996 lines
75 KiB
Plaintext
996 lines
75 KiB
Plaintext
{
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0,
|
|
"metadata": {
|
|
"colab": {
|
|
"name": "Lab13_exercises.ipynb",
|
|
"provenance": [],
|
|
"collapsed_sections": []
|
|
},
|
|
"kernelspec": {
|
|
"name": "python3",
|
|
"display_name": "Python 3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"#Principal Component Analysis"
|
|
],
|
|
"metadata": {
|
|
"id": "0TRW769UviGZ",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"In this lab, we'll see in practice how to use Principal Component Analysis (PCA) to reduce the dimensionality of a dataset, and how to interpret the principal components. To this end, we'll use the \"Wisconsin Breast Cancer dataset\", which contains diverse characteristics of cancerous cell nuclei, and whether the patient's tumor is malignant (M) or benign (B). In the second part of the lab, we'll see how PCA can be used to compress an image. "
|
|
],
|
|
"metadata": {
|
|
"id": "6Wv_p-C5vtEK",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**Load the necessary libraries**"
|
|
],
|
|
"metadata": {
|
|
"id": "SJrpIZNhw_wK",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {
|
|
"id": "uhVzgj_HOHit",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sklearn\n",
|
|
"import pandas as pd \n",
|
|
"from sklearn.decomposition import PCA\n",
|
|
"import numpy as np \n",
|
|
"import matplotlib.pyplot as plt \n",
|
|
"from sklearn.linear_model import LogisticRegression\n",
|
|
"from sklearn.pipeline import Pipeline \n",
|
|
"from sklearn.model_selection import train_test_split\n",
|
|
"from sklearn.metrics import accuracy_score\n",
|
|
"#import cv2"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**1) Read the dataset, drop the columns 'id' and 'Unnamed : 32', and check whether there are missing values. If any, drop the entire corresponding row.**"
|
|
],
|
|
"metadata": {
|
|
"id": "ud0SDrc2xDn2",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" diagnosis radius_mean texture_mean perimeter_mean area_mean \\\n",
|
|
"0 M 17.99 10.38 122.80 1001.0 \n",
|
|
"1 M 20.57 17.77 132.90 1326.0 \n",
|
|
"2 M 19.69 21.25 130.00 1203.0 \n",
|
|
"3 M 11.42 20.38 77.58 386.1 \n",
|
|
"4 M 20.29 14.34 135.10 1297.0 \n",
|
|
"\n",
|
|
" smoothness_mean compactness_mean concavity_mean concave points_mean \\\n",
|
|
"0 0.11840 0.27760 0.3001 0.14710 \n",
|
|
"1 0.08474 0.07864 0.0869 0.07017 \n",
|
|
"2 0.10960 0.15990 0.1974 0.12790 \n",
|
|
"3 0.14250 0.28390 0.2414 0.10520 \n",
|
|
"4 0.10030 0.13280 0.1980 0.10430 \n",
|
|
"\n",
|
|
" symmetry_mean ... radius_worst texture_worst perimeter_worst \\\n",
|
|
"0 0.2419 ... 25.38 17.33 184.60 \n",
|
|
"1 0.1812 ... 24.99 23.41 158.80 \n",
|
|
"2 0.2069 ... 23.57 25.53 152.50 \n",
|
|
"3 0.2597 ... 14.91 26.50 98.87 \n",
|
|
"4 0.1809 ... 22.54 16.67 152.20 \n",
|
|
"\n",
|
|
" area_worst smoothness_worst compactness_worst concavity_worst \\\n",
|
|
"0 2019.0 0.1622 0.6656 0.7119 \n",
|
|
"1 1956.0 0.1238 0.1866 0.2416 \n",
|
|
"2 1709.0 0.1444 0.4245 0.4504 \n",
|
|
"3 567.7 0.2098 0.8663 0.6869 \n",
|
|
"4 1575.0 0.1374 0.2050 0.4000 \n",
|
|
"\n",
|
|
" concave points_worst symmetry_worst fractal_dimension_worst \n",
|
|
"0 0.2654 0.4601 0.11890 \n",
|
|
"1 0.1860 0.2750 0.08902 \n",
|
|
"2 0.2430 0.3613 0.08758 \n",
|
|
"3 0.2575 0.6638 0.17300 \n",
|
|
"4 0.1625 0.2364 0.07678 \n",
|
|
"\n",
|
|
"[5 rows x 31 columns]\n",
|
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|
"RangeIndex: 569 entries, 0 to 568\n",
|
|
"Data columns (total 31 columns):\n",
|
|
" # Column Non-Null Count Dtype \n",
|
|
"--- ------ -------------- ----- \n",
|
|
" 0 diagnosis 569 non-null category\n",
|
|
" 1 radius_mean 569 non-null float64 \n",
|
|
" 2 texture_mean 569 non-null float64 \n",
|
|
" 3 perimeter_mean 569 non-null float64 \n",
|
|
" 4 area_mean 569 non-null float64 \n",
|
|
" 5 smoothness_mean 569 non-null float64 \n",
|
|
" 6 compactness_mean 569 non-null float64 \n",
|
|
" 7 concavity_mean 569 non-null float64 \n",
|
|
" 8 concave points_mean 569 non-null float64 \n",
|
|
" 9 symmetry_mean 569 non-null float64 \n",
|
|
" 10 fractal_dimension_mean 569 non-null float64 \n",
|
|
" 11 radius_se 569 non-null float64 \n",
|
|
" 12 texture_se 569 non-null float64 \n",
|
|
" 13 perimeter_se 569 non-null float64 \n",
|
|
" 14 area_se 569 non-null float64 \n",
|
|
" 15 smoothness_se 569 non-null float64 \n",
|
|
" 16 compactness_se 569 non-null float64 \n",
|
|
" 17 concavity_se 569 non-null float64 \n",
|
|
" 18 concave points_se 569 non-null float64 \n",
|
|
" 19 symmetry_se 569 non-null float64 \n",
|
|
" 20 fractal_dimension_se 569 non-null float64 \n",
|
|
" 21 radius_worst 569 non-null float64 \n",
|
|
" 22 texture_worst 569 non-null float64 \n",
|
|
" 23 perimeter_worst 569 non-null float64 \n",
|
|
" 24 area_worst 569 non-null float64 \n",
|
|
" 25 smoothness_worst 569 non-null float64 \n",
|
|
" 26 compactness_worst 569 non-null float64 \n",
|
|
" 27 concavity_worst 569 non-null float64 \n",
|
|
" 28 concave points_worst 569 non-null float64 \n",
|
|
" 29 symmetry_worst 569 non-null float64 \n",
|
|
" 30 fractal_dimension_worst 569 non-null float64 \n",
|
|
"dtypes: category(1), float64(30)\n",
|
|
"memory usage: 134.2 KB\n",
|
|
"None\n",
|
|
"diagnosis 0\n",
|
|
"radius_mean 0\n",
|
|
"texture_mean 0\n",
|
|
"perimeter_mean 0\n",
|
|
"area_mean 0\n",
|
|
"smoothness_mean 0\n",
|
|
"compactness_mean 0\n",
|
|
"concavity_mean 0\n",
|
|
"concave points_mean 0\n",
|
|
"symmetry_mean 0\n",
|
|
"fractal_dimension_mean 0\n",
|
|
"radius_se 0\n",
|
|
"texture_se 0\n",
|
|
"perimeter_se 0\n",
|
|
"area_se 0\n",
|
|
"smoothness_se 0\n",
|
|
"compactness_se 0\n",
|
|
"concavity_se 0\n",
|
|
"concave points_se 0\n",
|
|
"symmetry_se 0\n",
|
|
"fractal_dimension_se 0\n",
|
|
"radius_worst 0\n",
|
|
"texture_worst 0\n",
|
|
"perimeter_worst 0\n",
|
|
"area_worst 0\n",
|
|
"smoothness_worst 0\n",
|
|
"compactness_worst 0\n",
|
|
"concavity_worst 0\n",
|
|
"concave points_worst 0\n",
|
|
"symmetry_worst 0\n",
|
|
"fractal_dimension_worst 0\n",
|
|
"dtype: int64\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"file = '../data/wbcd.csv'\n",
|
|
"\n",
|
|
"##Read dataframe##\n",
|
|
"\n",
|
|
"df = pd.read_csv(file)\n",
|
|
"df.drop(columns=['id','Unnamed: 32'],axis=1,inplace=True)\n",
|
|
"df= df.astype({'diagnosis':'category'})\n",
|
|
"\n",
|
|
"\n",
|
|
"print(df.head())\n",
|
|
"print(df.info())\n",
|
|
"print(df.isna().sum())"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**2) Select X as all columns in the dataframe, at the exception of the target variable 'diagnosis', and y as the variable diagnosis. Center the matrix X column-wise.**"
|
|
],
|
|
"metadata": {
|
|
"id": "5JlJfmxfxiSw",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" radius_mean texture_mean perimeter_mean area_mean smoothness_mean \\\n",
|
|
"0 3.862708 -8.909649 30.830967 346.110896 0.02204 \n",
|
|
"1 6.442708 -1.519649 40.930967 671.110896 -0.01162 \n",
|
|
"2 5.562708 1.960351 38.030967 548.110896 0.01324 \n",
|
|
"3 -2.707292 1.090351 -14.389033 -268.789104 0.04614 \n",
|
|
"4 6.162708 -4.949649 43.130967 642.110896 0.00394 \n",
|
|
".. ... ... ... ... ... \n",
|
|
"564 7.432708 3.100351 50.030967 824.110896 0.01464 \n",
|
|
"565 6.002708 8.960351 39.230967 606.110896 0.00144 \n",
|
|
"566 2.472708 8.790351 16.330967 203.210896 -0.01181 \n",
|
|
"567 6.472708 10.040351 48.130967 610.110896 0.02144 \n",
|
|
"568 -6.367292 5.250351 -44.049033 -473.889104 -0.04373 \n",
|
|
"\n",
|
|
" compactness_mean concavity_mean concave points_mean symmetry_mean \\\n",
|
|
"0 0.173259 0.211301 0.098181 0.060738 \n",
|
|
"1 -0.025701 -0.001899 0.021251 0.000038 \n",
|
|
"2 0.055559 0.108601 0.078981 0.025738 \n",
|
|
"3 0.179559 0.152601 0.056281 0.078538 \n",
|
|
"4 0.028459 0.109201 0.055381 -0.000262 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 0.011559 0.155101 0.089981 -0.008562 \n",
|
|
"565 -0.000941 0.055201 0.048991 -0.005962 \n",
|
|
"566 -0.002041 0.003711 0.004101 -0.022162 \n",
|
|
"567 0.172659 0.262601 0.103081 0.058538 \n",
|
|
"568 -0.060721 -0.088799 -0.048919 -0.022462 \n",
|
|
"\n",
|
|
" fractal_dimension_mean ... radius_worst texture_worst \\\n",
|
|
"0 0.015912 ... 9.11081 -8.347223 \n",
|
|
"1 -0.006128 ... 8.72081 -2.267223 \n",
|
|
"2 -0.002808 ... 7.30081 -0.147223 \n",
|
|
"3 0.034642 ... -1.35919 0.822777 \n",
|
|
"4 -0.003968 ... 6.27081 -9.007223 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 -0.006568 ... 9.18081 0.722777 \n",
|
|
"565 -0.007468 ... 7.42081 12.572777 \n",
|
|
"566 -0.006318 ... 2.71081 8.442777 \n",
|
|
"567 0.007362 ... 9.47081 13.742777 \n",
|
|
"568 -0.003958 ... -6.81319 4.692777 \n",
|
|
"\n",
|
|
" perimeter_worst area_worst smoothness_worst compactness_worst \\\n",
|
|
"0 77.338787 1138.416872 0.029831 0.411335 \n",
|
|
"1 51.538787 1075.416872 -0.008569 -0.067665 \n",
|
|
"2 45.238787 828.416872 0.012031 0.170235 \n",
|
|
"3 -8.391213 -312.883128 0.077431 0.612035 \n",
|
|
"4 44.938787 694.416872 0.005031 -0.049265 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 58.838787 1146.416872 0.008631 -0.042965 \n",
|
|
"565 47.738787 850.416872 -0.015769 -0.062065 \n",
|
|
"566 19.438787 243.416872 -0.018469 0.055135 \n",
|
|
"567 77.338787 940.416872 0.032631 0.613835 \n",
|
|
"568 -48.101213 -611.983128 -0.042409 -0.189825 \n",
|
|
"\n",
|
|
" concavity_worst concave points_worst symmetry_worst \\\n",
|
|
"0 0.439712 0.150794 0.170024 \n",
|
|
"1 -0.030588 0.071394 -0.015076 \n",
|
|
"2 0.178212 0.128394 0.071224 \n",
|
|
"3 0.414712 0.142894 0.373724 \n",
|
|
"4 0.127812 0.047894 -0.053676 \n",
|
|
".. ... ... ... \n",
|
|
"564 0.138512 0.106994 -0.084076 \n",
|
|
"565 0.049312 0.048194 -0.032876 \n",
|
|
"566 0.068112 0.027194 -0.068276 \n",
|
|
"567 0.666512 0.150394 0.118624 \n",
|
|
"568 -0.272188 -0.114606 -0.002976 \n",
|
|
"\n",
|
|
" fractal_dimension_worst \n",
|
|
"0 0.034954 \n",
|
|
"1 0.005074 \n",
|
|
"2 0.003634 \n",
|
|
"3 0.089054 \n",
|
|
"4 -0.007166 \n",
|
|
".. ... \n",
|
|
"564 -0.012796 \n",
|
|
"565 -0.017576 \n",
|
|
"566 -0.005746 \n",
|
|
"567 0.040054 \n",
|
|
"568 -0.013556 \n",
|
|
"\n",
|
|
"[569 rows x 30 columns]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"pred = list(df.columns)\n",
|
|
"pred.remove('diagnosis')\n",
|
|
"X_normal =df[pred]\n",
|
|
"X = df[pred] - df[pred].mean()\n",
|
|
"Y = df[['diagnosis']]\n",
|
|
"print(X)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**3) Compute the total variance of the variable X.**"
|
|
],
|
|
"metadata": {
|
|
"id": "X-jy8912yE8O",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"451896.5562573987\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(X.var().sum())"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**4) Apply PCA to the dataset X by computing all principal components. Make sure you can access to the principal components, the variance explained by each component and the ratio of variance explained by each component.**"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[9.82044672e-01 1.61764899e-02 1.55751075e-03 1.20931964e-04\n",
|
|
" 8.82724536e-05 6.64883951e-06 4.01713682e-06 8.22017197e-07\n",
|
|
" 3.44135279e-07 1.86018721e-07 6.99473205e-08 1.65908880e-08\n",
|
|
" 6.99641650e-09 4.78318306e-09 2.93549214e-09 1.41684927e-09\n",
|
|
" 8.29577731e-10 5.20405883e-10 4.08463983e-10 3.63313378e-10\n",
|
|
" 1.72849737e-10 1.27487508e-10 7.72682973e-11 6.28357718e-11\n",
|
|
" 3.57302295e-11 2.76396041e-11 8.14452259e-12 6.30211541e-12\n",
|
|
" 4.43666945e-12 1.55344680e-12]\n",
|
|
"[[ 5.08623202e-03 2.19657026e-03 3.50763298e-02 5.16826469e-01\n",
|
|
" 4.23694535e-06 4.05260047e-05 8.19399539e-05 4.77807775e-05\n",
|
|
" 7.07804332e-06 -2.62155251e-06 3.13742507e-04 -6.50984008e-05\n",
|
|
" 2.23634150e-03 5.57271669e-02 -8.05646029e-07 5.51918197e-06\n",
|
|
" 8.87094462e-06 3.27915009e-06 -1.24101836e-06 -8.54530832e-08\n",
|
|
" 7.15473257e-03 3.06736622e-03 4.94576447e-02 8.52063392e-01\n",
|
|
" 6.42005481e-06 1.01275937e-04 1.68928625e-04 7.36658178e-05\n",
|
|
" 1.78986262e-05 1.61356159e-06]\n",
|
|
" [ 9.28705650e-03 -2.88160658e-03 6.27480827e-02 8.51823720e-01\n",
|
|
" -1.48194356e-05 -2.68862249e-06 7.51419574e-05 4.63501038e-05\n",
|
|
" -2.52430431e-05 -1.61197148e-05 -5.38692831e-05 3.48370414e-04\n",
|
|
" 8.19640791e-04 7.51112451e-03 1.49438131e-06 1.27357957e-05\n",
|
|
" 2.86921009e-05 9.36007477e-06 1.22647432e-05 2.89683790e-07\n",
|
|
" -5.68673345e-04 -1.32152605e-02 -1.85961117e-04 -5.19742358e-01\n",
|
|
" -7.68565692e-05 -2.56104144e-04 -1.75471479e-04 -3.05051743e-05\n",
|
|
" -1.57042845e-04 -5.53071662e-05]\n",
|
|
" [-1.23425821e-02 -6.35497857e-03 -7.16694814e-02 -2.78944181e-02\n",
|
|
" 7.26596827e-05 1.01754350e-04 2.65989729e-04 3.60471764e-05\n",
|
|
" 1.41290958e-04 5.06376971e-05 6.06156709e-03 6.23377635e-03\n",
|
|
" 4.38560369e-02 9.90245878e-01 4.34471433e-05 1.27658711e-04\n",
|
|
" 2.07365800e-04 4.78855144e-05 1.14411270e-04 2.43158370e-05\n",
|
|
" -1.55659935e-02 -3.15446196e-02 -9.23133791e-02 -3.93186778e-02\n",
|
|
" -4.21307399e-05 -7.64833237e-04 -8.46552237e-04 -3.33596393e-04\n",
|
|
" -3.49992952e-04 -4.09371692e-05]\n",
|
|
" [-3.42380473e-02 -3.62415111e-01 -3.29281417e-01 3.94122494e-02\n",
|
|
" -3.44153009e-04 -3.00489873e-03 -3.40779110e-03 -1.24725032e-03\n",
|
|
" -9.66809714e-04 -1.99194796e-04 -4.08618843e-03 -2.26398666e-02\n",
|
|
" -4.98565303e-02 -1.01980275e-01 6.69114619e-06 -8.93263012e-04\n",
|
|
" -9.95328878e-04 -2.34560912e-04 -1.24528446e-04 -6.72412843e-05\n",
|
|
" -6.18999387e-02 -5.42057412e-01 -6.66816451e-01 3.87691524e-02\n",
|
|
" -7.21927589e-04 -1.03619614e-02 -1.15618071e-02 -2.99467373e-03\n",
|
|
" -2.64085170e-03 -9.08697292e-04]\n",
|
|
" [ 3.54561138e-02 -4.43187450e-01 3.13382893e-01 -4.60378117e-02\n",
|
|
" 5.79019359e-04 2.52639926e-03 2.19520726e-03 1.13196737e-03\n",
|
|
" 9.37014169e-04 2.07028041e-04 2.93386180e-03 -3.75434531e-02\n",
|
|
" 3.57275320e-02 5.08045702e-02 -5.18037664e-05 5.24579915e-04\n",
|
|
" 5.76839903e-04 2.25598524e-04 6.11321955e-05 4.64421630e-05\n",
|
|
" 5.31447667e-02 -6.12574312e-01 5.64102976e-01 -1.84525531e-02\n",
|
|
" 4.65062512e-04 6.09647380e-03 6.16530214e-03 2.41157233e-03\n",
|
|
" 1.88324182e-03 5.19581269e-04]\n",
|
|
" [-1.31213101e-01 -2.13486089e-01 -8.40324225e-01 5.23468101e-02\n",
|
|
" -4.06502430e-04 -1.01527758e-03 2.75600070e-04 -5.76346878e-04\n",
|
|
" -1.79444495e-04 2.19983885e-04 -8.45585552e-04 -1.24013980e-02\n",
|
|
" 9.48056397e-02 -2.31166662e-02 -1.49989514e-05 -3.59930492e-04\n",
|
|
" -3.83840527e-04 -4.25616208e-04 -4.11711911e-05 -1.00135535e-04\n",
|
|
" -7.49807186e-02 1.21167279e-01 4.44630524e-01 -2.01806772e-02\n",
|
|
" 1.47871511e-03 9.48569782e-03 1.04511092e-02 1.59681971e-03\n",
|
|
" 5.47852368e-03 1.23726579e-03]\n",
|
|
" [ 3.35131912e-02 -7.84253475e-01 1.89074737e-01 -7.33787337e-03\n",
|
|
" 1.60796958e-03 2.77107786e-04 1.02365525e-03 9.05454729e-04\n",
|
|
" 5.98298140e-04 -4.25619565e-05 -1.53826412e-02 6.66867308e-02\n",
|
|
" -1.48548561e-01 2.25977534e-02 -2.37177063e-04 -1.27405510e-03\n",
|
|
" -1.41036865e-03 -5.21614746e-04 -7.13773114e-04 -1.94572501e-04\n",
|
|
" 4.53901657e-02 5.52024144e-01 -1.17015608e-01 1.83169390e-03\n",
|
|
" 3.94704099e-03 7.74390329e-03 1.08822097e-02 4.24156865e-03\n",
|
|
" 7.03799527e-03 1.17067750e-03]\n",
|
|
" [-7.54924585e-02 -6.87405638e-02 8.39642267e-02 -3.00992471e-03\n",
|
|
" 3.43658580e-03 1.55731486e-02 1.92512587e-02 9.07295722e-03\n",
|
|
" 9.14981253e-03 3.00903313e-03 8.41264091e-02 5.87281487e-01\n",
|
|
" 7.77894165e-01 -4.22340772e-02 1.60686058e-03 8.68045428e-03\n",
|
|
" 1.17908464e-02 3.99536083e-03 5.54864404e-03 1.44055227e-03\n",
|
|
" -1.29429315e-01 1.60158677e-02 -7.32396132e-02 5.01077221e-03\n",
|
|
" -2.13704577e-03 -8.59643463e-03 -5.96017341e-03 4.06900452e-04\n",
|
|
" -7.69020238e-03 -9.48038255e-05]\n",
|
|
" [-3.50549264e-01 4.08376429e-03 1.32828034e-01 -3.82916116e-03\n",
|
|
" 8.22698130e-03 5.63148308e-02 7.02297025e-02 1.92498100e-02\n",
|
|
" 1.49895864e-02 7.63859418e-03 -9.15127909e-02 1.62432584e-02\n",
|
|
" -1.90207008e-01 4.47806986e-03 1.02379224e-03 1.89362797e-02\n",
|
|
" 2.52133644e-02 2.90414428e-03 1.76381905e-03 2.10548292e-03\n",
|
|
" -8.60507446e-01 -2.11047341e-03 3.86007208e-02 4.09978696e-03\n",
|
|
" 1.14296358e-02 1.56279540e-01 1.91177497e-01 3.09293509e-02\n",
|
|
" 2.02890503e-02 1.76591684e-02]\n",
|
|
" [ 1.39559852e-01 7.66679112e-02 -8.92113884e-02 1.95571374e-03\n",
|
|
" -4.44685266e-03 -2.99475404e-02 -2.79441150e-02 -1.04362500e-02\n",
|
|
" -8.27800166e-03 -4.54280517e-03 -3.22073955e-02 7.91295671e-01\n",
|
|
" -5.48501632e-01 2.03427164e-02 4.45191329e-04 -6.64830167e-03\n",
|
|
" -9.88467884e-03 -2.63444827e-03 1.06193405e-03 -6.29812958e-04\n",
|
|
" 2.83270491e-02 -9.44156308e-02 8.41866696e-02 -3.10937022e-03\n",
|
|
" -1.17730197e-02 -8.54753911e-02 -1.00411922e-01 -2.84427473e-02\n",
|
|
" -2.97280861e-02 -1.21927540e-02]\n",
|
|
" [-4.19346972e-01 2.90168453e-02 2.68885270e-03 3.44514452e-03\n",
|
|
" 2.91369356e-02 1.16711657e-01 1.85699905e-01 5.77209423e-02\n",
|
|
" 5.17276491e-02 1.92137160e-02 1.05998470e-01 1.40864382e-01\n",
|
|
" -8.77215348e-02 1.44869247e-03 3.85723906e-03 4.01767227e-02\n",
|
|
" 7.88928691e-02 8.56309877e-03 9.72156094e-03 6.24114957e-03\n",
|
|
" 4.21861885e-01 -3.27379836e-02 -2.65059616e-02 -1.75632076e-03\n",
|
|
" 5.36022007e-02 3.93362884e-01 5.92901605e-01 1.14283667e-01\n",
|
|
" 1.26297595e-01 5.67738652e-02]\n",
|
|
" [ 7.35141931e-01 -1.77040388e-03 -8.17809788e-02 -1.46297165e-03\n",
|
|
" -4.63391435e-02 -9.44692077e-02 3.35666920e-02 -4.75382102e-02\n",
|
|
" -5.52351985e-02 -1.52406056e-02 -6.42159672e-02 -4.12379900e-03\n",
|
|
" 6.76048261e-02 -1.15869620e-03 -4.32409361e-03 3.13125859e-02\n",
|
|
" 1.25506514e-01 1.43712028e-02 -6.51537135e-03 2.13796926e-03\n",
|
|
" -1.68128352e-01 4.46460236e-04 2.63303246e-03 9.19000402e-04\n",
|
|
" -5.93045367e-02 1.24613176e-01 5.97245084e-01 2.27020609e-02\n",
|
|
" -6.98326847e-02 -2.22345680e-04]\n",
|
|
" [ 2.18087182e-01 4.23058843e-03 -2.51180394e-02 -2.00988446e-04\n",
|
|
" 5.25266400e-03 8.77581594e-02 -2.24378306e-01 -4.04419791e-02\n",
|
|
" 1.15113362e-01 9.03812811e-03 -9.66326953e-02 2.27283852e-02\n",
|
|
" 2.63044655e-02 -2.31375241e-04 -2.53645692e-03 3.81654059e-02\n",
|
|
" -1.39282086e-01 -1.20723947e-02 3.81570258e-02 -6.22637688e-04\n",
|
|
" -3.10629219e-02 -4.46771810e-03 -3.35229868e-03 2.87275851e-04\n",
|
|
" 7.34213381e-03 7.40034844e-01 -3.35405153e-01 3.29427129e-03\n",
|
|
" 4.24745510e-01 6.41650534e-02]\n",
|
|
" [ 8.10260113e-02 1.98471260e-03 -5.22865768e-03 -3.38365236e-04\n",
|
|
" 3.61161370e-02 -2.54306035e-02 1.28761314e-01 5.48374336e-02\n",
|
|
" 3.26359052e-01 -2.39053134e-03 1.46435306e-01 -9.85458035e-03\n",
|
|
" -1.39810594e-02 -2.25744302e-04 2.03874389e-04 -4.61430983e-02\n",
|
|
" 5.09256883e-02 3.76274691e-03 8.42645063e-02 -5.15315786e-03\n",
|
|
" -6.31338895e-02 -1.61770544e-04 4.07595409e-03 2.14122499e-04\n",
|
|
" 2.83201064e-02 -4.15638333e-01 9.88608381e-02 4.73907070e-02\n",
|
|
" 7.99896166e-01 -4.83413296e-02]\n",
|
|
" [-1.37865559e-01 7.07543943e-03 1.34434455e-02 1.89595169e-04\n",
|
|
" -5.37159784e-02 -9.61467980e-02 -1.06775624e-01 -7.50229010e-02\n",
|
|
" -5.24186368e-02 -1.31995161e-02 -9.24496275e-01 3.78188038e-02\n",
|
|
" 8.57009464e-02 1.78223809e-03 -1.57517228e-02 -4.90555493e-02\n",
|
|
" -5.88786230e-02 -1.64236829e-02 1.19559471e-02 -1.34453974e-02\n",
|
|
" 1.22855427e-01 -7.46425242e-03 -1.02334959e-02 -2.62085387e-04\n",
|
|
" -6.01702395e-02 -1.33017878e-01 1.29230578e-01 -2.80515569e-02\n",
|
|
" 1.55327182e-01 -1.76897423e-02]\n",
|
|
" [-1.41957144e-01 -3.71772553e-03 2.06841238e-02 7.16236305e-05\n",
|
|
" -2.44151203e-01 -1.73132652e-01 -1.63916071e-01 -3.11870436e-01\n",
|
|
" -9.20311012e-02 -3.65397500e-02 1.75147665e-01 -4.74183050e-03\n",
|
|
" -1.64085616e-02 -3.94123160e-04 -2.72335855e-02 7.14537635e-02\n",
|
|
" 9.66647543e-02 -6.30245504e-02 4.42032739e-02 6.62754743e-03\n",
|
|
" 1.47186018e-02 2.33182486e-03 -6.73566380e-04 -4.67346730e-05\n",
|
|
" -3.91496947e-01 3.33214534e-02 1.05075524e-01 -7.21728014e-01\n",
|
|
" 1.26502755e-01 -3.48926382e-02]\n",
|
|
" [ 4.42129324e-02 -1.74411881e-03 -1.08282412e-02 1.33246661e-04\n",
|
|
" -1.30030608e-01 1.80413129e-01 4.32652559e-01 3.88939443e-02\n",
|
|
" 2.16862037e-02 3.76425779e-02 -1.64246809e-01 -9.58819754e-03\n",
|
|
" 2.53430874e-03 6.73342112e-04 5.32163112e-03 3.31399676e-01\n",
|
|
" 6.76877071e-01 1.00527901e-01 4.05424525e-02 5.35339202e-02\n",
|
|
" 2.25143677e-02 1.93164766e-03 5.41453437e-04 -1.30917707e-04\n",
|
|
" -2.60189514e-01 6.24843150e-02 -2.69804020e-01 5.43397196e-02\n",
|
|
" 2.27045336e-03 1.74914413e-02]\n",
|
|
" [ 8.97292328e-02 -1.41458884e-04 -1.37775702e-02 1.13279338e-06\n",
|
|
" 3.06212225e-01 2.87099573e-01 1.99451087e-01 -1.38388516e-02\n",
|
|
" 4.38304941e-01 8.09957805e-02 -1.46255624e-01 -6.12757296e-03\n",
|
|
" 1.47257226e-02 1.80666960e-04 3.78861546e-02 -1.84806700e-02\n",
|
|
" 2.13236057e-02 -5.36608155e-02 -5.41504310e-02 8.38679702e-03\n",
|
|
" 5.92277426e-03 4.86674310e-04 6.13605812e-04 -4.23143566e-05\n",
|
|
" 4.12030401e-01 4.05638129e-03 1.16402570e-02 -5.80515606e-01\n",
|
|
" -1.65287246e-01 1.13530565e-01]\n",
|
|
" [-2.10057742e-02 -1.24960485e-03 6.16356938e-04 1.20299753e-04\n",
|
|
" -1.97107021e-01 -5.53153375e-02 -4.20063482e-02 4.44765528e-02\n",
|
|
" 7.63628284e-01 -4.40913636e-02 -2.28731804e-02 -2.36191575e-03\n",
|
|
" -1.30277683e-03 1.42354992e-04 -5.34210340e-02 -5.56807262e-02\n",
|
|
" -1.71590650e-01 -1.70880697e-03 -1.23077156e-01 -1.45669104e-02\n",
|
|
" 1.31009681e-02 1.01735246e-03 -5.69726713e-04 -6.08533609e-05\n",
|
|
" -4.69474770e-01 2.95753422e-02 2.70708049e-02 1.57973616e-01\n",
|
|
" -2.57505985e-01 -6.39356851e-02]\n",
|
|
" [-8.01074429e-02 2.12853660e-04 1.09397982e-02 1.22408810e-04\n",
|
|
" 6.49458801e-02 4.29470256e-02 -6.67235914e-01 -3.28925155e-01\n",
|
|
" 2.22382756e-01 2.33171053e-02 4.39284077e-02 3.20684477e-03\n",
|
|
" -9.11272542e-03 7.44106237e-05 3.77952551e-02 2.25649739e-01\n",
|
|
" 4.43120250e-01 1.11669448e-01 -1.24646702e-02 3.87988752e-02\n",
|
|
" 4.69179080e-03 -1.86035982e-04 3.17772987e-04 -4.94152270e-05\n",
|
|
" 2.49151067e-01 -9.92181813e-02 7.59212067e-02 1.88195407e-01\n",
|
|
" -5.35602114e-02 5.62031123e-02]\n",
|
|
" [ 5.94747777e-02 -5.08486619e-04 -1.00150532e-02 6.72130539e-05\n",
|
|
" 5.05875321e-02 7.86476447e-01 -2.07298286e-01 1.95295923e-02\n",
|
|
" -1.59737119e-01 1.76680619e-01 1.13499143e-02 -2.52359539e-04\n",
|
|
" -4.16204243e-03 1.50857164e-04 -6.61066896e-02 -4.07953355e-02\n",
|
|
" -1.51525068e-01 -2.36680658e-02 4.01205397e-02 2.34074017e-02\n",
|
|
" -6.14078602e-03 6.80390514e-04 1.23096656e-03 -2.07826845e-05\n",
|
|
" -3.71434661e-01 -1.75980615e-01 9.48178123e-02 1.97712543e-02\n",
|
|
" 3.91032580e-02 2.51854689e-01]\n",
|
|
" [-8.72363409e-03 3.25522689e-04 3.17936985e-03 -8.61054786e-05\n",
|
|
" -1.23800316e-01 -3.07516380e-01 6.74748546e-02 -9.94859922e-03\n",
|
|
" 2.28284660e-02 2.04533392e-01 1.42505916e-02 1.32642700e-03\n",
|
|
" 2.11743590e-04 -4.80315787e-05 -1.43604236e-02 -5.31878430e-02\n",
|
|
" 2.33095914e-03 -2.06071899e-02 -1.17416102e-01 1.12889769e-01\n",
|
|
" -6.81318531e-03 -3.14021728e-04 2.85261632e-04 2.78114736e-05\n",
|
|
" 7.74928695e-03 -4.02655019e-02 -3.33052794e-02 3.67880307e-02\n",
|
|
" 1.97580346e-02 8.99017368e-01]\n",
|
|
" [-4.57847381e-03 5.70803677e-04 1.25149830e-03 -8.95193307e-06\n",
|
|
" 5.76678236e-02 6.81084244e-02 9.75431969e-02 -3.70822357e-01\n",
|
|
" -2.75978831e-02 1.20861421e-02 1.66061175e-02 4.48926295e-03\n",
|
|
" 1.25850374e-03 -1.36165607e-04 -1.18579493e-01 -8.02419200e-01\n",
|
|
" 3.20221001e-01 -3.65853434e-02 -2.26926628e-01 -7.70139814e-02\n",
|
|
" -1.97252553e-03 -5.94055091e-04 -3.40223822e-04 2.29399965e-05\n",
|
|
" -4.07567556e-02 8.45421998e-02 -6.52663643e-02 8.91754582e-02\n",
|
|
" 1.96427496e-02 -5.59332520e-02]\n",
|
|
" [ 2.82894830e-02 7.33059897e-05 -3.58436087e-03 -2.64959224e-05\n",
|
|
" -6.84974204e-01 2.47366571e-01 2.21208940e-01 -4.61447929e-01\n",
|
|
" 2.89096396e-02 -1.34513145e-02 8.00696324e-03 1.98785626e-03\n",
|
|
" 4.17372804e-04 -3.67867124e-05 1.16955811e-01 9.20949353e-02\n",
|
|
" -2.32860819e-01 -2.37982721e-02 6.40992514e-03 -1.79257890e-04\n",
|
|
" -3.65597275e-03 -3.28916831e-04 7.46498022e-05 1.52392975e-05\n",
|
|
" 3.38845354e-01 -4.71308735e-02 -1.18303714e-02 1.03567515e-01\n",
|
|
" -5.07940555e-03 -3.03265921e-02]\n",
|
|
" [ 3.59617411e-03 4.32289948e-04 -3.07763932e-04 -1.83005653e-05\n",
|
|
" -4.70763997e-01 1.12910548e-01 -2.46083834e-01 6.06124800e-01\n",
|
|
" -3.79117865e-02 -6.37956993e-02 -5.48812336e-03 3.36786194e-03\n",
|
|
" 9.65358515e-04 4.85535218e-05 1.91129278e-02 -1.51776701e-01\n",
|
|
" 2.04174155e-01 -5.12935751e-03 -4.44877141e-01 -2.90928879e-02\n",
|
|
" -1.57275838e-03 -5.10381910e-04 -3.00466192e-05 6.32348960e-06\n",
|
|
" 1.84440788e-01 9.05167514e-04 2.49243203e-02 -1.66799320e-01\n",
|
|
" 6.17567463e-02 -4.43746763e-02]\n",
|
|
" [-1.60336173e-03 -6.85637302e-04 1.33993328e-04 -6.11786890e-06\n",
|
|
" -2.33739581e-01 5.64508610e-04 -1.07297993e-01 2.34238246e-01\n",
|
|
" 9.10728702e-02 2.70861187e-02 -1.02803829e-02 -4.05065873e-03\n",
|
|
" -1.22510674e-03 9.14651057e-05 1.06982587e-01 -3.42920796e-01\n",
|
|
" 1.51034303e-01 -3.96060033e-03 8.30872058e-01 -1.33614026e-02\n",
|
|
" 1.13210755e-03 6.60844087e-04 1.44679382e-04 -9.13824977e-06\n",
|
|
" 7.59773478e-02 3.31512433e-02 1.62159306e-03 -4.32253502e-02\n",
|
|
" -1.19316598e-01 6.67516068e-02]\n",
|
|
" [-2.79341068e-03 -2.03286434e-04 -1.48499782e-04 2.79755999e-05\n",
|
|
" 4.23689945e-02 4.47732652e-02 2.87280747e-02 -2.51468694e-02\n",
|
|
" -1.73524458e-02 -3.36239624e-01 -3.54107942e-03 -3.89325162e-04\n",
|
|
" -1.36868856e-03 5.15075948e-05 3.23321169e-01 -1.08304964e-01\n",
|
|
" -9.17739425e-02 8.54802094e-01 -3.62928822e-02 -2.00238568e-02\n",
|
|
" 1.77504179e-03 1.50254292e-04 9.94451416e-05 -1.32750118e-05\n",
|
|
" -5.98909049e-02 -1.10335473e-03 6.00223990e-03 -8.35480473e-02\n",
|
|
" 1.56674416e-02 1.15823522e-01]\n",
|
|
" [-3.25869730e-03 -1.08812487e-04 5.92481499e-04 -2.69334809e-06\n",
|
|
" -4.43105873e-02 -1.08470081e-01 -7.89825836e-03 1.74453235e-02\n",
|
|
" -1.30715896e-03 8.88814613e-01 2.08214818e-03 -4.16851133e-04\n",
|
|
" -1.30535120e-03 4.73766465e-05 1.79728739e-01 -3.75730401e-02\n",
|
|
" -4.78504389e-02 3.07352340e-01 -5.57502875e-02 2.40473411e-02\n",
|
|
" 8.98500542e-04 1.21159573e-04 3.92446496e-05 -7.82704450e-06\n",
|
|
" -1.63749618e-02 2.53473356e-02 8.73144088e-03 -3.09344589e-02\n",
|
|
" 6.96062043e-03 -2.44427991e-01]\n",
|
|
" [-5.12865809e-04 -1.28702530e-04 2.82547547e-04 -1.30371539e-05\n",
|
|
" 9.43925292e-02 6.81916699e-03 -1.44289058e-02 -1.73481243e-02\n",
|
|
" -8.82902913e-03 -4.74686275e-02 -3.63257185e-03 -4.72658847e-04\n",
|
|
" 1.19657801e-03 -2.03423347e-05 9.01248743e-01 -4.34291557e-02\n",
|
|
" 4.96009338e-02 -3.76480981e-01 -9.78356412e-02 -7.75315348e-03\n",
|
|
" 5.15472696e-05 1.07829473e-04 -1.15443883e-04 5.20580200e-06\n",
|
|
" -1.32092425e-01 8.29891881e-05 -1.08953390e-03 4.79998985e-02\n",
|
|
" 1.28640353e-02 1.74174206e-02]\n",
|
|
" [ 6.48447162e-04 4.67664637e-06 -1.53201140e-04 1.78631233e-06\n",
|
|
" 2.95544130e-03 1.31734821e-02 -4.01346519e-03 4.22724702e-03\n",
|
|
" -2.19542355e-03 -6.13701892e-02 -6.51778391e-03 -8.77127241e-05\n",
|
|
" 7.98704359e-04 7.20344351e-06 2.31482576e-03 -9.55572740e-02\n",
|
|
" -2.29448848e-02 -2.54907674e-03 -1.00353743e-02 9.86791087e-01\n",
|
|
" 4.51134631e-04 1.35174619e-06 -6.58155925e-05 -2.32842029e-07\n",
|
|
" 5.02792306e-03 9.24822560e-03 4.92968092e-03 -2.98937509e-03\n",
|
|
" 3.40003739e-03 -1.10843881e-01]]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"pca = PCA()\n",
|
|
"X_PCA = pca.fit_transform(X)\n",
|
|
"\n",
|
|
"#print(pca.explained_variance_)\n",
|
|
"print(pca.explained_variance_ratio_)\n",
|
|
"print(pca.components_)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**5) Create a barplot of the proportion of total variance explained by the first 5 components. What do you observe ?**\n",
|
|
"\n",
|
|
"**Generate also a barplot of the cumulative ratio of the total variance explained by the first 5 components.**\n",
|
|
"\n",
|
|
"**Also, what is the proportion of the total variance explained by all the components ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "gyfp2zIeze8E",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "<Figure size 432x288 with 1 Axes>",
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAMiElEQVR4nO3cf6jd913H8edrSeuk61YwVylJulswE8MQWy5xUNGim6TtSARFGqi/KMs/q1Q6lAylzvrPpjBFjD+iK2NTF+s2JdhoHC4yJmuXm/WHS2LkEqtJHCTrarUMrdW3f9xTObu9N+ckOTmneef5gEvP9/v99Jz3958n33zPj1QVkqSr3xtmPYAkaTIMuiQ1YdAlqQmDLklNGHRJamL9rF54w4YNNT8/P6uXl6Sr0tGjR79aVXOrHZtZ0Ofn51lcXJzVy0vSVSnJP691zFsuktSEQZekJkYGPcmjSc4l+fIax5PkN5MsJXk2ye2TH1OSNMo4V+gfBbZf4PhdwJbB327gdy5/LEnSxRoZ9Kr6HPC1CyzZCXyslj0B3JTk5kkNKEkazyTuoW8ETg9tnxnse40ku5MsJlk8f/78BF5akvSqqb4pWlX7qmqhqhbm5lb9GKUk6RJNIuhngc1D25sG+yRJUzSJoB8AfmLwaZd3AC9W1Vcm8LySpIsw8puiST4B3AlsSHIG+CXgOoCq+l3gIHA3sAR8HfjpKzXsq+b3PH6lX2JqnvvgPbMeQVITI4NeVbtGHC/gvRObSJJ0SfymqCQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDUxVtCTbE9yMslSkj2rHL8lyeEkTyV5Nsndkx9VknQhI4OeZB2wF7gL2ArsSrJ1xbJfBB6rqtuAe4HfnvSgkqQLG+cKfRuwVFWnquplYD+wc8WaAt48ePwW4F8nN6IkaRzjBH0jcHpo+8xg37APAPclOQMcBH5mtSdKsjvJYpLF8+fPX8K4kqS1TOpN0V3AR6tqE3A38PEkr3nuqtpXVQtVtTA3Nzehl5YkwXhBPwtsHtreNNg37H7gMYCq+gLwRmDDJAaUJI1nnKAfAbYkuTXJ9Sy/6XlgxZp/AX4QIMl3shx076lI0hSNDHpVvQI8ABwCTrD8aZZjSR5JsmOw7H3Ae5I8A3wC+Kmqqis1tCTptdaPs6iqDrL8ZufwvoeHHh8H7pjsaJKki+E3RSWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJamJsYKeZHuSk0mWkuxZY82PJTme5FiSP57smJKkUdaPWpBkHbAXeBdwBjiS5EBVHR9aswV4P3BHVb2Q5Fuv1MCSpNWNc4W+DViqqlNV9TKwH9i5Ys17gL1V9QJAVZ2b7JiSpFHGCfpG4PTQ9pnBvmFvA96W5O+SPJFk+2pPlGR3ksUki+fPn7+0iSVJq5rUm6LrgS3AncAu4PeT3LRyUVXtq6qFqlqYm5ub0EtLkmC8oJ8FNg9tbxrsG3YGOFBV/11V/wT8I8uBlyRNyThBPwJsSXJrkuuBe4EDK9b8OctX5yTZwPItmFOTG1OSNMrIoFfVK8ADwCHgBPBYVR1L8kiSHYNlh4DnkxwHDgM/V1XPX6mhJUmvNfJjiwBVdRA4uGLfw0OPC3ho8CdJmgG/KSpJTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNjBX0JNuTnEyylGTPBdb9SJJKsjC5ESVJ4xgZ9CTrgL3AXcBWYFeSrausuxF4EHhy0kNKkkYb5wp9G7BUVaeq6mVgP7BzlXW/AnwI+M8JzidJGtM4Qd8InB7aPjPY9/+S3A5srqrHL/RESXYnWUyyeP78+YseVpK0tst+UzTJG4APA+8btbaq9lXVQlUtzM3NXe5LS5KGjBP0s8Dmoe1Ng32vuhF4O/C3SZ4D3gEc8I1RSZqucYJ+BNiS5NYk1wP3AgdePVhVL1bVhqqar6p54AlgR1UtXpGJJUmrGhn0qnoFeAA4BJwAHquqY0keSbLjSg8oSRrP+nEWVdVB4OCKfQ+vsfbOyx9LknSx/KaoJDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNTFW0JNsT3IyyVKSPascfyjJ8STPJvmbJG+d/KiSpAsZGfQk64C9wF3AVmBXkq0rlj0FLFTVdwGfBH510oNKki5snCv0bcBSVZ2qqpeB/cDO4QVVdbiqvj7YfALYNNkxJUmjjBP0jcDpoe0zg31ruR/4y8sZSpJ08dZP8smS3AcsAN+/xvHdwG6AW265ZZIvLUnXvHGu0M8Cm4e2Nw32fYMk7wR+AdhRVf+12hNV1b6qWqiqhbm5uUuZV5K0hnGCfgTYkuTWJNcD9wIHhhckuQ34PZZjfm7yY0qSRhkZ9Kp6BXgAOAScAB6rqmNJHkmyY7Ds14A3AX+a5OkkB9Z4OknSFTLWPfSqOggcXLHv4aHH75zwXJKki+Q3RSWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJamJ9bMeQBdvfs/jsx5hIp774D2zHkFqZawr9CTbk5xMspRkzyrHvynJnwyOP5lkfuKTSpIuaGTQk6wD9gJ3AVuBXUm2rlh2P/BCVX078OvAhyY9qCTpwsa55bINWKqqUwBJ9gM7geNDa3YCHxg8/iTwW0lSVTXBWaU2t5vAW06avHGCvhE4PbR9BvietdZU1StJXgS+Bfjq8KIku4Hdg82Xkpy8lKGnaAMrzmHS8vr9t4znfoVd6+f/OnU1nPtb1zow1TdFq2ofsG+ar3k5kixW1cKs55gFz/3aPHe4ts//aj/3cd4UPQtsHtreNNi36pok64G3AM9PYkBJ0njGCfoRYEuSW5NcD9wLHFix5gDwk4PHPwp81vvnkjRdI2+5DO6JPwAcAtYBj1bVsSSPAItVdQD4CPDxJEvA11iOfgdXze2hK8Bzv3Zdy+d/VZ97vJCWpB786r8kNWHQJakJg76KJI8mOZfky7OeZdqSbE5yOMnxJMeSPDjrmaYlyRuTfDHJM4Nz/+VZzzRtSdYleSrJX8x6lmlL8lySv0/ydJLFWc9zKbyHvook3we8BHysqt4+63mmKcnNwM1V9aUkNwJHgR+uquMj/terXpIAN1TVS0muAz4PPFhVT8x4tKlJ8hCwALy5qt4963mmKclzwEJVvd6/WLQmr9BXUVWfY/nTOtecqvpKVX1p8Pg/gBMsfxO4vVr20mDzusHfNXPFk2QTcA/wB7OeRZfGoGtNg1/NvA14csajTM3glsPTwDngM1V1zZw78BvAzwP/O+M5ZqWAv05ydPAzJVcdg65VJXkT8CngZ6vq32c9z7RU1f9U1Xez/I3obUmuiVtuSd4NnKuqo7OeZYa+t6puZ/mXZd87uPV6VTHoeo3B/eNPAX9UVZ+e9TyzUFX/BhwGts94lGm5A9gxuI+8H/iBJH8425Gmq6rODv57Dvgzln9p9qpi0PUNBm8MfgQ4UVUfnvU805RkLslNg8ffDLwL+IeZDjUlVfX+qtpUVfMsf9P7s1V134zHmpokNww+BECSG4AfAq66T7kZ9FUk+QTwBeA7kpxJcv+sZ5qiO4AfZ/kK7enB392zHmpKbgYOJ3mW5d8w+kxVXXMf37tGfRvw+STPAF8EHq+qv5rxTBfNjy1KUhNeoUtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklN/B/cG+Fv2wnaEQAAAABJRU5ErkJggg==\n"
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "<Figure size 432x288 with 1 Axes>",
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAMjUlEQVR4nO3cf6jd913H8edrSeuk67Y/cpWSH0vBbBim2HLJhIoWt0najURwSAMdKnX5Z5FKh5KhVK3/OAdThPgjuDI3tbFuTi42GoeLlMnS5Wb94ZIYucZqEwdJu25ahtbq2z/umZze3OScpCf3cN/3+YBLz/f7/fSc9/efJ998z49UFZKk1e910x5AkjQZBl2SmjDoktSEQZekJgy6JDWxflovvGHDhtq6deu0Xl6SVqUTJ048X1Uzyx2bWtC3bt3K/Pz8tF5eklalJP9yuWPecpGkJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMjg57k4SQXknzlMseT5LeSLCR5Jsntkx9TkjTKOFfonwB2XuH4XcC2wd9e4Hde+1iSpKs1MuhV9TjwtSss2Q18shYdA96c5JZJDShJGs8kvim6EXhuaPvcYN9Xly5MspfFq3i2bNkygZdem7buf2zaI0zEs7/2nqv+f7qcO6zt81/L5w7Xdv7jWNE3RavqYFXNVtXszMyyP0UgSbpGkwj6eWDz0PamwT5J0gqaxC2XOWBfkkPAO4BvVNUlt1smyX96SdKlRgY9ySPAncCGJOeAXwJuAKiq3wUOA3cDC8A3gZ+6XsNKki5vZNCras+I4wV8cGITSZKuid8UlaQmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUxFhBT7IzyZkkC0n2L3N8S5KjSZ5M8kySuyc/qiTpSkYGPck64ABwF7Ad2JNk+5Jlvwg8WlW3AfcAvz3pQSVJVzbOFfoOYKGqzlbVy8AhYPeSNQW8cfD4TcC/TW5ESdI4xgn6RuC5oe1zg33Dfhm4N8k54DDwM8s9UZK9SeaTzF+8ePEaxpUkXc6k3hTdA3yiqjYBdwOfSnLJc1fVwaqararZmZmZCb20JAnGC/p5YPPQ9qbBvmH3AY8CVNUXgdcDGyYxoCRpPOME/TiwLcmtSW5k8U3PuSVr/hV4J0CS72Yx6N5TkaQVNDLoVfUKsA84Apxm8dMsJ5M8lGTXYNmHgA8keRp4BPjJqqrrNbQk6VLrx1lUVYdZfLNzeN+DQ49PAXdMdjRJ0tXwm6KS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpoYK+hJdiY5k2Qhyf7LrPnxJKeSnEzyx5MdU5I0yvpRC5KsAw4A7wbOAceTzFXVqaE124APA3dU1YtJvuN6DSxJWt44V+g7gIWqOltVLwOHgN1L1nwAOFBVLwJU1YXJjilJGmWcoG8EnhvaPjfYN+ytwFuT/F2SY0l2TmpASdJ4Rt5yuYrn2QbcCWwCHk/yPVX19eFFSfYCewG2bNkyoZeWJMF4V+jngc1D25sG+4adA+aq6r+r6p+Bf2Qx8K9SVQeraraqZmdmZq51ZknSMsYJ+nFgW5Jbk9wI3APMLVnz5yxenZNkA4u3YM5ObkxJ0igjg15VrwD7gCPAaeDRqjqZ5KEkuwbLjgAvJDkFHAV+rqpeuF5DS5IuNdY99Ko6DBxesu/BoccFPDD4kyRNgd8UlaQmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCbGCnqSnUnOJFlIsv8K634sSSWZndyIkqRxjAx6knXAAeAuYDuwJ8n2ZdbdDNwPPDHpISVJo41zhb4DWKiqs1X1MnAI2L3Mul8FPgL85wTnkySNaZygbwSeG9o+N9j3/5LcDmyuqseu9ERJ9iaZTzJ/8eLFqx5WknR5r/lN0SSvAz4GfGjU2qo6WFWzVTU7MzPzWl9akjRknKCfBzYPbW8a7PuWm4G3A3+b5Fng+4E53xiVpJU1TtCPA9uS3JrkRuAeYO5bB6vqG1W1oaq2VtVW4Biwq6rmr8vEkqRljQx6Vb0C7AOOAKeBR6vqZJKHkuy63gNKksazfpxFVXUYOLxk34OXWXvnax9LknS1/KaoJDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6Qmxgp6kp1JziRZSLJ/meMPJDmV5Jkkf5PkLZMfVZJ0JSODnmQdcAC4C9gO7EmyfcmyJ4HZqvpe4NPAr096UEnSlY1zhb4DWKiqs1X1MnAI2D28oKqOVtU3B5vHgE2THVOSNMo4Qd8IPDe0fW6w73LuA/5yuQNJ9iaZTzJ/8eLF8aeUJI000TdFk9wLzAIfXe54VR2sqtmqmp2ZmZnkS0vSmrd+jDXngc1D25sG+14lybuAXwB+qKr+azLjSZLGNc4V+nFgW5Jbk9wI3APMDS9Ichvwe8Cuqrow+TElSaOMDHpVvQLsA44Ap4FHq+pkkoeS7Bos+yjwBuBPkzyVZO4yTydJuk7GueVCVR0GDi/Z9+DQ43dNeC5J0lXym6KS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUxFhBT7IzyZkkC0n2L3P825L8yeD4E0m2TnxSSdIVjQx6knXAAeAuYDuwJ8n2JcvuA16squ8CfgP4yKQHlSRd2ThX6DuAhao6W1UvA4eA3UvW7Ab+YPD408A7k2RyY0qSRklVXXlB8j5gZ1X99GD7/cA7qmrf0JqvDNacG2z/02DN80ueay+wd7D5NuDMpE7kOtkAPD9yVU+e+9q1ls9/NZz7W6pqZrkD61dyiqo6CBxcydd8LZLMV9XstOeYBs99bZ47rO3zX+3nPs4tl/PA5qHtTYN9y65Jsh54E/DCJAaUJI1nnKAfB7YluTXJjcA9wNySNXPATwwevw/4fI26lyNJmqiRt1yq6pUk+4AjwDrg4ao6meQhYL6q5oCPA59KsgB8jcXod7Bqbg9dB5772rWWz39Vn/vIN0UlSauD3xSVpCYMuiQ1YdCXkeThJBcGn69fU5JsTnI0yakkJ5PcP+2ZVkqS1yf5UpKnB+f+K9OeaaUlWZfkySR/Me1ZVlqSZ5P8fZKnksxPe55r4T30ZST5QeAl4JNV9fZpz7OSktwC3FJVX05yM3AC+NGqOjXl0a67wbebb6qql5LcAHwBuL+qjk15tBWT5AFgFnhjVb132vOspCTPArNLvxC5mniFvoyqepzFT+usOVX11ar68uDxfwCngY3TnWpl1KKXBps3DP7WzBVPkk3Ae4Dfn/YsujYGXZc1+NXM24AnpjzKihnccngKuAB8rqrWzLkDvwn8PPC/U55jWgr46yQnBj9TsuoYdC0ryRuAzwA/W1X/Pu15VkpV/U9VfR+L34jekWRN3HJL8l7gQlWdmPYsU/QDVXU7i78s+8HBrddVxaDrEoP7x58B/qiq/mza80xDVX0dOArsnPIoK+UOYNfgPvIh4IeT/OF0R1pZVXV+8N8LwGdZ/KXZVcWg61UGbwx+HDhdVR+b9jwrKclMkjcPHn878G7gH6Y61Aqpqg9X1aaq2sriN70/X1X3TnmsFZPkpsGHAEhyE/AjwKr7lJtBX0aSR4AvAm9Lci7JfdOeaQXdAbyfxSu0pwZ/d097qBVyC3A0yTMs/obR56pqzX18b436TuALSZ4GvgQ8VlV/NeWZrpofW5SkJrxCl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpr4P+aN9SgSpmNRAAAAAElFTkSuQmCC\n"
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"plt.bar([1,2,3,4,5],pca.explained_variance_ratio_[:5])\n",
|
|
"plt.show()\n",
|
|
"plt.bar([1,2,3,4,5],pca.explained_variance_ratio_[:5].cumsum())\n",
|
|
"plt.show()\n",
|
|
"\n",
|
|
"\n"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**6) Generate a biplot of the component's scores in the space spanned by the first two components. Color the points depending on their target label ('M' or 'B'). Do you notice anything ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "VaE6VhCf1MPK",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" radius_mean texture_mean perimeter_mean area_mean \\\n",
|
|
"0 20870.964901 -3050.864103 5965.427229 -8720.687283 \n",
|
|
"1 23864.132741 -5222.914750 6456.069045 -11552.079258 \n",
|
|
"2 22843.207276 -6245.747802 6315.191692 -10480.506295 \n",
|
|
"3 13248.828192 -5990.039539 3768.712088 -3363.693666 \n",
|
|
"4 23539.292820 -4214.777576 6562.941520 -11299.431974 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 25012.673889 -6580.813802 6898.132463 -12885.011480 \n",
|
|
"565 23353.670009 -8303.170608 6373.485769 -10985.800863 \n",
|
|
"566 19258.366723 -8253.204625 5261.040463 -7475.746011 \n",
|
|
"567 23898.937018 -8620.601555 6805.833508 -11020.648764 \n",
|
|
"568 9002.706372 -7212.736521 2327.876814 -1576.867531 \n",
|
|
"\n",
|
|
" smoothness_mean compactness_mean concavity_mean concave points_mean \\\n",
|
|
"0 3.788858 0.351279 0.279494 0.021795 \n",
|
|
"1 2.711721 0.099512 0.080933 0.010397 \n",
|
|
"2 3.507253 0.202340 0.183846 0.018951 \n",
|
|
"3 4.560069 0.359251 0.224825 0.015587 \n",
|
|
"4 3.209649 0.168047 0.184405 0.015454 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 3.552054 0.146662 0.227153 0.020580 \n",
|
|
"565 3.129648 0.130844 0.134113 0.014507 \n",
|
|
"566 2.705641 0.129452 0.086158 0.007856 \n",
|
|
"567 3.769657 0.350520 0.327272 0.022521 \n",
|
|
"568 1.684186 0.055197 0.000000 0.000000 \n",
|
|
"\n",
|
|
" symmetry_mean fractal_dimension_mean ... radius_worst texture_worst \\\n",
|
|
"0 0.180328 0.046388 ... 0.537772 0.004172 \n",
|
|
"1 0.135078 0.033399 ... 0.529508 0.005636 \n",
|
|
"2 0.154236 0.035356 ... 0.499420 0.006146 \n",
|
|
"3 0.193597 0.057427 ... 0.315925 0.006380 \n",
|
|
"4 0.134854 0.034672 ... 0.477595 0.004013 \n",
|
|
".. ... ... ... ... ... \n",
|
|
"564 0.128667 0.033140 ... 0.539255 0.006356 \n",
|
|
"565 0.130605 0.032609 ... 0.501963 0.009209 \n",
|
|
"566 0.118529 0.033287 ... 0.402163 0.008215 \n",
|
|
"567 0.178688 0.041349 ... 0.545400 0.009491 \n",
|
|
"568 0.118305 0.034678 ... 0.200361 0.007312 \n",
|
|
"\n",
|
|
" perimeter_worst area_worst smoothness_worst compactness_worst \\\n",
|
|
"0 0.466728 23.339693 0.000936 0.000917 \n",
|
|
"1 0.401497 22.611412 0.000715 0.000257 \n",
|
|
"2 0.385569 19.756085 0.000834 0.000585 \n",
|
|
"3 0.249975 6.562627 0.001211 0.001193 \n",
|
|
"4 0.384811 18.207042 0.000793 0.000282 \n",
|
|
".. ... ... ... ... \n",
|
|
"564 0.419954 23.432173 0.000814 0.000291 \n",
|
|
"565 0.391890 20.010406 0.000673 0.000265 \n",
|
|
"566 0.320338 12.993470 0.000658 0.000426 \n",
|
|
"567 0.466728 21.050808 0.000953 0.001195 \n",
|
|
"568 0.149575 3.105023 0.000519 0.000089 \n",
|
|
"\n",
|
|
" concavity_worst concave points_worst symmetry_worst \\\n",
|
|
"0 -0.001411 0.000343 0.000915 \n",
|
|
"1 -0.000479 0.000241 0.000547 \n",
|
|
"2 -0.000893 0.000314 0.000719 \n",
|
|
"3 -0.001361 0.000333 0.001320 \n",
|
|
"4 -0.000793 0.000210 0.000470 \n",
|
|
".. ... ... ... \n",
|
|
"564 -0.000814 0.000287 0.000410 \n",
|
|
"565 -0.000637 0.000211 0.000512 \n",
|
|
"566 -0.000674 0.000183 0.000441 \n",
|
|
"567 -0.001860 0.000343 0.000813 \n",
|
|
"568 -0.000000 0.000000 0.000571 \n",
|
|
"\n",
|
|
" fractal_dimension_worst \n",
|
|
"0 0.000084 \n",
|
|
"1 0.000063 \n",
|
|
"2 0.000062 \n",
|
|
"3 0.000122 \n",
|
|
"4 0.000054 \n",
|
|
".. ... \n",
|
|
"564 0.000050 \n",
|
|
"565 0.000047 \n",
|
|
"566 0.000055 \n",
|
|
"567 0.000087 \n",
|
|
"568 0.000050 \n",
|
|
"\n",
|
|
"[569 rows x 30 columns]\n"
|
|
]
|
|
},
|
|
{
|
|
"ename": "ValueError",
|
|
"evalue": "'c' argument must be a color, a sequence of colors, or a sequence of numbers, not diagnosis\n0 M\n1 M\n2 M\n3 M\n4 M\n.. ...\n564 M\n565 M\n566 M\n567 M\n568 B\n\n[569 rows x 1 columns]",
|
|
"output_type": "error",
|
|
"traceback": [
|
|
"\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
|
|
"\u001B[0;31mValueError\u001B[0m Traceback (most recent call last)",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:4221\u001B[0m, in \u001B[0;36mAxes._parse_scatter_color_args\u001B[0;34m(c, edgecolors, kwargs, xsize, get_next_color_func)\u001B[0m\n\u001B[1;32m 4220\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m: \u001B[38;5;66;03m# Is 'c' acceptable as PathCollection facecolors?\u001B[39;00m\n\u001B[0;32m-> 4221\u001B[0m colors \u001B[38;5;241m=\u001B[39m \u001B[43mmcolors\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mto_rgba_array\u001B[49m\u001B[43m(\u001B[49m\u001B[43mc\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 4222\u001B[0m \u001B[38;5;28;01mexcept\u001B[39;00m (\u001B[38;5;167;01mTypeError\u001B[39;00m, \u001B[38;5;167;01mValueError\u001B[39;00m) \u001B[38;5;28;01mas\u001B[39;00m err:\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/colors.py:377\u001B[0m, in \u001B[0;36mto_rgba_array\u001B[0;34m(c, alpha)\u001B[0m\n\u001B[1;32m 376\u001B[0m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[0;32m--> 377\u001B[0m rgba \u001B[38;5;241m=\u001B[39m np\u001B[38;5;241m.\u001B[39marray([to_rgba(cc) \u001B[38;5;28;01mfor\u001B[39;00m cc \u001B[38;5;129;01min\u001B[39;00m c])\n\u001B[1;32m 379\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m alpha \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/colors.py:377\u001B[0m, in \u001B[0;36m<listcomp>\u001B[0;34m(.0)\u001B[0m\n\u001B[1;32m 376\u001B[0m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[0;32m--> 377\u001B[0m rgba \u001B[38;5;241m=\u001B[39m np\u001B[38;5;241m.\u001B[39marray([\u001B[43mto_rgba\u001B[49m\u001B[43m(\u001B[49m\u001B[43mcc\u001B[49m\u001B[43m)\u001B[49m \u001B[38;5;28;01mfor\u001B[39;00m cc \u001B[38;5;129;01min\u001B[39;00m c])\n\u001B[1;32m 379\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m alpha \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/colors.py:187\u001B[0m, in \u001B[0;36mto_rgba\u001B[0;34m(c, alpha)\u001B[0m\n\u001B[1;32m 186\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m rgba \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m: \u001B[38;5;66;03m# Suppress exception chaining of cache lookup failure.\u001B[39;00m\n\u001B[0;32m--> 187\u001B[0m rgba \u001B[38;5;241m=\u001B[39m \u001B[43m_to_rgba_no_colorcycle\u001B[49m\u001B[43m(\u001B[49m\u001B[43mc\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43malpha\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 188\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m:\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/colors.py:262\u001B[0m, in \u001B[0;36m_to_rgba_no_colorcycle\u001B[0;34m(c, alpha)\u001B[0m\n\u001B[1;32m 261\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m c, c, c, alpha \u001B[38;5;28;01mif\u001B[39;00m alpha \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m \u001B[38;5;28;01melse\u001B[39;00m \u001B[38;5;241m1.\u001B[39m\n\u001B[0;32m--> 262\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mValueError\u001B[39;00m(\u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mInvalid RGBA argument: \u001B[39m\u001B[38;5;132;01m{\u001B[39;00morig_c\u001B[38;5;132;01m!r}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n\u001B[1;32m 263\u001B[0m \u001B[38;5;66;03m# turn 2-D array into 1-D array\u001B[39;00m\n",
|
|
"\u001B[0;31mValueError\u001B[0m: Invalid RGBA argument: 'diagnosis'",
|
|
"\nThe above exception was the direct cause of the following exception:\n",
|
|
"\u001B[0;31mValueError\u001B[0m Traceback (most recent call last)",
|
|
"Input \u001B[0;32mIn [40]\u001B[0m, in \u001B[0;36m<module>\u001B[0;34m\u001B[0m\n\u001B[1;32m 2\u001B[0m NY \u001B[38;5;241m=\u001B[39m X_PCA[\u001B[38;5;241m1\u001B[39m]\n\u001B[1;32m 4\u001B[0m \u001B[38;5;28mprint\u001B[39m(NX\u001B[38;5;241m*\u001B[39mX_normal)\n\u001B[0;32m----> 5\u001B[0m \u001B[43mplt\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mscatter\u001B[49m\u001B[43m(\u001B[49m\u001B[43mNX\u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mX_normal\u001B[49m\u001B[43m,\u001B[49m\u001B[43mNY\u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mX_normal\u001B[49m\u001B[43m,\u001B[49m\u001B[43mc\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mY\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mcmap\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[38;5;124;43mYlOrRd\u001B[39;49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 6\u001B[0m plt\u001B[38;5;241m.\u001B[39mshow()\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/pyplot.py:2807\u001B[0m, in \u001B[0;36mscatter\u001B[0;34m(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite, data, **kwargs)\u001B[0m\n\u001B[1;32m 2802\u001B[0m \u001B[38;5;129m@_copy_docstring_and_deprecators\u001B[39m(Axes\u001B[38;5;241m.\u001B[39mscatter)\n\u001B[1;32m 2803\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21mscatter\u001B[39m(\n\u001B[1;32m 2804\u001B[0m x, y, s\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, c\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, marker\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, cmap\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, norm\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m,\n\u001B[1;32m 2805\u001B[0m vmin\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, vmax\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, alpha\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, linewidths\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, \u001B[38;5;241m*\u001B[39m,\n\u001B[1;32m 2806\u001B[0m edgecolors\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, plotnonfinite\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mFalse\u001B[39;00m, data\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs):\n\u001B[0;32m-> 2807\u001B[0m __ret \u001B[38;5;241m=\u001B[39m \u001B[43mgca\u001B[49m\u001B[43m(\u001B[49m\u001B[43m)\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mscatter\u001B[49m\u001B[43m(\u001B[49m\n\u001B[1;32m 2808\u001B[0m \u001B[43m \u001B[49m\u001B[43mx\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43my\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43ms\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43ms\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mc\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mc\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mmarker\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mmarker\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mcmap\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mcmap\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mnorm\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mnorm\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 2809\u001B[0m \u001B[43m \u001B[49m\u001B[43mvmin\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mvmin\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mvmax\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mvmax\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43malpha\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43malpha\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mlinewidths\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mlinewidths\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 2810\u001B[0m \u001B[43m \u001B[49m\u001B[43medgecolors\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43medgecolors\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mplotnonfinite\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mplotnonfinite\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 2811\u001B[0m \u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43m(\u001B[49m\u001B[43m{\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mdata\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m:\u001B[49m\u001B[43m \u001B[49m\u001B[43mdata\u001B[49m\u001B[43m}\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;28;43;01mif\u001B[39;49;00m\u001B[43m \u001B[49m\u001B[43mdata\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;129;43;01mis\u001B[39;49;00m\u001B[43m \u001B[49m\u001B[38;5;129;43;01mnot\u001B[39;49;00m\u001B[43m \u001B[49m\u001B[38;5;28;43;01mNone\u001B[39;49;00m\u001B[43m \u001B[49m\u001B[38;5;28;43;01melse\u001B[39;49;00m\u001B[43m \u001B[49m\u001B[43m{\u001B[49m\u001B[43m}\u001B[49m\u001B[43m)\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 2812\u001B[0m sci(__ret)\n\u001B[1;32m 2813\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m __ret\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/__init__.py:1412\u001B[0m, in \u001B[0;36m_preprocess_data.<locals>.inner\u001B[0;34m(ax, data, *args, **kwargs)\u001B[0m\n\u001B[1;32m 1409\u001B[0m \u001B[38;5;129m@functools\u001B[39m\u001B[38;5;241m.\u001B[39mwraps(func)\n\u001B[1;32m 1410\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21minner\u001B[39m(ax, \u001B[38;5;241m*\u001B[39margs, data\u001B[38;5;241m=\u001B[39m\u001B[38;5;28;01mNone\u001B[39;00m, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs):\n\u001B[1;32m 1411\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m data \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[0;32m-> 1412\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mfunc\u001B[49m\u001B[43m(\u001B[49m\u001B[43max\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;28;43mmap\u001B[39;49m\u001B[43m(\u001B[49m\u001B[43msanitize_sequence\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43margs\u001B[49m\u001B[43m)\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 1414\u001B[0m bound \u001B[38;5;241m=\u001B[39m new_sig\u001B[38;5;241m.\u001B[39mbind(ax, \u001B[38;5;241m*\u001B[39margs, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs)\n\u001B[1;32m 1415\u001B[0m auto_label \u001B[38;5;241m=\u001B[39m (bound\u001B[38;5;241m.\u001B[39marguments\u001B[38;5;241m.\u001B[39mget(label_namer)\n\u001B[1;32m 1416\u001B[0m \u001B[38;5;129;01mor\u001B[39;00m bound\u001B[38;5;241m.\u001B[39mkwargs\u001B[38;5;241m.\u001B[39mget(label_namer))\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:4387\u001B[0m, in \u001B[0;36mAxes.scatter\u001B[0;34m(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite, **kwargs)\u001B[0m\n\u001B[1;32m 4384\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m edgecolors \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[1;32m 4385\u001B[0m orig_edgecolor \u001B[38;5;241m=\u001B[39m kwargs\u001B[38;5;241m.\u001B[39mget(\u001B[38;5;124m'\u001B[39m\u001B[38;5;124medgecolor\u001B[39m\u001B[38;5;124m'\u001B[39m, \u001B[38;5;28;01mNone\u001B[39;00m)\n\u001B[1;32m 4386\u001B[0m c, colors, edgecolors \u001B[38;5;241m=\u001B[39m \\\n\u001B[0;32m-> 4387\u001B[0m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43m_parse_scatter_color_args\u001B[49m\u001B[43m(\u001B[49m\n\u001B[1;32m 4388\u001B[0m \u001B[43m \u001B[49m\u001B[43mc\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43medgecolors\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mkwargs\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mx\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43msize\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 4389\u001B[0m \u001B[43m \u001B[49m\u001B[43mget_next_color_func\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43m_get_patches_for_fill\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mget_next_color\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 4391\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m plotnonfinite \u001B[38;5;129;01mand\u001B[39;00m colors \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[1;32m 4392\u001B[0m c \u001B[38;5;241m=\u001B[39m np\u001B[38;5;241m.\u001B[39mma\u001B[38;5;241m.\u001B[39mmasked_invalid(c)\n",
|
|
"File \u001B[0;32m~/.local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:4230\u001B[0m, in \u001B[0;36mAxes._parse_scatter_color_args\u001B[0;34m(c, edgecolors, kwargs, xsize, get_next_color_func)\u001B[0m\n\u001B[1;32m 4227\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m invalid_shape_exception(c\u001B[38;5;241m.\u001B[39msize, xsize) \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01merr\u001B[39;00m\n\u001B[1;32m 4228\u001B[0m \u001B[38;5;66;03m# Both the mapping *and* the RGBA conversion failed: pretty\u001B[39;00m\n\u001B[1;32m 4229\u001B[0m \u001B[38;5;66;03m# severe failure => one may appreciate a verbose feedback.\u001B[39;00m\n\u001B[0;32m-> 4230\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mValueError\u001B[39;00m(\n\u001B[1;32m 4231\u001B[0m \u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m'\u001B[39m\u001B[38;5;124mc\u001B[39m\u001B[38;5;124m'\u001B[39m\u001B[38;5;124m argument must be a color, a sequence of colors, \u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 4232\u001B[0m \u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mor a sequence of numbers, not \u001B[39m\u001B[38;5;132;01m{\u001B[39;00mc\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m) \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01merr\u001B[39;00m\n\u001B[1;32m 4233\u001B[0m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[1;32m 4234\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28mlen\u001B[39m(colors) \u001B[38;5;129;01mnot\u001B[39;00m \u001B[38;5;129;01min\u001B[39;00m (\u001B[38;5;241m0\u001B[39m, \u001B[38;5;241m1\u001B[39m, xsize):\n\u001B[1;32m 4235\u001B[0m \u001B[38;5;66;03m# NB: remember that a single color is also acceptable.\u001B[39;00m\n\u001B[1;32m 4236\u001B[0m \u001B[38;5;66;03m# Besides *colors* will be an empty array if c == 'none'.\u001B[39;00m\n",
|
|
"\u001B[0;31mValueError\u001B[0m: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not diagnosis\n0 M\n1 M\n2 M\n3 M\n4 M\n.. ...\n564 M\n565 M\n566 M\n567 M\n568 B\n\n[569 rows x 1 columns]"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "<Figure size 432x288 with 1 Axes>",
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAANT0lEQVR4nO3cYYjkd33H8ffHO1NpjKb0VpC706T00njYQtIlTRFqirZc8uDugUXuIFgleGAbKVWEFEuU+MiGWhCu1ZOKVdAYfSALntwDjQTEC7chNXgXItvTeheFrDHNk6Ax7bcPZtKdrneZf3Zndy/7fb/gYP7/+e3Mlx97752d2ZlUFZKk7e8VWz2AJGlzGHxJasLgS1ITBl+SmjD4ktSEwZekJqYGP8lnkzyZ5PuXuD5JPplkKcmjSW6c/ZiSpPUa8gj/c8CBF7n+VmDf+N9R4F/WP5YkadamBr+qHgR+/iJLDgGfr5FTwNVJXj+rASVJs7FzBrexGzg/cXxhfO6nqxcmOcrotwCuvPLKP7z++utncPeS1MfDDz/8s6qaW8vXziL4g1XVceA4wPz8fC0uLm7m3UvSy16S/1zr187ir3SeAPZOHO8Zn5MkXUZmEfwF4F3jv9a5GXimqn7t6RxJ0taa+pROki8BtwC7klwAPgK8EqCqPgWcAG4DloBngfds1LCSpLWbGvyqOjLl+gL+emYTSZI2hO+0laQmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqYlBwU9yIMnjSZaS3HWR69+Q5IEkjyR5NMltsx9VkrQeU4OfZAdwDLgV2A8cSbJ/1bK/B+6vqhuAw8A/z3pQSdL6DHmEfxOwVFXnquo54D7g0Ko1BbxmfPm1wE9mN6IkaRaGBH83cH7i+ML43KSPArcnuQCcAN5/sRtKcjTJYpLF5eXlNYwrSVqrWb1oewT4XFXtAW4DvpDk1267qo5X1XxVzc/Nzc3oriVJQwwJ/hPA3onjPeNzk+4A7geoqu8CrwJ2zWJASdJsDAn+aWBfkmuTXMHoRdmFVWt+DLwNIMmbGAXf52wk6TIyNfhV9TxwJ3ASeIzRX+OcSXJPkoPjZR8E3pvke8CXgHdXVW3U0JKkl27nkEVVdYLRi7GT5+6euHwWeMtsR5MkzZLvtJWkJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNTEo+EkOJHk8yVKSuy6x5p1JziY5k+SLsx1TkrReO6ctSLIDOAb8GXABOJ1koarOTqzZB/wd8JaqejrJ6zZqYEnS2gx5hH8TsFRV56rqOeA+4NCqNe8FjlXV0wBV9eRsx5QkrdeQ4O8Gzk8cXxifm3QdcF2S7yQ5leTAxW4oydEki0kWl5eX1zaxJGlNZvWi7U5gH3ALcAT4TJKrVy+qquNVNV9V83NzczO6a0nSEEOC/wSwd+J4z/jcpAvAQlX9qqp+CPyA0Q8ASdJlYkjwTwP7klyb5ArgMLCwas3XGD26J8kuRk/xnJvdmJKk9Zoa/Kp6HrgTOAk8BtxfVWeS3JPk4HjZSeCpJGeBB4APVdVTGzW0JOmlS1VtyR3Pz8/X4uLilty3JL1cJXm4qubX8rW+01aSmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmBgU/yYEkjydZSnLXi6x7R5JKMj+7ESVJszA1+El2AMeAW4H9wJEk+y+y7irgb4CHZj2kJGn9hjzCvwlYqqpzVfUccB9w6CLrPgZ8HPjFDOeTJM3IkODvBs5PHF8Yn/s/SW4E9lbV11/shpIcTbKYZHF5efklDytJWrt1v2ib5BXAJ4APTltbVcerar6q5ufm5tZ715Kkl2BI8J8A9k4c7xmfe8FVwJuBbyf5EXAzsOALt5J0eRkS/NPAviTXJrkCOAwsvHBlVT1TVbuq6pqqugY4BRysqsUNmViStCZTg19VzwN3AieBx4D7q+pMknuSHNzoASVJs7FzyKKqOgGcWHXu7kusvWX9Y0mSZs132kpSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmhgU/CQHkjyeZCnJXRe5/gNJziZ5NMk3k7xx9qNKktZjavCT7ACOAbcC+4EjSfavWvYIMF9VfwB8FfiHWQ8qSVqfIY/wbwKWqupcVT0H3AccmlxQVQ9U1bPjw1PAntmOKUlaryHB3w2cnzi+MD53KXcA37jYFUmOJllMsri8vDx8SknSus30RdsktwPzwL0Xu76qjlfVfFXNz83NzfKuJUlT7Byw5glg78TxnvG5/yfJ24EPA2+tql/OZjxJ0qwMeYR/GtiX5NokVwCHgYXJBUluAD4NHKyqJ2c/piRpvaYGv6qeB+4ETgKPAfdX1Zkk9yQ5OF52L/Bq4CtJ/j3JwiVuTpK0RYY8pUNVnQBOrDp398Tlt894LknSjPlOW0lqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpow+JLUhMGXpCYMviQ1YfAlqQmDL0lNGHxJasLgS1ITBl+SmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElqwuBLUhMGX5KaMPiS1ITBl6QmDL4kNWHwJakJgy9JTRh8SWrC4EtSEwZfkpoYFPwkB5I8nmQpyV0Xuf43knx5fP1DSa6Z+aSSpHWZGvwkO4BjwK3AfuBIkv2rlt0BPF1Vvwv8E/DxWQ8qSVqfIY/wbwKWqupcVT0H3AccWrXmEPBv48tfBd6WJLMbU5K0XjsHrNkNnJ84vgD80aXWVNXzSZ4Bfhv42eSiJEeBo+PDXyb5/lqG3oZ2sWqvGnMvVrgXK9yLFb+31i8cEvyZqarjwHGAJItVNb+Z93+5ci9WuBcr3IsV7sWKJItr/dohT+k8AeydON4zPnfRNUl2Aq8FnlrrUJKk2RsS/NPAviTXJrkCOAwsrFqzAPzl+PJfAN+qqprdmJKk9Zr6lM74Ofk7gZPADuCzVXUmyT3AYlUtAP8KfCHJEvBzRj8Upjm+jrm3G/dihXuxwr1Y4V6sWPNexAfiktSD77SVpCYMviQ1seHB92MZVgzYiw8kOZvk0STfTPLGrZhzM0zbi4l170hSSbbtn+QN2Ysk7xx/b5xJ8sXNnnGzDPg/8oYkDyR5ZPz/5LatmHOjJflskicv9V6ljHxyvE+PJrlx0A1X1Yb9Y/Qi738AvwNcAXwP2L9qzV8BnxpfPgx8eSNn2qp/A/fiT4HfHF9+X+e9GK+7CngQOAXMb/XcW/h9sQ94BPit8fHrtnruLdyL48D7xpf3Az/a6rk3aC/+BLgR+P4lrr8N+AYQ4GbgoSG3u9GP8P1YhhVT96KqHqiqZ8eHpxi952E7GvJ9AfAxRp/L9IvNHG6TDdmL9wLHquppgKp6cpNn3CxD9qKA14wvvxb4ySbOt2mq6kFGf/F4KYeAz9fIKeDqJK+fdrsbHfyLfSzD7kutqarngRc+lmG7GbIXk+5g9BN8O5q6F+NfUfdW1dc3c7AtMOT74jrguiTfSXIqyYFNm25zDdmLjwK3J7kAnADevzmjXXZeak+ATf5oBQ2T5HZgHnjrVs+yFZK8AvgE8O4tHuVysZPR0zq3MPqt78Ekv19V/7WVQ22RI8Dnquofk/wxo/f/vLmq/merB3s52OhH+H4sw4ohe0GStwMfBg5W1S83abbNNm0vrgLeDHw7yY8YPUe5sE1fuB3yfXEBWKiqX1XVD4EfMPoBsN0M2Ys7gPsBquq7wKsYfbBaN4N6stpGB9+PZVgxdS+S3AB8mlHst+vztDBlL6rqmaraVVXXVNU1jF7POFhVa/7QqMvYkP8jX2P06J4kuxg9xXNuE2fcLEP24sfA2wCSvIlR8Jc3dcrLwwLwrvFf69wMPFNVP532RRv6lE5t3McyvOwM3It7gVcDXxm/bv3jqjq4ZUNvkIF70cLAvTgJ/HmSs8B/Ax+qqm33W/DAvfgg8Jkkf8voBdx3b8cHiEm+xOiH/K7x6xUfAV4JUFWfYvT6xW3AEvAs8J5Bt7sN90qSdBG+01aSmjD4ktSEwZekJgy+JDVh8CWpCYMvSU0YfElq4n8BzPZculjwdYoAAAAASUVORK5CYII=\n"
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"\n"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**7) Generate a table of the loadings for the first two principal components, i.e. a pandas Dataframe which columns are the principal components, and which rows are the loadings for each variable. Set the Dataframe's indices to be the original variable names. How do you interpret it ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "N2sQPSRi5Q5c",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 41,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[[ 1.16014257e+03 -2.93917544e+02 4.85783976e+01 ... 1.29334919e-03\n",
|
|
" 1.98910417e-03 7.04378359e-04]\n",
|
|
" [ 1.26912244e+03 1.56301818e+01 -3.53945342e+01 ... -1.34685217e-03\n",
|
|
" 6.85925212e-04 -1.06125086e-03]\n",
|
|
" [ 9.95793889e+02 3.91567432e+01 -1.70975298e+00 ... 1.84867758e-05\n",
|
|
" -7.75218581e-04 4.05360270e-04]\n",
|
|
" ...\n",
|
|
" [ 3.14501756e+02 4.75535252e+01 -1.04424072e+01 ... 2.54369638e-05\n",
|
|
" 4.83858890e-04 -2.85342703e-04]\n",
|
|
" [ 1.12485812e+03 3.41292250e+01 -1.97420874e+01 ... 1.23547951e-03\n",
|
|
" -8.08728730e-04 1.21655195e-03]\n",
|
|
" [-7.71527622e+02 -8.86431064e+01 2.38890319e+01 ... -4.44552928e-03\n",
|
|
" 2.42876427e-04 1.46800350e-03]]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(X_PCA)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**8) Use the function below to generate a loading plot. How do you interpret it ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "-AnpD_8V74lb",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"def myplot(score,coeff,labels=None):\n",
|
|
" fig, ax = plt.subplots()\n",
|
|
" xs = score[:,0]\n",
|
|
" ys = score[:,1]\n",
|
|
" n = coeff.shape[0]\n",
|
|
" scalex = 1.0/(xs.max() - xs.min())\n",
|
|
" scaley = 1.0/(ys.max() - ys.min())\n",
|
|
" ax.scatter(xs*scalex, ys*scaley, c = y)\n",
|
|
" for i in range(n):\n",
|
|
" ax.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)\n",
|
|
" if labels is None:\n",
|
|
" ax.text(coeff[i,0]* 1.15 , coeff[i,1] * 1.15 , \"Var\"+str(i+1), color = 'g', ha = 'center', va = 'center')\n",
|
|
" else:\n",
|
|
" ax.text(coeff[i,0]* 1.15 , coeff[i,1] * 1.15 , labels[i], color = 'g', ha = 'center', va = 'center')\n",
|
|
" plt.xlim(-0.5,1.1)\n",
|
|
" plt.ylim(-1,1.1)\n",
|
|
" ax.set_xlabel(\"PC{}\".format(1))\n",
|
|
" ax.set_ylabel(\"PC{}\".format(2))\n",
|
|
" fig.set_size_inches(18.5, 10.5)\n"
|
|
],
|
|
"metadata": {
|
|
"id": "Z8FYAJLcxdrk",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 19,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**9) Split X (the centered data) and y into a training and a test set following a 80/20 partition. Fit a logistic regression model to the training data using all original variables, and evaluate its accuracy on the test set. Redo the same, but create a pipeline that adds a PCA pre-processing step to the data X. Fit the model on only the two first principal components. Is the difference in accuracy significant between the two approaches ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "fmFXxtqrCJo7",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**10) We'll now see how PCA can be employed to compress an image. First, load the 'doggo.jpeg' picture using the library matplotlib.**"
|
|
],
|
|
"metadata": {
|
|
"id": "MoD52zwBDd2m",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**11) Split the image into its red, green and blue channels (using the method cv2.split()). Then, on each channel, apply a PCA transformation with 1 component. For each channel, also compute the inverse PCA transform (using the pca.inverse_transform() method).**\n",
|
|
"\n",
|
|
"**Stack the three inverted transforms (one for each channel) back together to form the compressed imgage, and display the image. Try by increasing the number of principal components until you reach a satisfactory quality.***"
|
|
],
|
|
"metadata": {
|
|
"id": "rnevCJjcEiw1",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
} |