951 lines
71 KiB
Plaintext
951 lines
71 KiB
Plaintext
{
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0,
|
|
"metadata": {
|
|
"colab": {
|
|
"name": "Lab5_solutions.ipynb",
|
|
"provenance": []
|
|
},
|
|
"kernelspec": {
|
|
"name": "python3",
|
|
"display_name": "Python 3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**Last week, you were introduced to model testing and model selection for a regression task using the framework of cross-validation. In this lab, we'll experiment a bit more with model selection, but this time, we'll be focusing on the task of classification. We'll work with the dataset \"heart_failure_lab\", which contains several information regarding a patient's health, and whether he/she suffered from an heart failure. The task here is trying to find the best model that can predict an heart failure using the patient's information at hand. To this end, you'll be introduced to several models, that you may consider as \"black boxes\" (at this stage, you're not asked to understand how the models work under the hood).**"
|
|
],
|
|
"metadata": {
|
|
"id": "lxzNCYczocbF",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**Import the necessary libraries**"
|
|
],
|
|
"metadata": {
|
|
"id": "xlWn8gFVYOXp",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"id": "2Ll2oo0bFKbm",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np \n",
|
|
"import pandas as pd \n",
|
|
"from sklearn.impute import SimpleImputer\n",
|
|
"from sklearn.svm import SVC\n",
|
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|
"from sklearn.neural_network import MLPClassifier\n",
|
|
"from sklearn.metrics import accuracy_score\n",
|
|
"from sklearn.model_selection import train_test_split, cross_validate, LeaveOneOut, GridSearchCV, RandomizedSearchCV\n",
|
|
"from sklearn.linear_model import LogisticRegression\n",
|
|
"from sklearn.preprocessing import StandardScaler, RobustScaler\n",
|
|
"from sklearn.neighbors import KNeighborsClassifier\n",
|
|
"from sklearn.pipeline import make_pipeline \n",
|
|
"from sklearn.compose import ColumnTransformer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**1) Load the dataframe, get its general information and change the data type of the columns 'anaemia', 'diabetes', 'high_blood_pressure', 'sex', smoking' to categorical.**"
|
|
],
|
|
"metadata": {
|
|
"id": "y2UWo1OBYWeM",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"file = './heart_failure_lab.csv'\n",
|
|
"df = pd.read_csv(file, index_col=0)\n",
|
|
"df = df.astype({'anaemia' : 'category', 'diabetes': 'category', 'high_blood_pressure' : 'category', 'sex' : 'category', 'smoking' : 'category'})\n",
|
|
"print(df.info())\n",
|
|
"print(df.head())\n",
|
|
"print(df.isna().sum())"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "0Hm-3k-tU-hh",
|
|
"outputId": "13c94a5e-db68-4fe9-f24f-47e84110a57b",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 2,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|
"Int64Index: 299 entries, 0 to 298\n",
|
|
"Data columns (total 13 columns):\n",
|
|
" # Column Non-Null Count Dtype \n",
|
|
"--- ------ -------------- ----- \n",
|
|
" 0 age 294 non-null float64 \n",
|
|
" 1 anaemia 291 non-null category\n",
|
|
" 2 creatinine_phosphokinase 292 non-null float64 \n",
|
|
" 3 diabetes 287 non-null category\n",
|
|
" 4 ejection_fraction 296 non-null float64 \n",
|
|
" 5 high_blood_pressure 291 non-null category\n",
|
|
" 6 platelets 294 non-null float64 \n",
|
|
" 7 serum_creatinine 290 non-null float64 \n",
|
|
" 8 serum_sodium 296 non-null float64 \n",
|
|
" 9 sex 292 non-null category\n",
|
|
" 10 smoking 293 non-null category\n",
|
|
" 11 time 299 non-null int64 \n",
|
|
" 12 DEATH_EVENT 292 non-null float64 \n",
|
|
"dtypes: category(5), float64(7), int64(1)\n",
|
|
"memory usage: 23.1 KB\n",
|
|
"None\n",
|
|
" age anaemia creatinine_phosphokinase diabetes ejection_fraction \\\n",
|
|
"0 75.0 0.0 582.0 0.0 20.0 \n",
|
|
"1 55.0 0.0 7861.0 0.0 38.0 \n",
|
|
"2 65.0 0.0 146.0 0.0 20.0 \n",
|
|
"3 50.0 1.0 111.0 0.0 20.0 \n",
|
|
"4 65.0 1.0 160.0 1.0 20.0 \n",
|
|
"\n",
|
|
" high_blood_pressure platelets serum_creatinine serum_sodium sex smoking \\\n",
|
|
"0 1.0 265000.00 1.9 130.0 1.0 0.0 \n",
|
|
"1 0.0 263358.03 NaN 136.0 1.0 0.0 \n",
|
|
"2 0.0 162000.00 1.3 129.0 1.0 1.0 \n",
|
|
"3 0.0 210000.00 1.9 137.0 1.0 0.0 \n",
|
|
"4 0.0 327000.00 2.7 116.0 0.0 0.0 \n",
|
|
"\n",
|
|
" time DEATH_EVENT \n",
|
|
"0 4 1.0 \n",
|
|
"1 6 1.0 \n",
|
|
"2 7 1.0 \n",
|
|
"3 7 1.0 \n",
|
|
"4 8 NaN \n",
|
|
"age 5\n",
|
|
"anaemia 8\n",
|
|
"creatinine_phosphokinase 7\n",
|
|
"diabetes 12\n",
|
|
"ejection_fraction 3\n",
|
|
"high_blood_pressure 8\n",
|
|
"platelets 5\n",
|
|
"serum_creatinine 9\n",
|
|
"serum_sodium 3\n",
|
|
"sex 7\n",
|
|
"smoking 6\n",
|
|
"time 0\n",
|
|
"DEATH_EVENT 7\n",
|
|
"dtype: int64\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**2) Remove the entire row wherever the corresponding column 'DEATH_EVENT contains a missing value. Change the datatype of the column 'DEATH_EVENT' to integer.**"
|
|
],
|
|
"metadata": {
|
|
"id": "9Y46JzzHZnp0",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"idx = df[df['DEATH_EVENT'].isnull()].index\n",
|
|
"df = df.drop(idx, axis=0)\n",
|
|
"df = df.astype({'DEATH_EVENT' : 'int'})\n",
|
|
"print(df.info())\n",
|
|
"print(df.isna().sum())"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "yQY2P3j8Y8UC",
|
|
"outputId": "2a51d2e2-9694-4b26-8501-57029c4a2652",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 3,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|
"Int64Index: 292 entries, 0 to 298\n",
|
|
"Data columns (total 13 columns):\n",
|
|
" # Column Non-Null Count Dtype \n",
|
|
"--- ------ -------------- ----- \n",
|
|
" 0 age 287 non-null float64 \n",
|
|
" 1 anaemia 284 non-null category\n",
|
|
" 2 creatinine_phosphokinase 285 non-null float64 \n",
|
|
" 3 diabetes 281 non-null category\n",
|
|
" 4 ejection_fraction 289 non-null float64 \n",
|
|
" 5 high_blood_pressure 284 non-null category\n",
|
|
" 6 platelets 287 non-null float64 \n",
|
|
" 7 serum_creatinine 283 non-null float64 \n",
|
|
" 8 serum_sodium 289 non-null float64 \n",
|
|
" 9 sex 285 non-null category\n",
|
|
" 10 smoking 286 non-null category\n",
|
|
" 11 time 292 non-null int64 \n",
|
|
" 12 DEATH_EVENT 292 non-null int64 \n",
|
|
"dtypes: category(5), float64(6), int64(2)\n",
|
|
"memory usage: 22.6 KB\n",
|
|
"None\n",
|
|
"age 5\n",
|
|
"anaemia 8\n",
|
|
"creatinine_phosphokinase 7\n",
|
|
"diabetes 11\n",
|
|
"ejection_fraction 3\n",
|
|
"high_blood_pressure 8\n",
|
|
"platelets 5\n",
|
|
"serum_creatinine 9\n",
|
|
"serum_sodium 3\n",
|
|
"sex 7\n",
|
|
"smoking 6\n",
|
|
"time 0\n",
|
|
"DEATH_EVENT 0\n",
|
|
"dtype: int64\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**3) Replace categorical missing values by the most frequent occurence of the corresponding column. For continuous missing values, replace them by the mean. Create a new dataframe 'df_new' which does not contain the variable 'time'.**"
|
|
],
|
|
"metadata": {
|
|
"id": "ooF6KgF1acaU",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"imp_cont = SimpleImputer(missing_values=np.nan, strategy='mean')\n",
|
|
"imp_cat = SimpleImputer(missing_values=np.nan, strategy='most_frequent')\n",
|
|
"\n",
|
|
"cat_columns = df.select_dtypes(include=['category']).columns\n",
|
|
"cont_columns = df.select_dtypes(exclude=['category', 'int']).columns\n",
|
|
"\n",
|
|
"df[cat_columns] = imp_cat.fit_transform(df[cat_columns])\n",
|
|
"df[cont_columns] = imp_cont.fit_transform(df[cont_columns])\n",
|
|
"\n",
|
|
"print(df.isna().sum())"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "Pj_OfdtzV4n9",
|
|
"outputId": "b79fbceb-cc44-4b76-d652-0ce7716d4cb4",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 4,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"age 0\n",
|
|
"anaemia 0\n",
|
|
"creatinine_phosphokinase 0\n",
|
|
"diabetes 0\n",
|
|
"ejection_fraction 0\n",
|
|
"high_blood_pressure 0\n",
|
|
"platelets 0\n",
|
|
"serum_creatinine 0\n",
|
|
"serum_sodium 0\n",
|
|
"sex 0\n",
|
|
"smoking 0\n",
|
|
"time 0\n",
|
|
"DEATH_EVENT 0\n",
|
|
"dtype: int64\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"df_new = df.drop(\"time\", axis=1)"
|
|
],
|
|
"metadata": {
|
|
"id": "mb_pNlE8aWJY",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 5,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**4) Fit a model to predict the target variale 'DEATH_EVENT' given the remaining columns of 'df_new'. To this end, use a SVC, a RandomForestClassifier, and a LogisticRegression. Fit the model using a 10 folds cross-validation, and report the mean training and test accuracy across each folds. What are your conclusions ?** "
|
|
],
|
|
"metadata": {
|
|
"id": "HwkQkulGa76d",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"X = df_new.drop(\"DEATH_EVENT\", axis=1)\n",
|
|
"y = df_new.DEATH_EVENT\n",
|
|
"\n",
|
|
"model = SVC(gamma='auto')\n",
|
|
"#model = RandomForestClassifier()\n",
|
|
"#model = LogisticRegression()\n",
|
|
"\n",
|
|
"cv_results = cross_validate(model, X, y, cv=10, scoring=['accuracy'], return_train_score=True)\n",
|
|
"\n",
|
|
"train_acc = cv_results['train_accuracy']\n",
|
|
"test_acc = cv_results['test_accuracy']\n",
|
|
"\n",
|
|
"mean_train_acc = train_acc.mean()\n",
|
|
"mean_test_acc = test_acc.mean()\n",
|
|
"\n",
|
|
"print('Train accuracy : {}'.format(mean_train_acc))\n",
|
|
"print('Test accuracy : {}'.format(mean_test_acc))\n",
|
|
"\n",
|
|
"'''\n",
|
|
"Both the SVC and the RandomForestClassifier show a high score on the training data, and a much lower performance on the test sets. This suggests \n",
|
|
"that the model overfits the training data (i.e. the models start to model irrelevant patterns in the training data, which do not generalize well to unseen \n",
|
|
"data). This indicate that the model has high variance. On the contrary, the LogisticRegression shows low performance on both the training and test data, which \n",
|
|
"is a sign of underfitting (the model is actually not complex enough to explain the data). Underfitting is associated to a model having high bias. \n",
|
|
"'''"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 124
|
|
},
|
|
"id": "c_W4FeGVaidW",
|
|
"outputId": "938e535a-2e78-4e93-ee48-5eb46f58cd2c",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 15,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Train accuracy : 1.0\n",
|
|
"Test accuracy : 0.6781609195402301\n"
|
|
]
|
|
},
|
|
{
|
|
"output_type": "execute_result",
|
|
"data": {
|
|
"text/plain": [
|
|
"'\\nBoth the SVC and the RandomForestClassifier show a high score on the training data, and a much lower performance on the test sets. This suggests \\nthat the model overfits the training data (i.e. the models start to model irrelevant patterns in the training data, which do not generalize well to unseen \\ndata). This indicate that the model has high variance. On the contrary, the LogisticRegression shows low performance on both the training and test data, which \\nis a sign of underfitting (the model is actually not complex enough to explain the data). Underfitting is associated to a model having high bias. \\n'"
|
|
],
|
|
"application/vnd.google.colaboratory.intrinsic+json": {
|
|
"type": "string"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"execution_count": 15
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**5) Like last week, grid search is an useful procedure when trying to find the best subset of hyper-parameters fo a model. Go check the documentation of the class RandomForestClassifier(), and perform a grid search on a given range of selected hyper-parameters. Set the number of folds to 10, and evaluate on the accuracy. Report the best subset of hyper-parameters, and the best score.**"
|
|
],
|
|
"metadata": {
|
|
"id": "yifQfW7-cE0t",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"clf = RandomForestClassifier()\n",
|
|
"\n",
|
|
"param_grid = { \n",
|
|
" 'n_estimators': [10, 20, 30],\n",
|
|
" 'max_features': ['auto', 'sqrt'],\n",
|
|
" 'max_depth' : [4,5],\n",
|
|
" 'criterion' :['gini', 'entropy']}\n",
|
|
"\n",
|
|
"grid = GridSearchCV(estimator=clf, param_grid=param_grid, cv=10, scoring='accuracy')\n",
|
|
"grid.fit(X, y)\n",
|
|
"\n",
|
|
"best_params = grid.best_params_\n",
|
|
"best_test_score = grid.cv_results_['mean_test_score'].max()\n",
|
|
"\n",
|
|
"print('Best set of hyper-parameters : {}'.format(best_params))\n",
|
|
"print('Best test accuracy : {}'.format(best_test_score))"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "t7BtcV33i-bR",
|
|
"outputId": "c56ca5aa-8d17-44da-be69-76a1c04d6166",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 7,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Best set of hyper-parameters : {'criterion': 'entropy', 'max_depth': 4, 'max_features': 'auto', 'n_estimators': 30}\n",
|
|
"Best test accuracy : 0.7497701149425288\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**6) Scaling the variables sometimes helps the classifier in its predictions. Check the class StandardScaler() and create a pipeline (using the method 'make_pipeline') tha combines both the scaler and a SVC(). Be careful, the scaler must only be applied to continuous variables. To this end, check the class ColumnTransformer(), and how it can be used to apply a scaler to only a set of specified columns. Use this papeline this perform the same grid search as before. Repeat the experiment using a RobustScaler().** \n",
|
|
"\n",
|
|
"**Do the performance change before and after scaling ?**"
|
|
],
|
|
"metadata": {
|
|
"id": "piIwQ3VMdWdQ",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"columns = df_new.select_dtypes(include=['float']).columns\n",
|
|
"\n",
|
|
"clf = SVC()\n",
|
|
"\n",
|
|
"param_grid = { \n",
|
|
" 'kernel': ['poly', 'rbf'],\n",
|
|
" 'degree': [1,2,3]\n",
|
|
"}\n",
|
|
"\n",
|
|
"grid = GridSearchCV(estimator=clf, param_grid=param_grid, cv=10, scoring='accuracy')\n",
|
|
"grid.fit(X,y)\n",
|
|
"\n",
|
|
"best_params = grid.best_params_\n",
|
|
"best_test_score = grid.cv_results_['mean_test_score'].max()\n",
|
|
"\n",
|
|
"print('Best set of hyper-parameters : {}'.format(best_params))\n",
|
|
"print('Best test accuracy : {}'.format(best_test_score))\n"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "fe2VWgPb21Nv",
|
|
"outputId": "f3bde876-6007-4753-9caf-6b5c21cfb297",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 14,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Best set of hyper-parameters : {'degree': 1, 'kernel': 'poly'}\n",
|
|
"Best test accuracy : 0.6781609195402301\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"columns = df_new.select_dtypes(include=['float']).columns\n",
|
|
"\n",
|
|
"ct = ColumnTransformer([\n",
|
|
" ('num_transformer', StandardScaler(), columns)\n",
|
|
" ], remainder='passthrough')\n",
|
|
"\n",
|
|
"clf = SVC()\n",
|
|
"\n",
|
|
"pipeline = make_pipeline(ct, clf)\n",
|
|
"\n",
|
|
"param_grid = { \n",
|
|
" 'svc__kernel': ['poly', 'rbf'],\n",
|
|
" 'svc__degree': [1,2,3]\n",
|
|
"}\n",
|
|
"\n",
|
|
"grid = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10, scoring='accuracy')\n",
|
|
"grid.fit(X,y)\n",
|
|
"\n",
|
|
"best_params = grid.best_params_\n",
|
|
"best_test_score = grid.cv_results_['mean_test_score'].max()\n",
|
|
"\n",
|
|
"print('Best set of hyper-parameters : {}'.format(best_params))\n",
|
|
"print('Best test accuracy : {}'.format(best_test_score))"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "23Wm9KB5KWTe",
|
|
"outputId": "25babb69-a7a2-445c-e684-962400b473c9",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 16,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Best set of hyper-parameters : {'svc__degree': 1, 'svc__kernel': 'poly'}\n",
|
|
"Best test accuracy : 0.7262068965517241\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"ct = ColumnTransformer([\n",
|
|
" ('num_transformer', RobustScaler(), columns)\n",
|
|
" ], remainder='passthrough')\n",
|
|
"\n",
|
|
"clf = SVC()\n",
|
|
"\n",
|
|
"pipeline = make_pipeline(ct, clf)\n",
|
|
"\n",
|
|
"param_grid = { \n",
|
|
" 'svc__kernel': ['poly', 'rbf'],\n",
|
|
" 'svc__degree': [1,2,3]\n",
|
|
"}\n",
|
|
"\n",
|
|
"grid = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10, scoring='accuracy')\n",
|
|
"grid.fit(X,y)\n",
|
|
"\n",
|
|
"best_params = grid.best_params_\n",
|
|
"best_test_score = grid.cv_results_['mean_test_score'].max()\n",
|
|
"\n",
|
|
"print('Best set of hyper-parameters : {}'.format(best_params))\n",
|
|
"print('Best test accuracy : {}'.format(best_test_score))"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "EUaTnMRk174A",
|
|
"outputId": "7f76ecaa-2d26-43f8-a43c-d6371d160f19",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": 18,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Best set of hyper-parameters : {'svc__degree': 1, 'svc__kernel': 'rbf'}\n",
|
|
"Best test accuracy : 0.7535632183908046\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Scaling the data ensures that all variables have the same scale. If our dataset, you might have noticed that the scales of 'platelets' and 'ejection_fraction' have orders of magnitude of difference between them. By scaling both variables using a Standard Scaler, we transform them such that their distributions have mean 0 and standard deviation of 1, i.e. they now have the same scale. \n",
|
|
"\n",
|
|
"But why is it necessary ? Some models, and especially distance-based models like SVC and KNN that use the distance between two data points to compute their similarity, are greatly affected by highly different scales. As such, to ensure that all features contribute equally to the model, we scale our variables beforehand. \n",
|
|
"\n",
|
|
"In the experiments above, we observe that scaling the variables indeed helped the SVC in making its predictions, with the Robust Scaler yielding better performance than the Standard Scaler. \n"
|
|
],
|
|
"metadata": {
|
|
"id": "DD1BeB1Jc_1Z",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**7) Let's now see which model amongst a logistic regression, a random forest, a SVC and a multi-layer perceptron performs the best on predicting an heart failure. For each model, perform a grid search cross-validation on a selected subset of hyper-parameters, and report the best model amongst the above. Check the documentation of LogisticRegression(), RandomForestClassifier(), SVC() and MLPClassiffier() to choose the hyper-parameters. Use the same pipeline as before, and evaluate on the accuracy using a 10 folds cross-validation. Report the best model and the best score.**"
|
|
],
|
|
"metadata": {
|
|
"id": "wItdjyS4f24T",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"import warnings\n",
|
|
"warnings.filterwarnings(\"ignore\")\n",
|
|
"\n",
|
|
"logistic = LogisticRegression(solver='saga', max_iter=1000)\n",
|
|
"forest = RandomForestClassifier()\n",
|
|
"svc = SVC() \n",
|
|
"mlp = MLPClassifier(max_iter=1000)\n",
|
|
"\n",
|
|
"param_grid_log = {\n",
|
|
" 'logisticregression__penalty' : ['l2', 'none'],\n",
|
|
" 'logisticregression__fit_intercept' : [True, False] \n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_forest = { \n",
|
|
" 'randomforestclassifier__n_estimators': [10, 50, 200],\n",
|
|
" 'randomforestclassifier__max_features': ['auto', 'sqrt'],\n",
|
|
" 'randomforestclassifier__max_depth' : [4,5],\n",
|
|
" 'randomforestclassifier__criterion' :['gini', 'entropy']\n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_svc = { \n",
|
|
" 'svc__kernel': ['linear', 'poly', 'rbf'],\n",
|
|
" 'svc__degree': [1,2,3],\n",
|
|
" 'svc__gamma' : ['scale', 'auto'],\n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_mlp = { \n",
|
|
" 'mlpclassifier__hidden_layer_sizes': [(10,), (50,), (100,)],\n",
|
|
" 'mlpclassifier__activation': ['identity', 'tanh', 'relu'],\n",
|
|
" 'mlpclassifier__learning_rate' : ['constant', 'invscaling', 'adaptive'],\n",
|
|
"}\n",
|
|
"\n",
|
|
"estimators = [logistic, forest, svc, mlp]\n",
|
|
"grids = [param_grid_log, param_grid_forest, param_grid_svc, param_grid_mlp]\n",
|
|
"best_results_list = []\n",
|
|
"best_params_list = []\n",
|
|
"for i, clf in enumerate(estimators):\n",
|
|
" pipeline = make_pipeline(ct, clf)\n",
|
|
" print(clf)\n",
|
|
" grid = GridSearchCV(estimator=pipeline, param_grid=grids[i], cv=10, scoring='accuracy')\n",
|
|
" grid.fit(X, y)\n",
|
|
" best_params_list.append(grid.best_params_)\n",
|
|
" best_results_list.append(grid.cv_results_['mean_test_score'].max())\n",
|
|
"\n",
|
|
"\n",
|
|
"print('Best set of hyper-parameters : {}'.format(best_params_list.index(np.argmax(best_results_list))))\n",
|
|
"print('Best test accuracy : {}'.format(max(best_results_list)))\n",
|
|
"\n"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 416
|
|
},
|
|
"id": "VQGyZpj797-p",
|
|
"outputId": "530e2b70-4407-429b-902c-a65955c9f187",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": null,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"LogisticRegression(max_iter=1000, solver='saga')\n",
|
|
"RandomForestClassifier()\n"
|
|
]
|
|
},
|
|
{
|
|
"output_type": "error",
|
|
"ename": "KeyboardInterrupt",
|
|
"evalue": "ignored",
|
|
"traceback": [
|
|
"\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
|
|
"\u001B[0;31mKeyboardInterrupt\u001B[0m Traceback (most recent call last)",
|
|
"\u001B[0;32m<ipython-input-11-ad8f6ebf232c>\u001B[0m in \u001B[0;36m<module>\u001B[0;34m()\u001B[0m\n\u001B[1;32m 39\u001B[0m \u001B[0mprint\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mclf\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 40\u001B[0m \u001B[0mgrid\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mGridSearchCV\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mpipeline\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mparam_grid\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mgrids\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0mi\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcv\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;36m10\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mscoring\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;34m'accuracy'\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 41\u001B[0;31m \u001B[0mgrid\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mfit\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mX\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0my\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 42\u001B[0m \u001B[0mbest_params_list\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mappend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mgrid\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mbest_params_\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 43\u001B[0m \u001B[0mbest_results_list\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mappend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mgrid\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mcv_results_\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0;34m'mean_test_score'\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mmax\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py\u001B[0m in \u001B[0;36mfit\u001B[0;34m(self, X, y, groups, **fit_params)\u001B[0m\n\u001B[1;32m 889\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0mresults\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 890\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 891\u001B[0;31m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_run_search\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mevaluate_candidates\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 892\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 893\u001B[0m \u001B[0;31m# multimetric is determined here because in the case of a callable\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py\u001B[0m in \u001B[0;36m_run_search\u001B[0;34m(self, evaluate_candidates)\u001B[0m\n\u001B[1;32m 1390\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m_run_search\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mevaluate_candidates\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1391\u001B[0m \u001B[0;34m\"\"\"Search all candidates in param_grid\"\"\"\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m-> 1392\u001B[0;31m \u001B[0mevaluate_candidates\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mParameterGrid\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mparam_grid\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 1393\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1394\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py\u001B[0m in \u001B[0;36mevaluate_candidates\u001B[0;34m(candidate_params, cv, more_results)\u001B[0m\n\u001B[1;32m 849\u001B[0m )\n\u001B[1;32m 850\u001B[0m for (cand_idx, parameters), (split_idx, (train, test)) in product(\n\u001B[0;32m--> 851\u001B[0;31m \u001B[0menumerate\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mcandidate_params\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0menumerate\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mcv\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msplit\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mX\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0my\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mgroups\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 852\u001B[0m )\n\u001B[1;32m 853\u001B[0m )\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self, iterable)\u001B[0m\n\u001B[1;32m 1044\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_iterating\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_original_iterator\u001B[0m \u001B[0;32mis\u001B[0m \u001B[0;32mnot\u001B[0m \u001B[0;32mNone\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1045\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m-> 1046\u001B[0;31m \u001B[0;32mwhile\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mdispatch_one_batch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0miterator\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 1047\u001B[0m \u001B[0;32mpass\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1048\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36mdispatch_one_batch\u001B[0;34m(self, iterator)\u001B[0m\n\u001B[1;32m 859\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0;32mFalse\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 860\u001B[0m \u001B[0;32melse\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 861\u001B[0;31m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_dispatch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mtasks\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 862\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0;32mTrue\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 863\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m_dispatch\u001B[0;34m(self, batch)\u001B[0m\n\u001B[1;32m 777\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_lock\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 778\u001B[0m \u001B[0mjob_idx\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mlen\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 779\u001B[0;31m \u001B[0mjob\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mapply_async\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mbatch\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mcb\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 780\u001B[0m \u001B[0;31m# A job can complete so quickly than its callback is\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 781\u001B[0m \u001B[0;31m# called before we get here, causing self._jobs to\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py\u001B[0m in \u001B[0;36mapply_async\u001B[0;34m(self, func, callback)\u001B[0m\n\u001B[1;32m 206\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0mapply_async\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mfunc\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;32mNone\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 207\u001B[0m \u001B[0;34m\"\"\"Schedule a func to be run\"\"\"\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 208\u001B[0;31m \u001B[0mresult\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mImmediateResult\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mfunc\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 209\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 210\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mresult\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py\u001B[0m in \u001B[0;36m__init__\u001B[0;34m(self, batch)\u001B[0m\n\u001B[1;32m 570\u001B[0m \u001B[0;31m# Don't delay the application, to avoid keeping the input\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 571\u001B[0m \u001B[0;31m# arguments in memory\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 572\u001B[0;31m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mresults\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mbatch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 573\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 574\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0mget\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m 261\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mparallel_backend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mn_jobs\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_n_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 262\u001B[0m return [func(*args, **kwargs)\n\u001B[0;32m--> 263\u001B[0;31m for func, args, kwargs in self.items]\n\u001B[0m\u001B[1;32m 264\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 265\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__reduce__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m<listcomp>\u001B[0;34m(.0)\u001B[0m\n\u001B[1;32m 261\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mparallel_backend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mn_jobs\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_n_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 262\u001B[0m return [func(*args, **kwargs)\n\u001B[0;32m--> 263\u001B[0;31m for func, args, kwargs in self.items]\n\u001B[0m\u001B[1;32m 264\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 265\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__reduce__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/utils/fixes.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self, *args, **kwargs)\u001B[0m\n\u001B[1;32m 214\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__call__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 215\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mconfig_context\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m**\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mconfig\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 216\u001B[0;31m \u001B[0;32mreturn\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mfunction\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 217\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 218\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py\u001B[0m in \u001B[0;36m_fit_and_score\u001B[0;34m(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)\u001B[0m\n\u001B[1;32m 700\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 701\u001B[0m \u001B[0mfit_time\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mtime\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mtime\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;34m-\u001B[0m \u001B[0mstart_time\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 702\u001B[0;31m \u001B[0mtest_scores\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0m_score\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mX_test\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0my_test\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mscorer\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0merror_score\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 703\u001B[0m \u001B[0mscore_time\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mtime\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mtime\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;34m-\u001B[0m \u001B[0mstart_time\u001B[0m \u001B[0;34m-\u001B[0m \u001B[0mfit_time\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 704\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mreturn_train_score\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py\u001B[0m in \u001B[0;36m_score\u001B[0;34m(estimator, X_test, y_test, scorer, error_score)\u001B[0m\n\u001B[1;32m 759\u001B[0m \u001B[0mscores\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mscorer\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mX_test\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 760\u001B[0m \u001B[0;32melse\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 761\u001B[0;31m \u001B[0mscores\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mscorer\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mX_test\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0my_test\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 762\u001B[0m \u001B[0;32mexcept\u001B[0m \u001B[0mException\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 763\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0merror_score\u001B[0m \u001B[0;34m==\u001B[0m \u001B[0;34m\"raise\"\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self, estimator, X, y_true, sample_weight)\u001B[0m\n\u001B[1;32m 219\u001B[0m \u001B[0mX\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 220\u001B[0m \u001B[0my_true\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 221\u001B[0;31m \u001B[0msample_weight\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0msample_weight\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 222\u001B[0m )\n\u001B[1;32m 223\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py\u001B[0m in \u001B[0;36m_score\u001B[0;34m(self, method_caller, estimator, X, y_true, sample_weight)\u001B[0m\n\u001B[1;32m 256\u001B[0m \"\"\"\n\u001B[1;32m 257\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 258\u001B[0;31m \u001B[0my_pred\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mmethod_caller\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m\"predict\"\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mX\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 259\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0msample_weight\u001B[0m \u001B[0;32mis\u001B[0m \u001B[0;32mnot\u001B[0m \u001B[0;32mNone\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 260\u001B[0m return self._sign * self._score_func(\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py\u001B[0m in \u001B[0;36m_cached_call\u001B[0;34m(cache, estimator, method, *args, **kwargs)\u001B[0m\n\u001B[1;32m 66\u001B[0m \u001B[0;34m\"\"\"Call estimator with method and args and kwargs.\"\"\"\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 67\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mcache\u001B[0m \u001B[0;32mis\u001B[0m \u001B[0;32mNone\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 68\u001B[0;31m \u001B[0;32mreturn\u001B[0m \u001B[0mgetattr\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mestimator\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mmethod\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 69\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 70\u001B[0m \u001B[0;32mtry\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/utils/metaestimators.py\u001B[0m in \u001B[0;36m<lambda>\u001B[0;34m(*args, **kwargs)\u001B[0m\n\u001B[1;32m 111\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 112\u001B[0m \u001B[0;31m# lambda, but not partial, allows help() to work with update_wrapper\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 113\u001B[0;31m \u001B[0mout\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;32mlambda\u001B[0m \u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mfn\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mobj\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;31m# noqa\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 114\u001B[0m \u001B[0;32melse\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 115\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py\u001B[0m in \u001B[0;36mpredict\u001B[0;34m(self, X, **predict_params)\u001B[0m\n\u001B[1;32m 468\u001B[0m \u001B[0;32mfor\u001B[0m \u001B[0m_\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mname\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mtransform\u001B[0m \u001B[0;32min\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_iter\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mwith_final\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;32mFalse\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 469\u001B[0m \u001B[0mXt\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mtransform\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mtransform\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mXt\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 470\u001B[0;31m \u001B[0;32mreturn\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msteps\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0;34m-\u001B[0m\u001B[0;36m1\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0;36m1\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mXt\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mpredict_params\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 471\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 472\u001B[0m \u001B[0;34m@\u001B[0m\u001B[0mavailable_if\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0m_final_estimator_has\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m\"fit_predict\"\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py\u001B[0m in \u001B[0;36mpredict\u001B[0;34m(self, X)\u001B[0m\n\u001B[1;32m 806\u001B[0m \u001B[0mThe\u001B[0m \u001B[0mpredicted\u001B[0m \u001B[0mclasses\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 807\u001B[0m \"\"\"\n\u001B[0;32m--> 808\u001B[0;31m \u001B[0mproba\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict_proba\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mX\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 809\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 810\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mn_outputs_\u001B[0m \u001B[0;34m==\u001B[0m \u001B[0;36m1\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py\u001B[0m in \u001B[0;36mpredict_proba\u001B[0;34m(self, X)\u001B[0m\n\u001B[1;32m 865\u001B[0m \u001B[0;34m)\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 866\u001B[0m \u001B[0mdelayed\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0m_accumulate_prediction\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0me\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict_proba\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mX\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mall_proba\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mlock\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 867\u001B[0;31m \u001B[0;32mfor\u001B[0m \u001B[0me\u001B[0m \u001B[0;32min\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mestimators_\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 868\u001B[0m )\n\u001B[1;32m 869\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self, iterable)\u001B[0m\n\u001B[1;32m 1044\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_iterating\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_original_iterator\u001B[0m \u001B[0;32mis\u001B[0m \u001B[0;32mnot\u001B[0m \u001B[0;32mNone\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1045\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m-> 1046\u001B[0;31m \u001B[0;32mwhile\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mdispatch_one_batch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0miterator\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 1047\u001B[0m \u001B[0;32mpass\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 1048\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36mdispatch_one_batch\u001B[0;34m(self, iterator)\u001B[0m\n\u001B[1;32m 859\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0;32mFalse\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 860\u001B[0m \u001B[0;32melse\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 861\u001B[0;31m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_dispatch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mtasks\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 862\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0;32mTrue\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 863\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m_dispatch\u001B[0;34m(self, batch)\u001B[0m\n\u001B[1;32m 777\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_lock\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 778\u001B[0m \u001B[0mjob_idx\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mlen\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 779\u001B[0;31m \u001B[0mjob\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mapply_async\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mbatch\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mcb\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 780\u001B[0m \u001B[0;31m# A job can complete so quickly than its callback is\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 781\u001B[0m \u001B[0;31m# called before we get here, causing self._jobs to\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py\u001B[0m in \u001B[0;36mapply_async\u001B[0;34m(self, func, callback)\u001B[0m\n\u001B[1;32m 206\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0mapply_async\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mfunc\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;32mNone\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 207\u001B[0m \u001B[0;34m\"\"\"Schedule a func to be run\"\"\"\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 208\u001B[0;31m \u001B[0mresult\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mImmediateResult\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mfunc\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 209\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 210\u001B[0m \u001B[0mcallback\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mresult\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py\u001B[0m in \u001B[0;36m__init__\u001B[0;34m(self, batch)\u001B[0m\n\u001B[1;32m 570\u001B[0m \u001B[0;31m# Don't delay the application, to avoid keeping the input\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 571\u001B[0m \u001B[0;31m# arguments in memory\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 572\u001B[0;31m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mresults\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mbatch\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 573\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 574\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0mget\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m 261\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mparallel_backend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mn_jobs\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_n_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 262\u001B[0m return [func(*args, **kwargs)\n\u001B[0;32m--> 263\u001B[0;31m for func, args, kwargs in self.items]\n\u001B[0m\u001B[1;32m 264\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 265\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__reduce__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/joblib/parallel.py\u001B[0m in \u001B[0;36m<listcomp>\u001B[0;34m(.0)\u001B[0m\n\u001B[1;32m 261\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mparallel_backend\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_backend\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mn_jobs\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0m_n_jobs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 262\u001B[0m return [func(*args, **kwargs)\n\u001B[0;32m--> 263\u001B[0;31m for func, args, kwargs in self.items]\n\u001B[0m\u001B[1;32m 264\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 265\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__reduce__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/utils/fixes.py\u001B[0m in \u001B[0;36m__call__\u001B[0;34m(self, *args, **kwargs)\u001B[0m\n\u001B[1;32m 214\u001B[0m \u001B[0;32mdef\u001B[0m \u001B[0m__call__\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 215\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mconfig_context\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m**\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mconfig\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 216\u001B[0;31m \u001B[0;32mreturn\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mfunction\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m*\u001B[0m\u001B[0margs\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m**\u001B[0m\u001B[0mkwargs\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 217\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 218\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py\u001B[0m in \u001B[0;36m_accumulate_prediction\u001B[0;34m(predict, X, out, lock)\u001B[0m\n\u001B[1;32m 638\u001B[0m \u001B[0mcomplains\u001B[0m \u001B[0mthat\u001B[0m \u001B[0mit\u001B[0m \u001B[0mcannot\u001B[0m \u001B[0mpickle\u001B[0m \u001B[0mit\u001B[0m \u001B[0mwhen\u001B[0m \u001B[0mplaced\u001B[0m \u001B[0mthere\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 639\u001B[0m \"\"\"\n\u001B[0;32m--> 640\u001B[0;31m \u001B[0mprediction\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mX\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mcheck_input\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;32mFalse\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 641\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mlock\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 642\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mlen\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mout\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;34m==\u001B[0m \u001B[0;36m1\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/sklearn/tree/_classes.py\u001B[0m in \u001B[0;36mpredict_proba\u001B[0;34m(self, X, check_input)\u001B[0m\n\u001B[1;32m 974\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mn_outputs_\u001B[0m \u001B[0;34m==\u001B[0m \u001B[0;36m1\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 975\u001B[0m \u001B[0mproba\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mproba\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;34m:\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mn_classes_\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m--> 976\u001B[0;31m \u001B[0mnormalizer\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mproba\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msum\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0maxis\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0;36m1\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mnp\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mnewaxis\u001B[0m\u001B[0;34m]\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 977\u001B[0m \u001B[0mnormalizer\u001B[0m\u001B[0;34m[\u001B[0m\u001B[0mnormalizer\u001B[0m \u001B[0;34m==\u001B[0m \u001B[0;36m0.0\u001B[0m\u001B[0;34m]\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m1.0\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 978\u001B[0m \u001B[0mproba\u001B[0m \u001B[0;34m/=\u001B[0m \u001B[0mnormalizer\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;32m/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py\u001B[0m in \u001B[0;36m_sum\u001B[0;34m(a, axis, dtype, out, keepdims, initial, where)\u001B[0m\n\u001B[1;32m 44\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0mumr_minimum\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0ma\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0maxis\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0;32mNone\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mout\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mkeepdims\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0minitial\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mwhere\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 45\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 46\u001B[0;31m def _sum(a, axis=None, dtype=None, out=None, keepdims=False,\n\u001B[0m\u001B[1;32m 47\u001B[0m initial=_NoValue, where=True):\n\u001B[1;32m 48\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0mumr_sum\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0ma\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0maxis\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mdtype\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mout\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mkeepdims\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0minitial\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mwhere\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
"\u001B[0;31mKeyboardInterrupt\u001B[0m: "
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**8) As you might have noticed, grid search can quickly take very long to compute when the hyper-parameters space on which to search becomes large. Another approach, that trades precision in the solution for lower runtime, is called Random Search. In a Random Search, only a defined number of hyper-parameters' subsets are selected randomly and used to fit the model, which considerably fastens the procedure. Perform the same experiment as above, but use the class RandomizedSearchCV() this time. Set the number of subsets to try to 5.**"
|
|
],
|
|
"metadata": {
|
|
"id": "1G5RUgPhhL8s",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"import warnings\n",
|
|
"warnings.filterwarnings(\"ignore\")\n",
|
|
"\n",
|
|
"logistic = LogisticRegression(solver='saga')\n",
|
|
"forest = RandomForestClassifier()\n",
|
|
"svc = SVC() \n",
|
|
"mlp = MLPClassifier()\n",
|
|
"\n",
|
|
"param_grid_log = {\n",
|
|
" 'logisticregression__penalty' : ['l2', 'none'],\n",
|
|
" 'logisticregression__fit_intercept' : [True, False] \n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_forest = { \n",
|
|
" 'randomforestclassifier__n_estimators': [10, 50, 200],\n",
|
|
" 'randomforestclassifier__max_features': ['auto', 'sqrt'],\n",
|
|
" 'randomforestclassifier__max_depth' : [4,5],\n",
|
|
" 'randomforestclassifier__criterion' :['gini', 'entropy']\n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_svc = { \n",
|
|
" 'svc__kernel': ['linear', 'poly', 'rbf'],\n",
|
|
" 'svc__degree': [1,2,3],\n",
|
|
" 'svc__gamma' : ['scale', 'auto'],\n",
|
|
"}\n",
|
|
"\n",
|
|
"param_grid_mlp = { \n",
|
|
" 'mlpclassifier__hidden_layer_sizes': [(10,), (50,), (100,)],\n",
|
|
" 'mlpclassifier__activation': ['identity', 'tanh', 'relu'],\n",
|
|
" 'mlpclassifier__learning_rate' : ['constant', 'invscaling', 'adaptive'],\n",
|
|
"}\n",
|
|
"\n",
|
|
"estimators = [logistic, forest, svc, mlp]\n",
|
|
"grids = [param_grid_log, param_grid_forest, param_grid_svc, param_grid_mlp]\n",
|
|
"best_results_list = []\n",
|
|
"best_params_list = []\n",
|
|
"best_estimators_list = []\n",
|
|
"for i, clf in enumerate(estimators):\n",
|
|
" print(clf)\n",
|
|
" pipeline = make_pipeline(ct, clf)\n",
|
|
" grid = RandomizedSearchCV(estimator=pipeline, param_distributions=grids[i], cv=10, scoring='accuracy', n_iter=5)\n",
|
|
" grid.fit(X, y)\n",
|
|
" best_params_list.append(grid.best_params_)\n",
|
|
" best_results_list.append(grid.cv_results_['mean_test_score'].max())\n",
|
|
" best_estimators_list.append(grid.best_estimator_)"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "5sVKQJVqbaGA",
|
|
"outputId": "76f0fc01-8da3-4bb1-f0f6-093879a68b95",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": null,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"LogisticRegression(solver='saga')\n",
|
|
"RandomForestClassifier()\n",
|
|
"SVC()\n",
|
|
"MLPClassifier()\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"print('Best set of hyper-parameters : {}'.format(best_params_list[np.argmax(best_results_list)]))\n",
|
|
"print('Best test accuracy : {}'.format(max(best_results_list)))"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "CQX7lUwSnLW3",
|
|
"outputId": "fef9dcbc-d9f6-4c03-fa76-e7a546da2fb7",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": null,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Best set of hyper-parameters : {'mlpclassifier__learning_rate': 'invscaling', 'mlpclassifier__hidden_layer_sizes': (50,), 'mlpclassifier__activation': 'relu'}\n",
|
|
"Best test accuracy : 0.7432183908045977\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"**9) Using the best model found above, report the precision, the recall, the accuracy, the F1 score and the area under the ROC curve. Use a 10 folds cross-validation, and report the means of these metrics across all folds for the training and test folds.**"
|
|
],
|
|
"metadata": {
|
|
"id": "ItK-qfL4i5NM",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"best_estimator = grid.best_estimator_\n",
|
|
"\n",
|
|
"scoring = ['precision', 'recall', 'accuracy', 'f1', 'roc_auc']\n",
|
|
"\n",
|
|
"cv_results = cross_validate(best_estimator, X, y, cv=10, scoring=scoring, return_train_score=True)\n",
|
|
"\n",
|
|
"print('Test Precision : {}'.format(cv_results['test_precision'].mean()))\n",
|
|
"print('Train Precision : {}'.format(cv_results['train_precision'].mean()))\n",
|
|
"print('Test Recall : {}'.format(cv_results['test_recall'].mean()))\n",
|
|
"print('Train Recall : {}'.format(cv_results['train_recall'].mean()))\n",
|
|
"print('Test Accuracy : {}'.format(cv_results['test_accuracy'].mean()))\n",
|
|
"print('Train Accuracy : {}'.format(cv_results['train_accuracy'].mean()))\n",
|
|
"print('Test F1 score : {}'.format(cv_results['test_f1'].mean()))\n",
|
|
"print('Train F1 score : {}'.format(cv_results['train_f1'].mean()))\n",
|
|
"print('Test AUROC : {}'.format(cv_results['test_roc_auc'].mean()))\n",
|
|
"print('Train AUROC : {}'.format(cv_results['train_roc_auc'].mean()))"
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "h8dxh_1ykdUw",
|
|
"outputId": "44fcef39-d641-4422-d2e6-44abbf3da2cd",
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"execution_count": null,
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"name": "stdout",
|
|
"text": [
|
|
"Test Precision : 0.645\n",
|
|
"Train Precision : 0.7626033967740431\n",
|
|
"Test Recall : 0.5033333333333333\n",
|
|
"Train Recall : 0.6394117647058823\n",
|
|
"Test Accuracy : 0.7467816091954023\n",
|
|
"Train Accuracy : 0.8196296403796476\n",
|
|
"Test F1 score : 0.5555950490857611\n",
|
|
"Train F1 score : 0.6951755621432663\n",
|
|
"Test AUROC : 0.7637280701754385\n",
|
|
"Train AUROC : 0.8775512965841662\n"
|
|
]
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"What do these metrics actually compute to evaluate the performance of our model ? \n",
|
|
"\n",
|
|
"* Precision = $\\frac{TP}{TP + FP}$\n",
|
|
" \n",
|
|
" * Fraction of the points that I **correctly** predicted as positive among all the points that I predicted as positive. \n",
|
|
"\n",
|
|
"* Recall = $\\frac{TP}{TP + FN}$\n",
|
|
"\n",
|
|
" * Fraction of the points that I **correctly** predicted as positive among all the points that I should have predicted as positive. \n",
|
|
"\n",
|
|
"* Accuracy = $\\frac{TP + TN}{N + P}$\n",
|
|
"\n",
|
|
" * Fraction of the points that I correctly predicted. \n",
|
|
"\n",
|
|
"* F1-score = $2\\frac{Precision * Recall}{Precision + Recall}$\n",
|
|
"\n",
|
|
" * Harmonic mean of precision and recall. \n",
|
|
"\n",
|
|
"* AUROC : All the models used in this lab (logistic regression, random forest, svc) actually output the probability that a datapoint belongs to the positive class. Given this probability, we must then decide on a threshold above which the point belongs to the positive class. Arbitrarly, this threshold is usually set to 0.5, such that if the probability output by the model is greater or equal to 0.5, the point is classified as \"1\", and \"0\"otherwise. However, setting the threshold at 0.5 is rather arbitrary, and other values might be better suited for the task at hand. For instance, if you want to detect credit card fraud, you want to make sure to capture as much fraud as possible, even if that means classifying a couple of legitimate transactions as fraud by doing so. To this end, you might want to lower your threshold, let's say to a value of 0.2, to capture as much fraud as possible. \n",
|
|
"\n",
|
|
" * We can further define the Specificity, or True Negative Rate, as $\\frac{TN}{N}$. As the TP, TN, FP, FN are dependent on the selected threshold, so are the Recall and the Specificity. The Receiver Operating Curve (ROC) actually displays the points $\\big(\\text{Recall}(\\tau), 1-\\text{Specificity}(\\tau)\\big)$ for different values of the threshold $\\tau$. The AUROC (Are Under the ROC) is simply the area under the obtained curve, it varies between 0 and 1, 1 being the AUROC of a perfect classifier. "
|
|
],
|
|
"metadata": {
|
|
"id": "f6RykhJMgJKa",
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
} |