{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Lab3_solutions.ipynb", "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "This notebook is an introduction to the library scikit-learn (https://scikit-learn.org/stable/), which provides numerous tools to easily perform machine learning tasks. \n", "\n", "In this lab, we'll experiment with two of the most frequently encountered tasks in machine learning : \n", " - Regression : predicting a continuous variable given a set of predictors. \n", " - Classification : predicting the class of a sample given a set of predictors.\n", "\n", "In order for you to first have a good feeling of the general pipeline of a machine learning task, we'll experiment with two simple models : a linear regression, and a K-nearest neighbors classifier. " ], "metadata": { "id": "4bV7mOa9W1Eu", "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "markdown", "source": [ "**Import necessary libraries**" ], "metadata": { "id": "jBfdU_joaqrm", "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "source": [ "import numpy as np\n", "import pandas as pd\n", "import os\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.metrics import mean_squared_error, precision_score, recall_score, accuracy_score, confusion_matrix, \\\n", " roc_auc_score, roc_curve, f1_score\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from mpl_toolkits import mplot3d" ], "metadata": { "id": "y90MCHqMa1j5", "pycharm": { "name": "#%%\n" } }, "execution_count": 5, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Load the 'Pokemon.csv' dataset as a pandas dataframe, change the Type 1 and Type 2 variables to categorical and replace 'Type 2' missing values (replace by the value of 'Type 1').**" ], "metadata": { "id": "zx1zL4ura5ze", "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "source": [ "file = 'data/Pokemon.csv'\n", "\n", "##Read dataframe##\n", "\n", "df = pd.read_csv(file)\n", "print(df.head())" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LuZ28dR_bsXi", "outputId": "953949cf-64d3-4c54-e218-15b2cba75dbe", "pycharm": { "name": "#%%\n" } }, "execution_count": 9, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " # Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", "1 2 Ivysaur Grass Poison 405 60 62 63 \n", "2 3 Venusaur Grass Poison 525 80 82 83 \n", "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", "4 4 Charmander Fire NaN 309 39 52 43 \n", "\n", " Sp. Atk Sp. Def Speed Generation Legendary \n", "0 65 65 45 1 False \n", "1 80 80 60 1 False \n", "2 100 100 80 1 False \n", "3 122 120 80 1 False \n", "4 60 50 65 1 False \n" ] } ] }, { "cell_type": "code", "source": [ "##Change variables types##\n", "\n", "print(df.dtypes)\n", "df.astype({'Type 1': 'category', 'Type 2': 'category'})" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 667 }, "id": "BRPoZ4YhbuLr", "outputId": "7f4450fa-b7c8-4ea6-f53b-c3cd360882cb", "pycharm": { "name": "#%%\n" } }, "execution_count": 10, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# int64\n", "Name object\n", "Type 1 object\n", "Type 2 object\n", "Total int64\n", "HP int64\n", "Attack int64\n", "Defense int64\n", "Sp. Atk int64\n", "Sp. Def int64\n", "Speed int64\n", "Generation int64\n", "Legendary bool\n", "dtype: object\n" ] }, { "data": { "text/plain": " # Name Type 1 Type 2 Total HP Attack Defense \\\n0 1 Bulbasaur Grass Poison 318 45 49 49 \n1 2 Ivysaur Grass Poison 405 60 62 63 \n2 3 Venusaur Grass Poison 525 80 82 83 \n3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n4 4 Charmander Fire NaN 309 39 52 43 \n.. ... ... ... ... ... .. ... ... \n795 719 Diancie Rock Fairy 600 50 100 150 \n796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 \n797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n799 721 Volcanion Fire Water 600 80 110 120 \n\n Sp. Atk Sp. Def Speed Generation Legendary \n0 65 65 45 1 False \n1 80 80 60 1 False \n2 100 100 80 1 False \n3 122 120 80 1 False \n4 60 50 65 1 False \n.. ... ... ... ... ... \n795 100 150 50 6 True \n796 160 110 110 6 True \n797 150 130 70 6 True \n798 170 130 80 6 True \n799 130 90 70 6 True \n\n[800 rows x 13 columns]", "text/html": "
| \n | # | \nName | \nType 1 | \nType 2 | \nTotal | \nHP | \nAttack | \nDefense | \nSp. Atk | \nSp. Def | \nSpeed | \nGeneration | \nLegendary | \n
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n1 | \nBulbasaur | \nGrass | \nPoison | \n318 | \n45 | \n49 | \n49 | \n65 | \n65 | \n45 | \n1 | \nFalse | \n
| 1 | \n2 | \nIvysaur | \nGrass | \nPoison | \n405 | \n60 | \n62 | \n63 | \n80 | \n80 | \n60 | \n1 | \nFalse | \n
| 2 | \n3 | \nVenusaur | \nGrass | \nPoison | \n525 | \n80 | \n82 | \n83 | \n100 | \n100 | \n80 | \n1 | \nFalse | \n
| 3 | \n3 | \nVenusaurMega Venusaur | \nGrass | \nPoison | \n625 | \n80 | \n100 | \n123 | \n122 | \n120 | \n80 | \n1 | \nFalse | \n
| 4 | \n4 | \nCharmander | \nFire | \nNaN | \n309 | \n39 | \n52 | \n43 | \n60 | \n50 | \n65 | \n1 | \nFalse | \n
| ... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n
| 795 | \n719 | \nDiancie | \nRock | \nFairy | \n600 | \n50 | \n100 | \n150 | \n100 | \n150 | \n50 | \n6 | \nTrue | \n
| 796 | \n719 | \nDiancieMega Diancie | \nRock | \nFairy | \n700 | \n50 | \n160 | \n110 | \n160 | \n110 | \n110 | \n6 | \nTrue | \n
| 797 | \n720 | \nHoopaHoopa Confined | \nPsychic | \nGhost | \n600 | \n80 | \n110 | \n60 | \n150 | \n130 | \n70 | \n6 | \nTrue | \n
| 798 | \n720 | \nHoopaHoopa Unbound | \nPsychic | \nDark | \n680 | \n80 | \n160 | \n60 | \n170 | \n130 | \n80 | \n6 | \nTrue | \n
| 799 | \n721 | \nVolcanion | \nFire | \nWater | \n600 | \n80 | \n110 | \n120 | \n130 | \n90 | \n70 | \n6 | \nTrue | \n
800 rows × 13 columns
\n