0% found this document useful (0 votes)

78 views43 pages

Supervised Learning: Andreas Müller

This document discusses supervised machine learning using scikit-learn. It begins with an overview of supervised learning, including that it uses labeled training data to predict target variables. It then discusses classification and regression problems in supervised learning. The rest of the document demonstrates supervised learning on the Iris dataset using k-nearest neighbors classification in scikit-learn, including fitting a model to training data and measuring its performance on test data.

Uploaded by

Amit Topiwala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views43 pages

Supervised Learning: Andreas Müller

Uploaded by

Amit Topiwala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Supervised learning

S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

Andreas Müller
Core developer, scikit-learn
What is machine learning?
The art and science of:
Giving computers the ability to learn to make decisions from
data

without being explicitly programmed!

Examples:
Learning to predict whether an email is spam or not

Clustering wikipedia entries into different categories

Supervised learning: Uses labeled data

Unsupervised learning: Uses unlabeled data

SUPERVISED LEARNING WITH SCIKIT-LEARN

Unsupervised learning
Uncovering hidden patterns from unlabeled data

Example:
Grouping customers into distinct categories (Clustering)

SUPERVISED LEARNING WITH SCIKIT-LEARN

Reinforcement learning
Software agents interact with an environment
Learn how to optimize their behavior

Given a system of rewards and punishments

Draws inspiration from behavioral psychology

Applications
Economics

Genetics

Game playing

AlphaGo: First computer to defeat the world champion in Go

SUPERVISED LEARNING WITH SCIKIT-LEARN

Supervised learning
Predictor variables/features and a target variable

Aim: Predict the target variable, given the predictor variables

Classi cation: Target variable consists of categories

Regression: Target variable is continuous

SUPERVISED LEARNING WITH SCIKIT-LEARN

Naming conventions
Features = predictor variables = independent variables

Target variable = dependent variable = response variable

SUPERVISED LEARNING WITH SCIKIT-LEARN

Supervised learning
Automate time-consuming or expensive manual tasks
Example: Doctor’s diagnosis

Make predictions about the future

Example: Will a customer click on an ad or not?

Need labeled data

Historical data with labels

Experiments to get labeled data

Crowd-sourcing labeled data

SUPERVISED LEARNING WITH SCIKIT-LEARN

Supervised learning in Python
We will use scikit-learn/sklearn
Integrates well with the SciPy stack

Other libraries
TensorFlow

keras

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Exploratory data
analysis
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

Hugo Bowne-Anderson
Data Scientist, DataCamp
The Iris dataset
Features:

Petal length

Petal width

Sepal length

Sepal width

Target variable: Species

Versicolor

Virginica

Setosa

SUPERVISED LEARNING WITH SCIKIT-LEARN

The Iris dataset in scikit-learn
from sklearn import datasets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
iris = datasets.load_iris()
type(iris)

sklearn.datasets.base.Bunch

print(iris.keys())

dict_keys(['data', 'target_names', 'DESCR', 'feature_names', 'target'])

SUPERVISED LEARNING WITH SCIKIT-LEARN

The Iris dataset in scikit-learn
type(iris.data), type(iris.target)

(numpy.ndarray, numpy.ndarray)

iris.data.shape

(150, 4)

iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

SUPERVISED LEARNING WITH SCIKIT-LEARN

Exploratory data analysis (EDA)
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns=iris.feature_names)
print(df.head())

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

SUPERVISED LEARNING WITH SCIKIT-LEARN

Visual EDA
_ = pd.plotting.scatter_matrix(df, c = y, figsize = [8, 8],
s=150, marker = 'D')

SUPERVISED LEARNING WITH SCIKIT-LEARN

Visual EDA

SUPERVISED LEARNING WITH SCIKIT-LEARN

Visual EDA

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
The classi cation
challenge
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

Hugo Bowne-Anderson
Data Scientist, DataCamp
k-Nearest Neighbors
Basic idea: Predict the label of a data point by
Looking at the ‘k’ closest labeled data points

Taking a majority vote

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-NN: Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-NN: Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-NN: Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-NN: Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN

k-NN: Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN

Scikit-learn t and predict
All machine learning models implemented as Python classes
They implement the algorithms for learning and predicting

Store the information learned from the data

Training a model on the data = ‘ tting’ a model to the data

.fit() method

To predict the labels of new data: .predict() method

SUPERVISED LEARNING WITH SCIKIT-LEARN

Using scikit-learn to t a classi er
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=6)
knn.fit(iris['data'], iris['target'])

KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski',metric_params=None, n_jobs=1,
n_neighbors=6, p=2,weights='uniform')

iris['data'].shape

(150, 4)

iris['target'].shape

(150,)

SUPERVISED LEARNING WITH SCIKIT-LEARN

Predicting on unlabeled data
prediction = knn.predict(X_new)

X_new.shape

(3, 4)

print('Prediction: {}’.format(prediction))

Prediction: [1 1 0]

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Measuring model
performance
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

Hugo Bowne-Anderson
Data Scientist, DataCamp
Measuring model performance
In classi cation, accuracy is a commonly used metric

Accuracy = Fraction of correct predictions

Which data should be used to compute accuracy?

How well will the model perform on new data?

SUPERVISED LEARNING WITH SCIKIT-LEARN

Measuring model performance
Could compute accuracy on data used to t classi er

NOT indicative of ability to generalize

Split data into training and test set

Fit/train the classi er on the training set

Make predictions on test set

Compare predictions with the known labels

SUPERVISED LEARNING WITH SCIKIT-LEARN

Train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.3,
random_state=21, stratify=y)
knn = KNeighborsClassifier(n_neighbors=8)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(\"Test set predictions:\\n {}\".format(y_pred))

Test set predictions:

[2 1 2 2 1 0 1 0 0 1 0 2 0 2 2 0 0 0 1 0 2 2 2 0 1 1 1 0 0
1 2 2 0 0 2 2 1 1 2 1 1 0 2 1]

knn.score(X_test, y_test)

0.9555555555555556

SUPERVISED LEARNING WITH SCIKIT-LEARN

Model complexity
Larger k = smoother decision boundary = less complex model

Smaller k = more complex model = can lead to over tting

1 Source: Andreas Müller & Sarah Guido, Introduction to Machine Learning with
Python

SUPERVISED LEARNING WITH SCIKIT-LEARN

Model complexity and over/under tting

SUPERVISED LEARNING WITH SCIKIT-LEARN

Model complexity and over/under tting

SUPERVISED LEARNING WITH SCIKIT-LEARN

Model complexity and over/under tting

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

09K K1 Drum
100% (3)
09K K1 Drum
3 pages
Practical Data Analysis
From Everand
Practical Data Analysis
Hector Cuesta
4.5/5 (14)
Ch1 - Slides - Supervised Learning
No ratings yet
Ch1 - Slides - Supervised Learning
32 pages
Supervised Learning With Scikit-learn
No ratings yet
Supervised Learning With Scikit-learn
178 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
supervised learning using python - chapter1
No ratings yet
supervised learning using python - chapter1
34 pages
Machine Learning With Scikit-Learn: George Boorman
No ratings yet
Machine Learning With Scikit-Learn: George Boorman
34 pages
Slides (A12 A14)
No ratings yet
Slides (A12 A14)
353 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Scikit-Learn-Exercises - Jupyter Notebook
100% (2)
Scikit-Learn-Exercises - Jupyter Notebook
28 pages
algorithmeknn-121213175830-phpapp02
No ratings yet
algorithmeknn-121213175830-phpapp02
52 pages
Machine Learning With Skicit-learn
No ratings yet
Machine Learning With Skicit-learn
15 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
VTU ML (1)
No ratings yet
VTU ML (1)
62 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
supervised learning using python - chapter3
No ratings yet
supervised learning using python - chapter3
47 pages
Unit 2 MLMM
No ratings yet
Unit 2 MLMM
41 pages
6 - Machine Learning 2
No ratings yet
6 - Machine Learning 2
14 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
How Good Is Your Model?: Andreas Müller
No ratings yet
How Good Is Your Model?: Andreas Müller
54 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
100% (1)
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
23 pages
Python SciKit Learn Tutorial _ DigitalOcean
No ratings yet
Python SciKit Learn Tutorial _ DigitalOcean
11 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
No ratings yet
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
47 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Scikit
No ratings yet
Scikit
3 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Regression Scikit Learn
No ratings yet
Regression Scikit Learn
33 pages
UNIT 1
No ratings yet
UNIT 1
28 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Lect3 Supervised1
No ratings yet
Lect3 Supervised1
25 pages
Supervised Learning With Scikit-Learn: How Good Is Your Model?
No ratings yet
Supervised Learning With Scikit-Learn: How Good Is Your Model?
31 pages
SK Learn
No ratings yet
SK Learn
9 pages
Scikit Learn
No ratings yet
Scikit Learn
107 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
04_MLModelingBasics
No ratings yet
04_MLModelingBasics
61 pages
Python Machine Learning
No ratings yet
Python Machine Learning
29 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Knowing the Machine Learning
No ratings yet
Knowing the Machine Learning
15 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
34 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Practical Guide To Scikit-Learn For Data Science
No ratings yet
Practical Guide To Scikit-Learn For Data Science
27 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Data Science
No ratings yet
Data Science
38 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
Large Scale Machine Learning with Python
From Everand
Large Scale Machine Learning with Python
Bastiaan Sjardin
2/5 (1)
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
From Everand
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
Giuseppe Ciaburro
No ratings yet
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
From Everand
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
Dr. Rajkumar Tekchandani
No ratings yet
Didactics For Reading and Listening in English
No ratings yet
Didactics For Reading and Listening in English
19 pages
Q4 Applied Eco Learning Material Wk12
No ratings yet
Q4 Applied Eco Learning Material Wk12
9 pages
Unidades Interiores Ductos MHB
No ratings yet
Unidades Interiores Ductos MHB
32 pages
The Second Test - 11
No ratings yet
The Second Test - 11
2 pages
Drucker Illustrates The Importance of A Sense of Mission With His Story of Three People Working On A Building Site
No ratings yet
Drucker Illustrates The Importance of A Sense of Mission With His Story of Three People Working On A Building Site
1 page
Electromagne C Wave Ma Er Wave
No ratings yet
Electromagne C Wave Ma Er Wave
1 page
Copy of Joaquin Rayford - Juvenile Justice Open Letter Rough Draft
No ratings yet
Copy of Joaquin Rayford - Juvenile Justice Open Letter Rough Draft
2 pages
INITIATION
No ratings yet
INITIATION
3 pages
English Holiday Homework
No ratings yet
English Holiday Homework
3 pages
A Survey of Structural Optimization in Mechanical Product Development
No ratings yet
A Survey of Structural Optimization in Mechanical Product Development
13 pages
Application of Engineering Curves
100% (4)
Application of Engineering Curves
4 pages
Agriculturist: Skills and Competencies
No ratings yet
Agriculturist: Skills and Competencies
3 pages
Health Economics: Which of The Following Is Not A Reason For Increased Health Spending?
100% (1)
Health Economics: Which of The Following Is Not A Reason For Increased Health Spending?
8 pages
Strategy PRD
No ratings yet
Strategy PRD
15 pages
Chapter 4 LTD PTN
No ratings yet
Chapter 4 LTD PTN
6 pages
Dissertation Ideas Clinical Psychology
100% (2)
Dissertation Ideas Clinical Psychology
5 pages
The Role of Politics in Gabriel García Márquez's One Hundred Years of Solitude
No ratings yet
The Role of Politics in Gabriel García Márquez's One Hundred Years of Solitude
8 pages
Nine Morning VENUS
No ratings yet
Nine Morning VENUS
1 page
English Vi Week 1: NAME: - DATE
No ratings yet
English Vi Week 1: NAME: - DATE
3 pages
FAR BEHIND CHORDS (Ver 2) by Candlebox @
No ratings yet
FAR BEHIND CHORDS (Ver 2) by Candlebox @
3 pages
40 Hadith On Music
No ratings yet
40 Hadith On Music
7 pages
The Beginnings of Piezoelectricity
100% (2)
The Beginnings of Piezoelectricity
278 pages
Legal Framework of Taxation
No ratings yet
Legal Framework of Taxation
34 pages
20th Return To Camden Town Festival Magazine/ Programme
100% (1)
20th Return To Camden Town Festival Magazine/ Programme
21 pages
Horner I B TR Book of The Discipline Vinaya Pitaka Vol IV Mahavagga 568p
No ratings yet
Horner I B TR Book of The Discipline Vinaya Pitaka Vol IV Mahavagga 568p
568 pages
OB Notes
No ratings yet
OB Notes
2 pages
Entrep PPT Chap 1
No ratings yet
Entrep PPT Chap 1
21 pages
Accomplishment Report in H.R Guidance
No ratings yet
Accomplishment Report in H.R Guidance
2 pages
(HC) Fentress v. Powers - Document No. 3
No ratings yet
(HC) Fentress v. Powers - Document No. 3
1 page