0% found this document useful (0 votes)

1K views55 pages

Intro to Supervised Learning

This document provides an overview of machine learning concepts and supervised learning. It discusses the different types of machine learning including supervised, unsupervised, and reinforcement learning. For supervised learning, it describes the workflow including data collection, preprocessing, model training and evaluation. It then explains key supervised learning algorithms like decision trees, naive Bayes, k-nearest neighbors, logistic regression, and support vector machines. The document emphasizes the importance of validation, avoiding overfitting, and selecting the right algorithm based on characteristics of the problem and data.

Uploaded by

Weirliam John

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views55 pages

Intro to Supervised Learning

Uploaded by

Weirliam John

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Practical Data Science

An Introduction to Supervised Machine Learning

and Pattern Classification: The Big Picture

Sebastian Raschka
Michigan State University
NextGen Bioinformatics Seminars - 2015

Feb. 11, 2015

A Little Bit About Myself ...

PhD candidate in Dr. L. Kuhns Lab:
Developing software & methods for
- Protein ligand docking
- Large scale drug/inhibitor discovery

and some other machine learning side-projects

What is Machine Learning?

"Field of study that gives computers the
ability to learn without being explicitly
programmed.
(Arthur Samuel, 1959)

By Phillip Taylor [CC BY 2.0]

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Examples of Machine Learning

Text Recognition

Biology

http://commons.wikimedia.org/wiki/
File:American_book_company_1916._letter_envelope-2.JPG#filelinks
[public domain]

Spam Filtering
https://flic.kr/p/5BLW6G [CC BY 2.0]

Examples of Machine Learning

Self-driving cars
Recommendation systems

http://commons.wikimedia.org/wiki/File:Netflix_logo.svg [public domain]

By Steve Jurvetson [CC BY 2.0]

Photo search
and many, many
more ...
http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html

How many of you have used

machine learning before?

Our Agenda

Concepts and the big picture

Workflow

Practical tips & good habits

Labeled data
Direct feedback
Predict outcome/future

Supervised

Learning
Unsupervised

No labels
No feedback
Find hidden structure

Reinforcement

Decision process
Reward system
Learn series of actions

Unsupervised
learning

Unsupervised Learning

Supervised
learning

Supervised Learning

Clustering:

Regression:

Classification:

[DBSCAN on a toy dataset]

[Soccer Fantasy Score prediction]

[SVM on 2 classes of the Wine dataset]

Todays topic

Nomenclature

IRIS
https://archive.ics.uci.edu/ml/datasets/Iris

Instances (samples, observations)

sepal_length

sepal_width

petal_length

petal_width

class

5.1

3.5

1.4

0.2

setosa

4.9

3.0

1.4

0.2

setosa

6.4

3.2

4.5

1.5

veriscolor

150

5.9

3.0

5.1

1.8

virginica

Features (attributes, dimensions)

Classes (targets)

Classification
1) Learn from training data
class1
class2
x1

2) Map unseen (new) data

Supervised
Learning

Raw Data Collection

Missing Data

Pre-Processing
Feature Extraction

Sampling

Training Dataset

Split

Feature Selection
Feature Scaling

Pre-Processing

Test Dataset

New Data

Dimensionality Reduction

Final Model
Evaluation

Prediction

Learning Algorithm
Training

Cross Validation
Refinement
Hyperparameter
Optimization

Performance Metrics

Post-Processing
Model Selection

Final Classification/
Regression Model

Sebastian Raschka 2014

This work is licensed under a Creative Commons Attribution 4.0 International License.

Supervised
Learning

Raw Data Collection

Missing Data

Pre-Processing
Feature Extraction

Sampling

Training Dataset

Split

Feature Selection
Feature Scaling

Pre-Processing

Test Dataset

New Data

Dimensionality Reduction

Final Model
Evaluation

Prediction

Learning Algorithm
Training

Cross Validation
Refinement
Hyperparameter
Optimization

Performance Metrics

Post-Processing
Model Selection

Final Classification/
Regression Model

Sebastian Raschka 2014

This work is licensed under a Creative Commons Attribution 4.0 International License.

A Few Common Classifiers

Perceptron

Naive Bayes

Decision Tree
K-Nearest Neighbor
Logistic Regression
Artificial Neural Network / Deep Learning
Support Vector Machine
Ensemble Methods: Random Forest, Bagging, AdaBoost

Discriminative Algorithms
Map x y directly.
E.g., distinguish between people speaking different languages
without learning the languages.
Logistic Regression, SVM, Neural Networks

Generative Algorithms
Models a more general problem: how the data was generated.
I.e., the distribution of the class; joint probability distribution p(x,y).
Naive Bayes, Bayesian Belief Network classifier, Restricted
Boltzmann Machine

Examples of Discriminative Classifiers:

Perceptron
F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.

xi1
xi2

w0
w1

y {-1,1}

y = wTx = w0 + w1x1 + w2x2

wj = weight
xi = training sample
yi = desired output
y^ i = actual output
t = iteration step
= learning rate
= threshold (here 0)

update rule:

1 if wTxi
-1 otherwise

wj(t+1) = wj(t) + (yi - yi)xi

until
t+1 = max iter
or error = 0

Discriminative Classifiers:
Perceptron
F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.

1
xi1
xi2

w0
w1

yi
y {-1,1}

x1
x2

Binary classifier (one vs all, OVA)

Convergence problems (set n iterations)
Modification: stochastic gradient descent
Modern perceptron: Support Vector Machine (maximize margin)
Multilayer perceptron (MLP)

Generative Classifiers:
Naive Bayes
Bayes Theorem:

P(j | xi) =

Posterior probability =

Iris example:

P(xi | j) P(j)
P(xi)
Likelihood x Prior probability

P(Setosa"| xi),

Evidence

xi = [4.5 cm, 7.4 cm]

Generative Classifiers:
Naive Bayes

Bayes Theorem:

Decision Rule:

P(j | xi) =

P(xi | j) P(j)

pred. class label j

P(xi)

argmax P(j | xi)

i = 1, , m

e.g., j {Setosa, Versicolor, Virginica}

Generative Classifiers:
Naive Bayes
Evidence:

P(j | xi) =

Prior probability:

P(xi | j) P(j)
P(xi)

Nj
P(j) =
Nc

Class-conditional
probability
(here Gaussian kernel):

(cancels out)

(class frequency)

1
P(xik |j) = (2 j2) exp

P(xi |j)

(xik - j)2
2j2

P(xik |j)

Generative Classifiers:
Naive Bayes

Naive conditional independence assumption typically

violated
Works well for small datasets
Multinomial model still quite popular for text classification
(e.g., spam filter)

Non-Parametric Classifiers:
K-Nearest Neighbor
k=3

e.g., k=1

Simple!
Lazy learner
Very susceptible to curse of dimensionality

Iris Example

C=3

k=3
mahalanobis dist.
uniform weights

Setosa

Virginica

depth = 2

Versicolor

Decision Tree
petal length <= 2.45?

N
petal length <= 4.75?

Setosa

Y
Virginica

N
Versicolor

Entropy = pi logk pi
i
depth = 4

e.g.,

2 (- 0.5 log2(0.5)) = 1

Information Gain =
entropy(parent) [avg entropy(children)]

"No Free Lunch" :(

D. H. Wolpert. The supervised learning no-free-lunch theorems. In Soft Computing and Industry, pages 2542. Springer, 2002.

Our model is a simplification of reality

Simplification is based on assumptions (model bias)

Assumptions fail in certain situations

Roughly speaking:
No one model works best for all possible situations.

Which Algorithm?
What is the size and dimensionality of my training set?
Is the data linearly separable?
How much do I care about computational efficiency?
- Model building vs. real-time prediction time
- Eager vs. lazy learning / on-line vs. batch learning
- prediction performance vs. speed
Do I care about interpretability or should it "just work well?"
...

Supervised
Learning

Raw Data Collection

Missing Data

Pre-Processing
Feature Extraction

Sampling

Training Dataset

Split

Feature Selection
Feature Scaling

Pre-Processing

Test Dataset

New Data

Dimensionality Reduction

Final Model
Evaluation

Prediction

Learning Algorithm
Training

Cross Validation
Refinement
Hyperparameter
Optimization

Performance Metrics

Post-Processing
Model Selection

Final Classification/
Regression Model

Sebastian Raschka 2014

This work is licensed under a Creative Commons Attribution 4.0 International License.

Missing Values:
- Remove features (columns)
- Remove samples (rows)
- Imputation (mean, nearest neighbor, )

Sampling:
- Random split into training and validation sets
- Typically 60/40, 70/30, 80/20
- Dont use validation set until the very end!
(overfitting)

Categorical Variables
M

10.1

class
label
class1

color size
0 green
1

red

13.5

class2

blue

15.3

class1

nominal

ordinal

green (1,0,0)
red (0,1,0)
blue (0,0,1)

0
1
2

class
label
0
1
0

prize

M1
L2
XL 3

color=blue color=green
0
1
0
0
1
0

color=red
0
1
0

prize
10.1
13.5
15.3

size
1
2
3

Supervised
Learning

Raw Data Collection

Missing Data

Pre-Processing
Feature Extraction

Sampling

Training Dataset

Split

Feature Selection
Feature Scaling

Pre-Processing

Test Dataset

New Data

Dimensionality Reduction

Final Model
Evaluation

Prediction

Learning Algorithm
Training

Cross Validation
Refinement
Hyperparameter
Optimization

Performance Metrics

Post-Processing
Model Selection

Final Classification/
Regression Model

Sebastian Raschka 2014

This work is licensed under a Creative Commons Attribution 4.0 International License.

Generalization Error and Overfitting

How well does the model perform on unseen data?

Generalization Error and Overfitting

Error Metrics: Confusion Matrix

here: setosa = positive

[Linear SVM on sepal/petal lengths]

Error Metrics
here: setosa = positive

[Linear SVM on sepal/petal lengths]

micro and macro

averaging for multi-class

TP + TN
Accuracy =
FP +FN +TP +TN
= 1 - Error
FP
False Positive Rate =
N
TP
True Positive Rate =
P
(Recall)
TP
Precision =
TP + FP

Receiver Operating Characteristic

(ROC) Curves

Model Selection
Complete dataset

Training dataset

Test dataset

k-fold cross-validation (k=4):

fold 1

fold 2

fold 3

fold 4

Test set

1st iteration

calc. error

2nd iteration

calc. error

3rd iteration

calc. error

4th iteration

calc. error

calculate
avg. error

k-fold CV and ROC

Feature Selection
IMPORTANT!
(Noise, overfitting, curse of dimensionality, efficiency)
-

Domain knowledge
Variance threshold
Exhaustive search
Decision trees

Simplest example:
Greedy Backward Selection

start:

X = [x1, x2, x3, x4]

X = [x1, x3, x4]

stop:
(if d = k)

X = [x1, x3]

Dimensionality Reduction

Transformation onto a new feature subspace

e.g., Principal Component Analysis (PCA)

Find directions of maximum variance

Retain most of the information

PCA in 3 Steps
0. Standardize data
xik - k
z=

1. Compute covariance matrix

ik =

1 (xij - j) (xik - k)
n -1 i

2 1
21
=
31
41

12
2 2
32
42

13
23
2 3
43

14
24
34
2 4

PCA in 3 Steps
2. Eigendecomposition and sorting eigenvalues
Xv=v

Eigenvectors
[[ 0.52237162
[-0.26335492
[ 0.58125401
[ 0.56561105

-0.37231836 -0.72101681 0.26199559]

-0.92555649 0.24203288 -0.12413481]
-0.02109478 0.14089226 -0.80115427]
-0.06541577 0.6338014
0.52354627]]

Eigenvalues
[ 2.93035378

0.92740362

(from high to low)

0.14834223

0.02074601]

PCA in 3 Steps
3. Select top k eigenvectors and transform data
Eigenvectors
[[ 0.52237162
[-0.26335492
[ 0.58125401
[ 0.56561105

-0.37231836 -0.72101681 0.26199559]

-0.92555649 0.24203288 -0.12413481]
-0.02109478 0.14089226 -0.80115427]
-0.06541577 0.6338014
0.52354627]]

Eigenvalues
[ 2.93035378

0.92740362

[First 2 PCs of Iris]

0.14834223

0.02074601]

Hyperparameter Optimization:
GridSearch in scikit-learn

C=1000,
gamma=0.1

C=1

k=11
uniform weights

Non-Linear Problems
- XOR gate
depth=4

Kernel Trick
Kernel function
Kernel
Map onto high-dimensional space (non-linear combinations)

Kernel Trick
Trick: No explicit dot product!
Radius Basis Function (RBF) Kernel:

Kernel PCA

PC1, linear PCA

PC1, kernel PCA

Supervised
Learning

Raw Data Collection

Missing Data

Pre-Processing
Feature Extraction

Sampling

Training Dataset

Split

Feature Selection
Feature Scaling

Pre-Processing

Test Dataset

New Data

Dimensionality Reduction

Final Model
Evaluation

Prediction

Learning Algorithm
Training

Cross Validation
Refinement
Hyperparameter
Optimization

Performance Metrics

Post-Processing
Model Selection

Final Classification/
Regression Model

Sebastian Raschka 2014

This work is licensed under a Creative Commons Attribution 4.0 International License.

Thanks!
Questions?
@rasbt
[email protected]
https://github.com/rasbt

Additional Slides

Inspiring Literature
P. N. Klein. Coding the Matrix: Linear
Algebra Through Computer Science
Applications. Newtonian Press, 2013.

S. Gutierrez. Data Scientists at Work.

Apress, 2014.

R. Schutt and C. ONeil. Doing Data

Science: Straight Talk from the Frontline.
OReilly Media, Inc., 2013.

R. O. Duda, P. E. Hart, and D. G. Stork.

Pattern classification. 2nd. Edition. New
York, 2001.

Useful Online Resources

https://www.coursera.org/course/ml

http://stats.stackexchange.com

http://www.kaggle.com

My Favorite Tools
http://scikit-learn.org/stable/
http://www.numpy.org
http://pandas.pydata.org

Seaborn

http://stanford.edu/~mwaskom/software/seaborn/
http://ipython.org/notebook.html

Which one to pick?

class1
class2

Generalization error!

The problem of overfitting

Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
100% (1)
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
57 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
100% (1)
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
Data Mining
No ratings yet
Data Mining
73 pages
Pattern Recognition Systems
No ratings yet
Pattern Recognition Systems
81 pages
Statistical Learning: Classification Course
No ratings yet
Statistical Learning: Classification Course
65 pages
Bayesian Decision Theory Guide
No ratings yet
Bayesian Decision Theory Guide
39 pages
Metaheuristic Algorithms in Engineering
No ratings yet
Metaheuristic Algorithms in Engineering
57 pages
Recommendation Systems
No ratings yet
Recommendation Systems
29 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
No ratings yet
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
43 pages
Neural Networks & Pattern Recognition Guide
No ratings yet
Neural Networks & Pattern Recognition Guide
9 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
43 pages
AI and Decision Support Systems Overview
No ratings yet
AI and Decision Support Systems Overview
21 pages
Lightgt: A Light Graph Transformer For Multimedia Recommendation
No ratings yet
Lightgt: A Light Graph Transformer For Multimedia Recommendation
10 pages
22 - State Graph Reasoning For Multimodal Conversational Recommendation
No ratings yet
22 - State Graph Reasoning For Multimodal Conversational Recommendation
12 pages
Duda ch10
No ratings yet
Duda ch10
17 pages
Tabu Search 1
100% (1)
Tabu Search 1
15 pages
Pattern Recognition: Tutorial 2
No ratings yet
Pattern Recognition: Tutorial 2
23 pages
Introduction To Inverse Problems - Sari Lasanen
No ratings yet
Introduction To Inverse Problems - Sari Lasanen
104 pages
Lecture 3 - Color Image Processing
No ratings yet
Lecture 3 - Color Image Processing
105 pages
Symes - 2009 - The Seismic Reflection Inverse Problem
No ratings yet
Symes - 2009 - The Seismic Reflection Inverse Problem
39 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
Efficient Fine-Tuning with PEFT
No ratings yet
Efficient Fine-Tuning with PEFT
10 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
55 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Exploring: Safran Project
No ratings yet
Exploring: Safran Project
660 pages
SEMINAR REPORT - Image Processing
No ratings yet
SEMINAR REPORT - Image Processing
25 pages
Inverse Sturm-Liouville Problems and Their Applications
No ratings yet
Inverse Sturm-Liouville Problems and Their Applications
305 pages
Interface Python With MySQL
No ratings yet
Interface Python With MySQL
28 pages
An Improved Bat Algorithm For The Hybrid Flowshop Scheduling To Minimize Total Job Completion Time
No ratings yet
An Improved Bat Algorithm For The Hybrid Flowshop Scheduling To Minimize Total Job Completion Time
7 pages
Heuristic Optimization: Local Search & GRASP
No ratings yet
Heuristic Optimization: Local Search & GRASP
52 pages
Bayes Classification for Fish Sorting
No ratings yet
Bayes Classification for Fish Sorting
86 pages
MySQL-Python Connectivity Guide
No ratings yet
MySQL-Python Connectivity Guide
20 pages
Energy-Aware Flow-Shop Scheduling Algorithm
No ratings yet
Energy-Aware Flow-Shop Scheduling Algorithm
23 pages
A Survey On Multimodal Bidirectional Machine Learning Translation of Image and Natural Language Processing
No ratings yet
A Survey On Multimodal Bidirectional Machine Learning Translation of Image and Natural Language Processing
14 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Time Series Forecasting Using Clustering With Periodinc Pattern
No ratings yet
Time Series Forecasting Using Clustering With Periodinc Pattern
8 pages
Basic Image Processing for Robotics
No ratings yet
Basic Image Processing for Robotics
103 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
01 - ML Introduction - Course Outline
No ratings yet
01 - ML Introduction - Course Outline
21 pages
Pattern Recognition Essentials
No ratings yet
Pattern Recognition Essentials
31 pages
Python MySQL
No ratings yet
Python MySQL
22 pages
Graduate Heuristic Optimization
No ratings yet
Graduate Heuristic Optimization
37 pages
Introductory Techniques For 3-D Computer Vision
No ratings yet
Introductory Techniques For 3-D Computer Vision
182 pages
Intelligent Machining - Key Takeaways
No ratings yet
Intelligent Machining - Key Takeaways
3 pages
Sequencing and Scheduling: J J J J J J J J J J J J
No ratings yet
Sequencing and Scheduling: J J J J J J J J J J J J
8 pages
Lecture 4.b - Metaheuristics - Basic Concepts
No ratings yet
Lecture 4.b - Metaheuristics - Basic Concepts
42 pages
L19-20 ColorImageProcessing
No ratings yet
L19-20 ColorImageProcessing
72 pages
Lecture 12 Learning in Vision 2022
No ratings yet
Lecture 12 Learning in Vision 2022
100 pages
Imaging Course
No ratings yet
Imaging Course
33 pages
2019 - On The Control of Multi-Agent Systems - A Survey
No ratings yet
2019 - On The Control of Multi-Agent Systems - A Survey
164 pages
3-D VSP Survey Design and Processing
No ratings yet
3-D VSP Survey Design and Processing
24 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Machine Learning Workshop Overview
No ratings yet
Machine Learning Workshop Overview
44 pages
Introduction to Machine Learning Classification
No ratings yet
Introduction to Machine Learning Classification
62 pages
STAD Club Strategies Revisited: Looking Back at Volumes 1-12
No ratings yet
STAD Club Strategies Revisited: Looking Back at Volumes 1-12
188 pages
SSRN Id1808129
No ratings yet
SSRN Id1808129
19 pages
StableMagnet Standart Smart Contract Security Audit
No ratings yet
StableMagnet Standart Smart Contract Security Audit
7 pages
Bayesian Analysis for Complex Data
No ratings yet
Bayesian Analysis for Complex Data
49 pages
ELMo: Deep Contextualized Word Representations
No ratings yet
ELMo: Deep Contextualized Word Representations
15 pages
Chapter8 - Effective ML
No ratings yet
Chapter8 - Effective ML
15 pages
Python Metaprogramming
100% (1)
Python Metaprogramming
93 pages
NSW Photo Card Application Guide
No ratings yet
NSW Photo Card Application Guide
2 pages
Thrilling Footy: Cats, Saints, Pies or Bulldogs For AFL Grand Final
No ratings yet
Thrilling Footy: Cats, Saints, Pies or Bulldogs For AFL Grand Final
4 pages
Add
No ratings yet
Add
1 page
Deep Learning Basics
No ratings yet
Deep Learning Basics
4 pages
Mining Frequent Itemsets Without Candidate Generation
No ratings yet
Mining Frequent Itemsets Without Candidate Generation
17 pages
Critical Path Method
No ratings yet
Critical Path Method
3 pages
Quadratic Inequalities1
No ratings yet
Quadratic Inequalities1
20 pages
LostKey Report
No ratings yet
LostKey Report
7 pages
Segmentasi Objek Semi-Otomatis Menggunakan Metode Region Merging Maximal Similarity Berbasis Algoritma Mean Shift Dan Normalized Cuts
No ratings yet
Segmentasi Objek Semi-Otomatis Menggunakan Metode Region Merging Maximal Similarity Berbasis Algoritma Mean Shift Dan Normalized Cuts
14 pages
Arundel Partners
86% (7)
Arundel Partners
2 pages
La&c 2023-24 Mid I QB
No ratings yet
La&c 2023-24 Mid I QB
7 pages
Solution to ODE with Cosine and Sine
No ratings yet
Solution to ODE with Cosine and Sine
1 page
Cryptography Exam: B.Tech CSE Winter 2023
No ratings yet
Cryptography Exam: B.Tech CSE Winter 2023
4 pages
Nessus Vulnerability Scan Summary
No ratings yet
Nessus Vulnerability Scan Summary
6 pages
COMP 250 Midterm 2 Crib Sheet
No ratings yet
COMP 250 Midterm 2 Crib Sheet
3 pages
Assignment and Game Theory
No ratings yet
Assignment and Game Theory
42 pages
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
100% (5)
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
74 pages
Unit 3 Test Review Guide
No ratings yet
Unit 3 Test Review Guide
6 pages
PQXDH
No ratings yet
PQXDH
16 pages
r05410408 Digital Image Processing
No ratings yet
r05410408 Digital Image Processing
5 pages
Statistics and Optimization Techniques Detailed Exam Guide
No ratings yet
Statistics and Optimization Techniques Detailed Exam Guide
3 pages
Graphs and Algorithms Exam Guide
No ratings yet
Graphs and Algorithms Exam Guide
17 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
Ieee Isgt2022
No ratings yet
Ieee Isgt2022
5 pages
Multiple Choice Questions Related To Testing Knowledge About Time and Space Complexity of A Program - Tutorial - CodeChef Discuss
No ratings yet
Multiple Choice Questions Related To Testing Knowledge About Time and Space Complexity of A Program - Tutorial - CodeChef Discuss
59 pages
Information Security System (CSPC-307) : Introduction To Modern Symmetric-Key Ciphers
No ratings yet
Information Security System (CSPC-307) : Introduction To Modern Symmetric-Key Ciphers
51 pages
Basic Digital Audio Signal Processing
No ratings yet
Basic Digital Audio Signal Processing
58 pages
2021-Analisis Regresi Linear Brganda
No ratings yet
2021-Analisis Regresi Linear Brganda
10 pages
Control Systems 1 Block Diagram Reduction Part 3
No ratings yet
Control Systems 1 Block Diagram Reduction Part 3
8 pages
GAI End of Course Notes
No ratings yet
GAI End of Course Notes
3 pages
Aies Unit 3
No ratings yet
Aies Unit 3
11 pages
Maxima and Minima of Function of One Variables: IIT JEE (Main) Examination
No ratings yet
Maxima and Minima of Function of One Variables: IIT JEE (Main) Examination
18 pages
Joseph V. Tranquillo - An Introduction To Complex Systems - Making Sense of A Changing World - Springer (2019) PDF
100% (6)
Joseph V. Tranquillo - An Introduction To Complex Systems - Making Sense of A Changing World - Springer (2019) PDF
405 pages

Intro to Supervised Learning

Uploaded by

Intro to Supervised Learning

Uploaded by

Practical Data Science

An Introduction to Supervised Machine Learning

Feb. 11, 2015

A Little Bit About Myself ...

and some other machine learning side-projects

What is Machine Learning?

By Phillip Taylor [CC BY 2.0]

Examples of Machine Learning

Examples of Machine Learning

http://commons.wikimedia.org/wiki/File:Netflix_logo.svg [public domain]

How many of you have used

Concepts and the big picture

Practical tips & good habits

[DBSCAN on a toy dataset]

[Soccer Fantasy Score prediction]

[SVM on 2 classes of the Wine dataset]

Instances (samples, observations)

Features (attributes, dimensions)

2) Map unseen (new) data

Raw Data Collection

Sebastian Raschka 2014

Raw Data Collection

Sebastian Raschka 2014

A Few Common Classifiers

Examples of Discriminative Classifiers:

y = wTx = w0 + w1x1 + w2x2

wj(t+1) = wj(t) + (yi - yi)xi

Binary classifier (one vs all, OVA)

xi = [4.5 cm, 7.4 cm]

pred. class label j

argmax P(j | xi)

e.g., j {Setosa, Versicolor, Virginica}

Naive conditional independence assumption typically

"No Free Lunch" :(

Our model is a simplification of reality

Simplification is based on assumptions (model bias)

Assumptions fail in certain situations

Raw Data Collection

Sebastian Raschka 2014

Raw Data Collection

Sebastian Raschka 2014

Generalization Error and Overfitting

How well does the model perform on unseen data?

Generalization Error and Overfitting

Error Metrics: Confusion Matrix

[Linear SVM on sepal/petal lengths]

[Linear SVM on sepal/petal lengths]

micro and macro

Receiver Operating Characteristic

k-fold cross-validation (k=4):

k-fold CV and ROC

X = [x1, x2, x3, x4]

Transformation onto a new feature subspace

e.g., Principal Component Analysis (PCA)

Find directions of maximum variance

Retain most of the information

1. Compute covariance matrix

-0.37231836 -0.72101681 0.26199559]

(from high to low)

-0.37231836 -0.72101681 0.26199559]

[First 2 PCs of Iris]

PC1, linear PCA

PC1, kernel PCA

Raw Data Collection

Sebastian Raschka 2014

S. Gutierrez. Data Scientists at Work.

R. Schutt and C. ONeil. Doing Data

R. O. Duda, P. E. Hart, and D. G. Stork.

Useful Online Resources

Which one to pick?

The problem of overfitting

You might also like