Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
Feature'2
*"+,-,./'0.%/1#
&2'
!"#$%&"')
Feature'1
Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta tistical
Learning, Knowledge Discovery in Databases, Pattern Dis covery.
Data everywhere!
1. Google: processes 24 peta bytes of data per day. 2.
Data types
Data comes in different sizes and also flavors (types): Texts
Numbers
Clickstreams
Graphs
Tables
Images
Transactions
Videos
DB statistics, Clustering
Research questions?
Time
Data.
Data.
DB 54
Applications of ML • We all
use it on a daily basis. Examples:
Machine Learning
• Spam filtering
• Credit card fraud detection
• Digit recognition on checks, zip codes •
Detecting faces in images
• MRI image analysis
• Recommendation system
• Search engines
• Handwriting recognition
• Scene classification
• etc...
Biology
Engineering
Economics
M L Visualization
ML versus Statistics
GLM
Statistics: • PCA
Machine Learning:
• Hypothesis testing •
Experimental design • • Decision trees
Anova • Rule induction
• Linear regression • • Neural Networks •
Logistic regression • SVMs
• Clustering method • Visualization
Association rules • • Graphical models •
Feature selection • Genetic algorithm
http://statweb.stanford.edu/
~jhf/ftp/dm-stat.pdf
example xi → yi ← label
......
example xn → xn1 xn2 . . . xnd yn ← label
example xi → yi ← label
......
example xn → xn1 xn2 . . . xnd yn ← label
Supervised vs. Unsupervised
Unsupervised learning:
Learning a model from unlabeled data.
Supervised learning:
Learning a model from labeled data.
Unsupervised Learning
Training data:“examples” x.
n
x1, . . . , xn, xi ∈ X ⊂ R
• Clustering/segmentation:
d
f : R −→ {C1, . . . Ck} (set of clusters).
Feature'2
Feature'1
Unsupervised learning
Feature'2
Feature'1
Unsupervised learning
Feature'2
Feature'1
Methods: K-means, gaussian mixtures, hierarchical clustering, spectral
clustering, etc.
Supervised learning
Training data:“examples” x with “labels” y.
d
(x1, y1), . . . , (xn, yn) / xi ∈ R
d
f : R −→ {−1, +1} f is called a binary classifier. Example: Approve
Supervised learning
!"#$%&"'(
!"#$%&"')
Supervised learning
!"#$%&"'(
&2'
!"#$%&"')
*"+,-,./'0.%/1#
Supervised learning
!"#$%&"'(
&2'
!"#$%&"')
*"+,-,./'0.%/1#
Supervised learning
Classification:
!"#$%&"'(
!"#$%&"'('
!"#$%&"') !"#$%&"'('
!"#$%&"') !"#$%&"'(
!"#$%&"'(
!"#$%&"')
!"#$%&"')
!"#$%&"')
Supervised learning
Non linear classification
Supervised learning Training
data:“examples” x with “labels” y. (x1, y1), . . . , (xn,
d
yn) / xi ∈ R
• Regression: y is a real value, y ∈ R
d
f : R −→ R f is called a regressor. Example: amount of
credit, weight of fruit.
Supervised learning
Regression:
!
#$%&'($")
Example: Income in function of age, weight of the fruit in function of its
length.
Supervised learning
Regression:
!
#$%&'($")
Supervised learning
Regression:
!
#$%&'($")
Supervised learning
Regression:
!
#$%&'($")
Training and Testing
!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
Training and &)%2)"8''
Testing !"#$%$%&'()*'
+,'-.&/"$*01'
6%7/1)8''
=")2$*'#1/:%*'>'
#&)8'' 8' ;$<7/2) =")2$*'9)(?%/'
4#1$.9'(*#*:( +/2).'345'
K-nearest neighbors
• Not every ML method builds a model!
K-nearest neighbors
• KNN uses the standard Euclidian distance to define nearest neighbors.
Given two examples xi and xj:
vuuu d
tX
k=1
2
d(xi, xj) = (xik − xjk)
K-nearest neighbors
Training algorithm:
d
Add each training example (x, y) to the dataset D. x ∈ R ,
y ∈ {+1, −1}.
K-nearest neighbors
Training algorithm:
d
Add each training example (x, y) to the dataset D. x ∈ R ,
y ∈ {+1, −1}.
Classification algorithm:
Given an example xq to be classified. Suppose Nk(xq) is the set of the
K-nearest neighbors
3-NN. Credit: Introduction to Statistical Learning.
K-nearest neighbors
3-NN. Credit: Introduction to Statistical Learning.
K-nearest neighbors
Credit: Introduction to Statistical Learning.
K-nearest neighbors
Question: What are the pros and cons of K-NN? Pros:
+ Simple to implement.
+ Works well in practice.
+ Does not require to build a model, make assumptions, tune parameters.
+ Can be extended easily with news examples.
K-nearest neighbors
Question: What are the pros and cons of K-NN? Pros:
+ Simple to implement.
+ Works well in practice.
+ Does not require to build a model, make assumptions, tune parameters.
+ Can be extended easily with news examples.
Cons:
- Requires large space to store the entire training dataset. - Slow! Given
n examples and d features. The method takes O(n × d) to run.
- Suffers from the curse of dimensionality.
Applications of K-NN
1. Information retrieval.
Training and
Testing !"#$%$%&'()*'
+,'-.&/"$*01'
6%7/1)8'' =")2$*'#1/:%*'>'
&)%2)"8''
#&)8'' 8' ;$<7/2) =")2$*'9)(?%/'
4#1$.9'(*#*:( +/2).'345'
i=1
• Examples of loss
i=1
• Examples of loss
train train
• We aim to have E (f) small, i.e., minimize E (f)
test
• We hope that E (f), the out-sample error (test/true error), will be small
too.
Overfitting/underfitting An intuitive example
Predic'on*Error
____Test*error****
____Training*error*
Low*******************************************Complexity*of*the*model***************************
**********High*
()&
!"#$%&'
()&
!"#$%&'
!"#$%&' !"#$%&'
!"#$%&'
()&
()&
()&
!"#$%&' !"#$%&'
!"#$%&'
()&
()&
()&
()&
Avoid overfitting
In general, use simple models!
i=1
Regularization: Intuition
!"#$%&'
()&
!"#$%&'
()&
()&
!"#$%&'
f(x) = λ0 + λ1x ... (1)
2
f(x) = λ0 + λ1x + λ2x ... (2)
2 3 4
f(x) = λ0 + λ1x + λ2x + λ3x + λ4x ... (3) Hint: Avoid
high-degree polynomials.
Example: Split the data randomly into 60% for training, 20% for validation
and 20% for testing.
2. Validation set is a set of examples that cannot be used for learning the
model but can help tune model parameters (e.g., selecting K in K-NN).
Validation helps control overfitting.
2. Validation set is a set of examples that cannot be used for learning the
model but can help tune model parameters (e.g., selecting K in K-NN).
Validation helps control overfitting.
3. Test set is used to assess the performance of the final model and
provide an estimation of the test error.
2. Validation set is a set of examples that cannot be used for learning the
model but can help tune model parameters (e.g., selecting K in K-NN).
Validation helps control overfitting.
3. Test set is used to assess the performance of the final model and
provide an estimation of the test error.
Note: Never use the test set in any way to further tune the parameters
or revise the model.
Algorithm:
j=1
Confusion matrix
34*#/0%;/<$0%
!"#$%$&' (')*%$&'
&"$4)()'6 ,!-0-+,!-.-1!/
&"$=)4*$=%;/<$0
!"#$%$&' !"#$%&'()*)+$%,!&-
./0($%&'()*)+$%,.&- (')*%$&' 7$6()*)+)*5%,8$4/00- ,!-0-+,!-.-1(/
./0($%1$2/*)+$%,.1- !"#$%1$2/*)+$%,!1-
79$4):)4)*5 ,(-0-+,(-.-1!/
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'
5%
'4'- 34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4
%-*4'- 5"44'5%
'- 34'8$5%'8-*#-6')*%$&'
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9
Evaluation metrics
34*#/0%;/<$0%
&"$4)()'6 ,!-0-+,!-.-1!/
&"$=)4*$=%;/<$0
!"#$%$&' (')*%$&' 7$6()*)+)*5%,8$4/00- ,!-0-+,!-.-1(/
!"#$%$&' !"#$%&'()*)+$%,!&-
./0($%&'()*)+$%,.&- (')*%$&' 79$4):)4)*5 ,(-0-+,(-.-1!/
./0($%1$2/*)+$%,.1- !"#$%1$2/*)+$%,!1-
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'
5%
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*
%-*4'- 5"44'5%
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9 '- 34'8$5%'8-*#-6')*%$&'
'4'- 34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4
Terminology review
Review the concepts and terminology:
nuggets http://www.kdnuggets.com/
Credit
• The elements of statistical learning. Data mining, inference, and
prediction. 10th Edition 2009. T. Hastie, R. Tibshirani, J. Friedman.
• Machine Learning 1997. Tom Mitchell.