Fundamentals Part 2
Fundamentals Part 2
Fundamentals – Part 2
[email protected]
Machine learning categories
What learning means
Why and how estimating f ?
Agenda
DPO MLDA 2
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 3
Machine learning categories
What learning means
Why and how estimating f ?
1. Supervised learning
Learn a model for predicting or estimating an output based on one or more inputs
For each observation of the predictors, there is an associated response measurement
(labeled training data)
Regression and classification tasks
Examples:
Linear regression
Logistic regression
Boosting
Support Vector Machines
...
DPO MLDA 4
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 5
Machine learning categories
What learning means
Why and how estimating f ?
300
300
300
200
200
200
Wage
Wage
Wage
50 100
50 100
50 100
20 40 60 80 2003 2006 2009 1 2 3 4 5
DPO MLDA 6
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 7
Machine learning categories
What learning means
Why and how estimating f ?
6
Percentage change in S&P
4
2
2
0
0
−2
−2
−2
−4
−4
−4
Down Up Down Up Down Up
Regression vs classification
Sometimes the difference between a regression and a classification task is not clear
from the beginning
DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?
Regression vs classification
Sometimes the difference between a regression and a classification task is not clear
from the beginning
Classification tasks can be interpreted as estimating the probability that an element
has a given label
DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?
Regression vs classification
Sometimes the difference between a regression and a classification task is not clear
from the beginning
Classification tasks can be interpreted as estimating the probability that an element
has a given label
However, theory and tools are very different!
DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?
2. Unsupervised learning
There are inputs but no supervising output
We can learn relationships and structure from data
We observe only features and have no measurements of the outcome
There is no response variable to predict
DPO MLDA 10
Machine learning categories
What learning means
Why and how estimating f ?
12
8
10
8
6
X2
X2
6
4
4
2
2
0 2 4 6 8 10 12 0 2 4 6
X1 X1
If we had the spending patterns among the observed variables → supervised task
DPO MLDA 11
Machine learning categories
What learning means
Why and how estimating f ?
Y = f (X ) +
DPO MLDA 12
Machine learning categories
What learning means
Why and how estimating f ?
Y = f (X ) +
DPO MLDA 12
Machine learning categories
What learning means
Why and how estimating f ?
Y = f (X ) +
DPO MLDA 13
Machine learning categories
What learning means
Why and how estimating f ?
25
25
25
20
20
20
Sales
Sales
Sales
15
15
15
10
10
10
5
5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100
TV Radio Newspaper
X : Budget
X1 : tv budget
X2 : radio budget
X3 : newspaper budget
Y : Sales
DPO MLDA 14
Machine learning categories
What learning means
Why and how estimating f ?
X : Years of education
80
80
70
70
Y : Income
60
60
Income
Income
50
50
Blue line: true relationship (unknown)
40
40
30
30
Black lines: errors (approx. mean zero)
20
20
10 12 14 16 18 20 22 10 12 14 16 18 20 22
variable
y
rit
Ye
nio
ars
Se
of
Edu
ca
tio
n
DPO MLDA 15
Machine learning categories
What learning means
Why and how estimating f ?
1. Prediction
Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?
1. Prediction
Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?
1. Prediction
Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:
DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?
1. Prediction
Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:
reducible error: improve the accuracy of fˆ by choosing the most appropriate learning
technique (if fˆ = f , this term disappears)
irreducible error: even if fˆ = f , this term is not affected. Y is also a function of .
Examples: unmeasured variables, unmeasured variations...
DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?
1. Prediction
Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:
reducible error: improve the accuracy of fˆ by choosing the most appropriate learning
technique (if fˆ = f , this term disappears)
irreducible error: even if fˆ = f , this term is not affected. Y is also a function of .
Examples: unmeasured variables, unmeasured variations...
→ We will study techniques for estimating f while minimizing the reducible error
DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?
2. Inference
DPO MLDA 17
Machine learning categories
What learning means
Why and how estimating f ?
How estimating f ?
DPO MLDA 18
Machine learning categories
What learning means
Why and how estimating f ?
How estimating f ?
Goal: apply a learning method to the training data to estimate the unknown function
f (i.e. find fˆ s.t. Y ≈ fˆ(X ) for any observation (X , Y )).
Learning methods
1 Parametric learning methods
2 Non-parametric learning methods
DPO MLDA 18
Machine learning categories
What learning means
Why and how estimating f ?
1. Parametric methods
A two-step model-based approach
DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?
1. Parametric methods
A two-step model-based approach
1 Assumption on the function shape
Example: f is linear in X → linear model, i.e.:
f (X ) = β0 + β1 X1 + β2 X2 + · · · + βp Xp
DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?
1. Parametric methods
A two-step model-based approach
1 Assumption on the function shape
Example: f is linear in X → linear model, i.e.:
f (X ) = β0 + β1 X1 + β2 X2 + · · · + βp Xp
2 After selecting a model, choose a procedure that uses the training data to fit the
model
Example: estimate the parameters β0 , β1 , . . . , βp , i.e.:
Y ≈ β0 + β1 X1 + β2 X2 + · · · + βp Xp
DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?
DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?
Incom
e
e
ity
or
Ye
ni
a
ity
rs Se
of
or
Ye
Ed
ni
ars
Se
uc of
atio Ed
n uc
atio
n
DPO MLDA 21
Machine learning categories
What learning means
Why and how estimating f ?
2. Non-parametric methods
DPO MLDA 22
Machine learning categories
What learning means
Why and how estimating f ?
Incom
Incom
e
e
ity
or
Ye
ni
a
ity
rs
Se
or
Ye of
ni
ars Ed
Se
of uc
E ati
du on
cati
on
DPO MLDA 23
Machine learning categories
What learning means
Why and how estimating f ?
More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f
DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?
More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f
Why choosing a more restrictive method?
DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?
More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f
Why choosing a more restrictive method?
Inference: restrictive models are more interpretable
Prediction: flexible models are less interpretable, but more accurate prediction
DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?
High
Subset Selection
Lasso
Least Squares
Interpretability
Bagging, Boosting
Low High
Flexibility
DPO MLDA 25