Ict515 Lec1
Ict515 Lec1
Yi f (X i ) i
Where f is an unknown function and ε is a
random error term, independent of X, with mean
zero (unbiased estimation)
29
0.10
0.05
0.00
y
-0.05
-0.10
x
30
0.10
εi
0.05
0.00
y
-0.05
f
-0.10
x
31
Different Standard
Deviations
sd=0.001 sd=0.005
• The difficulty of
0.10
0.10
0.05
estimating f will
0.05
0.00
depend on the
0.00
y
y
-0.05
standard
-0.05
-0.10
-0.10
deviation of the 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
ε’s. x x
sd=0.01 sd=0.03
0.10
y
-0.05
-0.10
-0.10
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
32
0.10
0.10
0.05
0.05
0.00
0.00
y
y
-0.05
-0.05
-0.10
-0.10
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
sd=0.01 sd=0.03
0.10
y
-0.05
-0.10
-0.10
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
33
34
35
Why estimate f?
Statistical Learning, and this course, are all
about how to estimate f.
The term statistical learning refers to using the
data to “learn” f.
Why do we care about estimating f?
There are 2 reasons for estimating f,
Prediction and
Inference.
38
1. Prediction
If we can produce a good estimate for f (and the
variance of ε is not too large) we can make
accurate predictions for the response, Y, based
on a new value of X.
39
2. Inference
Alternatively, we may be interested in the type
of relationship between Y and the X's.
For example,
Which particular predictors actually affect the
response?
Is the relationship positive or negative?
Is the relationship a simple linear one or is it more
complicated etc.?
41
How Do We Estimate f?
We will assume we have observed a set of
training data
{(X1 , Y1 ), ( X 2 , Y2 ), , ( X n , Yn )}
We must then use the training data and a
statistical method to estimate f.
Statistical Learning Methods:
Parametric Methods
Non-parametric Methods
43
44
45
46
STEP 1:
Make some assumption about the functional form of f,
i.e. come up with a model. The most common
example is a linear model i.e.
f (X i ) 0 1 X i1 2 X i 2 p X ip
47
• Even if the
standard
deviation is low
we will still get a
bad answer if we
use the wrong
model
• The true f has
some curvature
that is not
captured in the
linear fit
52
Non-parametric Methods
They do not make explicit assumptions about the
functional form of f.
Instead, they seek an estimate of f that gets as
close to the data points as possible without being
too rough or wiggly.
53
Advantages of non-
parametric Methods
They accurately fit a wider range of possible
shapes of f
Any parametric approach brings with it the
possibility that the functional form used to
estimate f is very different from the true f => the
resulting model will not fit the data well
In contrast, non-parametric approaches
completely avoid this danger, since essentially
no assumption is made about the form of f
54
Disadvantage of non-
parametric Methods
Since they do not reduce the problem of
estimating f to a small number of parameters, a
very large number of observations is required to
obtain an accurate estimate of f
55
Example of a non-parametric
approach: A Thin-Plate Spline
Estimate
• Non-linear
regression
methods are
more flexible
and can
potentially
provide more
accurate
estimates.
56
Reason 2:
Even if you are only interested in prediction (so the
first reason is not relevant) it is often possible to get
more accurate predictions with a simple, instead of
a complicated, model. This seems counter intuitive
but has to do with the fact that it is harder to fit a
more flexible model.
58
Flexibility vs.
Interpretability
59
A Poor Estimate
• Non-linear
regression
methods can
also be too
flexible and
produce poor
estimates for f.
60
A Simple Clustering
Example
64
Regression vs.
Classification
Supervised learning problems can be further
divided into regression and classification problems
Regression covers situations where Y is
continuous/numerical. e.g.
Predicting the value of the Dow Jones stock market
index in 6 months
Predicting the value of a given house based on
various inputs
Classification covers situations where Y is
categorical e.g.
Will the Dow be up (U) or down (D) in 6 months?
Is this email a SPAM or not?
65
Different Approaches
We will deal with both types of problems in this
unit
Some methods work well on both types of
problems, e.g. Neural Networks
Other methods work best on Regression, e.g.
Linear Regression, or on Classification, e.g. k-
Nearest Neighbors.
66
Short Assignment 2
• Find on the web a paper/article/blog post
discussing the differences between statistical
learning and machine learning.
• Share the link to the paper with the class, in the
Discussion Forum, and be prepared to discuss
the differences between statistical learning and
machine learning in class next week