0% found this document useful (0 votes)
16 views40 pages

Fundamentals Part 2

This document discusses machine learning categories and supervised vs unsupervised learning. It introduces supervised learning techniques like regression and classification that learn models to predict outputs given inputs. Regression predicts quantitative variables while classification predicts categorical variables. Unsupervised learning discovers relationships within inputs when there are no labeled outputs. The goal is to cluster or segment data to find patterns rather than make predictions. Examples of both supervised and unsupervised techniques are provided.

Uploaded by

Agustin Agustin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views40 pages

Fundamentals Part 2

This document discusses machine learning categories and supervised vs unsupervised learning. It introduces supervised learning techniques like regression and classification that learn models to predict outputs given inputs. Regression predicts quantitative variables while classification predicts categorical variables. Unsupervised learning discovers relationships within inputs when there are no labeled outputs. The goal is to cluster or segment data to find patterns rather than make predictions. Examples of both supervised and unsupervised techniques are provided.

Uploaded by

Agustin Agustin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Machine Learning and Data Analytics

Fundamentals – Part 2

Dr. Rossana Cavagnini

Deutsche Post Chair – Optimization of Distribution Networks (DPO)


RWTH Aachen University

[email protected]
Machine learning categories
What learning means
Why and how estimating f ?

Agenda

1 Machine learning categories

2 What learning means

3 Why and how estimating f ?

DPO MLDA 2
Machine learning categories
What learning means
Why and how estimating f ?

DPO MLDA 3
Machine learning categories
What learning means
Why and how estimating f ?

1. Supervised learning

Learn a model for predicting or estimating an output based on one or more inputs
For each observation of the predictors, there is an associated response measurement
(labeled training data)
Regression and classification tasks
Examples:
Linear regression
Logistic regression
Boosting
Support Vector Machines
...

DPO MLDA 4
Machine learning categories
What learning means
Why and how estimating f ?

1.a Regression problems

Problems with a quantitative response


Quantitative variables (numerical values)

DPO MLDA 5
Machine learning categories
What learning means
Why and how estimating f ?

Example: wage data


Predict the wage based on age, education level, and income year.

300

300

300
200

200

200
Wage

Wage

Wage
50 100

50 100

50 100
20 40 60 80 2003 2006 2009 1 2 3 4 5

Age Year Education Level

Lots of variability → combine the three features


Methodologies: linear regression, but non-linear relationship between wage and age!

DPO MLDA 6
Machine learning categories
What learning means
Why and how estimating f ?

1.b Classification problems

Problems with a qualitative (categorical) response


Qualitative variables (values of one of K different classes, categories)

DPO MLDA 7
Machine learning categories
What learning means
Why and how estimating f ?

Example: stock market data


Predict if a market index will increase or decrease based on the performance of
yesterday, of two, and of three previous days.

Yesterday Two Days Previous Three Days Previous

6
Percentage change in S&P

Percentage change in S&P

Percentage change in S&P


4

4
2

2
0

0
−2

−2

−2
−4

−4

−4
Down Up Down Up Down Up

Today’s Direction Today’s Direction Today’s Direction

Will the answer fall into the Up or Down bucket?


There is no simple strategy for using yesterday’s movement to predict today’s returns.
DPO MLDA 8
Machine learning categories
What learning means
Why and how estimating f ?

Regression vs classification

Sometimes the difference between a regression and a classification task is not clear
from the beginning

DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?

Regression vs classification

Sometimes the difference between a regression and a classification task is not clear
from the beginning
Classification tasks can be interpreted as estimating the probability that an element
has a given label

DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?

Regression vs classification

Sometimes the difference between a regression and a classification task is not clear
from the beginning
Classification tasks can be interpreted as estimating the probability that an element
has a given label
However, theory and tools are very different!

DPO MLDA 9
Machine learning categories
What learning means
Why and how estimating f ?

2. Unsupervised learning
There are inputs but no supervising output
We can learn relationships and structure from data
We observe only features and have no measurements of the outcome
There is no response variable to predict

DPO MLDA 10
Machine learning categories
What learning means
Why and how estimating f ?

Example: Market segmentation study


Dataset with zipcode and family income for customers. Determine whether there are
clusters for customers (goal: are there spending patterns?).

12

8
10
8

6
X2

X2
6

4
4

2
2

0 2 4 6 8 10 12 0 2 4 6

X1 X1

If we had the spending patterns among the observed variables → supervised task

DPO MLDA 11
Machine learning categories
What learning means
Why and how estimating f ?

What learning means


X : input variables (predictors, independent variables, features, ...)
Y : output variables (response variables, dependent variables, target, ...)
Hypothesis: there is a certain hidden (unknown) relation between the inputs and the
outputs. Let’s estimate it!
Learning refers to the set of approaches for estimating f

Y = f (X ) + 

DPO MLDA 12
Machine learning categories
What learning means
Why and how estimating f ?

What learning means


X : input variables (predictors, independent variables, features, ...)
Y : output variables (response variables, dependent variables, target, ...)
Hypothesis: there is a certain hidden (unknown) relation between the inputs and the
outputs. Let’s estimate it!
Learning refers to the set of approaches for estimating f

Y = f (X ) + 

f fixed unknown function of X1 , . . . , Xp (systematic information that X provides about Y )

DPO MLDA 12
Machine learning categories
What learning means
Why and how estimating f ?

What learning means


X : input variables (predictors, independent variables, features, ...)
Y : output variables (response variables, dependent variables, target, ...)
Hypothesis: there is a certain hidden (unknown) relation between the inputs and the
outputs. Let’s estimate it!
Learning refers to the set of approaches for estimating f

Y = f (X ) + 

f fixed unknown function of X1 , . . . , Xp (systematic information that X provides about Y )


 error term: complex real-world relations vs simplifications and assumptions
- independent of X : properties of Y which cannot be inferred from the features
- mean zero (E() = 0): over (infinitely) many observations, the effects of the error are
unpredictable
DPO MLDA 12
Machine learning categories
What learning means
Why and how estimating f ?

Example for the error term


Describe the grade obtained in a class as a function of some inputs: share of class
attendance, number of hours of study, student’s GPA, number of weekends spent
partying, availability of a quiet room to study
These features are not enough (for ex. a student gets a cold and performs poorly) →
uncertainty contained in 
The error :
is independent from the other variables: we cannot guess if a student will have a bad
exam day (e.g., a cold) using the other input features.
has zero mean: on average, given a very large number of observations, there will be as
many students with unlucky as with lucky days
We could reduce this error by increasing the number of features (ex. add the feature
“body temperature on the exam day”)
But we could still miss something...

DPO MLDA 13
Machine learning categories
What learning means
Why and how estimating f ?

Example: advertising data


Sales of a product in 200 different markets, with advertising budgets for the product
in each of those markets for three different media: TV, radio, and newspaper.
Develop a model to predict sales based on the three media budgets.

25

25

25
20

20

20
Sales

Sales

Sales
15

15

15
10

10

10
5

5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper

X : Budget
X1 : tv budget
X2 : radio budget
X3 : newspaper budget
Y : Sales
DPO MLDA 14
Machine learning categories
What learning means
Why and how estimating f ?

Example: income data


Develop a model to predict the income based on the years of education.

X : Years of education
80

80
70

70
Y : Income
60

60
Income

Income
50

50
Blue line: true relationship (unknown)
40

40
30

30
Black lines: errors (approx. mean zero)
20

20
10 12 14 16 18 20 22 10 12 14 16 18 20 22

Years of Education Years of Education


Incom

f may involve more than one input


e

variable
y
rit

Ye
nio

ars
Se

of
Edu
ca
tio
n

DPO MLDA 15
Machine learning categories
What learning means
Why and how estimating f ?

1. Prediction

Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y

DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?

1. Prediction

Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )

DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?

1. Prediction

Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:

DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?

1. Prediction

Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:
reducible error: improve the accuracy of fˆ by choosing the most appropriate learning
technique (if fˆ = f , this term disappears)
irreducible error: even if fˆ = f , this term is not affected. Y is also a function of .
Examples: unmeasured variables, unmeasured variations...

DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?

1. Prediction

Goal: find a good fˆ yielding good predictions Ŷ and keeping the error as low as
possible
fˆ: estimate for f
Ŷ : resulting prediction for Y
Since the error term averages to zero:
Ŷ = fˆ(X )
The accuracy of Ŷ as a prediction of Y depends on:
reducible error: improve the accuracy of fˆ by choosing the most appropriate learning
technique (if fˆ = f , this term disappears)
irreducible error: even if fˆ = f , this term is not affected. Y is also a function of .
Examples: unmeasured variables, unmeasured variations...
→ We will study techniques for estimating f while minimizing the reducible error

DPO MLDA 16
Machine learning categories
What learning means
Why and how estimating f ?

2. Inference

Goal: explain and prove causality between inputs and outputs


How Y changes as a function of X1 , . . . , Xp
Which predictors are associated with the response? Identify the few important predictors
What is the relationship between the response and each predictor? Positive vs opposite
relationship
Example: advertising data
Which media contribute to sales?
Which media generates the biggest boost in sales?
How much increase is associated with a given increase in TV advertising?

DPO MLDA 17
Machine learning categories
What learning means
Why and how estimating f ?

How estimating f ?

Training data: n different observations (rows), p different features (columns)


xij : value of the jth feature for observation i, i = 1, 2, . . . , n and j = 1, 2, . . . , p
yi : value of the response variable for the ith observation
Training data: {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}, where xi = (xi1 , xi2 , . . . , xip )T

DPO MLDA 18
Machine learning categories
What learning means
Why and how estimating f ?

How estimating f ?

Training data: n different observations (rows), p different features (columns)


xij : value of the jth feature for observation i, i = 1, 2, . . . , n and j = 1, 2, . . . , p
yi : value of the response variable for the ith observation
Training data: {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}, where xi = (xi1 , xi2 , . . . , xip )T

Goal: apply a learning method to the training data to estimate the unknown function
f (i.e. find fˆ s.t. Y ≈ fˆ(X ) for any observation (X , Y )).
Learning methods
1 Parametric learning methods
2 Non-parametric learning methods

DPO MLDA 18
Machine learning categories
What learning means
Why and how estimating f ?

1. Parametric methods
A two-step model-based approach

DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?

1. Parametric methods
A two-step model-based approach
1 Assumption on the function shape
Example: f is linear in X → linear model, i.e.:

f (X ) = β0 + β1 X1 + β2 X2 + · · · + βp Xp

DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?

1. Parametric methods
A two-step model-based approach
1 Assumption on the function shape
Example: f is linear in X → linear model, i.e.:

f (X ) = β0 + β1 X1 + β2 X2 + · · · + βp Xp

2 After selecting a model, choose a procedure that uses the training data to fit the
model
Example: estimate the parameters β0 , β1 , . . . , βp , i.e.:

Y ≈ β0 + β1 X1 + β2 X2 + · · · + βp Xp

DPO MLDA 19
Machine learning categories
What learning means
Why and how estimating f ?

The problem of estimating f is reduced to the one of estimating a set of parameters


which minimize the error of the estimator (i.e. training)
The chosen model will usually not match the true unknown form of f (poor estimate)

DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?

The problem of estimating f is reduced to the one of estimating a set of parameters


which minimize the error of the estimator (i.e. training)
The chosen model will usually not match the true unknown form of f (poor estimate)
Possible solution: flexible models that can fit many different functional forms for f
→ it requires estimating a greater number of parameters
→ risk of overfitting the data: they follow the errors (or noise) too closely

DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?

The problem of estimating f is reduced to the one of estimating a set of parameters


which minimize the error of the estimator (i.e. training)
The chosen model will usually not match the true unknown form of f (poor estimate)
Possible solution: flexible models that can fit many different functional forms for f
→ it requires estimating a greater number of parameters
→ risk of overfitting the data: they follow the errors (or noise) too closely
Parametric models are the most common type of model used for ML

DPO MLDA 20
Machine learning categories
What learning means
Why and how estimating f ?

Example: income data


income ≈ β0 + β1 education + β2 seniority
We assumed a linear relationship: only estimate β0 , β1 , β2 (least squares linear
regression)
Incom

Incom
e

e
ity
or
Ye
ni
a

ity
rs Se
of

or
Ye
Ed

ni
ars

Se
uc of
atio Ed
n uc
atio
n

A linear model fit by least squares to the


A smooth thin-plate spline fit.
income data.

DPO MLDA 21
Machine learning categories
What learning means
Why and how estimating f ?

2. Non-parametric methods

No explicit assumptions about the function shape


An estimate of f that gets as close as possible to the data points
Advantage: accurately fit a wider range of possible shapes for f
Disadvantage: a very large number of observations is required

DPO MLDA 22
Machine learning categories
What learning means
Why and how estimating f ?

Example: income data


Which is the correct amount of smoothness?

Incom
Incom

e
e

ity
or
Ye

ni
a

ity
rs

Se
or
Ye of

ni
ars Ed

Se
of uc
E ati
du on
cati
on

Thin-plate spline for the income data with a


Thin-plate spline for the income data
lower level of smoothness

DPO MLDA 23
Machine learning categories
What learning means
Why and how estimating f ?

The trade-off between prediction accuracy and model interpretability

More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f

DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?

The trade-off between prediction accuracy and model interpretability

More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f
Why choosing a more restrictive method?

DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?

The trade-off between prediction accuracy and model interpretability

More restrictive models (ex. linear regression) offer a small range of shapes for f
More flexible models (ex. thin plate splines) offer a wider range of shapes for f
Why choosing a more restrictive method?
Inference: restrictive models are more interpretable
Prediction: flexible models are less interpretable, but more accurate prediction

DPO MLDA 24
Machine learning categories
What learning means
Why and how estimating f ?

High
Subset Selection
Lasso

Least Squares
Interpretability

Generalized Additive Models


Trees

Bagging, Boosting

Support Vector Machines


Low

Low High

Flexibility

DPO MLDA 25

You might also like