0% found this document useful (0 votes)
2 views55 pages

03 Regression

The document outlines the fundamentals of regression in applied machine learning, covering topics such as solutions to regression, probabilistic regression, and basis function regression. It includes an overview of the mathematical foundations, error functions, and methods like closed-form solutions and gradient descent. The lecture is part of a course taught by Professor Abdelkrim EL MOUATASIM during the winter term of 2021/2022.

Uploaded by

marwaneouzaina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views55 pages

03 Regression

The document outlines the fundamentals of regression in applied machine learning, covering topics such as solutions to regression, probabilistic regression, and basis function regression. It includes an overview of the mathematical foundations, error functions, and methods like closed-form solutions and gradient descent. The lecture is part of a course taught by Professor Abdelkrim EL MOUATASIM during the winter term of 2021/2022.

Uploaded by

marwaneouzaina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

*** Applied Machine Learning Fundamentals ***

Regression

Abdelkrim EL MOUATASIM Full Professeur of AI


UIZ FPO - DMG

Winter term 2021/2022

Find all slides on https://sites.google.com/a/uiz.ac.ma/elmouatasim/


(Master-MASED: Machine Learning via Python)
Introduction
Solutions to Regression
Probabilistic Regression
Basis Function Regression
Wrap-Up

Lecture Overview

Unit I Machine Learning Introduction


Unit II Mathematical Foundations
Unit III Regression
Unit IV Classication I
Unit V Evaluation
Unit VI Classication II
Unit VII Clustering
Unit VIII Deep learning

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-2/55
Introduction
Solutions to Regression
Probabilistic Regression
Basis Function Regression
Wrap-Up

Agenda for this Unit

4 Basis Function Regression


1 Introduction General Idea
What is Regression? Polynomial Basis Functions
Least Squares Error Function Radial Basis Functions
Regularization Techniques
2 Solutions to Regression
Closed-Form Solutions and Normal Equation 5 Wrap-Up
Gradient Descent Summary
Self-Test Questions
3 Probabilistic Regression Lecture Outlook
Underlying Assumptions Recommended Literature and further Reading
Maximum Likelihood Solution Meme of the Day

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-3/55
Section:
Introduction
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Regression
Type of target variable Continuous
Type of training information Supervised
Example Availability Batch learning

Algorithm sketch: Given the training data D, the algorithm derives a function of the
type
hθ (x ) = θ0 + θ1 x1 + · · · + θm xm x ∈ Rm , θ ∈ Rm+1 (1)
from the data. θ is the parameter vector containing the coecients to be estimated by
the regression algorithm. Once θ is learned, it can be used for prediction.

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-5/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Example Data Set: Revenues

Revenue y • Find a linear function:

hθ (x ) = θ0 + θ1 x1 + · · · + θm xm

• Usually: x0 = 1:

b ∈ Rm+1 = [1 x ]⊺
x

Xm
hθ (b
x) = θj xj = θ ⊺ xb
Marketing Expenses x1 j=0

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-6/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Error Function for Regression


• We need an error function J(θ) in order to know how good the function ts:

1 X
n
J(θ) = (hθ (b
x
(i)
) − y (i) )2 (2)
2n i=1

• We want to minimize J(θ):

1 X
n
min (hθ (b
x
(i)
) − y (i) )2
θ 2n
i=1

• This is ordinary least squares (OLS)


L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-7/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Error Function Intuition


y x (i) ) − y (i) )
(hθ (b

4 hθ (b
x) = θ x
y (i)
b
3

2
Why the square
in the error function?
1

x
1 2 3 4 5 6 7 8 9

xb(i)
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-8/55
Section:
Solutions to Regression
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Closed-Form Solutions
• Usual approach (for two unknowns): Calculate θ0 and θ1 according to
sample mean x Pn
(x (i) − x) · (y (i) − y )
θ0 = y − θ1 x θ1 = i=1Pn (i) − x)2
(3)
i=1 (x

• `Normal equation' (scales to arbitrary dimensions):


b )−1 X
b ⊺X
θ = (X b⊺ y (4)
Moore-Penrose
pseudo-inverse

X b is called `design matrix' or `regressor matrix'


L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-10/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Design Matrix / Regressor Matrix


b ∈ Rn×(m+1) looks as follows:
• The design matrix X In the following
Xb ≡ X

1 x1
(1)
x2
( 1)
···
(1) 
xm
 1 (2)
x1
( 2)
x2 ···
(2)
xm 
1 (5)
 
(3) ( 3) (3) 
X = ···

x1 x2 xm 
... ... ... ... ...
b
 
 
 
1 x1
(n)
x2
(n)
· · · xm
(n)

• And the n × 1 label vector:

y=
⊺
y (1) , y (2) , y (3) , . . . , y (n)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-11/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Derivation of the Normal Equation

• The derivation involves a bit of linear algebra


• Step ❶: Rewrite J(θ) in matrix-vector notation:

J(θ) = 1/2(X θ − y )⊺ (X θ − y )
= 1/2((X θ)⊺ − y ⊺ )(X θ − y )
= 1/2((X θ)⊺ X θ − (X θ)⊺ y − y ⊺ (X θ) + y ⊺ y )
= 1/2(θ ⊺ X ⊺ X θ − 2(X θ)⊺ y + y ⊺ y )

• To be continued...
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-12/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Derivation of the Normal Equation (Ctd.)


• Step ❷: Calculate the derivative of J(θ) and set it to zero:

!
∇θ J(θ) = 1/2(2X ⊺ X θ − 2X ⊺ y ) = 0
⇔ X ⊺X θ = X ⊺y

• If ⊺
X X is invertible, we can multiply both sides by (X ⊺ X )−1 :

Normal equation:
θ = (X ⊺ X )−1 X ⊺ y

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-13/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Problems with Matrix Inversion?


• What if (X ⊺ X )−1 does not exist?
• Problems and solutions:
1 Linearly dependent (redundant) features or design matrix does not have full
rank? (E. g. size in m2 and size in feet2 )
⇒ Delete correlated features
2 Too many features (m > n)?
⇒ Delete features (e. g. using PCA) / add training examples
3 Other numerical instabilities?
⇒ Add a regularization term (later)
4 Computationally too expensive?
⇒ Use gradient descent
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-14/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Gradient Descent
• We want to minimize a smooth function J : Rm+1 → R:

min J(θ)
θ∈Rm+1

• Update the parameters iteratively:

θ (t+1) ←− θ (t) − α∇θ J(θ (t) ) (6)

• where α > 0 (learning rate) and ∇θ J(θ) is the gradient of J(θ) w. r. t. θ:


 ⊺
∇θ J(θ) = ∂J(θ) ∂J(θ)
∂θ0
, ∂θ1 , . . . , ∂J(θ)
∂θm

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-15/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space

Data input space Hypothesis space H


6

2
40
0
y

J(θ)
−2 y = θ0 + θ1 x 20
5
−4 0
−4 −2 0
−6 0
−6 −4 −2 0 2 4 6 2 4 θ1
x −5
θ0

L MOUATASIM
Abdelkrim E Full Professeur of AI
(UIZ-FPO),
Winter term 2021/2022 Regression REG-16/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space (Ctd.)

• Data input space


• Determined by the m + 1 attributes of the data set x0 , x1 , x2 , . . . , xm
• Often high-dimensional
• Hypothesis space H
• Determined by the number of parameters of the model
• Each point in the hypothesis space corresponds to a specic assignment of
model parameters
• The error function gives information about how good this assignment is
• Gradient descent is applied in the hypothesis space H

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-17/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space (Ctd.)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-18/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Visualization of Gradient Descent in 3 Dimensions

J(θ) 40

20
5
0
−4 −2 0
0 2 4 −5 θ1
θ0

L MOUATASIM
Abdelkrim E Full Professeur of AI(UIZ-FPO),
Winter term 2021/2022 Regression REG-19/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Versions of Gradient Descent


• Assume some training data D: {x (i) , y (i) }ni=1
• Squared error for a single example: ℓ(ypred , ytrue ) = (ypred − ytrue )2
• Our objective is to minimize the total error:

X
n
min J(θ) = min ℓ(hθ (x (i) ), y (i) )
θ∈Rm+1 m+1 θ∈R
i=1

• Three versions of gradient descent:


1 Batch gradient descent
2 Stochastic gradient descent
3 Mini-batch gradient descent
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-20/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Versions of Gradient Descent (Ctd.)


• Batch gradient descent: Compute gradient based on ALL data points
X n
(t+1)
θ ←− θ (t)
− α ∇ℓ(hθ(t) (x (i) ), y (i) ) (7)
i =1
• Stochastic gradient descent: Compute gradient based on a SINGLE
data point (pick training example randomly and not sequentially!)
• For i ∈ {1, . . . , n} do:
θ (t+1) ←− θ (t) − α∇ℓ(hθ(t) (x (i) ), y (i) ) (8)

L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-21/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Solving linear Regression using Gradient Descent


• Randomly initialize θ
• To minimize the error, keep changing θ according to:
θ (t+1) ←− θ (t) − α∇θ J(θ (t) ) (9)
• We need to calculate ∇θj J(θ (based on a single example)
(t)
):
∂ 1 X 1X
n n
(hθ (x (i) ) − y (i) )2 = 2 · (hθ (x (i) ) − y (i) ) · (hθ (x (i) ) − y (i) )
∂ ∂
J(θ) =
∂θj ∂θj 2 2 ∂θj
i=1 i=1
(10)
X
n X
n
(hθ (x (i) ) − y (i) ) · (hθ (x (i) ) − y (i) )xj
∂ (i) (i)
= (θ0 x0 + · · · + θm xm(i) − y (i) ) =
∂θj
i=1 i=1
(11)
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-22/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Solving the introductory Example

30 R (y )
• θ0 ≈ 7.4218
• θ1 ≈ 2.9827
20
• J(θ) ≈ 446.9584
• hθ (x ) = 7.4218 + 2.9827 · x1
10 • R = hθ (2.7) = 15.4750
M (x1 )
2 4 6 8
Abdelkrim EL MOUATASIM Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-23/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Disadvantage of Gradient Descent

2 Local
Global
J(θ)

Optimum
Optimum
0

−2 −1 0 1 2
θ

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-24/55
Section:
Probabilistic Regression
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Probabilistic Regression

• Assumption 1: The target function values are generated by adding noise


to the function estimate:
y = hθ (x ) + ε (12)
• Assumption 2: The noise is a Gaussian random variable:

β≡ precision ε ∼ N(0, β−1 ) (13)


β = 1/σ2
p(y |x , θ, β) = N(y |hθ (x ), β−1 ) (14)

• y is now a random variable!


L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-26/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Probabilistic Regression (Ctd.)

w ≡θ
t≡y
y (x0 , w ) ≡ hθ (x0 )

cf. [1], p. 29; probabilistic regression


Abdelkrim EL MOUATASIM AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-27/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression

• Given: A labeled set of training data points {(x (i) , y (i) )}ni=1
• Conditional likelihood (assuming the data is i. i. d.):
Y
n
p(y |X , θ, β) = N(y (i) |hθ (x (i) ), β−1 ) (15)
i=1
Y
n
= N(y (i) |θ ⊺ x (i) , β−1 ) (16)
i=1

• Maximize the likelihood w. r. t. θ and β

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-28/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression (Ctd.)


Simplify using the log-likelihood:
X
n
log p(y |X , θ, β) = log N(y (i) |θ ⊺ x (i) , β−1 ) (17)
i=1
Xn  √  
β β (i) ⊺ (i) 2
= log √ − (y − θ x ) (18)
i=1
2π 2
Remember log-rules?
β X (i)
n
n n
= log β − log(2π) − (y − θ ⊺ x (i) )2 (19)
2 2 2 i=1

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-29/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression (Ctd.)

• Compute the gradient w. r. t. θ :

∇θ log p(y |X , θ, β) = 0
X
n
−β (y (i) − θ ⊺ x (i) )x (i) = 0
i=1

...
θml = (X ⊺ X )−1 X ⊺ y

• Same result as in least squares regression


L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-30/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

We have derived the squared Error!

Minimizing the squared error gives the maximum likelihood solution for the
parameters θ assuming Gaussian noise.

• The maximum likelihood approach gives rise to the squared error


• But it is much more powerful than regular least squares ⇒ We can
estimate the uncertainty β
!−1
1 X (i)
n
⊺ (i) 2
βml = (y − θml x ) (20)
n i=1
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-31/55
Section:
Basis Function Regression
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

What if the Data is non-linear?

Data
2
• So far we have tted straight lines
1
• What if the data is not
linear...? 0

y
1
The best-tting function is
obviously not a straight line! 2

What would you do? 10 5 0 5 10


x

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-33/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Basis Functions
• Remember: `When stuck switch to a dierent perspective'
• We can add higher-order features using basis functions φ:

We assume 1-D data


X
p
hθ (x) = θj φj (x) (21)
j=0

• There exist several types of basis functions:


• linear: φ0 (x) = 1 and φ1 (x) = x
• polynomial ⇒ see below
• radial basis functions (RBFs) ⇒ see below
• Fourier basis
L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-34/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

New Design Matrix


Applying the basis functions to X we get the new design matrix Φ:
φ0 (x (1) ) φ1 (x (1) ) φ2 (x (1) ) φp (x (1) )
 
...
φ0 (x (2) ) φ1 (x (2) ) φ2 (x (2) ) ... φp (x (2) )
Φ=


... ... ... ... ...


(22)
φ0 (x (n) ) φ1 (x (n) ) φ2 (x (n) ) ... φp (x (n) )

The model is still linear in the parameters, so we can still use the same algorithm
as before. This is still linear regression (!!!)

L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-35/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Polynomial Basis Functions

• A quite frequently used basis function: The polynomial basis


For N -D data we would also
φ0 (x) = 1 include cross-terms!
φj (x) = x j
X p
hθ (x) = θj φj (x) = θ0 + θ1 x + θ2 x 2 + · · · + θp x p
j=0

• Here, p is the degree of the polynomial


• Here: φ(x) = [1, x, x 2 , x 3 , . . . , x p ]
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-36/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

It is still linear!

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-37/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

It is still linear! (Ctd.)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-38/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Basis Functions: Radial Basis Functions

• Yet another possible choice of basis function: Radial basis functions


φ0 (x) = 1 (23)

φj (x) = exp −1/2∥x − zj ∥2 /2σ2 (24)

• {zj } are the centers of the radial basis functions


• p denotes the number of centers / number of radial basis functions
• Often we take each data point as a center, so p = n

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-39/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Radial Basis Functions (Ctd.)

Data
2

0
y

10 5 0 5 10
x

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-40/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

The Danger of too expressive Models...


Polynomial of degree p = 16 RBF with σ = 1.00, p = n
( A severe overtting A) (About right)
Data
2 2 Regression line

1 1

0 0

y
y

1 1

2 Data 2
Regression line
10 5 0 5 10 10 5 0 5 10
x x

Abdelkrim E L MOUATASIM Full Professeur of AI


(UIZ-FPO),
Winter term 2021/2022 Regression REG-41/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Overtting vs. Undertting

• Undertting
• The model is not complex enough to t the data well ⇒ High bias
• Make the model more complex; adding new examples does not help
• Overtting
• The model predicts the training data perfectly
• But it fails to generalize to unseen instances ⇒ High variance
• Decrease the degree of freedom or add more training examples
• Also: Try regularization
• Bias-Variance trade-o
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-42/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

First Solution: Smaller Degree


One solution: Use a smaller degree (here: p = 3) Much better :)

Data
2 Regression line

0
y

10 5 0 5 10
x
L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-43/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Second Solution: Regularization


• Enrich J(θ) with a regularization term
• This can prevent overtting and results in a smoother function
(large values for θj are prevented)
• Two forms of regularization, L1 and L2:

min J(θ) + λ|θ| → (L1) min J(θ) + λ∥θ∥2 → (L2)


θ θ

X
m X
m
|θ| = |θj | ∥θ∥2 = θ2j
j=1 j=1

• λ ⩾ 0 controls the degree of regularization


L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-44/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Regularization visualized

• Here: w ≡ θ
• L1-Regularization
⇒ Lasso regression
(least abs. shrinkage and select. operator)
• L2-Regularization
⇒ Ridge regression
(Tikhonov regularization)
• The combination of both is called
elastic net cf. [1], p. 146; left: L2, right: L1

Abdelkrim EL MOUATASIM Full Professeur of AI(UIZ-FPO),


Winter term 2021/2022 Regression REG-45/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Incorporating Regularization
The regularization also
• Normal equation with regularization: helps to overcome numerical issues!
(ridge regression)
θ = (X ⊺ X + λI )−1 X ⊺ y (25)
• Regularized gradient descent update rule:

θ (t+1) ←− θ (t) − α∇θ J(θ (t) )



J(θ) = (hθ (x ) − y )xj + λθj
∂θj
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-46/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Polynomial Regression with Regularization


At least better Way too much regularization
2 Data
2 Regression line

1 1

0 0
y

y
1 1

2 Data
Regression line 2
10 5 0 5 10 10 5 0 5 10
x x

Abdelkrim E L MOUATASIM Full Professeur of AI


(UIZ-FPO),
Winter term 2021/2022 Regression REG-47/55
Section:
Wrap-Up
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Summary
• Regression predicts continuous target variables
• The algorithm minimizes the (mean) squared error
• Minimizing the squared error gives the maximum likelihood solution
• Two approaches:
1 Normal equation
2 (Batch / stochastic / mini-batch) gradient descent
• Probabilistic regression allows to quantify the uncertainty of the model
• Use basis functions to t non-linear regression lines
• Regularization is important
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-49/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Self-Test Questions

1 What is the goal of regression?


2 What can you do if matrix inversion fails for the normal equation?
3 What is a suitable cost function for regression? Where does it come from?
4 Does gradient descent give the exact solution?
5 What is the advantage of probabilistic regression?
6 What are basis functions? Why use them? State some examples.
7 What is overtting / undertting?
8 What is regularization? Why should you apply it?
Abdelkrim EL MOUATASIM AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-50/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

What's next...?

Unit I Machine Learning Introduction


Unit II Mathematical Foundations
Unit III Regression
Unit IV Classication I
Unit V Evaluation
Unit VI Classication II
Unit VII Clustering
Unit VIII Deep learning

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-51/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Recommended Literature and further Reading I

[1] Pattern Recognition and Machine Learning


Christopher Bishop. Springer. 2006.

→ Link , cf. chapter 3.1


[2] Machine Learning: A Probabilistic Perspective
Kevin Murphy. MIT Press. 2012.

→ Link , cf. chapters 1.4.5 and 1.4.7


[3] Stanford CS229 course notes
Andrew Ng. Stanford University. 2019.

→ Link

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-52/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Recommended Literature and further Reading II

[4] Stanford CS229 course recording


Andrew Ng. Stanford University. 2008.

→ Link

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-53/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Meme of the Day

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-54/55
Thank you very much for the attention!

Topic: *** Applied Machine Learning Fundamentals *** Regression


Term: Winter term 2021/2022
Contact:
Abdelkrim EL MOUATASIM Full Professeur of AI
UIZ FPO - DMG
[email protected]

Do you have any questions?

You might also like