0% found this document useful (0 votes)

2 views55 pages

03 Regression

The document outlines the fundamentals of regression in applied machine learning, covering topics such as solutions to regression, probabilistic regression, and basis function regression. It includes an overview of the mathematical foundations, error functions, and methods like closed-form solutions and gradient descent. The lecture is part of a course taught by Professor Abdelkrim EL MOUATASIM during the winter term of 2021/2022.

Uploaded by

marwaneouzaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views55 pages

03 Regression

Uploaded by

marwaneouzaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

* Applied Machine Learning Fundamentals *

Regression

Abdelkrim EL MOUATASIM Full Professeur of AI

UIZ FPO - DMG

Winter term 2021/2022

Find all slides on https://sites.google.com/a/uiz.ac.ma/elmouatasim/

(Master-MASED: Machine Learning via Python)
Introduction
Solutions to Regression
Probabilistic Regression
Basis Function Regression
Wrap-Up

Lecture Overview

Unit I Machine Learning Introduction

Unit II Mathematical Foundations
Unit III Regression
Unit IV Classication I
Unit V Evaluation
Unit VI Classication II
Unit VII Clustering
Unit VIII Deep learning

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-2/55
Introduction
Solutions to Regression
Probabilistic Regression
Basis Function Regression
Wrap-Up

Agenda for this Unit

4 Basis Function Regression

1 Introduction General Idea
What is Regression? Polynomial Basis Functions
Least Squares Error Function Radial Basis Functions
Regularization Techniques
2 Solutions to Regression
Closed-Form Solutions and Normal Equation 5 Wrap-Up
Gradient Descent Summary
Self-Test Questions
3 Probabilistic Regression Lecture Outlook
Underlying Assumptions Recommended Literature and further Reading
Maximum Likelihood Solution Meme of the Day

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-3/55
Section:
Introduction
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Regression
Type of target variable Continuous
Type of training information Supervised
Example Availability Batch learning

Algorithm sketch: Given the training data D, the algorithm derives a function of the
type
hθ (x ) = θ0 + θ1 x1 + · · · + θm xm x ∈ Rm , θ ∈ Rm+1 (1)
from the data. θ is the parameter vector containing the coecients to be estimated by
the regression algorithm. Once θ is learned, it can be used for prediction.

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-5/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Example Data Set: Revenues

Revenue y • Find a linear function:

hθ (x ) = θ0 + θ1 x1 + · · · + θm xm

• Usually: x0 = 1:

b ∈ Rm+1 = [1 x ]⊺
x

Xm
hθ (b
x) = θj xj = θ ⊺ xb
Marketing Expenses x1 j=0

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-6/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Error Function for Regression

• We need an error function J(θ) in order to know how good the function ts:

1 X
n
J(θ) = (hθ (b
x
(i)
) − y (i) )2 (2)
2n i=1

• We want to minimize J(θ):

1 X
n
min (hθ (b
x
(i)
) − y (i) )2
θ 2n
i=1

• This is ordinary least squares (OLS)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-7/55
Introduction
Solutions to Regression
What is Regression?
Probabilistic Regression
Least Squares Error Function
Basis Function Regression
Wrap-Up

Error Function Intuition

y x (i) ) − y (i) )
(hθ (b
⊺
4 hθ (b
x) = θ x
y (i)
b
3

2
Why the square
in the error function?
1

x
1 2 3 4 5 6 7 8 9

xb(i)
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-8/55
Section:
Solutions to Regression
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Closed-Form Solutions
• Usual approach (for two unknowns): Calculate θ0 and θ1 according to
sample mean x Pn
(x (i) − x) · (y (i) − y )
θ0 = y − θ1 x θ1 = i=1Pn (i) − x)2
(3)
i=1 (x

• `Normal equation' (scales to arbitrary dimensions):

b )−1 X
b ⊺X
θ = (X b⊺ y (4)
Moore-Penrose
pseudo-inverse

X b is called `design matrix' or `regressor matrix'

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-10/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Design Matrix / Regressor Matrix

b ∈ Rn×(m+1) looks as follows:
• The design matrix X In the following
Xb ≡ X

1 x1
(1)
x2
( 1)
···
(1) 
xm
 1 (2)
x1
( 2)
x2 ···
(2)
xm 
1 (5)
 
(3) ( 3) (3) 
X = ···

x1 x2 xm 
... ... ... ... ...
b
 
 
 
1 x1
(n)
x2
(n)
· · · xm
(n)

• And the n × 1 label vector:

y=
⊺
y (1) , y (2) , y (3) , . . . , y (n)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-11/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Derivation of the Normal Equation

• The derivation involves a bit of linear algebra

• Step ❶: Rewrite J(θ) in matrix-vector notation:

J(θ) = 1/2(X θ − y )⊺ (X θ − y )
= 1/2((X θ)⊺ − y ⊺ )(X θ − y )
= 1/2((X θ)⊺ X θ − (X θ)⊺ y − y ⊺ (X θ) + y ⊺ y )
= 1/2(θ ⊺ X ⊺ X θ − 2(X θ)⊺ y + y ⊺ y )

• To be continued...
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-12/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Derivation of the Normal Equation (Ctd.)

• Step ❷: Calculate the derivative of J(θ) and set it to zero:

!
∇θ J(θ) = 1/2(2X ⊺ X θ − 2X ⊺ y ) = 0
⇔ X ⊺X θ = X ⊺y

• If ⊺
X X is invertible, we can multiply both sides by (X ⊺ X )−1 :

Normal equation:
θ = (X ⊺ X )−1 X ⊺ y

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-13/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Problems with Matrix Inversion?

• What if (X ⊺ X )−1 does not exist?
• Problems and solutions:
1 Linearly dependent (redundant) features or design matrix does not have full
rank? (E. g. size in m2 and size in feet2 )
⇒ Delete correlated features
2 Too many features (m > n)?
⇒ Delete features (e. g. using PCA) / add training examples
3 Other numerical instabilities?
⇒ Add a regularization term (later)
4 Computationally too expensive?
⇒ Use gradient descent
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-14/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Gradient Descent
• We want to minimize a smooth function J : Rm+1 → R:

min J(θ)
θ∈Rm+1

• Update the parameters iteratively:

θ (t+1) ←− θ (t) − α∇θ J(θ (t) ) (6)

• where α > 0 (learning rate) and ∇θ J(θ) is the gradient of J(θ) w. r. t. θ:

⊺
∇θ J(θ) = ∂J(θ) ∂J(θ)
∂θ0
, ∂θ1 , . . . , ∂J(θ)
∂θm

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-15/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space

Data input space Hypothesis space H

2
40
0
y

J(θ)
−2 y = θ0 + θ1 x 20
5
−4 0
−4 −2 0
−6 0
−6 −4 −2 0 2 4 6 2 4 θ1
x −5
θ0

L MOUATASIM
Abdelkrim E Full Professeur of AI
(UIZ-FPO),
Winter term 2021/2022 Regression REG-16/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space (Ctd.)

• Data input space

• Determined by the m + 1 attributes of the data set x0 , x1 , x2 , . . . , xm
• Often high-dimensional
• Hypothesis space H
• Determined by the number of parameters of the model
• Each point in the hypothesis space corresponds to a specic assignment of
model parameters
• The error function gives information about how good this assignment is
• Gradient descent is applied in the hypothesis space H

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-17/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Data Input Space vs. Hypothesis Space (Ctd.)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-18/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Visualization of Gradient Descent in 3 Dimensions

J(θ) 40

20
5
0
−4 −2 0
0 2 4 −5 θ1
θ0

L MOUATASIM
Abdelkrim E Full Professeur of AI(UIZ-FPO),
Winter term 2021/2022 Regression REG-19/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Versions of Gradient Descent

• Assume some training data D: {x (i) , y (i) }ni=1
• Squared error for a single example: ℓ(ypred , ytrue ) = (ypred − ytrue )2
• Our objective is to minimize the total error:

X
n
min J(θ) = min ℓ(hθ (x (i) ), y (i) )
θ∈Rm+1 m+1 θ∈R
i=1

• Three versions of gradient descent:

1 Batch gradient descent
2 Stochastic gradient descent
3 Mini-batch gradient descent
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-20/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Versions of Gradient Descent (Ctd.)

• Batch gradient descent: Compute gradient based on ALL data points
X n
(t+1)
θ ←− θ (t)
− α ∇ℓ(hθ(t) (x (i) ), y (i) ) (7)
i =1
• Stochastic gradient descent: Compute gradient based on a SINGLE
data point (pick training example randomly and not sequentially!)
• For i ∈ {1, . . . , n} do:
θ (t+1) ←− θ (t) − α∇ℓ(hθ(t) (x (i) ), y (i) ) (8)

L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-21/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Solving linear Regression using Gradient Descent

• Randomly initialize θ
• To minimize the error, keep changing θ according to:
θ (t+1) ←− θ (t) − α∇θ J(θ (t) ) (9)
• We need to calculate ∇θj J(θ (based on a single example)
(t)
):
∂ 1 X 1X
n n
(hθ (x (i) ) − y (i) )2 = 2 · (hθ (x (i) ) − y (i) ) · (hθ (x (i) ) − y (i) )
∂ ∂
J(θ) =
∂θj ∂θj 2 2 ∂θj
i=1 i=1
(10)
X
n X
n
(hθ (x (i) ) − y (i) ) · (hθ (x (i) ) − y (i) )xj
∂ (i) (i)
= (θ0 x0 + · · · + θm xm(i) − y (i) ) =
∂θj
i=1 i=1
(11)
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-22/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Solving the introductory Example

30 R (y )
• θ0 ≈ 7.4218
• θ1 ≈ 2.9827
20
• J(θ) ≈ 446.9584
• hθ (x ) = 7.4218 + 2.9827 · x1
10 • R = hθ (2.7) = 15.4750
M (x1 )
2 4 6 8
Abdelkrim EL MOUATASIM Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-23/55
Introduction
Solutions to Regression
Closed-Form Solutions and Normal Equation
Probabilistic Regression
Gradient Descent
Basis Function Regression
Wrap-Up

Disadvantage of Gradient Descent

2 Local
Global
J(θ)

Optimum
Optimum
0

−2 −1 0 1 2
θ

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-24/55
Section:
Probabilistic Regression
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Probabilistic Regression

• Assumption 1: The target function values are generated by adding noise

to the function estimate:
y = hθ (x ) + ε (12)
• Assumption 2: The noise is a Gaussian random variable:

β≡ precision ε ∼ N(0, β−1 ) (13)

β = 1/σ2
p(y |x , θ, β) = N(y |hθ (x ), β−1 ) (14)

• y is now a random variable!

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-26/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Probabilistic Regression (Ctd.)

w ≡θ
t≡y
y (x0 , w ) ≡ hθ (x0 )

cf. [1], p. 29; probabilistic regression

Abdelkrim EL MOUATASIM AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-27/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression

• Given: A labeled set of training data points {(x (i) , y (i) )}ni=1
• Conditional likelihood (assuming the data is i. i. d.):
Y
n
p(y |X , θ, β) = N(y (i) |hθ (x (i) ), β−1 ) (15)
i=1
Y
n
= N(y (i) |θ ⊺ x (i) , β−1 ) (16)
i=1

• Maximize the likelihood w. r. t. θ and β

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-28/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression (Ctd.)

Simplify using the log-likelihood:
X
n
log p(y |X , θ, β) = log N(y (i) |θ ⊺ x (i) , β−1 ) (17)
i=1
Xn √
β β (i) ⊺ (i) 2
= log √ − (y − θ x ) (18)
i=1
2π 2
Remember log-rules?
β X (i)
n
n n
= log β − log(2π) − (y − θ ⊺ x (i) )2 (19)
2 2 2 i=1

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-29/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

Maximum Likelihood Regression (Ctd.)

• Compute the gradient w. r. t. θ :

∇θ log p(y |X , θ, β) = 0
X
n
−β (y (i) − θ ⊺ x (i) )x (i) = 0
i=1

...
θml = (X ⊺ X )−1 X ⊺ y

• Same result as in least squares regression

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-30/55
Introduction
Solutions to Regression
Underlying Assumptions
Probabilistic Regression
Maximum Likelihood Solution
Basis Function Regression
Wrap-Up

We have derived the squared Error!

Minimizing the squared error gives the maximum likelihood solution for the
parameters θ assuming Gaussian noise.

• The maximum likelihood approach gives rise to the squared error

• But it is much more powerful than regular least squares ⇒ We can
estimate the uncertainty β
!−1
1 X (i)
n
⊺ (i) 2
βml = (y − θml x ) (20)
n i=1
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-31/55
Section:
Basis Function Regression
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

What if the Data is non-linear?

Data
2
• So far we have tted straight lines
1
• What if the data is not
linear...? 0

y
1
The best-tting function is
obviously not a straight line! 2

What would you do? 10 5 0 5 10

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-33/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Basis Functions
• Remember: `When stuck switch to a dierent perspective'
• We can add higher-order features using basis functions φ:

We assume 1-D data

X
p
hθ (x) = θj φj (x) (21)
j=0

• There exist several types of basis functions:

• linear: φ0 (x) = 1 and φ1 (x) = x
• polynomial ⇒ see below
• radial basis functions (RBFs) ⇒ see below
• Fourier basis
L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-34/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

New Design Matrix

Applying the basis functions to X we get the new design matrix Φ:
φ0 (x (1) ) φ1 (x (1) ) φ2 (x (1) ) φp (x (1) )
 
...
φ0 (x (2) ) φ1 (x (2) ) φ2 (x (2) ) ... φp (x (2) )
Φ=


... ... ... ... ...


(22)
φ0 (x (n) ) φ1 (x (n) ) φ2 (x (n) ) ... φp (x (n) )

The model is still linear in the parameters, so we can still use the same algorithm
as before. This is still linear regression (!!!)

L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-35/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Polynomial Basis Functions

• A quite frequently used basis function: The polynomial basis

For N -D data we would also
φ0 (x) = 1 include cross-terms!
φj (x) = x j
X p
hθ (x) = θj φj (x) = θ0 + θ1 x + θ2 x 2 + · · · + θp x p
j=0

• Here, p is the degree of the polynomial

• Here: φ(x) = [1, x, x 2 , x 3 , . . . , x p ]
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-36/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

It is still linear!

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-37/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

It is still linear! (Ctd.)

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-38/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Basis Functions: Radial Basis Functions

• Yet another possible choice of basis function: Radial basis functions

φ0 (x) = 1 (23)

φj (x) = exp −1/2∥x − zj ∥2 /2σ2 (24)

• {zj } are the centers of the radial basis functions

• p denotes the number of centers / number of radial basis functions
• Often we take each data point as a center, so p = n

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-39/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Radial Basis Functions (Ctd.)

Data
2

0
y

10 5 0 5 10
x

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-40/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

The Danger of too expressive Models...

Polynomial of degree p = 16 RBF with σ = 1.00, p = n
( A severe overtting A) (About right)
Data
2 2 Regression line

1 1

0 0

y
y

1 1

2 Data 2
Regression line
10 5 0 5 10 10 5 0 5 10
x x

Abdelkrim E L MOUATASIM Full Professeur of AI

(UIZ-FPO),
Winter term 2021/2022 Regression REG-41/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Overtting vs. Undertting

• Undertting
• The model is not complex enough to t the data well ⇒ High bias
• Make the model more complex; adding new examples does not help
• Overtting
• The model predicts the training data perfectly
• But it fails to generalize to unseen instances ⇒ High variance
• Decrease the degree of freedom or add more training examples
• Also: Try regularization
• Bias-Variance trade-o
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-42/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

First Solution: Smaller Degree

One solution: Use a smaller degree (here: p = 3) Much better :)

Data
2 Regression line

0
y

10 5 0 5 10
x
L MOUATASIM
Abdelkrim E Full Professeur ofAI (UIZ-FPO),
Winter term 2021/2022 Regression REG-43/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Second Solution: Regularization

• Enrich J(θ) with a regularization term
• This can prevent overtting and results in a smoother function
(large values for θj are prevented)
• Two forms of regularization, L1 and L2:

min J(θ) + λ|θ| → (L1) min J(θ) + λ∥θ∥2 → (L2)

θ θ

X
m X
m
|θ| = |θj | ∥θ∥2 = θ2j
j=1 j=1

• λ ⩾ 0 controls the degree of regularization

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-44/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Regularization visualized

• Here: w ≡ θ
• L1-Regularization
⇒ Lasso regression
(least abs. shrinkage and select. operator)
• L2-Regularization
⇒ Ridge regression
(Tikhonov regularization)
• The combination of both is called
elastic net cf. [1], p. 146; left: L2, right: L1

Abdelkrim EL MOUATASIM Full Professeur of AI(UIZ-FPO),

Winter term 2021/2022 Regression REG-45/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Incorporating Regularization
The regularization also
• Normal equation with regularization: helps to overcome numerical issues!
(ridge regression)
θ = (X ⊺ X + λI )−1 X ⊺ y (25)
• Regularized gradient descent update rule:

θ (t+1) ←− θ (t) − α∇θ J(θ (t) )

∂
J(θ) = (hθ (x ) − y )xj + λθj
∂θj
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-46/55
Introduction
General Idea
Solutions to Regression
Polynomial Basis Functions
Probabilistic Regression
Radial Basis Functions
Basis Function Regression
Regularization Techniques
Wrap-Up

Polynomial Regression with Regularization

At least better Way too much regularization
2 Data
2 Regression line

1 1

0 0
y

y
1 1

2 Data
Regression line 2
10 5 0 5 10 10 5 0 5 10
x x

Abdelkrim E L MOUATASIM Full Professeur of AI

(UIZ-FPO),
Winter term 2021/2022 Regression REG-47/55
Section:
Wrap-Up
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Summary
• Regression predicts continuous target variables
• The algorithm minimizes the (mean) squared error
• Minimizing the squared error gives the maximum likelihood solution
• Two approaches:
1 Normal equation
2 (Batch / stochastic / mini-batch) gradient descent
• Probabilistic regression allows to quantify the uncertainty of the model
• Use basis functions to t non-linear regression lines
• Regularization is important
L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-49/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Self-Test Questions

1 What is the goal of regression?

2 What can you do if matrix inversion fails for the normal equation?
3 What is a suitable cost function for regression? Where does it come from?
4 Does gradient descent give the exact solution?
5 What is the advantage of probabilistic regression?
6 What are basis functions? Why use them? State some examples.
7 What is overtting / undertting?
8 What is regularization? Why should you apply it?
Abdelkrim EL MOUATASIM AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-50/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

What's next...?

Unit I Machine Learning Introduction

Unit II Mathematical Foundations
Unit III Regression
Unit IV Classication I
Unit V Evaluation
Unit VI Classication II
Unit VII Clustering
Unit VIII Deep learning

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-51/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

[1] Pattern Recognition and Machine Learning

Christopher Bishop. Springer. 2006.

→ Link , cf. chapter 3.1

[2] Machine Learning: A Probabilistic Perspective
Kevin Murphy. MIT Press. 2012.

→ Link , cf. chapters 1.4.5 and 1.4.7

[3] Stanford CS229 course notes
Andrew Ng. Stanford University. 2019.

→ Link

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-52/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

[4] Stanford CS229 course recording

Andrew Ng. Stanford University. 2008.

→ Link

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-53/55
Introduction Summary
Solutions to Regression Self-Test Questions
Probabilistic Regression Lecture Outlook
Basis Function Regression Recommended Literature and further Reading
Wrap-Up Meme of the Day

Meme of the Day

L MOUATASIM
Abdelkrim E AI
Full Professeur of (UIZ-FPO),
Winter term 2021/2022 Regression REG-54/55
Thank you very much for the attention!

Topic: * Applied Machine Learning Fundamentals * Regression

Term: Winter term 2021/2022
Contact:
Abdelkrim EL MOUATASIM Full Professeur of AI
UIZ FPO - DMG
[email protected]

Do you have any questions?

First Year Syllabus Book 2022 Scheme For Website 30-10-2024
No ratings yet
First Year Syllabus Book 2022 Scheme For Website 30-10-2024
152 pages
Explosive Energy Distribution: Kleine Et Al (1993)
No ratings yet
Explosive Energy Distribution: Kleine Et Al (1993)
2 pages
Introduction To Modeling of Structures Using Opensees: Maha Kenawy, PH.D
No ratings yet
Introduction To Modeling of Structures Using Opensees: Maha Kenawy, PH.D
53 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Machine Learning Homework1 Solutions
No ratings yet
Machine Learning Homework1 Solutions
16 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Representer Function
No ratings yet
Representer Function
12 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
lec6_7_Linear_regression
No ratings yet
lec6_7_Linear_regression
38 pages
ML_Lec 5_Regression_Gradient Descent Least Square
No ratings yet
ML_Lec 5_Regression_Gradient Descent Least Square
59 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
hw01s
No ratings yet
hw01s
10 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Cs419 Closed Form Derv
No ratings yet
Cs419 Closed Form Derv
5 pages
HW 1
No ratings yet
HW 1
3 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression With Gradient Descent
100% (1)
Linear Regression With Gradient Descent
8 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
CISE301-Topic 3 Curve Fitting
No ratings yet
CISE301-Topic 3 Curve Fitting
38 pages
Week14_Applications
No ratings yet
Week14_Applications
23 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
ML Notes
No ratings yet
ML Notes
14 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Updating_Weight
No ratings yet
Updating_Weight
9 pages
Notes 04
No ratings yet
Notes 04
50 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture 6
No ratings yet
Lecture 6
29 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
P05 LinearRegression SolutionNotes
No ratings yet
P05 LinearRegression SolutionNotes
4 pages
LogisticRegression_ExercisesSolutions
No ratings yet
LogisticRegression_ExercisesSolutions
5 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Machine Learning - Home - Week 2 - Notes - Coursera
No ratings yet
Machine Learning - Home - Week 2 - Notes - Coursera
10 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
FP1 Summation of Series
No ratings yet
FP1 Summation of Series
7 pages
Modeling Notes Complete
100% (2)
Modeling Notes Complete
144 pages
Key Topics Edexcel Higher Paper 3
No ratings yet
Key Topics Edexcel Higher Paper 3
4 pages
JEE Mains 1 - 2024
No ratings yet
JEE Mains 1 - 2024
16 pages
Streda+Integers v2
No ratings yet
Streda+Integers v2
5 pages
Calc 2 Study Guide
No ratings yet
Calc 2 Study Guide
17 pages
Exam - GRD 8 - Nov - 2016 Paper 1 - Final
No ratings yet
Exam - GRD 8 - Nov - 2016 Paper 1 - Final
9 pages
General Math For K11
No ratings yet
General Math For K11
2 pages
23-24 EoT1 Coverage Mathematics G8 Gen Reveal
No ratings yet
23-24 EoT1 Coverage Mathematics G8 Gen Reveal
2 pages
Pile Spring
100% (1)
Pile Spring
69 pages
Ag 205
No ratings yet
Ag 205
61 pages
Mathematics T Coursework Conclusion
100% (2)
Mathematics T Coursework Conclusion
5 pages
12 Polynomial Functions
No ratings yet
12 Polynomial Functions
34 pages
Pressure Drawdown Analysis-Multirate
No ratings yet
Pressure Drawdown Analysis-Multirate
13 pages
Application of Sturm Liouville Problem in The Wave Equation
No ratings yet
Application of Sturm Liouville Problem in The Wave Equation
4 pages
Year 10 Mathematics Curriculum
No ratings yet
Year 10 Mathematics Curriculum
2 pages
Int Cal & Trig
No ratings yet
Int Cal & Trig
26 pages
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
No ratings yet
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
67 pages
B.Tech-R15-ECE
No ratings yet
B.Tech-R15-ECE
164 pages
Study Guide Summer 2013
No ratings yet
Study Guide Summer 2013
7 pages
Solution of Fredholm Integral Equations by Collocation
100% (1)
Solution of Fredholm Integral Equations by Collocation
4 pages
Solving Separable Differential Equation Using Antidifferential
No ratings yet
Solving Separable Differential Equation Using Antidifferential
9 pages
Quantitative Techniques in Business
100% (1)
Quantitative Techniques in Business
18 pages
Methods of Solving: System of Linear Equations
No ratings yet
Methods of Solving: System of Linear Equations
15 pages
Differential Equtions PDF
No ratings yet
Differential Equtions PDF
3 pages
Course Title
No ratings yet
Course Title
142 pages

03 Regression

Uploaded by

03 Regression

Uploaded by

*** Applied Machine Learning Fundamentals ***

Abdelkrim EL MOUATASIM Full Professeur of AI

Winter term 2021/2022

Find all slides on https://sites.google.com/a/uiz.ac.ma/elmouatasim/

Unit I Machine Learning Introduction

Agenda for this Unit

4 Basis Function Regression

Example Data Set: Revenues

Revenue y • Find a linear function:

Error Function for Regression

• We want to minimize J(θ):

• This is ordinary least squares (OLS)

Error Function Intuition

• `Normal equation' (scales to arbitrary dimensions):

X b is called `design matrix' or `regressor matrix'

Design Matrix / Regressor Matrix

• And the n × 1 label vector:

Derivation of the Normal Equation

• The derivation involves a bit of linear algebra

Derivation of the Normal Equation (Ctd.)

Problems with Matrix Inversion?

• Update the parameters iteratively:

θ (t+1) ←− θ (t) − α∇θ J(θ (t) ) (6)

• where α > 0 (learning rate) and ∇θ J(θ) is the gradient of J(θ) w. r. t. θ:

Data Input Space vs. Hypothesis Space

Data input space Hypothesis space H

Data Input Space vs. Hypothesis Space (Ctd.)

• Data input space

Data Input Space vs. Hypothesis Space (Ctd.)

Visualization of Gradient Descent in 3 Dimensions

Versions of Gradient Descent

• Three versions of gradient descent:

Versions of Gradient Descent (Ctd.)

Solving linear Regression using Gradient Descent

Solving the introductory Example

Disadvantage of Gradient Descent

• Assumption 1: The target function values are generated by adding noise

β≡ precision ε ∼ N(0, β−1 ) (13)

• y is now a random variable!

Probabilistic Regression (Ctd.)

cf. [1], p. 29; probabilistic regression

Maximum Likelihood Regression

• Maximize the likelihood w. r. t. θ and β

Maximum Likelihood Regression (Ctd.)

Maximum Likelihood Regression (Ctd.)

• Compute the gradient w. r. t. θ :

• Same result as in least squares regression

We have derived the squared Error!

• The maximum likelihood approach gives rise to the squared error

What if the Data is non-linear?

What would you do? 10 5 0 5 10

We assume 1-D data

• There exist several types of basis functions:

New Design Matrix

Polynomial Basis Functions

• A quite frequently used basis function: The polynomial basis

• Here, p is the degree of the polynomial

It is still linear! (Ctd.)

Basis Functions: Radial Basis Functions

• Yet another possible choice of basis function: Radial basis functions

• {zj } are the centers of the radial basis functions

Radial Basis Functions (Ctd.)

The Danger of too expressive Models...

Abdelkrim E L MOUATASIM Full Professeur of AI

Overtting vs. Undertting

First Solution: Smaller Degree

Second Solution: Regularization

min J(θ) + λ|θ| → (L1) min J(θ) + λ∥θ∥2 → (L2)

• λ ⩾ 0 controls the degree of regularization

Abdelkrim EL MOUATASIM Full Professeur of AI(UIZ-FPO),

θ (t+1) ←− θ (t) − α∇θ J(θ (t) )

Polynomial Regression with Regularization

Abdelkrim E L MOUATASIM Full Professeur of AI

1 What is the goal of regression?

Unit I Machine Learning Introduction

Recommended Literature and further Reading I

* Applied Machine Learning Fundamentals *

Overtting vs. Undertting

Topic: * Applied Machine Learning Fundamentals * Regression