0% found this document useful (0 votes)
18 views50 pages

Notes 04

Uploaded by

HAMXALA KHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views50 pages

Notes 04

Uploaded by

HAMXALA KHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Machine Learning

EE514 – CS535

Linear Regression: Formulation, Solutions,


Polynomial Regression, Gradient Descent
and Regularization

Zubair Khalid

School of Science and Engineering


Lahore University of Management Sciences

https://www.zubairkhalid.org/ee514_2023.html
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Regression
Regression: Quantitative Prediction on a continuous scale
- Given a data sample, predict a numerical value

Example: Linear relationship


x Process or System y
Input Observed
Output

Process or f (x)
x System
y
Input Noise n Observed
Output

Here, PROCESS or SYSTEM refers to any underlying physical or logical


phenomenon which maps our input data to our observed and noisy output data.
Regression
Overview:

x Process or System y
Input Observed
Output

One variable regression: 𝒚 is a scalar

Multi-variable regression: 𝐲പ is a vector


We will cover
Single feature regression: 𝐱 is a scalar

Multiple feature regression: 𝐱പ is a vector


Regression
Examples:

Single Feature:
- Predict score in the course given the number of hours of effort per week.
- Establish the relationship between the monthly e-commerce sales and the advertising costs.
Multiple Feature:
- Studying operational efficiency of machine given sensors (temperature, vibration) data.
- Predicting remaining useful life (RUL) of the battery from charging and discharging information.
- Estimate sales volume given population demographics, GDP indicators, climate data, etc.
- Predict crop yield using remote sensing (satellite images, gravity information).
- Dynamic Pricing or Surge Pricing by ride sharing applications (Uber).
- Rate the condition (fatigue or distraction) of the driver given the video.
- Rate the quality of driving given the data from sensors installed on car or driving patterns.
Regression
Model Formulation and Setup:
True Model:
We assume there is an inherent Process or y
x
but unknown relationship between System
input and output. Input Noise n Observed
Output

Goal:
Given noisy observations, we need to
estimate the unknown functional
relationship as accurately as possible.
True unknown function
Observations

𝐱
Regression
Model Formulation and Setup:
Process or
System
- Single Feature Regression, Example: Input Noise n Observed
Output

𝒚 Training Data
First Data Sample: x (1) , y (1)
Second Data Sample: x ( 2 ) , y ( 2 )

n-th Data Sample: x ( n ) , y ( n )

𝐱
Regression
Model Formulation and Setup:
Observed Output
We have: Input
Process or
System
Noise n Error

Model
Model Output
Linear Regression
Overview:
- Second learning algorithm of the course

- Scalar output is a linear function of the inputs

- Different from KNN: Linear regression adopts a modular approach which we will use
most of the times in the course.
- Select a model
- Defining a loss function
- Formulate an optimization problem to find the model parameters such that a loss
function is minimized.
- Employ different techniques to solve optimization problem or minimize loss function.
Linear Regression
Model:
Linear Regression
Model:
What is Linear?
𝐨𝐫 𝒚
Interpretation:

𝐨𝐫 𝒚
Linear Regression
Define Loss Function:
- Loss function should be a function of model parameters.

Observed values

𝒚
Residual error

True unknown function:

𝐱
Linear Regression
Define Loss Function:

- One minimizer for all loss functions.


Linear Regression
Define Loss Function:

How to solve?
Linear Regression
Define Loss Function:
Reformulation:

Model
Residual error Parameters

Consequently: Observations Inputs


Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)

- Very beautiful, elegant function we have here!


Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)
Gradient of a function: Overview

Examples:
Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)
Linear Regression
So far and moving forward:
- We assumed that we know the structure of the model, that is, there is a linear

relationship between inputs and output.

- Number of parameters = dimension of the feature space + 1 (bias parameter)

- Formulated loss function using residual error.

- Formulated optimization problem and obtain analytical solution.

- Linear regression is one of the models for which we can obtain an analytical solution.

- We will shortly learn an algorithm to solve optimization problem numerically.


Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Polynomial Regression
Overview:
𝒚

- If the relationship between the inputs and output is not linear,


Is it linear ?

we can use a polynomial to model the relationship.

- We will formulate the polynomial regression model for single

feature regression problem.

- Polynomial Regression is often termed as Non-linear


𝐱
Regression or Linear in Parameter Regression.

- We will also revisit the concept of ‘over-fitting’.


Polynomial Regression
Single Feature Regression:
Formulation:
Polynomial Regression
Single Feature Regression:
Formulation:

We have seen
this before.
&
We are capable
to solve this!
Polynomial Regression
Single Feature Regression:
Example (Ref: CB. Section 1.1):
Polynomial Regression
Single Feature Regression:
Example:

Underfitting:
Model is too
simple

Overfitting:
Model is too
complex
Polynomial Regression
Single Feature Regression:
Example:
Overfitting

Good choice
of M

Solution 1:
Polynomial Regression
Single Feature Regression:
Example:
Polynomial Regression
Single Feature Regression:
How to Handle Overfitting?
- The polynomial degree M is the hyper-parameter of our model, like we had k in kNN,
and controls the complexity of the model.
- If we stick with M=3 model, this is the restriction on the number of parameters.
- We encounter overfitting for M=9 because we do not have sufficient data.
Solution 2: Take more data points to avoid over-fitting.

Solution 3: Regularization
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Regularization
Regularization overview:
- The concept is broad but we will see in the context of linear regression or polynomial
regression which we formulated as linear regression.
- Encourages the model coefficients to be small by adding a penalty term to the error.
- We had the loss function of the following form that we minimize to find the coefficients:

See linear regression


formulation.

- We add a ‘penalty term’, known as regularizer, in the loss function as

Regularized Loss function Regularizer


Regularization
L2 Least-squares Regularization – Ridge Regression:
- Since we require to discourage the model coefficients from reaching large values; we can
use the following simple regularizer:

- For this choice, regularized loss function becomes

- This regularization term maintains a trade-off between ‘fit of the model to the data’
and ‘square of norm of the coefficients’.
- If model is fitted poorly, the first term is large.
- If coefficients have high values, the second term (penalty term) is large.

Intuitive Interpretation: We want to minimize the error while


keeping the norm of the coefficients bounded.
Regularization
L2 Least-squares Regularization – Ridge Regression:
- Regularized loss function is still quadratic, and we can find closed form solution.
Regularization
L2 Least-squares Regularization – Ridge Regression:
Example:

No regularization Too much regularization


Regularization
L2 Least-squares Regularization – Ridge Regression:
Example:
Regularization
L2 Least-squares Regularization – Ridge Regression:
Graphical Visualization:
Regularization
L1 Least-squares Regularization – Lasso Regression
Graphical Visualization:
Regularization
Elastic Net Regression, L1 vs L2
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Gradient Descent Algorithm
Optimization and Gradient Descent - Overview
Gradient Descent Algorithm
Optimization and Gradient Descent - Overview
Gradient Descent Algorithm
Formulation:
Gradient Descent Algorithm
Algorithm:
Overall:

Pseudo-code:

Note: Simultaneous update.

Convergence and Step size:


Gradient Descent Algorithm
Linear Regression Case:

Gradient Descent:

Note:
Simultaneous update.
Gradient Descent Algorithm
Linear Regression Case:
Visualization:

Surface plot Contour plot


Gradient Descent Algorithm
Linear Regression Case:

Gradient Descent:

Note:
Simultaneous update.
Gradient Descent Algorithm
Notes:

Why?

Stochastic Gradient Descent:


Gradient Descent Algorithm
Stochastic Gradient Descent (SGD) - Rationale:
Gradient Descent Algorithm
Stochastic Gradient Descent (SGD):

Pros:
Gradient Descent Algorithm
SGD for Linear Regression Case:

Iteration Epoch
Gradient Descent Algorithm
Mini-batch Stochastic Gradient Descent (SGD) :
Batch Gradient Descent Stochastic Gradient Descent

You might also like