0% found this document useful (0 votes)

68 views30 pages

Regularization in Polynomial Regression

Uploaded by

Saitama Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views30 pages

Regularization in Polynomial Regression

Uploaded by

Saitama Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Regularization and
polynomial regression
CSCI-P 556
ZORAN TIGANJ
2
Reminders/Announcements

u Don’t forget the Quiz deadline on Wednesday

u More office hours available (check Canvas homepage)
u Make sure to follow instructions when signing up for groups in HW1
u Don’t create your own groups – those are not visible to us when grading, use our
groups
u Make sure to be in a group even if you’re doing the assignment alone
(otherwise, we won’t see your submission on Canvas)
3
Today

u Polynomial regression
u Regularization (Lass, Ridge, Elastic net, Early stopping)
4
Examples of linear models

Linear models have a linear relationship between dependent variable (y) and parameters

Description: This model predicts a response based on

Simple Linear Regression
a single predictor variable.
Example Application: Predicting the salary of an
employee based on years of experience.

Multiple Linear Regression Description: This model uses multiple predictor

variables to predict a response.
Example Application: Estimating the price of a house
based on its size, age, and location.
Description: Although it models non-linear
Polynomial Regression
relationships, it is considered a linear model because
it is linear in the parameters.
Example Application: Modeling the growth rate of
bacteria depending on the nutrient concentration
5
Examples of non-linear models

Non-Linear models have a non-linear relationship between dependent variable (y or p) and parameters

Description: Despite its name, logistic regression is a

Logistic Regression
non-linear model used for binary classification.
Example Application: Predicting whether a patient
has a disease (1) or not (0) based on their test results.

Exponential Growth Model Description used for growth processes where the
increase is proportional to the current amount.
Example Application: Predicting population growth.

Description: Fitting some known functional

Function fitting
relationship. E.g., sinusoidal is appropriate for data
exhibiting periodic fluctuations.
Example Application: Seasonal variations in
temperature or other cyclic phenomena.
6
Why does the difference between
linear and non-linear models matter?

u Linear models will usually have a closed-form solution in the form of a

Normal Equation and the loss function is convex (if we use MSE as a loss
function, also called Ordinary Least Squares method).
u Closed closed-form solution might not exist if we can’t compute the inverse in
the normal equation
u Non-linear models can represent more complex relationships. This ability
makes them suitable for dealing with real-world phenomena that
inherently exhibit non-linear dynamics, such as exponential growth,
saturation effects, and threshold effects.
u Non-linear models typically require iterative, numerical approaches, such as
gradient descent, which can be computationally intensive and require more
data to achieve stable estimates (loss function often not convex)
7
Polynomial Regression

u What if your data is more complex than a straight line?

u Surprisingly, you can use a linear model to fit nonlinear data.
u A simple way to do this is to add powers of each feature as new features,
then train a linear model on this extended set of features.
u This technique is called Polynomial Regression
8
Polynomial Regression

u Let’s generate some nonlinear data, based on a simple quadratic

equation:
9
Polynomial Regression

u Let’s use ScikitLearn’s PolynomialFeatures class to transform our training

data, adding the square (second-degree polynomial) of each feature in
the training set as a new feature (in this case there is just one feature):
10
Polynomial Regression
11
Polynomial Regression

u Note that when there are multiple features, Polynomial Regression is

capable of finding relationships between features.
u This is made possible by the fact that PolynomialFeatures also adds all
combinations of features up to the given degree.
u For example, if there were two features a and b, PolynomialFeatures with
degree=3 would not only add the features a2, a3, b2, and b3, but also the
combinations ab, a2b, and ab2.
12
Learning Curves

u If you perform high-degree Polynomial

Regression, you will likely fit the training data
much better than with plain Linear
Regression.
u This high-degree Polynomial Regression
model is severely overfitting the training
data, while the linear model is underfitting it.
u How can you tell that your model is
overfitting or underfitting the data?
13
Learning Curves: Underfitting

u Linear model: These learning curves are

typical of a model that’s underfitting. Both
curves have reached a plateau; they are
close and fairly high.

We cannot correct for underfitting with more training instances,

we need to make a more complex model
14
Learning Curves: Overfitting

u 10th degree polynomial:

u The error on the training data is much lower
than with the Linear Regression model.
u There is a gap between the curves. This means
that the model performs significantly better on
the training data than on the validation data,
which is the hallmark of an overfitting model.
u If you used a much larger training set,
however, the two curves would continue to
get closer.
15
Linear Regression Model
When high bias, reduce alpha

high bias: Underfit, reduce alpha 10th Degree Polynomial Model When high variance, increase alpha
high variance: Overfit, increase alpha
16
Regularized Linear Models

u A good way to reduce overfitting is to regularize the model (i.e., to constrain it):
the fewer degrees of freedom it has, the harder it will be for it to overfit the
data.
u A simple way to regularize a polynomial model is to reduce the number of
polynomial degrees.
u For a linear model, regularization is typically achieved by constraining the
weights of the model.
u We will now look at three different ways to constrain the weights:
u Ridge Regression,
u Lasso Regression,
u Elastic Net
17
Ridge Regression

u Ridge Regression (also called Tikhonov regularization) is a regularized

version of Linear Regression: a regularization term added to the cost
function is

u This forces the learning algorithm to not only fit the data but also keep the
model weights as small as possible.
u Loss function with regularization:
18
Ridge Regression

Note that the regularization

term should only be added
u Ridge Regression (also called Tikhonov regularization) is a regularized to the cost function during
version of Linear Regression: a regularization term added to the cost training. Once the model is
trained, you want to evaluate
function is the model’s performance
using the unregularized
performance measure.

u This forces the learning algorithm to not only fit the data but also keep the
model weights as small as possible.
u Loss function with regularization:
The higher the alpha, the flatter the line

u If 𝛼 is very large, then all weights end up very close to zero and the result is
a flat line going through the data’s mean (note that sum starts from 1).
19
Ridge Regression
20
Ridge Regression

u Ridge Regression closed-form solution

21
Ridge Regression

u And using Stochastic Gradient Descent:

22
Lasso Regression

Use Lasso if some features are useless

u Least Absolute Shrinkage and Selection Operator Regression (usually so Lasso makes some features zero or
close to zero.
simply called Lasso Regression) is another regularized version of Linear
Regression. Ridge reduces the weights of all
features ..

u Just like Ridge Regression, it adds a regularization term to the cost function,
but it uses the ℓ1 norm of the weight vector instead of half the square of
the ℓ2 norm.
23
Lasso Regression
24
Lasso Regression

u An important characteristic of Lasso

Regression is that it tends to eliminate the
weights of the least important features
(i.e.,set them to zero).
u In other words, Lasso Regression
automatically performs feature selection
and out puts a sparse model (i.e., with
few nonzero feature weights).

https://explained.ai/regularization/index.html
25
Lasso Regression
26
Elastic Net

u Elastic Net is a middle ground be tween Ridge Regression and Lasso

Regression.
u The regularization term is a simple mix of both Ridge and Lasso’s
regularization terms, and you can control the mix ratio r.
u when r = 0, Elastic Net is equivalent to Ridge Regression,
u when r = 1, it is equivalent to Lasso Regression

Ridge: weights as small as possible

Lasso: weights zero for unimporatant features
27
What regularization those choose

u So when should you use plain Linear Regression (i.e., without any
regularization), Ridge, Lasso, or Elastic Net?
u It is almost always preferable to have at least a little bit of regularization, so
generally you should avoid plain Linear Regression.
u Ridge is a good default, but if you suspect that only a few features are useful,
you should prefer Lasso or Elastic Net because they tend to reduce the useless
features’ weights down to zero.
u In general, Elastic Net is preferred over Lasso because Lasso may behave
erratically when the number of features is greater than the number of training
instances or when several features are strongly correlated.

Elastic net > Lasso

Lasso behaves irregularly when m > no. of training instances or when features are strongly correlated
28
Early Stopping

u A very different way to regularize iterative learning algorithms such as

Gradient Descent is to stop training as soon as the validation error reaches
a minimum.
u This is called early stopping.
29
Early Stopping

u With Stochastic and Mini-batch Gradient Descent, the curves are not so
smooth, and it may be hard to know whether you have reached the
minimum or not.
u One solution is to stop only after the validation error has been above the
minimum for some time (when you are confident that the model will not
do any better), then roll back the model parameters to the point where
the validation error was at a minimum.
30
Next time

u Logistic regression, from Chapter 4 from Hands on machine learning

textbook

Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
Lecture-6 Linear Regression Addition
No ratings yet
Lecture-6 Linear Regression Addition
15 pages
ML 1
No ratings yet
ML 1
24 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
MLDAP Module2
No ratings yet
MLDAP Module2
32 pages
Unit 2
No ratings yet
Unit 2
8 pages
Notes 04
No ratings yet
Notes 04
50 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
89 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Week 2
No ratings yet
Week 2
43 pages
Lecture 5 - Polynomial Regression Imran 07032025 114203am
No ratings yet
Lecture 5 - Polynomial Regression Imran 07032025 114203am
39 pages
Regression Techniques Guide
No ratings yet
Regression Techniques Guide
74 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
445 Lecture 7
No ratings yet
445 Lecture 7
30 pages
Regression Models
No ratings yet
Regression Models
5 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
Supervised Learning: Regression Insights
No ratings yet
Supervised Learning: Regression Insights
11 pages
Group 30
No ratings yet
Group 30
33 pages
Lasso Regression in Logistic Models
No ratings yet
Lasso Regression in Logistic Models
43 pages
Regression
No ratings yet
Regression
56 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Extracted Text
No ratings yet
Extracted Text
391 pages
Module B Handbook
No ratings yet
Module B Handbook
11 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Cost Function
No ratings yet
Cost Function
17 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
Understanding Polynomial Regression Model
No ratings yet
Understanding Polynomial Regression Model
11 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
04 Linear
No ratings yet
04 Linear
31 pages
Comprehensive Machine Learning Tutorial - Regressio
No ratings yet
Comprehensive Machine Learning Tutorial - Regressio
9 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Regularization for Overfitting Prevention
No ratings yet
Regularization for Overfitting Prevention
7 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Scribe Notes Fall 2022
No ratings yet
Scribe Notes Fall 2022
41 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
9295-1732532901075-Unit 35 - NEW System Analysis and Design - 2024-2025 (2) (AutoRecovered)
No ratings yet
9295-1732532901075-Unit 35 - NEW System Analysis and Design - 2024-2025 (2) (AutoRecovered)
106 pages
Chandrima: Interior Design Portfolio
No ratings yet
Chandrima: Interior Design Portfolio
12 pages
Bosch-Ebike Purion MY20 BUI210 215 US Oreg
No ratings yet
Bosch-Ebike Purion MY20 BUI210 215 US Oreg
14 pages
Case Report HP
No ratings yet
Case Report HP
3 pages
DCD Unit Wise Important Questions
No ratings yet
DCD Unit Wise Important Questions
10 pages
Object Detection with ImageAI Tutorial
No ratings yet
Object Detection with ImageAI Tutorial
7 pages
Questionnaire Analysis Using Spss
100% (1)
Questionnaire Analysis Using Spss
14 pages
Computer Studies Revision Kits Form 3 2022 Term 2 Paper 1 Model17092022007-2
No ratings yet
Computer Studies Revision Kits Form 3 2022 Term 2 Paper 1 Model17092022007-2
22 pages
How in STEP 7 (TIA Portal) Do You Save The Value of An "HSC" (High-Speed Counter) For The S7-1200 After STOP Mode or After A Restart?
No ratings yet
How in STEP 7 (TIA Portal) Do You Save The Value of An "HSC" (High-Speed Counter) For The S7-1200 After STOP Mode or After A Restart?
3 pages
ADS Jitter and Eye Diagram Analysis
No ratings yet
ADS Jitter and Eye Diagram Analysis
23 pages
Prasanna Nagpure Resume
No ratings yet
Prasanna Nagpure Resume
1 page
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Agile Burndown Charts & Testing Guide
No ratings yet
Agile Burndown Charts & Testing Guide
17 pages
JSA07039E-AB FES Maintenance and Operation Manual - PDF Manual Escalera
100% (1)
JSA07039E-AB FES Maintenance and Operation Manual - PDF Manual Escalera
63 pages
08-Fixed Point Arithmetic - Addition and Subtraction
No ratings yet
08-Fixed Point Arithmetic - Addition and Subtraction
7 pages
M. Phil. in Statistics: Syllabus
No ratings yet
M. Phil. in Statistics: Syllabus
12 pages
D 1295042413
No ratings yet
D 1295042413
4 pages
TV Block & Connection Diagram
No ratings yet
TV Block & Connection Diagram
24 pages
Crasher Virus Batch File Instructions
No ratings yet
Crasher Virus Batch File Instructions
51 pages
10 Coding Project Ideas
No ratings yet
10 Coding Project Ideas
10 pages
Microprocessor Study Materials
No ratings yet
Microprocessor Study Materials
158 pages
Network Solutions and Data Security Insights
100% (4)
Network Solutions and Data Security Insights
19 pages
Hervé: Broadcom Bcm4350 Cards Under High Sierra/Mojave/Catalina
No ratings yet
Hervé: Broadcom Bcm4350 Cards Under High Sierra/Mojave/Catalina
23 pages
pl-900 2
No ratings yet
pl-900 2
7 pages
Dasda
No ratings yet
Dasda
6 pages
Beige and Brown Minimalist Academic Resume - 20250908 - 010102 - 0000
No ratings yet
Beige and Brown Minimalist Academic Resume - 20250908 - 010102 - 0000
2 pages
IntelliSys 2015 7361164
No ratings yet
IntelliSys 2015 7361164
5 pages
Zudio
No ratings yet
Zudio
3 pages
MS-16811 Rev2.0
No ratings yet
MS-16811 Rev2.0
45 pages

Regularization in Polynomial Regression

Uploaded by

Regularization in Polynomial Regression

Uploaded by

1

u Don’t forget the Quiz deadline on Wednesday

Description: This model predicts a response based on

Multiple Linear Regression Description: This model uses multiple predictor

Description: Despite its name, logistic regression is a

Description: Fitting some known functional

u Linear models will usually have a closed-form solution in the form of a

u What if your data is more complex than a straight line?

u Let’s generate some nonlinear data, based on a simple quadratic

u Let’s use ScikitLearn’s PolynomialFeatures class to transform our training

u Note that when there are multiple features, Polynomial Regression is

u If you perform high-degree Polynomial

u Linear model: These learning curves are

We cannot correct for underfitting with more training instances,

u 10th degree polynomial:

u Ridge Regression (also called Tikhonov regularization) is a regularized

Note that the regularization

u Ridge Regression closed-form solution

u And using Stochastic Gradient Descent:

Use Lasso if some features are useless

u An important characteristic of Lasso

u Elastic Net is a middle ground be tween Ridge Regression and Lasso

Ridge: weights as small as possible

Elastic net > Lasso

u A very different way to regularize iterative learning algorithms such as

u Logistic regression, from Chapter 4 from Hands on machine learning

You might also like