0% found this document useful (0 votes)

17 views108 pages

Regression 0

The document discusses the origins and definitions of machine learning, highlighting its roots in statistics, computer science, and neuroscience. It explains the different types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their respective algorithms and applications. Additionally, it covers regression analysis as a method for predicting outcomes based on independent variables, using examples like house price prediction.

Uploaded by

ARSH SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views108 pages

Regression 0

Uploaded by

ARSH SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MACHINE LEARNING

WHY MACHINE LEARNING WAS INTRODUCED

 Statistics: How to efficiently train large complex models?

 Computer Science & Artificial Intelligence: How to train more

robust version of the AI system.

 Neuroscience: How to design operational models of the brain?

CAN YOU RECOGNIZE THESE PICTURES ?

If yes, How do you Recognize it ?

ORIGIN OF MACHINE LEARNING

……… Lies in very effort of understanding Intelligence

What is intelligence ?
It can be defined as the ability to comprehend; to understand and profit
from experience.

Capability of acquire and apply knowledge

LEARNING?
LEARNING?
2300
2300 YEARS
YEARS AGO……
AGO……

 Plato (427 – 347 BC)

 The concept of abstract ideas are known
to us a priori, through a Mystic
connection with world.
 He conclude that ability to think is found
in a priori knowledge of the concept
LEARNING ?
LEARNING ?
 Plato’s Pupil
 Aristotle (384 – 322 BC)
 Criticized his Teacher’s Theory
as it is not taking into account the
important aspect
An ability to learn or adapt to
changing world.
MACHINE LEARNING
MACHINE LEARNING
 Machine Learning is a subset of AI technique which use statistical
methods to enable machines to improve with experience.

• Learning –
– A computer program is said to learn from
• experience E
• with respect to some class of tasks T
• and performance measure P
– if its performance at tasks in T , as measured by P , improves with experience
E.” (Mitchell , 1997)
LEARNING ALGORITHMS…
LEARNING ALGORITHMS…
• General Tasks
– Classification, Regression, Transcription , Machine Translation etc.

• Performance measures
– Depends on the type of problem: Examples include –
• accuracy, error rate etc.
– Performance is measured on a dataset called test dataset, that is different
from the dataset used to train the algorithms.
– Often difficult to choose a performance measure that corresponds well to the
desired behavior of the system.

• Experience
– Algorithms are termed as supervised learning or unsupervised learning
algorithms based on the experience they are allowed to have on datasets.
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)

 Task T: Recognition and classifying handwritten words within images

 Performance Measure P: Percentage of words correctly classified.

 Training experience E: A database of handwritten words with given

classification
MACHINE LEARNING

• Learning from experience on data to make predictions.

Machine
Learning
Data
algorithm

Training
Prediction

Unseen Trained Prediction

Data model
BRANCHES OF MACHINE
LEARNING

Source: [Link]
machine-learning-b9e651e1ed9d
SUPERVISED MACHINE LEARNING
SUPERVISE APPROACH
MACHINE LEARNING APPROACH

 For each specific tasks

 We collect lots of examples with their known outcomes
 Learn a function that map inputs to outputs
 These programs tend to be data centric, i.e. driven by the learning
examples and tries to learn a preconceived hypothesis function that
can describe the mapping as close as possible.
SUPERVISED MACHINE LEARNING APPROACH

We collect lots of examples with their

known outcomes
Learn a function that map inputs to
outputs

Supervised Learning models are trying to find

parameter values that will allow them to
perform well on historical data. Then they
are used for making predictions on unknown
data, that was not a part of training dataset.
There are two main problems that can be solved with Supervised
Learning:

Classification Regression
Regression Classification

Linear Regression Logistic Regression

Multiple Linear Regression K-Nearest Neighbors

Polynomial Linear Regression Support Vector Machine

Support Vector Regression Naïve Bayes

Decision Tree Regression Decision Tree Classification

Random Forest Regression Random Forest Classification

SUPERVISED EXAMPLE & USE CASES

UNSUPERVISED
EXAMPLES & USE CASES
UNSUPERVISED MACHINE LEARNING APPROACH

 Finding patterns in data

 Draw inferences from non-labeled data (without reference to
known or labeled outcomes).
 Models based on this type of algorithms can be used for
discovering unknown data patterns and data structure itself.
CLUSTERING
ASSOCIATION RULE MINING

Source: [Link]
compared-with-collaborative-filtering-in-recommender-systems
DIMENSION REDUCTION METHOD
Association Rule
Clustering Dimension Reduction
Mining

K-Means Aprior PCA

Hierarchical FP-Growth LDA

DBSCAN Eclat
UNSUPERVISED EXAMPLE & USE CASES
REINFORCEMENT LEARNING

 Reinforcement learning is a type of machine learning where an agent learns to behave

in a environment by performing actions and seeing the results.
 Exploration (Trail and Error)
 Exploitation (Knowledge gained from the environment)
DEEP LEARNING

• The difference in artificial intelligence approaches over

the two decades (1997-2017)
– 1997: The IBM chess computer DeepBlue, was explicitly
programmed to win against the grandmaster Garry Kasparov in
1997
– 2017: AlphaGo was not preprogrammed to play Go.
– It learned using a general-purpose algorithm that allowed it to
interpret the game’s patterns.
• AlphaGo program applied deep learning.
DEEPDEEP LEARNING
LEARNING

 Deep learning is a new area of Machine Learning research,

which has been introduced with the objective of moving
machine learning closer to concept of its original goal:
Artificial Intelligence.

 It is inspired by the functionality of our brain cells called

neurons which led to the concept of artificial neural network
DEEP LEARNING
DEEP LEARNING

Source: [Link]
MACHINE LEARNING VS DEEP LEARNING

Deep Learning IS Machine Learning

Data Dependency Hardware Requirement Execution time

Feature Engineering Interpretability

Problem Solving
REGRESSION
SUPERVISED
SUPERVISED LEARNING
LEARNING
Learning a discrete function- classification
algorithm attempt to estimate the mapping
function from the input variables to
discrete or categorical output variables

Learning a continuous function- regression

algorithm attempt to estimate the mapping
function from the input variables to
numeric or continuous output variables
CLASSIFICATION VS REGRESSION

Classification Regression
Source: [Link]
SUPERVISED LEARNING

Image Source: [Link]

WHAT IS REGRESSION
WHAT IS REGRESSION

 It is used to predict target variables on a continuous scale.

Regression

Dataset

Map x  y
Identify
Relationship
SALARY AFTER COMPLETING THE COURSE

How much will your salary be ?

Depends on x = performance in course, quality of projects, etc….
TWEET POPULARITY

 How many people will retweet your tweet? (y)

 Depends on x = # followers, # of followers of followers, features of text tweeted,

popularity of hashtag, # of past retweets…….
REGRESSION ANALYSIS

 Regression Analysis is a statistical tool for investigating the

relationship between a dependent variable and one or more
independent variables/explanatory variable.

 Regression analysis is widely used for prediction and

forecasting
INDEPENDENT AND DEPENDENT VARIABLE

 Independent Variable (Explanatory Variable):

A variable whose value does not change by the effect of other variables and
is used to manipulate the dependent variable/target variable. It is often denoted
by X

 Dependent Variable
A variable whose value changes when there is any manipulation in the
values of independent variable. It is often denoted by Y
CASE STUDY: PREDICTING HOUSE PRICE
CASE STUDY: PREDICTING HOUSE PRICE

Size of house (ft) is independent variable also

known as control variable

Price of house is dependent variable/response

variable
WHAT IS REGRESSION
CASE STUDY: PREDICTING HOUSE PRICE

Regression

Dataset
BIVARIATE AND MULTIVARIATE MODEL

 Bivariate or simple regression model

Size of house X Y Price

 Multivariate or multiple regression model

Size of house X1

# of bedrooms X2 Y Price
Age of house X3
SIMPLE/BIVARIATE LINEAR REGRESSION

 Simple linear regression is a linear regression model with a single explanatory

variable.

 It concerns two-dimensional sample points with one independent variable and one
dependent variable and finds a linear function (a non-vertical straight line) that, as
accurately as possible, predicts the dependent variable values as a function of the
independent variables.

 The adjective simple refers to the fact that the outcome variable is related to a
single predictor.
HOW MUCH IS MY HOUSE WORTH?
LOOK AT RECENT SALES IN MY NEIGHBORHOOD

 How much did they sell for ?

𝒙(𝒊) 𝑦 (𝑖)

𝒙(𝒊)
𝑦 (𝑖)
REGRESSION (HOUSE PRICE PREDICTION) Scatter plot is a mathematical diagram to
display values of two variables for a set of data.

Size of house (ft) is independent 𝒙 𝒊 , 𝒚𝒊

variable also known as control
variable

Dependent Variable
Price of house is dependent
Variable/response variable

Independent Variable

Scatter plots are used to investigates the position

relationship between the variables
SIMPLE LINEAR REGRESSION
House Price Predication
We want to fit the best line (linear function
Y = f(X)) to explain the data
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION

 The equation that describe how dependent variable (y) is related to independent
variable (x). The equation is referred as a regression equation.
𝑦 = 𝑚𝑥 + 𝑐

 The simple linear regression model is:

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
• x is independent variable
• Parameters/Regression coefficients are 𝜃0 (intercept) and 𝜃1 (𝑠𝑙𝑜𝑝𝑒)
Represents the relationship
REGRESSION between input
(𝑥) and output (y)
The simple linear regression equation is

House price (y)

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙

Size of house (x)

1. The regression equation is a straight line

2. 𝜃0 intercept of the regression line
3. 𝜃1 𝑠𝑙𝑜𝑝𝑒 of the regression line
4. ℎ𝜃 𝑥 hypothesis of the model
ESTIMATION PROCESS

Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
Unknown 𝜽𝟎 , 𝜽𝟏

Sample Data
𝜽𝟎 , 𝜽𝟏 are known
(x, y)
Estimated
Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
GOAL OF REGRESSION MODEL

 Our goal to learn the model parameters that minimize error in the
model’s prediction.

𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)

𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )

𝒚(𝒊)

Size of house (x)

 To find the best parameters:
 Define the cost function , or loss function that measures how inaccurate our
model’s prediction are.

𝑦 (𝑖) − ℎ𝜃 (𝑥 (𝑖) )
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)

ℎ𝜃 (𝑥 (𝑖) ) − 𝑦 (𝑖)
𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )

𝒚(𝒊)

Size of house (x)

SIMPLE LINEAR REGRESSION

Parameter :
Regression coefficient

Hθ(x) =
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙 x y
3
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙 1 1
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙
2 2
3 3
2
1
0

0 1 2 3
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙
3 x y
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙 1 1
2 2
2

3 3
1

Example
Suppose x = 2.5
0

ℎ𝜃 𝑥 = 1 + 0.5 ∗ 𝑥
0 1 2 3

Predict the outcome

ℎ𝜃 𝑥 =1 + 0.5 *2.5
= 2.25
ESTIMATION PROCESS

Size of
house (x)
LEAST SQUARE METHOD

 One of the most common estimation

technique for linear regression is Least
Square Estimation.

 The least square method is a statistical

procedure to find the best fit for a set
of data points by minimizing the sum
of the offsets or residuals of points Size of
from plotted curve. house (x)
Least Square Method
𝑖
𝑦 = 𝜃0 + 𝜃1 𝑥 (𝑖) + 𝜀 𝑖

𝜀𝑖 = 𝑦 𝑖 − ℎ𝜃 (𝑥 𝑖 )

is residual error (RSS) in the ith observation

J(𝜃0 , 𝜃1 ) = (𝑦 1 − ℎ𝜃 (𝑥 (1) ))2 +(𝑦 2 − ℎ𝜃 (𝑥 (2) ))2 +(𝑦 3 − ℎ𝜃 (𝑥 (3) ))2

+⋯………….+
𝑖𝑛𝑐𝑙𝑢𝑑𝑖𝑛𝑔 𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ℎ𝑜𝑢𝑠𝑒𝑠
So, our aim to minimize the total error.

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1 Cost Function
EXAMPLE

 Let’s take only one parameters 𝜃1 .

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

 Goal: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃1 )

𝜃1
x y
EXAMPLE
1 1
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a  𝑱(𝜃1 ) is a function of 𝜃1 2 2
function of x

2
2

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

𝑱(𝜃1 )
y

1
1

𝜽𝟏 =1

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2(02 + 02 ) = 0
EXAMPLE
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a  𝑱(𝜃1 ) is a function of 𝜃1
function of x
3

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

2
2
y

𝑱(𝜃1 )
1
𝜽𝟏 =1.5
1

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 1.5)2 +(2 − 3)2 ) = 0.5
EXAMPLE
 𝑱(𝜃1 ) is a function of 𝜃1
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a
function of x

2
2

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

𝑱(𝜃1 )
y

1
1

𝜽𝟏 =.75

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 0.75)2 +(2 − 1.5)2 ) = 0.07
COST FUNCTION SURFACE PLOT
CONTOUR PLOT

 Contour plot is also known as level plots.

 It is used to visualized the change in J(𝜃0 ,

𝜃1 ) as a function of two input 𝜃0 and 𝜃1 .
J(𝜃0 , 𝜃1 ) =f(𝜃0 , 𝜃0 )

 For a function f(𝜃0 , 𝜃0 ) of two variables,

assigned different colors to different
values of F.

 Pick some values to plot. The result will

be contours–curves in the graph along
which the values of f(𝜃0 , 𝜃0 ) are constant
EXAMPLE

 𝐽(𝜃0 , 𝜃1 ) (function of the

 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is
parameters 𝜃1 , 𝜃1 )
a function of x
EXAMPLE
 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is a  𝐽(𝜃0 , 𝜃1 ) (function of the
function of x parameters 𝜃1 , 𝜃1 )
EXAMPLE

 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is a  𝐽(𝜃0 , 𝜃1 ) (function of the

function of x parameters 𝜃1 , 𝜃1 )
SUMMARY

Hypothesis ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥

Parameters 𝜃0 , 𝜃1

1
Cost Function J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

Goal 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃0 , 𝜃1 )

𝜃0 , 𝜃1
CONVEX AND CONCAVE FUNCTION
Convex Function Concave Function

Slope of change is 0
g′′(𝑧) ≥ 0
g(z)
𝑔′′ 𝑧 < 0

Slope of change is 0

a b a b

Slope of change
is 0
Example
g(𝑧) = 5 − (𝑧 − 10)2
𝑑(𝑔(𝑧)
= 0 − 2 𝑧 − 10
𝑑𝑧
= -2z + 20
Set 𝑑(𝑔(𝑧)Τ𝑑𝑧 = 0
z = 10
FINDING MAXIMUM VIA HILL CLIMBING
Derivative = 0

How do we know whether to move θ to right

or left ?
(Increase the value of θ or decrease θ)

𝑑𝑔(𝜃)
>0 -ve
𝑑𝜃 slope While not converged
𝑑𝑔(𝜃)
+ve 𝑑𝑔(𝜃) 𝜃 𝑡+1 ← 𝜃 𝑡 + α
<0 𝑑𝜃
slope 𝑑𝜃
iteration
Step Size

θ θ
Max(g(θ))
FINDING MINIMUM VIA HILL DESCENT

Min(g(θ)
When derivative is positive, we want to decrease
𝜃 and when derivative is negative, we want to
θ θ
increase 𝜃
𝑑𝑔(𝜃) 𝑑𝑔(𝜃)
<0 >0
𝑑𝜃 𝑑𝜃

-ve +ve
slope slope

While not converged

𝑑𝑔(𝜃)
𝜃 𝑡+1 ← 𝜃 -α
𝑡
𝑑𝜃
iteration
Step Size
STEP SIZE/LEARNING RATE (𝛼)
 With Fixed learning rate

Slowly reach to the optimum

position
STEP SIZE/LEARNING RATE (𝛼)

 With Fixed learning rate

Small step size Large step size

Advantage Advantage
Will converge to global optimum Moving fast toward the optimum
Disadvantage Disadvantage
Slow convergence May overshoot the optimum point
STEP SIZE/LEARNING RATE (𝛼)
 Decreasing Step Size

Step size is scheduled Common Choice:

𝑡
𝛽
α =
𝑡

𝛽
α𝑡 =
𝑡
CONVERGENCE CRITERIA

 For convex function, optimum occurs when

𝑑𝑔 𝜃
=0
𝑑𝜃

In practice, stop when While not converged

𝑑𝑔(𝜃)
𝜃 𝑡+1
← 𝜃 -α
𝑡
𝑑𝑔 𝜃 iteration
𝑑𝜃
<ϵ Step Size
𝑑𝜃
FINDING THE LEAST SQUARES LINE
𝜃0 , 𝜃1

Solution is unique and

+ gradient decent will
converge to minimum

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
Cost Function J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑖 (𝑖)
ℎ𝜃 (𝑥 ) = 𝜃0 + 𝜃1 𝑥
𝑚
1
𝐽(𝜃0 , 𝜃1 ) = ෍(𝑦 𝑖 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
2𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 +𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − ℎ𝜃 (𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1
𝑚
1
= ෍(𝑦 𝑖 −(𝜃0 + 𝜃1 𝑥 𝑖
))(−1)
𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
1
= ෍(𝑦 𝑖
− (𝜃0 + 𝜃1 𝑥 (𝑖) )) . (−𝑥 (𝑖) )
𝑚
𝑖=1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

Putting it together
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )
APPROACH 1 : SET GRADIENT = 0
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )

Top Term

σ𝑚 𝑦 𝑖 𝜃1 σ𝑚 𝑥 𝑖
𝜃0 = 𝑖=1
− 𝑖=1
𝑚 𝑚
Bottom Term

1 𝑖 2
− σ𝑦 𝑖 𝑥 𝑖
− 𝜃0 σ 𝑥 𝑖
− 𝜃1 σ 𝑥 =0
2𝑚

σ 𝑦 𝑖 σ𝑥 𝑖
σ𝑦 𝑖 𝑥 𝑖 −
𝜃1 = 2 σ 𝑦
𝑚
𝑖 σ𝑥 𝑖
σ𝑥 𝑖 −
𝑚
Note

෍𝑦 𝑖 𝑥 𝑖 ෍𝑥 𝑖
෍𝑥 𝑖 2 ෍𝑦 𝑖
QUESTION 1

Find the least square regression line, for the following data.
Also estimate the value of y when x = 10

X Y
0 2
1 3
2 5
3 4
4 6
SOLUTION

ℎ𝜃 𝑥 = 2.2 + 0.9 𝑥

x = 10

ℎ𝜃 𝑥 = 2.2 + 0.9𝑥
= 11.2
APPROACH 2: GRADIENT DESCENT

Gradient descent is an optimization algorithm used to find the values of parameters

(coefficients) of a function (f) that minimizes a cost function (cost).
GRADIENT DESCENT

 Gradient descent algorithm

 Get estimated parameters

 Intercepts
 Slope
 Used to form predictions
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
Have some function
ℎ𝜃 (𝑥 𝑖 )= 𝜃0 + 𝜃1 𝑥 (𝑖)

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1

Outlines:
Start with some 𝜃0 , 𝜃1
Keep changing 𝜃0 , 𝜃1 to reduce J(𝜃0 , 𝜃1 ) until we hopefully
end up at a minimum.
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑓𝑜𝑟 𝑗 = 0 𝑡𝑜 1
𝜕𝐽(𝜃0 ,𝜃1 )
𝜃𝑗 = 𝜃𝑗 − 𝛼
𝜕𝜃𝑗
}
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
Slope of the <0
line is -ve 𝜕𝜃1

𝜃1
𝜃1 = 𝜃1 − 𝛼 −𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Increase the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
>0
𝐽(𝜃1 ) 𝜕𝜃1 Slope of the
line is +ve

𝜃1

𝜃1 = 𝜃1 − 𝛼 +𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Decrease the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
Slope of the =0
𝜕𝜃1
line is 0

𝜃1 = 𝜃1 − 𝛼 ∗ 0
No change
GRADIENT DESCENT ALGORITHM
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑚
1 𝑖 𝑖
𝜃0 = 𝜃0 + 𝛼 ෍ (𝑦 −(ℎ𝜃 (𝑥 ))
𝑚
𝑖=1

𝑚
1 𝑖 𝑖 𝑖
𝜃1 = 𝜃1 + 𝛼 ෍ (𝑦 − ℎ𝜃 (𝑥 )𝑥
𝑚
𝑖=1
}
LINEAR REGRESSION WITH GRADIENT DESCENT

 Linear Regression Model

𝒉𝜽 𝒙(𝒊) = 𝜽𝟎 + 𝜽𝟏 𝒙(𝒊)
𝟏
J(𝜽𝟎 , 𝜽𝟏 ) = σ𝒎 (𝒚 𝒊 − 𝒉𝜽 (𝒙(𝒊) ))𝟐
𝟐𝒎 𝒊=𝟏

Linear Regression
with
 Gradient Descent Algorithm Gradient descent

𝑾𝒉𝒊𝒍𝒆 𝒏𝒐𝒕 𝒄𝒐𝒏𝒗𝒆𝒓𝒈𝒆𝒅

{
𝒇𝒐𝒓 𝒋 = 𝟎 𝒕𝒐 𝟏
𝝏𝑱(𝜽𝒋 ,𝜽𝒋 )
𝜽𝒋 = 𝜽 𝒋 − 𝜶
𝝏𝜽𝒋
}
GRADIENT DESCENT ALGORITHM
 Types of Gradient Descent Algorithm

 Stochastic gradient descent

 SGD randomly picks one data point from the whole data set at each iteration.

 Batch gradient descent

 Every step of gradient descent uses all the training examples

 Mini-batch gradient descent

 A balance between the goodness of gradient descent and speed of SGD.
 sample a small number of data points instead of just one point at each step.
COEFFICIENT OF DETERMINATION (𝑟 2 )

Quantifies the goodness of a fit.

 𝑟2
 Is a measure of how close each data
point fits to the regression line.

 In other words, it represents the

fraction of variance in dependent
variable (response) that has been
explained by the regression model
 R-Squared is a way of measuring how much better than the mean line
you have done based on summed squared error.
Our objective is to do better than the mean. For instance this regression line will give A
lower sum squared error than using the horizontal line.
Ideally, you would have zero regression error, i.e. Your regression line would perfectly
match the data. In that case you would get an r-squared value of 1
𝐴𝑐𝑡𝑢𝑎𝑙

𝑆𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 = 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
2
෍ 𝑦𝑖 − 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙

𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = ෍ 𝑦𝑖 − 𝑦ത 2

𝑆𝑆𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 =
2
෍ 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 − 𝑦ത
𝑦ത

Intercept
EXAMPLE
Regression Line
X Y SS_Total Y = 6x
SS_Regression
-5
0 0 169 -5 5 25
1 1 144 1 0 0
2 4 81 7 -3 9
3 9 16 13 -4 16
4 16 9 19 -3 9
5 25 144 25 0 0
6 36 529 31 5 25
Average 13
Total 1092 84

R-squared
0.923
Source: [Link]

Lecture Machinelearning
No ratings yet
Lecture Machinelearning
32 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Machine Learning Approaches Explained
No ratings yet
Machine Learning Approaches Explained
70 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
13 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Machine Learning for Professionals
No ratings yet
Machine Learning for Professionals
32 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
137 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
Machine Learning and Regression
No ratings yet
Machine Learning and Regression
8 pages
Unit 2
No ratings yet
Unit 2
151 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Algorithms For Data Science: Attendance: 88772147
No ratings yet
Algorithms For Data Science: Attendance: 88772147
35 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
45 pages
Introduction To Ai & ML
No ratings yet
Introduction To Ai & ML
27 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
AI Module 5
No ratings yet
AI Module 5
91 pages
Unit 1
No ratings yet
Unit 1
38 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Linear Regression Case Study Analysis
No ratings yet
Linear Regression Case Study Analysis
17 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
25 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Module 3
No ratings yet
Module 3
63 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
99 pages
Group 2 ML Asignmet
No ratings yet
Group 2 ML Asignmet
23 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Unit 1 Machine Learning (2) (Autosaved)
No ratings yet
Unit 1 Machine Learning (2) (Autosaved)
44 pages
ML Notes
No ratings yet
ML Notes
10 pages
Intro To ML
No ratings yet
Intro To ML
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Comp Vis Week 2
No ratings yet
Comp Vis Week 2
16 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
103 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Regression Training Overview
No ratings yet
Regression Training Overview
52 pages
AI Lab6
No ratings yet
AI Lab6
7 pages
Artificial Intelligence in Finance
No ratings yet
Artificial Intelligence in Finance
50 pages
AI and Robotics UPSC 2025 Notes
No ratings yet
AI and Robotics UPSC 2025 Notes
3 pages
Business Analytics Level 1 Quiz - Attempt Review
No ratings yet
Business Analytics Level 1 Quiz - Attempt Review
14 pages
Accenture Strategy Macro Foresight Brief February 2025
No ratings yet
Accenture Strategy Macro Foresight Brief February 2025
63 pages
Performance Analysis of NASNet On
No ratings yet
Performance Analysis of NASNet On
26 pages
Isae-Supaero Advanced Masters Brochure 2024 Pap-2
No ratings yet
Isae-Supaero Advanced Masters Brochure 2024 Pap-2
8 pages
Software Programmer Application
No ratings yet
Software Programmer Application
1 page
The Role of Machine Learning in Transforming Business
No ratings yet
The Role of Machine Learning in Transforming Business
9 pages
Cyber Security and Operations Management for Industry 4.0 1st edition by Ahmed Elngar 1000807290 9781000807295 ebook pro digital version
100% (4)
Cyber Security and Operations Management for Industry 4.0 1st edition by Ahmed Elngar 1000807290 9781000807295 ebook pro digital version
82 pages
ETI Exam: Emerging Trends in IT
No ratings yet
ETI Exam: Emerging Trends in IT
10 pages
Intelligent Workloads at the Edge
No ratings yet
Intelligent Workloads at the Edge
2 pages
Inside The Mind of A CISO Resilience in An AI Accelerated World
No ratings yet
Inside The Mind of A CISO Resilience in An AI Accelerated World
44 pages
Ai-Powered Personalized Learning
No ratings yet
Ai-Powered Personalized Learning
5 pages
Fairness in AI-Driven Recruitment: Challenges, Metrics, Methods, and Future Directions
No ratings yet
Fairness in AI-Driven Recruitment: Challenges, Metrics, Methods, and Future Directions
13 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Rise of Ai in Classrooms
No ratings yet
Rise of Ai in Classrooms
16 pages
MLP vs RBF Neural Networks Explained
No ratings yet
MLP vs RBF Neural Networks Explained
2 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
Computer Vision in Cocoa Pathology
No ratings yet
Computer Vision in Cocoa Pathology
21 pages
Workflow Break Down
No ratings yet
Workflow Break Down
15 pages
Analyzing Bangladeshi Cricketers' Sentiment
No ratings yet
Analyzing Bangladeshi Cricketers' Sentiment
13 pages
Answering The Top Trending AI Agent Questions, Part 2 - Current State
No ratings yet
Answering The Top Trending AI Agent Questions, Part 2 - Current State
14 pages
Naman Meena: Data Science Engineer
No ratings yet
Naman Meena: Data Science Engineer
1 page
All-In-One Robot Education: Dobot Product Catalog
No ratings yet
All-In-One Robot Education: Dobot Product Catalog
32 pages
Traffic Sign Detection with CNNs
No ratings yet
Traffic Sign Detection with CNNs
8 pages
Case Study - What Is A Data Strategy Group 2 Analysis
No ratings yet
Case Study - What Is A Data Strategy Group 2 Analysis
7 pages
University Research Supervisors
No ratings yet
University Research Supervisors
7 pages
AI Image Prompt Generation Tool
No ratings yet
AI Image Prompt Generation Tool
18 pages
Sub.: Outcome Transcript - Conference Call - Q4 FY'25: Date: BSE Limited National Stock Exchange of India Limited
No ratings yet
Sub.: Outcome Transcript - Conference Call - Q4 FY'25: Date: BSE Limited National Stock Exchange of India Limited
18 pages
An Effective Analysis of Tomato Plant Leaf Disease Identification Using Deep Learning
No ratings yet
An Effective Analysis of Tomato Plant Leaf Disease Identification Using Deep Learning
4 pages