0% found this document useful (0 votes)

11 views25 pages

ML02

1) The document introduces linear regression with one variable (univariate linear regression) for solving supervised learning problems. In supervised learning, the training set provides examples of the "right answer" or target variable y for each value of the input or feature x. 2) Linear regression finds the best-fitting straight line to describe the relationship between x and y. This straight line is represented by a hypothesis function hθ(x) = θ0 + θ1x, where θ0 and θ1 are parameters to be estimated. 3) The parameters θ0 and θ1 are estimated by minimizing a cost function J(θ0, θ1), which measures the total deviation between the predicted y values hθ

Uploaded by

Muneeb Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views25 pages

ML02

Uploaded by

Muneeb Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

10/23/2023

Introduction to
Machine Learning
Dr. Muhammad Amjad Iqbal
Associate Professor
University of Central Punjab, Lahore.
[email protected]
Slides of Prof. Dr. Andrew Ng, Stanford & Dr. Humayoun

Lecture 2:
Supervised Learning
Linear regression with one variable
Reading:
• Chapter 17, “Bayesian Reasoning and Machine Learning” Page 345-348
• Chapter 03, “Pattern Recognition and Machine Learning” of Christopher M. Bishop, Page 137
• Chapter 11, “Data Mining A Knowledge Discovery Approach”, from page 346
• Chapter 18 , “Artificial Intelligence A Modern Approach”, from page 718

Model representation

1
10/23/2023

500
Housing Prices
(Portland, OR) 400

300

Price 200
(in 1000s
100
of dollars)
0
0 500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data.
Classification Problem
Discrete valued output
4

Training set of Price ($) in 1000's

Size in feet2 (x)
housing prices (y)
(Portland, OR) 2104 460
1416 232
m
1534 315
852 178
Notation: … …
m = Number of training examples x(1) = 2104
x’s = “input” variable / features x(3) = 1534
y’s = “output” variable / “target” variable y(4) = 178
(x, y) – one training example y(2) = 232
(x(i), y(i)) – ith training example
i is an index to training set 5

Training Set How do we represent h ?

hθ(x) = θ0 + θ1x
Shorthand: h(x)
Learning Algorithm

Size of Estimated hθ(x) = θ0 + θ1x

h
house price
x Hypothesis Estimated
value of y
Linear regression with one variable
h is a function Univariate linear regression
h maps from x’s to y’s
6

2
10/23/2023

In summary
• A hypothesis h takes in some variable(s)
• Uses parameters determined by a learning
system
• Outputs a prediction based on that input

Cost function
• A cost function let us figure out how to fit the
best straight line to our data

Training Set Price ($) in 1000's

Size in feet2 (x)
(y)
2104 460
1416 232
m
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
9

3
10/23/2023

Different parameter values give different functions

3 3 3
h(x) = 0 + 0.5.x
2 h(x) = 1.5 + 0.x 2 2

1 1 1 h(x) = 1 + 0.5.x

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3

Positive slope if θ1> 0

Idea: Choose so that

is close to for our
training examples y

x
• hθ(x) is a "y imitator"
• Tries to convert the x into y
• Considering we already have y we
can evaluate how well hθ(x) does this

Minimal deviation of x from y

Idea: Choose so that
is close to for our
training examples
Minimization problem y

4
10/23/2023

ℎ 𝑥 = 𝜃 + 𝜃 𝑥( )

1
𝐽 𝜃 ,𝜃 = ℎ 𝑥 −𝑦
2𝑚

minimize 𝐽 𝜃 , 𝜃
Cost function

1
𝐽 𝜃 ,𝜃 = ℎ 𝑥 −𝑦
2𝑚

• This cost function is also called the squared

error cost function
– Reasonable choice for most regression functions
– Probably most commonly used function

Cost function intuition I

Simplified version
Hypothesis:
𝜃 =0
3 3
Parameters: 2 2
1 1
0 0
Cost Function: 0 1 2 3 0 1 2 3

Goal:
15

5
10/23/2023

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
𝐽 𝜃 = ℎ 𝑥 −𝑦
2𝑚
𝐽 1 =0
1 1
𝐽 𝜃 = 𝜃 𝑥−𝑦 = 0 +0 +0 =0
2×3 6 16

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
𝐽 𝜃 = 𝜃 𝑥( ) − 𝑦
2𝑚 𝐽 0.5 = 0.58
1 1 3.6
= (0.5 − 1) +(1 − 2) +(1.5 − 3) = 0.5 + 1 + 1.5 = ≈ 0.58
2×3 6 6 17

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
𝐽 𝜃 =
2𝑚
𝜃 𝑥( ) − 𝑦 𝐽 0 ≈ 2.3
1 1 14
= (0 × 1 − 1) +(0 × 2 − 2) +(0 × 3 − 3) = 1+4+9 = ≈ 2.3
6 6 6 18

6
10/23/2023

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 𝜃 = −0.5 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x

𝐽 −0.5 ≈ 5.15 19

• If we compute a range of values plot

• 𝐽(𝜃 ) vs 𝜃 we get a polynomial (looks like
a quadratic)

Cost function intuition II

Hypothesis:

Parameters:

Cost Function:

Goal:
21

7
10/23/2023

???

(for fixed , this is a function of x) (function of the parameters )

500

400

Price ($)
in 1000’s 300

200
𝜃 = 50
100
𝜃 = 0.06
0
0 1000 2000 3000
Size in feet2 (x)
𝜃 𝜃 ???

(for fixed , this is a function of x) (function of the parameters )

Contour Plot 24

8
10/23/2023

(for fixed , this is a function of x) (function of the parameters )

9
10/23/2023

• Doing this manually is painful

• What we really want is an efficient algorithm
for finding the minimum J for θ0 and θ1

Gradient descent algorithm

• Minimize cost function J
• Used all over machine learning
for minimization

Gradient descent algorithm

Have some function

Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

10
10/23/2023

• Local Search for optimization :

– hill climbing, simulated annealing, Gradient
descent algorithm, etc

Local Search Methods

• Applicable when seeking Goal State & don't care how
to get there. E.g.,
– N-queens,
– finding shortest/cheapest round trips
• (Travel Salesman Problem, Vehicle Routing Problem)
– finding models of propositional formulae (SAT solvers)
– VLSI layout, planning, scheduling, time-tabling, . . .
– map coloring,
– resource allocation
– protein structure prediction
– genome sequence assembly
32

Local search
 Key idea (surprisingly simple):

1. Select (random) initial state

(generate an initial guess)

2. Make local modification to

improve current state (evaluate
current state and move to other
states)

3. Repeat Step 2 until goal state

found (or out of time) 33

11
10/23/2023

Gradient descent algorithm

Have some function

Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
34

J(0,1)

1
0

J(0,1)

1
0

12
10/23/2023

Gradient descent algorithm

Derivative term

𝛼: Learning Rate

𝛼: Learning Rate (Should be a small number)

• Large number:= Huge steps
• Small number := baby steps
37

Gradient descent algorithm

Correct:
Simultaneous update of 𝜃 , 𝜃 Incorrect:

Gradient descent intuition

• To understand the intuition, we'll return to a simpler

function where we minimize one parameter to help
explain the algorithm in more detail

𝑤ℎ𝑒𝑟𝑒 𝜃 ∈ 𝑅

13
10/23/2023

Two key terms in the algorithm

• Derivative term
•𝛼

Partial derivative vs. derivative

• Use partial derivative when we have multiple variables but only
derive with respect to one
• Use derivative when we are deriving with respect to all the variables

𝑑
𝜃 =𝜃 −𝛼 𝐽(𝜃 )
𝑑𝜃
≥0
𝜃 = 𝜃 − 𝛼(+𝑣𝑒 𝑛𝑜. )

Derivative: it takes the tangent to the point (the straight red line) and calculates the slop of
this tangent line. Slop = vertical line / horizontal line 41

𝑑
𝜃 =𝜃 −𝛼 𝐽(𝜃 )
𝑑𝜃
≤0
𝜃 = 𝜃 − 𝛼 . (−𝑣𝑒 𝑛𝑜. )

14
10/23/2023

Slope
• Familiar meaning?
• The slope of a line is the change in y divided by the
change in x .
• Slope (m) = = =
• Pick any two points on the line: (𝑥 , 𝑦 ), (𝑥 , 𝑦 )
• Ex. Find the slope of the line which passes through the
points (2, 5) and (0, 1) :
• 𝑚= = = = 2 which is positive number
• Meaning: Every time x increases by 1 (anywhere on
the line), y increase by 2 , and whenever x decreases
by 1, y decreases by 2 .

5−1 4 2
𝑚= = =
2−0 2 1

Positive slope (i.e. m > 0 )

• y always increases
when x increases
and y always decreases
when x decreases.
• The graph of the line starts
at the bottom left and goes
towards the top right.

3+1 4 4
𝑚= = = − = −1.33
−2 − 1 −3 3

Negative slope (i.e. m < 0 )

Y always decreases
when x increases
and y always increases
when x decreases.

15
10/23/2023

Horizontal and Vertical Lines

• The slope of any

horizontal line is 0
• 𝑚= =0

• The slope of any vertical

line is undefined

• Positive value
• Negative value
• Zero value

• At each point, the line is always tangent to the curve

• Its slope is the derivative

𝛼: Learning Rate

If α is too small, gradient

descent can be slow.

If α is too large, gradient descent

can overshoot the minimum. It
may fail to converge.

16
10/23/2023

Question: When you get to a local minimum

at local optima

Current value of Derivative term = 0

θ1 = θ1- 0
So θ1 remains the same
49

Gradient descent can converge to a local

minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.

Gradient descent for linear regression

Gradient descent algorithm Linear Regression Model

17
10/23/2023

Gradient descent for linear regression

𝜕 1
ℎ 𝑥 −𝑦
𝜕𝜃 2𝑚

𝜕 1
= 𝜃 + 𝜃 𝑥( ) − 𝑦
𝜕𝜃 2𝑚

1
(ℎ (𝑥 ) − 𝑦 )
𝑚

1
(ℎ (𝑥 ) − 𝑦 ) . 𝑥 ( )
𝑚

Gradient descent algorithm

𝜕
𝐽(𝜃 , 𝜃 )
𝜕𝜃

update
and
simultaneously

𝜕
𝐽(𝜃 , 𝜃 )
𝜕𝜃
54

18
10/23/2023

J(0,1)

1
0

J(0,1)

1
0

19
10/23/2023

(for fixed , this is a function of x) (function of the parameters )

20
10/23/2023

(for fixed , this is a function of x) (function of the parameters )

21
10/23/2023

(for fixed , this is a function of x) (function of the parameters )

22
10/23/2023

Linear Regression with One Variable

• Error here a is y-intercept
while b is slope

• SSE

Linear Regression with One Variable

Another name:
“Batch” Gradient Descent
“Batch”: Each step of gradient descent uses
all the training examples.

Another algorithm that solves

Normal equations method

Gradient descent algorithm scales better than Normal

equations method to larger datasets
69

23
10/23/2023

Generalization of
Gradient descent algorithm
• Learn with larger number of features.

• Difficult to plot

We see here this matrix shows us Vector

Size, Number of bedrooms Shown as y
Number floors, Age of home Shows us the prices
All in one variable 71

• Need linear algebra for more complex linear

regression models
• Linear algebra is good for making
computationally efficient models (we’ll see
later)
– Provides a good way to work with large sets of data
sets
– Typically, vectorization of a problem is a common
optimization technique

24
10/23/2023

End

3DP Model Resin Type Color Bottom Exposure(s) Normal Exposure(s) Layer Height (MM) Notes
100% (3)
3DP Model Resin Type Color Bottom Exposure(s) Normal Exposure(s) Layer Height (MM) Notes
5 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Primavera Unifier
100% (1)
Primavera Unifier
37 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
LinearRegression)byimran
No ratings yet
LinearRegression)byimran
47 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
ML Notes
No ratings yet
ML Notes
14 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Regression
No ratings yet
Regression
30 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
CSE445 T3 Linear Regression One Variable
No ratings yet
CSE445 T3 Linear Regression One Variable
57 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229
No ratings yet
CS229
69 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Linear Regression
100% (1)
Linear Regression
51 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
lec6_7_Linear_regression
No ratings yet
lec6_7_Linear_regression
38 pages
Lec2 Linear Regression With One Variable
No ratings yet
Lec2 Linear Regression With One Variable
48 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
[MLP] MidtermNote
No ratings yet
[MLP] MidtermNote
31 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
2 (1)
No ratings yet
2 (1)
18 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Linear Regression: Normal Equation and Gradient Descent
No ratings yet
Linear Regression: Normal Equation and Gradient Descent
17 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
From Everand
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
Fouad Sabry
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Crash Injury Severity Prediction With Artificial Neural Networks
No ratings yet
Crash Injury Severity Prediction With Artificial Neural Networks
128 pages
Forward Pass
No ratings yet
Forward Pass
2 pages
Daniyal Ejaz L1S23MSCE0001 Thermal Treatment Presentation
No ratings yet
Daniyal Ejaz L1S23MSCE0001 Thermal Treatment Presentation
23 pages
Non-Linear Hypotheses: Neural Networks: Representation
No ratings yet
Non-Linear Hypotheses: Neural Networks: Representation
7 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Project Input Data For Each Student
No ratings yet
Project Input Data For Each Student
1 page
CAAnDoS - Fall 2021 - Section 02A - 01
No ratings yet
CAAnDoS - Fall 2021 - Section 02A - 01
214 pages
Grouting
No ratings yet
Grouting
21 pages
Electro Osmosis
0% (1)
Electro Osmosis
15 pages
CAAnDoS - Fall 2021 - Lecture 01
No ratings yet
CAAnDoS - Fall 2021 - Lecture 01
74 pages
Outer Brickwork Rate Ananlysis From Outside - xlsx-1
No ratings yet
Outer Brickwork Rate Ananlysis From Outside - xlsx-1
1 page
Best Design For Windows
No ratings yet
Best Design For Windows
6 pages
2nd Option
No ratings yet
2nd Option
6 pages
Status Concrete Raw Materials-10!10!23
No ratings yet
Status Concrete Raw Materials-10!10!23
1 page
WAH 1st Merit List 22 Nov 20191574405833
No ratings yet
WAH 1st Merit List 22 Nov 20191574405833
3 pages
PMP Students-Assignment 2 PDF
No ratings yet
PMP Students-Assignment 2 PDF
14 pages
Introduction To Personal Computing - Module Overview
No ratings yet
Introduction To Personal Computing - Module Overview
7 pages
Misc Mermaid PDF
No ratings yet
Misc Mermaid PDF
1 page
(PDF) Samsung Service Manual bn94-08318b - Compress PDF
No ratings yet
(PDF) Samsung Service Manual bn94-08318b - Compress PDF
8 pages
Photoshop
No ratings yet
Photoshop
126 pages
Activity
No ratings yet
Activity
21 pages
Six Sigma Proposal (1.1)
100% (1)
Six Sigma Proposal (1.1)
40 pages
Library Management System Project
No ratings yet
Library Management System Project
32 pages
Email and Online Communication
No ratings yet
Email and Online Communication
29 pages
DS Final Sample
No ratings yet
DS Final Sample
5 pages
Spectrum Averaging vs. More Spectrum Lines - A Comparative Study of Enhancement Effects On Spectrum
No ratings yet
Spectrum Averaging vs. More Spectrum Lines - A Comparative Study of Enhancement Effects On Spectrum
12 pages
MIRS - HS Term II - User Guide - 24.09.2024 - 3.42PM
No ratings yet
MIRS - HS Term II - User Guide - 24.09.2024 - 3.42PM
18 pages
Headquarters U.S. Air Force: Dod Enterprise Devsecops Initiative (Software Factory)
No ratings yet
Headquarters U.S. Air Force: Dod Enterprise Devsecops Initiative (Software Factory)
35 pages
Seek Pitch v24
No ratings yet
Seek Pitch v24
23 pages
QUANTITATIVE
No ratings yet
QUANTITATIVE
252 pages
National Instruments USB-232 - USB-485 Series Datasheet
No ratings yet
National Instruments USB-232 - USB-485 Series Datasheet
4 pages
TLE-ICT-CSS-9-Q2 - Module1-PMC Edited Module 1-3 2
No ratings yet
TLE-ICT-CSS-9-Q2 - Module1-PMC Edited Module 1-3 2
52 pages
Tm1 Css - Alther Dabon
No ratings yet
Tm1 Css - Alther Dabon
19 pages
2913 Published
No ratings yet
2913 Published
21 pages
2506.04651v1
No ratings yet
2506.04651v1
14 pages
BN-EG-UE109 Guide For Vessel Sizing
No ratings yet
BN-EG-UE109 Guide For Vessel Sizing
36 pages
Statistical Package For The Social Science Spss Lab Report Bca 3rd Sem Probability and Statistics - 2
No ratings yet
Statistical Package For The Social Science Spss Lab Report Bca 3rd Sem Probability and Statistics - 2
13 pages
History of Photoshop
No ratings yet
History of Photoshop
42 pages
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
No ratings yet
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
61 pages
Customer Behavior Analysis in E-Commerce Using Decision Tree Machine Learning Approach
No ratings yet
Customer Behavior Analysis in E-Commerce Using Decision Tree Machine Learning Approach
9 pages
Networking Analysis (Shortest Route Program) : Example 1
No ratings yet
Networking Analysis (Shortest Route Program) : Example 1
7 pages
Minority Rules - Scientists Discover (10%) Tipping Point For The Spread of Ideas
No ratings yet
Minority Rules - Scientists Discover (10%) Tipping Point For The Spread of Ideas
3 pages
Eee Class Notes
No ratings yet
Eee Class Notes
175 pages
Assembly Language Pop Push Topic 4 Excercise
No ratings yet
Assembly Language Pop Push Topic 4 Excercise
5 pages

ML02

Uploaded by

ML02

Uploaded by

10/23/2023

Training set of Price ($) in 1000's

Training Set How do we represent h ?

Size of Estimated hθ(x) = θ0 + θ1x

Training Set Price ($) in 1000's

Different parameter values give different functions

Positive slope if θ1> 0

Idea: Choose so that

Minimal deviation of x from y

• This cost function is also called the squared

Cost function intuition I

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

• If we compute a range of values plot

Cost function intuition II

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

• Doing this manually is painful

Gradient descent algorithm

Gradient descent algorithm

• Local Search for optimization :

Local Search Methods

1. Select (random) initial state

2. Make local modification to

3. Repeat Step 2 until goal state

Gradient descent algorithm

Gradient descent algorithm

𝛼: Learning Rate (Should be a small number)

Gradient descent algorithm

Gradient descent intuition

• To understand the intuition, we'll return to a simpler

Two key terms in the algorithm

Partial derivative vs. derivative

Positive slope (i.e. m > 0 )

Negative slope (i.e. m < 0 )

Horizontal and Vertical Lines

• The slope of any

• The slope of any vertical

• At each point, the line is always tangent to the curve

If α is too small, gradient

If α is too large, gradient descent

Question: When you get to a local minimum

Current value of Derivative term = 0

Gradient descent can converge to a local

Gradient descent for linear regression

Gradient descent algorithm Linear Regression Model

Gradient descent for linear regression

Gradient descent algorithm

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Linear Regression with One Variable

Linear Regression with One Variable

Another algorithm that solves

Normal equations method

Gradient descent algorithm scales better than Normal

We see here this matrix shows us Vector

• Need linear algebra for more complex linear

You might also like