0% found this document useful (0 votes)

2 views4 pages

Gradient Descent Algorithm and Back-Propagation Derivation

The document explains the Gradient Descent algorithm for estimating regression parameters in linear models, detailing the functional form, error calculation, and cost function. It outlines the iterative process of adjusting parameters to minimize mean squared error, including the derivation of the back-propagation rule for updating weights in neural networks. The document also distinguishes between training rules for output and hidden unit weights using gradient descent.

Uploaded by

prasadshetty1566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views4 pages

Gradient Descent Algorithm and Back-Propagation Derivation

Uploaded by

prasadshetty1566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Gradient Descent Algorithm

The Gradient Descent (GD) algorithm can be used for estimating the values of regression
parameters, given a dataset with inputs and outputs.
The functional form of a simple linear regression model is given by:

Yi = b0 + b1 Xi + ei (6.2)

where b0 is called the bias or intercept, b1 is the feature weight or regression coefficient, and ei
is the error in prediction.
The predicted value of Yi is written as Ŷi , and it is given by:

Ŷi = b̂0 + b̂1 Xi (6.3)

where b̂0 and b̂1 are the estimated values of b0 and b1 .

The error is given by:
ei = Yi − Ŷi = Yi − b̂0 − b̂1 Xi (6.4)

The cost function for the linear regression model is the total error (mean squared error)
across all N records and is given by:

N
1 X 2
MSE = Yi − b̂0 − b̂1 Xi (6.5)
2N i=1

The error is a function of b0 and b1 . It is a pure convex function and has a global minimum
as shown in Figure 6.1. The gradient descent algorithm starts at a random starting value (for
b0 and b1 ) and moves toward the optimal solution.

1
Gradient descent finds the optimal values of b0 and b1 that minimize the loss function using
the following steps:

1. Randomly guess the initial values of b0 (bias or intercept) and b1 (feature weight).

2. Calculate the estimated value of the outcome variable Ŷi for the initialized values of bias
and weights.

3. Calculate the mean square error function (MSE).

4. Adjust the b0 and b1 values by calculating the gradients of the error function:

∂MSE
b0 := b0 − α (6.6)
∂b0
∂MSE
b1 := b1 − α (6.7)
∂b1

where α is the learning rate (a hyperparameter). The value of α is chosen based on the
magnitude of the update needed to be applied to the bias and weights at each iteration.
The partial derivatives of MSE with respect to b0 and b1 are given by:

N
∂MSE 2 X
=− (Yi − Ŷi ) (6.8)
∂b0 N i=1
N
∂MSE 2 X
=− (Yi − Ŷi )Xi (6.9)
∂b1 N i=1

5. Repeat steps 1 to 4 for several iterations until the error stops reducing further or the
change in cost becomes infinitesimally small.

The values of b0 and b1 at the minimum cost points are the best estimates of the model
parameters.

Derivation of Back-propagation Rule

The stochastic gradient descent involves iterating through the training examples one at a time.
For each training example d, we descend the gradient of the error Ed with respect to that single
example.
For each training example d, every weight wji is updated by adding to it ∆wji :

∂Ed
∆wji = −η (1)
∂wji
where Ed is the error on training example d, summed over all output units in the network:

1 X
Ed (w)
⃗ = (tk − ok )2 (2)
2 k∈outputs

2
• outputs → Set of output units in the network.

• tk = target value of unit k for training example d.

• ok = output of unit k given training example d.

Using the chain rule, we can write:

∂Ed ∂Ed ∂netj

= ·
∂wji ∂netj ∂wji
or equivalently,
∂Ed ∂Ed
= xi (3)
∂wji ∂netj

Case 1: Training Rule for Output-Unit Weights

For the output units, we have:

∂Ed ∂Ed ∂oj

= · (4)
∂netj ∂oj ∂netj
Consider just the first term in (4). From equation (4):
!
∂Ed ∂ 1 X
= (tk − ok )2 (@)
∂oj ∂oj 2 k∈outputs

The derivative
∂
(tk − ok )2
∂oj
will be zero for all output units k except when k = j. Hence, we drop the summation over
output units and simply set k = j:

∂Ed ∂ 1
= (tj − oj )2
∂oj ∂oj 2
1 ∂(tj − oj )
= · 2(tj − oj ) · = −(tj − oj ) (5)
2 ∂oj
Now, consider the second term in (4). Since

oj = σ(netj )

the derivative is the derivative of the sigmoid function:

∂oj d
= σ(netj ) = oj (1 − oj ) (6)
∂netj d(netj )
Substituting equations (5) and (6) into (4), we obtain:

∂Ed
= −(tj − oj ) oj (1 − oj )
∂netj

3
Now, using equation (3), the weight update becomes:

∂Ed
∆wji = −η = η(tj − oj ) oj (1 − oj ) xi (7)
∂wji
This is the final training rule for output-layer weights in backpropagation using gradient
descent with a sigmoid activation.

Case 2: Training Rule for Hidden Unit Weights

In the case where j is an internal (hidden) unit in the network, the derivation must take into
account the indirect ways in which wji can influence the network outputs (and hence Ed ). We
write:

∂Ed X ∂Ed ∂netk

= ·
∂netj ∂netk ∂netj
k∈downstream(j)

X ∂netk
= −δk ·
∂netj
k∈downstream(j)

X ∂netk ∂oj
= −δk · ·
∂oj ∂netj
k∈downstream(j)

X ∂oj
= −δk · wkj ·
∂netj
k∈downstream(j)
X
= −δk · wkj · oj (1 − oj )
k∈downstream(j)

∂Ed
Using the notation δj to denote − ∂net j
, we have:

X
δj = oj (1 − oj ) δk wkj
k∈downstream(j)

And the final weight update rule for the hidden units becomes:

∆wji = η δj xi

This is the standard backpropagation rule for updating weights connected to hidden units
in a neural network.

NOTICE Support of BI Applications 7.9.6.3 or 7.9.6.4 With OBIEE 12c
No ratings yet
NOTICE Support of BI Applications 7.9.6.3 or 7.9.6.4 With OBIEE 12c
21 pages
Eio Supplementary
No ratings yet
Eio Supplementary
6 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
3.Linear Regression
No ratings yet
3.Linear Regression
18 pages
Derivation of The Gradient Descent Rule
No ratings yet
Derivation of The Gradient Descent Rule
38 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Gradient Descent Learning: Minimize Objective Function: Error Landscape
No ratings yet
Gradient Descent Learning: Minimize Objective Function: Error Landscape
14 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
16 pages
Back Propagation
No ratings yet
Back Propagation
28 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
No ratings yet
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
10 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
SJNanda_Neural Network
No ratings yet
SJNanda_Neural Network
43 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
ML algorithms
No ratings yet
ML algorithms
10 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
26 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Chapter_7
No ratings yet
Chapter_7
68 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Backward Forward Propogation
No ratings yet
Backward Forward Propogation
19 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
DL
No ratings yet
DL
73 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
S2_6_NN
No ratings yet
S2_6_NN
55 pages
Backpropagation LectureNotesPublic
No ratings yet
Backpropagation LectureNotesPublic
13 pages
Artificial Neural Networks Mathematics of Backpropagation (Part 4) - BRIAN DOLHANSKY
No ratings yet
Artificial Neural Networks Mathematics of Backpropagation (Part 4) - BRIAN DOLHANSKY
9 pages
Lec3 Backpropagation
No ratings yet
Lec3 Backpropagation
13 pages
14_Introduction_to_Training_a_Network[1]
No ratings yet
14_Introduction_to_Training_a_Network[1]
39 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
NN BackProp (1)
No ratings yet
NN BackProp (1)
34 pages
Multilayered Network Architectures
No ratings yet
Multilayered Network Architectures
34 pages
Handout Delta Rule
No ratings yet
Handout Delta Rule
10 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Lecture 40,41 BP Algorithm
No ratings yet
Lecture 40,41 BP Algorithm
11 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Gradient descent
No ratings yet
Gradient descent
16 pages
Multilayer Perceptrons Neural Networks
No ratings yet
Multilayer Perceptrons Neural Networks
19 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
Back Propagation in NN
No ratings yet
Back Propagation in NN
30 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Nenelista
No ratings yet
Nenelista
2 pages
Ningalude Sneham AJ Joseph
No ratings yet
Ningalude Sneham AJ Joseph
8 pages
Software Architecture in Context of MDSD
No ratings yet
Software Architecture in Context of MDSD
3 pages
Download ebooks file Introducing Blockchain with Lisp Implement and Extend Blockchains with the Racket Language Sitnikovski Boro all chapters
100% (3)
Download ebooks file Introducing Blockchain with Lisp Implement and Extend Blockchains with the Racket Language Sitnikovski Boro all chapters
62 pages
Carole Mortimer
0% (1)
Carole Mortimer
6 pages
Chap 8 E-Commerce
0% (1)
Chap 8 E-Commerce
17 pages
Wireless Local Loop
No ratings yet
Wireless Local Loop
31 pages
Dibta Compressed
No ratings yet
Dibta Compressed
2 pages
Assignment ADCIM 2007
No ratings yet
Assignment ADCIM 2007
17 pages
555 560XP/XPG 562XP/XPG: Workshop Manual
No ratings yet
555 560XP/XPG 562XP/XPG: Workshop Manual
59 pages
Avotek AV-12 Autopilot System Trainer (PDF)
No ratings yet
Avotek AV-12 Autopilot System Trainer (PDF)
8 pages
Lesson 2 Computer Hardware Tools
No ratings yet
Lesson 2 Computer Hardware Tools
17 pages
NTSE MAT - Last Minute Guidelines & Quick Revision (Download PDF
No ratings yet
NTSE MAT - Last Minute Guidelines & Quick Revision (Download PDF
3 pages
Spare Parts Options For Powerflex Architecture Class Low Voltage Drives Rev. V
No ratings yet
Spare Parts Options For Powerflex Architecture Class Low Voltage Drives Rev. V
1 page
Explanation of the Following 8051 Instructions With Examples
No ratings yet
Explanation of the Following 8051 Instructions With Examples
7 pages
Ux Design Portfolio: Selected Samples
No ratings yet
Ux Design Portfolio: Selected Samples
34 pages
Computer Organization and Design 5th Edition Patterson Test Bankdownload
100% (11)
Computer Organization and Design 5th Edition Patterson Test Bankdownload
48 pages
Wireless Communication and Mobile Computing
No ratings yet
Wireless Communication and Mobile Computing
64 pages
Stream Network and Watershed Delineation Using Spatial Analyst Hydrology Tools
No ratings yet
Stream Network and Watershed Delineation Using Spatial Analyst Hydrology Tools
16 pages
Logical Agents
No ratings yet
Logical Agents
23 pages
Towards An Algebra For Lighting Simulation: Daniel C. Glaser, Osbert Feng, Jan Voung, Ling Xiao
No ratings yet
Towards An Algebra For Lighting Simulation: Daniel C. Glaser, Osbert Feng, Jan Voung, Ling Xiao
9 pages
Integrate database with a website
No ratings yet
Integrate database with a website
98 pages
Smart-UPS On-Line - SRT2200XLI
No ratings yet
Smart-UPS On-Line - SRT2200XLI
4 pages
CX-CP-1.14 Gage Repeatability and Reproducibility Study
No ratings yet
CX-CP-1.14 Gage Repeatability and Reproducibility Study
13 pages
Student Login Page: Figure1.0: Student Log-In Page: Pinnacle Technologies, Inc
No ratings yet
Student Login Page: Figure1.0: Student Log-In Page: Pinnacle Technologies, Inc
7 pages
Directvariation Powerpoint 1
No ratings yet
Directvariation Powerpoint 1
26 pages
Seo CS
No ratings yet
Seo CS
2 pages
F: A Simple-to-Use News Scraper Optimized For High Quality Extractions
No ratings yet
F: A Simple-to-Use News Scraper Optimized For High Quality Extractions
10 pages
CSC 805
No ratings yet
CSC 805
4 pages

Gradient Descent Algorithm and Back-Propagation Derivation

Uploaded by

Gradient Descent Algorithm and Back-Propagation Derivation

Uploaded by

Gradient Descent Algorithm

Ŷi = b̂0 + b̂1 Xi (6.3)

where b̂0 and b̂1 are the estimated values of b0 and b1 .

3. Calculate the mean square error function (MSE).

Derivation of Back-propagation Rule

• tk = target value of unit k for training example d.

• ok = output of unit k given training example d.

Using the chain rule, we can write:

∂Ed ∂Ed ∂netj

Case 1: Training Rule for Output-Unit Weights

∂Ed ∂Ed ∂oj

the derivative is the derivative of the sigmoid function:

Case 2: Training Rule for Hidden Unit Weights

∂Ed X ∂Ed ∂netk

You might also like