0% found this document useful (0 votes)

48 views31 pages

Deep Learning Basics Lecture 2 Backpropagation

This document discusses backpropagation and gradient descent algorithms for training deep learning models. It begins with an overview of backpropagation for computing gradients to minimize a loss function. It then provides a pictorial illustration of gradient descent by representing neural networks as real circuits. It explains how to compute gradients through multiple layers and nodes with activation functions and weights. It concludes by discussing stochastic gradient descent and mini-batch gradient descent for optimizing deep learning models on large datasets.

Uploaded by

baris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views31 pages

Deep Learning Basics Lecture 2 Backpropagation

Uploaded by

baris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Deep Learning Basics

Lecture 2: Backpropagation
Princeton University COS 495
Instructor: Yingyu Liang
How to train the dragon?

…
……
𝑦
…

ℎ1 ℎ2 ℎ𝐿
𝑥
How to get the expected output

𝑓𝜃 (𝑥)
𝑥 𝑙 𝑥; 𝜃 ≠ 0
𝑦

Loss of the system

𝑙(𝑥; 𝜃) = 𝑙(𝑓𝜃 , 𝑥, 𝑦)
How to get the expected output
Find direction 𝑑 so that:

𝑥 𝑙 𝑥; 𝜃 + 𝑑 ≈ 0

Loss 𝑙(𝑥; 𝜃 + 𝑑)
How to get the expected output
How to find 𝑑: 𝑙 𝑥; 𝜃 + 𝜖𝑣 ≈ 𝑙 𝑥; 𝜃 + 𝛻𝑙 𝑥; 𝜃 ∗ 𝜖𝑣 for small scalar 𝜖

𝑥 𝑙 𝑥; 𝜃 + 𝑑 ≈ 0

Loss 𝑙(𝑥; 𝜃 + 𝑑)
How to get the expected output
Conclusion: Move 𝜃 along −𝛻𝑙 𝑥; 𝜃 for a small amount

𝑥 𝑙 𝑥; 𝜃 + 𝑑

Loss 𝑙(𝑥; 𝜃 + 𝑑)
Neural Networks as real circuits
Pictorial illustration of gradient descent
Gradient
• Gradient of the loss is simple
• E.g., 𝑙 𝑓𝜃 , 𝑥, 𝑦 = 𝑓𝜃 𝑥 − 𝑦 2 /2
𝜕𝑙 𝜕𝑓
• = (𝑓𝜃 𝑥 − 𝑦)
𝜕𝜃 𝜕𝜃
• Key part: gradient of the hypothesis
Open the box: real circuit
Single neuron

𝑥1
− 𝑓
𝑥2

Function: 𝑓 = 𝑥1 − 𝑥2
Single neuron

𝑥1
1
− 𝑓
𝑥2
−1

Function: 𝑓 = 𝑥1 − 𝑥2
𝜕𝑓 𝜕𝑓
Gradient: = 1, = −1
𝜕𝑥1 𝜕𝑥2
Two neurons
𝑥1
𝑥3
− 𝑓
+ 𝑥2
𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑥3 + 𝑥4
Two neurons
𝑥1
𝑥3 1
𝜕𝑥2
− 𝑓
=1
𝜕𝑥3
+ 𝑥2
−1
𝑥4
𝜕𝑥2
=1
𝜕𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑥3 + 𝑥4
𝜕𝑥2 𝜕𝑥2 𝜕𝑓
Gradient: = 1, = 1. What about ?
𝜕𝑥3 𝜕𝑥4 𝜕𝑥3
Two neurons
𝑥1
𝑥3 1
− 𝑓
−1
+ 𝑥2
−1
𝑥4
−1

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑥3 + 𝑥4
𝜕𝑓 𝜕𝑓 𝜕𝑥2
Gradient: = = −1
𝜕𝑥3 𝜕𝑥2 𝜕𝑥3
Multiple input
𝑥1
𝑥3 1
− 𝑓
−1
𝑥5
+ 𝑥2
−1
𝜕𝑥2 𝑥4
=1
𝜕𝑥5 −1

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑥3 + 𝑥5 + 𝑥4
𝜕𝑥2
Gradient: =1
𝜕𝑥5
Multiple input
𝑥1
𝑥3 1
− 𝑓
−1
𝑥5
+ 𝑥2
−1
−1 𝑥4
−1

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑥3 + 𝑥5 + 𝑥4
𝜕𝑓 𝜕𝑓 𝜕𝑥5
Gradient: = = −1
𝜕𝑥5 𝜕𝑥5 𝜕𝑥3
Weights on the edges
𝑥3 𝑥1
𝑤3 1
− 𝑓
+ 𝑥2
𝑤4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑤3 𝑥3 + 𝑤4 𝑥4
Weights on the edges
𝑤3
𝑥1
1
𝑥3 − 𝑓

𝑤4
+ 𝑥2
−1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑤3 𝑥3 + 𝑤4 𝑥4
Weights on the edges
𝑥3 𝑥1
𝑤3 −𝑥3 1
− 𝑓
+ 𝑥2
𝑤4 −𝑥4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝑤3 𝑥3 + 𝑤4 𝑥4
𝜕𝑓 𝜕𝑓 𝜕𝑥2
Gradient: = = −1 × 𝑥3 = −𝑥3
𝜕𝑤3 𝜕𝑥2 𝜕𝑤3
Activation
𝑥3 𝑥1
𝑤3 1
− 𝑓
𝜎 𝑥2
𝑤4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4
Activation
𝑥3 𝑥1
𝑤3 1
− 𝑓
𝑛𝑒𝑡2 𝜎 𝑥2
−1
𝑤4
𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4
Let 𝑛𝑒𝑡2 = 𝑤3 𝑥3 + 𝑤4 𝑥4
Activation
𝜕𝑛𝑒𝑡2
= 𝑥3
𝑥3 𝜕𝑤3 𝑥1
𝑤3 1
− 𝑓
𝜕𝑥2
= 𝜎′ 𝑛𝑒𝑡2 𝜎 𝑥2
𝜕𝑛𝑒𝑡2
𝑤4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4
𝜕𝑓 𝜕𝑓 𝜕𝑥2 𝜕𝑛𝑒𝑡2
Gradient: = = −1 × 𝜎 ′ × 𝑥3 = −𝜎 ′ 𝑥3
𝜕𝑤3 𝜕𝑥2 𝜕𝑛𝑒𝑡2 𝜕𝑤3
Activation
𝑥3 −𝜎′𝑥3 𝑥1
𝑤3 1
− 𝑓
−𝜎′ 𝑛𝑒𝑡2 𝜎 𝑥2
𝑤4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = 𝑥1 − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4
𝜕𝑓 𝜕𝑓 𝜕𝑥2 𝜕𝑛𝑒𝑡2
Gradient: = = −1 × 𝜎 ′ × 𝑥3 = −𝜎 ′ 𝑥3
𝜕𝑤3 𝜕𝑥2 𝜕𝑛𝑒𝑡2 𝜕𝑤3
Multiple paths
𝑥5
+ 𝑥1
1
𝑥3 𝑤3 − 𝑓
𝑛𝑒𝑡2 𝜎 𝑥2
𝑤4 −1

𝑥4

Function: 𝑓 = 𝑥1 − 𝑥2 = (𝑥1 +𝑥5 ) − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4

Multiple paths
+ 𝑥1
1
𝑥3 𝑤3 − 𝑓
𝑛𝑒𝑡2 𝜎 𝑥2
𝑤4 −1

Function: 𝑓 = 𝑥1 − 𝑥2 = (𝑥1 +𝑥5 ) − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4

Multiple paths
+ 𝑥1
1
𝑥3 𝑤3 − 𝑓
𝑛𝑒𝑡2 𝜎 𝑥2
𝑤4 −1

Function: 𝑓 = 𝑥1 − 𝑥2 = (𝑥3 +𝑥5 ) − 𝜎 𝑤3 𝑥3 + 𝑤4 𝑥4

𝜕𝑓 𝜕𝑓 𝜕𝑥2 𝜕𝑛𝑒𝑡2 𝜕𝑓 𝜕𝑥1
Gradient: = + = −1 × 𝜎 ′ × 𝑤3 + 1 × 1 = −𝜎 ′ 𝑤3 + 1
𝜕𝑥3 𝜕𝑥2 𝜕𝑛𝑒𝑡2 𝜕𝑥3 𝜕𝑥1 𝜕𝑥3
Summary
• Forward to compute 𝑓
• Backward to compute the gradients

𝑛𝑒𝑡11
𝑥1 𝜎 ℎ11
+ 𝑓
𝑥2 𝜎 ℎ12
𝑛𝑒𝑡21
Math form
Gradient descent
• Minimize loss 𝐿෠ 𝜃 , where the hypothesis is parametrized by 𝜃

• Gradient descent
• Initialize 𝜃0
• 𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡 𝛻 𝐿෠ 𝜃𝑡
Stochastic gradient descent (SGD)
• Suppose data points arrive one by one

1 𝑛
෠
• 𝐿 𝜃 = σ𝑡=1 𝑙(𝜃, 𝑥𝑡 , 𝑦𝑡 ), but we only know 𝑙(𝜃, 𝑥𝑡 , 𝑦𝑡 ) at time 𝑡
𝑛

• Idea: simply do what you can based on local information

• Initialize 𝜃0
• 𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡 𝛻𝑙(𝜃𝑡 , 𝑥𝑡 , 𝑦𝑡 )
Mini-batch
• Instead of one data point, work with a small batch of 𝑏 points
(𝑥𝑡𝑏+1, 𝑦𝑡𝑏+1 ),…, (𝑥𝑡𝑏+𝑏, 𝑦𝑡𝑏+𝑏 )

• Update rule
1
𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡 𝛻 ෍ 𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖
𝑏
1≤𝑖≤𝑏

• Typical batch size: 𝑏 = 128

IASSC Lean Six Sigma Yellow Belt Exam Questions - 83q
No ratings yet
IASSC Lean Six Sigma Yellow Belt Exam Questions - 83q
33 pages
The Student's Guide To Financial Aid and Scholarships
No ratings yet
The Student's Guide To Financial Aid and Scholarships
2 pages
Answers To CES Test About Personal Survival and Survival Craft
100% (1)
Answers To CES Test About Personal Survival and Survival Craft
19 pages
ECE604 f20 hw3
0% (1)
ECE604 f20 hw3
3 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Talent Release Form
No ratings yet
Talent Release Form
2 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
2. Neural Network Training
No ratings yet
2. Neural Network Training
73 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Slides 11
No ratings yet
Slides 11
48 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
Backpropagation in Neural Nets
No ratings yet
Backpropagation in Neural Nets
13 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
AyushChokhani AI Asiignment 2
No ratings yet
AyushChokhani AI Asiignment 2
12 pages
Matrix Calculus
No ratings yet
Matrix Calculus
33 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
nn_pdf
No ratings yet
nn_pdf
11 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Week5-LectureNotes
No ratings yet
Week5-LectureNotes
15 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
6.034f Neural Net Notes October 28, 2010
No ratings yet
6.034f Neural Net Notes October 28, 2010
7 pages
Learning 3
No ratings yet
Learning 3
98 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Pr2_ANN_WriteUp.docx
No ratings yet
Pr2_ANN_WriteUp.docx
11 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Tut 01
No ratings yet
Tut 01
39 pages
Backpropagation Exercises
No ratings yet
Backpropagation Exercises
7 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
Deep Learning_Lecture 2_Neural Networks
No ratings yet
Deep Learning_Lecture 2_Neural Networks
39 pages
aml pa
No ratings yet
aml pa
17 pages
nn2
No ratings yet
nn2
12 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
Chapter 3 - Supervised Learning - Neural Network Final
No ratings yet
Chapter 3 - Supervised Learning - Neural Network Final
103 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
First
No ratings yet
First
92 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
Deep Learning Basics Lecture 6 Convolutional NN
No ratings yet
Deep Learning Basics Lecture 6 Convolutional NN
36 pages
OSRAM SFH 309 Datasheet
No ratings yet
OSRAM SFH 309 Datasheet
16 pages
Deep Learning Basics Lecture 8 Autoencoder & DBM
No ratings yet
Deep Learning Basics Lecture 8 Autoencoder & DBM
28 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
ECE604 f20 hw1
No ratings yet
ECE604 f20 hw1
1 page
Deep Learning Basics Lecture 11 Practical Methodology
No ratings yet
Deep Learning Basics Lecture 11 Practical Methodology
25 pages
Lectures On Electromagnetic Theory - Weng Cho Chew
No ratings yet
Lectures On Electromagnetic Theory - Weng Cho Chew
591 pages
PYu-RC Group 51 RoHS L 12
No ratings yet
PYu-RC Group 51 RoHS L 12
10 pages
SFH 203 - en
No ratings yet
SFH 203 - en
15 pages
SFH 235 Fa - en
No ratings yet
SFH 235 Fa - en
15 pages
ESPORT Outlined Framework
No ratings yet
ESPORT Outlined Framework
10 pages
Auto Rickshaw
No ratings yet
Auto Rickshaw
16 pages
END CLERKSHIP MCQ and SEQ
No ratings yet
END CLERKSHIP MCQ and SEQ
5 pages
Copy of Cashiering
No ratings yet
Copy of Cashiering
25 pages
Static and Dynamic Analysis On R410A Scroll Compressor Components
No ratings yet
Static and Dynamic Analysis On R410A Scroll Compressor Components
9 pages
Q4 Module 3
No ratings yet
Q4 Module 3
5 pages
X350 Manual
No ratings yet
X350 Manual
40 pages
CV Muhammad Dio Ariqsyah
No ratings yet
CV Muhammad Dio Ariqsyah
1 page
Ch-01 Introduction to Digital Electronics
No ratings yet
Ch-01 Introduction to Digital Electronics
35 pages
Dawn Editorials 23 Oct PDF
No ratings yet
Dawn Editorials 23 Oct PDF
18 pages
Digital Design I: Laboratory Experiments
No ratings yet
Digital Design I: Laboratory Experiments
9 pages
BYGPB5152H Bollu RamaKrishna OBIEE Developer Dynpro Ak
No ratings yet
BYGPB5152H Bollu RamaKrishna OBIEE Developer Dynpro Ak
6 pages
Predictive Inventory Management for Retail project
No ratings yet
Predictive Inventory Management for Retail project
1 page
Book of Financial Terms 4th Edition Surendra Sundararajan All Chapters Instant Download
100% (2)
Book of Financial Terms 4th Edition Surendra Sundararajan All Chapters Instant Download
53 pages
DMW OFW Hospital Publication of Vacant Positions 07 October 2024
No ratings yet
DMW OFW Hospital Publication of Vacant Positions 07 October 2024
1 page
Green Cycle Startup
No ratings yet
Green Cycle Startup
14 pages
Dataset Public (3)
No ratings yet
Dataset Public (3)
7,988 pages
Assign 2
No ratings yet
Assign 2
2 pages
"EE Financing in Indonesia": Financing Mechanisms Workshop by
100% (1)
"EE Financing in Indonesia": Financing Mechanisms Workshop by
13 pages
Responsibility Accounting MC Questions: Maaw - Info
No ratings yet
Responsibility Accounting MC Questions: Maaw - Info
10 pages
Tackling Informality - Comparing Approaches and Strategies in South Korean and Latin American Cities
No ratings yet
Tackling Informality - Comparing Approaches and Strategies in South Korean and Latin American Cities
69 pages
EG2101-L21a RT2014 PDF
No ratings yet
EG2101-L21a RT2014 PDF
23 pages
(G.R. No. L-45168, September 25, 1979)
No ratings yet
(G.R. No. L-45168, September 25, 1979)
4 pages
Form 1 - Classroom Level: School Based Deworming Accomplishment Report (1St Round/) ELEMENTARY SY2022-2023
No ratings yet
Form 1 - Classroom Level: School Based Deworming Accomplishment Report (1St Round/) ELEMENTARY SY2022-2023
12 pages
COT 2-Cookeryg7
100% (1)
COT 2-Cookeryg7
5 pages
PARTNERSHIP LETTER-PTA & POW 2025
No ratings yet
PARTNERSHIP LETTER-PTA & POW 2025
3 pages