0% found this document useful (0 votes)

14 views21 pages

Chap 6 - Deep FeedForward Networks - Eunjeong Yi

Uploaded by

amr hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views21 pages

Chap 6 - Deep FeedForward Networks - Eunjeong Yi

Uploaded by

amr hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

InfoLab

Deep Learning Seminar

CH 6. Deep Feedforward Networks

2017-07-26
Eunjeong Yi
Chapter 6. Deep Feedforward Networks

6.1 Example: Learning XOR

6.2 Gradient-Based Learning
6.3 Hidden Units
6.4 Architecture Design
6.5 Back-Propagation and Other Differentiation
6.6 Historical Notes

2 InfoLab
Deep feedforward network
No feedback connection
Structure of model
Ø Input layer, hidden layer, output layer
Ø The depth, width of model
Ø Cost function

Depth

Width

Input layer Hidden layer Output layer

3 InfoLab
Example: Learning XOR
XOR function (“exclusive or”)

𝒙𝟏 𝒙𝟐 𝒙𝟏 𝑿𝑶𝑹 𝒙𝟐

0 0 0

0 1 1

1 0 1

1 1 0

𝕏 = { 𝟎, 𝟎 𝑻 , 𝟎, 𝟏 𝑻 , 𝟏, 𝟎 𝑻 , 𝟏, 𝟏 𝑻 }
Mean squared error (MSE) loss function
𝟏 𝟐
𝑱 𝜽 = , 𝒇∗ 𝒙 − 𝒇 𝒙; 𝜽
𝟒
𝐱∈𝕏
• 𝑓 ∗ 𝑥 : 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟
• 𝑓 𝑥; 𝜃 : 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡 𝑏𝑦 𝑛𝑒𝑢𝑟𝑎𝑙 𝑛𝑒𝑡𝑤𝑜𝑟𝑘

4 InfoLab
Example: Learning XOR
ℎ = 𝑔(𝑤MQ 𝑥 + 𝑏)
𝑤MQ 𝑥 + 𝑏

𝑤M 𝑤N
𝑥M ℎM

𝑓(𝑥; 𝜃)

𝑥N ℎN
𝑤NQ ℎ + 𝑐
Input layer Hidden layer Output layer
(g = activation function, b, c = bias)

𝑓 𝑥; 𝜃 = 𝑓 𝑥; 𝑤M , 𝑏, 𝑤N , 𝑐 = 𝑤NQ 𝑔 𝑤MQ 𝑥 + 𝑏 + 𝑐

5 InfoLab
Gradient-based Learning
Loss function of neural network is non-convex
ØTrained by using iterative, gradient-based optimizers

Cost function
Ø Learning Conditional Distribution
Ø Learning Conditional Statistics

Output layers
Ø Linear Units for Gaussian Output Distributions
Ø Sigmoid Units
Ø Softmax Units

6 InfoLab
Learning Conditional Distribution
Negative log-likelihood as cross-entropy between
training data and model distribution

𝑱 𝜽 = 𝑯 𝒑j𝒅𝒂𝒕𝒂 , 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙 =− , 𝒑
j 𝒅𝒂𝒕𝒂 𝒍𝒐𝒈 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙
𝒙,𝒚~𝒑
j 𝒅𝒂𝒕𝒂

𝑱 𝜽 = −𝔼𝒙,𝒚~𝒑V𝒅𝒂𝒕𝒂 𝒍𝒐𝒈𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙

V𝒅𝒂𝒕𝒂 ∶ 𝒕𝒓𝒂𝒊𝒏𝒊𝒏𝒈 𝒅𝒂𝒕𝒂 − 𝒈𝒆𝒏𝒆𝒓𝒂𝒕𝒊𝒏𝒈 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏

• 𝒑
V𝒅𝒂𝒕𝒂
• 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙 : 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒏𝒈 𝒑
V𝒅𝒂𝒕𝒂 , 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙 : 𝒄𝒓𝒐𝒔𝒔 𝒆𝒏𝒕𝒓𝒐𝒑𝒚 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒑
• 𝑯 𝒑 V𝒅𝒂𝒕𝒂 𝒂𝒏𝒅 𝒑𝒎𝒐𝒅𝒆𝒍

𝒍𝒐𝒈 function undoes the exp of some output units

7 InfoLab
Learning Conditional Statistics

Mean Square Error (MSE)

N
𝑓 ∗ = arg min 𝔼t,u~vwxyx 𝑦 − 𝑓 𝑥
s

Mean Absolute Error (MAE)

𝑓 ∗ = arg min 𝔼t,u~vwxyx 𝑦 − 𝑓 𝑥 M

MSE, MAE often lead to poor results when used with

gradient-based learning

8 InfoLab
Linear Units for Gaussian Output Distributions
Feature h, produce a vector 𝑦j

𝑦j = 𝑊 Q ℎ + 𝑏

To produce the mean of a conditional Gaussian

distribution
𝑝 𝑦 𝑥 = 𝒩(𝑦; 𝑦j, 𝐼)

No saturation → little difficult to gradient-based

optimization
𝑊Qℎ + 𝑏

𝑦j

Output layer
ℎ = 𝑓(𝑥; 𝜃)
9 InfoLab
Sigmoid Units
Binary classification
V = 𝝈 𝒘𝑻 𝒉 + 𝒃
Output: 𝒚
M
Ø𝜎 𝑧 =
M‚ƒ „…
Saturate to 0 and 1

© Copyright 2015-2017, CodeReclaimers

http://neat-python.readthedocs.io/en/latest/activation.html

10 InfoLab
Softmax Units
Multiclass classification
To generalize to the case of a discrete variable with n
V, 𝒘𝒊𝒕𝒉 𝒚V𝒊 = 𝑷(𝒚 = 𝒊|𝒙)
values, vector 𝒚

𝒛 = 𝑾𝑻 𝒉 + 𝒃 𝒛𝒊 = 𝒍𝒐𝒈 𝑷Š 𝒚=𝒊𝒙
𝒆𝒙𝒑 𝒛𝒊
𝒔𝒐𝒇𝒕𝒎𝒂𝒙 𝒛 𝒊 =
𝚺𝐣 𝒆𝒙𝒑 𝒛𝒋

11 InfoLab
Hidden Units
How to choose the type of hidden unit to use in the
hidden layers of the model
Input: 𝒛 = 𝑾𝑻 𝒙 + 𝒃
Activation function 𝒈(𝒛)
Ex) Rectified Linear Units (ReLU), Logistic Sigmoid and
Hyperbolic Tangent
ℎ = 𝑔(𝑤MQ 𝑥 + 𝑏)

𝑤M 𝑤N
𝑥M ℎM

𝑓(𝑥; 𝜃)

𝑥N ℎN

Input layer Hidden layer Output layer

12 (g = activation
InfoLab function, b, c = bias)
Rectified Linear Units (ReLU)
Activation function 𝑔 𝑧

𝑔 𝑧 = max(0, 𝑧)

If model’s behavior is closer to linear, models are easier

to optimize

© Copyright 2015-2017, CodeReclaimers

http://neat-python.readthedocs.io/en/latest/activation.html

13 InfoLab
Logistic Sigmoid and Hyperbolic Tangent
Activation function
𝒈 𝒛 =𝝈 𝒛
or
𝒈 𝒛 = 𝒕𝒂𝒏𝒉(𝒛) = 𝟐𝝈(𝟐𝒛) − 𝟏

Saturation of sigmoidal units make gradient-based

learning difficult

© Copyright 2015-2017, CodeReclaimers

http://neat-python.readthedocs.io/en/latest/activation.html

14 InfoLab
Back-Propagation

Method to calculate the gradient of the loss

function with respect to the weights in an artificial
neural network
Forward propagation result
𝑦
𝒚 = 𝒘𝑻𝟐 𝒉 + 𝒃 = 𝒘𝑻𝟐 𝒈 𝒘𝑻𝟏 𝒙 + 𝒄 + 𝒃
(g = activation function, b, c = bias)
𝑤N

ℎM ℎN

𝑤M

𝑥M 𝑥N

15 InfoLab
Back-Propagation

Method to calculate the gradient of the loss

function with respect to the weights in an artificial
neural network

𝑤N

𝒅𝒚
ℎM ℎN
𝒅𝒉

𝑤M

𝑥M 𝑥N

16 InfoLab
Back-Propagation

Method to calculate the gradient of the loss

function with respect to the weights in an artificial
neural network

𝑤N

𝒅𝒚
ℎM ℎN
𝒅𝒉

𝑤M

𝒅𝒉
𝑥M 𝑥N
𝒅𝒙

17 InfoLab
Back-Propagation

Method to calculate the gradient of the loss

function with respect to the weights in an artificial
neural network

𝑤N

𝒅𝒚
ℎM ℎN
𝒅𝒉

𝑤M

𝒅𝒉 𝒅𝒚 𝒅𝒚 𝒅𝒉
𝑥M 𝑥N = ×
𝒅𝒙 𝒅𝒙 𝒅𝒉 𝒅𝒙

18 InfoLab
Back-Propagation

𝛿𝑦 𝛿𝑦 𝛿𝑥M 𝛿ℎM
𝑦 = × ×
𝛿𝑤M,M 𝛿𝑥M 𝛿ℎM 𝛿𝑤M,M
𝑤N
Update 𝑤M,M
”u
ℎM 𝑤M,M = 𝑤M,M − 𝜂
”•–,–
( 𝜂: learning weight)
𝑤M,M 𝑤M,N

Using same way,

𝑥M 𝑥N
all weight update

19 InfoLab
Next Deep learning seminar

Chapter 7. Regularization for Deep Learning

7.1 Parameter Norm Penalties

7.2 Norm Penalties as Constrained Optimizaiton
7.3 Regularization and Under-Constrained Problems
7.4 Dataset Augmentation
7.5 Noise Robustness
7.6 Semi-Supervised Learning
7.7 Multitask Learning

20 InfoLab
InfoLab

Thank you

10 Gradient Based Learning 10-08-2024
No ratings yet
10 Gradient Based Learning 10-08-2024
22 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
MLSlides3 1 Selected Shared
No ratings yet
MLSlides3 1 Selected Shared
20 pages
15 Deep
No ratings yet
15 Deep
39 pages
Optimization
No ratings yet
Optimization
51 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Montanari
No ratings yet
Montanari
10 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Slides 11
No ratings yet
Slides 11
48 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Machine Learning Lecture 11
No ratings yet
Machine Learning Lecture 11
28 pages
MV cs4243 2024 Amir 6 p1
No ratings yet
MV cs4243 2024 Amir 6 p1
97 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
NN PDF
No ratings yet
NN PDF
23 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
DL 2
No ratings yet
DL 2
62 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
3 Examples
100% (1)
3 Examples
21 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Chapter 6 - Feedforward Deep Networks
No ratings yet
Chapter 6 - Feedforward Deep Networks
27 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Complete Accuracy and Stability of Numerical Algorithms Second Edition Nicholas J. Higham Ebook PDF File All Chapters
No ratings yet
Complete Accuracy and Stability of Numerical Algorithms Second Edition Nicholas J. Higham Ebook PDF File All Chapters
67 pages
Module 2
No ratings yet
Module 2
44 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Python Codes For Root Finding
No ratings yet
Python Codes For Root Finding
4 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Lecture 1&2
No ratings yet
Lecture 1&2
19 pages
Power Method
No ratings yet
Power Method
19 pages
08 - Fundemental of IoT Security
No ratings yet
08 - Fundemental of IoT Security
12 pages
AD3311 Lab Program
No ratings yet
AD3311 Lab Program
24 pages
302 Defective Chessboard
No ratings yet
302 Defective Chessboard
38 pages
Write Code For Digital Circuit Optimization Using PSO in Matlab
No ratings yet
Write Code For Digital Circuit Optimization Using PSO in Matlab
10 pages
ALGEBRAIC EXPRESSIONS Worksheet - Solution
No ratings yet
ALGEBRAIC EXPRESSIONS Worksheet - Solution
2 pages
MAE 290A, Fall 2018, Syllabus
No ratings yet
MAE 290A, Fall 2018, Syllabus
2 pages
Lec 2
No ratings yet
Lec 2
37 pages
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
100% (1)
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
21 pages
AlphaGo Mastering The Game of Go With Deep Neural Networks and Tree Search
100% (2)
AlphaGo Mastering The Game of Go With Deep Neural Networks and Tree Search
273 pages
Deep Learning Lab Course 2017 (Deep Learning Practical)
No ratings yet
Deep Learning Lab Course 2017 (Deep Learning Practical)
49 pages
Unit 4
No ratings yet
Unit 4
148 pages
Lcdf3 Chap 08
No ratings yet
Lcdf3 Chap 08
41 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Learning Optimal Objective Values For MILP.18321v1
No ratings yet
Learning Optimal Objective Values For MILP.18321v1
12 pages
Maths Test Class - X Chapter - 2 Polynomials
No ratings yet
Maths Test Class - X Chapter - 2 Polynomials
9 pages
Numerical Methods-MIDTERM
No ratings yet
Numerical Methods-MIDTERM
8 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
No ratings yet
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
8 pages
DFT Table
No ratings yet
DFT Table
2 pages
Data Structure and Algorithms (CO2003) : Chapter 2 - Algorithm Complexity
No ratings yet
Data Structure and Algorithms (CO2003) : Chapter 2 - Algorithm Complexity
33 pages
Bisection Method DEFINITION - Bisection Method Is The Simplest Among All The Numerical Schemes To Solve The
No ratings yet
Bisection Method DEFINITION - Bisection Method Is The Simplest Among All The Numerical Schemes To Solve The
3 pages
Question (1) : Draw The B.M.D. For The Following Frame Using The Consistence Deformation Method. Knowing That E 2000 T/CM, I 30000 CM, and A 20 CM
No ratings yet
Question (1) : Draw The B.M.D. For The Following Frame Using The Consistence Deformation Method. Knowing That E 2000 T/CM, I 30000 CM, and A 20 CM
11 pages
DM Paper PPT New-1
No ratings yet
DM Paper PPT New-1
14 pages
Initialization
No ratings yet
Initialization
16 pages
Program 1: Te A Program To Implement Bisec
No ratings yet
Program 1: Te A Program To Implement Bisec
24 pages
Or Syllabus
No ratings yet
Or Syllabus
2 pages
Numerical Methods For Optimization Lecture 6: Maximum Flow Problems
No ratings yet
Numerical Methods For Optimization Lecture 6: Maximum Flow Problems
30 pages
Iterative Techniques in Matrix Algebra Jacobi & Gauss-Seidel Iterative Techniques II
No ratings yet
Iterative Techniques in Matrix Algebra Jacobi & Gauss-Seidel Iterative Techniques II
107 pages
Back Propagation Algorithm To Solve Ordinary Differential Equations
No ratings yet
Back Propagation Algorithm To Solve Ordinary Differential Equations
3 pages
"Explain With Example That Rate of Convergence of False Position Method Is Faster Than That of The Bisection Method.
No ratings yet
"Explain With Example That Rate of Convergence of False Position Method Is Faster Than That of The Bisection Method.
3 pages
Simplex Algorithm
No ratings yet
Simplex Algorithm
2 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet