Chap 6 - Deep FeedForward Networks - Eunjeong Yi
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
2017-07-26
Eunjeong Yi
Chapter 6. Deep Feedforward Networks
2 InfoLab
Deep feedforward network
No feedback connection
Structure of model
Ø Input layer, hidden layer, output layer
Ø The depth, width of model
Ø Cost function
Depth
Width
3 InfoLab
Example: Learning XOR
XOR function (“exclusive or”)
𝒙𝟏 𝒙𝟐 𝒙𝟏 𝑿𝑶𝑹 𝒙𝟐
0 0 0
0 1 1
1 0 1
1 1 0
𝕏 = { 𝟎, 𝟎 𝑻 , 𝟎, 𝟏 𝑻 , 𝟏, 𝟎 𝑻 , 𝟏, 𝟏 𝑻 }
Mean squared error (MSE) loss function
𝟏 𝟐
𝑱 𝜽 = , 𝒇∗ 𝒙 − 𝒇 𝒙; 𝜽
𝟒
𝐱∈𝕏
• 𝑓 ∗ 𝑥 : 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟
• 𝑓 𝑥; 𝜃 : 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡 𝑏𝑦 𝑛𝑒𝑢𝑟𝑎𝑙 𝑛𝑒𝑡𝑤𝑜𝑟𝑘
4 InfoLab
Example: Learning XOR
ℎ = 𝑔(𝑤MQ 𝑥 + 𝑏)
𝑤MQ 𝑥 + 𝑏
𝑤M 𝑤N
𝑥M ℎM
𝑓(𝑥; 𝜃)
𝑥N ℎN
𝑤NQ ℎ + 𝑐
Input layer Hidden layer Output layer
(g = activation function, b, c = bias)
𝑓 𝑥; 𝜃 = 𝑓 𝑥; 𝑤M , 𝑏, 𝑤N , 𝑐 = 𝑤NQ 𝑔 𝑤MQ 𝑥 + 𝑏 + 𝑐
5 InfoLab
Gradient-based Learning
Loss function of neural network is non-convex
ØTrained by using iterative, gradient-based optimizers
Cost function
Ø Learning Conditional Distribution
Ø Learning Conditional Statistics
Output layers
Ø Linear Units for Gaussian Output Distributions
Ø Sigmoid Units
Ø Softmax Units
6 InfoLab
Learning Conditional Distribution
Negative log-likelihood as cross-entropy between
training data and model distribution
𝑱 𝜽 = 𝑯 𝒑j𝒅𝒂𝒕𝒂 , 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙 =− , 𝒑
j 𝒅𝒂𝒕𝒂 𝒍𝒐𝒈 𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙
𝒙,𝒚~𝒑
j 𝒅𝒂𝒕𝒂
𝑱 𝜽 = −𝔼𝒙,𝒚~𝒑V𝒅𝒂𝒕𝒂 𝒍𝒐𝒈𝒑𝒎𝒐𝒅𝒆𝒍 𝒚 𝒙
7 InfoLab
Learning Conditional Statistics
8 InfoLab
Linear Units for Gaussian Output Distributions
Feature h, produce a vector 𝑦j
𝑦j = 𝑊 Q ℎ + 𝑏
𝑦j
Output layer
ℎ = 𝑓(𝑥; 𝜃)
9 InfoLab
Sigmoid Units
Binary classification
V = 𝝈 𝒘𝑻 𝒉 + 𝒃
Output: 𝒚
M
Ø𝜎 𝑧 =
M‚ƒ „…
Saturate to 0 and 1
10 InfoLab
Softmax Units
Multiclass classification
To generalize to the case of a discrete variable with n
V, 𝒘𝒊𝒕𝒉 𝒚V𝒊 = 𝑷(𝒚 = 𝒊|𝒙)
values, vector 𝒚
𝒛 = 𝑾𝑻 𝒉 + 𝒃 𝒛𝒊 = 𝒍𝒐𝒈 𝑷Š 𝒚=𝒊𝒙
𝒆𝒙𝒑 𝒛𝒊
𝒔𝒐𝒇𝒕𝒎𝒂𝒙 𝒛 𝒊 =
𝚺𝐣 𝒆𝒙𝒑 𝒛𝒋
11 InfoLab
Hidden Units
How to choose the type of hidden unit to use in the
hidden layers of the model
Input: 𝒛 = 𝑾𝑻 𝒙 + 𝒃
Activation function 𝒈(𝒛)
Ex) Rectified Linear Units (ReLU), Logistic Sigmoid and
Hyperbolic Tangent
ℎ = 𝑔(𝑤MQ 𝑥 + 𝑏)
𝑤M 𝑤N
𝑥M ℎM
𝑓(𝑥; 𝜃)
𝑥N ℎN
12 (g = activation
InfoLab function, b, c = bias)
Rectified Linear Units (ReLU)
Activation function 𝑔 𝑧
𝑔 𝑧 = max(0, 𝑧)
13 InfoLab
Logistic Sigmoid and Hyperbolic Tangent
Activation function
𝒈 𝒛 =𝝈 𝒛
or
𝒈 𝒛 = 𝒕𝒂𝒏𝒉(𝒛) = 𝟐𝝈(𝟐𝒛) − 𝟏
14 InfoLab
Back-Propagation
ℎM ℎN
𝑤M
𝑥M 𝑥N
15 InfoLab
Back-Propagation
𝑤N
𝒅𝒚
ℎM ℎN
𝒅𝒉
𝑤M
𝑥M 𝑥N
16 InfoLab
Back-Propagation
𝑤N
𝒅𝒚
ℎM ℎN
𝒅𝒉
𝑤M
𝒅𝒉
𝑥M 𝑥N
𝒅𝒙
17 InfoLab
Back-Propagation
𝑤N
𝒅𝒚
ℎM ℎN
𝒅𝒉
𝑤M
𝒅𝒉 𝒅𝒚 𝒅𝒚 𝒅𝒉
𝑥M 𝑥N = ×
𝒅𝒙 𝒅𝒙 𝒅𝒉 𝒅𝒙
18 InfoLab
Back-Propagation
𝛿𝑦 𝛿𝑦 𝛿𝑥M 𝛿ℎM
𝑦 = × ×
𝛿𝑤M,M 𝛿𝑥M 𝛿ℎM 𝛿𝑤M,M
𝑤N
Update 𝑤M,M
”u
ℎM 𝑤M,M = 𝑤M,M − 𝜂
”•–,–
( 𝜂: learning weight)
𝑤M,M 𝑤M,N
19 InfoLab
Next Deep learning seminar
20 InfoLab
InfoLab
Thank you
21