Lesson 6
PRACTICAL DEEP LEARNING FOR CODERS (V2)
Why we need RNNs
“I went to Nepal in 2009” Variable length Long-term
“In 2009, I went to Nepal” sequence dependency
Stateful
Memory
Representation
\begin{proof} We may assume that $\mathcal{I}$ is an abelian sheaf on
$\mathcal{C}$. \item Given a morphism $\Delta : \mathcal{F} \to \mathcal{I}$
is an injective and let $\mathfrak q$ be an abelian sheaf on $X$. Let
$\mathcal{F}$ be a fibered complex. Let $\mathcal{F}$ be a category.
Basic NN with single hidden layer
Output: batch_size * #classes
Matrix product; softmax
Hidden: batch_size * # activations
Matrix product; relu
Input: batch_size * #inputs
Image CNN with single dense hidden layer NB: batch_size dimension and activation
function not shown here or in following slides
Output: #classes
Matrix product
FC1: # activations
(Flatten); matrix product
Output
Conv1: # filters * (h/2) * (w/2)
Hidden
Convolution stride 2
Input
Input: #channel * h * w
Predicting char 3 using chars 1 & 2 NB: layer operations not shown;
remember that arrows represent layer operations
char 3 output: vocab size
FC2: # activations
Output
char 2 input FC1: # activations
Hidden
Input
char 1 input: vocab size
Predicting char 4 using chars 1, 2 & 3
InputHidden
char 4 output: vocab size
HiddenOutput
HiddenHidden
FC3: # activations
char 3 input FC2: # activations
char 2 input FC1: # activations
char 1 input: vocab size
Predicting char n using chars 1 to n-1 NB: no hidden/output labels shown
InputHidden
HiddenOutput
HiddenHidden
char n input Repeat for 2n-1
Output
Hidden
Input
char 1 input
Predicting chars 2 to n using chars 1 to n-1
InputHidden
HiddenOutput
HiddenHidden
char n input Repeat for 1n-1
Output
Hidden
Initialize to zeros
Input
Predicting chars 2 to n using chars 1 to n-1 using stacked RNNs
Repeat for 1n-1
char n input Repeat for 1n-1
Initialize to zeros
Initialize to zeros
Unrolled stacked RNNs for sequences
char 3 input
char 2 input
char 1 input
Backprop
InputHidden
HiddenOutput
HiddenHidden
Loss
char 3 input
char 2 input