0% found this document useful (0 votes)
90 views33 pages

RNNs for Sequential Data Modeling

The document discusses recurrent neural networks (RNNs) and their ability to model sequential data like text, speech, and time series data, as RNNs can process sequences of inputs with variable lengths unlike feedforward networks and convolutional networks. It then provides an example of using a simple RNN model to perform sentiment analysis on movie reviews, showing the model architecture including an embedding layer, RNN layer, and dense output layer to classify reviews as positive or negative. Finally, it evaluates the simple RNN model on the movie review task, achieving around 89% accuracy on the training data.

Uploaded by

MInh Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views33 pages

RNNs for Sequential Data Modeling

The document discusses recurrent neural networks (RNNs) and their ability to model sequential data like text, speech, and time series data, as RNNs can process sequences of inputs with variable lengths unlike feedforward networks and convolutional networks. It then provides an example of using a simple RNN model to perform sentiment analysis on movie reviews, showing the model architecture including an embedding layer, RNN layer, and dense output layer to classify reviews as positive or negative. Finally, it evaluates the simple RNN model on the movie review task, achieving around 89% accuracy on the training data.

Uploaded by

MInh Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Recurrent Neural Networks (RNNs)

Shusen Wang
How to model sequential data?

• Limitations of FC Nets and ConvNets:


• Process a paragraph as a whole.
• Fixed-size input (e.g., image).
• Fixed-size output (e.g., predicted
probabilities).
How to model sequential data?

• Limitations of FC Nets and ConvNets:


• Process a paragraph as a whole.
• Fixed-size input (e.g., image).
• Fixed-size output (e.g., predicted
probabilities).

• RNNs are better ways to model the


sequential data (e.g., text, speech, and
time series).
Recurrent Neural Networks (RNNs)
state

parameter

Word
embedding
word the cat sat ⋯ mat
Simple RNN Model
Simple RNN

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

• Suppose 𝐱 ( = ⋯ = 𝐱*(( = 0.

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

• Suppose 𝐱 ( = ⋯ = 𝐱*(( = 0.
• 𝐡*(( = 𝐀𝐡.. = 𝐀/ 𝐡.0 = ⋯ = 𝐀*(( 𝐡( .
• What will happen if 𝜆234 𝐀 = 0.9?
• What will happen if 𝜆234 𝐀 = 1.2?

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Trainable parameters: matrix 𝐀
• #rows of A: shape(h)
• #cols of A: shape(h)+shape(x)

• Total #parameter: shape(h)× [shape(h)+shape(x)]

'()*
= tanh
+(

'( ,
Simple RNN for Movie Review Analysis
Simple RNN for IMDB Review

shape 𝐡 = 32

shape 𝐱 = 32

Word embedding
i love the ⋯ much
Simple RNN for IMDB Review

shape 𝐡 = 32

shape 𝐱 = 32

Word embedding
i love the ⋯ much
Simple RNN for IMDB Review

𝑠igmoid(𝐯 F 𝐡G )

shape 𝐡 = 32

shape 𝐱 = 32

Word embedding
i love the ⋯ much
4.
Build
a
Simple
Recurrent
Neural
Network
Simple RNN for IMDB Review
In [8]:
from keras.models import Sequential
from keras.layers import SimpleRNN, Embedding, Dense

vocabulary = 10000 #unique words in the dictionary


embedding_dim = 32 shape 𝐱 = 32
word_num = 500 sequence length
state_dim = 32 shape 𝐡 = 32

model = Sequential()
model.add(Embedding(vocabulary, embedding_dim, input_length=word_num))
model.add(SimpleRNN(state_dim, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
Only return the last state ℎG .
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: Future
from ._conv import register_converters as _register_converters
Simple
Using TensorFlow RNN for IMDB Review
backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 32) 2080
_________________________________________________________________
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 322,113
Trainable params: 322,113 #parameters in RNN:
Non-trainable params: 0 2080 = 32× 32 + 32 + 32
_________________________________________________________________

In [12]: shape 𝐡 = 32 shape 𝐱


=================================================================

Simple RNN for IMDB Review


embedding_1 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
In [12]:
simple_rnn_1 (SimpleRNN) (None, 32) 2080
_________________________________________________________________
dense_1 (Dense) (None, 1) 33
from keras import optimizers
=================================================================
Total params: 322,113

Early stopping alleviates overfitting


Trainable params: 322,113
epochs = 3
Non-trainable params: 0
_________________________________________________________________

In [9]:
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
from keras import optimizers
loss='binary_crossentropy', metrics=['acc'])
epochs = 3
history = model.fit(x_train, y_train, epochs=epochs,
batch_size=32, validation_data=(x_valid, y_valid))
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=epochs,
Train on 20000 samples, validate on 5000 samples
batch_size=32, validation_data=(x_valid, y_valid))

Epoch
Train 1/3
on 20000 samples, validate on 5000 samples
Epoch 1/3
20000/20000
20000/20000 [==============================]
[==============================] - 0.6959
- 65s 3ms/step - loss: 0.5514 - acc: 65s - 3ms/step
val_loss: 0.4095-- loss: 0.81
val_acc: 0.8176
Epoch 2/3
Epoch 2/3
20000/20000 [==============================] - 66s 3ms/step - loss: 0.3336 - acc: 0.8620 - val_loss: 0.3296 - val_acc: 0.8658
Epoch 3/3
20000/20000 [==============================] - 65s 3ms/step - loss: 0.51
20000/20000 [==============================] - 65s 3ms/step - loss: 0.2774 - acc: 0.8918 - val_loss: 0.3569 - val_acc: 0.8428
Epoch 3/3
In [10]:
20000/20000 [==============================] - 68s 3ms/step - loss: 0.42
import matplotlib.pyplot as plt
%matplotlib inline
Simple RNN for IMDB Review
In [13]:
loss_and_acc = model.evaluate(x_test, labels_test)
print('loss = ' + str(loss_and_acc[0]))
print('acc = ' + str(loss_and_acc[1]))
25000/25000 [==============================] - 21s 833us/step
loss = 0.6593638356399536
acc = 0.78984

Higher than a naïve shallow model (whose test accuracy is about 75%).
5.
LSTM
In [1]:
Simple RNN for IMDB Review

• Training Accuracy: 89.2%


• Validation Accuracy: 84.3%
• Test Accuracy: 84.4%

Higher than a naïve shallow model (whose test accuracy is about 75%).
Simple RNN for IMDB Review
𝑠igmoid(𝐯 F 𝐡)

Flatten: 𝐡 = vec 𝐡* , ⋯ , 𝐡G

i love the ⋯ much


In [15]: Simple RNN for IMDB Review
from keras.models import Sequential
from keras.layers import SimpleRNN, Embedding, Dense

vocabulary = 10000
embedding_dim = 32
word_num = 500
state_dim = 32

model = Sequential()
model.add(Embedding(vocabulary, embedding_dim, input_length=word_num))
model.add(SimpleRNN(state_dim, return_sequences=True))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.summary() Return all the states ℎ*, ⋯ , ℎG .


_________________________________________________________________
model.add(SimpleRNN(state_dim, return_sequences=True))
model.add(Flatten())
Simple RNN for IMDB Review
model.add(Dense(1))

model.summary()

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
simple_rnn_2 (SimpleRNN) (None, 500, 32) 2080
_________________________________________________________________
flatten_2 (Flatten) (None, 16000) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 16001
=================================================================
Total params: 338,081
Trainable params: 338,081
Non-trainable params: 0
_________________________________________________________________
Simple RNN for IMDB Review

• Training Accuracy: 96.3%


• Validation Accuracy: 85.4%
• Test Accuracy: 84.7%

Not really better than using only the final state (whose accuracy is 84.4%).
Shortcomings of SimpleRNN
SimpleRNN is good at short-term dependence.

Predicted next words: sky

Input text: clouds are in the

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.


SimpleRNN is bad at long-term dependence.

O 𝐡PQQ
𝐡*(( is almost irrelevant to 𝐱* : is near zero.
O 𝐱 P

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.


SimpleRNN is bad at long-term dependence.

Predicted next words: Chinese

Input text: in China speak fluent

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.


Summary
Summary
• RNN for text, speech, and time series data.
• Hidden state 𝐡G aggregates information in the inputs 𝐱 ( , ⋯ , 𝐱 G .
Summary
• RNN for text, speech, and time series data.
• Hidden state 𝐡G aggregates information in the inputs 𝐱 ( , ⋯ , 𝐱 G .
• RNNs can forget early inputs.
• It forgets what it has seen early on.
• If 𝑡 is large, 𝐡G is almost irrelevant to 𝐱 (.
Number of Parameters

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).


• Shape of the parameter matrix is
shape(h) × [shape(h)+shape(x)].
Number of Parameters

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).


• Shape of the parameter matrix is
shape(h) × [shape(h)+shape(x)].
• Only one such parameter matrix, no matter how long the sequence is.
Thank You!

You might also like