0% found this document useful (0 votes)

90 views33 pages

RNNs for Sequential Data Modeling

The document discusses recurrent neural networks (RNNs) and their ability to model sequential data like text, speech, and time series data, as RNNs can process sequences of inputs with variable lengths unlike feedforward networks and convolutional networks. It then provides an example of using a simple RNN model to perform sentiment analysis on movie reviews, showing the model architecture including an embedding layer, RNN layer, and dense output layer to classify reviews as positive or negative. Finally, it evaluates the simple RNN model on the movie review task, achieving around 89% accuracy on the training data.

Uploaded by

MInh Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views33 pages

RNNs for Sequential Data Modeling

Uploaded by

MInh Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Recurrent Neural Networks (RNNs)

Shusen Wang
How to model sequential data?

• Limitations of FC Nets and ConvNets:

• Process a paragraph as a whole.
• Fixed-size input (e.g., image).
• Fixed-size output (e.g., predicted
probabilities).
How to model sequential data?

• Limitations of FC Nets and ConvNets:

• Process a paragraph as a whole.
• Fixed-size input (e.g., image).
• Fixed-size output (e.g., predicted
probabilities).

• RNNs are better ways to model the

sequential data (e.g., text, speech, and
time series).
Recurrent Neural Networks (RNNs)
state

parameter

Word
embedding
word the cat sat ⋯ mat
Simple RNN Model
Simple RNN

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

• Suppose 𝐱 ( = ⋯ = 𝐱*(( = 0.

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Question: Why do we need the tanh function?

• Suppose 𝐱 ( = ⋯ = 𝐱*(( = 0.
• 𝐡*(( = 𝐀𝐡.. = 𝐀/ 𝐡.0 = ⋯ = 𝐀*(( 𝐡( .
• What will happen if 𝜆234 𝐀 = 0.9?
• What will happen if 𝜆234 𝐀 = 1.2?

'()*
= tanh
+(

'( ,
hyperbolic tangent function
Simple RNN
Trainable parameters: matrix 𝐀
• #rows of A: shape(h)
• #cols of A: shape(h)+shape(x)

• Total #parameter: shape(h)× [shape(h)+shape(x)]

'()*
= tanh
+(

'( ,
Simple RNN for Movie Review Analysis
Simple RNN for IMDB Review

shape 𝐡 = 32

shape 𝐱 = 32

Word embedding
i love the ⋯ much
Simple RNN for IMDB Review

shape 𝐡 = 32

shape 𝐱 = 32

Word embedding
i love the ⋯ much
Simple RNN for IMDB Review

𝑠igmoid(𝐯 F 𝐡G )

shape 𝐡 = 32
⋯

shape 𝐱 = 32

Word embedding
i love the ⋯ much
4.
Build
a
Simple
Recurrent
Neural
Network
Simple RNN for IMDB Review
In [8]:
from keras.models import Sequential
from keras.layers import SimpleRNN, Embedding, Dense

vocabulary = 10000 #unique words in the dictionary

embedding_dim = 32 shape 𝐱 = 32
word_num = 500 sequence length
state_dim = 32 shape 𝐡 = 32

model = Sequential()
model.add(Embedding(vocabulary, embedding_dim, input_length=word_num))
model.add(SimpleRNN(state_dim, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
Only return the last state ℎG .
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: Future
from ._conv import register_converters as _register_converters
Simple
Using TensorFlow RNN for IMDB Review
backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 32) 2080
_________________________________________________________________
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 322,113
Trainable params: 322,113 #parameters in RNN:
Non-trainable params: 0 2080 = 32× 32 + 32 + 32
_________________________________________________________________

In [12]: shape 𝐡 = 32 shape 𝐱

=================================================================

Simple RNN for IMDB Review

embedding_1 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
In [12]:
simple_rnn_1 (SimpleRNN) (None, 32) 2080
_________________________________________________________________
dense_1 (Dense) (None, 1) 33
from keras import optimizers
=================================================================
Total params: 322,113

Early stopping alleviates overfitting

Trainable params: 322,113
epochs = 3
Non-trainable params: 0
_________________________________________________________________

In [9]:
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
from keras import optimizers
loss='binary_crossentropy', metrics=['acc'])
epochs = 3
history = model.fit(x_train, y_train, epochs=epochs,
batch_size=32, validation_data=(x_valid, y_valid))
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=epochs,
Train on 20000 samples, validate on 5000 samples
batch_size=32, validation_data=(x_valid, y_valid))

Epoch
Train 1/3
on 20000 samples, validate on 5000 samples
Epoch 1/3
20000/20000
20000/20000 [==============================]
[==============================] - 0.6959
- 65s 3ms/step - loss: 0.5514 - acc: 65s - 3ms/step
val_loss: 0.4095-- loss: 0.81
val_acc: 0.8176
Epoch 2/3
Epoch 2/3
20000/20000 [==============================] - 66s 3ms/step - loss: 0.3336 - acc: 0.8620 - val_loss: 0.3296 - val_acc: 0.8658
Epoch 3/3
20000/20000 [==============================] - 65s 3ms/step - loss: 0.51
20000/20000 [==============================] - 65s 3ms/step - loss: 0.2774 - acc: 0.8918 - val_loss: 0.3569 - val_acc: 0.8428
Epoch 3/3
In [10]:
20000/20000 [==============================] - 68s 3ms/step - loss: 0.42
import matplotlib.pyplot as plt
%matplotlib inline
Simple RNN for IMDB Review
In [13]:
loss_and_acc = model.evaluate(x_test, labels_test)
print('loss = ' + str(loss_and_acc[0]))
print('acc = ' + str(loss_and_acc[1]))
25000/25000 [==============================] - 21s 833us/step
loss = 0.6593638356399536
acc = 0.78984

Higher than a naïve shallow model (whose test accuracy is about 75%).
5.
LSTM
In [1]:
Simple RNN for IMDB Review

• Training Accuracy: 89.2%

• Validation Accuracy: 84.3%
• Test Accuracy: 84.4%

Higher than a naïve shallow model (whose test accuracy is about 75%).
Simple RNN for IMDB Review
𝑠igmoid(𝐯 F 𝐡)

Flatten: 𝐡 = vec 𝐡* , ⋯ , 𝐡G

i love the ⋯ much

In [15]: Simple RNN for IMDB Review
from keras.models import Sequential
from keras.layers import SimpleRNN, Embedding, Dense

vocabulary = 10000
embedding_dim = 32
word_num = 500
state_dim = 32

model = Sequential()
model.add(Embedding(vocabulary, embedding_dim, input_length=word_num))
model.add(SimpleRNN(state_dim, return_sequences=True))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.summary() Return all the states ℎ*, ⋯ , ℎG .

_________________________________________________________________
model.add(SimpleRNN(state_dim, return_sequences=True))
model.add(Flatten())
Simple RNN for IMDB Review
model.add(Dense(1))

model.summary()

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
simple_rnn_2 (SimpleRNN) (None, 500, 32) 2080
_________________________________________________________________
flatten_2 (Flatten) (None, 16000) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 16001
=================================================================
Total params: 338,081
Trainable params: 338,081
Non-trainable params: 0
_________________________________________________________________
Simple RNN for IMDB Review

• Training Accuracy: 96.3%

• Validation Accuracy: 85.4%
• Test Accuracy: 84.7%

Not really better than using only the final state (whose accuracy is 84.4%).
Shortcomings of SimpleRNN
SimpleRNN is good at short-term dependence.

Predicted next words: sky

Input text: clouds are in the

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

SimpleRNN is bad at long-term dependence.

O 𝐡PQQ
𝐡*(( is almost irrelevant to 𝐱* : is near zero.
O 𝐱 P

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

SimpleRNN is bad at long-term dependence.

Predicted next words: Chinese

Input text: in China speak fluent

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

Summary
Summary
• RNN for text, speech, and time series data.
• Hidden state 𝐡G aggregates information in the inputs 𝐱 ( , ⋯ , 𝐱 G .
Summary
• RNN for text, speech, and time series data.
• Hidden state 𝐡G aggregates information in the inputs 𝐱 ( , ⋯ , 𝐱 G .
• RNNs can forget early inputs.
• It forgets what it has seen early on.
• If 𝑡 is large, 𝐡G is almost irrelevant to 𝐱 (.
Number of Parameters

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).

• Shape of the parameter matrix is
shape(h) × [shape(h)+shape(x)].
Number of Parameters

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).

• Shape of the parameter matrix is
shape(h) × [shape(h)+shape(x)].
• Only one such parameter matrix, no matter how long the sequence is.
Thank You!

Neuralnetworks Research Assignment
No ratings yet
Neuralnetworks Research Assignment
7 pages
IMDB Sentiment Analysis with RNN
No ratings yet
IMDB Sentiment Analysis with RNN
8 pages
Deep Learning LAB
No ratings yet
Deep Learning LAB
47 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
RNN and LSTM Concepts Explained
No ratings yet
RNN and LSTM Concepts Explained
128 pages
RNNs: LSTM and GRU Overview
No ratings yet
RNNs: LSTM and GRU Overview
32 pages
Experiment 10
No ratings yet
Experiment 10
5 pages
Experiment 3 (A, B, C) (RNN) (Recuurent) (IMDB) )
No ratings yet
Experiment 3 (A, B, C) (RNN) (Recuurent) (IMDB) )
11 pages
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
No ratings yet
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
13 pages
Deep Learning Cat 2
No ratings yet
Deep Learning Cat 2
14 pages
CCS355
No ratings yet
CCS355
29 pages
Deep Learning Solutions Q1-29
No ratings yet
Deep Learning Solutions Q1-29
3 pages
RNN LSTM
No ratings yet
RNN LSTM
71 pages
Analytics Final Exam Review
No ratings yet
Analytics Final Exam Review
16 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
A Quick Recap: Artificial Intelligence LAB
No ratings yet
A Quick Recap: Artificial Intelligence LAB
29 pages
Module 4
No ratings yet
Module 4
36 pages
IMDB Sentiment Analysis with LSTM
No ratings yet
IMDB Sentiment Analysis with LSTM
5 pages
Exp 6,7,8
No ratings yet
Exp 6,7,8
17 pages
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
No ratings yet
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
28 pages
DL 3
No ratings yet
DL 3
6 pages
Deep Learning Viva
No ratings yet
Deep Learning Viva
5 pages
DL Lab1
No ratings yet
DL Lab1
15 pages
18 Rnns
No ratings yet
18 Rnns
57 pages
Build RNN with Numpy: Step-by-Step Guide
No ratings yet
Build RNN with Numpy: Step-by-Step Guide
36 pages
Outline
No ratings yet
Outline
50 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Predicting Chlorophyll with Neural Networks
No ratings yet
Predicting Chlorophyll with Neural Networks
17 pages
Unit 4
No ratings yet
Unit 4
50 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
60 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
PDL 06-Merged
No ratings yet
PDL 06-Merged
8 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages
DL Lab Manual
No ratings yet
DL Lab Manual
18 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
53 pages
Case Study - Sentiment Analysis With RNNs
No ratings yet
Case Study - Sentiment Analysis With RNNs
8 pages
DL 8
No ratings yet
DL 8
4 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
Introduction to RNNs and LSTMs
No ratings yet
Introduction to RNNs and LSTMs
52 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
Set A
No ratings yet
Set A
20 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
48 pages
Ccs355 - NN&DL Lab Manual
No ratings yet
Ccs355 - NN&DL Lab Manual
34 pages
AI Exam Prep: Neural Networks
No ratings yet
AI Exam Prep: Neural Networks
115 pages
UNIT5
No ratings yet
UNIT5
81 pages
Part 5
No ratings yet
Part 5
37 pages
NLP Using RNN
No ratings yet
NLP Using RNN
15 pages
AIDL04 AIDemos
No ratings yet
AIDL04 AIDemos
37 pages
Lecture 4 - Language Modelling and RNNs Part 2
No ratings yet
Lecture 4 - Language Modelling and RNNs Part 2
44 pages
Efficient Parallel RNNs: Revisiting LSTMs and GRUs
No ratings yet
Efficient Parallel RNNs: Revisiting LSTMs and GRUs
20 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Experiment 3.3
No ratings yet
Experiment 3.3
3 pages
Lab - 2 Csa
No ratings yet
Lab - 2 Csa
10 pages
Aiml C6 DL RNN CS
No ratings yet
Aiml C6 DL RNN CS
42 pages
Deep Learning Q1 Q2 Q3 Answers
No ratings yet
Deep Learning Q1 Q2 Q3 Answers
6 pages
RNN With LSTM
No ratings yet
RNN With LSTM
36 pages
41 RNN Notes
No ratings yet
41 RNN Notes
14 pages
Assignment No 2
No ratings yet
Assignment No 2
8 pages
2022 Streaming Summit Netflix
No ratings yet
2022 Streaming Summit Netflix
100 pages
Siamese Network: Shusen Wang
No ratings yet
Siamese Network: Shusen Wang
51 pages
Few-Shot Learning Explained
No ratings yet
Few-Shot Learning Explained
42 pages
Policy-Based RL Overview by Shusen Wang
No ratings yet
Policy-Based RL Overview by Shusen Wang
46 pages
RNN + RL: Shusen Wang
No ratings yet
RNN + RL: Shusen Wang
51 pages
Neural Architecture Search Guide
No ratings yet
Neural Architecture Search Guide
20 pages
Deep Q-Networks for RL Experts
No ratings yet
Deep Q-Networks for RL Experts
53 pages
Convex Function vs. Nonconvex Function: A Little Bit Theory: Shusen Wang
No ratings yet
Convex Function vs. Nonconvex Function: A Little Bit Theory: Shusen Wang
23 pages
Data Poisoning Attacks in NLP Models
No ratings yet
Data Poisoning Attacks in NLP Models
17 pages
Seq2Seq Neural Machine Translation
No ratings yet
Seq2Seq Neural Machine Translation
57 pages
Common CNN Architectures: Shusen Wang
No ratings yet
Common CNN Architectures: Shusen Wang
67 pages
Text Generation: Shusen Wang
No ratings yet
Text Generation: Shusen Wang
49 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
75 pages
2.python Modules
No ratings yet
2.python Modules
8 pages
Computer Fundamentals Quiz and Answers
No ratings yet
Computer Fundamentals Quiz and Answers
4 pages
Company JD
No ratings yet
Company JD
5 pages
Complete Chapter 1 Computer Class6 QA
No ratings yet
Complete Chapter 1 Computer Class6 QA
6 pages
Lgem5k gr016 - en P
No ratings yet
Lgem5k gr016 - en P
69 pages
Assignment Python Programming
No ratings yet
Assignment Python Programming
4 pages
50 Hands-On Scenarios For Detection, Analysis & Response
No ratings yet
50 Hands-On Scenarios For Detection, Analysis & Response
192 pages
Python Break and Continue
No ratings yet
Python Break and Continue
1 page
Hik-ProConnect: Cloud Security Solution
No ratings yet
Hik-ProConnect: Cloud Security Solution
7 pages
DrWeb Crash
No ratings yet
DrWeb Crash
24 pages
6.3 Branch - Bound, 0-1 Knapsack
No ratings yet
6.3 Branch - Bound, 0-1 Knapsack
11 pages
SE - 2024 - Assignment 1
No ratings yet
SE - 2024 - Assignment 1
5 pages
Industrial Visit Report
No ratings yet
Industrial Visit Report
19 pages
SIWES Overview & Dotmac Technologies
No ratings yet
SIWES Overview & Dotmac Technologies
33 pages
Lecture 10 - ARM Interrupt
No ratings yet
Lecture 10 - ARM Interrupt
50 pages
Qustion Bank ISE
No ratings yet
Qustion Bank ISE
2 pages
Code Tunnel
No ratings yet
Code Tunnel
18 pages
C - Programming Unit 1 Handwritten Notes
No ratings yet
C - Programming Unit 1 Handwritten Notes
20 pages
Connected Component Analysis in Digital Image Processing
0% (1)
Connected Component Analysis in Digital Image Processing
30 pages
System Admin Role & Firewall Types
No ratings yet
System Admin Role & Firewall Types
6 pages
RAN Architecture Evolution
100% (2)
RAN Architecture Evolution
14 pages
High Availability Design Patterns
No ratings yet
High Availability Design Patterns
10 pages
Moxa Managed Ethernet Switch Manual v9.7
No ratings yet
Moxa Managed Ethernet Switch Manual v9.7
115 pages
AWS Practitioner Slides Session I
No ratings yet
AWS Practitioner Slides Session I
55 pages
MT390 Digital Image Processing Exam
No ratings yet
MT390 Digital Image Processing Exam
4 pages
Chapter 3 Python Fundamentals
No ratings yet
Chapter 3 Python Fundamentals
9 pages
Imp GRC Tables
No ratings yet
Imp GRC Tables
3 pages
Yudisium TA. 18-19 Genap
No ratings yet
Yudisium TA. 18-19 Genap
122 pages
Unit-3 Instruction Codes
No ratings yet
Unit-3 Instruction Codes
42 pages
Adobe Scan 28 Oct 2022
No ratings yet
Adobe Scan 28 Oct 2022
25 pages

RNNs for Sequential Data Modeling

Uploaded by

RNNs for Sequential Data Modeling

Uploaded by

Recurrent Neural Networks (RNNs)

• Limitations of FC Nets and ConvNets:

• Limitations of FC Nets and ConvNets:

• RNNs are better ways to model the

• Total #parameter: shape(h)× [shape(h)+shape(x)]

vocabulary = 10000 #unique words in the dictionary

In [12]: shape 𝐡 = 32 shape 𝐱

Simple RNN for IMDB Review

Early stopping alleviates overfitting

• Training Accuracy: 89.2%

i love the ⋯ much

model.summary() Return all the states ℎ*, ⋯ , ℎG .

• Training Accuracy: 96.3%

Predicted next words: sky

Input text: clouds are in the

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

Predicted next words: Chinese

Input text: in China speak fluent

Figures are from Christopher Olah’s blog: Understanding LSTM Networks.

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).

• SimpleRNN has a parameter matrix (and perhaps an intercept vector).

You might also like