0% found this document useful (0 votes)
10 views89 pages

Lesson 7 - RNN

This document provides an overview of Recurrent Neural Networks (RNNs) and their applications, including LSTMs and GRUs. It discusses the architecture, working principles, and challenges of RNNs, such as the vanishing gradient problem, while also highlighting practical applications like stock price prediction and sentiment analysis. The document emphasizes the importance of sequential data in training these models and offers insights into advanced structures like deep and bidirectional RNNs.

Uploaded by

pradeep191988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views89 pages

Lesson 7 - RNN

This document provides an overview of Recurrent Neural Networks (RNNs) and their applications, including LSTMs and GRUs. It discusses the architecture, working principles, and challenges of RNNs, such as the vanishing gradient problem, while also highlighting practical applications like stock price prediction and sentiment analysis. The document emphasizes the importance of sequential data in training these models and offers insights into advanced structures like deep and bidirectional RNNs.

Uploaded by

pradeep191988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning with Keras and

TensorFlow
Recurrent Neural Networks (RNN)
Learning Objectives

By the end of this lesson, you will be able to:

Implement RNNs for sequential data

Use LSTMs for memory operations within RNNs

Perform gated operations in LSTMs using GRUs

Improve the performance of LSTMs using the Attention


mechanism
Sequence Data
What Is Sequential Data?

The dataset is said to be sequential when the data points are dependent on other data points
within a dataset.

Example: Time Series Data


Sequential Data: Problems

Consider you have a sequential data that contains temperature and humidity values for everyday.

Temperature

Humidity

Goal: To build a neural network that imports the temperature and humidity values of a
given day and predicts if the weather for that day is sunny or rainy.
Sequential Data: Problems

The data then flows to the hidden layers, where the weights and biases are applied.

Weights ([0.23 + 0.72]) + Bias


Cloudy

Sunny
Sequential Data: Problems

A traditional neural network assumes that the data is non-sequential and each data point is
independent of the others.

Cloudy

Sunny

Note: The network does not remember what it gives as an output. It just accepts the next
data point.
Sequential Data: Problems

In the weather data, there is a strong correlation between the weather from one day and the
weather in subsequent days. The former has influence over the latter.

If it was sunny on one day in the


middle of summer, it’s easy to
to presume that it’ll also be sunny
on the following day.
Sequential Data: Solution

An RNN has a mechanism that can handle a sequential dataset.

Data in Data out


RNN

Recurs new state


to itself
RNN Model
The RNN Model

The RNN remembers the analysis done upto a given point by maintaining a state.

Data in Data out


RNN

Recurs new state


to itself

Note: You can think of the state as the memory of RNN which recurs into the net with
each new input.
RNN: Working

The first data point flows into the network as input data, denoted as x.

𝑊𝑥
RNN
x Y
𝑊ℎ
ℎ𝑝𝑟𝑣

Weights are multiplied


by the previously Recurs new state
hidden state to itself

Previous state received by the


hidden units

Weight matrix between input


and hidden units
RNN: Working

Two values are calculated in the hidden layer as shown below:

The new or
updated state, The output of the
denoted as h_new, network is
is used for the denoted as y
next data point
RNN: Working

𝑊𝑥
x Y
𝑊ℎ ℎ𝑛𝑒𝑤
ℎ𝑛𝑒𝑤 = tanh(𝑊ℎ . ℎ𝑝𝑟𝑣 + 𝑊𝑥 . x)
ℎ𝑝𝑟𝑣

Previous state received by the


hidden units
The new state is a function of the previous
state and the input data
RNN: Working

The output of the hidden unit is simply calculated by multiplication of the new hidden state and the
output weight matrix.

𝑊𝑥
x Y
𝑊ℎ ℎ𝑛𝑒𝑤
ℎ𝑝𝑟𝑣

After processing the first data point, a new context is generated that represents the most recent point.
Then, this context is fed back into the net with the next data point and we repeat these steps until all the
data is processed.
A Typical RNN
0

ot-1 ot Ot+1

V V V V
W st-1 st St+1
W

W W W
Unfold
U U U U

x Xt-1 xt Xt+1

ht h0 h1 h2 ht

A A A A A

xt x0 x1 x2 xt
Reduces Complexity

Given function f: h’, y = f (h,x): h and h’ are vectors with the same dimension.

y1 y2 y3

h0 f h1 f h2 f h3 ……

x1 x2 x3

We only need one


function f,
irrespective of the
input and output
sequences.
Applications of RNN
Speech Recognition

The goal is to consume a sequence of data and then produce another sequence.

Data in Data out


RNN

Recurs new state


to itself
Image Captioning

You can create a model that’s capable of understanding the elements in an image.

Data in Data out


RNN

Recurs new state


to itself

Note: There is just one input (the image) and the output is a sequence of words.
Therefore, it is also known as one-to-many.
Sentiment Analysis

RNNs can be used for sentiment analysis, where it focuses only on the final output and not on the
sentiment behind each word.

Data in Data out


RNN

Recurs new state


to itself

Note: The RNN here consumes a sequence of data and produces just one output.
Therefore, it is also known as many-to-one.
Deep RNNs
Problems with Smaller RNN Networks

If 𝑥1 .... 𝑥𝑛 is very large and continues to grow, the fully connected network will become too big.

x1
y1
x2
y2
x3
y3
x4
Bidirectional RNNs

Bidirectional RNNs are constructed by putting two RNNs (f1 and f2) together. Mathematically, these are
defined as y,h=f1(x,h) and z,g = f2(g,x).

x2 x3

g0 f2 g1 f2 g2 f2 g3

z1 z2 z3

p=f3(y,z) f3 p1 f3 p2 f3 p3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3

x1 x2 x3
Deep RNNs

Deep RNNs are constructed by adding more layers to simple RNNs. Mathematically, it can be defined as
h’,y = f1(h,x), g’,z = f2(g,y)


z1 z2 z3

g0 f2 g1 f2 ……
g2 f2 g3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3 ……

x1 x2 x3
Pyramid RNNs

Pyramid RNNs speed up the training process by reducing the number of timesteps.

Bidirectional
RNN
Problems with Deep RNNs

Deep RNNs are very hard to train and usually don’t remember data beyond certain timesteps.


z1 z2 z3

g0 f2 g1 f2 ……
g2 f2 g3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3 ……


z1 z2 z3

x1 x2 x3
g0 f2 g1 f2 ……
g2 f2 g3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3 ……

x1 x2 x3
The Problem of Vanishing Gradient with RNNs

The problem arises while updating weights in RNNs. These weights connect the hidden layers to
themselves in the unrolled temporal loop.

𝜀 t-3
𝜀 t-2 𝜀 t 𝜀 T+1

Wout Wout Wout Wout Wout

Wrec Wrec Wrec Wrec

Win Win Win Win Win

Note: When any figure is multiplied by a small number, its value decreases very quickly.
Long Short-Term Memory (LSTM)
LSTM Architecture

Decides what
to forget Decides what to
Bits of insert
memory

Combines with the


transformed 𝑥𝑡
LSTM Architecture

Decides which part


of memory to
forget. The part to
be forgotten is
denoted with 0
LSTM Architecture

Decides what bits to


insert in the next
states

Decides what
content to store in
the next states
LSTM Architecture

Decides the content of


the next memory cell,
which is a mixture of the
not forgotten part from
previous cell and
insertion
LSTM Architecture

Decides on what part


of cell to output

Maps bits within -1


and +1 range
A Peephole LSTM

A peephole LSTM allows peeping into the memory.


Information Flow in LSTM
Information Flow in LSTM

𝑥𝑡

Controls Forget Gate 𝑍𝑓 𝜎 𝑊𝑓


ℎ𝑡−1

𝑥𝑡
Controls Input Gate 𝑍𝑖 𝜎 𝑊𝑖
ℎ𝑡−1

𝑥𝑡
Updates Information Z tanh W
ℎ𝑡−1

𝑥𝑡
Controls Output Gate 𝑍0 𝜎 𝑊0

ℎ𝑡−1

Note: Above four matrix computations are done concurrently.


Information Flow in LSTM

𝐶 𝑡−1 𝐶𝑡

𝑍𝑓 . + . 𝑦𝑡

𝑡−1 𝑍𝑖 ct = zf  ct-1 + ziz


𝑥𝑡 . ht = zo  tanh(ct)
yt = σ(W’ ht)
Z

𝑍0 ℎ𝑡

Note:  signifies element-wise multiplication.


Information Flow in LSTM

The memory from one state is fed to another state along with the new input.

𝐶 𝑡+1

𝑍𝑓
. +
. 𝑦 𝑡+1

𝐶 𝑡−1 𝐶𝑡
ℎ𝑡
𝑥 𝑡+1
𝑍𝑖
.
𝑍𝑓
. +
. 𝑦𝑡
Z

𝑍0 ℎ𝑡+1
ℎ𝑡−1
𝑥𝑡
𝑍𝑖
.
Z

𝑍0 ℎ𝑡
Stock Price Prediction Using LSTM

Problem Statement: Forecasting stock prices has been a difficult task for many of the
researchers and analysts. There are a lot of complicated financial indicators, as a result of which
the fluctuation of the stock market is highly volatile. The prediction of the market value is of great
importance to help in maximizing the profit of stock option purchase while keeping the risk low.
Objective: Use LSTM approach to predict stock market indices on the dataset [Link].
Note: Prices dataset are fetched from Yahoo Finance, fundamentals are from Nasdaq Financials,
extended by some fields from EDGAR SEC databases.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter
the username and password in the respective fields, and click Login.
Multiclass Classification Using LSTM

Problem Statement: You are given a news aggregator dataset which contains news headlines,
URLs, and categories for 422.937 news stories collected by a web aggregator. These news articles
have to be categorized into business, science and technology, entertainment, and health.
Objective: Perform multiclass classification using LSTM.
Note: Use [Link] for the above task.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter
the username and password in the respective fields, and click Login.
Load Libraries

Import the necessary libraries.


Load Data

Load the .csv file and check the data count in each class.

Note: m-class has way less data than the others, thus the classes are unbalanced.
Balance the Data

Perform shuffling to balance the classes.


Encode the Data

Perform one-hot encoding on the labels data.


Tokenization

Perform tokenization and identify the number of unique tokens.


Create Train and Test Sets

Split the dataset into training and testing sets. Also, define epochs, batch size, and labels for the same.
Define the LSTM Model

Code the LSTM model and fit the same into the processed data.
Check Performance

Evaluate the results on training and testing sets and obtain accuracy of the model.
Plot Metrics

Plot the model’s training accuracy versus validation accuracy and training loss versus validation loss.
Plot Metrics
Perform Predictions

Perform label predictions against random data.


Sentiment Analysis Using LSTM

Problem Statement: Sentiment Analysis is one of the common problems that companies are
working on. The most important application of sentiment analysis comes while working on natural
language processing tasks. The motive of your company behind building a sentiment analyzer is
to determine employee concerns and to develop programs to help improve the likelihood of
employees remaining in their jobs.
Objective: Use LSTM to perform sentiment analysis in Keras.
Note: Use the inbuilt dataset imdb from [Link] for this task.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter
the username and password in the respective fields, and click Login.
Gated Recurrent Unit (GRU)
GRU Architecture

Performs label predictions against random data.

Update Gate
Reset Gate
GRU Architecture

Update Gate Determines how much of the past information (from the
previous time steps) needs to be passed along to the future.

Reset Gate

Current Memory

Current State
GRU Architecture

Update Gate Determines how much of the past information needs to be


forgotten.

Reset Gate

Current Memory

Current State
GRU Architecture

Update Gate The current memory is computed using the reset gate to store
relevant information from the past.

Reset Gate

Current Memory

Current State
Note: Here tanh is the nonlinear activation function.
GRU Architecture

Update Gate At the final stage, h_t vector is calculated such that it holds the
information for the current unit and passes it down to the
network.

Reset Gate

Current Memory

Current State
LSTM vs. GRU

LSTM

 Tracks long-term dependencies while mitigating the vanishing or


exploding gradient problems. It does so via input, forget, and
output gates.

 Controls the exposure of memory content

GRU

 Tracks long-term dependencies using a reset gate and an


update gate

 Exposes the entire cell state to other units in the network


The Attention Model
Attention Model
Encoder-Decoder Framework

Encoder: From word sequence to sentence representation


Decoder: From representation to word sequence distribution
Universal Representation: Intermediate representation of meaning

English sentence English sentence

English English
decoder decoder

For unilingual data


For bitext data

French English
encoder encoder

French sentence English sentence


Motivation

Limited representation
Constrained over longer distances
Improving Performance with LSTM
Example:

Reversing the
order

Instead of mapping the sentence a, b, c to the sentence α, β, γ, the LSTM is asked to map c, b, a to
α, β, γ, where α, β, γ is the translation of a, b, c. This way, a is in close proximity to α, b is fairly
close to β, and so on.
Examples of Attention
Example 1

One-to-many Many-to-one Many-to-many Many-to-many

The way an LSTM chooses what to forget and what to insert into memory,
determines what inputs the network will attend to in the generation phase
Example 2
Example 3

Translated
MNIST inputs

Cluttered
Translated
MNIST inputs
Example 4

Similar to the way an LSTM chooses what to forget and insert


into memory, allow a network to choose a path to focus on
in the visual field
The Attention Mechanism
Improving Performance with LSTM

Consider an input (or intermediate) sequence or image


Consider an upper level representation, which can choose where to look, by
assigning a weight or probability to each input position, applied at each position

Higher-level

Softmax over lower


locations conditioned
on context at lower
and higher locations
Lower-level
NMT with Recurrent Nets and Attention Mechanism
Image-to-Text: Caption Generation with Attention
Neural Attention Models
Neural Attention Model for Sentence Summarization

The heatmap
represents a soft
alignment between
the input and the
generated
summary.

Output of the attention-based


summarization system.
Neural Attention Model for Sentence Summarization

Attention-
based
encoder enc3

NNLM (Neural Net


Language Models)
decoder with
additional encoder
element
Neural Attention Model for Sentence Summarization

Decoder: NNLM
Neural Attention Model for Sentence Summarization

Encoder: NNLM
Quora Insincere Questions Classification

Problem Statement: An existential problem for any major website today is how to handle toxic
and divisive content. Quora wants to tackle this problem head-on to keep their platform a place
where users can feel safe sharing their knowledge with the world. As an approach to the solution,
you must create models that identify and flag insincere questions (a question intended to make a
statement rather than look for helpful answers.)
Objective: Predict whether a question asked on Quora is sincere or not.
Note: Use the word embeddings provided along with the datasets to accomplish your goal. Also,
use tf1.14 for accessing [Link].
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter
the username and password in the respective fields, and click Login.
Key Takeaways

RNNs have a mechanism that can handle a sequential dataset

The memory from one state is fed to another state along with the
new input in LSTMs

Attention models use encoder-decoder framework


Knowledge Check
Knowledge
Check Why is an RNN (Recurrent Neural Network) used for machine translation, say
translating English to French?
1

a. It can be trained as an unsupervised learning problem

b. It is strictly more powerful than a Convolutional Neural Network (CNN)

c. It is applicable when the input/output is a sequence (e.g., a sequence of words)

d. RNNs represent the recurrent process of Idea->Code->Experiment->Idea->....


Knowledge
Check Why is an RNN (Recurrent Neural Network) used for machine translation, say
translating English to French?
1

a. It can be trained as an unsupervised learning problem

b. It is strictly more powerful than a Convolutional Neural Network (CNN)

c. It is applicable when the input/output is a sequence (e.g., a sequence of words)

d. RNNs represent the recurrent process of Idea->Code->Experiment->Idea->....

The correct answer is c

RNNs are effective on sequential data.


Knowledge
Check What is the probable approach when dealing with “Vanishing Gradient” problem in
RNNs?
2

a. Use modified architectures like LSTM and GRUs

b. Gradient Clipping

c. Dropout

d. All the above


Knowledge
Check What is the probable approach when dealing with “Vanishing Gradient” problem in
RNNs?
2

a. Use modified architectures like LSTM and GRUs

b. Gradient Clipping

c. Dropout

d. All the above

The correct answer is a

LSTMs and GRUs avoid vanishing gradient problem by incorporating gates within RNNs such that only relevant
information is passed forward.
Stock Price Forecasting

Problem Statement: It's hard not to think of the stock market as a person. It has moods that
can turn from irritable to euphoric. Stock price prediction is of great use for the investors. They
constantly review past pricing history and use it to influence their future investment decisions.
Considering LSTMs as very powerful networks in sequence prediction, build a deep learning
model to predict the future behavior of stock prices.

Objective: Use LSTM for forecasting stock data.


Note: Use the [Link] to train your model and perform the testing on
[Link] file.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter
the username and password in the respective fields, and click Login.
Thank You

You might also like