0% found this document useful (0 votes)
2 views18 pages

06 - LLM

The document discusses various aspects of Recurrent Neural Networks (RNNs) and their applications, particularly in sequential data processing. It highlights the limitations of standard RNNs, such as the vanishing gradient problem, and introduces Long Short-Term Memory (LSTM) networks as a solution to these issues. Additionally, it outlines the architecture and functioning of LSTMs, including their gating mechanisms and applications in tasks like image captioning and language modeling.

Uploaded by

azmiunofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views18 pages

06 - LLM

The document discusses various aspects of Recurrent Neural Networks (RNNs) and their applications, particularly in sequential data processing. It highlights the limitations of standard RNNs, such as the vanishing gradient problem, and introduces Long Short-Term Memory (LSTM) networks as a solution to these issues. Additionally, it outlines the architecture and functioning of LSTMs, including their gating mechanisms and applications in tasks like image captioning and language modeling.

Uploaded by

azmiunofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Unit-3

- Large Language Model


- RNN , Language Attention Model ,
- Self Attention, Soft vs Hard Attention
- Transformers for Image Recognition
- Key, Value, Query, Encoder-Decoder.
Recurrent Neural Network (RNN)
• Recurrent Neural Networks unlike Feed-forward neural networks in
which activation outputs are propagated only in one direction, in
RNN ,the activation outputs from neurons propagate in both directions.
This creates loops in the neural network architecture which acts as a
‘memory state’ of the neurons. This state allows the neurons an ability
• to remember
The whatallows
hidden state have been learnedto
the network soretain
far. information from
past inputs, making it suitable for sequential tasks. This helps the
network understand the context of what has already happened and
make better predictions based on that.
• For example when predicting the next word in a sentence the RNN
uses the previous words to help decide what word is most likely to
come next.

Input node Hidden node Output


Input node Hidden node Output node
Recurrent Neural
• Network(contd)
Recurrent Neural Networks working:
Unlike Feed-forward neural networks in the RNN takes an input vector X
and the network generates an output vector y by scanning the data
sequentially from left to right, with each time step updating the hidden
state and producing an output. At each time step t, the hidden state W i is
computed based on the current input Xi , previous hidden state Wi₋₁ and
model parameters.
This sharing of parameters allows the RNN to effectively capture temporal
dependencies and process sequential data more efficiently by retaining
the information from previous input in its current hidden state.
Y y y
Outp yi i-1 i
i+

1
ut

feedba
w = w ……
ck i-
W i W i+1
1
i
weigh
ts
xi X xi x i+
i-1
Inpu 1

t
Types of
RNN
There are four types of Recurrent Neural Networks:
1. One to One
2. One to Many
3. Many to One
4. Many to Many 2. One to Many
This type of neural network has a
1. One to One single input and multiple outputs. An
example of this is the image
This type of caption.
neural Single Output
network is known
as the Vanilla
yi yi yi yi
Neural Network.
It's used for w
general w w w
i
machine i i
i
learning
problems, xi
which has a single xi
input and a Single Input
Single Input
single
Types of RNN
3. Many to One (contd) 3. Many to Many
This RNN takes a sequence of inputs This RNN takes a sequence of inputs
and generates a single output. and generates a sequence of outputs.
Sentiment analysis is a good Machine translation is one of the
example of this kind of network examples like word to voice.
where a given sentence can be
classified as expressing positive or
Multiple Output
negative sentiments.

Single Output
yi yi yi yi

w w w w w w
i i i i i
i

xi xi xi xi xi xi

Multiple Input Multiple Input


Image Captioning
https://imagecaptiongenerator.com/

Image caption generator is a process of recognizing the context of an


image and annotating it with relevant captions using deep learning and
computer vision. This task lies at the intersection of computer vision and
natural language processing. Most image captioning systems use an
encoder-decoder framework, where an input image is encoded into an
intermediate representation of the information in the image, and then
decoded
Using RNNsintoimage
a descriptive text sequence.
captioning
involves leveraging the power of both
Convolutional Neural Networks (CNNs) for
image feature extraction and Recurrent
Neural Networks (RNNs) for generating
descriptive captions.
The process typically follows an encoder-
decoder framework: a CNN (encoder)
analyzes the image to produce a feature
representation, which is then fed into an
RNN (decoder) to generate a textual
description.
A pre-trained CNN, like VGG16 or ResNet, is
used to extract visual features from the
input image. These features represent the
image's content in a numerical format.
Image
During training, the image and text encoder produce respectively image
Captioning(contd)
embeddings (I1 and I2) and text embeddings (T1 and T2) in a way to
maximize the cosine similarities between correct image-text pairs (<I1-T1>,
and <I2-T2>) while minimizing the cosine similarities for dissimilar pairs
(non-diagonal elements), in a contrastive fashion. The image encoder is a
computer vision backbone such as ResNet, or Vision Transformer (ViT)[4]
and the text encoder is a transformer [5].
A baby holding the
dog
A woman walking
on the road T1 T2
I1T1 I1T2
I2T1 I2T2

I1 I2
Most Popular application of RNN
RNNs are most effective in handling sequential type of data across
different domains. In the healthcare and pharmacy sectors specifically,
RNNs have shown promise in patient care management, treatment
personalization, and predictive analytics, which are crucial for both clinical
research and drug discovery. Their ability to process sequential and time-
seriesof
Some data
the makes them invaluable
most popular in these
applications fields.
of RNNs include:

1. Natural Language Processing (NLP)


RNNs are effective in NLP because they process sequences of data (like
words in a sentence) one at a time while maintaining an internal state
(memory) that captures information about the sequence processed so
far. The memory helps RNNs understand context, which is crucial in
language for understanding meaning and intent. They predict the
probability of a sequence of words to generate text that is syntactically
and semantically coherent.
Most Popular application of RNN
(contd)
2. Time Series Analysis

• Time series data is inherently sequential. Recurrent Neural Networks


(RNNs) are particularly well-suited for time series analysis due to their
ability to process sequential data and capture temporal dependencies.
• One of the key challenges in time series analysis is capturing the
dependencies across time steps. For example, in stock market
prediction, the price of a stock today could be influenced by its prices
over the past several days or weeks. RNNs, with their recurrent
connections, are able to capture these long-term dependencies, making
them effective for such analysis.
3. Medical Records Analysis
The sequential nature of data in medical records also is just apt for RNN analysis
medical records, the order and timing of events, such as symptoms, diagnoses,
treatments, and lab results, are essential in understanding a patient's health sta
and predicting future outcomes.
• For instance, in predicting the risk of future diseases or readmission, an
RNN can analyze the sequence of a patient's medical history, including
past ailments, treatments, and responses. This sequential analysis allows
the RNN to capture patterns and dependencies that might be missed by
models that treat each event independently.
Limitation of
• RNN
RNN suffers from a major drawback, known as the vanishing
gradient problem, which prevents it from yielding high accuracy. As
the context length increases, layers in the unrolled RNN also increase.
Consequently, as the network becomes deeper, the gradients flowing
back in the back propagation step becomes smaller. As a result, the
learning rate becomes really slow and makes it infeasible to expect
long-term dependencies of the language.
• In simpler words, RNNs experience difficulty in memorizing previous
words very far away in the sequence and is only able to make
predictions based on the most recent words.
• Another limitation in RNN is that while they can theoretically use all
previous context for making predictions, in practice, standard RNNs
tend to prioritize more recent inputs, which can limit their
effectiveness for some types of problems.
Limitation of
RNN
• Training RNNs can be challenging due to issues like choosing the
appropriate number of time steps to unroll the network, which
can significantly affect performance and training time
• The nature of RNNs requires processing the sequence one step at a
time, which can lead to slow computation, as it's difficult to parallelize
the operations across a sequence. This can be a bottleneck when
dealing with very long sequences or large datasets.
To address this problem, A new type of RNNs called LSTMs (Long Short
Term Memory) Models has been developed. LSTMs have an additional state
called ‘cell state’ through which the network makes adjustments in the
information flow. The advantage of this state is that the model can
remember or forget the leanings more selectively. Since LSTMs can
capture long-term dependencies in sequential data it makes them ideal for
tasks like language translation, speech recognition and time series
forecasting.
LSTM
(contd)
LSTM is designed to handle the vanishing gradient problem. It does this
by introducing a memory cell regulated by three gating mechanisms that
control the flow of information in it : the input gate, the forget gate, and
the output gate. The cell is made up of four neural networks and they are
connected in a chain structure. The cells store information, whereas the
gates manipulate memory. There are three entrances: Long Short-Term
Memory Networks inherit the exact architecture from standard RNNs,
with the exception of the hidden state.
memor
y
Ct-1 Ct
X + tanh
1 2 3
ft X X

state / σ σ tanh σ
weight
ht-1 ht
Input
(Xt)
LSTM
1. Forget Gate
(contd)
Step 1: It determines what details that should be removed from the
current
The first input . the LSTM is to decide which information should be omitted
step in
from the cell in that particular time step. The sigmoid function (σ)
determines this. It looks at the previous state (ht-1) along with the current
input Xt and computes the function ft . The function ft decides what
information from previous state to retain and what to forget that is not
important or relevant.
Consider the following two sentences:

Let the output of (ht-1) be “Alice is good in Physics. John, on the


other hand, is good at Chemistry. He is as well an excellent football
player”

Let the current input at Xt be “John plays football well. He told me


yesterday over the phone that he had served as the captain of his
college football team.”

The forget gate realizes there might be a change in context after


encountering the first full stop. It compares with the current input
sentence at Xt .The next sentence talks about John, so the
LSTM
2. Input Gate
(contd)
Step 2: Decide How Much This Unit Adds to the Current State
In the second layer, there are two parts. One is the sigmoid function, and
the other is the tanh function. In the sigmoid function, it decides which
values to let through (0 or 1). tanh function gives weightage to the values
which are passed, deciding their level of importance (-1 to 1)
This stage thus decides what information from current state to retain.

wrt example sentences:

Let the output of (ht-1) be “Alice is good in Physics. John, on the other
hand, is good at Chemistry.”

Let the current input at Xt be “He told me yesterday over the phone that
he had served as the captain of his college football team.”

With the current input at Xt , the input gate analyzes the important
information — John plays football, and the fact that he was the captain of
his college team is important.
“He told me yesterday over the phone” is less important; hence it might
be dropped. This process of adding controlled information is done via
LSTM
3. Output Gate
(contd)
Step 3: Decide What Part of the Current Cell State Makes It to the Output
The third step is to decide what the output will be. The sigmoid layer,
decides what parts of the cell state make it to the output. Then, the cell
state through tanh push the values to be between -1 and 1 and multiply it
by the output of the sigmoid gate.
This stage thus decides finally what information from current state to
retain.
wrt example sentences :
Let’s try to predict the next word in the sentence: “John played
tremendously well against the opponent and won for his team. For
his contributions, __???__ was awarded player of the match.”

There could be other choices toot , for the empty space. However ,
since the context is revolving around John . So, “John” could be the
best output after contributions...
LSTM
• The final LSTM DNN(contd)
is made up of cells that are connected in a chain
structure as shown below. The cells store information, whereas the
gates manipulate memory.

Bidirectional LSTMs
• In Bidirectional Recurrent Neural
Networks (BRNN) each training
sequence consists of forwards
and backwards to two
independent recurrent nets, both
of which are coupled to the same
output layer. This means that the
BRNN has comprehensive,
sequential knowledge about all
points before and after each
point in a given sequence.
LSTM applications
LSTM has a number of well-known applications, including:

• Image captioning
• Machine translation
• Language modeling
• Handwriting generation
• Question answering chatbots

1. Obama-RNN
Here the author used RNN to generate hypothetical political speeches
given by Barrack Obama. Taking in over 4.3 MB / 730,895 words of text
written by Obama’s speech writers as input, the model generates
multiple versions with a wide range of topics including jobs, war on
terrorism, democracy, China.
https://medium.com/@
samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0

2. Harry Potter
Here the author trained an LSTM Recurrent Neural Network on the first 4
Harry Potter books. The model is asked to produce a chapter based on
what it learned.
https://
Deep Learning is best suited for
COMPLEX PROBLEMS
such as Image recognition, Speech
recognition, or Natural Language
Processing,
PROVIDED
you have enough data, computing power,
and PATIENCE.

Thanks

You might also like