06 - LLM

The document discusses various aspects of Recurrent Neural Networks (RNNs) and their applications, particularly in sequential data processing. It highlights the limitations of standard RNNs, such as the vanishing gradient problem, and introduces Long Short-Term Memory (LSTM) networks as a solution to these issues. Additionally, it outlines the architecture and functioning of LSTMs, including their gating mechanisms and applications in tasks like image captioning and language modeling.

Uploaded by

azmiunofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views18 pages

06 - LLM

Uploaded by

azmiunofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Unit-3

- Large Language Model

- RNN , Language Attention Model ,
- Self Attention, Soft vs Hard Attention
- Transformers for Image Recognition
- Key, Value, Query, Encoder-Decoder.
Recurrent Neural Network (RNN)
• Recurrent Neural Networks unlike Feed-forward neural networks in
which activation outputs are propagated only in one direction, in
RNN ,the activation outputs from neurons propagate in both directions.
This creates loops in the neural network architecture which acts as a
‘memory state’ of the neurons. This state allows the neurons an ability
• to remember
The whatallows
hidden state have been learnedto
the network soretain
far. information from
past inputs, making it suitable for sequential tasks. This helps the
network understand the context of what has already happened and
make better predictions based on that.
• For example when predicting the next word in a sentence the RNN
uses the previous words to help decide what word is most likely to
come next.

Input node Hidden node Output

Input node Hidden node Output node
Recurrent Neural
• Network(contd)
Recurrent Neural Networks working:
Unlike Feed-forward neural networks in the RNN takes an input vector X
and the network generates an output vector y by scanning the data
sequentially from left to right, with each time step updating the hidden
state and producing an output. At each time step t, the hidden state W i is
computed based on the current input Xi , previous hidden state Wi₋₁ and
model parameters.
This sharing of parameters allows the RNN to effectively capture temporal
dependencies and process sequential data more efficiently by retaining
the information from previous input in its current hidden state.
Y y y
Outp yi i-1 i
i+

1
ut

feedba
w = w ……
ck i-
W i W i+1
1
i
weigh
ts
xi X xi x i+
i-1
Inpu 1

t
Types of
RNN
There are four types of Recurrent Neural Networks:
1. One to One
2. One to Many
3. Many to One
4. Many to Many 2. One to Many
This type of neural network has a
1. One to One single input and multiple outputs. An
example of this is the image
This type of caption.
neural Single Output
network is known
as the Vanilla
yi yi yi yi
Neural Network.
It's used for w
general w w w
i
machine i i
i
learning
problems, xi
which has a single xi
input and a Single Input
Single Input
single
Types of RNN
3. Many to One (contd) 3. Many to Many
This RNN takes a sequence of inputs This RNN takes a sequence of inputs
and generates a single output. and generates a sequence of outputs.
Sentiment analysis is a good Machine translation is one of the
example of this kind of network examples like word to voice.
where a given sentence can be
classified as expressing positive or
Multiple Output
negative sentiments.

Single Output
yi yi yi yi

w w w w w w
i i i i i
i

xi xi xi xi xi xi

Multiple Input Multiple Input

Image Captioning
https://imagecaptiongenerator.com/

Image caption generator is a process of recognizing the context of an

image and annotating it with relevant captions using deep learning and
computer vision. This task lies at the intersection of computer vision and
natural language processing. Most image captioning systems use an
encoder-decoder framework, where an input image is encoded into an
intermediate representation of the information in the image, and then
decoded
Using RNNsintoimage
a descriptive text sequence.
captioning
involves leveraging the power of both
Convolutional Neural Networks (CNNs) for
image feature extraction and Recurrent
Neural Networks (RNNs) for generating
descriptive captions.
The process typically follows an encoder-
decoder framework: a CNN (encoder)
analyzes the image to produce a feature
representation, which is then fed into an
RNN (decoder) to generate a textual
description.
A pre-trained CNN, like VGG16 or ResNet, is
used to extract visual features from the
input image. These features represent the
image's content in a numerical format.
Image
During training, the image and text encoder produce respectively image
Captioning(contd)
embeddings (I1 and I2) and text embeddings (T1 and T2) in a way to
maximize the cosine similarities between correct image-text pairs (<I1-T1>,
and <I2-T2>) while minimizing the cosine similarities for dissimilar pairs
(non-diagonal elements), in a contrastive fashion. The image encoder is a
computer vision backbone such as ResNet, or Vision Transformer (ViT)[4]
and the text encoder is a transformer [5].
A baby holding the
dog
A woman walking
on the road T1 T2
I1T1 I1T2
I2T1 I2T2

I1 I2
Most Popular application of RNN
RNNs are most effective in handling sequential type of data across
different domains. In the healthcare and pharmacy sectors specifically,
RNNs have shown promise in patient care management, treatment
personalization, and predictive analytics, which are crucial for both clinical
research and drug discovery. Their ability to process sequential and time-
seriesof
Some data
the makes them invaluable
most popular in these
applications fields.
of RNNs include:

1. Natural Language Processing (NLP)

RNNs are effective in NLP because they process sequences of data (like
words in a sentence) one at a time while maintaining an internal state
(memory) that captures information about the sequence processed so
far. The memory helps RNNs understand context, which is crucial in
language for understanding meaning and intent. They predict the
probability of a sequence of words to generate text that is syntactically
and semantically coherent.
Most Popular application of RNN
(contd)
2. Time Series Analysis

• Time series data is inherently sequential. Recurrent Neural Networks

(RNNs) are particularly well-suited for time series analysis due to their
ability to process sequential data and capture temporal dependencies.
• One of the key challenges in time series analysis is capturing the
dependencies across time steps. For example, in stock market
prediction, the price of a stock today could be influenced by its prices
over the past several days or weeks. RNNs, with their recurrent
connections, are able to capture these long-term dependencies, making
them effective for such analysis.
3. Medical Records Analysis
The sequential nature of data in medical records also is just apt for RNN analysis
medical records, the order and timing of events, such as symptoms, diagnoses,
treatments, and lab results, are essential in understanding a patient's health sta
and predicting future outcomes.
• For instance, in predicting the risk of future diseases or readmission, an
RNN can analyze the sequence of a patient's medical history, including
past ailments, treatments, and responses. This sequential analysis allows
the RNN to capture patterns and dependencies that might be missed by
models that treat each event independently.
Limitation of
• RNN
RNN suffers from a major drawback, known as the vanishing
gradient problem, which prevents it from yielding high accuracy. As
the context length increases, layers in the unrolled RNN also increase.
Consequently, as the network becomes deeper, the gradients flowing
back in the back propagation step becomes smaller. As a result, the
learning rate becomes really slow and makes it infeasible to expect
long-term dependencies of the language.
• In simpler words, RNNs experience difficulty in memorizing previous
words very far away in the sequence and is only able to make
predictions based on the most recent words.
• Another limitation in RNN is that while they can theoretically use all
previous context for making predictions, in practice, standard RNNs
tend to prioritize more recent inputs, which can limit their
effectiveness for some types of problems.
Limitation of
RNN
• Training RNNs can be challenging due to issues like choosing the
appropriate number of time steps to unroll the network, which
can significantly affect performance and training time
• The nature of RNNs requires processing the sequence one step at a
time, which can lead to slow computation, as it's difficult to parallelize
the operations across a sequence. This can be a bottleneck when
dealing with very long sequences or large datasets.
To address this problem, A new type of RNNs called LSTMs (Long Short
Term Memory) Models has been developed. LSTMs have an additional state
called ‘cell state’ through which the network makes adjustments in the
information flow. The advantage of this state is that the model can
remember or forget the leanings more selectively. Since LSTMs can
capture long-term dependencies in sequential data it makes them ideal for
tasks like language translation, speech recognition and time series
forecasting.
LSTM
(contd)
LSTM is designed to handle the vanishing gradient problem. It does this
by introducing a memory cell regulated by three gating mechanisms that
control the flow of information in it : the input gate, the forget gate, and
the output gate. The cell is made up of four neural networks and they are
connected in a chain structure. The cells store information, whereas the
gates manipulate memory. There are three entrances: Long Short-Term
Memory Networks inherit the exact architecture from standard RNNs,
with the exception of the hidden state.
memor
y
Ct-1 Ct
X + tanh
1 2 3
ft X X

state / σ σ tanh σ
weight
ht-1 ht
Input
(Xt)
LSTM
1. Forget Gate
(contd)
Step 1: It determines what details that should be removed from the
current
The first input . the LSTM is to decide which information should be omitted
step in
from the cell in that particular time step. The sigmoid function (σ)
determines this. It looks at the previous state (ht-1) along with the current
input Xt and computes the function ft . The function ft decides what
information from previous state to retain and what to forget that is not
important or relevant.
Consider the following two sentences:

Let the output of (ht-1) be “Alice is good in Physics. John, on the

other hand, is good at Chemistry. He is as well an excellent football
player”

Let the current input at Xt be “John plays football well. He told me

yesterday over the phone that he had served as the captain of his
college football team.”

The forget gate realizes there might be a change in context after

encountering the first full stop. It compares with the current input
sentence at Xt .The next sentence talks about John, so the
LSTM
2. Input Gate
(contd)
Step 2: Decide How Much This Unit Adds to the Current State
In the second layer, there are two parts. One is the sigmoid function, and
the other is the tanh function. In the sigmoid function, it decides which
values to let through (0 or 1). tanh function gives weightage to the values
which are passed, deciding their level of importance (-1 to 1)
This stage thus decides what information from current state to retain.

wrt example sentences:

Let the output of (ht-1) be “Alice is good in Physics. John, on the other
hand, is good at Chemistry.”

Let the current input at Xt be “He told me yesterday over the phone that
he had served as the captain of his college football team.”

With the current input at Xt , the input gate analyzes the important
information — John plays football, and the fact that he was the captain of
his college team is important.
“He told me yesterday over the phone” is less important; hence it might
be dropped. This process of adding controlled information is done via
LSTM
3. Output Gate
(contd)
Step 3: Decide What Part of the Current Cell State Makes It to the Output
The third step is to decide what the output will be. The sigmoid layer,
decides what parts of the cell state make it to the output. Then, the cell
state through tanh push the values to be between -1 and 1 and multiply it
by the output of the sigmoid gate.
This stage thus decides finally what information from current state to
retain.
wrt example sentences :
Let’s try to predict the next word in the sentence: “John played
tremendously well against the opponent and won for his team. For
his contributions, __???__ was awarded player of the match.”

There could be other choices toot , for the empty space. However ,
since the context is revolving around John . So, “John” could be the
best output after contributions...
LSTM
• The final LSTM DNN(contd)
is made up of cells that are connected in a chain
structure as shown below. The cells store information, whereas the
gates manipulate memory.

Bidirectional LSTMs
• In Bidirectional Recurrent Neural
Networks (BRNN) each training
sequence consists of forwards
and backwards to two
independent recurrent nets, both
of which are coupled to the same
output layer. This means that the
BRNN has comprehensive,
sequential knowledge about all
points before and after each
point in a given sequence.
LSTM applications
LSTM has a number of well-known applications, including:

• Image captioning
• Machine translation
• Language modeling
• Handwriting generation
• Question answering chatbots

1. Obama-RNN
Here the author used RNN to generate hypothetical political speeches
given by Barrack Obama. Taking in over 4.3 MB / 730,895 words of text
written by Obama’s speech writers as input, the model generates
multiple versions with a wide range of topics including jobs, war on
terrorism, democracy, China.
https://medium.com/@
samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0

2. Harry Potter
Here the author trained an LSTM Recurrent Neural Network on the first 4
Harry Potter books. The model is asked to produce a chapter based on
what it learned.
https://
Deep Learning is best suited for
COMPLEX PROBLEMS
such as Image recognition, Speech
recognition, or Natural Language
Processing,
PROVIDED
you have enough data, computing power,
and PATIENCE.

Thanks

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
GenAI-Module2
No ratings yet
GenAI-Module2
190 pages
Unit III- Recurrent Neural Networks
No ratings yet
Unit III- Recurrent Neural Networks
44 pages
Module2 L7 RNN LSTM
No ratings yet
Module2 L7 RNN LSTM
47 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
UNIT-3
No ratings yet
UNIT-3
30 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
RNN introduction
No ratings yet
RNN introduction
22 pages
2 U4-Rnn
No ratings yet
2 U4-Rnn
17 pages
UNIT-IV DL
No ratings yet
UNIT-IV DL
54 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
RNN SK
No ratings yet
RNN SK
17 pages
Unit 5
No ratings yet
Unit 5
76 pages
RNN
No ratings yet
RNN
23 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
DL For Sequencial Data
No ratings yet
DL For Sequencial Data
36 pages
Unit 3
No ratings yet
Unit 3
8 pages
Unit 4
No ratings yet
Unit 4
27 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Day 4
No ratings yet
Day 4
22 pages
T3-Slide_002_Vanilla RNNs
No ratings yet
T3-Slide_002_Vanilla RNNs
25 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
AAM unit 6 notes
No ratings yet
AAM unit 6 notes
20 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Lecture Notes_RRN
No ratings yet
Lecture Notes_RRN
8 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
DL-UNIT_5
No ratings yet
DL-UNIT_5
10 pages
What is an RNN
No ratings yet
What is an RNN
6 pages
DL_MOD4 (3)
No ratings yet
DL_MOD4 (3)
105 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Unit V
No ratings yet
Unit V
32 pages
Recurrent Neural Networks: Index
No ratings yet
Recurrent Neural Networks: Index
13 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
ad3501-dl-unit-3-notes
No ratings yet
ad3501-dl-unit-3-notes
30 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Survey On Recurrent Neural Network in Natural Lang
No ratings yet
Survey On Recurrent Neural Network in Natural Lang
5 pages
RNN Neural Network
No ratings yet
RNN Neural Network
23 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
DL
No ratings yet
DL
251 pages
Chapter 5 - RNN Updated
No ratings yet
Chapter 5 - RNN Updated
116 pages
Unit_3_rcnn
No ratings yet
Unit_3_rcnn
25 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
57396bd4-b487-4e7d-bd05-110238628dbd
No ratings yet
57396bd4-b487-4e7d-bd05-110238628dbd
31 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
module5
No ratings yet
module5
21 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
RNNBasics
No ratings yet
RNNBasics
23 pages
RNN
No ratings yet
RNN
32 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
21 pages
Module 06
No ratings yet
Module 06
5 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
AIDS II (1)
No ratings yet
AIDS II (1)
42 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Think Aloud
100% (1)
Think Aloud
12 pages
Minutes of the Meeting
No ratings yet
Minutes of the Meeting
2 pages
Assessing Student Learning Outcomes
0% (1)
Assessing Student Learning Outcomes
18 pages
grade-7-french-schemes-of-work-term-2-2023
No ratings yet
grade-7-french-schemes-of-work-term-2-2023
8 pages
ICGE XI 2025 Programme Book (2) (1)
No ratings yet
ICGE XI 2025 Programme Book (2) (1)
18 pages
Unit 1 Communication Workbook
No ratings yet
Unit 1 Communication Workbook
23 pages
Classroom Management Preparation For Task
No ratings yet
Classroom Management Preparation For Task
5 pages
Group 4: Formative Assessment: Pupils Write The Type of Natural Disaster
No ratings yet
Group 4: Formative Assessment: Pupils Write The Type of Natural Disaster
2 pages
Alston Brochure
No ratings yet
Alston Brochure
2 pages
Yogi Surya: Personal Data
No ratings yet
Yogi Surya: Personal Data
2 pages
Project Proposal Machine Learning
No ratings yet
Project Proposal Machine Learning
6 pages
MRAD
No ratings yet
MRAD
16 pages
Guide To Writing A Reflective Paper - 2024
No ratings yet
Guide To Writing A Reflective Paper - 2024
4 pages
ADDIE Worksheet
No ratings yet
ADDIE Worksheet
3 pages
The-workshop-as-a-research-methodology-lessons
No ratings yet
The-workshop-as-a-research-methodology-lessons
14 pages
SIGA Narrative 2020 2021
100% (1)
SIGA Narrative 2020 2021
22 pages
UGC Proforma Presidency University 1
No ratings yet
UGC Proforma Presidency University 1
47 pages
Dyslexia, Dyscalculia and Maths Learner Difficulties Steve Chinn
No ratings yet
Dyslexia, Dyscalculia and Maths Learner Difficulties Steve Chinn
29 pages
FINAL Detailed Lesson Plan in ENGLISH by Alyzandra Garcia
No ratings yet
FINAL Detailed Lesson Plan in ENGLISH by Alyzandra Garcia
5 pages
Testbank for Clinical Reasoning Cases in Nursing 8th Edition Instant Download
No ratings yet
Testbank for Clinical Reasoning Cases in Nursing 8th Edition Instant Download
17 pages
Guide - Part B - KA2 - Cooperation For Innovation and The Exchange of Good Practices
No ratings yet
Guide - Part B - KA2 - Cooperation For Innovation and The Exchange of Good Practices
82 pages
research based teaching strategies
No ratings yet
research based teaching strategies
4 pages
TERM II FA1 PBL - Grade 4
No ratings yet
TERM II FA1 PBL - Grade 4
1 page
TOPIC 9 Upload For Students
No ratings yet
TOPIC 9 Upload For Students
15 pages
Review Assignment Chapter 1-3
No ratings yet
Review Assignment Chapter 1-3
2 pages
Population and Development Education
No ratings yet
Population and Development Education
405 pages
Important Information About ECBA Certification
No ratings yet
Important Information About ECBA Certification
4 pages
Education System Austria
No ratings yet
Education System Austria
12 pages
English: Quarter 4 - Module 1
100% (1)
English: Quarter 4 - Module 1
17 pages
BautistaJustine F. Lesson Plan
No ratings yet
BautistaJustine F. Lesson Plan
5 pages