0% found this document useful (0 votes)

57 views

generative AI Unit 3 notes

generative AI

Uploaded by

darshika gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

generative AI Unit 3 notes

generative AI

Uploaded by

darshika gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit -3

Transformer
Transformers: Introduction

• Definition: Transformers are a revolutionary deep learning architecture introduced by Vaswani et al.
in the paper "Attention is All You Need". They were primarily designed for tasks involving sequential
data like natural language processing (NLP). Unlike traditional sequential models like Recurrent
Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks, transformers can process
entire input sequences at once rather than sequentially.
• Key Innovation: The key innovation in transformers is the self-attention mechanism, which allows
the model to weigh the importance of different words (or tokens) within the input sequence. This is
done without regard to the word's position, thus overcoming the limitation of RNNs where long-range
dependencies (e.g., relationships between words far apart in a sentence) are difficult to capture.

Applications of Transformers

• Natural Language Processing (NLP): Transformers have been applied extensively in NLP tasks.
They have enabled breakthroughs in tasks like:
o Machine Translation: Translating text from one language to another (e.g., Google
Translate).
o Text Summarization: Creating concise summaries of longer documents.
o Sentiment Analysis: Determining the sentiment (positive, negative, neutral) in a text.
o Question Answering: Systems like BERT (Bidirectional Encoder Representations from
Transformers) can answer questions from given texts by understanding context.
o Language Modeling: Predicting the next word or phrase in a sequence (e.g., GPT-3, used for
generating human-like text).
• Computer Vision:
o Recently, transformers have been adapted for vision tasks, most notably with Vision
Transformers (ViTs). These models treat image patches as sequences of tokens, similar to
words in a text. They have shown competitive performance on image classification tasks.
o Applications include object detection, image segmentation, and even video understanding.
• Speech Processing:
o Transformers are used for tasks like automatic speech recognition (ASR), converting
spoken language into text, and speech synthesis, generating spoken language from text (e.g.,
text-to-speech models).
• Reinforcement Learning:
o Transformers have been applied in reinforcement learning for decision-making tasks. They
are particularly useful in tasks requiring long-term planning and memory of past states, such
as video game agents like those used in AlphaStar.
• Multimodal Learning:
o Multimodal transformers combine information from multiple data types, such as text, images,
and sound, to perform tasks that require an understanding of different modalities. For
example, models like DALL·E and CLIP use text to generate or interpret images.

Transformer Encoder

• Architecture: The transformer architecture consists of an encoder-decoder structure. The encoder

processes the input sequence, while the decoder generates the output sequence. In some models like
BERT, only the encoder is used (hence it's called an encoder-only model), while in models like
GPT, only the decoder is used (decoder-only model).
• Components:
o Self-Attention Mechanism: Each encoder layer contains a self-attention mechanism that
allows the model to attend to all parts of the input sequence simultaneously. This enables it to
weigh the importance of different tokens based on their relevance to one another.
o Feedforward Neural Network: After the self-attention mechanism, a fully connected
feedforward network processes the attention outputs. This helps transform the representations
learned from the attention mechanism into a more useful form for the task.
o Normalization and Residual Connections: Each layer also includes layer normalization and
residual connections to ensure stable gradients during training, preventing issues like vanishing
gradients.
• Stacking Encoders:
o Multiple encoder layers are stacked on top of each other. Each layer builds on the previous one,
progressively learning more complex relationships between the tokens in the sequence.

Self-Attention Mechanisms

• Purpose: Self-attention helps the model determine which words in the input sequence are important
to each other, regardless of their distance apart. This allows the model to handle long-range
dependencies, which is difficult for traditional sequential models like RNNs.
• How it Works:
1. Input Embeddings: Each word in the sequence is converted into a vector (embedding)
representing its meaning.
2. Query, Key, and Value Vectors: These embeddings are then projected into three different
vectors: Query (Q), Key (K), and Value (V).
3. Attention Scores: The attention scores between the words are calculated by taking the dot
product of the Query and Key vectors. This results in a matrix of scores that represent how
much focus each word should place on every other word.
4. Softmax Normalization: These attention scores are passed through a softmax function to
normalize them, ensuring that they sum to 1.
5. Weighted Values: The final output of the self-attention mechanism is a weighted sum of the
Value vectors, weighted by the attention scores.
• Benefits: Self-attention enables the model to capture relationships between words, regardless of their
position in the sentence. This is in contrast to RNNs, which have difficulty capturing long-range
dependencies because they process the sentence sequentially.

Self-Attention vs. Multi-Head Attention

• Self-Attention: A single attention head computes the attention scores between each word and every
other word in the sequence. However, a single attention head might focus too much on certain parts
of the sequence and ignore others, limiting its capacity to capture complex relationships.
• Multi-Head Attention:
o Multi-head attention extends self-attention by using multiple attention heads. Each head
learns to focus on different parts of the sequence. The outputs from all heads are concatenated
and transformed by a feedforward neural network.
o Benefit: By attending to different parts of the sequence simultaneously, multi-head attention
helps the model learn a richer representation of the input, capturing more subtle relationships
and
patterns

.
Positional Encoding in Transformers

• Challenge: Unlike RNNs and CNNs, transformers process inputs in parallel rather than sequentially.
While this is efficient, it introduces a problem: the model has no sense of word order, which is
crucial in language tasks.
• Solution: Positional encodings are added to the input embeddings to give the model a sense of
position. These encodings are learned or designed functions that provide unique values to each
position in the input sequence.
• How Positional Encoding Works:
o The most common positional encodings are sinusoidal functions of different frequencies,
which ensure that each position has a unique encoding that can be easily interpreted by the
model.
o These encodings are added to the input embeddings before feeding them into the self-
attention mechanism.
• Benefit: Positional encodings provide the transformer with information about the relative positions
of the words in the sequence, allowing it to make use of word order when processing the input.

Introduction to Large Language Models (LLMs)

Definition: Large Language Models (LLMs) are deep learning models that have been trained on vast
amounts of text data and fine-tuned for specific tasks like translation, summarization, or question answering.
They are based on the transformer architecture.

If we talk about the size of the advancements in the GPT (Generative Pre-trained Transformer) model only
then:

• GPT-1 which was released in 2018 contains 117 million parameters having 985 million words.

• GPT-2 which was released in 2019 contains 1.5 billion parameters.

• GPT-3 which was released in 2020 contains 175 billion parameters. Chat GPT is also based on this
model as well.

• GPT-4 model is expected to be released in the year 2023 and it is likely to contain trillions of
parameters.

How do Large Language Models work?

Large Language Models (LLMs) operate on the principles of deep learning, leveraging neural network
architectures to process and understand human languages.

These models, are trained on vast datasets using self-supervised learning techniques. The core of their
functionality lies in the intricate patterns and relationships they learn from diverse language data during
training. LLMs consist of multiple layers, including feedforward layers, embedding layers, and attention
layers. They employ attention mechanisms, like self-attention, to weigh the importance of different tokens in
a sequence, allowing the model to capture dependencies and relationships.
Architecture of LLM

A Large Language Model’s (LLM) architecture is determined by a number of factors, like the objective of the
specific model design, the available computational resources, and the kind of language processing tasks that
are to be carried out by the LLM. The general architecture of LLM consists of many layers such as the feed
forward layers, embedding layers, attention layers. A text which is embedded inside is collaborated together
to generate predictions.

Important components to influence Large Language Model architecture –

• Model Size and Parameter Count

• input representations

• Self-Attention Mechanisms

• Training Objectives

• Computational Efficiency

• Decoding and Output Generation

Large Language Models Use Cases

The main reason behind such a craze about the LLMs is their efficiency in the variety of tasks they
can accomplish. From the above introductions and technical information about the LLMs you must
have understood that the Chat GPT is also an LLM so, let’s use it to describe the use cases of Large
Language Models.

• Code Generation – One of the craziest use cases of this service is that it can generate quite an accurate
code for a specific task that is described by the user to the model.

• Debugging and Documentation of Code – If you are struggling with some piece of code regarding
how to debug it then ChatGPT is your savior because it can tell you the line of code which are creating
issues along with the remedy to correct the same. Also now you don’t have to spend hours writing the
documentation of your project you can ask ChatGPT to do this for you.

• Question Answering – As you must have seen that when AI-powered personal assistants were
released people used to ask crazy questions to them well you can do that here as well along with the
genuine questions.

• Language Transfer – It can convert a piece of text from one language to another as it supports more
than 50 native languages. It can also help you correct the grammatical mistakes in your content.

Use cases of LLM are not limited to the above-mentioned one has to be just creative enough to write
better prompts and you can make these models do a variety of tasks as they are trained to perform tasks
on one-shot learning and zero-shot learning methodologies as well. Due to this only Prompt
Engineering is a totally new and hot topic in academics for people who are looking forward to using
ChatGPT-type models extensively.

Large Language Models Applications

LLMs, such as GPT-3, have a wide range of applications across various domains. Few of them are:

Natural Language Understanding (NLU)

1. Large language models power advanced chatbots capable of engaging in natural conversations.

2. They can be used to create intelligent virtual assistants for tasks like scheduling, reminders, and
information retrieval.

Content Generation

1. Creating human-like text for various purposes, including content creation, creative writing, and
storytelling.

2. Writing code snippets based on natural language descriptions or commands.

Language Translation

1. Large language models can aid in translating text between different languages with improved
accuracy and fluency.

Text Summarization

1. Generating concise summaries of longer texts or articles.

Sentiment Analysis

1. Analyzing and understanding sentiments expressed in social media posts, reviews, and
comments.

• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): A model that reads
text bidirectionally and has been highly successful in tasks like question answering.
o GPT-3 (Generative Pre-trained Transformer 3): A large model capable of generating
human-like text. It’s been used for everything from content generation to coding assistance.
o T5 (Text-to-Text Transfer Transformer): A model that treats every NLP task as a text-to-
text problem, making it highly flexible.
• Advances:
o Scalability: LLMs are highly scalable and continue to show performance gains as the number
of parameters and the amount of training data increase.
o Transfer Learning: These models can be pre-trained on massive datasets and then fine-
tuned on specific tasks, reducing the need for task-specific data.

Challenges in Training of Large Language Models

There has been no doubt in the abilities of the LLMs in the future and this technology is part of most of the
AI-powered applications which will be used by multiple users on a daily basis. But there are some drawbacks
as well of LLMs.

• For the successful training of a large language model, millions of dollars are required to set up that big
computing power that can train the model utilizing parallel performance.
• It requires months of training and then humans in the loop for the fine-tuning of models to achieve
better performance.

• Requiring a large amount of text corpus getting can be a challenging task because ChatGPT only is
being accused of being trained on the data which has been scraped illegally and building an application
for commercial purposes.

• In the era of global warming and climate change, we cannot forget the carbon footprint of an LLM it
is said that training a single AI model from scratch have carbon footprints equal to the carbon footprint
of five cars in their whole lifetime which is a really serious concern.

ChatGPT

• Overview: ChatGPT is a conversational AI model based on the GPT series developed by OpenAI.
It’s designed to generate human-like responses in a dialogue format by understanding the context of
the conversation.
• Capabilities:
o Natural Conversations: ChatGPT can engage in natural, flowing conversations, maintaining
context over multiple turns of dialogue.
o Question Answering: It can answer factual questions or provide opinions on a wide variety
of topics.
o Task Assistance: ChatGPT can assist with tasks such as content generation, coding, solving
mathematical problems, and even creative writing.
• Applications:
o Customer Support: Chatbots powered by GPT can handle customer inquiries, providing
information, troubleshooting, and guidance without human intervention.
o Virtual Assistants: ChatGPT-like models are used in virtual assistants to interact with users,
provide information, and automate tasks.
o Content Creation: These models can generate articles, summaries, and other written content
based on prompts.
o Tutoring and Learning: ChatGPT can be used in educational tools to assist with learning by
explaining concepts, answering questions, or providing study guidance.

Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen.li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen.li
272 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
Asae S296
No ratings yet
Asae S296
5 pages
LLM
No ratings yet
LLM
41 pages
Transformer
No ratings yet
Transformer
31 pages
Tianzheng Troy Wang CIS498EAS499 Submission
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
51 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
Unlocking Linguistic Intelligence_ Attention Mechanisms and Transformer Architectures in NLP (1)
No ratings yet
Unlocking Linguistic Intelligence_ Attention Mechanisms and Transformer Architectures in NLP (1)
117 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Whitepaper_Foundational Large Language Models & Text Generation_v2
100% (1)
Whitepaper_Foundational Large Language Models & Text Generation_v2
86 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Transformer
No ratings yet
Transformer
5 pages
Transformers: Intro
No ratings yet
Transformers: Intro
7 pages
Transformers
No ratings yet
Transformers
10 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Transformers
No ratings yet
Transformers
27 pages
2024_Transformer_master
No ratings yet
2024_Transformer_master
50 pages
What Is A Transformer
No ratings yet
What Is A Transformer
11 pages
Transformers
No ratings yet
Transformers
15 pages
Transformer
No ratings yet
Transformer
55 pages
14.Chapter10_AdvancedDeepLearningForText
No ratings yet
14.Chapter10_AdvancedDeepLearningForText
22 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Transformer Architecture explained in LLMs
No ratings yet
Transformer Architecture explained in LLMs
2 pages
Transformer networks
No ratings yet
Transformer networks
53 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Transformer
No ratings yet
Transformer
10 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Attention is all you need
No ratings yet
Attention is all you need
15 pages
RADL LHPhuong
No ratings yet
RADL LHPhuong
66 pages
The Transformer_ the Engine Behind Large Language
No ratings yet
The Transformer_ the Engine Behind Large Language
3 pages
AE556_2024_Topic7_Transformer
No ratings yet
AE556_2024_Topic7_Transformer
49 pages
Week 12
100% (1)
Week 12
64 pages
L.7
No ratings yet
L.7
54 pages
shivam final
No ratings yet
shivam final
34 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Transformers in Machine Learning _ GeeksforGeeks
No ratings yet
Transformers in Machine Learning _ GeeksforGeeks
9 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
495 Lecture 10 Attall
No ratings yet
495 Lecture 10 Attall
18 pages
To create a LLM
No ratings yet
To create a LLM
53 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
attention
No ratings yet
attention
15 pages
Uppwise Standard PPT 2
No ratings yet
Uppwise Standard PPT 2
13 pages
AATN Merged
No ratings yet
AATN Merged
139 pages
Aiayn
No ratings yet
Aiayn
15 pages
For a Change
No ratings yet
For a Change
10 pages
DL CO4 PPT-1
No ratings yet
DL CO4 PPT-1
29 pages
1706.03762v1
No ratings yet
1706.03762v1
15 pages
JioDiscover-What is the neural networ
No ratings yet
JioDiscover-What is the neural networ
5 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
lecture15_transformer
No ratings yet
lecture15_transformer
26 pages
Attention is All You Need PPT
No ratings yet
Attention is All You Need PPT
18 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advantages of Vector Graphics: Adobe Illustrator Basics
No ratings yet
Advantages of Vector Graphics: Adobe Illustrator Basics
4 pages
ACES-IAS-24!09!2686 Glass Burette 25 Ml
No ratings yet
ACES-IAS-24!09!2686 Glass Burette 25 Ml
2 pages
02_basicarch
No ratings yet
02_basicarch
103 pages
Sae J526 PDF
100% (1)
Sae J526 PDF
1 page
21CS735 Solutions
No ratings yet
21CS735 Solutions
24 pages
214 2BT13 VIPA Control Systems Data Sheet
No ratings yet
214 2BT13 VIPA Control Systems Data Sheet
5 pages
Korelasi Antara Ukuran-Ukuran Tubuh Dengan Bobot Badan Kambing Kacang Jantan Di Jawa Tengah
No ratings yet
Korelasi Antara Ukuran-Ukuran Tubuh Dengan Bobot Badan Kambing Kacang Jantan Di Jawa Tengah
5 pages
1st Grade Summer Math Challenge Updated 2024
No ratings yet
1st Grade Summer Math Challenge Updated 2024
41 pages
L2, 3,4 - Pne
No ratings yet
L2, 3,4 - Pne
37 pages
DFM Die Casting NTTF
No ratings yet
DFM Die Casting NTTF
100 pages
Cls Jeead-15-16 Xii Che Target-5 Set-2 Chapter-3
67% (6)
Cls Jeead-15-16 Xii Che Target-5 Set-2 Chapter-3
44 pages
CE 632 Bearing Capacity PDF
No ratings yet
CE 632 Bearing Capacity PDF
48 pages
Lesson 2 Stepper Servo Motors (1)
No ratings yet
Lesson 2 Stepper Servo Motors (1)
40 pages
Analyzing Arguments
No ratings yet
Analyzing Arguments
9 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
13 pages
Emissions From Nonroad Mining Trucks in AOSR
No ratings yet
Emissions From Nonroad Mining Trucks in AOSR
191 pages
Groundwater exploration using integrated geophysics method in hard rock terrains in Mount Betung Western Bandar Lampung, Indonesia
No ratings yet
Groundwater exploration using integrated geophysics method in hard rock terrains in Mount Betung Western Bandar Lampung, Indonesia
9 pages
23.1.3 Lab - Troubleshoot SNMP and Logging Issues - ILM
No ratings yet
23.1.3 Lab - Troubleshoot SNMP and Logging Issues - ILM
13 pages
A Study of Blowfish Encryption Algorithm
100% (1)
A Study of Blowfish Encryption Algorithm
4 pages
06 - Introduction To Trigonometry Practice WS - 3
No ratings yet
06 - Introduction To Trigonometry Practice WS - 3
3 pages
Fat Trafo
100% (1)
Fat Trafo
3 pages
Simplifier Mk2
No ratings yet
Simplifier Mk2
13 pages
Line Follower Robot
No ratings yet
Line Follower Robot
16 pages
Transmission Line Design-Advanced TADP 640: Steel Poles-Design Considerations - Miscellaneous Topics Dr. Prasad Yenumula
No ratings yet
Transmission Line Design-Advanced TADP 640: Steel Poles-Design Considerations - Miscellaneous Topics Dr. Prasad Yenumula
40 pages
Assetto Corsa Competizione - Car Setup: 1. Run With Default Aggressive Setup
No ratings yet
Assetto Corsa Competizione - Car Setup: 1. Run With Default Aggressive Setup
17 pages
chap5
No ratings yet
chap5
79 pages
Rozaimi Mohd Noor MFKA2009
No ratings yet
Rozaimi Mohd Noor MFKA2009
12 pages
Influence of Topographical Input Data On The Accuracy of Wind Flow Modelling in Complex Terrain
No ratings yet
Influence of Topographical Input Data On The Accuracy of Wind Flow Modelling in Complex Terrain
5 pages
Object Oriented Programming Using Java 7th 2022 2023 Btech
No ratings yet
Object Oriented Programming Using Java 7th 2022 2023 Btech
2 pages