generative AI Unit 3 notes
generative AI Unit 3 notes
Transformer
Transformers: Introduction
• Definition: Transformers are a revolutionary deep learning architecture introduced by Vaswani et al.
in the paper "Attention is All You Need". They were primarily designed for tasks involving sequential
data like natural language processing (NLP). Unlike traditional sequential models like Recurrent
Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks, transformers can process
entire input sequences at once rather than sequentially.
• Key Innovation: The key innovation in transformers is the self-attention mechanism, which allows
the model to weigh the importance of different words (or tokens) within the input sequence. This is
done without regard to the word's position, thus overcoming the limitation of RNNs where long-range
dependencies (e.g., relationships between words far apart in a sentence) are difficult to capture.
Applications of Transformers
• Natural Language Processing (NLP): Transformers have been applied extensively in NLP tasks.
They have enabled breakthroughs in tasks like:
o Machine Translation: Translating text from one language to another (e.g., Google
Translate).
o Text Summarization: Creating concise summaries of longer documents.
o Sentiment Analysis: Determining the sentiment (positive, negative, neutral) in a text.
o Question Answering: Systems like BERT (Bidirectional Encoder Representations from
Transformers) can answer questions from given texts by understanding context.
o Language Modeling: Predicting the next word or phrase in a sequence (e.g., GPT-3, used for
generating human-like text).
• Computer Vision:
o Recently, transformers have been adapted for vision tasks, most notably with Vision
Transformers (ViTs). These models treat image patches as sequences of tokens, similar to
words in a text. They have shown competitive performance on image classification tasks.
o Applications include object detection, image segmentation, and even video understanding.
• Speech Processing:
o Transformers are used for tasks like automatic speech recognition (ASR), converting
spoken language into text, and speech synthesis, generating spoken language from text (e.g.,
text-to-speech models).
• Reinforcement Learning:
o Transformers have been applied in reinforcement learning for decision-making tasks. They
are particularly useful in tasks requiring long-term planning and memory of past states, such
as video game agents like those used in AlphaStar.
• Multimodal Learning:
o Multimodal transformers combine information from multiple data types, such as text, images,
and sound, to perform tasks that require an understanding of different modalities. For
example, models like DALL·E and CLIP use text to generate or interpret images.
Transformer Encoder
Self-Attention Mechanisms
• Purpose: Self-attention helps the model determine which words in the input sequence are important
to each other, regardless of their distance apart. This allows the model to handle long-range
dependencies, which is difficult for traditional sequential models like RNNs.
• How it Works:
1. Input Embeddings: Each word in the sequence is converted into a vector (embedding)
representing its meaning.
2. Query, Key, and Value Vectors: These embeddings are then projected into three different
vectors: Query (Q), Key (K), and Value (V).
3. Attention Scores: The attention scores between the words are calculated by taking the dot
product of the Query and Key vectors. This results in a matrix of scores that represent how
much focus each word should place on every other word.
4. Softmax Normalization: These attention scores are passed through a softmax function to
normalize them, ensuring that they sum to 1.
5. Weighted Values: The final output of the self-attention mechanism is a weighted sum of the
Value vectors, weighted by the attention scores.
• Benefits: Self-attention enables the model to capture relationships between words, regardless of their
position in the sentence. This is in contrast to RNNs, which have difficulty capturing long-range
dependencies because they process the sentence sequentially.
• Self-Attention: A single attention head computes the attention scores between each word and every
other word in the sequence. However, a single attention head might focus too much on certain parts
of the sequence and ignore others, limiting its capacity to capture complex relationships.
• Multi-Head Attention:
o Multi-head attention extends self-attention by using multiple attention heads. Each head
learns to focus on different parts of the sequence. The outputs from all heads are concatenated
and transformed by a feedforward neural network.
o Benefit: By attending to different parts of the sequence simultaneously, multi-head attention
helps the model learn a richer representation of the input, capturing more subtle relationships
and
patterns
.
Positional Encoding in Transformers
• Challenge: Unlike RNNs and CNNs, transformers process inputs in parallel rather than sequentially.
While this is efficient, it introduces a problem: the model has no sense of word order, which is
crucial in language tasks.
• Solution: Positional encodings are added to the input embeddings to give the model a sense of
position. These encodings are learned or designed functions that provide unique values to each
position in the input sequence.
• How Positional Encoding Works:
o The most common positional encodings are sinusoidal functions of different frequencies,
which ensure that each position has a unique encoding that can be easily interpreted by the
model.
o These encodings are added to the input embeddings before feeding them into the self-
attention mechanism.
• Benefit: Positional encodings provide the transformer with information about the relative positions
of the words in the sequence, allowing it to make use of word order when processing the input.
Definition: Large Language Models (LLMs) are deep learning models that have been trained on vast
amounts of text data and fine-tuned for specific tasks like translation, summarization, or question answering.
They are based on the transformer architecture.
If we talk about the size of the advancements in the GPT (Generative Pre-trained Transformer) model only
then:
• GPT-1 which was released in 2018 contains 117 million parameters having 985 million words.
• GPT-3 which was released in 2020 contains 175 billion parameters. Chat GPT is also based on this
model as well.
• GPT-4 model is expected to be released in the year 2023 and it is likely to contain trillions of
parameters.
Large Language Models (LLMs) operate on the principles of deep learning, leveraging neural network
architectures to process and understand human languages.
These models, are trained on vast datasets using self-supervised learning techniques. The core of their
functionality lies in the intricate patterns and relationships they learn from diverse language data during
training. LLMs consist of multiple layers, including feedforward layers, embedding layers, and attention
layers. They employ attention mechanisms, like self-attention, to weigh the importance of different tokens in
a sequence, allowing the model to capture dependencies and relationships.
Architecture of LLM
A Large Language Model’s (LLM) architecture is determined by a number of factors, like the objective of the
specific model design, the available computational resources, and the kind of language processing tasks that
are to be carried out by the LLM. The general architecture of LLM consists of many layers such as the feed
forward layers, embedding layers, attention layers. A text which is embedded inside is collaborated together
to generate predictions.
• input representations
• Self-Attention Mechanisms
• Training Objectives
• Computational Efficiency
The main reason behind such a craze about the LLMs is their efficiency in the variety of tasks they
can accomplish. From the above introductions and technical information about the LLMs you must
have understood that the Chat GPT is also an LLM so, let’s use it to describe the use cases of Large
Language Models.
• Code Generation – One of the craziest use cases of this service is that it can generate quite an accurate
code for a specific task that is described by the user to the model.
• Debugging and Documentation of Code – If you are struggling with some piece of code regarding
how to debug it then ChatGPT is your savior because it can tell you the line of code which are creating
issues along with the remedy to correct the same. Also now you don’t have to spend hours writing the
documentation of your project you can ask ChatGPT to do this for you.
• Question Answering – As you must have seen that when AI-powered personal assistants were
released people used to ask crazy questions to them well you can do that here as well along with the
genuine questions.
• Language Transfer – It can convert a piece of text from one language to another as it supports more
than 50 native languages. It can also help you correct the grammatical mistakes in your content.
Use cases of LLM are not limited to the above-mentioned one has to be just creative enough to write
better prompts and you can make these models do a variety of tasks as they are trained to perform tasks
on one-shot learning and zero-shot learning methodologies as well. Due to this only Prompt
Engineering is a totally new and hot topic in academics for people who are looking forward to using
ChatGPT-type models extensively.
1. Large language models power advanced chatbots capable of engaging in natural conversations.
2. They can be used to create intelligent virtual assistants for tasks like scheduling, reminders, and
information retrieval.
Content Generation
1. Creating human-like text for various purposes, including content creation, creative writing, and
storytelling.
Language Translation
1. Large language models can aid in translating text between different languages with improved
accuracy and fluency.
Text Summarization
Sentiment Analysis
1. Analyzing and understanding sentiments expressed in social media posts, reviews, and
comments.
• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): A model that reads
text bidirectionally and has been highly successful in tasks like question answering.
o GPT-3 (Generative Pre-trained Transformer 3): A large model capable of generating
human-like text. It’s been used for everything from content generation to coding assistance.
o T5 (Text-to-Text Transfer Transformer): A model that treats every NLP task as a text-to-
text problem, making it highly flexible.
• Advances:
o Scalability: LLMs are highly scalable and continue to show performance gains as the number
of parameters and the amount of training data increase.
o Transfer Learning: These models can be pre-trained on massive datasets and then fine-
tuned on specific tasks, reducing the need for task-specific data.
There has been no doubt in the abilities of the LLMs in the future and this technology is part of most of the
AI-powered applications which will be used by multiple users on a daily basis. But there are some drawbacks
as well of LLMs.
• For the successful training of a large language model, millions of dollars are required to set up that big
computing power that can train the model utilizing parallel performance.
• It requires months of training and then humans in the loop for the fine-tuning of models to achieve
better performance.
• Requiring a large amount of text corpus getting can be a challenging task because ChatGPT only is
being accused of being trained on the data which has been scraped illegally and building an application
for commercial purposes.
• In the era of global warming and climate change, we cannot forget the carbon footprint of an LLM it
is said that training a single AI model from scratch have carbon footprints equal to the carbon footprint
of five cars in their whole lifetime which is a really serious concern.
ChatGPT
• Overview: ChatGPT is a conversational AI model based on the GPT series developed by OpenAI.
It’s designed to generate human-like responses in a dialogue format by understanding the context of
the conversation.
• Capabilities:
o Natural Conversations: ChatGPT can engage in natural, flowing conversations, maintaining
context over multiple turns of dialogue.
o Question Answering: It can answer factual questions or provide opinions on a wide variety
of topics.
o Task Assistance: ChatGPT can assist with tasks such as content generation, coding, solving
mathematical problems, and even creative writing.
• Applications:
o Customer Support: Chatbots powered by GPT can handle customer inquiries, providing
information, troubleshooting, and guidance without human intervention.
o Virtual Assistants: ChatGPT-like models are used in virtual assistants to interact with users,
provide information, and automate tasks.
o Content Creation: These models can generate articles, summaries, and other written content
based on prompts.
o Tutoring and Learning: ChatGPT can be used in educational tools to assist with learning by
explaining concepts, answering questions, or providing study guidance.