0% found this document useful (0 votes)
2 views130 pages

Gen-AI-Module1

Generative AI encompasses algorithms that create new content across various mediums, such as text, images, and audio, mimicking the training data. It includes large language models like GPT and applications in diverse fields, including text-to-image and text-to-speech generation. The evolution of generative AI has seen significant milestones from early theoretical foundations to modern advancements like DALL-E and GPT-3, highlighting its growing impact and ethical considerations.

Uploaded by

susheeth1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views130 pages

Gen-AI-Module1

Generative AI encompasses algorithms that create new content across various mediums, such as text, images, and audio, mimicking the training data. It includes large language models like GPT and applications in diverse fields, including text-to-image and text-to-speech generation. The evolution of generative AI has seen significant milestones from early theoretical foundations to modern advancements like DALL-E and GPT-3, highlighting its growing impact and ethical considerations.

Uploaded by

susheeth1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

Generative AI

Subject Code & Subject Name : CSE3348 Generative AI


Students: School of Computer Science and Engineering

Faculty Name : Dr J Alamelu Mangai


Designation : Professor
Department : CSE

Section Name Slide Number


What is GenAI?
• Generative AI refers to a set of algorithms that can generate new
content in any medium such as image, text, audio or video.
• This generated content is similar to the content that the algorithm is
trained on.
• A prominent type of generative AI is the large language model (LLM),
which generates natural language texts based on prompts.
• GPT (Generative Pre-trained Transformer) series is a well-known
example of generative AI.
• ChatGPT is a renowned example of LLMs.

Ranjitha P-20213CSE0014 2
Ranjitha P-20213CSE0014 3
What is GenAI?
• GenAI :
• algorithms that generate novel content
• unlike traditional predictive ML, they do
not analyse or act on the existing data

• GenAI models have the ability to generate


text, images and other creative content
indistinguishable from human-generated
content.

Ranjitha P-20213CSE0014 4
Generative Vs. Discriminative Modeling [T2 Pg 1 -
5
• Discriminative Modeling is like supervised learning

Ranjitha P-20213CSE0014 5
What is Generative modeling? [T2 pg. 1 – 5]
• A generative model describes how a data set is generated in terms of
a probabilistic model.
• By sampling this model, new data can be generated.

Ranjitha P-20213CSE0014 6
What is Generative modelling?
• Any generative modeling process has:
• A training data : examples of the entity the model has to generate.
• Observation : one of the examples from the training data
• Each Observation is defined using many features.
• Ex: Image of a horse has individual pixel values as features
• A generative model has to be probabilistic and not deterministic.
• The model should have some randomness that influences the sample
generated every time by the model.
• The model has to identify the unknown prob. distribution that
justifies/distinguishes the images present in the training data from
those not in the training set.

Ranjitha P-20213CSE0014 7
• If the model mimics this distribution, by sampling it can generate new
observations that look realistic.
• Discriminative modelling is done on a labelled data.
• Generative modeling is usually done on an unlabelled data (like
unsupervised learning)
• It can also be used to generate samples of a distinct class in the
training data.

Ranjitha P-20213CSE0014 8
Generative Modeling projects - Examples
• StyleGan by NVIDIA – generates hyper-realistic images of human
faces.

• GPT by OpenAI : given a short introductory passage, the model


completes the given passage.

Ranjitha P-20213CSE0014 9
Generative Modeling projects – Examples[T1 pg 4-
• OpenAI :
• A US based AI research company that promotes and develop friendly AI
applications.
• Started as a non-profit organisation in 2015 .
• In 2019, it became a for profit organisation.
• Significant achievements : Gym library for training reinforcement learning
algorithms
• Recently – GPT-n models and Dall-E generative models which generates
images from text.

Ranjitha P-20213CSE0014 10
Generative models? T1 Pg.4
• Artificial Intelligence (AI) : a broad field of CS focussed on creating intelligent
agents that can reason, learn and act autonomously
• Machine Learning(ML): a subset of AI, focussed on developing algorithms that
can learn from data.
• Deep Learning(DL): uses deep neural networks with many layers, as a mechanism
of ML to learn complex patterns from data.
• Generative models are a type of ML model, that can generate new data based on
patterns learnt from the input data.
• Language Models (LMs): are statistical models used to predict words in a
sequence of natural language text. ”The sky is ********”
• Large Language models (LLMs) : uses deep learning and are trained on massive
data sets.

Ranjitha P-20213CSE0014 11
Ranjitha P-20213CSE0014 12
• Generative models :
• a powerful type of AI that can generate new data that resembles the training
data.
• They handle different data modalities
• They are used in different domains – text, image, music and video
• They synthesise new data rather than just making predictions/decisions
• They are used in applications generating text, image, music and video.
• When real data is scarce to train an AI model, generative models can be used
to create synthetic data.

Ranjitha P-20213CSE0014 13
OpenAI’s generative models
https://platform.openai.com/docs/models

Ranjitha P-20213CSE0014 14
Evolution of Generative AI
• 1948: Claude Shannon wrote a paper called “A Mathematical Theory
of Communication“. In this paper, he introduced the idea of n-grams,
a statistical model that can generate new text based on existing text.
• 1950: Alan Turing wrote a paper called “Computing Machinery and
Intelligence“. In this paper, he introduced the Turing Test, which is a
way to determine if a machine can behave intelligently like a human.
• 1952: A.L. Hodgkin and A.F. Huxley created a mathematical model
that explained how the brain uses neurons to create an electrical
network. This model inspired the development of artificial neural
networks, which are used in generative AI.

Ranjitha P-20213CSE0014 15
• 1965: Alexey Ivakhnenko and Valentin Lapa developed the first
learning algorithm for feedforward neural networks. This algorithm
enabled the networks to learn complex nonlinear functions from
data.
• 1979: Kunihiko Fukushima introduced the neocognitron, a powerful
type of neural network known as a deep convolutional neural
network. It was specifically designed to identify and recognize
handwritten digits and various other patterns.
• 1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams wrote a
paper called “Learning Representations by Back-propagating Errors.”
This paper introduced the backpropagation algorithm, which is
commonly used to train neural networks.

Ranjitha P-20213CSE0014 16
• 1991: Sepp Hochreiter introduced the long short-term memory
(LSTM) network. It is a type of recurrent neural network that can
learn long-term relationships in sequential data.
• 2001: Yoshua Bengio and his colleagues created a neural network
called the Neural Probabilistic Language Model (NPLM). This model
can learn how words are used in natural language.
• 2014: Diederik Kingma and Max Welling introduced the variational
autoencoder (VAE). It is a type of model that can learn
representations of data and generate new data based on those
learned representations.

Ranjitha P-20213CSE0014 17
• 2014: Ian Goodfellow and his colleagues introduced the generative
adversarial network (GAN). It is a type of generative model that
comprises two neural networks: a generator and a discriminator. The
generator aims to generate realistic data, while the discriminator aims
to differentiate between real and fake data.
• 2015: Yann LeCun and his team proposed the diffusion model. It is a
generative model that learns to reverse a process that gradually
transforms data into noise.
• 2016: Aaron van den Oord and his team introduced WaveNet, a
powerful neural network that can create lifelike speech and music
waveforms.

Ranjitha P-20213CSE0014 18
• 2017: Ashish Vaswani and his team introduced the Transformer, a
neural network design that leverages attention mechanisms to learn
from sequential information, like language or speech.
• 2018: Alec Radford and his team introduced Generative Pre-trained
Transformer (GPT). This is a big model that uses the Transformer
architecture to create different kinds of text on different subjects.
• 2018: Jacob Devlin and his team introduced BERT, a powerful model
that can understand the meaning of words and sentences in any
language. It uses a technique called Transformers to learn from lots of
text without needing specific labels.

Ranjitha P-20213CSE0014 19
• 2019: a researcher named Tero Karras and his team
introduced StyleGAN, an enhanced type of GAN (Generative
Adversarial Network) that can create a wide range of detailed and
realistic images, including faces, animals, landscapes, and more.
• 2020: Large Language Models Take Center Stage: OpenAI’s GPT-3
(Generative Pre-trained Transformer 3) with 175 billion parameters
pushes the boundaries of language generation, demonstrating
impressive capabilities in text creation, translation, and code writing.
• 2020: a team led by Alexei Baevski introduced wav2vec 2.0. It is a
model that can learn speech representations directly from raw audio
and achieved excellent performance in speech recognition tasks.

Ranjitha P-20213CSE0014 20
• 2021: Aditya Ramesh and his team created DALL-E, a powerful
model that can create lifelike images based on written descriptions.
• 2021: Focus on Control and Explainability: Researchers grapple with
the “black box” nature of large language models, seeking methods to
improve control over generated outputs and explain the reasoning
behind their creations.
• 2022: Diffusion Models Gain Traction: Diffusion models, known for
their ability to create realistic images, experience a surge in
popularity. Applications in image generation, editing, and inpainting
become prominent.

Ranjitha P-20213CSE0014 21
• 2023: Multimodal Generative AI Takes Shape: Models capable of
generating across different modalities, like text and image
combinations, start to emerge. This opens doors for more interactive
and immersive experiences.
• 2023: Ethical Considerations Mount: Concerns around bias,
misinformation, and potential misuse of generative AI lead to
discussions on responsible development and deployment practices.
• 2024: Focus on Real-World Integration: A growing trend towards
integrating generative AI tools into real-world applications across
various industries like customer service, product design, and
marketing.

Ranjitha P-20213CSE0014 22
Advantages of generative modeling
• Synthetic data generation using generative models reduces the cost
of labelling and improves the training efficiency.
• Microsoft Research trained their LLM named phi-1 using generative
modelling, for basic Python coding.
• It is a transformer with 1.3 billion parameters.
• Trained on code from The Stack, Q&A content from StackOverflow,
synthetic codes generated by GPT3.5
• “Textbooks Are All You Need, June 2023”
https://www.microsoft.com/en-us/research/publication/textbooks-
are-all-you-need/

Ranjitha P-20213CSE0014 23
Ranjitha P-20213CSE0014 24
Types of generative models[T1 pg 6]
• Different types of generative models for different data modalities:
1) Text-to-text :
• models that generate text from input text, like conversational agents. Ex:
LLaMa 2, GPT-4, Claude, PaLM 2
• A conversational agent is a program designed to converse with humans in
natural language.
• It can talk to people on phones, computers, and other devices, allowing them
to order food or do other functions through voice, text, or chat.
• It can achieve these using technologies like natural language processing (NLP),
machine learning (ML), speech recognition, text-to-speech synthesis, and
dialog management to interact with people through various mediums.

Ranjitha P-20213CSE0014 25
• Llama 2 is a family of pre-trained and fine-tuned large language
models (LLMs) released by Meta AI in 2023.
• Released free of charge for research and commercial use, Llama 2 AI
models are capable of a variety of natural language processing (NLP)
tasks, from text generation to programming code.

Ranjitha P-20213CSE0014 26
• GPT-n by OpenAI:
• Generative Pre-trained Transformer 3
(GPT-3) is a large language model
released by OpenAI in 2020.
• it is a decoder-only transformer model
of deep neural network and convolution
-based architectures with a
technique known as "attention“ with 175
billion parameters.

Ranjitha P-20213CSE0014 27
2) Text-to-Image:
• Models that generate images from text captions. Ex: Dall-E 2, Stable Diffusion
and Imagen.
• Dall-E 2 : https://openai.com/index/dall-e-2/
• DALL·E is a 12-billion parameter version of GPT-3 (opens in a new
window) trained to generate images from text descriptions, using a dataset of
text–image pairs.

Ranjitha P-20213CSE0014 28
Ranjitha P-20213CSE0014 29
3) Text-to-Audio:
• Models that generate audio clips and music from text. Ex: Jukebox, AudioLM and
MusicGen
• Jukebox is a neural network-based tool that uses artificial intelligence to
generate music.
• Developed by OpenAI, Jukebox is a neural network model capable of composing
original songs in different genres and styles.
• Jukebox employs a combination of deep learning techniques, including generative
modeling and reinforcement learning, to create music that is both coherent and
creative.
• The main use cases of Jukebox include music generation, song completion, and
music style transfer. It can generate new songs in the style of a given artist or
even complete a song given a short melody.

Ranjitha P-20213CSE0014 30
4) Text-to-video:
• Models that generate video content from text descriptions. Ex:
Phenaki and Emu Video
• Phenaki : A model for generating videos from text, with prompts that
can change over time, and videos that can be as long as multiple
minutes. https://phenaki.video/
5) Text-to-Speech: Models that synthesize speech audio from input
text. Ex: WaveNet and Tacotron
6) Speech-to-text: Models that transcribe speech to text [ also called
Automatic Speech Recognition ASR]. Ex: Whisper and SpeechGPT

Ranjitha P-20213CSE0014 31
7) Image-to-text: Models that generate image captions from images.
Ex: CLIP and DALL-E 3.
8) Image to Image: Applications –
• data augmentation,
• Neural style transfer (NST) - manipulate digital images,
or videos, in order to adopt the appearance or visual style of
another image.
• generating a new image by combining the content of
one image with the style of another image.
• The goal of style transfer is to create an image that
preserves the content of the original image while
applying the visual style of another image.

Ranjitha P-20213CSE0014 32
Ranjitha P-20213CSE0014 33
• Inpainting : removing defects in the image
Ex: Right arm is missing in the original image

Ranjitha P-20213CSE0014 34
9) Text-to-code: models that generate programming code from text.
Ex: Stable diffusion and Dall-E 3
10) Video-to-audio:
Models that analyse video and generate matching audio.
Ex: Soundify
11) Text-to-Math: generates mathematical expressions from text.
• Many other combinations of data modalities exists
• Text is the common modality.
• OpenAI’s GPT-4V model – Sep 2023 takes both text and images to
better OCR to read text from images.

Ranjitha P-20213CSE0014 35
L3:Prompt Engineering

Ranjitha P-20213CSE0014 1
Prompt Engineering-Introduction, prompt design and prompt engineering
https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/introduction-prompt-design

• A prompt is a natural language request submitted to a language model to receive


a response back.
• They can contain questions, instructions, contextual information, few-shot
examples, and partial input for the model to complete or continue.
• depending on the type of model being used, it can generate text, embeddings,
code, images, videos, music, and more.
• Prompt design is the process of creating prompts that elicit the desired response
from language models.
• Writing well structured prompts can be an essential part of ensuring accurate,
high quality responses from a language model.

Ranjitha P-20213CSE0014 2
Prompt Engineering-Introduction
https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/introduction-prompt-design

• The iterative process of repeatedly updating prompts and assessing the model's
responses is sometimes called prompt engineering.
• for straightforward tasks many of the models (Google’s Gemini) often perform
well without the need for prompt engineering.
• for complex tasks, effective prompt engineering still plays an important role.

Ranjitha P-20213CSE0014 3
Components of a Prompt
• Prompt contents fall within one of the following components:
1. Task (required)
2. System instructions (optional)
3. Few-shot examples (optional)
4. Contextual information (optional)
1. Task:
• A task is the text in the prompt that you want the model to provide a
response for.
• Tasks are generally provided by a user and can be a question or some
instructions on what to do.

Ranjitha P-20213CSE0014 4
Components of a Prompt..
• Example of a task with a question:

Ranjitha P-20213CSE0014 5
Components of a Prompt..
• Example of a task with an instruction:

Ranjitha P-20213CSE0014 6
Components of a Prompt..
2. System Instructions:
• Are instructions passed to the model before user input.
• They dictate the style and tone of the model – what the model should and
should not do
• They are given through : “system parameter”

Ranjitha P-20213CSE0014 7
Ranjitha P-20213CSE0014 8
3. Few shot examples:
• Few-shot examples are examples that you include in a prompt to show the model what getting it
right looks like.
• Few-shot examples are especially effective at dictating the style and tone of the response and
for customizing the model's behavior.

Ranjitha P-20213CSE0014 9
4. Contextual Information:
• Contextual information, or context, is information that is included in the prompt that the
model uses or references when generating a response.
• We can include contextual information in different formats, like tables or text.

Ranjitha P-20213CSE0014 10
Safety and Fallback responses
• In few use cases the model is not expected to fulfill the user's requests.
• when the prompt is encouraging a response that is not aligned with the service provider’s
(Google's/OpenAI’s) values or policies, the model might refuse to respond and provide a fallback
response.
• Few cases where the model is likely to refuse to respond:
• Hate Speech:
• Prompts with negative or harmful content targeting identity and/or protected attributes.
• Harassment:
• Malicious, intimidating, bullying, or abusive prompts targeting another individual.
• Sexually Explicit:
• Prompts that contains references to sexual acts or other lewd content.
• Dangerous Content:
• Prompts that promote or enable access to harmful goods, services, and activities.

Ranjitha P-20213CSE0014 11
Prompting Strategies
https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies

• There is no specific syntax/format of a prompt


• There are common strategies that can be uses to affect the model's responses.
• Rigorous testing and evaluation remain crucial for optimizing model performance.
• Large language models (LLM) are trained on vast amounts of text data to learn
the patterns and relationships between units of language.
• When given some text (the prompt), language models can predict what is likely to
come next, like a sophisticated autocompletion tool.
• Therefore, when designing prompts, consider the different factors that can
influence what a model predicts comes next.

Ranjitha P-20213CSE0014 12
Prompt Engineering Workflow…
• Prompt engineering is a test-driven and iterative process that can
enhance model performance.
• When creating prompts, it is important to clearly define the
objectives and expected outcomes for each prompt and
systematically test them to identify areas of improvement.

Ranjitha P-20213CSE0014 13
Prompt Engineering Workflow…
• Two factors affecting the effectiveness of a prompt : content and
structure.
• Content:
• has all relevant information associated with a task for the model to complete
the task.
• It can include instructions, examples, contextual information, and so on.
• Structure:
• Helps to give the relevant information in a structure, easy to parse for the
model.
• Structure of the prompt is defined by prompt template
• Things like the ordering, labeling, and the use of delimiters can all affect the
quality of responses.

Ranjitha P-20213CSE0014 14
Ranjitha P-20213CSE0014 15
Ranjitha P-20213CSE0014 16
Best Practices of Prompt design
• Give clear and specific instructions
• Include few-shot examples
• Assign a role
• Add contextual information
• Use system instructions
• Structure prompts
• Instruct the model to explain its reasoning
• Break down complex tasks
• Experiment with parameter values
• Prompt iteration strategies

Ranjitha P-20213CSE0014 17
1. Give clear and specific instructions
• Use Effective instructions applying the following principles:
• Tell the model what to do.
• Be clear and specific.
• Specify any constraints or formatting requirements for the output.
• Example: Suppose you own a cheeseburger restaurant and you want
to use a model to help you learn about which menu items are the
most popular. You want the model to format transcripts of customer
orders in JSON so that you can quickly identify menu items

Ranjitha P-20213CSE0014 18
Case 1: Response if the instruction is more generic

Ranjitha P-20213CSE0014 19
Ranjitha P-20213CSE0014 20
Case 2: Response if the instruction is specific following the principles

Ranjitha P-20213CSE0014 21
2. Include few-shot examples
• include examples in the prompt that show the model what a good response looks
like.
• The model attempts to identify patterns and relationships from the examples and
applies them when generating a response.
• Prompts that contain examples are called few-shot prompts, while prompts that
provide no examples are called zero-shot prompts.
• Few-shot prompts are often used to regulate the output formatting, phrasing,
scoping, or general patterning of model responses.
• Use specific and varied examples to help the model narrow its focus and generate
more accurate results.
• Experiment with the number of prompts to include. Depending on the model, too
few examples are ineffective at changing model behavior. Too many examples can
cause the model to overfit.

Ranjitha P-20213CSE0014 22
Zero-shot versus few-shot prompts
Extract the technical specifications from the text given and give the
output in JSON
Case 1: Response from a zero-shot prompt

Ranjitha P-20213CSE0014 23
Case 2: If we want the keys in lower case, use few-shot prompt

Ranjitha P-20213CSE0014 24
3. Assign a role
• Assign a role to the model (in persona)
• Adding a role is not always necessary but can enforce a certain level
of expertise when generating a response, improve performance, and
tailor its communication style.
• particularly useful for getting the model to perform highly technical
tasks or enforcing specific communication styles.

Ranjitha P-20213CSE0014 25
Ranjitha P-20213CSE0014 26
4. Add contextual information
• instead of assuming that the model has all of the required
information, can include instructions and information that the model
needs to solve a problem, in the prompt.
• This contextual information helps the model understand the
constraints and details of what it is asked to do.
• Effective contextual information includes the following:
• Background information (context) for the model to refer to when generating
responses.
• Rules or pre-programmed responses to steer the model behavior.
Ex: Ask the model to give troubleshooting guidance for a router:
https://cloud.google.com/vertex-ai/generative-
ai/docs/learn/prompts/contextual-information

Ranjitha P-20213CSE0014 27
5. System Instructions
• System instructions are like a preamble added before the LLM gets
exposed to any further instructions from the user.
• It lets users steer the behavior of the model based on their specific
needs and use cases.
• When you set a system instruction, you give the model additional
context to understand the task, provide more customized responses,
and adhere to specific guidelines over the full user interaction with
the model.
• Example, include things like the role or persona, contextual
information, and formatting instructions:

Ranjitha P-20213CSE0014 28
https://cloud.google.com/vertex-ai/generative-
ai/docs/learn/prompts/system-instructions

Ranjitha P-20213CSE0014 29
6. Structure prompts [Same as discussed before using prompt
templates]
7. Instruct the model to explain its reasoning
8. Break down complex tasks
9. Experiment with parameter values
10. Prompt iteration strategies

Ranjitha P-20213CSE0014 30
7. Instruct the model to explain reasoning
• the model responds with the steps that it employs to solve the
problem – reasoning steps
• Going through this process can sometimes improve accuracy and
nuance, especially for challenging queries.
• Reasoning steps are part of the response.
• To parse out the reasoning steps from the answer,specify an output
format by using XML or other separators.

Ranjitha P-20213CSE0014 31
• Example, suppose you're writing a cooking blog and you want the
model to tell you how readers might interpret different parts of the
blog. If you don't instruct the model to explain its reasoning, the
response from the model might not be as useful as you'd like:
• https://cloud.google.com/vertex-ai/generative-
ai/docs/learn/prompts/explain-reasoning

Ranjitha P-20213CSE0014 32
Ranjitha P-20213CSE0014 33
8. Break down complex tasks
• For complex tasks that require multiple instructions or steps, improve
the model's responses by breaking the prompts into subtasks.
• Smaller prompts can help improve controllability, debugging, and
accuracy.
• There are two ways to break down complex prompts and ingest them
into a model:
• Chain prompts: split a task into subtasks(sequential steps) and run the
subtasks sequentially.
• Aggregate responses: split a task into subtasks and run the subtasks in
parallel.

Ranjitha P-20213CSE0014 34
Chain Prompts
• For complex tasks that involve multiple sequential steps, make each
step a prompt and chain the prompts together in a sequence.
• In this sequential chain of prompts, the output of one prompt in the
sequence becomes the input of the next prompt.
• The output of the last prompt in the sequence is the final output.
• For example, suppose you run a telecommunications business and
want to use a model to help you analyze customer feedback to
identify common customer issues, classify issues into categories, and
generate solutions for categories of issues.

Ranjitha P-20213CSE0014 35
Task 1: identify customer issues

Ranjitha P-20213CSE0014 36
Task 2: classify issues into categories

Ranjitha P-20213CSE0014 37
Task 3: generate solutions

Ranjitha P-20213CSE0014 38
Aggregate responses
• Used for a complex task, that doesn’t follow a sequence.
• you can run parallel prompts and aggregate the model's responses.
• Ex: https://cloud.google.com/vertex-ai/generative-
ai/docs/learn/prompts/break-down-prompts
• Problem: Suppose you own a music record store and want to use a
model to help you decide which records to stock based on music
streaming trends and your store's sales data.
• Analyse the two data sets streaming and sales data in parallel by
running two prompts.

Ranjitha P-20213CSE0014 39
Task 1: analyze streaming and sales data in parallel

Ranjitha P-20213CSE0014 40
Task 2 : aggregate the responses

Ranjitha P-20213CSE0014 41
9. Experiment with parameter values
• Each call sent to a model includes parameter values that control how the model
generates a response.
• The model can generate different results for different parameter values.
• Experiment with different parameter values to get the best values for the task.
• The parameters available for different models may differ.
• The most common parameters are the following:
• Max output tokens
• Temperature
• Top-K
• Top-P

Ranjitha P-20213CSE0014 42
• Max output tokens:
• Maximum number of tokens that can be generated in the response.
• A token is approximately four characters.
• 100 tokens correspond to roughly 60-80 words.
• Specify a lower value for shorter responses and a higher value for potentially longer
responses.
• Temperature:
• Temperature controls the degree of randomness in token selection.
• Lower temperatures are good for prompts that require a less open-ended or creative
response, while higher temperatures can lead to more diverse or creative results.
• Temp=0 always selects the tokens with the highest probability.
• Temp=0, always generates deterministic response, with little randomness still possible.
• If the model returns a response that's too generic, too short, or the model gives a fallback
response, try increasing the temperature.

Ranjitha P-20213CSE0014 43
• Top-K:
• Top-K changes how the model selects tokens for output.
• Top-K = 1, the next selected token is the most probable among all tokens in
the model's vocabulary (also called greedy decoding)
• Top-K = 3, the next token is selected from among the three most probable
tokens by using temperature.
• For each token selection step, the top-K tokens with the highest probabilities
are sampled.
• Then tokens are further filtered based on top-P with the final token selected
using temperature sampling.
• Specify a lower value for less random responses and a higher value for more
random responses.

Ranjitha P-20213CSE0014 44
• Top-P:
• Top-P changes how the model selects tokens for output
• Tokens are selected from the most propable(see top-K) to least probable until
the sum of their probabilities equals the top-P value.
• For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and
the top-P =0.5, then the model will select either A or B as the next token by
using temperature and excludes C as a candidate.
• Specify a lower value for less random responses and a higher value for more
random responses.

Ranjitha P-20213CSE0014 45
10. Prompt Iteration strategies
• Prompt design often requires a few iterations before you get the
desired response consistently.
• Your prompt design strategy should apply the Prompt design best practices,
with incremental refinements.
• You can iteratively introduce some or all of the best practices when testing for
performance that meets your use case needs.
• Additionally, the order of the content in the prompt can sometimes
affect the response.
• Try changing the content order and see how the response changes.
• For example, for multimodal prompts, try adding the files to the prompt
before the instructions.

Ranjitha P-20213CSE0014 46
• As you receive responses from the model, take note of the aspects
that you like and dislike about its responses
• modify your prompts to guide the model to responses that best align
with your use cases.
• Ex: https://cloud.google.com/vertex-ai/generative-
ai/docs/learn/prompts/prompt-iteration

Ranjitha P-20213CSE0014 47
L4:Large language Models(LLMs)

1
Large Language Models (LLMs) T1 pg:11
• Language models(LMs) aim to predict the next word, character or
sentence based on the previous ones in a sequence.
• These models work by estimating the probability of a token or
sequence of tokens occurring within a longer sequence of tokens.
• Ex: “When I hear rain on my roof, I ------- in my kitchen”
• If a token is a word, then a language model determines the
probabilities of different words or sequences of words to replace that
underscore.

2
Large Language Models (LLMs) T1 pg:11
• if a token is a sentence or a blocks of text, the LM could calculate
the likelihood of different entire sentences or blocks of text.
• LMs encode the rules and structure of a language in a way that can be
understood by a machine.
• Applications:
• generating text,
• translating languages, and
• answering questions

3
Large Language Models(LLMs)
• LLMs uses deep learning techniques and massively large data sets to
understand, summarize, generate and predict new content similar to
human language.
• They can understand complex textual data, identify entities and
relationships between them, and generate new text that is coherent and
grammatically accurate.
• A generative language model encodes information about its training text
(embeddings) and uses it to generate new text based on those learnings.
• These models doesn’t use explicit feature extraction.
• Embeddings are the foundations of LLMs.
• Language modeling/NLP tasks depends on the quality of representation
learning – embeddings.

4
• An embedding is a numerical representation of a piece of information
- text, documents, images, audio, etc.
• The representation captures the semantic meaning of what is being
embedded,
• This data could consist of words, in which case we call it a word
embedding
• Embeddings are everywhere in modern deep learning such as
transformers, recommendation engines, SVD matrix decomposition,
layers of deep neural networks, encoders and decoders.
• They provide a common mathematical representation of data
• They compress the data
• They preserve relationships within the data

5
Word Embedding : Example
• Corpus : king, queen, prince and princess
• Embedding : one hot encoding with 3D vectors
• Disadvantage : sparse

6
• The 4 words differ in two dimensions: age and gender
• They can be represented using only 2 dimensions

7
• In real-time applications:
• An embedding is a dense vector of floating point values (the
length of the vector is a user defined parameter)
• Instead of specifying the values for the embedding manually, they
are trainable parameters (weights learned by the model during
training, in the same way a model learns weights for a dense
layer).
• For small data sets: word embeddings are 8-dimensional
• In real time applications: corpus has all words of a language, all
words of all languages too
• For such large data sets - 1024-dimensions.
• A higher dimensional embedding can capture fine-grained relationships
between words, but takes more data to learn.

8
• Embeddings, preserve the important information in a compact form.

9
Embeddings for images

10
• Methods/Tools for embeddings:
• Word2Vec trains neural networks based on the context in which words
appear.
• GloVe(Global Vectors for Word Representation) uses global word-word co-
occurrence statistics https://nlp.stanford.edu/projects/glove/
• BERT uses deep learning and attention mechanisms to generate context-
sensitive embeddings.
• OpenAI provides APIs to send text and receive back precomputed
embeddings using their advanced models.

11
• Modelling human language at a high scale is extremely complex and
resource intensive.
• The size and capability of language models has exploded over the last
few years as computer memory, dataset size, and processing power
increases, and more effective techniques for modeling longer text
sequences are developed.
• "Large" can refer either to the number of parameters in the model,
or sometimes the number of words in the dataset.
• Parameters are the weights the model learned during training, is used
to predict the next token in the sequence.
• BERT (110M parameters) as well as PaLM 2 (up to 340B parameters).

12
13
14
15
Applications of LLMs T1:Pg.12
• Question Answering:
• AI chatbots and virtual assistants can provide personalised and efficient assistance, reducing
response time in customer support and increasing customer experience.
• Ex: chatbots in restaurant reservations and ticket bookings
• Automatic Summarization:
• Produce concise summaries of articles, research papers and other content.
• Sentiment Analysis:
• Analyze opinions and emotions in text, LLMs help business to understand customer feedback and
opinions efficiently
• Topic modelling:
• Given a corpus of docs, LLMs can discover abstract topics and themes across them.
• Using word clusters and latent semantic structures.
• Semantic Search:
• LLMs can focus understanding meaning within a document.
• Using NLP to interpret words and concepts for improved search relevance.

16
• Machine Translation:
• LLMs can translate text/docs from one language to another.
• This supports business to expand globally.

17
Types of LLMs
Different types of LLMs exists to address specific need and challenges
• Auto regressive models :
• Generate text by predicting the next word given the preceding words in a
sequence
• trained to maximize the likelihood of generating the correct next word,
conditioned by context
• computationally expensive and may suffer from generating repetitive or
irrelevant responses.
• Ex: GPT3

18
2. Transformer based models:
• Transformers are a type of deep learning architecture used in large language
models.
• The transformer model, introduced by Vaswani et al. in 2017 is a key component
of many LLMs.
• this transformer architecture allows the model to process and generate text
effectively, capturing long-range dependencies and contextual information.
• Ex: RoBERTa (Robustly Optimized BERT Pretraining Approach) by Facebook AI

19
3. Encoder-Decoder models:
• These models consist of two main components: an encoder that reads and
processes the input sequence and a decoder that generates the output
sequence.
• The encoder learns to encode the input information into a fixed-length
representation, which the decoder uses to generate the output sequence.
• The transformer-based model known as the ‘Transformer’ is an example of an
encoder-decoder architecture.
• MarianMT (Marian Neural Machine Translation) by the University of Edinburgh

20
4. Pre-trained and fine-tuned models:
• Many LLMs are pre-trained on large-scale datasets, enabling them to understand
language patterns and semantics broadly.
• These pre-trained models can then be fine-tuned on specific tasks or domains
using smaller task-specific datasets.
• Fine-tuning allows the model to specialize in a particular task, such as sentiment
analysis or named entity recognition.
• This approach saves computational resources and time compared to training a
large model from scratch for each task.
• Ex: ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements
Accurately)

21
5. Multilingual models
• trained on text from multiple languages and can process and generate text in
several languages.
• They can be useful for tasks such as cross-lingual information retrieval, machine
translation, or multilingual chatbots.
• By leveraging shared representations across languages, multilingual models can
transfer knowledge from one language to another.
• Ex: XLM (Cross-lingual Language Model) developed by Facebook AI Research

22
4. Multilingual models
• trained on text from multiple languages and can process and generate text in
several languages.
• They can be useful for tasks such as cross-lingual information retrieval, machine
translation, or multilingual chatbots.
• By leveraging shared representations across languages, multilingual models can
transfer knowledge from one language to another.
• Ex: XLM (Cross-lingual Language Model) developed by Facebook AI Research
Researchers and engineers continue to explore new architectures, techniques, and
applications to advance the capabilities of these models further

23
Layers of a Neural Network..a Recap

24
Layers of a Neural Network..a Recap
• Input layer :
• Receives input from the outside world, such as images, text, or video, and is made up of one or
more artificial neurons.
• Each neuron represents a feature of the input data, and its value represents the feature's value
• Hidden layers:
• Between the input and output layers, there may be one or more hidden layers.
• These layers perform complex computations on the input data.
• Each neuron in a hidden layer receives inputs from all neurons in the previous layer, applies a
weighted sum, adds a bias term, and passes the result through an activation function.
• Output layer :
• The output layer produces the final prediction or result.
• The number of neurons in this layer depends on the nature of the problem.
• For example, in a binary classification task, there might be one neuron for each class, outputting
probabilities.

25
• Connection Weights (W):
• These represent the strengths of connections between neurons.
• Each connection from neuron A to neuron B has a weight associated with it, denoted as
WAB.
• These weights are learned during training and determine the impact of neuron A’s output
on neuron B.
• Bias Terms (b):
• Each neuron also has a bias term (b) associated with it.
• The bias allows the neuron to shift its output. Bias terms are also learned during training.

26
• Activation Function (σ):
• Each neuron applies an activation function to the weighted sum of its inputs plus the bias.
• Common activation functions include the sigmoid function, ReLU (Rectified Linear Unit), and
tanh (hyperbolic tangent).
• The activation function introduces non-linearity, allowing the network to model complex
relationships.
• https://medium.com/@sarita_68521/basic-understanding-of-neural-network-
structure-eecc8f149a23

27
General Architecture of LLMs
The architecture of Large Language Model primarily consists of multiple layers
of neural networks,
• embedding layers, recurrent layers, feedforward layers and attention layers.
• These layers work together to process the input text and generate output
predictions.
• The embedding layer converts each word in the input text into a high-
dimensional vector representation.
• These embeddings capture semantic and syntactic information about the
words and help the model to understand the context.

28
General Architecture of LLMs – Text Generation

29
General Architecture of LLMs – Text Generation

30
General Architecture of LLMs
• The feedforward layers of LLMs have multiple fully connected
layers that apply nonlinear transformations to the input
embeddings.
• These layers help the model learn higher-level abstractions
from the input text..
• The recurrent layers of LLMs are designed to interpret
information from the input text in sequence.
• These layers maintain a hidden state that is updated at each
time step, allowing the model to capture the dependencies
between words in a sentence.

31
General Architecture of LLMs
• The attention mechanism allows the model to focus
selectively on different parts of the input text.
• This mechanism helps the model attend to the input text’s
most relevant parts and generate more accurate predictions.

32
Popular Large Language Model (LLM)
• GPT - 3 (Generative Pre-trained Transformer 3) –
• the largest Large Language Models developed by OpenAI.
• It has 175 billion parameters
• text generation, translation, and summarization.
• BERT (Bidirectional Encoder Representations from
Transformers) –
• Developed by Google, BERT is another popular LLM that has been
trained on a massive corpus of text data.
• It can understand the context of a sentence and generate
meaningful responses to questions.

33
Popular Large Language Model (LLM)
• XLNet –
• This LLM developed by Carnegie Mellon University and Google uses
“permutation language modeling.”
• It has state-of-the-art performance on language tasks, including
language generation and question answering.
• T5 (Text-to-Text Transfer Transformer) – T5,
• developed by Google,
• is trained on a variety of language tasks and can perform text-to-
text transformations, like translating text to another language,
creating a summary, and question answering.

34
Popular Large Language Model (LLM)

RoBERTa (Robustly Optimized BERT Pretraining Approach)


• Developed by Facebook AI Research,
• RoBERTa is an improved BERT version that performs better on
several language tasks.

35
L5: GPT and its variants

1
Generative Pre-trained Transformer (GPT)
• GPT is a family of AI models built by OpenAI.
• It stands for Generative Pre-trained Transformer,
• Generative: Generative AI is a technology capable of producing content, such as text and imagery.
• Pre-trained: Pre-trained models are saved networks that have already been taught, using a large data set, to resolve a
problem or accomplish a specific task.
• Transformer: A transformer is a deep learning architecture that transforms an input into another type of output.
• GPT is a generative AI technology that has been previously trained to transform its input into a different type of
output.
• Initially, GPT was made up of only LLMs (large language models). But OpenAI has expanded this to include two new
models:
• GPT-4o: a large multimodal model (LLM)
• GPT-4o mini: a small language model (SLM)
• Generate human-like responses to a prompt – initially text based.
• But GPT-4o and GPT-4o mini) can also work with images and audio inputs because they're multimodal.

2
• GPT-1
• is the first version of OpenAI’s language model.
• It followed Google’s 2017 paper Attention is All You Need, in which researchers introduced
the first general transformer model.
• serves as the framework for Google Search, Google Translate, autocomplete, and all large
language models (LLMs), including Bard and Chat-GPT.
• GPT-2
• is the second transformer-based language model by OpenAI.
• It’s open-source, unsupervised, and trained on over 1.5 billion parameters.
• GPT-2 was designed specifically to predict and generate the next sequence of text to follow a
given sentence.

3
GPT-3
• The third iteration of OpenAI’s GPT model is trained on 175 billion parameters,.
• Trained on Wikipedia entries as well as the open-source data set Common Crawl.
• can generate computer code and improve performance in niche areas of
content creation such as storytelling.
GPT-4
• GPT-4 is the most recent model from OpenAI.
• It’s a large multimodal model (LMM), meaning it's capable of parsing image
inputs as well as text.
• exhibits human-level performance across a variety of benchmarks in the
professional and academic realm.

4
How GPT works
1. Pre-training https://www.youtube.com/watch?v=8Lqi-F8g_ps
• The model is trained on a large dataset consisting of text from the internet.
• During this phase, GPT learns grammar, facts about the world and some
reasoning abilities for predicting the next word in a sentence.
• It builds a general understanding of language and context from this extensive
data.
2. Fine Tuning
• After pre-training, the model undergoes fine-tuning on a smaller and more
focused dataset.
• This dataset usually contains examples directly related to the intended
application.

5
3. Tokenization
• The process of breaking down input text into smaller parts, known as tokens, can
include words, subwords or individual characters.
• These tokens are then converted into numerical representations that the model
can process.
4. Transformer architecture
• GPT uses the transformer architecture, which includes mechanisms like self-
attention.
• Self-attention enables the model to assess the significance of individual words
within a sentence, enhancing its comprehension of context and connections
among words.

6
5. Generation
• During the generation phase, the model receives an AI prompt and generates a
coherent and contextually relevant continuation based on its training data.
• It predicts one token at a time, using the previously generated tokens as context.
• This process continues until the desired output length is reached.

7
L6: Other LLMs

1
Other LLMs [T1 Pg.16]
• Other notable foundation models :

• Google’s ------ PaLM2 [PaLM (Pathways Language Model) is a 540 billion parameter transformer-
based large language model developed by Google AI]

• LLaMa --------- Meta AI

• Claude -------- Anthropic

2
• Features of PaLM2 : a 340 billion parameter model trained on 3.6 trillion tokens
• Technical report : https://ai.google/static/documents/palm2techreport.pdf
• Multilinguality:
• PaLM 2 is more heavily trained on multilingual text, spanning more than 100 languages.
• This has significantly improved its ability to understand, generate and translate nuanced text
— including idioms, poems and riddles — across a wide variety of languages, a hard problem
to solve.
• PaLM 2 also passes advanced language proficiency exams at the “mastery” level.
• Reasoning:
• Trained on scientific papers and web pages that contain mathematical expressions.
• As a result, it demonstrates improved capabilities in logic, common sense reasoning, and
mathematics.
• Coding:
• pre-trained on a large quantity of publicly available source code datasets.
• This means that it excels at popular programming languages like Python and JavaScript,
• can also generate specialized code in languages like Prolog, Fortran and Verilog.

3
Med-PaLM2:
Prompt :

Response:

4
• Sec-PaLM is a specialized version of PaLM 2 trained on security use cases, and
used for cybersecurity analysis.
• Available through Google Cloud, it uses AI to help analyze and explain the
behavior of potentially malicious scripts,
• It detects which scripts are actually threats to people and organizations in
unprecedented time.

5
2. LLaMa :
• Llama (acronym for Large Language Model Meta AI, is a family
of autoregressive large language models (LLMs) released by Meta AI starting in
February 2023.
• The latest version is Llama 3.1, released in July 2024.

3. Claude :
• Claude was the initial version of Anthropic's language model released in March
2023
• Claude demonstrated proficiency in various tasks but had certain limitations in
coding, math, and reasoning capabilities.

You might also like