LLM_introduction 2024
LLM_introduction 2024
2
A brief history of NLP
3
Outline
● Generative AI
● Large Language Models
● Prompting Engineering
● LLM agents
5
Generative AI market
Source: https://www.precedenceresearch.com/generative-ai-market
6
Generative AI
7
Key characteristics of Generative AI
● Creativity: It can produce novel outputs that do not exist in the training
data
● Learning patterns: It analyzes and learns from vast amounts of data to
replicate style or logic
● Versatility: Generative AI can be applied across multiple media types,
such as text, images, and sound
8
What is Generative AI capable of?
9
What is Generative AI capable of?
● Prompt: a picture of a
narwhal giving a
lecture on generative
AI. Put the narwhal on
a stage in front of a
large audience. The
narwhal should appear
scholarly and is
presenting off of a slide
deck, gesturing at
some dense equations
on the slide
10
Source: https://stablediffusionweb.com/app/image-generator
What is Generative AI capable of?
11
Generative AI in math
Source: https://waveline.ai/blog/generative-ai-mathematically
12
Types of Generative AI models
13
Source: https://www.xenonstack.com/blog/generative-ai-models
Intelligence in Multi-Sensory Data
● Harnessing Multimodality
○ This world we live in is replete with
multimodal information & signals,
not just language
14
Intelligence in Multi-Sensory Data
● Building Multimodal LLMs (MLLMs)
○ Can we transfer the success of LLMsto MLLMs, enabling LLMs to comprehend multimodal
information as deeply as they understand language?
● Perceiving and interacting with the world as HUMANBEINGs do, might be the key
to achieving human-level AI.
15
Intelligence in Multi-Sensory Data
● Trends of MLLMs
16
Large Language Models
17
Language Modeling (Mô hình ngôn ngữ)?
● What is the probability of “Tôi trình bày ChatGPT tại Trường ĐH Công
Nghệ” ?
● What is the probability of “Công Nghệ học Đại trình bày ChatGPT tại Tôi” ?
● “Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm …”) or
P(…/Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm) ?
● A model that computes either of these:
W = w1,w2,w3,w4,w5…wn
P(W) or P(wn|w1,w2…wn-1) is called a language model
18
LM decoder
19
Large Language Model
20
Large Language Model
21
22
GPT-o1
23
Large Language Models - yottaFlops of Compute
Source: https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf 24
Why LLMs?
● Double Descent
25
Why LLMs?
● Scaling Law for Neural Language Models
○ Performance depends strongly on scale! We keep getting better performance as
we scale the model, data, and compute up!
Source: Jared Kaplan et al, Scaling Laws for Neural Language Models, 2020 26
Why LLMs?
● Generalization
○ We can now use one single model to solve many tasks
27
Why LLMs? Emergence in few-shot prompting
Emergent Abilities
• Some ability of LM is
not present in
smaller models but
is present in larger
models
What is pre-training / fine-tuning?
29
BERT: Bidirectional Encoder Representations from
Transformers
Source: (Devlin et al, 2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 30
Masked Language Modeling (MLM)
● Solution: Mask out k% of the input words, and then predict the masked words
31
Next Sentence Prediction (NSP)
32
BERT pre-training
33
RoBERTa
● BERT is still under-trained
● Removed the next sentence prediction pre-training — it adds more noise than
benefits!
● Trained longer with 10x data & bigger batch sizes
● Pre-trained on 1,024 V100 GPUs for one day in 2019
34
(Liu et al., 2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach
Three major forms of pre-training (LLMs)
35
Text-to-text models
● Encoder-only models (e.g., BERT) enjoy the benefits of bidirectionality but they
can’t be used to generate text
● Decoder-only models (e.g., GPT) can do generation but they are left-to-right LMs..
● Text-to-text models combine the best of both worlds! (T5, Bard)
(Raffel et al., 2020): Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 36
How to use these pre-trained models?
37
From GPT to GPT-2 to GPT-3
38
GPT-2 architecture
39
GPT-3: language models are few-shot learners
40
GPT-3’s in-context learning
41
[2020] GPT-3 to [2022] ChatGPT
What’s new?
● Training on code
● Supervised
instruction tuning
● RLHF =
Reinforcement
learning from
human feedback
Source: Fu, 2022, “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language
Models to their Sources" 42
43
Large Language models Risks
45
Pretraining + Prompting Paradigm
46
47
Source: https://gradientflow.com/llm-triad-tune-prompt-reward/
Temperature in LLMs
● Temperature and Top-p sampling are two essential parameters that can
be tweaked to control the output of LLMs
● Temperature (0-2): This parameter determines the creativity and diversity of the text
generated by LLMs model. A higher temperature value (e.g., 1.5) leads to more
diverse and creative text, while a lower value (e.g., 0.5) results in more focused and
deterministic text.
48
Top-p Sampling in LLMs
● Top-p Sampling (0-1): This parameter maintains a balance between diversity and
high-probability words by selecting tokens from the top-p most probable tokens whose
collective probability mass is greater than or equal to a threshold p.
49
Why Prompt Engineering?
● Why learn prompt engineering? •
○ Important for research, discoveries, and advancement
○ Helps to test and evaluate the limitations of LLMs
○ Enables all kinds of innovative applications on top of LLMs
50
Source: https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed
First Basic Prompt
51
Elements of a Prompt
● A prompt is composed with the following components:
52
Settings to keep in mind
● When prompting a language model you should keep in mind a few settings
● You can get very different results with prompts when using different settings
● One important setting is controlling how deterministic the model is when
generating completion for prompts
○ Temperature and top_p are two important parameters to keep in mind
○ Generally, keep these low if you are looking for exact answers
○ keep them high if you are looking for more diverse responses
53
Designing Prompts for Different Tasks
● In the next few slides, we will cover a few examples of common tasks
using different prompts
● Tasks covered:
○ Text Summarization
○ Question Answering
○ Text Classification
○ Code Generation
○ …
54
Text Summarization
55
Question Answering
Answer the question based on the context below. Keep the answer short and concise.
Respond "Unsure about answer" if not sure about the answer. Context: Teplizumab
traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There,
scientists generated an early version of the antibody, dubbed OKT3. Originally sourced
from mice, the molecule was able to bind to the surface of T cells and limit their cell-
killing potential. In 1986, it was approved to help prevent organ rejection after kidney
transplants, making it the first therapeutic antibody allowed for human use.
Question: What was OKT3 originally sourced from?
Answer: According to the context, OKT3 was originally sourced from mice.
56
Text Classification
Classify the text into neutral, negative or positive.
Text: I think the food was okay.
Sentiment: Neutral
57
Code Generation
58
Reasoning
Tìm các số nguyên tố từ 1 đến 100 và cho biết có tất cả bao nhiêu số nguyên
tố?
59
Advanced Techniques for Prompt Engineering
60
Prompt Engineering Techniques
● Many advanced prompting techniques have been designed to
improve performance on complex tasks •
○ Few-shot prompts
○ Chain-of-thought (CoT) prompting
○ Self-Consistency
○ Knowledge Generation Prompting
○ ReAct
61
Few-shot Prompts
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A: The answer is True.
62
Chain-of-Thought (CoT) Prompting
● Chain of Thought (CoT) prompting is a recently developed prompting method, which
encourages the LLM to explain its reasoning.
63
Chain-of-Thought (CoT) Prompting
● Prompting can be further improved by instructing the model to reason about
the task when responding
○ This is very useful for tasks that requiring reasoning
○ You can combine it with few-shot prompting to get better results
○ You can also do zero-shot CoT where exemplars are not available
65
Self-Consistency
66
Self-Consistency
Source: Wang et al, Self-Consistency Improves Chain of Thought Reasoning in Language Models, 2022
67
Self-Consistency
Hello,
Cheers,
Donny
Classify the above email as IMPORTANT or NOT IMPORTANT as it relates to a software company.
Let's think step by step.
68
Generate Knowledge Prompting
● This technique involves using additional knowledge provided as part of the context to
improve results on complex tasks such as commonsense reasoning
● The knowledge used in the context is generated by a model and used in the prompt to
make a prediction
○ Highest-confidence prediction is used
69
Generate Knowledge Prompting Example
● The first step is to generate knowledge. Below is an example of how to
generate the knowledge samples
70
Generate Knowledge Prompting Example
● The knowledge samples are then used to generate knowledge augmented
questions to get answer proposals
○ The highest-confidence response is selected as final answer
71
ReAct Prompting
● (Yao et al., 2022) introduced a framework named ReAct where LLMs are
used to generate both reasoning traces and task-specific actions in an
interleaved manner.
● The ReAct framework can allow LLMs to interact with external tools to
retrieve additional information that leads to more reliable and factual
responses.
● ReAct combined with chain-of-thought (CoT) that allows use of both
internal knowledge and external information obtained during reasoning.
Source: (Yao et al., 2022) ReAct: Synergizing Reasoning and Acting in Language Models
72
ReAct Agent pipeline
73
74
More Prompting Techniques
● Refer to: https://www.promptingguide.ai/techniques
75
LLM agents
76
General components of an agent
Source: https://developer.nvidia.com/blog/introduction-to-llm-agents/ 77
Summary
● Generative AI
● Large Language models and Modality LLMs
● Prompting Engineering
● LLM agents