0% found this document useful (0 votes)
24 views77 pages

LLM_introduction 2024

The document provides an overview of Generative AI and Large Language Models (LLMs), highlighting their history, capabilities, and applications. It discusses key characteristics of Generative AI, such as creativity and versatility, and outlines various prompting techniques to enhance LLM performance. Additionally, it addresses the risks associated with LLMs and the importance of prompt engineering in leveraging these models effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views77 pages

LLM_introduction 2024

The document provides an overview of Generative AI and Large Language Models (LLMs), highlighting their history, capabilities, and applications. It discusses key characteristics of Generative AI, such as creativity and versatility, and outlines various prompting techniques to enhance LLM performance. Additionally, it addresses the risks associated with LLMs and the importance of prompt engineering in leveraging these models effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Generative AI (LLMs)

Nguyen Van Vinh - UET


A brief history of NLP

2
A brief history of NLP

● 2018: BERT 2019: T5, RoBERTa 2020: GPT-3 2022: ChatGPT

3
Outline

● Generative AI
● Large Language Models
● Prompting Engineering
● LLM agents

UET-FIT LLM and Its Applications 4


Generative AI market

5
Generative AI market

Source: https://www.precedenceresearch.com/generative-ai-market
6
Generative AI

● The field of artificial intelligence is concerned with developing computer


systems that mimic (or exceed) human behavior

7
Key characteristics of Generative AI

● Creativity: It can produce novel outputs that do not exist in the training
data
● Learning patterns: It analyzes and learns from vast amounts of data to
replicate style or logic
● Versatility: Generative AI can be applied across multiple media types,
such as text, images, and sound

8
What is Generative AI capable of?

9
What is Generative AI capable of?

● Prompt: a picture of a
narwhal giving a
lecture on generative
AI. Put the narwhal on
a stage in front of a
large audience. The
narwhal should appear
scholarly and is
presenting off of a slide
deck, gesturing at
some dense equations
on the slide

10
Source: https://stablediffusionweb.com/app/image-generator
What is Generative AI capable of?

11
Generative AI in math

Source: https://waveline.ai/blog/generative-ai-mathematically
12
Types of Generative AI models

13
Source: https://www.xenonstack.com/blog/generative-ai-models
Intelligence in Multi-Sensory Data

● Harnessing Multimodality
○ This world we live in is replete with
multimodal information & signals,
not just language

14
Intelligence in Multi-Sensory Data
● Building Multimodal LLMs (MLLMs)
○ Can we transfer the success of LLMsto MLLMs, enabling LLMs to comprehend multimodal
information as deeply as they understand language?

● Perceiving and interacting with the world as HUMANBEINGs do, might be the key
to achieving human-level AI.
15
Intelligence in Multi-Sensory Data

● Trends of MLLMs

16
Large Language Models

17
Language Modeling (Mô hình ngôn ngữ)?

● What is the probability of “Tôi trình bày ChatGPT tại Trường ĐH Công
Nghệ” ?
● What is the probability of “Công Nghệ học Đại trình bày ChatGPT tại Tôi” ?
● “Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm …”) or
P(…/Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm) ?
● A model that computes either of these:
W = w1,w2,w3,w4,w5…wn
P(W) or P(wn|w1,w2…wn-1) is called a language model

18
LM decoder

19
Large Language Model

20
Large Language Model

21
22
GPT-o1

23
Large Language Models - yottaFlops of Compute

Source: https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf 24
Why LLMs?

● Double Descent

25
Why LLMs?
● Scaling Law for Neural Language Models
○ Performance depends strongly on scale! We keep getting better performance as
we scale the model, data, and compute up!

Source: Jared Kaplan et al, Scaling Laws for Neural Language Models, 2020 26
Why LLMs?

● Generalization
○ We can now use one single model to solve many tasks

27
Why LLMs? Emergence in few-shot prompting
Emergent Abilities
• Some ability of LM is
not present in
smaller models but
is present in larger
models
What is pre-training / fine-tuning?

● “Pre-train” a model on a large dataset for task X, then “fine-tune” it on a


dataset for task Y
● Key idea: X is somewhat related to Y, so a model that can do X will have
some good neural representations for Y as well (transfer learning)
● ImageNet pre-training is huge in computer vision: learning generic visual
features for recognizing objects

Can we find some task X that can be


useful for a wide range of
downstream tasks Y?

29
BERT: Bidirectional Encoder Representations from
Transformers

Source: (Devlin et al, 2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 30
Masked Language Modeling (MLM)

● Q: Why we can’t do language modeling with bidirectional models?

● Solution: Mask out k% of the input words, and then predict the masked words

31
Next Sentence Prediction (NSP)

32
BERT pre-training

33
RoBERTa
● BERT is still under-trained
● Removed the next sentence prediction pre-training — it adds more noise than
benefits!
● Trained longer with 10x data & bigger batch sizes
● Pre-trained on 1,024 V100 GPUs for one day in 2019

34
(Liu et al., 2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach
Three major forms of pre-training (LLMs)

35
Text-to-text models
● Encoder-only models (e.g., BERT) enjoy the benefits of bidirectionality but they
can’t be used to generate text
● Decoder-only models (e.g., GPT) can do generation but they are left-to-right LMs..
● Text-to-text models combine the best of both worlds! (T5, Bard)

T5 = Text-to-Text Transfer Transformer

(Raffel et al., 2020): Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 36
How to use these pre-trained models?

37
From GPT to GPT-2 to GPT-3

38
GPT-2 architecture

39
GPT-3: language models are few-shot learners

● GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

40
GPT-3’s in-context learning

41
[2020] GPT-3 to [2022] ChatGPT

What’s new?
● Training on code

● Supervised
instruction tuning

● RLHF =
Reinforcement
learning from
human feedback

Source: Fu, 2022, “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language
Models to their Sources" 42
43
Large Language models Risks

● LLMs make mistakes


○ (falsehoods, hallucinations)
● LLMs can be misused
○ (misinformation, spam)
● LLMs can cause harms
○ (toxicity, biases, stereotypes)
● LLMs can be attacked
○ (adversarial examples, poisoning, prompt injection)
● The cost of LLMs is expensive

45
Pretraining + Prompting Paradigm

46
47
Source: https://gradientflow.com/llm-triad-tune-prompt-reward/
Temperature in LLMs

● Temperature and Top-p sampling are two essential parameters that can
be tweaked to control the output of LLMs

● Temperature (0-2): This parameter determines the creativity and diversity of the text
generated by LLMs model. A higher temperature value (e.g., 1.5) leads to more
diverse and creative text, while a lower value (e.g., 0.5) results in more focused and
deterministic text.

48
Top-p Sampling in LLMs
● Top-p Sampling (0-1): This parameter maintains a balance between diversity and
high-probability words by selecting tokens from the top-p most probable tokens whose
collective probability mass is greater than or equal to a threshold p.

49
Why Prompt Engineering?
● Why learn prompt engineering? •
○ Important for research, discoveries, and advancement
○ Helps to test and evaluate the limitations of LLMs
○ Enables all kinds of innovative applications on top of LLMs

50
Source: https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed
First Basic Prompt

51
Elements of a Prompt
● A prompt is composed with the following components:

52
Settings to keep in mind

● When prompting a language model you should keep in mind a few settings
● You can get very different results with prompts when using different settings
● One important setting is controlling how deterministic the model is when
generating completion for prompts
○ Temperature and top_p are two important parameters to keep in mind
○ Generally, keep these low if you are looking for exact answers
○ keep them high if you are looking for more diverse responses

53
Designing Prompts for Different Tasks

● In the next few slides, we will cover a few examples of common tasks
using different prompts
● Tasks covered:
○ Text Summarization
○ Question Answering
○ Text Classification
○ Code Generation
○ …

54
Text Summarization

55
Question Answering

Answer the question based on the context below. Keep the answer short and concise.
Respond "Unsure about answer" if not sure about the answer. Context: Teplizumab
traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There,
scientists generated an early version of the antibody, dubbed OKT3. Originally sourced
from mice, the molecule was able to bind to the surface of T cells and limit their cell-
killing potential. In 1986, it was approved to help prevent organ rejection after kidney
transplants, making it the first therapeutic antibody allowed for human use.
Question: What was OKT3 originally sourced from?
Answer: According to the context, OKT3 was originally sourced from mice.

56
Text Classification
Classify the text into neutral, negative or positive.
Text: I think the food was okay.
Sentiment: Neutral

57
Code Generation

“”” Table departments, columns = [DepartmentId, DepartmentName] Table


students, columns = [DepartmentId, StudentId, StudentName] Create a MySQL
query for all students in the Computer Science Department “””

SELECT students.StudentName FROM students JOIN departments ON


departments.DepartmentId = students.DepartmentId WHERE
departments.DepartmentName = 'Computer Science';

58
Reasoning

Tìm các số nguyên tố từ 1 đến 100 và cho biết có tất cả bao nhiêu số nguyên
tố?

59
Advanced Techniques for Prompt Engineering

60
Prompt Engineering Techniques
● Many advanced prompting techniques have been designed to
improve performance on complex tasks •
○ Few-shot prompts
○ Chain-of-thought (CoT) prompting
○ Self-Consistency
○ Knowledge Generation Prompting
○ ReAct

61
Few-shot Prompts

● Few-shot prompting allows us to provide examples in prompts to steer the


model towards better performance

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A: The answer is True.

62
Chain-of-Thought (CoT) Prompting
● Chain of Thought (CoT) prompting is a recently developed prompting method, which
encourages the LLM to explain its reasoning.

63
Chain-of-Thought (CoT) Prompting
● Prompting can be further improved by instructing the model to reason about
the task when responding
○ This is very useful for tasks that requiring reasoning
○ You can combine it with few-shot prompting to get better results
○ You can also do zero-shot CoT where exemplars are not available

Source: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, 2022 64


Zero-Shot CoT
● Involves adding "Let's think step by step" to the original prompt

65
Self-Consistency

● Self-consistency is an approach that simply asks a model the same prompt


multiple times and takes the majority result as the final answer. It is follow
up to CoT, and is more powerful when used in conjunction with it.

66
Self-Consistency

● Self-consistency has been shown to improve results on arithmetic,


commonsense and symbolic reasoning tasks.
● (Wang et al., 2022) discuss a more complex method for selecting the final
answer, which deals with the LLM generated probabilities for each chain
of thought.

Source: Wang et al, Self-Consistency Improves Chain of Thought Reasoning in Language Models, 2022
67
Self-Consistency

Hello,

I have discovered a major security vulnerability in your system. Although it is not


easy to use, it is possible to gain access to all of your users' data. I have attached
a proof of concept. Please fix this issue as soon as possible.

Cheers,

Donny

Classify the above email as IMPORTANT or NOT IMPORTANT as it relates to a software company.
Let's think step by step.

68
Generate Knowledge Prompting
● This technique involves using additional knowledge provided as part of the context to
improve results on complex tasks such as commonsense reasoning
● The knowledge used in the context is generated by a model and used in the prompt to
make a prediction
○ Highest-confidence prediction is used

Source: Generated Knowledge Prompting for


Commonsense Reasoning

69
Generate Knowledge Prompting Example
● The first step is to generate knowledge. Below is an example of how to
generate the knowledge samples

70
Generate Knowledge Prompting Example
● The knowledge samples are then used to generate knowledge augmented
questions to get answer proposals
○ The highest-confidence response is selected as final answer

71
ReAct Prompting

● (Yao et al., 2022) introduced a framework named ReAct where LLMs are
used to generate both reasoning traces and task-specific actions in an
interleaved manner.
● The ReAct framework can allow LLMs to interact with external tools to
retrieve additional information that leads to more reliable and factual
responses.
● ReAct combined with chain-of-thought (CoT) that allows use of both
internal knowledge and external information obtained during reasoning.

Source: (Yao et al., 2022) ReAct: Synergizing Reasoning and Acting in Language Models

72
ReAct Agent pipeline

73
74
More Prompting Techniques
● Refer to: https://www.promptingguide.ai/techniques

75
LLM agents

76
General components of an agent

Source: https://developer.nvidia.com/blog/introduction-to-llm-agents/ 77
Summary

● Generative AI
● Large Language models and Modality LLMs
● Prompting Engineering
● LLM agents

UET-FIT LLM and Its Applications 78

You might also like