0% found this document useful (0 votes)

31 views48 pages

14 LookingForward

The document discusses advancements in Natural Language Processing (NLP), including pretraining, scale, and the emergence of new problems such as prompting and multimodality. It highlights core methods like word embeddings, language models, and transformers, emphasizing the importance of large-scale models and fine-tuning techniques. Additionally, it addresses challenges in robustness and the integration of retrieval components to enhance NLP capabilities.

Uploaded by

zelsiqkiikwuzqvrzm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views48 pages

14 LookingForward

Uploaded by

zelsiqkiikwuzqvrzm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Modern Natural Language Processing:

Where do we go from here?

Antoine Bosselut
Section Outline

• Advances: NLP Successes, Pretraining, Scale

• New Problems: Prompting, Knowledge & Reasoning, Retrieval-Augmentation,

Robustness, Multimodality

2
Core Methods

3
Word Embeddings

• Words and other tokens become vectors; no longer discrete symbols!

• Need to define a vocabulary of words (or token types) V that our system can
assign to a vector
• Word embeddings can be learned in a self-supervised manner from large
quantities of raw text
• Learning word embeddings from scratch using labeled data for a task is data-
inefficient!
• Three main algorithms: Continuous Bag of Words (CBOW), Skip-gram, and
GloVe

4
LMs & RNNs

• Language models learn to estimate the distribution over the next word given a
context
• Early neural LMs (and n-gram models) suffered from fixed context windows
• Recurrent neural networks can theoretically learn to model an unbounded
context length
• no increase in model size because weights are shared across time steps
• Practically, however, vanishing gradients stop vanilla RNNs from learning useful
long-range dependencies
• LSTMs are variants of recurrent networks that mitigate the vanishing gradient
problem
• used for for many sequence-to-sequence tasks
5
Transformers

• Temporal Bottleneck: Vanishing gradients stop many RNN architectures from

learning long-range dependencies
• Parallelisation Bottleneck: RNN states depend on previous time step hidden
state, so must be computed in series
• Attention: Direct connections between output states and inputs (solves
temporal bottleneck)
• Self-Attention: Remove recurrence over input, allowing parallel computation
for encoding
• Transformers use self-attention to encode sequences, but now require position
embeddings to capture sequence order

6
Text Generation

• Text generation is the foundation of many useful NLP applications (e.g.,

translation, summarisation, dialogue systems)
• Autoregressive: models generate one token a time, using the context and
previously generated tokens as inputs to generate the next token
• Teacher forcing is the premier algorithm for training text generators
• A variety of decoding algorithms can be used to generate text from models,
each trading off expected quality vs. diversity in different ways.
• Automatic evaluation of NLG systems (content overlap, model-based, human)
is difficult as most metrics fall short of reliable estimates of output quality

7
Deep Learning Successes in NLP

8
Question

What did these ingredients propel ?

9
Pretraining

Massive Text Corpus Transformer Language Model

Used to

Learn

10 (Radford et al., 2018, 2019, many others)

Pretraining: Two Approaches

(Causal, Left-to-right) Masked

Language Modeling Language Modeling

I really enjoyed the movie we I really enjoyed the ____ we

watched on ____ watched on Saturday!

11 (Radford et al., 2018, 2019, many others) (Devlin et al., 2018; Liu et al., 2020)
Fine-tuning a single model

• Prepend special token [CLS]: Classify output embedding for this token

12 Devlin et al. (2019)

Fine-tuning a single model

• Prepend special token [CLS]: Classify output embedding for this token
• Can use same model for classification tasks, sentence pair tasks, sequence
labelling tasks, and many more!
13 Devlin et al. (2019)
Pretraining Improvements!

Superhuman results on benchmark datasets!

All top models use pretrained transformers!

14
Scale: Parameters

# Parameters in Model

Time

15
Scale: Data
ELMo: 1B training tokens
BERT: 3.3B training tokens
RoBERTa: ~30B training tokens

16
Slide Credit: Mohit Iyyer
Scale: Flops

17 Slide Credit: Mohit Iyyer

Scaling Laws

18 Kaplan et al. (2020)

Question

Why do we want to make these models as big as possible?

19
Fine-tuning a single model

• During fine-tuning:
- Keep all pretrained
f(x) parameters frozen
h
- LoRA: Initialise new Feedforward Net (FFN)
alongside components of transformer blocks
Pretrained
Pretrained 𝐵=0 - Keep these FFN layers limited in number of
Weights
Weights parameters
𝑟
𝑑×𝑑
‣ # 𝑊
parameters ∈inℝFFN layers is
𝑊∈ ℝ 𝑑×𝑑
𝐴 = 𝒩(0, 𝜎 2 ) 2 * d * r, so keep r small
𝑑
𝑑 ‣ r is hidden dimension of FFN
x
x - Only update these FFN layers

21 Hu et al., (2021)
In-context Learning: A new paradigm!

• At very large-scale, language

models exhibit emergent in-
context learning abilities

• Providing examples as input that

depict desired behaviour is
enough for model to replicate it

• No learning required, though

learning can improve this ability
Chain-of-thought Reasoning
Standard Prompting Chain of Thought Prompting
Input Input

Q: Roger has 5 tennis balls. He buys 2 more cans of Q: Roger has 5 tennis balls. He buys 2 more cans of
tennis balls. Each can has 3 tennis balls. How many tennis balls. Each can has 3 tennis balls. How many
tennis balls does he have now? tennis balls does he have now?

A: The answer is 11. A: Roger started with 5 balls. 2 cans of 3 tennis balls
each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to
make lunch and bought 6 more, how many apples Q: The cafeteria had 23 apples. If they used 20 to
do they have? make lunch and bought 6 more, how many apples
do they have?

Model Output Model Output

A: The answer is 27. A: The cafeteria had 23 apples originally. They used
20 to make lunch. So they had 23 - 20 = 3. They
bought 6 more apples, so they have 3 + 6 = 9. The
answer is 9.

Model self-rationalizes through text generation

What do those two abilities
remind you of?
ChatGPT!

25
Alignment

Large Models can be Aligned to new Behaviors

Outcome: Many Tasks

27
Outcome: Personalization

28
Question

Why are these language models so effective at scale?

29
Encoded Knowledge

World knowledge is implicitly encoded in

LM parameters! (e.g., that barbershops are
places to get buzz cuts)

BERT barbershop: 54%

Bob went to the <MASK> (teacher): barber: 20%
to get a buzz cut 24 layer salon: 6%
Transformer stylist: 4%
…

30
Slide Credit: Mohit Iyyer
Question

Why might this be a bad idea?

31
32 Guu et al., 2020 (“REALM”)
Question

What could we do instead?

33
Retrieval-Augmented LLMs

Lewis et al., 2020

Chang et al., 2020 Borgeaud et al., 2021

34
35
Slide credit: Mohit Iyyer
36
Slide credit: Mohit Iyyer
37
Slide credit: Mohit Iyyer
38
Slide credit: Mohit Iyyer
39
Slide credit: Mohit Iyyer
Retrieval-Augmented LLMs
• Readily available at scale, requires no processing

• We have powerful methods for encoding text (e.g., BERT)

40 Slide credit: Mohit Iyyer

Retrieval-Augmented LLMs
• Readily available at scale, requires no processing

• We have powerful methods for encoding text (e.g., BERT)

• However, these methods don’t really work yet with larger units of text (e.g.,
books)

• “Long-context” NLP is an active area of research!

41 Slide credit: Mohit Iyyer

Recap Question

Why might we want to integrate retrieval components?

42
Remaining Problems!

43
Robustness

Deep learning models exploit biases (Bolukbasi et al., 2016), annotation

artifacts (Gururangan et al., 2018), surface patterns (Li & Gauthier, 2017), etc.

They struggle to learn robust understanding abilities

“All the impressive achievements of deep

learning amount to just curve fitting”

(Pearl, 2018)
44
Multimodality
mage Captioning with Attention

Image captioning with attention

, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
45
yright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
Xu et al., 2015
Multimodality

Masked
Language Modeling

I really enjoyed the ____ we

watched on Saturday!

46 Lu et al., 2019
Multimodality

Dall-E
Using natural language training Learning to generate images from
to improve computer vision natural language descriptions

https://openai.com/blog/clip/ https://openai.com/blog/dall-e/
47
Thanks for a great semester!

Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Large Language Models (LLM)
100% (4)
Large Language Models (LLM)
139 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Brief Introduction To LLM
No ratings yet
Brief Introduction To LLM
69 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
FDP Deep Learning Architectures and Applications
No ratings yet
FDP Deep Learning Architectures and Applications
51 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Bert
No ratings yet
Bert
60 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
NLP Year in Review - 2019 - Dair - Ai - Medium
No ratings yet
NLP Year in Review - 2019 - Dair - Ai - Medium
26 pages
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
Cs224n Text Generation
No ratings yet
Cs224n Text Generation
73 pages
Lec7 - Large Models
No ratings yet
Lec7 - Large Models
33 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Deep Learning in Natural Language Processing
No ratings yet
Deep Learning in Natural Language Processing
41 pages
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
CL3410 - Language Models and Agents - LLM Working and Tokenization
No ratings yet
CL3410 - Language Models and Agents - LLM Working and Tokenization
42 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
ChatBot Unit1
No ratings yet
ChatBot Unit1
35 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (3)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Syllabus DSA4213
No ratings yet
Syllabus DSA4213
6 pages
Jason Wei Stanford cs330 Talk
No ratings yet
Jason Wei Stanford cs330 Talk
44 pages
Few-Shot Learning in GPT-3 Models
No ratings yet
Few-Shot Learning in GPT-3 Models
74 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
AI: Pre-Trained Language Models Review
No ratings yet
AI: Pre-Trained Language Models Review
15 pages
Top 50 LinkedIn LLM Interview Questions
100% (1)
Top 50 LinkedIn LLM Interview Questions
12 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
Hands-On Large Language Models
No ratings yet
Hands-On Large Language Models
59 pages
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
Deep Learning for NLP Overview
No ratings yet
Deep Learning for NLP Overview
24 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Generative AI NLP Bootcamp
No ratings yet
Generative AI NLP Bootcamp
17 pages
Prompting and Fine-Tuning Pre-Trained Generative Language Models
No ratings yet
Prompting and Fine-Tuning Pre-Trained Generative Language Models
4 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
26 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
2022ADeepLearning BasedModelforDateFruitClassification
No ratings yet
2022ADeepLearning BasedModelforDateFruitClassification
17 pages
AI & Machine Learning Lab Course Guide
No ratings yet
AI & Machine Learning Lab Course Guide
6 pages
Complete Generative AI Curriculum
No ratings yet
Complete Generative AI Curriculum
6 pages
Quiz 1 DLNN
No ratings yet
Quiz 1 DLNN
4 pages
2021 - Deep Learning in Geodesy
No ratings yet
2021 - Deep Learning in Geodesy
8 pages
Personalized Clothes Recommendation System
No ratings yet
Personalized Clothes Recommendation System
9 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Lenet-5 Architecture Explained
No ratings yet
Lenet-5 Architecture Explained
6 pages
LSTM Presentation
100% (1)
LSTM Presentation
23 pages
Deep Learning: A Technical Guide
No ratings yet
Deep Learning: A Technical Guide
106 pages
DL1 - Neural Network
No ratings yet
DL1 - Neural Network
14 pages
Neural Network Foundations & Perceptron
No ratings yet
Neural Network Foundations & Perceptron
31 pages
LSTM Architecture Presentation
No ratings yet
LSTM Architecture Presentation
18 pages
Artificial Intelligence Shaping The Future
No ratings yet
Artificial Intelligence Shaping The Future
8 pages
Neural Networks
No ratings yet
Neural Networks
3 pages
Autoencoder: Tuan Nguyen - AI4E
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
35 pages
Weights and Biases in Neural Networks
No ratings yet
Weights and Biases in Neural Networks
10 pages
AI's Impact on Future Work Trends
No ratings yet
AI's Impact on Future Work Trends
12 pages
Neuron Model and Network Architecture
No ratings yet
Neuron Model and Network Architecture
21 pages
CPCS432 Lecture 5 Deep Learning and Artificial Neural Networks Techniques in Computer Vision
No ratings yet
CPCS432 Lecture 5 Deep Learning and Artificial Neural Networks Techniques in Computer Vision
57 pages
Unit-I Introduction and ANN Structure
No ratings yet
Unit-I Introduction and ANN Structure
15 pages
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks I IBM
No ratings yet
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks I IBM
7 pages
ELM Performance with Activation Functions
No ratings yet
ELM Performance with Activation Functions
9 pages
Generative AI For Dummies
67% (3)
Generative AI For Dummies
6 pages
Math Lab 1
No ratings yet
Math Lab 1
7 pages
ML Guide for Absolute Beginners
No ratings yet
ML Guide for Absolute Beginners
4 pages
Soft Computing Question Bank
No ratings yet
Soft Computing Question Bank
4 pages
Auto Encoder
No ratings yet
Auto Encoder
13 pages