0% found this document useful (0 votes)
4K views150 pages

Transformers For Natural Language Processing and Computer Vision

The document is a syllabus for a course on Generative AI Innovative Applications, focusing on Transformers for Natural Language Processing and Computer Vision. It outlines weekly topics, including large language models, case studies, and various applications of generative AI, culminating in final project reports. The course is taught by Professor Min-Yuh Day at National Taipei University in Spring 2025.

Uploaded by

Abhijit Mazumdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4K views150 pages

Transformers For Natural Language Processing and Computer Vision

The document is a syllabus for a course on Generative AI Innovative Applications, focusing on Transformers for Natural Language Processing and Computer Vision. It outlines weekly topics, including large language models, case studies, and various applications of generative AI, culminating in final project reports. The course is taught by Professor Min-Yuh Day at National Taipei University in Spring 2025.

Uploaded by

Abhijit Mazumdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

Generative AI Innovative Applications

Transformers for Natural Language


Processing and Computer Vision
1132GAIIA02
MBA, IM, NTPU (M6031) (Spring 2025)
Tue 2, 3, 4 (9:10-12:00) (B3F17)

Min-Yuh Day, Ph.D,


https://meet.google.com/
paj-zhhj-mya

Professor
Institute of Information Management, National Taipei University
https://web.ntpu.edu.tw/~myday
1
2025-02-25
Syllabus
Week Date Subject/Topics
1 2025/02/18 Introduction to Generative AI Innovative Applications
2 2025/02/25 Transformers for Natural Language Processing and
Computer Vision
3 2025/03/04 Large Language Models (LLMs),
NVIDIA Building RAG Agents with LLMs Part I
4 2025/03/11 Case Study on Generative AI Innovative Applications I
5 2025/03/18 NVIDIA Building RAG Agents with LLMs Part II
6 2025/03/25 NVIDIA Building RAG Agents with LLMs Part III

2
Syllabus
Week Date Subject/Topics
7 2025/04/01 Self-Learning
8 2025/04/08 Midterm Project Report
9 2025/04/15 Generative AI for Multimodal Information Generation
10 2025/04/22 NVIDIA Generative AI with Diffusion Models Part I
11 2025/04/29 NVIDIA Generative AI with Diffusion Models Part II
12 2025/05/06 Case Study on Generative AI Innovative Applications II

3
Syllabus
Week Date Subject/Topics
13 2025/05/13 NVIDIA Generative AI with Diffusion Models Part III
14 2025/05/20 AI Agents and Large Multimodal Agents (LMAs)
15 2025/05/27 Final Project Report I
16 2025/06/03 Final Project Report II

4
Transformers for
Natural Language Processing
and Computer Vision

5
Denis Rothman (2024),
Transformers for Natural Language Processing and Computer Vision:
Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3,
3rd Edition, Packt Publishing

Source: https://www.amazon.com/Transformers-Natural-Language-Processing-Computer/dp/1805128728 6
Denis Rothman (2024),
Transformers for Natural Language Processing and Computer Vision:
Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3,
3rd Edition, Packt Publishing
1.What Are Transformers?
2.Getting Started with the Architecture of the Transformer Model
3.Emergent vs Downstream Tasks: The Unseen Depths of Transformers
4.Advancements in Translations with Google Trax, Google Translate, and Gemini
5.Diving into Fine-Tuning through BERT
6.Pretraining a Transformer from Scratch through RoBERTa
7.The Generative AI Revolution with ChatGPT
8.Fine-Tuning OpenAI GPT Models
9.Shattering the Black Box with Interpretable Tools
10.Investigating the Role of Tokenizers in Shaping Transformer Models
11.Leveraging LLM Embeddings as an Alternative to Fine-Tuning
12.Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
13.Summarization with T5 and ChatGPT
14.Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
15.Guarding the Giants: Mitigating Risks in Large Language Models
16.Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
17.Transcending the Image-Text Boundary with Stable Diffusion
18.Hugging Face AutoTrain: Training Vision Models without Coding
19.On the Road to Functional AGI with HuggingGPT and its Peers
20.Beyond Human-Designed Prompts with Generative Ideation
Source: https://github.com/Denis2054/Transformers-for-NLP-and-Computer-Vision-3rd-Edition 7
Denis Rothman (2024),
Transformers for Natural Language Processing and Computer Vision:
Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3,
3rd Edition, Packt Publishing

Source: https://github.com/Denis2054/Transformers-for-NLP-and-Computer-Vision-3rd-Edition 8
Jay Alammar and Maarten Grootendorst (2024),
Hands-On Large Language Models:
Language Understanding and Generation,
O'Reilly Media

Source: https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961/ 9
Jay Alammar and Maarten Grootendorst (2024),
Hands-On Large Language Models:
Language Understanding and Generation,
O'Reilly Media
Chapter 1: Introduction to Language Models
Chapter 2: Tokens and Embeddings
Chapter 3: Looking Inside Transformer LLMs
Chapter 4: Text Classification
Chapter 5: Text Clustering and Topic Modeling
Chapter 6: Prompt Engineering
Chapter 7: Advanced Text Generation Techniques and Tools
Chapter 8: Semantic Search and Retrieval-Augmented Generation
Chapter 9: Multimodal Large Language Models
Chapter 10: Creating Text Embedding Models
Chapter 11: Fine-tuning Representation Models for Classification
Chapter 12: Fine-tuning Generation Models
Source: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models 10
Generative AI
Large Language Models
(LLMs)
Foundation Models
11
Generative AI
(Gen AI)
AI Generated Content
(AIGC) 12
Generative AI (Gen AI)
AI Generated Content (AIGC)

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 13
AI, ML, DL, Generative AI

Source: Jeong, Cheonsu. "A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture." arXiv preprint arXiv:2309.01105 (2023). 14
lmarena.ai Chatbot Arena Leaderboard
Rank* Rank Arena
Model 95% CI Votes Organization License
(UB) (StyleCtrl) Score
1 1 chocolate (Early Grok-3) 1403 +6/-6 9992 xAI Proprietary
2 3 Gemini-2.0-Flash-Thinking-Exp-01-21 1385 +4/-6 15083 Google Proprietary
2 3 Gemini-2.0-Pro-Exp-02-05 1380 +5/-6 13000 Google Proprietary
2 1 ChatGPT-4o-latest (2025-01-29) 1377 +5/-5 13470 OpenAI Proprietary
5 3 DeepSeek-R1 1362 +7/-7 6581 DeepSeek MIT
5 8 Gemini-2.0-Flash-001 1358 +7/-7 10862 Google Proprietary
5 3 o1-2024-12-17 1352 +5/-5 17248 OpenAI Proprietary
8 7 o1-preview 1335 +3/-4 33169 OpenAI Proprietary
8 8 Qwen2.5-Max 1334 +5/-5 9282 Alibaba Proprietary
8 7 o3-mini-high 1332 +5/-9 5954 OpenAI Proprietary
11 11 DeepSeek-V3 1318 +4/-5 19461 DeepSeek DeepSeek
11 13 Qwen-Plus-0125 1311 +9/-7 5112 Alibaba Proprietary
11 14 GLM-4-Plus-0111 1310 +6/-9 5134 Zhipu Proprietary
11 13 Gemini-2.0-Flash-Lite-Preview-02-05 1309 +6/-5 10262 Google Proprietary
12 12 o3-mini 1306 +5/-6 12179 OpenAI Proprietary
https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard 15
lmarena.ai Chatbot Arena Leaderboard
Confidence Intervals on Model Strength (via Bootstrapping)

https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard 16
Transformer (Attention is All You Need)
(Vaswani et al., 2017)

Source: Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.
"Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. 17
Transformer (Attention is All You Need)
(Vaswani et al., 2017)

• A Transformer is a type of deep learning model


introduced in the paper "Attention Is All You Need"
(Vaswani et al., 2017).
• It revolutionized Natural Language Processing (NLP)
by replacing traditional sequence models
like RNNs and LSTMs with a self-attention mechanism
that enables highly parallelizable training.

Source: Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.
"Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. 18
Transformer Models
Transformer
Encoder Decoder

DistilBERT BERT T5 GPT

RoBERTa BART GPT-2 CTRL

XLM-R XLM M2M-100 GPT-3

ALBERT BigBird GPT-Neo GPT-J

ELECTRA mT0 BLOOM BLOOMZ

DeBERTa ChatGPT GPT-4


Source: Lewis Tunstall, Leandro von Werra, and Thomas Wolf (2022), Natural Language Processing with Transformers: Building L anguage Applications with Hugging Face, O'Reilly Media. 19
BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
BERT (Bidirectional Encoder Representations from Transformers)
Overall pre-training and fine-tuning procedures for BERT

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 20
Fine-tuning BERT on Different Tasks

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 21
Sentiment Analysis:
Single Sentence Classification

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv preprint arXiv:1810.04805 22
Fine-tuning BERT on
Question Answering (QA)

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 23
Fine-tuning BERT on Dialogue
Intent Detection (ID; Classification)

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 24
Fine-tuning BERT on Dialogue
Slot Filling (SF)

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 25
Key Features of the Transformer Model
• Self-Attention Mechanism: Allows the model to weigh the importance of different
words in a sentence, regardless of their position.
• Positional Encoding: Since Transformers don’t use recurrence (like RNNs), positional
encodings are added to input embeddings to retain word order information.
• Multi-Head Attention: The model attends to different words simultaneously in multiple
ways, capturing various relationships.
• Feed-Forward Layers: After attention, the output passes through dense layers for
further transformation.
• Layer Normalization & Residual Connections: Improve gradient flow and training
stability.
• Encoder-Decoder Architecture:
• Encoder: Processes input text and converts it into contextual embeddings.
• Decoder: Generates output text, often used in translation or text generation tasks.

Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 26
Popular Transformer-Based Models
• BERT (Bidirectional Encoder Representations from
Transformers)
• Used for tasks like classification and question answering.
• GPT (Generative Pre-trained Transformer)
• Generates text based on input prompts.
• T5 (Text-To-Text Transfer Transformer)
• Converts all NLP tasks into a text-to-text format.
• ViT (Vision Transformer)
• Applies Transformer architecture to computer vision.
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 27
Tokens in NLP (Text Processing)
Aspect Description
Splitting text into meaningful units
Tokenization
(words, subwords, or characters).
Methods like Byte Pair Encoding (BPE) and
Subword Tokens WordPiece break words into reusable subunits
(e.g., "unhappiness" → ["un", "happiness"]).

Embeddings Converts tokens into numerical vectors for processing.


Identifies sentence structure by assigning roles to
Semantic Role Labeling (SRL)
tokens (e.g., subject, object).
Special Tokens [CLS], [SEP]
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 28
Tokens in CV (Image Processing)
Aspect Description

Vision Transformers (ViT) split an image into small patches


Patch Tokens
(e.g., 16x16 pixels), treating them as tokens.

Since images lack inherent sequence order like text,


Positional Encoding positional embeddings help ViTs understand spatial
structure.

Midjourney’s AI processes text prompts into image tokens,


Midjourney API Tokens
converting descriptions into AI-generated art.

OpenAI’s CLIP model tokenizes both text and images,


CLIP Tokens allowing cross-modal understanding (e.g., “dog” matches a
picture of a dog).
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 29
Tokens in NLP vs CV
Feature NLP Tokens CV Tokens
Image patches
Basic Unit Words, subwords, or characters
(e.g., 16×16 pixel grids)
Tokenized using BPE, Tokenized as patch
Processing
WordPiece, SentencePiece embeddings
Needed to retain spatial
Positional Encoding? Needed to retain word order
information
Example Models BERT, GPT, T5, RAG ViT, DINOv2, CLIP

Vision-based tasks (image


Text-based tasks (chatbots,
Use Case classification, AI art
summarization, RAG)
generation)
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 30
Attention in NLP (Text Processing)
Aspect Description
Each token attends to every other token in a sentence,
Self-Attention
capturing dependencies across long text sequences.
Computes attention scores using query (Q), key (K), and
Scaled Dot-Product Attention
value (V) vectors.
Improves attention by using multiple attention heads that
Multi-Head Attention (MHA)
learn different relationships.
Causal Attention (Decoder- Restricts attention to past tokens only, enabling text
Only Models) generation without looking ahead.
The decoder attends to encoder outputs in seq-to-seq
Cross-Attention
tasks like translation (e.g., T5, BART).
Retrieval-Augmented Fetches external knowledge before generating a response
Attention (RAG models).
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 31
Attention in CV (Vision Processing)
Aspect Description
Treats images as a sequence of patches (like words in text)
Self-Attention in ViTs
and applies attention to learn spatial relationships.

Similar to NLP, multiple attention heads capture different


Multi-Head Attention in ViTs
visual features.
Since images lack inherent order, positional embeddings
Positional Encoding in Vision
help maintain spatial structure.
Cross-Attention in Used in models like CLIP and Midjourney, where text
Multimodal AI descriptions attend to visual features.

Heatmaps showing which image regions the model focuses


Attention Maps in Vision
on (e.g., for explainability).
Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 32
Attention in NLP vs CV
Feature NLP Attention (Text) CV Attention (Images)

Input Format Tokenized text sequences Image patches

Basic Unit Words/subwords Pixels/patches


Learns spatial & contextual
Role of Attention Captures long-range dependencies
relationships
Sequential No, operates on full input
No, processes patches like text
Processing? (parallelizable)

Model Examples BERT, GPT, T5, RAG ViT, CLIP, Midjourney


Source: Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing 33
Single-layer Transformer
consists of self-attention,
a feedforward network, and residual connection

Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 34
Transformer Architecture
for POS Tagging

Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 35
Transformer (Attention is All You Need)
(Vaswani et al., 2017)

Source: Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.
"Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. 36
Transformer

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 37


Transformer
Encoder Decoder

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 38


Transformer
Encoder Decoder Stack

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 39


Transformer
Encoder Self-Attention

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 40


Transformer
Decoder

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 41


Transformer
Encoder with Tensors
Word Embeddings

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 42


Transformer
Self-Attention Visualization

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 43


Transformer
Positional Encoding Vectors

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 44


Transformer
Self-Attention Softmax Output

Source: Jay Alammar (2019), The Illustrated Transformer, http://jalammar.github.io/illustrated-transformer/ 45


Training Contextual Representations
using a left-to-right Language Model

Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 46
Masked Language Modeling:
Pretrain a Bidirectional Model

Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 47
Illustrated BERT

Source: Jay Alammar (2019), The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning),
http://jalammar.github.io/illustrated-bert/ 48
BERT Classification Input Output

Source: Jay Alammar (2019), The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning),
http://jalammar.github.io/illustrated-bert/ 49
BERT Encoder Input

Source: Jay Alammar (2019), The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning),
http://jalammar.github.io/illustrated-bert/ 50
BERT Classifier

Source: Jay Alammar (2019), The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning),
http://jalammar.github.io/illustrated-bert/ 51
Sentiment Analysis:
Single Sentence Classification

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv preprint arXiv:1810.04805 52
A Visual Guide to
Using BERT for the First Time
(Jay Alammar, 2019)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
53
Sentiment Classification: SST2
Sentences from movie reviews
sentence label

a stirring , funny and finally transporting re imagining of beauty and the beast
1
and 1930s horror films

apparently reassembled from the cutting room floor of any given daytime soap 0

they presume their audience won't sit still for a sociology lesson 0

this is a visually stunning rumination on love , memory , history and the war
1
between art and commerce

jonathan parker 's bartleby should have been the be all end all of the modern
1
office anomie films

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
54
Movie Review Sentiment Classifier

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
55
Movie Review Sentiment Classifier

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
56
Movie Review Sentiment Classifier
Model Training

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
57
Step # 1 Use distilBERT to
Generate Sentence Embeddings

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
58
Step #2:Test/Train Split for
Model #2, Logistic Regression

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
59
Step #3 Train the logistic regression
model using the training set

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
60
Tokenization
[CLS] a visually stunning rum ##ination on love [SEP]
a visually stunning rumination on love

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
61
Tokenization
tokenizer.encode("a visually stunning rumination on love",
add_special_tokens=True)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
62
Tokenization for BERT Model

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
63
Flowing Through DistilBERT
(768 features)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
64
Model #1 Output Class vector as
Model #2 Input

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
65
Fine-tuning BERT on
Single Sentence Classification Tasks

Source: Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018).
"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. 66
Model #1 Output Class vector as
Model #2 Input

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
67
Logistic Regression Model to
classify Class vector

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
68
df = pd.read_csv('https://github.com/clairett/pytorch-
sentiment-classification/raw/master/data/SST2/train.tsv',
delimiter='\t', header=None)

df.head()

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
69
Tokenization
tokenized = df[0].apply((lambda x: tokenizer.encode(x,
add_special_tokens=True)))

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
70
BERT Input Tensor

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
71
Processing with DistilBERT
input_ids = torch.tensor(np.array(padded))
last_hidden_states = model(input_ids)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
72
Unpacking the BERT output tensor

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
73
Sentence to last_hidden_state[0]

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
74
BERT’s output for the [CLS] tokens
# Slice the output for the first position for all the
sequences, take all hidden unit outputs
features = last_hidden_states[0][:,0,:].numpy()

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
75
The tensor sliced from BERT's output
Sentence Embeddings

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
76
Dataset for Logistic Regression
(768 Features)
The features are the output vectors of BERT for the [CLS] token (position #0)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
77
labels = df[1]
train_features, test_features, train_labels, test_labels =
train_test_split(features, labels)

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
78
Score Benchmarks
Logistic Regression Model
on SST-2 Dataset
# Training
lr_clf = LogisticRegression()
lr_clf.fit(train_features, train_labels)

#Testing
lr_clf.score(test_features, test_labels)

# Accuracy: 81%
# Highest accuracy: 96.8%
# Fine-tuned DistilBERT: 90.7%
# Full size BERT model: 94.9%
Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
79
Sentiment Classification: SST2
Sentences from movie reviews
sentence label

a stirring , funny and finally transporting re imagining of beauty and the beast
1
and 1930s horror films

apparently reassembled from the cutting room floor of any given daytime soap 0

they presume their audience won't sit still for a sociology lesson 0

this is a visually stunning rumination on love , memory , history and the war
1
between art and commerce

jonathan parker 's bartleby should have been the be all end all of the modern
1
office anomie films

Source: Jay Alammar (2019), A Visual Guide to Using BERT for the First Time,
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
80
A Visual Notebook to
Using BERT for the First Time

https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_
Using_BERT_for_the_First_Time.ipynb 81
Artificial Intelligence
(AI)
82
Definition
of
Artificial Intelligence
(A.I.)
83
Artificial Intelligence

“… the science and


engineering
of
making
intelligent machines”
(John McCarthy, 1955)
Source: https://digitalintelligencetoday.com/artificial-intelligence-defined-useful-list-of-popular-definitions-from-business-and-science/ 84
Artificial Intelligence
“… technology that
thinks and acts
like humans”
Source: https://digitalintelligencetoday.com/artificial-intelligence-defined-useful-list-of-popular-definitions-from-business-and-science/ 85
Artificial Intelligence
“… intelligence
exhibited by machines
or software”
Source: https://digitalintelligencetoday.com/artificial-intelligence-defined-useful-list-of-popular-definitions-from-business-and-science/ 86
4 Approaches of AI
2. 3.
Thinking Humanly: Thinking Rationally:
The “Laws of Thought”
The Cognitive Approach
Modeling Approach

1. 4.
Acting Humanly: Acting Rationally:
The Turing Test The Rational Agent
Approach (1950) Approach
Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 87
AI Acting Humanly:
The Turing Test Approach
(Alan Turing, 1950)

• Knowledge Representation
• Automated Reasoning
• Machine Learning (ML)
• Deep Learning (DL)
• Computer Vision (Image, Video)
• Natural Language Processing (NLP)
• Robotics
Source: Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson 88
AI, ML, DL
Artificial Intelligence (AI)

Machine Learning (ML)

Supervised Unsupervised
Learning Learning
Deep Learning (DL)
CNN
RNN LSTM GRU
GAN
Semi-supervised Reinforcement
Learning Learning
Source: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/deep_learning.html 89
Comparison of Generative AI
and Traditional AI
Feature Generative AI Traditional AI

Output type New content Classification/Prediction

Creativity High Low

Interactivity Usually more natural Limited


90
Generative AI
• Generative AI: The Art of Creation
• Definition: AI systems capable of creating new content
• Characteristics: Creativity, interactivity

91
Neural Network and Deep Learning

Source: 3Blue1Brown (2017), But what *is* a Neural Network? | Chapter 1, deep learning,
https://www.youtube.com/watch?v=aircAruvnKk 92
Gradient Descent
how neural networks learn

Source: 3Blue1Brown (2017), Gradient descent, how neural networks learn | Chapter 2, deep learning,
https://www.youtube.com/watch?v=IHZwWFHWa-w 93
Backpropagation

Source: 3Blue1Brown (2017), What is backpropagation really doing? | Chapter 3, deep learning,
https://www.youtube.com/watch?v=Ilg3gGewQ5U 94
Transformers (how LLMs work)

Source: 3Blue1Brown (2024), Transformers (how LLMs work) explained visually | DL5,
https://www.youtube.com/watch?v=wjZofJX0v4M 95
Attention in Transformers

Source: 3Blue1Brown (2024), Attention in transformers, visually explained | DL6,


https://www.youtube.com/watch?v=eMlx5fFNoYc 96
How might LLMs store facts

Source: 3Blue1Brown (2024), How might LLMs store facts | DL7,


https://www.youtube.com/watch?v=9-Jl0dxWQs8 97
Large Language Models explained briefly

Source: 3Blue1Brown (2024), Large Language Models explained briefly,


https://www.youtube.com/watch?v=LPZh9BOjkQs 98
Generative AI,
Agentic AI,
AI Agent,
RAG LLM
for
QA and Dialogue Systems 99
Chatbot
Dialogue System
Intelligent Agent
Conversational AI
100
The Development of LM-based Dialogue Systems
1) Early Stage (1966 - 2015)
2) The Independent Development of TOD and ODD (2015 - 2019)
3) Fusions of Dialogue Systems (2019 - 2022)
4) LLM-based DS (2022 - Now)

Task-oriented DS (TOD), Open-domain DS (ODD)


Source: Wang, Hongru, Lingzhi Wang, Yiming Du, Liang Chen, Jingyan Zhou, Yufei Wang, and Kam-Fai Wong. "A Survey of the Evolution of Language Model-Based Dialogue Systems." arXiv preprint arXiv:2311.16789 (2023). 101
Intelligent Agents Roadmap

Source: Cheng, Yuheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang et al. "Exploring large language model based intelligent agents: Definitions, methods, and prospects." arXiv preprint arXiv:2401.03428 (2024). 102
AI Agents
• Traditional AI Agents • Evolution of AI Agents
• Simple reflex agents • LLM-based Agents
• Model-based reflex agents • Multi-modal agents
• Goal-based agents • Embodied AI agents in
• Utility-based agents virtual environments
• Learning agents • Collaborative AI agents

103
Reinforcement Learning (DL)

Agent

Environment

Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book. 104
Reinforcement Learning (DL)

1 observation Agent
2 action
3 reward
Environment

Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book. 105
Reinforcement Learning (DL)

1 observation Agent
2 action
Ot At

3 reward Rt

Environment

Source: Richard S. Sutton & Andrew G. Barto (2018), Reinforcement Learning: An Introduction, 2nd Edition, A Bradford Book. 106
Large Language Model (LLM) based Agents

Source: Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., ... & Gui, T. (2023). The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864. 107
LLM-based Agents

Source: Cheng, Yuheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang et al. "Exploring large language model based intelligent agents: Definitions, methods, and prospects." arXiv preprint arXiv:2401.03428 (2024). 108
Large Multimodal Agents (LMA)

Source: Xie, J., Chen, Z., Zhang, R., Wan, X., & Li, G. (2024). Large Multimodal Agents: A Survey. ArXiv, abs/2402.15116. 109
Large Multimodal Agents (LMA)

Source: Xie, J., Chen, Z., Zhang, R., Wan, X., & Li, G. (2024). Large Multimodal Agents: A Survey. ArXiv, abs/2402.15116. 110
The Development of LM-based Dialogue Systems
1) Early Stage (1966 - 2015)
2) The Independent Development of TOD and ODD (2015 - 2019)
3) Fusions of Dialogue Systems (2019 - 2022)
4) LLM-based DS (2022 - Now)

Task-oriented DS (TOD), Open-domain DS (ODD)


Source: Wang, Hongru, Lingzhi Wang, Yiming Du, Liang Chen, Jingyan Zhou, Yufei Wang, and Kam-Fai Wong. "A Survey of the Evolution of Language Model-Based Dialogue Systems." arXiv preprint arXiv:2311.16789 (2023). 111
Major GenAI LLMs Research Milestones
(2017-2024)

Source: https://github.com/Hannibal046/Awesome-LLM 112


Multimodal Large Language Models (MLLM)

Source: Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. (2024) "A survey on multimodal large language models." National Science Review (2024): nwae403. 113
Multimodal Large Language Models (MLLM)

Multimodall LLM
Three types of connectors:
1. projection-based
2. query-based
3. fusion-based connectors

Source: Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. (2024) "A survey on multimodal large language models." National Science Review (2024): nwae403. 114
Multimodal Large Language Model (MLLM)
for Vision Question Answering

Source: Jiayi Kuang, Jingyou Xie, Haohao Luo, Ronghao Li, Zhe Xu, Xianfeng Cheng, Yinghui Li, Xika Lin, and Ying Shen. (2024) "Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey." arXiv preprint arXiv:2411.17558. 115
Multi-task Language Understanding on MMLU
GPT-4, Claude 3.5 Sonnet
Massive Multitask Language Understanding (MMLU)

Source: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu 116


LLM Capabilities

Source: Minaee, Shervin, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. (2024) "Large language models: A survey." arXiv preprint arXiv:2402.06196 (2024). 117
LLM-powered Multimodal Agents
Large Multimodal Agents (LMAs)

2022 2023 2024

Source: Xie, Junlin, Zhihong Chen, Ruifei Zhang, Xiang Wan, and Guanbin Li. "Large Multimodal Agents: A Survey." arXiv preprint arXiv:2402.15116 (2024). 118
Four Paradigms in NLP (LM)

Transfer Learning: Pre-training, Fine-Tuning (FT)

GAI: Pre-train, Prompt, and Predict (Prompting)

Source: Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. (2023) "Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing." ACM Computing Surveys 55, no. 9 (2023): 1-35. 119
Large Language Models (LLM)
Three typical learning paradigms

Source: Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. (2024) "A survey on multimodal large language models." National Science Review (2024): nwae403. 120
Popular Generative AI
• OpenAI ChatGPT (GPT-o1, GPT-4o, GPT-4)
• Claude.ai (Claude 3.5)
• Google Gemini
• Meta Llama 3.3, Llama 3.2 Vision
• Mixtral Pixtral (mistral.ai)
• DeepSeek
• Chat.LMSys.org (lmarena.ai)
• Perplexity.ai
• Stable Diffusion
• Video: D-ID, Synthesia
• Audio: Speechify
121
Claude 3.5 Sonnet State-of-the-art vision

Source: https://www.anthropic.com/news/3-5-models-and-computer-use 122


Llama 3.2 90B vision LLMs

Source: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ 123


Llama 3.3 70B instruction-tuned

https://www.llama.com/ 124
Mistral Pixtral Large (124B)
Frontier-class multimodal performance

Source: https://mistral.ai/news/pixtral-large/ 125


Mistral Pixtral 12B

Source: Agrawal, Pravesh, Szymon Antoniak, Emma Bou Hanna, Baptiste Bout, Devendra Chaplot, Jessica Chudnovsky, Diogo Costa et al. (2024) "Pixtral 12B." arXiv preprint arXiv:2410.07073. 126
Large Language Models (LLMs)
Artificial Analysis Quality Index

Source: https://artificialanalysis.ai/ 127


Large Language Models (LLMs)
Quality vs. Price

Source: https://artificialanalysis.ai/ 128


Large Language Models for Data Science
Chat
with llama 3.2 claude 3.5 sonnet

Open
Large
Language
Models:
Chatbot
Arena
https://lmarena.ai/ 129
Perplexity.ai

https://www.perplexity.ai/ 130
Generative AI (Gen AI)
AI Generated Content (AIGC)
Image Generation

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 131
Generative AI (Gen AI)
AI Generated Content (AIGC)

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 132
The history of Generative AI
in CV, NLP and VL

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 133
Generative AI
Foundation Models

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 134
Categories of Vision Generative Models

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 135
The General Structure of
Generative Vision Language

Source: Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun (2023). "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT."
arXiv preprint arXiv:2303.04226. 136
RAG LLM
Dialogue Systems

137
Technology Tree of RAG Research
Retrieval-Augmented Generation (RAG) for Large Language Models (LLMs)

Source: Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997. 138
Retrieval-Augmented Generation (RAG)
for Large Language Models (LLMs)

Source: Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997. 139
Retrieval-Augmented Generation (RAG) Architecture

Source: Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., ... & Cui, B. (2024). Retrieval-augmented generation for ai-generated content: A survey. arXiv preprint arXiv:2402.19473. 140
Synthesizing RAG with LLMs
for Question Answering Application

Source: Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv preprint arXiv:2402.06196. 141
Synthesizing the KG as a Retriever with LLMs

Source: Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv preprint arXiv:2402.06196. 142
HuggingGPT:
An agent-based approach to use tools and planning

Source: Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv preprint arXiv:2402.06196. 143
A LLM-based Agent for
Conversational Information Seeking

Source: Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv preprint arXiv:2402.06196. 144
Direct LLM, RAG, and GraphRAG

Source: Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., ... & Tang, S. (2024). Graph retrieval -augmented generation: A survey. arXiv preprint arXiv:2408.08921. 145
GraphRAG Framework for Question Answering

Source: Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., ... & Tang, S. (2024). Graph retrieval -augmented generation: A survey. arXiv preprint arXiv:2408.08921. 146
LangChain Architecture

Source: https://www.langchain.com/ 147


Multimodal LLM RAG
Multi-Vector Retriever for RAG

Source: https://blog.langchain.dev/deconstructing-rag/ 148


Evaluating RAG with Ragas Metrics

Source: https://blog.langchain.dev/evaluating-rag-pipelines-with-ragas-langsmith/ 149


References
• Numa Dhamani and Maggie Engler (2024), Introduction to Generative AI, Manning
• Denis Rothman (2024), Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and
Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3, 3rd Edition, Packt Publishing
• NVIDIA DLI (2024), Building RAG Agents with LLMs, https://learn.nvidia.com/courses/course-detail?course_id=course-
v1:DLI+S-FX-15+V1
• NVIDIA DLI (2024), Generative AI with Diffusion Models, https://learn.nvidia.com/courses/course-
detail?course_id=course-v1:DLI+S-FX-14+V1
• Denis Rothman (2024), RAG-Driven Generative AI: Build custom retrieval augmented generation pipelines with
LlamaIndex, Deep Lake, and Pinecone, Packt Publishing
• Jay Alammar and Maarten Grootendorst (2024), Hands-On Large Language Models: Language Understanding and
Generation, O'Reilly Media
• Ben Auffarth (2023), Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT and
other LLMs, Packt Publishing
• Chris Fregly, Antje Barth, and Shelbee Eigenbrode (2023), Generative AI on AWS: Building Context-Aware Multimodal
Reasoning Applications, O'Reilly Media
• David Foster (2023), Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play, 2nd Edition,
Oreilly & Associates Inc

150

You might also like