0% found this document useful (0 votes)

20 views7 pages

ACL Exp7

The document outlines the process of fine-tuning the BERT model for Natural Language Processing tasks, emphasizing its architecture, pre-training methods, and advantages of fine-tuning. It describes the steps involved in fine-tuning BERT for multi-class text classification, including data loading, pre-processing, model definition, training, and evaluation. The conclusion highlights the effectiveness of BERT in achieving high accuracy with minimal training time and the ability to optimize performance using smaller datasets.

Uploaded by

Abhay Raj Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views7 pages

ACL Exp7

Uploaded by

Abhay Raj Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Name: Abhay Raj Agrawal

Name : Dhwani Bharat Gohil

D12
SAP ID : 60009210102
Div : D1 / D12
60009210100

Experiment No 7
Aim: Fine tuning BERT model to perform Natural Language Processing task.

Theory:

BERT

BERT stands for Bidirectional Encoder Representations from Transformers and is a language
representation model by Google. It uses two steps, pre-training and fine-tuning, to create state-
of-the-art models for a wide range of tasks. Its distinctive feature is the unified architecture
across different downstream tasks — what these are, we will discuss soon. That means that the
same pre-trained model can be fine-tuned for a variety of final tasks that might not be similar to
the task model was trained on and give close to state-of-the-art results.

BERT Architecture

BERT has to differ Architecture BERT Base and BERT Large

BERT Base: L=12, H=768, A=12.
Total Parameters=110M!
BERT Large: L=24, H=1024, A=16.
Total Parameters=340M!!
L = Number of layers (i.e., #Transformer encoder blocks in the stack).
H = Hidden size (i.e. the size of q, k and v vectors).
A = Number of attention heads.

Pre-training BERT

The BERT model is trained on the following two unsupervised tasks.

1. Masked Language Model (MLM)
This task enables the deep bidirectional learning aspect of the model. In this task, some
percentage of the input tokens are masked (Replaced with [MASK] token) at random and the
model tries to predict these masked tokens — not the entire input sequence. The predicted
tokens from the model are then fed into an output softmax over the vocabulary to get the final
output words.
This, however creates a mismatch between the pre-training and fine-tuning tasks because the
latter does not involve predicting masked words in most of the downstream tasks. This is
mitigated by a subtle twist in how we mask the input tokens.
Approximately 15% of the words are masked while training, but all of the masked words are
not replaced by the [MASK] token.
80% of the time with [MASK] tokens.
10% of the time with a random tokens.
10% of the time with the unchanged input tokens that were being masked.

2. Next Sentence Prediction (NSP)

The LM doesn’t directly capture the relationship between two sentences which is relevant in
many downstream tasks such as Question Answering (QA) and Natural Language Inference
(NLI). The model is taught sentence relationships by training on binarized NSP task.
In this task, two sentences — A and B — are chosen for pre-training.
50% of the time B is the actual next sentence that follows A.
50% of the time B is a random sentence from the corpus.
Training — Inputs and Outputs.

The model is trained on both above mentioned tasks simultaneously. This is made possible by
clever usage of inputs and outputs.
Inputs

The input representation for BERT

The model needs to take input for both a single sentence or two sentences packed together
unambiguously in one token sequence. Authors note that a “sentence” can be an arbitrary span
of contiguous text, rather than an actual linguistic sentence. A [SEP] token is used to separate
two sentences as well as a using a learnt segment embedding indicating a token as a part of
segment A or B.

Problem #1: All the inputs are fed in one step — as opposed to RNNs in which inputs are fed
sequentially, the model is not able to preserve the ordering of the input tokens. The order of
words in every language is significant, both semantically and syntactically.
Problem #2: In order to perform Next Sentence Prediction task properly we need to be able
to distinguish between sentences A and B. Fixing the lengths of sentences can be too restrictive
and a potential bottleneck for various downstream tasks.
Both of these problems are solved by adding embeddings containing the required information
to our original tokens and using the result as the input to our BERT model. The following
embeddings are added to token embeddings:

Segment Embedding: They provide information about the sentence a particular token is a part
of.

Position Embedding: They provide information about the order of words in the input.
Outputs

Fine-tuning BERT

Fine-tuning on various downstream tasks is done by swapping out the appropriate inputs or
outputs. In the general run of things, to train task-specific models, we add an extra output layer
to existing BERT and fine-tune the resultant model — all parameters, end to end. A positive
consequence of adding layers — input/output and not changing the BERT model is that only a
minimal number of parameters need to be learned from scratch making the procedure fast, cost
and resource efficient.
Just to give you an idea of how fast and efficient it is, the authors claim that all the results in the
paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU,
starting from the exact same pre-trained model.

Fine-tuning BERT on various downstream tasks.

In Sentence Pair Classification and Single Sentence Classification, the final state corresponding
to [CLS] token is used as input for the additional layers that makes the prediction.
In QA tasks, a start (S) and an end (E) vector are introduced during fine tuning. The question is
fed as sentence A and the answer as sentence B. The probability of word i being the start of the
answer span is computed as a dot product between Ti (final state corresponding to ith input
token) and S (start vector) followed by a softmax over all of the words in the paragraph. A
similar method is used for end span.

Advantages of Fine-Tuning
Quicker Development

First, the pre-trained BERT model weights already encode a lot of information about our
language. As a result, it takes much less time to train our fine-tuned model - it is as if we have
already trained the bottom layers of our network extensively and only need to gently tune them
while using their output as features for our classification task. In fact, the authors recommend
only 2-4 epochs of training for fine-tuning BERT on a specific NLP task (compared to the
hundreds of GPU hours needed to train the original BERT model or a LSTM from scratch!).

Less Data

In addition, and perhaps just as important, because of the pre-trained weights this method allows
us to fine-tune our task on a much smaller dataset than would be required in a model that is built
from scratch. A major drawback of NLP models built from scratch is that we often need a
prohibitively large dataset in order to train our network to reasonable accuracy, meaning a lot of
time and energy had to be put into dataset creation. By fine-tuning BERT, we are now able to
get away with training a model to good performance on a much smaller amount of training data.

Better Results

Finally, this simple fine-tuning procedure (typically adding one fully-connected layer on top of
BERT and training for a few epochs) was shown to achieve state of the art results with minimal
task-specific adjustments for a wide variety of tasks: classification, language inference, semantic
similarity, question answering, etc. Rather than implementing custom and sometimes-obscure
architectures shown to work well on a specific task, simply fine-tuning BERT is shown to be a
better (or at least equal) alternative.

Steps to Fine Tune BERT Model to perform Multi Class Text Classification
1. Load dataset
2. Pre-process data
3. Define model
4. Train the model
5. Evaluate

Lab Exercise to be Performed in this Session:

Perform Text Classification by Fine tuning BERT model.
Colab Link:
https://colab.research.google.com/drive/1ojVt77k0AThffONLXr3rvSjPagVn6lRk?usp=
sharing
Conclusion:
We successfully fine-tuned a BERT model for multi-class text classification, leveraging its pre-
trained language representations to achieve high accuracy with minimal training time. The
process involved data loading, pre-processing, model definition, training, and evaluation.
Despite the time-consuming nature of training on large datasets, using subsets can optimize
performance, showcasing BERT’s effectiveness in accurately classifying text across various
categories.

BERT Architecture
No ratings yet
BERT Architecture
23 pages
Understanding BERT
No ratings yet
Understanding BERT
4 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
BERT (Bidirectional Encoder Representations From Transformers)
No ratings yet
BERT (Bidirectional Encoder Representations From Transformers)
4 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Understanding BERT: Architecture & Applications
No ratings yet
Understanding BERT: Architecture & Applications
4 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
No ratings yet
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
71 pages
Bert Explained
No ratings yet
Bert Explained
8 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
BERT and Its Implementation
No ratings yet
BERT and Its Implementation
5 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
No ratings yet
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
16 pages
Bert Model - NLP
No ratings yet
Bert Model - NLP
10 pages
Understanding BERT's Bidirectional Encoder
No ratings yet
Understanding BERT's Bidirectional Encoder
8 pages
BERT: Key Insights for NLP Students
No ratings yet
BERT: Key Insights for NLP Students
33 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
BERT for NLP Experts
No ratings yet
BERT for NLP Experts
17 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Evolution of NLP Models: LSTM to BERT
No ratings yet
Evolution of NLP Models: LSTM to BERT
30 pages
Week 3: Deeplearning - Ai
No ratings yet
Week 3: Deeplearning - Ai
98 pages
Understanding BERT for NLP Tasks
No ratings yet
Understanding BERT for NLP Tasks
21 pages
Data Mining Report
No ratings yet
Data Mining Report
17 pages
BERT: Bidirectional Encoder Insights
No ratings yet
BERT: Bidirectional Encoder Insights
24 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
Visualizing BERT & NLP Advances
No ratings yet
Visualizing BERT & NLP Advances
19 pages
Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
No ratings yet
Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
4 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Bert 1
No ratings yet
Bert 1
4 pages
Bert
No ratings yet
Bert
36 pages
BERT Language Model
No ratings yet
BERT Language Model
7 pages
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
No ratings yet
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
12 pages
Punctuation Restoration Using BERTs Variants
No ratings yet
Punctuation Restoration Using BERTs Variants
11 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Bert
No ratings yet
Bert
20 pages
BERT vs GPT: Key Differences
No ratings yet
BERT vs GPT: Key Differences
41 pages
SpanBERT: Enhanced Span Representation
No ratings yet
SpanBERT: Enhanced Span Representation
14 pages
Understanding BERT and ELMo in NLP
No ratings yet
Understanding BERT and ELMo in NLP
20 pages
Understanding BERT and NLP Innovations
No ratings yet
Understanding BERT and NLP Innovations
98 pages
11 Bert
No ratings yet
11 Bert
66 pages
BERT Interview Questions and Cross Questions-1
No ratings yet
BERT Interview Questions and Cross Questions-1
9 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Pretrained Sentence Embedding and Semantic Sentence Similarity Language Model Fo
No ratings yet
Pretrained Sentence Embedding and Semantic Sentence Similarity Language Model Fo
5 pages
Understanding BERT: A Comprehensive Survey
No ratings yet
Understanding BERT: A Comprehensive Survey
23 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
BERT Summarization MP IA1Final
No ratings yet
BERT Summarization MP IA1Final
12 pages
Stanford Dataset 2.0
No ratings yet
Stanford Dataset 2.0
9 pages
Neo Bert
No ratings yet
Neo Bert
19 pages
Pretrained Transformers Insights
No ratings yet
Pretrained Transformers Insights
42 pages
BERT GPT CoT
No ratings yet
BERT GPT CoT
83 pages
Fine-Tuning Representation Models For Classification
No ratings yet
Fine-Tuning Representation Models For Classification
72 pages
Index of Articles
No ratings yet
Index of Articles
4 pages
This Article Is About The Academic Discipline
No ratings yet
This Article Is About The Academic Discipline
8 pages
Experiments For B. Tech. 1 Year Physics Laboratory
No ratings yet
Experiments For B. Tech. 1 Year Physics Laboratory
6 pages
Students' Ability in Paraphrasing An English Text
No ratings yet
Students' Ability in Paraphrasing An English Text
5 pages
Teaching Philosophy 1
100% (2)
Teaching Philosophy 1
2 pages
Compilation of Written Works
No ratings yet
Compilation of Written Works
3 pages
Resumen MIS
No ratings yet
Resumen MIS
79 pages
Unit 5 - Iti 4.0
No ratings yet
Unit 5 - Iti 4.0
19 pages
Thesis Ref Job Satisfaction Deped Nurses
100% (1)
Thesis Ref Job Satisfaction Deped Nurses
90 pages
The Importance of Public Libraries
No ratings yet
The Importance of Public Libraries
4 pages
1 R8360 Final Project - Building A Qualitative Research Plan The Purpo
No ratings yet
1 R8360 Final Project - Building A Qualitative Research Plan The Purpo
21 pages
Exercise of Human Agency Through Collective Efficacy: Albert Bandura
No ratings yet
Exercise of Human Agency Through Collective Efficacy: Albert Bandura
4 pages
Exploring Elements of Poetry
50% (2)
Exploring Elements of Poetry
4 pages
Kotler Roberto Lee Social MKT Contents
0% (1)
Kotler Roberto Lee Social MKT Contents
7 pages
Class 11 Strings
No ratings yet
Class 11 Strings
10 pages
Full Leibniz Doctrine of Necessary Truth Routledge Library Editions 17th Century Philosophy Margaret Dauler Wilson PDF All Chapters
100% (4)
Full Leibniz Doctrine of Necessary Truth Routledge Library Editions 17th Century Philosophy Margaret Dauler Wilson PDF All Chapters
62 pages
Research Paper of AI Personal Assistant
No ratings yet
Research Paper of AI Personal Assistant
3 pages
Academic Dishonesty in Schools of Nursing - A Literature Review - ProQuest
No ratings yet
Academic Dishonesty in Schools of Nursing - A Literature Review - ProQuest
6 pages
COT Q2 MATH11 Truth Values of Propositions
No ratings yet
COT Q2 MATH11 Truth Values of Propositions
5 pages
Army Public School Application Format
No ratings yet
Army Public School Application Format
4 pages
Iocl R&D Advertisement
No ratings yet
Iocl R&D Advertisement
7 pages
Report - Ae1111
No ratings yet
Report - Ae1111
3 pages
Environmental Educationand Public Awareness
No ratings yet
Environmental Educationand Public Awareness
10 pages
HACCP & ISO 22000 Certification for Catering
No ratings yet
HACCP & ISO 22000 Certification for Catering
2 pages
TERI University Placement Brochure 2017
No ratings yet
TERI University Placement Brochure 2017
40 pages
Best Writing Practices For Graduate Students Reducing The Discomfort of The Blank Screen
No ratings yet
Best Writing Practices For Graduate Students Reducing The Discomfort of The Blank Screen
7 pages
Digitizing Classical Rhetorics
No ratings yet
Digitizing Classical Rhetorics
18 pages
372 Advertisement Lady Police Constable Karachi Range
No ratings yet
372 Advertisement Lady Police Constable Karachi Range
1 page
BAE-Subject Verb Agreement
No ratings yet
BAE-Subject Verb Agreement
18 pages
The Tactical Athlete
No ratings yet
The Tactical Athlete
10 pages

ACL Exp7

Uploaded by

ACL Exp7

Uploaded by

Name: Abhay Raj Agrawal

Name : Dhwani Bharat Gohil

BERT has to differ Architecture BERT Base and BERT Large

The BERT model is trained on the following two unsupervised tasks.

2. Next Sentence Prediction (NSP)

The input representation for BERT

Fine-tuning BERT on various downstream tasks.

Lab Exercise to be Performed in this Session:

You might also like