0% found this document useful (0 votes)
3 views22 pages

NLP 1.2

The document outlines the course objectives and outcomes for a Natural Language Processing course at the Apex Institute of Technology. It covers foundational concepts such as n-grams, maximum likelihood estimation, smoothing, and entropy, along with their applications in speech recognition and language modeling. Additionally, it provides references for textbooks and online resources related to the subject.

Uploaded by

Gorika Jindal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views22 pages

NLP 1.2

The document outlines the course objectives and outcomes for a Natural Language Processing course at the Apex Institute of Technology. It covers foundational concepts such as n-grams, maximum likelihood estimation, smoothing, and entropy, along with their applications in speech recognition and language modeling. Additionally, it provides references for textbooks and online resources related to the subject.

Uploaded by

Gorika Jindal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Apex Institute of Technology

Department of Computer Science & Engineering

NATURAL LANGUAGE PROCESSING


(22CSH-379)

Dr Prabhjot Kaur
E16646
Assistant Professor DISCOVER . LEARN . EMPOWER
CSE(AIT), CU 1
NATURAL LANGUAGE PROCESSING : Course Objectives
The objectives of this course are:
• To understand the foundational concepts of speech and language processing,
including ambiguity and computational models.
• To explore the role of algorithms and automata in morphological parsing and
linguistic analysis.
• To familiarize students with language modelling techniques like n-grams and
smoothing, and their application in speech recognition.
• To analyze the structure of language through parsing, feature structures, and
probabilistic grammars.
• To introduce semantic representation techniques for understanding meaning in
natural language.
• To equip students with the skills to implement NLP systems using tools and
techniques like tagging, parsing, and unification.

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

3
Table of Contents
•N-Gram
•Bi-Gram
•Maximum Likelihood Estimation
•Smoothing
•Entropy

4
Words

5
29

●N-grams of texts are extensively used in text mining


and natural language processing tasks. They are
basically a set of co-occurring words within a given
window and when computing the n-grams you
typically move one word forward (although you can
move X words forward in more advanced scenarios).

Artificial Intelligence: Natural Language Processing 23 April 2020


29

Artificial Intelligence: Natural Language Processing 23 06/14/2025


Artificial Intelligence: Natural Language Processing 23 April 2020
N-Grams
● It helps in suggesting words which could be used next in a given
sentence.

● An n-gram is a contiguous sequence of n items from a given sample


of text or speech.

● The items can be phonemes, syllables, letters, words or base pairs


according to the application.

● The n-grams typically are collected from a text or speech corpus.

● So, an N-gram model predicts the occurrence of a word based on


the occurrence of its N – 1 previous words.

9
Bi-Gram
● An N-gram is a sequence of N tokens (or words).

● A 1-gram (or unigram) is a one-word sequence.


For the above sentence, the unigrams would simply be: “I”, “love”,
“reading”, “blogs”, “about”, “data”, “science”.
● A 2-gram (or bigram) is a two-word sequence of words, like “I love”,
“love reading”.
And a 3-gram (or trigram) is a three-word sequence of words like “I love
reading”, “about data science”.

10
Example:
For the sentence “The cow jumps over the moon”. If N=2 (known as
bigrams), then the ngrams would be:
• the cow
• cow jumps
• jumps over
• over the
• the moon

So you have 5 n-grams in this case. Notice that we moved from the->cow
to cow->jumps to jumps->over, etc, essentially moving one word forward
to generate the next bigram. 11
If N=3, the n-grams would be:

• the cow jumps

• cow jumps over

• jumps over the

• over the moon

So you have 4 n-grams in this case. When N=1, this is referred to


as unigrams and this is essentially the individual words in a sentence.
When N=2, this is called bigrams and when N=3 this is called trigrams.
When N>3 this is usually referred to as four grams or five grams and so on.
12
How many N-grams in a sentence?

If X=Num of words in a given sentence K, the number of n-grams for sentence K would be:

13
How do N-gram language models work?

An N-gram language model predicts the probability of a given N-gram within any
sequence of words in the language. If we have a good N-gram model, we can
predict p(w | h) – what is the probability of seeing the word w given a history of
previous words h – where the history contains n-1 words.

14
• We must estimate this probability to construct an N-gram model.

• We compute this probability in two steps:

• Apply the chain rule of probability

• We then apply a very strong simplification assumption to allow us to


compute p(w1…ws) in an easy manner.

• The chain rule of probability is:

p(w1...ws) = p(w1) . p(w2 | w1) . p(w3 | w1 w2) . p(w4 | w1 w2 w3) .....


p(wn | w1...wn-1)
15
• So what is the chain rule? It tells us how to compute the joint
probability of a sequence by using the conditional probability of a word
given previous words.

• But we do not have access to these conditional probabilities with


complex conditions of up to n-1 words. So how do we proceed?

This is where we introduce a simplification assumption. We can assume


for all conditions, that:

p(wk | w1...wk-1) = p(wk | wk-1) 16


Maximum Likelihood Estimation
● An intuitive way to estimate probabilities is called maximum likelihood
estimation or MLE.
● The MLE estimate for the parameters of an n-gram model by getting
counts from a corpus, and normalizing the counts so that they lie
between 0 and 1.
● Maximum likelihood estimation involves defining a likelihood function for
calculating the conditional probability of observing the data sample
given a probability distribution and distribution parameters.

17
Maximum Likelihood Estimation
● PMLE(w1 ,..,wn )=C(w1 ,..,wn )/N, where C(w1 ,..,wn ) is the frequency
of n-gram w1 ,..,wn
● PMLE(wn |w1 ,..,wn-1)= C(w1 ,..,wn )/C(w1 ,..,wn-1)
● This estimate is called Maximum Likelihood Estimate (MLE) because it
is the choice of parameters that gives the highest probability to the
training corpus.

18
Smoothing
● What do we do with words that are in our vocabulary (they are not
unknown words) but appear in a test set in an unseen context (for
example they appear after a word they never appeared after in
training)?
● To keep a language model from assigning zero probability to these
unseen events, we’ll have to shave off a bit of probability mass from
some more frequent events and give it to the events we’ve never seen.
● This modification is called smoothing or discounting

19
Entropy
● Entropy is a measure of information.
● Given a random variable X ranging over whatever we are predicting
(words, letters, parts of speech, the set of which we’ll call χ) and with a
particular probability function, call it p(x), the entropy of the random
variable X is:

● The log can, be computed in any base.


If we use log base 2, the resulting value of entropy will be measured in
bits. 20
Reference:
Books:

TEXTBOOKS
T1: Speech and Language processing an introduction to Natural Language Processing, Computational Linguistics
and speech Recognition by Daniel Jurafsky and James H. Martin
T2: Natural Language Processing with Python by Steven Bird, Ewan Klein, Edward Lopper
REFERENCE BOOKS:
R1: Handbook of Natural Language Processing, Second Edition—Nitin Indurkhya, Fred J. Damerau, Fred J. Damera
Course Link:
https://in.coursera.org/specializations/natural-language-processing

Video Link:
https://youtu.be/YVQcE5tV26s

Web Link:
https://www.tutorialspoint.com/natural_language_processing/natural_language_processing_tutorial.pdf

21
THANK YOU

For queries
Email:
[email protected]

You might also like