NPTEL NLP Assignment 2
NPTEL NLP Assignment 2
Assignment- 2
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
According to Zipf’s law which statement(s) is/are correct?
(i) A small number of words occur with high frequency.
(ii) A large number of words occur with low frequency.
a. Both (i) and (ii) are correct
b. Only (ii) is correct
c. Only (i) is correct
d. Neither (i) nor (ii) is correct
Correct Answer: a
Solution:
____________________________________________________________________________
QUESTION 2:
Consider the following corpus C1 of 4 sentences. What is the total count of unique bi-grams for
which the likelihood will be estimated? Assume we do not perform any pre-processing.
a. 24
b. 28
c. 27
d. 23
Correct Answer: a
Detailed Solution:
Unique bi-grams are:
<s> tomorrow tomorrow is is Sachin’s Sachin’s birthday birthday <\s>
<s> he he loves loves cream cream chocolates chocolates <\s>
he is is also also fond fond of of sweet
cake <\s>
<s> we we will will celebrate celebrate his
his birthday birthday with with sweet chocolate cake
______________________________________________________________________________
QUESTION 3:
A 4-gram model is a ___________ order Markov Model.
a. Two
b. Five
c. Four
d. Three
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 4:
d. The probability of a word depends only on the current and the previous word.
Correct Answer: b
Solution:
______________________________________________________________________________
QUESTION 5:
For the string ‘mash’, identify which of the following set of strings has a Levenshtein distance of
1.
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 6:
Assume that we modify the costs incurred for operations in calculating Levenshtein distance,
such that both the insertion and deletion operations incur a cost of 1 each, while substitution
incurs a cost of 2. Now, for the string ‘clash’ which of the following set of strings will have an
edit distance of 1?
Correct Answer: d
Detailed Solution:
____________________________________________________________________________
QUESTION 7:
Given a corpus C2, the Maximum Likelihood Estimation (MLE) for the bigram “dried berries” is
0.45 and the count of occurrence of the word “dried” is 720. For the same corpus C2, the likelihood
of “dried berries” after applying add-one smoothing is 0.05. What is the vocabulary size of C2?
a. 4780
b. 3795
c. 4955
d. 5780
Correct Answer: d
Detailed Solution:
QUESTION 8:
Calculate P(they play in a big garden) assuming a bi-gram language model.
a. 1/8
b. 1/12
c. 1/24
d. None of the above
Correct Answer: b
Detailed Solution:
QUESTION 9:
Considering the same model as in Question 7, calculate the perplexity of <s> they play in a big
garden <\s>.
a. 2.289
b. 1.426
c. 1.574
d. 2.178
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
QUESTION 10:
Assume that you are using a bi-gram language model with add one smoothing. Calculate P(they
play in a beautiful garden).
a. 4.472 x 10^-6
b. 2.236 x 10^-6
c. 3.135 x 10^-6
d. None of the above
Correct Answer: b
Detailed Solution:
|V|=11
P(they | <s> ) = (1+1)/(3+11)
P(play | they) = (1+1)/(1+11)
P(in | play) = (1+1)/(2+11)
P(a | in) = (1+1)/(1+11)
P(beautiful | a) = (0+1)/(2+11)
P(garden | beautiful) = (1+1)/(1+11)
P(<\s>|garden) = (3+1)/(3+11)
P(they play in a beautiful garden) = 2/14 x 2/12 x 2/13 x 2/12 x 1/13 x 2/12 x 4/14
= 2.236 x 10^-6
____________________________________________________________________________
************END*******