0% found this document useful (0 votes)
11 views39 pages

Important Topics Explantion NLP

NLP

Uploaded by

mustaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

Important Topics Explantion NLP

NLP

Uploaded by

mustaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Natural Language Processing NLP IMP

QNA

Natural language Processing (University of Mumbai)

Scan to open on Studocu

Downloaded by Mohamed Mustaq Ahmed


Studocu is not sponsored or endorsed by any college or university

Downloaded by Mohamed Mustaq Ahmed


Natural Language Processing Importance.

note – before use check answers according to your syllabus.

Module 1 : Introduction to NLP.

Q1 Discuss the challenges in various stages of natural language processing.

Ans.

Text Preprocessing

 Tokenization: Breaking text into meaningful units (tokens) is challenging due to language-specific
rules, complex words, and variations in spacing and punctuation. For example, "New York-based"
should be kept together as one token, but simple tokenizers might split it incorrectly.
 Normalization: Converting text to a standardized format (e.g., lowercasing, stemming, or lemmatizing
words) is complex because it often depends on context. Words like "saw" can be either a verb (to
see) or a noun (a tool), requiring contextual understanding.

 Handling Stop Words: Removing words like "the," "is," or "and" can be problematic since they
sometimes carry meaning. For instance, negations like "not" significantly change sentence meaning,
so ignoring them could lead to errors.

 Noise and Error Handling: User-generated content, such as tweets, can have slang, typos, and
inconsistent grammar, making normalization difficult. Dealing with this noise is essential for
accurate downstream analysis.

Syntactic Analysis

 Part-of-Speech (POS) Tagging: Identifying the grammatical role of each word in a sentence is
challenging due to ambiguity and homographs (words spelled the same but with different
meanings). For example, "lead" can be a noun (metal) or a verb (to guide).

 Dependency Parsing: Determining syntactic dependencies (which words are subjects, objects, etc.) is
difficult when sentences are complex or have unusual structures. Ambiguities like "I saw the man
with a telescope" can have multiple interpretations, and parsers often struggle to disambiguate
them accurately.

 Grammar Variation Across Languages: Languages have vastly different grammatical structures, which
requires designing adaptable parsing algorithms. English is SVO (Subject-Verb-Object), but other
languages, like Japanese, follow SOV (Subject-Object-Verb), complicating syntactic analysis.

Semantic Analysis

 Word Sense Disambiguation (WSD): Many words have multiple meanings, and identifying the correct
one is essential for understanding. For example, "bank" could mean a financial institution or the side
of a river. WSD often relies on context, which is not always explicit or clear.

Downloaded by Mohamed Mustaq Ahmed


 Named Entity Recognition (NER): Detecting and categorizing entities like names, locations, and dates
is challenging due to ambiguities and diverse formats. For instance, "Apple" can be an entity
representing the company or a common noun referring to the fruit.

 Handling Figurative Language: Metaphors, sarcasm, and idioms are common in human language but
hard for machines to interpret. "Kick the bucket" meaning "to die" or "break the ice" meaning "to
start a conversation" require cultural and contextual knowledge to understand.

 Coreference Resolution: Resolving references to entities across sentences is difficult in complex text.
In the text "Alice went to the market. She bought apples," identifying that "she" refers to "Alice"
requires keeping track of entities and context.

Pragmatic Analysis

 Context and World Knowledge: Pragmatics involves understanding language in context, including the
speaker’s intent and background knowledge. For example, understanding "Can you pass the salt?" as
a request rather than a question requires context-sensitive interpretation.

 Dealing with Ambiguity and Implicit Meaning: Pragmatics often involves implied meanings that are not
explicitly stated. In the sentence "John ate an apple. He is healthy," there is an implicit
understanding that apples contribute to health, which NLP systems may miss without external
knowledge.

 Dialogue and Discourse Analysis: Maintaining context across sentences and turns in a conversation is
challenging, especially in multi-turn dialogues. Tracking what has been said before, handling
interruptions, and managing topic shifts requires sophisticated models that can “remember”
prior interactions.

Machine Translation

 Handling Idioms and Non-literal Expressions: Machine translation often struggles with idiomatic
expressions. Translating "piece of cake" directly would fail to capture its meaning ("easy task") in
the target language.

 Low-Resource Languages: NLP resources for widely spoken languages like English and Mandarin
are abundant, but many languages have limited data and resources, making it difficult to train
effective models.

 Grammatical Structure Differences: Differences in syntax, morphology, and semantics across


languages pose significant challenges. For example, English uses prepositions to indicate relationships
(e.g., "on the table"), whereas languages like Russian use case endings, requiring careful handling
during translation.

 Maintaining Cultural and Contextual Nuances: Language often carries cultural and regional meaning
that is hard to convey directly in another language. Maintaining these nuances is critical for accurate,
context-sensitive translations.

Downloaded by Mohamed Mustaq Ahmed


Q2 Explain the applications of Natural Language Processing.

Ans.

1. Machine Translation: Automatically translating text from one language to another (e.g.,
Google Translate). Useful for bridging language barriers in global communication.
2. Sentiment Analysis: Identifying and categorizing opinions expressed in text, commonly used in
social media monitoring, customer feedback analysis, and brand reputation management.
3. Chatbots and Virtual Assistants: Enabling interactive, conversational agents like Siri, Alexa, and
customer service bots to assist users, answer questions, and perform tasks through natural
language.
4. Speech Recognition: Converting spoken language into text, used in transcription services, voice
search, and accessibility tools for voice-activated systems.
5. Text Summarization: Automatically generating concise summaries of large documents, useful for news
aggregation, legal document analysis, and summarizing research papers.
6. Information Retrieval: Searching and retrieving relevant information from large datasets (e.g.,
search engines like Google) based on user queries.
7. Named Entity Recognition (NER): Identifying and classifying named entities like names,
locations, organizations, etc., in text, used in information extraction and database indexing.
8. Text Classification: Automatically categorizing text into predefined categories (e.g., spam detection
in emails, topic categorization in news articles).
9. Optical Character Recognition (OCR): Converting scanned images of text into editable and searchable
text, widely used in digitizing printed documents, invoices, and receipts.
10. Question Answering Systems: Providing precise answers to user questions by searching large
knowledge bases or text corpora, commonly used in search engines and customer
service.
11. Recommendation Systems: Using NLP to analyze user reviews and preferences to
recommend products, movies, articles, etc., enhancing personalized experiences.

Q3 Explain the various stages of Natural Language Processing.

Ans.

1. Text Preprocessing

Text preprocessing involves cleaning and preparing raw text data for further analysis. This stage typically
includes tokenization (splitting text into words or phrases), removing stop words (common but
uninformative words like "the" or "and"), and normalizing text (standardizing words by lowercasing or
lemmatizing them). Preprocessing helps reduce noise and standardize text to improve the accuracy of
NLP tasks. Handling different languages, slang, or typos also adds complexity to this foundational step.

2. Syntactic Analysis (Parsing)

Syntactic analysis, or parsing, examines the grammatical structure of sentences. It identifies parts of speech
(e.g., nouns, verbs) and analyzes dependencies between words (e.g., subject-verb-object
relationships).
This structural analysis enables NLP systems to understand sentence construction and hierarchy.

Downloaded by Mohamed Mustaq Ahmed


Challenges include handling ambiguous structures and complex syntax, but syntactic parsing is
essential for applications like machine translation and grammar checking.

3. Semantic Analysis

Semantic analysis seeks to capture the meaning of text by interpreting word senses, phrases, and
sentence structures. Key techniques include word sense disambiguation (determining the correct
meaning of words with multiple interpretations) and named entity recognition (identifying entities like
names, locations, and organizations). This stage helps extract accurate meaning from text, a
requirement for applications like information retrieval, question answering, and summarization.

4. Pragmatic Analysis

Pragmatic analysis deals with understanding context and the intended meaning of text, going beyond
literal interpretation. This stage involves disambiguating references, identifying the speaker's intent, and
recognizing nuances in language like sarcasm or implied meaning. Pragmatics is crucial for dialogue
systems, as it enables them to interpret user queries within context and respond appropriately, often
relying on world knowledge and contextual clues.

5. Discourse Analysis

Discourse analysis focuses on interpreting connected pieces of text, understanding how sentences relate
to each other across paragraphs and documents. This stage includes tasks like coreference resolution
(linking pronouns or phrases to specific entities) and coherence analysis (ensuring logical flow
between ideas).
Discourse analysis is essential in summarization and document understanding, enabling systems
to comprehend and generate coherent, logically structured outputs.
6. Machine Translation

Machine translation converts text from one language to another, taking into account grammar,
semantics, and cultural nuances. This complex task goes beyond simple word-for-word translation,
requiring syntactic and semantic understanding to maintain accuracy and fluency. Challenges include
handling idiomatic expressions and maintaining context across sentences, making machine
translation a sophisticated NLP application dependent on various stages of language processing.

7. Text Generation and Summarization

In text generation, NLP systems create coherent and relevant text based on prompts or structured
inputs. Summarization condenses information, extracting key points while retaining essential meaning.
This stage often uses language models and involves generating summaries, articles, or conversational
responses. It’s essential for applications like news aggregation, report generation, and chatbots,
though challenges remain in maintaining factual accuracy, coherence, and relevance.

Module 2 : Word Level Analysis.

Q1 Explain and Illustrate the working of Porter stemmer algorithm.

Downloaded by Mohamed Mustaq Ahmed


Ans.

The Porter Stemmer algorithm is a rule-based approach for reducing words to their root or base form,
called the stem. Developed by Martin Porter in 1980, it’s commonly used in Natural Language Processing
(NLP) tasks like text preprocessing for information retrieval, where similar forms of a word need to be
matched (e.g., "running" and "run").

The algorithm applies a series of transformation rules in five sequential steps, modifying suffixes to
reduce words to their stems. These rules work by examining the suffixes and applying specific
conditions to determine if they should be removed or replaced. The steps of the Porter Stemmer are as
follows:

Step-by-Step Working of the Porter Stemmer Algorithm

1. Step 1: Removing Plurals and -ed, -ing Suffixes

 The first step handles suffixes like "-s," "-es," "-ed," and "-ing."

2. Step 2: Handling “-y” Suffix

 This step converts words ending in "y" to "i" if there’s a vowel earlier in the word.

3. Step 3: Handling Double Suffixes like “-ational” or “-izer”

 Step 3 applies transformations for suffixes commonly found in adjectives and nouns like "-ational,"
"- izer," "-ization," etc.

4. Step 4: Removing Suffixes like “-al,” “-ance,” and “-ence”

 The fourth step handles suffixes often found in nouns and adjectives, like "-al," "-ence," "-ance,"
and "- able."

5. Step 5: Removing the Final “-e”

 In the final step, the algorithm removes a trailing "e" from words, but only if this doesn’t reduce
the word to fewer than three characters.

Example Illustrations

Let’s go through a few example words to see how the Porter Stemmer applies these steps:

1. Word: “caresses”

 Step 1: Ends with "-sses" → Remove "-es" → Result: "caress"

 Step 5: No further suffixes to remove

 Stemmed Word: "caress"

2. Word: “relational”

 Step 3: Ends with "-ational" → Replace with "-ate" → Result: "relate"

 Stemmed Word: "relate"

Downloaded by Mohamed Mustaq Ahmed


3. Word: “happiness”

 Step 1: No change

 Step 2: Ends with "-ness" → Remove "-ness" → Result: "happy"

 Stemmed Word: "happi"

4. Word: “agreed”

 Step 1: Ends with "-ed" → Remove "-ed" → Result: "agree"

 Stemmed Word: "agree"

Q2 Illustrate the concept of tokenization and stemming in Natural Language Processing.

Ans.

In Natural Language Processing (NLP), tokenization and stemming are two fundamental preprocessing
steps that help convert raw text into a more manageable format for further analysis. These techniques
are used to break down and simplify text to extract meaningful patterns, making it easier for NLP
models to process.

1. Tokenization

Tokenization is the process of splitting text into smaller units, known as tokens. These tokens can be words,
subwords, sentences, or even characters. Tokenization is important because it helps to transform
unstructured text into structured units that can be processed by algorithms.

Types of Tokenization:

 Word Tokenization: Splitting a sentence into individual words.

 Sentence Tokenization: Splitting a paragraph or text into individual sentences.

Example of Tokenization:

 Sentence: “Natural Language Processing is fun!”


 Word Tokenization: [“Natural”, “Language”, “Processing”, “is”, “fun”]
 Sentence Tokenization: [“Natural Language Processing is fun!”]

2.Stemming

Stemming is the process of reducing words to their root form (also called the stem). This helps in grouping
related words that share the same base form but might have different suffixes (e.g., “running,” “runner,”
and “ran” all stem to "run"). Stemming removes prefixes or suffixes using predefined rules to reduce
words to their core meaning.

Stemming is often implemented using algorithms like the Porter Stemmer. Example of
Stemming:

Downloaded by Mohamed Mustaq Ahmed


 “running” → “run”

 “better” → “better” (no change)

 “happiness” → “happi” (the stem “happi” is produced)

Illustration of Tokenization and Stemming

Let’s walk through a practical example, combining both tokenization and stemming.
Text: “The runners were running fast because they are happy.”

Step 1: Tokenization

We first tokenize the text into words (word tokenization).

 Tokens: ["The", "runners", "were", "running", "fast", "because", "they", "are", "happy"]

Step 2: Stemming

Next, we apply stemming to reduce words to their base forms.

 “runners” → “runner” (removing the plural suffix “-s”)

 “running” → “run” (removing the suffix “-ing”)

 “happy” → “happi” (Porter stemmer produces a non-standard form)

 Other words like “The,” “were,” “fast,” “because,” “they,” and “are” either remain unchanged or
aren’t stemmed because they are already in their base form.
Stemmed Tokens: ["The", "runner", "were", "run", "fast", "because", "they", "are", "happi"]

Q3 Explain inflectional and derivational morphology with example.

Ans.

1. Inflectional Morphology

Inflectional morphology refers to the process of adding inflectional morphemes to a base word to express
grammatical features like tense, number, case, person, or gender. These morphemes do not change
the core meaning of the word but alter its grammatical role within a sentence.
Key Characteristics:

 Inflection does not change the word’s part of speech.

 It is primarily used to convey grammatical information such as tense, number, or possession.

 Inflectional forms are often predictable and follow standard rules.

Examples of Inflectional Morphology:

 Verb Tense:

Downloaded by Mohamed Mustaq Ahmed


o "run" (base form) → "ran" (past tense)

o "run" → "running" (present participle)

o "run" → "runs" (third-person singular present)

 Plural Nouns:

o "cat" → "cats" (plural)

o "child" → "children" (irregular plural)

 Possession:

o "dog" → "dog’s" (possessive form)

 Adjective Comparison:

o "big" → "bigger" (comparative)

o "big" → "biggest" (superlative)

2.Derivational Morphology

Derivational morphology involves adding derivational morphemes to a base word to create a new word
with a different meaning or a different part of speech. Unlike inflection, derivational morphemes can
change the meaning of a word and often change its syntactic category (e.g., from a noun to a verb, or
from an adjective to a noun).

Key Characteristics:

 Derivation often changes the word's part of speech.

 It can produce words with meanings that are more specific or have a new context.

 The resulting word may have a completely different grammatical role.

Examples of Derivational Morphology:

 Verb to Noun:

o "act" → "action" (derivational morpheme: "-ion")

o "decide" → "decision" (derivational morpheme: "-ion")

 Adjective to Noun:

o "happy" → "happiness" (derivational morpheme: "-ness")

o "active" → "activity" (derivational morpheme: "-ity")

 Noun to Adjective:

o "danger" → "dangerous" (derivational morpheme: "-ous")

o "beauty" → "beautiful" (derivational morpheme: "-ful")

Downloaded by Mohamed Mustaq Ahmed


 Adjective to Adverb:

o "quick" → "quickly" (derivational morpheme: "-ly")

o "careful" → "carefully" (derivational morpheme: "-ly")

Q3 Explain Good Turing Discounting.

Ans.

Good-Turing Discounting is a statistical technique used to estimate the probability of unseen events in
probabilistic models, especially in the context of natural language processing and speech recognition. It is
particularly useful when dealing with situations where you have a frequency distribution of observed
events, and you want to estimate probabilities for events that have not been observed during training (i.e.,
zero- frequency events).

The Good-Turing discounting method is based on the observation that events that have been observed
once or a few times are more likely to appear again, even if they have low observed frequencies. The
core idea of Good-Turing is to adjust the frequency counts of observed events to account for the
possibility of unseen events.

In a typical probabilistic model, such as a language model, we estimate the probability of an event (like
the appearance of a word or a phrase) based on its frequency in the observed data. However, for
rare events or words that don’t appear in the data at all, direct probability estimation leads to zero
probability, which is not helpful, especially for tasks like language modeling, where unseen events can
be common.

Good-Turing Discounting corrects this by redistributing probability mass from the more frequent events to
the unseen ones.

Advantages of Good-Turing Discounting

 Handling Zero-Frequency Events: The primary advantage of Good-Turing discounting is its ability to
handle zero-frequency events, which is a common issue in natural language processing, where
many possible word combinations may never be observed in training.

 Improved Probability Estimation: It leads to more accurate probability estimates for rare or unseen
events by adjusting the frequency counts and redistributing probability mass.

 Works in Various Domains: Though originally developed for language models, Good-Turing discounting
has been applied to a wide variety of fields, such as speech recognition, computational biology,
and other domains involving probabilistic models.

Q4 Explain how N-gram model is used in spelling correction.

Ans.

Downloaded by Mohamed Mustaq Ahmed


An N-gram model is a probabilistic language model that uses the frequency of word sequences (n-
grams) to predict the likelihood of a word or phrase occurring in a given context. In the context of spelling
correction, an N-gram model can help identify and correct misspelled words by considering the
likelihood of a word or sequence of words based on the context in which they appear.

How the N-gram Model Works for Spelling Correction

Spelling correction using an N-gram model typically involves two main tasks: error detection (identifying
the misspelled word) and error correction (suggesting the correct word). The N-gram model helps in both
these tasks by leveraging the relationship between words and their surrounding context.

1. Error Detection

In the first step, the system needs to detect whether a word is misspelled or not. While traditional
spelling correction methods often rely on dictionaries (word lists) and simple rules, the N-gram model
uses the context of surrounding words to identify if a word is unusual or possibly misspelled.

 Contextual Analysis: The N-gram model works by analyzing the sequence of words in the
context (e.g., in a sentence). It compares the probability of the word's occurrence in a given
context.

For example, in a sentence like "The quick bown fox jumps over the lazy dog," the word "bown" is likely
identified as a misspelling because the context around it ("quick" and "fox") makes the word
"brown" more likely, even if "bown" itself is a valid word.

 Probabilistic Analysis: The model calculates the probability of a sequence of words appearing
together. If the probability of the word in the context is low (for example, because it's not common
in the language or doesn’t fit the pattern), it can be flagged as a possible spelling error.

2. Error Correction

Once a misspelled word is detected, the N-gram model can help suggest the correct word. This is
done by:

 Generating Candidate Words: For a detected misspelled word, the system generates a list
of possible candidate corrections. This could be done using various methods, such as:

o Edit distance: Suggesting words that are a small number of character changes away from
the misspelled word (e.g., "bown" → "brown").

o Phonetic similarity: Using techniques like Soundex or Metaphone to find words that
sound similar to the misspelled word.

 Contextual Scoring: The N-gram model evaluates the candidate words based on the surrounding
context. For each candidate, it calculates the probability of the sequence of words (including the
candidate word) appearing in the given context.

For example, in the sentence "The quick bown fox...", the N-gram model will compute the probability of
sequences like "quick brown fox" and "quick bown fox". The word "brown" will have a higher likelihood

Downloaded by Mohamed Mustaq Ahmed


(probability) given the context (because "quick brown fox" is a more common sequence in English than
"quick bown fox").

 Selecting the Best Candidate: The model selects the word that maximizes the likelihood of the
word sequence. If "brown" has a higher probability than "bown" in the context, it will be chosen as
the correct word.

Module 3: Syntax Analysis.

Q1 Discuss the challenges in part of speech (POS) tagging.

Ans.

1. Ambiguity in Word Meanings

Words can serve as multiple parts of speech depending on the context. For example, the word "bank"
can be a noun (a financial institution) or a verb (to rely on something). Deciding which POS to assign
in such cases requires a deep understanding of the context, making it one of the primary challenges
in POS tagging.

2.Polysemy

Polysemy refers to a single word having multiple meanings or uses, often based on the context in which it
appears. This is closely related to ambiguity but focuses on words that share the same form but have
distinct meanings that influence their POS tag.

3. Context Sensitivity

The correct POS tag for a word is heavily dependent on its surrounding words, which can introduce
complexity. Words may change their role based on syntactic context, and this requires the tagger to
use contextual cues effectively, making accurate tagging challenging.

4. Word Segmentation and Tokenization

In some languages, such as Chinese or Japanese, there is no clear boundary between words (i.e., no
spaces between words), which makes tokenization and POS tagging much more difficult. Accurately
segmenting words in these languages is a prerequisite for correct POS tagging.

5. Handling Out-of-Vocabulary (OOV) Words

POS taggers typically rely on a predefined vocabulary of words. However, when encountering a word
that was not present during training (an out-of-vocabulary or OOV word), it is difficult to assign the
correct POS tag. This challenge is particularly common in domains like social media or informal speech,
where new words, abbreviations, or slang are frequently used.

6. Ambiguity of Word Boundaries

Some languages, like German, have compound words where the boundaries between individual
components are unclear, making POS tagging difficult. These compounds can be interpreted in
different ways depending on how they are split.

Downloaded by Mohamed Mustaq Ahmed


7.Language-Specific Challenges

Different languages exhibit different syntactic structures, which means POS tagging algorithms must
be tailored to the specific linguistic characteristics of the language. For example, languages like English
have relatively fixed word order (subject-verb-object), while languages like Japanese or Arabic have
free word order, which complicates POS tagging.

8. Sparse Data and Rare Tags

Some words may occur only a few times in the training data, leading to sparse data problems. Rare or
unseen POS tags, especially in languages with rich morphology, make it hard for POS taggers to
generalize well across all possible word forms or usage scenarios.

9. Ambiguity in Multi-Word Expressions (MWEs)

Some phrases or expressions have a fixed syntactic or semantic meaning, and their correct POS
tagging depends on the entire expression rather than individual words. Identifying and tagging these
multi-word expressions can be difficult.

10. Errors in Training Data

The accuracy of a POS tagger largely depends on the quality of the annotated training data. If the
training data contains errors or inconsistencies in POS tagging, these errors can propagate and lead to
incorrect tagging when the model is applied to new text.

Q2 Demonstrate the concept of conditional random field in NLP.

Ans.

A Conditional Random Field (CRF) is a type of discriminative probabilistic model commonly used in
structured prediction tasks in Natural Language Processing (NLP). It is particularly useful for tasks where
the goal is to predict a sequence of labels (tags) for a sequence of input data, such as part-of-speech
tagging, named entity recognition (NER), and chunking.

CRFs are useful in NLP because they model the conditional probability of a label sequence given an
input sequence, conditional on the observed data. Unlike models like Hidden Markov Models (HMM),
which generate both the data and the labels, CRFs focus only on predicting the labels conditioned on
the observed sequence, making them more flexible and powerful for many structured prediction
tasks.

Key Concepts in CRF

1. Conditional Probability Model:

 A CRF directly models the conditional probability P(Y∣X)P(Y|X)P(Y∣X) of the output label sequence
YYY given an input sequence XXX, where YYY is a sequence of labels and XXX is a sequence of
observed features (e.g., words in a sentence).

Downloaded by Mohamed Mustaq Ahmed


 CRFs are called discriminative because they model the decision boundary between different label
sequences, directly focusing on P(Y∣X)P(Y|X)P(Y∣X) instead of modeling P(X,Y)P(X, Y)P(X,Y).

2. Structured Prediction:

 In NLP, the label space is often structured (e.g., a sequence of labels), and CRFs are
designed to handle this by capturing dependencies between neighboring labels. This makes
them suitable for sequence labeling tasks where the label of one token depends on the labels of
neighboring tokens.

3. Markov Assumption:

 CRFs assume that the labels depend on a small context, typically the current token and its
neighboring tokens, thus making them more effective in capturing dependencies between
labels. This is similar to Hidden Markov Models (HMMs), but CRFs allow for more complex
dependencies between the labels.

Advantages of CRFs in NLP

1. Contextual Dependencies: CRFs model the dependencies between adjacent labels, which helps
capture syntactic relationships and contextual information, resulting in more accurate
predictions.

2. Flexible Feature Engineering: CRFs allow for the use of a wide range of features (e.g., word-
level, character-level, lexicon-based features), making them highly customizable for different
tasks.

3. Discriminative Nature: Unlike generative models like Hidden Markov Models (HMMs), CRFs are
discriminative, meaning they directly model the conditional probability P(Y∣X)P(Y|X)P(Y∣X),
which typically leads to better performance.

Q3 Explain hidden Markov model for POS based tagging.

Ans.

The Hidden Markov Model (HMM) is a statistical model that can be effectively applied to Part-of-
Speech (POS) tagging in Natural Language Processing (NLP). POS tagging involves assigning a
grammatical category (such as noun, verb, adjective, etc.) to each word in a sentence, and HMM is
one of the classical techniques used for this sequence labeling task.

An HMM is a type of probabilistic model that assumes the presence of hidden (latent) states which can
only be observed through certain outputs. In the case of POS tagging, the hidden states are the POS
tags, and the observations are the words in the sentence.

Key Components of HMM for POS Tagging

1. States: The hidden states correspond to the possible POS tags (e.g., NN for noun, VB for verb, JJ
for adjective, etc.).

2. Observations: The observations are the words in the sentence. For each word, we need to assign a
POS tag, but we only observe the word (not the tag directly), making the tag “hidden” from us.

Downloaded by Mohamed Mustaq Ahmed


3. Transition Probabilities (A): These are the probabilities of transitioning from one state (POS tag) to
another. They represent how likely it is for a certain POS tag to follow another tag in a sentence.

4. Emission Probabilities (B): These represent the probability of a word being associated with a particular
POS tag. It reflects how likely a given word is generated by a specific POS tag.

5. Initial Probabilities (π): These represent the probability distribution over the starting POS tags in
the sentence (i.e., the tag of the first word in the sequence).

Advantages of HMM for POS Tagging

1. Simplicity: HMMs are relatively simple to implement and understand, making them a good
starting point for sequence labeling tasks like POS tagging.

2. Captures Sequential Dependencies: HMMs model the dependencies between neighboring words in a
sentence through transition probabilities.

3. Effective for Small Datasets: HMMs work well for smaller labeled datasets, especially when training
data is limited.

Disadvantages of HMM for POS Tagging

1. Independence Assumption: HMMs assume that the current word’s tag depends only on the previous
tag, ignoring longer-range dependencies. This can limit the model's ability to capture more
complex syntactic structures.

2. Limited Feature Set: HMMs use only the word’s context (previous and next tags) for tagging,
which restricts the range of features they can consider (e.g., morphological features, word
clusters).

3. Poor Performance with Sparse Data: HMMs can struggle when faced with unseen words or rare
tags, as they rely on previously observed tag sequences for estimating probabilities.

Q4 What is the rule-based and stochastic part of speech taggers.

Ans.

Part-of-Speech (POS) tagging is a crucial task in Natural Language Processing (NLP) where the goal is to
assign a grammatical category (like noun, verb, adjective, etc.) to each word in a sentence. There are
two main types of POS taggers based on how they approach the task: rule-based POS taggers and
stochastic POS taggers. Each type has its own methodology, advantages, and limitations.

Rule-Based POS Taggers

A rule-based POS tagger assigns POS tags to words in a sentence based on predefined linguistic
rules. These rules typically involve the structure and context of the sentence to determine the
appropriate tag for each word. Rule-based taggers rely heavily on linguistic knowledge and
require extensive manual effort to craft the rules.

How Rule-Based Taggers Work:

Downloaded by Mohamed Mustaq Ahmed


 Lexicon: A rule-based POS tagger typically has a lexicon (or dictionary) of words with possible
POS tags.

 Transformation Rules: These are patterns or heuristics that are applied to decide the correct tag.
For example, a rule might state: "If a word ends in 'ing', it is most likely a verb (VBG)" or "If a word
is preceded by an article (e.g., 'the'), it is likely a noun."

 Contextual Clues: Rule-based taggers often consider the context of a word within a sentence. For
example, the word "can" could be tagged as a verb ("can do") or as a noun ("a tin can").
Rules help disambiguate such cases based on neighboring words.

Stochastic POS Taggers

A stochastic POS tagger relies on statistical models to assign POS tags to words in a sentence.
Unlike rule-based taggers, which depend on handcrafted rules, stochastic taggers use data-
driven approaches to learn how words are tagged based on probabilistic models. The most
common stochastic taggers use methods like Hidden Markov Models (HMM) or Maximum Entropy
Models.

How Stochastic Taggers Work:

 Probabilistic Models: Stochastic taggers estimate the likelihood of a POS tag for a given word
based on statistical patterns observed in a training corpus. These models use:

o Emission Probabilities: The probability of a word given a specific POS tag (e.g., P(word | tag)).

o Transition Probabilities: The probability of a POS tag following another tag (e.g., P(tag2 | tag1)).

 Training: Stochastic taggers learn these probabilities from a large annotated corpus (a text corpus
where each word is tagged with its correct POS). The model "learns" the likelihood of various
tags occurring in the context of surrounding words.

 Decoding: Once trained, stochastic models can assign tags to new sentences by using
algorithms like the Viterbi Algorithm (for sequence tagging tasks) to find the most likely sequence
of tags given the observed words.

Q5 Explain Maximum Entropy Model for POS Tagging.

Ans.

The Maximum Entropy Model (MEM) is a probabilistic framework used for solving classification tasks in
Natural Language Processing (NLP), including Part-of-Speech (POS) tagging. The core idea behind MEM is
that, when we have incomplete information about the world (i.e., in the case of POS tagging, we do not
know the true tags for words in a sentence), we should make predictions that are as uncertain (or
"uniform") as possible, unless we have strong evidence to the contrary. This helps ensure that no
unnecessary assumptions are made. The principle of maximum entropy essentially states that we should
choose the probability distribution that has the highest entropy (i.e., the least bias), subject to the
constraints that we know.

Downloaded by Mohamed Mustaq Ahmed


Key Concepts of Maximum Entropy Models

1. Entropy: In information theory, entropy is a measure of uncertainty or unpredictability. The higher the
entropy, the more uncertain the system. For example, if you have a perfectly balanced coin, the
entropy of that system is high because you have no reason to prefer heads or tails. In contrast, a
biased coin with a higher chance of landing on heads would have lower entropy.

2. Maximum Entropy Principle: The principle of maximum entropy states that, when trying to predict
something (such as a POS tag), we should choose the probability distribution that maximizes
entropy, subject to the constraints provided by the available data. In other words, we should
avoid making assumptions beyond the information already provided by the data. This helps
prevent overfitting and ensures that the model is as general as possible.

3. Features: The features used in a Maximum Entropy model for POS tagging are typically based on the
context of a word within a sentence. These could include the surrounding words, word morphology
(e.g., suffixes), position in the sentence, and other linguistic cues that might help determine the
correct tag.

4. Conditional Probability: In POS tagging, we are interested in the conditional probability of a word's
POS tag given its context.

Advantages of Maximum Entropy Models for POS Tagging

1. Flexibility: MEM can handle a wide variety of feature types, including both local (e.g., word and
suffix) and global features (e.g., the POS tags of neighboring words), making it highly adaptable.

2. No Assumptions About Feature Interactions: MEM does not assume independence between features,
unlike models such as Naive Bayes. This flexibility allows it to model complex feature interactions.
3. Generalization: The model generalizes well because it does not rely on overly specific
assumptions, reducing the risk of overfitting.

4. Continuous Model: It can be adapted to new linguistic data without significant changes to the
underlying structure, as it can continue learning from new examples.

Q6 Explain the use of Probabilistic Context Free Grammar (PCFG) in Natural Language Processing.

Ans.

A Probabilistic Context-Free Grammar (PCFG) is an extension of Context-Free Grammar (CFG) used


in Natural Language Processing (NLP) to model the syntactic structure of languages. It assigns
probabilities to the production rules in a CFG, allowing it to handle ambiguity by making probabilistic
choices about which syntactic structure is most likely for a given sentence.

In a Context-Free Grammar, the structure of a language is defined using a set of production rules,
where each rule expresses how a non-terminal symbol can be replaced with a sequence of terminal or
non- terminal symbols. A Probabilistic Context-Free Grammar builds on this by assigning probabilities to
these production rules, which enables the model to choose the most likely parse when given multiple
possible.

Downloaded by Mohamed Mustaq Ahmed


Applications of PCFG in NLP

1. Syntactic Parsing:

 PCFGs are primarily used in syntactic parsing, where the goal is to generate the syntactic
structure (parse tree) of a given sentence. A PCFG helps to efficiently find the most likely parse
tree by selecting the production rules with the highest probabilities.

2. Disambiguation in Parsing:

 Sentences with multiple possible structures are common in natural language. A PCFG helps in
disambiguating these structures by providing probabilities to guide the parser toward the most
likely syntactic interpretation. This is particularly useful for handling ambiguity in sentence
structure.

3. Language Modeling:

 PCFGs can also be used to improve language modeling, especially in tasks that require both
syntactic and probabilistic considerations, like machine translation, information extraction,
or speech recognition.

4. Ambiguity Handling:

 In word sense disambiguation (WSD) and other NLP tasks, the probabilistic nature of PCFGs
can help select the most likely syntactic interpretation based on context. This is essential for tasks
like part-of- speech tagging and named entity recognition where the same word might have
different meanings or grammatical roles depending on its context.

5. Improved Parsing with Probabilistic Information:

 While CFGs are useful for representing the structure of a language, they may not perform well
when multiple structures are valid. By adding probabilities to the production rules, PCFGs provide
a richer and more flexible framework for parsing sentences in real-world applications.

Advantages of PCFGs

1. Disambiguation: The primary advantage of a PCFG over a regular CFG is its ability to resolve
syntactic ambiguity by selecting the most probable parse based on training data.

2. Data-Driven: PCFGs learn the probabilities of the production rules from annotated training corpora.
This makes them adaptable to different languages and domains without requiring manual rule
crafting.
3. Efficiency: In parsing tasks, PCFGs help reduce the search space by assigning probabilities to
various possible parse trees, making it easier to select the most likely tree without exhaustively
checking all possibilities.

4. Handling Complex Sentences: PCFGs can handle more complex syntactic structures effectively,
especially when sentences are long or involve deep nesting of phrases, by using probabilistic
measures to prefer more natural or common structures.

Downloaded by Mohamed Mustaq Ahmed


Module 4: Semantic Analysis.

Q1 Explain with suitable example the following relationships between word meanings:
Hyponymy, Hypernymy, Meronymy, Holynymy.

Ans.

In linguistics, words can have various kinds of semantic relationships with each other. These relationships
help define how words are connected or related in meaning. Some of these relationships are hyponymy,
hypernymy, meronymy, and holonymy. Below is an explanation of each relationship, along with
examples.

1. Hyponymy (Specificity Relationship)

 Definition: Hyponymy is the relationship between a more specific term (hyponym) and a
more general term (hypernym). A hyponym represents a subclass or a specific instance of
a broader category denoted by the hypernym.

 Example:

o Hyponym: Poodle

o Hypernym: Dog

In this case, Poodle is a specific type of dog. The word dog is the more general term, and Poodle is a
more specific instance under that category.

2. Hypernymy (Generalization Relationship)

 Definition: Hypernymy is the opposite of hyponymy. A hypernym is a general term that covers a
broad category, which includes more specific terms (hyponyms). It is a broader concept
under which many specific items (hyponyms) can be classified.

 Example:

o Hypernym: Vehicle

o Hyponym: Car

In this case, Vehicle is a more general term, and Car is a specific example of a vehicle. Vehicle
encompasses all kinds of vehicles like cars, trucks, motorcycles, etc.

3. Meronymy (Part-Whole Relationship)

 Definition: Meronymy refers to the relationship between a part (meronym) and the whole
(holonym). A meronym is a word that denotes a part of something, while a holonym refers to
the whole entity that the part belongs to. This relationship captures how objects or concepts are
composed of smaller parts.

 Example:

o Meronym: Wheel

Downloaded by Mohamed Mustaq Ahmed


o Holonym: Car

In this case, Wheel is a part of a Car. A car is composed of several parts, one of which is the wheel.

4. Holonymy (Whole-Part Relationship)

 Definition: Holonymy is the opposite of meronymy. A holonym is a term that refers to the
whole entity that is made up of various parts, while a meronym is a term for a part of the
whole. This relationship illustrates how a whole is composed of multiple parts.

 Example:

o Holonym: Car

o Meronym: Wheel

Here, Car is the whole, and Wheel is one of the parts that make up the car.

Summary of Relationships:

Relationship Definition Example

Specific term (hyponym) under a broader category Poodle (hyponym) → Dog


Hyponymy (hypernym). (hypernym)

general term (hypernym) that includes specific examples


Vehicle (hypernym) → Car
Hypernymy (hyponyms). (hyponym)

Wheel (meronym) → Car


Meronymy part (meronym) of a whole (holonym). (holonym)

Car (holonym) → Wheel


Holonymy whole (holonym) made up of parts (meronyms). (meronym)

Q2 Explain the Lesk algorithm for Word Sense Disambiguation.

Ans.

The Lesk algorithm is a classic algorithm used for Word Sense Disambiguation (WSD) in Natural Language
Processing (NLP). The goal of WSD is to determine the correct meaning (sense) of a word based on its
context in a sentence. Since many words have multiple meanings (polysemy), choosing the right
sense from a dictionary or lexical resource is crucial for understanding text correctly.

The Lesk algorithm is based on the idea that the correct sense of a word is the one that has the most
overlap (in terms of shared words or definitions) with its surrounding context. The algorithm uses
dictionary definitions (often taken from WordNet or similar lexical databases) to determine which sense of
the word is most likely in a particular context.

Steps of the Lesk Algorithm

Downloaded by Mohamed Mustaq Ahmed


1. Extract the Context:

 Given a word with multiple senses (meanings), the first step is to extract the context of the word.
Typically, this involves the surrounding words in a sentence or a window of words around the
target word.

 For example, in the sentence "I went to the bank to deposit money." the target word is "bank,"
which could refer to a financial institution or the side of a river.

2. Define the Senses:

 For the target word, the algorithm gathers all possible senses (meanings) from a lexical resource
like WordNet. Each sense comes with a dictionary definition and possibly some example usage.

 In the case of "bank," there might be two senses:

o Sense 1: A financial institution.

o Sense 2: The side of a river.

3. Compute the Overlap:

 For each sense of the word, the algorithm compares the definitions (or glosses) of that sense
with the context words. The key idea is that the sense that shares the most common words with
the context is the most likely sense.

 Overlap refers to the number of common words between the context and the definition of the
sense.

o For example, if the context includes the words "deposit" and "money," the definition for the
financial institution sense of "bank" may have a high overlap with words like "money" or
"financial."

4. Choose the Sense with Maximum Overlap:

 After calculating the overlap for each sense, the algorithm selects the sense that has the
maximum overlap with the context. This sense is then assumed to be the correct meaning in the
given context.

5. Return the Disambiguated Sense:

 The word's sense with the highest overlap is returned as the final result.

Q3 Describe the semantic analysis in Natural Language Processing.

Ans.

Semantic analysis in NLP refers to the process of extracting meaning from text. While syntactic analysis
focuses on the structure of sentences (e.g., parsing sentence trees), semantic analysis focuses on
understanding the meaning of words, phrases, sentences, and larger text units. It is a crucial part of NLP,
as it enables machines to understand and process human language in a way that is closer to human
comprehension.

Downloaded by Mohamed Mustaq Ahmed


The goal of semantic analysis is to convert unstructured text into a structured representation that a
machine can process and reason about. This includes resolving ambiguities, understanding word
meanings, relationships between concepts, and extracting useful information from the text.

Key Concepts in Semantic Analysis

1. Word Sense Disambiguation (WSD):

 Definition: WSD refers to the process of determining which meaning of a word is used in a
given context when the word has multiple meanings (polysemy).

 Example: The word "bank" can refer to a financial institution or the side of a river. WSD helps
choose the correct sense based on the surrounding context (e.g., "I went to the bank to deposit
money" indicates the financial institution sense).

2. Named Entity Recognition (NER):

 Definition: NER involves identifying and classifying key entities in text, such as persons,
organizations, locations, dates, etc.

 Example: In the sentence "Barack Obama was born in Honolulu," NER would identify "Barack
Obama" as a Person, and "Honolulu" as a Location.

3. Coreference Resolution:

 Definition: Coreference resolution identifies when different expressions in a text refer to the
same entity. This is important for understanding the relationships between words and phrases.

 Example: In the sentence "John went to the store. He bought some milk," "He" refers to "John,"
and coreference resolution links the two.

4. Sentiment Analysis:

 Definition: Sentiment analysis determines the sentiment or opinion expressed in a piece of


text, typically categorizing it as positive, negative, or neutral.

 Example: In the sentence "I love this phone, it's amazing!" the sentiment is positive.

5. Word Sense Induction (WSI):

 Definition: WSI is the task of automatically discovering the possible senses of a word from its
usage in a large corpus of text without pre-labeled senses.

 Example: For the word "bank," WSI might find different senses based on the contexts in which
"bank" appears in large corpora (e.g., financial sense, riverbank sense).

6. Semantic Role Labeling (SRL):

 Definition: SRL assigns roles to constituents in a sentence based on the verb. These roles include
Agent, Theme, Goal, etc., which describe the participants in an event.

 Example: In the sentence "John gave Mary a gift," the roles could be:

Downloaded by Mohamed Mustaq Ahmed


o Agent: John

o Recipient: Mary

o Theme: gift

7. Word Embeddings and Distributional Semantics:

 Definition: Word embeddings represent words as vectors in a continuous vector space,


where similar words have similar vector representations. This helps capture semantic
similarity and relationships between words.

 Example: Words like "king" and "queen" are represented as vectors that are close to each
other, reflecting their semantic similarity. Similarly, "king" and "man" might be related in a
different way compared to "king" and "woman."

Q4 Demonstrate Lexical semantic analysis using an example.

Ans.

Lexical semantic analysis involves the study of word meanings and their relationships within language.
It focuses on understanding how individual words carry meaning, how words can have multiple
meanings (polysemy), and how words relate to each other in terms of synonyms, antonyms,
hyponyms, hypernyms, etc.

To demonstrate lexical semantic analysis, we’ll look at an example where we analyze the meanings and
relationships of words in a given context.

Example Sentence:

"The bat flew out of the cave at dusk."

In this sentence, the word "bat" is ambiguous, as it could have two meanings:

1. Bat (the animal) - A small flying mammal.

2. Bat (the sports equipment) - A piece of equipment used in games like baseball or cricket.
To analyze the word "bat" in this sentence, we need to perform lexical semantic analysis.
Steps in Lexical Semantic Analysis

Step 1: Identify Potential Word Senses (Word Sense Disambiguation)

The first task is to identify the potential meanings of the ambiguous word “bat” using a lexical resource
such as WordNet.

 Sense 1: Bat (animal):

o Meaning: A small, nocturnal flying mammal, often found in caves.

o Example: "The bat flew out of the cave at dusk."

Downloaded by Mohamed Mustaq Ahmed


 Sense 2: Bat (sports equipment):

o Meaning: A piece of equipment used to hit a ball, typically in baseball or cricket.

o Example: "He hit the ball with the bat."

Step 2: Contextual Clues for Disambiguation

Now, we examine the context to choose the correct meaning of "bat" in this specific sentence. We
analyze the surrounding words for clues:

 "The bat flew out of the cave at dusk."

o The words "cave" and "dusk" provide important clues:

 Cave: The bat is likely referring to the animal because bats are commonly
associated with caves.

 Dusk: Bats are nocturnal creatures that typically emerge at dusk, which further supports
the idea that the "bat" here is the animal.

Thus, Sense 1 (bat as an animal) is the most appropriate meaning in this context.

Step 3: Identify Synonyms, Antonyms, and Relationships

In lexical semantics, words are also analyzed in terms of their relationships to other words. These
relationships help build a more comprehensive understanding of their meanings.

 Synonyms for "bat" (animal sense):

o Chiroptera (scientific term for the bat family).

o Flying mammal (descriptive synonym).

 Antonyms for "bat" (animal sense):

o Bird (another flying animal, but typically not a mammal).

 Hyponyms and Hypernyms for "bat" (animal sense):

o Hypernym: Mammal (broader category).

o Hyponyms: Fruit bat, Vampire bat (specific types of bats).

 Hyponyms and Hypernyms for "bat" (sports equipment sense):

o Hypernym: Sports equipment (broader category).

o Hyponyms: Baseball bat, Cricket bat (specific types of bats).

Step 4: Semantic Role Labeling

This step involves identifying the roles played by various words in the sentence. In our case, we need to
identify the semantic roles of the elements in the sentence "The bat flew out of the cave at dusk."

Downloaded by Mohamed Mustaq Ahmed


 Bat: The Agent (the entity that is performing the action).

 Cave: The Source (the location where the bat is coming from).

 Dusk: The Time (the time when the action occurs).

This analysis gives us a better understanding of the sentence structure and the relationships
between the elements.

Step 5: Word Sense Induction

While Word Sense Disambiguation (WSD) focuses on disambiguating predefined senses of a word,
Word Sense Induction (WSI) aims to automatically discover the possible senses of a word based on
its usage in context.

For example, in a corpus of sentences where "bat" appears in different contexts, the system might
automatically identify two main senses of "bat":

1. Animal: Found in contexts involving caves, flight, and dusk.

2. Sports Equipment: Found in contexts related to baseball, cricket, and hitting.

This step may not be directly applied to our specific sentence but can be used when analyzing large
text corpora.

Q5 What is Word Sense Disambiguation (WSD) ? Explain the dictionary based approach to Word
Sense Disambiguation.

Ans.

Word Sense Disambiguation (WSD)

Word Sense Disambiguation (WSD) is the task in Natural Language Processing (NLP) that aims to
determine which sense (meaning) of a word is used in a given context. Many words in natural
language are polysemous, meaning they have multiple meanings depending on the context in which they
are used. The challenge in WSD is to correctly identify the appropriate sense of a word based on the
surrounding text.

For example, consider the word "bank":

 Bank (financial institution): "I went to the bank to deposit money."

 Bank (side of a river): "The boat sailed along the bank of the river."

The goal of WSD is to identify whether the word "bank" refers to a financial institution or the side of a
river in the given sentence.

Dictionary-based Approaches to Word Sense Disambiguation

Dictionary-based approaches to WSD rely on external lexical resources (like dictionaries or thesauruses)
that provide definitions and semantic relationships for words. These approaches use information from
these resources to decide the correct sense of a word in context.

Downloaded by Mohamed Mustaq Ahmed


1. Lesk Algorithm (Original Approach)

The Lesk algorithm is one of the simplest and most well-known dictionary-based approaches to WSD.
It uses the definition (gloss) of a word and the definitions of its surrounding words (context) to
determine the most appropriate sense of the word. The idea is that the sense whose definition
overlaps most with the context words should be the correct one.

Working of the Lesk Algorithm:

 Step 1: For each possible sense of the ambiguous word, retrieve the gloss (definition) of
that sense from a dictionary or lexical resource.

 Step 2: Retrieve the glosses of surrounding words in the context.

 Step 3: Calculate the overlap between the gloss of each sense of the ambiguous word and
the glosses of surrounding context words.

 Step 4: The sense with the highest overlap (i.e., the most shared words) is chosen as the
correct sense.

2. WordNet-based Approach

WordNet is a lexical database of English, where words are organized into sets of synonyms (called
synsets), and these synsets are linked by semantic relationships like hyponymy (is-a relationships),
meronymy (part-whole relationships), and others. It is widely used for WSD, as it provides a rich
source of semantic relationships.

Working of WordNet-based WSD:

 Step 1: For each word in a sentence, generate a list of possible senses (synsets) of the word
using
WordNet.

 Step 2: Retrieve definitions (glosses) of the synsets for the word in the context.

 Step 3: Calculate the semantic similarity between the senses based on their glosses and the
surrounding words. Various similarity metrics, such as Lesk, Wu-Palmer, and Path Similarity,
are used.

 Step 4: Choose the synset with the highest semantic similarity to the context.

3. Dictionary-based Information Content (IC) Approach

In this approach, the information content of a word sense is used to determine its appropriateness for a
given context. The idea is that more specific senses (with a more precise meaning) have higher
information content than general senses.

This approach leverages resources like WordNet and calculates the entropy or information content
of each possible sense:

 The more specific a sense, the more informative it is.

Downloaded by Mohamed Mustaq Ahmed


 The sense with the highest information content in the given context is chosen.

4. Extended Lesk Algorithm (Overlap-based Approach)

An extension of the original Lesk algorithm, the Extended Lesk algorithm uses additional semantic
relationships between words, such as synonyms, antonyms, hypernyms, hyponyms, and meronyms.
This approach improves the basic Lesk method by not only considering the glosses but also
incorporating semantic relationships between words.

Working of Extended Lesk:

 Step 1: For each possible sense, retrieve the glosses and related synsets from a resource
like WordNet.

 Step 2: Expand the context with related words (e.g., synonyms, hypernyms).

 Step 3: Calculate the overlap between the context and the expanded gloss of each sense.

 Step 4: Choose the sense with the highest overlap, considering both direct gloss matches
and semantic relations.

Module 5: Pragmatic & Discourse Processing.

Q1 What is reference resolution. And Explain the three types of referents that complicate the reference
resolution problem.

Ans.

Reference resolution, also known as anaphora resolution, is the task in Natural Language Processing (NLP)
that aims to determine what a word (usually a pronoun or a noun phrase) refers to in a sentence or
text. The goal is to link an ambiguous expression to its actual referent (the specific object, person, or
idea it represents in the context).

For example:

 "John went to the store. He bought a loaf of bread."

o The task of reference resolution here is to figure out that "He" refers to "John."

Reference resolution is a crucial part of many NLP tasks, including information extraction, machine
translation, question answering, and summarization, because understanding what each word or
phrase refers to is necessary for accurate interpretation.

Three Types of Referents that Complicate the Reference Resolution Problem

1. Pronouns and Anaphora

 Definition: Pronouns (such as he, she, it, they) and anaphora (a broader term encompassing all
types of referential expressions that depend on a preceding element) are common forms
of

Downloaded by Mohamed Mustaq Ahmed


reference in language. These expressions depend on a previous word or phrase (an
antecedent) for their meaning.

 Complication: The challenge lies in determining the antecedent — the entity or concept in
the discourse that the pronoun refers to. For example, in the sentence:

o "Alice called Bob because he needed help."

o The pronoun "he" could refer to either Alice or Bob depending on the context, but it is
usually disambiguated by rules about gender and logical expectations.

2. Definite Descriptions (Definite Noun Phrases)

 Definition: A definite description refers to a specific entity or thing that is assumed to be known to
both the speaker and listener. In English, this is often marked by the definite article "the" (e.g., the
cat, the book). These descriptions are usually referring to something specific in the context or
previously mentioned in the discourse.

 Complication: The difficulty arises when a definite noun phrase doesn't explicitly mention the
referent, and the system needs to figure out what exactly it refers to from prior context. For
example:

o "I saw a car in the parking lot. The car was red."

o Here, the car clearly refers to the car mentioned earlier, but in more complex
contexts, resolving definite descriptions can be tricky if no prior reference is
available.

3. Indirect or Implicit Referents (including Events or Ideas)

 Definition: An indirect or implicit referent refers to things that aren't explicitly mentioned but
are inferred from context, such as ideas, events, or actions. These may include abstract
concepts, events, or situations that are indirectly referred to, often using pronouns, ellipses,
or other expressions.

 Complication: The primary challenge with indirect or implicit referents is that the referent is not
explicitly named or described in the text, and the connection to it must be inferred from the
broader context. For example:

o "John gave Mary a gift. It was wrapped in blue paper."

o Here, the word "It" refers to the gift, which is only implicitly mentioned. The
challenge is recognizing that "It" refers to an abstract or implied object.

Q2 Illustrate the reference phenomena for solving the pronoun problem.

Ans.

In Natural Language Processing (NLP), solving the pronoun problem refers to the task of reference resolution,
where we need to determine what a pronoun refers to in a given context. Pronouns like he, she, it,
they, him, her, this, that, etc., do not explicitly mention their referents but instead point to a noun
phrase (known as the

Downloaded by Mohamed Mustaq Ahmed


antecedent) elsewhere in the text. The challenge is identifying the correct antecedent that the pronoun
refers to.

Example: Simple Pronoun Resolution

Sentence:
"John went to the store. He bought some milk."

 Pronoun: He

 Context: "John went to the store."

 Possible antecedent: The word John.

In this case, the pronoun he clearly refers to the subject John in the previous sentence. The pronoun he
is a subject pronoun and in the given context, it is assumed that the most likely antecedent for he
is John, since "John" is the only noun phrase before the pronoun.
Reference Phenomenon:

 The reference phenomenon here is straightforward: the antecedent of he is John, as there is


no ambiguity in the context.

 A pronoun resolution algorithm would look for the nearest noun phrase in the preceding text
(or discourse) to resolve he as referring to John.

Techniques for Solving the Pronoun Problem

1. Rule-based methods:

 Use grammatical rules to match pronouns with their closest noun phrases.

 Gender and number agreement rules are often applied (e.g., he must refer to a male subject, it to a
singular object, etc.).

2. Machine Learning:

 Machine learning models, such as coreference resolution systems, can be trained on annotated
data to learn patterns of how pronouns refer to antecedents.

 These models can use features like sentence position, syntactic structure, and semantic
coherence to make predictions.

3. Contextual and Discourse Analysis:

 By considering discourse structure (e.g., whether the pronoun appears after a mention of a
person or object), the system can resolve which antecedent is most likely.

 Contextual cues such as verbs, adjectives, and action types help link pronouns to their referents.

4. World Knowledge:

Downloaded by Mohamed Mustaq Ahmed


 Knowledge about the world (e.g., understanding that people usually refer to weather when they
use "it") can help resolve ambiguous cases where the antecedent is not explicitly mentioned.

Q3 Explain Anaphora Resolution using Hobbs and Cantering Algorithm.

Ans.

Anaphora Resolution is the task of identifying the antecedent (the entity or concept) to which an
anaphor (a linguistic expression, like a pronoun or definite noun phrase) refers. Anaphora can be
simple, such as resolving a pronoun like he or she, or complex, such as identifying the referent for noun
phrases like the dog or the tall man. The goal is to link these expressions back to their appropriate
antecedents in a way that makes the discourse semantically coherent.

Two popular algorithmic approaches to anaphora resolution are Hobbs' Algorithm and Cantering Algorithm.
Both algorithms focus on identifying the antecedent of an anaphor in a systematic way, often relying on
linguistic rules and discourse structure.

Hobbs' Algorithm for Anaphora Resolution

Hobbs' algorithm, proposed by Jerry Hobbs in the early 1970s, is one of the earliest rule-based approaches
for anaphora resolution. It focuses on resolving pronouns and other anaphoric expressions by
considering syntactic and discourse structures.

The Hobbs algorithm operates by looking for the most local antecedent (closest possible match) in the
discourse. It uses a left-to-right reading strategy and incorporates syntactic information to
determine which noun phrase or entity a pronoun refers to.

Steps in Hobbs' Algorithm

1. Input:

o The input consists of a sentence containing an anaphor (typically a pronoun) and a


discourse context (previous sentences or noun phrases).

2. Identifying Candidate Antecedents:

o The algorithm first identifies all noun phrases in the discourse that could potentially
be antecedents. These noun phrases are taken from prior sentences or within the
current sentence (if applicable).
3. Syntactic Criteria:

o Hobbs' algorithm primarily uses syntactic proximity to identify the most likely antecedent.
It prefers to resolve a pronoun to a noun phrase that is grammatically close (e.g., a noun
phrase in the same clause or sentence). It checks for agreement in gender, number, and
person between the pronoun and potential antecedents.

4. Discourse Information:

Downloaded by Mohamed Mustaq Ahmed


o The algorithm also takes into account discourse coherence: the antecedent that makes the
most sense in the given context will be selected. For example, in the sentence John
went to the store. He bought bread., the pronoun he refers to John, because John is the only
suitable candidate based on the context.

5. Final Decision:

o After analyzing syntactic features and context, Hobbs' algorithm selects the most
likely antecedent.

Cantering Algorithm for Anaphora Resolution

The Cantering Algorithm, proposed by K. Cantering in the 1980s, is another rule-based approach that
attempts to resolve anaphora in a more structured and sophisticated manner. The Cantering
algorithm builds on ideas from discourse representation theory and attempts to resolve not just
pronouns but also other forms of anaphora like definite noun phrases.

The Cantering algorithm works by considering the semantic relations and pragmatic context of discourse. It
evaluates the discourse structure and uses rules to match anaphors to their referents. The algorithm
is more sophisticated than Hobbs' algorithm in terms of handling complex anaphora (e.g., when there
are multiple potential antecedents or when anaphors refer to events or ideas rather than individuals).

Steps in Cantering Algorithm

1. Input:

o The input consists of an anaphoric expression and the discourse context, including previous
sentences and noun phrases.

2. Identify Possible Antecedents:

o The algorithm first identifies all potential antecedents in the preceding discourse. These
antecedents are the noun phrases, events, or concepts that might be linked to the
anaphor.

3. Semantic and Syntactic Constraints:

o The algorithm uses semantic constraints (such as gender, number, person agreement) and
syntactic proximity to narrow down the list of candidates. In contrast to Hobbs' algorithm,
Cantering focuses more on contextual and semantic relationships.

4. Contextual Coherence:

o It analyzes the discourse coherence by considering the relationships between previous


statements, focusing on how previous actions or statements connect to the current
anaphor.

5. Selecting the Antecedent:

o Based on semantic and syntactic analysis, the algorithm resolves the anaphor by
selecting the most appropriate antecedent from the set of candidates.

Downloaded by Mohamed Mustaq Ahmed


Module 6: Applications of NLP.

Q1 Explain the various approaches for machine translation.

Ans.

1. Rule-Based Machine Translation (RBMT)

RBMT is one of the earliest approaches to machine translation, based on linguistic rules and grammatical
knowledge of both the source and target languages.

 How it Works: Linguists create extensive sets of rules to cover aspects like grammar, syntax,
morphology, and semantics for each language. A parser analyzes the structure of the source text
according to these rules, then transfers this structure to the target language by applying another
set of rules specific to it.

 Strengths: It works well when high-quality linguistic resources are available. It can offer
relatively accurate translations for structured, domain-specific texts.

 Weaknesses: Requires extensive manual effort to develop rules and dictionaries for each
language pair. It struggles with ambiguous and unstructured sentences, and adding new
languages requires significant work.

2. Statistical Machine Translation (SMT)

SMT models, which emerged in the 1990s, use statistical techniques to generate translations based on large
bilingual text corpora.

 How it Works: SMT relies on probabilities. Given a bilingual corpus, SMT calculates the likelihood
of a word or phrase in the target language, given the source language phrase. It typically includes
a language model, which ensures the fluency of output in the target language, and a
translation model, which handles the conversion of source language phrases to target phrases.
Phrase-based SMT (PB-SMT) is one of the most common variants, where phrases (instead of single
words) are translated as units.

 Strengths: SMT models improve over time with more data, making them adaptive to
specific domains. They were a significant improvement over rule-based models for many
languages and contexts.

 Weaknesses: SMT systems require large, high-quality bilingual corpora to work effectively.
They often produce grammatically awkward translations because they translate phrases
without understanding the overall sentence context.

3. Hybrid Machine Translation

Hybrid machine translation combines elements from multiple translation approaches, often integrating
rule-based and statistical methods.

 How it Works: Hybrid models can take many forms; for example, using rule-based techniques
to pre-process or post-process translations from a statistical system. Some hybrid systems
also

Downloaded by Mohamed Mustaq Ahmed


integrate SMT with example-based or neural approaches to handle different language phenomena
or translation needs.

 Strengths: By leveraging the strengths of different techniques, hybrid systems can improve
overall translation accuracy and adaptability.

 Weaknesses: Hybrid systems can be complex and challenging to maintain. They may still
struggle with idiomatic expressions, long sentences, or domains without adequate data.

4. Neural Machine Translation (NMT)

NMT, introduced in the mid-2010s, uses deep learning techniques to improve the quality of machine
translation. It has since become the most popular and effective approach.

 How it Works: NMT systems are based on neural networks, specifically sequence-to-sequence
(Seq2Seq) models with encoder-decoder architecture. The encoder processes the source sentence
into a continuous representation (or context vector), which is then decoded by the target
language decoder. Attention mechanisms help the model focus on relevant parts of the source
sentence when generating each word in the target language. Transformers are now the
dominant model architecture for NMT, enhancing the model's ability to handle longer
dependencies.

 Strengths: NMT generally produces smoother, more fluent translations than previous methods.
The models can be trained end-to-end, allowing them to learn language patterns and nuances
directly from data without the need for extensive linguistic rules.

 Weaknesses: NMT requires large datasets and significant computational power. It may struggle
with low-resource languages or highly specialized vocabulary, where data is limited.

Q2 Demonstrate the working of Machine translation systems.

Ans.

To demonstrate the working of machine translation (MT) systems, let’s walk through the main types of
MT systems and their operational processes, using an example translation task to show how each
type handles the job.

1. Rule-Based Machine Translation (RBMT)


Process:

1. Analysis of Source Language: The RBMT system first analyzes the English sentence according to
grammatical rules, identifying parts of speech (e.g., "the" as a determiner, "cat" as a noun).

2. Mapping to Target Language Structure: Based on pre-set rules, it then maps the source language
structure to the target language structure. For instance, English and French use articles and nouns
in similar ways, so the system would know to map "the cat" to "le chat."

Downloaded by Mohamed Mustaq Ahmed


3. Transformation and Generation: The system applies morphological and syntactic rules to generate
the correct form of each word in the French structure.

Result:

Translation: "Le chat est sur le tapis."

Explanation: Here, the rule-based system translates each word individually based on dictionary lookups
and grammar rules, applying transformations to match French grammatical conventions.

2. Statistical Machine Translation (SMT)


Process:

1. Phrase-Based Segmentation: The SMT model segments the sentence into phrases or chunks based on
statistical probabilities (e.g., "The cat" as one phrase and "is on the mat" as another).

2. Phrase Translation Using Probabilities: The model uses a bilingual corpus to find probable phrase
translations. For example, it might find that "The cat" translates to "Le chat" in the majority of
cases.

3. Reordering and Fluency Check: It arranges phrases in the correct order based on target language
syntax probabilities and uses a language model to ensure the translation reads naturally in
French.

Result:

Translation: "Le chat est sur le tapis."

Explanation: SMT relies on matching phrase probabilities from training data. Here, it finds the most
likely French phrase equivalents for the input, reordering and smoothing the result for natural flow.

3. Neural Machine Translation (NMT)


Process:

1. Encoding: An NMT model, often based on a sequence-to-sequence or transformer architecture,


encodes each word in the sentence to a continuous vector (numerical) representation. For "The cat
is on the mat," the model creates contextually informed vector representations.

2. Attention Mechanism: The model applies attention to decide which parts of the input sentence are
most relevant to each target word it generates. For example, when generating "le chat," it focuses
on "the cat" rather than "on the mat."

3. Decoding: The decoder generates the French translation word-by-word, guided by both the
context vector from the encoder and the attention mechanism.

Result:

Translation: "Le chat est sur le tapis."

Explanation: NMT uses deep learning to consider the full sentence context, resulting in a smooth,
contextually accurate translation. Attention allows the model to capture the dependencies
between words in both languages effectively.

Downloaded by Mohamed Mustaq Ahmed


4. Hybrid Machine Translation
Process:
1. Combining Techniques: A hybrid model might use rule-based translation for specific phrases (e.g.,
fixed expressions) and statistical translation for other parts.

2. Phrase Processing: For example, the system could use RBMT rules to translate "The cat" as "Le
chat" and then apply SMT to translate "is on the mat."

3. Final Translation Assembly: The hybrid model combines outputs from different approaches,
selecting the one with higher probability or fluency in the target language.

Result:

Translation: "Le chat est sur le tapis."

Explanation: Hybrid systems leverage multiple MT methods to handle different parts of a translation
task, combining the strengths of rule-based and statistical methods to improve accuracy and
fluency.

Q3 Explain the information retrieval


system.

Ans.

 Objective: The main goal of an IR system is to retrieve information relevant to a user's query,
typically from large datasets, databases, or collections of documents.
 Core Components:

o Document Collection: This is the database or corpus where information is stored, such as
text documents, images, or other media.

o Indexing: Documents are processed and indexed to enable quick searching. The indexing
process organizes and represents information to allow efficient retrieval.

o Query Processing: When a user enters a query, the system processes it to understand the
user's intent and identify key terms or phrases.

o Ranking: The system ranks documents based on their relevance to the query, often using
algorithms to assess factors like keyword presence, frequency, and document importance.

o Retrieval and Results: The system retrieves and displays ranked results, showing the most
relevant documents at the top for the user.

 Key Processes:

o Tokenization: Breaking down documents and queries into individual words or tokens for
easier matching.

o Stemming and Lemmatization: Reducing words to their root form to improve matching
(e.g., "running" to "run").

Downloaded by Mohamed Mustaq Ahmed


o Relevance Scoring: Assigning a score to each document based on how well it matches the
query, often using statistical or machine learning models.

o Feedback Loop: Many IR systems include user feedback to improve future searches, learning
which results were helpful or irrelevant.

 Types of IR Systems:

o Boolean Retrieval: Uses boolean operators (AND, OR, NOT) to match documents exactly based
on query terms.

o Vector Space Model: Represents documents and queries as vectors in a multi-dimensional space
and uses similarity measures to rank results.

o Probabilistic Models: Predicts the probability that a document is relevant to a given query based on
statistics from past queries and documents.

o Neural IR Models: Uses machine learning and deep learning models to understand semantic
relationships and improve search relevance.

 Applications: IR systems are used in web search engines, library databases, medical records
retrieval, e-commerce product search, and more.
 Challenges:

o Handling large-scale datasets efficiently.

o Understanding and processing complex or vague queries.

o Balancing relevance, speed, and accuracy in results.

Q4 Explain Question Answering system (QAS) in detail.

Ans.

A Question Answering System (QAS) is an advanced application in Natural Language Processing (NLP)
designed to automatically answer user questions in natural language. Unlike traditional search
engines that return a list of documents, a QAS aims to provide direct answers to specific questions. It
interprets the question, searches for the answer in a structured or unstructured data source, and
generates a response that best fits the query.

Components of a QAS

1. Question Processing:

 This stage focuses on understanding the user’s question by identifying its type (e.g., "Who,"
"What," "Where," "How") and extracting key information or keywords.

 It also involves identifying the intent and classifying the question to determine what type of answer
is needed (e.g., a person, place, date, fact).

Downloaded by Mohamed Mustaq Ahmed


2. Information Retrieval (IR):

 Once the question is processed, the system searches relevant documents or data sources to
find potential answers.

 This stage may involve traditional IR techniques, where the system retrieves a set of documents
or text passages that are likely to contain the answer.

3. Answer Processing:

 The QAS extracts and ranks candidate answers from the retrieved passages using NLP techniques.

 This may involve text parsing, named entity recognition (NER), dependency parsing, and
relation extraction to isolate information relevant to the question.

4. Answer Selection and Generation:

 From the list of possible answers, the system selects the one with the highest likelihood
of correctness, often using a scoring mechanism.

 If the system uses generative models (like certain neural network-based models), it may
generate answers directly rather than extracting them from documents.

Types of QAS

 Closed-Domain QAS: Focuses on a specific domain (e.g., medical, legal, or technical fields) and is
typically trained on specialized data, offering high accuracy within that domain.

 Open-Domain QAS: Handles a broader range of questions across various topics, commonly applied
in general-purpose search engines or virtual assistants. This system needs a vast knowledge
base and robust processing to handle diverse queries.

Applications of QAS

 Customer Support: QAS can automate responses to frequently asked questions, saving time
and enhancing customer experience.

 Virtual Assistants: Systems like Siri, Alexa, and Google Assistant utilize QAS to respond to
user questions conversationally.

 Educational Tools: QAS can help students by providing instant answers to educational queries
or guiding them through problem-solving.

 Medical Assistance: In healthcare, QAS can assist by providing information on


symptoms, treatments, or medication details.

Downloaded by Mohamed Mustaq Ahmed


Q5 Explain information retrieval versus information extraction systems.

Ans.

Feature Information Retrieval (IR) Information Extraction (IE)

Retrieve relevant documents or Extract specific, structured information from


Objective passages based on a query unstructured data

Primary Focus Document-level retrieval Sentence- or phrase-level extraction

Structured data (e.g., entities, relationships, facts)


Data Output List of documents or text passages

Works on unstructured data Works on unstructured data but outputs


Data Type (documents, articles, etc.) structured information

Typical Search engines, digital libraries, news Named entity recognition, relation extraction,
Applications article search summarization

Not always required; can run on entire


User Query Generally required to initiate search datasets automatically

Natural Language Processing (NLP), pattern


Techniques Used Indexing, ranking, retrieval algorithms recognition, rule-based or ML models

Example Use Finding documents containing Extracting names, dates, locations, and
Case information about "climate relationships from news articles
change"
Results User interprets retrieved documents System presents extracted information directly
Interpretation to find relevant details (e.g., key facts)

Complexity of More complex - identifies and extracts specific


Simple - retrieves entire documents information within documents
Output

Scope of Broader - fetches large segments of


Narrower - targets specific information within text
Operation data

Evaluation Precision, recall, accuracy (of extracted


Precision, recall, relevance entities/relations)
Metrics

Downloaded by Mohamed Mustaq Ahmed

You might also like