Natural Language Processing - Chapter
1 Study Guide
Q1. What is natural language processing (NLP)? Describe the various
stages involved in the NLP process with a suitable example.
Natural Language Processing (NLP) is a specialized field of Artificial Intelligence (AI) that
equips computers with the ability to understand, interpret, generate, and interact with human
language in a valuable way. The primary objective is to eliminate the communication barrier
between humans and machines.
The NLP process is typically executed through a pipeline of several stages. Let's use the
sentence "The old men finally kicked the bucket." to illustrate these stages:
1. Lexical Analysis (Morphological Analysis): This initial stage involves breaking down
the text into its basic components, or tokens, and analyzing word structures.
○ Tokenization: The sentence is segmented into tokens: ["The", "old", "men", "finally",
"kicked", "the", "bucket", "."].
○ Morphological Analysis: Words are deconstructed to their root forms. kicked is
identified as kick + -ed (past tense), and men is recognized as the plural form of man.
2. Syntactic Analysis (Parsing): This stage focuses on grammar, analyzing the
arrangement of words to ensure the sentence is grammatically correct. It constructs a
parse tree to map the grammatical relationships. For our example, it would identify "The
old men" as the noun phrase (subject) and "kicked the bucket" as the verb phrase,
confirming its structural validity.
3. Semantic Analysis: This stage moves from structure to meaning, aiming to understand
the literal definition of the words and the sentence.
○ It assigns dictionary meanings to individual words like men, kicked, and bucket.
○ The literal interpretation would be that the men physically struck a pail with their feet.
It checks for semantic logic; for instance, it would flag a sentence like "The wall ate a
sandwich" as semantically illogical, even though it is grammatically correct.
4. Discourse Integration: This stage analyzes the context provided by surrounding
sentences to understand the complete meaning. It's responsible for tasks like pronoun
resolution. If a subsequent sentence were "They were very frail," this stage would link
"They" back to "The old men."
5. Pragmatic Analysis: This is the most advanced stage, dealing with the intended or
inferred meaning, which can differ from the literal meaning. It uses real-world knowledge
and context. For our example, pragmatic analysis would recognize the idiom "kicked the
bucket" to understand its true meaning: "to die."
Q2. Explain the types of ambiguities in NLP with examples.
Ambiguity, where a piece of language can have multiple interpretations, is a primary challenge
in NLP.
● Lexical Ambiguity (Word Level): Occurs when a single word has multiple meanings.
○ Example: "The fishermen went to the bank." (Could be a river bank or a financial
institution).
● Syntactic Ambiguity (Structural Level): Occurs when a sentence can be grammatically
parsed in more than one way.
○ Example: "I saw the man with the telescope." (Who has the telescope? Me or the
man?).
● Semantic Ambiguity: Occurs when a sentence is grammatically correct but its meaning
is unclear.
○ Example: "The car hit the pole while it was moving." (Was the car moving or was
the pole moving?).
● Anaphoric Ambiguity: Arises when it is unclear which noun a pronoun refers to.
○ Example: "The boy waved to his father. He was happy." (Who was happy? The boy or
the father?).
● Pragmatic Ambiguity: Occurs when the speaker's intent is not clear from the literal
meaning and requires context.
○ Example: A person walks into a very messy room and says, "You're so tidy!" (This is
likely sarcasm, not a literal statement).
Q3 & Q4. What are the challenges in NLP? Explain in detail.
1. Ambiguity: Resolving the multiple meanings of words and sentences is the most
fundamental challenge.
2. Contextual Understanding: The meaning of a word is highly dependent on its context.
"It's cool" can refer to temperature, fashion, or acceptance.
3. Synonymy and Paraphrasing: Humans express the same idea in many ways (e.g., "book
a flight" vs. "reserve a seat"). Teaching a machine to recognize these as equivalent is
difficult.
4. Named Entity Recognition (NER): Correctly identifying entities like "Washington" (a
person, a state, or a city?) or "May" (a name or a month?) is complex.
5. Irony, Sarcasm, and Figurative Language: NLP models struggle to detect non-literal
language like sarcasm or idioms from text alone.
6. Spelling and Grammatical Errors: NLP systems must be robust enough to handle the
messy, error-filled text common in real-world data from social media and emails.
7. Coreference Resolution: Correctly linking pronouns and other expressions (e.g., "he,"
"it," "the company") to the entities they refer to throughout a text is a significant
challenge.
8. Domain Adaptation: A model trained on news articles will perform poorly on legal or
medical texts due to differences in vocabulary and structure.
9. World Knowledge: True understanding requires common-sense knowledge that is
difficult to program into a machine (e.g., knowing that water is wet).
10.Bias: Models trained on human-generated text can learn and perpetuate societal biases
related to gender, race, and other attributes.
Q5. Define syntactic and semantic levels of language understanding
in NLP. Give an example for each level.
● Syntactic Level: This level is concerned with the grammatical structure of a sentence.
It analyzes how words are ordered and related to form a valid sentence according to
grammatical rules, without considering meaning.
○ Example: In the sentence "The boy ate the apple," syntactic analysis confirms that it
follows the correct Noun Phrase + Verb Phrase structure.
● Semantic Level: This level focuses on the literal meaning of the words and the
sentence. After syntax is confirmed, semantics determines if the sentence makes logical
sense.
○ Example: In the sentence "The apple ate the boy," while it is syntactically correct,
semantic analysis would identify it as nonsensical because apples cannot perform
the action of eating.
Q6. Explain Tokenization, Stop Word Removal, Case Folding, Script
Validation, and Named Entity Recognition.
These are common text pre-processing techniques:
1. Tokenization: The process of breaking down text into smaller units called tokens (usually
words or punctuation).
○ Example: "NLP is fun!" → ["NLP", "is", "fun", "!"].
2. Stop Word Removal: The process of removing common words (like "the", "is", "a", "in")
that add little semantic meaning.
○ Example: "The cat is on the mat" → ["cat", "mat"].
3. Case Folding: Converting all text to a single case (usually lowercase) to treat words like
"Apple" and "apple" as the same.
○ Example: "Apple", "apple", "APPLE" → "apple".
4. Script Validation: Verifying that the text uses the expected character set or script (e.g.,
Latin, Devanagari) to ensure data quality.
5. Named Entity Recognition (NER): A more advanced process to identify and classify
named entities into categories like Person, Organization, or Location.
○ Example: "Tim Cook works for Apple in Cupertino." → "Tim Cook" (PERSON), "Apple"
(ORGANIZATION), "Cupertino" (LOCATION).
Q7. Describe the applications of natural language processing.
● Machine Translation (e.g., Google Translate)
● Sentiment Analysis (analyzing customer reviews, social media)
● Chatbots and Virtual Assistants (e.g., Siri, Alexa)
● Search Engines (understanding query intent)
● Text Summarization (condensing long documents)
● Question Answering Systems (providing direct answers to queries)
● Spam Detection (filtering emails)
● Speech Recognition (voice-to-text)
● Grammar and Spell Checkers (e.g., Grammarly)
Q8. Identify and describe the ambiguities in the following sentences.
● i. The man kept the dog in the house.
○ Ambiguity: Syntactic.
○ Explanation: It's unclear what "in the house" refers to. Does it describe where the
man was (The man, [in the house], kept the dog), or which dog he kept (The man
kept [the dog that was in the house])?
● ii. Book the flight.
○ Ambiguity: Lexical.
○ Explanation: The word "Book" could be a verb (a command to make a reservation)
or a noun (referring to a specific book titled "The Flight").
● iii. Did you said you were looking for mixed nuts.
○ Ambiguity: Pragmatic.
○ Explanation: This could be a literal question or a figurative one. "Mixed nuts" can be
an idiom for "crazy people," so the speaker's intent depends entirely on the context.
● iv. The old man finally kicked the bucket.
○ Ambiguity: Pragmatic (Idiomatic).
○ Explanation: This sentence has a literal meaning (a man physically kicked a pail) and
a well-known idiomatic meaning ("to die"). An NLP system needs knowledge of
idioms to understand the intended meaning.