UNIT NO 2
UNIT NO 2
UNIT NO: 2
Q. No Question Marks
1 Show Working of Named Entity Relation(NER) With appropriate example. 6
ANS Named Entity Recognition (NER):
Named Entity Recognition (NER) is a key task in Natural Language Processing (NLP) that
involves the identification and classification of named entities in unstructured text, such as
people, organizations, locations, dates, and other relevant information.
NER is used in various NLP applications such as information extraction, sentiment analysis,
question-answering, and recommendation systems.
3. Entity Categorization:
o Tokens are assigned appropriate labels.
o Example (Token-LevelLabels):Apple (B-ORG), Inc. (I-ORG), Steve (B-PER),
Jobs (I-PER), Cupertino (B-LOC), California (B-LOC), 1976 (B-DATE).
o B stands for the beginning of an entity, and I stand for inside the entity.
4. Output:
Organization: Apple Inc.
Person: Steve Jobs
Location: Cupertino, California
Date: 1976
Application:
1) Information Extraction:
NER can be used to extract relevant information from large volumes of unstructured text,
such as news articles, social media posts, and online reviews.
This information can be used to generate insights and make informed decisions.
2) Sentiment Analysis:
NER can be used to identify the sentiment expressed in a text towards a particular named
entity, such as a product or service.
This information can be used to improve customer satisfaction and identify areas for
improvement.
3) Question Answering:
NER can be used to identify the relevant entities in a text that can be used to answer a
specific question.
This is particularly useful for chatbots and virtual assistants.
4) Recommendation Systems:
NER can be used to identify the interests and preferences of users based on the entities
mentioned in their search queries or online interactions.
This information can be used to provide personalized recommendations and improve user
engagement.
Advantages of NER:
1) Improved Accuracy: NER can improve the accuracy of NLP applications by identifying and
classifying named entities in a text more accurately and efficiently.
2) Speed and Efficiency: NER can automate the process of identifying and classifying named
entities in a text, saving time and improving efficiency.
3) Scalability: NER can be applied to large volumes of unstructured text, making it a valuable
tool for analyzing big data.
4) Personalization: NER can be used to identify the interests and preferences of users based on
their interactions with a system, allowing for personalized recommendations and improved
user engagement.
Disadvantages of NER:
1) Ambiguity: NER can be challenging to apply in cases where there is ambiguity in the meaning
of a word or phrase. Example: the word “Apple” can refer to a fruit or a technology company.
2) Limited Scope: NER is limited to identifying and classifying named entities in a text and
cannot capture the full meaning of a text.
NLP END SEM QUESTION BANK
3) Data Requirements: NER requires large volumes of labeled data for training, which can be
expensive and time-consuming to collect and annotate.
4) Language Dependency: NER models are language-dependent and may require additional
training for use in different languages.
Types of Morphology:
1) Derivational Morphology:
Derivation is how we create new words from a base word or root word by adding parts
called affixes (prefixes and suffixes).
We combine affixes with root words.
Example, adding -ation to the verb "summarize" gives us the noun "summarization."
Less Productive: Some affixes only work with certain words. For example, you can’t add
-ation to every verb.
Different Meanings: Suffixes can change the meaning of a word. For instance,
"conformation" and "conformity" both come from "conform," but they mean different
things.
The new words created can also be used to make even more new words, making our
language richer.
2) Inflectional Morphology:
Inflection changes the form of the same word without changing its meaning.
Inflection adds parts (called morphemes) to a word’s stem to show grammatical information
like number (singular/plural), tense (past/present), agreement, or case.
The inflected word keeps its original meaning and word category.
Example: a noun stays a noun, and a verb remains a verb but changes to show a different
tense.
In English, inflection happens mostly with nouns and verbs (and sometimes adjectives).
English has fewer inflectional morphemes compared to some other languages.
NLP END SEM QUESTION BANK
Application:
1) Machine Translation:
Helps in translating words with different morphological rules across languages.
Example: Translating “playing” (English) to “jouant” (French).
2) Spell Checking: Identifies and suggests corrections for misspelled words based on
morphological rules.
3) Information Retrieval:
Improves search accuracy by matching words with their morphological variations.
Example: Searching for “run” will also return results for “running” and “ran.”
4) Text-to-Speech Systems: Morphological analysis helps in pronouncing words correctly by
understanding their structure.
5) Sentiment Analysis:
Extracts meaning from words with prefixes or suffixes indicating positive or negative
sentiments.
Example: "unhappy" (negative sentiment) → "happy" (positive root).
6) Language Modeling: Helps build NLP models for morphologically rich languages like
Finnish, Turkish, or Hindi, where words have complex structures.
Where:
P(y∣x): The probability of class y given input x.
Z(x): The normalization factor, also called the partition function, ensures that
probabilities sum to 1:
Steps:
NLP END SEM QUESTION BANK
Pre-processing Step:
1) Stemming:
Stemming reduces words to their base or root form, usually by removing suffixes.
The resulting stems are not necessarily valid words but are useful for text normalization.
Common ways to implement stemming:
1. Porter Stemmer: One of the most popular stemming algorithms, known for its
simplicity and efficiency
2. Snowball Stemmer: An improvement over the Porter Stemmer, supporting multiple
languages.
3. Lancaster Stemmer: A more aggressive stemming algorithm, often resulting in
shorter stems.
2) Lemmatization:
Lemmatization reduces words to their base or dictionary form (lemma).
It considers the context and part of speech, producing valid words.
To implement lemmatization in python, WordNet Lemmatizer is used, which leverages
the WordNet lexical database to find the base form of words.
3) Morphological Parsing:
Morphological parsing involves analyzing the structure of words to identify their
morphemes (roots, prefixes, suffixes).
It requires knowledge of morphological rules and patterns.
Finite-State Transducers (FSTs) is uses as a tool for morphological parsing.
FSTs are computational models used to represent and analyze the morphological structure
of words.
They consist of states and transitions, capturing the rules of word formation.
Applications:
1. Morphological Analysis: Parsing words into their morphemes.
2. Morphological Generation: Generating word forms from morphemes.
4) Neural Network Models: Neural network models, especially deep learning models, can be
trained to perform morphological analysis by learning patterns from large datasets.
Types of Neural Network:
1. Recurrent Neural Networks (RNNs): Useful for sequential data like text.
2. Convolutional Neural Networks (CNNs): Can capture local patterns in the text.
3. Transformers: Advanced models like BERT and GPT that understand context and
semantics.
5) Rule-Based Methods:
Rule-based methods rely on manually defined linguistic rules for morphological analysis.
These rules can handle specific language patterns and exceptions.
Applications:
1. Affix Stripping: Removing known prefixes and suffixes to find the root form.
2. Inflectional Analysis: Identifying grammatical variations like tense, number, and
case.
6) Hidden Markov Models (HMMs):
Hidden Markov Models (HMMs) are probabilistic models that can be used to analyze
sequences of data, such as morphemes in words.
HMMs consist of a set of hidden states, each representing a possible state of the system,
and observable outputs generated from these states.
NLP END SEM QUESTION BANK
In the context of morphological analysis, HMMs can be used to model the probabilistic
relationships between sequences of morphemes, helping to predict the most likely
sequence of morphemes for a given word.
Components of Hidden Markov Models (HMMs):
1. States: Represent different parts of words (e.g., prefixes, roots, suffixes).
2. Observations: The actual characters or morphemes in the words.
3. Transition Probabilities: Probabilities of moving from one state to another.
4. Emission Probabilities: Probabilities of an observable output being generated from a
state.
Applications:
1. Morphological Segmentation: Breaking words into morphemes.
2. Part-of-Speech Tagging: Assigning parts of speech to each word in a sentence.
3. Sequence Prediction: Predicting the most likely sequence of morphemes for a given
word.
Types of N-grams:
1. Unigrams (1-grams): These are single words.
2. Bigrams (2-grams): These consist of pairs of consecutive words.
3. Trigrams (3-grams): These are sequences of three consecutive words.
Example:
NLP END SEM QUESTION BANK
Uses of N-grams:
1) Language Modeling:
N-grams are used to estimate the probability of a word given its previous N-1 words.
This is fundamental in predicting the next word in a sequence of text, making them
essential for applications like auto-completion and text generation.
2) Text Classification:
N-grams can be used to represent documents or text for classification tasks.
By counting the occurrences of N-grams in a document, it's possible to create a feature
vector that can be used in machine learning algorithms for text classification.
3) Information Retrieval:
In search engines, N-grams can be used to index documents and query terms.
This helps in ranking and retrieving relevant documents.
4) Speech Recognition: N-grams can be used to model sequences of phonemes or words in
speech recognition systems, aiding in accurate transcription of spoken language.
5) Machine Translation: N-grams are used in machine translation to align and translate
sequences of words or phrases in different languages.
6) Spelling Correction: N-grams can help identify and correct misspelled words by comparing
them to correctly spelled N-grams in a language model.