0% found this document useful (0 votes)
36 views8 pages

NLP Study Material

The document outlines a comprehensive study material for Natural Language Processing (NLP), divided into three modules. It covers various topics including Named Entity Recognition, tokenization, language models, text classification, sentiment analysis, and the use of libraries like NLTK and Spacy. Each module contains questions and tasks designed to deepen understanding of NLP concepts and their applications in real-world scenarios.

Uploaded by

avoynath2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

NLP Study Material

The document outlines a comprehensive study material for Natural Language Processing (NLP), divided into three modules. It covers various topics including Named Entity Recognition, tokenization, language models, text classification, sentiment analysis, and the use of libraries like NLTK and Spacy. Each module contains questions and tasks designed to deepen understanding of NLP concepts and their applications in real-world scenarios.

Uploaded by

avoynath2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NLP study Material

Module 1

 Identify two real-world scenarios where Named Entity Recognition is utilized.


 Describe different categories of entities recognized by NER with relevant
illustrations.
 Define the concept of natural language processing in your own words.
 What are stop words and why are they often excluded from processing?
 Cite two domains where NLP techniques are commonly applied.
 Clarify how precision differs from recall in the evaluation of search results.
 Briefly describe the purpose of the NLTK library.
 Explain what is meant by identifying multiple words as a single unit in tokenization.
 What is stemming and how is it useful in NLP?
 Explain what affixes are and their function in word formation.
 What is a lexicon in NLP, and why is it important?
 Why do we sometimes prefer tokenizing phrases instead of individual words?
 Describe the task of breaking text into separate sentences.
 What is the importance of segmenting text into sentences?
 Define morphology as it pertains to NLP tasks.
 Name the key types of morphology relevant in computational linguistics.
 How does natural language understanding differ from NLP more generally?
 Provide examples of standard corpora used in NLP research.
 Distinguish between dividing text into words and segmenting into sentences.
 Outline the steps taken to solve a typical NLP problem.
 Show how a sentence is broken into tokens, using a specific sentence.
 Describe the inner workings of a named entity recognition system.
 Explain the value of removing stop words and when it might be counterproductive.
 Illustrate how regular expressions work through a simple case.
 Discuss how dependency parsing helps analyze sentence structure.
 Create a regex for all strings using only ‘a’ and ‘b’ with even length.
 Build a regex for strings of length four starting with the letter 'a'.
 Write a regex that ensures a string includes at least one ‘a’.
 Compare the features and capabilities of NLTK and Spacy.
 Define and exemplify the Bag of Words model.
 How do regular expressions differ from regular grammar?
 Walk through both word and sentence tokenization with a hands-on example.
 What are the frequent issues faced in morphological analysis, and how can they be
solved?
 Explain and demonstrate the Minimum Edit Distance algorithm with “MAM” and
“MADAM”.
 Describe how NER and IE are used to address real-life tasks.
 Pick a real-world NLP problem and explain how it was solved step-by-step.
 Explain how to build a corpus tailored to a specific NLP problem.
 What does Information Extraction aim to achieve in NLP?
 List uses of sentiment and opinion mining with corresponding variations.
 List practical examples of where Information Retrieval techniques are applied.
 Define text normalization and why it matters in NLP preprocessing.
 How does normalization differ from tokenization? Provide justification with
examples.
 Why is POS tagging important? Explain with a real example.
 Point out limitations of a basic Top-Down parsing technique.
 Clarify the contrast between classification and prediction using an example.
 How are word and phrase tokenization techniques used differently?
 Name two real-life examples that show NLP’s role across industries.
 Identify all strings of length 5 or fewer from given regular expressions.
 Write regex for:
 All alphabet-only strings
 Lowercase strings ending with ‘b’
 Strings where every 'a' is surrounded by 'b'
 Describe how rule-based POS tagging works.
 Explain how a regular expression contrasts with regular grammar.
 Briefly explain what NLTK is and its purpose.
 Clarify what’s meant by Multi Word Tokenization.
 Describe how sentence segmentation operates in NLP.
 What does morphology focus on in computational linguistics?
 Share well-known corpus datasets used in NLP.
 What does word tokenization involve?
 Calculate the minimum number of operations to turn “ELEPHANT” into
“RELEVANT”.
 Determine the edit distance between "SUNDAY" and "SATURDAY".
 Identify the different branches of morphology in linguistics.
 Explain what stemming is and where it fits in the NLP pipeline.
 Define a corpus and explain its role in NLP.
 Differentiate stemming and lemmatization with examples.
 List and describe the major stages in the NLP pipeline.
 What is the role of a chatbot in NLP-driven applications?
 Summarize the main steps in text preprocessing and explain how to manage outliers.
 Provide an example highlighting issues with sentence segmentation.
 What are some fundamental tasks NLP systems typically perform?
 Describe what is meant by text cleanup and extraction, with examples.
 What is word sense ambiguity and how can it impact NLP?
 Write a short note on how the Bag of Words approach works.
 What is homonymy? Provide a brief explanation with an example.
 Define WordNet and its utility.
 Given a document and corpus stats, compute TF and IDF values.
 Relate the use of SVD, matrix completion, and matrix factorization in NLP.
 Present two use cases showing how regex supports NLP tasks.
 Why is it beneficial to use multiword tokens over single tokens? Provide illustrations.
 Differentiate natural language from formal language.
 Define lexicon and lexeme, and explain the relationships between different lexemes.
 What makes bottom-up parsers more efficient than top-down ones?
 (Question skipped in original numbering; continue to next)
 Describe the Skip-gram model and its approach to learning embeddings.
 Clarify how TF-IDF scoring works in search ranking.
 Tokenize and tag a sample sentence.
 Identify different pronunciations and parts-of-speech of a given word.
 Use an edit distance grid to transform “intention” to “execution”.
 Why is creating corpora essential in NLP research?
 Discuss the role of regex in managing and exploring text data.
 Explain how WordNet supports understanding word relationships in NLP.
 Define pragmatic ambiguity and explain how it affects interpretation.
 Describe what kind of strings are matched by these regex patterns:
a. [a-zA-Z]+
b. [A-Z][a-z]*
 Extract email addresses from a sample text using regex.
 Analyze a faulty regex intended to capture uppercase letters and digits. Suggest a fix.
 Build a regex capable of detecting three different date formats in text.
 Calculate edit distance between “MAMA” and “MADAAM”.
 Find the transformation steps for turning “kitten” into “sitting” with edit cost 1

Module 2

 How would you define a language model in NLP?


 Explain how an n-gram model works, using a simple example.
 Mention two key differences between bigram and trigram models.
 What is the chain rule of probability, and how is it used in language modeling?
 When applying the bigram model, what simplifications do we introduce into
probability computation?
 Define the Markov assumption and explain its role in NLP.
 What purpose does maximum likelihood estimation serve in statistical NLP?
 Given a word and its preceding word, how would you calculate normalized bigram
probabilities?
 What does relative frequency mean in the context of n-gram models?
 Identify the core elements that make up a semantic interpretation system.
 What is lexical ambiguity and how does it affect language understanding?
 Provide an explanation and example of semantic ambiguity.
 What is syntactic ambiguity, and how does it arise in sentences?
 Why is it important to represent meaning in computational language processing?
 Explain the major contrast between lexical and semantic analysis.
 List two tools commonly used for creating or analyzing language models.
 Discuss various part-of-speech attributes with suitable examples.
 What is extrinsic evaluation in language models, and what challenges are associated
with it?
 Demonstrate how path-based similarity is used to compare word meanings.
 Define and differentiate homonymy, polysemy, and synonymy with examples.
 How can WordNet be used to derive meaning from text corpora?
 Explain how computational semantics is implemented in NLP systems.
 What are the drawbacks of basic path similarity, and how does information content
improve it?
 How does the extended Lesk algorithm work? Include a use case.
 Highlight the differences between rule-based and stochastic part-of-speech tagging
techniques.
 What are the features and advantages of stochastic POS tagging?
 Describe rule-based POS tagging and list its key properties.
 Give examples of how n-gram models assist in predicting the next word in a
sentence.
 What is transformation-based tagging and how does it function?
 Compare structured and unstructured data using NLP examples.
 What is semi-structured data, and how does it differ from structured data? Provide
an example.
 How can supervised learning methods help in categorizing text?
 List practical applications of emotion detection in real-world scenarios.
 Suppose you work for a food delivery company and need to survey competitors—
how would you carry this out using NLP techniques?
 Describe a traditional search model and illustrate it with a diagram.
 Why is tagging parts of speech vital for natural language tasks?
 What is meant by “vocabulary” in NLP and why does it matter?
 Define Information Extraction and state its importance.
 What is morphological parsing, and what are its major stages?
 Clarify the idea of a Bag-of-Words and give an example of its usage.
 How do formal and natural languages differ from each other?
 Suppose you want to regenerate a document based on its word content. How would
you do this and what does it tell you about the topic?
 What does text parsing mean in an NLP context?
 How is sentiment analysis applied in analyzing market trends?
 Briefly describe the Hidden Markov Model and its role in sequence labeling.
 Explain a key advantage of using Latent Dirichlet Allocation (LDA) over Probabilistic
Latent Semantic Analysis (PLSA) for recommendations.
 How can building a recommender system using matrix factorization be considered a
regression task?
 Name two common POS tagging methods used in computational linguistics.
 What is WordNet and what role does it play in NLP?
 Explain the hierarchy of word relationships as represented in WordNet.
 How are morphological operations implemented in NLP applications?
 Define hypernyms, hyponyms, and heteronyms, and explain with examples.
 Compare the advantages and disadvantages of Skip-gram and CBOW models.
 Describe how text classification works using the Naïve Bayes algorithm.
 How can a Naïve Bayes model be applied to collaborative filtering tasks?
 Clarify the difference between lexical and semantic analysis in NLP.
 Define what n-grams are and explain their use in language processing.
 What are word embeddings and how are they useful in NLP systems?
 Explain “vector semantics” and its role in interpreting word meaning.
 State a major limitation of the TF-IDF model in NLP applications.
 Show how regular expressions help in processing and cleaning text data.
 Describe the role of n-grams in language modeling, and illustrate how they detect
text patterns.
 Discuss how n-grams are applied in building classification systems.
 What problem does the unigram model present in extracting useful information?
 Define homographs and illustrate with a real-world example.
 How does the Levenshtein distance measure similarity between two words?
 What are heteronyms? Give one example.
 Define polysemy and illustrate it with a word that has multiple meanings.
 What are synonyms and antonyms? Provide examples of each.
 Given a small corpus and a bigram model, calculate the smoothed probability of a
word following another.
 Analyze the validity of the following:
o Are rule-based taggers deterministic?
o Do stochastic taggers depend on specific languages?

Module 3

 How can concepts like TF-IDF, data splitting (train/validation/test), and stop words
influence the performance of NLP-based machine learning models? What challenges
might arise, and how can they be addressed?
 What is meant by text classification in NLP?
 How can structured information be extracted from unstructured text?
 What are ad-hoc retrieval tasks in information systems?
 Which aspects of ad-hoc retrieval are typically addressed in IR research?
 What components are included in a basic Information Retrieval system?
 Define the purpose and structure of an inverted index.
 How can manually designed rules assist in categorizing text?
 List and explain a few machine learning techniques used for classifying text
documents.
 What are some known limitations of the Naive Bayes classifier in text tasks?
 Describe the outcome of applying the independence assumption in the multinomial
Naive Bayes model.
 Mention two scenarios where the bag-of-words approach can be effectively applied.
 What issue does maximum likelihood estimation present in multinomial Naive Bayes,
and how can this be mitigated?
 Using the concept of a confusion matrix, explain the working of a spam classifier.
 How is k-fold cross-validation used to test the robustness of a classifier?
 Identify real-world challenges faced by text classifiers and how to resolve them.
 What are the major categories of techniques used for text classification?
 List three performance metrics for evaluating classifiers. Explain each with suitable
examples.
 What evaluation parameters are used to assess the performance of a confusion matrix?
 Illustrate Word2Vec-based embeddings using a simple diagram.
 Explain how Doc2Vec differs from Word2Vec, with a labeled diagram.
 With examples, explain how vector semantics and probabilistic models help represent
sequences of words.
 Define the process of opinion mining in NLP.
 What are the major considerations when collecting feedback data for sentiment
analysis?
 What is intent analysis and where is it applied?
 Define emotion analysis and describe how it functions.
 Explain the mechanism behind emotion analytics and its impact on business or user
behavior.
 Why is the Naive Bayes model still powerful despite its name?
 Describe how the multinomial Naive Bayes algorithm works using a step-by-step
approach.
 Differentiate micro and macro averaging in classification metrics using an example.
 Explain three different techniques for opinion mining.
 What challenge arises in keyword-based IR when dealing with large documents?
 What are the key initial steps in the text preprocessing pipeline?
 What is the primary aim of an information retrieval system?
 In what different ways can a Bag-of-Words representation be leveraged for
classifying documents?
 Compare sentiment, emotion, and intent analysis in terms of purpose and output.
 How do companies utilize sentiment analysis to track product reception in the market?
 Name some practical implementations of emotion detection through recognition
techniques.
 Show how Naive Bayes can be applied step-by-step in a text classification scenario.
 List and explain the four essential stages involved in text normalization.
 Highlight specific real-world applications of automatic text classification.
 Define NER and explain its role in extracting meaningful elements from text.
 How is Named Entity Recognition useful in building NLP-based applications?
 Explain how k-fold validation helps in performance testing of classifiers.
 Discuss the core principles of NLP and why they are valuable in modern applications.
 What is ambiguity in language? Describe the different types found in NLP.
 List some key benefits of using automated text classification, with an example.
 What foundational concepts make up a semantic system in NLP?
 How does NLTK differ from Spacy in terms of features and use cases?
 Provide a detailed explanation of dependency parsing in syntactic analysis.
 What preprocessing tasks are typically performed before applying NLP techniques?
 Describe different industries where chatbots are being actively used.
 Using Levenshtein distance (insertion = 1, deletion = 1, substitution = 2), transform
“DOG” into “COW”.
 What is word embedding and how does it benefit various NLP applications?
 Clarify the difference between prediction and classification with practical examples.
 How does lexical ambiguity affect tasks such as translation or sentiment detection?
How does WordNet help?
 What is topic modeling in text analytics and what purpose does it serve?
 Using the given dataset, apply Naive Bayes to determine if an email with “Offer =
Yes”, “Win = Yes”, and “Money = Yes” is spam.
 Given a dataset of customer feedback, determine if a new feedback entry is “Positive”
using Naive Bayes.
 With a given weather dataset, predict whether someone will play tennis using Naive
Bayes based on the conditions:
o Outlook: Rain
o Temperature: Mild
o Humidity: High
o Wind: Strong

Module 4

 Break down the primary types of recommendation systems and how they are
categorized.
 How does a content-based recommendation engine function?
 Describe the operational principles behind collaborative filtering.
 List and briefly explain the key metrics used to evaluate recommendation systems.
 What is meant by a hybrid recommendation approach?
 Define conversational agents and explain their purpose.
 What is the role of summarization in processing large text data?
 How does item-based collaborative filtering differ from user-based methods?
 Provide some real-world use cases of topic modeling.
 Why is there a need to automatically summarize textual content?
 Justify the classification of a chatbot as a conversational system.
 What benefits does artificial intelligence bring to chatbot development?
 Explain the idea behind retrieval-based conversation models.
 Define a question-answering system and its purpose.
 Mention examples of commonly used question answering systems.
 How does the user-based collaborative filtering technique operate?
 Explain the approach of using IR for building a question answering system.
 Contrast the goals of information retrieval versus traditional web search.
 Use an example to describe how user ratings help shape a recommendation.
 What is sentiment analysis, and how can it be illustrated with an example?
 Describe the various types of recommendation engines available.
 Provide real-world scenarios that exemplify recommendation system usage.
 Explain two different types of conversational agents and their characteristics.
 Show with an example how collaborative filtering can generate recommendations.
 List the most frequent use cases where sentiment analysis is applied.
 What are the major steps involved in implementing Latent Dirichlet Allocation
(LDA)?
 Describe how sentiment from Twitter posts can be analyzed using NLP.
 What components define chatbot architecture in modern systems?
 Explain how summarizing multiple documents works.
 Define topic modeling and its significance in text analysis.
 What is extractive summarization and how is it implemented?
 Contrast extractive and abstractive summarization techniques.
 Categorize recommendation strategies and provide an example of each.
 Highlight various techniques for summarizing text content.
 What are the most relevant use-cases where recommendation engines are deployed?
 Differentiate content-based filtering from collaborative filtering.
 Outline the major steps of sentiment analysis in NLP.
 What makes LDA different from other topic modeling techniques?
 With a simple case, explain how abstractive summarization is done.
 (Originally incomplete in the source; continue with assumed context) Given a
collection of sentences, demonstrate how summarization techniques would condense
them.
 What are the key pros and cons of both collaborative and content-based filtering?
 How do TF-IDF, training-validation-test split, and stop word removal impact model
building in NLP? What are some common challenges, and how can they be
addressed?
 How do extractive and abstractive summarizers differ? How would you construct an
extractive summarization tool?
 Describe how models like GPT use pretraining to enhance performance on NLP tasks.
 How can a Naive Bayes model be adapted for use in collaborative filtering?
 Clarify how lexical and semantic analysis differ in language processing.
 How does NLP support sentiment analysis tasks?
 Provide a brief explanation of n-grams in the NLP context.
 Describe how n-grams are applied in practical NLP tasks.
 Define the term "data augmentation" in the context of NLP.
 What steps would you follow to build a recommendation system that works with text
data?
 Explain how CBOW and Skip-Gram are implemented as part of Word2Vec.
 What distinguishes collaborative filtering from content-based recommendation?
 Name a few real-world scenarios where recommendation systems are applied.
 Compare rule-based, retrieval-based, and generative chatbot architectures, evaluating
them by scalability, quality, and flexibility.
 How does ChatGPT leverage large-scale pretraining and transformer architecture to
deliver relevant replies?
 Provide an example to explain collaborative recommendation systems in action.
 List five diverse areas where NLP has real-world applications like education,
healthcare, or finance.
 Why is Natural Language Understanding (NLU) essential in chatbot design?
 Outline the architectural framework behind ChatGPT as an NLP model.
 Mention different ways to apply data augmentation in NLP-related tasks.
 Distinguish between information retrieval and a basic web search engine.
 Elaborate on the principles of item-based collaborative filtering.
 What is dialogue management and how is it relevant in chatbot development?
 How do large pre-trained models like GPT-3 enhance conversational systems?
 How does user-based collaborative filtering operate in recommender engines?
 Can statistical models support machine translation? If yes, give a short explanation.
 With a diagram, explain how both single and multi-document summarization
techniques function.
 Provide an example that demonstrates how abstractive summarization works.
 Highlight the strengths and weaknesses of collaborative vs. content-based filtering
approaches.

You might also like