Machine Learning Natural Language 2023
Machine Learning Natural Language 2023
1
NLP System : IBM Watson
2
Natural Language Processing
• NLP focuses on developing systems that allow
computers to perform useful tasks involving human
language
– Also called Computational Linguistics
• NLP applications
– Information Retrieval
– Question Answering
– Machine Translation
– Information Extraction
3
NLP application : Information Retrieval
• Stemming
• Spell checking
• Query expansion
• Word sense
disambiguation
6
NLP application : Question Answering
7
NLP application : Machine Translation
• Sentence alignment
• POS tagging
• Parsing
• Sentence generation grammars
• Named Entity Recognition (“New Delhi”)
8
NLP application : Information Extraction
27
NLP application : Categorization
• Topical : politics, sports, business
• Sentiment: positive, negative, neutral
POS tagging to obtain adjectives
28
NLP : Tasks
• Discourse
Coreference Resolution : linking pronouns/abbreviations to entities
“I saw Scott yesterday. He was fishing by the lake.”
“Indian Institute of Technology Hyderabad is a public institution
located in Hyderbad. IITH was established in 2007.”
• Semantic
“I put the plant in the window” vs “Ford put the plant in Mexico”
• Ambiguity is Explosive
“I saw the man on the hill with the telescope.”: 4 parses
Machine Learning Natural Language
● “Rules” in language have numerous exceptions and irregularities
● Manual knowledge engineering, is difficult, time-consuming, and error
prone.
●
Use machine learning methods to automatically acquire the required
knowledge from appropriately annotated text corpora.
●
Annotating corpora is easier and requires less expertise than manual
knowledge engineering.
Machine Learning POS Tagging
•
Lowest level of syntactic analysis
• Useful for Parsing and word sense disambiguation
• Ambiguity in POS tagging
Flies[Noun] like[Verb] flower[Noun]
Time flies[Verb] like[Prep] an arrow.
Learning : Train models on human annotated corpora like the Penn Treebank.
POS Tagging
Classification
classifier
NN
14
POS Tagging
Classification
NN
Time flies like an arrow.
classifier
VBZ
15
POS Tagging
● Classification
NN VBZ
Time flies like an arrow.
classifier
VBP
16
POS Tagging
● Classification
NN VBZ VBP
Time flies like an arrow.
classifier
DT
17
POS Tagging
Classification
NN VBZ VBP DT
Time flies like an arrow.
classifier
NN
18
POS Tagging
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
Tags of words are dependent on the tags of other words in
the sentence, particularly their neighbors
classifier
NN
19
POS Tagging
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
NN
Time flies like an arrow.
classifier
VBZ
20
POS Tagging
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
NN VBZ
Time flies like an arrow.
classifier
IN
21
POS Tagging
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
NN VBZ IN
Time flies like an arrow.
classifier
DT
22
POS Tagging
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
NN VBZ IN DT
Time flies like an arrow.
classifier
NN
23
Sequence Labeling
Classification
NN VBZ VBP DT NN
Time flies like an arrow.
Sequence Labeling
NN VBZ IN DT NN
Time flies like an arrow.
24
Parsing
•
Ambiguity
“I saw the man with the telescope” vs
“I saw the man with the telescope”
Probabilistic Context Free Grammars (PCFG)
• Structured Prediction
Strings Trees
• Software tools
Stanford CoreNLP, openNLP, NLTK, Lingpipe
26
References
Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing