0% found this document useful (0 votes)
58 views12 pages

Natural Language Processing

This document discusses natural language processing and text mining applications. It covers topics like text clustering, trend analysis, supervised and unsupervised text mining, sentiment analysis, combining structured and text data in predictive models, common NLP tasks including part-of-speech tagging, named entity recognition, and parsing.

Uploaded by

Mohamed Adel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views12 pages

Natural Language Processing

This document discusses natural language processing and text mining applications. It covers topics like text clustering, trend analysis, supervised and unsupervised text mining, sentiment analysis, combining structured and text data in predictive models, common NLP tasks including part-of-speech tagging, named entity recognition, and parsing.

Uploaded by

Mohamed Adel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Natural Language Processing

Natural Language understanding

1
Text Mining Applications – Unsupervised
• Text clustering • Trend analysis

Trend for the Term “text mining” from Google Trends


Cluster Comment Key Words
No.
1 1, 3, 4 doctor, staff,
friendly, helpful
2 5, 6, 8 treatment, results,
time, schedule
3 2, 7 service, clinic, fast

2
Text Mining Applications – Supervised
– Many typical predictive modeling or
classification applications can be
enhanced by incorporating textual data in
addition to traditional input variables.
• churning propensity models that include
customer center notes, website forms, e-
mails, and Twitter messages
• hospital admission prediction models
incorporating medical records notes as a
new source of information
• insurance fraud modeling using adjustor
notes
• sentiment categorization
• stylometry or forensic applications that
identify the author of a particular writing
sample
Sentiment Analysis
• The field of sentiment analysis deals with categorization (or
classification) of opinions expressed in textual documents

Green color represents positive tone, red color represents negative tone, and
product features and model names are highlighted in blue and brown, respectively.

4
Structured + Text Data in Predictive
Models
• Use of both types of data in building predictive
models.

ROC Chart of Models With and Without Textual Comments


NLP Tasks
• NLP applications require several NLP analyses:
– Word tokenization
– Sentence boundary detection
– Part-of-speech (POS) tagging
• to identify the part-of-speech (e.g. noun, verb) of each word
– Named Entity (NE) recognition
• to identify proper nouns (e.g. names of person, location,
organization; domain terminologies)
– Parsing
• to identify the syntactic structure of a sentence
– Semantic analysis
• to derive the meaning of a sentence

6
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical
class marker to each word in a sentence (and all
sentences in a corpus).

Input: the lead paint is unsafe


Output: the/Det lead/N paint/N is/V unsafe/Adj

7
Syntactic Analysis - Grammar
• sentence -> noun_phrase, verb_phrase
• noun_phrase -> proper_noun
• noun_phrase -> determiner, noun
• verb_phrase -> verb, noun_phrase
• proper_noun -> [mary]
• noun -> [apple]
• verb -> [ate]
• determiner -> [the]
9
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a
sentence
– e.g. “U.N. official Ekeus heads for Baghdad.”

10
Confusion matrix

• True Positive:
You predicted positive and it’s true.
• True Negative:
You predicted negative and it’s true.
• False Positive: (Type 1 Error)
You predicted positive and it’s false.
• False Negative: (Type 2 Error)
You predicted negative and it’s false.

11
12

You might also like