Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
CS 1462
Introduction
Computer Science
3 rd s e m e s t e r - 1 4 4 4
Dr. Fahman Saeed
[email protected]
1
Some slides borrows from Carl Sable
What is NLP ?
2
What is NLP ?
3
Some Applications of NLP
Information retrieval
Text categorization
Grammar checking
Automatic machine translation
Question-answering
Automatic summarization
Fields that Contributed to NLP
14
Wordforms
A question mark ('?') can be used to match the preceding character or RE, or
nothing (i.e., zero or one instance of the preceding character or RE)
For example, /woodchucks?/ matches the singular or plural of the word
"woodchuck"
The Kleene star ('*') indicates zero or more occurrences of the previous character
or RE
The Kleene + ('+') means one or more of the previous character or RE
As an example of why these are useful, let's say you want to represent all strings
representing sheep sounds
In other words, we want to represent the language consisting of the strings "baa!",
"baaa!", "baaaa!", "baaaaa!", etc.
Two regular expressions defining this language are /baaa*!/ and /baa+!/
This language is an example of a formal language, which is a set of strings
adhering to specific rules
One very important special character is the period ('.'); this is a wildcard expression
that matches any single character (except an end-of-line character)
For example, to find a line in which the word "aardvark" appears twice, you can use
/aardvark.*aardvark/
To match an actual period, you can use "\." within an RE
Anchors
Book: "Morphology is the study of the way words are built up from
smaller meaning-bearing units called morphemes."
The current edition of the textbook very briefly discusses morphological
parsing, which is necessary for "sophisticated methods for lemmatization"
Recall that in practice, a wordform can instead be looked up in an
appropriate resource to retrieve the lemma
WordNet is an example of such a resource that was very popular in
conventional NLP
To keep such resources current (e.g., by adding new words) involves a lot of
manual effort
Also recall that we can avoid lemmatization all together by applying
stemming, which is much simpler (but doesn't always work as well)
The current edition of the textbook dropped most of its discussion of
morphology
We will discuss it in more detail than the book (but significantly less than I
used to); some of this content comes from the previous edition of the
textbook
Rules of Morphology
Orthographic rules are general rules that deal with spelling and tell us how
to transform words; some examples are:
To pluralize a noun ending in "y", change the "y" to an "i" and add "es" (e.g., "bunnies")
A single consonant letter is often doubled before adding "-ing" or "-ed" suffixes (e.g.,
"begging", "begged")
A "c" is often changed to "ck" when adding "-ing" and "-ed" (e.g., "picnicking", "picnicked")
Morphological rules deal with exceptions; e.g., "fish" is its own plural,
"goose" becomes "geese"
Morphological parsing uses both types of rules in order to break down a
word into its component morphemes
A morpheme is the smallest part of the word that has a semantic meaning
For example, given the wordform, "going", the parsed form can be
represented as: "VERB-go + GERUND-ing"
Conventionally, morphological parsing sometimes played an important role
for POS tagging
For morphologically complex languages (we'll discuss an example later), it
can also play an important role for web search
Stems and Affixes
30
Combining Morphemes
Parts of speech (POS) are categories for words that indicate their syntactic
functions
Parts of speech are also known as word classes, lexical tags, or syntactic
categories
Included was the description of eight parts-of-speech: noun, verb, pronoun,
preposition, adverb, conjunction, participle, and article
35
36
Uses of POS
Knowing the POS of a word gives you information about its neighbors
Examples: possessive pronouns (e.g., "my", "her", "its") are likely to be followed by
nouns, while personal pronouns (e.g., "I", "you", "he") are likely to be followed by
verbs
POS can tell us about how a word is pronounced (e.g., "content" as a noun or an
adjective)
POS can also be useful for applications such as parsing, named entity recognition,
and coreference resolution
Corpora that have been marked with parts of speech are useful for linguistic
research
Part-of-speech tagging (a.k.a. POS tagging or sometimes just tagging) is the
automatic assignment of POS to words
POS tagging is often an important first step before several other NLP applications
can be applied
We will discuss the use of hidden Markov models (HMMs) for POS tagging later in
the topic
There are also deep learning approaches for POS tagging, which tend to perform a
bit better (we'll learn about such methods later in the course)
Named-Entity Recognition
38
Example
39
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
40
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
41
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
42
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
43
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
44
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
45
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
46
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
47
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
48
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
49
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
50
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
51
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
52
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
53
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
54
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
55
A Short Introduction to Arabic Natural Language Processing
Prof. Nizar Habash
56
Wishing you a fruitful educatio
nal experience
57