0% found this document useful (0 votes)

4 views23 pages

natural language processing unit1

Natural Language Processing (NLP) is a field that enables computers to understand and interpret human languages, facilitating applications like chatbots, sentiment analysis, and machine translation. Key components of NLP include Natural Language Understanding (NLU) and Natural Language Generation (NLG), with various steps such as lexical analysis and semantic analysis involved in processing language. Challenges in NLP include ambiguity, irregularities in language, and the need for effective topic boundary detection to organize and summarize text.

Uploaded by

Siddharth Nyalakanti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

natural language processing unit1

Uploaded by

Siddharth Nyalakanti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

1

Natural Language Processing (NLP) Unit-I

1. Natural Language Processing – Introduction

 Humans communicate through some form of language either by text or speech.

 To make interactions between computers and humans, computers need to understand
natural languages used by humans.
 Natural language processing is all about making computers learn, understand,
analyze, manipulate and interpret natural(human) languages.
 NLP stands for Natural Language Processing, which is a part of Computer Science,
Human languages or Linguistics, and Artificial Intelligence.
 Processing of Natural Language is required when you want an intelligent system like
robot to perform as per your instructions, when you want to hear decision from a
dialogue based clinical expert system, etc.
 The ability of machines to interpret human language is now at the core of many
applications that we use every day - chatbots, Email classification and spam filters,
search engines, grammar checkers, voice assistants, and social language translators.
 The input and output of an NLP system can be Speech or Written Text.

2. Applications of NLP or Use cases of NLP

1. Sentiment analysis
 Sentiment analysis, also referred to as opinion mining, is an approach to natural
language processing (NLP) that identifies the emotional tone behind a body of text.
 This is a popular way for organizations to determine and categorize opinions about a
product, service or idea.
 Sentiment analysis systems help organizations gather insights into real-time customer
sentiment, customer experience and brand reputation.
 Generally, these tools use text analytics to analyze online sources such as emails, blog
posts, online reviews, news articles, survey responses, case studies, web chats, tweets,
forums and comments.
 Sentiment analysis uses machine learning models to perform text analysis of human
language. The metrics used are designed to detect whether the overall sentiment of a
piece of text is positive, negative or neutral.
2. Machine Translation
 Machine translation, sometimes referred to by the abbreviation MT, is a sub-field
of computational linguistics that investigates the use of software to translate text or
speech from one language to another.
 On a basic level, MT performs mechanical substitution of words in one language for
words in another, but that alone rarely produces a good translation because
recognition of whole phrases and their closest counterparts in the target language is
needed.
 Not all words in one language have equivalent words in another language, and many
words have more than one meaning.
2

 Solving this problem with corpus statistical and neural techniques is a rapidly
growing field that is leading to better translations, handling differences in linguistic
typology, translation of idioms, and the isolation of anomalies.
 Corpus: A collection of written texts, especially the entire works of a particular
author.

3. Text Extraction
 There are a number of natural language processing techniques that can be
used to extract information from text or unstructured data.
 These techniques can be used to extract information such as entity names,
locations, quantities, and more.
 With the help of natural language processing, computers can make sense
of the vast amount of unstructured text data that is generated every day,
and humans can reap the benefits of having this information readily
available.
 Industries such as healthcare, finance, and e-commerce are already using
natural language processing techniques to extract information and
improve business processes.
 As the machine learning technology continues to develop, we will only
see more and more information extraction use cases covered.

4. Text Classification

 Unstructured text is everywhere, such as emails, chat conversations, websites, and

social media. Nevertheless, it’s hard to extract value from this data unless it’s
organized in a certain way.
 Text classification also known as text tagging or text categorization is the process of
categorizing text into organized groups. By using Natural Language
Processing (NLP), text classifiers can automatically analyze text and then assign a set
of pre-defined tags or categories based on its content.
 Text classification is becoming an increasingly important part of businesses as it
allows to easily get insights from data and automate business processes.

5. Speech Recognition
 Speech recognition is an interdisciplinary subfield of computer
science and computational linguistics that develops methodologies and technologies
that enable the recognition and translation of spoken language into text by computers.
 It is also known as automatic speech recognition (ASR), computer speech
recognition or speech to text (STT).
 It incorporates knowledge and research in the computer
science, linguistics and computer engineering fields. The reverse process is speech
synthesis.
3

Speech recognition use cases

 A wide number of industries are utilizing different applications of speech technology
today, helping businesses and consumers save time and even lives. Some examples
include:
 Automotive: Speech recognizers improves driver safety by enabling voice-activated
navigation systems and search capabilities in car radios.
 Technology: Virtual agents are increasingly becoming integrated within our daily
lives, particularly on our mobile devices. We use voice commands to access them
through our smartphones, such as through Google Assistant or Apple’s Siri, for tasks,
such as voice search, or through our speakers, via Amazon’s Alexa or Microsoft’s
Cortana, to play music. They’ll only continue to integrate into the everyday products
that we use, fueling the “Internet of Things” movement.
 Healthcare: Doctors and nurses leverage dictation applications to capture and log
patient diagnoses and treatment notes.
 Sales: Speech recognition technology has a couple of applications in sales. It can help
a call center transcribe thousands of phone calls between customers and agents to
identify common call patterns and issues. AI chatbots can also talk to people via a
webpage, answering common queries and solving basic requests without needing to
wait for a contact center agent to be available. In both instances speech recognition
systems help reduce time to resolution for consumer issues.
6. Chatbot
 Chatbots are computer programs that conduct automatic conversations with people.
They are mainly used in customer service for information acquisition. As the name
implies, these are bots designed with the purpose of chatting and are also simply
referred to as “bots.”

 You’ll come across chatbots on business websites or messengers that give pre-scripted
replies to your questions. As the entire process is automated, bots can provide quick
assistance 24/7 without human intervention.

7. Email Filter
 One of the most fundamental and essential applications of NLP online is email
filtering. It began with spam filters, which identified specific words or phrases that
indicate a spam message. But, like early NLP adaptations, filtering has been
improved.
 Gmail's email categorization is one of the more common, newer implementations of
NLP. Based on the contents of emails, the algorithm determines whether they belong
in one of three categories (main, social, or promotional).
 This maintains your inbox manageable for all Gmail users, with critical, relevant
emails you want to see and reply to fast.
8. Search Autocorrect and Autocomplete
 When you type 2-3 letters into Google to search for anything, it displays a list of
probable search keywords. Alternatively, if you search for anything with mistakes, it
corrects them for you while still returning relevant results. Isn't it incredible?
4

 Everyone uses Google search autocorrect autocomplete on a regular basis but seldom
gives it any thought. It's a fantastic illustration of how natural language processing is
touching millions of people across the world, including you and me.
 Both, search autocomplete and autocorrect make it much easier to locate accurate
results.
3. Components of NLP
 There are two components of NLP, Natural Language Understanding (NLU)and
Natural Language Generation (NLG).
 Natural Language Understanding (NLU) which involves transforming
humanlanguage into a machine-readable format.It helps the machine to understand
and analyze human language by extracting the text from large data such as keywords,
emotions, relations, and semantics.
 Natural Language Generation (NLG) acts as a translator that converts
thecomputerized data into natural language representation.
 It mainly involves Text planning, Sentence planning, and Text realization.
 The NLU is harder than NLG.

4. Steps in NLP
There are general five steps :
 1. Lexical Analysis
 2. Syntactic Analysis (Parsing)
 3. Semantic Analysis
 4. Discourse Integration
 5. Pragmatic Analysis

Lexical Analysis:
 The first phase of NLP is the Lexical Analysis.
 This phase scans the source code as a stream of characters and converts it into
meaningful lexemes.
 It divides the whole text into paragraphs, sentences, and words.
 Lexeme: A lexeme is a basic unit of meaning. In linguistics, the abstract unit of
morphological analysis that corresponds to a set of forms taken by a single word is
called lexeme.
 The way in which a lexeme is used in a sentence is determined by its grammatical
category.
5

 Lexeme can be individual word or multiword.

 For example, the word talk is an example of an individual
word lexeme, which mayhave many grammatical variants
like talks, talked and talking.
 Multiword lexeme can be made up of more than one
orthographic word. For example,speak up, pull through, etc.
are the examples of multiword lexemes.

Syntax Analysis (Parsing)

 Syntactic Analysis is used to check grammar, word
arrangements, and shows therelationship among the words.
 The sentence such as “The school goes to boy” is rejected
by English syntactic analyzer.

Semantic Analysis
 Semantic analysis is concerned with the meaning representation.
 It mainly focuses on the literal meaning of words, phrases, and sentences.
 The semantic analyzer disregards sentence such as “hot ice-cream”.
 Another Example is “Manhattan calls out to Dave” passes a syntactic
analysis because it’s a grammatically correct sentence. However, it fails a
semantic analysis. Because Manhattan is a place (and can’t literally call out
to people), the sentence’s meaning doesn’t make sense.

Discourse Integration
 Discourse Integration depends upon the sentences that
precedes it and also invokesthe meaning of the sentences that
follow it.
 For instance, if one sentence reads, “Manhattan speaks to all its
people,” and the following sentence reads, “It calls out to Dave,”
discourse integration checks the first sentence for context to understand
that “It” in the latter sentence refers to Manhattan.

Pragmatic Analysis
 During this, what was said is re-interpreted on what it actually meant.
 It involves deriving those aspects of language which require real world knowledge.
 For instance, a pragmatic analysis can uncover the intended meaning of
“Manhattan speaks to all its people.” Methods like neural networks
assess the context to understand that the sentence isn’t literal, and most
people won’t interpret it as such. A pragmatic analysis deduces that this
sentence is a metaphor for how people emotionally connect with place.
6
7
8
9
10

Note: for detail morphemes topic refer notes

Issues and challenges in morphological modeling and parsing in natural language processing (NLP).

1. Ambiguity in Language and Syncretism

Ambiguity arises when a linguistic expression can have multiple interpretations. This is a
fundamental challenge in NLP and computational linguistics. There are two primary types of
ambiguity mentioned:

 Accidental Ambiguity: This happens when a word or phrase has multiple possible meanings
depending on the context. For example, the word bank can refer to a financial institution or
the side of a river.

 Ambiguity Due to Lexemes Having Multiple Senses: Some words can have multiple
meanings even within a single grammatical category. For example, light can mean "not
heavy" or "illumination."

 Syncretism (Systematic Ambiguity): This refers to cases where different grammatical

categories share the same word form, making interpretation more difficult. For example, in
English, the word deer remains the same for both singular and plural forms. In many
languages, the same word form can be used for different grammatical cases, making
disambiguation complex for NLP models.

2. Productivity and Creativity in Language

 Language is Dynamic: New words are constantly created due to cultural, technological, and
societal changes (e.g., "selfie" or "metaverse").

 Morphological Systems Can’t Always Handle New Words: Most NLP models rely on a
predefined lexicon or a set of grammatical rules. When they encounter a new word (a
neologism) or an unfamiliar usage, they may fail to parse or process it correctly.
12

 The Unknown Word Problem: Words that are not present in the model’s vocabulary remain
unprocessed. This problem is especially severe in:

o Speech or writing that includes domain-specific terminology.

o Conversations that mix multiple languages (code-switching).

o Cases where foreign names or new slang words are used.

 Example: If an NLP model trained only on standard English encounters a new internet slang
term like yeet, it may fail to understand or process it correctly.

3. Irregularity in Morphological Parsing

 Morphological Parsing: This refers to breaking down words into their smallest meaningful
units (morphemes). For example, the word unhappiness can be split into un- (prefix), happy
(root), and -ness (suffix).

 Generalization and Abstraction:

o NLP systems aim to create broad rules that can apply to many words.

o However, language has many exceptions (irregular forms), making this difficult.

 Challenges of Irregularity:

o Some words don’t follow standard rules (e.g., go → went instead of goed).

o Some words have multiple possible segmentations, leading to ambiguity.

o Some descriptions of linguistic data may be inaccurate or overly complex.

 Example:

o Regular English past tense: walk → walked (follows rule)

o Irregular English past tense: run → ran (does not follow rule)
13
14
15
16
17
18

Topic Boundary Detection:

Topic boundary detection is another important subtask of ﬁnding the structure of

documents in NLP. It involves identifying the points in a document where the topic or

theme of the text shifts. This task is particularly useful for organizing and

summarizing large amounts of text, as it allows for the identiﬁcation of different

topics or subtopics within a document.

Topic boundary detection is a challenging task, as it involves understanding the

underlying semantic structure and meaning of the text, rather than simply identifying

speciﬁc markers or patterns. As such, there are several methods and techniques that

have been developed to address this challenge, including

Lexical cohesion: This method looks at the patterns of words and phrases

that appear in a text, and identiﬁes changes in the frequency or distribution of

these patterns as potential topic boundaries. For example, if the frequency of

a particular keyword or phrase drops off sharply after a certain point in the

text, this could indicate a shift in topic.

2. Discourse markers: This method looks at the use of discourse markers, such

as "however", "in contrast", and "furthermore", which are often used to signal a

change in topic or subtopic. By identifying these markers in a text, it is

possible to locate potential topic boundaries.

3. Machine learning: This method involves training a machine learning model to

identify patterns and features in a text that are associated with topic

boundaries. This can involve using a variety of linguistic and contextual

features, such as sentence length, word frequency, and part-of-speech tags, to

identify potential topic boundaries.

METHODS:

There are several methods and techniques used in NLP to ﬁnd the structure of

documents, which include:

1. Sentence boundary detection: This involves identifying the boundaries

between sentences in a document, which is important for tasks like parsing,

machine translation, and text-to-speech synthesis.

2. Part-of-speech tagging: This involves assigning a part of speech (noun, verb,

adjective, etc.) to each word in a sentence, which is useful for tasks like

parsing, information extraction, and sentiment analysis.

3. Named entity recognition: This involves identifying and classifying named

entities (such as people, organizations, and locations) in a document, which is

important for tasks like information extraction and text categorization.

4. Coreference resolution: This involves identifying all the expressions in a text

that refer to the same entity, which is important for tasks like information

extraction and machine translation.

5. Topic boundary detection: This involves identifying the points in a document

where the topic or theme of the text shifts, which is useful for organizing and

summarizing large amounts of text.

6. Parsing: This involves analyzing the grammatical structure of sentences in a

document, which is important for tasks like machine translation,

text-to-speech synthesis, and information extraction.

7. Sentiment analysis: This involves identifying the sentiment (positive, negative,

or neutral) expressed in a document, which is useful for tasks like brand

monitoring, customer feedback analysis, and market research

Generative Sequence Classification Methods:

Most commonly used generative sequence classification method for topic and sentence is the
hidden Markov model (HMM).

An HMM has five components:

1️States (Hidden States) → These are the things we want to predict (e.g., POS tags: Noun, Verb,
Adjective).
2️Observations (Visible Outputs) → The actual words we see in a sentence (e.g., "dogs", "run",
"quickly").
3️Transition Probabilities (A) → Probability of moving from one hidden state to another (e.g., P(Noun
→ Verb)).
4️Emission Probabilities (B) → Probability of a word being generated from a state (e.g., P("run" |
Verb)).
5️Initial Probabilities (π) → Probability of starting in a particular state.

HMM for POS tagging using a small training dataset and testing it on a new sentence

“The Cat Runs”

“A dog barks”
21

Testing the data: “A dog runs”

import nltk

from nltk.tag import hmm

train_data = [[('The', 'DET'), ('cat', 'NOUN'), ('runs', 'VERB')],

[('A', 'DET'), ('dog', 'NOUN'), ('barks', 'VERB')]]

trainer = hmm.HiddenMarkovModelTrainer()

hmm_model = trainer.train(train_data)

test_sentence = ["The", "dog", "runs"]

predicted_tags = hmm_model.tag(test_sentence)

print(predicted_tags)

Bayes rule:
22
23

Maria Lilia F. Realubit. Translating Vocabulario de La Lengua...
No ratings yet
Maria Lilia F. Realubit. Translating Vocabulario de La Lengua...
3 pages
iNTRO TO lINGUISTICS sYLLABUS
No ratings yet
iNTRO TO lINGUISTICS sYLLABUS
10 pages
NLP Notes
No ratings yet
NLP Notes
90 pages
unit 3&4
No ratings yet
unit 3&4
10 pages
NLP_UNIT-1[1]
No ratings yet
NLP_UNIT-1[1]
20 pages
Unit 3
No ratings yet
Unit 3
14 pages
NLP
No ratings yet
NLP
27 pages
Nlp Materia
No ratings yet
Nlp Materia
29 pages
NLP
No ratings yet
NLP
31 pages
AI-CH-4
No ratings yet
AI-CH-4
53 pages
6._NLP
No ratings yet
6._NLP
11 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
DS Exp2 20101A0021 Satyam Mishra
No ratings yet
DS Exp2 20101A0021 Satyam Mishra
5 pages
DS Exp2 Rugved
No ratings yet
DS Exp2 Rugved
5 pages
NLP Exam Notes
No ratings yet
NLP Exam Notes
15 pages
AI Unit-5
No ratings yet
AI Unit-5
10 pages
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
No ratings yet
AI Unit 3 - Natural Language Processing by Kulbhushan (Krazy Kaksha & KK World)
4 pages
nlp unit 1
No ratings yet
nlp unit 1
133 pages
ML1701 - NLP Notes Unit-1
No ratings yet
ML1701 - NLP Notes Unit-1
38 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
NLP Application
No ratings yet
NLP Application
7 pages
Tech Titans Ppt
No ratings yet
Tech Titans Ppt
12 pages
Chapter-6 Communicating, Perceiving, and Acting
100% (1)
Chapter-6 Communicating, Perceiving, and Acting
10 pages
foundation for NLP
No ratings yet
foundation for NLP
14 pages
NLP Notes
No ratings yet
NLP Notes
9 pages
Natural Language Processing
100% (1)
Natural Language Processing
12 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
NLP Unit 1 (1)
No ratings yet
NLP Unit 1 (1)
48 pages
Group 8 NLP
No ratings yet
Group 8 NLP
3 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
80 pages
NLP Lecture Slides - Part 1
No ratings yet
NLP Lecture Slides - Part 1
54 pages
NLP handwritten notes_copy
No ratings yet
NLP handwritten notes_copy
26 pages
NLP MODULE 1 Chapter1 &2 ppt
No ratings yet
NLP MODULE 1 Chapter1 &2 ppt
83 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
NLP
No ratings yet
NLP
11 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
NLP unit 1 notes
No ratings yet
NLP unit 1 notes
15 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Intro NLP
No ratings yet
Intro NLP
47 pages
NLP 01
No ratings yet
NLP 01
7 pages
Natural Language Processing_1
No ratings yet
Natural Language Processing_1
44 pages
AI UNIT5
No ratings yet
AI UNIT5
16 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
76 pages
AI Init-5
No ratings yet
AI Init-5
6 pages
ورقة الذكاء
No ratings yet
ورقة الذكاء
7 pages
AI Chapter 6
No ratings yet
AI Chapter 6
27 pages
AI-2
No ratings yet
AI-2
7 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
unit 4 (1)
No ratings yet
unit 4 (1)
39 pages
NLP Toppers Solution
No ratings yet
NLP Toppers Solution
86 pages
Natural Languag-wps Office (1)
No ratings yet
Natural Languag-wps Office (1)
24 pages
A Beginner's Introduction To Natural Language Processing (NLP)
100% (1)
A Beginner's Introduction To Natural Language Processing (NLP)
15 pages
Disruptive Technologies AI Lecture 3
No ratings yet
Disruptive Technologies AI Lecture 3
19 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
CH 5 NLP
No ratings yet
CH 5 NLP
12 pages
What Is NLP
No ratings yet
What Is NLP
16 pages
What Is NLP?
No ratings yet
What Is NLP?
3 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
Natural Language Processing_ Bridging the Gap Between Humans and Machines
No ratings yet
Natural Language Processing_ Bridging the Gap Between Humans and Machines
6 pages
BTech Advanced AI Unit04
No ratings yet
BTech Advanced AI Unit04
45 pages
AI For Your Business
From Everand
AI For Your Business
Book Summary Club
No ratings yet
Hockett 1961 Linguistic Elements and Their Relations
No ratings yet
Hockett 1961 Linguistic Elements and Their Relations
26 pages
5.2.-Predicate M4a
No ratings yet
5.2.-Predicate M4a
6 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Annual Distribution For 1AS Lit 2022 2023
No ratings yet
Annual Distribution For 1AS Lit 2022 2023
2 pages
Similarities and Differences of Word Order in English and German Languages
No ratings yet
Similarities and Differences of Word Order in English and German Languages
4 pages
Lessons from Documented Endangered Languages 1st Edition K. David Harrison (Ed.)download
100% (1)
Lessons from Documented Endangered Languages 1st Edition K. David Harrison (Ed.)download
55 pages
Final MorphoSyntax 2021 Sample Key
No ratings yet
Final MorphoSyntax 2021 Sample Key
3 pages
A Grammatical Description of The Early Classic Maya Hieroglyphic
No ratings yet
A Grammatical Description of The Early Classic Maya Hieroglyphic
135 pages
The Scope of Theoretical Grammar. 2. Fundamental Concepts of Grammar
No ratings yet
The Scope of Theoretical Grammar. 2. Fundamental Concepts of Grammar
9 pages
Linguistics - The Prague School - Britannica
No ratings yet
Linguistics - The Prague School - Britannica
1 page
Lexical Gaps, Nour Assi
No ratings yet
Lexical Gaps, Nour Assi
10 pages
I. Haspelmath (2010) : Lists of Properties of Inflection and Derivation
No ratings yet
I. Haspelmath (2010) : Lists of Properties of Inflection and Derivation
6 pages
Comparative Lexicology
No ratings yet
Comparative Lexicology
30 pages
Morphology Quiz
No ratings yet
Morphology Quiz
4 pages
Word Formation
No ratings yet
Word Formation
28 pages
1 Linguistsics BA Syllabus - Linguistics - 2018
No ratings yet
1 Linguistsics BA Syllabus - Linguistics - 2018
45 pages
Shri Ramswaroop Memorial University, Deva Road Barabanki (Up)
No ratings yet
Shri Ramswaroop Memorial University, Deva Road Barabanki (Up)
14 pages
Methods and Procedures of Lexicological Analysis
No ratings yet
Methods and Procedures of Lexicological Analysis
4 pages
Mark Durie, Malcolm Ross The Comparative Method Reviewed - Regularity and Irregularity in Language Change
No ratings yet
Mark Durie, Malcolm Ross The Comparative Method Reviewed - Regularity and Irregularity in Language Change
330 pages
MORPHOLOGY Introduction
No ratings yet
MORPHOLOGY Introduction
11 pages
Ling Morphology 13
No ratings yet
Ling Morphology 13
82 pages
Types of Affixes
No ratings yet
Types of Affixes
11 pages
Part 3 Syntax - Parts of Speech FD Peralta
No ratings yet
Part 3 Syntax - Parts of Speech FD Peralta
84 pages
The Basic Ways of Word Formation
No ratings yet
The Basic Ways of Word Formation
56 pages
ENGLISH
No ratings yet
ENGLISH
4 pages
Grammar PDF
No ratings yet
Grammar PDF
2 pages
Fundamental Principles of An Onomasiological Theory of English Word-Formation
No ratings yet
Fundamental Principles of An Onomasiological Theory of English Word-Formation
3 pages
Sanskrit and Its Development From Proto-Indo-European (PDFDrive)
No ratings yet
Sanskrit and Its Development From Proto-Indo-European (PDFDrive)
57 pages

natural language processing unit1

Uploaded by

natural language processing unit1

Uploaded by

1

Natural Language Processing (NLP) Unit-I

 Humans communicate through some form of language either by text or speech.

2. Applications of NLP or Use cases of NLP

 Unstructured text is everywhere, such as emails, chat conversations, websites, and

Speech recognition use cases

 Lexeme can be individual word or multiword.

Syntax Analysis (Parsing)

Note: for detail morphemes topic refer notes

1. Ambiguity in Language and Syncretism

 Syncretism (Systematic Ambiguity): This refers to cases where different grammatical

2. Productivity and Creativity in Language

o Speech or writing that includes domain-specific terminology.

o Conversations that mix multiple languages (code-switching).

o Cases where foreign names or new slang words are used.

3. Irregularity in Morphological Parsing

 Generalization and Abstraction:

o Some words have multiple possible segmentations, leading to ambiguity.

o Some descriptions of linguistic data may be inaccurate or overly complex.

o Regular English past tense: walk → walked (follows rule)

Topic Boundary Detection:

Topic boundary detection is another important subtask of ﬁnding the structure of

summarizing large amounts of text, as it allows for the identiﬁcation of different

topics or subtopics within a document.

Topic boundary detection is a challenging task, as it involves understanding the

have been developed to address this challenge, including

that appear in a text, and identiﬁes changes in the frequency or distribution of

these patterns as potential topic boundaries. For example, if the frequency of

text, this could indicate a shift in topic.

change in topic or subtopic. By identifying these markers in a text, it is

possible to locate potential topic boundaries.

3. Machine learning: This method involves training a machine learning model to

boundaries. This can involve using a variety of linguistic and contextual

features, such as sentence length, word frequency, and part-of-speech tags, to

identify potential topic boundaries.

documents, which include:

1. Sentence boundary detection: This involves identifying the boundaries

between sentences in a document, which is important for tasks like parsing,

machine translation, and text-to-speech synthesis.

2. Part-of-speech tagging: This involves assigning a part of speech (noun, verb,

parsing, information extraction, and sentiment analysis.

3. Named entity recognition: This involves identifying and classifying named

entities (such as people, organizations, and locations) in a document, which is

important for tasks like information extraction and text categorization.

4. Coreference resolution: This involves identifying all the expressions in a text

extraction and machine translation.

5. Topic boundary detection: This involves identifying the points in a document

summarizing large amounts of text.

6. Parsing: This involves analyzing the grammatical structure of sentences in a

document, which is important for tasks like machine translation,

text-to-speech synthesis, and information extraction.

7. Sentiment analysis: This involves identifying the sentiment (positive, negative,

or neutral) expressed in a document, which is useful for tasks like brand

monitoring, customer feedback analysis, and market research

Generative Sequence Classification Methods:

An HMM has five components:

“The Cat Runs”

Testing the data: “A dog runs”

from nltk.tag import hmm

train_data = [[('The', 'DET'), ('cat', 'NOUN'), ('runs', 'VERB')],

[('A', 'DET'), ('dog', 'NOUN'), ('barks', 'VERB')]]

test_sentence = ["The", "dog", "runs"]

You might also like