0% found this document useful (0 votes)

16 views8 pages

Unit 6 (NLP)

Uploaded by

sachin kumar saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Unit 6 (NLP)

Uploaded by

sachin kumar saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Class – X

ARTIFICIAL
INTELLIGENCE

Unit 6
(Natural Language Processing)
Natural Language Processing: Natural Language Processing (NLP) is a branch of artificial intelligence that
enables computers to process human language in the form of text or voice data, 'understand' its full meaning
and mimic human conversation.

Application of NLP: Some common NLP applications are:

1. AutoText Summaritation: Automatic Text Summarisation is an NLP technique where a computer

program shortens longer texts and generates summaries to pass the intended message as the most
important information is retained.

2. Sentiment Analysis: Sentiment Analysis refers to the use of linguistic analysis using Al to detect
emotional and language tones in written text or speech to text.

3. Text Classification: Text Classification is the process of understanding, analysing and categorising
unstructured text into organised groups using NLP and other AI technologies based on predetermined
tags and categories.

4. Virtual Assistants: Virtual Assistants are NLP based programs that are auto- mated to communicate in
human voice, mimicking human interaction to help ease your day-to-day tasks, such as showing
weather reports, creating remainders, making shopping lists etc. for example, Siri, Cortana, Alexa,
Google Assistant.

7. Chatbots: Chatbots are essentially a software applications that use Al and NLP to assist humans and
communicate through text or voice.

Depending on how they are programmed, they can be categorized as:

i. Script Chatbots: They work on pre-written keywords that they understand. Each of the
commands that they are going to follow must be coded into them by the developer. So,
if a user asks them something outside of their knowledge base, they respond with
"sorry, I did not understand", or something along those lines.

ii. Smart Chatbots: Smart chatbots are based on Al, these bots don't have pre-
programmed answers. They learn with time, catching keywords and putting them in
context, and help users arrive at the most relevant answers to their queries.
Script Chatboat vs Smart Chatboat:

S.No. Script Chatboat Smart Chatboat

1 These are task specific chatbots These are flexible and versatile chatbots
Script bots work around a script which Smart bots work on bigger databases and
2
is programmed in them other resources directly
These require simple or very less These require advanced AI based
3
programming programming
4 These have limited functionality These are flexible and adaptive

Human Language VS Computer Language:

Humans communicate through language which we process all the time. As a person speaks, the
sound travels and enters the listener's eardrum. This sound then converted into neuron impulse and
transported to the brain for processing. After processing, the brain gains understanding around the meaning
of sound.

The computer understands the language of numbers. Everything that is sent to the machine has to
be converted to numbers. And while typing, if a single mistake is made, the computer throws an error and
does not process that part. The communications made by the machines are very basic and simple.

Difficulties faced by machine to understand human language:

1. Arrangement of the words and meaning: There are rules in human language which provide structure to a
language. There are nouns, verbs, adverbs, adjectives. A word can be a noun at one time and an adjective
some other time.

2. Multiple meanings of a word: In natural language, a word can have multiple meanings and the meanings fit
into the statement according to the context of it.

3. Perfect Syntax, no Meaning: Sometimes, a statement can have a perfectly correct syntax but it does not
mean anything. For example, take a look at this statement:

Chickens feed extravagantly while the moon drinks tea.

This statement is correct grammatically but does not make any sense.

How NLP makes it possible for the machines to understand and speak just like humans?:
We all know that the language of computers Is Numerical, so the very first step that comes to our mind
is to convert our language to numbers. This conversion happens in various steps which are given below.

1. Text Normalisation: In Text Normalisation, we undergo several steps to normalise the text to a lower
level. Text Normalisation helps in cleaning up the textual data in such a way that it comes down to a level
where its complexity is lower than the actual data. Steps of Text Normalisation are:

a) Sentence Segmentation: Sentence Segmentation is the process of dividing the whole text into smaller
components, i.e., individual sentences. This is done to understand the thought or idea of each
individual sentence. For example:

Hello world. AI is fun to know.  Hello world.

It has starting impacting our  AI is fun to know.
lives in many ways. Many  It has starting impacting our lives in
more revolutionary many ways.
technologies will soon  Many more revolutionary technologies
evolve out of it. will soon evolve out of it.

b) Tokenisation: After segmenting the sentences, each sentence is then further divided into tokens.
Tokens is a term used for any word or number or special character occurring in a sentence. Under
tokenisation, every word, number and special character is considered separately and each of them is
now a separate token. For example:

c) Removing Stopwords, Special Characters and Numbers: The function of this step is to find
unimportant words as per the overall meaning and retain the important words in the text. For
example, consider the following two sentences which are conveying the same meaning to the
computer.

‘Delhi is the capital and the most populous city.’

‘Delhi capital most populous city’

This step, along with removing the stop words, also removes the redundant special characters and
numbers, which are not contributing to the overall meaning.
d) Converting text to a common case: After the stopwords removal, we convert the whole text into a
similar case, preferably lower case. This is done to ensure that if any machine is case sensitive, it
should not affect the overall result by considering two same words in different cases as
different words.

e) Stemming : In this step, the remaining words are reduced to their root words. In other words,
stemming is the process in which the affixes of words are removed and the words are converted to
their base form. Note that in stemming, the stemmed words might not be meaningful.

f) Lemmatization: Stemming and lemmatization both are alternative processes to each other as the role
of both the processes is same – removal of affixes. But the difference between both of them is that in
lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.
2. Bag of Words : Bag of Words is a Natural Language Processing model which helps in extracting features
out of the text which can be helpful in machine learning algorithms. In bag of words, we get the occurrences of each
word and construct the vocabulary for the text.

A bag of words gives us two things:

1. A vocabulary of words for the corpus
2. The frequency of these words (number of times it has occurred in the whole corpus).
It is called a "bag" of words, because it contains just the collection of words without any information about the order or
structure of words in the document.

Steps to Implement Bag-of-Words Model:

Step 1: Text Normalisation: In the first step, the data is to be collected and then pre-processed. For example, we have
the following text available:

“Aman and Anil are stressed.

Aman went to a therapist.
Anil went to download a health chatbot.”

After text normalisation, the text becomes:

Document 1: [aman, and, anil, are, stressed]

Document 2: [aman, went, to, a, therapist]
Document 3: [anil, went, to, download, a, health, chatbot]

Step 2: Create Dictionary: Now we can make a list of all of the words in our model vocabulary with the unique words
left after pre-processing of all the documents. So, the unique words in our Dictionary for the text will be:

Step 3: Create document vector: In this step, the vocabulary is written in the top row. Now, for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word appears again, increment the previous
value by 1. And if the word does not occur in that document, put a 0 under it.
Since in the first document, we have words: aman, and, anil, are, stressed. So, all these words get a value of 1 and rest of
the words get a 0 value.

Step 4: Repeat for all documents: Same exercise has to be done for all the documents. Hence, the table becomes:

In this table, the header row contains the vocabulary of the corpus and three rows correspond to three different
documents.
Finally, this gives us the document vector table for our corpus. But the tokens have still not converted to
numbers. This leads us to the final steps of our algorithm: TFIDF.

3. TFIDF: Term Frequency & Inverse Document Frequency: TFIDF stands for Term Frequency and
Inverse Document Frequency. TFIDF helps un in identifying the value for each word. Let us understand each term one by
one.

Term Frequency: Term frequency is the frequency of a word in one document. Term frequency can easily be found
from the document vector table as in that table we mention the frequency of each word of the vocabulary in each
document.

Inverse Document Frequency: To understand Inverse Document Frequency, let us first understand what does
document frequency mean. Document Frequency is the number of documents in which the word occurs irrespective of
how many times it has occurred in those documents. The document frequency for the exemplar vocabulary would be:
In inverse document frequency, we need to put the document frequency in the denominator while the total number of
documents is the numerator. Here, the total number of documents are 3, hence inverse document frequency becomes:

Finally, the formula of TFIDF for any word W becomes:

TFIDF(W) = TF(W) * log( IDF(W) )

Fundamentals of Programming: Using Python
From Everand
Fundamentals of Programming: Using Python
Bruce Embry
5/5 (2)
ATS-friendly Resume v1 Unfold
No ratings yet
ATS-friendly Resume v1 Unfold
1 page
NLP_AI_X
No ratings yet
NLP_AI_X
6 pages
C10_AI_UNIT 3_NLP_ HALF YEARLY
No ratings yet
C10_AI_UNIT 3_NLP_ HALF YEARLY
37 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
AI_NLP
No ratings yet
AI_NLP
9 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP
No ratings yet
NLP
16 pages
NLP Class10.PDF
No ratings yet
NLP Class10.PDF
9 pages
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
No ratings yet
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
7 pages
NLP IA1
No ratings yet
NLP IA1
7 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
NLP Notes
No ratings yet
NLP Notes
10 pages
IP Projects NLP
No ratings yet
IP Projects NLP
8 pages
6._NLP
No ratings yet
6._NLP
11 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Welcome
No ratings yet
Welcome
8 pages
Seminar Report1
No ratings yet
Seminar Report1
17 pages
NaturalLanguageProcessingClassworkNotes_1473d9cb2fd64561b134cb14125f9536_37661
No ratings yet
NaturalLanguageProcessingClassworkNotes_1473d9cb2fd64561b134cb14125f9536_37661
10 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
Natural Language Processing Notes Class 10 AI
100% (1)
Natural Language Processing Notes Class 10 AI
20 pages
Key Notes for Natural Language Processing
No ratings yet
Key Notes for Natural Language Processing
2 pages
UNIT 6- NLP NOTES
No ratings yet
UNIT 6- NLP NOTES
7 pages
Unit - 1 Introduction
No ratings yet
Unit - 1 Introduction
33 pages
TSP unit1 own (1)
No ratings yet
TSP unit1 own (1)
20 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Text Analytics and Natural Language Processing - KAI073.docx
No ratings yet
Text Analytics and Natural Language Processing - KAI073.docx
24 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
25 pages
AI-2
No ratings yet
AI-2
7 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
5 unit in AI
No ratings yet
5 unit in AI
49 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
NLP SEM IMP
No ratings yet
NLP SEM IMP
46 pages
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing (1)
No ratings yet
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing (1)
3 pages
Harambe University
No ratings yet
Harambe University
8 pages
NLP Notes
No ratings yet
NLP Notes
9 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
unit-4 NLP
No ratings yet
unit-4 NLP
54 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Introduction to Data Science_Week 7_LAQ's
No ratings yet
Introduction to Data Science_Week 7_LAQ's
4 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
36 pages
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
No ratings yet
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
19 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
The Beginner’s Guide to Creating AI Chatbots
From Everand
The Beginner’s Guide to Creating AI Chatbots
Steven Mcananey
No ratings yet
C.W. AI Ch-1 (02_07_24) IX
No ratings yet
C.W. AI Ch-1 (02_07_24) IX
3 pages
C.W. AI Ch-1 (06_07_24)
No ratings yet
C.W. AI Ch-1 (06_07_24)
2 pages
C.W. AI Ch-2( 02_08_24)
No ratings yet
C.W. AI Ch-2( 02_08_24)
3 pages
CHAP-5
No ratings yet
CHAP-5
2 pages
S&P Capital IQ Office Plug-In 9_x Technical Guide
No ratings yet
S&P Capital IQ Office Plug-In 9_x Technical Guide
20 pages
Big Data Quiz-Merged
No ratings yet
Big Data Quiz-Merged
152 pages
Smart Edge Price List 31.7-1
No ratings yet
Smart Edge Price List 31.7-1
1 page
ManageBac What's New Expo Session
No ratings yet
ManageBac What's New Expo Session
83 pages
Aveva Bocad Admin 22 en
No ratings yet
Aveva Bocad Admin 22 en
24 pages
(Tutorial) HTML5 Games
No ratings yet
(Tutorial) HTML5 Games
44 pages
Tavrida Electric
No ratings yet
Tavrida Electric
42 pages
Chapter II New
No ratings yet
Chapter II New
11 pages
Lalith 6th Sem Report Final 07
No ratings yet
Lalith 6th Sem Report Final 07
82 pages
Python NPTEL
No ratings yet
Python NPTEL
66 pages
(Lecture 1) NAOE2214 CLASS 1 Introduction To Programming (Partial)
No ratings yet
(Lecture 1) NAOE2214 CLASS 1 Introduction To Programming (Partial)
25 pages
Xapp1151 Xilinx Parameterizable CAM
No ratings yet
Xapp1151 Xilinx Parameterizable CAM
31 pages
Wipro
No ratings yet
Wipro
21 pages
March
No ratings yet
March
9 pages
grammar book
No ratings yet
grammar book
20 pages
Unit 16
No ratings yet
Unit 16
13 pages
S B Gowtham Resume
No ratings yet
S B Gowtham Resume
2 pages
Graylog Product Adoption Guide
No ratings yet
Graylog Product Adoption Guide
5 pages
Oop Finals Reviewer
No ratings yet
Oop Finals Reviewer
11 pages
Osmobsc Vty Reference
No ratings yet
Osmobsc Vty Reference
272 pages
Bab 03 Mitigate Threats Using Microsoft Defender For Cloud
No ratings yet
Bab 03 Mitigate Threats Using Microsoft Defender For Cloud
45 pages
PHP Slide Presntation
No ratings yet
PHP Slide Presntation
59 pages
Le Pan S User Manual
100% (1)
Le Pan S User Manual
34 pages
NotebookLM_Guide_Postgrad
No ratings yet
NotebookLM_Guide_Postgrad
5 pages
Java - Programming Module3 (Packages & Exception Handling)
No ratings yet
Java - Programming Module3 (Packages & Exception Handling)
15 pages
OK Vocabulary Actions 1 - LearnEnglish Kids
No ratings yet
OK Vocabulary Actions 1 - LearnEnglish Kids
7 pages
C++ Important Notes
No ratings yet
C++ Important Notes
6 pages
Computer Architecture-Notes
No ratings yet
Computer Architecture-Notes
61 pages
Aero 16: E-Manual
No ratings yet
Aero 16: E-Manual
152 pages

Unit 6 (NLP)

Uploaded by

Unit 6 (NLP)

Uploaded by

Class – X

Application of NLP: Some common NLP applications are:

1. AutoText Summaritation: Automatic Text Summarisation is an NLP technique where a computer

Depending on how they are programmed, they can be categorized as:

S.No. Script Chatboat Smart Chatboat

Human Language VS Computer Language:

Difficulties faced by machine to understand human language:

Chickens feed extravagantly while the moon drinks tea.

Hello world. AI is fun to know.  Hello world.

‘Delhi is the capital and the most populous city.’

‘Delhi capital most populous city’

A bag of words gives us two things:

Steps to Implement Bag-of-Words Model:

“Aman and Anil are stressed.

After text normalisation, the text becomes:

Document 1: [aman, and, anil, are, stressed]

Finally, the formula of TFIDF for any word W becomes:

TFIDF(W) = TF(W) * log( IDF(W) )

You might also like