Module-5 (1)

The document discusses various applications of Artificial Intelligence (AI), focusing on Natural Language Processing (NLP), language modeling, text classification, and information retrieval. It explains how NLP enables computers to understand and generate human language, while language models predict word sequences, aiding in tasks like text generation and translation. Additionally, it covers information retrieval techniques, evaluation metrics, and algorithms like BM25 and HITS, emphasizing their importance in enhancing search engine performance.

Uploaded by

vinaykavinayka7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views57 pages

Module-5 (1)

Uploaded by

vinaykavinayka7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 57

Module-5

Applications of AI –
Natural Language Processing, Text
Classification, and Information Retrieval,
Speech Recognition, Image processing and
computer vision, Robotics
Natural language processing
• Natural language processing (NLP) is a subfield
of Artificial Intelligence (AI). This is a widely used
technology for personal assistants that are used
in various business fields/areas. This technology
works on the speech provided by the user
breaks it down for proper understanding and
processes it accordingly. This is a very recent and
effective approach due to which it has a really
high demand in today’s market.
• Natural Language Processing (NLP) is a field of artificial
intelligence and linguistics that focuses on the interaction
between computers and humans through natural language.
• It encompasses the development of algorithms and
techniques that enable computers to understand, interpret,
and generate human language in a way that is meaningful
and contextually relevant
Language modeling
• Language modeling, or LM, is the use of various statistical
and probabilistic techniques to determine the probability
of a given sequence of words occurring in a sentence.
• Language models analyze bodies of text data to provide a
basis for their word predictions.
we need language models because they help computers
understand and generate human language better. Here's why:
• Communication: Language models enable computers to
understand what humans are saying or writing. For
example, they help virtual assistants like Siri or Alexa
understand spoken commands.
• Text Generation: Language models help computers
generate human-like text, such as auto-completion
suggestions when typing or generating responses for
chatbots.
• Translation: Language models assist in translating text from
one language to another, making communication easier
between people who speak different languages.
N-gram Character Models:

• N-gram character models are like detectives for language

made of characters (letters, numbers, punctuation). They
predict how likely certain sequences of characters are. For
example, in English, "the" is common, so it gets a high
probability, while something rare like "zgq" gets a low
probability.
• Example 1 : they might notice that in English, "th" often
comes before "e" and "a," so they'll assign a higher
probability to those combinations.
• Example 2 : When you type on your phone, the keyboard
predicts what you're going to type next based on the
characters you've entered so far. If you've typed "hel", it
might suggest "hello" because that sequence is common.
Smoothing N-gram Models:
Smoothing N-gram Models:
• Smoothing helps with the problem of rare or unseen
sequences. It adjusts probabilities so that even if a sequence
hasn't been seen before, it still gets a chance. For instance, if
"ht" is rare but "th" is common, smoothing gives "ht" a small
probability based on "th" and individual character
frequencies.
• Example: In spellcheck, if you misspell a rare word like
"aciphex", smoothing gives it a chance of being correct based
on similar words it knows.
• “Apex”,"Aphex“,"Apache“,"Acetex“,"Acetate”,"Alex“
Misspelled word: "aerospape"
• Potential corrections: "aerospace", "grape", "escape",
"earpiece"
N-gram Word Models:
N-gram Word Models:
• N-gram word models do the same thing as character
models, but with words instead. They predict the
likelihood of word sequences instead of character
sequences. The challenge with word models is
dealing with new or unknown words.
• Example: In a predictive text app, when you start
typing a word like "artif", the app predicts that you
might be typing "artificial" because it's seen that
sequence before in other texts.
Text Classification:
Text classification, also known as categorization, involves determining which
category or class a given text belongs to. It's widely used in various applications
such as language identification, genre classification, and spam detection.
Text classification is the process of automatically sorting and categorizing text
documents into predefined categories or classes based on their content.
Example: Imagine you receive emails, some of which are spam (unwanted
messages) and some are legitimate (ham). Your task is to develop a system that
automatically identifies whether an incoming email is spam or ham based on its
content.
Features for Classification: To build a text classifier, you need to decide what
features or characteristics of the text are indicative of its class. For example:
In spam detection, words or phrases like " Free gift " or "50% off" might indicate
spam.
Uppercase letters and excessive punctuation within words could also be features of
spam emails.
• “Free gift"
• "Limited time offer" “Thank you"
• "Congratulations, you've won" "Meeting scheduled"
• "Exclusive deal" "Attached document"
• "Act now" "Confirming our appointment"
"Invoice attached"
• "Make money fast" "Receipt for your purchase"
• "Guaranteed results" "Newsletter subscription confirmation"
• "Cash prize" "Update on your account"
• "Discounted prices" "Delivery notification"
• "Weight loss solution" "Event invitation"
"Weekly/monthly report"
• "Inheritance claim"
"Feedback requested"
• "Credit repair" "Request for information"
• Meet singles in your area" "Newsletter update"
• "Secret formula"
• "Earn $XXX per day"
• "No risk, all reward"
• "Instant approval"
• "Get rich quick"
• "Miracle cure"
Approaches to Classification:
• Language Modeling Approach:
– Create separate language models for spam and ham by analyzing a set
of training examples.
– Classify new messages by comparing their likelihood under each
model using Bayes' rule.
• Machine Learning Approach:
– Represent the text with a set of features (e.g., word frequencies,
presence of certain words).
– Apply a classification algorithm (e.g., decision trees, support vector
machines) to predict the class based on these features.
• Feature Selection: Choosing the right features is crucial for accurate
classification. Features should effectively discriminate between classes.
Sometimes, feature selection techniques are used to retain only the most
informative features.
• Classification by Data Compression: Another perspective on
classification is viewing it as a data compression problem. Compression
algorithms can be used to identify repeated patterns in text, which can
indicate the class of the text. For instance, spam messages might share
common patterns that compress well together.
• Outcome: Text classification algorithms can achieve high accuracy
rates, often exceeding 98-99% with carefully chosen features and
algorithms. Both traditional machine learning methods and compression-
based approaches show promise in accurately categorizing text data.
Information Retrieval
Information retrieval is the task of finding documents that are relevant to a user’s
need for information. The best-known examples of information retrieval systems are
search engines on the World Wide Web. A Web user can type a query such as [AI
book]2 into a search engine and see a list of relevant pages

Here's how it works:

• Documents: It starts with a collection of documents, like web pages or articles.

• Queries: You ask a question or type in keywords to tell the system what you're
looking for.
• Result Set: The system finds documents that match your query and thinks you
might find helpful.
• Presentation: Finally, it shows you the list of relevant documents in a way that's
easy for you to understand.
• In the past, systems used a simple model where each word in a document was like
a yes or no answer to a question. But that had some drawbacks, like not being
great at ranking results or being tricky to use
IR MODELS
• IR scoring functions
• IR system evaluation
• IR refinements
• The PageRank algorithm
• The HITS algorithm
• Question answering
IR scoring functions
• BM25 Scoring Function:
• This function calculates a score for how well a document
matches a search query.
• It looks at three things:
– Term Frequency (TF): How often a word from the query appears in
the document.
– Inverse Document Frequency (IDF): How common or rare a word is
across all documents. Rare words get higher scores.
– Document Length: Longer documents might mention the words but
may not be as relevant.
• BM25 combines these factors to score documents.
• Example
Imagine you have three books in a library, and
you're looking for information about "cats."
• Book 1: "Cats are cute pets."
• Book 2: "Dogs are also great pets."
• Book 3: "People love cats and dogs as pets."
output
• Book 1 would likely get the highest score
because it talks specifically about "cats."
• Book 3 might get a decent score because it
mentions both "cats" and "dogs."
• Book 2 would likely get a lower score because
it doesn't mention "cats" at all.
IR system evaluation
• To evaluate whether an Information Retrieval
(IR) system is doing a good job, we use
experiments where the system is given some
search queries, and the results are compared to
judgments made by humans about the
relevance of those results. Traditionally, we use
two measures called recall and precision.
• Precision tells us the proportion of relevant
documents among the retrieved documents.
Precision = (Number of relevant documents
retrieved) / (Total number of documents retrieved)

• Recall: Recall tells us the proportion of relevant

documents that were retrieved out of all the
relevant documents in the library. Recall = (Number
of relevant documents retrieved) / (Total number
of relevant documents)
• Imagine you're searching for information about "dogs" in a library with 100 books. Out
of these, 50 books mention dogs and are considered relevant, while the other 50 don't
mention dogs and are considered not relevant.
• Now, you use a search engine to find information about dogs, and it returns 20 books as
results.
Out of these 20 books:
• 15 books actually mention dogs (true positives).
• 5 books don't mention dogs (false positives). Given this information, let's calculate
precision and recall:
• Precision: Precision tells us the proportion of relevant documents among the retrieved
document
• ments. Precision = (Number of relevant documents retrieved) / (Total number of
documents retrieved) In our case, precision = 15 / (15 + 5) = 15 / 20 = 0.75. So, the
precision is 75%.
• Recall: Recall tells us the proportion of relevant documents that were retrieved out of all
the relevant documents in the library.
• Recall = (Number of relevant documents retrieved) / (Total number of relevant
documents) In our case, recall = 15 / 50 = 0.30. So, the recall is 30%.
• In simple terms: Precision (75%) tells us that out of the books the search engine
returned, 75% of them actually mentioned dogs.
• Recall (30%) tells us that out of all the books in the library that mention dogs, the
search engine found only 30% of them.
IR Refinements Definition: IR refinements refer to the various
techniques and approaches used to enhance the performance and
effectiveness of information retrieval systems. These refinements aim to
improve the accuracy, relevance, and efficiency of retrieving information
from large datasets, particularly in the context of web search engines.
Methods Used in IR Refinements:
• Document Length Normalization: Adjusting for document length
variations to prevent bias towards shorter or longer documents in
search results.
• Word Correlation Handling: Addressing correlations between words
to improve search accuracy. Techniques include case folding and
stemming to recognize variations of words and account for word
relationships.
• Synonym Recognition: Identifying synonyms of query terms to
broaden the scope of relevant documents retrieved.
• Metadata Utilization: Leveraging metadata associated with
documents, such as keywords in publication data, to improve
relevance and accuracy in search results.
• The HITS algorithm, which stands for Hyperlink-Induced Topic Search, is a method
used by search engines to figure out which web pages are the most relevant for a
specific search query.
• Here's how it works
1. Finding Relevant Pages: First, HITS looks for pages on the internet that are related
to the topic you're searching for. It considers not only pages that directly talk
about the topic but also pages that are connected to those pages through links.
2. Authority Pages are like experts on a topic. They are pages that many other
important pages link to. So, they're seen as trustworthy and knowledgeable
because others refer to them.
3. Hub Pages are like connectors. They're pages that link to many other important
pages about a topic. They may not have all the information themselves, but
they're good at pointing to other valuable resources.

• Authority Page (GeeksforGeeks):

• GeeksforGeeks is a well-known website offering a vast array of programming
tutorials, articles, and resources. It covers a wide range of topics in computer
science and programming languages, providing detailed explanations, code
samples, and solutions to common programming problems
• Hub Page(blog): the programming blog "CodeLovers' Corner," would provide links
to relevant pages on GeeksforGeeks, tutorials points, w3scholls.
Question answering
• Task: Instead of just finding websites, this system tries to directly
answer questions with short responses.
• Using the Web: It looks at lots of information on the internet to
find answers.
• Focus on Accuracy: It's more concerned with giving one correct
answer than showing a bunch of possible answers.
• Rewriting Questions: It changes questions into ways a search
engine can understand better.
• Searching and Checking: It looks at short summaries of websites
to find answers to questions.
• Choosing the Best Answer: It picks the answer that seems most
likely to be right based on the question.
• Question: "What is the capital of France?"

System Approach:
• Task Definition: The system understands that it needs to provide a short
response to the question.
• Using the Web: It searches the internet for information related to the
capital of France.
• Focus on Accuracy: Instead of giving a list of websites, it aims to give a
single correct answer.
• Rewriting Questions: It changes the question into a search query that a
search engine can understand, like "capital of France."
• Searching and Checking: It looks at short summaries of websites in the
search results to find the answer.
• Scoring the Answers: It rates each potential answer based on how often it
appears and where it appears in the search results.
• Choosing the Best Answer: It picks the answer that seems most likely to
be correct based on the question and the search results.
Speech recognition
• Speech recognition involves identifying spoken words from an acoustic signal.
• It's a mainstream AI application used in various everyday scenarios like voice
mail systems, mobile web searches, and hands-free operation.
• Challenges include ambiguity, noise, segmentation (lack of pauses between
words), coarticulation (blending sounds), and homophones.
• Speech recognition is viewed as a problem of finding the most likely
sequence of words given the observed sounds, using Bayes' rule.
• The approach involves an acoustic model (describing sounds) and a language
model (specifying word probabilities).
• Claude Shannon's noisy channel model inspired this approach,
demonstrating the possibility of recovering the original message despite
noise.
• The Viterbi algorithm is commonly used to find the most likely word
sequence.
a phone model is like a map that helps a computer understand and recognize the different
sounds in spoken language. It breaks down speech into small units called "phones,"
• Markov Model: Think of it like a series of states where you move from one
state to another based on probabilities. Each state represents a different
sound or word.
• Hidden: The states are "hidden" because we don't directly observe them.
Instead, we hear the speech but don't know exactly which state produced
each sound.
• Modeling Speech: HMMs help us model how speech transitions between
different sounds or words. They learn the probabilities of going from one
sound to another and use this knowledge to recognize spoken words.
• Key Component: HMMs are a key component of many speech recognition
systems because they can handle the variability and uncertainty present in
real-world speech. They're like a smart guesser that figures out the most
likely sequence of words based on the sounds it hears
Image processing and computer vision
• Image processing is the manipulation and analysis of images using
computational techniques. It involves converting images into digital form
and performing various operations on them to extract useful information,
enhance visual quality, or facilitate further analysis. Image processing can be
used in a wide range of applications including medical imaging, satellite
imaging, surveillance, remote sensing, and digital photography.
• [Converting images into digital form involves capturing an image through a digital
device, such as a digital camera or a scanner, and representing it in a format that can be
processed and manipulated by computers]
• Early Vision Operations: These are the initial steps in image processing,
including:
• Edge Detection: Identifying boundaries between objects or regions in an
image.
• Texture Analysis: Analyzing patterns and textures within the image.
• Computation of Optical Flow: Tracking the movement of objects or features
between frames in a sequence of images.
• Edge Detection:
definition: Edge detection is the process of identifying abrupt changes
in brightness in an image to locate boundaries between objects or
regions
– What it does: Identifies significant changes in brightness in an image,
indicating boundaries between objects or regions.
– How it works: Examines the rate of change in brightness across pixels and
detects sharp transitions.
– Example: If you have a picture of a stapler on a desk, edge detection
would outline the edges of the stapler, desk, and any other prominent
features..
• Texture:
definition: Texture refers to the repetitive patterns or visual characteristics of
surfaces in an image, aiding in object recognition based on unique surface
qualities.
– What it is: The repeating patterns or visual feel of a surface in an image.
– How it's used: Helps in recognizing objects or surfaces based on their
unique textures.
– Example: Texture could distinguish between the smooth surface of a desk
and the rough texture of a brick wall in an image.
• Optical Flow:
definition: Optical flow describes the apparent motion of objects in a
sequence of images or video by measuring their direction and speed of
movement
– What it describes: Apparent motion in a sequence of images or video.
– How it works: Measures the direction and speed of movement of features in an
image sequence.
– Example: In a video of a moving tennis player, optical flow would track the
direction and speed of the player's racket or limbs.

• Segmentation of Images:
definition: Image segmentation is the process of partitioning an image into
regions with similar visual characteristics, facilitating object recognition and
analysis.
– What it does: Divides an image into regions of similar visual properties.
– How it's done: Pixels with similar attributes (brightness, color, texture) are
grouped together.
– Example: In a picture of a beach, segmentation could separate the sand, water,
sky, and objects like umbrellas or people into distinct regions.

Semester 1 Midterm Exam PLSQL
100% (2)
Semester 1 Midterm Exam PLSQL
15 pages
LinkedIn Course
100% (4)
LinkedIn Course
44 pages
Can Do Writing: The Proven Ten-Step System for Fast and Effective Business Writing
From Everand
Can Do Writing: The Proven Ten-Step System for Fast and Effective Business Writing
Daniel Graham
No ratings yet
Final PPT
No ratings yet
Final PPT
14 pages
Ai & Ml Unit-3 Ir & Ie
No ratings yet
Ai & Ml Unit-3 Ir & Ie
15 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
6. Applications of NLP
No ratings yet
6. Applications of NLP
85 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
AI UNIT5
No ratings yet
AI UNIT5
16 pages
Natural Languag-wps Office (1)
No ratings yet
Natural Languag-wps Office (1)
24 pages
NLP Notes
No ratings yet
NLP Notes
9 pages
AI UNIT-5 notes
No ratings yet
AI UNIT-5 notes
27 pages
Pert23 - NLP
No ratings yet
Pert23 - NLP
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
NLP Application
No ratings yet
NLP Application
7 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
AI-CH-4
No ratings yet
AI-CH-4
53 pages
NLP11
No ratings yet
NLP11
4 pages
My M-7
No ratings yet
My M-7
44 pages
AI 6th sem unit 5
No ratings yet
AI 6th sem unit 5
13 pages
Nlp Materia
No ratings yet
Nlp Materia
29 pages
A Quick Guide To Artificial Intelligence
100% (3)
A Quick Guide To Artificial Intelligence
41 pages
AI Unit V
No ratings yet
AI Unit V
64 pages
AI Unit 5 sem 5_watermark
No ratings yet
AI Unit 5 sem 5_watermark
14 pages
UNIT-5_AI
No ratings yet
UNIT-5_AI
74 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
57 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
Kunal Dm
No ratings yet
Kunal Dm
3 pages
AI Terminologies You Must Know in 2024 1724502103
No ratings yet
AI Terminologies You Must Know in 2024 1724502103
42 pages
Unit 5 AI
No ratings yet
Unit 5 AI
9 pages
NLP 01
No ratings yet
NLP 01
7 pages
Unit 6 Natural Language Processing
No ratings yet
Unit 6 Natural Language Processing
48 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
NLP BOOK
No ratings yet
NLP BOOK
599 pages
NLP
No ratings yet
NLP
31 pages
natural language processing unit1
No ratings yet
natural language processing unit1
23 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
AI note
No ratings yet
AI note
9 pages
Ai Unit-5 Notes
100% (1)
Ai Unit-5 Notes
30 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
78 pages
unit 3&4
No ratings yet
unit 3&4
10 pages
NLP - 11 Applications
No ratings yet
NLP - 11 Applications
9 pages
Unt 22017ppt Night 111a&b
No ratings yet
Unt 22017ppt Night 111a&b
76 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
NLP_MODULE_6
No ratings yet
NLP_MODULE_6
30 pages
NLP_UNIT-1[1]
No ratings yet
NLP_UNIT-1[1]
20 pages
5th Chapter Notes
No ratings yet
5th Chapter Notes
6 pages
Module 1 - ML
No ratings yet
Module 1 - ML
26 pages
vtu internship hyderbad
No ratings yet
vtu internship hyderbad
11 pages
Applications of NLP
No ratings yet
Applications of NLP
6 pages
AI Lec 1
No ratings yet
AI Lec 1
62 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
dupppppppppp
No ratings yet
dupppppppppp
15 pages
Ed3book Jan122022
No ratings yet
Ed3book Jan122022
653 pages
AI Prompting: A Guide to Communicating with Artificial Intelligence
From Everand
AI Prompting: A Guide to Communicating with Artificial Intelligence
E. A. Ruppert II
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
PDF&Rendition=1 2
No ratings yet
PDF&Rendition=1 2
27 pages
Web_Content_Short_Answers
No ratings yet
Web_Content_Short_Answers
2 pages
Php unit 4 nep[1]
No ratings yet
Php unit 4 nep[1]
41 pages
1st Unit Python Programming
No ratings yet
1st Unit Python Programming
37 pages
Uni Code
No ratings yet
Uni Code
4 pages
Solution Midterm DSA
No ratings yet
Solution Midterm DSA
4 pages
Parallel Finite Automata PDF
No ratings yet
Parallel Finite Automata PDF
17 pages
Best Practices Cyber Security Testing
100% (2)
Best Practices Cyber Security Testing
4 pages
Auto Docu Bot
No ratings yet
Auto Docu Bot
28 pages
DX Diag
No ratings yet
DX Diag
10 pages
01 eLMS Activity 1 Network Technology
No ratings yet
01 eLMS Activity 1 Network Technology
2 pages
CS ZG524ES ZG524MEL ZG524 - Real Time Operating System Makeup _ Answer Keys
No ratings yet
CS ZG524ES ZG524MEL ZG524 - Real Time Operating System Makeup _ Answer Keys
4 pages
ATRG - Threat Emulation
No ratings yet
ATRG - Threat Emulation
44 pages
IPO Factsheet SNS Network Technology Berhad
No ratings yet
IPO Factsheet SNS Network Technology Berhad
2 pages
Free Harry Potter Powerpoint Template
No ratings yet
Free Harry Potter Powerpoint Template
14 pages
Project Synopsis Format
No ratings yet
Project Synopsis Format
4 pages
Questoes de Arraste e Solte
No ratings yet
Questoes de Arraste e Solte
27 pages
Course Outline DEE40082 (SESI I 2022-2023)
No ratings yet
Course Outline DEE40082 (SESI I 2022-2023)
4 pages
LIS Communication Information
No ratings yet
LIS Communication Information
10 pages
Week10 - File Manipulation, Iterators, Reflection & Custom Annotations
No ratings yet
Week10 - File Manipulation, Iterators, Reflection & Custom Annotations
75 pages
Topic 5 CSS
No ratings yet
Topic 5 CSS
21 pages
Marshall: V-R71PA-SDI Instruction Sheet
No ratings yet
Marshall: V-R71PA-SDI Instruction Sheet
2 pages
FSWD-Unit-I
No ratings yet
FSWD-Unit-I
40 pages
Summer of Science 2024: Discrete Mathematics and Coding Theory
No ratings yet
Summer of Science 2024: Discrete Mathematics and Coding Theory
23 pages
240101 Technology BIDRESP
No ratings yet
240101 Technology BIDRESP
640 pages
Shiva Azure Devops Engineer
0% (1)
Shiva Azure Devops Engineer
6 pages
Clustering Report
No ratings yet
Clustering Report
55 pages
Climatix Pol822
No ratings yet
Climatix Pol822
12 pages
CV Mridwann
No ratings yet
CV Mridwann
17 pages
Cloud Security Checklist
No ratings yet
Cloud Security Checklist
3 pages