0% found this document useful (0 votes)
23 views110 pages

Class X AI Notes (Autosaved)

Uploaded by

arnavpatne22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views110 pages

Class X AI Notes (Autosaved)

Uploaded by

arnavpatne22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 110

1

CLASS X - ARTIFICIAL INTELLIGENCE


PART-B
UNIT-I INTRODUCTION TO AI
SESSION – I FOUNDATIONAL CONCEPT OF AI
What is AI?
Artificial intelligence is the science of making machines that can think like humans. It
can do things that are considered "smart." AI technology can process large amounts of
data in ways, unlike humans. The goal for AI is to be able to do things such as
recognize patterns, make decisions, and judge like humans.

Artificial Intelligence suggests that machines can mimic humans in:

 Talking
 Thinking
 Learning
 Planning
 Understanding

Artificial Intelligence is also called Machine Intelligence and Computer Intelligence.

What is Intelligence?
Intelligence refers to the ability to understand, distinguish, question
things/objects/feeling/situations/ people along with acquiring and applying
knowledge and skills in various domain.

Role of intelligence in out life.


- Any one Example
2

Types of intelligence
3

What is decision making?


Is the process of identifying and picking a final choice from an available set of choices.

Examples of AI – What AI can do.

Text Editors

Chatting, commenting, emailing, etc., are a part of our lives. The keypad apps in
smartphones have built-in AI to auto-predict sentences and emojis. Gmail also has an
auto-predictor that suggests the next part of the sentence as you type.

Navigation and Maps

Maps and GPS are some of the best real-world examples

Facial Recognition

Facial recognition is another real-world AI product. The face-lock security feature on


smartphones is a perfect example of this.

Healthcare

Real-world AI products are rampant in the healthcare industry, with many hospitals
and pharma companies investing in advanced technology. Data shows that 38% of
medical and healthcare providers use computer-aided diagnosis when treating
patients
4

Customer Service

The customer service industry is quickly using chatbots to interact with customers.

Electronic Payments

Payment apps are well-known AI products in everyday life. Google Pay, PhonePe,
Paytm, etc., are commonly used payment apps in India to make instant electronic
payments by scanning a QR code.

Home Assistants

Home assistants are the easiest example of AI products used in our day-to-day lives.
Many of us have a home assistant device like Alexa, Siri, Cortana, or Google Assistant to
play music, place an order, read out webpages, or provide answers to our questions by
searching on the internet. They are hands-free devices that listen to our voice and
respond by answering in audio.

ChatGPT, Google Bard, and Bing

Generative AI has taken chatbots to the next level. They can give us predetermined
replies, understand our input, and provide a more detailed and realistic response.
ChatGPT by OpenAI took the world by storm. Google Bard and Microsoft Bing followed
it, though ChatGPT continues to be the most used chatbot. A survey
by Forbes reported that 97% of companies think ChatGPT will have a positive impact
on their operations.

What AI is not?
1. AI is not just Automation-Ex smart washing machine, smart TV
2. AI is not single entity like human or animal.
3. AI is not emotions like human being
4. AI is not magic. It is math and algorithm.

How machines become intelligent?


In order to know how machines become intelligent, It is important to know various AI
domain and branches. AI has following domain and branches.
1. Machine Learning-The branch of AI that teaches how to make inferences and
decision based on past experience.
2. Neural Network-The branch of AI that works similar to human neural networks
where multiple layers capture the relationships among data and process them as
per the need.
5

3. Natural Language Processing- The branch of AI that helps machine to read


understand and interpret a natural language and provide a response in a natural
language.
4. Computer vision-The branch of AI that helps machine recognize an image by
breaking down an image and studying different parts of the objects.

Session II AI Domain and Technologies.


Introduction

Deep learning is incredibly powerful tool for extracting complex pattern from data
using neural network having multiple layer in it.

Following figure depicts what deep learning is and where it resides in the field
of artificial intelligence:

Deep Learning is sub field of Artificial Neural Network which in turn sub field
of Machine Learning in Artificial Intelligence.
6

Artificial Intelligenec: Any technique that mimics human behavior using computer or
digital processor is known as artificial intelligence. For example: Robots, Chatbots,
Spam Filter & Email Categorization, Face Recognition etc

Machine Learning: Ability to learn by machine from examples or without being


explicitly programmed is known as machine learning. For Example: Face Recognition,
Stock Market Prediction, Voice Recognition etc

Artificial Neural Network: Computational algorithm for machine learning inspired


by human brain is known as artificial neural network. For example: Stock Market
Prediction, Object Detection, Face Recognition, Outlier Detection etc.

Applications

Deep learning has following applications:

1. Computer Vision: Object Detection, Object Recognition, Face Recognition etc.


2. Natural Language Processing: Speech Recognition, Language Understanding,
Language Generation etc.
3. Bioinformatics: Understanding Biological Data, Finding Pattern in Biological
Data etc.
4. Machine Translation: Translating text and speech from one language to
another language.
5. Medical Image Analysis: Analyzing different images in medical field like
Medical Resonance Imaging (MRI). Application includes brain tumor detection,
cancer diagnosis and detection.

Requirements
Deep learning has following requirements:

1. Large Data Requirements:


7

2. Hardware Requirements: ).
3. Software Requirements: .

Domains of AI
Natural Language Processing (NLP)
 Natural Language Processing (NLP) is a domain of AI that focuses on enabling
machines to understand, interpret, and generate human language.

 The purpose of NLP is to bridge the communication gap between humans and
computers, allowing seamless interactions and extracting valuable insights from
textual data.

Applications of NLP

 Speech Recognition: NLP powers speech recognition systems, converting


spoken language into text.

 Language Translation: NLP facilitates language translation, enabling machines


to translate text from one language to another

Examples of NLP in Virtual Assistants and Chatbots

 Virtual Assistants: NLP is the backbone of popular virtual assistants like Siri,
Alexa, and Google Assistant. These assistants understand spoken language,
process user queries, and provide relevant responses or perform tasks based on
the context.

 Chatbots: NLP is essential for chatbots, enabling them to engage in human-like


conversations with users. These chatbots are employed in customer support,
helping users find information, and providing personalized recommendations.

Computer Vision
Definition and Scope of Computer Vision

Computer vision is a field of artificial intelligence that enables computers to interpret


and understand visual information from the world. It involves developing algorithms
and techniques to enable machines to extract meaningful insights from images and
videos. Computer vision aims to replicate human visual perception and understand
the content, context, and spatial relationships within visual data.

Use Cases of Computer Vision

 Facial Recognition: Identifying and verifying individuals based on their facial


features, commonly used for security and user authentication.
8

 Object Detection: Locating and identifying specific objects or patterns within


images or videos, utilized in various applications like security systems and
object tracking.

 Image Classification: Categorizing images into predefined classes or categories,


used in areas like medical diagnostics, quality control, and content organization.

 Optical Character Recognition (OCR): Extracting text from images or scanned


documents to make it machine-readable and searchable.

 Gesture Recognition: Understanding and interpreting human gestures,


employed in applications like gaming and virtual reality.

 Augmented Reality (AR): Overlapping virtual objects with the real world using
computer vision techniques, enriching the user experience.

Applications of Computer Vision in Autonomous Vehicles and Surveillance


Systems
9

 Autonomous Vehicles: Computer vision plays a critical role in enabling self-


driving cars.

 Traffic Analysis: Analyzing traffic flow and congestion through computer vision
techniques to optimize transportation systems.

 Object Tracking: Tracking and following specific objects or individuals across


video frames for surveillance and forensic analysis.

 Anomaly Detection: Identifying abnormal events or behaviour patterns in


surveillance footage, alerting operators to potential threats.

Data Science
10

Session III AI Applications


Real life AI applications.

4. Surveillance
AI has made it possible to develop face recognition Tools which may be used for
surveillance and security purposes.
As a result, this empowers the systems to monitor the footage in real-time and can be
a pathbreaking development in regards to public safety.

Session IV AI Ethics
AI ethics is a set of guiding principles designed to help humans maximize the benefits
of artificial intelligence and minimize its potential negative impacts. These principles
establish ‘right’ from ‘wrong’ in the field of AI, encouraging producers of AI
technologies to address questions surrounding transparency, inclusivity, sustainability
and accountability, among other areas.

AI ethics may require organizations to establish policies that respect data privacy
laws, account for bias in algorithms and explain to customers how their data is used
before they sign up for a product.
11

Ethical issues around AI


Bias and Fairness-
Should be free from all types of biases and be fair.
Ex- Keeping candidates for a job must not be biased against any gender, color etc.
Accountability-
is an assurance that an individual or organization is evaluated on its performance or
behavior related to something for which it is responsible.
Ex- if a bank manager relies on an A.I. lending tool that unfairly discriminates against
certain loan applicants, this manager could be liable for not properly overseeing its
decisions.
Transparency-
It means nothing is hidden and everything that AI performs is explainable.
Safety-
No harm to data, people and the outcomes.
Trust Privacy and Control-
Privacy in AI isn't just about protecting data from unauthorized access; it's about using
data responsibly. In an age where data is a valuable asset, ensuring its ethical use is
paramount for maintaining consumer trust and regulatory compliance.
Cyber security and malicious use-
It is an ethical responsibility of an organization to have human control over AI uses in
term of its span and control so that it is not available to hacker for malicious use.
Automation and impact over job
AI and Robotics are leading to increased automation in all types of fields and
industries and at many places, robots are replacing humans too. This will lead to many
humans losing their jobs. But AI doesn’t mean that jobs are reduced, it just means that
the nature of jobs and work is changing.
It is an ethical responsibility of an organization to upgrade the skillset of its workers
so that they upgrade their skillset and be ready for AI oriented jobs. It is an ethical
responsibility of governments too to bring changes in education system for its people,
keeping in mind the nature and impact of AI over them.
Human Rights in the Age of AI
Human rights in the age of AI are both a critical concern and an evolving field of
discussion and action. Here are some key points to consider:

1. Privacy Rights: AI technologies often involve the collection and analysis of large
amounts of data, raising concerns about privacy rights. Governments and companies
must ensure that individuals' personal data is protected and secure.
12

2. Bias: AI algorithms can reflect and even biases present in the data they are trained on,
leading to unfair outcomes. It's essential to address bias in AI systems to ensure that
they do not violate individuals' rights.
3. Transparency and Accountability: The opacity of AI systems can pose challenges to
accountability and the right to information. There is a need for transparency in how AI
systems are developed, deployed, and used to ensure that individuals can understand
and challenge decisions that affect their rights.

AI BIAS AND AI ACCESS

AI Bias:

Facial Recognition: Facial recognition algorithms have been found to exhibit bias,
particularly against people of color.
Hiring Algorithms: AI-powered hiring platforms may bias. For instance, if a company
historically hired more men for technical roles, an AI hiring system trained on this
data may continue to favor male candidates.

AI bias can comes from various sources, both technical and societal. Here are
several reasons why AI systems may bias:

Reason #1: Insufficient Training Data


A major contributor to the problem of bias in AI is that not enough training data was
collected. Or more precisely, there is a lack of good training data for certain
demographic groups

Reason #2: Humans Are Biased


Whether we like it or not, as humans we all carry our (un)conscious biases that are
reflected in the data we collect about our world.

Real life examples of AI bias

‍ . Amazon’s Hiring-Not selected female resume for job hiring for technical post
1
2. COMPOS-

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS)


is an American case management and decision support tool. It is designed to help
judges
However, an analysis of the COMPAS AI algorithm discovered racial biases — black
criminals were inaccurately and unfairly assessed as being more risky vs. white
13

criminals. Black criminals were also misclassified by COMPAS as being more


dangerous than white criminals

4. US healthcare-

UsS health care system used an AI model that assigned a lower risk rate for black
people than white people for same disease. It happened because the model was
optimized for cost, and since black people perceived as being less able to pay, the
model ranked their health risk lower than white people.

5. Twitter image cropping-


Twitter users found out that the image cropping algorithm favored white faces
over black. Because AI application shows certain portion of images in the
preview.
6. Facebook advertisement algorithm- Target people based on their gender, color
and religion.

Reducing AI Bias

 Through research-Deep study of problem


 Diversity of team-One person or team doesn’t have major influence on data.
 Data diversity-Combine input from multiple sources to ensure data diversity.
 Standardized data labeling-Proper data labels are used in data collection.
 Data Review-to take domain expertise help to review data
 Regular data analysis-The team should keep track of errors and problems areas
so as to respond to and resolve them quickly.

Advantages and disadvantages of AI

Advantages
 Reduce human errors
 Help in learning repetitive task
 Provide digital assistance
 Faster and more accurate
 24x7 Availability

Disadvantages

 High Costs
 No Creativity
14

 Unemployment
 Make Humans Lazy
 Emotionless

Questions answers

UNIT_II

Project Cycle

Session-Introduction to AI project cycle

Defination-Project cycle refers to the life cycle of any project that describe different
project stages,with each step being separate from one another and delivering or
meeting a certain objective.
15

The AI project cycle typically consists of these five stages:

1. Problem Scoping
2. Data Acquisition
3. Data Exploration
4. Modeling
5. Evaluation

Problem Scoping

Problem scoping is the process of pinpointing a particular issue or opportunity that


can be tackled using artificial intelligence (AI). During this phase, we not only identify
16

the problem but also set specific objectives, goals, and criteria for success. However,
scoping a problem is no simple task. It requires a deep understanding of the issue so
that we can work effectively and solve problem-solving.

Data Acquisition

This is the second phase of the AI project cycle, which is focused on obtaining the
necessary data for the project. While developing an AI system for predictive purposes,
it’s essential to begin by training it with relevant data.
1. Surveys
2. Web Scraping
3. Sensors
4. Cameras
5. Observations

Data Exploration

Data is a complicated thing, often just a bunch of numbers. But to make sense of it, we
need to find the hidden patterns. That’s where data visualization comes in. It’s all
about turning those numbers into pictures that are easy for people to understand.

Modeling

In the AI project cycle, modeling is a critical step in simplifying complex data for
computers to process and make predictions. At the start, data is usually presented in
charts or graphs to help people spot patterns. But, for AI systems to work, we need to
convert this data into a basic form that computers can grasp that is binary (0s and 1s).

Evaluation

Now, we are at the last stage of the AI project cycle. Once you’ve created and trained a
model, it’s crucial to thoroughly test it to evaluate how well it performs. To do this, we
use a separate dataset called testing data.

Session II Understanding Problem Scoping


17

link for reference-

https://aiforkids.in/class-10/project-cycle/#problem-scoping
18

Framing problem statement


The desired outcome of applying the 4W canvas in AI project cycle is that we should be
able to frame a problem statement.
But we must have a problem statement that is specific, attainable and timebound:
 Specific: This means you will list out what are the exact painpoints you are trying to
solve.
 Attainable: And these painpoints should not be out of this world. They should be
achievable within your resources and a specified timeframe.
 Time bound: The problem statement must mention a time frame. You will not say that
I’m going to solve this problem. You will say, I’m going to find a solution in the next six
months.

The first blank is the answer of who. The stakeholder, the name of the people, the
name of the group of people who are facing the problem.
In the second blank, you list the painpoints that you are going to solve. The
stakeholders might have five pain points, but you’re only going to solve the three top
three. You will list that here.
In the third blank you put the answer of “where.” Where they’re actually facing the
problem.
And the last blank is for why are we solving the problem? You can say, because we can
do this, this, and this, and list your reasons.

4W Canvas Examples
For the first example let us say there is a long waiting line for the parking area in the
mall near your house. People often park their cars on the roads rather than wait for
19

their turn. This creates traffic problems. And this is the problem that you are going to
solve.
Let’s get answer to the 4Ws:
 Who is facing the problem – The car owners of the city.
 What is the problem – They have to wait for a long time for parking space.
 Where does the problem exist – In the XYZ Mall when they visit them.
 Why are you are trying to solve this problem – because you have the ability to develop
a software that calculates the average wait time before the next parking space falls
vacant, so that somebody’s waiting in the line. They will know they have just two
minutes to wait and then they’ll join the queue and park the car in the parking.
This is how the problem statement can be framed.

Session 3 Simplifying Data Acquisition

Data Acquisition is the process of collecting accurate and reliable data to work
with. Data Can be in the format of the text, video, images, audio, and so on and it can
be collected from various sources like interest, journals, newspapers, and so on.

Data Sources
20

Surveys
1. Survey is one of the method to gather data from the users for the second stage of ai
project cycle that is data acquisition.
2. Survey is a method of gathering specific information from a sample of people. for
Example a census survey is conducted every year for analyzing the population.
3. Surveys are conducted in particular areas to acquire data from particular people.

Web Scraping
1. Web Scraping means collecting data from web using some technologies.
2. We use it for monitoring prices, news and etc.

Sensors
1. Sensors are very Important but very simple to understand.
2. Sensors are the part of IOT. IOT is internet of things.
3. Example of IOT is smart watches or smart fire alarm which automatically detects wire
and starts the alarm.. How does this happen, this happens when sensors like fire
sensor sends data to the IOT or the smart alarm and if sensor detects heat or fire the
alarm starts.
21

Cameras

1. Camera captures the visual information and then that information which is called
image is used as a source of data.
2. Cameras are used to capture raw visual data
Observations

1. When we observe something carefully we get some information


2. For example, Scientists take instects in observation for years and that data will be used
by them . So this is a data source.
3. Observations is a time consuming data source.

API
1. API stands for Application Programming interface.
2. API is actually a messenger which takes requests from you and then tells the system
what you want and then it gives you a response

characteristics of data quality

Relevance: This is a more subjective and comprehensive assessment of data


quality. Data is useless if it is not relevant to the intended purpose

Completeness: The data should be complete without any missing data.

Reliability: The degree to which data is true and factual.

Validity: Data is considered valid if it has the correct format, type, and range. This may
differ based on the country, sector, or standards used. Here are several examples:

 Data type: numeric, boolean, labels.


 Range: values must be within a certain interval; for example, a birth year of 201
is invalid because it is outside the date range.
 Patterns: When dates do not meet established standards, they are considered
invalid, for example, MM-DD-YYYYY for a date of birth.

Accuracy: How effectively does the data describe the real-world conditions it is
trying to describe? This is one of the most important properties of high-quality
data. Accuracy can be checked by comparing data with a reliable source.

Timeliness
22

For data to retain its quality, it should be recorded promptly to manage changes.
Weekly over annually, tracking is the solution to timeliness. An example of timeliness
metrics is time variance.

Types of data used in AI project

It has two types


What is structured data?
Structured data is data that has been predefined and formatted to a set structure
before being placed in data storage.

There are three key benefits of structured data:

 Easy use by machine learning algorithms:


 Easy use by business users:.
 Increased access to more tools:
Examples of structured data
Structured data is everywhere. It’s the basis for inventory control systems and ATMs.
It can be human- or machine-generated.
What is structured data?

Unstructured data, typically categorized as qualitative data, cannot be processed


and analyzed through conventional data tools and methods. Since unstructured
data does not have a predefined data model.

Structured data is: Unstructured data is:

 In the form of numbers and text, in  Comes in a variety of shapes and sizes
standardized, readable formats. that does not conform to a predefined
 Typically XML and CSV. data model.
 Follows a predefined relational data  Typically DOC, WMV, MP3
model.  Does not have a data model, though may
 Stored in a relational database in tablets, have hidden structure.
rows, and columns, with specific labels.  Stored in unstructured raw formats or in
Relational databases use SQL for a NoSQL database. Many companies use
processing. data lakes to store large volumes of
 Easy to search and use with ample unstructured data that they can then
analytics tools available. access when needed.
 Quantitative (has countable elements),  Requires complex search, processing, and
easy to group based on attributes or analysis before it can be placed in a
characteristics. relational database.
 Qualitative with subjective information
23

Structured data is: Unstructured data is:

that must be split, stacked, grouped,

Session 4 Data Exploration with Data Visualisation.

If we simplify this Data Exploration means that the data which we collected in Data
Acquisition, in Data Exploration we need to arrange it for example if we have data of
50 students in a class, we have their Mobile Number, Date of Birth, Class, etc.

In the process of data exploration, we can make a chart for that data in which all the
names will be at one place and all the mobile numbers at one, etc.

1. Google Charts

Google chart tools are powerful, simple to use, and free. Try out our rich gallery of
interactive charts and data tools.

2. Tableau

Tableau is often regarded as the grandmaster of data visualization software and for
good reason.

Tableau has a very large customer base of 57,000+ accounts across many industries
due to its simplicity of use and ability to produce interactive visualizations far beyond
those provided by general BI solutions.
24

3. FusionCharts

This is a very widely-used, JavaScript-based charting and visualization package that


has established itself as one of the leaders in the paid-for market.

It can produce 90 different chart types and integrates with a large number of
platforms and frameworks giving a great deal of flexibility.

4. Highcharts

A simple options structure allows for deep customization, and styling can be done via
JavaScript or CSS. Highcharts is also extendable and pluggable for experts seeking
advanced animations and functionality.

What is data visualization?

Data visualization is the representation of information and data using charts, graphs,
maps, and other visual tools. These visualizations allow us to easily understand any
patterns, trends, or outliers in a data set.

Benefits of data visualization

Data visualization can be used in many contexts in nearly every field, like public policy,
finance, marketing, retail, education, sports, history, and more. Here are the benefits of
data visualization:
 Storytelling: People are drawn to colors and patterns in clothing, arts and culture,
architecture, and more. Data is no different—colors and patterns allow us to visualize
the story within the data.
 Accessibility: Information is shared in an accessible, easy-to-understand manner for a
variety of audiences.
 Visualize relationships: It’s easier to spot the relationships and patterns within a data
set when the information is presented in a graph or chart.
 Exploration: More accessible data means more opportunities to explore, collaborate,
and inform actionable decisions.
Types of chart
25

Session-V Modelling
What is an AI model?
26

An AI model is a program that has been trained on a set of data to recognize certain
patterns or make certain decisions without further human intervention.

To Make a machine learning model there are 2 ways/Approaches Learning-Based


Approach and a Rule-Based Approach.

Machine Learning (ML)

Machine learning is a subset of artificial intelligence (AI) which provides machines


the ability to learn automatically and improve from experience without being
programmed for it.

Types of Machine Learning

Machine learning can be divided into 3 types, Supervised Learning, Unsupervised


Learning, and semi-supervised or Reinforcement Learning
27

Supervised Learning

Supervised learning is where a computer algorithm is trained on input data that has
been labeled for a particular output.

For example a shape with three sides is labeled as a


triangle, Classification and Regression models are also type of supervised Learning
28

What is classification ?

Classification in which the algorithm’s job is to separate the labeled data to predict
the output.

Example: to predict weather which of them is apple and pineapple.

What is Regression ?

Regression is a type of supervised learning which is used to predict continuous value.

Example: Regression is used to predict the weather. it is also used widely for weather
forecasting.

Unsupervised Learning

In terms of machine learning, unsupervised learning is in which a system learns


through data sets created on its own. In this, the training is not labeled.

Learning on its own is termed Unsupervised learning.


Basically, in unsupervised learning where the data is un-tagged or un-named, the
machine creates a learning algorithm using its structural data-sets present in its input.

Example: Suppose a boy sees someone performing tricks by a ball, so he also learnt the
tricks by himself. This is what we call unsupervised learning.

Reinforcement Learning

Learning through feedback or trial and error method is called Reinforcement


Learning.
29

In this type of learning, The system works on Reward or Penalty policy. In this an
agent performs an action positive or negative, in the environment which is taken as
input from the system, then the system changes the state in the environment and the
agent is provided with a reward or penalty.
The system also builds a policy, that what action should be taken under a specific
condition.

Example: A very good example of these is Vending machines.


Suppose you put a coin (action) in a Juice Vending machine(environment), now
the system detects the amount of coin given (state) you get the drink corresponding
to the amount(reward) or if the coin is damaged or there is any another problem, then
you get nothing (penalty).
Here the machine is building a policy that which drink should be provided under what
condition and how to handle an error in the environment.

Rule Based Approach

 Rule Based Approach Refers to the AI modelling where the relationship or


patterns in data are defined by the developer.
 That means the machine works on the rules and information given by the
developer and performs the task accordingly.
30

For example: Suppose you have a dataset containing 100 images of apples and
bananas each. Now you created a machine using Computer-Vision and trained it with
the labeled images of apples and bananas. If you test your machine with an image of an
apple it will give you the output by comparing the images in its datasets. This is known
as the Rule-Based Approach.

Datasets

Dataset is a collection of related sets of information that is composed of separate


elements but can be manipulated by a computer as a unit.
31

In the Rule-based Approach we will deal with 2 divisions of the dataset:

1. Training Data – A subset required to train the model


2. Testing Data – A subset required while testing the trained the model

Training vs Testing Data

Bas
Training Set Testing Set
e

Used for Testing the Model after


Use Used for Training the Model
it is trained

It is smaller than Training Set


Is a lot bigger than testing data and
Size and constitutes about 20% to
constitutes about 70% to 80%
30%

Session 6 Neural Network

link for reference-

https://aiforkids.in/class-10/project-cycle/#problem-scoping

What is a neural network?

Neural Networks are series of networks of independent Neurons just like in our brain,
but in computers, neurons are not the living tissue it is just a algorithm which give
output based on given data.

The key advantage of Neural Networks, are that they are able to extract data features
automatically without needing the input of the programmer.

A Neural Network is essentially a system of organizing machine learning algorithms to


perform certain tasks. It is a fast and efficient way to solve problems for which the
dataset is very large, such as in images.

Biological Neural Network


32

Fig Name- Structure of human brain neurons

Soma- It is a cell body that contains Nucleus.

Axon-The axon is a long fibre that carries signl from the cell body out to other
neurons.

Dendrites-Tree like structure that carries electrical signals into the cell body.

 Synapse-The point of contact between an axon of one cell and dendrites of


another cell.

Structure of ANN
Structure: The structure of artificial neural networks is inspired by biological
neurons. A biological neuron has a cell body or soma to process the impulses,
dendrites to receive them, and an axon that transfers them to other neurons. The
input nodes of artificial neural networks receive input signals, the hidden layer
nodes compute these input signals, and the output layer nodes compute the final
output by processing the hidden layer’s results using activation functions.
33

Biological Neuron Artificial Neuron

Dendrite Inputs

Cell nucleus or Soma Nodes

Synapses Weights

Axon Output

Synapses: Synapses are the links between biological neurons that enable the
transmission of impulses from dendrites to the cell body. Synapses are the weights
that join the one-layer nodes to the next-layer nodes in artificial neurons. The
strength of the links is determined by the weight value

Layers in ANN
34

 Input Layer: Each feature in the input layer is represented by a node on the
network, which receives input data.
 Weights and Connections: The weight of each neuronal connection indicates
how strong the connection is. Throughout training, these weights are changed.
 Hidden Layers: Each hidden layer neuron processes inputs by multiplying them
by weights, adding them up, and then passing them through an activation
function. By doing this, non-linearity is introduced, enabling the network to
recognize intricate patterns.
 Output: The final result is produced by repeating the process until the output
layer is reached.

Features of Neural Network

 The Artificial Neural Network systems are modelled on the human brain and
nervous system.
 Every node of layer in a Neural Network is compulsorily a machine learning
algorithm.
 It is very useful to implement when solving problems for very huge datasets.
 They can perform multiple tasks in parallel without affecting the system
performance.
 Neural Networks have the ability to learn by themselves and produce the output
that is not limited to the input provided to them.

Training an ANN

Steps:

1. Initialize weights for all neurons.


2. Input layer
3. Calculate output as per the weights and activation function.
35

4. Compare output with the expected results.


5. Update weights if the output produced is not match

Advantages of Neural Networks

1. Effective Visual Analysis

The very first advantage of neural networks is that they lead to an effective visual
analysis. Since an artificial neural network is similar to that of a human’s neural
network, it is capable of performing more complex tasks and activities as
compared to other machines.

This involves analyzing visual information and segregating it into different


categories. A typical example of this advantage is when any website that you visit
asks you to prove whether or not you are a robot.

Robots cannot effectively analyze visual information, while humans can


successfully do so. This proves that any person logging into a website is a human
as s/he is required to differentiate between different images and put images of a
certain kind together.
2. Processing of Unorganized Data

Another one of the greatest advantages of neural networks is that it is capable of


processing unorganized data

3. User-friendly Interface

The last advantage among others is that they portray a user-friendly interface.
For any machine or artificial equipment to become a success, its interface and
usability of it should be user-friendly.
36

Disadvantages of Neural Networks


1. Hardware Requirement

Despite their ability to quickly adapt to the changing requirements of the purpose
they are supposed to work for, neural networks can be a bit hefty to arrange and
organize. This means that they require heavy machinery and hardware
equipment to work for any application.
2. Incomplete Results

The second demerit of neural networks is that they can often create incomplete
results or outputs. Since ANNs are trained to adapt to the changing applications
of neural networks, they are often left untrained for the whole process.
3. Data Suitability

Another one of the challenges of neural networks is that they are highly
dependent on the data made available to them. This infers that the efficiency of
any neural network is directly proportional to the amount of data it receives to
process.

Applications of ANN

1. Facial Recognition

Facial Recognition Systems are serving as robust systems of surveillance. Recognition


Systems matches the human face and compares it with the digital images. They are
used in offices for selective entries. The systems thus authenticate a human face and
match it up with the list of IDs that are present in its database.
2. Stock Market Prediction
37

3. Social Media

No matter how cliche it may sound, social media has altered the normal boring course
of life. Artificial Neural Networks are used to study the behaviours of social media
users. Data shared everyday via virtual conversations is tacked up and analyzed for
competitive analysis.

Neural networks duplicate the behaviours of social media users. Post analysis of
individuals' behaviours via social media networks the data can be linked to people’s
spending habits. Multilayer Perceptron ANN is used to mine data from social media
applications.
4. Defence

Defence is the backbone of every country. Every country’s state in the international
domain is assessed by its military operations. Neural Networks also shape the defence
operations of technologically advanced countries. The United States of America,
Britain, and Japan are some countries that use artificial neural networks for
developing an active defence strategy.

5. Healthcare

The age old saying goes like “Health is Wealth”. Modern day individuals are leveraging
the advantages of technology in the healthcare sector. Convolutional Neural
Networks are actively employed in the healthcare industry for X ray detection, CT
Scan and ultrasound.
6. Signature Verification and Handwriting Analysis

Signature Verification , as the self explanatory term goes, is used for verifying an
individual’s signature. Banks, and other financial institutions use signature verification
to cross check the identity of an individual.
7. Weather Forecasting
38

The forecasts done by the meteorological department were never accurate before
artificial intelligence came into force. Weather Forecasting is primarily undertaken to
anticipate the upcoming weather conditions beforehand. In the modern era, weather
forecasts are even used to predict the possibilities of natural disasters.

Session 7 Evaluation

The stage of testing the models is known as Evaluation. In this stage, we evaluate each
and every model tried and choose the model which gives the most efficient and
reliable results.
Model Evaluation
Model Evaluation in Machine Learning is the process of determining a trained model's
effectiveness and quality using a variety of metrics and approaches. It entails
evaluating whether the model achieves the required goals and how well it generalizes
to fresh, untested data. We are able to examine several models, grasp their advantages
and disadvantages, and make informed judgments thanks to model evaluation.
Determining the model's predicted accuracy and evaluating its effectiveness in solving
the given problem are the key goals of model evaluation.

Causes behind performance of AI Model


Overfitting-It refers to a situation when an AI model performs so well as the test data
it got, fitted exactly against its training data and thus AI model always produced
correct result.
Underfitting-Refers to a situation when an AI model is not complex enough to capture
the structure and relationship of its training data and predict effective outcomes.
Generalization-Is to generalize well from the training data to any data from the
problem domain. This allows us to make predictions in the future on data the model
has never seen.
It should be balanced between overfitting and underfitting.

Model Evaluation Metrics

Model evaluation has various subtopics that are essential to thoroughly assess the
performance and accuracy of machine learning models. Some of the key topics within
model evaluation include.
39

Model evaluation has various subtopics that are essential to thoroughly assess the
performance and accuracy of machine learning models. Some of the key topics within
model evaluation include.

What is a Confusion Matrix?


A confusion matrix is a matrix that summarizes the performance of a machine
learning model on a set of test data. It is a means of displaying the number of
accurate and inaccurate instances based on the model’s predictions. It is often used
to measure the performance of classification models, which aim to predict a
categorical label for each input instance.
The matrix displays the number of instances produced by the model on the test data.
 True positives (TP): occur when the model accurately predicts a positive data
point.
 True negatives (TN): occur when the model accurately predicts a negative data
point.
 False positives (FP): occur when the model predicts a positive data point
incorrectly.
 False negatives (FN): occur when the model mispredicts a negative data point.
Confusion Matrix For binary classification
A 2X2 Confusion matrix is shown below for the image recognition having a Dog
image or Not Dog image.
Actual

Dog Not Dog

True Positive False Positive


Dog (TP) (FP)

False Negative True Negative


Predicted Not Dog (FN) (TN)

True Positive (TP): It is the total counts having both predicted and actual values
are Dog.
 True Negative (TN): It is the total counts having both predicted and actual values
are Not Dog.
 False Positive (FP): It is the total counts having prediction is Dog while actually
Not Dog.
 False Negative (FN): It is the total counts having prediction is Not Dog while
actually, it is Dog.
40

Example for binary classification problems


Index 1 2 3 4 5 6 7 8 9 10

Not Not Not Not


Dog Dog Dog Dog Dog Dog
Actual Dog Dog Dog Dog

Not Not Not Not


Dog Dog Dog Dog Dog Dog
Predicted Dog Dog Dog Dog

Result TP FN TP TN TP FP TP TP TN TN

 Actual Dog Counts = 6


 Actual Not Dog Counts = 4
 True Positive Counts = 5
 False Positive Counts = 1
 True Negative Counts = 3
 False Negative Counts = 1

Predicted

Dog Not Dog

True Positive False Negative


Dog (TP =5) (FN =1)

False Positive True Negative


Actual Not Dog (FP=1) (TN=3)

Metrics based on Confusion Matrix Data


1. Accuracy
Accuracy is used to measure the performance of the model. It is the ratio of Total
correct instances to the total instances.
(TP+TN )
ACCURACY= (TP+TN + FP+ FN )
For the above case:
Accuracy = (5+3)/(5+3+1+1) = 8/10 = 0.8
2. Precision
41

Precision is a measure of how accurate a model’s positive predictions are. It is


defined as the ratio of true positive predictions to the total number of positive
predictions made by the model.
TP
Precision= (TP+ FP)
For the above case:
Precision = 5/(5+1) =5/6 = 0.8333
3. Recall
Recall measures the effectiveness of a classification model in identifying all relevant
instances from a dataset. It is the ratio of the number of true positive (TP) instances
to the sum of true positive and false negative (FN) instances.
TP
RECALL= (TP+ FN )
For the above case:
Recall = 5/(5+1) =5/6 = 0.8333

4. F1-Score
F1-score is used to evaluate the overall performance of a classification model. It is
the harmonic mean of precision and recall,
F1 SCORE= TP¿¿
For the above case:
F1-Score: = 0.8333
We balance precision and recall with the F1-score when a trade-off between
minimizing false positives and false negatives is necessary, such as in information
retrieval systems.

UNIT-III Advance Python


Session-I Jupyter Notes and other Python Related Platform
1. Introduction of various Python Platform
Python is a programming language that lets you work quickly and integrate systems more
efficiently. It is a widely-used general-purpose, high-level programming language. It was
designed with an emphasis on code readability, and its syntax allows programmers to
express their concepts in fewer lines of code. In the Python programming language, there
are two ways in which we can run our code:

1. Interactive mode
2. Script mode
42

What is Script Mode?

Script mode is where you write your code in a .py file and then run it with the
python command. This is the most common way that people use Python because it
lets you write and save your code so that you can use it again later.

What is the Interactive Mode?

Interactive mode is where you type your code into the Python interpreter directly.
This is useful for trying out small snippets of code, or for testing things out as you’re
writing them.

Session –II Python Basics in New Flavour

Types of Tokens in Python


When working with the Python language, it is important to understand the different
types of tokens that make up the language. Python has different types of tokens,
including identifiers, literals, operators, keywords, delimiters, and whitespace. Each
token type fulfills a specific function and plays an important role in the execution of a
Python script.

Keywords are reserved words in Python that have a special meaning and are used to
define the syntax and structure of the language. These words cannot be used as
identifiers for variables, functions, or other objects. Python has a set of 35 keywords,
each serving a specific purpose in the language.

There are 35 keywords in Python 3.11. They are:


and as assert async continue
43

else if not while def

except import or with del


finally in pass yield elif
for is raise await false
from lambda return break none
global nonlocal try class true

When working with the Python language, it is important to understand the different types of
tokens that make up the language. Python has different types of tokens,
including identifiers, literals, operators, keywords, delimiters, and whitespace. Each token
type fulfills a specific function and plays an important role in the execution of a Python script.

1. Identifiers in Python

Identifiers is a user-defined name given to identify variables, functions, classes, modules, or any
other user-defined object in Python. They are case-sensitive and can consist of letters, digits, and
underscores. Yet, they cannot start with a digit. Python follows a naming convention called
“snake_case,” where words are separated by underscores. Identifiers are used to make code
more readable and maintainable by providing meaningful names to objects.
44

Examples of Python Identifiers

Examples of valid Python identifiers:

 my_variable
 my_function()
 my_class
 my_module
 _my_private_variable
 my_variable_with_underscores

3. Literals in Python
Literals are the fixed values or data items used in a source code. Python supports
different types of literals such as:
(i) String Literals: The text written in single, double, or triple quotes represents the
string literals in Python. For example: “Computer Science”, ‘sam’, etc. We can also
use triple quotes to write multi-line strings.
 Python3

# String Literals
a = 'Hello'
b = "Geeks"
c = '''Geeks for Geeks is a
learning platform'''

# Driver code
print(a)
print(b)
print(c)

Output
Hello
Geeks
Geeks for Geeks is a
learning platform
(ii) Character Literals: Character literal is also a string literal type in which the
character is enclosed in single or double-quotes.
 Python3

# Character Literals
a = 'G'
b = "W"
45

# Driver code
print(a)
print(b)
Output:
G
W
(iii) Numeric Literals: These are the literals written in form of numbers. Python
supports the following numerical literals:
 Integer Literal: It includes both positive and negative numbers along with 0. It
doesn’t include fractional parts. It can also include binary, decimal, octal,
hexadecimal literal.
 Float Literal: It includes both positive and negative real numbers. It also includes
fractional parts.
 Python3

# Numeric Literals
a =5
b = 10.3
c = -17

# Driver code
print(a)
print(b)
print(c)
Output
5
10.3
-17
(iv) Boolean Literals: Boolean literals have only two values in Python. These are
True and False.
 Python3

# Boolean Literals
a =3
b = (a == 3)
c = True + 10

# Driver code
print(a, b, c)
46

Output
3 True 11
(v) Special Literals: Python has a special literal ‘None’. It is used to denote nothing,
no values, or the absence of value.
 Python3

# Special Literals
var = None
print(var)
Output
None

Python Operators
Operators are special symbols that perform operations on variables and values. For
example,
print(5 + 6) # 11

Python Operators
Operators are special symbols that perform operations on variables and values. For
example,
print(5 + 6) # 11
Run Code

Here, + is an operator that adds two numbers: 5 and 6.

Types of Python Operators


Here's a list of different types of Python operators that we will learn in this tutorial.

1. Arithmetic Operators

2. Assignment Operators

3. Comparison Operators

4. Logical Operators
47

5. Here, - is an arithmetic operator that subtracts two values or variables.

Operator Operation Example

+ Addition 5 + 2 = 7

- Subtraction 4 - 2 = 2

* Multiplication 2 * 3 = 6

/ Division 4 / 2 = 2

// Floor Division 10 // 3 = 3

% Modulo 5 % 2 = 1

** Power 4 ** 2 = 16

Example 1: Arithmetic Operators in Python


a = 7
b = 2

# addition
print ('Sum: ', a + b)

# subtraction
print ('Subtraction: ', a - b)

# multiplication
print ('Multiplication: ', a * b)

# division
print ('Division: ', a / b)

# floor division
print ('Floor Division: ', a // b)

# modulo
print ('Modulo: ', a % b)
48

# a to the power b
print ('Power: ', a ** b)
Run Code

Output
Sum: 9
Subtraction: 5
Multiplication: 14
Division: 3.5
Floor Division: 3
Modulo: 1
Power: 49

Example 2: Assignment Operators


# assign 10 to a
a = 10

# assign 5 to b
b = 5

# assign the sum of a and b to a


a += b # a = a + b

print(a)

# Output: 15

3. Python Comparison Operators


Comparison operators compare two values/variables and return a boolean
result: True or False. For example,

Operator Meaning Example

== Is Equal To 3 == 5 gives us False


49

!= Not Equal To 3 != 5 gives us True

> Greater Than 3 > 5 gives us False

< Less Than 3 < 5 gives us True

>= Greater Than or Equal To 3 >= 5 give us False

<= Less Than or Equal To 3 <= 5 gives us True

Example 3: Comparison Operators

a = 5
b = 2
# equal to operator
print('a == b =', a == b)

# not equal to operator


print('a != b =', a != b)

# greater than operator


print('a > b =', a > b)

# less than operator


print('a < b =', a < b)

# greater than or equal to operator


print('a >= b =', a >= b)

# less than or equal to operator


print('a <= b =', a <= b)
Run Code
50

Output

a == b = False
a != b = True
a > b = True
a < b = False
a >= b = True
a <= b = False

4. Python Logical Operators


Logical operators are used to check whether an expression is True or False . They are
used in decision-making. For example,
a = 5
b = 6
print((a > 2) and (b >= 6)) # True
Operator Example Meaning

Logical AND:
and a and b
True only if both the operands are True

Logical OR:
or a or b
True if at least one of the operands is True

Logical NOT:
not not a True if the operand is False and vice-
versa.

Example 4: Logical Operators


# logical AND
51

print(True and True) # True


print(True and False) # False

# logical OR
print(True or False) # True

# logical NOT
print(not True) # False

Python Comments
Comments can be used to explain Python code.
Comments can be used to make the code more readable.
Comments can be used to prevent execution when testing code.
#This is a comment
print("Hello, World!")

For multiline comment we use triple quote

One Value to Multiple Variables


And you can assign the same value to multiple variables in one line:

x = y = z = "Orange"
print(x)
print(y)
print(z)

print("assigning values of different datatypes")


a, b, c, d = 4, "geeks", 3.14, True
print(a)
print(b)
52

print(c)
print(d)

Output:
assigning values of different datatypes
4
geeks
3.14
True

Variable and Assignments

Variables
Variables are containers for storing data values.

Creating Variables
Python has no command for declaring a variable.

A variable is created the moment you first assign a value to it.

x=5 output
y = "John" 5
print(x) John

Data types in Python


53

Getting user input in Python

input()
The input function is used in all latest version of the Python. It takes the
input from the user and then evaluates the expression.
The Python interpreter automatically identifies the whether a user input
a string, a number, or a list. Let's understand the following example.

# Python program showing


# a use of input()
name = input("Enter your name: ") # String Input
age = int(input("Enter your age: ")) # Integer Input
marks = float(input("Enter your marks: ")) # Float Input
print("The name is:", name)
print("The age is:", age)
print("The marks is:", marks)
54

Enter your name: Johnson


Enter your age: 21
Enter your marks: 89
The name is: Johnson
The age is 21
The marks is: 89.0
Explanation:
By default, the input() function takes input as a string so if we need to
enter the integer or float type input then the input() function must be
type casted.

Session III
Python conditionals and loops
Conditional Statement

The if statement
In order to write useful programs, we almost always need the ability to check
conditions and change the behavior of the program accordingly. Conditional
statements give us this ability. The simplest form is the if statement, which has the
genaral form:

if BOOLEAN EXPRESSION:
STATEMENTS

A few important things to note about if statements:


1. The colon (:) is significant and required. It separates the header of
the compound statement from the body.
2. The line after the colon must be indented. It is standard in Python to use four
spaces for indenting.
3. All lines indented the same amount after the colon will be executed whenever
the BOOLEAN_EXPRESSION is true.
Here is an example:
a = 33
b = 200
if b > a:
print("b is greater than a")

The if else statement


55

It is frequently the case that you want one thing to happen when
a condition it true, and something else to happen when it is
false. For that we have the if else statement.
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")

Else
The else keyword catches anything which isn't caught by the preceding
conditions.

a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")

Python range() function


The Python range() function returns a sequence of numbers, in a given range.
The most common use of it is to iterate sequences on a sequence of numbers
using Python loops.
Example
In the given example, we are printing the number from 0 to 4.
for i in range(5):
print(i, end=" ")
print()
0 1 2 3 4

Parameter Description

start Optional. An integer number specifying at which position to start.


Default is 0

stop Required. An integer number specifying at which position to stop


(not included).
56

step Optional. An integer number specifying the incrementation.


Default is 1

What is the use of the range function in Python


In simple terms, range() allows the user to generate a series of numbers within a
given range
Python range() function takes can be initialized in 3 ways.
 range (stop) takes one argument.
 range (start, stop) takes two arguments.
 range (start, stop, step) takes three arguments.

Python range (stop)

# printing first 6
# whole number
for i in range(6):
print(i, end=" ")
print()
0 1 2 3 4 5
57

Python range (start, stop)

# printing a natural
# number from 5 to 20
for i in range(5, 20):
print(i, end=" ")
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
58

Python range (start, stop, step)

for i in range(0, 10, 2):


print(i, end=" ")
print()
0 2 4 6 8

Incrementing the Range using a Positive Step

If a user wants to increment, then the user needs steps to be a positive number.
# incremented by 4
for i in range(0, 30, 4):
print(i, end=" ")
print()
0 4 8 12 16 20 24 28

Python range() using Negative Step


59

If a user wants to decrement, then the user needs steps to be a negative number.

# incremented by -2
for i in range(25, 2, -2):
print(i, end=" ")
print()
25 23 21 19 17 15 13 11 9 7 5 3

Python range() with Float Values

Python range() function doesn’t support float numbers. i.e. user cannot use floating-
point or non-integer numbers in any of its arguments. Users can use only integer
numbers.
Operator in and not in
Ex- 3 in [1,2,3,4,5]
True
5 in [1,2,3]
False
5 not in [1,2,3,4]
True

Python for Loop


In Python, a for loop is used to iterate over sequences such
as lists, strings, tuples, etc.

languages = ['Swift', 'Python', 'Go']


# access elements of the list one by one
for i in languages:
print(i)
Swift
Python
Go
60

for loop Syntax


for val in sequence:
# statement(s)

Here, val accesses each item of the sequence on each


iteration. The loop continues until we reach the last item in
the sequence.
Example: Loop Through a String
language = 'Python'
# iterate over each character in language
for x in language:
print(x)
Output

P
y
t
h
o
n

Here, we have printed each character of the string language using a for loop.

for Loop with Python range()


In Python, the range() function returns a sequence of numbers. For example,

values = range(4)

Here, range(4) returns a sequence of 0, 1, 2 ,and 3.


Since the range() function returns a sequence of numbers, we can iterate over it using
a for loop. For example,
# iterate from i = 0 to i = 3
for i in range(4):
print(i)
61

Output

0
1
2
3

Python While Loop


Python While Loop is used to execute a block of statements
repeatedly until a given condition is satisfied. When the condition
becomes false, the line immediately after the loop in the program is
executed.
Syntax of while loop in Python
while expression:
statement(s)

# Python
# while loop
count = 0
while (count < 3):
count = count + 1
print("Hello Geek")
Hello Geek
Hello Geek
62

Hello Geek
Questions and answers

UNIT IV Data Science


Session I Introducing data science

Define Data Science?


The term “data science” combines two key elements: “data” and “science.”
1. Data: It refers to the raw information that is collected, stored, and processed. In
today’s digital age, enormous amounts of data are generated from various
sources such as sensors, social media, transactions, and more. This data can
come in structured formats (e.g., databases) or unstructured formats (e.g., text,
images, videos).
2. Science: It refers to the systematic study and investigation of phenomena
using scientific methods and principles. Science involves, analyzing data, and
drawing conclusions based on evidence.

When we put these two elements together, “data+science” refers to the scientific
study of data. Data Science involves applying scientific methods, statistical
techniques, computational tools, and domain expertise to explore, analyze, and
extract insights from data. The term emphasizes the rigorous and systematic
approach taken to understand and derive value from vast and complex datasets.
Essentially, data science is about using scientific methods to unlock the potential of
data, uncover patterns, make predictions, and drive informed decision-making across
various domains and industries.

What is Data Science in Simple Words?


Imagine you’re scrolling through your favorite social media platform, and you notice
that certain types of posts always seem to grab your attention. Maybe it’s cute animal
videos, delicious food recipes, or inspiring travel photos.
Now, from the platform’s perspective, they want to keep you engaged and coming
back for more. This is where data science comes into play. They collect a ton of
information about what you like, share, and comment on. They use data science
techniques to analyze all this information to understand your preferences better.

Real-world Applications of Data Science


1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when
we want to search for something on the internet, we mostly use Search engines like
Google, Yahoo and Bing, etc. So Data Science is used to get Searches faster.

2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless
Cars. With the help of Driverless Cars, it is easy to reduce the number of Accidents.
63

For Example, In Driverless Cars the training data is fed into the algorithm and with
the help of Data Science techniques, the Data is analyzed like what as the speed
limit in highways, Busy Streets, Narrow Roads, etc. And how to handle different
situations while driving etc.
3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always
have an issue of fraud and risk of losses. Thus, Financial Industries needs to
automate risk of loss analysis in order to carry out strategic decisions for the
company.
For Example, In Stock Market, Data Science is the main part. In the Stock Market,
Data Science is used to examine past behavior with past data and their goal is to
examine the future outcome. Data is analyzed in such a way that it makes it possible
to predict future stock prices over a set timetable.
4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a
better user experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get
recommendations according to most buy the product
5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
 Detecting Tumor.
 Drug discoveries.
 Medical Image Analysis.
 Virtual Medical Bots.
6. Image Recognition
Currently, Data Science is also used in Image Recognition. For Example, When we
upload our image with our friend on Facebook, Facebook gives suggestions Tagging
who is in the picture. This is done with the help of machine learning and Data
Science.
7. Data Science in Gaming
In most of the games where a user will play with an opponent i.e. a Computer
Opponent, data science concepts are used with machine learning where with the
help of past data the Computer will improve its performance. There are many games
like Chess, EA Sports, etc. will use Data Science concepts.
8. Autocomplete
AutoComplete feature is an important part of Data Science where the user will get
the facility to just type a few letters or words, and he will get the feature of auto-
completing the line. In Google Mail, when we are writing formal mail to someone so
at that time data science concept of Autocomplete feature is used where he/she is an
efficient choice to auto-complete the whole line. Also in Search Engines in social
media, in various apps, AutoComplete feature is widely used.

Session II Revisiting AI project cycle.


64

1. PROBLEM SCOPING
Problem Scoping means selecting a problem which we might want to solve
using our AI knowledge.
Problem Scoping is the process of identifing the scope the problem (like
cause, nature or solution of a problem) that you wish to solve with the help of
your project.
The process of finalising the aim of a system or project means you scope the
problem that you wish to solve with the help of your project. This is "Problem
Scoping".
4WS PROBLEM CANVAS
The 4Ws Problem canvas helps you in identifying the key elements related to
the problem. Let us go through each of the blocks one by one.

Who?
The “Who” block helps you in analysing the people getting affected directly or
indirectly due to it. Under this, you find out who the ‘Stakeholders’ to this
problem are and what you know about them. Stakeholders are the people
who face this problem and would be benefitted with the solution.
What?
Under the “What” block, you need to look into what you have on hand. At this
stage, you need to determine the nature of the problem. What is the problem
and how do you know that it is a problem? Under this block, you also gather
evidence to prove that the problem you have selected actually exists.
Newspaper articles, Media, announcements, etc are some examples.
Where?
Now that you know who is associated with the problem and what the problem
65

actually is; you need to focus on the context / situation / location of the
problem.
Why?
You have finally listed down all the major elements that affect the problem
directly. Now it is convenient to understand who the people that would be
benefitted by the solution are; what is to be solved; and where will the solution
be deployed. These three canvases now become the base of why you want to
solve this problem.

Data Acquisition
Data acquisition refers to the systematic process of capturing data from various
sources in the physical world, converting it into digital format, and making it
available for analysis. This process involves the use of sensors, instruments, and
technologies to measure and record parameters such as temperature, pressure,
voltage, sound, image, or any other measurable quantity.

Components of Data Acquisition


Let’s discuss the components of data acquisition in detail

 Sensors and Transducers: These devices are responsible for converting


physical phenomena, such as light, temperature, or pressure, into electrical
signals that can be processed by data acquisition systems.
 Signal Conditioning: Raw signals from sensors are often weak or noisy.
Signal conditioning involves amplification, filtering, and other techniques to
improve signal quality and accuracy.
 Data Converters: Analog-to-digital converters (ADCs) transform continuous
analog signals from sensors into discrete digital values that can be stored and
processed by computers.
 Data Logger or Computer Interface: This component establishes
communication between sensors and a computer system. It records and stores
the digital data for analysis.
 Software: Specialized software is used to control data acquisition systems,
visualize collected data, and perform analysis.
66

Data Exploration
What is data exploration?
Data exploration is a statistical process that lets you see how your data is distributed,
identify any outliers, and determine which statistical tests might be most appropriate.
When deciding what type of analysis or interpretation is most accurate for your data,
preliminary data exploration can help you understand its characteristics.

1. Descriptive analysis
Descriptive analysis offers the most basic overview of your data. For example, let’s say
you are an educator and have a spreadsheet of test scores for your learners.
Descriptive analysis would give you a summary of the data and key features

2. Visual analysis
Visual analysis helps you visualize your data's trends, distribution patterns, outliers, and
tendencies. This may be particularly useful if you have large data sets that are difficult
to understand with numbers alone fully. You can see the bigger picture of your data by
creating visual representations like graphs, charts, and plots.

3. Statistical analysis
Statistical analysis provides a deeper look into your data using mathematical tools.

Data Modeling
Data Modeling is the process of analyzing the data objects and their relationship to the
other objects. It is used to analyze the data requirements that are required for the
business processes. The data models are created for the data to be stored in a
database. The Data Model's main focus is on what data is needed and how we have
to organize data rather than what operations we have to perform.
Data Model is basically an architect's building plan. It is a process of documenting
complex software system design as in a diagram that can be easily understood. The
diagram will be created using text and symbols to represent how the data will flow. It is
also known as the blueprint for constructing new software .

Evaluation
Model evaluation is the process of using different evaluation metrics to understand a
machine learning model’s performance, as well as its strengths and weaknesses.
67

Classification

The most popular metrics for measuring classification performance include accuracy,
precision, confusion matrix, log-loss, and AUC (area under the ROC curve).

Accuracy measures how often the classifier makes the correct predictions, as it is the
ratio between the number of correct predictions and the total number of predictions.

Precision measures the proportion of predicted Positives that are truly Positive.
Precision is a good choice of evaluation metrics when you want to be very sure of
your prediction. For example, if you are building a system to predict whether to
decrease the credit limit on a particular account, you want to be very sure about the
prediction or it may result in customer dissatisfaction.
A confusion matrix (or confusion table) shows a more detailed breakdown of correct
and incorrect classifications for each class. Using a confusion matrix is useful when
you want to understand the distinction between classes, particularly when the cost of
misclassification might differ for the two classes, or you have a lot more test data on
one class than the other. For example, the consequences of making a false positive
or false negative in a cancer diagnosis are very different.

Data Evaluation
Data evaluation may include the following tasks:

 identifying significant data gaps (if any)

 performing statistical evaluation

 developing visual representations of data, data reduction

Deployment
is the mechanism through which applications, modules, updates, and patches are
delivered from developers to users. The methods used by developers to build, test
and deploy new code will impact how fast a product can respond to changes in
customer preferences or requirements and the quality of each change.

Data format use in Data Science


Different type of file formats:
CSV: the CSV is stand for Comma-separated values. as-well-as this name CSV file
is use comma to separated values. In CSV file each line is a data record and Each
record consists of one or more than one data fields, the field is separated by
commas.
XLSX: The XLSX file is Microsoft Excel Open XML Format Spreadsheet file. This is
used to store any type of data but it’s mainly used to store financial data and to
create mathematical models etc.
68

ZIP: ZIP files are used an data containers, they store one or more than one files in
the compressed form. it widely used in internet After you downloaded ZIP file, you
need to unpack its contents in order to use it.
JSON and XML
JSON (JavaScript Object Notation) and XML (Extensible Markup
Language) are formats for storing and exchanging structured and
hierarchical data
SQL and NoSQL
SQL (Structured Query Language) and NoSQL (Not Only SQL) are types of databases
that store and query data.

Session III Python for data science

What is NumPy?
NumPy is a Python library used for working with arrays.

Why Use NumPy?


In Python we have lists that serve the purpose of arrays, but they are
slow to process.

NumPy aims to provide an array object that is up to 50x faster than


traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of


supporting functions that make working with ndarray very easy.

Arrays are very frequently used in data science

Installation of NumPy
If you have Python and PIP already installed on a system, then
installation of NumPy is very easy.

Install it using this command:

C:\Users\Your Name>pip install numpy


69

If this command fails, then use a python distribution that already has
NumPy installed like, Anaconda, Spyder etc.

Import NumPy
Once NumPy is installed, import it in your applications by adding
the import keyword:

import numpy

Now NumPy is imported and ready to use.

import numpy

arr = numpy.array([1, 2, 3, 4, 5])

print(arr)
[1 2 3 4 5]

NumPy as np
NumPy is usually imported under the np alias.
alias: In Python alias are an alternate name for referring to the same
thing.
Create an alias with the as keyword while importing:
import numpy as np
Now the NumPy package can be referred to as np instead of numpy.
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

Numpy Array Python List

It is the core Library of python The core library of python


70

which is used for scientific


provides list.
computing.

It can contain similar It Contains different types of


datatypes. datatypes.

We need to Numpy Library to


It is built-in function of python.
access Numpy Arrays.

It is both homogeneous and


It is Homogeneous.
heterogeneous.

What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating
data.
The name "Pandas" has a reference to both "Panel Data", and "Python
Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?


 Pandas allows us to analyze big data and make conclusions based on statistical
theories.
 Pandas can clean messy data sets, and make them readable and relevant.
 Relevant data is very important in data science.

What Can Pandas Do?


 Is there a correlation between two or more columns?
 What is average value?
 Max value?
 Min value?

Pandas are also able to delete rows that are not relevant, or contains
wrong values, like empty or NULL values. This is called cleaning the data.
71

Installation of Pandas
If you have Python and PIP already installed on a system, then
installation of Pandas is very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

If this command fails, then use a python distribution that already has
Pandas installed like, Anaconda, Spyder etc.

import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)

cars passings
0 BMW 3
1 Volvo 7
2 Ford 2

Pandas as pd
Pandas is usually imported under the pd alias.

alias: In Python alias are an alternate name for referring to the same thing.

Create an alias with the as keyword while importing:

import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

Example
72

import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.

Installation of Matplotlib
If you have Python and PIP already installed on a system, then
installation of Matplotlib is very easy.

Install it using this command:

C:\Users\Your Name>pip install matplotlib

If this command fails, then use a python distribution that already has
Matplotlib installed, like Anaconda, Spyder etc.

Example
Draw a line in a diagram from position (0,0) to position (6,250):

import matplotlib.pyplot as plt


import numpy as np
73

xpoints = np.array([0, 6])


ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()

Homework- Types of chart


74

Basic Statistics

Mean, median, and mode


Mean, median, and mode are different measures of center in a numerical
data set. They each try to summarize a dataset with a single number to
represent a "typical" data point from the dataset.

Calculating the mean


There are many different types of mean, but usually when people say mean,
they are talking about the arithmetic mean.

The arithmetic mean is the sum of all of the data points divided by the number
of data points.
75

Finding the median


The median is the middle point in a dataset—half of the data points are smaller
than the median and half of the data points are larger.

To find the median:

 Arrange the data points from smallest to largest.


 If the number of data points is odd, the median is the middle data point
in the list.
 If the number of data points is even, the median is the average of the
two middle data points in the list.
76

The medium is 2.
77

Finding the mode


The mode is the most commonly occurring data point in a dataset. The mode
is useful when there are a lot of repeated values in a dataset. There can be
no mode, one mode, or multiple modes in a dataset.

Range
The range tells you the spread of your data from the lowest to the
highest value in the distribution. It’s the easiest measure of variability to
calculate.

To find the range, simply subtract the lowest value from the highest
value in the data set.
78

Variance-

In order to understand Standard Deviation we must know variance

Variance-

The number 7 13 and 22 have mean 14

Now each of these numbers has a difference or distance from mean as

14-7=7

14-13=1

14-22=-8

Sum of Square of difference is 49 +1 + 64=114

Count of numbers=3

Average=114/3=38

Standard Deviation-It is a square root of variance

Sqrt of 38 is 6.164=6
79

Function in Python
import numpy as np
Arry1=np.array([1,2,3,4,5,6,7,8,9])
Arry1
Other numpy functions

np.mean(arry1),np.medium(arry1),no.var(arry1),np.std(ary1)

Session IV K-Nearest Neighbour Model.


The K-Nearest Neighbors (K-NN) algorithm is a popular
Machine Learning algorithm used mostly for solving
classification problems.
In this article, you'll learn how the K-NN algorithm works with practical
examples.

We'll use diagrams, as well sample data to show how you can classify
data using the K-NN algorithm.

How Does the K-Nearest Neighbors Algorithm


Work?
The K-NN algorithm compares a new data entry to the values in a
given data set (with different classes or categories).

Based on its closeness or similarities in a given range (K) of


neighbors, the algorithm assigns the new data to a class or category in
the data set (training data).

Let's break that down into steps:

Step #1 - Assign a value to K.


Step #2 - Calculate the distance between the new data entry and all
other existing data entries (you'll learn how to do this shortly). Arrange
them in ascending order.
80

Step #3 - Find the K nearest neighbors to the new entry based on


the calculated distances.
Step #4 - Assign the new data entry to the majority class in the
nearest neighbors.
K-Nearest Neighbors Classifiers and Model Example With
Diagrams

in the above figure, we have two classes. Class A belongs to the


yellow family and Class B is belonged to the purple class according
to the above figure. Then train the dataset in the kNN model which
we discuss later but focus on just example here k=3 is three
nearest neighbors k=6 six nearest neighbors. so when we take k=3
then what happens and when k=6 then what happens? When k=3
then two belong to a purple class and one belongs to the yellow
class majority vote of purple so here the purple class is considered
similarly when k=6 then four belong to a yellow class and two
81

belong to the purple class so majority votes are yellow so consider


the yellow class. So in this way, kNN works

See Real-world applications of KNN

Preprocessing of data: In the preprocessing of data many


missing values can be found in datasets. Missing data
imputation is a procedure that uses the KNN algorithm to
estimate missing values.

Recognizing patterns: In recognizing patterns the KNN


algorithm's capacity to recognize patterns

Recommendation systems: for example, to propose content that


a user is more likely to view based on what other users watch.

Computer Vision: here for picture classification, the KNN


algorithm is used. It's important in various computer vision
applications since it can group comparable data points, such as
cats and dogs in separate classes.

In Medical field

The knn can Predict whether a patient is hospitalized due to a


heart attack, and also it will analyze to have a second heart
attack. As The prediction is based on demographic, diet, and
clinical measurements for that patient.
82

UNIT V Computer Vision


Introducing Computer vision and its Application
What is Computer Vision?
Computer vision is a field of study within artificial intelligence (AI) that focuses on
enabling computers to Intercept and extract information from images and videos, in a
manner similar to human vision. It involves developing algorithms and techniques to
extract meaningful information from visual inputs and make sense of the visual world
Computer Vision Examples:
Here are some examples of computer vision:
 Facial recognition: Identifying individuals through visual analysis.
 Self-driving cars: Using computer vision to navigate and avoid obstacles.
 Robotic automation: Enabling robots to perform tasks and make decisions
based on visual input.
 Medical anomaly detection: Detecting abnormalities in medical images for
improved diagnosis.
 Sports performance analysis: Tracking athlete movements to analyze and
enhance performance.
 Manufacturing fault detection: Identifying defects in products during the
manufacturing process.
 Agricultural monitoring: Monitoring crop growth, livestock health, and weather
conditions through visual data
83

Session II Computer Vision Concept and


OpenCV
Concept of Computer vision-Image basics
https://www.javatpoint.com/dip-color-codes-
conversion

Types of Images
There are three types of images. They are as following:

1. Binary Images
It is the simplest type of image. It takes only two values i.e, Black and
White or 0 and 1. The binary image consists of a 1-bit image and it takes
only 1 binary digit to represent a pixel. Binary images are mostly used
for general shape or outline.

For Example: Optical Character Recognition (OCR).

Binary images are generated using threshold operation. When a pixel is


above the threshold value, then it is turned white('1') and which are
below the threshold value then they are turned black('0')
84

2. Gray-scale images
Grayscale images are monochrome images, Means they have only one
color. Grayscale images do not contain any information about color. Each
pixel determines available different grey levels.

A normal grayscale image contains 8 bits/pixel data, which has 256


different grey levels. In medical images and astronomy, 12 or 16
bits/pixel images are used.

3. Colour images
Colour images are three band monochrome images in which, each band
contains a different color and the actual information is stored in the
digital image. The color images contain gray level information in each
spectral band.

The images are represented as red, green and blue (RGB images). And
each color image has 24 bits/pixel means 8 bits for each of the three
color band(RGB).
85

8-bit color format


8-bit color is used for storing image information in a computer's memory
or in a file of an image. In this format, each pixel represents one 8 bit
byte. It has 0-255 range of colors, in which 0 is used for black, 255 for
white and 127 for gray color. The 8-bit color format is also known as a
grayscale image. Initially, it was used by the UNIX operating system.

16-bit color format


The 16-bit color format is also known as high color format. It has 65,536
different color shades. It is used in the system developed by Microsoft.
The 16-bit color format is further divided into three formats which are
Red, Green, and Blue also known as RGB format.
86

24-bit color format


The 24-bit color format is also known as the true color format. The 24-bit
color format is also distributed in Red, Green, and Blue. As 24 can be
equally divided on 8, so it is distributed equally between 3 different
colors like 8 bits for R, 8 bits for G and 8 bits for B.

Different color codes


As we know that color here is of 24-bit format, which means 8 bits of red,
8 bits of green, 8 bits of blue. By changing the quantity of the 3 portions,
you can made different colors.

Binary color format


Color: Black

Image:
87

Decimal code : RGB(0,0,0)

Color: White

Decimal : RGB(255, 255, 255)


88

Resolution
Image resolution is typically described in PPI, which refers to how many pixels are
displayed per inch of an image.
Higher resolutions mean that there more pixels per inch (PPI), resulting in more
pixel information and creating a high-quality, crisp image.
Images with lower resolutions have fewer pixels,

Image Classification

It refers to the act of identifying and classifying a given image as


belonging to one of a set of predefined categories or classes.

Localisation

localization is the process of identifying the location of an object in an image or


video. It is a common task in the field of computer vision, and is used in a wide
range of applications including image and video analysis.

Object localization is an important task in computer vision, as it allows computers to


recognize and understand the content of an image or video. It is a complex and
active area of research, and new approaches and techniques are being developed to
improve the accuracy and effectiveness of object localization algorithms.
89

What is Object Detection ?


Object detection, within computer vision, involves combined action of
localization and classification within images or videos. These algorithms
commonly rely on machine learning or deep learning methods to
generate valuable outcomes.

So instead of classifying, which type of dog is present in these images, we


have to actually locate a dog in the image. That is, I have to find out
where is the dog present in the image? Is it at the center or at the bottom
left? And so on. Now the next question comes into the human mind, how
can we do that? So let’s start.
Well, we can create a box around the dog that is present in the image and
specify the x and y coordinates of this box.

for now, consider that the location of the object in the image can be represented as
coordinates of these boxes. So this box around the object in the image is formally
known as a bounding box. Now, this becomes an image localization problem where
90

we are given a set of images and we have to identify where is the object present in
the image.
Note that here we have a single class. what if we have multiple classes? here is an
example,

In this image, we have to locate the objects in the image but note that all the
objects are not dogs. Here we have a dog and a car. So we not only have to locate
the objects in the image but also classify the located object as a dog or Car. So this
becomes an object detection problem.

OpenCV- Introduction
Opencv is a huge open-source library for computer vision, machine learning, and image
processing. Now, it plays a major role in real-time operation which is very important in
today’s systems. By using it, one can process images and videos to identify objects,
faces, or even the handwriting of a human.
OpenCV allows you to perform various operations in the image.
 Read the Image : OpenCV helps you to read the image fro file or directly from
camera to make it accessible for further processing.
 Image Enhacncement : You will be able to enhance image by adjusting the
brightness , sharpness or contract of the image. This is helpful to visualize quality
of the image.
 Object detection: As you can see in the below image object can also be detected
by using OpenCV , Bracelet , watch , patterns, faces can be detected. This can
also include to recognize faces , shapes or even objects .
 Image Filtering: You can change image by applying various filters such as
blurring or Sharpening.
 Draw the Image: OpenCV allows to draw text, lines and any shapes in the
images.
91

 Saving the Changed Images: After processing , You can save images that are
being modified for future analysis.
Session III Understanding convolution operator and CNN

Convolution is the process of transforming an image by applying a


similar effects through a special array namely kernel over each pixel
and its local neighbours across the entire image. Here kernel plays
important role whose size and values depends upon the type of
operation being performed.
92

Convolution operation-
93
94

Convolutional Neural Networks (CNN)


Layers-It has three layers

Input layer-Acquire data


Hidden layer-Process the input and carry out the task
Output layer-Provide final output
95

Layers of CNN Architecture

1. Convolutional Layer

This layer is the first layer that is used to extract the various features
from the input images. In this layer, the mathematical operation of
convolution is performed between the input image and a filter of a
particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image
with respect to the size of the filter (MxM).

2.Pooling Layer- Pooling Layers divide the input data into small regions, called
pooling windows or receptive fields, and perform an aggregation operation, such as
96

taking the maximum or average value, within each window. This aggregation reduces
the size of the feature maps, resulting in a compressed representation of the input
data.

Fully connected layer-

It performs image processing and then assign weights


and accordingly label them, then based on the labels, it
assigns a class to the images.
97

Uses/Application of CNN
98

UNIT VI INTRODUCTION TO NLP

Natural language processing (NLP) is an area of computer science


and artificial intelligence concerned with the interaction between
computers and humans in natural language. The ultimate goal of NLP is
to help computers understand language as well as we do. It is the
driving force behind things like virtual assistants, speech recognition,
sentiment analysis, automatic text summarization, machine translation and
much more.

The field is divided into the three parts:

 Speech recognition — the translation of spoken language into text.


 Natural language understanding — a computer’s ability to understand language.
 Natural language generation — the generation of natural language by a computer.eg. when
we ask some information from siri or Alexa it replies in speech form.
99

1.A chatbot- is a computer program that simulates and processes human


conversation (either written or spoken), allowing humans to interact with digital
devices . Chatbots work in two simple steps. First, they identify the meaning of the
question asked and collect all the data from the user that may be required to answer
the question. Then they answer the question appropriately.

2. Autocomplete in Search Engines


Have you noticed that search engines tend to guess what you are typing and
automatically complete your sentences? For example, On typing “game” in Google,
you may get further suggestions for “game of thrones”, “game of life” or if you are
interested in maths then “game theory”. All these suggestions are provided using
autocomplete that uses Natural Language Processing to guess what you want to ask.

3. Voice Assistants

These days voice assistants are all the rage! Whether its Siri, Alexa, or Google
Assistant, almost everyone uses one of these to make calls, place reminders,
schedule meetings, set alarms, surf the internet, etc. These voice assistants have
made life much easier. But how do they work? They use a complex combination of
speech recognition, natural language understanding, and natural language
processing to understand what humans are saying and then act on it. The long
term goal of voice assistants is to become a bridge between humans and the
internet and provide all manner of services based on just voice interaction.
However, they are still a little far from that goal seeing as Siri still can’t understand
what you are saying sometimes!

4. Language Translator

Want to translate a text from English to Hindi but don’t know Hindi? Well, Google
Translate is the tool for you! While it’s not exactly 100% accurate, it is still a great
tool to convert text from one language to another.

5. Sentiment analysis is the process of analyzing digital text to determine if the


emotional tone of the message is positive, negative, or neutral. Today, companies
have large volumes of text data like emails, customer support chat transcripts,
social media comments, and reviews.
100

6. Grammar Checkers

Grammar and spelling is a very important factor while writing professional


reports for your superiors even assignments for your lecturers. After all,
having major errors may get you fired or failed! That’s why grammar and
spell checkers are a very important tool for any professional writer. They
can not only correct grammar and check spellings but also suggest better
synonyms and improve the overall readability of your content.

7. Email Classification and Filtering


Email services use natural language processing to identify the contents of
each Email with text classification so that it can be put in the correct
section. This method is not perfect since there are still some Promotional
newsletters in Primary, but its better than nothing. In more advanced
cases, some companies also use specialty anti-virus software with natural
language processing to scan the Emails and see if there are any patterns
and phrases that may indicate a phishing attempt on the employees.

Session II AI Project Cycle NLP

1. PROBLEM SCOPPING
Let us look at various factors around this problem through the 4Ws problem
canvas.
Who Canvas – Who has the problem? or Who are the stakeholders?.

What Canvas – What is the nature of the problem?


101

Where Canvas – Where does the problem arise? or What is the


context/situation in which the stakeholders experience this problem?
Why Canvas – Why do you think it is a problem worth solving and How
would it improve their situation

2. DATA ACQUISITION
To understand the sentiments of people, we need to collect their
conversational data so the machine can interpret the words that they use
and understand their meaning. Such data can be collected from various
means:
1. Surveys
2. Observing the therapist’s sessions
3. Databases available on the internet
4. Interviews, etc.
3. DATA EXPLORATION
Once the data has been collected, it needs to be processed and cleaned.
4. MODELLING
Once the text has been normalised, it is then fed to an NLP based AI model.
Depending upon the type of chatbot we try to make, there are a lot of AI
models available which help us build the foundation of our project.
5. EVALUATION
The model trained is then evaluated and the accuracy for the same is
generated on the basis of the relevance of the answers which the machine
gives to the user’s responses

How do chatbots work?


Chatbots process collected data and often are trained on that data using AI
and machine learning (ML), NLP, and rules defined by the developer. This
allows the chatbot to provide accurate and efficient responses to all
requests.
102

Who builds chatbots?


Chatbots tend to be built by chatbot developers, but not without a team of machine
learning and AI engineers, and experts in NLP. Here are a few careers involved in
building chatbots.

Chatbot developer: These professionals build a conversational experience for users


with AI, machine learning, and natural language processing.

AI engineer: AI engineers build models using machine learning algorithms and deep
learning neural networks that can be used to make decisions, such as in the production
of chatbots.

NLP engineer: NLP engineers create programs that can understand human languages
and respond accordingly, using a combination of computer science and AI.

Types Of Chatbot –
Mainly, chatbots are classified into three types: Rule-Based, AI-Based and Hybrid.

1. Rule Based Chatbot: It is also known as Decision treebots. It has a set of


predefined responses from a database for a particular query based on the
keywords uttered in the query. So it can be tedious to extract a very lengthy and
informative response from the bot. It is just like a decision tree. It gives a response
103

based on the keywords extracted from the user’s utterance. Most of them don’t
use NLP/NLU. The advantage of using this type of is that it is economic.
2. AI Based Chatbot: They are built using ML, NLP/NLU. It also provides answers
from a given database but the thing that makes it unique is that it becomes more
intelligent over time with the help of past interactions with the users.
3. Hybrid Chatbot: These are the most common type of chatbot. It is basically a mix
of both Rule-based and AI-based chatbots. They interact with humans and provide
a personalized reply i.e It can start the conversation with the user but when the
conversation gets deeper chatbot can be replaced by a human being.

Session III Human vs. Computer Languages and NLP


Similarities in Human language and Computer language

Differences in Human language and Computer language


104

Concept of NLP
 They break up problem into very small pieces to simplify.
 Remove complexity by removing extra information.
 Use AI to solve each smaller piece separately
 Tie together the processed result.
 Finally convert the processed result to numbers so that computers can
understand it.
Text Normalisation
105

 Text normalization simplifies the text for further processing.


 Text normalization divides the text into smaller components called
tokens and group related tokens together.
Steps
1. Sentence segmentation-The whole text is divided into individual
sentences.
2. Tokenisation- Sentences is then further devided into tokens.
3. Remove stop words, special char. And numbers-In this step the
unnecessary tokens are removes from the token list.
4. Stemming-In this step the remaning words are reduced to their root
words. Eg –charger is consider as charge
5. Converting text to a common case-Whole text is converted into a
lower case
6. Lemmatisation-
the process of reducing the different forms of a word to one single
form, for example, reducing "builds", "building", or "built" to the
lemma "build": Lemmatization is the process of grouping inflected
forms together as a single base form.
7. Finally convert to numbers.
106

Sentence segmentation- Sentence segmentation is a fundamental process in


natural language processing. It involves breaking down a given text into
individual sentences

Tokenisation-
Tokenization, in the realm of Natural Language Processing (NLP) and
machine learning, refers to the process of converting a sequence of text into
smaller parts, known as tokens. These tokens can be as small as characters
or as long as words.

Remove stop words, special char. And numbers-Stop words do not add
much meaning in sentence.
Ex- and, the ,a,an
107

Stemming-
The process of removing affixes from a word so that we are left with the
stem of that word is called stemming. For example, consider the words
‘run’, ‘running’, and ‘runs’, all convert into the root word ‘run’ after
stemming is implemented on them.

Lemmatisation-
Lemmatization is a text pre-processing technique used in natural language
processing (NLP) models to break a word down to its root meaning to identify
similarities. For example, a lemmatization algorithm would reduce the word
better to its root word.
108

Case Normalisation-
Case Normalisation refers to conversion of all the words in the same case
(ofter lowercase) and finally convert to numbers.

Bag of Words
It is a representation of text that describes the occurrence of words within a
document.
A Bag of words contains two things
1. Vocabulary of known words
2. A measure of the presence of own words
Bag of words contains collection of words.
How does BoW work, and how do we implement it?
Here are the steps involved when we want to implement the Bag of
Word Model:
 Preprocess the data: We should convert the text into a lowercase letter,
and we should remove all non-word characters and punctuation.
 Finding the frequent words: The vocabulary should be defined by
finding the frequency of each word in the document. Each sentence
should be tokenized into words, and we should count the number of
occurrences of the word.
109

 Model construction: We should construct the model by building a vector


to determine if a word is a frequent word. If it is a frequent word, it can be
set as 1, else 0.
 Output: We can now generate the output.

1. Data Collection:
We should consider some lines of text as different documents that need to
be vectorized:
The cat danced
Cat danced on chair
The cat danced with a chair
2. Determine the vocabulary:
The vocabulary is the set of all words found in the document. These are the
only words that are found in the documents above.
3. Counting:
The vectorization process will involve counting the number of times every
word appears:

Document the cat danced on chair a with


The cat danced 1 1 10 000
The cat danced on a chair 1 1 11 110
The cat danced with a chair 1 1 10 111
This will generate a seven-length vector for each document.

The cat danced: [1,1,1,0,0,0,0]


Cat danced on a chair: [1,1,1,1,1,1,0]
110

The cat danced with a chair: [1,1,1,1,1,1,1]


Now we can see that the BoW vector only has information about what
words occurred and how many times with any contextual information or
where they occurred.
4. Managing vocabulary:
With the above example, we can see that as the vocabulary grows, so does
the vector representation. When we consider large documents, the vector
length can stretch up to millions.
As every document has a few known words, that will create a lot of empty
spots with 0s called sparse vectors. When modeling becomes cumbersome
with traditional algorithms, there are cleaning methods for reducing the
vocabulary size. It will include ignoring the punctuations, fixing the
misspelled words, and ignoring the common words like ‘a’, ‘of’, etc.
5. Scoring words:
Scoring words is attaching a numerical value for marking the occurrences
of the words. In the above example, scoring was binary: it means the
presence or absence of the words. There are other methods as well:
Counts: The number of times each word appears in a document.
Frequencies: Frequencies are the calculation of word frequency in a
document in contrast to the total number of words in that document.
Application of TD-IDF
Document Classification - Helps in classifying the type .
Topic Modelling - It helps in predicting the topic for a corpus.
Information Retrieval System - To extract the important information out
of a corpus.
Stop word filtering - Helps in removing the unnecessary words out of a
text body.

You might also like