0% found this document useful (0 votes)
10 views

Chapter 4- Machine Learning

Chapter 4 introduces machine learning, defining it as a field that enables computers to learn from data without explicit programming. It covers types of machine learning, including supervised, unsupervised, and reinforcement learning, and outlines the machine learning process from problem definition to predictions. Key algorithms such as Linear Regression, Logistic Regression, Random Forest, and K-NN are discussed, along with their applications and differences.

Uploaded by

sravane1608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 4- Machine Learning

Chapter 4 introduces machine learning, defining it as a field that enables computers to learn from data without explicit programming. It covers types of machine learning, including supervised, unsupervised, and reinforcement learning, and outlines the machine learning process from problem definition to predictions. Key algorithms such as Linear Regression, Logistic Regression, Random Forest, and K-NN are discussed, along with their applications and differences.

Uploaded by

sravane1608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 81

Chapter 4

Introduction to Machine
Learning
4.1 What is machine learning?
4.2 Why learn? When it is required?
4.3 Types of Machine Learning- Supervised
learning, Unsupervised learning ,Reinforcement learning
4.4 Comparison between traditional software algorithm and machine
learning algorithm
• Introduction To Machine Learning
• The term Machine Learning was first coined by Arthur Samuel in the
year 1959. Looking back, that year was probably the most significant
in terms of technological advancements.
• “It is the field of study that gives computers the ability to learn
without being explicitly programmed” –Arthur Samuel,1959
• When machine learning is seen as a process
“machine learning is the process by which a computer can work more
accurately as it collects and learns from the data it is given –Mike
Roberts

Ex autocomplete feature on cell phone for texting messages.


• To give you a better understanding of how important Machine Learning is, let’s
list down a couple of Machine Learning Applications:
• Netflix’s Recommendation Engine: The core of Netflix is its infamous
recommendation engine. Over 75% of what you watch is recommended by Netflix
and these recommendations are made by implementing Machine Learning.
• Facebook’s Auto-tagging feature: The logic behind Facebook’s DeepMind face
verification system is Machine Learning and Neural Networks. DeepMind studies
the facial features in an image to tag your friends and family.
• Amazon’s Alexa: The infamous Alexa, which is based on Natural Language
Processing and Machine Learning is an advanced level Virtual Assistant that does
more than just play songs on your playlist. It can book you an Uber, connect with
the other IoT devices at home, track your health, etc.
• Google’s Spam Filter: Gmail makes use of Machine Learning to filter out spam
messages. It uses Machine Learning algorithms and Natural Language Processing
to analyze emails in real-time and classify them as either spam or non-spam.
Features of Machine Learning
• 1) It uses data to detect patterns in a dataset and adjust the program
actions accordingly.
• 2) It focuses on development of computer programs that can teach
themselves to grow and change when exposed to new data.
• 3)It enables computers to find hidden insights using iterative
algorithms without being explicitly programmed.
• 4) It is a method of data analysis that automates analytical model
building.
What is Machine learning ?

Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used to learn patterns
from data and draw significant information from it. It is the logic behind a Machine Learning model. An
example of a Machine Learning algorithm is the Linear Regression algorithm.

Model: A model is the main component of Machine Learning. A model is trained by using a Machine
Learning Algorithm. An algorithm maps all the decisions that a model is supposed to take based on the
given input, in order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to
predict the output.
• Response Variable: It is the feature or the output variable that needs
to be predicted by using the predictor variable(s)
• Training Data: The Machine Learning model is built using the training
data. The training data helps the model to identify key trends and
patterns essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate
how accurately it can predict an outcome. This is done by the testing
data set.
• Altogether the first step in a machine learning process is to feed the
computer a large amount of data. The system is then trained to
recognize hidden trends and insights utilizing this data.

• These observations are then applied to create a machine learning


model using an algorithm to address a challenge.
Machine Learning Process
• Building a predictive model is a step in the machine learning process
that can be utilized to solve problems.
• Let’s assume that you have been given a problem that needs to be
solved using machine learning in order to better grasp the machine
learning process.
• Step 1: Define the objective of the Problem Statement
• At this step, we must understand what exactly needs to be predicted.
In our case, the objective is to predict the possibility of rain by
studying weather conditions. At this stage, it is also essential to take
mental notes on what kind of data can be used to solve this problem
or the type of approach you must follow to get to the solution.
• Step 2: Data Gathering
• At this stage, you must be asking questions such as,
• What kind of data is needed to solve this problem?
• Is the data available?
• How can I get the data?
• Once you know the types of data that is required, you must
understand how you can derive this data. Data collection can be done
manually or by web scraping.
• the data needed for weather forecasting includes measures such as
humidity level, temperature, pressure, locality, whether or not you
live in a hill station, etc. Such data must be collected and stored for
analysis.
• Step 3: Data Preparation
• The data you collected is almost never in the right format. You will
encounter a lot of inconsistencies in the data set such as missing
values, redundant variables, duplicate values, etc. Removing such
inconsistencies is very essential because they might lead to wrongful
computations and predictions.
• Step 4: Exploratory Data Analysis
• It is the brainstorming stage of Machine Learning. Data Exploration
involves understanding the patterns and trends in the data. At this
stage, all the useful insights are drawn and correlations between the
variables are understood.
• When it comes to predicting rainfall, we are aware that there is a
good chance of rain if the temperature has dropped. At this point, it is
necessary to comprehend and map these correlations.
• Step 5: Building a Machine Learning Model
• The Machine Learning Model is constructed using all of the
conclusions and trends discovered during Data Exploration. The data
set is divided into training and testing halves at the start of this stage.
The model will be developed and examined using the training data.
The machine learning algorithm that is being used forms the basis of
the model's reasoning.
• In the case of predicting rainfall, since the output will be in the form
of True (if it will rain tomorrow) or False (no rain tomorrow), we can
use a Classification Algorithm such as Logistic Regression.

• Step 6: Model Evaluation & Optimization


• It's finally time to test the model after developing it using the training
data set. The testing data set is used to evaluate the model's
effectiveness and degree of outcome prediction accuracy.
• Any additional model improvements can now be made once the
correctness has been determined. The performance of the model can
be increased by using techniques like cross-validation and parameter
adjustment.
• Step 7: Predictions
• The model is assessed and enhanced before being utilized to make
predictions. The output can either be a Continuous Quantity or a
Categorical Variable (such as True or False) (eg. the predicted value of
a stock).
• In our case, for predicting the occurrence of rainfall, the output will
be a categorical variable.
• Machine Learning Types

Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning

• Using well-labeled data, we instruct or train the machine using the


approach of supervised learning. Ex. We all needed assistance as
children to tackle math issues. Our teachers explained addition to us
and showed us how to do it. Similar to machine learning, supervised
learning is a subset of learning that includes a guide. The teacher who
will teach you to recognize patterns in the data is the labelled data
set. The training data set is the only component of the labelled data
set.
What is data labeling?

• In machine learning, data labeling is the process of identifying raw data


(images, text files, videos, etc.) and adding one or more meaningful and
informative labels to provide context so that a machine learning model
can learn from it.
• Think about the image above. Here, we're feeding the computer Tom
and Jerry photographs with the intention of having it recognize and
divide the images into two categories (Tom images and Jerry images).

• We label the training set of data that we feed the model, informing it,
"This is how Tom looks, and this is Jerry." By doing this, you are using
labelled data to train the computer.
• With the aid of labelled data, there is a clearly defined training phase
in supervised learning.
Linear Regression vs Logistic Regression

• The Linear Regression is used for solving Regression problems


whereas Logistic Regression is used for solving the Classification
problems.
• The goal of the Linear regression is to find the best fit line that can
accurately predict the output for the continuous dependent variable.
• If single independent variable is used for prediction then it is called
Simple Linear Regression and if there are more than two independent
variables then such regression is called as Multiple Linear Regression.
• By finding the best fit line, algorithm establish the relationship
between dependent variable and independent variable. And the
relationship should be of linear nature.
• The output for Linear regression should only be the continuous values
such as price, age, salary, etc.
• In above image the dependent variable is on Y-axis (salary) and independent variable is on x-
axis(experience). The regression line can be written as:
• y= a0+a1x+ ε

• Where, a0 and a1 are the coefficients and ε is the error term.


• Logistic regression is used to predict the categorical dependent
variable with the help of independent variables.
• The output of Logistic Regression problem can be only between the 0
and 1.
• Logistic regression can be used where the probabilities between two
classes is required. Such as whether it will rain today or not, either 0
or 1, true or false etc.
• Logistic regression is based on the concept of Maximum Likelihood
estimation. According to this estimation, the observed data should be
most probable.
• In logistic regression, we pass the weighted sum of inputs through an
activation function that can map values in between 0 and 1. Such
activation function is known as sigmoid function and the curve
obtained is called as sigmoid curve or S-curve. Consider the below
image:

The equation for logistic


regression
Random Forest
• Random Forest is a popular machine learning algorithm that belongs
to the supervised learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model
• "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average
to improve the predictive accuracy of that dataset.
• Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of
predictions, and it predicts the final output
• The greater number of trees in the forest leads to higher accuracy
and prevents the problem of overfitting.
• Random Forest is capable of performing both Classification and
Regression tasks.
Naïve Bayes
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Bayes’ Theorem finds the probability of an event
occurring given the probability of another event that has
already occurred. Bayes’ theorem is stated
mathematically as the following equation:
Types of Problems solved by
Supervised Learning
• Classification problems ask the algorithm to predict a discrete value that
can identify the input data as a member of a particular class or group.
Taking up the animal photos dataset, each photo has been labeled as a dog,
a cat, etc., and then the algorithm has to classify the new images into any
of these labeled categories.
• Regression problems are responsible for continuous data, e.g., for
predicting the price of a piece of land in a city, given the area, location, etc..
Here, the input is sent to the machine for predicting the price according to
previous instances. And the machine determines a function that would map
the pairs. If it is unable to provide accurate results, backward propagation is
used to repeat the whole function until it receives satisfactory results.
Types of Problems solved by Unsupervised Learning
• Clustering
• "A way of grouping the data points into different clusters, consisting
of similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group.“
• Association
• Unsupervised Learning
• Unsupervised learning involves training by using unlabeled data and
allowing the model to act on that information without guidance.
• Think of unsupervised learning as a smart kid that learns without any
guidance. In this type of Machine Learning, the model is not fed with
labeled data, as in the model has no clue that ‘this image is Tom and this
is Jerry’, it figures out patterns and the differences between Tom and Jerry
on its own by taking in tons of data.
• For instance, it identifies Tom's distinguishing characteristics, such as his
larger size and pointy ears, to recognize that this image is of type 1.

• Similar traits are present in Jerry, therefore it recognizes that this image as a
type 2 image.

• As a result, without knowing who Tom or Jerry are, it divides the photos into
two groups.
Unsupervised Algorithms
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
K- means clustering
• K-Means Clustering is an Unsupervised Learning algorithm, which
groups the unlabeled dataset into different clusters. Here K defines
the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be
three clusters, and so on.

• It is an iterative algorithm that divides the unlabeled dataset into k


different clusters in such a way that each dataset belongs only one
group that has similar properties.
The apriori algorithm
• It utilizes frequent itemsets to create association rules. Frequent
itemsets are the items with a greater value of support. The algorithm
generates the itemsets and finds associations by performing multiple
scanning of the full dataset. Say, you have four transactions:
• transaction 1={apple, peach, grapes, banana};
• transaction 2={apple, potato, tomato, banana};
• transaction 3={apple, cucumber, onion}; and
• transaction 4={oranges, grapes}.
• As we can see from the transactions, the frequent itemsets are
{apple}, {grapes}, and {banana} according to the calculated support
value of each.
• Itemsets can contain multiple items. For instance, the support value
for {apple, banana} is two out four, or 50%.
K-NN
• K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
• K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
• It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data.
• Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B,
and we have a new data point x1, so this data point will lie in which of
these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category
or class of a particular dataset. Consider the below diagram:
• Example: Suppose, we have an image of a creature that looks similar
to cat and dog, but we want to know either it is a cat or dog. So for
this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the
new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
• How does K-NN work?
• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
• Step-6: Our model is ready.
• Suppose we have a new data point and we need to put it in the
required category. Consider the below image:
• Firstly, we will choose the number of neighbors, so we will choose the
k=5.
• Next, we will calculate the Euclidean distance between the data
points. The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated as:
• By calculating the Euclidean distance we got the nearest neighbors,
as three nearest neighbors in category A and two nearest neighbors
in category B. Consider the below image:

•As we can see the 3 nearest neighbors are from


category A, hence this new data point must belong
to category A.
• Reinforcement Learning
• Reinforcement Learning is a part of Machine learning where an agent
is put in an environment and he learns to behave in this environment
by performing certain actions and observing the rewards which it gets
from those actions.

• This type of Machine Learning is comparatively different. Imagine that


you were dropped off at an isolated island!
• Reinforcement Learning is mainly used in advanced Machine Learning
areas such as self-driving cars, AlphaGo (is a computer program that
plays the board game Go.) etc.
• reinforcement learning is a type of learning that is based
on interaction with the environment.
• To begin with, there is always a start and an end state for an agent
(the AI-driven system); however, there might be different paths for
reaching the end state, like a maze. This is the scenario wherein
reinforcement learning is able to find a solution for a problem.
Examples of reinforcement learning include self-navigating vacuum
cleaners, driverless cars, scheduling of elevators, etc.
• Consider an example of a child trying to take his/her first steps. What
will be the instructions he/she follows to start walking?
• Observing others walking and trying to replicate the same
• Standing still
• Remaining still
• Trying to balance the body weight, along with deciding on which foot
to advance first to start walking.It sounds like a difficult and
challenging task for a child to get up and walk, right? But for us, it is
easy since we have become used to it over time.
• Now, putting it together, a child is an agent who is trying to
manipulate the environment (surface or floor) by trying to walk and
going from one state to another (taking a step). A child gets a
reward when he/she takes a few steps (appreciation) but will not
receive any reward or appreciation if he/she is unable to walk. This is
a simplified description of a reinforcement learning problem.
What to use and when?
Traditional algorithm v/s machine
learning

You might also like