02 ML Supervised Learning
02 ML Supervised Learning
Supervised Learning
Machine Learning – Model
A machine learning model is defined as a mathematical representation of the output of the training process.
It recognizes certain types of patterns. A model is trained over a set of data, providing it an algorithm that it
can use to reason over and learn from those data. Once you have trained the model, you can use it to reason
over data that it hasn't seen before, and make predictions about those data.
Machine learning is the study of different algorithms that can improve automatically through experience & old
data and build the model. A machine learning model is similar to computer software designed to recognize
patterns or behaviors based on previous experience or data. The learning algorithm discovers patterns within
the training data, and it outputs an ML model which captures these patterns and makes predictions on new
data.
For example, let's say you want to build an application that can recognize a user's emotions based
on their facial expressions. You can train a model by providing it with images of faces that are each
tagged with a certain emotion, and then you can use that model in an application that can
recognize any user's emotion.
Machine Learning – Model
Machine Learning models can be understood as a program that has been trained to find patterns within new
data and make predictions. These models are represented as a mathematical function that takes requests in the
form of input data, makes predictions on input data, and then provides an output in response. First, these
models are trained over a set of data, and then they are provided an algorithm to reason over data, extract the
pattern from feed data and learn from those data. Once these models get trained, they can be used to predict
the unseen dataset.
Take another example, let's say you want to build an application that can recognize a geometric
shapes like triangles, rectangles etc. Based on criteria such as number of sides, angles, you can train
the model by providing it with different types of geometric shapes. Once model is trained, any new
shape having same criteria can be identified by the model
Machine Learning – Classification of ML Models
Based on different business goals and data sets, there are three learning models for algorithms. Each machine
learning algorithm settles into one of the three models:
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Machine Learning – ML Models
Supervised Learning
Supervised learning, one of the most used methods in ML, takes both training data (also called data samples)
and its associated output (also called labels or responses) during the training process. The major goal of
supervised learning methods is to learn the association between input training data and their labels. For this it
performs multiple training data instances.
Supervised algorithms are called supervised because the machine learning model learns from data samples
where the output is known in advance. In this sense, the whole process of learning in supervised learning
algorithms can be thought as it is being supervised by a supervisor.
Machine Learning – ML Models
Supervised Learning
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.
If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
If the given shape has six equal sides then it will be labelled as hexagon.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape
on the bases of a number of sides, and predicts the output.
Machine Learning – ML Models
Unsupervised Learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to
act on that data without any supervision.
These models are not supervised using training dataset. Instead, models itself find the hidden patterns and insights from the
given data. It can be compared to learning which takes place in the human brain while learning new things.
we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying
structure of dataset, group that data according to similarities, and represent that dataset in a compressed format.
Social network analysis − Social network analysis is conducted to make clusters of friends depends on the frequency of
connection between them. Such analysis reveals the links between the users of some social networking website.
Market segmentation − Sales organizations can cluster or group their users into multiple segments on the basis of their prior
billed items. For instance, a big superstore can required to send an SMS about grocery elements specifically to its users of
grocery rather than sending that SMS to all its users.
Machine Learning – ML Models
Reinforcement Learning
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in an environment
to obtain maximum reward. This optimal behavior is learned through interactions with the environment and observations of
how it responds, similar to children exploring the world around them and learning the actions that help them achieve a goal.
In the absence of a supervisor, the learner must independently discover the sequence of actions that maximize the reward.
This discovery process is akin to a trial-and-error search. The quality of actions is measured by not just the immediate reward
they return, but also the delayed reward they might fetch. As it can learn the actions that result in eventual success in an
unseen environment without the help of a supervisor, reinforcement learning is a very powerful algorithm.
•Autonomous Driving. An autonomous driving system must perform multiple perception and planning tasks in an
uncertain environment. Some specific tasks where RL finds application include vehicle path planning and motion
prediction. Vehicle path planning requires several low and high-level policies to make decisions over varying
temporal and spatial scales. Motion prediction is the task of predicting the movement of pedestrians and other
vehicles, to understand how the situation might develop based on the current state of the environment.
Machine Learning – ML Models
Reinforcement Learning
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond and avoid
the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path which gives him the
reward with the least hurdles. Each right step will give the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward that is the diamond.
Machine Learning – Supervised Learning
Based on machine learning based tasks, we can divide supervised learning algorithms in following two classes
Regression
Classification
Regression
Regression algorithms are used if there is a relationship between the input variable and the output variable. It
is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are
some popular Regression algorithms which come under supervised learning
Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes
such as Yes-No, Male-Female, True-false, etc.
Machine Learning – Supervised Learning
Regression
Linear Regression
Regression Trees
Non-Linear Regression
Polynomial Regression
Machine Learning – Supervised Learning
Classification
Random Forest
Decision Trees
Logistic Regression
Regression analysis is a fundamental concept in the field of machine learning. It helps in establishing a relationship among
the variables by estimating how one variable affects the other.
There are various scenarios in the real world where we need some future predictions such as weather condition, sales
prediction, marketing trends, etc., for such case we need some technology which can make predictions more accurately. So
for such case we need Regression analysis which is a statistical method and used in machine learning and data science.
Regression estimates the relationship between the target and the independent variable.
By performing the regression, we can confidently determine the most important factor, the least important factor,
and how each factor is affecting the other factors.
Machine Learning – Supervised Learning
Regression
Examples :-
Car Purchase -Imagine you’re going to purchase a car and have decided that gas mileage is a deciding factor in your
decision to buy. If you wanted to predict the miles per gallon of some promising rides, how would you do it? Well, since
you know the different features of the car (weight, horsepower, displacement, etc.) one possible method is regression. By
plotting the average MPG of each car given its features you can then use regression techniques to find the relationship of
the MPG and the input features. The regression function here could be represented as $Y = f(X)$, where Y would be the
MPG and X would be the input features like the weight, displacement, horsepower, etc.
Linear regression is one of the regression technique in which a dependent variable has a linear relationship
with an independent variable. The main goal of Linear regression is to consider the given data points and plot
the trend line that fit the data in the best way possible.
Example-
-Let’s say we have a dataset that contains information about the relationship between X and Y. Number of
observations are made on X and Y and are recorded . This will be our training data. Our goal is to design a
model that can predict the Y value if the X value is provided. Using the training data, a regression line is
obtained which will give the minimum error. This linear equation is then used to apply for new data. That is, if
we give X as an input, our model should be able to predict Y with minimum error.
-Let us consider another example that there’s a connection between how many hours a student study and
marks; regression analysis can help us understand that connection. Regression analysis will provide us with a
relation that can be visualized into a graph to make predictions about your data.
Machine Learning – Supervised Learning
Regression - Linear
The goal of regression analysis is to create a trend line based on the data. This then allows us to determine
whether other factors apart from hours of study affect the student marks, such as level of stress, etc. Before
taking that into account, we need to look at these factors and attributes and determine whether there is a
correlation between them. Linear Regression can then be used to draw a trend line which can then be used to
confirm or deny the relationship between attributes.
Machine Learning – Supervised Learning
Regression - Linear
Model Performance
After the model is built, We need to check the difference between the values predicted and actual data, if it is
not much, then it is considered to be a good model.
Machine Learning – Supervised Learning
Regression - Linear
Python Code
After the model is built, We need to check the difference between the values predicted and actual data, if it is
not much, then it is considered to be a good model.
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5, 6]).reshape((-1, 1))
y = np.array([2, 5, 6, 8, 9, 12])
model = LinearRegression()
model.fit(x, y)
Y_pred = model.predict(x)
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)
plt.scatter(x, y)
plt.plot(x, Y_pred, color='red')
plt.show()
Note :-The coefficient of determination is a statistical measurement that examines how differences in one variable
can be explained by the difference in a second variable, when predicting the outcome of a given event.
Machine Learning – Supervised Learning
Multivariate Linear Regression
Linear Regression is one of the most used statistical models in the industry. The main advantage of linear
regression lies in its simplicity and interpretability. Linear regression is used to forecast revenue of a company
based on parameters, forecasting player’s growth in sports, predicting the price of a product given the cost of
raw materials, predicting crop yield given rainfall and much much more. During our internship at Ambee, we
were given a warm-up task to predict car prices given the dataset. This task strengthened our understanding of
feature selection for multivariate linear regression and statistical measures for choosing the right model. You
might be wondering why does an environment company makes interns work on a car pricing dataset. At Ambee,
we celebrate outside data as much as inside data. That’s what makes us relate things like how a change in
pollutants impacts health businesses’ economies of scale, which aren’t seen directly by many but affect
indirectly. It is important for a data scientist to gain domain knowledge but it is also important to keep an open
mind on external factors that can be directly or indirectly related. Regression is a statistical technique used to
model continuous target variables. It has also been adopted to Machine Learning to predict continuous variables.
Regression models the target variable as a function of independent variables also called as predictors. Linear
Regression fits a straight line to our data. Simple Linear Regression (SLR) models target variable as a function
of a single predictor whereas Multivariate Linear Regression (MLR) models target variable as a function of
multiple predictors.
Machine Learning – Supervised Learning
Multivariate Linear Regression
Problem Statement
A new car manufacturer is looking to set up business in the US Market. They need to know the factors on which
the pricing of a car depends on to take on their competition in the market. The company wants to know the
variables the price depends on and to what extent does the variables explain the price of a car.
Business Goal
We need to build a model for the price of a car as a function of explanatory variables. The company will then use
it to configure the price of a car according to its features or configure the features according to its price. In this
blog post, we shall go through the process of cleaning the data, understanding our variables and modelling using
linear regression. Let us import our libraries. Numpy is a fast matrix computation library that most of the other
libraries depend on and we might need it at some point. Pandas is our data manipulation library and one of the
most important libraries in our pipeline. matplotlib and Seaborn are used for plotting graphs.
Machine Learning – Supervised Learning
Multivariate Linear Regression
Python Code
Machine Learning – Supervised Learning
Multi Level Models
Many kinds of data, including observational data collected in the human and biological sciences, have a
hierarchical or clustered structure. For example, children with the same parents tend to be more alike in
their physical and mental characteristics than individuals chosen at random from the population at large.
Individuals may be further nested within geographical areas or institutions such as schools or employers.
Multilevel data structures also arise in longitudinal studies where an individual’s responses over time are
correlated with each other.
Multilevel models recognize the existence of such data hierarchies by allowing for residual components at
each level in the hierarchy. For example, a two-level model which allows for grouping of child outcomes
within schools would include residuals at the child and school level. Thus the residual variance is
partitioned into a between-school component (the variance of the school-level residuals) and a within-
school component (the variance of the child-level residuals). The school residuals, often called ‘school
effects’, represent unobserved school characteristics that affect child outcomes. It is these unobserved
variables which lead to correlation between outcomes for children from the same school.
Machine Learning – Supervised Learning
Multi Level Models
Polynomial regression is a special case of linear regression where we fit a polynomial equation on the data
with a curvilinear relationship between the independent variable x and dependent variable y is modeled as
an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the
corresponding conditional mean of y
Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and
independent variable(x) as nth degree polynomial. The Polynomial Regression equation is given below:
It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the
Multiple Linear regression equation to convert it into Polynomial Regression.
It is a linear model with some modification in order to increase the accuracy.
The dataset used in Polynomial regression for training is of non-linear nature.
It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.
Hence, "In Polynomial regression, the original features are converted into Polynomial features of required degree
(2,3,..,n) and then modeled using a linear model."
Machine Learning – Supervised Learning
Regression - Polynomial
In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a linear
model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is suitable to cover most
of the data points, which is of the Polynomial model.
Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression model
instead of Simple Linear Regression.
Machine Learning – Supervised Learning
Regression – Polynomial
Python Code =================================
# Polynomial Regression