Predictive Analytics (2)
Predictive Analytics (2)
UNIT: 1
Linear methods for regression and classification: over view of
supervised learning, linear regression models and least squares,
multiple regression, multiple outputs, subset selection, Ridge
regression, lasso regression, linear discriminant analysis, logistic
regression, Perceptron learning algorithm
WHAT IS PREDICTIVE ANALYTICS ???
A labelled dataset is one that has both input and output parameters. In this type
of learning both training and validation is done
The labeled dataset used in supervised learning consists of input features and
corresponding output labels
The input features are the attributes or characteristics of the data that are used to
make predictions, while the output labels are the desired outcomes or targets that
the algorithm tries to predict.
•Figure A: It is a dataset of a shopping store that is useful in predicting whether a customer
will purchase a particular product under consideration or not based on his/ her gender, age,
and salary.
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that
the customer won’t purchase it.
-----------------------------------------------------------------------------------------------------------
•Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed
based on different parameters.
•
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
•
Output: Wind Speed
Types of Supervised Learning Algorithm
Regression:
Regression is a supervised learning technique used to predict continuous
numerical values based on input features. It aims to establish a functional
relationship between independent variables and a dependent variable, such as
predicting house prices based on features like size, bedrooms, and location.
The goal is to minimize the difference between predicted and actual values using
algorithms like Linear Regression, Decision Trees, or Neural Networks.
Classification
In the case of one independent variable it is called simple linear regression. For
more than one independent variable, the process is called mulitple linear
regression.
Let X be the independent variable and Y be the dependent variable. We will
define a linear relationship between these two variables as follows:
This is the equation for a line that you studied in high school. m is the slope of the line and c is the
y intercept. Today we will use this equation to train our model with a given dataset and predict the
value of Y for any given value of X.
Our challenege today is to determine the value of m and c, that gives the minimum error for the
given dataset. We will be doing this by using the Least Squares method.
We are squaring it because, for the points below the regression line y — p will be negative and
we don’t want negative values in our total error.
Least Squares method
Now that we have determined the loss function, the only thing left to do is minimize it.
This is done by finding the partial derivative of L, equating it to 0 and then finding an
expression for m and c. After we do the math, we are left with these equations:
Here x̅ is the mean of all the values in the input X and ȳ is the mean of all the values in
the desired output Y. This is the Least Squares method.
Multiple Regression:
Multiple linear regression is used to estimate the relationship between two or more independent
variables and one dependent variable.
You can use multiple linear regression when you want to know:
1.How strong the relationship is between two or more independent variables and one dependent variable (e.g. how
rainfall, temperature, and amount of fertilizer added affect crop growth).
2.The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a
crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple linear regression example
You are a public health researcher interested in social factors that influence heart disease. You survey 500 towns
and gather data on the percentage of people in each town who smoke, the percentage of people in each town who
bike to work, and the percentage of people in each town who have heart disease.
Because you have two independent variables and one dependent variable, and all your variables are quantitative,
you can use multiple linear regression to analyze the relationship between them.
How to perform a multiple linear regression
Multiple linear regression formula
The formula for a multiple linear regression is:
To find the best-fit line for each independent variable, multiple linear
regression calculates three things:
•The regression coefficients that lead to the smallest overall model
error.
•The t statistic of the overall model.
•The associated p value (how likely it is that the t statistic would have
occurred by chance if the null hypothesis of no relationship between
the independent and dependent variables was true).
It then calculates the t statistic and p value for each regression
coefficient in the model.
SUBSET SELECTION:
SubSet selection is a way of selecting the subset of the most relevant features from the original
features set by removing the redundant, irrelevant, or noisy features.
• While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant.
• If we input the dataset with all these redundant and irrelevant features, it may negatively impact
and reduce the overall performance and accuracy of the model.
• Hence it is very important to identify and select the most appropriate features from the data and
remove the irrelevant or less important features, which is done with the help of feature selection in
machine learning.
• Feature selection is one of the important concepts of machine learning, which highly impacts the
performance of the model. As machine learning works on the concept of "Garbage In Garbage
Out", so we always need to input the most appropriate and relevant dataset to the model in order to
get a better result.
Subset selection is a technique used in statistics and machine learning to choose a subset of features or variables from a larger set.
The goal is to identify the most relevant and informative subset that optimizes a certain criterion, such as model performance,
interpretability, or efficiency. There are various methods for subset selection, and they can be broadly categorized into three types:
Forward Selection:
Procedure: Start with an empty set and iteratively add variables that most improve the model performance.
Algorithmic Steps:
Evaluate models with one variable and select the one with the best performance.
Add one variable at a time, selecting the variable that contributes the most improvement.
Stop when a certain criterion is met (e.g., model performance no longer improves).
Backward Elimination:
Procedure: Start with all variables and iteratively remove the least valuable variables.
Algorithmic Steps:
Evaluate the model with all variables and select the one with the least contribution.
Remove one variable at a time, excluding the variable with the least impact.
Stop when a certain criterion is met.
Stepwise Selection:
Procedure: Combine elements of both forward and backward selection.
Algorithmic Steps:
Evaluate models with one variable and select the one with the best performance.
At each step, add or remove one variable based on its impact.
Continue until the addition or removal of variables no longer improves the model.
These methods are often used in the context of linear regression or other statistical models, where the goal is to improve
model fit or reduce overfitting. The choice of the subset selection method depends on the specific goals, the nature of the data,
and the characteristics of the model.
It's important to note that subset selection should be done carefully, considering the potential for overfitting or loss of
information. Cross-validation and other model evaluation techniques are often employed to ensure that the selected subset
generalizes well to new data.
EXAMPLE : Predicting Housing Prices
Features:
Square Footage (X1)
Number of Bedrooms (X2)
Distance to City Center (X3)
Presence of Nearby Schools (X4)
Crime Rate in the Neighborhood (X5)
Target Variable:
Housing Price (Y)
Implement a stepwise process of adding or removing features based on their impact on the model.
Start with a subset of features and iteratively evaluate the model's performance.
Add features that improve the model or remove features that do not contribute significantly.
Final Subset:
Select the final subset of features that optimizes the model's predictive performance,
interpretability, or any other defined criteria.
Example Outcome:
After the subset selection process, the model might identify that Square Footage (X1), Number of
Bedrooms (X2), and Distance to City Center (X3) are the most significant features for predicting
housing prices. These features may provide a good balance between predictive power and
simplicity.
RIDGE REGRESSSION:
Ridge regression is a statistical regularization technique. It corrects for overfitting on
training data in machine learning models. Ridge regression—also known as L2
regularization—is one of several types of regularization for linear regression models.
Regularization is a statistical method to reduce errors caused by overfitting on training
data.
(Over fitting: Overfitting occurs when a machine learning model learns the training data too
closely, capturing noise and patterns that do not generalize well to new, unseen data, resulting in
poor performance on test or validation sets. It is characterized by high accuracy on training data
but low accuracy on new data.)
OR
•Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is
•In this technique, the cost function is altered by adding the penalty term to it. The amount of bias added
to the model is called Ridge Regression penalty. We can calculate it by multiplying with the lambda to
•The equation for the cost function in ridge regression will be:
•In the above equation, the penalty term regularizes the coefficients of the model, and hence ridge regression
reduces the amplitudes of the coefficients that decreases the complexity of the model.
•As we can see from the above equation, if the values of λ tend to zero, the equation becomes the cost function
of the linear regression model. Hence, for the minimum value of λ, the model will resemble the linear regression
model.
•A general linear or polynomial regression will fail if there is high collinearity between the independent variables,
•It helps to solve the problems if we have more parameters than samples.
Lasso regression:
•Lasso regression is another regularization technique to reduce the complexity of the
model.
•It stands for Least Absolute and Selection Operator.
•It is similar to the Ridge Regression except that the penalty term contains only the
absolute weights instead of a square of weights.
•Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
•It is also called as L1 regularization. The equation for the cost function of Lasso
regression will be:
Difference Between Ridge Regression And
Lasso Regression
Linear Discriminant Analysis:
• Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for
classification tasks in machine learning. It is a technique used to find a linear combination
of features that best separates the classes in a dataset.
• It is used for modelling differences in groups i.e. separating two or more classes. It is used
to project the features in higher dimension space into a lower dimension space.
For example, we have two classes and we need to separate them efficiently. Classes can have
multiple features. Using only a single feature to classify them may result in some
overlapping as shown in the below figure.
EXAMPLE:
Suppose we have two sets of data points belonging to two different classes that we want to
classify. As shown in the given 2D graph, when the data points are plotted on the 2D plane,
there’s no straight line that can separate the two classes of the data points completely.
Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D
graph into a 1D graph in order to maximize the separability between the two classes.
Two criteria are used by LDA to create a new axis:
1.Maximize the distance between means of the two classes.
2.Minimize the variation within each class.
BEFORE AFTER
• In the above graph, it can be seen that a new axis (in red) is generated and plotted in
the 2D graph such that it maximizes the distance between the means of the two
classes and minimizes the variation within each class.
• In simple terms, this newly generated axis increases the separation between the data
points of the two classes.
• After generating this new axis using the above-mentioned criteria, all the data points
of the classes are plotted on this new axis and are shown in the figure given below.
•Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
•Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
Initialization:
Initialize weights and bias to small random values.
Input Signals:
Input features are multiplied by their respective weights.
Sum these products and add a bias.
Activation Function:
Use an activation function (often a step function) to determine the output of the perceptron based
on the weighted sum.
Error Calculation:
Calculate the error by comparing the predicted output to the actual output from the training data.
Update Weights:
Adjust weights and bias based on the error using a learning rate.
This step helps the perceptron learn from its mistakes and improve its accuracy.
Iteration:
Repeat steps 2-5 for multiple iterations or until convergence.
EXAMPLE : Imagine you have a dataset of flowers with two features: petal length (x1) and petal width (x2). The
goal is to predict whether a flower belongs to species A (1) or species B (0) based on these features.
• Initialization:
Start with random weights (w1, w2) and a bias (b).
• Input Signals:
For a flower with petal length x1 and width x2, calculate z = w1*x1 + w2*x2 + b.
• Activation Function:
If z is greater than or equal to 0, predict species A (1); otherwise, predict species B (0).
• Error Calculation:
Compare the prediction with the actual species label (0 or 1) to calculate the error.
• Update Weights:
Adjust weights and bias based on the error to improve the model's accuracy.
• Iteration:
Repeat steps 2-5 for each flower in the dataset until the model converges or for a set number of iterations.
TYPES OF PERCEPTRON LEARNING:
There are mainly two types of perceptron learning:
Single-Layer Perceptron (SLP):
Consists of a single layer of output nodes that directly produce the final output.
Primarily used for binary classification problems.
Limited to solving linearly separable problems.
Involves multiple layers of interconnected nodes, including input, hidden, and output
layers.
Can handle more complex problems and non-linear relationships between input and
output.
Utilizes activation functions and backpropagation for training.
THANK
YOU