0% found this document useful (0 votes)
9 views

Predictive Analytics (2)

Predictive analytics is a branch of advanced analytics that forecasts future outcomes using historical data and statistical modeling techniques. It encompasses supervised learning methods such as linear regression, classification, and various algorithms like Ridge and Lasso regression, which help in making predictions based on labeled datasets. The document also discusses feature selection techniques and the importance of minimizing model complexity to improve predictive performance.

Uploaded by

sai22072002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Predictive Analytics (2)

Predictive analytics is a branch of advanced analytics that forecasts future outcomes using historical data and statistical modeling techniques. It encompasses supervised learning methods such as linear regression, classification, and various algorithms like Ridge and Lasso regression, which help in making predictions based on labeled datasets. The document also discusses feature selection techniques and the importance of minimizing model complexity to improve predictive performance.

Uploaded by

sai22072002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

PREDICTIVE ANALYTICS

UNIT: 1
Linear methods for regression and classification: over view of
supervised learning, linear regression models and least squares,
multiple regression, multiple outputs, subset selection, Ridge
regression, lasso regression, linear discriminant analysis, logistic
regression, Perceptron learning algorithm
WHAT IS PREDICTIVE ANALYTICS ???

 PREDICTIVE ANALYTICS---The basic goal of predictive analytics is to


forecast what will happen in the future

 Predictive analytics is a branch of advanced analytics that makes


predictions about future outcomes using historical data combined with
statistical modeling, data mining techniques and machine learning.
Over view of supervised learning:
 Supervised learning is a machine learning technique that is widely used in various fields
such as finance, healthcare, marketing, and more.
 It is a form of machine learning in which the algorithm is trained on labeled data to
make predictions or decisions based on the data inputs.
 In supervised learning, the algorithm learns a mapping between the input and output
data. This mapping is learned from a labeled dataset, which consists of pairs of input
and output data.
 The algorithm tries to learn the relationship between the input and output data so that it
can make accurate predictions on new, unseen data.
 Supervised learning is where the model is trained on a labelled dataset.

 A labelled dataset is one that has both input and output parameters. In this type
of learning both training and validation is done

 The labeled dataset used in supervised learning consists of input features and
corresponding output labels

 The input features are the attributes or characteristics of the data that are used to
make predictions, while the output labels are the desired outcomes or targets that
the algorithm tries to predict.
•Figure A: It is a dataset of a shopping store that is useful in predicting whether a customer
will purchase a particular product under consideration or not based on his/ her gender, age,
and salary.

Input: Gender, Age, Salary

Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that
the customer won’t purchase it.
-----------------------------------------------------------------------------------------------------------
•Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed
based on different parameters.

Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction

Output: Wind Speed
Types of Supervised Learning Algorithm
Regression:
 Regression is a supervised learning technique used to predict continuous
numerical values based on input features. It aims to establish a functional
relationship between independent variables and a dependent variable, such as
predicting house prices based on features like size, bedrooms, and location.

 The goal is to minimize the difference between predicted and actual values using
algorithms like Linear Regression, Decision Trees, or Neural Networks.
Classification

 Classification is a type of supervised learning that categorizes input data into


predefined labels. It involves training a model on labeled examples to learn
patterns between input features and output classes.

 In classification, the target variable is a categorical value. For example,


classifying emails as spam or not.
linear regression models and least squares:
 Linear Regression is the simplest form of machine learning
 In statistics, linear regression is a linear approach to modelling the relationship
between a dependent variable and one or more independent variables.

 In the case of one independent variable it is called simple linear regression. For
more than one independent variable, the process is called mulitple linear
regression.
 Let X be the independent variable and Y be the dependent variable. We will
define a linear relationship between these two variables as follows:
This is the equation for a line that you studied in high school. m is the slope of the line and c is the
y intercept. Today we will use this equation to train our model with a given dataset and predict the
value of Y for any given value of X.
Our challenege today is to determine the value of m and c, that gives the minimum error for the
given dataset. We will be doing this by using the Least Squares method.

Finding the Error


So to minimize the error we need a way to calculate the error in the first place. A loss function in
machine learning is simply a measure of how different the predicted value is from the actual value.
Today we will be using the Quadratic Loss Function to calculate the loss or error in our model. It
can be defined as:

We are squaring it because, for the points below the regression line y — p will be negative and
we don’t want negative values in our total error.
Least Squares method
Now that we have determined the loss function, the only thing left to do is minimize it.
This is done by finding the partial derivative of L, equating it to 0 and then finding an
expression for m and c. After we do the math, we are left with these equations:

Here x̅ is the mean of all the values in the input X and ȳ is the mean of all the values in
the desired output Y. This is the Least Squares method.
Multiple Regression:
Multiple linear regression is used to estimate the relationship between two or more independent
variables and one dependent variable.
You can use multiple linear regression when you want to know:
1.How strong the relationship is between two or more independent variables and one dependent variable (e.g. how
rainfall, temperature, and amount of fertilizer added affect crop growth).
2.The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a
crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple linear regression example
You are a public health researcher interested in social factors that influence heart disease. You survey 500 towns
and gather data on the percentage of people in each town who smoke, the percentage of people in each town who
bike to work, and the percentage of people in each town who have heart disease.
Because you have two independent variables and one dependent variable, and all your variables are quantitative,
you can use multiple linear regression to analyze the relationship between them.
How to perform a multiple linear regression
Multiple linear regression formula
The formula for a multiple linear regression is:
To find the best-fit line for each independent variable, multiple linear
regression calculates three things:
•The regression coefficients that lead to the smallest overall model
error.
•The t statistic of the overall model.
•The associated p value (how likely it is that the t statistic would have
occurred by chance if the null hypothesis of no relationship between
the independent and dependent variables was true).
It then calculates the t statistic and p value for each regression
coefficient in the model.
SUBSET SELECTION:
 SubSet selection is a way of selecting the subset of the most relevant features from the original
features set by removing the redundant, irrelevant, or noisy features.
• While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant.
• If we input the dataset with all these redundant and irrelevant features, it may negatively impact
and reduce the overall performance and accuracy of the model.
• Hence it is very important to identify and select the most appropriate features from the data and
remove the irrelevant or less important features, which is done with the help of feature selection in
machine learning.
• Feature selection is one of the important concepts of machine learning, which highly impacts the
performance of the model. As machine learning works on the concept of "Garbage In Garbage
Out", so we always need to input the most appropriate and relevant dataset to the model in order to
get a better result.
Subset selection is a technique used in statistics and machine learning to choose a subset of features or variables from a larger set.
The goal is to identify the most relevant and informative subset that optimizes a certain criterion, such as model performance,
interpretability, or efficiency. There are various methods for subset selection, and they can be broadly categorized into three types:

Forward Selection:

Procedure: Start with an empty set and iteratively add variables that most improve the model performance.
Algorithmic Steps:
Evaluate models with one variable and select the one with the best performance.
Add one variable at a time, selecting the variable that contributes the most improvement.
Stop when a certain criterion is met (e.g., model performance no longer improves).

Backward Elimination:

Procedure: Start with all variables and iteratively remove the least valuable variables.
Algorithmic Steps:
Evaluate the model with all variables and select the one with the least contribution.
Remove one variable at a time, excluding the variable with the least impact.
Stop when a certain criterion is met.
Stepwise Selection:
Procedure: Combine elements of both forward and backward selection.
Algorithmic Steps:
Evaluate models with one variable and select the one with the best performance.
At each step, add or remove one variable based on its impact.
Continue until the addition or removal of variables no longer improves the model.

These methods are often used in the context of linear regression or other statistical models, where the goal is to improve
model fit or reduce overfitting. The choice of the subset selection method depends on the specific goals, the nature of the data,
and the characteristics of the model.

It's important to note that subset selection should be done carefully, considering the potential for overfitting or loss of
information. Cross-validation and other model evaluation techniques are often employed to ensure that the selected subset
generalizes well to new data.
EXAMPLE : Predicting Housing Prices

Features:
Square Footage (X1)
Number of Bedrooms (X2)
Distance to City Center (X3)
Presence of Nearby Schools (X4)
Crime Rate in the Neighborhood (X5)

Target Variable:
Housing Price (Y)

Subset Selection Process:


Initial Consideration:
Begin by considering all features (X1 to X5) for predicting the housing price.
Feature Importance:
Evaluate the importance of each feature individually in predicting the housing price.
Identify which features contribute the most to explaining the variance in housing prices.
Correlation Analysis:

Examine the pairwise correlations between features.


Identify if there are redundant features that provide similar information.
Stepwise Iterative Process:

Implement a stepwise process of adding or removing features based on their impact on the model.
Start with a subset of features and iteratively evaluate the model's performance.
Add features that improve the model or remove features that do not contribute significantly.

Final Subset:

Select the final subset of features that optimizes the model's predictive performance,
interpretability, or any other defined criteria.

Example Outcome:

After the subset selection process, the model might identify that Square Footage (X1), Number of
Bedrooms (X2), and Distance to City Center (X3) are the most significant features for predicting
housing prices. These features may provide a good balance between predictive power and
simplicity.
RIDGE REGRESSSION:
Ridge regression is a statistical regularization technique. It corrects for overfitting on
training data in machine learning models. Ridge regression—also known as L2
regularization—is one of several types of regularization for linear regression models.
Regularization is a statistical method to reduce errors caused by overfitting on training
data.
 (Over fitting: Overfitting occurs when a machine learning model learns the training data too
closely, capturing noise and patterns that do not generalize well to new, unseen data, resulting in
poor performance on test or validation sets. It is characterized by high accuracy on training data
but low accuracy on new data.)
OR

Ridge regression is a statistical regularization technique. It corrects for


overfitting on training data in machine learning models.
•Ridge regression is one of the types of linear regression in which a small amount of bias is introduced so

that we can get better long-term predictions.

•Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is

also called as L2 regularization.

•In this technique, the cost function is altered by adding the penalty term to it. The amount of bias added

to the model is called Ridge Regression penalty. We can calculate it by multiplying with the lambda to

the squared weight of each individual feature.

•The equation for the cost function in ridge regression will be:
•In the above equation, the penalty term regularizes the coefficients of the model, and hence ridge regression

reduces the amplitudes of the coefficients that decreases the complexity of the model.

•As we can see from the above equation, if the values of λ tend to zero, the equation becomes the cost function

of the linear regression model. Hence, for the minimum value of λ, the model will resemble the linear regression

model.

•A general linear or polynomial regression will fail if there is high collinearity between the independent variables,

so to solve such problems, Ridge regression can be used.

•It helps to solve the problems if we have more parameters than samples.
Lasso regression:
•Lasso regression is another regularization technique to reduce the complexity of the
model.
•It stands for Least Absolute and Selection Operator.
•It is similar to the Ridge Regression except that the penalty term contains only the
absolute weights instead of a square of weights.
•Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
•It is also called as L1 regularization. The equation for the cost function of Lasso
regression will be:
Difference Between Ridge Regression And
Lasso Regression
Linear Discriminant Analysis:
• Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for
classification tasks in machine learning. It is a technique used to find a linear combination
of features that best separates the classes in a dataset.

• Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant


Function Analysis is a dimensionality reduction technique that is commonly used for
supervised classification problems.

• It is used for modelling differences in groups i.e. separating two or more classes. It is used
to project the features in higher dimension space into a lower dimension space.
For example, we have two classes and we need to separate them efficiently. Classes can have
multiple features. Using only a single feature to classify them may result in some
overlapping as shown in the below figure.
EXAMPLE:

Suppose we have two sets of data points belonging to two different classes that we want to
classify. As shown in the given 2D graph, when the data points are plotted on the 2D plane,
there’s no straight line that can separate the two classes of the data points completely.
Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D
graph into a 1D graph in order to maximize the separability between the two classes.
Two criteria are used by LDA to create a new axis:
1.Maximize the distance between means of the two classes.
2.Minimize the variation within each class.

BEFORE AFTER
• In the above graph, it can be seen that a new axis (in red) is generated and plotted in
the 2D graph such that it maximizes the distance between the means of the two
classes and minimizes the variation within each class.
• In simple terms, this newly generated axis increases the separation between the data
points of the two classes.
• After generating this new axis using the above-mentioned criteria, all the data points
of the classes are plotted on this new axis and are shown in the figure given below.
•Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.

•Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.

• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.

•Logistic Regression is much similar to the Linear Regression except that


how they are used. Linear Regression is used for solving Regression
problems, whereas Logistic regression is used for solving the
classification problems.
•Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below image
is showing the logistic function:
Logistic Regression Equation:
The Logistic regression equation can be obtained from the Linear
Regression equation. The mathematical steps to get Logistic Regression
equations are given below:
•We know the equation of the straight line can be written as:

• In Logistic Regression y can be between 0 and 1 only, so for this let's


divide the above equation by (1-y):

• But we need range between -[infinity] to +[infinity], then take


logarithm of the equation it will become:
Type of Logistic Regression:
On the basis of the categories, Logistic Regression can be classified into
three types:
•Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
•Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat",
"dogs", or "sheep"
•Ordinal: In ordinal Logistic regression, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or
"High".
Perceptron Learning Algorithm :
The perceptron learning algorithm is a binary classification algorithm used in predictive
analytics. It's a type of supervised learning where the algorithm learns to classify inputs into two
categories based on training data.

Initialization:
Initialize weights and bias to small random values.

Input Signals:
Input features are multiplied by their respective weights.
Sum these products and add a bias.

Activation Function:
Use an activation function (often a step function) to determine the output of the perceptron based
on the weighted sum.
Error Calculation:
Calculate the error by comparing the predicted output to the actual output from the training data.

Update Weights:
Adjust weights and bias based on the error using a learning rate.
This step helps the perceptron learn from its mistakes and improve its accuracy.

Iteration:
Repeat steps 2-5 for multiple iterations or until convergence.
EXAMPLE : Imagine you have a dataset of flowers with two features: petal length (x1) and petal width (x2). The
goal is to predict whether a flower belongs to species A (1) or species B (0) based on these features.

• Initialization:
Start with random weights (w1, w2) and a bias (b).

• Input Signals:
For a flower with petal length x1 and width x2, calculate z = w1*x1 + w2*x2 + b.

• Activation Function:
If z is greater than or equal to 0, predict species A (1); otherwise, predict species B (0).

• Error Calculation:
Compare the prediction with the actual species label (0 or 1) to calculate the error.

• Update Weights:
Adjust weights and bias based on the error to improve the model's accuracy.

• Iteration:
Repeat steps 2-5 for each flower in the dataset until the model converges or for a set number of iterations.
TYPES OF PERCEPTRON LEARNING:
There are mainly two types of perceptron learning:
Single-Layer Perceptron (SLP):

 Consists of a single layer of output nodes that directly produce the final output.
 Primarily used for binary classification problems.
 Limited to solving linearly separable problems.

Multi-Layer Perceptron (MLP):

 Involves multiple layers of interconnected nodes, including input, hidden, and output
layers.
 Can handle more complex problems and non-linear relationships between input and
output.
 Utilizes activation functions and backpropagation for training.
THANK
YOU

You might also like