Logistic Regression
Prof. Kailash Singh
Department of Chemical Engineering
MNIT Jaipur
Prof. Kailash Singh
Logistic Regression
(A Classification Algorithm)
Introduction
Prof. Kailash Singh
• Logistic regression is a classification algorithm used to
assign observations to a discrete set of classes.
• Some of the examples of classification problems are:
– Email spam or not spam,
– Online transactions Fraud or not Fraud,
– Tumor Malignant or Benign.
• y= 0 or 1. Here 0 : Negative class, 1: Positive class
• In Logistic regression, instead of fitting a regression line,
we fit an “S” shaped logistic function (sigmoid function),
which predicts two maximum values (0 or 1).
Logistic Function
Prof. Kailash Singh
• The sigmoid function is a mathematical function used
to map the predicted values to probabilities.
• It maps any real value into another value within a
range of 0 and 1. The value of the logistic regression
must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the
logistic function.
• In logistic regression, we use the concept of the
threshold value, which defines the probability of either
0 or 1. Such as values above the threshold value tends
to 1, and a value below the threshold values tends to 0.
Types of Logistic Regression
Prof. Kailash Singh
• Binomial: In binomial Logistic regression, there
can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression,
there can be 3 or more possible unordered types
of the dependent variable, such as “cat”, “dogs”,
or “sheep”
• Ordinal: In ordinal Logistic regression, there can
be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or
“High”.
Logistic function
Prof. Kailash Singh
• The logistic function
maps any real-valued
set of independent
variables input into a
value between 0 and 1.
• Logistic function (or
Sigmoid function) is
Prof. Kailash Singh
• Odds are the ratio of probability of success to
probability of failure = .
• , it has range from -∞ to
+∞
• So , which is Sigmoid
function.
Cost Function Prof. Kailash Singh
• The following cost function for one data point may be defined:
• The cost (J=-log(S)) approaches 0, if we correctly predict that y belongs to class 1.
• Similarly, the cost (J=-(1-log(S)) approaches to 0 if we correctly predict y = 0.
Overall Cost Function Prof. Kailash Singh
• The overall cost function can be written as
linear combination of the cost functions:
• Use Gradient Descent method to minimize the
cost function:
Or,
Note that xij=1 for j=0. is the learning rate.
A Schematic of Logistic Regression
Prof. Kailash Singh
Confusion matrix
Prof. Kailash Singh
• A confusion matrix is a matrix that summarizes the
performance of a ML classification model on a set of test
data.
• It is a means of displaying the number of accurate and
inaccurate instances based on the model’s predictions.
• The matrix displays the number of instances produced by the
model on the test data as follows:
– True Positive (TP): The model correctly predicted a positive
outcome.
– True Negative (TN): The model correctly predicted a negative
outcome.
– False Positive (FP): The model incorrectly predicted a positive
outcome Also known as a Type I error.
– False Negative (FN): The model incorrectly predicted a negative
outcome. Also known as a Type II error.
Prof. Kailash Singh
Confusion matrix is a 2x2
matrix:
TP: True Positive
FP: False Positive
FN: False Negative
TN: True Negative
Confusion matrix helps
understand if the model
is performing as
expected. It also helps
identify which classes of
data are most often
misclassified
Accuracy, Precision, Recall
Prof. Kailash Singh
• Accuracy is used to measure the performance of the model. It is the
ratio of total correct instances to the total instances:
• Precision is a measure of how accurate a model’s positive
predictions are:
• Recall measures the effectiveness of a classification model in
identifying all relevant instances from a dataset:
F1-Score and Specifity
Prof. Kailash Singh
• F1-score is used to evaluate the overall
performance of a classification model. It is the
harmonic mean of precision (P) and recall (R):
F1-Score =
• Specificity measures the ability of a model to
correctly identify negative instances. It is also
known as the True Negative Rate.
Type 1 and Type 2 errors
Prof. Kailash Singh
• Type 1 error occurs when the model predicts a
positive instance, but it is actually negative.
Type 1 error
• Type 2 error occurs when the model fails to
predict a positive instance.
Type 2 error
Jaccard Index
Prof. Kailash Singh
Jaccard Index, also known as Jaccard Similarity Coefficient is
defined as the size of the intersection divided by the size of
union of two label sets.
Suppose true data set is =[1,0,1,1,0,1].
and Predicted data set is =[0,0,1,1,0,0]
So 4 values in predicted data set are matching with actual
data. Hence,
Therefore, Jacard index is calculated as follows:
Example
Prof. Kailash Singh
Blood Heart
For the given data fit Age
45
Pressure
120
Cholesterol Disease
200 0
a Logistic Regression 50 130 220 1
60 140 240 1
model to predict 55 125 210 0
heart disease. 58 135 230 1
48 122 205 0
Predict the heart 62 145 250 1
53 128 215 0
disease of a person 59 138 235 1
with Age 58 years, 46
51
121
132
202
225
0
1
BP 135, cholesterol 65 150 260 1
57 134 228 0
230. 49 124 208 0
63 142 245 1
56 131 223 0
61 139 238 1
Python Code for Logistic Regression
import numpy as np Prof. Kailash Singh
def sigmoid(z):
return 1 / (1 + [Link](-z))
def initialize_parameters(n):
weights = [Link](n)
bias = 0
return weights, bias
def compute_cost(y, y_predicted):
m = [Link][0]
cost = -(1 / m) * [Link](y * [Link](y_predicted) + (1 - y) * [Link](1 - y_predicted))
return cost
def gradient_descent(X, y, weights, bias, learning_rate, num_iterations):
m = [Link][0]
for i in range(num_iterations):
linear_model = [Link](X, weights) + bias
y_predicted = sigmoid(linear_model)
dw = (1 / m) * [Link](X.T, (y_predicted - y))
db = (1 / m) * [Link](y_predicted - y)
weights -= learning_rate * dw
bias -= learning_rate * db
if i % 100 == 0:
cost = compute_cost(y, y_predicted)
print(f"Iteration {i}, Cost: {cost}")
return weights, bias
def predict(X, weights, bias):
linear_model = [Link](X, weights) + bias
y_predicted = sigmoid(linear_model)
y_predicted_class = [1 if i > 0.5 else 0 for i in y_predicted]
return [Link](y_predicted_class)
def scaled(a):
return ([Link]())/(max(a)-min(a))
Contd…
Prof. Kailash Singh
# Example usage
if __name__ == "__main__":
# Sample data
Age=[Link]([45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65, 57, 49, 63, 56, 61])
BP=[Link]([120, 130, 140, 125, 135, 122, 145, 128, 138, 121, 132, 150, 134, 124, 142,
131, 139])
Cholesterol=[Link]([200, 220, 240, 210, 230, 205, 250, 215, 235, 202, 225, 260, 228,
208, 245, 223, 238])
Heart_Disease=[Link]([0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1])
#Scale input data...
Age=scaled(Age); BP=scaled(BP); Cholesterol=scaled(Cholesterol)
X=[Link]([Age,BP,Cholesterol]).T
y=Heart_Disease
# Initialize parameters
weights, bias = initialize_parameters([Link][1])
# Train the model
learning_rate = 0.1
num_iterations = 1000
weights, bias = gradient_descent(X, y, weights, bias, learning_rate, num_iterations)
# Make predictions
yp = predict(X, weights, bias)
print("y :",y)
print("yp:", yp)
print("%Accuracy:",100*sum(y==yp)/len(y))
Python Code using sklearn
import pandas as pd
Prof. Kailash Singh
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, confusion_matrix
# Load the data
data = [Link]({
'Age': [45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65, 57, 49, 63, 56, 61],
'Blood Pressure': [120, 130, 140, 125, 135, 122, 145, 128, 138, 121, 132, 150, 134, 124, 142, 131, 139],
'Cholesterol': [200, 220, 240, 210, 230, 205, 250, 215, 235, 202, 225, 260, 228, 208, 245, 223, 238],
'Heart Disease': [0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1]
})
# Split the data into features (X) and target (y)
X = data[['Age', 'Blood Pressure', 'Cholesterol']]
y = data['Heart Disease']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a logistic regression model
model = LogisticRegression()
[Link](X_train, y_train)
predictions = [Link](X_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(confusion_matrix(y_test, predictions))
# Example prediction
example = [Link]({'Age': [58], 'Blood Pressure': [135], 'Cholesterol': [230]})
prediction = [Link](example)
print("Predicted probability of heart disease:", prediction)
Python Code using class Prof. Kailash Singh
#Code for Logistic Regression:
import numpy as np
# Main Prog……
class LogisticRegression:
def __init__(self, learning_rate=0.03, num_iterations=100):
self.learning_rate = learning_rate Age=[Link]([45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65,
self.num_iterations = num_iterations 57, 49, 63, 56, 61])
[Link] = None
[Link] = None
BP=[Link]([120, 130, 140, 125, 135, 122, 145, 128, 138,
121, 132, 150, 134, 124, 142, 131, 139])
def sigmoid(self, z): Cholesterol=[Link]([200, 220, 240, 210, 230, 205, 250,
return 1 / (1 + [Link](-z))
215, 235, 202, 225, 260, 228, 208, 245, 223, 238])
def fit(self, X, y): Heart_Disease=[Link]([0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0,
# Initialize parameters
n_samples, n_features = [Link]
1, 0, 1])
[Link] = [Link](n_features) #Scale input data...
[Link] = 0 Age=scaled(Age); BP=scaled(BP);
# Gradient descent Cholesterol=scaled(Cholesterol)
for _ in range(self.num_iterations):
linear_model = [Link](X, [Link]) + [Link]
y_predicted = [Link](linear_model)
X=[Link]([Age,BP,Cholesterol]).T
y=Heart_Disease
dw = (1 / n_samples) * [Link](X.T, (y_predicted - y))
db = (1 / n_samples) * [Link](y_predicted - y)
model = LogisticRegression()
[Link] -= self.learning_rate * dw [Link](X, y)
[Link] -= self.learning_rate * db
yp = [Link](X)
def predict(self, X): print("Actual :",y)
linear_model = [Link](X, [Link]) + [Link] print("Predictions:", [Link](yp))
y_predicted = [Link](linear_model)
y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted] Accuracy=100*sum(y==yp)/len(y)
return y_predicted_cls print(f"Accuracy={Accuracy:.2f}%")
def scaled(a):
return ([Link]())/(max(a)-min(a))
Problem
Prof. Kailash Singh
Suppose we want to predict
whether a student will pass(1)
Hours Passed /
or fail(0) an exam based on Studied Failed
the number of hours they 1 0
studied.
2 0
Find the probability that a
3 0
student will pass who studied
4 1
3 hours.
5 1
6 1
Solution
Prof. Kailash Singh
The probability of passing the exam is given by Sigmoid
function:
where
Using Python program: b0=-4 and b1=1.
Hence
Probability of passing is .