0% found this document useful (0 votes)
11 views23 pages

07 Logistics Regression

Uploaded by

BALI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

07 Logistics Regression

Uploaded by

BALI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Logistic Regression

Prof. Kailash Singh


Department of Chemical Engineering
MNIT Jaipur
Prof. Kailash Singh

Logistic Regression
(A Classification Algorithm)
Introduction
Prof. Kailash Singh

• Logistic regression is a classification algorithm used to


assign observations to a discrete set of classes.
• Some of the examples of classification problems are:
– Email spam or not spam,
– Online transactions Fraud or not Fraud,
– Tumor Malignant or Benign.
• y= 0 or 1. Here 0 : Negative class, 1: Positive class
• In Logistic regression, instead of fitting a regression line,
we fit an “S” shaped logistic function (sigmoid function),
which predicts two maximum values (0 or 1).
Logistic Function
Prof. Kailash Singh

• The sigmoid function is a mathematical function used


to map the predicted values to probabilities.
• It maps any real value into another value within a
range of 0 and 1. The value of the logistic regression
must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the
logistic function.
• In logistic regression, we use the concept of the
threshold value, which defines the probability of either
0 or 1. Such as values above the threshold value tends
to 1, and a value below the threshold values tends to 0.
Types of Logistic Regression
Prof. Kailash Singh

• Binomial: In binomial Logistic regression, there


can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression,
there can be 3 or more possible unordered types
of the dependent variable, such as “cat”, “dogs”,
or “sheep”
• Ordinal: In ordinal Logistic regression, there can
be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or
“High”.
Logistic function
Prof. Kailash Singh

• The logistic function


maps any real-valued
set of independent
variables input into a
value between 0 and 1.
• Logistic function (or
Sigmoid function) is
Prof. Kailash Singh

• Odds are the ratio of probability of success to


probability of failure = .

• , it has range from -∞ to


+∞
• So , which is Sigmoid
function.
Cost Function Prof. Kailash Singh

• The following cost function for one data point may be defined:

• The cost (J=-log(S)) approaches 0, if we correctly predict that y belongs to class 1.


• Similarly, the cost (J=-(1-log(S)) approaches to 0 if we correctly predict y = 0.
Overall Cost Function Prof. Kailash Singh

• The overall cost function can be written as


linear combination of the cost functions:

• Use Gradient Descent method to minimize the


cost function:

Or,
Note that xij=1 for j=0. is the learning rate.
A Schematic of Logistic Regression
Prof. Kailash Singh
Confusion matrix
Prof. Kailash Singh

• A confusion matrix is a matrix that summarizes the


performance of a ML classification model on a set of test
data.
• It is a means of displaying the number of accurate and
inaccurate instances based on the model’s predictions.
• The matrix displays the number of instances produced by the
model on the test data as follows:
– True Positive (TP): The model correctly predicted a positive
outcome.
– True Negative (TN): The model correctly predicted a negative
outcome.
– False Positive (FP): The model incorrectly predicted a positive
outcome Also known as a Type I error.
– False Negative (FN): The model incorrectly predicted a negative
outcome. Also known as a Type II error.
Prof. Kailash Singh

Confusion matrix is a 2x2


matrix:
TP: True Positive
FP: False Positive
FN: False Negative
TN: True Negative
Confusion matrix helps
understand if the model
is performing as
expected. It also helps
identify which classes of
data are most often
misclassified
Accuracy, Precision, Recall
Prof. Kailash Singh

• Accuracy is used to measure the performance of the model. It is the


ratio of total correct instances to the total instances:

• Precision is a measure of how accurate a model’s positive


predictions are:

• Recall measures the effectiveness of a classification model in


identifying all relevant instances from a dataset:
F1-Score and Specifity
Prof. Kailash Singh

• F1-score is used to evaluate the overall


performance of a classification model. It is the
harmonic mean of precision (P) and recall (R):
F1-Score =
• Specificity measures the ability of a model to
correctly identify negative instances. It is also
known as the True Negative Rate.
Type 1 and Type 2 errors
Prof. Kailash Singh

• Type 1 error occurs when the model predicts a


positive instance, but it is actually negative.
Type 1 error
• Type 2 error occurs when the model fails to
predict a positive instance.
Type 2 error
Jaccard Index
Prof. Kailash Singh

Jaccard Index, also known as Jaccard Similarity Coefficient is


defined as the size of the intersection divided by the size of
union of two label sets.

Suppose true data set is =[1,0,1,1,0,1].


and Predicted data set is =[0,0,1,1,0,0]
So 4 values in predicted data set are matching with actual
data. Hence,
Therefore, Jacard index is calculated as follows:
Example
Prof. Kailash Singh

Blood Heart
For the given data fit Age
45
Pressure
120
Cholesterol Disease
200 0
a Logistic Regression 50 130 220 1
60 140 240 1
model to predict 55 125 210 0
heart disease. 58 135 230 1
48 122 205 0
Predict the heart 62 145 250 1
53 128 215 0
disease of a person 59 138 235 1
with Age 58 years, 46
51
121
132
202
225
0
1
BP 135, cholesterol 65 150 260 1
57 134 228 0
230. 49 124 208 0
63 142 245 1
56 131 223 0
61 139 238 1
Python Code for Logistic Regression
import numpy as np Prof. Kailash Singh

def sigmoid(z):
return 1 / (1 + [Link](-z))

def initialize_parameters(n):
weights = [Link](n)
bias = 0
return weights, bias

def compute_cost(y, y_predicted):


m = [Link][0]
cost = -(1 / m) * [Link](y * [Link](y_predicted) + (1 - y) * [Link](1 - y_predicted))
return cost

def gradient_descent(X, y, weights, bias, learning_rate, num_iterations):


m = [Link][0]
for i in range(num_iterations):
linear_model = [Link](X, weights) + bias
y_predicted = sigmoid(linear_model)

dw = (1 / m) * [Link](X.T, (y_predicted - y))


db = (1 / m) * [Link](y_predicted - y)

weights -= learning_rate * dw
bias -= learning_rate * db

if i % 100 == 0:
cost = compute_cost(y, y_predicted)
print(f"Iteration {i}, Cost: {cost}")

return weights, bias

def predict(X, weights, bias):


linear_model = [Link](X, weights) + bias
y_predicted = sigmoid(linear_model)
y_predicted_class = [1 if i > 0.5 else 0 for i in y_predicted]
return [Link](y_predicted_class)

def scaled(a):
return ([Link]())/(max(a)-min(a))
Contd…
Prof. Kailash Singh

# Example usage
if __name__ == "__main__":
# Sample data
Age=[Link]([45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65, 57, 49, 63, 56, 61])
BP=[Link]([120, 130, 140, 125, 135, 122, 145, 128, 138, 121, 132, 150, 134, 124, 142,
131, 139])
Cholesterol=[Link]([200, 220, 240, 210, 230, 205, 250, 215, 235, 202, 225, 260, 228,
208, 245, 223, 238])
Heart_Disease=[Link]([0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1])
#Scale input data...
Age=scaled(Age); BP=scaled(BP); Cholesterol=scaled(Cholesterol)

X=[Link]([Age,BP,Cholesterol]).T
y=Heart_Disease

# Initialize parameters
weights, bias = initialize_parameters([Link][1])

# Train the model


learning_rate = 0.1
num_iterations = 1000
weights, bias = gradient_descent(X, y, weights, bias, learning_rate, num_iterations)

# Make predictions
yp = predict(X, weights, bias)
print("y :",y)
print("yp:", yp)
print("%Accuracy:",100*sum(y==yp)/len(y))
Python Code using sklearn
import pandas as pd
Prof. Kailash Singh

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, confusion_matrix

# Load the data


data = [Link]({
'Age': [45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65, 57, 49, 63, 56, 61],
'Blood Pressure': [120, 130, 140, 125, 135, 122, 145, 128, 138, 121, 132, 150, 134, 124, 142, 131, 139],
'Cholesterol': [200, 220, 240, 210, 230, 205, 250, 215, 235, 202, 225, 260, 228, 208, 245, 223, 238],
'Heart Disease': [0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1]
})

# Split the data into features (X) and target (y)


X = data[['Age', 'Blood Pressure', 'Cholesterol']]
y = data['Heart Disease']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model


model = LogisticRegression()
[Link](X_train, y_train)
predictions = [Link](X_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(confusion_matrix(y_test, predictions))

# Example prediction
example = [Link]({'Age': [58], 'Blood Pressure': [135], 'Cholesterol': [230]})
prediction = [Link](example)
print("Predicted probability of heart disease:", prediction)
Python Code using class Prof. Kailash Singh

#Code for Logistic Regression:


import numpy as np
# Main Prog……
class LogisticRegression:
def __init__(self, learning_rate=0.03, num_iterations=100):
self.learning_rate = learning_rate Age=[Link]([45, 50, 60, 55, 58, 48, 62, 53, 59, 46, 51, 65,
self.num_iterations = num_iterations 57, 49, 63, 56, 61])
[Link] = None
[Link] = None
BP=[Link]([120, 130, 140, 125, 135, 122, 145, 128, 138,
121, 132, 150, 134, 124, 142, 131, 139])
def sigmoid(self, z): Cholesterol=[Link]([200, 220, 240, 210, 230, 205, 250,
return 1 / (1 + [Link](-z))
215, 235, 202, 225, 260, 228, 208, 245, 223, 238])
def fit(self, X, y): Heart_Disease=[Link]([0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0,
# Initialize parameters
n_samples, n_features = [Link]
1, 0, 1])
[Link] = [Link](n_features) #Scale input data...
[Link] = 0 Age=scaled(Age); BP=scaled(BP);
# Gradient descent Cholesterol=scaled(Cholesterol)
for _ in range(self.num_iterations):
linear_model = [Link](X, [Link]) + [Link]
y_predicted = [Link](linear_model)
X=[Link]([Age,BP,Cholesterol]).T
y=Heart_Disease
dw = (1 / n_samples) * [Link](X.T, (y_predicted - y))
db = (1 / n_samples) * [Link](y_predicted - y)
model = LogisticRegression()
[Link] -= self.learning_rate * dw [Link](X, y)
[Link] -= self.learning_rate * db
yp = [Link](X)
def predict(self, X): print("Actual :",y)
linear_model = [Link](X, [Link]) + [Link] print("Predictions:", [Link](yp))
y_predicted = [Link](linear_model)
y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted] Accuracy=100*sum(y==yp)/len(y)
return y_predicted_cls print(f"Accuracy={Accuracy:.2f}%")
def scaled(a):
return ([Link]())/(max(a)-min(a))
Problem
Prof. Kailash Singh

Suppose we want to predict


whether a student will pass(1)
Hours Passed /
or fail(0) an exam based on Studied Failed
the number of hours they 1 0
studied.
2 0
Find the probability that a
3 0
student will pass who studied
4 1
3 hours.
5 1
6 1
Solution
Prof. Kailash Singh

The probability of passing the exam is given by Sigmoid


function:
where

Using Python program: b0=-4 and b1=1.


Hence
Probability of passing is .

You might also like