0% found this document useful (0 votes)

60 views18 pages

Logistic Regression

This document provides an introduction to logistic regression, including: - Logistic regression predicts the probability of categorical dependent variables based on independent variables. It is used for binary classification problems like predicting churn, response to campaigns, or loan approvals. - Unlike linear regression, which predicts continuous outcomes, logistic regression models the log odds via the sigmoid/logistic function to map predictions between 0-1. - Maximum likelihood estimation is used instead of ordinary least squares. Key values include the logit, odds, and odds ratios to interpret the impact of predictors. - Applications include response modeling, risk modeling, customer profiling, and more. The Framingham Heart Study is discussed as an example of using logistic regression to predict

Uploaded by

Hrithik Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views18 pages

Logistic Regression

Uploaded by

Hrithik Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Introduction to R

History of R
Logistic regression is a type of regression analysis used for predicting the outcome of a Categorical dependent
variable (a dependent variable that can take on a limited number of categories) based on one or more predictor
variables (Continuous, Ordinal or Categorical).

Binary Logistic Regression has Outcome or Dependent Variable: Binary or Dichotomous i.e. 0 or 1

Example:

a customer will churn (1) or not (0)

a customer will respond to a campaign (1) or not (0)

should we grant a loan to a particular person (1) or not (0)

Logistic regression measures the relationship between a categorical dependent variable and independent variable (or
several), by converting the dependent variable to probability scores (P)

Probability score signifies the probability of event happening, for example probability of a customer to churn or
respond to a campaign
How is Logistic Regression different from Linear Regression
In Linear regression, the outcome variable is continuous and the predictor variables can be a mix of numeric and
categorical. But often there are situations where we wish to evaluate the effects of multiple explanatory variables on
a binary outcome variable
For example, the effects of a number of factors on the development or otherwise of a disease. A patient may be
cured or not; a prospect may respond or not, should we grant a loan to particular person or not, etc.
When the outcome or dependent variable is binary, and we wish to measure the effects of several independent
variables on it, we uses Logistic Regression
Probability of each observation will not be linearly distributed but more like sigmoid function i.e. values would be
closer to 0 and 1.
 The binary outcome variable can be coded as 0 or 1.
 The logistic curve is shown in the figure below:

Sigmoid Function
Concept of Sigmoid Function in Logistic Regression
 The sigmoid function is a bounded function.

 If b is –ve, thens shape will get reversed

 If we use linear regression, the predicted
value can become greater than one and less
than zero.
 Basically Y is a random variable having 0 or 1
outcome, which is a Bernoulli random
variable.

log of odds:

ln( p / 1  p)  a  bx
This is also called as a logit function

The estimation of parameters is done using Maximum Likelihood Estimate(for Non Linear
distribution) unlike Linear regression where method of Ordinary Least square is used.
Odds Ratio
Odds is calculated as P(Y = 1)/P(Y = 0)
Odds > 1 if Y = 1 is more likely
Odds < 1 if Y = 0 is more likely

This is called logit and it looks like “linear regression equation”

The bigger the logit is, bigger is P(Y = 1)
Quick Question 1
Suppose the coefficients of a logistic regression model with two
independent variables are as follows:

β0 = -1.5, β1 = 3, β2 = -0.5

And we have an observation with the following values of independent

variables:

x1 = 1, x2 = 5

What is the value of the Logit for this observation? Recall that the Logit is
log(Odds)

What is the value of the Odds for this observation? Note that you can
compute e^x, for some number x, in your R console by typing exp(x). The
function exp() computes the exponential of its argument

What is the value of P(y = 1) for this observation?

Applications of logistic regression in business

Response to an Subscriber
Churning up of
E-mail conversion after
subscribers
Campaign a Campaign

Response Conversion Attrition

Model Model Model

Cross Sell
Application Behavioral
Up Sell
Risk Model Risk Model
Model

Finding Credit Finding

Finding
card defaulters parameters that
probability of
by Demography boost Cross sell
loan defaults
and Behavior & Up sell

Just a few of them

Logistic Process
THE FRAMINGHAM HEART STUDY
Evaluating Risk Factors to Save Lives

Misconceptions in the first half of 20th Century about blood pressure

High blood pressure, dubbed hypertension, was considered important to
force blood through arteries and it was considered harmful to lower
blood pressure
In late 1940s, the US government set out to better understand
cardiovascular disease
The plan was to track a large cohort of initially healthy patients over
their lifetimes
A city was chosen, the city of Framingham, Massachusetts, to be the site
for the study
 Appropriate Size
 Stable population
5209 patients aged 30 – 59 enrolled
Patients were given questionnaire and exam every 2 years:
 Physical characteristics
 Behavioral characteristics
THE FRAMINGHAM HEART STUDY Contd..
Use the anonymized version of the original data that was collected
 Includes several demographic risk factors:
 the sex of the patient, male or female;
 the age of the patient in years;
 the education level coded as either 1 for some high school, 2 for a
high school diploma or GED, 3 for some college or vocational
school, and 4 for a college degree.
 Includes behavioral risk factors:
 Does the patient smoke (yes/no)
 Medical history - blood pressure medication, previously had a
stroke, hypertensive or not, diabetic or not
 Includes risk factors from physical examination:
 Cholesterol level
 Systolic/diastolic blood pressure
 Body mass index
 Heart rate
 Blood glucose level
THE FRAMINGHAM HEART STUDY Contd..
Building the model
Split our data randomly into training and testing set
Logistic regression to predict whether or not a patient experienced
Coronary Heart Disease within 10 years of first examination
After building the model, we will evaluate the predictive power of the
model on the test set
Threshold Value
The outcome of a logistic regression model is a probability
Often, we want to make a binary prediction – whether this person will
suffer from CHD or not
We can do this using a threshold value t

If P(CHD = 1) ≥ t, predict CHD

If P(PoorCare = 1) < t, predict healthy

What value should we pick?

Often selected based on which errors are better
Confusion Matrix
Compare actual outcomes to predicted outcomes using a confusion
matrix (classification matrix)

 Sensitivity = TP/(TP + FN)

 Specificity = TN/(TN + FP)
Receiver Operator Characteristic
 Receiver Operating Characteristic (ROC) curve is a graph between True Positive Rate (Sensitivity)
and False Positive Rate (1-Specificity)
 Accuracy is measured by the area under the ROC curve. The greater the area under curve better is
the model. An area of 1 represents a perfect test.
 Each point on the ROC curve represents a cutoff probability. These cutoff point represent the
tradeoff between sensitivity and specificity probabilities
 Ideally the goal should be to have high probabilities for both Sensitivity and Specificity
Selecting a Threshold using ROC
 Captures all thresholds simultaneously
 High threshold means High specificity and Low sensitivity
 Low Threshold means Low specificity and High sensitivity
 Choose best threshold for best trade off:
 cost of failing to detect positives
 costs of raising false alarms
Compute Outcome Measures
 Overall accuracy = (TN + TP)/N
 Overall error rate = (FP + FN)/N
 Sensitivity = TP/(TP + FN)
 Specificity = TN/(TN + FP)
 False negative error rate = FN/(TP + FN)
 False positive error rate = FP/(TN + FP)
Quick Question 2
Using the below confusion matrix, answer the following questions.

FALSE TRUE
0 1069 6
1 187 11
What is the sensitivity of our logistic regression model on the test set,
using a threshold of 0.5?
What is the specificity of our logistic regression model on the test set,
using a threshold of 0.5?
Thank You

Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Logistic Regression Monograph
No ratings yet
Logistic Regression Monograph
33 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression for Coupon Usage
100% (1)
Logistic Regression for Coupon Usage
56 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Logistic Regression for Analysts
No ratings yet
Logistic Regression for Analysts
33 pages
Logistic Regression for BBA Students
No ratings yet
Logistic Regression for BBA Students
12 pages
Logistic Regression in The Analytics Edge
No ratings yet
Logistic Regression in The Analytics Edge
57 pages
Understanding Logistic Regression in Biostatistics
No ratings yet
Understanding Logistic Regression in Biostatistics
32 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
37 pages
Logistic Regression for Researchers
100% (2)
Logistic Regression for Researchers
51 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
HR Analytics with Logistic Regression
No ratings yet
HR Analytics with Logistic Regression
9 pages
Logistic Regression for Healthcare Quality
No ratings yet
Logistic Regression for Healthcare Quality
54 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Lecture 6 Logistic Regression
No ratings yet
Lecture 6 Logistic Regression
28 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Binary Logistic Regression Overview
No ratings yet
Binary Logistic Regression Overview
48 pages
Eml 24.7.25
No ratings yet
Eml 24.7.25
23 pages
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
No ratings yet
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
20 pages
Understanding Logistic Regression
100% (3)
Understanding Logistic Regression
41 pages
Chap4 Logistic Regression
No ratings yet
Chap4 Logistic Regression
40 pages
Understanding Logistic Regression Basics
100% (1)
Understanding Logistic Regression Basics
5 pages
ML Logistic Regression Module3 Final
No ratings yet
ML Logistic Regression Module3 Final
22 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
49 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
8 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Earthing Features in Underground Tunnels
100% (1)
Earthing Features in Underground Tunnels
7 pages
Distribution Network Tata Steel
100% (1)
Distribution Network Tata Steel
7 pages
Alphatech India: Retaining Neptune Deal
No ratings yet
Alphatech India: Retaining Neptune Deal
8 pages
Virtual Reality: By-Shivam Chaudhary & Nitin Gusain Eee-A
No ratings yet
Virtual Reality: By-Shivam Chaudhary & Nitin Gusain Eee-A
15 pages
Project Soli: Gesture Control Technology
No ratings yet
Project Soli: Gesture Control Technology
14 pages
Research Paper On Standalone PV Microgrid
No ratings yet
Research Paper On Standalone PV Microgrid
6 pages
Final Review Paper
No ratings yet
Final Review Paper
8 pages
Final Review Paper
No ratings yet
Final Review Paper
8 pages
Class 8 Math Mock Test Paper
No ratings yet
Class 8 Math Mock Test Paper
5 pages
Akhil Jadawala (41492) A Darshana Gulhane (41491)
No ratings yet
Akhil Jadawala (41492) A Darshana Gulhane (41491)
25 pages
Sample Paper II (2023-24)
No ratings yet
Sample Paper II (2023-24)
15 pages
Slots Aut Sem'25 v05
No ratings yet
Slots Aut Sem'25 v05
2 pages
Reliability Test of Research Instrument - With Certification
No ratings yet
Reliability Test of Research Instrument - With Certification
3 pages
Ensemble of Technical Analysis and Machine Learning For Market Trend Prediction
No ratings yet
Ensemble of Technical Analysis and Machine Learning For Market Trend Prediction
7 pages
Econometrics For MPM, LNotes 2
No ratings yet
Econometrics For MPM, LNotes 2
45 pages
Arun Thesis Final
No ratings yet
Arun Thesis Final
64 pages
cs3251 Fall2025 Lecture 08
No ratings yet
cs3251 Fall2025 Lecture 08
47 pages
7.4 Latin Square Design
No ratings yet
7.4 Latin Square Design
7 pages
Traveling Salesman Problem
No ratings yet
Traveling Salesman Problem
19 pages
Formatiin of Y Bus Matrix
No ratings yet
Formatiin of Y Bus Matrix
6 pages
Second Quantization: K K K K K K K
No ratings yet
Second Quantization: K K K K K K K
2 pages
Quantum Retrocausality Insights
No ratings yet
Quantum Retrocausality Insights
7 pages
Differential Geometry: A Comprehensive Introduction To
No ratings yet
Differential Geometry: A Comprehensive Introduction To
5 pages
AI Mid Term Exam Sample Paper
100% (1)
AI Mid Term Exam Sample Paper
2 pages
Understanding Adversarial Robustness From Feature Maps of Convolutional Layers
No ratings yet
Understanding Adversarial Robustness From Feature Maps of Convolutional Layers
14 pages
PID Control Systems Explained
No ratings yet
PID Control Systems Explained
40 pages
Deep Reinforcement Learning For Charging Schedulin
No ratings yet
Deep Reinforcement Learning For Charging Schedulin
22 pages
TCS - CodeVita - Coding Arena-D
No ratings yet
TCS - CodeVita - Coding Arena-D
2 pages
Fla Qpaper July 2025 Sem
No ratings yet
Fla Qpaper July 2025 Sem
4 pages
Eigenvalues and Eigenvectors Analysis
No ratings yet
Eigenvalues and Eigenvectors Analysis
4 pages
Symbol Library Submenu Description Block Parameters Description
No ratings yet
Symbol Library Submenu Description Block Parameters Description
2 pages
Separation in Logistic Regression - Causes Consequences and Control
No ratings yet
Separation in Logistic Regression - Causes Consequences and Control
7 pages
Ai Note Unit 1-5 Panimalar
100% (1)
Ai Note Unit 1-5 Panimalar
380 pages
QM ZG528-L3
No ratings yet
QM ZG528-L3
21 pages
Iapm-V20pbfm01-Week 3 Laq
No ratings yet
Iapm-V20pbfm01-Week 3 Laq
3 pages
Final Year Report
No ratings yet
Final Year Report
30 pages
Applied Econometrics 3 Edition: Dimitrios Asteriou and Stephen G Hall
0% (1)
Applied Econometrics 3 Edition: Dimitrios Asteriou and Stephen G Hall
21 pages
Representation of Negative Numbers and
No ratings yet
Representation of Negative Numbers and
26 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

Introduction to R

a customer will churn (1) or not (0)

a customer will respond to a campaign (1) or not (0)

should we grant a loan to a particular person (1) or not (0)

 If b is –ve, thens shape will get reversed

This is called logit and it looks like “linear regression equation”

And we have an observation with the following values of independent

What is the value of P(y = 1) for this observation?

Response Conversion Attrition

Finding Credit Finding

Just a few of them

Misconceptions in the first half of 20th Century about blood pressure

If P(CHD = 1) ≥ t, predict CHD

What value should we pick?

 Sensitivity = TP/(TP + FN)

You might also like