Chapter 10 – Logistic
Regression
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
© Galit Shmueli and Peter Bruce 2010
Logistic Regression
Powerful model-based classification tool
Extends idea of linear regression to situation
where outcome variable is categorical
Model relates predictors with the outcome
Example: Y denotes recommendation on
holding/selling/buying a stock – categorical
variable with 3 categories
We focus on binary classification, i.e. Y=0 or
Y=1 but predictors can be categorical or
continuous
Widely used, particularly where a structured
model is useful
The Logit
Goal: Find a function of the predictor variables
that relates them to a 0/1 outcome
Instead of Y as outcome variable (like in linear
regression), we use a function of Prob(Y=1)
called the logit
Logit can be modeled as a linear function of the
predictors
The logit can be mapped back to a probability,
which, in turn, can be mapped to a class
Using cut-off value on the probability of belonging
to class 1, P(Y=1)
From MLR to Logistic
Regression
How to make them match?
Logistic
Regression!
Another format
Equation 10.2 in textbook
Step 2: The Odds
The odds of an event are defined as:
p
eq. 10.3 Odds p = probability of
1 p event
Or, given the odds of an event, the probability
of the event can be computed by:
eq. Odds
p
10.4 1 Odds
We can also relate the Odds to
the predictors:
0 1 x1 2 x2 q xq
eq. 10.5 Odds e
Recall that:
Step 3: Take log on both
sides
• This gives us the logit:
eq. 10.6
• Log(odds) is called the logit and it takes
values from –∞ to +∞
• Logit is the dependent variable, and is a linear
function of the predictors x1, x2, …, xq
• Helps make interpretations easier
Example: Acceptance of
Personal Loan Offer
Outcome variable: accept bank loan (0/1)
Predictors: Demographic (age, income, etc.), and
information about their bank relationship (mortgage,
securities account, etc.)
Data: 5000 customers – 480 (9.6%) accepted the loan
offer previously
Goal: find characteristics of customers who are most
likely to accept loan offer in future mailings
Data preprocessing
Partition 60% training, 40% validation
Create 0/1 dummy variables for categorical
predictors
Single Predictor Model
Modeling loan acceptance on income (x)
Fitted coefficients: b0 = -6.3525, b1 = 0.0392
Last step - classification
Model produces an estimated probability of being a
“1”
Example: P(accept loan|income)
Convert to a classification by establishing cutoff level
If estimated prob. > cutoff, classify as “1”
Thus model helps in classification as well as
predicting the probability of belonging to one class
Default cut-off value: 0.50 but can be changed to:
Maximize classification accuracy
Example: Parameter
estimation
Estimates of ’s are derived through an
iterative process called maximum
likelihood estimation
Let us include all 12 predictors in the model
now
Estimated Equation for Logit
• Interpreting binary predictor effects:
• The odds of accepting the loan offer for those who already have a CD
account with the bank is 32.1 times as the odds of accepting the loan
offer for those who do not have a CD account (p value < 0.001).
• Interpreting continuous predictor effects:
• The odds of accepting the loan offer increases by 77.1% if the family
size increases by one (p value < 0.001).
• The odds of accepting the loan offer decreases by 4.4% if a client is 1
year older (p value = 0.624).
[Link]
Variable Selection
Problems:
As in linear regression, correlated predictors
introduce bias in the method
Overly complex models have the danger of overfitting
Solution: Remove extreme redundancies by
dropping predictors via automated selection of
variable subsets (like linear regressions) or by
data reduction methods such as PCA
P-values for Predictors
Test null hypothesis that coefficient = 0
P-values with the coefficients display results of
these tests
Coefficients with low p-values (close to 0) are
statistically significant
Useful for review to determine whether to
include variable in model
Key in profiling tasks, but less important in
predictive classification
Summary
Logistic regression is similar to linear
regression, except that it is used with a
categorical response
It can be used for explanatory tasks
(=profiling) or predictive tasks (=classification)
The predictors are related to the response Y via
a nonlinear function called the logit
As in linear regression, reducing predictors can
be done via variable selection
Logistic regression can be generalized to more
than two classes