0% found this document useful (0 votes)
31 views19 pages

Chap10 LogisticRegression

Uploaded by

Alison Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views19 pages

Chap10 LogisticRegression

Uploaded by

Alison Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter 10 – Logistic

Regression

Data Mining for Business


Intelligence
Shmueli, Patel & Bruce
© Galit Shmueli and Peter Bruce 2010
Logistic Regression
 Powerful model-based classification tool
 Extends idea of linear regression to situation
where outcome variable is categorical
Model relates predictors with the outcome
Example: Y denotes recommendation on
holding/selling/buying a stock – categorical
variable with 3 categories
 We focus on binary classification, i.e. Y=0 or
Y=1 but predictors can be categorical or
continuous
 Widely used, particularly where a structured
model is useful
The Logit
Goal: Find a function of the predictor variables
that relates them to a 0/1 outcome

 Instead of Y as outcome variable (like in linear


regression), we use a function of Prob(Y=1)
called the logit
 Logit can be modeled as a linear function of the
predictors
 The logit can be mapped back to a probability,
which, in turn, can be mapped to a class
Using cut-off value on the probability of belonging
to class 1, P(Y=1)
From MLR to Logistic
Regression
How to make them match?

Logistic
Regression!
Another format

Equation 10.2 in textbook


Step 2: The Odds

The odds of an event are defined as:

p
eq. 10.3 Odds  p = probability of
1 p event

Or, given the odds of an event, the probability


of the event can be computed by:

eq. Odds
p
10.4 1  Odds
We can also relate the Odds to
the predictors:

 0  1 x1   2 x2   q xq
eq. 10.5 Odds e

Recall that:
Step 3: Take log on both
sides
• This gives us the logit:

eq. 10.6

• Log(odds) is called the logit and it takes


values from –∞ to +∞
• Logit is the dependent variable, and is a linear
function of the predictors x1, x2, …, xq
• Helps make interpretations easier
Example: Acceptance of
Personal Loan Offer
Outcome variable: accept bank loan (0/1)

Predictors: Demographic (age, income, etc.), and


information about their bank relationship (mortgage,
securities account, etc.)

Data: 5000 customers – 480 (9.6%) accepted the loan


offer previously

Goal: find characteristics of customers who are most


likely to accept loan offer in future mailings
Data preprocessing
Partition 60% training, 40% validation
Create 0/1 dummy variables for categorical
predictors
Single Predictor Model
Modeling loan acceptance on income (x)

Fitted coefficients: b0 = -6.3525, b1 = 0.0392


Last step - classification
 Model produces an estimated probability of being a
“1”
Example: P(accept loan|income)
 Convert to a classification by establishing cutoff level
 If estimated prob. > cutoff, classify as “1”
 Thus model helps in classification as well as
predicting the probability of belonging to one class
 Default cut-off value: 0.50 but can be changed to:
Maximize classification accuracy
Example: Parameter
estimation
Estimates of ’s are derived through an
iterative process called maximum
likelihood estimation

Let us include all 12 predictors in the model


now
Estimated Equation for Logit

• Interpreting binary predictor effects:


• The odds of accepting the loan offer for those who already have a CD
account with the bank is 32.1 times as the odds of accepting the loan
offer for those who do not have a CD account (p value < 0.001).

• Interpreting continuous predictor effects:


• The odds of accepting the loan offer increases by 77.1% if the family
size increases by one (p value < 0.001).
• The odds of accepting the loan offer decreases by 4.4% if a client is 1
year older (p value = 0.624).
[Link]
Variable Selection
Problems:
As in linear regression, correlated predictors
introduce bias in the method
 Overly complex models have the danger of overfitting

Solution: Remove extreme redundancies by


dropping predictors via automated selection of
variable subsets (like linear regressions) or by
data reduction methods such as PCA
P-values for Predictors
 Test null hypothesis that coefficient = 0
P-values with the coefficients display results of
these tests
Coefficients with low p-values (close to 0) are
statistically significant
 Useful for review to determine whether to
include variable in model
 Key in profiling tasks, but less important in
predictive classification
Summary
 Logistic regression is similar to linear
regression, except that it is used with a
categorical response
 It can be used for explanatory tasks
(=profiling) or predictive tasks (=classification)
 The predictors are related to the response Y via
a nonlinear function called the logit
 As in linear regression, reducing predictors can
be done via variable selection
 Logistic regression can be generalized to more
than two classes

You might also like