Amr Assignment 2: Logistic Regression On Credit Risk
Amr Assignment 2: Logistic Regression On Credit Risk
Submitted : Karthic CM
We consider the remaining factors as the independent variables (exception for the row
identifier : Loan_ID and Credit_History)
The constant has got a significant value denoting that there is a clear difference between Yes
and No categories of the dependent variable Loan_Status.
Model Summary
The Nagelkerke R2 statistic is usually referred to (it scales up to 1.0) and is a measure of how
well the dependent variable is covered by independent variables.
Classification Tablea
Predicted
Loan_Status
Observed N Y Percentage Correct
N 14 175 7.4
Loan_Status
Y 12 406 97.1
Overall Percentage 69.2
The classification (or truth) table shows that the model is good at predicting True Positives (at
97.1% accuracy) but not at predicting True Negatives (7.4%). This is not ideal for a bank as the
model should be used to predict which loans to reject.
Parameter coding
Frequency
(1) (2) (3) (4)
1 1.000 .000 .000 .000
0 349 .000 1.000 .000 .000
Dependents 1 103 .000 .000 1.000 .000
2 103 .000 .000 .000 1.000
3+ 51 .000 .000 .000 .000
Rural 177 1.000 .000
Property_Are Semiurba
231 .000 1.000
a n
Urban 199 .000 .000
Graduate 476 1.000
Education
Not Grad 131 .000
No 210 1.000
Married
Yes 397 .000
Self_Employ No 513 1.000
ed Yes 94 .000
Female 116 1.000
Gender
Male 491 .000
Variables in the Equation
Married
Education
Property Areas (Urban and Rural)
Adding the Credit_History independent factor alongside the above-identified significant factors
we can create a new model.
Parameter
coding
Frequency (1) (2)
2 1.000 .000
Credit_History 0 103 .000 1.000
1 509 .000 .000
Rural 179 1.000 .000
Property_Area Semiurban 233 .000 1.000
Urban 202 .000 .000
Graduate 480 1.000
Education
Not Grad 134 .000
No 214 1.000
Married
Yes 400 .000
Model Summary
It can be seen that the R2 statistic has improved significantly from 0.07 to 0.354
Classification Tablea
Predicted
Loan_Status Percentage
Observed N Y Correct
N 87 105 45.3
Loan_Status
Step 1 Y 16 406 96.2
Overall
80.3
Percentage
The truth table shows a high number of False Positives although the specificity is much better
than the last model. The prediction accuracy for True Negatives has also increased greatly from
7.4% to 45.3%. Even though education was significant in the last model but by adding credit
history we can see that it doesn’t have much of a significant impact on the dependent variable