0% found this document useful (0 votes)
34 views6 pages

Amr Assignment 2: Logistic Regression On Credit Risk

The document describes using logistic regression to analyze factors influencing credit risk and loan status. Several models are tested using different independent variables like marital status, education, property area, and credit history. The best model includes credit history as a significant factor, improving the prediction of true negatives from 7.4% to 45.3% and the overall accuracy from 69.2% to 80.3%.

Uploaded by

Karthic C M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views6 pages

Amr Assignment 2: Logistic Regression On Credit Risk

The document describes using logistic regression to analyze factors influencing credit risk and loan status. Several models are tested using different independent variables like marital status, education, property area, and credit history. The best model includes credit history as a significant factor, improving the prediction of true negatives from 7.4% to 45.3% and the overall accuracy from 69.2% to 80.3%.

Uploaded by

Karthic C M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AMR ASSIGNMENT 2

LOGISTIC REGRESSION ON CREDIT RISK

Submitted : Karthic CM

Roll no : 133078 | IMG 13 | AMR 1

Date of Submission : 30th August 2020


Logistic Regression

The dependent variable : Loan Status

Loan Status has two possible outputs : Yes and No

We consider the remaining factors as the independent variables (exception for the row
identifier : Loan_ID and Credit_History)

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step 0 Constant .794 .088 81.997 1 .000 2.212

The constant has got a significant value denoting that there is a clear difference between Yes
and No categories of the dependent variable Loan_Status.

Model Summary

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square


1 722.105a .049 .070

a. Estimation terminated at iteration number 20 because maximum


iterations has been reached. Final solution cannot be found.

The Nagelkerke R2 statistic is usually referred to (it scales up to 1.0) and is a measure of how
well the dependent variable is covered by independent variables.
Classification Tablea

Predicted
Loan_Status
Observed N Y Percentage Correct
N 14 175 7.4
Loan_Status
Y 12 406 97.1
Overall Percentage 69.2
The classification (or truth) table shows that the model is good at predicting True Positives (at
97.1% accuracy) but not at predicting True Negatives (7.4%). This is not ideal for a bank as the
model should be used to predict which loans to reject.

Categorical Variables Codings

Parameter coding
Frequency
(1) (2) (3) (4)
1 1.000 .000 .000 .000
0 349 .000 1.000 .000 .000
Dependents 1 103 .000 .000 1.000 .000
2 103 .000 .000 .000 1.000
3+ 51 .000 .000 .000 .000
Rural 177 1.000 .000
Property_Are Semiurba
231 .000 1.000
a n
Urban 199 .000 .000
Graduate 476 1.000
Education
Not Grad 131 .000
No 210 1.000
Married
Yes 397 .000
Self_Employ No 513 1.000
ed Yes 94 .000
Female 116 1.000
Gender
Male 491 .000
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Gender(1) -.013 .249 .003 1 .958 .987
Married(1) -.509 .214 5.633 1 .018 .601
Dependents 2.559 4 .634
Dependents(1) -21.980 40192.970 .000 1 1.000 .000
Dependents(2) .305 .341 .798 1 .372 1.356
Dependents(3) -.025 .377 .004 1 .948 .976
Dependents(4) .362 .386 .882 1 .348 1.436
Education(1) .492 .218 5.084 1 .024 1.636

Step 1a Self_Employed(1) -.076 .253 .089 1 .765 .927


ApplicantIncome .000 .000 .000 1 .997 1.000
CoapplicantIncome .000 .000 2.162 1 .141 1.000
LoanAmount -.001 .001 .592 1 .442 .999
Loan_Amount_Term -.001 .001 .648 1 .421 .999
Property_Area 11.448 2 .003

Property_Area(1) -.170 .222 .583 1 .445 .844


Property_Area(2) .558 .222 6.311 1 .012 1.748

Constant .939 .645 2.122 1 .145 2.558

a. Variable(s) entered on step 1: Gender, Married, Dependents, Education,


Self_Employed, ApplicantIncome, CoapplicantIncome, LoanAmount,
Loan_Amount_Term, Property_Area.

The model shows significant independent variables, which are

 Married
 Education
 Property Areas (Urban and Rural)
Adding the Credit_History independent factor alongside the above-identified significant factors
we can create a new model.

Categorical Variables Codings

Parameter
coding
Frequency (1) (2)
2 1.000 .000
Credit_History 0 103 .000 1.000
1 509 .000 .000
Rural 179 1.000 .000
Property_Area Semiurban 233 .000 1.000
Urban 202 .000 .000
Graduate 480 1.000
Education
Not Grad 134 .000
No 214 1.000
Married
Yes 400 .000

Model Summary

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square


1 585.044a .251 .354

a. Estimation terminated at iteration number 20 because maximum


iterations has been reached. Final solution cannot be found.

It can be seen that the R2 statistic has improved significantly from 0.07 to 0.354
Classification Tablea

Predicted
Loan_Status Percentage
Observed N Y Correct
N 87 105 45.3
Loan_Status
Step 1 Y 16 406 96.2
Overall
80.3
Percentage

a. The cut value is .500

The truth table shows a high number of False Positives although the specificity is much better

than the last model. The prediction accuracy for True Negatives has also increased greatly from

7.4% to 45.3%. Even though education was significant in the last model but by adding credit

history we can see that it doesn’t have much of a significant impact on the dependent variable

You might also like