0% found this document useful (0 votes)
31 views

LDA CreditCardDefault Code N

This document discusses using linear discriminant analysis (LDA) to build a predictive model for credit card default status. It loads credit card data from a CSV file and cleans the data. Univariate and bivariate analyses are conducted to explore relationships between variables like work experience and loan status. The distribution of loan statuses (default vs no default) is also calculated. LDA will be used to classify applications as default or no default and evaluate the model's performance.

Uploaded by

SHEKHAR SWAMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

LDA CreditCardDefault Code N

This document discusses using linear discriminant analysis (LDA) to build a predictive model for credit card default status. It loads credit card data from a CSV file and cleans the data. Univariate and bivariate analyses are conducted to explore relationships between variables like work experience and loan status. The distribution of loan statuses (default vs no default) is also calculated. LDA will be used to classify applications as default or no default and evaluate the model's performance.

Uploaded by

SHEKHAR SWAMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Problem Statement - Credit Card Defualt Status

Predictive Modeling - Linear Discriminant Analysis

#Import all necessary modules
import pandas as pd  ###Software library written for the Python programming language for data manipulation and analysis.
import numpy as np ### fundamental package for scientific computing with Python
import os ### using operating system dependent functionality
import scipy.stats as stats
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScaler

Set the working directory

#os.chdir('C:\\GL Class\Solution Preparation\Logistic Regression - Python')

Import Excel file

Load the Data file into Python DataFrame using pandas read_csv method

Ensure that data set loaded do not have any unicode character

data_df = pd.read_csv("default(HandsOnVideo Content).csv")

View Top 10 rows

head function is used to view the top records. The number records to be view need to be given in the parenthesis.

data_df.head(10)

Gender Loan.Offered Job Work.Exp Credit.Score EMI.Ratio Status Credit.Histo

0 Male 0 unskilled 14 86 3.0 No po

1 Female 1 skilled 15 94 3.0 No po

2 Male 0 unskilled 16 86 3.0 No po

3 Female 1 skilled 13 94 3.0 No po

4 Male 1 skilled 12 85 3.3 No po

5 Female 1 Management 12 86 3.6 No critic

6 Female 1 Management 15 86 3.6 No critic

7 Male 1 skilled 12 85 3.6 No po

8 Male 1 skilled 13 87 3.9 No critic

9 Male 1 skilled 13 89 4.0 No critic

Some more basic commands

tail function is used to view the last records. The number records to be view need to be given in the parenthesis.
data_df.tail(20)

Gender Loan.Offered Job Work.Exp Credit.Score EMI.Ratio Status Credit.His

761 Male 1 Management 0 43 14.0 Default

762 Male 1 Management 2 47 14.0 Default very

763 Female 1 skilled 5 58 14.0 Default c

764 Female 1 skilled 6 58 14.0 Default c

765 Male 1 skilled 1 42 14.0 Default c

766 Male 1 skilled 4 47 14.0 Default c

767 Male 1 skilled 3 47 14.2 Default c

768 Male 1 skilled 1 42 14.2 Default c

769 Male 1 skilled 4 52 14.3 Default

770 Male 1 skilled 3 42 14.3 Default

771 Male 1 skilled 3 52 14.4 Default

772 Male 1 skilled 7 59 14.4 Default

773 Male 1 skilled 7 59 14.4 Default

774 Male 0 unskilled 10 65 14.6 Default

775 Male 0 unskilled 2 46 14.7 Default c

776 Male 0 unskilled 2 46 14.7 Default c

777 Male 0 unskilled 3 54 14.7 Default

778 Male 0 unskilled 3 51 14.8 Default very

779 Male 0 unskilled 3 54 14.8 Default

780 Male 0 unskilled 3 51 14.8 Default very

data_df.describe()

Loan.Offered Work.Exp Credit.Score EMI.Ratio Own house Dependents

count 781.000000 781.000000 781.000000 781.000000 781.000000 781.000000

mean 0.756722 12.377721 83.597951 9.495006 0.768246 2.081946

std 0.429336 3.809161 12.040410 2.786867 0.422223 1.068641

min 0.000000 0.000000 42.000000 3.000000 0.000000 0.000000

25% 1.000000 11.000000 83.000000 7.400000 1.000000 2.000000

50% 1.000000 13.000000 87.000000 9.500000 1.000000 2.000000

75% 1.000000 15.000000 91.000000 11.400000 1.000000 3.000000

max 1.000000 19.000000 99.000000 15.000000 1.000000 4.000000

data_df.dtypes

Gender object
Loan.Offered int64
Job object
Work.Exp int64
Credit.Score int64
EMI.Ratio float64
Status object
Credit.History object
Own house int64
Purpose object
Dependents int64
dtype: object

type(data_df)

pandas.core.frame.DataFrame

Check for missing values

data_df.isnull().sum()
Gender 0
Loan.Offered 0
Job 0
Work.Exp 0
Credit.Score 0
EMI.Ratio 0
Status 0
Credit.History 0
Own house 0
Purpose 0
Dependents 0
dtype: int64

No Missing values

data_df.shape ### 781 rows and 11 features

(781, 11)

Convert Own House into object

data_df['Own house']=data_df['Own house'].astype('object')

Find out unique values in each categorical column

data_df['Gender'].unique()

array(['Male', 'Female'], dtype=object)

data_df['Job'].unique()

array(['unskilled', 'skilled', 'Management'], dtype=object)

data_df['Status'].unique() ### No means No Default

array(['No', 'Default'], dtype=object)

data_df['Credit.History'].unique()

array(['poor', 'critical', 'good', 'very good', 'verygood', 'Poor'],


dtype=object)

data_df['Own house'].unique()

array([1, 0], dtype=object)

data_df['Purpose'].unique()

array(['personal', 'car', 'education', 'consumer.durable'], dtype=object)

data_df.dtypes

Gender object
Loan.Offered int64
Job object
Work.Exp int64
Credit.Score int64
EMI.Ratio float64
Status object
Credit.History object
Own house object
Purpose object
Dependents int64
dtype: object

Clean the dataset -- Correction in the values

data_df['Credit.History']=np.where(data_df['Credit.History'] =='very good', 'verygood', data_df['Credit.History'])
data_df['Credit.History']=np.where(data_df['Credit.History'] =='Poor', 'poor', data_df['Credit.History'])

data_df['Credit.History'].unique()

array(['poor', 'critical', 'good', 'verygood'], dtype=object)

Count of Default and No Default in the Target Column

data_df['Status'].value_counts()

No 656
Default 125
Name: Status, dtype: int64

Univariate Plots

sns.distplot(data_df['Work.Exp'])
plt.show() ### Not required in this version, inserted just to show that if graph is not printing then this is required

C:\Users\Shikhar Shrivastava\anaconda3\lib\site-packages\seaborn\distributions.py:2619: Futu


warnings.warn(msg, FutureWarning)

Bivariate Analysis

Bivariate shown below is only as a sample.. Reader is adviced to perform complete data exploration process

sns.jointplot(data_df['Work.Exp'], data_df['Loan.Offered']) ### annotate function of stats is to print correlation value

C:\Users\Shikhar Shrivastava\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWa


warnings.warn(
<seaborn.axisgrid.JointGrid at 0x2403e0e0a00>

sns.stripplot(data_df['Status'], data_df['Work.Exp']) ### Concentration of observations
C:\Users\Shikhar Shrivastava\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWa
warnings.warn(
<AxesSubplot:xlabel='Status', ylabel='Work.Exp'>

Lower Work Experience have more concentration of Default.. Reader is adviced to perform more analysis and generate insight.

Find the Distribution of Dependent Variable Categories

count_no_sub = len(data_df[data_df['Status']=='No'])
count_sub = len(data_df[data_df['Status']=='Default'])
pct_of_no_sub = count_no_sub/(count_no_sub+count_sub)
print("percentage of no Default is", pct_of_no_sub*100)
pct_of_sub = count_sub/(count_no_sub+count_sub)
print("percentage of Default", pct_of_sub*100)

percentage of no Default is 83.99487836107554


percentage of Default 16.005121638924454

Distribution is not biased and hence no need to use SMOTE or any other package to balance binary classes

pd.crosstab(data_df.Dependents,data_df.Status).plot(kind='bar')
plt.title('Dependents Vs. Status')
plt.xlabel('Dependents')
plt.ylabel('Status')

Text(0, 0.5, 'Status')

Below commands are only done as a good practice.. Not mandatory

data_df.rename(columns = {'Own house':'Ownhouse'}, inplace = True) 
data_df.rename(columns = {'Loan.Offered':'LoanOffered'}, inplace = True) 
data_df.rename(columns = {'Work.Exp':'WorkExp'}, inplace = True) 
data_df.rename(columns = {'Credit.Score':'CreditScore'}, inplace = True) 
data_df.rename(columns = {'EMI.Ratio':'EMIRatio'}, inplace = True) 
data_df.rename(columns = {'Credit.History':'CreditHistory'}, inplace = True) 

data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 781 entries, 0 to 780
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Gender 781 non-null object
1 LoanOffered 781 non-null int64
2 Job 781 non-null object
3 WorkExp 781 non-null int64
4 CreditScore 781 non-null int64
5 EMIRatio 781 non-null float64
6 Status 781 non-null object
7 CreditHistory 781 non-null object
8 Ownhouse 781 non-null object
9 Purpose 781 non-null object
10 Dependents 781 non-null int64
dtypes: float64(1), int64(4), object(6)
memory usage: 67.2+ KB

Convert Object Feature types for Linear Discriminant Analysis

data_df['Gender']=np.where(data_df['Gender'] =='Male', 1, data_df['Gender'])
data_df['Gender']=np.where(data_df['Gender'] =='Female', 0, data_df['Gender'])

data_df['Job']=np.where(data_df['Job'] =='Management', 1, data_df['Job'])
data_df['Job']=np.where(data_df['Job'] =='unskilled', 0, data_df['Job'])
data_df['Job']=np.where(data_df['Job'] =='skilled', 2, data_df['Job'])

data_df['CreditHistory']=np.where(data_df['CreditHistory'] =='critical', 1, data_df['CreditHistory'])
data_df['CreditHistory']=np.where(data_df['CreditHistory'] =='poor', 0, data_df['CreditHistory'])
data_df['CreditHistory']=np.where(data_df['CreditHistory'] =='good', 2, data_df['CreditHistory'])
data_df['CreditHistory']=np.where(data_df['CreditHistory'] =='verygood', 3, data_df['CreditHistory'])

data_df['Purpose']=np.where(data_df['Purpose'] =='personal', 1, data_df['Purpose'])
data_df['Purpose']=np.where(data_df['Purpose'] =='car', 0, data_df['Purpose'])
data_df['Purpose']=np.where(data_df['Purpose'] =='education', 2, data_df['Purpose'])
data_df['Purpose']=np.where(data_df['Purpose'] =='consumer.durable', 3, data_df['Purpose'])

data_df.head()

Gender LoanOffered Job WorkExp CreditScore EMIRatio Status CreditHistory Ownhouse

0 1 0 0 14 86 3.0 No 0 1

1 0 1 2 15 94 3.0 No 0 1

2 1 0 0 16 86 3.0 No 0 1

3 0 1 2 13 94 3.0 No 0 1

4 1 1 2 12 85 3.3 No 0 1

#Scaling the data which is a pre-requisite for LDA 
scaler=StandardScaler()
X = scaler.fit_transform(data_df.drop(['Status'],axis=1))
Y = data_df['Status']

Y.value_counts()

No 656
Default 125
Name: Status, dtype: int64

Y.replace({"No":1,"Default":0})

0 1
1 1
2 1
3 1
4 1
..
776 0
777 0
778 0
779 0
780 0
Name: Status, Length: 781, dtype: int64

#Build LDA Model
# Refer details for LDA at http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
clf = LinearDiscriminantAnalysis()
model=clf.fit(X,Y)
model

LinearDiscriminantAnalysis()

# Predict it
pred_class = model.predict(X)
data_df['Prediction'] = pred_class 

# Check Correlation values
#Refer on correlation at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html
data= data_df[['Gender','LoanOffered','Job','WorkExp','CreditScore','EMIRatio','CreditHistory','Ownhouse','Purpose','Dependents']]
Cor1 = data.corr()
Cor1

LoanOffered WorkExp CreditScore EMIRatio Dependents

LoanOffered 1.000000 -0.076224 -0.082435 0.057273 -0.029145

WorkExp -0.076224 1.000000 0.915575 -0.300286 0.408753

CreditScore -0.082435 0.915575 1.000000 -0.382192 0.490798

EMIRatio 0.057273 -0.300286 -0.382192 1.000000 -0.251782

Dependents -0.029145 0.408753 0.490798 -0.251782 1.000000

#generate Confusion Matrix
# Please refer for confusion matrix http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
confusion_matrix(Y, pred_class)

array([[124, 1],
[ 22, 634]], dtype=int64)

plt.figure(figsize=(6,4))
sns.heatmap(confusion_matrix(Y, pred_class),annot=True,fmt='.4g'),'\n\n'
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
plt.show();

146 rows classified as 0 (Default) and 635 rows classified as 1 (Not Default)

from sklearn.metrics import classification_report
print(classification_report(Y, pred_class))

precision recall f1-score support

Default 0.85 0.99 0.92 125


No 1.00 0.97 0.98 656

accuracy 0.97 781


macro avg 0.92 0.98 0.95 781
weighted avg 0.97 0.97 0.97 781

X.shape

(781, 10)

model.coef_
array([[ 1.137129 , -0.46395456, 0.83372221, -1.08383805, 3.80345376,
-0.53102867, 0.36191579, 5.95912536, 0.14435512, 2.30379498]])

model.intercept_

array([9.15964746])

LDF for the above model will be

'''
LDF=9.159+ X1*1.137 + X2*(-0.463) + X3*(0.833) + X4*(-1.083) + X5*3.803 + X6*(-0.531) + X7*0.361 + X8*5.959 + X9*0.144 + X10*2.30
'''

'\nLDF=9.159+ X1*1.137 + X2*(-0.463) + X3*(0.833) + X4*(-1.083) + X5*3.803 + X6*(-0.531) + X7*0.361 + X8*5.959 + X9*0.144 +
X10*2.30\n'

So from the above equation the following things can be summarized as


the coeff of X8 predictor is largest in magnitude thus it helps in discriminating the target the best
the coeff of X2 predictor is smallest in magnitude thus it helps in discriminating the target the least.
all the DS can be computed for each row using the above f(x) which will aid in classification

Classification by Discriminant Score

#Computation of Discriminant Scores/LDF for each row of data

DS=[]
coef=[1.137129  , -0.46395456,  0.83372221, -1.08383805,  3.80345376,
        -0.53102867,  0.36191579,  5.95912536,  0.14435512,  2.30379498] # Coefficients 
for p in range(len(X)):
    s3=0
    for q in range(X.shape[1]):
        s3=s3+(X[p,q]*coef[q]) # Building the LDF equation 
    s3=s3+9.159
    DS.append(s3)
    
    

'''
Classification Rule :

if LDF>=0 then Classify as 1 
else if LDF <0 then Classify as 0 
'''

s1=0
s2=0
for i in range(len(X)):
    if DS[i]>=0:
        print("FOR Row:",i," ",X[i,:])
        print()
        #print("-->","{ prob(Y=1|X) =",pred_prob[:,1][i],">0.5 is True}")
        print("-->","{ DS: ",DS[i],">=0 , Classify as 1}")
        print("------------------------------------------------------------------------------------------")
        s1+=1
    elif DS[i]<0:
        print("FOR Row:",i," ",X[i,:])
        print()
        #print("-->","{ prob(Y=1|X) =",pred_prob[:,1][i],">0.5 is True}")
        print("-->","{ DS: ",DS[i],"<0 , Classify as 0}")
        print("------------------------------------------------------------------------------------------")
        s2+=1

FOR Row: 0 [ 0.65206018 -1.76366843 -2.20488852 0.42616174 0.19962674 -2.33206968


-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { DS: 12.795920822262884 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 1 [-1.53360077 0.567 0.58832792 0.68885497 0.8644817 -2.33206968
-1.80993827 0.5492419 -0.42611361 0.85963561]
--> { DS: 15.959211343453298 >=0 , Classify as 1}
------------------------------------------------------------------------------------------
FOR Row: 2 [ 0.65206018 -1.76366843 -2.20488852 0.9515482 0.19962674 -2.33206968
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { DS: 12.226486990137262 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 3 [-1.53360077 0.567 0.58832792 0.16346851 0.8644817 -2.33206968
-1.80993827 0.5492419 -0.42611361 0.85963561]

--> { DS: 16.52864517557892 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 4 [ 0.65206018 0.567 0.58832792 -0.09922471 0.11651988 -2.22435294
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { DS: 14.23950318117552 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 5 [-1.53360077 0.567 -0.8082803 -0.09922471 0.19962674 -2.1166362
-0.83439528 0.5492419 -0.42611361 1.79600299]

--> { DS: 15.516095232380136 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 6 [-1.53360077 0.567 -0.8082803 0.68885497 0.19962674 -2.1166362
-0.83439528 0.5492419 -0.42611361 1.79600299]

--> { DS: 14.661944484191704 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 7 [ 0.65206018 0.567 0.58832792 -0.09922471 0.11651988 -2.1166362
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { DS: 14.182302504986978 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 8 [ 0.65206018 0.567 0.58832792 0.16346851 0.28273361 -2.00891946
-0.83439528 0.5492419 -0.42611361 -0.07673177]

--> { DS: 14.825635596317687 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 9 [ 0.65206018 0.567 0.58832792 0.16346851 0.44894735 -1.97301389
-0.83439528 0.5492419 -0.42611361 -1.94946653]

--> { DS: 11.1243580481563 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 10 [-1.53360077 0.567 0.58832792 0.68885497 0.53205422 -1.93710831
-0.83439528 0.5492419 -0.42611361 0.85963561]

--> { DS: 14.838167396633985 >=0 , Classify as 1}


------------------------------------------------------------------------------------------
FOR Row: 11 [ 0.65206018 0.567 0.58832792 0.9515482 0.44894735 -1.90120273
-0.83439528 0.5492419 -0.42611361 -1.94946653]

print(s1," rows classified as 1 (Not Default) ")
print(s2," rows classified as 0 (Default) ")

635 rows classified as 1 (Not Default)


146 rows classified as 0 (Default)

Classification by Probability

pred_prob=model.predict_proba(X)#Posterior Probability for each row

pred_prob[:,1]

array([9.99997230e-01, 9.99999883e-01, 9.99995104e-01, 9.99999934e-01,


9.99999346e-01, 9.99999818e-01, 9.99999571e-01, 9.99999307e-01,
9.99999636e-01, 9.99985261e-01, 9.99999641e-01, 9.99964028e-01,
9.99999504e-01, 9.99999816e-01, 9.99999944e-01, 9.99999600e-01,
9.99998776e-01, 9.99998879e-01, 9.99999957e-01, 9.99969492e-01,
9.99968905e-01, 9.99999447e-01, 9.99945047e-01, 9.99999605e-01,
9.99999805e-01, 9.99945047e-01, 9.99999605e-01, 9.99999000e-01,
9.99999867e-01, 9.99999426e-01, 9.99999849e-01, 9.99999886e-01,
9.99999276e-01, 9.99999662e-01, 9.99997674e-01, 9.99998690e-01,
9.99999591e-01, 9.99998592e-01, 9.99999019e-01, 9.99999578e-01,
9.99999583e-01, 9.99998721e-01, 9.99999784e-01, 9.99999548e-01,
9.99999352e-01, 9.99999747e-01, 9.99999878e-01, 9.99999809e-01,
9.99999421e-01, 9.99999417e-01, 9.99997974e-01, 9.99999957e-01,
9.99999930e-01, 9.99995647e-01, 9.99998811e-01, 9.99999106e-01,
9.99999544e-01, 9.99999351e-01, 9.99999013e-01, 9.99999828e-01,
9.99998272e-01, 9.99999966e-01, 9.99998358e-01, 9.99999911e-01,
9.99999811e-01, 9.99999666e-01, 9.99999744e-01, 9.99999214e-01,
9.99999964e-01, 9.99999076e-01, 9.99999631e-01, 9.99999653e-01,
9.99582194e-01, 4.15299803e-01, 9.99999616e-01, 9.99999933e-01,
9.99999952e-01, 9.99999827e-01, 9.99999688e-01, 9.99999109e-01,
9.99999330e-01, 9.99999879e-01, 9.99999972e-01, 9.99999913e-01,
9.99998997e-01, 9.99999292e-01, 9.99999322e-01, 9.99998482e-01,
9.99999876e-01, 9.99999988e-01, 9.99998739e-01, 9.99999873e-01,
9.99999931e-01, 9.99995178e-01, 9.99995416e-01, 9.99999978e-01,
9.99999003e-01, 9.99996312e-01, 9.99998958e-01, 9.99996312e-01,
9.99999859e-01, 9.99999862e-01, 9.99999855e-01, 9.99549097e-01,
9.99999585e-01, 9.99999812e-01, 9.99997612e-01, 9.99999739e-01,
9.99999460e-01, 9.99999896e-01, 9.99999821e-01, 4.15150988e-01,
3.90301730e-01, 5.87157584e-01, 1.10849153e-02, 2.71041726e-01,
1.61645802e-01, 8.74835930e-01, 4.65103366e-01, 3.50892366e-01,
4.15205003e-01, 9.02886741e-01, 7.28027133e-01, 1.23825710e-01,
4.29940069e-01, 3.61972628e-01, 6.33954029e-01, 9.31032412e-01,
9.01149356e-01, 9.27306677e-01, 9.99999688e-01, 9.99996368e-01,
9.99999095e-01, 9.99999992e-01, 9.99999847e-01, 9.99999902e-01,
9.99998477e-01, 9.99999290e-01, 9.99999919e-01, 9.99999788e-01,
9.99999850e-01, 9.99999956e-01, 9.99994861e-01, 9.99997722e-01,
9.99998760e-01, 9.99999229e-01, 9.99999898e-01, 9.99999831e-01,
9.99999727e-01, 9.99999083e-01, 9.99998644e-01, 9.99999550e-01,
9.99999995e-01, 9.99999927e-01, 9.99999476e-01, 9.99999744e-01,
9.99998585e-01, 9.99999427e-01, 9.99999804e-01, 9.99999909e-01,
9.99999716e-01, 9.99999859e-01, 9.99999943e-01, 9.99998592e-01,
9.99999460e-01, 9.99999703e-01, 9.99999923e-01, 9.99999636e-01,
2.84522392e-01, 9.99999451e-01, 9.99999734e-01, 9.99998503e-01,
9.99999937e-01, 9.99999861e-01, 9.99998708e-01, 9.99996949e-01,
9.99922340e-01, 9.99999052e-01, 9.99997145e-01, 9.99999705e-01,
9.99996017e-01, 9.99991479e-01, 9.99996223e-01, 6.05726285e-01,
9.99999362e-01, 9.99999275e-01, 9.99999194e-01, 9.99983703e-01,
9.99999815e-01, 9.99999728e-01, 9.99999287e-01, 9.99999947e-01,
9.99999936e-01, 9.99999850e-01, 9.99999970e-01, 9.99999923e-01,
9.99999238e-01, 9.99999188e-01, 9.99999526e-01, 9.99999798e-01,
9.99999488e-01, 9.99999227e-01, 9.99999804e-01, 9.99997570e-01,
9.99998174e-01, 9.99999774e-01, 9.99999845e-01, 9.99999645e-01,
9.99999928e-01, 9.99998692e-01, 9.99999260e-01, 9.99999887e-01,
9.99994393e-01, 9.99999558e-01, 9.99936952e-01, 9.99998947e-01,
9.99999208e-01, 9.99994555e-01, 9.99999922e-01, 9.99999913e-01,
9.99999826e-01, 9.99999978e-01, 7.05627741e-01, 9.99999458e-01,
9.99999938e-01, 9.99976588e-01, 9.99999911e-01, 9.99999991e-01,
9 99841960e-01 9 99992949e-01 9 63048628e-01 6 76239653e-01

'''
Classification Rule :

if prob(Y=1|X) >=0 then Classify as 1 
else ifprob(Y=1|X) <0 then Classify as 0 
'''

s3,s4=0,0
for i in range(len(pred_prob[:,1])):
    if pred_prob[:,1][i]>=0.5:
        print("FOR Row:",i," ",X[i,:])
        print()
        print("-->","{ prob(Y=1|X) =",pred_prob[:,1][i],">=0.5 , Classify as 1 }")
        print("------------------------------------------------------------------------------------------")
        s3+=1
    elif pred_prob[:,1][i]<0.5:
        print("FOR Row:",i," ",X[i,:])
        print()
        print("-->","{ prob(Y=1|X) =",pred_prob[:,1][i],"< 0.5 , Classify as 0 }")
        print("------------------------------------------------------------------------------------------")
        s4+=1

  

FOR Row: 0 [ 0.65206018 -1.76366843 -2.20488852 0.42616174 0.19962674 -2.33206968


-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { prob(Y=1|X) = 0.999997229744646 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 1 [-1.53360077 0.567 0.58832792 0.68885497 0.8644817 -2.33206968
-1.80993827 0.5492419 -0.42611361 0.85963561]

--> { prob(Y=1|X) = 0.9999998828556507 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 2 [ 0.65206018 -1.76366843 -2.20488852 0.9515482 0.19962674 -2.33206968
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { prob(Y=1|X) = 0.9999951042317538 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 3 [-1.53360077 0.567 0.58832792 0.16346851 0.8644817 -2.33206968
-1.80993827 0.5492419 -0.42611361 0.85963561]

--> { prob(Y=1|X) = 0.999999933714369 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 4 [ 0.65206018 0.567 0.58832792 -0.09922471 0.11651988 -2.22435294
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { prob(Y=1|X) = 0.9999993459952966 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 5 [-1.53360077 0.567 -0.8082803 -0.09922471 0.19962674 -2.1166362
-0.83439528 0.5492419 -0.42611361 1.79600299]

--> { prob(Y=1|X) = 0.9999998175414595 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 6 [-1.53360077 0.567 -0.8082803 0.68885497 0.19962674 -2.1166362
-0.83439528 0.5492419 -0.42611361 1.79600299]

--> { prob(Y=1|X) = 0.9999995713359658 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 7 [ 0.65206018 0.567 0.58832792 -0.09922471 0.11651988 -2.1166362
-1.80993827 0.5492419 -0.42611361 -0.07673177]

--> { prob(Y=1|X) = 0.9999993074951923 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 8 [ 0.65206018 0.567 0.58832792 0.16346851 0.28273361 -2.00891946
-0.83439528 0.5492419 -0.42611361 -0.07673177]

--> { prob(Y=1|X) = 0.9999996360624034 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 9 [ 0.65206018 0.567 0.58832792 0.16346851 0.44894735 -1.97301389
-0.83439528 0.5492419 -0.42611361 -1.94946653]

--> { prob(Y=1|X) = 0.9999852610993039 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 10 [-1.53360077 0.567 0.58832792 0.68885497 0.53205422 -1.93710831
-0.83439528 0.5492419 -0.42611361 0.85963561]

--> { prob(Y=1|X) = 0.9999996405947326 >=0.5 , Classify as 1 }


------------------------------------------------------------------------------------------
FOR Row: 11 [ 0.65206018 0.567 0.58832792 0.9515482 0.44894735 -1.90120273
-0.83439528 0.5492419 -0.42611361 -1.94946653]

print(s3," rows classified as 1 (Not Default) ")
print(s4," rows classified as 0 (Default) ")

635 rows classified as 1 (Not Default)


146 rows classified as 0 (Default)

You might also like