100% found this document useful (1 vote)

173 views24 pages

Project 3 Thera Bank

The document summarizes a project report for Thera Bank that aims to build a classification model to identify customers most likely to purchase a loan. It performs exploratory data analysis on a dataset of 5,000 customers, identifies patterns and outliers. Key findings include age and experience being normally distributed while income, credit spending and mortgages have many outliers. Family size and education correlate with loan acceptance, with more advanced education customers needing loans for higher studies. The analysis will inform feature selection for building a predictive model to increase the loan approval success rate.

Uploaded by

Meghapriya1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

173 views24 pages

Project 3 Thera Bank

Uploaded by

Meghapriya1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

PROJECT #3

Project report on Thera Bank

1.0 PROJECT OBJECTIVE

The objective is to build a classification model to identify the potential customers with higher
probability to purchase the loan.

2.0 ASSUMPTIONS

Increase success rate from 9% of last year by Converting liability customer to personal loan
customers.

3.0 EXPLORATORY DATA ANALYSIS

To perform Exploratory data analysis by using Thera bank data set, perform predictive analysis and
build up a model to identify potential customers. This is done through various histograms,
identification of outliers and also identification of specific area (Zip code) to cut short the process of
customer base identification.

3.1 Environment set up and data analysis

3.1.1 Installation of packages and library setup

1. install.packages("caret")
2. install.packages("rpart")
3. install.packages("rpart.plot")
4. install.packages("randomForest")
5. install.packages("lattice")
6. install.packages("ggplot2")
7. install.packages(“scales”)
8. library(ROCR)
9. library(ineq)
10. library(rattle)
11. library(RColorBrewer

3.1.2 Setup working directory

Set up working directory and import “Thera Bank_dataset.xlsx” for further interpretation of data and
model build-up.

setwd("C:/Users/MEGHA/Desktop/Thera Bank")

3.1.3 Import the data and read data set

library(readxl)

Thera_Bank_dataset <- read_excel("Thera Bank_dataset.xlsx")

View("Thera Bank_dataset.xlsx")

dim(Thera_Bank_dataset): To find out number of observations and variables.

[1] 5000 14

To find out Class of each Feature, along with internal structure

> str(Thera_Bank_dataset)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5000 obs. of 14 variables:
$ ID : num 1 2 3 4 5 6 7 8 9 10 ...
$ Age (in years) : num 25 45 39 35 35 37 53 50 35 34 ...
$ Experience (in years): num 1 19 15 9 8 13 27 24 10 9 ...
$ Income (in K/year) : num 49 34 11 100 45 29 72 22 81 180 ...
$ ZIP Code : num 91107 90089 94720 94112 91330 ...
$ Family members : num 4 3 1 1 4 4 2 1 3 1 ...
$ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
$ Education : num 1 1 1 2 2 2 2 3 2 3 ...
$ Mortgage : num 0 0 0 0 0 155 0 0 104 0 ...
$ Personal Loan : num 0 0 0 0 0 0 0 0 0 1 ...
$ Securities Account : num 1 1 0 0 0 0 0 0 0 0 ...
$ CD Account : num 0 0 0 0 0 0 0 0 0 0 ...
$ Online : num 0 0 0 0 0 1 1 0 1 0 ...
$ CreditCard : num 0 0 0 0 1 0 0 1 0 0 ...

colnames(Thera_Bank_dataset)<- c ('ID','Age (in years)','Experience (in years)','Income (in K/year)','ZI

P Code','Family members','CCAvg','Education','Mortgage','Personal Loan','Securities Account','CD Acc
ount','Online','CreditCard')

> head(Thera_Bank_dataset)

# A tibble: 6 x 14

ID Àge (in years)` Èxperience (in~ Ìncome (in K/y~

<dbl> <dbl> <dbl> <dbl>
1 1 25 1 49
2 2 45 19 34
3 3 39 15 11
4 4 35 9 100
5 5 35 8 45
6 6 37 13 29

#renaming columns for ease of understanding

> names(Thera_Bank_dataset)[2]<-"age_in_years"
> names(Thera_Bank_dataset)[3]<-"experience_in_years"
> names(Thera_Bank_dataset)[4]<-"income_in_K_month"
> names(Thera_Bank_dataset)[5]<-"zip_code"
> names(Thera_Bank_dataset)[6]<-"family_size"
> names(Thera_Bank_dataset)[10]<-"did_accept_personal_loan_offer"
#predictor variable
> names(Thera_Bank_dataset)[11]<-"have_securities_account"
> names(Thera_Bank_dataset)[12]<-"have_deposit_account"
> names(Thera_Bank_dataset)[13]<-"have_online_access"
> names(Thera_Bank_dataset)[14]<-"have_CC"

> summary(Thera_Bank_dataset)

ID age_in_years experience_in_years
Min. : 1 Min. :23.00 Min. :-3.0
1st Qu.:1251 1st Qu.:35.00 1st Qu.:10.0
Median :2500 Median :45.00 Median :20.0
Mean :2500 Mean :45.34 Mean :20.1
3rd Qu.:3750 3rd Qu.:55.00 3rd Qu.:30.0
Max. :5000 Max. :67.00 Max. :43.0
income_in_K_month zip_code family_size
Min. : 8.00 Min. : 9307 Min. :1.000
1st Qu.: 39.00 1st Qu.:91911 1st Qu.:1.000
Median : 64.00 Median :93437 Median :2.000
Mean : 73.77 Mean :93153 Mean :2.397
3rd Qu.: 98.00 3rd Qu.:94608 3rd Qu.:3.000
Max. :224.00 Max. :96651 Max. :4.000
NA's :18
CCAvg Education Mortgage
Min. : 0.000 Min. :1.000 Min. : 0.0
1st Qu.: 0.700 1st Qu.:1.000 1st Qu.: 0.0
Median : 1.500 Median :2.000 Median : 0.0
Mean : 1.938 Mean :1.881 Mean : 56.5
3rd Qu.: 2.500 3rd Qu.:3.000 3rd Qu.:101.0
Max. :10.000 Max. :3.000 Max. :635.0
did_accept_personal_loan_offer have_securities_account
Min. :0.000 Min. :0.0000
1st Qu.:0.000 1st Qu.:0.0000
Median :0.000 Median :0.0000
Mean :0.096 Mean :0.1044
3rd Qu.:0.000 3rd Qu.:0.0000
Max. :1.000 Max. :1.0000
have_deposit_account have_online_access have_CC
Min. :0.0000 Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
Median :0.0000 Median :1.0000 Median :0.000
Mean :0.0604 Mean :0.5968 Mean :0.294
3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.000
Max. :1.0000 Max. :1.0000 Max. :1.000

family_size_fact Education_fact
1 :1464 1:2096
2 :1292 2:1403
3 :1009 3:1501
4 :1217
NA's: 18

have_deposit_account_fact have_online_access_fact
0:4698 0:2016
1: 302 1:2984
have_CC_fact zip_code_fact have_securities_account_fact
0:3530 94720 : 169 0:4478
1:1470 94305 : 127 1: 522
95616 : 116
90095 : 71
93106 : 57
92037 : 54
(Other):4406

The customers out of 5000 total customers, who took personal loan vs no
personal loan were 90.4% and 9.6% respectively

did_accept_personal_loan_offer_fact
0:4520
1: 480

Points considered:

#1. ID is only a dummy variable and hence, removed from the dataset

#2. There are negative values in professional experience years , replace all negative values with a 0.

#3. There are invalid zip codes (some 4 digit ones , which needs to be prefixed with 0)

#4. Here we are dealing with imbalanced data , only 10% of the population accepted the offer

#5. Majority of the customers do not have a securities account - 90%

#6. Majority of the customers do not have a deposits account - 94%

#7. Mortgage data is highly skewed , means only few customers have house mortgage

#8. Age, professional experience have normal distribution

## Univariate analysis
> hist(Thera_Bank_dataset$age_in_years,main = "Histogram of Age",xlab = "age_in_years")

X-axis: Age
Y-axis: Frequency

Inference : Age is very close to the normal distribution

#Converting the categorical variables to factor from numeric

> Thera_Bank_dataset$family_size_fact<-as.factor(Thera_Bank_dataset$family_size)
> Thera_Bank_dataset$Education_fact<-as.factor(Thera_Bank_dataset$Education)
> Thera_Bank_dataset$did_accept_personal_loan_offer_fact<-as.factor(Thera_Bank_dataset$did_a
ccept_personal_loan_offer)
> Thera_Bank_dataset$have_deposit_account_fact<-as.factor(Thera_Bank_dataset$have_deposit_a
ccount)
> Thera_Bank_dataset$have_online_access_fact<-as.factor(Thera_Bank_dataset$have_online_acces
s)
> Thera_Bank_dataset$have_CC_fact<-as.factor(Thera_Bank_dataset$have_CC)
> Thera_Bank_dataset$zip_code_fact<-as.factor(Thera_Bank_dataset$zip_code)
> Thera_Bank_dataset$have_securities_account_fact<-as.factor(Thera_Bank_dataset$have_securiti
es_account)

# Grouped Bar Plot

counts <- table(Thera_Bank_dataset$family_size_fact, Thera_Bank_dataset$did_accept_personal_lo
an_offer)
barplot(counts, main="family_size_fact vs did_accept_personal_loan_offer",
xlab="Personal Loan No vs Yes", col=c("darkblue","red","green","yellow"),
legend = rownames(counts), beside=TRUE)
More the family members more is the possibility of taking loan.
0-No loan
1-Loan

counts <- table(Thera_Bank_dataset$Education, Thera_Bank_dataset$did_accep

t_personal_loan_offer)
> barplot(counts, main="Education Category vs did_accept_personal_loan_off
er",
+xlab="Personal Loan No vs Yes", col=c("darkblue","red","green"),
+legend = c("1 Undergrad", "2 Graduate","3 Advanced/Professional"), beside
=TRUE)

Inference : Hypothesis : Advanced/Professional require loan for higher studies

Boxplot for numerical data
boxplot(Thera_Bank_dataset$age_in_years, main = toupper("Boxplot
of Age"),ylab = "age_in_years",col = "blue")

Inference : Not much outlier in Age column

boxplot(Thera_Bank_dataset$experience_in_years,main = toupper("Boxplot of
Experience"),ylab = "experience_in_years",col = "purple")

Inference : Not much outlier in Experience column

boxplot(Thera_Bank_dataset$income_in_K_month,main = toupper("Boxplot of Mo
nthly Income"),ylab = "Monthly Income",col = "orange")

Inference : Many outliers in the monthly income

boxplot(Thera_Bank_dataset$CCAvg,main = toupper("Boxplot of Average Spendi
ng of credit card per month"),ylab = "Average Spending",col = "red")

Inference : Here too in the average spending of credit card per month there are many outliers
boxplot(Thera_Bank_dataset$Mortgage,main = toupper("Boxplot of House Mortg
age if any"),ylab = "House Mortgage",col = "red")

Inference : Here too in there are lots of outliers

Maximum outliers observed in

• Monthly income
• Average spending against Credit card
• Mortgage

3.1.4 Correlation between the numeric features

my_data <- Thera_Bank_dataset[, c(2,3,4,7,9)]

> res <- cor(my_data)
> round(res, 2)

age_in_years experience_in_years
age_in_years 1.00 0.99
experience_in_years 0.99 1.00
income_in_K_month -0.06 -0.05
CCAvg -0.05 -0.05
Mortgage -0.01 -0.01
income_in_K_month CCAvg Mortgage
age_in_years -0.06 -0.05 -0.01
experience_in_years -0.05 -0.05 -0.01
income_in_K_month 1.00 0.65 0.21
CCAvg 0.65 1.00 0.11
Mortgage 0.21 0.11 1.00
age_in_years experience_in_years income_in_K_month CCAvg Mortgage
age_in_years 1.00000000 0.99421486 -0.05526862 -0.05201218 -0.01253859

experience_in_years 0.99421486 1.00000000 -0.04657418 -0.05007651 -0.01058155

income_in_K_month -0.05526862 -0.04657418 1.00000000 0.64598367 0.20680623

CCAvg -0.05201218 -0.05007651 0.64598367 1.00000000 0.10990472

Mortgage -0.01253859 -0.01058155 0.20680623 0.10990472 1.00000000

Thera_Bank_dataset_cont_vars<-Thera_Bank_dataset[,c(2:4,7,9)]
corrmatrix<-cor(Thera_Bank_dataset_cont_vars)
corrplot(corrmatrix,method='circle',type='upper',order='FPC')
library(corrplot)
Inference
1. There is high degree of correlation between Age of the person and professional experience
2. There is some degree of correlation between income and spending on credit card
3. There is some degree of correlation between income and mortgage amount.
4. Correlation matrix plotted corroborates the above fact, so we drop professional experience
column and keep only age in years
5. But there is no significant correlation between years of experience and income per month

4.0 Cluster Analysis

Clustering features are only numerical
All the categorical features have not been considered as they do not make much sense when we do
clustering
1. Age in Years
2. Experience
3. Monthly Income
4. CCAvg
5. Mortgage

wss <- (nrow(my_data)-1)*sum(apply(my_data,2,var))

> for(i in 2:15)wss[i]<- sum(fit=kmeans(my_data,centers=i,15)$wi
thinss)
> plot(1:15,wss,type="b",main="15 clusters",xlab="no. of cluster
",ylab="with cluster sum of squares")

A fundamental step for any unsupervised algorithm is to determine the optimal number of clusters
into which the data may be clustered. The Elbow Method is one of the most popular methods to
determine this optimal value of k (clusters).
We now demonstrate the given method using the K-Means clustering technique
Inference : Based on the elbow curve we can see 4 clusters formed.

fit <- kmeans(my_data,4)

> library(cluster)
> library(fpc)

Getting the cluster means

mydata <- data.frame(my_data,fit$cluster)
> cluster_mean <- aggregate(mydata,by = list(fit$cluster),FUN = mean)
> cluster_mean

Group.1 age_in_years experience_in_years income_in_K_month

1 1 45.35738 20.12164 56.98742
2 2 45.73072 20.40566 51.66331
3 3 44.62385 19.47706 129.17431
4 4 44.44778 19.44667 139.28778
CCAvg Mortgage fit.cluster
1 1.566745 141.411074 1
2 1.387648 0.000000 2
3 3.044434 343.116208 3
4 3.605644 1.925556 4

“It is important to remember that Data Analytics Projects require a delicate balance between
experimentation, intuition, but also following (once a while) a process to avoid getting fooled by
randomness and “finding results and patterns” that are mainly driven by our own biases and not by
the facts/data themselves"
As Kmeans is prone to outliers lets re-cluster them after outlier removal
> my_data2<-my_data
> outliers3 <- boxplot(my_data2$income_in_K_month, plot=FALSE)
> outliers3<-outliers3$out
> my_data2 <- my_data2[-which(my_data2$income_in_K_month %in% outliers3),]
> outliers4 <- boxplot(my_data2$CCAvg, plot=FALSE)
> my_data2 <- my_data2[-which(my_data2$CCAvg %in% outliers4),]
> outliers5 <- boxplot(my_data2$Mortgage, plot=FALSE)
> outliers5<-outliers5$out
> my_data2 <- my_data2[-which(my_data2$Mortgage %in% outliers5),]
> nrow(my_data2)

[1] 4402

Plotting elbow curve for the outlier removed data

Inference : 5 clusters make sense here

fit2<-kmeans(my_data2,5)
> my_data3 <- data.frame(my_data2)
> cluster_mean_2 <- aggregate(my_data3,by = list(fit2$cluster),FUN = mean)
> cluster_mean_2

Group.1 age_in_years experience_in_years income_in_K_month

1 1 45.41659 20.198930 59.32114
2 2 35.10999 9.868307 71.47323
3 3 43.52968 18.610350 142.68493
4 4 47.66155 22.370759 32.45961
5 5 54.64748 29.233094 81.78273
CCAvg Mortgage
1 1.6275914 144.8599465
2 1.8128075 0.0000000
3 3.7670928 2.1476408
4 0.9817447 0.8626817
5 2.0223741 0.3280576

Inference : The 5 clusters make much more sense after outlier removal

my_data2$cluster<-fit2$cluster
install.packages("dplyr")
library(dplyr)
head(my_data2)

# A tibble: 6 x 6
age_in_years experience_in_ye~ income_in_K_mon~ CCAvg Mortgage cluster
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 25 1 49 1.6 0 2
2 45 19 34 1.5 0 4
3 35 9 100 2.7 0 2
4 37 13 29 0.4 155 1
5 53 27 72 1.5 0 5
6 50 24 22 0.3 0 4

index<-as.integer(row.names.data.frame(my_data2))

did_accept_personal_loan_offer<-Thera_Bank_dataset[index,10]

my_data2$Personal_loan<-did_accept_personal_loan_offer

head(my_data2)

age_in_years experience_in_y~ income_in_K_mon~ CCAvg Mortgage cluster Personal_loan$d~

1 25 1 49 1.6 0 2 0

2 45 19 34 1.5 0 4 0

3 35 9 100 2.7 0 2 0

4 37 13 29 0.4 155 1 0

5 53 27 72 1.5 0 5 0

6 50 24 22 0.3 0 4 0

To check the Personal_loan vs Cluster barchart

# Grouped Bar

Plot counts <- table( my_data2$Personal_loan,my_data2$cluster)

barplot(counts, main="Family members vs Personal Loan", xlab="Personal Loan No vs Yes",

col=c("red","green"), legend = c("Personal_Loan_No","Personal_Loan_Yes"), beside=TRUE)
Inference :

Since population target is close to 500, we will target the cluster 4 segment to make higher
conversion ratio.

This would help the company to spend the marketing money on the correctly predicted customers.

5.0 Cart Model

Creating Training and Testing Dataset The given data set is divided into Training and Testing data set,
with 70:30 proportion. The distribution of Responder and Non Responder Class is verified in both the
data sets, and ensured it’s close to equal.
set.seed(111)
trainIndex <- createDataPartition(Personal_loan,p=0.7,
list = FALSE,
times = 1)
train.data <- Thera_Bank_dataset[trainIndex,2:length(Thera_Bank_dataset)]
test.data <- Thera_Bank_dataset[-trainIndex,2:length(Thera_Bank_dataset)]

5.1 Model Building - CART (Unbalanced Dataset) Setting the control parameter inputs for rpart
r.ctrl <- rpart.control(minsplit = 100,
+ minbucket = 10,
+ cp = 0,
+ xval = 10)
#Exclude columns - "Customer ID" and "Acct Opening Date"
+ cart.train <- train.data
+ m1 <- rpart(formula = Personal_loan~.,
+ data = cart.train,
+ method = "class",
+ control = r.ctrl)
#install.packages("rattle")
#install.packages("RColorBrewer")
library(rattle)
fancyRpartPlot(m1)

printcp(m1)

Classification tree:
rpart(formula = Personal_loan ~ ., data = cart.train, method =
"class", control = r.ctrl)
Variables actually used in tree construction:
[1] CCAvg Education Family_members Income_Monthly
Zip_code
Root node error: 308/3062 = 0.10059
n= 3062

CP nsplit rel error xerror xstd

1 0.3214286 0 1.000000 1.00000 0.054039

2 0.1525974 2 0.357143 0.36688 0.033871

3 0.0487013 3 0.204545 0.21753 0.026283

4 0.0097403 5 0.107143 0.23052 0.027039

5 0.0000000 6 0.097403 0.23377 0.027224

5.2 Pruning the cart tree to ensure that the model is not overfitting

“Overfitting happens when a model learns the detail and noise in the training data to the extent that
it negatively impacts the performance of the model on new data. The problem is that these concepts
do not apply to new data and negatively impact the models ability to generalize”
plotcp(m1)

Will consider 0.045 as the prune parameter and rebuild the tree.

ptree<- prune(m1, cp= 0.045 ,"CP")

printcp(ptree)

Classification tree:
rpart(formula = Personal_loan ~ ., data = cart.train, method =
"class", control = r.ctrl)

Variables actually used in tree construction:

[1] CCAvg Education Family_members Income_Monthly
Zip_code

Root node error: 308/3062 = 0.10059

n= 3062

CP nsplit rel error xerror xstd

1 0.321429 0 1.00000 1.00000 0.054039

2 0.152597 2 0.35714 0.36688 0.033871

3 0.048701 3 0.20455 0.21753 0.026283

4 0.045000 5 0.10714 0.23052 0.027039

fancyRpartPlot(ptree,
uniform = TRUE,
main = "Final Tree",
palettes = c("Blues", "Oranges"))
5.3 Predicting on the test set

## Scoring Holdout sample

cart.test <- test.data

cart.test$predict.class = predict(ptree, cart.test,type = "class")

x<-cart.test$Personal_loan

cart.test$predict.score = predict(ptree, cart.test, type = "prob")

library(caret)

confusionMatrix(table(as.factor(x),cart.test$predict.class ))

5.4 Confusion Matrix and Statistics

0 1

0 1744 22

1 25 147

Accuracy : 0.9757

95% CI : (0.9679, 0.9821)

No Information Rate : 0.9128

P-Value [Acc > NIR] : <2e-16

Kappa : 0.8489

Mcnemar's Test P-Value : 0.7705

Sensitivity : 0.9859

Specificity : 0.8698

Pos Pred Value : 0.9875

Neg Pred Value : 0.8547

Prevalence : 0.9128

Detection Rate : 0.8999

Detection Prevalence : 0.9112

Balanced Accuracy : 0.9278

'Positive' Class : 0

5.6 AUC/ROC performance metrics

ROC cure for pruned tree

library("ROCR")
Pred.cart = predict(ptree, newdata = cart.test, type = "prob")[,2]
Pred2 = prediction(Pred.cart, cart.test$Personal_loan)
plot(performance(Pred2, "tpr", "fpr"))
abline(0, 1, lty = 2)

plotting AUC
auc.tmp <- performance(Pred2,"auc")
auc <- as.numeric([email protected])
print(auc)
[1] 0.973238
Inference : The area under the curve is close to 0.97
Result : The Cart model has given close to 97.5 % accuracy in predicting the people who will take
personal loan on the test data.

6.0 Random Forest model

library(randomForest)

library(caret)
trainIndex <- createDataPartition(Personal_loan, p=0.7, list = FALSE, times = 1)

Thera_Bank_dataset _2<- Thera_Bank_dataset [,-5]

train.data <- Thera_Bank_dataset _2 [trainIndex,2:length(Thera_Bank_dataset _2) ]

train.data$Personal_loan<-as.factor(train.data$Personal_loan)

train.data<-na.omit(train.data)

test.data <- Thera_Bank_dataset _2 [-trainIndex,2:length(Thera_Bank_dataset _2) ]

test.data<-na.omit(test.data)

test.data$Personal_loan<-as.factor(test.data$Personal_loan)

model1 <- randomForest(Personal_loan ~ ., ntree = 100,data = train.data, importance = TRUE)

model1

Call:

randomForest(formula = Personal_loan ~ ., data = train.data, ntree = 100, importance = TRUE)

Type of random forest: classification

Number of trees: 100

No. of variables tried at each split

OOB estimate of error rate: 1.41%

0 1 class.error

0 2730 6 0.002192982

1 37 277 0.117834395

Pred_rf <- predict(model1, test.data, type = 'class')

confusionMatrix(test.data$Personal_loan, Pred_rf)
Confusion Matrix and Statistics

Reference

Prediction 0 1

0 1766 2

1 25 139

Accuracy : 0.986

95% CI : (0.9797, 0.9908)

No Information Rate : 0.927

P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.9039

Mcnemar's Test P-Value : 2.297e-05

Sensitivity : 0.9860

Specificity : 0.9858

Pos Pred Value : 0.9989

Neg Pred Value : 0.8476

Prevalence : 0.9270

Detection Rate : 0.9141

Detection Prevalence : 0.9151

Balanced Accuracy : 0.9859

'Positive' Class : 0

Result : Random forest has perfomed very well with 98.9% accuracy on the test data

ROC curve for random forest

library("ROCR")
Pred_rf <- predict(model1, test.data, type = 'prob')[,2]
require(pROC)
rf.roc<-roc(test.data$Personal_loan,Pred_rf)
plot(rf.roc)
Inference : The ROC is very close to ideal
auc(rf.roc)
Area under the curve: 0.9975
Inference:
• Monthly Income and Education is the most significant factor that decides personal loan.
• Random forest has performed better with 98.9% accuracy on the test data, as compared to
The Cart model that gave 97.5 % accuracy in predicting the people who will take personal
loan on the test data

Thera Bank PRJ
100% (10)
Thera Bank PRJ
79 pages
Credit Card Default
No ratings yet
Credit Card Default
30 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Cart Project
75% (4)
Cart Project
17 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Thera Bank-Project
100% (12)
Thera Bank-Project
26 pages
Capastone Project Taiwan Customer Default
67% (3)
Capastone Project Taiwan Customer Default
36 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Telecom Churn Report
No ratings yet
Telecom Churn Report
66 pages
Midterm Project Group 6
No ratings yet
Midterm Project Group 6
41 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Credit Pruned and Cleaned
No ratings yet
Credit Pruned and Cleaned
37 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Capstone - 1 Notes - Vikas Chauhan PDF
100% (3)
Capstone - 1 Notes - Vikas Chauhan PDF
13 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Chapter3-Measures of Center
No ratings yet
Chapter3-Measures of Center
26 pages
Project On Data Mining-Raveendra Babu Gaddam
No ratings yet
Project On Data Mining-Raveendra Babu Gaddam
29 pages
Report
No ratings yet
Report
24 pages
Project3: Loading Library
No ratings yet
Project3: Loading Library
17 pages
Bank Rpubs
No ratings yet
Bank Rpubs
24 pages
RCode Group 4
No ratings yet
RCode Group 4
21 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
Bank Marketing Data Set Analysis
No ratings yet
Bank Marketing Data Set Analysis
33 pages
Note 4
No ratings yet
Note 4
18 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Advanced Modelling Techniques Anurag Payel
No ratings yet
Advanced Modelling Techniques Anurag Payel
41 pages
Fruaddetectiondata2 CSV
No ratings yet
Fruaddetectiondata2 CSV
24 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
Produit Bancaire
No ratings yet
Produit Bancaire
15 pages
Mini Project-Data Mining
No ratings yet
Mini Project-Data Mining
25 pages
FRA Group Assignment - Report
No ratings yet
FRA Group Assignment - Report
22 pages
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
No ratings yet
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
18 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
Statistical Foundations of Business Analytics Assignment 2
No ratings yet
Statistical Foundations of Business Analytics Assignment 2
11 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
11 pages
21nku14 - Data Visualization Assignment
No ratings yet
21nku14 - Data Visualization Assignment
10 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
No ratings yet
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
9 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
Home Credit Data
No ratings yet
Home Credit Data
6 pages
Practical 3
No ratings yet
Practical 3
8 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Family Main
No ratings yet
Family Main
5 pages
Insurance Dataset Description2
No ratings yet
Insurance Dataset Description2
4 pages
Pract Person
No ratings yet
Pract Person
6 pages
Full Download Advanced Risk Analysis in Engineering Enterprise Systems Paul R. Garvey PDF
100% (6)
Full Download Advanced Risk Analysis in Engineering Enterprise Systems Paul R. Garvey PDF
70 pages
Cert and CPM
No ratings yet
Cert and CPM
11 pages
Since R Considers All Variables As Numeric, We Convert Them Into Factors
No ratings yet
Since R Considers All Variables As Numeric, We Convert Them Into Factors
3 pages
Optimizations and Programming: Linear, Non-Linear, Dynamic, Stochastic and Applications With Matlab Abdelkhalak El Hamiinstant Download
100% (2)
Optimizations and Programming: Linear, Non-Linear, Dynamic, Stochastic and Applications With Matlab Abdelkhalak El Hamiinstant Download
48 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Google - LeetCode
100% (1)
Google - LeetCode
29 pages
SanatKulkarni - AP22110010183 - Assignment3-1
No ratings yet
SanatKulkarni - AP22110010183 - Assignment3-1
4 pages
Forecasting in Business (PPT) - 092138
No ratings yet
Forecasting in Business (PPT) - 092138
53 pages
Assignment Lab 1
No ratings yet
Assignment Lab 1
3 pages
Ampl Cplex Tutorial
No ratings yet
Ampl Cplex Tutorial
22 pages
Stationarity & AR, MA, ARIMA, SARIMA
100% (1)
Stationarity & AR, MA, ARIMA, SARIMA
6 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 7. Steady State Process Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 7. Steady State Process Optimization (2010)
32 pages
CPU Scheduling Algorithms Simulation Using Java
No ratings yet
CPU Scheduling Algorithms Simulation Using Java
13 pages
Module 3.1 - Multilayred
No ratings yet
Module 3.1 - Multilayred
35 pages
Lecture 35
No ratings yet
Lecture 35
41 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
35 pages
PROBLEM 4.156: Solution
No ratings yet
PROBLEM 4.156: Solution
11 pages
Indian Currency Fake Note DetectionSystem Using Resnet 50
No ratings yet
Indian Currency Fake Note DetectionSystem Using Resnet 50
4 pages
UNIT V Window Operations
No ratings yet
UNIT V Window Operations
12 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
Future-Proofing Security For UAVs With Post-Quantum Cryptography A Review
No ratings yet
Future-Proofing Security For UAVs With Post-Quantum Cryptography A Review
23 pages
2d Overfitting 18may
No ratings yet
2d Overfitting 18may
19 pages
Uber Data Analysis: Data Import and Sanity Checks
No ratings yet
Uber Data Analysis: Data Import and Sanity Checks
16 pages
Decision Trees Machine Learning
No ratings yet
Decision Trees Machine Learning
4 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
No ratings yet
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
10 pages
Mini-Project Description - Coursera
No ratings yet
Mini-Project Description - Coursera
4 pages
THINK PAIR SHARE CASE 2 Advanced Stats
No ratings yet
THINK PAIR SHARE CASE 2 Advanced Stats
6 pages
Backpropagation and Resilient Propagation
No ratings yet
Backpropagation and Resilient Propagation
6 pages
Business Statistics: Assignment 4
No ratings yet
Business Statistics: Assignment 4
3 pages
TOG: Targeted Adversarial Objectness Gradient Attacks On Real-Time Object Detection Systems
No ratings yet
TOG: Targeted Adversarial Objectness Gradient Attacks On Real-Time Object Detection Systems
7 pages
Calc 8.4 Packet
No ratings yet
Calc 8.4 Packet
4 pages
Case Study: Titan Insurance Company: Practice Assignment
No ratings yet
Case Study: Titan Insurance Company: Practice Assignment
4 pages
Case Study: Titan Insurance Company: Practice Assignment
No ratings yet
Case Study: Titan Insurance Company: Practice Assignment
4 pages
Decision Theory
No ratings yet
Decision Theory
6 pages
Vikas Subramaniam: Professional Summary
No ratings yet
Vikas Subramaniam: Professional Summary
2 pages
3 Sampling Techniques
No ratings yet
3 Sampling Techniques
4 pages
22 Thesis Poster 0905096 0905035
No ratings yet
22 Thesis Poster 0905096 0905035
1 page
Simple Credit Repair
From Everand
Simple Credit Repair
LadyG
5/5 (1)
Golden Nuggets
From Everand
Golden Nuggets
Will Shepard
No ratings yet
Dictionary of Credit Risk Business Terms - EXTRACT
From Everand
Dictionary of Credit Risk Business Terms - EXTRACT
Steve Preece
No ratings yet