0% found this document useful (0 votes)
8 views10 pages

Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk

This study compares various machine learning algorithms for assessing bank customer credit scoring default risk against traditional logistic regression. Utilizing a dataset of 2005 Taiwanese credit card consumers, the research finds that machine learning models, particularly Random Forest, significantly improve accuracy in predicting defaults compared to logistic regression. Key challenges identified include unbalanced datasets and the inability of some algorithms to explain their predictions effectively.

Uploaded by

Trung Nguyen Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk

This study compares various machine learning algorithms for assessing bank customer credit scoring default risk against traditional logistic regression. Utilizing a dataset of 2005 Taiwanese credit card consumers, the research finds that machine learning models, particularly Random Forest, significantly improve accuracy in predicting defaults compared to logistic regression. Key challenges identified include unbalanced datasets and the inability of some algorithms to explain their predictions effectively.

Uploaded by

Trung Nguyen Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1935

Jurnal Kejuruteraan 36(5) 2024: 1935–1944


https://doi.org/10.17576/jkukm-2024-36(5)-13

Algorithm Comparison for Data Mining Classification: Assessing Bank Customer


Credit Scoring Default Risk
Elaf Adel Abbasa* & Nisreen Abbas Husseinb

aCollege of Computer Science and Information Technology,University of Karbala

bBabylon Education Directorate, Ministry of Education, Iraq,

*Corresponding author: [email protected]

Received 22 January 2024, Received in revised form 13 May 2024


Accepted 11 July 2024, Available online 30 September 2024

ABSTRACT

Rating consumer credit risk involves assessing credit application risks. Thus, every business must appropriately
identify debtors and non-debtors. This study uses machine learning approaches to simulate consumer credit risk and
compares the results to the logistic model, determining if machine learning improves client default ratings. The
study examines how customer attributes affect virtual experiences. Despite advances in machine learning models for
credit assessment, unbalanced datasets and some algorithms’ failure to explain forecasts remain major issues.
This study used 2005 Taiwanese credit card consumers’ education, age, marital status, payment history, and sex.
The default experience is modeled using Logistic Regression, K neighbors, Support Vector Machine, Decision Tree,
Random Forest, Ada Boost Classifier, and Gradient Boosting. The models’ Accuracy, precision, recall,
receiver operating characteristic (ROC) curve, and precision-recall curve were evaluated. Random Forest’s
97% ROC metric rating outperformed all other accuracy metrics. The logistic model underperformed, while
machine learning improved the default categorization.

Keywords: Credit scoring; artificial intelligence; machine learning; classification techniques; logistic regression

INTRODUCTION (CS) to evaluate loan applications(Durand, 1941). To


manage financial risks and decide whether to lend money,
banks and other financial institutions must collect customer
Financial risk management is a delicate topic that should
data. This method can help identify good and bad debtors.
be investigated. Some organizations, industries, and
Banks consider “good borrowers” clients with clean credit
governments worldwide depend on risk management
histories. “Bad borrowers” have poor credit. A simple
systems and credit scoring (Zhou et al. 2018). Financial
selection technique may not always classify correctly. More
fraud, which includes business fraud, personal loan fraud,
accurate automated approaches that reduce prediction
money laundering, credit card fraud, insurance fraud, peer-
errors are critically needed to manage vast and complicated
to-peer lending fraud, and others, is a conscious strategy,
CS datasets (Anderson 2007). Model development and
culpability, or fraud committed with the intention of
model implementation comprises the credit rating process.
exploiting the structure of a nonprofit organization in order
The first step is to collect samples of good and bad loan
to illicitly achieve financial benefit without resorting to
applications from past borrowers to train and construct
physical coercion (Pławiak et al. 2020). The line separating
a model that can predict payment behavior. Formally, let
fraudulent activity from damaging credit events is
A = {ai, bi}, where ai is the number of loan applications
becoming hazier in the credit markets as more credit events
and bi represents their status as good or bad loans. The loan
shift online, and counterfeiters improve their skills. As a
application form has several properties or variables
result, financial institutions frequently combine financial
ai = (ai1, ai2,..., aim). Thus, a quantitative model is constructed
detection, credit scoring, and other factors when making
to convert loan application characteristics to the chance of
decisions in order to lower the risk of credit loss. Banks
default[5]. After the model’s development and training, it’s
and customers face many risks. Banks use Credit Scoring
time to test it and see how well it classifies loan applicants.
1936

The applicant’s final score, which the lender will use to any credit to the debtors.
decide whether to grant the loan, is based on a threshold Some authors classified the sufficiency of the borrower
or drop score of Threshold value (Tc). A loan applicant’s using ML approaches like Random Forest (RF) and
status is usually (0) for good and (1) for bad. The model’s AdaBoost (Aniceto et al. 2020). Using the loan database
score is f(x) for new loan applications. If this score is below from the Brazilian Bank, researchers examined various
Tc, the loan is approved; otherwise, it is denied. ML techniques and evaluated the suitability of the
borrowers. Large Brazilian financial institutions’ low-
income borrowers make up the majority of the data sets.
LITERATURE REVIEW The portfolio’s default rate was close to 48%. Using real-
world data, they developed a machine learning (ML) model
This section specifically highlights how financial and showed that Random Forest and AdaBoost performed
institutions are developing and implementing cutting-edge better than competing methods. Only some authors
technologies based on Artificial intelligence- Machine suggested using the model of decision trees to determine
Learning (AI-ML) strategies to deal with their various whether the loan provider poses a risk for performing or
credit risks in both developed and emerging nations. The non-performing loans. Most academics emphasized that a
majority of financial organizations today deal with various categorization issue exists with credit scores (Boughaci &
risks daily. Credit risk, operational risk, market risk, and Alkhawaldeh, 2018). They compared the credit data sets
liquidity risk are a few of these dangers (Leo et al. 2019). from Germany and Australia with well-known classifier
Few writers have discussed the socioeconomic benchmarks. They coupled the Support Vector Machine
implications for determining a client’s credit score in earlier (SVM) model with the Local Search method (LS),
research papers, which have mostly focused on a Stochastic Local Search technique (SLS), and Variable
customer’s demographics and statistical factors (Moradi Neighborhood Search (VNS) approach to determine a
& Mokhatab Rafiei, 2019). The authors emphasized that person’s credit score.
political alterations have an impact on economic aspects
as well. In order to estimate credit risk, they also took MACHINE-LEARNING CLASSIFICATION TECHNIQUES IN
politico-economic issues into account. To first anticipate CREDIT-SCORING
whether a specific loan is performing, they created an
adaptive network-based fuzzy inference system. Most
banks and other financial institutions now prioritize social The major goal is to create a model that can
and economic effects due to Covid-19.One of the writers effectively categorize and measure borrower repayment
evaluated their customers’ credit scores using data from behavior as well as anticipate borrower loan
an Iranian bank, especially when societal and economic applications. This section provides an overview of the
conditions are exceptional. Using the features of Iranian most popular modern machine-learning categorization
bank clients’ behavior as input, that assessed credit scoring approaches that are pertinent to this research and utilized
using a fuzzy inference method, outperforming more to create credit-scoring models.
traditional models, especially during economic crises.
Researchers have stressed the non-linear and non-
LOGISTIC REGRESSION
parametric correlations between the factors influencing
bank lending and how many loans are still outstanding
(Ozgur et al. 2021). They showed how 19 macroeconomic, A specific type of Generalized Linear Model (GLM), or
local, and international variables impacted Turkish bank “Logistic Regression (LR),” is a generalization of the ideas
loans between 2002Q4 and 2019Q2. They contrasted the of regular linear models. As a result, logistic regression is
regression model with ML-based approaches to determine similar to linear regression and is employed in this analysis
how these factors will affect the results. The authors also to solve a classification problem. A binary outcome
pointed out that conventional linear regression methods variable, typically denoted by 0 or 1, is modeled using LR.
struggled with the extremely high dimensionality of the According to Thomas, the scoring model’s result must
datasets, whereas ML-based techniques were able to be binary (accept/good loan, 0; reject/bad loan, 1), and this
accommodate this. For the majority of their debt recovery is based on a number of independent variables(Ala’raj &
management, banking institutions rely on outside sources, Abbod 2015).
which entails increased expenses and market risks.
Therefore, it is always advised to have a reliable strategy
in place for predicting debt repayment before extending
1937

K NEAREST NEIGHBOR SUPPORT VECTOR MACHINE

K nearest neighbor is one of the most widely applied credit Another effective machine-learning method utilized in
scoring techniques (KNN). The non-parametric categorization and credit scoring issues is SVM. Due to its
classification method category includes this technique. It excellent outcomes, it is widely employed in the field of
is well known that the non-parametric classifier frequently credit scoring as well as other areas. SVMs that take on
experiences outliers, especially when the training sample the appearance of a linear classifier. SVMs predict a set of
size is small. Numerous credit scoring researchers have two classes of inputs to identify which of the two classes
utilized KNN to evaluate the risk involved in making a is most likely to have the output. In order to create the
loan to a business or a person(Mukid et al. 2018). The finest hyperplane (Line) that divides the input data into
Euclidean distance between the given training samples and two groups, binary classification is accomplished using
the test sample is a common foundation for the k-nearest SVMs (good and bad credit).
neighbor classifier. The primary principle of the k-NN SVM can be used in both linear and non-linear
method is that the training data is used to select the k nearest separation settings. The latter uses a basis expansion h(x)
neighbors of each new point that needs to be predicted. that can be converted back to a non-linear boundary in the
The average of the values of the new point’s k-closest original space to construct the linear boundary in an
neighbors can then be used as a prediction (Zhang & Wang extended and revised version of the feature space. It is
2016). necessary to understand how the kernel function K
computes the inner products of vectors in the transformed
space by using the original space X as input (Dastile et al.
DECISION TREES 2020).
Fitting linear classifiers and suppressors with convex
Decision trees are now frequently used to fit data, anticipate loss functions, such as those of (linear) Support Vector
default, and improve credit rating. The algorithms used by Machines and Logistic Regression, is a breeze with
decision trees work top-down, selecting the variable that stochastic gradient descent (SGD). Text categorization and
divides the dataset “best” at each stage. Any of a number NLP are two areas where SGD has been successfully
of criteria, such as the Gini index, information value, or applied to tackle sparse and massive machine learning
entropy, can be used to determine what is “best.” Predict challenges.
an outcome by following the tree’s branches from the The SGD does not belong to a particular family of
starting (root) node to a leaf node. The final leaf node machine learning models and is necessarily correct, just
contains the solution. Classification trees deliver nominal an optimization technique. The only purpose is to train a
responses, such as “true” or “false,” rather than more model. A straightforward stochastic gradient descent
nuanced answers. Regression trees produce numerical learning procedure that supports various classification
results (Bastos, 2007). penalties and loss functions is implemented by the class
SGD Classifier (Condori-Alejo et al. 2021).

RANDOM FOREST CLASSIFICATION


METHODOLOGY

Random forest is a machine learning technique for dealing


with classification and regression problems. It employs The various steps of our process are depicted in Figure 1.
ensemble learning, a technique for resolving challenging Selecting data sets, cleaning and organizing information,
problems by integrating a variety of classifiers. It’s a developing models, and checking those models are all parts
classifier that combines several decision trees on various of the process described here. This study proposes methods
dataset subsets and averages the outcomes to enhance the to classify borrowers according to their payment risk in
dataset’s anticipated Accuracy. One potential benefit of order to help banks avoid the real risks associated with
these techniques is that they may allow model builders to non-payment, which are burdensome to banks. By focusing
significantly shorten the time spent on data management solely on trustworthy borrowers, banks may boost their
and data pre-processing (Tang et al. 2019). profitability.
1938

FIGURE 1. The procedure of credit cart

THE DATASET obligations. Of the 30,000 debtors, 23 364 paid back their
loans, while 6636 missed payments. Around 78 percent of
the dataset’s debtors are good debtors, and 22 percent are
The dataset chosen to be used for this study includes 30,000
bad debtors. In this study, the result variable was a binary
Taiwanese credit card customers’ anonymized information
variable called default payment (Yes = 1, No = 0). As shown
from 2005 variables as explanatory data (Yeh & Lien,
in Table 1‎ , the data are divided into 23 columns with various
2009). The dataset includes characteristics of the clients,
numerical values and categorical information like
including whether or not they were in default on their
education is also hidden as a numerical value.

TABLE 1. The Data Attributes


Attribute Displaying
ID ID of each client
Sex Gender (1 = male; 2 = female)
Education 1: graduate school; 2 : university; 3 : high school; 4 : others
Marital status 1 : married; 2 : single; 3 : others
Age Year
PAY1___PAY6 History of past payment. The past monthly payment records (from April to September, 2005) were
tracked as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August,
2005; . . .; X11 = the repayment status in April, 2005. The measurement scale for the repayment status
is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment
delay for eight months; 9 = payment delay for nine months and above.
BILL_AMT1__ Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount
BILL_AMT6 of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005
PAY AMT1__ Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in
PAY AMT6 August, 2005; . . .; X23 = amount paid in April, 2005.
DEFAULT Default payment (1=yes, 0=no).

The magnitude of values varies between features due DATA PREPROCESSING


to some fields containing information about the account
limit and payment details. All features were normalized to
Data mining relies on several preparatory steps, one of
lessen the effects of this on the outcomes.
which is called preprocessing. Some features may be
redundant, while others are noisy and unimportant. Feature
1939

Selection (FS) is a technique for figuring out the most DATA REDUCTION
helpful features. Preprocessing of the dataset is necessary
when it contains useless data that is noisy (outliers),
Analyzing enormous amounts of data requires a lot of time.
unreliable (missing), and inconsistent. Any extraneous and
It can be accomplished using data cube aggregation, data
correlated data were eliminated from the dataset in order
compression, dimension reduction, data reduction, concept
to increase model accuracy and obtain useful results. We
hierarchy, and discretization development. Because a group
next used data cleaning, discretization, and target class
of academics came to the conclusion that discretization
balancing to our data to produce a dataset that would work
enhances the efficiency of the naive Bayesian algorithm,
well with our algorithms.
we discretized a continuous variable by applying it to our
dataset (Jonathan L. Lustgarten, 2008.), Therefore, the Age
DATA CLEANING
property was split into ten-year chunks, and the Limit bal
attribute was reclassified as Low, Medium, or High instead
of ((0-100,000), (100,001-500,000), and over (500,001))
The methods employed to address missing data involve in New Taiwan dollars, to the labels.. Our next step in the
excluding or imputing the affected records with a preprocessing process was data reduction, where we shrunk
predetermined value. In the case of noisy data, many the dataset to obtain two equal class representations, the
techniques can be utilized, including binning algorithms, default, and no default classes. From 30,000 records, we
clustering, a combination of human and machine were able to reduce it to 13,210, which is a 50% reduction
inspection, and regression analysis. Inconsistencies can be with 6,605 records for each class. To improve performance,
fixed manually. Some numbers in the datasets don’t have we eliminated all redundant information from our dataset;
official meanings on the UCI site; for example, the as a result, there are now only six attributes instead of 24.
Education attribute’s values ranged from one to four, but After that, a 70/30 split was performed randomly
they also had values greater than four (331). across the full data set to create a training set and a test set.
This work uses an imbalanced dataset and employs
sampling methods such as SMOTE, kNN, and Tomek-
FEATURES SELECTION
links. However, the study relied on a relatively simplistic
random over-sampling method for the response variable
A method to lessen dimensionality is feature selection. This due to technical limitations and time constraints. The model
method’s primary use is the extraction of discrete subsets performance evaluation was conducted using the test data
of pertinent features from the original dataset according to set, which was utilized to assess the models’ predictive
the assessment criterion. capabilities after training on the training set.

DATA TRANSFORMATION TECHNIQUES FOR CREDIT-SCORING EVALUATION

Data transformation is the process of converting data from The following metrics are the most widely used metrics
one format to another so that it can be used for data mining. for evaluating the effectiveness of the models in credit
A few techniques for doing transformation include scoring out of the many assessment metrics that are used
normalization, smoothing, aggregation, and generalization. in the many types of literature (Dastile et al. 2020). A
The values in the dataset were all expressed as numbers in confusion matrix is a common tool for assessing a
the records; for instance, Categorical information, such as classifier’s performance (see Table 2). All of the
sex, was encoded as a “1” for male and a “2” for female. instances in the data set are displayed in a confusion
That was problematic because including numbers in our matrix and are divided into four groups:
data would make it less relevant to our clients; therefore, TABLE 2. Confusion Matrixes Discretion
we needed to alter some columns to make them better suited
for analyzing the outcome. We converted the string Observed class Predicative Class
representations of these attributes (sex, education, and Class (1)=Good Class (0)=Bad
marital status). Class (1)=Goods True Positive False Negative
Class(0)=Bad False Positive True Negative
1940

TP (True Positive): These are the positive findings that curve, a standard classification statistic, is used. The
the model correctly predicted based on the actual data, probability of binary outcomes, which are typical in our
which, in our case, suggests that the model successfully situation for default and non-default, can be predicted using
predicted the number of defaults that actually occurred in the ROC curve. The ROC curve compares the ratio of false
the actual data. positives to genuine positives (Osei et al. 2021).
TN(True Negatives): These are the negative values The advantage of the ROC curve is that different
that the model correctly predicted based on the actual data, thresholds or modeling techniques can be compared using
suggesting that the model correctly predicted the proportion measurements of the area under the curve (AUC), with a
of non-defaults that were non-defaults in the real data in greater AUC signifying a better model. The movement
our case. along a line indicates a change in the threshold used to
FP (False Positives ): These are the positive values the classify a positive instance. Each line on the plot represents
model mistakenly predicted based on the actual data, the curve for a single model.
suggesting that the observations the model projected as The threshold is 0 (upper right) and 1 (lower left)
default but were not in our case’s real data. (lower left). The AUC, which ranges from 0 to 1 with a
FN (False Negative): These are the negative values good model scoring higher, is the area under the ROC
that the model mistakenly predicted based on the actual curve. A ROC AUC of 0.5 results from a model’s random
data, suggesting that the observations the model projected predictions. Since Sensitivity and (1 - Specificity) are
as defaults weren’t defaults in the real data in our case. plotted, the ROC Curve logically plots between True
The consequences of misclassification in the context Positive Rate and False Positive Rate.
of credit rating are very different. False positives cause the
lender to lose some or both the interest and the principal
that was supposed to be repaid. In contrast, false negatives MODEL RESULT ANALYSIS AND DISCUSSION
solely refer to the opportunity cost of lost interest that could
have been gained. Due to the fact that these individuals The model performance was assessed using the metrics
were given a loan despite the model classifying them as that were presented in Chapter 3. It might be difficult
excellent, false positives are substantially more expensive and subjective to agree on a single criterion for
(Bunker et al. 2016) (Nyangena 2019). evaluating performance, depending on the nature of the
The proportion of actual products that were accurately activity at hand. The quality of a classification model can
identified as such is known as the true positive rate: be evaluated via inspection of the confusion matrix.
After that, a 70/30 split was performed randomly
(1) across the entire data set to create a training and test set.
Sampling methods such as Tomek-links, kNN, and
SMOTE can be used on imbalanced datasets like the one
Similar to the true positive rate, the true negative rate used in this research. The study, however, opted to
is the percentage of actual bad that were accurately simply employ a random over-sampling strategy for the
categorized as such: response variable due to time and technical limitations.
The test data set was used to evaluate how well the
(2) models performed after being trained on the training set
to make predictions.
A confusion matrix is a table that is able to compare
One of the most often used metrics in the field of
the model predicted and actual classes from the labeled
accounting and finance, specifically for applications
data within the validation set. Hence, the confusion
involving credit rating, is the Percentage Correctly
matrices for each of our ensemble classifiers are shown in
Classified. The PCC rate calculates the percentage of cases
Figure (2).
in a given data set that are correctly classified as having The ideal evaluation model would actually be a
excellent credit and having bad credit. The PCC rate is an profit function, which is a function of recall and precision
important factor to consider while assessing the proposed and would need to be optimized. Trade between the TP
scoring models’ capacity for classification. (profit) and the FP (cost), both of which are captured by
the F-measure or the accuracy recall AUC, would be used
to estimate the profit. The F-measure, however, is the
(3) metric picked for this study’s evaluation of the models.
The model with the best F measure can then be modified
and verified in an effort to produce improved evaluation
To assess the accuracy of the model’s predictions of
metrics and predictions.
loan default, the Receiver Operating Characteristic (ROC)
1941

FIGURE 2. The Confusion Matrix for diffrent Classifier Models

TABLE 3. The Summary of Results of Each Model


Model Accuracy Recall Precision Roc-
AUC
Logistic Regression 61% 48% 60% 64%
K neighbours Classifier 70% 78% 65% 78%
SVM Classifier 55% 47% 64% 57%
Decision Tree Classifier 88% 95% 81% 86%
Random Forest Classifier 93% 95% 91% 97%
Ada Boost Classifier 71% 57% 75% 78%
Gradient Boosting Classifier 73% 60% 76% 79%

The results in Table 3 below demonstrate how well lowest precision score. On the other hand, the Random
the models performed using the measures after the best Forest and Decision Tree both have the greatest recall
model for each class was chosen. scores, whereas Logistic Regression has the lowest. In
Using the results table above as a guide, We conducted contrast, the Random Forest has the highest precision score,
an estimation of the range of values for the measures that followed by the Decision Tree, and the Logistic Regression
could potentially be employed to evaluate our models. The has the lowest.
Random Forest model, followed by the Decision Tree, has Figure (3) show the ROC curve of each model. The
the best accuracy score, while the SVM Classifier has the ROC curve for Random Forest has a convex circle shape,
1942

which is indicative of reduced rates of false negative and and Decision Tree are the most promising alternatives, yet
false positive errors when compared to other curves. In it is impossible to discern a preferred individual technique
other words, for each given value of sensitivity and from the curves. The skewed ROC curves observed in
specificity, Random Forest performs optimally. In addition, Logistic Regression, K neighbors SVM, Ada Boost, and
there is little to no difference in performance between Ada Gradient Boosting indicate that the increased specificity
Boost, K neighbors, and Gradient Boosting. Random Forest resulted in a markedly reduced sensitivity.

FIGURE 3. The ROC Curve for modles Classification

Choosing the kind of false that the Bank can tolerate determines. An increase in the Bank’s risk due to false
while dealing with credit risk is essential. False positives negative results consumers as low-risk, but in reality, they
will force us to turn away consumers who would otherwise would be more likely to default and cause losses for the
be profitable clients because the models incorrectly labeled business, as determined by the Recall score.
them as the Bank’s worst customers, which Precision Score
1943

Figure (4) displays the Precision-Recall (PR) curves The study concludes that machine learning models are
for each model, and the AUC values for each model are more effective at estimating credit risk when dealing with
all greater than 0.5, indicating that the models performed unbalanced information, such as credit data sets. However,
well. the models might have performed even better had the
dataset’s number of features been higher. Additionally,
more advanced sampling strategies like SMOTE may have
helped to balance out the unbalanced data set and boost
performance. Consequently, this demonstrates that the
findings are not restricted to any one particular bank and
may be applied globally to the forecasting of early instances
of corporate insolvency.
Investigate various dataset pre-processing techniques,
such as feature selection or data-filtering techniques, and
ascertain the potential effects on the outcomes. Try to utilize
a filtering condensing strategy rather than pure filtering,
which will remove both outlier items and non-informative
entries that could negatively affect the training process.

FIGURE 4. The PR(AUC) of Different Classification Models


ACKNOWLEDGEMENT

CONCLUSION AND FUTURE WORKS This research has received no external funding.

Since identifying which customers pose a high risk is more


DECLARATION OF COMPETING INTEREST
like a sliding scale than a straightforward binary decision,
the present study demonstrates the challenges of credit risk
modeling. The outcomes have shown that the challenging None.
aspect of doing binary classification is defining a boundary.
The imbalanced nature of the dataset further exacerbated
the difficulty of the algorithms’ learning. REFERENCES
For the data used in this study, it was demonstrated
that when using six data mining classification techniques Ala’raj, M., & Abbod, M. 2015. A systematic credit
in this research: K neighbors, SVM, Decision Tree, scoring model based on heterogeneous classifier
Random Forest, Ada Boost, and Gradient Boosting. ensembles. 2015 International Symposium on
Machine learning models outperformed logistic regression, Innovations in Intelligent SysTems and Applications
and the best machine learning model for credit risk (INISTA): 1–7.
Anderson, R. 2007. The Credit Scoring Toolkit: Theory
estimation was Random Forest. The random forest method
and Practice for Retail Credit Risk Management and
can handle unbalanced data with hundreds of variables. It
Decision Automation. Oxford University Press.
automatically balances data sets with rare classes. The Aniceto, M. C., Barboza, F., & Kimura, H. 2020.
logistic regression model performed poorly, and the Machine learning predictivity applied to consumer
dissecting Tree model came in second. In this study, Models creditworthiness. Future Business Journal 6(1):
were considered efficient if the ability to maximize revenue 1–14.
while minimizing the opportunity cost of false positives Bastos, J. 2007. Credit scoring with boosted decision
and false negatives in order to maximize a company’s trees.
profitability. A statistic called the ROC curve score, a Boughaci, D., & Alkhawaldeh, A. A. 2018. Three local
harmonic mean of Precision and Recall, was used to do search-based methods for feature selection in credit
this. However, in terms of Accuracy (the proportion of scoring. Vietnam Journal of Computer Science 5(2):
properly classified defaults to all observations), Random 107–121.
Bunker, R. P., Zhang, W., & Naeem, M. A. 2016.
Forest performed well, scoring 93%. The score was not
Improving a credit scoring model by incorporating
used as a performance indicator since Accuracy does not
bank statement derived features. ArXiv Preprint
consider the cost of misclassification, as shown by the false ArXiv:1611.00252.
positives and negatives.
1944

Condori-Alejo, H. I., Aceituno-Rojo, M. R., & Alzamora, Osei, S., Mpinda, B. N., Sadefo-Kamdem, J., & Fadugba,
G. S. 2021. Rural micro credit assessment using J. 2021. Accuracies of some Learning or Scoring
machine learning in a Peruvian microfinance Models for Credit Risk Measurement.
institution. Procedia Computer Science 187: 408– Ozgur, O., Karagol, E. T., & Ozbugday, F. C. 2021.
413. Machine learning approach to drivers of bank
Dastile, X., Celik, T., & Potsane, M. 2020. Statistical lending: evidence from an emerging economy.
and machine learning models in credit scoring: A Financial Innovation 7(1): 1–29.
systematic literature survey. Applied Soft Computing Pławiak, P., Abdar, M., Pławiak, J., Makarenkov, V., &
91: 106263. Acharya, U. R. (2020). DGHNL: A new deep genetic
Durand, D. 1941. Risk Elements in Consumer Installment hierarchical network of learners for prediction of
Financing. National Bureau of Economic Research, credit scoring. Information Sciences 516: 401–418.
New York. Tang, L., Cai, F., & Ouyang, Y. 2019. Applying a
Leo, M., Sharma, S., & Maddulety, K. 2019. Machine nonparametric random forest algorithm to assess
learning in banking risk management: A literature the credit risk of the energy industry in China.
review. Risks 7(1): 29. Technological Forecasting and Social Change 144:
Moradi, S., & Mokhatab Rafiei, F. 2019. A dynamic credit 563–572.
risk assessment model with data mining techniques: Yeh, I.-C., & Lien, C. 2009. The comparisons of data
Evidence from Iranian banks. Financial Innovation mining techniques for the predictive accuracy of
5(1): 1–27. probability of default of credit card clients. Expert
Mukid, M. A., Widiharih, T., Rusgiyono, A., & Prahutama, Systems with Applications 36(2): 2473–2480.
A. 2018. Credit scoring analysis using weighted k Zhang, Y., & Wang, J. 2016. K-nearest neighbors
nearest neighbor. Journal of Physics: Conference and a kernel density estimator for GEFCom2014
Series 1025(1): 12114. probabilistic wind power forecasting. International
Nyangena, B. O. 2019. Consumer Credit Risk Modelling Journal of Forecasting 32(3): 1074–1080.
Using Machine Learning Algorithms: A Comparative Zhou, X., Cheng, S., Zhu, M., Guo, C., Zhou, S., Xu, P.,
Approach. Strathmore University. Xue, Z., & Zhang, W. 2018. A state of the art survey
of data mining-based fraud detection and credit
scoring. MATEC Web of Conferences 189: 3002.

You might also like