Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
Science (IJAERS)
Peer-Reviewed Journal
ISSN: 2349-6495(P) | 2456-1908(O)
Vol-8, Issue-9; Sep, 2021
Journal Home Page Available: https://ijaers.com/
Article DOI: https://dx.doi.org/10.22161/ijaers.89.20
Received: 14 Aug 2021, Abstract—Most large Brazilian institutions working with credit
Received in revised form: 15 Sep 2021, concession use credit models to evaluate the risk of consumer loans. Any
improvement in the techniques that may bring about greater precision of a
Accepted: 22 Sep 2021,
prediction model will provide financial returns to the institution. The first
Available online: 30 Sep 2021 phase of this study introduces concepts of credit and risk. Subsequently,
©2021 The Author(s). Published by AI with a sample set of applicants from a large Brazilian financial institution,
Publication. This is an open access article three credit scoring models are built applying these distinct techniques:
under the CC BY license Logistic Regression, Neural Networks and Genetic Algorithms. Finally,
(https://creativecommons.org/licenses/by/4.0/). the quality and performance of these models are evaluated and compared
to identify the best. Results obtained by the logistic regression and neural
Keywords—credit risk, credit scoring models,
network models are good and very similar, although the first is slightly
genetic algorithms, logistic regression, neural
better. Results obtained with the genetic algorithm model are also good,
networks.
but somewhat inferior. This study shows the procedures to be adopted by a
financial institution to identify the best credit model to evaluate the risk of
consumer loans. Use of the best fitted model will favor the definition of an
adequate business strategy thereby increasing profits.
More extensive use of the models in the sixties (1997)further stress Discriminant Analysis, Linear
transformed business in the American market (Thomas, Regression and Decision Trees as methods that can be
2000). Not only companies in the financial area, but also used in practice. There is no method that is clearly better
the large retailers began to use credit scoring models to than the others, everything depends upon how the elected
carry out credit sales to their consumers. Retailers such as technique fits the data.
Wards, Bloomingdale’s and J.C. Penney were some of the 6. Definition of the comparison criteria of the
pioneers in this segment. models
In Brazil the background is shorter. Financial Measurement for the comparison of the models will be
institutions started to make an intensive use of credit defined here, normally by the rate of hits and the
scoring models only in the mid-nineties. Kolmogorov-Smirnov (KS) statistics.
There are some steps to be followed to construct a 7. Selection and implementation of the best model
credit scoring model; such as:
The best model is chosen using the previously defined
1. Survey of a historical background of the clients criteria. As such, the implementation of the model must be
The basic supposition to construct a model of credit programmed. The institution must adjust its systems to
evaluation is that the clients have the same behavior receive the final algorithm and program its utilization in
pattern over time; therefore models are constructed based coordination with the other areas involved.
upon past information. The availability and quality of the
data bank are fundamental for the success of the model
III. METHODOLOGICAL PROCEDURES
(Jain et al., 2020)
3.1 Description of the Study
2. Classification of clients according to their
behavior pattern and definition of the dependent variable A financial institution wishes to grant loans to its
clients and therefore it requires a tool to assess the level of
In addition to good and bad clients there are also the
risk associated to each loan to support the decision making
excluded clients, those who have peculiar characteristics
process. To set up this project, information on the history
and should not be considered (for instance, workers in the
of the clients that contracted personal credit was made
institution) and the indeterminate clients, those on the
available.
threshold of being good or bad, still without a clear
position about them. In practice, institutions consider only The product under study is personal credit. Individual
the good and bad clients to build the model because it is credit is a rapid and practical consumer credit operation.
much easier to work with binary response models. This The purpose of the loan does not need to be stated, and the
tendency to work only with good and bad clients is also loan will be extended according to the applicant’s credit
noticed in academic works (Amaral & Iquiapaza, 2020; scoring.
Gonçalves et al., 2013; Locatelli et al., 2015; Ríha, 2016). Another characteristic of the product in question is the
3. Selection of a random sample representative of lack of requirement of goods as a guarantee of
the historical background payment.The modality with pre-fixed interest rates with
the loan terms ranging from 1 to 12 months was focused
It is important that the samples of good and bad clients
for this study.
have the same size so as to avoid any possible bias due to
size difference. There is no fixed number for the sample; 3.2 The Data
however Lewis (1992)suggests a sample of 1,500 good To carry out this study a random selection was made in
clients and 1,500 bad clients to achieve robust results. a universe of clients of the bank, 10,000 credit contracts,
Habitually three samples are used, one for building of the considered as good and 10,000 considered as bad. All
model, another for the validation of the model and a third these contracts had already matured, that is to say the
to test the model. sample was collected after the due date of the last
4. Descriptive analysis and preparation of data installment of all contracts. This is an historical database
with monthly information on the utilization of the product.
This consists of analyzing, according to statistic
Based upon this structure, the progress of the contract
criteria, each variable that will be utilized in the model.
could be accompanied and particularized when the client
5. Choice and application of techniques to be used did not pay one or more installments.
in the construction of the model
In the work, the sample is divided into three sub-
Logistic Regression, Neural Networks and Genetic samples coming from the same universe of interest: one
Algorithms will be used in this work. Hand & Henley
The model of Logistic Regression is a particular case It was only in the eighties that, because of the greater
of the Generalized Linear Models(Lopes et al., 2017). The computational power, neural networks were widely studied
function which characterizes the model is given by(Ye & and applied. Rojas (1996)underlines the development of
Bellotti, 2019): the backpropagation algorithm as the turning point for the
popularity of neural networks.
p( X )
ln = ' X = Z An artificial neural network model processes certain
1 − p( X ) characteristics and produces replies like those of the
human brain. Artificial neural networks are developed
' = ( 0 , 1 , 2 ,..., n ) : vector of the parameters using mathematical models in which the following
associated to the variables suppositions are made (Rojas, 1996):
p(X)=E(Y=1|X): probability of the individual has been 1. Processing of information takes place within the
classified as good, given the vector X. so-called neurons;
This probability is expressed by (Gonçalves et al., 2. Stimuli are transmitted by the neurons through
2013): connections;
3. Each connection is associated to a weight which, 1. Feedforward networks with a single layer are the
in a standard neural network, multiplies itself upon simpler network, in which there is only one input layer and
receiving a stimulus; one output layer. Some networks utilizing this architecture
4. Each neuron contributes for the activation are: the Hebb Network, perceptron, ADALINE, among
function (in general not linear) to determine the output others.
stimulus (response of the network). 2. Multilayered feedforward networks are those
The pioneer model by McCulloch and Pitts having one or more intermediate layers. The multilayer
(McCulloch & Pitts, 1943)for one processing unit (neuron) perceptron networks (MLP), MADALINE and of a radial
can be summarized in: base function are some of the networks utilizing this
architecture.
• Signals are presented upon input;
3. Recurrent networks: in this type of network, the
• Each signal is multiplied by a weight that
output layer has at least one connection that feeds back the
indicates its influence on the output of the unit;
network. The networks called BAM (Biderectional
• The weighted sum of the signals which produces Associative Memory) and ART1 and ART2 (Adaptative
a level of activity is made; Resonance Theory) are recurring networks.
• If this level exceeds a limit, the unit produces an The most important quality of neural networks is the
output. capacity to “learn” according to the environment and
thereby improve their performance (Deiu-merci & Mayou,
There are input signals X 1 , X 2 ,..., X p and
2018).
corresponding weights W1, W2 ,..., Wp and the limit There are essentially three types of learning:
being k. 1. Supervised Learning: in this type of learning the
In this model the level of activity is given by: expected reply is indicated to the network. This is the case
p
of this work, where a priori it is already known whether
a = Wi X i the client is good or bad.
i =1 2. Non-supervised Learning: in this type of learning
the network must only rely on the received stimuli; the
And the output is given by:
network must learn to cluster the stimuli;
y = 1, if a k
3. Reinforcement Learning: in this type of learning,
y = 0, if a < k behavior of the network is assessed by an external
Three characteristics must be taken into account in the reviewer.
definition of a model of neural networks: the form of the Berry & Linoff (2004) point out the following positive
network called architecture, the method for determination points in the utilization of neural networks:
of the weights, called learning algorithm; and the
• They are versatile: neural networks may be used
activation function.
for the solution of different types of problems such as:
Architecture relates to the format of the network. Every prediction, clustering or identification of patterns;
network is divided in layers, usually classified into three
• They are able to identify non-linear relationships
groups(Akkoç, 2012):
between variables;
• Input Layer where the patterns are presented to
• They are widely utilized, can be found in various
the network;
software.
• Intermediate or Hidden layers in which the major
As for the disadvantages the authors state:
part of processing takes place, by means of the weighted
connections, they may be viewed as extractors of • Results cannot be explained: no explicit rules are
characteristics; produced, analysis is performed inside the network and
only the result is supplied by the “black box”;
• Output Layer, in which the end result is
concluded and presented. • The network can converge towards a lesser
solution: there are no warranties that the network will find
There are basically three main types of architecture:
the best possible solution; it may converge to a local
feedforward networks with a single layer; feedforward
maximum.
networks with multiple layers and recurring networks.
3.7 Genetic Algorithms Some of the disadvantages pointed out in literature are:
The idea of genetic algorithms resembles the evolution • They continue to be seldom used for problems of
of the species proposed by Darwin: the algorithms will assessment of risk credit (Fensterstock, 2005)
evolve with the passing of generations and the candidates • Require a major computational effort (Berry &
for the solution of the problem one wants to solve “stay Linoff, 2004)
alive” and reproduce(Silva et al., 2019).
• Are available in only a few softwares(Berry &
The algorithm is comprised of a population which is Linoff, 2004)
represented by chromosomes that are merely the various
Criteria for Performance Evaluation
possible solutions for the proposed problem. Solutions that
are selected to shape new solutions (starting from a cross- To evaluate performance of the model two samples
over) are selected according to the fitness of the parent were selected, one for validation and the other for test.
chromosomes. Thus, the more fit the chromosome is, the Both were of the same size (3,000 clients considered good
higher the possibility of reproducing itself. This process is and 3,000 considered bad, for each one). In addition to the
repeated until the rule of halt is satisfied, that is to say to samples, other criteria are used, which are presented in this
find a solution very near to that hoped for. section.
Every genetic algorithm goes through the following 3.8 Score of Hits
stages: The score of hits is measured by dividing the total of
Start: initially a population is generated formed by a clients correctly classified, by the number of clients
random set of individuals (chromosomes) that may be included in the model.
viewed as possible solutions for the problem. Similarly, the score of hits of the good and bad clients
Fitness: a function of fitness is defined to evaluate the can be quantified.
“quality” of each one of the chromosomes. In some situations it is much more important to identify
Selection: according to the results of the fitness a good client than a bad client (or vice versa); in such
function, a percentage of the best fit is maintained while cases, often a more fitting weight is given to the score of
the others are rejected (Darwinism). hits and a weighted mean of the score of hits is calculated.
Cross-over: two parents are chosen and based upon In this work, as there is not a priori information on
them an offspring is generated, based on a specific cross- what would be more attractive for the financial institution
over criterion. The same criterion is used with another (identification of the good or bad clients), the product
chromosome and the material of both chromosomes is between the score of hits of good and bad clients (Ih) will
exchanged. If there is no cross-over, the offspring is an be used as an indicator of hits to evaluate the quality of the
exact copy of the parents. model. This indicator will privilege the models with high
scores of hits for both types of clients. The greater the
Mutation is an alteration in one of the genes of the
indicator is the better will be the model.
chromosome. The purpose of mutation is to avoid that the
population converges to a local maximum. Thus, should 3.9 The Kolmogorov-Smirnov Test
this convergence take place, mutation ensures that the The Kolmogorov-Smirnov (KS) is the other criterion
population will jump over the minimum local point, often used in practice and used in this work(Fonseca et al.,
endeavoring to reach other maximum points. 2019; Lin, 2013; Machado, 2015).
Verification of the halt criterion: once a new generation The KS test is a non-parametric technique to determine
is created, the criterion of halt is verified and should this whether two samples were collected from the same
criterion not have been met, one returns to the stage of the population (or from populations with similar
fitness function. distributions)(Jaklič et al., 2018). This test is based on the
The following positive points in the utilization of accumulated distribution of the scores of clients
genetic algorithms must be highlighted: considered good and bad.
• Contrariwise to neural networks they produce To check whether the samples have the same
explicable results (Berry & Linoff, 2004) distribution, there are tables to be consulted according to
the significance level and size of the sample (Siegel &
• Their use is easy (Berry & Linoff, 2004)
Castellan Jr, 2006). In this work, as the samples are large,
• They may work with a large set of data and tendency is that all models reject the hypothesis of equal
variables (Fensterstock, 2005) distributions. The best model will be that with the highest
value in the test, because this result indicates a larger • The variables commercial telephone and home
spread between the good and bad. telephone were recoded in the binary form as ownership or
not;
9264.686 1825.669 28 0.000 Training of the networks consists in finding the set of
Wi weights that minimizes one function of error. In this
work for the training will be used the Back propagation
The model of 28 variables disclosed that the reduction algorithm. In this algorithm the network operates in a two
of the -2LL measure was statistically significant. step sequence. First a pattern is presented to the input layer
The Hosmer and Lemeshow test considers the of the network. The resulting activity flows through the
statistical hypothesis that the predicted classifications in network, layer by layer until the reply is produced by the
groups are equal to those observed. Therefore, this is a test output layer. In the second step the output achieved is
of the fitness of the model to the data. compared to the desired output for this particular pattern.
If not correct, the error is estimated. The error is
The chi-square statistic presented the outcome 3.4307,
propagated starting from the output layer to the input layer,
with eight degrees of freedom and descriptive level equal
and the weights of the connections of the units of the inner
to 0.9045. This outcome leads to the non rejection of the
layers are being modified, while the error is
null hypothesis of the test, endorsing the model’s
backpropagated. This procedure is repeated in the
adherence to the data.
successive iterations until the halt criterion is reached.
4.3 Neural Network
In this model the halt criterion adopted was the mean
In this work, a supervised learning network will be error of the set of validation data. This error is calculated
used, as it is known a priori whether the clients in question by means of the module of the difference between the
are good or bad. According to Potts (1998: 44), the most value the network has located and the expected one. Its
used structure of neural network for this type of problem is mean for the 8,000 cases (training sample) or the 6,000
the multilayer perceptron (MLP) which is a network with a cases (validation sample) is estimated. Processing detected
feedforward architecture with multiple layers. Consulted that the stability of the model took place after the 94th
literature (Akkoç, 2012; Deiu-merci & Mayou, 2018; iteration. In the validation sample the error was somewhat
Olson et al., 2012; Ríha, 2016) supports this statement. larger (0.62 x 0.58), which is common considering that the
The network MLP will also be adopted in this work. model is fitted based upon the first sample.
The MLP networks can be trained using the following Initially, the bad classification is of 50%, because the
algorithms: Conjugate Descending Gradient, Levenberg- allocation of an individual as a good or bad client is
Marquardt, Back propagation, Quick propagation or Delta- random; with the increase of the iterations, the better result
bar-Delta. The most common (Rojas, 1996)is the Back of 30.6% of error is reached for the training sample and of
propagation algorithm which will be detailed later on. 32.3% for the validation sample.
The implemented model has an input layer of neurons, Some of the statistics of the adopted network are in
a single neuron output layer, which corresponds to the table 2.
outcome whether a client is good or bad in the
Table2:Neural network statistics
classification of the network. It also has an intermediate
layer with three neurons, since it was the network which Obtained statistics Test Validation
presented the best outcomes, in the query of the higher Misclassification of cases 0.306 0.323
percentage of hits as well as in the query of reduction of
Mean error 0.576 0.619
the mean error. Networks which had one, two or four
neurons were also tested in this work. Mean square error 0.197 0.211
Each neuron of the hidden layer is a processing Degrees of freedom of the 220
element that receives n inputs weighted by weights Wi. model
The weighted sum of inputs is transformed by means of a Degrees of freedom of the 7780
nonlinear activation function f(.). error
The activationfunctionused in Total degrees of freedom 8000
1
thisstudywillbethelogisticfunction , where
1 + e ( −g )
Besides the misclassification and the mean error, the Fitness Function: each client was associated to the
square error and the degrees of freedom are also presented. estimate of a score and classified as good or bad. By
The average square error is calculated by the average of comparing with the information already known a priori on
the squares of the differences between that observed and the nature of the client, the precision of each chromosome
that obtained from the network. can be calculated. The indicator of hits (Ih), will be the
The number of degrees of freedom of the model is fitness function, that is to say, the greater the indicator the
related to the number of estimated weights, to the better will be the chromosome.
connection of each of the attributes to the neurons of the Selection: In this work an elitism of 10% was used for
intermediate layer and to the binding of the intermediate each new generation, the twenty best chromosomes are
layer with the output. maintained while the other hundred and eighty are formed
4.4 Genetic Algorithms by cross over and mutation.
The genetic algorithm was used to find a discriminate Cross-over: to chose the parents for cross-over the
equation permitting to score clients, and later, separate the method known as roulette wheel was used for selection
good from the bad according to the score achieved. The among these twenty chromosomes that were
equation scores the clients and those with a higher score maintained(Oreski et al., 2012). In this method, each
are considered good, while the bad are those with a lower individual is given one probability of being drawn
score. This route was adopted by Metawa et al., (2017) and according to its value of the fitness function.
Picinini et al. (2003). For the process of exchange of genetic material a
The implemented algorithm was similar to that method known as uniform cross-over was used(Galvan,
presented in Picinini et al. (2003). Each one of the 71 2016). In this type of cross-over each gene of the offspring
categories of variables was given an initial random weight. chromosome is randomly chosen among the genes of one
To these seventy one coefficients, one more was of the parents, while the second offspring receives the
introduced, an additive constant incorporated to the linear complementary genes of the second father.
equation. The value of the client score is given by: Mutation: in the mutation process, each gene of the
chromosome is independently evaluated. Each gene of
( )
72
S j = w i p ij , where each chromosome has a 0.5% probability of undergoing
i =1 mutation. Whenever a gene is chosen for mutation, the
genetic alteration is performed, adding a small scalar value
S j = Score obtained by client j k in this gene. In the described experiment a value ranging
from -0.05 and + 0.05 was randomly drawn.
w i = Weight relating to the category i
Verification of the halt criterion: a maximum number
p ij = binary indicator equal to 1, if the client j has of generations equal to 600 was defined as the halt
criterion. After six hundred iterations, the fit chromosome
the category i and 0, conversely.
will be the solution.
The following rule was used to define if the client is
Results of the algorithm that had the highest Indicator
good or bad:
of hits are presented here.
If S j 0 , the client is considered good After execution of the algorithm, variables with a very
small weight were discarded. In the work by Picinini et al.
If S j 0 , the client is considered bad
(2003) the authors consider that the variables with a
As such, the problem the algorithm has to solve is to weight lower than 0.15 or higher than -0.15 would be
find the vector W= [ w1, w 2 ,..., w 72 ] resulting in a discarded because they did not have a significant weight
for the model. In this work, after performing a sensitivity
classification criterion with a good rate of hits in analysis, it was decided that the variables with a weight
predicting the performance of payment of credit. higher than 0.10 or lower than – 0.10 would be considered
Following the stages of a genetic algorithm, one has: significant for the model. This rule was not applied for the
Start: a population of 200 individuals was generated constant, which was proven important for the model even
with each chromosome holding 72 genes. The initial with a value below cutoff.
weight w i of each gene was randomly generated in the 4.5 Evaluation of the Models’ Performance
interval [-1, 1] (Picinini et al., 2003) After obtaining the models the three samples were
scored and the Ih and KS were calculated for each of the
All presented good classification results, because, KS values in all models can be considered good.
according toPicinini et al. (2003): “credit scoring models Again, Picinini et al. (2003) explain: “The Kolmogorov-
with hit rates above 65% are considered good by Smirov test (KS) is used in the financial market as one of
specialists”. the efficiency indicators of the credit scoring models. A
The hit percentages were very similar in the models of model which presents a KS value equal or higher than 30
logistic regression and neural network and were somewhat is considered good by the market”. Here again, the logistic
lesser for the model of genetic algorithms. Another regression and neural network models exhibit very close
interesting result is that, except for genetic algorithms, the results, superior to those achieved by the genetic
models presented the greatest rate of hits for bad clients, algorithm.
with a higher than 70% rate for bad clients in the three In choosing the model that best fits these data and
samples of the logistic and neural network models. analyzing according to the Ih and KS indicators, the model
Table 4 presents results of the criteria Ih and KS which built by logistic regression was elected. Although results
were chosen to compare the models. were very similar to those achieved by neural networks
this model presented the best results in the test sample,
Table 4: Comparison indexes
suggesting that it is best fit for application in other
Ih Training Validation Test databases. Nevertheless, it must be highlighted that the
Logistic regression 47.9 45.1 46.6 adoption of any one of the models would bring about good
results for the financial institution.
Neural network 47.9 45.3 45.3
Genetic algorithm 45.7 42.3 44.2
V. CONCLUSION
KS Training Validation Test
The objective of this study was to develop credit
Logistic regression 38 35 37 scoring predictive models based upon data of a large
Neural network 39 35 35 financial institution by using Logistic Regression,
Artificial Neural Networks and Genetic Algorithms.
Genetic algorithm 34 30 32
When developing the credit scoring models some care
must be taken to guarantee the quality of the model and its