Survival Analysis
Survival Analysis
Abstract: The aim of the paper is to discuss credit scoring modeling of a customer revolving credit depending on
customer application data and transaction behavior data. Logistic regression, survival analysis, and neural network
credit scoring models were developed in order to assess relative importance of different variables in predicting the
default of a customer. Three neural network algorithms were tested: multilayer perceptron, radial basis and
probabilistic. The radial basis function network model produced the highest average hit rate. The overall results show
that the best NN model outperforms the LR model and the survival model. All three models extracted similar sets of
variables as important. Working status and client's delinquency history are the most important features for customer
revolving credit scoring on the observed dataset.
Key-Words: credit scoring modeling, logistic regression, revolving credit, survival analysis, neural networks
1
Proceedings of the 7th WSEAS International Conference on Neural Networks, Cavtat, Croatia, June 12-14, 2006 (pp164-169)
LR as a classical one, survival analysis in order to variables have a multiplier effect on the hazard rate
observe the client’s behavior in time, and NN as a which does not depend on time t:
nonparametric intelligent method which was ( )
h(t ) = exp xτ β h 0 (t ) (5)
frequently used in data mining to discover nonlinear
relationships among variables.
Cox’s regression is used in order to find determinants
of default in personal open-end accounts, including
2.1 Logistic regression
time to default and to provide the likelihood of default
The reason for using LR is to find determinants of
in the period of next 6 months. The Cox procedure is
default in revolving credit in personal open-end
used because it enables estimation of the regression
accounts, as well as to provide the probability that a
parameters without knowing the type of baseline
client will default in the next 6 months. LR modeling
hazard function h0(t). As we are not, at this moment,
is widely used for analyzing multivariate data
interested in the form of the hazard function, but only
involving binary responses that we deal with in credit
in the significant variables, it was sufficient for this
scoring modeling. Since the likelihood function of
purpose [7]. The forward selection procedure,
mutually independent variables Y1 , Κ , Yn , with available in SAS software, is used for the variable
outcomes measured on a binary scale, is a member of extraction.
the exponential family with
⎛ ⎛ π1 ⎞ ⎛ πn ⎞⎞ 2.3 Neural network classifiers
⎜ log⎜ ⎟, Κ , log ⎜⎜ ⎟⎟ ⎟ (1)
⎜ ⎜1− π ⎟ ⎟ Although many research results show that NNs can
⎝ ⎝ 1 ⎠ ⎝1− π n ⎠⎠ solve almost all problems more efficiently than
as a canonical parameter ( π j is a probability that Y j traditional modelling and statistical methods, there are
some opposite research results showing that statistical
becomes 1), the assumption of the LR model is a linear
methods in particular data samples outperform NNs.
relationship between canonical parameter and the
The lack of standardized paradigms that can determine
vector of explanatory variables xj (dummy variables
the efficiency of certain NN algorithms and
for factor levels and measured values of covariates):
architectures in a particular problem domain is
⎛ πj ⎞
log⎜ ⎟ = xτj β emphasized by many authors [11]. Therefore, we test
(2)
⎜1− π ⎟ three different NN classifiers: backpropagation, radial
⎝ j ⎠ basis function network, and probabilistic network.
This linear relationship between the logarithm of odds First two algorithms were tested using tangent
and the vector of explanatory variables results in a hyperbolic functions in the hidden layer, and the
nonlinear relationship between the probability of Yj softmax activation function in the output layer in order
equals 1 and the vector of explanatory variables: to obtain probabilities. The learning rate ranged from
π j = exp(xτj β ) (1 + exp(xτj β ))
0.01 to 0.08 during the training phase. The momentum
(3) was set to 0.3. Overtraining was avoided by a cross-
In order to extract important variables, the forward validation procedure which alternatively trains and
selection procedure, available in SAS software, is used tests the network until the performance of the network
with standard overall fit measures. on the test sample does not improve for n number of
iterations. After training and cross-validating the
2.2 Survival analysis network on maximum 10000 iterations, all the NN
Survival-based model is created using the length of algorithms were tested on the out-of-sample data in
time before a loan defaults (let us denote it by T). The order to determine its generalization ability. The
main interest in survival modeling is in describing the probabilistic neural network (PNN) algorithm was
probability that T is larger then a given time t (the chosen due to its fast learning and efficiency. It is a
survivor function S(t)). The most interesting object for stochastic-based network, developed by Specht [11],
this is a hazard function, h(t), which can be described which uses nonparametric estimation methods for
by the equation: classification.
h(t)∆t=Prob{t ≤ T ≤ t+∆t | T ≥ t} (4) The topology of networks consisted of an input
which denotes the probability that a loan defaults in layer, a hidden or a pattern layer, and an output layer.
the next instant period conditioned by the fact that it The number of hidden neurons was optimized by a
survives at least to the time t [7]. In the proportional pruning procedure. The maximum number of hidden
hazard models, it is a hazard function that is modeled units was initially set to 50 in the backpropagation and
using the explanatory variable. The main assumption RBFN. The number of hidden units in the probabilistic
of a proportional hazard model is that the explanatory NN was set to the size of the training sample.
2
Proceedings of the 7th WSEAS International Conference on Neural Networks, Cavtat, Croatia, June 12-14, 2006 (pp164-169)
The inputs with small fan-out weights were pruned accuracy of 71.98% by NNs and 69.32% by LDA,
during the learning phase in order to select important where the NNs significantly outperformed LDA.
variables. Sensitivity analysis is performed on the test
sample in order to determine the significance of
selected variables to the output. 4 Variables and data
Data was collected randomly in a Croatian bank,
covering the period of 12 months in 2004. An
3 Review of previous research results observation point is settled in the middle of the period,
Previous research has shown that common issues in on June 30. A period preceding this point is the
behavior scoring models are delinquency history, performance period and the characteristics of the
payment history, and usage history [12]. Hamilton performance in this period were used for developing
and Khan [8] found that important discriminating scoring models. On the basis of its performance, in the
variables between those who defaulted and those who period of 6 months after the observation point, a client
did not are behavior characteristics such as cash is defined as “good” or “bad” [14]. A client is “bad” if
advances, minimum payment due, and interest paid in she/he exceeds a contracted overdraft for more than 35
the previous period. Avery et al. [2] emphasize that a days during the period of 6 months. Otherwise, a client
failure to consider situational circumstances in credit is considered to be “good”. The total number of 35
scoring may influence the accuracy of the model. input variables that deal with personal open-end
Their results show that unemployment rate is accounts is used. The variables can be divided into
positively associated with estimated likelihood of three main groups: (i) demographic data; (ii) socio-
default, than there is a higher probability of default economic data; (iii) behavior data – repayment and
with lower income individuals. Marital status also usage (average values for the period of 6 months). Due
affects probability of default. Andreeva et al. [1] to the lack of credit bureau in the country it was not
analyzed a revolving store card and found out that a possible to examine the influence of some other
combination of application, purchase and behavior variables that create credit bureau score.
characteristics is the most important in predicting time
of default. The research of Noh et al. [13] shows that 5 Sampling procedure
the default increases with a higher utilization rate, but The total sample consisted of 44087 cases. Majority of
decreases with a higher contracted overdraft, a higher applicants in the sample were good (96.80%), and
number of sales transactions made over the account, a 3.20% were bad. Using the appropriate cut-off [5], the
client's age and her/his relationship with the bank. Dey LR model was estimated on the in-sample data (34879
and Mumy [6] found that the higher clients’ income, or approximate 80% of cases), and validated on the
age, longer usage of the account, the lower their out-of-sample data (9208 or approximate 20% of
likelihood to default. Also, less riskier clients are those cases). Because of the nature of its objective function,
who are employed and haven’t previously experienced NNs require equal number of good and bad applicants
bankruptcy. in the training sample. Therefore, the surplus of good
Concerning the methodology used in credit applicants was removed from the in-sample data and
scoring modeling, the most frequent method is the LR NN models were trained on the equal number of good
[9]. Survival analysis is also often used in behavior and bad applicants, cross-validated in order to
scoring modeling. Banasik et al. [3] compare logistic optimize the training time and the network topology,
regression to survival analysis in analysing a personal and finally tested on the out-of-sample test data. The
loan data set. Andreeva et al. [1] showed that survival subsamples were created using a random selection of
analysis is competitive with LR and there is a little cases into the train, cross-validation and test sample,
difference in classification accuracy between while keeping the equal distribution of good and bad
parametric models, non-parametric Cox PH model and applicants in the train and cross-validation sample.
LR. Baesens et al. [4] showed that LR and Cox have The proportion of good and bad applicants in the
the same accuracy in predicting default in the first 12 validation sample corresponds to the proportion of
months, while NNs were more accurate, but not good and bad applicants in the whole sample. In order
significantly. In predicting loan default between 12 to enable the comparison, the same out-of-sample
and 24 months LR had a hit rate of 78.24%, Cox validation data was used to test LR, Cox and NN
model 77.50% and NNs 78.58%, where NNs were models. The distribution of applicants in the
significantly better than the Cox model. Malhotra and subsamples separately for LR, Cox, and NN models is
Malhotra [10] used NNs and LDA to classify given in Tables 1 and 2.
consumer loans and obtained the overall mean
3
Proceedings of the 7th WSEAS International Conference on Neural Networks, Cavtat, Croatia, June 12-14, 2006 (pp164-169)
Table 1. Distribution of applicants in the sample On the basis of the LR model results it can be seen that
- LR and Cox models default is increased: (i) with higher withdrawals by
No. Good cheques; (ii) with higher number of days client’s
Bad applicants
Subsample of applicants overdraft exceeds contracted overdraft; (iv) with
cases No. % No. % higher number of continuous months with overdraft
Estimation 34879 33745 96.75 1134 3.25 exceeded; (v) if the client doesn’t have a permanent
Test 9208 8930 96.98 278 3.02 job; (vi) if the client doesn’t have a second job. The
Total 44087 42675 96.80 1412 3.20
opposite influence on probability of default has been
found in the following variables: (i) higher amount of
contracted overdraft; (ii) higher salaries; (iii) higher
Table 2. Distribution of applicants in the sample
amount of cash paid from the account; (iv) higher
- NN models
amount of payments to the account from other
No. Good
Bad applicants account; (v) higher balance; (vi) higher number of
Subsample of applicants
days from the last time the client exceeded contracted
cases No. % No. %
overdraft in the previous period; (vii) older age of the
Train 1800 900 50.00 900 50.00 client and higher number of years a client owns the
Cross- account.
450 225 50.00 225 50.00
validation
Test 9208 8930 96.98 278 3.02
Total 11458 10055 87.76 1403 12.24 6.2 Survival analysis model results
The final Cox’s regression scoring model, using
forward selection, ended with 10 input variables. The
6 Results of credit scoring models Cox’s regression model has the following standard
overall fit measures: Score=8777.4851 (p<.0001),
6.1 Logistic regression scoring model Wald=3407.2117 (p<.0001). The ROC curve (Fig. 2)
The final LR scoring model, using forward selection, has also been used as a measure of performance in the
ended with 18 variables, and has the following same way as in the LR model. Test results of the
standard overall fit measures: Score= 8309.4855, (p model obtained on the holdout sample showed the
<.0001), Wald= 2919.9757 (p <.0001)]. average hit rate of 77,7%, good hit rate of 75,91% and
In order to measure score performance ROC Curve bad hit rate of 79,50%.
is used as a plot of the cumulative score distribution of
the bad accounts versus good accounts (Fig. 1), where 1,2
C um ul a tiv e d i st ri bu t io n o f go o d a p p lic a n ts
1,2
D is tr ib u tio n f u nc t io n o f go o d a p p lic a n ts
0,0
1,0
-0,2
-0,2 0,0 0,2 0,4 0,6 0,8 1,0 1,2
Cumulative distribution of bad applicants
0,8
4
Proceedings of the 7th WSEAS International Conference on Neural Networks, Cavtat, Croatia, June 12-14, 2006 (pp164-169)
5
Proceedings of the 7th WSEAS International Conference on Neural Networks, Cavtat, Croatia, June 12-14, 2006 (pp164-169)
the above that payment history is the most important [5] P. Beling, Z. Covaliu, R.M. Oliver, Optimal
for credit scoring modeling of open ended accounts. Scoring Cutoff Policies and Efficient Frontiers,
Journal of the Operational Research Society Vol.
6 Conclusion and discussion 56, 200, pp. 1016-1029.
The paper aimed to identify important features for [6] S. Dey, G. Mumy, Determinants of Borrowing
customer revolving credit scoring modeling for open- Limits on Credit Cards, Bank of Canada Working
end accounts on a Croatian dataset. A standard LR is Paper 2005-7, 2005.
used in addition to Cox survival analysis and three NN [7] E. Frank, J.R. Harrell, Regression Modeling
algorithms in order to classify clients into good or bad Strategies, Springer-Verlag, New York, 2001.
ones due to the probability of their default. The most [8] R. Hamilton, M. Khan, Revolving Credit Card
successful NN algorithm was the radial basis function Holders: Who Are They and How Can They Be
network producing higher average hit rate than the LR Identified?, The Service Industries Journal, Vol.
and Cox models. 21, No. 3, 2001, pp. 37-48.
The above findings confirm some previous results [9] D.J. Hand, W.E. Henley, Statistical Classification
in consumer behavior in open-end accounts, in the Methods in Consumer Credit Scoring: A Review,
sense that delinquency, payment, and usage history Journal Royal Statistical Society A, Vol. 160,
were also found important by other authors. However, 1997, pp. 523-541.
it also shows that the working status (having a [10] R. Malhotra, D.K. Malhotra, Evaluating
permanent and a second job) is a specific feature Consumer Loans Using Neural Networks, Omega,
extracted on the examined dataset. It is not surprising Vol. 31, 2003, pp. 83-96.
due to the specific economics conditions present in the [11] T. Masters, Advanced Algorithms for Neural
observed dataset, such as high unemployment rate. In Networks, A C++ Sourcebook, John Wiley &
order to develop models that will be more sensitive to Sons, 1995.
macroeconomic factors, additional variables such as [12] H. McNab, A. Wynn, Principles and Practice
unemployment rate, interest rate, stock market, of Consumer Credit Risk Management, CIB
housing market, etc., should be included into the data. Publishing, Canterbury, 2000.
Also, more significant results in testing the impact of [13] H.J. Noh, T.H. Roh, I. Han, Progonostic
economy would be accomplished if time seria Personal Credit Risk Model Considering Censored
approach is taken with more than one data sample. In Information, Expert Systems with Applications,
order to achieve generalization of the results, more Vol. 28, 2005, pp. 753-762.
datasets from different transitional countries should be [14] L.C. Thomas, J. Ho, W.T. Scherer, Time Will
included. The additional NN models that deal with Tell: Behavioural Scoring And The Dynamics of
time series and survival analysis could also be Consumer Credit Assessment, IMA Journal of
examined. Management Mathematics 12, 2001, pp. 89-103.
References: