没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
神经网络在许多应用中越来越受欢迎,在量化金融领域也是如此。 然而,他们的“黑匣子”表示的低可解释性一直是一个普遍的批评。 以前的文献试图更好地理解和可视化神经网络,主要侧重于图像分类。 本文展示了将相同方法应用于示例深度神经网络模型的可行性,该模型涉及信用卡投资组合的信用风险估计。 结果表明,对相关性、敏感性和神经活动的分析可以提高神经网络在金融建模环境中的可解释性。
资源推荐
资源详情
资源评论

























Interpretability of Neural Networks: a credit card
default model example
Ksenia Ponomareva
Simone Caenazzo
Abstract
Neural networks have risen in popularity for a number of applications, also in quan-
titative finance. However, the low interpretability of their ‘black box’ representation
has always been a common criticism. Previous literature has attempted to provide
a better understanding and visualisation of neural networks, focusing primarily on
image classification. This paper shows the feasibility of applying the same methods
to an example deep neural network model, concerned with the estimation of credit
risk for a portfolio of credit cards. Results show that the analysis of relevance,
sensitivity and neural activities can increase the interpretability of a neural network
in a financial modelling context.
1 Introduction and motivation
Historically, the widespread use of advanced deep learning models in sensitive fields like medicine
and finance has been hindered by a fundamental lack of human interpretability regarding the outcomes
of such advanced models. Simpler techniques such as linear or logistic regressions yield outcomes
which are deterministic in nature and follow mechanics which are well understood and controlled
by model developers and analysts. Deep neural networks, however, have a large number of hidden
layers and neurons, the exact roles of which are not easily understood by humans [1].
The interpretability issue in the financial services context has started to receive broad coverage
in recent times. Examples can be found in [
2
], [
3
] and [
4
]. In [
5
], the interpretability issue of
neural-network-based models has been introduced in the context of Retail Banking, looking at the
opposing challenges in data analysis and model interpretability that financial institutions are facing.
A number of research streams have recently been seeking solutions for the issue of interpreting deep
neural network models. Among them, are the following:
• Relevance analysis
: how much of the output (e.g. a probability of default) is directly due
to a given input variable?
• Sensitivity analysis
: how much does the output change subject to a (small) change in a
given input variable?
• Neural activity analysis: which neural paths are most activated by a given input variable?
Another promising research avenue is found in [
6
], where a novel class of neural networks (called
Deep Fundamental Factor models) with enhanced information ratios is introduced in the context of
multi-factor asset models.
For larger banks and institutions these research outputs allow to consider development of more
sophisticated models that could not previously comply to rigorous regulatory standards; for smaller
fintechs that already use artificial intelligence techniques, they provide a way of conducting deeper
model validation into their processes.
Electronic copy available at: https://ssrn.com/abstract=3519142

The primary focus of this paper is to show the feasibility of these methods in overcoming the
interpretability hurdles around the application of neural networks and deep learning in business
and/or risk processes in the financial industry. This is achieved by analysing a credit/default risk
neural-network-based model in the context of a credit card portfolio. The analysis involves applying
a selection of relevance, sensitivity and neural activation analysis techniques, demonstrating their
ability to explain the model’s mechanics. It is worth clarifying, however, that the paper does not focus
on trying to structure a neural network model that outperforms other techniques for the particular
dataset at hand. Instead it focuses on developing foundations for understanding, examining and
modelling interpretability and explainability of deep learning models.
The rest of the paper is organised as follows. Section 2 provides details on the construction of the
dataset and architecture, as well as a brief discussion on the training and test results for the credit
card default model to be interpreted. Section 3 introduces the relevance analysis technique, Section 4
provides details for the sensitivity methods and Section 5 describes the neuron activity analysis. All
three sections illustrate techniques by analysing their application to the model examples. Conclusions
are drawn in Section 6.
2 Dataset and the neural network
2.1 Dataset and features
For this paper, a publicly available dataset from UCI machine learning repository, [
7
], has been used.
This data provides information on default payments, demographic factors, credit data, history of
payment and bill statements of credit card clients in Taiwan from April 2005 to September 2005.
This dataset has been used in various academic papers, see [
8
][
9
][
10
], as well as online machine
learning blogs, [11].
The input used in this paper has 23 features, similar to [
8
]. There are four demographic features
covering gender, education, marital status and age of each client. These are followed by 18 features
providing history of payment and bill statements, i.e. repayment status as well as amount of bill
statements and previous payments for the six consecutive months. The last feature is the amount of
credit given to the client. These input features have been scaled between zero and one to speed up
convergence of the gradient descent and accelerate training, as well as to enable easy comparison of
sensitivities. The ground truth, or the true label, output takes two values, zero for no default and one
if the client defaults on the next month’s payment.
Data is randomly shuffled and then split in such a way that
80%
is used for training and
20%
for
testing. In the original dataset
≈ 78%
of entries represent non-default. This finding is as expected for
this specific financial context.
2.2 Network specification
The model examined in this paper is a feed-forward neural network. The network topology has
been determined after a brief hyper-parameter tuning that experimented the usage of hyperbolic
tangent (tanh) and Rectified Linear Unit (ReLU) activation functions, as well as various numbers of
hidden layers and neurons. The final model architecture was selected as the one that minimised the
discrepancy between accuracy metrics in the training and testing phases, whilst keeping training time
within reasonable levels.
The final model topology comprises three hidden layers, see Figure 1 for the high-level architecture,
with 100, 50 and 10 nodes respectively. The three hidden layers have activation functions
A1, A2
and
A3
respectively, the input vector can be considered as
A0
, and the output has final activation
A4
.
All activations apart from the final one are ReLU, where
ReLU (z) = max(0, z)
. Since predicting
whether a client would default or not in the next month is a binary classification problem, the final
activation is the sigmoid function, sig, where
A4 = sig(Z4) =
1
1 + e
−Z4
2
Electronic copy available at: https://ssrn.com/abstract=3519142

and
Z4
is the linear transformation applied to the output of the last hidden layer. To avoid over-fitting,
neuron dropouts are implemented in the training phase across all hidden layers, with 65%, 50% and
25% dropout rates in the first, the second and the last layers respectively.
Figure 1: High-level architecture for the considered neural network.
This model takes the dataset described in Section (2.1) as an input. The classification output is a
default prediction,
ˆ
Y = D(A4)
, obtained from a sigmoid score of the final activation
A4
as follows:
D(A4) =
1 if A4 = sig(Z4) > 0.5,
0 otherwise.
(1)
2.3 Results analysis
The neural network, once trained, achieves
≈ 82%
accuracy both on the train and the test datasets.
This might be considered reasonable accuracy for a binary classification and, in fact, to the authors’
best knowledge, it has not been exceeded for this particular dataset in the published literature so far.
However, since the dataset is highly dominated by non-default entries, overall accuracy alone will not
provide enough information on how well the model predicts defaults.
Firstly, other metrics such as precision, recall and F1 score should be considered, where:
precision =
T P
T P + F P
, recall =
T P
T P + F N
, F 1 = 2 ×
precision × recall
precision + recall
.
Here, TP stands for true positives (both prediction and true label are default), FP stands for false
positives (prediction is default but true label states no default occurs) and FN means false negatives
(prediction is no default and true label is default). Table 1 shows the normalised confusion matrix for
the test dataset, based on which, it can be concluded that the model has
71%
precision,
31%
recall
and F1 score of
43%
. This means that
71%
of the clients, for whom defaults are predicted by the
model, would indeed default. However, out of all the client defaults that occur in the next month,
only 31% are correctly identified by the model.
Predicted Default Predicted Non-Default
Actual Default 6.88% (TP) 15.10% (FN)
Actual Non-Default 2.82% (FP) 75.20% (TN)
Table 1: Normalised confusion matrix for the test dataset.
Secondly, the receiver operating characteristic (ROC) curve obtained from test data should be
examined, see Figure 2 for the details. The ROC curve is obtained as a scatter plot of true positive
rate (TPR) versus false positive rate (FPR) for increasing values of the decision threshold in Equation
(1). TPR and FPR are defined as:
T P R =
T P
T P + F N
, F P R =
F P
F P + T N
.
3
Electronic copy available at: https://ssrn.com/abstract=3519142
剩余16页未读,继续阅读
资源评论


weixin_38625442
- 粉丝: 6
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 基于 Python tkinter 与 MySQL的图书管理系统.zip
- 基于 Python 的 Linux 应用防火墙(UESTC 课程设计).zip
- 基于 Python 编写的点名器.zip
- 基于 Python 的 Hyper-V 虚拟机管理工具.zip
- 基于 Python 的结构化日志库..zip
- 基于 Python 的 QQ 空间爬虫程序.zip
- 基于 python 的 selenium UI 自动化测试框架,采用 Page Object 设计模式进行二次开发
- 基于 python 开发的 DDNS 域名自动解析工具, 适用于百度云_ 百度智能云域名。.zip
- 基于 Python 的跳动爱心.zip
- 基于 Python 的量化投资基金的仓库.zip
- 基于 Redis 官方分布式锁文章的 Python 实现.zip
- 基于 Python 实现微信公众号爬虫.zip
- 基于 Python-Flask 的微服务框架.zip
- 基于 skywind3000_KCP 的 python 版本.zip
- 基于 Skulpt.js 的在线 Python 编程学习网站.zip
- 基于 skulpt 开发的 Python online.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
