0% found this document useful (0 votes)
100 views108 pages

Scorecard Training Slides 2024 v5

The document outlines a training program on scoring concepts in credit analytics, scheduled for June 20-21, 2024, by APDS Consulting. It covers various topics including model diagnostics, validation, decision areas, and the use of data sources for scoring. The training aims to enhance understanding of credit risk management and analytics, leveraging both traditional and alternative data sources for improved decision-making in financial services.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views108 pages

Scorecard Training Slides 2024 v5

The document outlines a training program on scoring concepts in credit analytics, scheduled for June 20-21, 2024, by APDS Consulting. It covers various topics including model diagnostics, validation, decision areas, and the use of data sources for scoring. The training aims to enhance understanding of credit risk management and analytics, leveraging both traditional and alternative data sources for improved decision-making in financial services.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Concepts of Scoring

Training
20th / 21st June 2024
by

APDS Consulting
(Matthew Freeman)

www.apds-analytics.com
• Introduction
• Credit Analytics and Scoring Concepts
• Decision Areas
• Origination
• Behavioural
• Credit Propensity

• Modelling Steps 1 & 2 (includes a brief discussion on Reject Inference)


• Sampling

Scorecard
• Segmentation
• Characteristic Selection
• Characteristic Classing

Training •

Modelling
Scorecard Scaling

Agenda
• Model Diagnostics & Validation
• Score Level – PSI, Gini, KS & Rank Ordering
• Characteristic Analysis

• Other Risk Model Issues


• IRB / IFRS 9 PD, EAD & LGD

• Model Risk Considerations


• Model Lifecycle
• Model Management Framework
• Model Inventory
• Approval Networks / Committees

• Conclusions
Matthew has twenty plus years retail financial services industry experience, predominantly within the
analytical risk management sphere, covering diverse geographies ranging from the United Kingdom and
Western Europe, throughout the Middle East and India, China, Asia (North and across the South East) and
Australasia (Australia and New Zealand).

Much of this experience has been gained within the big credit risk
consulting firms (such as Experian and FICO) and with large banking and
finance groups (such as Lloyds, Barclays, United Overseas Bank and GE

APDS
Capital), developing both operational (supervised segmented application
and behavioural scorecards (incorporating statistical and business oriented
segmentation), built utilising linear or logistic regression) and regulatory
(Basel II & IFRS9 PD, EAD and LGD) models, utilising a wide range of

Consulting -
internal and external datasources (which have undergone detailed and
specific quality checks to ensure that they are fit for the modelling /
analytical purpose for which they are intended). Models and associated
strategies have been developed for all the major consumer product groups

Matthew
such as cards, loans and mortgages.

More recently he has started to consider how the new, exciting and

Freeman
developing datasources (reviewing current data scenarios, whilst
identifying any data gaps and therefore the need for enhanced data
collection that fits with intended business (problem solving) use of the
data), thrown up by the Big Data revolution, can be used to help both
traditional and new lenders to exploit the opportunities that have been

Biography created, including identifying potential high net worth individuals for a
wealth management bank, how to source new customers in a risk
responsible manner where traditional credit data is thin, but newer
innovative sources are abundant or the developing of risk and marketing
models for the pre-paid mobile telephone sector in Asia.

Typically, all data analysis (including quality assurance and data


manipulation (merging / creating new advanced combinations of variables)
has been carried out utilising the SAS, Python or R software (as well as
specialised statistical modelling software (supplied by the likes of Experian,
FICO or Paragon Business Solutions).
Global Experience applied Locally – By APDS Consulting
IFRS9 / IRB Consulting Development of
Scorecard & IRB Scorecards and Long
for multiple large UK IFRS9 / IRB Consulting
Model Development Term Analytical
Banks for a Nordic wide
for numerous banks Support in Mongolia

Scorecard Training

Scoring for Credit


Start-up across
regional South East
Asia

Marketing Analytics &


Campaign Analysis /
Strategy for Top NZ
Social Scoring for Bank / Mastercard
Lending to Start Ups

Model Risk
Marketing Analytics Generic Visa
Management Gap
and Modelling for a Scorecards across 4
Analysis across the
top 5 Saudi Bank countries
Middle East
APDS Clients

www.apds-analytics.com
Training and Mentoring – Foundation to Expert Level

Intermediate Advanced
(Portfolio (Internal Model
Foundation
Monitoring and Development,
(Training to
joint advanced
allow project
development of strategies,
participation)
new models and consultants as
strategies) trusted advisors)

3 years
Analytics
The Analytics Journey
High

Optimisation

Advanced Data Driven


Constrained
Design (multiple & alternate datasources)
Optimisation

Data Driven
Profitability

Design & Strategy Multiple Scores Outcomes (Risk and


Marketing)
Score-based Alternative Data Scores
Strategies
Integrated Custom
+ Bureau Scores
Custom
Policies Scores
Pooled
Scores

Expert/Generic Scores
Rules
Low

Low Analytic Infrastructure High

www.apds-analytics.com
Credit life cycle Analytics – Decision & Prediction

Customer Collections &


Acquisition Origination
Management Recovery

Decisions
 Whom to target  Approve/Decline  Line Management  Collections priority
 Product to offer  Line/Loan/Lease  Re-pricing/Renewal  Collection action
 Channel amount
 Authorizations  DCA placements
 Timing  Price
 Cross-sell  Channel placements
 Up-sell
 Early Collections

Predictions
 Response  Risk  Risk  Amount collectible
 Revenue  Revenue  Revenue  Charge-off
 Risk  Capacity  Attrition  Bankruptcy
 Pre-payment  Capacity  Roll
 Fraud  Fraud

www.apds-analytics.com
Credit life cycle Analytics - Results

Customer Collections &


Acquisition Origination
Management Recovery

 Increased market share  Increased approvals  Increased revenue  Reduced losses


 Increased share of wallet  Reduced losses  Reduced losses  Increased amounts
 Reduced marketing collected
 Increased  Reduced attrition
costs activation/usage  Reduced customer
 Reduced transaction
 Increased revenue service and collection
 Reduced origination fraud
costs
costs
 Reduced
application fraud

Better Results

www.apds-analytics.com
Questions
www.apds-analytics.com
Scoring Concepts
Analytics & Scoring - The Concept of Modelling

Future Predictions
Data Future
Data Science / Predictions
Universe AI or ML

Using past or historic data to predict future performance


Credit Risk Analytics – Solving Risk Management Challenges

Generic
Internal Models
Data

Macro Custom
Econ Models

Strategic Use
Bureau Regulatory
Data Models of Models &
Data
Stress /
Open
Scenario
Banking
Models

Advanced
Alt Combo
Data Models
www.apds-analytics.com
Describe Datasources

• Traditional Application Data, obtained from the submitted


application form (online or physically) by the applicant (covers
demographics, employment, affordability)
• Traditional Bureau Data (collected by an agency from banks / FS
Organisations, relating to past and existing credit facilities)
• Transactional Data (date from the use of a credit or debit card)
• Internal (Bank) Performance Data (information about account
usage, payments etc)
• Open Banking Data
• Alternate Data, collected from Telcos, Mobile Handsets, Social
Media Use, other sources (unconnected to the bank) etc.

**Datasources may differ if the product / applicant is business / commercial


www.apds-analytics.com
How Does Scoring Work?

EXAMPLE SCORECARD
Applicant Age in Years

< 22 -50
22 - 25 -20
• Scorecards add and subtract points to a
baseline constant according to each 26 – 40 0
individual’s or account’s data 41 – 55 +30

• These scorecards are easy to apply and > 55 0


intuitively simple to understand Time as Rider
• The resulting score gives a prediction <1 0
of future behaviour 1–2 -45
• These scores can be used to rank a 3+ -100
group of individuals to assign the best Worst Payment Status
actions
Current 0
1 Payment missed -10
2 Payments missed -60
Etc. Etc.
… …

www.apds-analytics.com
The Role of Analytics & Scoring
Consider a scorecard built to predict whether a new applicant for a credit product will default on their
payments within time X
This scorecard is used when a new customer applies…

EXAMPLE SCORECARD
Application
Form Data Applicant Age in
Years
< 22 -50
22 - 25 -20

LOAN
26 – 40 0
APPLICATION Alternate Data 41 – 55 +30 Score
(Rider or Customer > 55 0
NAME
Characteristics)
LOAN AMOUNT Time as Rider
PURPOSE
RISK
<1 0
DEMOGRAPHICS
1–2 -45
3+ -100
Worst Payment
Open
Status
Banking and / or
Bureau Data Current 0
1 Payment missed -10
Take most appropriate action
2 Payments missed -60
for each individual
Etc. Etc.
… …
Previous
Loan Behaviour

www.apds-analytics.com
Using the Scores in Decisions

Low Score
APPLICATION SCORE High Score

Very High Risk High Risk Standard Risk Very Low Risk
Worst Applicants. Reject these Good applicants: The best applicants.
Reject these applicants or give Accept Consider higher
applicants low limits and apply limits
higher pricing.

www.apds-analytics.com
Analytics Across Business Areas
GDS Link Asia
Business Area Challenge Model Data Business Outcome
Prospecting • Who is the right customer to target? Propensity to Respond Alternative Data Improve Customer Loyalty
• How can we cost effective market to the right customer? Propensity to Take-Up Demographics Reduce Marketing Costs
• Can we pre-screen the risk associated with our targets? Account Management Increased Marketing Efficiency
Reduced Attrition
Increased Revenue / Profits
Origination • Who to accept and reject? Application Scores Demographics Risk Profiles
• Under what terms and conditions? Bureau Scores Bureau Data Improved Customer Service
• Can I X-sell addition products to the applicant? Application Fraud Alternative Data Reduced Losses
Account Management
Customer Management • Which of my existing customers can we x-sell or up-sell Behavioural Scores Demographics Risk Profiles
additional products and services to? Bureau Data Improved Customer Service
• Which customers can we offer an increase in limit? Alternative Data Increased Market Share
• How do we determine pre-delinquency treatments? Account Management Increased Profits
Reduced Losses

Usage • How can we identify the customers most likely to use the Revenue Models Demographics Increased Revenue / Profits
product more? Utilisation Bureau Data
• How do we identify the profitable customers that are likely to Attrition Alternative Data
close their facilities? Account Management

Collections • Which delinquent customers to we collect upon? Collections Scores Demographics Reduced Losses
• How do we prioritise collection treatments? Payment Projection Bureau Data Increased Recoveries
• Which accounts do we collect, litigate, sell or write-off? Loss Forecasts Alternative Data
Account Management
Collections Management

Regulatory Models • How much capital do we need? PD, EAD and LGD models for Demographics Capital Planning
• Can we estimate our provisions based upon the lifetime of the AIRB Bureau Data Expected Loss Calculation for Provisioning
facility? PD, EAD and LGD Models for Alternative Data Long Term Business Planning
• How will the economy affect our capital and provisioning IFRS9 Account Management
positions in the future? Macroeconomic models for Collections Management
stress testing and scenario Macroeconomic Data
analysis
www.apds-analytics.com
Benefits of Credit Scoring
GDS Link Asia

Automated High Level Reduced Increased


Decisioning of Control Losses Revenue

Increased Increased
Optimised Engaged
Portfolio Regulatory
Portfolios Staff
Knowledge Standing

Maximised
Opportunity

www.apds-analytics.com
Questions
www.apds-analytics.com
• List three areas where the concept of scoring
could be applied to your bank / organisation
• What’s the business problem that could be
addressed?
• What data would be used or could be useful?
• Discuss the business case for doing such a thing?
Exercise 1 (1
• Ideation sessions
hour)
Decision Areas
GDS Link Asia
Acquisition Analytics
• Is the bank targeting the right customers?
• Is the bank using Alternative Data to target thin file or unbanked customers?
• Is the bank acquiring customers based upon the bank’s risk appetite and assigning appropriate terms and conditions
based upon the risk profile?
• Does the bank capture and store the right data for the use within scoring and strategy deployment?
• Is the bank maximising the benefits of using date from both internal and external datasources (bureau data / alternative
data)?
• Is the bank satisfying the prevailing regulatory compliance criteria?

www.apds-analytics.com
Acquisition Analytics
GDS Link Asia

Key Analytic Area at a


Acquisition Level

App Bureau Alt Data Fraud Propensity


Scores Scores Scores Scores Scores

www.apds-analytics.com
Analytics in the Acquisition Area
Application Scoring - Overview

Business Challenge: Automate the Acquisition decisioning process


based on the Risk Profile
The recent past Some time later Solution: Application scorecard build to predict Delinquency
THEN

NOW
Outcome
• Application scorecard using Application Demo data
Outcome • Can use Bureau data, and Internal other Product data
Development Statistical • Usually use past Delinquency behaviour to predict
Sample Model
outcome i.e. 90+DPD Default definition
Benefits:
• Reduction in Bad debts
• Aim of high level of Automation
• Tool for Risk Based Pricing

Constant +800
Age of Applicant
<22 -50
22-25 -10
26-40 0
Low High
41-55 +30
Score Score
APPLICATION Worst status Last 6 months

SCORE 0
1-2
0
-45

Very High Standard Very Low 3+ -100

High Risk Joint Applicant


Risk Risk Risk Y +20
N 0
Acquisition Analytics – Combining Scores
GDS Link Asia
Combining the internal application scores with the bureau scores enables the bank to consider a customer’s
credit position at other financial institutions and also consider cases that may have been previously rejected if
only internal data was considered

Bureau
Benefits
• Increase in predictive power of
acquisition decisioning Low Medium High

• Reduction in Bad Debts


• Expansion in the number of applicants Very Low Reject Reject Refer
that can be offered facilities (e.g. Application
previous high risk internal only Low Reject Refer Refer

applicant can now be referred, instead


Medium Reject Accept Accept
of rejected)
• Aim for high level of automation High Refer Accept Accept

Very High Accept Accept Accept

www.apds-analytics.com
Potential Acquisition & Origination Modelling Datasources

Demographic Data Product Information Mobile


• Age
Models can cover many • Residential Status
• Credit Card Limit Requested • Pre-Paid Y/N
• Card Type Requested • Number of device used to pay
areas :- • Socio-Economic Group • Loan Purpose
for goods L3M, L6M
• Occupation • Time of first daily use
• Loan Term
• Origination Models • Average Distance between
• Location • Property Type
• Cross-Sell Models daytime and nighttime
• Time at Address
• Income Models • Borrower Type – Owner location last week
• Affordability Models • Time in Job Occupier vs Buy-to-Let
• Handset Type – Smart vs
• Indebtedness Models • Etc. Analogue
• Fraud Models

CRA Data Open Banking Alternate Data


• Worst Delq Status (all prod) • Total Number of Accounts in • Psychometric Data
last 24 months a Delinquent state • Time Taken to
• Number of Searches Last 3 • Total Outstanding Balance answer moral
Months questions
• Our Outstanding as a
• Number New Cards Opened proportion of Total • Time taken to
Last 6 Months Outstanding Last Months answer DoB
question
• Time Since Last Default • Total Late Fees Paid Last 3
• Number of CCJs / Court months • Tax Records
Judgements Last 6 years • No. Months with delinquency • Utilities Data
last 3M, L6M, L12m

Model Power increases as data availability expands


Modelling Arena

Can we predict which Which customers are


• Collections operations are new customers will Activation Re- likely to re-activate a
highly data dependent quickly activate their (New to
Activation
dormant card?
cards? Book) (Dormant)

• Data used will be generated


from account operation,
from contacts with the
Which customers are
customer within the Usage
likely to close their
Do we need customer Response Attrition
collections / delinquent consent?
Data
accounts?
environment
• The use of external data,
such as open banking and Revolve
Can we predict which
other alternative sources dormant or
(pay
interest)
Spend
(Domestic) Can we predict which
would be encouraged as it transacting customers Spend customers will spend
provides a much more will revolve balances (Overseas) on their cards
in the future? domestically?
holistic view
Or spend overseas?
Potential Propensity Modelling Datasources

Demographic Data Behavioural Scoring Data Mobile


• Age • Pre-Paid Y/N
Models can cover many • Residential Status
• Worst Status Last Month /
L3M / L6M / L12M • Number of device used to pay
areas :- • Socio-Economic Group • Min Balance Last Month, L3M, for goods L3M, L6M
L6M, L12M • Time of first daily use
• Occupation
• Product Propensity – • Number of Payments L1M, • Average Distance between
• Location
• which product, when? L3M, L6M, L12M daytime and nighttime
• Time at Address location last week
• Credit Propensity – determining • Average Payment to Balance
which customers need additional • Time in Job Ratio L3M
credit
• Usage – determining which
customers will generate the most
revenue through greater usage CRA Data Transactional Data Open Banking
• Benefits include • Worst Delq Status (all prod) • Number of Purchases Last 3 • Total Number of Accounts in
• Strategic Customer Targeting last 24 months Months a Delinquent state
• Reduced Marketing Costs • Number of Searches Last 3 • Number of Purchases greater • Total Outstanding Balance
• Efficient Decisioning Increased Months than $100 Last Month • Our Outstanding as a
Revenue • Number New Cards Opened • Time since Last Purchase proportion of Total
• Examples Last 6 Months
• Number of Months Revolving
Outstanding Last Months
• Propensity to spend on a Credit Card • Time Since Last Default Balance Held last 12 months • Total Late Fees Paid Last 3
• Propensity to Revolve a balance on a months
• Number of CCJs / Court • Average Value Purchases by
CC Judgements Last 6 years MCC last 6 months • No. Months with delinquency
• Propensity to Attrite / Switch a last 3M, L6M, L12m
Mortgage

Model Power increases as data availability expands


Acquisition Analytics
GDS Link Asia – Fraud

• Increasing levels of identity fraud leads to increasing fraud losses at a diverse range of banks
• Data and Models can be used to help the bank identify cases that would require further investigation (searching for
anomalies in the data)
• Application Fraud Models
• Use of bureau data is permitted
• Potentially early bads may be classed as fraudulent
• Benefits (taken from a UK example)
• Referral of 9% of Applications for further investigation
• Detection of 56% of fraud cases, saving the bank money Example Characteristics include complex
interactions, e.g. Age and High Income
Analytics and Customer Management
GDS Link Asia
Maximising the use of the bank’s data that can be mined from existing customers

Is the bank pro-actively managing it’s existing customer base to drive revenue and reduce costs?

• Behavioural Scoring in Customer Management


• Is the bank managing relationships at an account level?
• Is the bank managing relationships at a customer level?
• Pro-actively managing the customer base?
• Identifying high value customers for potential upsell / x-sell?
• Increase Revenue from X-Sell Opportunities and grow the portfolios X-Sell
• Identify customers that are starting to look vulnerable? (triage)
• Prioritising the flow into collections and determining appropriate actions
• Increased (Early) Collection Process Efficiency & Reducing Roll-Rates

www.apds-analytics.com
Customer Management Analytics
GDS Link Asia
Key Analytic Area at a
Customer Management Strategies
Level

Account Level Behavioural Credit Limit Increase


Scorecard Further Advances
X-Sell

Customer Level Credit Limit Increase


Behavioural Scorecard Further Advances
X-Sell
Collection & Recovery
Scores Roll-Rate Models
Payment Projection
Collections Strategies

Revenue & Propensity X-Sell


Scores Attrition
Spend Models

www.apds-analytics.com
Customer Management Analytics
GDS Link Asia

Key Analytic Area at a


Customer Management
Level

Account Level
Behavioural Customer Level Early Collections
Score Cross Sell Scores Scores

Customer Level Late Collections


Revenue Scores
Behavioural & Recovery
(propensity to
Score Scores
use)

www.apds-analytics.com
Looking at Behavioural Scoring and how to use Strategy

Exposure Management

Cross-Sell
Portfolio Data

Behavioural Scoring Modelling


Triage
Chars
(Examples)
Exposure Management
• Max Delq last 6 months
• Numbers times 30+ last
12 months
e
α+∑ βx

Pr( x) =
• Number of Overlimit last
3 months Segmentation
• Number of Payments α + ∑ βx
greater than minimum
payment last 6 months
1+ e
• Max Utilisation last 9 Collections Priorisation
months
• Number of Collection Roll-Rate Management
Calls last 24 months

Payment Projection

Recoveries
Management

www.apds-analytics.com
Customer Management Analytics – Collections Management
GDS Link Asia
Account Status

1-29 Days 30 - 90 Days 90+ Days Past


Up To Date Past Due Due
Past Due

Behavioural Scoring Roll-Rate Model Payment Projection

Description Manage Customers Initiate Customer Continued Customer Decision made to end
Relationships Contact for Contact for the customer
through • Debt Control • Prioritised Debt relationship,
• Pro-active limit Control therefore switch to
• Triage
management • Arrears • Recovery
• Limit Management Strategies
• Competitive Pricing Management
• Pre-emptive Triage • Save the • Debt Sale

• X-Sell Programmes Relationship? • Litigation

Objective To maximise To minimise roll-rates To maximise recovery


To maximise profitability by
profitability by to non-performing cash-flows and
balancing expected bucket whilst minimise losses to the
balancing expected revenue with
revenue with determining whether bank
associated levels of to save the
associated levels of risk and exposure,
risk and exposure relationship or
whilst catching early minimise loss
delinquency
www.apds-analytics.com
Champion-Challenger Philosophy
Agenda

Champion
Modelling Monitoring
Post Default
Payment
Chars

Alternative
Data Chars Efficient Frontier
Challenger 14
A 12
Increasing delinquency – Missed Payments
10

% Collected
8
6
Data Science /
4
Credit Scoring
Payment 2
U Projection
3 6 9 0
𝑒𝑒 𝛼𝛼+∑ 𝛽𝛽𝛽𝛽 T X / LGD
0 0 0 1 2 3 4 5 6 7 8 9
Pr( 𝑥𝑥) = D Models
1 + 𝑒𝑒 𝛼𝛼+∑ 𝛽𝛽𝛽𝛽
Challenger Months since Implementatio
B
Challenger C too
Recovery – Making Payments
extreme, A not radical
enough, B improves
Recovery to Contact
Behavioural
Scoring Chars
Lend More
Model
Current
Model
Default
Prevention
History
Chars
over the Champion and
becomes incumbent
(30 dpd) Model
(30 dpd)
(60 dpd)

Challenger
C

www.apds-analytics.com
Collections & Payment Projection Models

• Credit Customers are obliged to make regular (usually)


monthly payments to repay their lending facilities
• Customers are categorised as sitting in a range of
buckets, representing the number of missed payments
• If a payment is missed or not made customers will ‘roll’
to the next delinquency bucket
Increasing delinquency – Missed Payments
• If a payment is made the customer will ‘roll back’ the
number of buckets the payments represents
• At 90 days past due the customer is deemed to have
‘defaulted’, and a loss may be incurred
• NPLs occur at 90 days past due, Banks will aim to avoid a
90-day status
• The predictive power of the behavioural scoring UTD X 30 60 90 Losses
deteriorates the longer the customer is in a delinquent
state
• Typically, the bank will look to use contact information
collected during the collections process, e.g. Number of
Right Party Contacts Last Month or Number of PTP’s
Kept Last Three Months
• As we move post-default (usually 90 dpd) the data and Recovery – Making Payments
the model structure changes again, towards recovery
payments data and payment projection models (the
bank may wish to utilise it’s LGD models in this phase or
have separate operational without the cost and As Delinquency Increases B-Score Data Usefulness Decreases
discounting elements)

www.apds-analytics.com
Collection Scores – Basic Strategy What is it?

• Behavioural & Collections Scoring is used to


predict which accounts or customers will go
into the late stages of delinquency (often 3
cycles plus, although more complex
performance definition can be used on a
product-by-product basis)
High Risk Collections • Often the score is grouped into risk grades and
High Intensive Contact appropriate actions taken by grade
Day 1
Risk Strategy / Early Debt
Sale
Why

• Identify Future Delinquents, reducing losses


• Identifying cases that will self-cure will allow
resources to be concentrated where they are
needed
Medium Standard Collections Tailored Use of
Day 7 Resources
• Identifying x-sell potential will increase revenue
Risk Process

How?

• A behavioural score is often an integral part of


the bank’s customer management
infrastructure and can be used to determine x-
Low Soft Touch
sell opportunities (from a revenue perspective)
Risk Day 21 Collections Process • The score is also key to identifying customers
New Facility offers that are deteriorating so that preventative
measures can be taken to pre-cure before
delinquency
• The score would also be used in the early
stages of delinquency to prioritise strategies
Customer Management Analytics - Sampling
GDS Link Asia
Independent Variables Performance Variables
Data
Elements Transactional
Bureau Data,
Data
can be at e.g. No. of
e.g. Delq accounts
account or L3M
customer
level OR be Internal Data,
Contact
Performance
History, One Record per
external e.g. Other
e.g. £ of File
Account Account / Customer
bureau type Delq
PTPs Good-Bad Flag
taken
chars

Observation Period Outcome Period


(usually up to 12 months) (dependent upon model purpose)
Know Info @
Decision
Point Outcome Outcome is
shorter for
Observation Colls
Models
Point
Potential Behavioural & Collections Modelling Datasources
Examples of Data that could be used
when Collections Scoring

Behavioural Scoring Data Mobile

• Collections operations are • Worst Status Last Month /


L3M / L6M / L12M
• Pre-Paid Y/N

highly data dependent • Min Balance Last Month, L3M,


• Number of device used to pay
for goods L3M, L6M
• Data used will be generated L6M, L12M
• Time of first daily use
• Number of Payments L1M,
from account operation, L3M, L6M, L12M
• Average Distance between
daytime and nighttime
from contacts with the • Average Payment to Balance location last week
customer within the Ratio L3M

collections / delinquent
environment
• The use of external data, Collections Contact Data Open Banking
such as open banking and • Number of right party • Total Number of Accounts in
a Delinquent state
other alternative sources contact last month
• Total Outstanding Balance
• Number of Promises to Pay
would be encouraged as it Taken L1m, L3M, L6M • Our Outstanding as a
provides a much more • Number of Partial Payments proportion of Total
Outstanding Last Months
holistic view Made L3M, L6M
• Total Late Fees Paid Last 3
• Ratio of Payment to
Outstanding L3m months
• No. Months with delinquency
last 3M, L6M, L12m

Model Power increases as data availability expands


Questions
• Earlier we listed out areas where your bank
may apply scoring concepts, now please list
Exercise 2 or revisit the data you would need to build
upon the identified uses
Let’s Talk Data for ABC Bank Scoring Concepts

Discussion Area
– Exercise 2
Modelling Steps
Scorecard Development Steps
Low predictive chars will be
business logic check to Modelling
determine whether they • Characteristic Classing
are considered for
modelling
• Modelling – Linear vs Logistic
Regression
• Model Assessment & Re-model
• Finalisation Workshop & Report

Variable
Design and Reduction Modelling incl
Data Initial Analysis (PCA, Reject
Requirements Correlation, Inference
VoI)

Design Initial Analysis


• Design Workshop • Data Read In
• Business Objective / Problem • Data Manipulation
• Data Availability • Performance Definitions
• Portfolio Nuances • Segmentation
• Design Document • Sampling
• Initial Analysis Workshop & Report

Model Strategy
Diagnostics Develop
Design is confirmed
Essential that the design is through data analysis
correct and aligned to e.g. the performance
business requirements definition, the
development sample etc

www.apds-analytics.com
Data Sample Construction
GDS Link Asia
Design Application Scoring
(Business Problem, What Data is Available, Sample Window Outcome Window
High-level Solutionising)

12 Months Applications 12 Months Performance

24 Mths 12 Mths Outcome Point


Ago Ago Now
Data Sourcing (Jan 2013) (Dec 2013) (Dec 2014)
(extract, merge, quality checks etc.)

Behavioural Scoring
Observation Period Outcome Window

12 Months Applications Obs Point 12 Months


Data Manipulation
(char derivation, segmentation Performance
investigation, performance definitions, 24 Mths 12 Mths Outcome Point
exclusions etc.)
Ago Ago Now
(Jan 2013) (Dec 2013) (Dec 2014)
*could have multiple sample windows
Data Sample Construction

GDS Link Asia


Modelling Data

Sample Window

Ja
n Outcome Point
1 F
e
b
Monthly Application 2 M
a
Or Transactional r
3
Files
Rolling Observation Windows Rolling Performance Windows /
De
c
12

*Typically used when the number of bads is not low, and mature relatively quickly or when the portfolio requires a fixed outcome period (i.e. all bads
have the same time to exhibit poor performance)
Development Team Data Sample Construction
Fixed Window
GDS Link Asia
Modelling Data

Outcome Point
Sample Window
Performance Data
Transaction Data

24 12 Now
Months Ago Months Ago

*Typically used when the number of bads is low, the earliest bads are given longer
(up to 24 months in this case) to mature
Example Performance Definition - Complex
GB Good-Bad Classification
Flag Definition
Design
(Business Problem, What Data is Available,
High-level Solutionising)
1 Voluntary Cancel / Close / Exclusion
Deceased / Never Active

2 Bankruptcy Bad / Default

3 Write-Off Bad / Default


4 Re-Age / Re-Structured Bad / Default
Data Sourcing 5 Ever 90+ Days Past Due >= Bad / Default
(extract, merge, quality checks etc.) $100 in the last 12 months

6 Ever 60+ Days Past Due in Indeterminate / Non-


the last 12 months Default

7 Ever 30+ Days Past Due in Indeterminate / Non-


the last 12 months Default

Data Manipulation 8 Ever x-Days Past Due in the Good / Non-Default


last 12 months
(char derivation, segmentation
investigation, performance definitions, 9 Up-to Date in the last 12 Good / Non-Default
exclusions etc.) months

www.apds-analytics.com
Example Performance Definition - Simple

Design
(Business Problem, What Data is Available, The aim is to the scorecard development as simple as possible
High-level Solutionising)

GB Good-Bad Classification
Flag Definition

1 Ever 90+ Days Past Bad / Default

Data Sourcing 2 Ever 30 – 90 Days Past Due Indeterminate / Non-


in the last 12 months Default
(extract, merge, quality checks etc.)
3 Ever 30+ Days Past Due in Indeterminate / Non-
the last 12 months Default

4 Worst Status x-Days Past Good / Non-Default


Due in the last 12 months
(includes current)

Data Manipulation
(char derivation, segmentation
investigation, performance definitions,
exclusions etc.)

www.apds-analytics.com
Defining Exclusions (Application)
GDS Link Asia

Customers have certain characteristics which the bank views as


favourable, therefore the applicant will be accepted regardless of score,
e.g.
• a Premium Banking Customer
• a Brand Ambassador
• a VIP

The bank would like to treat these customers differently from the
normal thru the door population when they for credit
• Example Exclusion Rules VIP
• Staff
• Students
• Pre-Approved Customers

www.apds-analytics.com
Defining Policy Rules (Application)
GDS Link Asia
• The applicant will automatically be rejected when hit any of the bank’s
Policy Rules (which define the bank’s lending or regulatory criteria)

• Policy Decline Rules are a set of criteria which each applicant must pass
regardless of their application score

• Example Policy Rules


• Age Between 18 and 70
• Derogatory Records (Bankrupt, Legal Action, Charged Off)
• Excessive Credit Exposure
• High Debt to Income Ratios

• Where the bank allows policy exemptions, we may analyse the effectiveness
of the policy rules by looking into default rates by exception rule

www.apds-analytics.com
Defining Exclusion Rules (Behavioural)
GDS Link Asia
• For the behavioural model the customer is already on book
therefore the exclusion rules are slightly different
• There two levels of exclusion rules for existing customer,
Observation & Outcome
• Example Observation Exclusion Rules
• Bad at Observation
• Inactive in the 12 months prior to Observation Point (no predictive data)
• Fraud / Lost or Stolen / Dispute

• Example Outcome Exclusion Rules


• Deceased in the performance window
• Inactive for the entire performance window
• Fraud / Lost or Stolen / Dispute

• Basel and IFRS9 Models may have separate exclusion rules, therefore these must be
known as the operational scorecards are often the keystone within those model
structures

www.apds-analytics.com
• Please define your performance definition
Exercise 3 – for your risk or propensity model
Performance • What are your key assumptions and the
Definitions and rationale behind your thinking?
• Please relate your definition back to a business
Outcomes prompt?
• Please define your performance definition
Exercise 3 – for your risk or propensity model
Discussion • Discussion
Day 1 - • Any Questions?
Reflections • Anything you’d like to go over again?
Defining Data
GDS Link Asia
1. Spec the data (Dev Team, Business Reps)
2. Extract Data (IT Team)
3. Initial Data Analysis to assess data (Dev
Team) quality
• Depth
Data Sourcing • History
(extract, merge, quality checks etc.) • Comparisons between segments,
e.g. Accepts & Rejects
• Population Rates
• Expected Values
Data Extract • Future Data Availability (Dev Team, IT
(application data, associated performance, Team)
sufficient time periods) • Data Audit Report (Dev Team, Business
Reps, IT)
Data Manipulation
(char derivation, segmentation
investigation, performance definitions,
exclusions etc.)

Who?
Dev Team
Business Reps
IT

www.apds-analytics.com
Data Prep
GDS Link Asia
For data assessment and specification two main areas are Ref No. Data Item Portflio Best Practice Bank Available
Recommendation
investigated. The first step is to look at the ‘Quantity’ of data
available and whether the information is sufficient (and covers
all the necessary fields) to construct appropriate scorecards.
The second step is to determine whether the available data is
of adequate ‘Quality’ and is a more analytic exercise. Balance

Step 1 – Quantity 1 CC Y Y
• Data Quantity assessment of the key data fields for in scope
portfolios
• Starting point is always best practice variable lists 2DPD CC Y Y
• Recommendations of required fields
• Recommendations of going forward fields
3Open Date CC Y N

Step 2 – Quality
• Review of data elements within each field by assessing population
4Limit CC Y Y
rates, data accuracy through detailed analysis of documentation,
data items etc
• Data Quality Assessment should include descriptive statistics, 5Payment CC Y Y
accuracy tests, documentation review and recommendations

www.apds-analytics.com
Data Check & Audit
GDS Link Asia
• To determine scorecard development feasibility the following analysis will be
undertaken, to determine
• Depth of Data
• Breadth of Data
• For Numerical Characteristics, frequencies will be produced investigating the
descriptive statistics of mean, min, max to highlight any data anomalies
• For Categorical Characteristics the frequency tables will investigate that the
appropriate codes are populated correctly
• For Example
• Balances are within the expected ranges, i.e. all positive and no abnormally high values
• Charge-Off is expected to values of A to E but a value of X is observed
• No. of Written Off accounts is expected to be 2% but 20% of cases have that status#
• Additionally
• Data Availability to code exclusion rules, good-bad definitions (including the default definition) exist and are well populated
• The supplied data is logical and no irregular trends are observed, examples include
• movement through delinquency buckets follows expected trends, i.e. Jan 2013 DPD is 0, but 90 dpd in February
• Valid Open Dates exist
• DPD registers a zero value but the account is charged off
• Number of records the expected number of records

www.apds-analytics.com
Data Manipulation
GDS Link Asia

• Valid Match Keys


Matching Data • Sorted Match Keys on both files
(outcome & (application and performance)
performance) • Explain any duplication (or remove)
• Check the range of the match keys
• Sense check match rates
• Verify that the merged file contains
the relevant and correct data
• Investigate any unmatched records

www.apds-analytics.com
Data Needed for Application Scorecards
GDS Link Asia
Observation Outcome
Point Point

Outcome Period
Typically 12 to 24 months Monthly
account
performance
information for
each month in
Data as at point of Application the outcome,
For Credit used to
determine the
GB Flag
Appl Existing Credit Other Perf
Data Accounts Bureau Sources Data
Historical Application Account / Customer Customer Credit Open Banking, Monthly Account
Data, for covering 12 Data from existing Data from other alternative Performance Data
Months of applications client Financial Institutions datasources

www.apds-analytics.com
Data Needed for Behavioural Scorecards
GDS Link Asia
Observation Outcome
Point Point

Outcome Period Monthly


Typically 6 to 12 months
account
performance
information for
each month in
the outcome,
Data as at point of scoring
used to
determine the
GB Flag
Historic Current Credit Other
Account Month Bureau Sources Perf
Data Data Data
Historical Account or
Customer Account / Customer Customer Credit Open Banking, Monthly Account
Data, for covering 12 Data from existing Data from other alternative Performance Data
Months of performance Client in the month of Financial Institutions datasources
observation

www.apds-analytics.com
Data Preparation & Initial Analysis
GDS Link Asia
e.g. Max Delq Last 3m
Raw
Data Derivation Min Balance last 6m
Data
etc

‘Turning the (raw) data into information’

Bad – ever 90 dpd inside


Performance Definition
X months

RE60 - Default Spike - Pop 1 vs Pop 2


100.00%
80.00%
60.00%
40.00%
Sampling 20.00%
0.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

pop 1 pop 2
Example of Raw to Derived – Turning Data into Information
Raw Data Derived Data

Column
Column ID Comment
Description
CIF_ID Customer ID
FORACID Loan Account ID
OPEN_DATE Loan Open Date
MATURE_DATE Loan Mature Date
Loan Disbursed
DISBURSED_AMOUNT Amount
SCHM_CODE Loan Product Code
DR_INTEREST_RATE Interest rate
Balance as of
BALANCE reporting date
OVERDUE_PRINCIPAL Overdue in principal
Overdue days in
OVERDUE_PDAY principal
OVERDUE_INTEREST Overdue in interest
Overdue days in
OVERDUE_IDAY interest
total overdue overdue principal +
TOTAL_AMT amount interest
REPORTDATE report date

www.apds-analytics.com
Data Exercise – 15 Minutes
GDS Link Asia

Raw Data Derived Data


‘ min_repordate',
'max_reportdate’,
'CYCLE_COUNT',
'CYCLE_NUM',
'CYCLE_POINT',
'loan_product’,
Please design 5 derived
'CIF_ID', characteristics
'FORACID’,
'OPEN_DATE',
based upon the raw data
'MATURE_DATE',
'DISBURSED_AMOUNT’,
'BALANCE’,
And what information
'OVERDUE_DAY', are you engineering into
'TOTAL_AMT',
'REPORTDATE’,
the model
‘Delq Cycle’

www.apds-analytics.com
Sample Window Selection
GDS Link Asia
Considerations for the Sample Window chosen
• Volumes
• Is there enough volume within the good-bad categories? Typically, scorecards require at least 750 of each
• Do sufficient volumes exist for the suggested or desired scorecard segments?
• Is the Sample Window representative of the ‘Going Forward’
World?
• Long Outcome Windows may be less relevant
• Have there been or expected to be changes in the portfolio?
• Are the products offered today the same as in the past? How do they differ?
• Have there been any specific marketing events that need considering?
• Data Availability
• Are all the characteristics available for the entire observation period?
• Is retrospective bureau data available for the observation period?
• Seasonality
• Do special events in the year affect the flow of the portfolio, e.g. Christmas?
• Selecting 12-month observation window help to smooth out spikes in demand or delinquency?

www.apds-analytics.com
Vintage Analysis – Length of the window
GDS Link Asia
Time to Bad Analysis: Vintage Report:

Choose
period
when bad
rate is
stable /
flattens

Examination of the Bad Rate over time will identify the


exposure required for the outcome period

www.apds-analytics.com
Roll-Rate Analysis – Confirming Good-Bad Definition
GDS Link Asia
Maintaining Stability

Example Roll-Rate Analysis:

www.apds-analytics.com
Why Segment?
Origination App Pop

• Diverse Customer Profiles


• Improved Accuracy salary
Non-
salary
• Segmentation enhances the accuracy of risk assessment by
accounting for nuances within each customer segment, reducing the
likelihood of misclassification. Bad Rate 1.2% Bad Rate 4.2%

• Key Reasons for Segmentation


• Risk Differentiation – there are differences within the risk profiles
Existing
associated with differing segments Customers
Existing Book
• Customized Strategies – segmentation gives the ability to tailor risk
management strategies to each segment, optimizing resource allocation and
(behavioural
improving overall risk outcomes modelling)
• Adaptive Modelling - segmentation allows for adaptive modelling,
ensuring that risk assessments remain relevant and effective in
dynamic retail environments

• Implementation Challenges – segmentation


Clean
last
Dirty
12
allows for differing data by segment to be assessed mths

• Increased Model Complexity – segmentation may add


model risk into the process, increases model management overhead
(monitoring, validation, management committees / approvals)

< 60 Pay
Curr dpd Pred

www.apds-analytics.com
Confirming Segmentation – Differing Performance
GDS Link Asia
Current Delq
1. Is the Population
Known GB
Interval Known Woe Known GB Odds Totals Totalslarge
% enough? Interval Known Woe Odds Totals Totals %

0 3.66 40.72 453279 58.50% 0 2.1 9.52 34500 32.20%


2. Are the
1-10 3.24 125.86 77675 10.03% GB Odds 1-10 1.75 17.47 19876 18.55%

11-25 2 39.84 65666 8.48% different? 11-25 1.55 8.96 18249 17.03%

26-40 1.32 13.37 63213 8.16% 26-40 1.5 5.06 15986 14.92%
3. Is the
41-50 0.34 8.18 54362 7.02% 41-50 0.1 3.26 4500 4.20%
Risk Profile
51-65 -1.3 5.77 23421 3.02% different? 51-65 -0.6 2.33 4032 3.76%

66-80 -1.8 4.5 16542 2.13% 66-80 -2.3 1.58 3333 3.11%

81-100 -2.1 2.77 12652 1.63% 81-100 -3.2 0.9 3121 2.91%
4. Is the
101+ -2.5 2.25 5643 0.73% 101+ -4.65 0.49 2321 2.17%
Population %
Others 0.12 10.1 2352 0.30% distribution Others -1.2 6.66 1234 1.15%

total 17.17 774805 100.00% different? 2.46 107152 100.00%

Also compare Segment Characteristic Analysis Reports to the total


population Characteristic Analysis to assess total differences
www.apds-analytics.com
Model Development
GDS Link Asia
Model Prep Model Construct Model Check

Univariate Analysis Modelling


• Variable Reduction • Dummy vs Weight of Evidence
• Population Stability Models Diagnostics
• Correlation Analysis • Linear Regression (Score Dist,
• Business Rationale • Logistic Regression Gini, KSI
• Characteristic Analysis • Decision Trees etc)
• Random Forests

www.apds-analytics.com
Modelling Process Flow – Application Scores
GDS Link Asia
Modelling Samples

Build Known
Good-Bad Model

Build Accept-Reject Application


Scores only
Model

Reject
Inference
Only one
model for B-
Build Final Model Score & C-
Score

www.apds-analytics.com
Customer Management Analytics - Sampling
GDS Link Asia
Independent Variables Performance Variables
Data
Elements Transactional
Bureau Data,
Data
can be at e.g. No. of
e.g. Delq accounts
account or L3M
customer
level OR be Internal Data,
Contact
Performance
History, One Record per
external e.g. Other
e.g. £ of File
Account Account / Customer
bureau type Delq
PTPs Good-Bad Flag
taken
chars

Observation Period Outcome Period


(usually up to 12 months) (dependent upon model purpose)
Know Info @
Decision
Point Outcome Outcome is
shorter for
Observation Colls
Models
Point
Introduction to Paragon Modeller
(www.credit-scoring.co.uk)

• Reading Modelling Data – Read the given dataset


• Basic Manipulation
• Assigning Good Bad definitions / Generating Characteristics

www.apds-analytics.com
Exercise Discussion

• What have we learnt?


• Modeller Questions

www.apds-analytics.com
Modelling Steps 2
Model Process – Univariate Analysis
GDS Link Asia

PAST
Modelling Score Decide
DATA

Characteristic Reduction Statistical Modelling


• PSI • Linear Regression
• Characteristic Analysis • Logistic Regression
• Fine to Coarse Classing • Decision Tress
• Correlation Analysis • Random Forests
• Business Rationale

www.apds-analytics.com
Characteristic Reduction
GDS Link Asia
• In Feature Engineering we may generate hundreds / thousands of characteristics
• Importance of Reducing Variables within the Model Selection Process:
• Simplicity and Interpretability (for model / product stakeholders, model validators / approvers &
regulators)
• More efficient to build, less opportunity for the developers to engineer in Model Risk
• Reduced likelihood of generalised model over-fitting (tailored specifically to the development
data)
• Easier to implement and maintain
• Efficiency in model development, it is impossible to analyse 1000s of characteristics
• Simpler (lower characteristic models) tend to be more robust and less prone (to deterioration) to
small changes in the operational environment, therefore are more reliable & robust for longer
• Simpler models have less monitoring, on-going modelling overhead, may introduce less Model
Risk

www.apds-analytics.com
Correlation Considerations
Pearson’s Correlation Co-Efficient: -

• Correlation is a measure of how similar or how the


information given by the characteristic is the same as
the other variable ∑ 𝑋𝑋𝑖𝑖 − 𝑋𝑋 𝑌𝑌𝑖𝑖 − 𝑌𝑌
𝑟𝑟 =
• Highly correlated variables may interact with each 2 2
other and cause the models to be over fitted
∑ 𝑋𝑋𝑖𝑖 − 𝑋𝑋 ∑ 𝑌𝑌𝑖𝑖 − 𝑌𝑌
• Where multicollinearity exists the true relationship
between variables may be hidden
• Where correlation exists the opinions of model or Where Xi and Yi are the individual data points, and 𝑋𝑋 and 𝑌𝑌 are the
portfolio experts may be blunted means of X and Y, respectively
• Within the scorecard arena we typically utilise
Pearson or Spearman Correlations metrics
Spearman Rank Correlation Coefficient:-

∑ 2
𝜌𝜌 = 1 − 𝑛𝑛6𝑛𝑛2−1
𝑑𝑑𝑖𝑖

Where di is the difference the ranks of the corresponding data points


X and Y and n is the number of data

www.apds-analytics.com
Selection Statistics
Weight of Evidence / Value of Information

• Weight of Evidence is a relationship Weight of Evidence


between the proportion of goods and
bads with a particular attribute of a cell
• The Value of Information or Information
Value is a measure of predictive power % 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
across the variable
WoE = ln % 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏

Range Pred. Action


Power Value of Information
<= 0.02 Very Weak Exclude / Drop

0.02 – <= 0.1 Weak Exclude / Drop / Review


IV = ∑ % 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 − % 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 × 𝑊𝑊𝑊𝑊𝑊𝑊
0.1 - <= 0.3 Medium Include / Keep / Review

0.3 - <= 0.5 Strong Include / Keep

>= 0.5 Very Strong Include / Keep / Review

www.apds-analytics.com
• Let’s play with the Field Reducer in Modeller
Exercise – • Discuss key fields / reduction parameters
Modeller Field • Review the results
• What’s in the keep list?
Reduction
• Anything worthwhile or business worthy in the
delete list?
GDS Link Asia Analysis
Characteristic
• At a characteristic level the developer wishes to compare the distribution of goods
and bads across the attributes in order to ensure that predictive trends are
consistent and intuitive will perceive business logic
• The predictive trends across the attributes should be broadly linear in an
increasing or decreasing manner (dependent upon the characteristic)
• Characteristics with counter-intuitive (unexpected) trends will be excluded from
further analysis unless the trend can be explained due to business process and
procedures, e.g. for product A, older customers are higher risk.
• Additionally, we try to keep greater than 50 goods and 50 bads in each
characteristic group
• Records should not be overly concentrated in one attributed band
• We try to reduce the number of attribute groups to 6 or 7 (for low volume
scorecards there are likely to be less groups, perhaps 4 or 5)
• Groups are combined into classes of similar risk (risk can be measured by GB
Odds, Bad Rate or Weight of Evidence (WoE))
• Categorical fields should be grouped into like meaning categories, e.g. owned and
mortgaged property

www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model exhibit a degree of
stability over time (ensuring that a robust model is constructed)
• The Population Stability Index (PSI) is the standard best practice measure used to determine stability
across samples drawn from different time period
• The measure illustrates the discrepancy between two populations (development versus validation),
when used with individual characteristics it will show that whether the information being given has
changed
• PSI <= 0.1 means that the two populations are sufficient stable for the characteristic to be considered
for model entry

PSI
D=Ln(a
Attribute 20122012 Percentage (A) 20142014 Percentage (B) C= A -B /B) E=C*D

Current 229,195 0.56 233923 0.57 -0.01-0.0117 0.0001

X-dpd 107,295 0.26 117,042 0.29 -0.02-0.0782 0.0017


30 dpd 26567 0.07 16158 0.04 0.03 0.5060 0.0132
60 dpd 17824 0.04 15284 0.04 0.01 0.1625 0.0011
90 dpd 9967 0.02 7503 0.02 0.01 0.2927 0.0018

120+ dpd 14847 0.04 19339 0.05 -0.01-0.2556 0.0027


405,695 1.00 409249 1.00 0.0205

www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model are predictive, and
utilises the Information Value (IV), also referred to as the Value of Information (VOI) as a measure of
relative power
• Measures the discrepancy between goods and bads, taken from the same time based sample
• A Low value of VOI indicates that the good and bad populations are similar and is therefore not
predictive, i.e. will not help to separate good and bad accounts or customers

VOI

D=Ln(a
Attribute Good Good Percentage (A) Bad Bad Percentage C= A -B /B) E=C*D

Current 226,900 0.56 1,195 0.00 0.56 5.2551 2.9238

X-dpd 105,800 0.26 897 0.00 0.26 4.7790 1.2358

30 dpd 25,692 0.06 674 0.00 0.06 3.6494 0.2251

60 dpd 16,889 0.04 886 0.00 0.04 2.9564 0.1167

90 dpd 8,056 0.02 1,910 0.00 0.02 1.4480 0.0220

120+ dpd 9,653 0.02 5,211 0.01 0.01 0.6252 0.0069

392,990 0.97 10,773 0.03 4.5303

www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model are not overly
correlated, i.e. telling the same piece of information
• Correlation causes characteristics to come into the model with conflicting predictive trends that may
be counter-intuitive and out of line with earlier project analysis (characteristic analysis trends)
• Characteristics with the highest VOI should be selected for regression consideration, assuming they
have a correlation co-efficient greater than 0.3 (the threshold can be changed)

Characterictic VOI

Worst Delq in the last month 2.61

Worst Delq in the last 3 months 2.73

Worst Delq in the last 6 months 2.68

Worst Delq in the last 12 months 2.71

www.apds-analytics.com
GDS Link Asia Analysis
Characteristic

Attribute Good Good % Bad Bad % GB Odds WoE Total Total %


Current 226,900 0.56 1,195 0.0029 189.8745 5.255085 228,095 56.49%
X-dpd 105,800 0.26 897 0.0022 117.9487 4.778972 106,697 26.43%
30 dpd 25,692 0.06 674 0.0016 38.11869 3.649427 26,366 6.53%
60 dpd 16,889 0.04 886 0.0022 19.06208 2.956423 17,775 4.40%
90 dpd 8,056 0.02 1,910 0.0047 4.217801 1.448036 9,966 2.47%
120+ dpd 9,653 0.02 5,211 0.0127 1.852428 0.625219 14,864 3.68%
392,990 0.97 10,773 0.03 36.47916 403,763 100.00%

Max Delq Last 3 Months


200

150

100 Predictive
50
Trend Observed
0
Current X-dpd 30 dpd 60 dpd 90 dpd 120+ dpd
www.apds-analytics.com
Characteristic
GDS LinkAnalysis
Asia

Attribute Good Good % Bad Bad % GB Odds WoE Total Total %

<= 0 106,811.00 27.19% 3,655.00 34.03% 29.22326 0.799194 110,466.00 27.37%

1-10k 98,625.00 25.11% 1,374.00 12.79% 71.77948 1.963016 99,999.00 24.78%

10k-50k 36,911.00 9.40% 1,289.00 12.00% 28.63538 0.783117 38,200.00 9.47%

50k-100k 79,624.00 20.27% 1,641.00 15.28% 48.52163 1.326964 81,265.00 20.14%

100k-500k 46,243.00 11.77% 2,134.00 19.87% 21.66963 0.592619 48,377.00 11.99%

500k + 24,577.00 6.26% 649.00 6.04% 37.86903 1.035638 25,226.00 6.25%

392,791.00 100.00% 10,742.00 100.00% 36.56591 403,533.00 100.00%

80
70 Balance Last Month
60
50
40
30
20
None-Predictive
10 Trend Observed
0
<= 0 1-10k 10k-50k 50k-100k 100k-500k 500k +
Characteristic Reduction for Modelling
GDS Link Asia

100
Characteristics

PSI <= 0.1 Stability PSI > 0.1


80 20

No
10 70 Predictive

No
20
20 50 Correlation
Put into model after Fine &
Coarse Classing
Characteristic Analysis

5 45

No Linear / Intuitive Trend


www.apds-analytics.com
GDS Link Asia Classing
Characteristic
• For numerical characteristics the first step of the process is to produce a set of fine classed
characteristics (it is usual to break the characteristics into 20 fine classes, using a statistic ranking
mechanism)
• The process converts the continuous nature of the variable into a categorical characteristic
• Group manually to reduce the number of attributes by ensuring that fine classes with similar GB Odds
or Weights or Evidence are coarse classed together
• The Weight of Evidence of each Coarse class can be used within the regression analysis (alternatively
Dummies can be modelled)
• Coarse Classes should be created on the following principles:-
• Increasing / decreasing trends (WoE or GB Odds)
• Less than 8 coarse classes (for low data models this may be reduced to 4)
• There should be greater than 2% of the population in each class
• No. of bads in each coarse class should be greater than 50 or 1% of all bads
• Unknown classes should be assigned to neutral classes, where WoE is close to zero

www.apds-analytics.com
Exercise – • Scorecard Building Starts now
Grouping / • Auto-grouping
Classing /
• Grouping
Binning
Model Development

Model
Development Model Validation
(Linear, Logistic Diagnostics (Hold-out,
Regression, (Gini, KS, Bootstrapping,
other model Correlation) Out of Time)
types)
Comparing Linear Vs Logistic Regression
Linear Regression Logistic Regression
Advantages Advantages
• Interpretability • Requires Binary Outcomes
• Widely Applicable • Produces probability based estimates
• Simplicity • Less sensitive to outliers
• Can be used with continuous outcomes Disadvantages
Disadvantages • Complex Interpretation
• Assumes linearity • Limited to linear division boundaries
• Sensitive to Outliers
• Limited in that it is not well suited to
binary outcomes
Linear Regression Vs Logistic Regression (Equations)
GDS Link Asia
General form of Linear Regression General form of Logistic Regression
Yˆj = α + ∑ β j x j
α + ∑ βx
j e Linear Regression
Pr( x ) = α + ∑ βx
Where: 1+ e
Y: dependent variable
α : general intercept
β: co-efficient applied to the explanatory variable The output of the regression model is a probability from 0 to 1
x: explanatory variable

In Scoring:
Y: total score
Other Form of Logistic Regression
α : constant
β: co-efficient applied to the characteristics
 Pr( x ) 
x: explanatory variable (e.g. age, income, sex) g ( x ) = ln   = α + ∑ βx
350 [Y] = 200 [α ] + 50 x Age30-40 + 40 x Income50k+ + 60 x SexFemale 1 − Pr( x ) 

www.apds-analytics.com
Comparing Linear Vs Logistic Regression Graphically
GDS Link Asia
Comparing Graphical Patterns: Logistic Regression Vs Linear Regression

LINEAR REGRESSION LOGISTIC REGRESSION

GB x x xx x GB x x xx x
Odds Odds

x x xx x x x xx x
Age Age

Logistic Regression tends to fit to more of the datapoints, for example predicting score by age & GB Odds
www.apds-analytics.com
Dummy Models Versus Weight of Evidence
Dummy Models WoE Models
Advantages :- Advantages:-
• Characteristic Attributes split into dummy • Characteristic Attributes assigned their
variable respective weight of evidence
• Missing values easily handled • Applicable to all model types
• Encodes monotonically • Easy to interpret as each attributes becomes a
• Effective in binary classification variable
• Variable contribution is easily observed
Disadvantages:-
Disadvantages:-
• Increased effort to prepare modelling sample
• Curse of dimensionally when there is are large (coding inefficiency)
number of attributes
• Non-monotonic relationships may not be
• Not used in continuous outcome models desirable
• Assumes a linear relationship between • Tends to model the extremes
variables and target
• Does not handle missing variables well

www.apds-analytics.com
Score Creation and Scaling (PDO)

Score creation Points to Double the Odds


• To create the score for Dummy
Models
• Multiple the Parameter Estimate by a
user determined factor
• Sum across all characteristics, including • Points to Odds (pdo), often 20 or 50
the intercept
• To create score for WoE Models • Factor = pdo/ln(2)
• Multiple the Parameter by the calculated • Offset = Score – (Factor * ln(odds))
attribute level WoE
• Scale the scorecards for comparison
• Sum across all model characterisrics, across the different models
including the intercept
• Comparison is only meaningful if the
same performance definition is used
Model Diagnostics - Kolmogorov–Smirnov (KS)
Skewed KS away from Decision Point
Example of Maximum Separation (KS)
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3 KS = 67.7
0.3 KS = 42.96 0.2
0.2 0.1
0.1 0
0 <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= >
<= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= > 451 611 639 657 672 682 695 711 721 741 757 767 775 792 810 819 836 864 903 903
451 611 639 657 672 682 695 711 721 741 757 767 775 792 810 819 836 864 903 903
goods bads Goods Bads

• Maximum difference between cumulative % goods and cumulative % bads


• Advantages: -
• Standard scale 0.0 - 1.0 comparable across models
• Identifies area where Scorecard works best (i.e. max separation)
• Disadvantages: -
• Point of maximum difference may not fall at the decision point

www.apds-analytics.com
Model Diagnostics – Gini Co-Efficient / Lorenz Curve / RoC
Curve
Example of Gini Plot Example of a Skewed Gini
1 1

0.9 0.9

0.8 0.8

0.7 0.7
0.6 0.6 GINI CO-EFFICIENT 83.24
% Bads

% Bads
0.5 0.5
0.4 0.4
0.3 GINI CO-EFFICIENT 56.26 0.3
0.2
0.2
0.1
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
% Goods
% Goods

• Measures the difference between the cumulative of goods and bads by score
• Advantages: -
• Standard scale 0% - 100% comparable across models
• Disadvantages: -
• Separation may be skewed to high or low scores only
www.apds-analytics.com
Reject Inference
By building a new model with reject inference we wish to maximise the number of ‘new goods’ from the old Reject
population and minimise the number of ‘new bads’ from the previous Accepted population
Known Performance
1
0.8

Probability
0.6
0.4
0.2
Prev 0
300 350 400 450 500 550 600 650 700 750 800 850
Accepts Score

P(G) P(A)
New
Goods
New
Bads

Known Performance with Rejects


Prev 1
0.8
Rejects 0.6

Probability
0.4
0.2
0
-0.2 300 350 400 450 500 550 600 650 700 750 800 850
-0.4
Score
Drag the Rejects P(G) down as we assume rejects would perform P(G) P(A) Rejected P(G)
worse than known population
www.apds-analytics.com
Reject Inference – Alternative Data (Bureau / Open Banking)

• Bureau Data / Open Banking Data


• Search for similar loan types applied / approved shortly after reject at other
lenders (does the bank have access to the data)
• Check performance of loans that were written, is this indicative of how the
loan would have performed had we accepted it
• Compare the populations of rejects with the performance of known
performance
• Historic Loans (at bank)
• Other Lending Products

www.apds-analytics.com
Model Validation • Model Validation should be carried out
• as part of the development process
• as an independent verification of model quality
prior to model implementation or use
Hold-
• periodically to confirm modelling assumption
Out
made at point of development hold and that the
model continues to work as designed
• Validation often uses
• a hold-out sample (develop model on 80% of
sample, test on 20%)
Model • Boot-strapped validation (random samples pulled
Boot-
Diagnostics / from the development sample to test the model)
strap
Validation
• Out of time data sample

Out of
Time /
Sample
• Let’s build models
Exercise – • Model Assessment
Modelling
• Refinement
Other Models to Consider – IRB / IFRS9

• IRB Models look to utilise multiple models to predict the capital requirements of
the bank (by modelling the Probability of Default (PD), Exposure at Default (EAD)
and Loss Given Default (LGD)
• IRB Models should be adjusted for cyclicality across the economic cycle
• Model weaknesses need to be address with Margin of Conservatism adjustment
• IFRS9 models typically utilise similar modelling constructs but consider Lifetime
PD, EAD and LGD
• Revolving products may have complicated EAD models that aim to predict the
amount of the limit that will be consumed by the time of default
• LGD for mortgage products can also be complicated by the recovery processes
within the bank and the legal structure / environment in which they operate
• Monitoring and Validation of models is also important (covered in a separate
session)

www.apds-analytics.com
Why is Model Risk Important?

• What is Model Risk?


• Financial impact associated with utilising models to make key decisions within a bank or financial
institution
• Reputational risk from making poor decisions
• Arises from model error and prediction inefficiency
• Propagated by lack of appropriate controls

• Why is Model Risk important?


• Models are increasingly used in an ever-expanding range of operational and regulatory decisions
• If not understood and managed the risk can aggregate to a level that is outside the bank’s risk appetite
• As with all risks, if not managed a financial loss could occur

• Where does Model Risk arise from?


• From each and every aspect of the model lifecycle
• We’ll focus on model development here

www.apds-analytics.com
Regulatory Landscape
• Model Risk Regulations or Guidelines have been published by a number of
Regulators, including
• Federal Reserve (SR11-7 – Guidance on Model Risk - April 2011)
• Bank of England (SS1/23 – Model Risk Management Principles for Banks – May 2023 )
• CBUAE (Model Management Standards (MMS) & Guidelines (MMG) – December 2022)
• ECB (Guide to Internal Models – Oct 2019)

• The Guidance can be


• Principles based – allows a great deal of freedom in terms of how the models are managed as long as management
of the models follows a general path
• Prescriptive – sets in stone how models should be developed and managed

• More developed markets with sophisticated banking groups tend to have gone
down the principles-based approach, whereas developing markets are more
prescriptive

• Many global regulators are yet to release guidance, firms in those regions may
want to get ahead of the curve

www.apds-analytics.com
Thank You
www.apds-analytics.com
Risk Modelling Training

by

APDS Consulting

www.apds-analytics.com

You might also like