Scorecard Training Slides 2024 v5
Scorecard Training Slides 2024 v5
Training
20th / 21st June 2024
by
APDS Consulting
(Matthew Freeman)
www.apds-analytics.com
• Introduction
• Credit Analytics and Scoring Concepts
• Decision Areas
• Origination
• Behavioural
• Credit Propensity
Scorecard
• Segmentation
• Characteristic Selection
• Characteristic Classing
Training •
•
Modelling
Scorecard Scaling
Agenda
• Model Diagnostics & Validation
• Score Level – PSI, Gini, KS & Rank Ordering
• Characteristic Analysis
• Conclusions
Matthew has twenty plus years retail financial services industry experience, predominantly within the
analytical risk management sphere, covering diverse geographies ranging from the United Kingdom and
Western Europe, throughout the Middle East and India, China, Asia (North and across the South East) and
Australasia (Australia and New Zealand).
Much of this experience has been gained within the big credit risk
consulting firms (such as Experian and FICO) and with large banking and
finance groups (such as Lloyds, Barclays, United Overseas Bank and GE
APDS
Capital), developing both operational (supervised segmented application
and behavioural scorecards (incorporating statistical and business oriented
segmentation), built utilising linear or logistic regression) and regulatory
(Basel II & IFRS9 PD, EAD and LGD) models, utilising a wide range of
Consulting -
internal and external datasources (which have undergone detailed and
specific quality checks to ensure that they are fit for the modelling /
analytical purpose for which they are intended). Models and associated
strategies have been developed for all the major consumer product groups
Matthew
such as cards, loans and mortgages.
More recently he has started to consider how the new, exciting and
Freeman
developing datasources (reviewing current data scenarios, whilst
identifying any data gaps and therefore the need for enhanced data
collection that fits with intended business (problem solving) use of the
data), thrown up by the Big Data revolution, can be used to help both
traditional and new lenders to exploit the opportunities that have been
Biography created, including identifying potential high net worth individuals for a
wealth management bank, how to source new customers in a risk
responsible manner where traditional credit data is thin, but newer
innovative sources are abundant or the developing of risk and marketing
models for the pre-paid mobile telephone sector in Asia.
Scorecard Training
Model Risk
Marketing Analytics Generic Visa
Management Gap
and Modelling for a Scorecards across 4
Analysis across the
top 5 Saudi Bank countries
Middle East
APDS Clients
www.apds-analytics.com
Training and Mentoring – Foundation to Expert Level
Intermediate Advanced
(Portfolio (Internal Model
Foundation
Monitoring and Development,
(Training to
joint advanced
allow project
development of strategies,
participation)
new models and consultants as
strategies) trusted advisors)
3 years
Analytics
The Analytics Journey
High
Optimisation
Data Driven
Profitability
Expert/Generic Scores
Rules
Low
www.apds-analytics.com
Credit life cycle Analytics – Decision & Prediction
Decisions
Whom to target Approve/Decline Line Management Collections priority
Product to offer Line/Loan/Lease Re-pricing/Renewal Collection action
Channel amount
Authorizations DCA placements
Timing Price
Cross-sell Channel placements
Up-sell
Early Collections
Predictions
Response Risk Risk Amount collectible
Revenue Revenue Revenue Charge-off
Risk Capacity Attrition Bankruptcy
Pre-payment Capacity Roll
Fraud Fraud
www.apds-analytics.com
Credit life cycle Analytics - Results
Better Results
www.apds-analytics.com
Questions
www.apds-analytics.com
Scoring Concepts
Analytics & Scoring - The Concept of Modelling
Future Predictions
Data Future
Data Science / Predictions
Universe AI or ML
Generic
Internal Models
Data
Macro Custom
Econ Models
Strategic Use
Bureau Regulatory
Data Models of Models &
Data
Stress /
Open
Scenario
Banking
Models
Advanced
Alt Combo
Data Models
www.apds-analytics.com
Describe Datasources
EXAMPLE SCORECARD
Applicant Age in Years
< 22 -50
22 - 25 -20
• Scorecards add and subtract points to a
baseline constant according to each 26 – 40 0
individual’s or account’s data 41 – 55 +30
www.apds-analytics.com
The Role of Analytics & Scoring
Consider a scorecard built to predict whether a new applicant for a credit product will default on their
payments within time X
This scorecard is used when a new customer applies…
EXAMPLE SCORECARD
Application
Form Data Applicant Age in
Years
< 22 -50
22 - 25 -20
LOAN
26 – 40 0
APPLICATION Alternate Data 41 – 55 +30 Score
(Rider or Customer > 55 0
NAME
Characteristics)
LOAN AMOUNT Time as Rider
PURPOSE
RISK
<1 0
DEMOGRAPHICS
1–2 -45
3+ -100
Worst Payment
Open
Status
Banking and / or
Bureau Data Current 0
1 Payment missed -10
Take most appropriate action
2 Payments missed -60
for each individual
Etc. Etc.
… …
Previous
Loan Behaviour
www.apds-analytics.com
Using the Scores in Decisions
Low Score
APPLICATION SCORE High Score
Very High Risk High Risk Standard Risk Very Low Risk
Worst Applicants. Reject these Good applicants: The best applicants.
Reject these applicants or give Accept Consider higher
applicants low limits and apply limits
higher pricing.
www.apds-analytics.com
Analytics Across Business Areas
GDS Link Asia
Business Area Challenge Model Data Business Outcome
Prospecting • Who is the right customer to target? Propensity to Respond Alternative Data Improve Customer Loyalty
• How can we cost effective market to the right customer? Propensity to Take-Up Demographics Reduce Marketing Costs
• Can we pre-screen the risk associated with our targets? Account Management Increased Marketing Efficiency
Reduced Attrition
Increased Revenue / Profits
Origination • Who to accept and reject? Application Scores Demographics Risk Profiles
• Under what terms and conditions? Bureau Scores Bureau Data Improved Customer Service
• Can I X-sell addition products to the applicant? Application Fraud Alternative Data Reduced Losses
Account Management
Customer Management • Which of my existing customers can we x-sell or up-sell Behavioural Scores Demographics Risk Profiles
additional products and services to? Bureau Data Improved Customer Service
• Which customers can we offer an increase in limit? Alternative Data Increased Market Share
• How do we determine pre-delinquency treatments? Account Management Increased Profits
Reduced Losses
Usage • How can we identify the customers most likely to use the Revenue Models Demographics Increased Revenue / Profits
product more? Utilisation Bureau Data
• How do we identify the profitable customers that are likely to Attrition Alternative Data
close their facilities? Account Management
Collections • Which delinquent customers to we collect upon? Collections Scores Demographics Reduced Losses
• How do we prioritise collection treatments? Payment Projection Bureau Data Increased Recoveries
• Which accounts do we collect, litigate, sell or write-off? Loss Forecasts Alternative Data
Account Management
Collections Management
Regulatory Models • How much capital do we need? PD, EAD and LGD models for Demographics Capital Planning
• Can we estimate our provisions based upon the lifetime of the AIRB Bureau Data Expected Loss Calculation for Provisioning
facility? PD, EAD and LGD Models for Alternative Data Long Term Business Planning
• How will the economy affect our capital and provisioning IFRS9 Account Management
positions in the future? Macroeconomic models for Collections Management
stress testing and scenario Macroeconomic Data
analysis
www.apds-analytics.com
Benefits of Credit Scoring
GDS Link Asia
Increased Increased
Optimised Engaged
Portfolio Regulatory
Portfolios Staff
Knowledge Standing
Maximised
Opportunity
www.apds-analytics.com
Questions
www.apds-analytics.com
• List three areas where the concept of scoring
could be applied to your bank / organisation
• What’s the business problem that could be
addressed?
• What data would be used or could be useful?
• Discuss the business case for doing such a thing?
Exercise 1 (1
• Ideation sessions
hour)
Decision Areas
GDS Link Asia
Acquisition Analytics
• Is the bank targeting the right customers?
• Is the bank using Alternative Data to target thin file or unbanked customers?
• Is the bank acquiring customers based upon the bank’s risk appetite and assigning appropriate terms and conditions
based upon the risk profile?
• Does the bank capture and store the right data for the use within scoring and strategy deployment?
• Is the bank maximising the benefits of using date from both internal and external datasources (bureau data / alternative
data)?
• Is the bank satisfying the prevailing regulatory compliance criteria?
www.apds-analytics.com
Acquisition Analytics
GDS Link Asia
www.apds-analytics.com
Analytics in the Acquisition Area
Application Scoring - Overview
NOW
Outcome
• Application scorecard using Application Demo data
Outcome • Can use Bureau data, and Internal other Product data
Development Statistical • Usually use past Delinquency behaviour to predict
Sample Model
outcome i.e. 90+DPD Default definition
Benefits:
• Reduction in Bad debts
• Aim of high level of Automation
• Tool for Risk Based Pricing
Constant +800
Age of Applicant
<22 -50
22-25 -10
26-40 0
Low High
41-55 +30
Score Score
APPLICATION Worst status Last 6 months
SCORE 0
1-2
0
-45
Bureau
Benefits
• Increase in predictive power of
acquisition decisioning Low Medium High
www.apds-analytics.com
Potential Acquisition & Origination Modelling Datasources
• Increasing levels of identity fraud leads to increasing fraud losses at a diverse range of banks
• Data and Models can be used to help the bank identify cases that would require further investigation (searching for
anomalies in the data)
• Application Fraud Models
• Use of bureau data is permitted
• Potentially early bads may be classed as fraudulent
• Benefits (taken from a UK example)
• Referral of 9% of Applications for further investigation
• Detection of 56% of fraud cases, saving the bank money Example Characteristics include complex
interactions, e.g. Age and High Income
Analytics and Customer Management
GDS Link Asia
Maximising the use of the bank’s data that can be mined from existing customers
Is the bank pro-actively managing it’s existing customer base to drive revenue and reduce costs?
www.apds-analytics.com
Customer Management Analytics
GDS Link Asia
Key Analytic Area at a
Customer Management Strategies
Level
www.apds-analytics.com
Customer Management Analytics
GDS Link Asia
Account Level
Behavioural Customer Level Early Collections
Score Cross Sell Scores Scores
www.apds-analytics.com
Looking at Behavioural Scoring and how to use Strategy
Exposure Management
Cross-Sell
Portfolio Data
Pr( x) =
• Number of Overlimit last
3 months Segmentation
• Number of Payments α + ∑ βx
greater than minimum
payment last 6 months
1+ e
• Max Utilisation last 9 Collections Priorisation
months
• Number of Collection Roll-Rate Management
Calls last 24 months
Payment Projection
Recoveries
Management
www.apds-analytics.com
Customer Management Analytics – Collections Management
GDS Link Asia
Account Status
Description Manage Customers Initiate Customer Continued Customer Decision made to end
Relationships Contact for Contact for the customer
through • Debt Control • Prioritised Debt relationship,
• Pro-active limit Control therefore switch to
• Triage
management • Arrears • Recovery
• Limit Management Strategies
• Competitive Pricing Management
• Pre-emptive Triage • Save the • Debt Sale
Champion
Modelling Monitoring
Post Default
Payment
Chars
Alternative
Data Chars Efficient Frontier
Challenger 14
A 12
Increasing delinquency – Missed Payments
10
% Collected
8
6
Data Science /
4
Credit Scoring
Payment 2
U Projection
3 6 9 0
𝑒𝑒 𝛼𝛼+∑ 𝛽𝛽𝛽𝛽 T X / LGD
0 0 0 1 2 3 4 5 6 7 8 9
Pr( 𝑥𝑥) = D Models
1 + 𝑒𝑒 𝛼𝛼+∑ 𝛽𝛽𝛽𝛽
Challenger Months since Implementatio
B
Challenger C too
Recovery – Making Payments
extreme, A not radical
enough, B improves
Recovery to Contact
Behavioural
Scoring Chars
Lend More
Model
Current
Model
Default
Prevention
History
Chars
over the Champion and
becomes incumbent
(30 dpd) Model
(30 dpd)
(60 dpd)
Challenger
C
www.apds-analytics.com
Collections & Payment Projection Models
www.apds-analytics.com
Collection Scores – Basic Strategy What is it?
How?
collections / delinquent
environment
• The use of external data, Collections Contact Data Open Banking
such as open banking and • Number of right party • Total Number of Accounts in
a Delinquent state
other alternative sources contact last month
• Total Outstanding Balance
• Number of Promises to Pay
would be encouraged as it Taken L1m, L3M, L6M • Our Outstanding as a
provides a much more • Number of Partial Payments proportion of Total
Outstanding Last Months
holistic view Made L3M, L6M
• Total Late Fees Paid Last 3
• Ratio of Payment to
Outstanding L3m months
• No. Months with delinquency
last 3M, L6M, L12m
Discussion Area
– Exercise 2
Modelling Steps
Scorecard Development Steps
Low predictive chars will be
business logic check to Modelling
determine whether they • Characteristic Classing
are considered for
modelling
• Modelling – Linear vs Logistic
Regression
• Model Assessment & Re-model
• Finalisation Workshop & Report
Variable
Design and Reduction Modelling incl
Data Initial Analysis (PCA, Reject
Requirements Correlation, Inference
VoI)
Model Strategy
Diagnostics Develop
Design is confirmed
Essential that the design is through data analysis
correct and aligned to e.g. the performance
business requirements definition, the
development sample etc
www.apds-analytics.com
Data Sample Construction
GDS Link Asia
Design Application Scoring
(Business Problem, What Data is Available, Sample Window Outcome Window
High-level Solutionising)
Behavioural Scoring
Observation Period Outcome Window
Sample Window
Ja
n Outcome Point
1 F
e
b
Monthly Application 2 M
a
Or Transactional r
3
Files
Rolling Observation Windows Rolling Performance Windows /
De
c
12
*Typically used when the number of bads is not low, and mature relatively quickly or when the portfolio requires a fixed outcome period (i.e. all bads
have the same time to exhibit poor performance)
Development Team Data Sample Construction
Fixed Window
GDS Link Asia
Modelling Data
Outcome Point
Sample Window
Performance Data
Transaction Data
24 12 Now
Months Ago Months Ago
*Typically used when the number of bads is low, the earliest bads are given longer
(up to 24 months in this case) to mature
Example Performance Definition - Complex
GB Good-Bad Classification
Flag Definition
Design
(Business Problem, What Data is Available,
High-level Solutionising)
1 Voluntary Cancel / Close / Exclusion
Deceased / Never Active
www.apds-analytics.com
Example Performance Definition - Simple
Design
(Business Problem, What Data is Available, The aim is to the scorecard development as simple as possible
High-level Solutionising)
GB Good-Bad Classification
Flag Definition
Data Manipulation
(char derivation, segmentation
investigation, performance definitions,
exclusions etc.)
www.apds-analytics.com
Defining Exclusions (Application)
GDS Link Asia
The bank would like to treat these customers differently from the
normal thru the door population when they for credit
• Example Exclusion Rules VIP
• Staff
• Students
• Pre-Approved Customers
www.apds-analytics.com
Defining Policy Rules (Application)
GDS Link Asia
• The applicant will automatically be rejected when hit any of the bank’s
Policy Rules (which define the bank’s lending or regulatory criteria)
• Policy Decline Rules are a set of criteria which each applicant must pass
regardless of their application score
• Where the bank allows policy exemptions, we may analyse the effectiveness
of the policy rules by looking into default rates by exception rule
www.apds-analytics.com
Defining Exclusion Rules (Behavioural)
GDS Link Asia
• For the behavioural model the customer is already on book
therefore the exclusion rules are slightly different
• There two levels of exclusion rules for existing customer,
Observation & Outcome
• Example Observation Exclusion Rules
• Bad at Observation
• Inactive in the 12 months prior to Observation Point (no predictive data)
• Fraud / Lost or Stolen / Dispute
• Basel and IFRS9 Models may have separate exclusion rules, therefore these must be
known as the operational scorecards are often the keystone within those model
structures
www.apds-analytics.com
• Please define your performance definition
Exercise 3 – for your risk or propensity model
Performance • What are your key assumptions and the
Definitions and rationale behind your thinking?
• Please relate your definition back to a business
Outcomes prompt?
• Please define your performance definition
Exercise 3 – for your risk or propensity model
Discussion • Discussion
Day 1 - • Any Questions?
Reflections • Anything you’d like to go over again?
Defining Data
GDS Link Asia
1. Spec the data (Dev Team, Business Reps)
2. Extract Data (IT Team)
3. Initial Data Analysis to assess data (Dev
Team) quality
• Depth
Data Sourcing • History
(extract, merge, quality checks etc.) • Comparisons between segments,
e.g. Accepts & Rejects
• Population Rates
• Expected Values
Data Extract • Future Data Availability (Dev Team, IT
(application data, associated performance, Team)
sufficient time periods) • Data Audit Report (Dev Team, Business
Reps, IT)
Data Manipulation
(char derivation, segmentation
investigation, performance definitions,
exclusions etc.)
Who?
Dev Team
Business Reps
IT
www.apds-analytics.com
Data Prep
GDS Link Asia
For data assessment and specification two main areas are Ref No. Data Item Portflio Best Practice Bank Available
Recommendation
investigated. The first step is to look at the ‘Quantity’ of data
available and whether the information is sufficient (and covers
all the necessary fields) to construct appropriate scorecards.
The second step is to determine whether the available data is
of adequate ‘Quality’ and is a more analytic exercise. Balance
Step 1 – Quantity 1 CC Y Y
• Data Quantity assessment of the key data fields for in scope
portfolios
• Starting point is always best practice variable lists 2DPD CC Y Y
• Recommendations of required fields
• Recommendations of going forward fields
3Open Date CC Y N
Step 2 – Quality
• Review of data elements within each field by assessing population
4Limit CC Y Y
rates, data accuracy through detailed analysis of documentation,
data items etc
• Data Quality Assessment should include descriptive statistics, 5Payment CC Y Y
accuracy tests, documentation review and recommendations
www.apds-analytics.com
Data Check & Audit
GDS Link Asia
• To determine scorecard development feasibility the following analysis will be
undertaken, to determine
• Depth of Data
• Breadth of Data
• For Numerical Characteristics, frequencies will be produced investigating the
descriptive statistics of mean, min, max to highlight any data anomalies
• For Categorical Characteristics the frequency tables will investigate that the
appropriate codes are populated correctly
• For Example
• Balances are within the expected ranges, i.e. all positive and no abnormally high values
• Charge-Off is expected to values of A to E but a value of X is observed
• No. of Written Off accounts is expected to be 2% but 20% of cases have that status#
• Additionally
• Data Availability to code exclusion rules, good-bad definitions (including the default definition) exist and are well populated
• The supplied data is logical and no irregular trends are observed, examples include
• movement through delinquency buckets follows expected trends, i.e. Jan 2013 DPD is 0, but 90 dpd in February
• Valid Open Dates exist
• DPD registers a zero value but the account is charged off
• Number of records the expected number of records
www.apds-analytics.com
Data Manipulation
GDS Link Asia
www.apds-analytics.com
Data Needed for Application Scorecards
GDS Link Asia
Observation Outcome
Point Point
Outcome Period
Typically 12 to 24 months Monthly
account
performance
information for
each month in
Data as at point of Application the outcome,
For Credit used to
determine the
GB Flag
Appl Existing Credit Other Perf
Data Accounts Bureau Sources Data
Historical Application Account / Customer Customer Credit Open Banking, Monthly Account
Data, for covering 12 Data from existing Data from other alternative Performance Data
Months of applications client Financial Institutions datasources
www.apds-analytics.com
Data Needed for Behavioural Scorecards
GDS Link Asia
Observation Outcome
Point Point
www.apds-analytics.com
Data Preparation & Initial Analysis
GDS Link Asia
e.g. Max Delq Last 3m
Raw
Data Derivation Min Balance last 6m
Data
etc
pop 1 pop 2
Example of Raw to Derived – Turning Data into Information
Raw Data Derived Data
Column
Column ID Comment
Description
CIF_ID Customer ID
FORACID Loan Account ID
OPEN_DATE Loan Open Date
MATURE_DATE Loan Mature Date
Loan Disbursed
DISBURSED_AMOUNT Amount
SCHM_CODE Loan Product Code
DR_INTEREST_RATE Interest rate
Balance as of
BALANCE reporting date
OVERDUE_PRINCIPAL Overdue in principal
Overdue days in
OVERDUE_PDAY principal
OVERDUE_INTEREST Overdue in interest
Overdue days in
OVERDUE_IDAY interest
total overdue overdue principal +
TOTAL_AMT amount interest
REPORTDATE report date
www.apds-analytics.com
Data Exercise – 15 Minutes
GDS Link Asia
www.apds-analytics.com
Sample Window Selection
GDS Link Asia
Considerations for the Sample Window chosen
• Volumes
• Is there enough volume within the good-bad categories? Typically, scorecards require at least 750 of each
• Do sufficient volumes exist for the suggested or desired scorecard segments?
• Is the Sample Window representative of the ‘Going Forward’
World?
• Long Outcome Windows may be less relevant
• Have there been or expected to be changes in the portfolio?
• Are the products offered today the same as in the past? How do they differ?
• Have there been any specific marketing events that need considering?
• Data Availability
• Are all the characteristics available for the entire observation period?
• Is retrospective bureau data available for the observation period?
• Seasonality
• Do special events in the year affect the flow of the portfolio, e.g. Christmas?
• Selecting 12-month observation window help to smooth out spikes in demand or delinquency?
www.apds-analytics.com
Vintage Analysis – Length of the window
GDS Link Asia
Time to Bad Analysis: Vintage Report:
Choose
period
when bad
rate is
stable /
flattens
www.apds-analytics.com
Roll-Rate Analysis – Confirming Good-Bad Definition
GDS Link Asia
Maintaining Stability
www.apds-analytics.com
Why Segment?
Origination App Pop
< 60 Pay
Curr dpd Pred
www.apds-analytics.com
Confirming Segmentation – Differing Performance
GDS Link Asia
Current Delq
1. Is the Population
Known GB
Interval Known Woe Known GB Odds Totals Totalslarge
% enough? Interval Known Woe Odds Totals Totals %
11-25 2 39.84 65666 8.48% different? 11-25 1.55 8.96 18249 17.03%
26-40 1.32 13.37 63213 8.16% 26-40 1.5 5.06 15986 14.92%
3. Is the
41-50 0.34 8.18 54362 7.02% 41-50 0.1 3.26 4500 4.20%
Risk Profile
51-65 -1.3 5.77 23421 3.02% different? 51-65 -0.6 2.33 4032 3.76%
66-80 -1.8 4.5 16542 2.13% 66-80 -2.3 1.58 3333 3.11%
81-100 -2.1 2.77 12652 1.63% 81-100 -3.2 0.9 3121 2.91%
4. Is the
101+ -2.5 2.25 5643 0.73% 101+ -4.65 0.49 2321 2.17%
Population %
Others 0.12 10.1 2352 0.30% distribution Others -1.2 6.66 1234 1.15%
www.apds-analytics.com
Modelling Process Flow – Application Scores
GDS Link Asia
Modelling Samples
Build Known
Good-Bad Model
Reject
Inference
Only one
model for B-
Build Final Model Score & C-
Score
www.apds-analytics.com
Customer Management Analytics - Sampling
GDS Link Asia
Independent Variables Performance Variables
Data
Elements Transactional
Bureau Data,
Data
can be at e.g. No. of
e.g. Delq accounts
account or L3M
customer
level OR be Internal Data,
Contact
Performance
History, One Record per
external e.g. Other
e.g. £ of File
Account Account / Customer
bureau type Delq
PTPs Good-Bad Flag
taken
chars
www.apds-analytics.com
Exercise Discussion
www.apds-analytics.com
Modelling Steps 2
Model Process – Univariate Analysis
GDS Link Asia
PAST
Modelling Score Decide
DATA
www.apds-analytics.com
Characteristic Reduction
GDS Link Asia
• In Feature Engineering we may generate hundreds / thousands of characteristics
• Importance of Reducing Variables within the Model Selection Process:
• Simplicity and Interpretability (for model / product stakeholders, model validators / approvers &
regulators)
• More efficient to build, less opportunity for the developers to engineer in Model Risk
• Reduced likelihood of generalised model over-fitting (tailored specifically to the development
data)
• Easier to implement and maintain
• Efficiency in model development, it is impossible to analyse 1000s of characteristics
• Simpler (lower characteristic models) tend to be more robust and less prone (to deterioration) to
small changes in the operational environment, therefore are more reliable & robust for longer
• Simpler models have less monitoring, on-going modelling overhead, may introduce less Model
Risk
www.apds-analytics.com
Correlation Considerations
Pearson’s Correlation Co-Efficient: -
∑ 2
𝜌𝜌 = 1 − 𝑛𝑛6𝑛𝑛2−1
𝑑𝑑𝑖𝑖
www.apds-analytics.com
Selection Statistics
Weight of Evidence / Value of Information
www.apds-analytics.com
• Let’s play with the Field Reducer in Modeller
Exercise – • Discuss key fields / reduction parameters
Modeller Field • Review the results
• What’s in the keep list?
Reduction
• Anything worthwhile or business worthy in the
delete list?
GDS Link Asia Analysis
Characteristic
• At a characteristic level the developer wishes to compare the distribution of goods
and bads across the attributes in order to ensure that predictive trends are
consistent and intuitive will perceive business logic
• The predictive trends across the attributes should be broadly linear in an
increasing or decreasing manner (dependent upon the characteristic)
• Characteristics with counter-intuitive (unexpected) trends will be excluded from
further analysis unless the trend can be explained due to business process and
procedures, e.g. for product A, older customers are higher risk.
• Additionally, we try to keep greater than 50 goods and 50 bads in each
characteristic group
• Records should not be overly concentrated in one attributed band
• We try to reduce the number of attribute groups to 6 or 7 (for low volume
scorecards there are likely to be less groups, perhaps 4 or 5)
• Groups are combined into classes of similar risk (risk can be measured by GB
Odds, Bad Rate or Weight of Evidence (WoE))
• Categorical fields should be grouped into like meaning categories, e.g. owned and
mortgaged property
www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model exhibit a degree of
stability over time (ensuring that a robust model is constructed)
• The Population Stability Index (PSI) is the standard best practice measure used to determine stability
across samples drawn from different time period
• The measure illustrates the discrepancy between two populations (development versus validation),
when used with individual characteristics it will show that whether the information being given has
changed
• PSI <= 0.1 means that the two populations are sufficient stable for the characteristic to be considered
for model entry
PSI
D=Ln(a
Attribute 20122012 Percentage (A) 20142014 Percentage (B) C= A -B /B) E=C*D
www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model are predictive, and
utilises the Information Value (IV), also referred to as the Value of Information (VOI) as a measure of
relative power
• Measures the discrepancy between goods and bads, taken from the same time based sample
• A Low value of VOI indicates that the good and bad populations are similar and is therefore not
predictive, i.e. will not help to separate good and bad accounts or customers
VOI
D=Ln(a
Attribute Good Good Percentage (A) Bad Bad Percentage C= A -B /B) E=C*D
www.apds-analytics.com
Measures of Characteristic Performance
GDS Link Asia
• The developer wishes to ensure that the characteristics used within the model are not overly
correlated, i.e. telling the same piece of information
• Correlation causes characteristics to come into the model with conflicting predictive trends that may
be counter-intuitive and out of line with earlier project analysis (characteristic analysis trends)
• Characteristics with the highest VOI should be selected for regression consideration, assuming they
have a correlation co-efficient greater than 0.3 (the threshold can be changed)
Characterictic VOI
www.apds-analytics.com
GDS Link Asia Analysis
Characteristic
150
100 Predictive
50
Trend Observed
0
Current X-dpd 30 dpd 60 dpd 90 dpd 120+ dpd
www.apds-analytics.com
Characteristic
GDS LinkAnalysis
Asia
80
70 Balance Last Month
60
50
40
30
20
None-Predictive
10 Trend Observed
0
<= 0 1-10k 10k-50k 50k-100k 100k-500k 500k +
Characteristic Reduction for Modelling
GDS Link Asia
100
Characteristics
No
10 70 Predictive
No
20
20 50 Correlation
Put into model after Fine &
Coarse Classing
Characteristic Analysis
5 45
www.apds-analytics.com
Exercise – • Scorecard Building Starts now
Grouping / • Auto-grouping
Classing /
• Grouping
Binning
Model Development
Model
Development Model Validation
(Linear, Logistic Diagnostics (Hold-out,
Regression, (Gini, KS, Bootstrapping,
other model Correlation) Out of Time)
types)
Comparing Linear Vs Logistic Regression
Linear Regression Logistic Regression
Advantages Advantages
• Interpretability • Requires Binary Outcomes
• Widely Applicable • Produces probability based estimates
• Simplicity • Less sensitive to outliers
• Can be used with continuous outcomes Disadvantages
Disadvantages • Complex Interpretation
• Assumes linearity • Limited to linear division boundaries
• Sensitive to Outliers
• Limited in that it is not well suited to
binary outcomes
Linear Regression Vs Logistic Regression (Equations)
GDS Link Asia
General form of Linear Regression General form of Logistic Regression
Yˆj = α + ∑ β j x j
α + ∑ βx
j e Linear Regression
Pr( x ) = α + ∑ βx
Where: 1+ e
Y: dependent variable
α : general intercept
β: co-efficient applied to the explanatory variable The output of the regression model is a probability from 0 to 1
x: explanatory variable
In Scoring:
Y: total score
Other Form of Logistic Regression
α : constant
β: co-efficient applied to the characteristics
Pr( x )
x: explanatory variable (e.g. age, income, sex) g ( x ) = ln = α + ∑ βx
350 [Y] = 200 [α ] + 50 x Age30-40 + 40 x Income50k+ + 60 x SexFemale 1 − Pr( x )
www.apds-analytics.com
Comparing Linear Vs Logistic Regression Graphically
GDS Link Asia
Comparing Graphical Patterns: Logistic Regression Vs Linear Regression
GB x x xx x GB x x xx x
Odds Odds
x x xx x x x xx x
Age Age
Logistic Regression tends to fit to more of the datapoints, for example predicting score by age & GB Odds
www.apds-analytics.com
Dummy Models Versus Weight of Evidence
Dummy Models WoE Models
Advantages :- Advantages:-
• Characteristic Attributes split into dummy • Characteristic Attributes assigned their
variable respective weight of evidence
• Missing values easily handled • Applicable to all model types
• Encodes monotonically • Easy to interpret as each attributes becomes a
• Effective in binary classification variable
• Variable contribution is easily observed
Disadvantages:-
Disadvantages:-
• Increased effort to prepare modelling sample
• Curse of dimensionally when there is are large (coding inefficiency)
number of attributes
• Non-monotonic relationships may not be
• Not used in continuous outcome models desirable
• Assumes a linear relationship between • Tends to model the extremes
variables and target
• Does not handle missing variables well
www.apds-analytics.com
Score Creation and Scaling (PDO)
www.apds-analytics.com
Model Diagnostics – Gini Co-Efficient / Lorenz Curve / RoC
Curve
Example of Gini Plot Example of a Skewed Gini
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6 GINI CO-EFFICIENT 83.24
% Bads
% Bads
0.5 0.5
0.4 0.4
0.3 GINI CO-EFFICIENT 56.26 0.3
0.2
0.2
0.1
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
% Goods
% Goods
• Measures the difference between the cumulative of goods and bads by score
• Advantages: -
• Standard scale 0% - 100% comparable across models
• Disadvantages: -
• Separation may be skewed to high or low scores only
www.apds-analytics.com
Reject Inference
By building a new model with reject inference we wish to maximise the number of ‘new goods’ from the old Reject
population and minimise the number of ‘new bads’ from the previous Accepted population
Known Performance
1
0.8
Probability
0.6
0.4
0.2
Prev 0
300 350 400 450 500 550 600 650 700 750 800 850
Accepts Score
P(G) P(A)
New
Goods
New
Bads
Probability
0.4
0.2
0
-0.2 300 350 400 450 500 550 600 650 700 750 800 850
-0.4
Score
Drag the Rejects P(G) down as we assume rejects would perform P(G) P(A) Rejected P(G)
worse than known population
www.apds-analytics.com
Reject Inference – Alternative Data (Bureau / Open Banking)
www.apds-analytics.com
Model Validation • Model Validation should be carried out
• as part of the development process
• as an independent verification of model quality
prior to model implementation or use
Hold-
• periodically to confirm modelling assumption
Out
made at point of development hold and that the
model continues to work as designed
• Validation often uses
• a hold-out sample (develop model on 80% of
sample, test on 20%)
Model • Boot-strapped validation (random samples pulled
Boot-
Diagnostics / from the development sample to test the model)
strap
Validation
• Out of time data sample
Out of
Time /
Sample
• Let’s build models
Exercise – • Model Assessment
Modelling
• Refinement
Other Models to Consider – IRB / IFRS9
• IRB Models look to utilise multiple models to predict the capital requirements of
the bank (by modelling the Probability of Default (PD), Exposure at Default (EAD)
and Loss Given Default (LGD)
• IRB Models should be adjusted for cyclicality across the economic cycle
• Model weaknesses need to be address with Margin of Conservatism adjustment
• IFRS9 models typically utilise similar modelling constructs but consider Lifetime
PD, EAD and LGD
• Revolving products may have complicated EAD models that aim to predict the
amount of the limit that will be consumed by the time of default
• LGD for mortgage products can also be complicated by the recovery processes
within the bank and the legal structure / environment in which they operate
• Monitoring and Validation of models is also important (covered in a separate
session)
www.apds-analytics.com
Why is Model Risk Important?
www.apds-analytics.com
Regulatory Landscape
• Model Risk Regulations or Guidelines have been published by a number of
Regulators, including
• Federal Reserve (SR11-7 – Guidance on Model Risk - April 2011)
• Bank of England (SS1/23 – Model Risk Management Principles for Banks – May 2023 )
• CBUAE (Model Management Standards (MMS) & Guidelines (MMG) – December 2022)
• ECB (Guide to Internal Models – Oct 2019)
• More developed markets with sophisticated banking groups tend to have gone
down the principles-based approach, whereas developing markets are more
prescriptive
• Many global regulators are yet to release guidance, firms in those regions may
want to get ahead of the curve
www.apds-analytics.com
Thank You
www.apds-analytics.com
Risk Modelling Training
by
APDS Consulting
www.apds-analytics.com