Machine Learning Report
Machine Learning Report
B. REPORT
1. Baseline Practice: The current baseline practice involves assigning each credit
account a status: active, suspended, closed, or in collection. For modeling
purposes, we treat accounts with a status of suspended, closed, or in collection as
indicating predicted fraudulent transactions, while active accounts are considered
non-fraudulent. Hence, we convert the account_status into a binary variable: 0 for
non-fraud (active) and 1 for fraud (suspended, closed, or in collection), then
calculate profit based on the confusion matrix.
We then implement ML model, with the best model in the leaderboard: i.e. ‘Light
Gradient Boosted Trees Classifier with Early Stopping and threshold of 0.1, we
compare the result with the baseline practice and recommend the bank to
implement ML model.
● Compare with the baseline practice (on the test set)
Baseline’s Profit: 334850.178
ML Model’s Profit: 570623.119 (outperformed the baseline practice)
● The estimated annual profit of ML model
Because our profit is calculated based on the test set which only accounts for 20%
of the total data (2 years), we need to scale it up in order to calculate the estimated
annual profit.
𝑃𝑟𝑜𝑓𝑖𝑡 𝑜𝑛 𝑇𝑒𝑠𝑡 𝑠𝑒𝑡 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑜𝑛 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑠𝑒𝑡
𝐴𝑛𝑛𝑢𝑎𝑙 𝑝𝑟𝑜𝑓𝑖𝑡 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑜𝑛 𝑇𝑒𝑠𝑡 𝑠𝑒𝑡
× 2
570623.119 463099
= 92620
× 2
= $ 1, 426, 693. 17
Preprocessing Explanations
steps
Predicted
3
(Fraud) (2% of transaction amount) (100% transaction amount)
4. Modeling procedures
i. Model selection & Hyperparameters: To select the most suitable model with
strong performance and minimal overfitting, we experimented with AdaBoost
Classifier, DT and hyperparameters using DataRobot and TPOT. The model from
DataRobot outperformed others. To determine the optimal threshold, predictions
on the valuation set were exported to Excel to build a profit curve, revealing that a
threshold of 0.1 maximizes profit.
4
To determine an effective decision threshold for fraud prediction, we evaluated
model performance across thresholds ranging from 0 to 1 in small increments. For
each threshold, we calculated the total profit on the holdout set using predefined
cost-reward structures for TP, FP, FN, and TN outcomes. Using DataRobot, we
tested our holdout set with the top-performing model, Light Gradient Boosted
Trees Classifier with Early Stopping. The profit curve increased initially and then
plateaued, showing local stability around a threshold of 0.1. We selected this
threshold as it offered a stable trade-off between fraud detection and false
positives while maximizing profit.
5. Evaluation process
After running on DataRobot, there is no leakage or overfitting detected. The top
features with the strongest signal are zip, log_amt, Purchase_frequency_merchant,
age, and hour of transaction. Zip indicates that some certain areas are more likely
to have fraud. The other features capture behavioral patterns and anomalies - such
as unusual transaction amounts, buying from new/irregular merchants, age -
related spending behavior, and off-hour transactions - which are often linked to
fraud.