Major Project
Major Project
Guide
Prof.Avinash Gondal
02 Motivation
03 Literature
Review
04 Objective
05 Problem
Statement
Conte 06 Proposed
Architecture
nts 07 Dataset and Technology
08 Proposed Methodology
09 Gantt
Chart
10 Conclusion
1
1 Reference
Introduction
• Credit risk modeling is a critical function for financial institutions, helping them assess the
likelihood of loan applicants defaulting.
• The ability to predict an applicant’s risk accurately helps banks and other lending
institutions minimize non-performing loans (NPLs) and optimize loan approval decisions.
• This project focuses on classifying loan applicants into risk categories (P1-P4) using
advanced machine learning (ML) and deep learning (DL) techniques, based on internal bank
data and external credit data (e.g., CIBIL).
The financial industry faces significant risks due to loan defaults. Misclassification of
credit risk can lead to substantial financial losses, as high-risk customers may be
approved for loans while low-risk customers are denied. Traditional credit scoring models
often struggle with complex, non-linear data and do not account for dynamic changes in
borrower behavior. By applying machine learning and deep learning models, we can
improve the accuracy and robustness of credit risk predictions, helping banks make better
decisions, minimize defaults, and optimize their lending processes.
• Credit risk prediction has evolved from traditional models like Logistic
Regression to more advanced methods such as XGBoost and Random Forest,
which offer better accuracy but still face challenges with class imbalance and
transparency.
• Techniques like SMOTE improve data balance, and tools like SHAP value
enhance model interpretability. While deep learning shows potential, its
complexity limits practical use.
2 Explainable AI Using H2O AutoML This paper focuses on applying H2O’s AutoML for model tuning and uses Lacks detailed performance metrics Incorporate class imbalance techniques
and Robustness Check in Credit SHAP values for interpretability. It emphasizes the need for transparency in and does not fully address class and provide a more comprehensive
Risk Management credit risk models to ensure regulatory compliance while maintaining high imbalance. evaluation with detailed metrics.
accuracy.
3 Customer Credit Risk: Application A comparative analysis of machine learning (SVM, Random Forest) and deep SVM struggles with large datasets, and Explore hybrid models combining ML
and Evaluation of Machine learning models (DNN, CNN) for credit risk prediction. SVM and XGBoost are complex models may suffer from and DL techniques to improve
Learning and Deep Learning identified as the top-performing algorithms in terms of accuracy. overfitting without proper tuning. performance on large and imbalanced
Models datasets.
4 Optimizing Loan Approval This paper uses ensemble methods to optimize loan approval decisions, Challenges in deploying stacking Test deep learning methods and improve
Decisions providing insights on different machine learning algorithms and their ensembles in real-time systems, with the deployment of ensemble models in
performance in predicting credit risk categories. limited exploration of deep learning real-time credit risk environments.
models.
5 Financial Risk Prediction Using Provides an extensive comparison of various machine learning algorithms Insufficient handling of class imbalance Integrate interpretability tools (like
Machine Learning (Logistic Regression, Decision Trees, XGBoost) for financial risk prediction. It and lack of model interpretability. SHAP) and better address class
focuses on practical implementation in the financial industry. imbalance through resampling
techniques.
6 Novel Approach to Optimize Proposes an enhanced SVM model for credit risk prediction, achieving high Limited dataset size raises concerns Apply the model to larger datasets and
Credit Risk Prediction accuracy (96.92%) and targeted improvements to the algorithm. about generalizability, and struggles enhance its handling of class imbalance
with imbalanced data were noted. using techniques like SMOTE.
7 An Automatic Deep This paper introduces a Deep-Q Network (reinforcement learning) approach Low accuracy (~10%) due to class Improve accuracy and scalability by
Reinforcement Learning Based for credit scoring. It focuses on adaptive learning and real-time feedback for imbalance and complex tuning tuning the Deep-Q network and
Credit Scoring Model using Deep- improved credit risk classification. requirements, limiting practical incorporating data balancing techniques.
Q application.
8 Credit Risk Assessment Explores the use of HistGradientBoosting for credit risk assessment, achieving Needs further validation in real-world Test model robustness in real-world
high accuracy (89.08%) and precision (94.54%) while efficiently handling large scenarios, with a risk of overfitting if environments and apply stronger
datasets. not properly regularized. regularization to avoid overfitting.
Objective
The objective of this project is to develop an accurate, scalable, and interpretable credit
risk prediction model that categorizes loan applicants into four risk categories: P1 (lowest
risk) to P4 (highest risk).
The model will leverage machine learning and deep learning techniques, combining both
internal bank data (loan accounts, payment histories) and external credit data (CIBIL
scores).
The primary goals are:
• To minimize financial losses by accurately predicting high-risk applicants.
• To ensure model interpretability for regulatory compliance.
• To balance model performance and computation time, making it scalable for real-world
deployment.
• This project presents a comprehensive approach to credit risk modeling, leveraging both
internal and external datasets. By applying machine learning algorithms like XGBoost and
ensemble methods like Random Forest, the project aims to provide a robust solution for
classifying loan applicants into different risk categories.
• The model will not only improve prediction accuracy but will also provide interpretability
using SHAP, ensuring that the system meets regulatory requirements and delivers clear insights
into the decision-making process.
• The expected result is a highly accurate model (85-90% accuracy) that balances performance
with transparency, allowing financial institutions to make informed, data-driven lending
decisions while minimizing risk.