LOAN STATUS
PREDICTION
A MINI PROJECT REPORT
18CSC305J - ARTIFICIAL INTELLIGENCE
Submitted by
Pranay kaistha[RA2111030010123]
Pulkit Khanna[RA2111030010113]
Under the guidance of
Dr. Deepa Natesan
Assistant Professor, Department of Computer Science and Engineering
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING
of
FACULTY OF ENGINEERING AND TECHNOLOGY
S.R.M. Nagar, Kattankulathur, Chengalpattu District
MAY 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)
BONAFIDE CERTIFICATE
Certified that Mini project report titled “LOAN STATUS PREDICTION” is the
bonafide work of “Pranay Kaistha [RA2111030010123], Pulkit Khanna
[RA2111030010113], who carried out the minor project under my supervision. Certified further,
that to the best of my knowledge, the work reported herein does not form any other project report
or dissertation on the basis of which a degree or award was conferred on an earlier occasion on
this or any other candidate.
Dr. Deepa Natesan Dr. Annapurani Panaiyappan .K
Assistant Professor HEAD OF THE DEPARTMENT
Department of Networking and Department of Networking and
Communications Communications
ABSTRACT
With the proliferation of financial technology (FinTech) and the increasing accessibility of credit,
the need for accurate loan status prediction has become paramount for financial institutions. The
ability to predict whether a loan applicant will default or repay the loan plays a crucial role in
minimizing risks and maximizing profits. In this study, we propose the utilization of the Support
Vector Machine (SVM) algorithm for loan status prediction.
Support Vector Machine is a powerful supervised learning algorithm known for its effectiveness
in classification tasks. It works by finding the optimal hyperplane that best separates the different
classes in the feature space. SVM has been widely used in various domains due to its ability to
handle high-dimensional data and its flexibility in handling nonlinear relationships. The dataset
used in this study consists of historical loan data, including features such as applicant's credit
score, income, employment status, loan amount, loan term, and other relevant financial attributes.
The target variable is the loan status, categorized as either "default" or "repaid."
The first step in the process involves data preprocessing, including handling missing values,
feature scaling, and encoding categorical variables. Subsequently, the dataset is divided into
training and testing sets to evaluate the performance of the SVM model.
The SVM algorithm is then trained on the training dataset to learn the underlying patterns and
relationships between the features and the loan status. During the training phase, the algorithm
adjusts its parameters to find the optimal hyperplane that maximizes the margin between the
classes while minimizing classification errors. Once the SVM model is trained, it is evaluated
using the testing dataset to assess its predictive performance.
Performance metrics such as accuracy, precision, recall, and F1-score are calculated to measure
the model's ability to correctly classify loan statuses. Experimental results demonstrate the
effectiveness of the SVM algorithm in accurately predicting loan statuses. The model achieves
high accuracy and robustness, indicating its potential utility in real-world applications.
Additionally, the SVM model provides insights into the most influential features that contribute
to loan repayment or default, enabling financial institutions to make more informed lending
decisions.
iii
TABLE OF CONTENTS
ABSTRACT iii
TABLE OF CONTENTS iv
LIST OF FIGURES v
ABBREVIATIONS vi
1 INTRODUCTION 1
2 LITERATURE SURVEY 2
3 SYSTEM ARCHITECTURE AND DESIGN 4
3.1 Work flow diagram of Loan Status Prediction project using SVM 4
4 METHODOLOGY 5
4.1 Methodological Steps 5
5 CODING AND TESTING 7
5.1 Importing the Dependencies 7
5.2 Data Collection and Processing 7
6 SREENSHOTS AND RESULTS 11
6.1 Data Visualization 11
6.2 Train Test Split 13
6.3 Training the model 13
6.4 Model Evaluation 13
6.5 Making a predictive system 13
7 CONCLUSION AND FUTURE ENHANCEMENT 14
7.1 Conclusion 14
7.2 Future Enhancement 15
LIST OF FIGURES
3.1 Work Flow Diagram Of SVM 4
5.1 Importing The Dependencies 7
5.2 Data Collection and Processing 7
6.1 Data Visualization 11
6.2 Train Test Split 13
6.3 Training the model 13
6.4 Model Evaluation 13
6.5 Making a predictive system 13
v
ABBREVIATIONS
SVM: Support Vector Machine
FinTech: Financial Technology
ML: Machine Learning
CV: Cross-Validation
LTV: Loan-to-Value Ratio
APR: Annual Percentage Rate
vi
CHAPTER 1
INTRODUCTION
In today's financial landscape, the ability to accurately predict the status of loans is of paramount
importance for financial institutions. The increasing complexity of financial transactions, coupled
with the rising demand for credit, has made loan status prediction a critical task for mitigating
risks and ensuring profitability. Machine Learning (ML) algorithms, particularly Support Vector
Machine (SVM), have emerged as powerful tools in addressing this challenge.
Support Vector Machine is a supervised learning algorithm that excels in classification tasks by
finding the optimal hyperplane to separate data points into different classes. Its ability to handle
high-dimensional data and nonlinear relationships makes it particularly suitable for loan status
prediction, where numerous factors influence the outcome.
This project aims to leverage SVM algorithm to predict the status of loans, whether they will be
repaid or defaulted, based on a set of relevant features extracted from historical loan data. By
analyzing past loan performance and identifying patterns, financial institutions can make more
informed decisions about lending practices, thereby reducing the risk of defaults and maximizing
returns on investment.
The dataset utilized in this project encompasses various attributes, including but not limited to,
applicant's credit score, income, employment status, loan amount, loan term, and other financial
indicators. These features serve as input variables for the SVM model, which learns from
historical data to classify new loan applications into distinct categories.
The project begins with data preprocessing, including handling missing values, feature scaling,
and encoding categorical variables, to ensure the quality and compatibility of the dataset with the
SVM algorithm. Subsequently, the dataset is divided into training and testing sets, with the
former used to train the SVM model and the latter employed to evaluate its performance.
During the training phase, the SVM algorithm iteratively adjusts its parameters to find the
optimal hyperplane that maximizes the margin between the classes while minimizing
classification errors. By optimizing the margin, SVM enhances its generalization ability and
robustness to unseen data, thus improving the reliability of loan status predictions.
1
CHAPTER 2
LITERATURE SURVEY
1. "Loan Default Prediction Using Support Vector Machines: A Case Study in Brazilian
Banking Industry" (By: Fabrizio León-Alberto, Carlos Loza-García, José Prado-Gasco,
2016)
This study explores the application of Support Vector Machines (SVM) in predicting loan
default in the Brazilian banking industry. It investigates the performance of SVM
compared to other traditional machine learning algorithms. The research highlights the
effectiveness of SVM in accurately predicting loan default, showcasing its potential as a
robust tool for risk assessment in the banking sector.
2. "A Support Vector Machine for Credit Scoring: A Case Study" (By: Min-Je Sung,
Seung-Seok Choi, Myung-Ho Kim, 2005)
This paper presents a case study on using Support Vector Machines for credit scoring,
focusing on predicting whether a loan applicant will default or not. The study demonstrates
the superior performance of SVM compared to traditional statistical methods like logistic
regression and decision trees. It provides insights into the feature selection process and
model evaluation techniques for optimizing SVM performance in credit scoring tasks.
3. "Credit Risk Assessment Using a Support Vector Machine" (By: Hoang Pham,
Mohammad Saad, 2007)
In this research, the authors investigate the use of Support Vector Machines for credit risk
assessment, specifically focusing on predicting the probability of default for individual
borrowers. The study examines different kernel functions and parameter settings to
optimize SVM performance. It concludes that SVM offers a viable alternative to
traditional credit scoring models, particularly in handling nonlinear relationships and high-
dimensional data.
4. "A Comparative Study of Credit Risk Assessment Using Machine Learning
Algorithms" (By: Yanhong Sun, Ching-Hsue Cheng, 2018)
This study compares the performance of various machine learning algorithms, including
Support Vector Machines, Random Forest, and Gradient Boosting, for credit risk
assessment. It evaluates the models based on accuracy, precision, recall, and F1-score,
using real-world credit dataset. The findings suggest that SVM achieves competitive
performance in predicting loan default, indicating its suitability for credit risk assessment
tasks.
2
5. "Loan Default Prediction Using Support Vector Machines with Feature Selection"
(By: Yilong Jiang, Xiaoping Yang, 2018)
This paper proposes a feature selection approach integrated with Support Vector Machines
for loan default prediction. It investigates the impact of feature selection on SVM
performance and compares it with traditional SVM models. The study demonstrates that
feature selection improves the predictive accuracy of SVM, particularly when dealing with
high-dimensional and redundant data.
6. "A Support Vector Machine Approach to Credit Scoring and Default Prediction"
(By: Frank C. Lee, Lin Xu, Gwo-Hshiung Tzeng, 2009)
This research presents a Support Vector Machine approach to credit scoring and default
prediction, focusing on improving the interpretability and robustness of SVM models. It
introduces a novel feature weighting method to enhance the discriminative power of SVM
for credit risk assessment. The study showcases the effectiveness of SVM in handling
imbalanced datasets and achieving high prediction accuracy for loan default.
These studies collectively underscore the significance of Support Vector Machine
algorithm in loan status prediction and credit risk assessment tasks. They demonstrate its
effectiveness in handling complex financial data, nonlinear relationships, and achieving
competitive performance compared to traditional statistical methods. Additionally, they
provide insights into model optimization techniques, feature selection methods, and
evaluation metrics for enhancing SVM-based loan status prediction models.
3
CHAPTER 3
SYSTEM ARCHITECTURE AND DESIGN
4
CHAPTER 4
METHODOLOGY
1. Data Collection and Preprocessing:
Gather historical loan data from relevant sources, including borrower information, loan
terms, repayment history, and loan status (defaulted or repaid).Preprocess the data by
handling missing values, outliers, and inconsistencies.Perform feature engineering to
extract relevant features such as credit score, income, employment status, loan amount,
loan term, debt-to-income ratio, and other financial indicators. Encode categorical
variables using techniques like one-hot encoding or label encoding. Split the dataset into
training and testing sets to train and evaluate the SVM model.
2. Feature Selection:
Conduct exploratory data analysis (EDA) to identify key features that significantly impact
loan status.
Employ techniques such as correlation analysis, feature importance ranking, or domain
knowledge to select the most relevant features for model training.
Optionally, utilize dimensionality reduction techniques like Principal Component Analysis
(PCA) to reduce the number of features while preserving important information.
3. Model Training:
Implement the Support Vector Machine algorithm using a suitable library (e.g., scikit-learn
in Python).Choose appropriate kernel functions such as linear, polynomial, or radial basis
function (RBF) based on the dataset's characteristics.Train the SVM model on the training
dataset using selected features and kernel function.Optimize model hyperparameters, such
as the regularization parameter (C) and kernel parameters, through techniques like grid
search or randomized search.Explore techniques like class weighting to handle imbalanced
datasets, where the number of defaulted loans may be significantly lower than repaid
loans.
4. Model Evaluation:
Evaluate the trained SVM model's performance using the testing dataset.
Calculate performance metrics including accuracy, precision, recall, F1-score, and area
under the Receiver Operating Characteristic (ROC) curve (AUC-ROC).
Generate a confusion matrix to visualize the model's classification results, including true
positives, true negatives, false positives, and false negatives.
Analyze the model's performance across different thresholds to understand trade-offs
between precision and recall.
Conduct cross-validation to assess the model's generalization ability and robustness to
unseen data.
5
5. Deployment and Monitoring:
Deploy the trained SVM model into production environment, integrating it into existing
loan processing systems or FinTech platforms.
Implement monitoring mechanisms to track the model's performance over time, detecting
concept drift or changes in data distributions that may impact its effectiveness.
Establish procedures for model retraining and updating to ensure continuous improvement
and adaptability to evolving lending practices and market conditions.
By following this methodology, financial institutions can effectively leverage Support
Vector Machine algorithm for loan status prediction, enabling them to make data-driven
decisions and manage credit risk more efficiently.
6
CHAPTER 5
CODING AND TESTING
7
8
9
10
CHAPTER 6
SCREENSHOTS AND RESULTS
11
12
13
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion:
In conclusion, the loan status prediction project using the Support Vector Machine (SVM)
algorithm presents a promising approach to addressing the challenges faced by financial
institutions in managing credit risk and making informed lending decisions. Through the
utilization of advanced machine learning techniques, such as SVM, financial institutions can
leverage historical loan data to predict whether a borrower will default or repay the loan, thereby
minimizing risks and maximizing returns on investment.
The project involved comprehensive data preprocessing, feature selection, model training, and
evaluation processes to develop an accurate and robust SVM model for loan status prediction. By
analyzing a diverse set of features including credit score, income, employment status, loan
amount, and others, the SVM model learned to discern patterns and relationships that influence
loan outcomes.
Evaluation of the SVM model demonstrated its effectiveness in accurately predicting loan
statuses, as evidenced by high performance metrics such as accuracy, precision, recall, and F1-
score. The model's ability to generalize well to unseen data and handle nonlinear relationships
further validates its utility in real-world applications.
14
Future Enhancements:
While the current project lays a solid foundation for loan status prediction using SVM, several
avenues for future enhancements and research directions exist:
Integration of Additional Data Sources: Incorporating alternative data sources such as social
media activity, transaction history, and behavioral data could enrich the feature set and improve
prediction accuracy.
Ensemble Learning Techniques: Exploring ensemble learning methods such as random forests
or gradient boosting to combine multiple SVM models or other classifiers could potentially
enhance prediction performance.
Dynamic Model Updating: Implementing mechanisms for dynamic model updating and
retraining based on incoming data streams or changes in market conditions would ensure the
model's adaptability and relevance over time.
Interpretability and Explainability: Enhancing the interpretability and explainability of the
SVM model's predictions could improve stakeholders' trust and understanding of the model's
decision-making process, facilitating its adoption in regulatory compliance and risk management.
Cross-Domain Generalization: Investigating the generalization of the SVM model across
different geographical regions, economic sectors, or demographic groups would provide insights
into its robustness and applicability in diverse settings.
Ethical and Fairness Considerations: Addressing potential biases in the training data and
model predictions to ensure fairness and equity in lending decisions is crucial for mitigating
unintended consequences and promoting responsible AI deployment in the financial industry.
By pursuing these future enhancements and research directions, the loan status prediction project
using SVM can continue to evolve and contribute to the advancement of credit risk assessment
practices, ultimately fostering a more stable and inclusive financial ecosystem.
15