0% found this document useful (0 votes)

261 views21 pages

Loan Prediction for CS Students

This document describes a project using an SVM algorithm to predict loan repayment status. It discusses preprocessing data, training and testing an SVM model, and evaluating model performance. The goal is to help financial institutions make more informed lending decisions to reduce risks and maximize profits.

Uploaded by

pulkitkhanna310

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

261 views21 pages

Loan Prediction for CS Students

Uploaded by

pulkitkhanna310

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

LOAN STATUS

PREDICTION

A MINI PROJECT REPORT

18CSC305J - ARTIFICIAL INTELLIGENCE

Submitted by

Pranay kaistha[RA2111030010123]
Pulkit Khanna[RA2111030010113]

Under the guidance of

Dr. Deepa Natesan

Assistant Professor, Department of Computer Science and Engineering

in partial fulfillment for the award of the degree

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE & ENGINEERING

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu District

MAY 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that Mini project report titled “LOAN STATUS PREDICTION” is the
bonafide work of “Pranay Kaistha [RA2111030010123], Pulkit Khanna
[RA2111030010113], who carried out the minor project under my supervision. Certified further,
that to the best of my knowledge, the work reported herein does not form any other project report
or dissertation on the basis of which a degree or award was conferred on an earlier occasion on
this or any other candidate.

Dr. Deepa Natesan Dr. Annapurani Panaiyappan .K

Assistant Professor HEAD OF THE DEPARTMENT
Department of Networking and Department of Networking and
Communications Communications
ABSTRACT

With the proliferation of financial technology (FinTech) and the increasing accessibility of credit,
the need for accurate loan status prediction has become paramount for financial institutions. The
ability to predict whether a loan applicant will default or repay the loan plays a crucial role in
minimizing risks and maximizing profits. In this study, we propose the utilization of the Support
Vector Machine (SVM) algorithm for loan status prediction.

Support Vector Machine is a powerful supervised learning algorithm known for its effectiveness
in classification tasks. It works by finding the optimal hyperplane that best separates the different
classes in the feature space. SVM has been widely used in various domains due to its ability to
handle high-dimensional data and its flexibility in handling nonlinear relationships. The dataset
used in this study consists of historical loan data, including features such as applicant's credit
score, income, employment status, loan amount, loan term, and other relevant financial attributes.
The target variable is the loan status, categorized as either "default" or "repaid."

The first step in the process involves data preprocessing, including handling missing values,
feature scaling, and encoding categorical variables. Subsequently, the dataset is divided into
training and testing sets to evaluate the performance of the SVM model.

The SVM algorithm is then trained on the training dataset to learn the underlying patterns and
relationships between the features and the loan status. During the training phase, the algorithm
adjusts its parameters to find the optimal hyperplane that maximizes the margin between the
classes while minimizing classification errors. Once the SVM model is trained, it is evaluated
using the testing dataset to assess its predictive performance.

Performance metrics such as accuracy, precision, recall, and F1-score are calculated to measure
the model's ability to correctly classify loan statuses. Experimental results demonstrate the
effectiveness of the SVM algorithm in accurately predicting loan statuses. The model achieves
high accuracy and robustness, indicating its potential utility in real-world applications.
Additionally, the SVM model provides insights into the most influential features that contribute
to loan repayment or default, enabling financial institutions to make more informed lending
decisions.

iii
TABLE OF CONTENTS

ABSTRACT iii

TABLE OF CONTENTS iv

LIST OF FIGURES v

ABBREVIATIONS vi

1 INTRODUCTION 1

2 LITERATURE SURVEY 2
3 SYSTEM ARCHITECTURE AND DESIGN 4
3.1 Work flow diagram of Loan Status Prediction project using SVM 4
4 METHODOLOGY 5
4.1 Methodological Steps 5
5 CODING AND TESTING 7
5.1 Importing the Dependencies 7
5.2 Data Collection and Processing 7
6 SREENSHOTS AND RESULTS 11
6.1 Data Visualization 11
6.2 Train Test Split 13
6.3 Training the model 13
6.4 Model Evaluation 13
6.5 Making a predictive system 13
7 CONCLUSION AND FUTURE ENHANCEMENT 14
7.1 Conclusion 14
7.2 Future Enhancement 15
LIST OF FIGURES

3.1 Work Flow Diagram Of SVM 4

5.1 Importing The Dependencies 7
5.2 Data Collection and Processing 7
6.1 Data Visualization 11
6.2 Train Test Split 13
6.3 Training the model 13
6.4 Model Evaluation 13
6.5 Making a predictive system 13

v
ABBREVIATIONS

SVM: Support Vector Machine

FinTech: Financial Technology
ML: Machine Learning
CV: Cross-Validation
LTV: Loan-to-Value Ratio
APR: Annual Percentage Rate

vi
CHAPTER 1
INTRODUCTION

In today's financial landscape, the ability to accurately predict the status of loans is of paramount
importance for financial institutions. The increasing complexity of financial transactions, coupled
with the rising demand for credit, has made loan status prediction a critical task for mitigating
risks and ensuring profitability. Machine Learning (ML) algorithms, particularly Support Vector
Machine (SVM), have emerged as powerful tools in addressing this challenge.

Support Vector Machine is a supervised learning algorithm that excels in classification tasks by
finding the optimal hyperplane to separate data points into different classes. Its ability to handle
high-dimensional data and nonlinear relationships makes it particularly suitable for loan status
prediction, where numerous factors influence the outcome.

This project aims to leverage SVM algorithm to predict the status of loans, whether they will be
repaid or defaulted, based on a set of relevant features extracted from historical loan data. By
analyzing past loan performance and identifying patterns, financial institutions can make more
informed decisions about lending practices, thereby reducing the risk of defaults and maximizing
returns on investment.

The dataset utilized in this project encompasses various attributes, including but not limited to,
applicant's credit score, income, employment status, loan amount, loan term, and other financial
indicators. These features serve as input variables for the SVM model, which learns from
historical data to classify new loan applications into distinct categories.

The project begins with data preprocessing, including handling missing values, feature scaling,
and encoding categorical variables, to ensure the quality and compatibility of the dataset with the
SVM algorithm. Subsequently, the dataset is divided into training and testing sets, with the
former used to train the SVM model and the latter employed to evaluate its performance.

During the training phase, the SVM algorithm iteratively adjusts its parameters to find the
optimal hyperplane that maximizes the margin between the classes while minimizing
classification errors. By optimizing the margin, SVM enhances its generalization ability and
robustness to unseen data, thus improving the reliability of loan status predictions.

1
CHAPTER 2
LITERATURE SURVEY

1. "Loan Default Prediction Using Support Vector Machines: A Case Study in Brazilian
Banking Industry" (By: Fabrizio León-Alberto, Carlos Loza-García, José Prado-Gasco,
2016)
This study explores the application of Support Vector Machines (SVM) in predicting loan
default in the Brazilian banking industry. It investigates the performance of SVM
compared to other traditional machine learning algorithms. The research highlights the
effectiveness of SVM in accurately predicting loan default, showcasing its potential as a
robust tool for risk assessment in the banking sector.

2. "A Support Vector Machine for Credit Scoring: A Case Study" (By: Min-Je Sung,
Seung-Seok Choi, Myung-Ho Kim, 2005)
This paper presents a case study on using Support Vector Machines for credit scoring,
focusing on predicting whether a loan applicant will default or not. The study demonstrates
the superior performance of SVM compared to traditional statistical methods like logistic
regression and decision trees. It provides insights into the feature selection process and
model evaluation techniques for optimizing SVM performance in credit scoring tasks.

3. "Credit Risk Assessment Using a Support Vector Machine" (By: Hoang Pham,
Mohammad Saad, 2007)
In this research, the authors investigate the use of Support Vector Machines for credit risk
assessment, specifically focusing on predicting the probability of default for individual
borrowers. The study examines different kernel functions and parameter settings to
optimize SVM performance. It concludes that SVM offers a viable alternative to
traditional credit scoring models, particularly in handling nonlinear relationships and high-
dimensional data.

4. "A Comparative Study of Credit Risk Assessment Using Machine Learning

Algorithms" (By: Yanhong Sun, Ching-Hsue Cheng, 2018)
This study compares the performance of various machine learning algorithms, including
Support Vector Machines, Random Forest, and Gradient Boosting, for credit risk
assessment. It evaluates the models based on accuracy, precision, recall, and F1-score,
using real-world credit dataset. The findings suggest that SVM achieves competitive
performance in predicting loan default, indicating its suitability for credit risk assessment
tasks.

2
5. "Loan Default Prediction Using Support Vector Machines with Feature Selection"
(By: Yilong Jiang, Xiaoping Yang, 2018)
This paper proposes a feature selection approach integrated with Support Vector Machines
for loan default prediction. It investigates the impact of feature selection on SVM
performance and compares it with traditional SVM models. The study demonstrates that
feature selection improves the predictive accuracy of SVM, particularly when dealing with
high-dimensional and redundant data.

6. "A Support Vector Machine Approach to Credit Scoring and Default Prediction"
(By: Frank C. Lee, Lin Xu, Gwo-Hshiung Tzeng, 2009)
This research presents a Support Vector Machine approach to credit scoring and default
prediction, focusing on improving the interpretability and robustness of SVM models. It
introduces a novel feature weighting method to enhance the discriminative power of SVM
for credit risk assessment. The study showcases the effectiveness of SVM in handling
imbalanced datasets and achieving high prediction accuracy for loan default.

These studies collectively underscore the significance of Support Vector Machine

algorithm in loan status prediction and credit risk assessment tasks. They demonstrate its
effectiveness in handling complex financial data, nonlinear relationships, and achieving
competitive performance compared to traditional statistical methods. Additionally, they
provide insights into model optimization techniques, feature selection methods, and
evaluation metrics for enhancing SVM-based loan status prediction models.

3
CHAPTER 3
SYSTEM ARCHITECTURE AND DESIGN

4
CHAPTER 4
METHODOLOGY

1. Data Collection and Preprocessing:

Gather historical loan data from relevant sources, including borrower information, loan
terms, repayment history, and loan status (defaulted or repaid).Preprocess the data by
handling missing values, outliers, and inconsistencies.Perform feature engineering to
extract relevant features such as credit score, income, employment status, loan amount,
loan term, debt-to-income ratio, and other financial indicators. Encode categorical
variables using techniques like one-hot encoding or label encoding. Split the dataset into
training and testing sets to train and evaluate the SVM model.

2. Feature Selection:
Conduct exploratory data analysis (EDA) to identify key features that significantly impact
loan status.
Employ techniques such as correlation analysis, feature importance ranking, or domain
knowledge to select the most relevant features for model training.
Optionally, utilize dimensionality reduction techniques like Principal Component Analysis
(PCA) to reduce the number of features while preserving important information.

3. Model Training:
Implement the Support Vector Machine algorithm using a suitable library (e.g., scikit-learn
in Python).Choose appropriate kernel functions such as linear, polynomial, or radial basis
function (RBF) based on the dataset's characteristics.Train the SVM model on the training
dataset using selected features and kernel function.Optimize model hyperparameters, such
as the regularization parameter (C) and kernel parameters, through techniques like grid
search or randomized search.Explore techniques like class weighting to handle imbalanced
datasets, where the number of defaulted loans may be significantly lower than repaid
loans.
4. Model Evaluation:
Evaluate the trained SVM model's performance using the testing dataset.
Calculate performance metrics including accuracy, precision, recall, F1-score, and area
under the Receiver Operating Characteristic (ROC) curve (AUC-ROC).
Generate a confusion matrix to visualize the model's classification results, including true
positives, true negatives, false positives, and false negatives.
Analyze the model's performance across different thresholds to understand trade-offs
between precision and recall.
Conduct cross-validation to assess the model's generalization ability and robustness to
unseen data.
5
5. Deployment and Monitoring:
Deploy the trained SVM model into production environment, integrating it into existing
loan processing systems or FinTech platforms.
Implement monitoring mechanisms to track the model's performance over time, detecting
concept drift or changes in data distributions that may impact its effectiveness.
Establish procedures for model retraining and updating to ensure continuous improvement
and adaptability to evolving lending practices and market conditions.

By following this methodology, financial institutions can effectively leverage Support

Vector Machine algorithm for loan status prediction, enabling them to make data-driven
decisions and manage credit risk more efficiently.

6
CHAPTER 5
CODING AND TESTING

7
8
9
10
CHAPTER 6
SCREENSHOTS AND RESULTS

11
12
13
CONCLUSION AND FUTURE ENHANCEMENTS

Conclusion:

In conclusion, the loan status prediction project using the Support Vector Machine (SVM)
algorithm presents a promising approach to addressing the challenges faced by financial
institutions in managing credit risk and making informed lending decisions. Through the
utilization of advanced machine learning techniques, such as SVM, financial institutions can
leverage historical loan data to predict whether a borrower will default or repay the loan, thereby
minimizing risks and maximizing returns on investment.

The project involved comprehensive data preprocessing, feature selection, model training, and
evaluation processes to develop an accurate and robust SVM model for loan status prediction. By
analyzing a diverse set of features including credit score, income, employment status, loan
amount, and others, the SVM model learned to discern patterns and relationships that influence
loan outcomes.

Evaluation of the SVM model demonstrated its effectiveness in accurately predicting loan
statuses, as evidenced by high performance metrics such as accuracy, precision, recall, and F1-
score. The model's ability to generalize well to unseen data and handle nonlinear relationships
further validates its utility in real-world applications.

14
Future Enhancements:

While the current project lays a solid foundation for loan status prediction using SVM, several
avenues for future enhancements and research directions exist:

Integration of Additional Data Sources: Incorporating alternative data sources such as social
media activity, transaction history, and behavioral data could enrich the feature set and improve
prediction accuracy.

Ensemble Learning Techniques: Exploring ensemble learning methods such as random forests
or gradient boosting to combine multiple SVM models or other classifiers could potentially
enhance prediction performance.

Dynamic Model Updating: Implementing mechanisms for dynamic model updating and
retraining based on incoming data streams or changes in market conditions would ensure the
model's adaptability and relevance over time.

Interpretability and Explainability: Enhancing the interpretability and explainability of the

SVM model's predictions could improve stakeholders' trust and understanding of the model's
decision-making process, facilitating its adoption in regulatory compliance and risk management.

Cross-Domain Generalization: Investigating the generalization of the SVM model across

different geographical regions, economic sectors, or demographic groups would provide insights
into its robustness and applicability in diverse settings.

Ethical and Fairness Considerations: Addressing potential biases in the training data and
model predictions to ensure fairness and equity in lending decisions is crucial for mitigating
unintended consequences and promoting responsible AI deployment in the financial industry.

By pursuing these future enhancements and research directions, the loan status prediction project
using SVM can continue to evolve and contribute to the advancement of credit risk assessment
practices, ultimately fostering a more stable and inclusive financial ecosystem.

ML Unit I Notes
No ratings yet
ML Unit I Notes
27 pages
Deep Learning Exam Questions 2022
No ratings yet
Deep Learning Exam Questions 2022
3 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
15 pages
Deep Learning Simp 21cs743
No ratings yet
Deep Learning Simp 21cs743
3 pages
BAI702 Important Questions With Answers
No ratings yet
BAI702 Important Questions With Answers
2 pages
Bcv755b 2 Ia Question Bank
No ratings yet
Bcv755b 2 Ia Question Bank
1 page
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
BCM601-Module 1
No ratings yet
BCM601-Module 1
35 pages
6th Sem AIDS Syllabus 2022 Scheme
No ratings yet
6th Sem AIDS Syllabus 2022 Scheme
52 pages
Sepm Notes
No ratings yet
Sepm Notes
10 pages
BAI701 Module 4 Notes
No ratings yet
BAI701 Module 4 Notes
12 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
32 pages
BAI701 Module 5 Notes
No ratings yet
BAI701 Module 5 Notes
18 pages
Comprehensive Machine Learning Notes
No ratings yet
Comprehensive Machine Learning Notes
96 pages
Module 5 Notes
No ratings yet
Module 5 Notes
12 pages
Deep Learnng Model QP
No ratings yet
Deep Learnng Model QP
2 pages
DLRL Module 1 Updated
No ratings yet
DLRL Module 1 Updated
25 pages
Module - 04 CC (Bcs601) Search Creators - 250426 - 131037
No ratings yet
Module - 04 CC (Bcs601) Search Creators - 250426 - 131037
64 pages
Unit 3
No ratings yet
Unit 3
14 pages
Data Engineering and MLops
No ratings yet
Data Engineering and MLops
3 pages
Module 1 de and Mlops Notes
No ratings yet
Module 1 de and Mlops Notes
40 pages
Ref 3 Recommender Systems For Learning PDF
No ratings yet
Ref 3 Recommender Systems For Learning PDF
84 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
Decision Tree Learning in ML
No ratings yet
Decision Tree Learning in ML
2 pages
@vtucode - in BCS515B Module 3 Textbook
No ratings yet
@vtucode - in BCS515B Module 3 Textbook
32 pages
Vtu Paper Presentations PDF
No ratings yet
Vtu Paper Presentations PDF
5 pages
BAD703 Module 4
No ratings yet
BAD703 Module 4
21 pages
DLRL Module 2
No ratings yet
DLRL Module 2
22 pages
Ad8701 DL Unit5 Notes
No ratings yet
Ad8701 DL Unit5 Notes
68 pages
Module-1 DL
No ratings yet
Module-1 DL
53 pages
21it6203-Knowledge Engineering Laboratory
No ratings yet
21it6203-Knowledge Engineering Laboratory
32 pages
BAI701 - DLRL - Module 4 Notes
No ratings yet
BAI701 - DLRL - Module 4 Notes
34 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
Deep Learning and Machine Learning Basics
No ratings yet
Deep Learning and Machine Learning Basics
66 pages
Animation & Entertainment Study
No ratings yet
Animation & Entertainment Study
13 pages
21CS63 - CG&FIP Course Material
No ratings yet
21CS63 - CG&FIP Course Material
151 pages
NLP Manual Final
No ratings yet
NLP Manual Final
28 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Deep Learning Important Questions
No ratings yet
Deep Learning Important Questions
2 pages
Aiml-Lab Manual 2025 - Lab 2 - Ai Problem Solving Agents
No ratings yet
Aiml-Lab Manual 2025 - Lab 2 - Ai Problem Solving Agents
5 pages
AML - Theory - Syllabus - Chandigarh University
No ratings yet
AML - Theory - Syllabus - Chandigarh University
4 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
BAI701 DLRL Module 3 Notes
No ratings yet
BAI701 DLRL Module 3 Notes
18 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Machine Learning Exam Question Paper
No ratings yet
Machine Learning Exam Question Paper
3 pages
Unit-III - Chapter7-Learning Rule Sets
No ratings yet
Unit-III - Chapter7-Learning Rule Sets
44 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
18 pages
AI-based Self-Driving Car
No ratings yet
AI-based Self-Driving Car
9 pages
Lab Manual CL III
No ratings yet
Lab Manual CL III
66 pages
BCS701 - IOT - Module 2 Notes
No ratings yet
BCS701 - IOT - Module 2 Notes
44 pages
Soft Computing
No ratings yet
Soft Computing
17 pages
Vtu Questions From Previous Ai ML Question Papers
No ratings yet
Vtu Questions From Previous Ai ML Question Papers
4 pages
191AIE021J-Recommender Systems-Syllabus
No ratings yet
191AIE021J-Recommender Systems-Syllabus
4 pages
PC Module II Question Bank
No ratings yet
PC Module II Question Bank
2 pages
ML DL Complete Notes
No ratings yet
ML DL Complete Notes
5 pages
AI Projects & Python Libraries Guide
No ratings yet
AI Projects & Python Libraries Guide
19 pages
7th Sem 1
No ratings yet
7th Sem 1
32 pages
Representation Power of MLPs
No ratings yet
Representation Power of MLPs
141 pages
Module-II Introduction To Hadoop
No ratings yet
Module-II Introduction To Hadoop
27 pages
Uber Price Prediction Using ML Techniques
No ratings yet
Uber Price Prediction Using ML Techniques
42 pages
Modeling Functions with Kernels
No ratings yet
Modeling Functions with Kernels
26 pages
IPU University 6th Sem Questions
No ratings yet
IPU University 6th Sem Questions
14 pages
PV Energy Forecasting with SVR Model
No ratings yet
PV Energy Forecasting with SVR Model
9 pages
Advancements in Air Quality Monitoring: A Systematic Review of Iot-Based Air Quality Monitoring and Ai Technologies
No ratings yet
Advancements in Air Quality Monitoring: A Systematic Review of Iot-Based Air Quality Monitoring and Ai Technologies
67 pages
Machine Learning Assignment: Regression & SVM
No ratings yet
Machine Learning Assignment: Regression & SVM
2 pages
ML Insem
No ratings yet
ML Insem
46 pages
Urban Land-Use Classification with ML
No ratings yet
Urban Land-Use Classification with ML
22 pages
Journal of Land Management and Geomatics Education Volume 1
No ratings yet
Journal of Land Management and Geomatics Education Volume 1
71 pages
Data Science Answer
No ratings yet
Data Science Answer
31 pages
SVMs for Credit Scoring Models
No ratings yet
SVMs for Credit Scoring Models
15 pages
CyberBullying Detection Thesis
No ratings yet
CyberBullying Detection Thesis
42 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
8 pages
Self-Taught Learning: Transfer Learning From Unlabeled Data
No ratings yet
Self-Taught Learning: Transfer Learning From Unlabeled Data
8 pages
AI Exam Instructions
No ratings yet
AI Exam Instructions
11 pages
A Comprehensive Review of Artificial Intelligence
No ratings yet
A Comprehensive Review of Artificial Intelligence
17 pages
Heart Disease Prediction Using GBC Techniques
No ratings yet
Heart Disease Prediction Using GBC Techniques
7 pages
Airplane Detection via Wavelet and SVM
No ratings yet
Airplane Detection via Wavelet and SVM
4 pages
Introduction to AdaBoost Algorithm
No ratings yet
Introduction to AdaBoost Algorithm
14 pages
Singh 2019
No ratings yet
Singh 2019
8 pages
SVM-Based Prediction of Building Electricity Load
No ratings yet
SVM-Based Prediction of Building Electricity Load
7 pages
Gaussian RBF CKA Convergence Analysis
No ratings yet
Gaussian RBF CKA Convergence Analysis
11 pages
Machine Learning for Modulation Recognition
No ratings yet
Machine Learning for Modulation Recognition
5 pages
41 Essential Machine Learning Interview Questions: 18 Mins Read
No ratings yet
41 Essential Machine Learning Interview Questions: 18 Mins Read
21 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
52 pages
Top 45 Machine Learning Interview Questions (2023) - Simplilearn
No ratings yet
Top 45 Machine Learning Interview Questions (2023) - Simplilearn
25 pages
Quantum vs Classical Machine Learning
No ratings yet
Quantum vs Classical Machine Learning
9 pages
Data Mining & Warehousing Exam
No ratings yet
Data Mining & Warehousing Exam
28 pages
BCI Challenge: Error Detection Model
No ratings yet
BCI Challenge: Error Detection Model
5 pages
Support Vector Machines and Kernels
No ratings yet
Support Vector Machines and Kernels
16 pages
Introduction To Machine Learning - Final Quiz 1
No ratings yet
Introduction To Machine Learning - Final Quiz 1
16 pages