1822 B.E Cse Batchno 92
1822 B.E Cse Batchno 92
by
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI - 600 119
MAY - 2022
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of GUNNAM SRINIVASU
(Reg. No: 38110180), ALLAM SHANKAR PAVAN KALYAN (Reg. No: 38110026) who
carried out the project entitled “BANK LOAN APPROVAL DATA ANALYZE AND
PREDICTION USING DATA SCIENCE TECHNIQUE (ML)” under my supervision from
November 2021 to April 2022.
I GUNNAM SRINIVASU, ALLAM SHANKAR PAVAN KALYAN hereby declare that the
Project Report entitled BANK LOAN APPROVAL DATA ANALYZE AND PREDICTION
USING DATA SCIENCE TECHNIQUE (ML) done by me under the guidance of Dr. M.
Maheswari M.E., Ph.D., (Internal) is submitted in partial fulfillment of the requirements for
I convey my thanks to Dr. T. Sasikala M.E., Ph.D, Dean, School of Computing Dr. L.
Lakshmanan M.E., Ph.D. , and Dr. S. Vigneshwari M.E., Ph.D. Head of the Department,
Dept. of Computer Science and Engineering for providing me necessary support and
details at the right time during the progressive reviews.
I would like to express my sincere and deep sense of gratitude to my Project Guide Dr.
M. Maheswari ME., Ph.D for his valuable guidance, suggestions and constant
encouragement paved way for the successful completion of my project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many ways for
the completion of the project.
ABSTRACT
i
TABLE OF CONTENTS
LIST OF FIGURES v
1 INTRODUCTION 1
1
1.1 Objective of the project 1
1.1.1 Necessity 1
1.1.2 Software development method 1
1.1.3 Layout of the document
1.2 Overview of the designed project 2
2 LITERATURE SURVEY 3
3.6.5 Advantages 11
3.8 Flow chart 12
ii
4 EXPERIMENTAL OR MATERIALS AND METHODS; ALGORITHMS 13
USED
4.4.6 Flask 16
4.5 Modules 17
iii
5 RESULTS AND DISCUSSION, PERFORMANCE ANALYSIS 42
6.1 Summary 43
6.2 Conclusion 43
6.3 Future Work 43
REFERENCES 44
APPENDIX
A. SOURCE CODE 45-49
B. SCREENSHOTS 50-52
iv
LIST OF FIGURES
v
CHAPTER-1
INTRODUCTION
The goal is to develop a machine learning model for Bank Loan Approval
Prediction, to potentially replace the updatable supervised machine learning classification
models by predicting results in the form of best accuracy by comparing supervised
algorithm.
1.1.1 Necessity:
This online bank loan approval system helps in overcoming the
time management. This Application is very easy to use. It can work accurately and
very smoothly in a different scenario. It reduces the effort workload and increases
efficiency in work. In aspects of time value, it is worthy. In this website the user can
check the loan status easily whether approved or not.
1
1.2 OVERVIEW OF THE DESIGNED PROJECT:
At first, we take the dataset from out resource then we have to perform
data-preprocessing, visualization methods for cleaning and visualizing the dataset
respectively and we applied the Machine Learning algorithms on the dataset then we
generate the pickle file for best algorithm and flask is used as user interface for displaying
the result.
2
CHAPTER-2
LITERATURE SURVEY
3
borrower’s credit history can increase risks in social lending platforms, requiring an
accurate credit risk scoring. To overcome such issues, the credit risk assessment problem
of financial operations is usually modeled as a binary problem on the basis of debt’s
repayment and proper machine learning techniques can be consequently exploited. In this
paper, we propose a benchmarking study of some of the most used credit risk scoring
models to predict if a loan will be repaid in a P2P platform. We deal with a class imbalance
problem and leverage several classifiers among the most used in the literature, which are
based on different sampling techniques. A real social lending platform (Lending Club)
data-set, composed by 877,956 samples, has been used to perform the experimental
analysis considering different evaluation metrics (i.e. AUC, Sensitivity, Specificity), also
comparing the obtained outcomes with respect to the state-of-the-art approaches. Finally,
the three best approaches have also been evaluated in terms of their explainability by
means of different eXplainable Artificial Intelligence (XAI) tools.
Title : An Approach for Prediction of Loan approval using Machine Learning Algorithm.
Author: Mohammad Ahmad Sheikh, Amit Kumar Goel, Tapas Kumar
Year : 2020
In our banking system, banks have many products to sell but main source of income of
any banks is on its credit line. So they can earn from interest of those loans which they
credits.A bank’s profit or a loss depends to a large extent on loans i.e. whether the
customers are paying back the loan or defaulting. By predicting the loan defaulters, the
bank can reduce its NonPerforming Assets. This makes the study of this phenomenon
very important. Pre vious research in this era has shown that there are so many methods
to study the problem of controlling loan default. But as the right predictions are very
important for the maximization of profits, it is essential to study the nature of the different
methods and their comparison. A very important approach in predictive analytics is used
to study the problem of predicting loan defaulters: The Logistic regression model. The
data is collected from the Kaggle for studying and prediction. Logistic Regression models
have been performed and the different measures of performances are computed. The
models are compared on the basis of the performance measures such as sensitivity and
4
specificity. The final results have shown that the model produce different results.Model is
marginally better because it includes variables (personal attributes of customer like age,
purpose, credit history, credit amount, credit duration, etc.) other than checking account
information (which shows wealth of a customer) that should be taken into account to
calculate the probability of default on loan correctly. Therefore, by using a logistic
regression approach, the right customers to be targeted for granting loan can be easily
detected by evaluating their likelihood of default on loan. The model concludes that a bank
should not only target the rich customers for granting loan but it should assess the other
attributes of a customer as well which play a very important part in credit granting
decisions and predicting the loan defaulters.
Title : Predict Loan Approval in Banking System Machine Learning Approach for
Cooperative Banks Loan Approval.
Author: Amruta S. Aphale, Dr. Sandeep R. Shinde.
Year : 2020
In today’s world, taking loans from financial institutions has become a very common
phenomenon. Everyday a large number of people make application for loans, for a variety
of purposes. But all these applicants are not reliable and everyone cannot be approved.
Every year, we read about a number of cases where people do not repay bulk of the loan
amount to the banks due to which they suffers huge losses. The risk associated with
making a decision on loan approval is immense. So the idea of this project is to gather
loan data from multiple data sources and use various machine learning algorithms on this
data to extract important information. This model can be used by the organizations in
making the right decision to approve or reject the loan request of the customers. In this
paper, we examine a real bank credit data and conduct several machine learning
algorithms on the data for that determine credit worthiness of customers in order to
formulate bank risk automated system.
5
Year : 2021
With the upgrade in the financial area loads of individuals are applying for bank advances
however the bank has its restricted resources which it needs to allow to restricted
individuals just, so discovering to whom the credit can be conceded which will be a more
secure choice for the bank is a commonplace interaction. So in this task we attempt to
decrease this danger factor behind choosing the protected individual in order to save
bunches of bank endeavors and resources. This is finished by mining the Data of the past
records of individuals to whom the advance was conceded previously and based on these
records/encounters the machine was prepared utilizing the AI model which give the most
precise outcome. The principle objective of this paper is to anticipate whether relegating
the advance to specific individual will be protected or not. This paper is separated into four
areas (i)Data Collection (ii) Comparison of AI models on gathered information (iii) Training
of framework on most encouraging model (iv) Testing.
6
Title : Modern Approach for Loan Sanctioning in Banks Using Machine Learning
Author: Golak Bihari Rath, Debasish Das, BiswaRanjan Acharya
Year : 2021
Loan analysis is a process adopted by banks used to check the credibility of loan
applicants who can pay back the sanction loan amount within regulations and loan amount
term mentioned by the bank. Most banks use their common recommended procedure of
credit scoring and background check techniques to analyze the loan application and to
make decisions on loan approval. This is overall a risk-oriented and a time-consuming
process. In some cases, people suffer through financial problems while some intentionally
try to fraud. As a result, such delay and default in payment by the loan applicants can lead
to loss of capital of the banks. Hence to overcome this, banks need to adopt a better
procedure to find the trustworthy applicants for granting loan from the list of all applicants
applied for the loan, who can pay can their loan amount in stipulated time. In the modern-
day age and advance of technology, we adopt a machine learning approach to reduce the
risk factor and human errors in the loan sanction process and determine where an
applicant is eligible for loan approval or not. Here, we examine various features such as
applicant income, credit history, education from past records of loan applicants
irrespective of their loan sanction, and the best features are determined and selected
which have a direct impact on the outcome for loan approval.
7
CHAPTER-3
AIM AND SCOPE OF THE PRESENT INVESTIGATION
3.1.1 Mission:
An online Web based machine learning application is very popular
and well known to everyone. Now a day’s everybody wants to get it and work with
it. Loan prediction is mostly useful for bank employees in approving the loan
application. This simple method gives fast and accurate results in approving the
customer application.
3.1.2 Goal:
The goal is to develop a machine learning model for Bank Loan
Approval Prediction.
8
3.4 EXISTING SYSTEM:
Anomaly detection relies on individuals’ behavior profiling and works by
detecting any deviation from the norm. When used for online banking fraud detection,
however, it mainly suffers from three disadvantages. First, for an individual, the
historical behavior data are often too limited to profile his/her behavior pattern. Second,
due to the heterogeneous nature of transaction data, there lacks a uniform treatment
of different kinds of attribute values, which becomes a potential barrier for model
development and further usage.
Third, the transaction data are highly skewed, and it becomes a challenge to
utilize the label information effectively. Anomaly detection often suffers from poor
generalization ability and a high false alarm rate. We argue that individuals’ limited
historical data for behavior profiling and the highly skewed nature of fraud data could
account for this defect. Since it is straightforward to use information from other similar
individuals, measuring similarity itself becomes a great challenge due to heterogeneous
attribute values. We propose to transform the anomaly detection problem into a
pseudo-recommender system problem and solve it with an embedding based method.
By doing so, the idea of collaborative filtering is implicitly used to utilize information from
similar users, and the learned preference matrices and attribute embedding provide a
concise way for further usage.
3.4.1 Disadvantages:
1. They had proposed a mathematical model and machine learning
algorithms is not used.
2. Class Imbalance problem was not addressed and the proper measure
were not taken.
This dataset contains 665 records of features extracted from Bank Loan datas, which
were then classified into 2 classes:
• Approve
• Reject
9
3.6 PROPOSED SYSTEM:
10
Bank Loan Dataset
Classification ML Model
Training Algorithm
dataset
3.6.5 Advantages:
➢ Performance and accuracy of the algorithms can be calculated and
compared.
➢ Class imbalance can be dealt with machine learning approaches.
11
3.8 FLOW CHART:
12
CHAPTER-4
EXPERIMENTAL OR MATERIALS AND METHODS
ALGORITHMS USED
13
understanding of underlying phenomena in the form of a model, (c) predict future
values of a phenomena using the above-generated model, and (d) detect
anomalous behavior exhibited by a phenomenon under observation.
14
4.4.1 NUMPY LIBRARY
NumPy is an open-source numerical Python library. NumPy contains
a multi- dimensional array and matrix data structures. It can be utilized to
perform a number of mathematical operations on arrays such as
trigonometric, statistical, and algebraic routines like mean, mode, standard
deviation etc…,
Installation- (https://numpy.org/install/)
Here we use pandas for reading the csv files, for grouping the data, for cleaning
the data using some operations.
15
pip install Matplotlib
Installation-(https://scikit-learn.org/stable/install.html)
4.4.6 FLASK
Flask is an API of Python that allows us to build up web-
applications. It was developed by Armin Ronacher. Flask's framework is more
16
explicit than Django's framework and is also easier to learn because it has
less base code to implement a simple web-Application.
4.5 MODULES:
A modular design reduces complexity, facilities change (a critical aspect of
software maintainability), and results in easier implementation by encouraging parallel
development of different part of system. Software with effective modularity is easier to
develop because function may be compartmentalized and interfaces are simplified.
Software architecture embodies modularity that is software is divided into separately
named and addressable components called modules that are integrated to satisfy problem
requirements.
Modularity is the single attribute of software that allows a program to be intellectually
manageable. The five important criteria that enable us to evaluate a design method with
respect to its ability to define an effective modular design are: Modular decomposability,
Modular Comps ability, Modular Understand ability, Modular continuity, Modular
Protection.
17
Fig:4.1: SYSTEM ARCHITECTURE
18
Use case diagrams are considered for high level requirement
analysis of a system. So when the requirements of a system are analyzed the
functionalities are captured in use cases. So, it can say that uses cases are
nothing but the system functionalities written in an organized manner.
19
4.6.3 Activity Diagram
20
4.6.4 Sequence Diagram
21
4.6.5 Entity Relationship Diagram (ERD)
The following are the modules of the project, which is planned in aid
to complete the project with respect to the proposed system, while overcoming
existing system and also providing the support for the future enhancement.
22
4.7 MODULE DETAILS:
4.7.1 Data Pre-processing
Validation techniques in machine learning are used to get the error rate
of the Machine Learning (ML) model, which can be considered as close to the true
error rate of the dataset. If the data volume is large enough to be representative
of the population, you may not need the validation techniques. However, in real-
world scenarios, to work with samples of data that may not be a true
representative of the population of given dataset. To finding the missing value,
duplicate value and description of data type whether it is float variable or integer.
The sample of data used to provide an unbiased evaluation of a model fit on the
training dataset while tuning model hyper parameters.
The evaluation becomes more biased as skill on the validation dataset
is incorporated into the model configuration. The validation set is used to evaluate
a given model, but this is for frequent evaluation. It as machine learning engineers
use this data to fine-tune the model hyper parameters. Data collection, data
analysis, and the process of addressing data content, quality, and structure can
add up to a time-consuming to-do list. During the process of data identification, it
helps to understand your data and its properties; this knowledge will help you
choose which algorithm to use to build your model.
A number of different data cleaning tasks using Python Pandas library
and specifically, it focuses on probably the biggest data cleaning task, missing
values and it able to more quickly clean data. It wants to spend less time cleaning
data, and more time exploring and modeling.
Some of these sources are just simple random mistakes. Other times,
there can be a deeper reason why data is missing. It’s important to understand
these different types of missing data from a statistics point of view. The type of
missing data will influence how to deal with filling in the missing values and to
detect missing values, and do some basic imputation and detailed statistical
approach for dealing with missing data. Before, joint into code, it’s important to
understand the sources of missing data. Here are some typical reasons why data
is missing:
23
• User forgot to fill in a field.
• Data was lost while transferring manually from a legacy database.
• There was a programming error.
• Users chose not to fill out a field tied to their beliefs about how the results
would be used or interpreted.
24
The steps and techniques for data cleaning will vary from dataset to dataset. The
primary goal of data cleaning is to detect and remove errors and anomalies to
increase the value of data in analytics and decision making.
MODULE DIAGRAM
25
Python and how to use them to better understand your own data.
➢ How to chart time series data with line plots and categorical quantities
with bar charts.
➢ How to summarize data distributions with histograms and box plots.
MODULE DIAGRAM
27
the same way. Before that comparing algorithm, Building a Machine Learning
Model using install Scikit-Learn libraries. In this library package have to done
preprocessing, linear model with logistic regression method, cross validating by
KFold method, ensemble with random forest method and tree with decision tree
classifier. Additionally, splitting the train set and test set. To predicting the result
by comparing accuracy.
False Positives (FP): A person who will pay predicted as defaulter. When actual
class is no and predicted class is yes. E.g. if actual class says this passenger did
not survive but predicted class tells you that this passenger will survive.
False Negatives (FN): A person who default predicted as payer. When actual
class is yes but predicted class in no. E.g. if actual class value indicates that this
passenger survived and predicted class tells you that passenger will die.
True Positives (TP): A person who will not pay predicted as defaulter. These are
the correctly predicted positive values which means that the value of actual class
is yes and the value of predicted class is also yes. E.g. if actual class value
indicates that this passenger survived and predicted class tells you the same
thing.
True Negatives (TN): A person who default predicted as payer. These are the
correctly predicted negative values which means that the value of actual class is
no and value of predicted class is also no. E.g. if actual class says this passenger
did not survive and predicted class tells you the same thing.
28
True Positive Rate (TPR) = TP / (TP + FN)
False Positive Rate (FPR) = FP / (FP + TN)
Accuracy calculation:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Accuracy is the most intuitive performance measure and it is simply a ratio of
correctly predicted observation to the total observations. One may think that, if we
have high accuracy then our model is best. Yes, accuracy is a great measure but
only when you have symmetric datasets where values of false positive and false
negatives are almost same.
F1 Score is the weighted average of Precision and Recall. Therefore, this score
takes both false positives and false negatives into account. Intuitively it is not as
29
easy to understand as accuracy, but F1 is usually more useful than accuracy,
especially if you have an uneven class distribution. Accuracy works best if false
positives and false negatives have similar cost. If the cost of false positives and
false negatives are very different, it’s better to look at both Precision and Recall.
General Formula:
F- Measure = 2TP / (2TP + FP + FN)
F1-Score Formula:
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
30
NumPy:
• It is a numeric python module which provides fast maths functions for
calculations.
• It is used to read data in numpy arrays and for manipulation purpose.
Pandas:
• Used to read and write different files.
• Data manipulation can be done easily with data frames.
Matplotlib:
• Data visualization is a useful way to help with identify the patterns from given
dataset.
• Data manipulation can be done easily with data frames.
Logistic Regression:
It is a statistical method for analyzing a data set in which there are one or more
independent variables that determine an outcome. The outcome is measured with
a dichotomous variable (in which there are only two possible outcomes). The goal
of logistic regression is to find the best fitting model to describe the relationship
between the dichotomous characteristic of interest (dependent variable =
response or outcome variable) and a set of independent (predictor or explanatory)
variables. Logistic regression is a Machine Learning classification algorithm that
is used to predict the probability of a categorical dependent variable. In logistic
regression, the dependent variable is a binary variable that contains data coded
as 1 (yes, success, etc.) or 0 (no, failure, etc.).
31
➢ Only the meaningful variables should be included.
➢ The independent variables should be independent of each other. That is,
the model should have little.
➢ The independent variables are linearly related to the log odds.
➢ Logistic regression requires quite large sample sizes.
MODULE DIAGRAM
32
Random Forest Classifier:
Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks, that operate by constructing a multitude
of decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual
trees. Random decision forests correct for decision trees’ habit of over fitting to
their training set. Random forest is a type of supervised machine learning
algorithm based on ensemble learning. Ensemble learning is a type of learning
where you join different types of algorithms or same algorithm multiple times to
form a more powerful prediction model. The random forest algorithm combines
multiple algorithm of the same type i.e. multiple decision trees, resulting in a forest
of trees, hence the name "Random Forest". The random forest algorithm can be
used for both regression and classification tasks.
The following are the basic steps involved in performing the random forest
algorithm:
➢ Pick N random records from the dataset.
➢ Build a decision tree based on these N records.
➢ Choose the number of trees you want in your algorithm and repeat steps
1 and 2.
In case of a regression problem, for a new record, each tree in the forest predicts
a value for Y (output). The final value can be calculated by taking the average of
all the values predicted by all the trees in forest. Or, in case of a classification
problem, each tree in the forest predicts the category to which the new record
belongs. Finally, the new record is assigned to the category that wins the majority
vote.
33
Fig:4.8: RANDOM FOREST CLASSIFIER
34
same time an associated decision tree is incrementally developed. A decision
node has two or more branches and a leaf node represents a classification or
decision. The topmost decision node in a tree which corresponds to the best
predictor called root node. Decision trees can handle both categorical and
numerical data. Decision tree builds classification or regression models in the form
of a tree structure. It utilizes an if-then rule set which is mutually exclusive and
exhaustive for classification. The rules are learned sequentially using the training
data one at a time. Each time a rule is learned, the tuples covered by the rules
are removed. This process is continued on the training set until meeting a
termination condition. It is constructed in a top-down recursive divide-and-conquer
manner. All the attributes should be categorical. Otherwise, they should be
discretized in advance. Attributes in the top of the tree have more impact towards
in the classification and they are identified using the information gain concept. A
decision tree can be easily over-fitted generating too many branches and may
reflect anomalies due to noise or outliers.
35
Naive Bayes algorithm:
The Naive Bayes algorithm is an intuitive method that uses the probabilities of
each attribute belonging to each class to make a prediction. It is the supervised
learning approach you would come up with if you wanted to model a predictive
modeling problem probabilistically.
➢ Naive bayes simplifies the calculation of probabilities by assuming that
the probability of each attribute belonging to a given class value is
independent of all other attributes. This is a strong assumption but results
in a fast and effective method.
➢ The probability of a class value given a value of an attribute is called the
conditional probability. By multiplying the conditional probabilities together
for each attribute for a given class value, we have a probability of a data
instance belonging to that class. To make a prediction we can calculate
probabilities of the instance belonging to each class and select the class
value with the highest probability.
➢ Naive Bayes is a statistical classification technique based on Bayes
Theorem. It is one of the simplest supervised learning algorithms. Naive
Bayes classifier is the fast, accurate and reliable algorithm. Naive Bayes
classifiers have high accuracy and speed on large datasets.
➢ Naive Bayes classifier assumes that the effect of a particular feature in a
class is independent of other features. For example, a loan applicant is
desirable or not depending on his/her income, previous loan and
transaction history, age, and location.
➢ Even if these features are interdependent, these features are still
considered independently. This assumption simplifies computation, and
that's why it is considered as naive. This assumption is called class
conditional independence.
36
Fig:4.10: NAÏVE BAYES CLASSIFIER
K-Nearest Neighbor
K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique. It assumes the similarity between the new
case/data and available cases and put the new case into the category that is most
similar to the available categories. It stores all the available data and classifies a
new data point based on the similarity. This means when new data appears then
it can be easily classified into a well suite category by using K- NN algorithm. K-
NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
37
Fig:4.11: K-NEAREST NEIGHBOR
38
Fig:4.12: SUPPORT VECTOR CLASSIFIER
Gradient Boost
Gradient boosting algorithm is one of the most powerful algorithms in the field of
machine learning. As we know that the errors in machine learning algorithms are
broadly classified into two categories i.e., Bias Error and Variance Error. As
gradient boosting is one of the boosting algorithms it is used to minimize bias error
of the model.
Gradient boosting algorithm can be used for predicting not only continuous target
variable (as a Regressor) but also categorical target variable (as a Classifier).
39
Fig:4.13: GRADIENT BOOST
40
enthusiasts formed in 2004. According to Ronacher, the idea was originally
an April Fool’s joke that was popular enough to make into a serious
application. The name is a play on the earlier Bottle framework.
When Ronacher and Georg Brand created a bulletin board system written in
Python, the Pocoo projects Werkzeug and Jinja were developed.
In April 2016, the Pocoo team was disbanded and development of Flask and
related libraries passed to the newly formed Pallets project.
Flask has become popular among Python enthusiasts. As of October 2020, it has
second most stars on GitHub among Python web-development frameworks, only
slightly behind Django, and was voted the most popular web framework in the
Python Developers Survey 2018.
The micro-framework Flask is part of the Pallets Projects, and based on several
others of them.
41
CHAPTER-5
RESULTS AND DISCUSSION, PERFORMANCE ANALYSIS
5.2 DISCUSSION:
While discussions provide avenues for exploration and discovery,
leading a discussion can be anxiety-producing: discussions are, by their nature,
unpredictable, and require us as instructors to surrender a certain degree of control over
the flow of information. Fortunately, careful planning can help us ensure that discussions
are lively without being chaotic and exploratory without losing focus. When planning a
discussion, it is helpful to consider not only cognitive, but also social/emotional, and
physical factors that can either foster or inhibit the productive exchange of ideas.
42
CHAPTER-6
SUMMARY AND CONCLUSION
6.1 SUMMARY:
This project objective is to predict the Loan Approval of the user. So this
online banking loan approval system will reduce the paper work and reduce the wastage
of bank asserts and efforts and also saves the valuable time of the customer.
6.2 CONCLUSION:
The analytical process started from data cleaning and processing,
missing value, exploratory analysis and finally model building and evaluation. The best
accuracy on public test set is higher accuracy score will be find out. This application can
help to find the Prediction of Bank Loan Approval.
43
REFERENCES:
[1] Amruta S. Aphale , Dr. Sandeep R. Shinde, 2020, Predict Loan Approval in
Banking System Machine Learning Approach for Cooperative Banks Loan
Approval, International Journal Of Engineering Research & Technology (IJERT)
Volume 09, Issue 08 (August 2020)
[2] Ashwini S. Kadam, Shraddha R. Nikam, Ankita A. Aher, Gayatri V. Shelke, Amar
S.Chandgude, “Prediction for Loan Approval using Machine Learning Algorithm”
(IRJET) Volume: 08 Issue: 04 | Apr 2021.
[3] M. A. Sheikh, A. K. Goel and T. Kumar, "An Approach for Prediction of Loan
Approval using Machine Learning Algorithm," 2020 International Conference on
Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 490-
494, doi: 10.1109/ICESC48915.2020.9155614.
[4] Rath, Golak & Das, Debasish & Acharya, Biswaranjan. (2021). Modern Approach
for Loan Sanctioning in Banks Using Machine Learning. Pages={179-188}
10.1007/978-981-15-5243-4_15.
[5] Vincenzo Moscato, Antonio Picariello, Giancarlo Sperlí, A benchmark of machine
learning approaches for credit score prediction, Expert Systems with Applications,
Volume 165, 2021, 113986, ISSN 0957-4174.
[6] Yash Divate, Prashant Rana, Pratik Chavan, “Loan Approval Prediction Using
Machine Learning” International Research Journal of Engineering and
Technology (IRJET) Volume: 08 Issue: 05 | May 2021.
44
APPENDIX:
A. SOURCE CODE:
Jupyter Notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import missingno as mso
import seaborn as sns
df=pd.read_csv("Loan_Train.csv")
df.head()
df.Married.value_counts(dropna=False)
sns.countplot(x="Married", data=df, palette="Paired")
plt.show()
df = pd.get_dummies(df)
df = df.drop(['Gender_Female', 'Married_No', 'Education_Not Graduate',
'Self_Employed_No', 'Loan_Status_N'], axis = 1)
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
45
df = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 0)
scoreListknn = []
for i in range(1,21):
KNclassifier = KNeighborsClassifier(n_neighbors = i)
KNclassifier.fit(X_train, y_train)
scoreListknn.append(KNclassifier.score(X_test, y_test))
NBclassifier2 = GaussianNB()
NBclassifier2.fit(X_train, y_train)
y_pred = NBclassifier2.predict(X_test)
scoreListDT = []
for i in range(2,21):
DTclassifier = DecisionTreeClassifier(max_leaf_nodes=i)
DTclassifier.fit(X_train, y_train)
scoreListDT.append(DTclassifier.score(X_test, y_test))
scoreListRF = []
for i in range(2,25):
RFclassifier = RandomForestClassifier(n_estimators = 1000, random_state =
1, max_leaf_nodes=i)
RFclassifier.fit(X_train, y_train)
scoreListRF.append(RFclassifier.score(X_test, y_test))
46
import pickle
file=open("RFclassifier.pkl", 'wb')
pickle.dump(RFclassifier, file)
Predict.html
<!doctype html>
<html lang="en">
<head><title>prediction</title></head><body>
<section class="text-gray-600 body-font"><div class="container px-5 py-24
mx-auto"><div class="flex flex-col text-center w-full mb-20">
<h1 class="sm:text-3xl text-2xl font-medium title-font mb-4 text-gray-
900">Loan prediction</h1><p class="lg:w-2/3 mx-auto leading-relaxed text-
base">fill the form for prediction</p></div><div> </div>
<a class="btn btn-primary" href="./" role="button">Back</a>
</br></br> <form action='/predict' method='POST'><div class="mb-3">
<label for="exampleFormControlInput1" class="form-label"> Gender</label>
<select class="form-select" id="gender" name="gender" aria-label="Default
select example"><option selected>-- select gender --</option>
<option value="Male">Male</option><option value="Female">Female</option>
</select></div><div class="mb-3">
<label for="exampleFormControlInput1" class="form-label">Enter
ApplicantIncome</label><input type="text" class="form-control"
id="ApplicantIncome" name="ApplicantIncome" placeholder="ApplicantIncome">
</div></div><button type="submit" class="btn btn-primary">Predict</button>
</form></div></section>
</body></html>
47
LA. Html (Loan Approval):
<!DOCTYPE html>
<html lang="en">
<head>
</style><title>Loan Approval</title>
</head>
<body>
<div class="bg-image"></div>
<div class="bg-text">
<div class="col-md-6 my-2 d-flex align-items-end justify-content-
around"> <a href="/predict">
<button type="submit" class="btn btn-danger button"
style="margin-right: 100%;">Back</button></a>
</div>
<h1>LOAN STATUS</h1>
<p>You Will Get the Approval from the Bank </p>
</div>
</body>
</html>
<div class="bg-text">
<div class="col-md-6 my-2 d-flex align-items-end justify-content-
around">
<a href="/predict"> <button type="submit" class="btn btn-danger
button" style="margin-right: 100%;">Back</button></a>
</div>
<h1>LOAN STATUS</h1>
<p>Your Details are not satisfed For Loan Approval</p>
</div></body>
</html>
48
App.py:
from flask import Flask, escape, request, render_template
import pickle
import numpy as np
app = Flask(__name__)
model = pickle.load(open('RFclassifier.pkl', 'rb'))
@app.route('/')
def home():
return render_template("index.html")
@app.route('/predict', methods=['GET', 'POST'])
def predict():
if request.method == 'POST':
gender = request.form['gender'] married = request.form['married']
dependents = request.form['dependents']
education = request.form['education']
self_employed = request.form['employed']
credit = float(request.form['credit'])area = request.form['area']
ApplicantIncome = float(request.form['ApplicantIncome'])
CoapplicantIncome = float(request.form['CoapplicantIncome'])
LoanAmount = float(request.form['LoanAmount'])
Loan_Amount_Term = float(request.form['Loan_Amount_Term'])
if (gender == "Male"):
Gender_Male=1
else:
Gender_Male=0
if(married=="Yes"):
Married_Yes = 1
else:
Married_Yes=0
if(dependents=='1'):
dependents_0 = 0
dependents_1 = 1
arr=np.array([[credit, ApplicantIncome, CoapplicantIncome, LoanAmount,
Loan_Amount_Term, Gender_Male, Married_Yes, dependents_0, dependents_1,
dependents_2, dependents_3, education_graduate, self_employed_yes,semiurban,
urban, rural ]])
prediction = model.predict(arr)
if(prediction==0):
return render_template("LR.html")
elif(prediction!=0):
return render_template("LA.html")
else:
return render_template("prediction.html")
if __name__ == "__main__":
app.run(debug=True)
49
B. SCREENSHOTS
50
Fig:6.3: INPUTS PAGE
51
Fig:6.5: LOAN APPROVED
52
Bank loan approval research
paper (1).docx
by Bank Loan Approval Research Paper (1).docx Bank Loan Approval
Research Paper (1).docx
1
4
2
Bank loan approval research paper (1).docx
ORIGINALITY REPORT
2 %
SIMILARITY INDEX
2%
INTERNET SOURCES
1%
PUBLICATIONS
1%
STUDENT PAPERS
PRIMARY SOURCES
1
www.ijitee.org
Internet Source 1%
2
Submitted to University of Central Lancashire
Student Paper 1%
3
ebooks.iospress.nl
Internet Source <1 %
4
essay.utwente.nl
Internet Source <1 %