0% found this document useful (0 votes)

32 views9 pages

Credit Card Fraud Detection Using Machine Learning Techniques A Comparative Analysis

This document compares the performance of naive bayes, k-nearest neighbor, and logistic regression classifiers for detecting credit card fraud. It analyzes these techniques on a credit card transaction data set containing 284,807 records where fraud accounts for 0.172% of transactions. The data is resampled using under-sampling of legitimate transactions and over-sampling of fraudulent transactions to create more balanced distributions for analysis. Results show that k-nearest neighbor achieved the highest accuracy of 97.69%, outperforming naive bayes and logistic regression.

Uploaded by

Joás Mendes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views9 pages

Credit Card Fraud Detection Using Machine Learning Techniques A Comparative Analysis

Uploaded by

Joás Mendes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Credit card fraud detection using Machine Learning

Techniques:
A Comparative Analysis

John O. Awoyemi Adebayo O. Adetunmbi Samuel A. Oluwadare

Department of Computer Science Department of Computer Science Department of Computer Science
Federal University of Technology Federal University of Technology Federal University of Technology
Akure Akure Akure
Akure, Nigeria Akure, Nigeria Akure, Nigeria
[email protected] [email protected] [email protected]

Abstract—Financial fraud is an ever growing menace with far of stolen credit card to get cash through dubious means. A lot
consequences in the financial industry. Data mining had played of researches have been devoted to detection of external card
an imperative role in the detection of credit card fraud in online fraud which accounts for majority of credit card frauds.
transactions. Credit card fraud detection, which is a data mining Detecting fraudulent transactions using traditional methods of
problem, becomes challenging due to two major reasons – first,
manual detection is time consuming and inefficient, thus the
the profiles of normal and fraudulent behaviours change
constantly and secondly, credit card fraud data sets are highly advent of big data has made manual methods more
skewed. The performance of fraud detection in credit card impractical. However, financial institutions have focused
transactions is greatly affected by the sampling approach on attention to recent computational methodologies to handle
dataset, selection of variables and detection technique(s) used. credit card fraud problem.
This paper investigates the performance of naïve bayes, k-nearest Data mining technique is one notable methods used in
neighbor and logistic regression on highly skewed credit card solving credit fraud detection problem. Credit card fraud
fraud data. Dataset of credit card transactions is sourced from detection is the process of identifying those transactions that
European cardholders containing 284,807 transactions. A hybrid are fraudulent into two classes of legitimate (genuine) and
technique of under-sampling and oversampling is carried out on
fraudulent transactions [1]. Credit card fraud detection is
the skewed data. The three techniques are applied on the raw and
preprocessed data. The work is implemented in Python. The based on analysis of a card’s spending behaviour. Many
performance of the techniques is evaluated based on accuracy, techniques have been applied to credit card fraud detection,
sensitivity, specificity, precision, Matthews correlation coefficient artificial neural network [2], genetic algorithm [3, 4], support
and balanced classification rate. The results shows of optimal vector machine [5], frequent itemset mining [6], decision tree
accuracy for naïve bayes, k-nearest neighbor and logistic [7], migrating birds optimization algorithm [8], naïve bayes
regression classifiers are 97.92%, 97.69% and 54.86% [9]. A comparative analysis of logistic regression and naive
respectively. The comparative results show that k-nearest bayes is carried out in [10]. The performance of bayesian and
neighbour performs better than naïve bayes and logistic neural network [11] is evaluated on credit card fraud data.
regression techniques. Decision tree, neural networks and logistic regression are
tested for their applicability in fraud detections [12]. This
Keywords—credit card fraud; data mining; naïve bayes;
paper [13] evaluates two advanced data mining approaches,
decision tree; logistic regression, comparative analysis support vector machines and random forests, together with
logistic regression, as part of an attempt to better detect credit
I. INTRODUCTION card fraud while neural network and logistic regression is
Financial fraud is an ever growing menace with far applied on credit card fraud detection problem [14]. A number
reaching consequences in the finance industry, corporate of challenges are associated with credit card detection, namely
organizations, and government. Fraud can be defined as fraudulent behaviour profile are dynamic, that is fraudulent
criminal deception with intent of acquiring financial gain. transactions tend to look like legitimate ones; credit card
High dependence on internet technology has enjoyed transaction datasets are rarely available and highly imbalanced
increased credit card transactions. As credit card transactions (or skewed); optimal feature (variables) selection for the
become the most prevailing mode of payment for both online models; suitable metric to evaluate performance of techniques
and offline transaction, credit card fraud rate also accelerates. on skewed credit card fraud data. Credit card fraud detection
Credit card fraud can come in either inner card fraud or performance is greatly affected by type of sampling approach
external card fraud. Inner card fraud occurs as a result of used, selection of variables and detection technique(s) used.
consent between cardholders and bank by using false identity This study investigates the effect of hybrid sampling on
to commit fraud while the external card fraud involves the use performance of fraud detection of naïve bayes, k-nearest

978-1-5090-4642-3/17/$31.00 ©2017 IEEE

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
neighbour and logistic regression classifiers on highly skewed fraudulent cases identified in a pool of credit card transactions
credit card fraud data. data leading to a highly skewed distribution towards the
This paper seeks to carry out comparative analysis of negative class (legitimate transactions). The credit card data
credit card fraud detection using naive bayes, k-nearest investigated in [18] contains 20% of the positive cases,
neighbor and logistic regression techniques on highly skewed 0.025% positive cases [19] and below 0.005% positive cases
data based on accuracy, sensitivity, specificity and Matthews’s [8]. The data used in this study has positive class (frauds)
correlation coefficient (MCC) metrics. This paper extends the accounting for 0.172% of all transactions. A number of
handling of highly imbalanced credit card fraud data in [33]. sampling approaches have been applied to the highly skewed
The imbalanced dataset used in this study which contains credit card transactions data. A random sampling approach is
about 0.172% of fraud transactions is sampled in a hybrid used in [18, 20] and reports experimental results indicating
approach. The positive class (fraud) is oversampled while the that 50:50 artificially distribution of fraud/non-fraud training
negative class (legitimate) is under-sampled by the same data generate classifiers with the highest true positive rate and
number of times to achieve two distributions of 34:66 and low false positive rate. The paper [8] uses stratified sampling
10:90. The three techniques are applied to the data. The to under sample the legitimate records to a meaningful
performance comparison of the three techniques is analyzed number. It experiment on 50:50, 10:90 and 1:99 distributions
based on accuracy, sensitivity, specificity, Matthews of fraud to legitimate cases reports that 10:90 distribution has
Correlation Coefficient (MCC) and balanced classification the best performance (regarding the performance comparisons
rate. on the 1:99 set) as it is closest to the real distribution of frauds
The rest of this paper is organized as follows: Section II and legitimates. Stratified sampling is also applied in [21]. In
gives detailed review on credit card fraud, feature selection this study, a hybrid of under-sampling the negative cases and
detection techniques and performance comparison. Section III oversampling the positive cases is carried in order to preserve
describes the experimental setup approach including the data valuable patterns from the data.
pre-processing and the three classifier methods on credit card
fraud detection. Section IV reports the experimental results B. Feature (Variables) selection
and discussion about the comparative analysis. Section V The basis of credit card fraud detection lies in the analysis
concludes the comparative study and suggests future areas of of cardholder’s spending behaviour. This spending profile is
research. analysed using optimal selection of variables that capture the
unique behaviour of a credit card. The profile of both a
II. RELATED WORKS
legitimate and fraudulent transaction tends to be constantly
Classification of credit card transactions is mostly a binary changing. Thus, optimal selection of variables that greatly
classification problem. Here, credit card transaction is either as differentiates both profiles is needed to achieve efficient
a legitimate transaction (negative class) or a fraudulent classification of credit card transaction. The variables that
transaction (positive class). Fraud detection is generally form the card usage profile and techniques used affect the
viewed as a data mining classification problem, where the performance of credit card fraud detection systems. These
objective is to correctly classify the credit card transactions as variables are derived from a combination of transaction and
legitimate or fraudulent [6]. past transaction history of a credit card. These variables fall
A. Credit Card Fraud under five main variable types, namely all transactions
statistics, regional statistics, merchant type statistics, time-
Credit card frauds have been partitioned into two types: based amount statistics and time-based number of transactions
inner card fraud and external fraud [12, 15] while a broader statistics [19].
classification have been done in three categories, that is, The variables that fall under all transactions statistics type
traditional card related frauds (application, stolen, account depict the general card usage profile of the card. The variables
takeover, fake and counterfeit), merchant related frauds under regional statistics type show the spending habits of the
(merchant collusion and triangulation) and Internet frauds card with taken into account the geographical regions. The
(site cloning, credit card generators and false merchant sites) variables under merchant statistics type show the usage of the
[16]. It is reported in [17] that the total amount of fraud losses card in different merchant categories. The variables of time-
of banks and businesses around the world reached more than based statistics types identify the usage profile of the cards
USD 16 billion in 2014 with an increase of nearly USD 2.5 with respect to usage amounts versus time ranges or
billion in the previous year recorded losses, meaning that, each frequencies of usage versus time ranges. Most literature
USD 100 is having 5.6 cents that was fraudulent, the report focused on cardholder profile rather than card profile. It is
concluded. evident that a person can operate two or more credit cards for
Credit card transactions data are mainly characterized by different purposes. Therefore, one can exhibit different
an unusual phenomenon. Both legitimate transactions and spending profile on such cards. In this study, focus is beamed
fraudulent ones tend to share the same profile. Fraudsters learn on card rather than cardholder because one credit card can
new ways to mimic the spending behaviour of legitimate card only exhibit a unique spending profile while a cardholder can
(or cardholder). Thus, the profiles of normal and fraudulent exhibit multiple behaviours on different cards. A total of 30
behaviours are constantly dynamic. This inherent variables are used in [18], 27 variables in [19] and 20 variables
characteristic leads to a decrease in the number of true are reduced to 16 relevant ones [6].

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
C. Credit card Fraud Detection Back-propagation (BP), together with naive Bayesian (NB)
As credit card becomes the most general mode of payment and C4.5 algorithms are applied to skewed data partitions
(both online and regular purchase), fraud rate tends to derived from minority oversampling with replacement [25].
accelerate. Detecting fraudulent transactions using traditional The study shows that innovative use of naive Bayesian (NB),
methods of manual detection are time consuming and C4.5, and back-propagation (BP) classifiers to process the
inaccurate, thus the advent of big data had made these manual same partitioned numerical data has the potential of getting
methods more impractical. However, financial institutions better cost savings. An adaptive and robust model learning
have turned to intelligent techniques. These intelligent fraud method that is highly adaptive to concept changes and is
techniques comprise of computational intelligence (CI)-based robust to noise is presented [26]. The classifiers’ weights are
techniques. Statistical fraud detection methods have been computed by logistic regression technique, which ensures
divided into two broad categories: supervised and good adaptability. Three different classification methods,
unsupervised [22]. In supervised fraud detection methods [13], decision tree, neural networks and logistic regression are
models are estimated based on the samples of fraudulent and tested for their applicability in fraud detections [12]. The
legitimate transactions to classify new transactions as results show that the proposed classifier of neural networks
fraudulent or legitimate while in unsupervised fraud detection, and logistic regression approaches outperform decision tree in
outliers’ transactions are detected as potential instances of solving the problem under investigation. A fusion approach
fraudulent transactions. A detailed discussion of supervised using Dempster–Shafer theory and Bayesian learning for
and unsupervised techniques is found in [23]. Quite a number detecting credit card fraud is proposed [27].The results also
of studies on a range of techniques have been carried out in show that use of Bayesian learning however, brings down the
solving credit card fraud detection problem. These techniques false positive rates to values close to 5%.
include but not limited to; neural network models (NN), Detection of credit card fraud using decision trees and
Bayesian network (BN), intelligent decision engines (IDE), support vector machines is investigated [28] and the results
expert systems, meta-learning agents, machine learning, show that the proposed classifiers of decision tree approaches
pattern recognition, rule-based systems, logic regression (LR), outperform SVM approaches in solving the problem under
support vector machine (SVM), decision tree, k-nearest investigation. As the training data scales, SVM based model
neighbor (kNN), meta learning strategy, adaptive learning etc. detection accuracy equal that of the decision tree based
Some related works on comparative study of credit card fraud models, but fall short in the number of frauds detected. This
detection techniques are presented. paper [13] evaluates the performance of logistic regression
alongside two advanced data mining approaches, support
D. Comparative study vector machines and random forests in credit card fraud
A study of the issues and results associated with credit card detection. The study shows that logistic regression maintained
fraud detection using meta-learning is presented [18]. This similar performance with different levels of under-sampling,
study is geared towards investigating distribution of frauds while SVM performance tend to increase with lower
and non-frauds that will lead to better performance, best proportion of fraud in the training data. Logistic regression
learning algorithms between meta-learning strategy. The shows appreciable performance, often surpassing that of the
results show that given a skewed distribution in the original SVM models with different kernels. In another study,
data, artificially more balanced training data leads to better classification models based on Artificial Neural Networks
classifiers. It demonstrate how meta-learning can be used to (ANN) and Logistic Regression (LR) are developed and
combine different classifiers and maintain, and in some cases, applied on credit card fraud detection problem [14] using a
improve the performance of the best classifier. Multiple highly skewed data. The results show that the proposed ANN
algorithms for fraud detection are investigated in [24] and classifiers outperform LR classifiers in solving the problem
results indicate that an adaptive solution can provide fraud under investigation. The logistic regression classifiers tend to
filtering and case ordering functions for reducing the number over fit the training data as it increases. This is due to lack of
of final-line fraud investigations necessary. A comparison of adequate sampling in the work. A comparative assessment of
logistic regression and naive bayes is presented in [10]. The supervised data mining techniques for fraud prevention is
results of the analysis shows that even though the presented in [29]. The techniques evaluated are decision tree,
discriminative logistic regression algorithm has a lower neural network and naive bayes classifiers. It is reported that
asymptotic error, the generative naive Bayes classifier may neural network classifiers are suitable for larger databases only
also converge more quickly to its (higher) asymptotic error. and take long time to train the model. Bayesian classifiers are
There are a few cases reported in which logistic regression's more accurate and much faster to train and suitable for
performance underperformed that of naive Bayes, but this is different sizes of data but are slower when applied to new
observed primarily in particularly small datasets. Another instances.
comparative study on credit card fraud detection using A meta-classification strategy is applied in improving
Bayesian and neural networks is done [11]. The results report credit card fraud detection [30]. The approach consists of 3
that Bayesian network performs better than neural network in base classifiers constructed using the decision tree, naïve
detecting credit card fraud. Bayesian, and k-nearest neighbour algorithms. Using the naïve
Bayesian algorithm as the meta-level algorithm to combine the

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
base classifier predictions, the result shows 28% improvement subtraction of a data point interpolated between existing data
in performance. This paper [31] put a light on performance points till over-fitting threshold is reached.
evaluation based on the correct and incorrect instances of data
classification using Naïve Bayes and decision tree. The results n
show that the efficiency and accuracy of J48 is better than that PCnew =  PC + i (1)
of Naïve Bayes [31]. In this paper [19], new comparison i =1
measure that realistically represents the monetary gains and n
NCnew =  NC − i (2)
losses due to fraud detection shows that including the real cost
by creating a cost sensitive system using a Bayes minimum i =1
risk classifier, gives rise to much better fraud detection results n = mod(( NC / PC ) / 2) (3)
in the sense of higher savings. where PCnew is the new number of positive data point
instances, NCnew is the new number of negative data points, n
III. EXPERIMENTAL SET UP AND METHODS is the modulus of the ratio (NC/PC) of number of negative
This section describes the dataset used in the experiments class to positive class, PC and NC is the number of positive
and the three classifiers under study, namely; Naïve Bayes, k- and negative class data points in imbalanced dataset
Nearest Neighbour and Logistic Regression techniques. The respectively.
different stages involved in generating the classifiers include;
collection of data, preprocessing of data, analysis of data, C. Naïve Bayes Classifier
training of the classifier algorithm and testing (evaluation). Naïve Bayes a statistical approach based on Bayesian
During the preprocessing stage, the data is converted into theory, which chooses the decision based on highest
useable format fit and sampled. A hybrid of under-sampling probability. Bayesian probability estimates unknown
(the negative cases) and over-sampling (the positive cases) is probabilities from known values. It also allows prior
carried out to achieve two sets of data distributions. For the knowledge and logic to be applied to uncertain statements.
analysis stage, the feature selection and reduction is already This technique has an assumption of conditional independence
carried out on the dataset using PCA. The training stage is among features in the data. The Naïve Bayes classifier is
where the classifier algorithms are developed and fed with the based on the conditional probabilities (4) and (5) of the binary
processed data. The experiments are evaluated using True
classes (fraud and non fraud).
positive, True Negative, False Positive and False Negative

) P( f k P| c(if )*)P(ci )
rates metric. The performance comparison of the classifiers is
analyzed based on accuracy, sensitivity, specificity, precision, (
P ci | f k = (4)
Matthews correlation coefficient and balanced classification k
rate.
P( f k | ci ) = ∏ P( f k ci ) k = 1,..., n; i = 1,2
n
A. Dataset (5)
i =1
The dataset is sourced from ULB Machine Learning Group
where n represents maximum number of features (30), P(ci | fk)
and description is found in [32]. The dataset contains credit
card transactions made by European cardholders in September is probability of feature value fk being in class ci, P(fk | ci) is
2013. This dataset presents transactions that occurred in two probability of generating feature value fk given class ci, P(ci)
days, consisting of 284,807 transactions. The positive class and P(fk) are probability of occurrence of class ci and
(fraud cases) make up 0.172% of the transactions data. The probability of feature value fk occurring respectively. The
dataset is highly unbalanced and skewed towards the positive classifier performs the binary classification based on Bayesian
class. It contains only numerical (continuous) input variables classification rule.
which are as a result of a Principal Component Analysis
(PCA) feature selection transformation resulting to 28 If P(c1 | fk) > P(c2 | fk) then the classification is C1
principal components. Thus a total of 30 input features are
utilized in this study. The details and background information If P(c1 | fk) < P(c2 | fk) then the classification is C2
of the features cannot be presented due to confidentiality Ci is the target class for classification where C1 is the negative
issues. The time feature contains the seconds elapsed between class (non fraud cases) and C2 is the positive class (fraud
each transaction and the first transaction in the dataset. The cases).
'amount' feature is the transaction amount. Feature 'class' is the
target class for the binary classification and it takes value 1 for D. K-Nearest Neighbour Classifier
positive case (fraud) and 0 for negative case (non fraud). The k-nearest neighbour is an instance based learning
which carries out its classification based on a similarity
B. Hybrid Sampling of dataset
measure, like Euclidean, Mahanttan or Minkowski distance
Data pre-processing is carried out on the data. A hybrid of functions. The first two distance measures work well with
under-sampling and over-sampling is carried out on the highly continuous variables while the third suits categorical variables.
unbalanced dataset to achieve two sets of distribution (10:90 The Euclidean distance measure is used in this study for the
and 34:64) for analysis. This is done by stepwise addition and kNN classifier. The Euclidean distance (Dij) between two
input vectors (Xi, Xj) is given by:

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
n 2 Positive (FPR) and False Negative (FNR) rates metric
Dij =   X ik − X jk  k=1,2,…,n (6) respectively.
k = 1 
TP
TPR = (10)
For every data point in the dataset, the Euclidean distance P
between an input data point and current point is calculated.
These distances are sorted in increasing order and k items with TN
TNR = (11)
lowest distances to the input data point are selected. The N
majority class among these items is found and the classifier FP
returns the majority class as the classification for the input FPR = (12)
point. Parameter tuning for k is carried out for k = 1, 3, 5, 7, 9, N
11 and k = 3 showed optimal performance. Thus, value of k = FN
3 is used in the classifier. FNR = (13)
P
E. Logistic Regression Classifier where TP, TN, FP and FN are the number of true positive, true
Logistic Regression which uses a functional approach to negative, false positive and false negative test cases classified
estimate the probability of a binary response based on one or while P and N are the total number of positive and negative
more variables (features). It finds the best-fit parameters to a class cases under test. True positives are cases classified as
nonlinear function called the sigmoid. The sigmoid function positive which are actually positive. True negative are cases
(σ) and the input (x) to the sigmoid function are shown in (7) classified rightly as negative. False positive are cases
and (8). classified as positive but are negative cases. False negative are
cases classified as negative but are truly positive.
1 The performance of naïve bayes, k-nearest neighbour and
σ ( x) =
(1 +  )
−x
(7) logistic regression classifiers are evaluated based on accuracy,
sensitivity, specificity, precision, Matthews correlation
x = w0 z 0 + w1 z1 + ... + wn z n (8) coefficient (MCC) and balanced classification rate. These
evaluation metrics are implored based on their relevance in
The vector z is input data and the best coefficients w, is evaluating imbalanced binary classification problem.
multiplied together multiply each element and adds up to get
TP + TN
one number which determines the classifier classification of Accuracy = (14)
the target class. If the value of the sigmoid is more than 0.5, TP + FP + TN + FN
it’s considered a 1; otherwise, it’s a 0. An optimization TP
method is used to train the classifier and find the best-fit Sensitivity = (15)
TP + FN
parameters. The gradient ascent (9) and modified stochastic
gradient ascent optimization methods were experimented on to TN
Specificity = (16)
evaluate their performance on the classifier. FP + TN
w := w + α∇ w f (w) (9) TP
Pr ecision = (17)
TP + FP
where the parameter ∇ is the magnitude of movement of the
gradient ascent. The steps are continued until a stopping
MCC =
(TP * TN ) − (FP * FN ) (18)
criterion is met. The optimization methods are investigated (TP + FP )(TP + FN )(TN + FP )(TN + FN )
(for iterations 50 to 1000) to know if the parameters are
converging. That is, are the parameters reaching a steady  TP TN 
value, or are they constantly changing. At 100 iterations, BCR = 1 *  +  (19)
2  P N 
steady values of parameters are achieved.
Stochastic gradient ascent incrementally updates the Sensitivity (Recall) gives the accuracy on positive (fraud)
classifier as new data comes in rather than all at once. It starts cases classification. Specificity gives the accuracy on negative
with all weights set to 1. Then for every feature value in the (legitimate) cases classification. Precision gives the accuracy
dataset, the gradient ascent is calculated. The weights vector is in cases classified as fraud (positive). Matthews Correlation
updated by the product of alpha and gradient. Then weight Coefficient (MCC) is an evaluation metric for binary
vector is returned. The stochastic gradient ascent is used in classification problems. MCC is used mainly with unbalanced
this study because given the large size of data it updates the data sets because its evaluation consists of TP, FP, TN and
weights using only one instance at a time, thus reducing FN. The MCC value is usually between -1 and +1; a +1 value
computational complexity. represents excellent classification while a -1 value represents
total distinction between classification and observation.
IV. PERFORMANCE EVALUATION AND RESULTS Balanced classification rate represents the average of
Four basic metrics are used in evaluating the experiments, sensitivity and specificity which is the portion of negatives
namely True positive (TPR), True Negative (TNR), False which are classified as negatives [33].

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
A. Results
In this study, three classifier models based on naive bayes,
k-nearest neighbour and logistic regression are developed. To TABLE 3. Accuracy result for 34:66 data distribution
evaluate these models, 70% of the dataset is used for training Classifiers
while 30% is set aside for validating and testing. Accuracy, Metrics k-Nearest Logistic
Naïve Bayes
sensitivity, specificity, precision, Matthews correlation Neighbour Regression
coefficient (MCC) and balanced classification rate are used to Accuracy 0.9769 0.9792 0.5486
evaluate the performance of the three classifiers. The accuracy
Sensitivity 0.9514 0.9375 0.5833
of the classifiers for the original 0.172:99.828 dataset
distribution, the sampled 10:90 and 34:66 distributions are Specificity 0.9896 1.0000 0.5313
presented in Tables 1, 2 and 3 respectively. Precision 0.9786 1.0000 0.3836
An observation of the metric tables shows that there is
Matthews
significant improvement from the sampled dataset distribution Correlation +0.9478 +0.9535 +0.1080
of 10:90 to 34:66 for accuracy, sensitivity, specificity, Coefficient
Matthews correlation coefficient and balanced classification Balanced
rate of the classifiers. This shows that a hybrid sampling Classification 0.9705 0.9688 0.5573
Rate
(under-sampling and over-sampling) on a highly imbalanced
dataset greatly improves the performance of binary
classification. The true positive, true negative, false positive
and false negative rates of the classifiers in each set of un-
TABLE 4. Basic metric rates for un-sampled data distribution
sampled and sampled data distribution is shown in Tables 4, 5
and 6. Logistic regression is the only technique that did not Classifiers
show better improvement in false negative rates from the Metrics k-Nearest Logistic
Naïve Bayes
10:90 to 34:66 data distribution. However, it showed overall Neighbour Regression

best performance in the un-sampled distribution. True Positive Rate 0.8072 0.8835 0.9767

False Positive Rate 0.0259 0.0288 0.0176

TABLE 1. Accuracy result for un-sampled data distribution
True Negative Rate 0.9741 0.9711 0.9824
Classifiers
False Negative Rate 0.1928 0.1165 0.0233
Metrics k-Nearest Logistic
Naïve Bayes
Neighbour Regression
Accuracy 0.9737 0.9691 0.9824

Sensitivity 0.8072 0.8835 0.9767

TABLE 5. Basic metric rates for 10:90 data distribution
Specificity 0.9741 0.9711 0.9824
Classifiers
Precision 0.0505 0.4104 0.0873 Metrics k-Nearest Logistic
Naïve Bayes
Matthews Neighbour Regression
Correlation +0.1979 +0.5903 +0.2893 True Positive Rate 0.8200 0.8285 0.7155
Coefficient
Balanced False Positive Rate 0.0250 0.000 0.7061
Classification 0.8907 0.9273 0.9796
Rate True Negative Rate 0.9750 1.0000 0.2939

False Negative Rate 0.1280 0.1715 0.2845

TABLE 2. Accuracy result for 10:90 data distribution

Classifiers
Metrics k-Nearest Logistic
Naïve Bayes
Neighbour Regression TABLE 6. Basic metric rates for 34:66 data distribution
Accuracy 0.9752 0.9715 0.3639
Classifiers
Sensitivity 0.8210 0.8285 0.7155 Metrics k-Nearest Logistic
Naïve Bayes
Neighbour Regression
Specificity 0.9754 1.0000 0.2939
True Positive Rate 0.9514 0.9375 0.5833
Precision 0.0546 1.0000 0.1678
False Positive Rate 0.0104 0.000 0.4688
Matthews
Correlation +0.2080 +0.8950 +0.0077 True Negative Rate 0.9896 1.0000 0.5313
Coefficient
False Negative Rate 0.0486 0.0625 0.4167
Balanced
Classificati 0.8975 0.9143 0.5047
on Rate

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
B. Comparative Performance
The performance evaluation of the three classifiers for the
34:66 data distribution is shown in figure 1. This data
distribution showed better performance. The k-nearest
neighbour technique showed superior performance across the
evaluation metrics used. It reached the highest value for
specificity and precision (that is 1.0) for the two data
distributions. This is because the kNN classifier recorded no
false positive in the classification. Naïve Bayes classifier only
outperformed the kNN in accuracy for the 10:90 data
distribution. The Logistic regression classifier showed the
least performance among the three classifiers evaluated.
However, there was significant improvement in performance
Figure 3.TPR and FPR evaluation of k-nearest neighbour classifiers
between the two sets of sampled data distribution. Since not
all related works carried out evaluation based on accuracy, *TPR = True Positive Rate
*FPR = False Positive Rate
sensitivity, specificity, precision, Matthews correlation *Proposed kNN = Proposed k-nearest neighbor classifier
coefficient and balanced classification rate, thus other related
works are compared with this study based on the basic true
positive and false positive rates. Figures 2 and 3 show the TPR
and FPR evaluation of proposed Naïve Bayes, kNN and LR
classifiers against other related works. The related works are
referenced using their reference number delimited within
square brackets “[ ]”.

Figure 4.TPR and FPR evaluation of Logistic Regression classifiers

*TPR = True Positive Rate
*FPR = False Positive Rate
*Proposed LR = Proposed Logistic Rgression classifier

It could be observed that our proposed kNN classifier recorded

zero false positive for both sets of data distributions (that is
Figure 1. Performance evaluation chart for Naïve bayes, kNN and
10:90 and 34:66 datasets). Thus, the classifier shows better
Logistic Regression performance than reviewed works. The true positive and false
positive rates evaluation on logistic regression with other
*MCC = Matthews Correlation Coefficient
*BCR = Balanced Classification Rate works is shown in figure 4. There is an overlap between true
positive and false positive rate for the 10:90 data distribution
unlike in figures 2 and 3. This shows that the logistic
regression classifier performs better on the un-sampled dataset
than the two sampled sets.
V. CONCLUSION
This paper investigates the comparative performance of
Naïve Bayes, K-nearest neighbor and Logistic regression
models in binary classification of imbalanced credit card fraud
data. The rationale for investigating these three techniques is
due to less comparison they have attracted in past literature.
However, a subsequent study to compare other single and
ensemble techniques using our approach is underway. The
contribution of the paper is summarized in the following:
Figure 2. TPR and FPR evaluation of Naive Bayes classifiers 1. Three classifiers based on different machine learning
*TPR = True Positive Rate techniques (Naïve Bayes, K-nearest neighbours and
*FPR = False Positive Rate
*Proposed NB = Proposed Naïve Bayes classifier

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
Logistic Regression) are trained on real life of credit turkish bank. In Data Mining Workshops (ICDMW), 2013 IEEE
card transactions data and their performances on 13th International Conference on (pp. 162-171). IEEE.
credit card fraud detection evaluated and compared [9] Bahnsen, A. C., Stojanovic, A., Aouada, D., & Ottersten, B.
(2014). Improving credit card fraud detection with calibrated
based on several relevant metrics. probabilities. In Proceedings of the 2014 SIAM International
2. The highly imbalanced dataset is sampled in a hybrid Conference on Data Mining (pp. 677-685). Society for Industrial
approach where the positive class is oversampled and and Applied Mathematics.
the negative class under-sampled, achieving two sets [10] Ng, A. Y., and Jordan, M. I., (2002). On discriminative vs.
generative classifiers: A comparison of logistic regression and
of data distributions. naive bayes. Advances in neural information processing
3. The performances of the three classifiers are systems, 2, 841-848.
examined on the two sets of data distributions using [11] Maes, S., Tuyls, K., Vanschoenwinkel, B., & Manderick, B.
accuracy, sensitivity, specificity, precision, balanced (2002). Credit card fraud detection using Bayesian and neural
classification rate and Matthews Correlation networks. In Proceedings of the 1st international naiso congress
on neuro fuzzy technologies (pp. 261-270).
coefficient metrics.
[12] Shen, A., Tong, R., & Deng, Y. (2007). Application of
Performance of classifiers varies across different evaluation classification models on credit card fraud detection. In Service
metrics. Results from the experiment shows that the kNN Systems and Service Management, 2007 International
shows significant performance for all metrics evaluated except Conference on (pp. 1-4). IEEE.
for accuracy in the 10:90 data distribution. This study shows [13] Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C.
the effect of hybrid sampling on the performance of binary (2011). Data mining for credit card fraud: A comparative
classification of imbalanced data. Expected future areas of study. Decision Support Systems, 50(3), 602-613.
research could be in examining meta-classifiers and meta- [14] Sahin, Y. and Duman, E., (2011). Detecting credit card fraud by
ANN and logistic regression. In Innovations in Intelligent
learning approaches in handling highly imbalanced credit card Systems and Applications (INISTA), 2011 International
fraud data. Also effects of other sampling approaches can be Symposium on (pp. 315-319). IEEE.
investigated. [15] Chaudhary, K. and Mallick, B., (2012). Credit Card Fraud: The
study of its impact and detection techniques, International
Acknowledgment Journal of Computer Science and Network (IJCSN), Volume 1,
Issue 4, pp. 31 – 35, ISSN: 2277-5420
We wish to acknowledge Nwaiwu John C for his effort in [16] Bhatla, T.P.; Prabhu, V.; and Dua, A. (2003).
the experimentation carried out and Pozzolo et al [32] for the Understanding credit card frauds. Crads Business Review#
source and description of the credit card fraud data. 2003-1, Tata Consultancy Services
[17] The Nilson Report. (2015). U.S. Credit & Debit Cards 2015.
References David Robertson.
[18] Stolfo, S., Fan, D. W., Lee, W., Prodromidis, A., & Chan, P.
[1] Maes, S., Tuyls, K., Vanschoenwinkel, B. and Manderick, B., (1997). Credit card fraud detection using meta-learning: Issues
(2002). Credit card fraud detection using Bayesian and and initial results. In AAAI-97 Workshop on Fraud Detection
neural networks. Proceeding International NAISO Congress on and Risk Management.
Neuro Fuzzy Technologies. [19] Bahnsen, A. C., Stojanovic, A., Aouada, D., & Ottersten, B.
[2] Ogwueleka, F. N., (2011). Data Mining Application in Credit (2013). Cost sensitive credit card fraud detection using Bayes
Card Fraud Detection System, Journal of Engineering Science minimum risk. In Machine Learning and Applications (ICMLA),
and Technology, Vol. 6, No. 3, pp. 311 – 322 2013 12th International Conference on (Vol. 1, pp. 333-338).
[3] RamaKalyani, K. and UmaDevi, D., (2012). Fraud Detection of IEEE.
Credit Card Payment System by Genetic Algorithm, [20] Pun, J. K. F. (2011). Improving Credit Card Fraud Detection
International Journal of Scientific & Engineering Research, using a Meta-Learning Strategy (Doctoral dissertation,
Vol. 3, Issue 7, pp. 1 – 6, ISSN 2229-5518 University of Toronto).
[4] Meshram, P. L., and Bhanarkar, P., (2012). Credit and ATM [21] Sahin, Y., Bulkan, S., & Duman, E. (2013). A cost-sensitive
Card Fraud Detection Using Genetic Approach, International decision tree approach for fraud detection. Expert Systems with
Journal of Engineering Research & Technology (IJERT), Vol. 1 Applications, 40(15), 5916-5923.
Issue 10, pp. 1 – 5, ISSN: 2278-0181 [22] Bolton, R. J. and Hand, D. J., (2001). Unsupervised profiling
[5] Singh, G., Gupta, R., Rastogi, A., Chandel, M. D. S., and Riyaz, methods for fraud detection, Conference on Credit Scoring and
A., (2012). A Machine Learning Approach for Detection of Credit Control, Edinburgh.
Fraud based on SVM, International Journal of Scientific [23] Kou, Y., Lu, C-T., Sinvongwattana, S. and Huang, Y-P., (2004).
Engineering and Technology, Volume No.1, Issue No.3, pp. Survey of Fraud Detection Techniques, In Proceedings of the
194-198, ISSN : 2277-1581 2004 IEEE International Conference on Networking, Sensing &
[6] Seeja, K. R., and Zareapoor, M., (2014). FraudMiner: A Novel Control, Taipei, Taiwan, March 21-23.
Credit Card Fraud Detection Model Based on Frequent Itemset [24] Wheeler, R., and Aitken, S. (2000). Multiple algorithms for
Mining, The Scientific World Journal, Hindawi Publishing fraud detection. Knowledge-Based Systems, 13(2), 93-99.
Corporation, Volume 2014, Article ID 252797, pp. 1 – 10, Elsevier
http://dx.doi.org/10.1155/2014/252797
[25] Phua, C., Alahakoon, D., & Lee, V. (2004). Minority report in
[7] Patil, S., Somavanshi, H., Gaikwad, J., Deshmane, A., and fraud detection: classification of skewed data. Acm sigkdd
Badgujar, R., (2015). Credit Card Fraud Detection Using explorations newsletter, 6(1), 50-59.
Decision Tree Induction Algorithm, International Journal of
Computer Science and Mobile Computing (IJCSMC), Vol.4, [26] Chu, F., Wang, Y., & Zaniolo, C. (2004). An adaptive learning
Issue 4, pp. 92-95, ISSN: 2320-088X approach for noisy data streams. In Data Mining, 2004.
ICDM'04. Fourth IEEE International Conference on (pp. 351-
[8] Duman, E., Buyukkaya, A., & Elikucuk, I. (2013). A novel and 354). IEEE
successful credit card fraud detection system implemented in a

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.
[27] Panigrahi, S., Kundu, A., Sural, S., & Majumdar, A. K. (2009). classification. International Journal of Computer Science and
Credit card fraud detection: A fusion approach using Dempster– Applications, 6(2), 256-261.
Shafer theory and Bayesian learning. Information Fusion, 10(4), [32] Pozzolo, A. D., Caelen, O., Johnson, R. A., and Bontempi, G.,
354-363. (2015). Calibrating Probability with Undersampling for
[28] Sahin, Y. and Duman, E., (2011). Detecting Credit Card Fraud Unbalanced Classification. In Symposium on Computational
by Decision Trees and Support Vector Machines, Proceedings Intelligence and Data Mining (CIDM), IEEE.
of International Multi-Conference of Engineers and Computer [33] Fahmi, M., Hamdy, A. and Nagati, K., (2016). Data Mining
Scientists (IMECS 2011), Mar. 16-18, Hong Kong, Vol. 1, pp. 1 Techniques for Credit Card Fraud Detection: Empirical Study,
- 6, ISBN: 978-988-18210-3-4, ISSN: 2078-0966 (Online) In Sustainable Vital Technologies in Engineering and
[29] Sherly, K. K. (2012). A comparative assessment of supervised Informatics BUE ACE1, pp. 1 – 9, Elsevier Ltd.
data mining techniques for fraud prevention. TIST. Int. J. Sci. [34] Islam, M. J., Wu, Q. M. J., Ahmadi, M. and Sid-Ahmed, M. A.,
Tech. Res, 1(16). (2007). Investigating the Performance of Naive- Bayes
[30] Pun, J., and Lawryshyn, Y. (2012). Improving credit card fraud Classifiers and KNearestNeighbor Classifiers. IEEE,
detection using a meta-classification strategy. International International Conference on Convergence Information
Journal of Computer Applications, 56(10). Technology, pp. 1541-1546.
[31] Patil, T. R., & Sherekar, S. S. (2013). Performance analysis of
Naive Bayes and J48 classification algorithm for data

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on June 14,2023 at 00:30:21 UTC from IEEE Xplore. Restrictions apply.

Class 12 Marksheet
0% (1)
Class 12 Marksheet
1 page
Credit Card Fraud Detect
No ratings yet
Credit Card Fraud Detect
19 pages
Real-Time Credit Card Fraud Detection Using Machine Learning
No ratings yet
Real-Time Credit Card Fraud Detection Using Machine Learning
6 pages
Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
Credit Card Fraud Detection Using Machine Learning Techniques
9 pages
Analysis On Credit Card Fraud Detection Methods
No ratings yet
Analysis On Credit Card Fraud Detection Methods
19 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
TCA (Trading Community Architecture) in R12 and Beyond: Presenter: Malik Aziz
No ratings yet
TCA (Trading Community Architecture) in R12 and Beyond: Presenter: Malik Aziz
36 pages
Types of Key
No ratings yet
Types of Key
6 pages
Machine Learning For Credit Card Fraud D
No ratings yet
Machine Learning For Credit Card Fraud D
6 pages
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
2 pages
10.1007@s41870 020 00430 y PDF
No ratings yet
10.1007@s41870 020 00430 y PDF
9 pages
MPML10 2022 FR
No ratings yet
MPML10 2022 FR
24 pages
paper 2
No ratings yet
paper 2
9 pages
Data Quality Analysis Based Machine Learning Model
No ratings yet
Data Quality Analysis Based Machine Learning Model
28 pages
Analysis and Prediction For Credit Card Fraud
No ratings yet
Analysis and Prediction For Credit Card Fraud
19 pages
Synopsis ON "Credit Card Fraud Detection System"
100% (1)
Synopsis ON "Credit Card Fraud Detection System"
14 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Research Paper 4 (Abnormal Transactions)
No ratings yet
Research Paper 4 (Abnormal Transactions)
7 pages
Credit Card Fraud Detection Techniques
No ratings yet
Credit Card Fraud Detection Techniques
8 pages
Bankingfraude-Data Mining
No ratings yet
Bankingfraude-Data Mining
12 pages
itmconf_icdsia2023_02012
No ratings yet
itmconf_icdsia2023_02012
10 pages
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
No ratings yet
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
7 pages
Bridget
No ratings yet
Bridget
6 pages
Ms Arjocs 1355
No ratings yet
Ms Arjocs 1355
13 pages
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
No ratings yet
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
11 pages
A Performance Analysis of Machine Learning Techniques For Credit Card Fraud Detection
No ratings yet
A Performance Analysis of Machine Learning Techniques For Credit Card Fraud Detection
21 pages
Credit Card Fraud Detection Using Adaboost and Majority Voting
100% (1)
Credit Card Fraud Detection Using Adaboost and Majority Voting
4 pages
Analysis On Credit Card Fraud Detection Methods
No ratings yet
Analysis On Credit Card Fraud Detection Methods
5 pages
Credit Card Research Paper
No ratings yet
Credit Card Research Paper
12 pages
ASystematic Review of Intelligent Systems and Analytic
No ratings yet
ASystematic Review of Intelligent Systems and Analytic
22 pages
Implementation of Credit Card Fraud Detection Using Support Vector Machine
No ratings yet
Implementation of Credit Card Fraud Detection Using Support Vector Machine
13 pages
Article-2017 - A Novel Idea For Credit Card Fraud Detection Using Decision Tree
No ratings yet
Article-2017 - A Novel Idea For Credit Card Fraud Detection Using Decision Tree
5 pages
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
No ratings yet
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
7 pages
Analysis On Credit Card Fraud Detection Methods
0% (1)
Analysis On Credit Card Fraud Detection Methods
7 pages
An Improved Hybrid System for The Prediction of Debit and Credit Card Fraud
No ratings yet
An Improved Hybrid System for The Prediction of Debit and Credit Card Fraud
16 pages
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
No ratings yet
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
9 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Report Credit Card
No ratings yet
Report Credit Card
26 pages
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
No ratings yet
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
6 pages
Menakshi Satwinder
No ratings yet
Menakshi Satwinder
10 pages
A_Review_of_Machine_Learning_Applications_for_Cred
No ratings yet
A_Review_of_Machine_Learning_Applications_for_Cred
11 pages
A_Review_on_Credit_Card_Fraud_Detection_Using_Mach
No ratings yet
A_Review_on_Credit_Card_Fraud_Detection_Using_Mach
6 pages
Credit Card Fraud Detection Report
No ratings yet
Credit Card Fraud Detection Report
31 pages
Credit Fraude PDF
No ratings yet
Credit Fraude PDF
6 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
Bioconf Iscku2024 00076
No ratings yet
Bioconf Iscku2024 00076
18 pages
Redit Card Fraud Detection Using Machine Learning as Data Mining Technique
No ratings yet
Redit Card Fraud Detection Using Machine Learning as Data Mining Technique
5 pages
Credit Card Fraud Detection: Title
No ratings yet
Credit Card Fraud Detection: Title
5 pages
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
No ratings yet
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
25 pages
Evaluation of Supervised Machine Learning Algorithms for Credit Card Fraud Detection a Comparison
No ratings yet
Evaluation of Supervised Machine Learning Algorithms for Credit Card Fraud Detection a Comparison
6 pages
Comparative Analysis of Back-Propagation Neural Network and K-Means Clustering in Fraud Detection
No ratings yet
Comparative Analysis of Back-Propagation Neural Network and K-Means Clustering in Fraud Detection
13 pages
Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
No ratings yet
Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
5 pages
10 31127-Tuje 1386127-3517135
No ratings yet
10 31127-Tuje 1386127-3517135
13 pages
RESEARCHINTELre
No ratings yet
RESEARCHINTELre
8 pages
Real Time Credit Card Fraud Detection Using Machine Learning
No ratings yet
Real Time Credit Card Fraud Detection Using Machine Learning
3 pages
Paper-7 - Supervised Machine Learning Model For Credit Card Fraud Detection
No ratings yet
Paper-7 - Supervised Machine Learning Model For Credit Card Fraud Detection
7 pages
Cred Card Fraud Chhapa
No ratings yet
Cred Card Fraud Chhapa
5 pages
Comparative Study of Machine Learning Algorithms F
No ratings yet
Comparative Study of Machine Learning Algorithms F
11 pages
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
From Everand
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
Prabhs Uyyala
No ratings yet
Credit Card Underworld
From Everand
Credit Card Underworld
Emily Johnson
No ratings yet
AI Security
From Everand
AI Security
Kai Turing
No ratings yet
Unmasking Deception: Advanced Forensic Accounting Techniques for Fraud Detection
From Everand
Unmasking Deception: Advanced Forensic Accounting Techniques for Fraud Detection
Elizabeth Mogopodi
No ratings yet
Computer Aided Fraud Prevention and Detection: A Step by Step Guide
From Everand
Computer Aided Fraud Prevention and Detection: A Step by Step Guide
David Coderre
No ratings yet
Steps To Effective Campaign Creation: Guidelines
No ratings yet
Steps To Effective Campaign Creation: Guidelines
28 pages
Object Oriented Programming Concepts Using Java
No ratings yet
Object Oriented Programming Concepts Using Java
2 pages
ABC Cook Book
No ratings yet
ABC Cook Book
20 pages
Week1 Intro To Java
No ratings yet
Week1 Intro To Java
52 pages
Advances in Banking Technology and Management (2008) PDF
No ratings yet
Advances in Banking Technology and Management (2008) PDF
381 pages
Release Note - 3 - MiCOM S1 Studio V3.5.1
No ratings yet
Release Note - 3 - MiCOM S1 Studio V3.5.1
5 pages
Icsp
100% (1)
Icsp
14 pages
BBEdit 12.0.1 User Manual
No ratings yet
BBEdit 12.0.1 User Manual
388 pages
M.SC - IT Sem 1234 Syllabus
No ratings yet
M.SC - IT Sem 1234 Syllabus
136 pages
Exemplar 8 1
No ratings yet
Exemplar 8 1
7 pages
Matrices & Determinant Test No. 1
No ratings yet
Matrices & Determinant Test No. 1
2 pages
State Transition Diagram
100% (1)
State Transition Diagram
19 pages
Advc2-3001advc Custom Logic Tool r1.1 Web
100% (1)
Advc2-3001advc Custom Logic Tool r1.1 Web
36 pages
Divide by 3 FSM
No ratings yet
Divide by 3 FSM
14 pages
HFSS 13patchantennacoax PDF
No ratings yet
HFSS 13patchantennacoax PDF
36 pages
Release Form BS - 847
No ratings yet
Release Form BS - 847
1 page
VOG Rate Card 2011
No ratings yet
VOG Rate Card 2011
13 pages
Comp3121 9101-3.18
No ratings yet
Comp3121 9101-3.18
4 pages
2.1.4.8 Packet Tracer - Navigating The IOS Instructions
No ratings yet
2.1.4.8 Packet Tracer - Navigating The IOS Instructions
5 pages
Day 1 - Intro To DSA
No ratings yet
Day 1 - Intro To DSA
20 pages
Gantt Chart Sample
75% (4)
Gantt Chart Sample
3 pages
D2XX Programmer's Guide (FT 000071) 3
No ratings yet
D2XX Programmer's Guide (FT 000071) 3
112 pages
Loading The Data For Time Hierarchy
No ratings yet
Loading The Data For Time Hierarchy
17 pages
Gartner Best Practices Planning SAP HANA PDF
No ratings yet
Gartner Best Practices Planning SAP HANA PDF
30 pages
NUREG CR-6463 Review Guidelines On Software Languages For Use in NPP Safety Systems
No ratings yet
NUREG CR-6463 Review Guidelines On Software Languages For Use in NPP Safety Systems
432 pages
3900 Series Base Station Product Documentation V100R009C00 - 14 20170925155759 PDF
No ratings yet
3900 Series Base Station Product Documentation V100R009C00 - 14 20170925155759 PDF
10 pages
Control Structure Design For Complete Chemical Plants: Sigurd Skogestad
No ratings yet
Control Structure Design For Complete Chemical Plants: Sigurd Skogestad
16 pages

Credit Card Fraud Detection Using Machine Learning Techniques A Comparative Analysis

Uploaded by

Credit Card Fraud Detection Using Machine Learning Techniques A Comparative Analysis

Uploaded by

Credit card fraud detection using Machine Learning

John O. Awoyemi Adebayo O. Adetunmbi Samuel A. Oluwadare

978-1-5090-4642-3/17/$31.00 ©2017 IEEE

False Positive Rate 0.0259 0.0288 0.0176

Sensitivity 0.8072 0.8835 0.9767

False Negative Rate 0.1280 0.1715 0.2845

TABLE 2. Accuracy result for 10:90 data distribution

Figure 4.TPR and FPR evaluation of Logistic Regression classifiers

It could be observed that our proposed kNN classifier recorded

You might also like