Redit Card Fraud Detection Using Machine Learning as Data Mining Technique
Redit Card Fraud Detection Using Machine Learning as Data Mining Technique
Abstract—The rapid participation in online based detect credit card fraud activities.
transactional activities raises the fraudulent cases all over the Data mining is known as the process of gaining interesting,
world and causes tremendous losses to the individuals and novel and insightful patterns as well as discovering
financial industry. Although there are many criminal activities understandable, descriptive and predictive models from large
occurring in financial industry, credit card fraudulent activities
are among the most prevalent and worried about by online
scale of data collections [5, 6]. The ability of data mining
customers. Thus, countering the fraud activities through data techniques to extract fruitful information from large scale of
mining and machine learning is one of the prominent data using statistical and mathematical techniques would
approaches introduced by scholars intending to prevent the assist credit card fraud detection based on differentiating the
losses caused by these illegal acts. Primarily, data mining characteristics of common and suspicious credit card
techniques were employed to study the patterns and transactions. While data mining focused on discovering
characteristics of suspicious and non-suspicious transactions valuable intelligence, machine learning is rooted in learning
based on normalized and anomalies data. On the other hand, the intelligence and developing its own model for the purpose
machine learning (ML) techniques were employed to predict the of classification, clustering or so on.
suspicious and non-suspicious transactions automatically by
using classifiers. Therefore, the combination of machine
The application of machine learning techniques spreads
learning and data mining techniques were able to identify the widely throughout computer sciences domains such as spam
genuine and non-genuine transactions by learning the patterns filtering, web searching, ad placement, recommender
of the data. This paper discusses the supervised based systems, credit scoring, drug design, fraud detection, stock
classification using Bayesian network classifiers namely K2, trading, and many other applications. Machine Learning
Tree Augmented Naïve Bayes (TAN), and Naïve Bayes, logistics classifiers operate by building a model from example inputs
and J48 classifiers. After preprocessing the dataset using and using that to make predictions or decisions, rather than
normalization and Principal Component Analysis, all the following strictly static program instructions. There are many
classifiers achieved more than 95.0% accuracy compared to different types of machine learning approaches available with
results attained before preprocessing the dataset.
the intentions to solve heterogeneous problems. Due to the
Index Terms—Credit Card; Data Mining; Fraud Detection; nature of this study which was focused on classification, the
Machine Learning. discussion that follows is based on this topic. Machine
learning classification refers to the process of learning to
I. INTRODUCTION assign instances to predefined classes. Formally, there are
several types of learning such as supervised, semi-supervised,
According to Global Payments Report 2015, credit card is the unsupervised, reinforcement, transduction and learning to
highest used payment method globally in 2014 compared to learn [7]. As the interest of this study was to conduct
other methods such as e-wallet and Bank Transfer [1]. The supervised based machine learning classification, the
huge transactional services are often eyed by cyber criminals discussions about the rest of the methods are discarded from
to conduct fraudulent activities using the credit card services. further elaboration. In most classification studies, supervised-
Credit card fraud is defined as the unauthorized usage of card, based learning is favoured more than other methods due to
unusual transaction behavior, or transactions on an inactive the ability to control the classes of the instances with the
card [2]. In general, there are three categories of credit card interventions of human. In supervised learning, the classes of
fraud namely, conventional frauds (e.g. stolen, fake and the instances would be labeled prior to feeding into
counterfeit), online frauds (e.g. false/fake merchant sites), classifiers. Then, by using certain evaluation metrics, the
and merchant related frauds (e.g. merchant collusion and performances of the classifiers could be measured.
triangulation) [3]. In the case of credit card fraud detection, the binary
In the past couple of the years, credit card breaches have classification technique was employed due to the instances
been trending alarmingly. According to Nilson Report, the labeled as fraud and non-fraud. The inputs were transformed
global credit card fraud losses reached $16.31 billion in 2014 as Boolean x = (x1,…, xj), where xj = 1, if the jth
and it is estimated that it will exceed $35 billion in 2020 [4]. characteristics appeared in the instances, but otherwise, xj =
Therefore, it is necessary to develop credit card fraud 0. A classifier input a training set into (xi, yi), where xi = (xi,
detection techniques as the counter measure to combat illegal . . . , xq) was an observed input and yi was the corresponding
activities. In general, credit card fraud detection has been output of the classifier. The rest of the paper is organized into
known as the process of identifying whether transactions are background studies, research methodology, results,
genuine or fraudulent. As the data mining and machine discussions and conclusions.
learning techniques are vastly used to counter cyber-criminal
cases, scholars often embraced those approaches to study and
Blacklisted
Address
country
Similar
Exceed
Fraud
Case
The overview of the research methodology illustrated in The following paragraphs will elaborate on data
Figure 2. transformation and data reduction. Generally, data
transformation and data reduction are referred to as data pre-
processing phase, where the raw data is cleaned and
ensured by noticing the ability of the WEKA to produce non- support to this work.
zero results. Generally, WEKA would not be able to process
the data if the data is highly unstructured and would return REFERENCES
N/A (Not Applicable) results, errors, or freeze during
[1] WorldPay. (2015, Nov). Global payments report preview: your
modeling process. However, it did not happen to our dummy
definitive guide to the world of online payments. Retrieved September
dataset. Furthermore, the development of the dummy dataset 28, 2016, from http://offers.worldpayglobal.com/rs/850-JOA-
was based on attributes commonly used for credit card fraud 856/images/GlobalPaymentsReportNov2015.pdf.
detection and created automatically by using GNU data [2] Federal Trade Commision. (2008). consumer sentinel network - data
book for January - December 2008. Retrieved Oct 20, 2016. From
generation scripts. Then, as always emphasized by many data
https://www.ftc.gov/.
mining researchers, the preprocessing of raw dataset is an [3] Bhatla, T.P., Prabhu, V., and Dua, A. (2003). understanding credit card
essential factor to improve the classification results. This has frauds. Crads Business Review# 2003-1, Tata Consultancy Services.
been proven by observing the differences between results of [4] The Nilson Report. (2015). Global fraud losses reach $16.31 Billion.
Edition: July 2015, Issue 1068.
Experiment 1 and Experiment 2. The improvement on
[5] Y. Sahin and E. Duman, “Detecting credit card fraud by decision trees
Experiment 2 after data transformation and data reduction and support vector machines”, Proceedings of the International Multi-
significantly improve the classification performances. As Conference of Engineers and Computer Scientists 2011 Vol I, IMECS
mentioned earlier, the strength of Principal Component 2011, March 2011.
[6] Elkan, C. (2001). Magical thinking in data mining: lessons from COIL
Analysis that reduced the dimensionality, losing much the
challenge 2000. Proc. of SIGKDD01, 426-431.
information from the attributes was one of the major factor [7] Mohammed, J. Zaki., & Wagner, Meira Jr. (2014). Data mining and
that improved the classification process. Therefore, we analysis: fundamental concepts and algorithms. Cambridge University
believed that Principal Component Analysis technique is the Press. ISBN 978-0-521-76633-3.
[8] F. N. Ogwueleka. (2011). Data mining application in credit card fraud
better filtering approach to be considered and to be used in
detection system. Journal of Engineering Science and Technology, Vol.
credit card fraud detection processes. Then, our classification 6, No. 3 (2011) 311 - 322.
process also proved that Bayesian based classifiers such as [9] V. Bhusari & S. Patil. (2011). Application of hidden markov model in
K2, Naïve Bayesian, Tan, Logistics and J48 were able to credit card fraud detection. International Journal of Distributed and
Parallel Systems (IJDPS) Vol.2, No.6.
classify and predict the credit card fraud activities better if the
[10] S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan.
data was preprocessed using reliable filtering techniques. (1998). Credit card fraud detection using meta-learning: issues and
Moreover, after the dimensionality of the raw data was initial results, Proc. AAAI Workshop AI Methods in Fraud and Risk
reduced by using Principal Component Analysis, the authors Management, pp. 83-90.
[11] Sen, Sanjay Kumar., & Dash, Sujatha. (2013). Meta learning
of this study found that the terminal_id attributes were largely
algorithms for credit card fraud detection. International Journal of
reduced.. Therefore, we made the assumptions that Engineering Research and Development Volume 6, Issue 6, pp. 16-20.
terminal_id information contribute less to the credit card [12] Maes, Sam, Tuyls Karl, Vanschoenwinkel Bram & Manderick,
fraud detection. However, the investigation of credit card Bernard. (2002). Credit card fraud detection using bayesian and neural
networks. Proc. of 1st NAISO Congress on Neuro Fuzzy Technologies.
hacking based on physical methods (e.g. hardware stressing)
Hawana.
has to use terminal_id attributes as the reference to identify [13] A.C. Bahnsen, Aleksandar, Stojanovic., D. Aouada & Bjorn, Ottersten.
the illegal activity. (2013). Cost sensitive credit card fraud detection using bayes minimum
In the future, this study will attempt to explore more credit risk. 12th International Conference on Machine Learning and
Applications.
card fraud detections using real time data. Then, since the
[14] Amlan Kundu, Suvasini Panigrahi, Shamik Sural and Arun K.
Bayesian Networks classifiers showed better results, the Majumdar. (2009). Credit card fraud detection: a fusion approach
comparisons with other types of classifiers such as using dempster–shafer theory and bayesian learning. Special Issue
Hyperplane based may contribute further to the body of the on Information Fusion in Computer Security, Vol. 10, Issue No. 4,
pp.354-363.
knowledge.
[15] Lam, Bacchus (1994). Learning bayesian belief networks: an approach
based on the MDL principle. Computational Intelligence, Vol. 10, Issue
VI. CONCLUSION No. 3, pp.269–293.
[16] M. Mehdi, S. Zair, A. Anou and M. Bensebti (2007). A bayesian
networks in intrusion detection systems. International Journal of
This paper tested classification metrics by using five
Computational Intelligence Research, Issue No. 1, pp.0973-1873 Vol.
Bayesian classifiers namely Naïve Bayes, K2, TAN, 3.
Logistics and J48. The evaluations conducted using two [17] R.Najafi & Afsharchi, Mohsen. (2012). Network intrusion detection
datasets, where, the first dataset was a dummy dataset that using tree augmented naive-bayes. The Third International Conference
on Contemporary Issues in Computer and Information Sciences (CICI)
represented the characteristics of credit card data and a newly
2012.
transformed dataset using data normalization and Principal [18] G. Cooper, E. Herskovits (1992). A bayesian method for the induction
Component Analysis techniques. Overall, all the Bayesian of probabilistic networks from data. Machine Learning. 9(4):309-347.
classifiers achieved significantly better results after being fed [19] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers.
with filtered data.
[20] Friedman, N. and Goldszmidt, M. (1996). Building classifiers using
bayesian networks. Proc. 13th National Conference on Artificial
ACKNOWLEDGMENT Intelligence.Vol. 2, pp 1277-1284.
[21] Friedman, N., Geiger, D. and Goldszmidt, M. (1997). Bayesian
network classifiers. machine learning,Vol. 29, pp 131-163. Kluwer
We are grateful to Universiti Sains Malaysia for providing
Academic Publishers, Boston.