Classification_of_Fraud_Calls_by_Intent_Analysis_of_Call_Transcripts
Classification_of_Fraud_Calls_by_Intent_Analysis_of_Call_Transcripts
of the most efficient techniques for word embedding principle of this algorithm is that every pair of features that
“Word2Vec”. Word2vec is developed by Google which are being classified are independent of each other.
has pre-trained word embeddings. To train our word In this model, we first stored the positive and negative i.e
embeddings, we used the Gensim Python package which fraud and non-fraud call transcripts present in the training
uses Word2vec calculations. Gensim expects the input data set and tokenized each word. The tokens of positive and
of sentences sequentially. It trains the word and stores negative classes are stored in different dictionaries. Both the
it in the KeyedVector instance. Gensim has several pre- dictionaries are then combined and then passed to the model.
trained models. Once the word vectors are trained they IV. I MPLEMENTATION & R ESULTS
are stored in a format that is compatible with word2Vec
To implement the proposed methodology, the system should
implementation.
have a stable internet connection and the required data set. For
• Convolutional Layer: Before implementing the CNN
the application to run successfully a system having a minimum
model, we first added padding to the sequence to make
RAM capacity of 4GB and a maximum of 8GB is required.
each sentence of the same length. This is achieved by
After implementing our methodology, we obtained an ac-
finding the length of the longest sequence. After padding
curacy of 95.47% for the Naive Bayes model and 97.21% for
the sequences, we implemented the Convolutional 1-D
the CNN model respectively.
layer using the Keras library in Python. This layer is
As shown in the confusion matrix in Fig. 4 where the rows
in between the Embedding layer and GlobalMaxPool-
represent the actual labels and columns represent the predicted
ing1D layer. This layer has several parameters and the
labels, for the Naive Bayes algorithm, most of the calls were
important ones are Kernel size, filter size and activation
classified correctly. Also, the number of normal calls is more
type. Typically, in word embedding, each sentence is
than the number of fraud calls even while testing for a small
represented in a matrix form. The rows of the matrix
subset of calls. Furthermore, none of the normal calls were
represent the tokens in the sentence and the columns
classified to be fraud calls. A less number of fraud calls were
represent vectorize words. This matrix is convolved with
classified to be normal calls.
different filter sizes in the Convolutional 1-D layer.
Since the data set is highly imbalanced, we cannot rely on
We used the filter sizes of [2,3,4,5,6]. The kernel size in
the model’s accuracy only. So to check the model’s perfor-
CNN represents the sequence of words it will convolve
mance we plotted the graph to display the precision, recall
at a given time. So, during the convolution process the
and F1 score of both models. We checked our results using
sequence of words according to the kernel size are taken
the evaluation parameters: precision, recall and F1 scores.
into consideration and are multiplied by the filter size.
Precision, also known as true positive rate, tells us the
These multiplication results are then summed together
number of positive class predictions that truly belong to the
and then feed to an activation function. The activation
positive class. From Fig. 5 and Fig. 6 we observed that for
function that we used is the Rectified Linear Unit(relu).
both the algorithms our precision is high which means that
This function gives the feature value and the mathemati-
the model does not give out many false positives. On the other
cal formula used is as shown in (1):
hand, recall tells us how correctly the model identifies the True
ci = f (w ∗ xi:i+m−1 + b) (1) Positives. The recall for the models in the case of the positive
class is low which implies that there are quite a few instances
Here, c = convolutional process, w = word matrix, x = of positive class i.e. a fraud call to be predicted as negative
element wise multiplication operation, b = bias term b i.e. normal call.
from that row. Once the convolution process is completed
for one filter, all the features obtained by the relu function
are mapped to the feature map as [c1, c2, c3...c(m-1)].
• Global Max Pooling I-D: We then applied the Glob-
alMaxPolling1D layer on the convolution layer to get
the maximum value of the features in a pool for each
feature dimension. When all the filters are applied to the
convolutional layer, a list of feature values is made using
this max-pooling feature.
The final step in CNN is to form a full connection
layer which includes the dropout and regularization from
the final feature vector to the output layer. We then
summarised our model on the training set by displaying
the type of layer used, the Output Shape of each layer
and the connection between layers.
2) Naive Bayes: The Naive Bayes classifier is a classi- Fig. 4. Confusion Matrix of the Naive Bayes Model
fication algorithm based on the Bayes Theorem. The main
R EFERENCES
[1] ’Sujay Radhakrishna Vikhepatil, Shrikant Eknath Shinde,
Hemant Patil, Unmesh Bhaiyyasaheb Patil, Sambhajirao
Mane Dhairyasheel’, Fraudulent Usage of Credit/Debit Card,
http://loksabhaph.nic.in/Questions/QResult15.aspx?qref=15384&lsno=17
[2] ’Sumedhanand Saraswati’, Online Frauds and
Scams,http://loksabhaph.nic.in/Questions/QResult15.aspx?qref=
17288&lsno=17
[3] Ş. Şentürk, E. Yerli and İ. Soğukpınar, ”Email phishing detection
and prevention by using data mining techniques”, 2017 International
Conference on Computer Science and Engineering (UBMK), Antalya,
Turkey, 2017
Fig. 5. Results of the CNN Model [4] M.A. Jabbar, Suharjito, ”Fraud Detection Call Detail Record Using
Machine Learning in Telecommunications Company”, Advances in
Science, Technology and Engineering Systems Journal, vol. 5, no. 4,
pp. 63-69 (2020)
[5] Elijah M. Maseno, “Vishing Attack Detection Model For Mobile Users”,
KCA University, 2017
[6] Zhao, Q., Chen, K., Li, T. et al. ”Detecting telecommunication fraud by
understanding the contents of a call”, Cybersecur 1, 8 (2018)
[7] Hollmén, Jaakko & Tresp, Volker, ”Call-Based Fraud Detection in Mo-
bile Communication Networks Using a Hierarchical Regime-Switching
Model”, 889-895
[8] Oren Kedem, Avi Turgeman, Itai NOVICK, Alexander Basil Zaloum,
Leonid Karabchevsky, Shira Mintz, Ron Uriel Maor, ”Device, System,
and Method of Detecting Vishing Attacks”, U. S. Patent 16/188,312,
May 23, 2019
[9] L. Peng and R. Lin, ”Fraud Phone Calls Analysis Based on Label Prop-
agation Community Detection Algorithm,” 2018 IEEE World Congress
on Services (SERVICES), 2018
[10] A. Marzuoli, H. A. Kingravi, D. Dewey and R. Pienta, ”Uncovering the
Landscape of Fraud and Spam in the Telephony Channel,” 2016 15th
IEEE International Conference on Machine Learning and Applications
(ICMLA), 2016
[11] Choi, Kwan & Lee, Ju-lak & Chun, Yong-tae, ”Voice phishing fraud
and its modus operandi”, Security Journal, 2017
[12] Ujjwal Saini, ”Voice Phishing Attacks”, International Research Journal
of Engineering and Technology (IRJET), July 2020
Fig. 6. Results of the Naive Bayes Model [13] Yue Zhang, Jason I. Hong, and Lorrie F. Cranor, ”Cantina: a content-
based approach to detecting phishing web sites, In ¡i¿Proceedings of
the 16th international conference on World Wide Web¡/i¿ (¡i¿WWW
’07¡/i¿). Association for Computing Machinery, New York, NY, USA,
V. C ONCLUSION 2007, 639–648
[14] Tu, H., Doupé, A., Zhao, Z., & Ahn, G. J, ”Users really do answer
After implementing two different algorithms we determined telephone scams”, In Proceedings of the 28th USENIX Security Sym-
that the CNN model gives an accuracy of 97.21% and the posium (pp. 1327-1340). (Proceedings of the 28th USENIX Security
Symposium). USENIX Association, 2019
Naive Bayes model gives an accuracy of 95.47%. The recall [15] Elijah M. Maseno, Patrick Ogao, Samwel Matende, ”Vishing Attacks
i.e. improper classification of fraud calls, an important factor on Mobile Platform in Nairobi County Kenya”, International Journal
for our problem statement is comparatively higher in the case of Advanced Research in Computer Science & Technology (IJARCST
2017)
of CNN than Naive Bayes. Hence we can conclude that the [16] Ahmed Aleroud, Lina Zhou, ”Phishing environments, techniques, and
performance of the CNN model is better and is well equipped countermeasures: A survey”, Computers & Security, Volume 68, 2017,
to classify the calls. ISSN 0167-4048
[17] Alabdan, Rana, ”Phishing Attacks Survey: Types, Vectors, and Technical
There are a few limitations to this project. Newer algorithms Approaches, Future Internet”, 12, (2020)
provide scope to improve the model performance. An interface [18] Johnson, J.M., Khoshgoftaar, T.M. ”Survey on deep learning with class
is needed to implement this model. The ways and methods of imbalance”, J Big Data 6, 27 (2019)
[19] Xin, M., Wang, Y. ’Research on image classification model based on
duping people are always evolving and hence the data will deep convolution neural network’, J Image Video Proc. 2019, 40 (2019)
need to be updated periodically. [20] Kumar, S., Zymbler, M. ”A machine learning approach to analyze
Phishing through phone calls is a modern way to attack customer satisfaction from airline tweets”, J Big Data 6, 62 (2019)
people and seek their personal information. It could be used