Filipino Online Scam Data Classification Decision Tree Algorithms

This study focuses on classifying Filipino online scam data using decision tree algorithms, specifically comparing Random Tree, J48, and Logistic Model Tree (LMT). The J48 decision tree algorithm demonstrated the highest accuracy and lowest error rates, validating its effectiveness for cybercrime investigations in the Philippines. The research highlights the potential of decision tree algorithms in enhancing data mining applications within criminal investigations, encouraging further exploration of classification techniques.

Uploaded by

Erwin Llame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

Filipino Online Scam Data Classification Decision Tree Algorithms

Uploaded by

Erwin Llame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

,QWHUQDWLRQDO&RQIHUHQFHRQ'DWD6FLHQFHDQG,WV$SSOLFDWLRQV ,&R'6$

Filipino Online Scam Data Classification using

Decision Tree Algorithms
Mark Harold H. Calderon Eddie Bouy B. Palad Marivic S. Tangkeko
Department of Information Technology Department of Information Technology Department of Information Technology
De La Salle University MSU-Iligan Institute of Technology De La Salle University
Manila, Philippines Iligan City, Philippines Manila, Philippines
[email protected] [email protected] [email protected]

Abstract—Classification as a data mining technique is a among the different classification algorithms employed, J48
process of finding a set of models describing and distinguishing decision tree emerged as the one with the best performance
data classes and concepts for the purpose of being able to use as it recorded the highest accuracy rate with the least errors.
such a model to predict the class with an unknown label. Additionally, police investigators had validated their results
Taking advantage of such technique as well as the other data saying that among the different classifiers used, it was J48
mining techniques has been highlighted in several prior Decision Tree algorithm which produced or presented the
researches however, the lack of works involving cybercrime most favourable results which were easily interpreted and
data in the Philippines suggests the lack of data mining that which may be applied in future cybercrime
application in facilitating cybercrime investigations in the
investigations. Truly, based on those initial results, it can be
country. Hence, this study classifies online scam unstructured
data consisting of 54,059 mostly Filipino words as a significant
gleaned that decision tree algorithms could be potentially
continuation of a prior study. Further, three (3) different useful in the field of criminal investigations as it somehow
decision tree algorithms namely, Random Tree, J48, and provides an opportunity for police investigators in possibly
Logistic Model tree (LMT) were utilized and compared in employing data mining using crime data in the country.
terms of their performance and prediction accuracy. The The authors further found a support of such claim that
results and evaluation conducted demonstrate that among the indeed decision tree classification method is the most used
classifiers, LMT performs best using an improved online scam
method in data classification. In fact, [5] discussed that
data as it achieved the highest accuracy and the lowest error
decision tree algorithm performs better than other methods as
rate. Further researches may be conducted using other
classification or data mining techniques in Weka and that
it was argued to produce favourable classification results as it
weight allocation and subsequent ranking approaches be presents such results in a tree-like structure having nodes and
introduced or employed to the results in order to rank those leaves that can be visualized clearly, understood easily, and
classifiers that may be evaluated. interpreted straightforwardly [6][7][8]. The tree performs in
case of both numerical and categorical variables as well. So,
Keywords—classification, decision trees, text mining researchers concluded that it is one of the efficient as well as
popular classifiers [5][9][10].
I. INTRODUCTION Thus, this present study aimed to continue the previous
Data classification has been tagged as one of the most study of [1] based on the identified research gaps, as follows:
frequently utilized techniques in data mining [1][2] in Filipino dataset used was relatively small; and the classifiers
finding a model that distinguishes data classes through being compared namely Naïve Bayes, J48 Decision Tree,
predicting class of objects [3] with unknown class labels [4]. and SMO were under different categories. The objective
Such technique is said to have been used in quite a lot of therefore of the present authors is to compare the
scholarly works in various fields to accurately predict target performance of other decision tree classification algorithms
classes of a given data. One interesting area is that of as against J48 using a larger set of data in order to shift the
cybercrime investigations and crime classifications. In fact, focus to decision trees and assess if such algorithms can be
[1] claimed that a number of scholars underlines the effectively used in data mining endeavors which may
significance and the perceived contributions of data mining positively impact or contribute to criminal investigations in
techniques into the area of crime investigations, however, the country.
presently, there is still a dearth of notable scholarly data The paper is organized as follows: Section II presents a
mining works employing crime datasets in Philippine brief review of related works; Section III introduces the
setting. Further, the authors also observed that most, if not methods employed in conducting this research work
all, previous works on data mining involving crime data are including the dataset used; Section IV presents the discussion
utilizing crime news as well as text data scraped from news of the results; and Section V lays the conclusion as well as
outlets’ websites as datasets. some recommendations for future researches.
It is important to note that the initial study of [1] ventures
into the cybercrime field using data mining techniques with II. RELATED WORK
Filipino data coming from Police incident reports and A classification method is mostly utilized by researchers
victims’ narratives of the online scam incidents. Such to correctly predict the target class[11] for each situation in
relevant research evaluated the performance and prediction the information. Such a method involved several techniques,
accuracy of three (3) classification algorithms under different namely Decision trees, Bayesian Network, K-Nearest
categories namely Naïve Bayes, J48 decision tree, and Neighbor (KNN) [12][13], Fuzzy Logic, Neural Networks,
Sequential Minimal Optimization (SMO) using a fraud and Support Vector Machines[8]; all of these are
dataset consisting of 14,098 words or attributes. Upon implemented in the Weka Data Mining Tool [14]. As already
perusal of the results of such study, it was revealed that

k,(((
Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
mentioned above, in [1] different classification techniques decision trees. The authors still utilized the pipe-lined system
namely J48, Naïve Bayes, and SMO were employed in the model presented in [1] as illustrated in Fig. 1.
analysis using the online scam dataset for which, J48
Decision Tree yields the most favourable results. Researchers conducted pre-processing activities on the
dataset prior to classifying them using various decision tree
Hence, in this present study, the focus is now shifted to algorithms that are available in the Weka text mining tool.
Decision Tree Algorithms. Reference [15] mentioned that a After pre-processing the data, the named entities were
decision tree is a learning classification machine shown as a extracted and represented as an online scam record. A vector
tree; used for representing the classifiers and regressions. A space model results after extracting the named entities which
decision tree is presented with branches and nodes having was then used to conduct the classification phase of the
leaves representing the classes[16]. The terminal nodes will online scam datasets.
show the final classification [17] or the goal decision
variables. In this paper, performance of the three (3) mostly Data classification is then performed on the cleaned
used decision trees algorithms namely Random Tree, J48, unstructured data gathered from Police records as well as
and LMT were evaluated using the online scam dataset. from online scam victims who willingly narrated their scam
experiences.
A. Random Tree
A. Dataset used
Reference [18] discussed that this algorithm performs by
building a model by way of randomly constructing a tree The researchers aimed to have a significant improvement
from a set of possible trees maintaining K random features at of the 14,098-word dataset used in [1] and they achieved this
each node [19][20]. By random, it just means that in the set by gathering a total of 54,059 words derived exclusively
of trees available, each tree has an equal chance of being from written narratives of Filipino online scam victims and
sampled as trees being generated and evaluated are said to police incident reports. To reiterate, this is one of the main
gaps that motivate the researchers in pursuing this study as
have a uniform distribution. Random Trees are principally
single model trees applying Random Forest ideas [21]. they want to investigate the performance of classification
Further, it is also said that the combination of large sets of algorithms using a larger dataset.
random trees generally yield to accurate models[18][21]. As already claimed in the previous study, online scams
Reference [15] tested Random Tree, among other consistently topped the list of most common cybercrimes
decision tree algorithms, to present an automatic reported at the Philippine National Police – Anti Cybercrime
classification of data that included features from a list of Group (PNP-ACG). Further, for this paper, the authors
authentic and hijacked journals. looked into reported or narrated online scam incidents under
different categories or classes namely buy and sell, banking,
employment, imposter, investment, lottery, boiler room,
B. J48 online game, and online romance scams.
Widely known to be an implementation of the C4.5
algorithm [9], the J48 decision tree presents a model in a B. Pre-processing Phase
tree-like flow chart [22] thereby using a recursive divide and
conquer strategy [16][23]. Thus, such algorithm works by re- This phase involved two steps. First, the online scam
selecting the best attribute for data splits and extending the data were manually pre-processed wherein; with the help
nodes to the end of the criterion [22]. from the investigators of the PNP, incident reports were
categorized as to their type or class of online scam
The study of [8] was able to develop a crime prediction enumerated in the preceding section; then, the researchers
prototype model using J48 decision tree as applied in the removed the names of victims, suspects, and reporting
context of law enforcement to address the continuing persons in order to comply with the data privacy laws and the
demand of advanced and new approaches to improve crime confidentiality agreement entered into between the
analytics. respondents and the researchers. In this stage, some English
words involved were also translated to Filipino.
C. Logistic Model Tree
Logistic model trees are called the classification trees
having logistic regression functions at the leaves [24]. The
LMT classifier used to build such models is described as a
combination of the C4.5 decision tree and of the logistic
regression functions [25].
With the LMT classifier, the information gain is used for
fitting while the so-called LogitBoost algorithm of [25] is for
fitting the logistic regression functions at a given node. As
compared to other decision trees, leaf nodes in LMTs are
replaced by a regression plane rather than a constant value.

III. METHODOLOGY
Data mining has various techniques that include
classification [26], clustering [27], regression, and others.
This present paper however mainly focuses on the
classification technique giving emphasis on the use of Fig. 1. Pipe-lined System Model.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
TABLE I. STRINGTOWORDVECTOR PARAMETERS AND VALUES number of folds; while a fold is used as testing, the
Parameter Set Value remaining folds are used as training [13].
IDFTransform True It is important to note that in this paper, a 10-fold cross-
TFTransform False validation was utilized wherein the data were divided into ten
attributeIndices First-last (10) groups. In each of the ten (10) rounds, the nine (9)
groups were used for training while the remaining group was
attributeNamePrefix “”
the one which was being used by the classifier in order to test
Debug False and calculate its classification accuracy. This process is
dictionaryFileToSaveTo Not set repeated 10 times and the average of the testing accuracy
obtained from all the ten (10) rounds determines the overall
doNotCheckCapabilities False
classification accuracy of a classification algorithm [11][32].
doNotOperateOnPerClassBasis False
The authors used the following matrices in order to
invertSelection False
evaluate the results and come up with a relevant performance
lowerCaseTokens True comparison of the decision tree classifiers, namely: (a) the
minTermFreq 1 time for the classifier to build the model; (b) prediction
periodicPruning -1
accuracy; and (c) the errors generated by the classifiers
represented as the Mean Absolute Error (MAE) and the Root
outputWordCounts True
Mean Squared Error (RMSE).
saveDictionaryBinaryForm False
MAE yields the average of the errors [9] in the given
Tokenizer Alphabetic Tokenizer online scam dataset divided by the number of instances in the
stemmer NullStemmer test set with the actual online scam category or label. Suffice
WordsFromFile: it to say that this error rate presents the sum of absolute
stopwordsHandler errors of all instances. It generally provides how far is the
Filipino.txt
model is from giving the right answer [5].
normalizeDocLength No normalization
wordsToKeep 2,000
Contrarily, [3] argued that RMSE must also be used to
evaluate the performance of the decision tree algorithms
together with MAE, as the former renders values in the same
The second pre-processing step was done in Weka range as the predicted value itself, consequently, making the
wherein StringToWordVector filter was employed. Since interpretation of the results easier. RMSE is also said to be
there were words or attributes that appear more frequently in sensitive to outliers and exaggerates their effect, as compared
the online scam dataset which do not provide information to MAE [33]. For its interpretation, it is said that a classifier
about a text, hence, stopwords handler in Weka was used in yields a good performance when low error is indicated in the
order to determine whether a sub string in the text is an results.
empty word. A list of Filipino stopwords (designated as Furthermore, apart from those basic performance
Filipino.txt as shown in Table 1) was saved in a text file and parameters mentioned in the preceding paragraphs, the
Weka’s stopwords handler utilized such list during the pre- following matrices are also considered for optimum
processing activity. evaluation of the decision tree classifiers’ performance, to
Further, Alphabetic tokenizer was also employed to wit: True Positive (TP) rate and Recall which show the
handle tokenization. Tokenization involves the breaking of a correctly classified instances; False Positive (FP) rate which
stream of textual content up into words, symbols, or other reports instances incorrectly labelled as correct instances;
meaningful terms or elements [28][29] branded as tokens Precision rate or also known as Positive Predictive Value
which will then be considered during pre-processing. (PPV) that measures the exactness of the relevant data
retrieved; and the F-Measure or called the F-score which
Table 1 completes the parameters with their reflects the harmonic mean between precision and recall
corresponding values provided in the StringToWordVector [3][9][5][29].
filter patterned from [1].
To evaluate the results, a high precision rate means that
the model returns more relevant data than irrelevant data.
C. Classification
High recall means the model has returned most of the
In order to classify the Filipino online scam data, it is relevant results; and the high value of exactness as presented
important to note that there are primarily two processes in F-Measure leads to more correctly recognized instances
which were actually involved, to wit, (1) building the than improper ones [29].
classification model using training data and (2), the model
usage [17][30]. Hence, the authors run the classification in
Weka using the training set and cross validation methods. IV. RESULTS AND DISCUSSIONS
After applying the three (3) decision tree algorithms on
Using the training set means building up a model for a the online scam dataset, the following results were obtained.
particular classifier where the method is trained with all These results are combined in tables presented below for
available data and then applies the results on the same input performance comparison of such algorithms.
data collection [31].
The results using the training set are presented in Tables
In order to validate the model built by a classifier, the 2 and 3, while Tables 4 and 5 present the results using the
cross-validation method is used. In this method, the dataset 10-fold cross validation.
was to be divided into k folds wherein k stands for the

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
TABLE II. RESULTS USING THE TRAINING SET TABLE IV. RESULTS USING 10-FOLD CROSS-VALIDATION

Decision Tree Algorithms Decision Tree Algorithms

Evaluation Metric Evaluation Metric
Random Tree J48 LMT Random Tree J48 LMT
Time to Build the Time to Build the
0.1 0.15 197.59 0.06 7.03 212.36
Model (in secs) Model (in secs)
Prediction Accuracy 99.57% 89.27% 99.57% Prediction Accuracy 46.35% 65.24% 72.10%

Mean Absolute Error 0.001 0.0358 0.0014 Mean Absolute Error 0.1197 0.0859 0.059
Root Mean Squared Root Mean Squared
0.0218 0.1339 0.0219 0.3449 0.2672 0.2181
Error Error

TABLE III. SUMMARIZED RESULTS ON THE TRAINING SET TABLE V. SUMMARIZED RESULTS ON CROSS-VALIDATION

Evaluation Metric Evaluation Metric

Algorithm TP FP F- Algorithm TP FP F-
Precision Recall Precision Recall
Rate Rate Measure Rate Rate Measure
Random Random
0.996 0.005 0.996 0.996 0.996 0.464 0.253 0.453 0.464 0.456
Tree Tree
J48 0.893 0.051 0.895 0.893 0.887 J48 0.652 0.202 0.657 0.652 0.641

LMT 0.996 0.005 0.996 0.996 0.966 LMT 0.721 0.207 0.721 0.721 0.695

Using the training set to build the model, it was observed As per the results obtained using the 10-fold validation
that both Random Tree and LMT yield a 99.57% accuracy presented in Tables 4 and 5, it is revealed that LMT
rate with J48 having only 89.27% accuracy as presented in outperforms the other algorithms. Using a larger dataset
Table 2. This could mean that both Random Tree and LMT compared to the prior study, the results show that LMT
were able to classify the instances into its actual online scam works very well as compared to J48 as the former yields
category or class more accurately as compared to J48. prediction accuracy of 72.10% which means it has correctly
Although, it took Random Tree only 0.1 second to build the classified 72.10% of the online scam data as compared to
model while 197.59 seconds for LMT. J48’s 65.24%. Random Tree, on the one hand, only records
prediction accuracy of 46.35%. These results also translate
Looking at the errors, Random Tree has a MAE of 0.001 that in precision, LMT outperforms J48 and Random Tree as
and RMSE of 0.0218 while LMT records a MAE of 0.0014 its results precision of 0.721. It can be seen that Random
and 0.0219, respectively, which could translate that both
Tree has the worst performance (0.453) as compared to J48
classifiers gave almost the same average prediction error. (0.657). Additionally, Recall shows the accuracy of the
Using these evaluation metrics, J48 came out last as classification performed based on the total number of online
compared to the other classifiers as it reflects 89.27% scam instances and the results in Table 5 show that LMT
prediction accuracy having the highest error rates as well. classifier’s Recall performs better than J48’s 0.652. Random
Moreover, the results reflected in Table 3 displaying the Tree still performs the least with a Recall of 0.464.
values of TP rate, FP rate, Precision, Recall, and F-Measure Anent the error rates, using the two (2) classification
further show that Random Tree and LMT tied having 0.996 error matrices, namely MAE and RMSE, the results show
in all these matrices, with a corresponding value of 0.0005 as that in both forms of classification error matrices, LMT had
FP rate. The higher the TP rate and Recall and the lower the the lowest amount of errors compared to both J48 and
FP rate, means that the classifier is performing efficiently. Random Tree. LMT only records a MAE of 0.059 as
Just the same, J48 performs the least as it records a lower TP compared to Random Tree's 0.1197 and J48's 0.0859; and a
rate of 0.893 as compared to the other classifiers. The results RMSE of 0.2181 as compared to Random Tree and J48
also reflect that J48 garnered a Precision rate of 0.895; Recall which obtain a MAE of 0.1197 and 0.0859, respectively, and
of 0.893, and F-Measure 0f 0.887. Summarily, it can be RMSE of 0.3449 (Random Tree) and 0.2672 (J48).
observed that running the training set in Weka, both LMT
and Random outperforms J48 using the abovementioned One interesting observation though is that while LMT
performance matrices. can be considered as the most efficient in learning and
classification based on the results, its runtime is the slowest
Furthermore, it was the contention of [1] that k-fold cross among the classifiers. Whereas it only took Random Tree
validation is highly recommended for estimating 0.06 seconds to build the model under cross-validation and
classification accuracy as [13] suggests that this approach is 7.03 seconds for J48, LMT took 212.36 seconds to build the
fitting in testing the algorithms to steer clear from biased model. Although, as already claimed in [1] that where time is
results. It was also observed that cross validation somehow not the main metric for evaluation of the performance, LMT
provides robustness to the classification [21]. can still be declared to have performed better than J48 and
Hence, using a 10-fold cross validation, the algorithm Random Tree using the online scam dataset.
which gives better Prediction Accuracy, Recall, Precision, Furthermore, a graph is also plotted to show and visually
and F-measure value were considered the most efficient. represent the prediction accuracy to compare the different
Nonetheless, the cross-validation results are given much algorithms. Fig. 2 shows the performance comparison based
weight by the researchers in evaluating the performance of on their prediction accuracy and it can be gleaned that indeed
the classifiers at it gives an accurate estimate of the LMT outperforms the other two (2) classifiers yielding the
performance than the training set.
highest overall accuracy rate in 10-fold cross validation.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Comparison of Prediction Accuracy Results.

In addition, Fig. 3 shows that among the classifiers

evaluated, it was LMT that yields the best performance as it
records low error rates both in using the training set and 10-
fold-cross validation.

V. CONCLUSION AND RECOMMENDATIONS

This research was conducted as a continuation and an
improvement of the study of [1] which was argued to be an Fig. 3. Comparison of Errors.
initial attempt to take advantage of data mining techniques
using available cybercrime data as the Philippines is seen to Therefore, it can be clearly concluded that the LMT
have lack of researches applying data mining in the field of classifier is the best performer in all areas of comparison
crime investigations. In the prior work, the authors used a however, the authors note that it ranks last as to run time as it
relatively small quantity of online scam data consisting of records the longest time to build the model. Nevertheless, as
14,098 words. Different classifiers were compared such as it was stressed in [1] that where the building time is not the
Naïve Bayes, J48, and SMO based on their accuracy, sole or main metric for evaluation of the performance, the
precision, recall, error rates, learning time, and other metrics. LMT classifier can be concluded to have performed better
Their results reveal that J48 Decision Tree performs best as it than the other decision tree algorithms.
records the highest accuracy rate and the least error rates as As future works, the authors recommend comparing the
compared to Naïve Bayes and SMO. Upon the presentation results obtained from these classifiers with other decision
of the results to police investigators through a validation tree classifiers implemented in Weka and investigate as to
exercise, J48 was also preferred. what causes the difference in the performance of such
In this present study, however, the authors shift their algorithms. Weight allocation and subsequent ranking
focus to Decision Tree algorithms using a relatively larger approaches may also be introduced or employed in order to
dataset consisting of 54,059 mainly Filipino words. The rank those classifiers that may be evaluated.
dataset was tested through training set and cross-validation
using Random Tree, J48, and LMT classifiers. Based on the ACKNOWLEDGMENT
prediction accuracy results as well as the classification The authors appreciate all the support given by
errors, one may conclude that the LMT classifier was the individuals during the entire course of the research project.
most suited algorithm for the dataset. LMT generates the best
performance and is the most efficient in learning and
classification as against J48 and Random Tree. The reason REFERENCES
for its best performance is that it correctly classifies the [1] E. B. B. Palad, M. S. Tangkeko, L. A. K. Magpantay, and G. L. Sipin,
instances with 72.10% using the 10-fold cross validation “Document Classification of Filipino Online Scam Incident Text
using Data Mining Techniques,” in 19th International Symposium on
leading to correct classification of the data as opposed to Communications and Information Technologies, 2019, pp. 232–237.
J48’s 65.24% and Random Tree’s 46.35%. [2] P. Rajesh and M. Karthikeyan, “A Comparative Study of Data Mining
Algorithms for Decision Tree Approaches using WEKA Tool,” Adv.
Another reason for its best performance is that LMT also Nat. Appl. Sci., vol. 11, no. 9, pp. 230–241, 2017.
obtains the highest values in terms of TP rate, Precision,
[3] Z. E. Rasjid and R. Setiawan, “Performance Comparison and
Recall, and F-measure, as it generates the least amount of Optimization of Text Document Classification using k-NN and Naïve
errors as well. Having the highest precision rate, it can be Bayes Classification Techniques,” Procedia Comput. Sci., vol. 116,
concluded that the model generated by the LMT classifier pp. 107–112, 2017.
returns more relevant data than irrelevant data as against the [4] K. F. Bindhia, Y. Vijayalakshmi, P. Manimegalai, and S. S. Babu,
other classifiers. Since LMT also obtains the highest recall, it “Classification using Decision Tree Approach towards Information
can be interpreted that the model has returned most of the Retrieval Keywords Techniques and a Data Mining Implementation
using WEKA Data Set,” Int. J. Pure Appl. Math., vol. 116, no. 22, pp.
relevant results while the high value of exactness as 19–29, 2017.
presented in LMT’s F-Measure leads to more correctly [5] S. Hussain, N. A. Dahan, F. M. Ba-Alwib, and N. Ribata,
recognized instances than improper ones. “Educational Data Mining and Analysis of Students ’ Academic

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
Performance Educational Data Mining and Analysis of Students ’ [20] P. Kaur and A. Khamparia, “Classification of liver based diseases
Academic Performance Using WEKA,” Indones. J. Electr. Eng. using random tree,” Int. J. Adv. Eng. Technol., vol. 8, no. 3, pp. 306–
Comput. Sci., vol. 9, no. 2, pp. 447–459, 2018. 313, 2015.
[6] C. Anuradha and T. Velmurugan, “A Comparative Analysis on the [21] A. K. Mishra and B. K. Ratha, “Study of Random Tree and Random
Evaluation of Classification Algorithms in the Prediction of Students Forest Data Mining Algorithms for Microarray Data Analysis,” Int. J.
Performance,” Indian J. Sci. Technol., vol. 8, no. 15, p. 12, 2015. Adv. Electr. Comput. Eng., vol. 3, no. 4, pp. 5–7, 2016.
[7] A. Arivarasan and M. Karthikeyan, “Classification based Performance [22] N. N. Sakhare and S. Joshi, “Classification of Criminal Data Using
analysis using Naïve-Bayes J48 and Random forest algorithms,” Int. J48-Decision Tree Algorithm,” Int. J. Data Warehous. Min., vol. 4,
J. Appl. Res., vol. 3, no. 6, pp. 174–178, 2017. no. 3, pp. 167–171, 2015.
[8] E. Ahishakiye, D. Taremwa, E. O. Omulo, and I. Niyonzima, “Crime [23] A. Pelt, T. brkç, P. Özalp, B. Tatekn, S. Biyokimyasal, and V.
Prediction using Decision Tree (J48) Classification Algorithm,” Int. J. Snflandrlmas, “Classification of Biochemical and Biomechanical
Comput. Inf. Technol., vol. 06, no. 03, pp. 2279–764, 2017. Data of Diabetic Rats Treated with Magnetic Field By PCA-
[9] R. Panigrahi and S. Borah, “Rank Allocation to J48 Group of Supported J48 Algorithm PCA Destekli J48 Algoritmas le Manyetik
Decision Tree Classifiers using Binary and Multiclass Intrusion Alanla Tedavi Edilen Diyabetik,” Kafkas Univ. Vet. Fak. Derg., vol.
Detection Datasets,” in International Conference on Computational 25, no. 6, pp. 741–747, 2019.
Intelligence and Data Science, 2018, vol. 132, pp. 323–332. [24] F. M. Ali, E.-B. E. Fgee, and Z. S. Zubi, “Predicting Performance of
[10] A. H. Aliwy and E. H. A. Ameer, “Comparative Study of Five Text Classification Algorithms,” Int. J. Comput. Eng. Technol., vol. 6, no.
Classification Algorithms with their Improvements,” Int. J. Appl. Eng. 2, pp. 19–28, 2015.
Res. ISSN, vol. 12, no. 14, pp. 973–4562, 2017. [25] D. T. Bui, T. A. Tuan, H. Klempe, B. Praghan, and I. Revhaug,
[11] G. Obuandike, A. Isah, and J. Alhasan, “Analytical Study of Some “Spatial prediction models for shallow landslide hazards: a
Selected Classification Algorithms in WEKA Using Real Crime comparative assessment of the efficacy of support vector machines ,
Data,” Int. J. Adv. Res. Artif. Intell., vol. 4, no. 12, pp. 44–48, 2015. artificial neural networks , kernel logistic regression , and logistic
model tree,” Landslides, vol. 13, pp. 361–378, 2016.
[12] M. Bilal, H. Israr, M. Shahid, and A. Khan, “Sentiment classification
of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and [26] A. Naik and L. Samant, “Correlation Review of Classification
KNN classification techniques,” J. King Saud Univ. - Comput. Inf. Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra,
Sci., vol. 28, no. 3, pp. 330–344, 2016. Orange and Knime,” Procedia Comput. Sci., vol. 85, pp. 662–668,
2016.
[13] S. G. Cho and S. B. Kim, “A Data-driven Text Similarity Measure
based on Classification Algorithms,” Int. J. Ind. Eng., vol. 24, no. 3, [27] R. Kiani, S. Mahdavi, and A. Keshavarzi, “Analysis and Prediction of
pp. 328–339, 2017. Crimes by Clustering and Classification,” Int. J. Adv. Res. Artif.
Intell., vol. 4, no. 8, pp. 11–17, 2015.
[14] R. Cretulescu, D. Morariu, and M. Breazu, “Using WEKA
Framework in Document Classification,” Int. J. Adv. Stat. IT&C [28] V. Gurusamy, S. Kannan, and N. K., “Performance Analysis:
Econ. Life Sci., vol. 6, no. 2, 2016. Stemming Algorithm for the English Language,” Int. J. Sci. Res. Dev.,
vol. 5, no. 05, pp. 1933–1938, 2017.
[15] M. A. Shahri, M. D. Jazi, G. Borchardt, and M. Dadkhah, “Detecting
Hijacked Journals by Using Classification Algorithms,” Sci. Eng. [29] K. S. Digamberao and R. Prasad, “Author Identification using
Ethics, vol. 24, pp. 655–668, 2018. Sequential Minimal Optimization with rule-based Decision Tree on
Indian Literature in Marathi,” Procedia Comput. Sci. J. elsevier, vol.
[16] U. Bashir and M. Chachoo, “Performance Evaluation of J48 and 132, pp. 1086–1101, 2018.
Bayes Algorithms for Intrusion Detection System,” Int. J. Netw.
Secur. its Appl., vol. 9, no. 4, pp. 01–11, 2017. [30] A. K. Pandey, “A comparative study of classification techniques by
utilizing WEKA,” 2016 Int. Conf. Signal Process. Commun., pp. 219–
[17] F. Alam and S. Pachauri, “Comparative Study of J48 , Naive Bayes 224, 2016.
and One-R Classification Technique for Credit Card Fraud Detection
using WEKA,” Adv. Comput. Sci. Technol., vol. 10, no. 6, pp. 1731– [31] G. D. K. Kishore and M. B. Reddy, “Comparative Analysis between
1743, 2017. Classification Algorithms and Data Sets (1:N & N:1) through
WEKA,” Open Access Int. J. Sci. Eng., vol. 2, no. 5, pp. 23–28, 2017.
[18] S. Kalmegh, “Analysis of WEKA Data Mining Algorithm REPTree ,
Simple Cart and Random Tree for Classification of Indian News,” Int. [32] A. S. Suguitan and L. N. Dacaymat, “Vehicle Image Classification
J. Innov. Sci. Eng. Technol., vol. 2, no. 2, pp. 438–446, 2015. Using Data Mining Techniques,” in 2nd International Conference on
Computer Science and Software, 2019, pp. 13–17.
[19] P. M. Vairavan and B. S. Sasidhar, “Classification Using Decision
Tree Approach towards Information Retrieval Keywords Techniques [33] G. Saltos and E. Haig, “An Exploration of Crime Prediction Using
and a Data Mining Implementation Using WEKA Data Set,” Int. J. Data Mining on Open Data,” Int. J. Inf. Technol. Decis. Mak. ·, vol.
Pure Appl. Math., vol. 116, no. 22, pp. 19–29, 2017. 15, no. 9, 2017.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.

Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
Credit Card Fraud Detection Using Machine Learning Techniques
9 pages
Leo Breiman 2001 Random Forest Algorithm Weka - Google Scholar
No ratings yet
Leo Breiman 2001 Random Forest Algorithm Weka - Google Scholar
6 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
A Comparative Analysis of Machine Learning Algorithms for Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms for Classification Purpose
10 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Wibowo 2020 J. Phys. Conf. Ser. 1450 012076
No ratings yet
Wibowo 2020 J. Phys. Conf. Ser. 1450 012076
7 pages
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Graph Data Science with Python and Neo4j
From Everand
Graph Data Science with Python and Neo4j
Timothy Eastridge
No ratings yet
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
From Everand
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
Timothy Eastridge
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
V6I5-0268
No ratings yet
V6I5-0268
7 pages
Ijetr042741 PDF
No ratings yet
Ijetr042741 PDF
4 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
2.1(4)
No ratings yet
2.1(4)
9 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Classifying Datasets Using Some Different Classification Methods
No ratings yet
Classifying Datasets Using Some Different Classification Methods
7 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
No ratings yet
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
4 pages
Practical Digital Forensics
From Everand
Practical Digital Forensics
Richard Boddington
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
From Everand
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
Zemelak Goraga
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
No ratings yet
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
6 pages
Comparative Analysis of Classification Algorithms Using Weka
No ratings yet
Comparative Analysis of Classification Algorithms Using Weka
12 pages
JCSSP 2023 1170 1179
No ratings yet
JCSSP 2023 1170 1179
10 pages
Introduction to Data Analysis in Qualitative Research
From Everand
Introduction to Data Analysis in Qualitative Research
Asher Shkedi
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
10 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
ERPANET Case Study: Project Gutenberg
From Everand
ERPANET Case Study: Project Gutenberg
ERPANET
No ratings yet
Performance Analysis of Decision Tree Classifiers
100% (1)
Performance Analysis of Decision Tree Classifiers
9 pages
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Classification Algorithm in Data Mining: An
No ratings yet
Classification Algorithm in Data Mining: An
6 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Strategic Policy Insights in Data Science
From Everand
Strategic Policy Insights in Data Science
Zemelak Goraga
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Mastering Clojure Data Analysis
From Everand
Mastering Clojure Data Analysis
Eric Rochester
No ratings yet
Artificial intelligence: AI in the technologies synthesis of creative solutions
From Everand
Artificial intelligence: AI in the technologies synthesis of creative solutions
Alexander V. Andreichikov
No ratings yet
Poster Sizes
No ratings yet
Poster Sizes
1 page
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
2.5/5 (2)
Desion Tree Based Crime 19
No ratings yet
Desion Tree Based Crime 19
8 pages
A Bird's Eye view of Data Visualisation
From Everand
A Bird's Eye view of Data Visualisation
Nisarg Patel
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Comparative Assessment Result Registry of Qualified Applicants Car Rqa for Teacher 1 Sy 2023-2024 Senior High School Sports Track
No ratings yet
Comparative Assessment Result Registry of Qualified Applicants Car Rqa for Teacher 1 Sy 2023-2024 Senior High School Sports Track
3 pages
TVL-COMPROG11-Q2-M3
No ratings yet
TVL-COMPROG11-Q2-M3
14 pages
1-s2.0-S0360131515300142-Yoon Jeon Kim Psychometric Qualities 2015
No ratings yet
1-s2.0-S0360131515300142-Yoon Jeon Kim Psychometric Qualities 2015
34 pages
CSS11_mod3_EnvironmentAndMarket_v1
No ratings yet
CSS11_mod3_EnvironmentAndMarket_v1
17 pages
09144064Asystematic
No ratings yet
09144064Asystematic
7 pages
TVL-COMPROG11-Q2-M5
No ratings yet
TVL-COMPROG11-Q2-M5
14 pages
TVL-COMPROG11-Q2-M2
No ratings yet
TVL-COMPROG11-Q2-M2
13 pages
SDO_Navotas_TLE_CSS9_Q1_FV
No ratings yet
SDO_Navotas_TLE_CSS9_Q1_FV
50 pages
publisher_copy
No ratings yet
publisher_copy
9 pages
Gr11.ICT.M2
No ratings yet
Gr11.ICT.M2
151 pages
Application of AI-based Models for Online Fraud Detection and Analysis
No ratings yet
Application of AI-based Models for Online Fraud Detection and Analysis
37 pages
CSS11-Wire Termination and Connection
No ratings yet
CSS11-Wire Termination and Connection
7 pages
New-Classroom-Observation-Sheet
No ratings yet
New-Classroom-Observation-Sheet
1 page
CSS11-Wire Termination and Connection
No ratings yet
CSS11-Wire Termination and Connection
5 pages
quillbot_invoice_6oqOxUYJ3rAwEYq3
No ratings yet
quillbot_invoice_6oqOxUYJ3rAwEYq3
1 page
The Rise of Public Wi-Fi and Threats
No ratings yet
The Rise of Public Wi-Fi and Threats
16 pages
CSS11-Using Multitester
No ratings yet
CSS11-Using Multitester
5 pages
CSS11-Testing Methods
No ratings yet
CSS11-Testing Methods
5 pages
Bhosale, Patnaik - 2023
No ratings yet
Bhosale, Patnaik - 2023
17 pages
Artificial Intelligence Fundamentals and Applications
No ratings yet
Artificial Intelligence Fundamentals and Applications
77 pages
Unit 5
No ratings yet
Unit 5
46 pages
Pump and Dumps in The Bitcoin Era Real Time Detect-1
No ratings yet
Pump and Dumps in The Bitcoin Era Real Time Detect-1
10 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
All Life Bank - AIML_ML_Project_low_code_notebook
No ratings yet
All Life Bank - AIML_ML_Project_low_code_notebook
78 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Project Viva
No ratings yet
Project Viva
4 pages
Early Detection of Cyberbullying On Social Media Networks
No ratings yet
Early Detection of Cyberbullying On Social Media Networks
11 pages
l09_machine_learning
No ratings yet
l09_machine_learning
39 pages
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
No ratings yet
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
14 pages
Parkinsons Disease Detection
No ratings yet
Parkinsons Disease Detection
80 pages
T15-AWSAnalyticsAndAI-ProblemStatement-Mocktest
No ratings yet
T15-AWSAnalyticsAndAI-ProblemStatement-Mocktest
14 pages
Chronic Disease Prediction Using Machine Learning
No ratings yet
Chronic Disease Prediction Using Machine Learning
7 pages
Ephrem Tibebu
No ratings yet
Ephrem Tibebu
140 pages
Performance Comparison of Face Detection and Recognition Algorithms
No ratings yet
Performance Comparison of Face Detection and Recognition Algorithms
10 pages
2503.17551v1
No ratings yet
2503.17551v1
10 pages
Summary of Research Papers
No ratings yet
Summary of Research Papers
8 pages
CNN Plant Disease Detection copy
No ratings yet
CNN Plant Disease Detection copy
21 pages
Search Engines Information Retrieval in Practice PDF
No ratings yet
Search Engines Information Retrieval in Practice PDF
542 pages
Artificial Intelligence123987
No ratings yet
Artificial Intelligence123987
6 pages
Classification Algorithm: Supervised Learning Technique Training Data
No ratings yet
Classification Algorithm: Supervised Learning Technique Training Data
28 pages
Ayub 2020
No ratings yet
Ayub 2020
6 pages
b 14 Sms Spam Detection Ml Ieee Report (1)
No ratings yet
b 14 Sms Spam Detection Ml Ieee Report (1)
5 pages
Chapter 6-8IR Revised
No ratings yet
Chapter 6-8IR Revised
76 pages
Integrated_GIS-Based_MCDA_and_Machine_Learning_Tec
No ratings yet
Integrated_GIS-Based_MCDA_and_Machine_Learning_Tec
24 pages
Smart Chess Assistant: Using AI to See the Board and Suggest the Best Moves
No ratings yet
Smart Chess Assistant: Using AI to See the Board and Suggest the Best Moves
13 pages
Fundamentals of machine learning with QA
No ratings yet
Fundamentals of machine learning with QA
41 pages
Batch B DWM Experiments
No ratings yet
Batch B DWM Experiments
90 pages

Filipino Online Scam Data Classification Decision Tree Algorithms

Uploaded by

Filipino Online Scam Data Classification Decision Tree Algorithms

Uploaded by

,QWHUQDWLRQDO&RQIHUHQFHRQ'DWD6FLHQFHDQG,WV$SSOLFDWLRQV ,&R'6$

Filipino Online Scam Data Classification using

Decision Tree Algorithms Decision Tree Algorithms

Evaluation Metric Evaluation Metric

In addition, Fig. 3 shows that among the classifiers

V. CONCLUSION AND RECOMMENDATIONS

You might also like

,QWHUQDWLRQDO&RQIHUHQFHRQ'DWD6FLHQFHDQG,WV$SSOLFDWLRQV ,&R'6$