0% found this document useful (0 votes)
24 views6 pages

Filipino Online Scam Data Classification Decision Tree Algorithms

This study focuses on classifying Filipino online scam data using decision tree algorithms, specifically comparing Random Tree, J48, and Logistic Model Tree (LMT). The J48 decision tree algorithm demonstrated the highest accuracy and lowest error rates, validating its effectiveness for cybercrime investigations in the Philippines. The research highlights the potential of decision tree algorithms in enhancing data mining applications within criminal investigations, encouraging further exploration of classification techniques.

Uploaded by

Erwin Llame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Filipino Online Scam Data Classification Decision Tree Algorithms

This study focuses on classifying Filipino online scam data using decision tree algorithms, specifically comparing Random Tree, J48, and Logistic Model Tree (LMT). The J48 decision tree algorithm demonstrated the highest accuracy and lowest error rates, validating its effectiveness for cybercrime investigations in the Philippines. The research highlights the potential of decision tree algorithms in enhancing data mining applications within criminal investigations, encouraging further exploration of classification techniques.

Uploaded by

Erwin Llame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

,QWHUQDWLRQDO&RQIHUHQFHRQ'DWD6FLHQFHDQG,WV$SSOLFDWLRQV ,&R'6$

Filipino Online Scam Data Classification using


Decision Tree Algorithms
Mark Harold H. Calderon Eddie Bouy B. Palad Marivic S. Tangkeko
Department of Information Technology Department of Information Technology Department of Information Technology
De La Salle University MSU-Iligan Institute of Technology De La Salle University
Manila, Philippines Iligan City, Philippines Manila, Philippines
[email protected] [email protected] [email protected]

Abstract—Classification as a data mining technique is a among the different classification algorithms employed, J48
process of finding a set of models describing and distinguishing decision tree emerged as the one with the best performance
data classes and concepts for the purpose of being able to use as it recorded the highest accuracy rate with the least errors.
such a model to predict the class with an unknown label. Additionally, police investigators had validated their results
Taking advantage of such technique as well as the other data saying that among the different classifiers used, it was J48
mining techniques has been highlighted in several prior Decision Tree algorithm which produced or presented the
researches however, the lack of works involving cybercrime most favourable results which were easily interpreted and
data in the Philippines suggests the lack of data mining that which may be applied in future cybercrime
application in facilitating cybercrime investigations in the
investigations. Truly, based on those initial results, it can be
country. Hence, this study classifies online scam unstructured
data consisting of 54,059 mostly Filipino words as a significant
gleaned that decision tree algorithms could be potentially
continuation of a prior study. Further, three (3) different useful in the field of criminal investigations as it somehow
decision tree algorithms namely, Random Tree, J48, and provides an opportunity for police investigators in possibly
Logistic Model tree (LMT) were utilized and compared in employing data mining using crime data in the country.
terms of their performance and prediction accuracy. The The authors further found a support of such claim that
results and evaluation conducted demonstrate that among the indeed decision tree classification method is the most used
classifiers, LMT performs best using an improved online scam
method in data classification. In fact, [5] discussed that
data as it achieved the highest accuracy and the lowest error
decision tree algorithm performs better than other methods as
rate. Further researches may be conducted using other
classification or data mining techniques in Weka and that
it was argued to produce favourable classification results as it
weight allocation and subsequent ranking approaches be presents such results in a tree-like structure having nodes and
introduced or employed to the results in order to rank those leaves that can be visualized clearly, understood easily, and
classifiers that may be evaluated. interpreted straightforwardly [6][7][8]. The tree performs in
case of both numerical and categorical variables as well. So,
Keywords—classification, decision trees, text mining researchers concluded that it is one of the efficient as well as
popular classifiers [5][9][10].
I. INTRODUCTION Thus, this present study aimed to continue the previous
Data classification has been tagged as one of the most study of [1] based on the identified research gaps, as follows:
frequently utilized techniques in data mining [1][2] in Filipino dataset used was relatively small; and the classifiers
finding a model that distinguishes data classes through being compared namely Naïve Bayes, J48 Decision Tree,
predicting class of objects [3] with unknown class labels [4]. and SMO were under different categories. The objective
Such technique is said to have been used in quite a lot of therefore of the present authors is to compare the
scholarly works in various fields to accurately predict target performance of other decision tree classification algorithms
classes of a given data. One interesting area is that of as against J48 using a larger set of data in order to shift the
cybercrime investigations and crime classifications. In fact, focus to decision trees and assess if such algorithms can be
[1] claimed that a number of scholars underlines the effectively used in data mining endeavors which may
significance and the perceived contributions of data mining positively impact or contribute to criminal investigations in
techniques into the area of crime investigations, however, the country.
presently, there is still a dearth of notable scholarly data The paper is organized as follows: Section II presents a
mining works employing crime datasets in Philippine brief review of related works; Section III introduces the
setting. Further, the authors also observed that most, if not methods employed in conducting this research work
all, previous works on data mining involving crime data are including the dataset used; Section IV presents the discussion
utilizing crime news as well as text data scraped from news of the results; and Section V lays the conclusion as well as
outlets’ websites as datasets. some recommendations for future researches.
It is important to note that the initial study of [1] ventures
into the cybercrime field using data mining techniques with II. RELATED WORK
Filipino data coming from Police incident reports and A classification method is mostly utilized by researchers
victims’ narratives of the online scam incidents. Such to correctly predict the target class[11] for each situation in
relevant research evaluated the performance and prediction the information. Such a method involved several techniques,
accuracy of three (3) classification algorithms under different namely Decision trees, Bayesian Network, K-Nearest
categories namely Naïve Bayes, J48 decision tree, and Neighbor (KNN) [12][13], Fuzzy Logic, Neural Networks,
Sequential Minimal Optimization (SMO) using a fraud and Support Vector Machines[8]; all of these are
dataset consisting of 14,098 words or attributes. Upon implemented in the Weka Data Mining Tool [14]. As already
perusal of the results of such study, it was revealed that

k,(((
Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
mentioned above, in [1] different classification techniques decision trees. The authors still utilized the pipe-lined system
namely J48, Naïve Bayes, and SMO were employed in the model presented in [1] as illustrated in Fig. 1.
analysis using the online scam dataset for which, J48
Decision Tree yields the most favourable results. Researchers conducted pre-processing activities on the
dataset prior to classifying them using various decision tree
Hence, in this present study, the focus is now shifted to algorithms that are available in the Weka text mining tool.
Decision Tree Algorithms. Reference [15] mentioned that a After pre-processing the data, the named entities were
decision tree is a learning classification machine shown as a extracted and represented as an online scam record. A vector
tree; used for representing the classifiers and regressions. A space model results after extracting the named entities which
decision tree is presented with branches and nodes having was then used to conduct the classification phase of the
leaves representing the classes[16]. The terminal nodes will online scam datasets.
show the final classification [17] or the goal decision
variables. In this paper, performance of the three (3) mostly Data classification is then performed on the cleaned
used decision trees algorithms namely Random Tree, J48, unstructured data gathered from Police records as well as
and LMT were evaluated using the online scam dataset. from online scam victims who willingly narrated their scam
experiences.
A. Random Tree
A. Dataset used
Reference [18] discussed that this algorithm performs by
building a model by way of randomly constructing a tree The researchers aimed to have a significant improvement
from a set of possible trees maintaining K random features at of the 14,098-word dataset used in [1] and they achieved this
each node [19][20]. By random, it just means that in the set by gathering a total of 54,059 words derived exclusively
of trees available, each tree has an equal chance of being from written narratives of Filipino online scam victims and
sampled as trees being generated and evaluated are said to police incident reports. To reiterate, this is one of the main
gaps that motivate the researchers in pursuing this study as
have a uniform distribution. Random Trees are principally
single model trees applying Random Forest ideas [21]. they want to investigate the performance of classification
Further, it is also said that the combination of large sets of algorithms using a larger dataset.
random trees generally yield to accurate models[18][21]. As already claimed in the previous study, online scams
Reference [15] tested Random Tree, among other consistently topped the list of most common cybercrimes
decision tree algorithms, to present an automatic reported at the Philippine National Police – Anti Cybercrime
classification of data that included features from a list of Group (PNP-ACG). Further, for this paper, the authors
authentic and hijacked journals. looked into reported or narrated online scam incidents under
different categories or classes namely buy and sell, banking,
employment, imposter, investment, lottery, boiler room,
B. J48 online game, and online romance scams.
Widely known to be an implementation of the C4.5
algorithm [9], the J48 decision tree presents a model in a B. Pre-processing Phase
tree-like flow chart [22] thereby using a recursive divide and
conquer strategy [16][23]. Thus, such algorithm works by re- This phase involved two steps. First, the online scam
selecting the best attribute for data splits and extending the data were manually pre-processed wherein; with the help
nodes to the end of the criterion [22]. from the investigators of the PNP, incident reports were
categorized as to their type or class of online scam
The study of [8] was able to develop a crime prediction enumerated in the preceding section; then, the researchers
prototype model using J48 decision tree as applied in the removed the names of victims, suspects, and reporting
context of law enforcement to address the continuing persons in order to comply with the data privacy laws and the
demand of advanced and new approaches to improve crime confidentiality agreement entered into between the
analytics. respondents and the researchers. In this stage, some English
words involved were also translated to Filipino.
C. Logistic Model Tree
Logistic model trees are called the classification trees
having logistic regression functions at the leaves [24]. The
LMT classifier used to build such models is described as a
combination of the C4.5 decision tree and of the logistic
regression functions [25].
With the LMT classifier, the information gain is used for
fitting while the so-called LogitBoost algorithm of [25] is for
fitting the logistic regression functions at a given node. As
compared to other decision trees, leaf nodes in LMTs are
replaced by a regression plane rather than a constant value.

III. METHODOLOGY
Data mining has various techniques that include
classification [26], clustering [27], regression, and others.
This present paper however mainly focuses on the
classification technique giving emphasis on the use of Fig. 1. Pipe-lined System Model.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
TABLE I. STRINGTOWORDVECTOR PARAMETERS AND VALUES number of folds; while a fold is used as testing, the
Parameter Set Value remaining folds are used as training [13].
IDFTransform True It is important to note that in this paper, a 10-fold cross-
TFTransform False validation was utilized wherein the data were divided into ten
attributeIndices First-last (10) groups. In each of the ten (10) rounds, the nine (9)
groups were used for training while the remaining group was
attributeNamePrefix “”
the one which was being used by the classifier in order to test
Debug False and calculate its classification accuracy. This process is
dictionaryFileToSaveTo Not set repeated 10 times and the average of the testing accuracy
obtained from all the ten (10) rounds determines the overall
doNotCheckCapabilities False
classification accuracy of a classification algorithm [11][32].
doNotOperateOnPerClassBasis False
The authors used the following matrices in order to
invertSelection False
evaluate the results and come up with a relevant performance
lowerCaseTokens True comparison of the decision tree classifiers, namely: (a) the
minTermFreq 1 time for the classifier to build the model; (b) prediction
periodicPruning -1
accuracy; and (c) the errors generated by the classifiers
represented as the Mean Absolute Error (MAE) and the Root
outputWordCounts True
Mean Squared Error (RMSE).
saveDictionaryBinaryForm False
MAE yields the average of the errors [9] in the given
Tokenizer Alphabetic Tokenizer online scam dataset divided by the number of instances in the
stemmer NullStemmer test set with the actual online scam category or label. Suffice
WordsFromFile: it to say that this error rate presents the sum of absolute
stopwordsHandler errors of all instances. It generally provides how far is the
Filipino.txt
model is from giving the right answer [5].
normalizeDocLength No normalization
wordsToKeep 2,000
Contrarily, [3] argued that RMSE must also be used to
evaluate the performance of the decision tree algorithms
together with MAE, as the former renders values in the same
The second pre-processing step was done in Weka range as the predicted value itself, consequently, making the
wherein StringToWordVector filter was employed. Since interpretation of the results easier. RMSE is also said to be
there were words or attributes that appear more frequently in sensitive to outliers and exaggerates their effect, as compared
the online scam dataset which do not provide information to MAE [33]. For its interpretation, it is said that a classifier
about a text, hence, stopwords handler in Weka was used in yields a good performance when low error is indicated in the
order to determine whether a sub string in the text is an results.
empty word. A list of Filipino stopwords (designated as Furthermore, apart from those basic performance
Filipino.txt as shown in Table 1) was saved in a text file and parameters mentioned in the preceding paragraphs, the
Weka’s stopwords handler utilized such list during the pre- following matrices are also considered for optimum
processing activity. evaluation of the decision tree classifiers’ performance, to
Further, Alphabetic tokenizer was also employed to wit: True Positive (TP) rate and Recall which show the
handle tokenization. Tokenization involves the breaking of a correctly classified instances; False Positive (FP) rate which
stream of textual content up into words, symbols, or other reports instances incorrectly labelled as correct instances;
meaningful terms or elements [28][29] branded as tokens Precision rate or also known as Positive Predictive Value
which will then be considered during pre-processing. (PPV) that measures the exactness of the relevant data
retrieved; and the F-Measure or called the F-score which
Table 1 completes the parameters with their reflects the harmonic mean between precision and recall
corresponding values provided in the StringToWordVector [3][9][5][29].
filter patterned from [1].
To evaluate the results, a high precision rate means that
the model returns more relevant data than irrelevant data.
C. Classification
High recall means the model has returned most of the
In order to classify the Filipino online scam data, it is relevant results; and the high value of exactness as presented
important to note that there are primarily two processes in F-Measure leads to more correctly recognized instances
which were actually involved, to wit, (1) building the than improper ones [29].
classification model using training data and (2), the model
usage [17][30]. Hence, the authors run the classification in
Weka using the training set and cross validation methods. IV. RESULTS AND DISCUSSIONS
After applying the three (3) decision tree algorithms on
Using the training set means building up a model for a the online scam dataset, the following results were obtained.
particular classifier where the method is trained with all These results are combined in tables presented below for
available data and then applies the results on the same input performance comparison of such algorithms.
data collection [31].
The results using the training set are presented in Tables
In order to validate the model built by a classifier, the 2 and 3, while Tables 4 and 5 present the results using the
cross-validation method is used. In this method, the dataset 10-fold cross validation.
was to be divided into k folds wherein k stands for the

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
TABLE II. RESULTS USING THE TRAINING SET TABLE IV. RESULTS USING 10-FOLD CROSS-VALIDATION

Decision Tree Algorithms Decision Tree Algorithms


Evaluation Metric Evaluation Metric
Random Tree J48 LMT Random Tree J48 LMT
Time to Build the Time to Build the
0.1 0.15 197.59 0.06 7.03 212.36
Model (in secs) Model (in secs)
Prediction Accuracy 99.57% 89.27% 99.57% Prediction Accuracy 46.35% 65.24% 72.10%

Mean Absolute Error 0.001 0.0358 0.0014 Mean Absolute Error 0.1197 0.0859 0.059
Root Mean Squared Root Mean Squared
0.0218 0.1339 0.0219 0.3449 0.2672 0.2181
Error Error

TABLE III. SUMMARIZED RESULTS ON THE TRAINING SET TABLE V. SUMMARIZED RESULTS ON CROSS-VALIDATION

Evaluation Metric Evaluation Metric


Algorithm TP FP F- Algorithm TP FP F-
Precision Recall Precision Recall
Rate Rate Measure Rate Rate Measure
Random Random
0.996 0.005 0.996 0.996 0.996 0.464 0.253 0.453 0.464 0.456
Tree Tree
J48 0.893 0.051 0.895 0.893 0.887 J48 0.652 0.202 0.657 0.652 0.641

LMT 0.996 0.005 0.996 0.996 0.966 LMT 0.721 0.207 0.721 0.721 0.695

Using the training set to build the model, it was observed As per the results obtained using the 10-fold validation
that both Random Tree and LMT yield a 99.57% accuracy presented in Tables 4 and 5, it is revealed that LMT
rate with J48 having only 89.27% accuracy as presented in outperforms the other algorithms. Using a larger dataset
Table 2. This could mean that both Random Tree and LMT compared to the prior study, the results show that LMT
were able to classify the instances into its actual online scam works very well as compared to J48 as the former yields
category or class more accurately as compared to J48. prediction accuracy of 72.10% which means it has correctly
Although, it took Random Tree only 0.1 second to build the classified 72.10% of the online scam data as compared to
model while 197.59 seconds for LMT. J48’s 65.24%. Random Tree, on the one hand, only records
prediction accuracy of 46.35%. These results also translate
Looking at the errors, Random Tree has a MAE of 0.001 that in precision, LMT outperforms J48 and Random Tree as
and RMSE of 0.0218 while LMT records a MAE of 0.0014 its results precision of 0.721. It can be seen that Random
and 0.0219, respectively, which could translate that both
Tree has the worst performance (0.453) as compared to J48
classifiers gave almost the same average prediction error. (0.657). Additionally, Recall shows the accuracy of the
Using these evaluation metrics, J48 came out last as classification performed based on the total number of online
compared to the other classifiers as it reflects 89.27% scam instances and the results in Table 5 show that LMT
prediction accuracy having the highest error rates as well. classifier’s Recall performs better than J48’s 0.652. Random
Moreover, the results reflected in Table 3 displaying the Tree still performs the least with a Recall of 0.464.
values of TP rate, FP rate, Precision, Recall, and F-Measure Anent the error rates, using the two (2) classification
further show that Random Tree and LMT tied having 0.996 error matrices, namely MAE and RMSE, the results show
in all these matrices, with a corresponding value of 0.0005 as that in both forms of classification error matrices, LMT had
FP rate. The higher the TP rate and Recall and the lower the the lowest amount of errors compared to both J48 and
FP rate, means that the classifier is performing efficiently. Random Tree. LMT only records a MAE of 0.059 as
Just the same, J48 performs the least as it records a lower TP compared to Random Tree's 0.1197 and J48's 0.0859; and a
rate of 0.893 as compared to the other classifiers. The results RMSE of 0.2181 as compared to Random Tree and J48
also reflect that J48 garnered a Precision rate of 0.895; Recall which obtain a MAE of 0.1197 and 0.0859, respectively, and
of 0.893, and F-Measure 0f 0.887. Summarily, it can be RMSE of 0.3449 (Random Tree) and 0.2672 (J48).
observed that running the training set in Weka, both LMT
and Random outperforms J48 using the abovementioned One interesting observation though is that while LMT
performance matrices. can be considered as the most efficient in learning and
classification based on the results, its runtime is the slowest
Furthermore, it was the contention of [1] that k-fold cross among the classifiers. Whereas it only took Random Tree
validation is highly recommended for estimating 0.06 seconds to build the model under cross-validation and
classification accuracy as [13] suggests that this approach is 7.03 seconds for J48, LMT took 212.36 seconds to build the
fitting in testing the algorithms to steer clear from biased model. Although, as already claimed in [1] that where time is
results. It was also observed that cross validation somehow not the main metric for evaluation of the performance, LMT
provides robustness to the classification [21]. can still be declared to have performed better than J48 and
Hence, using a 10-fold cross validation, the algorithm Random Tree using the online scam dataset.
which gives better Prediction Accuracy, Recall, Precision, Furthermore, a graph is also plotted to show and visually
and F-measure value were considered the most efficient. represent the prediction accuracy to compare the different
Nonetheless, the cross-validation results are given much algorithms. Fig. 2 shows the performance comparison based
weight by the researchers in evaluating the performance of on their prediction accuracy and it can be gleaned that indeed
the classifiers at it gives an accurate estimate of the LMT outperforms the other two (2) classifiers yielding the
performance than the training set.
highest overall accuracy rate in 10-fold cross validation.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Comparison of Prediction Accuracy Results.

In addition, Fig. 3 shows that among the classifiers


evaluated, it was LMT that yields the best performance as it
records low error rates both in using the training set and 10-
fold-cross validation.

V. CONCLUSION AND RECOMMENDATIONS


This research was conducted as a continuation and an
improvement of the study of [1] which was argued to be an Fig. 3. Comparison of Errors.
initial attempt to take advantage of data mining techniques
using available cybercrime data as the Philippines is seen to Therefore, it can be clearly concluded that the LMT
have lack of researches applying data mining in the field of classifier is the best performer in all areas of comparison
crime investigations. In the prior work, the authors used a however, the authors note that it ranks last as to run time as it
relatively small quantity of online scam data consisting of records the longest time to build the model. Nevertheless, as
14,098 words. Different classifiers were compared such as it was stressed in [1] that where the building time is not the
Naïve Bayes, J48, and SMO based on their accuracy, sole or main metric for evaluation of the performance, the
precision, recall, error rates, learning time, and other metrics. LMT classifier can be concluded to have performed better
Their results reveal that J48 Decision Tree performs best as it than the other decision tree algorithms.
records the highest accuracy rate and the least error rates as As future works, the authors recommend comparing the
compared to Naïve Bayes and SMO. Upon the presentation results obtained from these classifiers with other decision
of the results to police investigators through a validation tree classifiers implemented in Weka and investigate as to
exercise, J48 was also preferred. what causes the difference in the performance of such
In this present study, however, the authors shift their algorithms. Weight allocation and subsequent ranking
focus to Decision Tree algorithms using a relatively larger approaches may also be introduced or employed in order to
dataset consisting of 54,059 mainly Filipino words. The rank those classifiers that may be evaluated.
dataset was tested through training set and cross-validation
using Random Tree, J48, and LMT classifiers. Based on the ACKNOWLEDGMENT
prediction accuracy results as well as the classification The authors appreciate all the support given by
errors, one may conclude that the LMT classifier was the individuals during the entire course of the research project.
most suited algorithm for the dataset. LMT generates the best
performance and is the most efficient in learning and
classification as against J48 and Random Tree. The reason REFERENCES
for its best performance is that it correctly classifies the [1] E. B. B. Palad, M. S. Tangkeko, L. A. K. Magpantay, and G. L. Sipin,
instances with 72.10% using the 10-fold cross validation “Document Classification of Filipino Online Scam Incident Text
using Data Mining Techniques,” in 19th International Symposium on
leading to correct classification of the data as opposed to Communications and Information Technologies, 2019, pp. 232–237.
J48’s 65.24% and Random Tree’s 46.35%. [2] P. Rajesh and M. Karthikeyan, “A Comparative Study of Data Mining
Algorithms for Decision Tree Approaches using WEKA Tool,” Adv.
Another reason for its best performance is that LMT also Nat. Appl. Sci., vol. 11, no. 9, pp. 230–241, 2017.
obtains the highest values in terms of TP rate, Precision,
[3] Z. E. Rasjid and R. Setiawan, “Performance Comparison and
Recall, and F-measure, as it generates the least amount of Optimization of Text Document Classification using k-NN and Naïve
errors as well. Having the highest precision rate, it can be Bayes Classification Techniques,” Procedia Comput. Sci., vol. 116,
concluded that the model generated by the LMT classifier pp. 107–112, 2017.
returns more relevant data than irrelevant data as against the [4] K. F. Bindhia, Y. Vijayalakshmi, P. Manimegalai, and S. S. Babu,
other classifiers. Since LMT also obtains the highest recall, it “Classification using Decision Tree Approach towards Information
can be interpreted that the model has returned most of the Retrieval Keywords Techniques and a Data Mining Implementation
using WEKA Data Set,” Int. J. Pure Appl. Math., vol. 116, no. 22, pp.
relevant results while the high value of exactness as 19–29, 2017.
presented in LMT’s F-Measure leads to more correctly [5] S. Hussain, N. A. Dahan, F. M. Ba-Alwib, and N. Ribata,
recognized instances than improper ones. “Educational Data Mining and Analysis of Students ’ Academic

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.
Performance Educational Data Mining and Analysis of Students ’ [20] P. Kaur and A. Khamparia, “Classification of liver based diseases
Academic Performance Using WEKA,” Indones. J. Electr. Eng. using random tree,” Int. J. Adv. Eng. Technol., vol. 8, no. 3, pp. 306–
Comput. Sci., vol. 9, no. 2, pp. 447–459, 2018. 313, 2015.
[6] C. Anuradha and T. Velmurugan, “A Comparative Analysis on the [21] A. K. Mishra and B. K. Ratha, “Study of Random Tree and Random
Evaluation of Classification Algorithms in the Prediction of Students Forest Data Mining Algorithms for Microarray Data Analysis,” Int. J.
Performance,” Indian J. Sci. Technol., vol. 8, no. 15, p. 12, 2015. Adv. Electr. Comput. Eng., vol. 3, no. 4, pp. 5–7, 2016.
[7] A. Arivarasan and M. Karthikeyan, “Classification based Performance [22] N. N. Sakhare and S. Joshi, “Classification of Criminal Data Using
analysis using Naïve-Bayes J48 and Random forest algorithms,” Int. J48-Decision Tree Algorithm,” Int. J. Data Warehous. Min., vol. 4,
J. Appl. Res., vol. 3, no. 6, pp. 174–178, 2017. no. 3, pp. 167–171, 2015.
[8] E. Ahishakiye, D. Taremwa, E. O. Omulo, and I. Niyonzima, “Crime [23] A. Pelt, T. brkç, P. Özalp, B. Tatekn, S. Biyokimyasal, and V.
Prediction using Decision Tree (J48) Classification Algorithm,” Int. J. Snflandrlmas, “Classification of Biochemical and Biomechanical
Comput. Inf. Technol., vol. 06, no. 03, pp. 2279–764, 2017. Data of Diabetic Rats Treated with Magnetic Field By PCA-
[9] R. Panigrahi and S. Borah, “Rank Allocation to J48 Group of Supported J48 Algorithm PCA Destekli J48 Algoritmas le Manyetik
Decision Tree Classifiers using Binary and Multiclass Intrusion Alanla Tedavi Edilen Diyabetik,” Kafkas Univ. Vet. Fak. Derg., vol.
Detection Datasets,” in International Conference on Computational 25, no. 6, pp. 741–747, 2019.
Intelligence and Data Science, 2018, vol. 132, pp. 323–332. [24] F. M. Ali, E.-B. E. Fgee, and Z. S. Zubi, “Predicting Performance of
[10] A. H. Aliwy and E. H. A. Ameer, “Comparative Study of Five Text Classification Algorithms,” Int. J. Comput. Eng. Technol., vol. 6, no.
Classification Algorithms with their Improvements,” Int. J. Appl. Eng. 2, pp. 19–28, 2015.
Res. ISSN, vol. 12, no. 14, pp. 973–4562, 2017. [25] D. T. Bui, T. A. Tuan, H. Klempe, B. Praghan, and I. Revhaug,
[11] G. Obuandike, A. Isah, and J. Alhasan, “Analytical Study of Some “Spatial prediction models for shallow landslide hazards: a
Selected Classification Algorithms in WEKA Using Real Crime comparative assessment of the efficacy of support vector machines ,
Data,” Int. J. Adv. Res. Artif. Intell., vol. 4, no. 12, pp. 44–48, 2015. artificial neural networks , kernel logistic regression , and logistic
model tree,” Landslides, vol. 13, pp. 361–378, 2016.
[12] M. Bilal, H. Israr, M. Shahid, and A. Khan, “Sentiment classification
of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and [26] A. Naik and L. Samant, “Correlation Review of Classification
KNN classification techniques,” J. King Saud Univ. - Comput. Inf. Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra,
Sci., vol. 28, no. 3, pp. 330–344, 2016. Orange and Knime,” Procedia Comput. Sci., vol. 85, pp. 662–668,
2016.
[13] S. G. Cho and S. B. Kim, “A Data-driven Text Similarity Measure
based on Classification Algorithms,” Int. J. Ind. Eng., vol. 24, no. 3, [27] R. Kiani, S. Mahdavi, and A. Keshavarzi, “Analysis and Prediction of
pp. 328–339, 2017. Crimes by Clustering and Classification,” Int. J. Adv. Res. Artif.
Intell., vol. 4, no. 8, pp. 11–17, 2015.
[14] R. Cretulescu, D. Morariu, and M. Breazu, “Using WEKA
Framework in Document Classification,” Int. J. Adv. Stat. IT&C [28] V. Gurusamy, S. Kannan, and N. K., “Performance Analysis:
Econ. Life Sci., vol. 6, no. 2, 2016. Stemming Algorithm for the English Language,” Int. J. Sci. Res. Dev.,
vol. 5, no. 05, pp. 1933–1938, 2017.
[15] M. A. Shahri, M. D. Jazi, G. Borchardt, and M. Dadkhah, “Detecting
Hijacked Journals by Using Classification Algorithms,” Sci. Eng. [29] K. S. Digamberao and R. Prasad, “Author Identification using
Ethics, vol. 24, pp. 655–668, 2018. Sequential Minimal Optimization with rule-based Decision Tree on
Indian Literature in Marathi,” Procedia Comput. Sci. J. elsevier, vol.
[16] U. Bashir and M. Chachoo, “Performance Evaluation of J48 and 132, pp. 1086–1101, 2018.
Bayes Algorithms for Intrusion Detection System,” Int. J. Netw.
Secur. its Appl., vol. 9, no. 4, pp. 01–11, 2017. [30] A. K. Pandey, “A comparative study of classification techniques by
utilizing WEKA,” 2016 Int. Conf. Signal Process. Commun., pp. 219–
[17] F. Alam and S. Pachauri, “Comparative Study of J48 , Naive Bayes 224, 2016.
and One-R Classification Technique for Credit Card Fraud Detection
using WEKA,” Adv. Comput. Sci. Technol., vol. 10, no. 6, pp. 1731– [31] G. D. K. Kishore and M. B. Reddy, “Comparative Analysis between
1743, 2017. Classification Algorithms and Data Sets (1:N & N:1) through
WEKA,” Open Access Int. J. Sci. Eng., vol. 2, no. 5, pp. 23–28, 2017.
[18] S. Kalmegh, “Analysis of WEKA Data Mining Algorithm REPTree ,
Simple Cart and Random Tree for Classification of Indian News,” Int. [32] A. S. Suguitan and L. N. Dacaymat, “Vehicle Image Classification
J. Innov. Sci. Eng. Technol., vol. 2, no. 2, pp. 438–446, 2015. Using Data Mining Techniques,” in 2nd International Conference on
Computer Science and Software, 2019, pp. 13–17.
[19] P. M. Vairavan and B. S. Sasidhar, “Classification Using Decision
Tree Approach towards Information Retrieval Keywords Techniques [33] G. Saltos and E. Haig, “An Exploration of Crime Prediction Using
and a Data Mining Implementation Using WEKA Data Set,” Int. J. Data Mining on Open Data,” Int. J. Inf. Technol. Decis. Mak. ·, vol.
Pure Appl. Math., vol. 116, no. 22, pp. 19–29, 2017. 15, no. 9, 2017.

Authorized licensed use limited to: De La Salle University. Downloaded on October 10,2020 at 14:02:35 UTC from IEEE Xplore. Restrictions apply.

You might also like