Performance Evaluation: A Comparative Study of Various Classifiers

© 2013, IJARCSSE All Rights Reserved Page | 972
Volume 3, Issue 11, November 2013 ISSN: 2277 128X
International Journal of Advanced Research in
Computer Science and Software Engineering
Research Paper
Available online at: www.ijarcsse.com
Performance Evaluation: A Comparative Study of Various Classifiers
Amresh Kumar, Yashwant S. Ingle
Department of Computer Science & Engineering
G.H. Raisoni College of Engineering, Nagpur, INDIA
Abstract— Real-world knowledge discovery processes typically consist of complex data pre-processing, machine
learning, evaluation, and visualization steps. Hence a data mining platform should allow complex nested operator
chains or trees, provide the transparent data managing, comfortable constraint handling and optimization, be ﬂexible,
extendable and easy-to-use. Modern machine learning techniques have encouraged interest in the development of
various systems that ensure protected, trustworthy and many more operations in the different fields and applications.
In an earlier study, many other approaches/methods were investigated to develop various applications using modern
machine learning techniques and more specific classification algorithms. The Weka machine learning workbench
provides a general-purpose environment for automatic classification, clustering and feature selection, and common
data mining problems in bioinformatics research. Here in this Project Report paper we have used various classifiers
with filters to perform classification and we have done analysis of data with different classifiers and then we have
done feature selection process and during all these activities we have observed and record the various performance
change and different graphs which are briefed inside this paper.
Keywords— Machine Learning, WEKA, Data mining, KDD, Classification, Filters, Feature Selection
I. INTRODUCTION
In an earlier study, many other approaches/methods were investigated to develop various applications using modern
machine learning techniques and more specific classification algorithms. Modern machine learning techniques have
encouraged interest in the development of various systems that ensure secure, reliable and many more operations in the
different fields and applications. The Weka machine learning workbench provides a general-purpose environment for
automatic classification, clustering and feature selection. Therefore Weka also contains an extensive data pre-processing
methods and the experimental comparison of different machine learning techniques on the same problem. Data
mining (the analysis step of the "Knowledge Discovery in Databases”) is a field at the intersection of computer
science and statistics is the process that attempts to discover patterns in huge data sets. It utilizes methods at the junction
of artificial intelligence, machine learning, statistics, and database systems. The term Knowledge Discovery in Databases
or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application
of particular data mining methods. The unifying goal of the KDD process is to extract knowledge from data in the
context of large databases. In feature selection operation we are going to find out which are the most important instances
to carry out the classification to get accurate result by improving their performance. Here, in this Project Report paper we
have used various classifiers with filters to perform classification and we have done analysis of data with different
classifiers and then we have done feature selection process and during all these activities we have observed and record
the various performance change and different graphs which are briefed inside this paper. To improve the performance we
have experimented with Artificial Intelligence based (AI Classifier), a rule-based learning method using statistical
analysis and also Decision tree and Support Vector Machine (SVM) based classification schemes were used to analysis
and inspection of data. This selected algorithm efficiency and overall performance for the given data set (Temp.csv) are
observed and calculated. This experiment/study has been conducted using six classifiers, namely SMO, REPTree, IBK,
Logistic and Multilayer perceptron, with Temp.csv datasets having 41 instances. The Waikato Environment for
Knowledge Analysis (WEKA) learning tool has been used in this Experiment/study.
II. METHODOLOGY
Fig 1 Flow of Methodology involved.

Amresh et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(11),
November - 2013, pp. 972-976
Using Weka tool we have executed six various classifiers algorithm on our dataset and compared the various classifiers
based on the ROC Area (Weighted Average) value.
Fig 2 ROC (Weighted Average)
And also we found that out 6 classifiers, 4 Classifiers are showing 100% correctly classified instances and 2 Classifiers
REPTree and DMNBText are showing incorrectly classified instances, therefore next step we proceed with finding out
which instances was not correctly classified. For this we have to do a tuple wise analysis, so for the same I am taking
only one classifier now i.e. DMNBtext Classifier.
III. RESULTS
TABLE 1: CLASSIFIERS COMPARISON CHART
Sl. No. Classifiers ROC (Weighted Average)
1 SMO 1
2 REPTree 0.746
3 Ibk 1
4 Logistic 1
5 MultilayePerceptron 1
6 DMNBText 0.997
TABLE 2: CONSOLIDATED CLASSIFIERS SHEET USING TRAINING SET AS TEST OPTIONS
Relation=temp
Summary
Number ofCorrectly Classified Instances 41 28 41 41 41 35
Correctly Classified Instances % 100 68.2927 100 100 100 85.3659
Number ofIncorrectly Classified Instances 0 13 0 0 0 6
Incorrectly Classified Instances % 0 31.7073 0 0 0 14.6341
Kappa statistic 1 0.4336 1 1 1 0.7681
Mean absolute error 0.2222 0.2969 0.0303 0 0.0059 0.2135
Root mean squared error 0.2722 0.3853 0.0321 0 0.0091 0.2783
Relative absolute error 55.9901 74.8118 7.635 0.0001 1.4937 53.7995
Root relative squared error 61.3467 86.8492 7.2447 0.0001 2.0548 62.7209
TotalNumber ofInstances 41 41 41 41 41 41
Detailed Accuracy By Class
TP Rate (weighted avg) 1 0.683 1 1 1 0.854
FP Rate (weighted avg) 0 0.266 0 0 0 0.025
Precision (weighted avg) 1 0.585 1 1 1 0.927
Recall(weighted avg) 1 0.683 1 1 1 0.854
F-measure (weighted avg) 1 0.627 1 1 1 0.871
ROC Area (weighted avg) 1 0.746 1 1 1 0.997
Confusion Matrix
a=O,b=L,c=N
a-a 22 16 22 22 22 19
a-b 0 6 0 0 0 0
a-c 0 0 0 0 0 3
b-a 0 1 0 0 0 0
b-b 13 12 13 13 13 10
b-c 0 0 0 0 0 3
c-a 6 6 0 0 0 0
c-b 0 0 0 0 0 0
c-c 6 0 6 6 6 6
CLASSIFIER 6:
DMNBText
CLASSIFIER 3:
Ibk
CLASSIFIER 4:
Logistic
CLASSIFIER 5:
MultiLayer
Perceptron
DATA CLASSIFICATION STATISTICS
Data File Description
Attributes=62 Instances=41
CLASSIFIER 1:
AI Classifier
SMO
CLASSIFIER 2:
Tree Classifier
REPTree

November - 2013, pp. 972-976
TABLE 3
ANALYSIS AND PERFORMANCE CHANGE FOR DMNBTEXT CLASSIFIERS USING CROSS-VALIDATION AS TEST OPTIONS
From the Figure given below we can see the ROC value for DMNBText Classifier without and With the Attributor
Evaluator. Thus by Calculating ROC change we can Find out the performance Change as given below.
Performance Change= (ROCWith Evaluator - ROCWithout Evaluator) / ROCWithout Evaluator
Without
Attribute
Selection
With Attribute
Selection:
Attribute
Evaluator: na
me
SearchMethod:
Ranker
List of Se le cte d Attribute s---------->> All
RL-54, L-18, L-14, L-
20, L-24,, L-16, L-
15, H-35, RL-60, RL-
62, P-49, H-32, L-17
Summary
Number of Correctly Classified Instances 33 31
Correctly Classified Instances % 80.487 75.6098
Number of Incorrectly Classified Instances 8 10
Incorrectly Classified Instances % 19.512 24.3902
Kappa statistic 0.6439 0.5514
Mean absolute error 0.2455 0.2827
Root mean squared error 0.3339 0.3463
Relative absolute error 61.4542 70.7685
Root relative squared error 74.8162 77.6059
Total Number of Instances 41 41
TP Rate (weighted avg) 0.805 0.756
FP Rate (weighted avg) 0.175 0.215
Precision (weighted avg) 0.829 0.645
Recall (weighted avg) 0.805 0.756
F-measure (weighted avg) 0.77 0.696
ROC Area (weighted avg) 0.897 0.909
Confusion Matrix
a=O,b=L,c=N
a-a 21 20
a-b 1 2
a-c 0 0
b-a 2 2
b-b 11 11
b-c 0 0
c-a 3 4
c-b 2 2
c-c 1 0
Performance change(%) 1.34
CLASSIFIER NAME: DMNBText
Fig 3 Frequency
Fig 4 Attribute Frequency Pattern

November - 2013, pp. 972-976
TABLE 4: MULTILLAYER PERCEPTRON (USING TRAINING SET)
Summary
With all
attributes
With only
selected 35
attributes
Correctly Classified Instances % 100 100
Incorrectly Classified Instances % 0 0
Kappa statistic 1 1
TP Rate (weighted avg) 1 1
FP Rate (weighted avg) 0 0
Precision (weighted avg) 1 1
Recall (weighted avg) 1 1
F-measure (weighted avg) 1 1
ROC Area (weighted avg) 1 1
Confusion Matrix
a=O,b=L,c=N
a-a 22 22
a-b 0 0
a-c 0 0
b-a 0 0
b-b 13 13
b-c 0 0
c-a 0 0
c-b 0 0
c-c 6 6
Performance change 0 0
Classifier Name: MultiLayerPerceptron (Using Traing Set)
TABLE 5: MULTILLAYER PERCEPTRON (USING CROSS VALIDATION)
Classifier Name: MultiLayerPerceptron (Cross-Validation)
Summary
With all
attributes
With only selected
35 attributes
Correctly Classified Instances % 73.1701 78.0488
Incorrectly Classified Instances % 26.8293 21.9512
Kappa statistic 0.5847 0.6506
TP Rate (weighted avg) 0.732 0.78
FP Rate (weighted avg) 0.101 0.093
Precision (weighted avg) 0.818 0.828
Recall (weighted avg) 0.732 0.78
F-measure (weighted avg) 0.754 0.795
ROC Area (weighted avg) 0.908 0.911
Confusion Matrix
a=O,b=L,c=N
a-a 15 16
a-b 1 1
a-c 6 5
b-a 1 0
b-b 10 12
b-c 2 1
c-a 1 2
c-b 0 0
c-c 5 4
Performance change 0 0.330396476

November - 2013, pp. 972-976
IV. DISCUSSION
For the experimental analysis SMO, REPTree, IBK, Logistic and Multilayer perceptron classifiers are considered in this
experiment/study. 41 instances in data sets were selected from the collected data. To cover this experimental I have taken
Temp.csv datasets, and the observation and Calculation is done by considering following:
 Attribute selection
 Frequency
 ROC
 Confusion metrics
TABLE 5
COMPARISONS WITH GRAPH WITHOUT FEATURE EXTRACTION AND WITH FEATURE EXTRACTION USING DMNBTEXT
CLASSIFIER
Classifier:
DMNBText Type ofSearchMethod ROCArea(Wt. Avg.)
AttributeEvaluator:
ChiSquaredAttributeEval WithoutFeature Extraction 0.897
Ranker 0.909
Fig 5 Performance Index for DBMNText
V. CONCLUSION
As a conclusion we can tell that classification algorithms play a key role to solve real world problems. Selection of an
application specific classifier is an emerging research area. In this paper, performance change is being evaluated and
calculated using various popular classifiers. Initially, the percentage of correct classifications has been measured with the
highest accuracy. Later, ranking performance has been estimated to select a suitable algorithm for this application. The
ranking performance has shown that DMNBText performs the best for the given datasets. This also reduces
computational complexity, and development and maintenance costs both in terms of hardware and human inspection.
Based on the results obtained in the various algorithms, we can conclude that the feature selection concept played an
important role and can be useful component for many classifications. This is possible due to the low computational cost
of this method, which is more efficient compared to the other ones. The main advantage of this method is that it makes
no assumptions and these methods, not only improved the classification speed significantly, but they also improved the
accuracy rate and the reliability in most of the cases. Thus using the concept of Data Mining techniques we examine and
calculate the performance using ROC Values.
REFERENCES
[1] Ioannis Charalampopoulos, Ioannis Anagnostopoulos, A Comparable Study employing WEKA
Clustering/Classification Algorithms for Web Page Classification, IEEE 2011, Page(s): 235 – 239.
[2] G M Shafiullah, A B M Shawkat Ali, Adam Thompson, Peter J Wolfs, Rule-Based Classification Approach for
Railway Wagon HealthMonitoring, IEEE 2010, Page(s): 1 – 7.
[3] Weka, Available: http://www.cs.waikato.ac.nz
[4] Data_mining, Available: http://en.wikipedia.org
[5] aimag-kdd-overview-1996-Fayyad.pdf , Available: http://www.kdnuggets.com
[6] IJCSE10-01-04-51.pdf, Available: http://www.ijcse.com

Performance Evaluation: A Comparative Study of Various Classifiers

More Related Content

What's hot

Viewers also liked

Similar to Performance Evaluation: A Comparative Study of Various Classifiers

Recently uploaded

Performance Evaluation: A Comparative Study of Various Classifiers