The buttons can be used to start the following applications:
Explorer- An environment for exploring data with WEKA
a) Click on ―explorer‖ button to bring up the explorer window.
b) Make sure the ―preprocess‖ tab is highlighted.
c) Open a new file by clicking on ―Open New file‖ and choosing a file with
―.arff‖ extension from the ―Data‖ directory.
d) Attributes appear in the window below.
e) Click on the attributes to see the visualization on the right.
f) Click ―visualize all‖ to see them all
Experimenter- An environment for performing experiments and conducting
statistical tests between learning schemes.
a) Experimenter is for comparing results.
b) Under the ―set up‖ tab click ―New‖.
c) Click on ―Add New‖ under ―Data‖ frame. Choose a couple of arff format
files from ―Data‖ directory one at a time.
d) Click on ―Add New‖ under ―Algorithm‖ frame. Choose several algorithms,
one at a time by clicking ―OK‖ in the window and ―Add New‖.
e) Under the ―Run‖ tab click ―Start‖.
f) Wait for WEKA to finish.
g) Under ―Analyses‖ tab click on ―Experiment‖ to see results.
Knowledge Flow- This environment supports essentially the same functions as the
Explorer but with a drag-and-drop interface. One advantage is that it supports
incremental learning.
Simple CLI - Provides a simple command-line interface that allows direct execution of
WEKA commands for operating systems that do not provide their own command line
interface.
1
(iii). Navigate the options available in the WEKA (ex. Select
attributes panel, Preprocess panel, classify panel, Cluster panel, Associate
panel and Visualize panel)
When the Explorer is first started only the first tab is active; the others are greyed
out. This is because it is necessary to open (and potentially pre-process) a data set
before starting to explore the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Once the tabs are active, clicking on them flicks between different screens, on which
the respective actions can be performed. The bottom area of the window (including
the status box, the log button, and the Weka bird) stays visible regardless of which
section you are in.
1. Preprocessing
Loading Data:
The first four buttons at the top of the preprocess section enable you to load data into
WEKA:
1. Open file.... Brings up a dialog box allowing you to browse for the datafile on the
local file system.
2. Open URL ..... Asks for a Uniform Resource Locator address for where the data is
stored.
2
3. Open DB ..... Reads data from a database. (Note that to make this work you might
have to edit
the file in weka/experiment/DatabaseUtils.props.)
4. Generate. .. Enables you to generate artificial data from a variety of Data
Generators.
Using the Open file.... button you can read files in a variety of formats:
WEKA‘s ARFF format, CSV format, C4.5 format, or serialized Instances format. ARFF
files typically have a .arff extension, CSV files a .csv extension,C4.5 files a .data and
.names extension, and serialized Instances objects a .bsi extension.
2. Classification:
Selecting a Classifier
At the top of the classify section is the Classifier box. This box has a text field that
gives the name of the currently selected classifier, and its options. Clicking on the
text box with the left mouse button brings up a Generic Object Editor dialog box,
just the same as for filters, that you can use to configure the options of the current
classifier. With a right click (or Alt+Shift+left click) you can once again copy the
setup string to the clipboard or display the properties in a Generic Object Editor
dialog box. The Choose button allows you to choose on4eof the classifiers that are
available in WEKA
Test Options:
3
The result of applying the chosen classifier will be tested according to the options
that are set by clicking in the Test options box. There are four test modes:
1. Use training set: The classifier is evaluated on how well it predicts the class of
the instances it was trained on.
2. Supplied test set: The classifier is evaluated on how well it predicts the class of
a set of instances loaded from a file. Clicking the Set... button brings up a dialog
allowing you to choose the file to test on.
3. Cross-validation: The classifier is evaluated by cross-validation, using the
number of folds that are entered in the Folds text field.
4. Percentage split: The classifier is evaluated on how well it predicts a certain
percentage of the data which is held out for testing. The amount of data held out
depends on the value entered in the % field.
3. Clustering:
Cluster Modes:
The Cluster mode box is used to choose what to cluster and how to evaluate the
results. The first three options are the same as for classification: Use training set,
Supplied test set and Percentage split.
4
4. Associating:
Setting Up
This panel contains schemes for learning association rules, and the learners are
chosen and configured in the same way as the clusters, filters, and classifiers in the
other panels.
3. Selecting Attributes:
Searching and Evaluating
5
Attribute selection involves searching through all possible combinations of attributes in
the data to find which subset of attributes works best for prediction. To do this, two
objects must be set up: an attribute evaluator and a search method. The evaluator
determines what method is used to assign a worth to each subset of attributes. The
search method determines what style of search is performed.
3. Visualizing:
WEKA‘s visualization section allows you to visualize 2D plots of the current relation.
Conclusion:-
Hence we have successfully built Data Warehouse and explored WEKA.
6
Association rule mining:
Association rule mining is a machine-learning method used to discover interesting
relations between variables in large databases.It looks for patterns in the data and can be
performed using various methods, such as the Apriori algorithm. It is intended to identify
strong rules discovered using some measures of interestingness.
Procedure:-
For pre-processing the data after selecting the dataset (IRIS.arff).
Select Filter option & apply the resample filter & see the below results.
Select another filter option & apply the discretization filter, see the below results
7
Likewise, we can apply different filters for preprocessing the data & see the
results in different dimensions.
Conclusion:- Hence we have performed data pre-processing tasks and
demonstrated / performed association rule mining on data sets.
8
Procedure:-
1. Load the dataset (Iris-2D. arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under bayes section.
3. In which we selected Naïve-Bayes algorithm & click on start option with ―use
training set‖ test option enabled.
4. Then we will get detailed accuracy by class consists of F-measure, TP rate, FP rate,
Precision, Recall values& Confusion Matrix as represented below.
Conclusion:- Hence we have successfully demonstrated the classification rule
process on WEKA data-set using Naive Bayes algorithm.
9
Slice:
The slice operation selects one particular dimension from a given cube and
provides a
new sub-cube.
Dice:
Dice selects two or more dimensions from a given cube and provides a new
subcube.
Pivot (rotate):
The pivot operation is also known as rotation. It rotates the data axes in view
in order to provide an alternative presentation of data.
Now, we are practically implementing all these OLAP Operations using Microsoft
Excel.
Procedure for OLAP Operations:
1. Open Microsoft Excel, go to Data tab in top & click on ―Existing Connections”.
2. Existing Connections window will be opened, there “Browse for more” option
should be clicked for importing .cub extension file for performing OLAP
Operations. For sample, I took music.cub file.
1
0
As shown in above window, select ―PivotTable Report” and click “OK”.
We got all the music.cub data for analysing different OLAP Operations. Firstly, we
performed drill-down operation as shown below.
In the above window, we selected year „2008‟ in „Electronic‟ Category,
then automatically the Drill-Down option is enabled on top navigation options. We
will click on „Drill-Down‟ option, then the below window will be displayed.
1
1
Now we are going to perform roll-up (drill-up) operation, in the above window I
selected January month then automatically Drill-up option is enabled on top. We
will click on Drill-up option, then the below window will be displayed.
Next OLAP operation Slicing is performed by inserting slicer as shown in top
navigation options.
1
2
While inserting slicers for slicing operation, we select 2 Dimensions (for e.g.
Category Name & Year) only with one Measure (for e.g. Sum of sales).After inserting
a slice& adding a filter (Category Name: AVANT ROCK & BIG BAND; Year: 2009 &
2010), we will get table as shown below.
Dicing operation is similar to Slicing operation. Here we are selecting 3 dimensions
(Category Name, Year, Region Code)& 2 Measures (Sum of Quantity, Sum of Sales)
through „insert slicer‟ option. After that adding a filter for Category Name, Year &
Region Code as shown below.
1
3
Finally, the Pivot (rotate) OLAP operation is performed by swapping rows (Order
Date-Year) & columns (Values-Sum of Quantity & Sum of Sales) through right side
bottom navigation bar as shown below.
After Swapping (rotating), we will get resultant as represented below
with a pie-chart for Category-Classical& Year Wise data.
1
4
Conclusion:- Hence we have successfully implemented OLAP operations.
1
5
Procedure:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with use
training set option.
4. Then we will get regression model & its result as shown below.
5. The patterns are visually mentioned below for regression model through visualize
classifier errors option which is available in right click options.
B. Use options cross-validation and percentage split and repeat running
the Linear Regression Model. Observe the results and derive meaningful
results.
1
6
Procedure for cross-validation:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with
cross validation option with 10 folds.
4. Then we will get regression model & its result as shown below.
Procedure for percentage split:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with
percentage split option with 66% split.
4. Then we will get regression model & its result as shown below.
1
7
C. Explore Simple linear regression technique that only
looks at one variable Procedure:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Simple Linear Regression algorithm & click on start option
with use training set option with one variable (MYCT).
4. Then we will get regression model & its result as shown below.
Conclusion:- Hence we have successfully performed Regression on data sets
1
8
Procedure:-
1. Load the dataset (Iris.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different clustering
algorithms under lazy section.
3. In which we selected Simple K-Means algorithm & click on start option with
―use training set‖ test option enabled.
4. Then we will get the sum of squared errors, centroids, No. of iterations &
clustered instances as represented below.
5. If we right click on simple k means, we will get more options in which
―Visualize cluster assignments‖ should be selected for getting cluster
visualization as shown below.
1
9
Conclusion:-Hence we have demonstrated the clustering rule process on
data-set iris.arff using simple k-means.
2
0
Procedure:-
1. Load the dataset (Breast-Cancer.arff) into weka tool
2. Go to associate option & in left-hand navigation bar we can see different
association algorithms.
3. In which we can select Aprori algorithm & click on select option.
4. Below we can see the rules generated with different support & confidence values
for that selected dataset.
Conclusion:- Hence we have implemented the Apriori algorithm.
2
1