0% found this document useful (0 votes)

26 views21 pages

DMW Lab Print

The document provides a comprehensive guide on using WEKA for data exploration, experimentation, and analysis, detailing the functionalities of various applications such as Explorer, Experimenter, and Knowledge Flow. It outlines procedures for data preprocessing, classification, clustering, association rule mining, and OLAP operations using Microsoft Excel. The document also includes step-by-step instructions for implementing different algorithms and visualizing results using datasets like Iris and Breast Cancer.

Uploaded by

Professional Morin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views21 pages

DMW Lab Print

Uploaded by

Professional Morin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

The buttons can be used to start the following applications:

Explorer- An environment for exploring data with WEKA

a) Click on ―explorer‖ button to bring up the explorer window.

b) Make sure the ―preprocess‖ tab is highlighted.
c) Open a new file by clicking on ―Open New file‖ and choosing a file with
―.arff‖ extension from the ―Data‖ directory.
d) Attributes appear in the window below.
e) Click on the attributes to see the visualization on the right.
f) Click ―visualize all‖ to see them all

Experimenter- An environment for performing experiments and conducting

statistical tests between learning schemes.

a) Experimenter is for comparing results.

b) Under the ―set up‖ tab click ―New‖.
c) Click on ―Add New‖ under ―Data‖ frame. Choose a couple of arff format
files from ―Data‖ directory one at a time.
d) Click on ―Add New‖ under ―Algorithm‖ frame. Choose several algorithms,
one at a time by clicking ―OK‖ in the window and ―Add New‖.
e) Under the ―Run‖ tab click ―Start‖.
f) Wait for WEKA to finish.
g) Under ―Analyses‖ tab click on ―Experiment‖ to see results.

Knowledge Flow- This environment supports essentially the same functions as the
Explorer but with a drag-and-drop interface. One advantage is that it supports
incremental learning.
Simple CLI - Provides a simple command-line interface that allows direct execution of
WEKA commands for operating systems that do not provide their own command line
interface.

1
(iii). Navigate the options available in the WEKA (ex. Select
attributes panel, Preprocess panel, classify panel, Cluster panel, Associate
panel and Visualize panel)

When the Explorer is first started only the first tab is active; the others are greyed
out. This is because it is necessary to open (and potentially pre-process) a data set
before starting to explore the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Once the tabs are active, clicking on them flicks between different screens, on which
the respective actions can be performed. The bottom area of the window (including
the status box, the log button, and the Weka bird) stays visible regardless of which
section you are in.

1. Preprocessing

Loading Data:

The first four buttons at the top of the preprocess section enable you to load data into
WEKA:
1. Open file.... Brings up a dialog box allowing you to browse for the datafile on the
local file system.

2. Open URL ..... Asks for a Uniform Resource Locator address for where the data is
stored.

2
3. Open DB ..... Reads data from a database. (Note that to make this work you might
have to edit
the file in weka/experiment/DatabaseUtils.props.)

4. Generate. .. Enables you to generate artificial data from a variety of Data

Generators.
Using the Open file.... button you can read files in a variety of formats:
WEKA‘s ARFF format, CSV format, C4.5 format, or serialized Instances format. ARFF
files typically have a .arff extension, CSV files a .csv extension,C4.5 files a .data and
.names extension, and serialized Instances objects a .bsi extension.

2. Classification:

Selecting a Classifier

At the top of the classify section is the Classifier box. This box has a text field that
gives the name of the currently selected classifier, and its options. Clicking on the
text box with the left mouse button brings up a Generic Object Editor dialog box,
just the same as for filters, that you can use to configure the options of the current
classifier. With a right click (or Alt+Shift+left click) you can once again copy the
setup string to the clipboard or display the properties in a Generic Object Editor
dialog box. The Choose button allows you to choose on4eof the classifiers that are
available in WEKA

Test Options:

3
The result of applying the chosen classifier will be tested according to the options
that are set by clicking in the Test options box. There are four test modes:

1. Use training set: The classifier is evaluated on how well it predicts the class of
the instances it was trained on.

2. Supplied test set: The classifier is evaluated on how well it predicts the class of
a set of instances loaded from a file. Clicking the Set... button brings up a dialog
allowing you to choose the file to test on.

3. Cross-validation: The classifier is evaluated by cross-validation, using the

number of folds that are entered in the Folds text field.

4. Percentage split: The classifier is evaluated on how well it predicts a certain

percentage of the data which is held out for testing. The amount of data held out
depends on the value entered in the % field.

3. Clustering:

Cluster Modes:

The Cluster mode box is used to choose what to cluster and how to evaluate the
results. The first three options are the same as for classification: Use training set,
Supplied test set and Percentage split.

4
4. Associating:

Setting Up
This panel contains schemes for learning association rules, and the learners are
chosen and configured in the same way as the clusters, filters, and classifiers in the
other panels.

3. Selecting Attributes:

Searching and Evaluating

5
Attribute selection involves searching through all possible combinations of attributes in
the data to find which subset of attributes works best for prediction. To do this, two
objects must be set up: an attribute evaluator and a search method. The evaluator
determines what method is used to assign a worth to each subset of attributes. The
search method determines what style of search is performed.

3. Visualizing:

WEKA‘s visualization section allows you to visualize 2D plots of the current relation.

Conclusion:-
Hence we have successfully built Data Warehouse and explored WEKA.

6
Association rule mining:
Association rule mining is a machine-learning method used to discover interesting
relations between variables in large databases.It looks for patterns in the data and can be
performed using various methods, such as the Apriori algorithm. It is intended to identify
strong rules discovered using some measures of interestingness.

Procedure:-
For pre-processing the data after selecting the dataset (IRIS.arff).
Select Filter option & apply the resample filter & see the below results.

Select another filter option & apply the discretization filter, see the below results

7
Likewise, we can apply different filters for preprocessing the data & see the
results in different dimensions.

Conclusion:- Hence we have performed data pre-processing tasks and

demonstrated / performed association rule mining on data sets.

8
Procedure:-
1. Load the dataset (Iris-2D. arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under bayes section.
3. In which we selected Naïve-Bayes algorithm & click on start option with ―use
training set‖ test option enabled.
4. Then we will get detailed accuracy by class consists of F-measure, TP rate, FP rate,
Precision, Recall values& Confusion Matrix as represented below.

Conclusion:- Hence we have successfully demonstrated the classification rule

process on WEKA data-set using Naive Bayes algorithm.

9
Slice:
The slice operation selects one particular dimension from a given cube and
provides a
new sub-cube.

Dice:
Dice selects two or more dimensions from a given cube and provides a new
subcube.

Pivot (rotate):
The pivot operation is also known as rotation. It rotates the data axes in view
in order to provide an alternative presentation of data.
Now, we are practically implementing all these OLAP Operations using Microsoft
Excel.

Procedure for OLAP Operations:

1. Open Microsoft Excel, go to Data tab in top & click on ―Existing Connections”.
2. Existing Connections window will be opened, there “Browse for more” option
should be clicked for importing .cub extension file for performing OLAP
Operations. For sample, I took music.cub file.

1
0
As shown in above window, select ―PivotTable Report” and click “OK”.

We got all the music.cub data for analysing different OLAP Operations. Firstly, we
performed drill-down operation as shown below.

In the above window, we selected year „2008‟ in „Electronic‟ Category,

then automatically the Drill-Down option is enabled on top navigation options. We
will click on „Drill-Down‟ option, then the below window will be displayed.

1
1
Now we are going to perform roll-up (drill-up) operation, in the above window I
selected January month then automatically Drill-up option is enabled on top. We
will click on Drill-up option, then the below window will be displayed.

Next OLAP operation Slicing is performed by inserting slicer as shown in top

navigation options.

1
2
While inserting slicers for slicing operation, we select 2 Dimensions (for e.g.
Category Name & Year) only with one Measure (for e.g. Sum of sales).After inserting
a slice& adding a filter (Category Name: AVANT ROCK & BIG BAND; Year: 2009 &
2010), we will get table as shown below.

Dicing operation is similar to Slicing operation. Here we are selecting 3 dimensions

(Category Name, Year, Region Code)& 2 Measures (Sum of Quantity, Sum of Sales)
through „insert slicer‟ option. After that adding a filter for Category Name, Year &
Region Code as shown below.

1
3
Finally, the Pivot (rotate) OLAP operation is performed by swapping rows (Order
Date-Year) & columns (Values-Sum of Quantity & Sum of Sales) through right side
bottom navigation bar as shown below.

After Swapping (rotating), we will get resultant as represented below

with a pie-chart for Category-Classical& Year Wise data.

1
4
Conclusion:- Hence we have successfully implemented OLAP operations.

1
5
Procedure:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with use
training set option.
4. Then we will get regression model & its result as shown below.
5. The patterns are visually mentioned below for regression model through visualize
classifier errors option which is available in right click options.

B. Use options cross-validation and percentage split and repeat running

the Linear Regression Model. Observe the results and derive meaningful
results.

1
6
Procedure for cross-validation:
1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with
cross validation option with 10 folds.
4. Then we will get regression model & its result as shown below.

Procedure for percentage split:

1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Linear Regression algorithm & click on start option with
percentage split option with 66% split.
4. Then we will get regression model & its result as shown below.

1
7
C. Explore Simple linear regression technique that only

looks at one variable Procedure:

1. Load the dataset (Cpu.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under functions section.
3. In which we selected Simple Linear Regression algorithm & click on start option
with use training set option with one variable (MYCT).
4. Then we will get regression model & its result as shown below.

Conclusion:- Hence we have successfully performed Regression on data sets

1
8
Procedure:-
1. Load the dataset (Iris.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see different clustering
algorithms under lazy section.
3. In which we selected Simple K-Means algorithm & click on start option with
―use training set‖ test option enabled.
4. Then we will get the sum of squared errors, centroids, No. of iterations &
clustered instances as represented below.

5. If we right click on simple k means, we will get more options in which

―Visualize cluster assignments‖ should be selected for getting cluster
visualization as shown below.

1
9
Conclusion:-Hence we have demonstrated the clustering rule process on
data-set iris.arff using simple k-means.

2
0
Procedure:-
1. Load the dataset (Breast-Cancer.arff) into weka tool
2. Go to associate option & in left-hand navigation bar we can see different
association algorithms.
3. In which we can select Aprori algorithm & click on select option.
4. Below we can see the rules generated with different support & confidence values
for that selected dataset.

Conclusion:- Hence we have implemented the Apriori algorithm.

2
1

Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Expt 1 Docx
No ratings yet
Expt 1 Docx
15 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
DM Lab Task-1 Expr's-1
No ratings yet
DM Lab Task-1 Expr's-1
58 pages
WEKA Guide for ML Practitioners
No ratings yet
WEKA Guide for ML Practitioners
58 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
DM Lab
No ratings yet
DM Lab
101 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Data Mining Lab Manual for CSE
No ratings yet
Data Mining Lab Manual for CSE
50 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Data Mining Complete Lab Manual - DRSNR
No ratings yet
Data Mining Complete Lab Manual - DRSNR
27 pages
Lab 04
No ratings yet
Lab 04
7 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Data Mining Lab Manual Using WEKA
No ratings yet
Data Mining Lab Manual Using WEKA
41 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
Introduction to WEKA: Features & Usage
No ratings yet
Introduction to WEKA: Features & Usage
51 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
WEKA Tool & Data Mining Lab Guide
No ratings yet
WEKA Tool & Data Mining Lab Guide
29 pages
Ccs341 Datawarehousing
No ratings yet
Ccs341 Datawarehousing
66 pages
J48 & Naive Bayes Classification Guide
No ratings yet
J48 & Naive Bayes Classification Guide
3 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DWM1
No ratings yet
DWM1
19 pages
DWDM Lab Manual 2022-2023
No ratings yet
DWDM Lab Manual 2022-2023
87 pages
Itdw
No ratings yet
Itdw
44 pages
SQL Lookup Table in Data Warehousing
No ratings yet
SQL Lookup Table in Data Warehousing
41 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Lab Manual (2024)
No ratings yet
Lab Manual (2024)
56 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Exp 6
No ratings yet
Exp 6
9 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
DWDM Lab Manual 2024-2025
No ratings yet
DWDM Lab Manual 2024-2025
96 pages
Weka Data Analysis Guide
No ratings yet
Weka Data Analysis Guide
21 pages
Weka Tool Guide for Data Analysts
No ratings yet
Weka Tool Guide for Data Analysts
6 pages
Weka Tutorial
100% (1)
Weka Tutorial
32 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Data Warehousing Lab Course Guide
0% (1)
Data Warehousing Lab Course Guide
28 pages
Exp 6
No ratings yet
Exp 6
12 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Overview of WEKA Data Mining Software
No ratings yet
Overview of WEKA Data Mining Software
17 pages
DW Lab Manual
No ratings yet
DW Lab Manual
44 pages
Organized
No ratings yet
Organized
31 pages
Classcertofexcellence
No ratings yet
Classcertofexcellence
1 page
Classcertofexcellence
No ratings yet
Classcertofexcellence
1 page
SE Manual
No ratings yet
SE Manual
43 pages
Assignment 05 - MFP: Task 1
No ratings yet
Assignment 05 - MFP: Task 1
22 pages
Pengaruh Teknik Relaksasi Benson Terhadap Tekanan Darah Lansia Dengan Hipertensi
No ratings yet
Pengaruh Teknik Relaksasi Benson Terhadap Tekanan Darah Lansia Dengan Hipertensi
10 pages
Unit-3 Intr Data Science
No ratings yet
Unit-3 Intr Data Science
150 pages
6.2.A Writing The Action Research Report
100% (1)
6.2.A Writing The Action Research Report
12 pages
Unit 1
No ratings yet
Unit 1
50 pages
Acmegrade Data Analytics & Data Science
No ratings yet
Acmegrade Data Analytics & Data Science
4 pages
Guidelines For Dissertation and Viva
No ratings yet
Guidelines For Dissertation and Viva
4 pages
Ratio Analysis - Ashokleyland SUDHEER
100% (3)
Ratio Analysis - Ashokleyland SUDHEER
83 pages
Revenueassurance101 150709111645 Lva1 App6891
No ratings yet
Revenueassurance101 150709111645 Lva1 App6891
55 pages
Prescriptive Analytics in Business
No ratings yet
Prescriptive Analytics in Business
30 pages
Lesson 4 EDA
No ratings yet
Lesson 4 EDA
3 pages
Data Analysis Roadmap
No ratings yet
Data Analysis Roadmap
3 pages
Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
27 pages
HW18 - STAT 273 - Summer 2024
100% (1)
HW18 - STAT 273 - Summer 2024
21 pages
The Impact of Tax Policies On Business and Investment in Rwanda Using STATISTICAL DATA
No ratings yet
The Impact of Tax Policies On Business and Investment in Rwanda Using STATISTICAL DATA
11 pages
ESSAY
No ratings yet
ESSAY
4 pages
Data Transformation Assignment Instructions
No ratings yet
Data Transformation Assignment Instructions
2 pages
Report
No ratings yet
Report
9 pages
Case 1 - Wine Data Preliminar Analysis (Fall 2024)
No ratings yet
Case 1 - Wine Data Preliminar Analysis (Fall 2024)
3 pages
Effect Size Estimation
No ratings yet
Effect Size Estimation
18 pages
Intro To Practical Component
No ratings yet
Intro To Practical Component
4 pages
Eating Habits Project
90% (10)
Eating Habits Project
31 pages
The Linear Regression Model: An Overview: Damodar Gujarati
100% (1)
The Linear Regression Model: An Overview: Damodar Gujarati
17 pages
Agronomy 13 00704 v2
No ratings yet
Agronomy 13 00704 v2
20 pages
Pearson and Spearman Correlation Analysis
No ratings yet
Pearson and Spearman Correlation Analysis
1 page
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
No ratings yet
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
16 pages
How To Choose Right Tool For Data Analytics
No ratings yet
How To Choose Right Tool For Data Analytics
7 pages
Europass CV S Immanuel Sam Pradeep
No ratings yet
Europass CV S Immanuel Sam Pradeep
2 pages
Application Manual of Geophysical Methods To Engineering and Environmental Problems PDF
50% (2)
Application Manual of Geophysical Methods To Engineering and Environmental Problems PDF
763 pages
Aec Q003a
No ratings yet
Aec Q003a
21 pages
Saurabh Koli PV
No ratings yet
Saurabh Koli PV
1 page

DMW Lab Print

Uploaded by

DMW Lab Print

Uploaded by

The buttons can be used to start the following applications:

Explorer- An environment for exploring data with WEKA

a) Click on ―explorer‖ button to bring up the explorer window.

Experimenter- An environment for performing experiments and conducting

a) Experimenter is for comparing results.

4. Generate. .. Enables you to generate artificial data from a variety of Data

3. Cross-validation: The classifier is evaluated by cross-validation, using the

4. Percentage split: The classifier is evaluated on how well it predicts a certain

Searching and Evaluating

Conclusion:- Hence we have performed data pre-processing tasks and

Conclusion:- Hence we have successfully demonstrated the classification rule

Procedure for OLAP Operations:

In the above window, we selected year „2008‟ in „Electronic‟ Category,

Next OLAP operation Slicing is performed by inserting slicer as shown in top

Dicing operation is similar to Slicing operation. Here we are selecting 3 dimensions

After Swapping (rotating), we will get resultant as represented below

B. Use options cross-validation and percentage split and repeat running

Procedure for percentage split:

looks at one variable Procedure:

Conclusion:- Hence we have successfully performed Regression on data sets

5. If we right click on simple k means, we will get more options in which

Conclusion:- Hence we have implemented the Apriori algorithm.

You might also like