0% found this document useful (0 votes)

29 views17 pages

Data Mining with Weka Knowledge Flow

The document discusses using the Weka data mining tool to perform classification, association, and clustering tasks using its Knowledge Flow interface. It provides an overview of Weka and Knowledge Flow, and demonstrates how to create a Knowledge flow in Weka to conduct experiments with different algorithms on diverse datasets, evaluating the performance of classification, association, and clustering.

Uploaded by

Zulkifal A ràja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views17 pages

Data Mining with Weka Knowledge Flow

Uploaded by

Zulkifal A ràja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Mining Using Weka Knowledge Flow Environment

Project Report

Mustansar Ali (B210317031)

Regd.No: 2021-UOK-04831

Department: Artificial Intelligence

Session: 2021-2025

Subject : Data Mining

Submitted to: Dr. Waqar Malik

Date of Submission: April 12, 2024

DEPARTMENT OF ARTIFICIAL INTELLIGENCE

FACULTY OF COMPUTING AND ENGINEERING

UNIVERSITY OF KOTLI AZAD JAMMU AND KASHMIR

Data Mining Project Report

Contents

Abstract ................................................................................................................................................ 2

1. Introduction to WEKA .................................................................................................................. 4

2. Introduction to Knowledge flow ................................................................................................... 4

3. Creating a Knowledge flow in WEKA ......................................................................................... 5

4. Classification................................................................................................................................. 7

4.1 Performing Classification ...................................................................................................... 7

4.2 Comparing the algorithms performance ................................................................................ 9

5. Association .................................................................................................................................. 10

5.1 Performing Association ....................................................................................................... 10

5.2 Comparing the performance ................................................................................................ 13

6. Clustering .................................................................................................................................... 13

6.1 Performing Clustering ......................................................................................................... 14

6.2 Comparing the performance ................................................................................................ 16

7. Conclusion .................................................................................................................................. 17

Table of Figures

Figure 1 Classification - Random forest ............................................................................................... 8

Figure 2 Results -- Random forest ........................................................................................................ 8
Figure 3 Results -- Linear regression .................................................................................................... 9
Figure 4 Association -- filtered associator .......................................................................................... 11
Figure 5 Association -- filtered associator results............................................................................... 11
Figure 6 Association -- apriori ............................................................................................................ 12
Figure 7 Apriori – results .................................................................................................................... 12
Figure 8 Clustering -- Hierarichal ....................................................................................................... 14
Figure 9 Hierarichal – results ............................................................................................................. 15
Figure 10 Canopy – clustering ............................................................................................................ 15
Figure 11 Canopy clustering -- results ............................................................................................... 16

2
Data Mining Project Report

Abstract

Machine learning plays a pivotal role in extracting meaningful insights and making predictions
from vast datasets across various domains. In this project, we explore the utilization of Weka, an open-
source machine learning software tool, specifically focusing on its Knowledge Flow interface. Our
objective is to investigate the capabilities of Weka's Knowledge Flow in performing classification,
association, and clustering tasks. We begin by providing an overview of Weka and Knowledge Flow,
elucidating their significance in the machine learning workflow. Subsequently, we delve into the
methodology of creating a knowledge flow in Weka and the necessary steps for data preparation.
Utilizing a range of classification, association, and clustering algorithms available in Weka, we
conduct experiments on diverse datasets to evaluate the performance of these tasks. Evaluation metrics
including accuracy, precision, recall, F1 score, and relevant error measures are employed to assess the
effectiveness of the implemented algorithms. Through comprehensive analysis and interpretation of
results, we aim to gain insights into the capabilities and limitations of Weka's Knowledge Flow for
practical machine learning applications. This project contributes to enhancing understanding and
proficiency in leveraging Weka for diverse data analysis and predictive modeling tasks.

3
Data Mining Project Report

1. Introduction to WEKA
Weka, standing for "Waikato Environment for Knowledge Analysis," is a comprehensive suite
of machine learning software tools developed at the University of Waikato in New Zealand. It offers
a wide range of algorithms for data preprocessing, classification, regression, clustering, association
rule mining, and feature selection. Weka is written in Java, making it platform-independent and easily
accessible across different operating systems.
One of the distinguishing features of Weka is its user-friendly graphical user interface (GUI),
which facilitates interactive data exploration and experimentation. The GUI provides intuitive
visualization tools for data analysis, allowing users to visualize datasets, explore attribute distributions,
and inspect model outputs. Additionally, Weka offers a command-line interface for scripting and batch
processing tasks, providing flexibility for automation and integration into larger workflows.
Weka's extensive collection of machine learning algorithms includes both classic techniques
and state-of-the-art methods, making it suitable for a wide range of applications and research purposes.
Moreover, Weka's open-source nature encourages collaboration and community contributions,
fostering continuous development and enhancement of its functionalities.
In addition to its standalone capabilities, Weka can be seamlessly integrated into other software
environments through its Java API, enabling developers to incorporate machine learning
functionalities into custom applications and workflows. Furthermore, Weka supports interoperability
with other data analysis and visualization tools through standard data formats and interfaces, enhancing
its versatility and compatibility with existing software ecosystems.
Overall, Weka serves as a valuable resource for researchers, educators, and practitioners in the
field of machine learning and data mining. Its ease of use, extensive functionality, and active
community support make it a popular choice for various data analysis and predictive modeling tasks.

2. Introduction to Knowledge flow

In the realm of machine learning and data mining, Knowledge Flow represents a graphical user
interface (GUI) paradigm designed to streamline the process of designing, implementing, and
evaluating machine learning workflows. It provides an intuitive and visual way for users to construct
data processing pipelines, apply machine learning algorithms, and analyze results in a systematic
manner.
Knowledge Flow interfaces, such as the one provided in the Weka software suite, offer users a
visual representation of their data analysis workflows, akin to a flowchart. Users can drag and drop

4
Data Mining Project Report
components representing data sources, preprocessing steps, feature selection techniques, machine
learning algorithms, and evaluation methods onto a canvas, and then connect them together to form a
coherent workflow.
One of the key advantages of Knowledge Flow is its ability to simplify complex machine
learning tasks, making them more accessible to users with varying levels of expertise. By abstracting
away the intricacies of programming and algorithm implementation, Knowledge Flow empowers users
to focus on the conceptual aspects of their data analysis tasks, such as selecting appropriate algorithms,
tuning parameters, and interpreting results.
Furthermore, Knowledge Flow facilitates reproducibility and transparency in machine learning
experiments by providing a visual representation of the entire workflow, including data preprocessing
steps, model configurations, and evaluation metrics. This transparency enables users to easily track the
sequence of operations performed on the data and understand the rationale behind the decisions made
during the analysis process.
Another notable feature of Knowledge Flow is its support for interactive experimentation and
real-time feedback. Users can iteratively refine their workflows by adjusting parameters, swapping out
algorithms, and visualizing intermediate results, allowing for rapid prototyping and hypothesis testing.
Overall, Knowledge Flow represents a powerful tool for designing, implementing, and evaluating
machine learning workflows in a visual and interactive manner. Its intuitive interface, support for
reproducibility, and facilitation of experimentation make it a valuable asset for researchers, educators,
and practitioners in the field of machine learning and data mining.

3. Creating a Knowledge flow in WEKA

Launching Weka:
Start by launching the Weka application on your computer. Weka provides a user-friendly
graphical interface that allows us to create and visualize Knowledge Flows.
Opening the Knowledge Flow Environment:
Once Weka is launched, navigate to the "Explorer" tab in the top menu bar. From the dropdown
menu, select "Knowledge Flow." This will open the Knowledge Flow environment within the Weka
interface.
Understanding the Knowledge Flow Interface:
The Knowledge Flow interface consists of several panels:
• Toolbar: Contains tools for adding components to the canvas, running the flow, saving the flow,
etc.
• Canvas: The main area where you design your flow by adding components and connecting them.
5
Data Mining Project Report
• Component Palette: Contains a list of available components such as data sources, preprocessing
filters, classifiers, evaluators, etc.
• Properties Panel: Displays the properties of the selected component, allowing you to modify its
settings.
Adding Components to the Canvas:
To add a component to the canvas, simply drag it from the Component Palette onto the Canvas.
Components can include:
• Data Sources: Represent the input data for your analysis (e.g., ARFF files, databases).
• Preprocessing Filters: Apply transformations or cleanups to your data (e.g., attribute selection,
normalization).
• Classifiers: Algorithms used for classification tasks (e.g., decision trees, support vector machines).
• Evaluators: Assess the performance of your model (e.g., cross-validation, holdout evaluation).
• Connecting Components:
After added components to the canvas, connect them together to define the flow of data and operations.
To connect components, click on the output port of one component and drag the cursor to the input
port of another component.
Configuring Component Properties:
After adding components to the canvas, we can configure their properties by selecting the
component and adjusting its settings in the Properties Panel. For example, we can specify parameters
for classifiers, set options for preprocessing filters, or define evaluation metrics for evaluators.
Running the Knowledge Flow:
Once your Knowledge Flow is set up, we can execute it by clicking on the "Run" button in the
Toolbar. Weka will process the data according to the defined flow, applying preprocessing steps,
training classifiers, and evaluating the model's performance.
Analyzing Results:
After running the Knowledge Flow, you can analyze the results by inspecting output messages
in the Console Panel and viewing visualization outputs (e.g., ROC curves, confusion matrices)
generated by evaluators.
Saving and Exporting:
Once you're satisfied with your Knowledge Flow, you can save it for future use by clicking on
the "Save" button in the Toolbar. You can also export the flow as an XML file or share it with others.
Iterative Refinement:
Knowledge Flow allows for iterative refinement of your analysis. You can modify components,

6
Data Mining Project Report
adjust parameters, and rerun the flow to experiment with different settings and improve model
performance.
By following these steps, we can effectively create and execute Knowledge Flows in Weka for
various machine learning tasks. Experimentation and exploration within the Knowledge Flow
environment enable users to gain deeper insights into their data and develop robust predictive models.
Now we will perform classification, association and clustering one by one and will discuss the
performance of different algorithms on different datasets.

4. Classification
Classification in machine learning is a supervised learning task where the goal is to categorize
input data into predefined classes based on their features. It involves training a model on labeled data
to learn the relationships between input features and class labels, enabling it to predict the class labels
of unseen instances.
4.1 Performing Classification
For classification task I have used my own dataset which I have made as my assignment. The
dataset is basically related to doctor’s appointment and checkup. The dataset is basically a
classification dataset in which class variable is checkup status either done or not.
In order to make a data flow diagram of classification, I have first opened the weka software
and then opened the knowledge flow environment.
The components I used and steps I followed are as follow:
• First of all I took a csv loader to load my dataset.
• Then I took a class assigner to assigns class labels to instances in the dataset, indicating the
target variable or category that the model will predict.
• Then I took a cross validation fold maker to splits the dataset into multiple folds or subsets for
cross-validation, a technique used to assess the performance of a model by training and testing
it on different subsets of the data.
• Then I selected the random forest algorithm to train our model.
• Then I used a classifier performance evaluator which evaluates the performance of a
classification model using various metrics such as accuracy, precision, recall, and F1 score,
allowing users to assess the effectiveness of the model in classifying instances accurately.
• Then at the last I connected a text viewer so that results can be seen.
Here is the final interface of data flow in knowledge flow environment which I got after doing all
these above steps.

7
Data Mining Project Report

Figure 1 Classification - Random forest

When I display the results, here are the results which got.

Figure 2 Results -- Random forest

Then I tried to implement another algorithm so that I can compare which one is performing
better on my dataset. The other algorithm which tried is linear regression. The whole set up was the
same as above I have just replaced random forest with linear regression. After applying linear
regression I got following results.

8
Data Mining Project Report

Figure 3 Results -- Linear regression

4.2 Comparing the algorithms performance

Metric RandomForest LinearRegression

Correlation Coefficient 0.9937 1

Mean Absolute Error 0.0209 0

Root Mean Squared Error 0.0512 0.0004

Relative Absolute Error 5.8906% 0.0084%

Root Relative Squared Error 12.1714% 0.097%

Total Number of Instances 4100 4100

Analysis:
Correlation Coefficient: LinearRegression achieved a perfect correlation coefficient of 1, indicating
a perfect linear relationship between predicted and actual values. RandomForest also achieved a very
high correlation coefficient of 0.9937, indicating a strong correlation.
Mean Absolute Error and Root Mean Squared Error: LinearRegression achieved a mean
absolute error and root mean squared error of 0, suggesting that its predictions exactly match the
actual values. RandomForest had slightly higher errors, with a mean absolute error of 0.0209 and a
root mean squared error of 0.0512.
Relative Absolute Error and Root Relative Squared Error: LinearRegression had extremely low
relative absolute error and root relative squared error values, indicating minimal deviation from
actual values. RandomForest had higher error percentages, but still relatively low compared to the
scale of the data.
Conclusion: Both algorithms performed exceptionally well, but LinearRegression achieved slightly

9
Data Mining Project Report
better results in terms of accuracy and error metrics.

5. Association
Association in data mining refers to the process of discovering interesting relationships or patterns
among variables in large datasets. Unlike classification, which predicts a target variable based on input
features, association analysis focuses on identifying correlations or associations between different
variables without necessarily predicting an outcome. A common application of association analysis is
in market basket analysis, where the goal is to uncover relationships between items purchased together
by customers. The most well-known algorithm for association analysis is Apriori, which identifies
frequent itemsets and generates association rules based on their occurrence patterns. These association
rules provide valuable insights into customer behavior, purchasing patterns, and product
recommendations, helping businesses optimize their marketing strategies and improve customer
satisfaction.
5.1 Performing Association
The dataset which I will use here in this portion is weather. Nominal dataset which I downloaded
from the GitHub but it is also present in weka by default so that it can be used by researchers for
educational purposes. So I also have used the same dataset. As the name indicates it is a nominal
dataset and this dataset for association rule mining. I will apply two different algorithms (filtered
Associator and apriori algorithm) on this dataset and after that will compare the results of both. So
here are the steps which I have followed for knowledge flow diagram of association rule mining.
• First of all I took a Arff loader to load the dataset into the Knowledge Flow environment.
• Then I took a class assigner which specifies the target variable for association rule mining.
• Then I took a cross validation fold maker to splits the dataset into multiple folds or subsets for
cross-validation, a technique used to assess the performance of a model by training and testing it
on different subsets of the data.
• Then I selected the filtered associator an association rule mining algorithm to generate association
rules.
• Then at the last I connected a text viewer so that results can be seen.
Here is the final interface of data flow in knowledge flow environment which I got after doing all these
above steps.
Filtered associator algorithm :

10
Data Mining Project Report

Figure 4 Association -- filtered associator

On execution following results are obtained.

Figure 5 Association -- filtered associator results

11
Data Mining Project Report
Apriori Algorithm :

Figure 6 Association -- apriori

Figure 7 Apriori – results

12
Data Mining Project Report
5.2 Comparing the performance
Metric Apriori FilteredAssociator
Minimum Support 0.15 0.2
Minimum Confidence 0.9 0.9
Number of Cycles 17 16
Number of Rules 10 10
Best Rule: Temperature=cool ==> Humidity=normal Conf:(1), Lift:(1.71) Conf:(1), Lift:(1.71)
Best Rule: Humidity=normal ^ Windy=FALSE ==> Play=yes Conf:(1), Lift:(1.5) Conf:(1), Lift:(1.5)
Best Rule: Outlook=overcast ==> Play=yes Conf:(1), Lift:(1.5) Conf:(1), Lift:(1.5)
Best Rule: Outlook=rainy ^ Play=yes ==> Windy=FALSE Conf:(1), Lift:(1.71) Conf:(1), Lift:(1.71)
Best Rule: Outlook=sunny ^ Humidity=high ==> Temperature=hot Conf:(1), Lift:(3) Conf:(1), Lift:(3)

Analysis:
• Both algorithms (Apriori and Filtered Associator) produced similar results in terms of the generated
association rules, with identical support, confidence, and lift for the top rules.
• The minimum support and confidence thresholds were set to the same values for both algorithms.
• Both algorithms identified the same number of cycles and generated the same number of rules.
• The top association rules discovered by both algorithms are identical, indicating consistency in the
patterns identified from the dataset.
Overall, both association rule mining algorithms produced comparable results, suggesting that the
choice between them may depend on other factors such as computational efficiency, ease of use, or
additional customization options provided by the Filtered Associator algorithm.

6. Clustering
Clustering in the realm of machine learning is an unsupervised learning technique aimed at
organizing a dataset into groups or clusters where instances within the same group exhibit similar
characteristics or patterns. Unlike supervised learning, clustering does not involve labeled data;
instead, it seeks to discover intrinsic structures within the data based solely on the attributes of the
instances. The objective of clustering is to partition the dataset in such a way that instances within the
same cluster are more similar to each other than to those in other clusters, while maximizing the
dissimilarity between clusters. Common clustering algorithms include K-means, hierarchical
clustering, and DBSCAN, each with its own approach to defining clusters based on distance metrics,
density, or connectivity. Clustering finds applications in various domains such as customer
segmentation, anomaly detection, image segmentation, and document clustering, providing valuable
insights into the underlying structure and patterns present in the data.

13
Data Mining Project Report
6.1 Performing Clustering

For clustering I have used the same dataset which I used in Association rule mining but here is
numeric data and in association rule mining I used the nominal data. The dataset name is weather.
Numeric, which is also present in weka application in Arff file format. In order to draw data flow
diagram in weka knowledge flow I have performed the following steps.
• First of all I took a Arff loader to load my dataset.
• Then I took a class assigner to assigns class labels to instances in the dataset, indicating the target
variable or category that the model will predict.
• Then I took a cross validation fold maker to splits the dataset into multiple folds or subsets for
cross-validation, a technique used to assess the performance of a model by training and testing it
on different subsets of the data.
• Then I selected the Hierarchal clustering to make clusters. I set the Euclidean distance as a distance
function and number of clusters are set at 12.
• Then I used a cluster performance evaluator which assesses the quality of clustering results by
providing various metrics to evaluate the compactness and separation of clusters.
• Then at the last I connected a text viewer so that results can be seen.
Hierarchal clustering :

Figure 8 Clustering -- Hierarichal

14
Data Mining Project Report
Results:

Figure 9 Hierarichal – results

Canopy clustering

Figure 10 Canopy – clustering

15
Data Mining Project Report
Results :

Figure 11 Canopy clustering -- results

6.2 Comparing the performance

Metric Canopy Clustering Hierarchical Clustering

(Algorithm 1) (Algorithm 2)
Number of Clusters 12 2
Correctly Clustered
Instances 1 (100%) 1 (100%)
Incorrectly
Clustered Instances 0 (0%) 0 (0%)

Analysis:
• Number of Clusters: Canopy Clustering identified 12 clusters, whereas Hierarchical Clustering
identified 2 clusters.
• Correctly Clustered Instances: Both algorithms correctly clustered all instances, with a 100%
accuracy rate.
• Incorrectly Clustered Instances: Both algorithms did not have any incorrectly clustered instances.

16
Data Mining Project Report
Overall, both clustering algorithms achieved perfect clustering performance in terms of
correctly clustering instances. However, they differ in the number of clusters they identified and their
underlying clustering methodologies. Canopy Clustering tends to produce a larger number of clusters
based on a pre-defined radius threshold, while Hierarchical Clustering builds a hierarchical structure
of clusters based on the proximity of instances. The choice between the two algorithms may depend
on the specific characteristics of the dataset and the desired level of granularity in clustering.

7. Conclusion
Throughout this project, we explored the capabilities of Weka, a powerful tool for data mining
and machine learning. Through the implementation of various tasks including classification,
association rule mining, and clustering, we gained valuable insights into the underlying patterns and
relationships within our datasets. We utilized a range of algorithms such as Random Forest, Apriori,
and Canopy Clustering to analyze and extract meaningful information from the data. Our meticulous
evaluation and comparison of different algorithms demonstrated their effectiveness in addressing
diverse data mining tasks. Ultimately, this project underscores the significance of leveraging advanced
data mining techniques and tools like Weka to uncover actionable insights, optimize decision-making
processes, and drive innovation in various domains."

Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
WEKA
No ratings yet
WEKA
50 pages
OS Journal
No ratings yet
OS Journal
28 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Overview of WEKA Data Mining Software
No ratings yet
Overview of WEKA Data Mining Software
17 pages
Dataminingg
No ratings yet
Dataminingg
22 pages
Weka: Data Mining and Preprocessing Guide
No ratings yet
Weka: Data Mining and Preprocessing Guide
4 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
WEKA Classification Techniques Study
No ratings yet
WEKA Classification Techniques Study
15 pages
WEKA Data Mining Course Overview
No ratings yet
WEKA Data Mining Course Overview
5 pages
Data Mining Steps Using Weka for Sales
No ratings yet
Data Mining Steps Using Weka for Sales
20 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
DataMining Reflection Paper
No ratings yet
DataMining Reflection Paper
2 pages
Introduction to Weka Data Mining Tool
No ratings yet
Introduction to Weka Data Mining Tool
17 pages
DMDV
No ratings yet
DMDV
22 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
DWDM Lab Manual 2024-2025
No ratings yet
DWDM Lab Manual 2024-2025
96 pages
Data Mining Lab Report by Meherunnesa Tania
No ratings yet
Data Mining Lab Report by Meherunnesa Tania
7 pages
Weka Data Mining Project Report 2019-20
No ratings yet
Weka Data Mining Project Report 2019-20
33 pages
Chapter 5 - The Application of WEKA Software
No ratings yet
Chapter 5 - The Application of WEKA Software
80 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
10 pages
Komal DWDM 1to5
No ratings yet
Komal DWDM 1to5
61 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Data Warehousing Lab Course Guide
0% (1)
Data Warehousing Lab Course Guide
28 pages
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Flood Prediction Analysis
No ratings yet
Flood Prediction Analysis
42 pages
Adm 4 ND 5
No ratings yet
Adm 4 ND 5
51 pages
DWH Manual Merged
No ratings yet
DWH Manual Merged
47 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
Exp 6
No ratings yet
Exp 6
9 pages
Dresses Sales Data Mining Insights
No ratings yet
Dresses Sales Data Mining Insights
20 pages
Knowledge Representation & Weka Guide
No ratings yet
Knowledge Representation & Weka Guide
48 pages
DataMiningManual Sawan
No ratings yet
DataMiningManual Sawan
30 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Data Analysis Using WEKA
89% (9)
Data Analysis Using WEKA
24 pages
Weka Data Mining Overview and Features
No ratings yet
Weka Data Mining Overview and Features
7 pages
WEKA Tool & Data Mining Lab Guide
No ratings yet
WEKA Tool & Data Mining Lab Guide
29 pages
DWM Exp8 10080
No ratings yet
DWM Exp8 10080
9 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
DWM1
No ratings yet
DWM1
19 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
WEKA Data Mining Techniques Guide
No ratings yet
WEKA Data Mining Techniques Guide
17 pages
Week 1
No ratings yet
Week 1
4 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Data Warehouse
No ratings yet
Data Warehouse
29 pages
WEKA Data Analysis Guide
No ratings yet
WEKA Data Analysis Guide
85 pages
Ccs341 Datawarehousing
No ratings yet
Ccs341 Datawarehousing
66 pages
Ijettcs 2014 04 25 123
No ratings yet
Ijettcs 2014 04 25 123
5 pages
WEKA Guide for ML Practitioners
No ratings yet
WEKA Guide for ML Practitioners
58 pages
Institutional Economic Theory Syllabus
No ratings yet
Institutional Economic Theory Syllabus
4 pages
EGD PAT t2 PHSE 1 (AutoRecovered)
No ratings yet
EGD PAT t2 PHSE 1 (AutoRecovered)
8 pages
Philanthropic Impact of Beth Stukes
No ratings yet
Philanthropic Impact of Beth Stukes
17 pages
DA Course Outline - To Upload
No ratings yet
DA Course Outline - To Upload
7 pages
Efficiency Assessment of Universities Through Data Envelopment Analysis
No ratings yet
Efficiency Assessment of Universities Through Data Envelopment Analysis
8 pages
General Surveying 1
No ratings yet
General Surveying 1
6 pages
Test Bank For Fundamentals of Nursing Active Learning For Collaborative Practice 2nd Edition
No ratings yet
Test Bank For Fundamentals of Nursing Active Learning For Collaborative Practice 2nd Edition
13 pages
Phonics Museum 2 1st Grade Complete Kit Lis
100% (1)
Phonics Museum 2 1st Grade Complete Kit Lis
250 pages
Kids Can Starter SC Units 1-2
No ratings yet
Kids Can Starter SC Units 1-2
24 pages
After School (B)
No ratings yet
After School (B)
10 pages
Ankit 1000019876
No ratings yet
Ankit 1000019876
3 pages
Survey Questionnaire
No ratings yet
Survey Questionnaire
3 pages
Quality Strategies in Higher Education
No ratings yet
Quality Strategies in Higher Education
11 pages
Nursing Arts Lab Handbook Overview
No ratings yet
Nursing Arts Lab Handbook Overview
21 pages
Research Designs Data Collection Methods
No ratings yet
Research Designs Data Collection Methods
21 pages
Title - Early Childhood Education For Children With Special Needs - Fostering Inclusive Learning Environments
No ratings yet
Title - Early Childhood Education For Children With Special Needs - Fostering Inclusive Learning Environments
2 pages
Masterlist of Bataan Secondary Schools
No ratings yet
Masterlist of Bataan Secondary Schools
2 pages
The Decline of Social Mobility in America - The Atlantic
No ratings yet
The Decline of Social Mobility in America - The Atlantic
4 pages
FRCPath Practical Exam Overview
100% (1)
FRCPath Practical Exam Overview
20 pages
Inheritance, Genes Notes
No ratings yet
Inheritance, Genes Notes
18 pages
Unicode Symbols Reference
No ratings yet
Unicode Symbols Reference
3 pages
Mccown Ve Johnson, 1991
No ratings yet
Mccown Ve Johnson, 1991
3 pages
Properties of Assessment Methods
100% (5)
Properties of Assessment Methods
19 pages
Unit Plan Course Plan Ii
No ratings yet
Unit Plan Course Plan Ii
10 pages
Deepfakes-Disha Mittal
No ratings yet
Deepfakes-Disha Mittal
23 pages
CEE 101a Syllabus
No ratings yet
CEE 101a Syllabus
2 pages
Action Research Data Analysis Guide
No ratings yet
Action Research Data Analysis Guide
18 pages
Habits-A Repeat Performance PDF
No ratings yet
Habits-A Repeat Performance PDF
5 pages
Systemic Approach To Teaching
No ratings yet
Systemic Approach To Teaching
12 pages
Bom MGT Sim
No ratings yet
Bom MGT Sim
2 pages

Data Mining with Weka Knowledge Flow

Uploaded by

Data Mining with Weka Knowledge Flow

Uploaded by

Data Mining Using Weka Knowledge Flow Environment

Mustansar Ali (B210317031)

Department: Artificial Intelligence

Subject : Data Mining

Submitted to: Dr. Waqar Malik

Date of Submission: April 12, 2024

DEPARTMENT OF ARTIFICIAL INTELLIGENCE

FACULTY OF COMPUTING AND ENGINEERING

UNIVERSITY OF KOTLI AZAD JAMMU AND KASHMIR

1. Introduction to WEKA .................................................................................................................. 4

2. Introduction to Knowledge flow ................................................................................................... 4

3. Creating a Knowledge flow in WEKA ......................................................................................... 5

4.1 Performing Classification ...................................................................................................... 7

4.2 Comparing the algorithms performance ................................................................................ 9

5.1 Performing Association ....................................................................................................... 10

5.2 Comparing the performance ................................................................................................ 13

6.1 Performing Clustering ......................................................................................................... 14

6.2 Comparing the performance ................................................................................................ 16

Figure 1 Classification - Random forest ............................................................................................... 8

2. Introduction to Knowledge flow

3. Creating a Knowledge flow in WEKA

Figure 1 Classification - Random forest

Figure 2 Results -- Random forest

Figure 3 Results -- Linear regression

4.2 Comparing the algorithms performance

Metric RandomForest LinearRegression

Correlation Coefficient 0.9937 1

Mean Absolute Error 0.0209 0

Root Mean Squared Error 0.0512 0.0004

Relative Absolute Error 5.8906% 0.0084%

Root Relative Squared Error 12.1714% 0.097%

Total Number of Instances 4100 4100

Figure 4 Association -- filtered associator

On execution following results are obtained.

Figure 5 Association -- filtered associator results

Figure 6 Association -- apriori

Figure 7 Apriori – results

Figure 8 Clustering -- Hierarichal

Figure 9 Hierarichal – results

Figure 10 Canopy – clustering

Figure 11 Canopy clustering -- results

6.2 Comparing the performance

Metric Canopy Clustering Hierarchical Clustering

You might also like