0% found this document useful (0 votes)
4 views7 pages

devdm

The document outlines an experiment to implement the Apriori algorithm for association rule mining, focusing on its application in data mining using programming. It details the algorithm's purpose, methodology, and the importance of support and confidence measures in discovering frequent itemsets. Additionally, it discusses the limitations of the Apriori algorithm and provides a quiz on association rule mining concepts.

Uploaded by

Urva Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

devdm

The document outlines an experiment to implement the Apriori algorithm for association rule mining, focusing on its application in data mining using programming. It details the algorithm's purpose, methodology, and the importance of support and confidence measures in discovering frequent itemsets. Additionally, it discusses the limitations of the Apriori algorithm and provides a quiz on association rule mining concepts.

Uploaded by

Urva Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Mining (3160714) Enrollment No.

220220131116

Experiment No - 4
Aim: Implement Apriori algorithm of association rule data mining technique in any Programming
language.

Date:

Competency and Practical Skills: Logic building, Programming and Analyzing

Relevant CO: CO2

Objectives: To implement basic logic for association rule mining algorithm with support and
confidence measures.
.
Equipment/Instruments: Personal Computer, open-source software for programming

Theory:
The Apriori algorithm is a classic and fundamental data mining algorithm used for discovering
association rules in transactional datasets.

● Apriori is designed for finding associations or relationships between items in a dataset. It's
commonly used in market basket analysis and recommendation systems.
● Apriori discovers frequent itemsets, which are sets of items that frequently co-occur in
transactions. A frequent itemset is a set of items that appears in a minimum number of
transactions, known as the "support threshold."
● Support and Confidence: Support measures how often an itemset appears in the dataset,
while confidence measures how often a rule is true. High-confidence rules derived from
frequent itemsets are of interest.

● Apriori uses an iterative approach to progressively discover frequent itemsets of increasing


size. It starts with finding frequent 1-itemsets, then 2-itemsets, and so on.
● The algorithm employs pruning techniques to reduce the number of candidate itemsets that
need to be checked, making it more efficient.
● Apriori is widely used in retail for market basket analysis. It helps retailers understand which
products are often purchased together, allowing for optimized store layouts, targeted
marketing, and product recommendations.

19
Data Mining (3160714) Enrollment No.220220131116

Safety and necessary Precautions:

Ensure that your dataset is clean and free from missing values, outliers, and inconsistencies.

1. Procedure:
2. Import the dataset that you want to analyze for association rules.
3. Define the minimum support and confidence thresholds for the Apriori algorithm. These
parameters control the minimum occurrence of itemsets and the minimum confidence level
for rules.
4. Implement the Apriori algorithm to discover frequent itemsets.
5. Use the frequent itemsets obtained from the previous step to generate association rules

Observation/Program:

import numpy as np import


pandas as pd import
matplotlib.pyplot as plt
from apyori import apriori

data = pd.read_csv(r'C:\Users\devendra\Downloads\D2\sentimentdataset.csv')

data.head()

data = pd.read_csv(r'C:\Users\devendra\Downloads\D2\sentimentdataset.csv', header = None)

data.head()

20
Data Mining (3160714) Enrollment No.220220131116

data.shape

records = [] for i in range(0, 733):


records.append([str(data.values[i,j]) for j in range(0, 15)])

association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3,


min_length=2) association_results =
list(association_rules)

print(len(association_results))

print(association_results[0])

for item in association_results:

#first index of the inner list


#contains base item and add item

pair = item[0] items


= [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])

#second index of the inner list

21
Data Mining (3160714) Enrollment No.220220131116
print("support: " + str(item[1]))

#third index of the list located at 0th


#of the third index of the inner list

print("confidence: " + str(item[2][0][2]))


print("Lift: " + str(item[2][0][3]))
print("===================================================")

Conclusion:

The Apriori algorithm, while foundational and widely used, has limitations. Its computational cost
can be high for large datasets, and it may struggle with sparse data or complex patterns. However, it
remains a valuable tool for association rule mining, particularly in scenarios with moderate-sized
datasets and clear associations.

Quiz:

(1) What Do you Mean by Association rule mining?

Association rule mining is a data mining technique used to discover interesting relationships,
patterns, or associations among a set of items in large datasets. It is widely used in market basket
analysis, recommendation systems, and other applications where understanding co-occurrence and
associations between items can be valuable. Here are the key components:

22
Data Mining (3160714) Enrollment No.220220131116
1. Frequent Itemsets: These are groups of items that often appear together in transactions. For
example, in a supermarket, bread and butter might be a frequent itemset.
2. Association Rules: These are implications of the form A -> B, meaning if item A is present
in a transaction, item B is likely to be present as well. For example, the rule bread ->
butter might indicate that if a customer buys bread, they are also likely to buy butter.
3. Support: This measures how often an itemset appears in the dataset. High support means the
itemset is frequently occurring.
4. Confidence: This measures how often the rule A -> B holds true. High confidence means
the presence of A strongly predicts the presence of B.
5. Lift: This measures the strength of the association rule compared to random co-occurrence.
A lift value greater than 1 indicates a strong association.

In summary, association rule mining helps uncover hidden patterns and correlations in data,
enabling businesses and organizations to make data-driven decisions.

(2) What are the different measures are used in apriori algorithm?

The Apriori algorithm uses several measures to evaluate the quality of association rules. The most
important measures are:

1. Support: Indicates how frequently an itemset appears in the dataset.

2. Confidence: Indicates the likelihood that item B is also present when item A is present.

3. Lift: Measures the strength of association between items A and B compared to their
independent occurrence.

4. Conviction: Measures the degree of implication of item A in the absence of item B.

23
Data Mining (3160714) Enrollment No.220220131116

5. Leverage: Measures the difference between the observed frequency of A and B appearing
together and the expected frequency if they were independent.

6. Gini Index: Measures the inequality among values of a frequency distribution.

These measures help in identifying the most meaningful and interesting association rules in the data,
making them valuable tools for data mining and analysis.

Suggested Reference:

● J. Han, M. Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann

References used by the students:

https://colab.research.google.com Jupyter
Notebook

Rubric wise marks obtained:

Problem Completeness
Knowledge Logic
Recognition and accuracy Ethics (2)
Rubrics (2) Building (2) Total
(2) (2)
Good Average Good Average Good Average Good Average Good Average
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

24
Data Mining (3160714) Enrollment No.220220131116

25

You might also like