devdm
devdm
220220131116
Experiment No - 4
Aim: Implement Apriori algorithm of association rule data mining technique in any Programming
language.
Date:
Objectives: To implement basic logic for association rule mining algorithm with support and
confidence measures.
.
Equipment/Instruments: Personal Computer, open-source software for programming
Theory:
The Apriori algorithm is a classic and fundamental data mining algorithm used for discovering
association rules in transactional datasets.
● Apriori is designed for finding associations or relationships between items in a dataset. It's
commonly used in market basket analysis and recommendation systems.
● Apriori discovers frequent itemsets, which are sets of items that frequently co-occur in
transactions. A frequent itemset is a set of items that appears in a minimum number of
transactions, known as the "support threshold."
● Support and Confidence: Support measures how often an itemset appears in the dataset,
while confidence measures how often a rule is true. High-confidence rules derived from
frequent itemsets are of interest.
19
Data Mining (3160714) Enrollment No.220220131116
Ensure that your dataset is clean and free from missing values, outliers, and inconsistencies.
1. Procedure:
2. Import the dataset that you want to analyze for association rules.
3. Define the minimum support and confidence thresholds for the Apriori algorithm. These
parameters control the minimum occurrence of itemsets and the minimum confidence level
for rules.
4. Implement the Apriori algorithm to discover frequent itemsets.
5. Use the frequent itemsets obtained from the previous step to generate association rules
Observation/Program:
data = pd.read_csv(r'C:\Users\devendra\Downloads\D2\sentimentdataset.csv')
data.head()
data.head()
20
Data Mining (3160714) Enrollment No.220220131116
data.shape
print(len(association_results))
print(association_results[0])
21
Data Mining (3160714) Enrollment No.220220131116
print("support: " + str(item[1]))
Conclusion:
The Apriori algorithm, while foundational and widely used, has limitations. Its computational cost
can be high for large datasets, and it may struggle with sparse data or complex patterns. However, it
remains a valuable tool for association rule mining, particularly in scenarios with moderate-sized
datasets and clear associations.
Quiz:
Association rule mining is a data mining technique used to discover interesting relationships,
patterns, or associations among a set of items in large datasets. It is widely used in market basket
analysis, recommendation systems, and other applications where understanding co-occurrence and
associations between items can be valuable. Here are the key components:
22
Data Mining (3160714) Enrollment No.220220131116
1. Frequent Itemsets: These are groups of items that often appear together in transactions. For
example, in a supermarket, bread and butter might be a frequent itemset.
2. Association Rules: These are implications of the form A -> B, meaning if item A is present
in a transaction, item B is likely to be present as well. For example, the rule bread ->
butter might indicate that if a customer buys bread, they are also likely to buy butter.
3. Support: This measures how often an itemset appears in the dataset. High support means the
itemset is frequently occurring.
4. Confidence: This measures how often the rule A -> B holds true. High confidence means
the presence of A strongly predicts the presence of B.
5. Lift: This measures the strength of the association rule compared to random co-occurrence.
A lift value greater than 1 indicates a strong association.
In summary, association rule mining helps uncover hidden patterns and correlations in data,
enabling businesses and organizations to make data-driven decisions.
(2) What are the different measures are used in apriori algorithm?
The Apriori algorithm uses several measures to evaluate the quality of association rules. The most
important measures are:
2. Confidence: Indicates the likelihood that item B is also present when item A is present.
3. Lift: Measures the strength of association between items A and B compared to their
independent occurrence.
23
Data Mining (3160714) Enrollment No.220220131116
5. Leverage: Measures the difference between the observed frequency of A and B appearing
together and the expected frequency if they were independent.
These measures help in identifying the most meaningful and interesting association rules in the data,
making them valuable tools for data mining and analysis.
Suggested Reference:
https://colab.research.google.com Jupyter
Notebook
Problem Completeness
Knowledge Logic
Recognition and accuracy Ethics (2)
Rubrics (2) Building (2) Total
(2) (2)
Good Average Good Average Good Average Good Average Good Average
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
24
Data Mining (3160714) Enrollment No.220220131116
25