0% found this document useful (0 votes)
14 views8 pages

DWM Exp8

Uploaded by

dotavirus2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

DWM Exp8

Uploaded by

dotavirus2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Name: Aditya Dikonda

Roll No:24
Batch:T12

Experiment 8: To implement Apriori algorithm.

LO mapping: LO6 Mapped


Theory:
Apriori algorithm was given by R. Agrawal and R. Srikant in 1994 for finding
frequent itemsets in a dataset for boolean associa on rule. Name of the
algorithm is Apriori because it uses prior knowledge of frequent itemset proper
es. We apply an itera ve approach or level-wise search where kfrequent itemsets
are used to find k+1 itemsets.
To improve the efficiency of level-wise genera on of frequent itemsets, an
important property is used called Apriori property which helps by reducing the
search space.
Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of
Apriori algorithm is its an -monotonicity of support measure. Apriori assumes
that
All subsets of a frequent itemset must be frequent(Apriori property). If
an itemset is infrequent, all its supersets will be infrequent.
Before we start understanding the algorithm, go through some defini ons which
are explained in my previous post.
Consider the following dataset and we will find frequent itemsets and generate
associa on rules for them.
minimum support count is 2 minimum
confidence is 60%

Step-1: K=1

(I)Create a table containing support count of each item present in dataset –


Called C1(candidate set)
(II) compare candidate set item’s support count with minimum support
count(here min_support=2 if support_count of candidate set items is less than
min_support then remove those items). This gives us itemset L1.

Step-2: K=2

Generate candidate set C2 using L1 (this is called join step). Condi on of


joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
Check all subsets of an itemset are frequent or not and if not frequent remove
that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check
for each itemset)
Now find the support count of these itemsets by searching in the dataset.
(II) compare candidate (C2) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support
then remove those items) this gives us itemset L2.

Step-3:

Generate candidate set C3 using L2 (join step). Condi on of joining Lk-1 and
Lk1 is that it should have (K-2) elements in common. So here, for L2, first
element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3,
I4}{I2, I4, I5}{I2, I3, I5}
Check if all subsets of these itemsets are frequent or not and if not, then remove
that itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are
frequent. For {I2, I3, I4}, the subset {I3, I4} is not frequent so remove it.
Similarly check for every itemset) find the support count of these remaining
itemset by searching in the dataset. (II) Compare candidate (C3) support count
with minimum support count(here min_support=2 if support_count of candidate
set item is less than min_support then remove those items) this gives us itemset
L3.

Step-4:

Generate candidate set C4 using L3 (join step). Condi on of joining Lk-1 and
Lk1 (K=4) is that they should have (K-2) elements in common. So here, for L3,
the first 2 elements (items) should match.
Check if all subsets of these itemsets are frequent or not (Here the itemset
formed by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which
is not frequent). So no itemset in C4
We stop here because no frequent itemsets are found further

Thus, we have discovered all the frequent item-sets. Now a genera on of strong
associa on rule comes into picture. For that we need to calculate the confidence
of each rule.
Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk
and bread also bought bu er.

Confidence(A->B)=Support_count(A∪B)/Support_count(A)

So here, by taking an example of any frequent itemset, we will show the rule
genera on.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then the first 3 rules can be considered as
strong associa on rules.

Limitaions of Apriori Algorithm


Apriori Algorithms can be slow. The main limita on is me required to hold a
vast number of candidate sets with frequent itemsets, low minimum support or
large itemsets i.e. it is not an efficient approach for a large number of datasets.
For example, if there are 10^4 from frequent 1- itemsets, it needs to generate
more than 10^7 candidates into 2-lengths which in turn will be tested and
accumulated. Furthermore, to detect frequent pa erns in size 100 i.e. v1, v2…
v100, it have to generate 2^100 candidate itemsets that yield on costly and
wasting of time of candidate generation. So, it will check for many sets from
candidate itemsets, also it will scan the database many times repeatedly for
finding candidate itemsets. Apriori will be very low and inefficient when
memory capacity is limited with a large number of transactions.

Code:
from itertools import combinations

def get_itemset_transactions(data):
itemset = set()
transactions = []

for transaction in data:


transactions.append(set(transaction))
for item in transaction:
itemset.add(frozenset([item]))

return itemset, transactions

def get_frequent_itemsets(itemset, transactions, min_support):


itemset_count = {item: 0 for item in itemset}

for transaction in transactions:


for item in itemset:
if item.issubset(transaction):
itemset_count[item] += 1

num_transactions = len(transactions)
frequent_itemsets = {
item: count / num_transactions
for item, count in itemset_count.items()
if count / num_transactions >= min_support
}
eliminated_itemsets = {
item: count / num_transactions
for item, count in itemset_count.items()
if count / num_transactions < min_support
}
return frequent_itemsets, eliminated_itemsets

def apriori(data, min_support, min_confidence):


itemset, transactions = get_itemset_transactions(data)
frequent_itemsets, eliminated_itemsets = get_frequent_itemsets(itemset,
transactions, min_support)

iteration = 1
print(f"\nIteration {iteration}: Frequent 1-itemsets")
print_itemsets(frequent_itemsets, eliminated_itemsets)

k=2
while True:
combined_itemsets = combine_frequent_itemsets(frequent_itemsets, k)
next_frequent_itemsets, eliminated_itemsets =
get_frequent_itemsets(combined_itemsets, transactions, min_support)

if not next_frequent_itemsets:
break

iteration += 1
print(f"\nIteration {iteration}: Frequent {k}-itemsets")
print_itemsets(next_frequent_itemsets, eliminated_itemsets)

frequent_itemsets.update(next_frequent_itemsets)
k += 1

rules = []
for itemset in frequent_itemsets:
if len(itemset) > 1:
for subset in combinations(itemset, len(itemset) - 1):
subset = frozenset(subset)
remainder = itemset - subset
confidence = frequent_itemsets[itemset] / frequent_itemsets[subset]

if confidence >= min_confidence:


rules.append((subset, remainder, confidence))

return frequent_itemsets, rules

def combine_frequent_itemsets(frequent_itemsets, length):


combined = set()
items = list(frequent_itemsets.keys())

for i in range(len(items)):
for j in range(i + 1, len(items)):
union_itemset = items[i].union(items[j])
if len(union_itemset) == length:
combined.add(union_itemset)

return combined

def print_itemsets(frequent_itemsets, eliminated_itemsets):


print("Frequent Itemsets:")
for itemset, support in frequent_itemsets.items():
print(f"{set(itemset)}: {support:.3f}")

if eliminated_itemsets:
print("\nEliminated Itemsets:")
for itemset, support in eliminated_itemsets.items():
print(f"{set(itemset)}: {support:.3f}")

def print_rules(rules):
print("\nRules:")
for rule in rules:
antecedent, consequent, confidence = rule
print(f"{set(antecedent)} -> {set(consequent)}: {confidence:.3f}")

if __name__ == "__main__":
data = [
['milk', 'bread', 'butter'],
['beer', 'bread', 'butter'],
['milk', 'bread'],
['beer', 'butter'],
['milk', 'bread', 'beer', 'butter'],
['bread', 'butter'],
['beer', 'bread', 'butter']
]

min_support = 0.3
min_confidence = 0.7

frequent_itemsets, rules = apriori(data, min_support, min_confidence)


print_rules(rules)

Output:
Conclusion: In this experiment, the Apriori algorithm effectively identified
frequent itemsets and generated association rules from transactional data. The
results demonstrated the relationships between items, with support and
confidence metrics guiding the selection of significant patterns. Overall, the
analysis provides valuable insights for decision-making in retail and marketing
strategies.

You might also like