DWM Exp8
DWM Exp8
Roll No:24
Batch:T12
Step-1: K=1
Step-2: K=2
Step-3:
Generate candidate set C3 using L2 (join step). Condi on of joining Lk-1 and
Lk1 is that it should have (K-2) elements in common. So here, for L2, first
element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3,
I4}{I2, I4, I5}{I2, I3, I5}
Check if all subsets of these itemsets are frequent or not and if not, then remove
that itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are
frequent. For {I2, I3, I4}, the subset {I3, I4} is not frequent so remove it.
Similarly check for every itemset) find the support count of these remaining
itemset by searching in the dataset. (II) Compare candidate (C3) support count
with minimum support count(here min_support=2 if support_count of candidate
set item is less than min_support then remove those items) this gives us itemset
L3.
Step-4:
Generate candidate set C4 using L3 (join step). Condi on of joining Lk-1 and
Lk1 (K=4) is that they should have (K-2) elements in common. So here, for L3,
the first 2 elements (items) should match.
Check if all subsets of these itemsets are frequent or not (Here the itemset
formed by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which
is not frequent). So no itemset in C4
We stop here because no frequent itemsets are found further
Thus, we have discovered all the frequent item-sets. Now a genera on of strong
associa on rule comes into picture. For that we need to calculate the confidence
of each rule.
Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk
and bread also bought bu er.
Confidence(A->B)=Support_count(A∪B)/Support_count(A)
So here, by taking an example of any frequent itemset, we will show the rule
genera on.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then the first 3 rules can be considered as
strong associa on rules.
Code:
from itertools import combinations
def get_itemset_transactions(data):
itemset = set()
transactions = []
num_transactions = len(transactions)
frequent_itemsets = {
item: count / num_transactions
for item, count in itemset_count.items()
if count / num_transactions >= min_support
}
eliminated_itemsets = {
item: count / num_transactions
for item, count in itemset_count.items()
if count / num_transactions < min_support
}
return frequent_itemsets, eliminated_itemsets
iteration = 1
print(f"\nIteration {iteration}: Frequent 1-itemsets")
print_itemsets(frequent_itemsets, eliminated_itemsets)
k=2
while True:
combined_itemsets = combine_frequent_itemsets(frequent_itemsets, k)
next_frequent_itemsets, eliminated_itemsets =
get_frequent_itemsets(combined_itemsets, transactions, min_support)
if not next_frequent_itemsets:
break
iteration += 1
print(f"\nIteration {iteration}: Frequent {k}-itemsets")
print_itemsets(next_frequent_itemsets, eliminated_itemsets)
frequent_itemsets.update(next_frequent_itemsets)
k += 1
rules = []
for itemset in frequent_itemsets:
if len(itemset) > 1:
for subset in combinations(itemset, len(itemset) - 1):
subset = frozenset(subset)
remainder = itemset - subset
confidence = frequent_itemsets[itemset] / frequent_itemsets[subset]
for i in range(len(items)):
for j in range(i + 1, len(items)):
union_itemset = items[i].union(items[j])
if len(union_itemset) == length:
combined.add(union_itemset)
return combined
if eliminated_itemsets:
print("\nEliminated Itemsets:")
for itemset, support in eliminated_itemsets.items():
print(f"{set(itemset)}: {support:.3f}")
def print_rules(rules):
print("\nRules:")
for rule in rules:
antecedent, consequent, confidence = rule
print(f"{set(antecedent)} -> {set(consequent)}: {confidence:.3f}")
if __name__ == "__main__":
data = [
['milk', 'bread', 'butter'],
['beer', 'bread', 'butter'],
['milk', 'bread'],
['beer', 'butter'],
['milk', 'bread', 'beer', 'butter'],
['bread', 'butter'],
['beer', 'bread', 'butter']
]
min_support = 0.3
min_confidence = 0.7
Output:
Conclusion: In this experiment, the Apriori algorithm effectively identified
frequent itemsets and generated association rules from transactional data. The
results demonstrated the relationships between items, with support and
confidence metrics guiding the selection of significant patterns. Overall, the
analysis provides valuable insights for decision-making in retail and marketing
strategies.