0% found this document useful (0 votes)
36 views15 pages

Association Pattern Mining - Intro

Uploaded by

rachit gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views15 pages

Association Pattern Mining - Intro

Uploaded by

rachit gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Association pattern mining

Subhasis Ray

2023-04-12

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Motivation

Are customers
buying cereals
likely to buy milk?

Credit: mroach on flickr (https://www.flickr.com/photos/mroach/5196749464)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Market-basket
What items are frequently purchased together by customers?
You have a universe of items U, e.g., items sold at a
supermarket
Each transaction Ti is a set of items Ti ∈ U
An itemset is a set of items.
A k − itemset is an itemset containing k items

A database of transactions
tid itemset
1 Apple, Coke, DVD
2 Bread, Coke, Egg
3 Apple, Bread, Coke, Egg
4 Bread, Egg

tid: unique identifier for each transaction in the database


. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Market-basket
tid itemset
1 Apple, Coke, DVD
2 Bread, Coke, Egg
3 Apple, Bread, Coke, Egg
4 Bread, Egg

{Apple, Bread, Coke, DVD, Egg, Fish} = U


{Apple, Bread} a 2-item itemset
We can encode a basket as presence or absence of each item

tid itemset Apple Bread Coke DVD Egg Fish


1 Apple, Coke, DVD 1 0 1 1 0 0
2 Bread, Coke, Egg 0 1 1 0 1 0
3 Apple, Bread, Coke, Egg 1 1 1 0 1 0
4 Bread, Egg 0 1 0 0 1 0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Association rules
By analyzing these boolean vectors, we can find buying
patterns: which items are frquently associated or purchased
together.
We want to discover association rules like:
Cereals => Milk [support=2%, confidence=60%]
support of itemset I: the fraction of transactions in the
database that contain I as a subset. 2% of the
transactions in our database contain both Cereals and
Milk.
support-count - the number of transactions in the
database that contain the itemset I.
confidence fraction of transactions containing the left side
(antecedent) of the rule that also contain the right
(consequent). 60% of customers who bought Cereals also
bought Milk. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Support and confidence

In terms of probabilities
support(A => B) = P(A ∪ B), here A ∪ B means both
A and B occuring in the same transaction (note the
reversal from probability notation)
confidence(A => B) = P(B | A)
confidence(A => B) = P(A ∪ B) / P(A)
= support(A ∪ B) / support(A)
= support-count(A ∪ B) / support-count(A)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Exercise

Calculate the support for each item


Calculate the support for the itemset {Bread, Egg}
Calculate the confidence for the rule Bread => Coke
tid itemset
1 Apple, Coke, DVD
2 Bread, Coke, Egg
3 Apple, Bread, Coke, Egg
4 Bread, Egg

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Frequent itemset

Given a minimum support(-count) threshold minsup, itemsets


that occur at least minsup fraction of transaction in our
database.
The set of frequent k-itemsets is denoted Lk or Fk .
Frequent itemset mining: finding all itemsets that occur
in at least minsup fraction of transactions in our database
Support monotonicity: the support for every subset J of
itemset I is at least equal to that of I
support(J) >= support(I) ∀J ⊆ I
Downward closure/apriory property: Every subset of a
frequent itemset is also frequent!
Question: How many subsets exist for a frequent 10-itemset?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Max-itemset

A maximal frequent itemset or max-itemset is a frequent


itemset in the database such that no proper superset of it
is frequent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Association Rules

The rule A => B is called an association rule at a minimum


support of minsup and minimum confidence of minconf if it
satisfies the following:
1 The support of the itemset A ∪ B is at least minsup, and
2 The confidence of the rule A => B is at least minconf.
Criterion 1 ensures that the itemsets are frequent enough to
be relevant, and criterion 2 ensures their association is strong
enough in terms of conditional probability.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Steps to generate association rules

1 Generate all frequent itemsets at the minimum support


minsup
2 Generate association rules from the frequent itemsets at
the minimum confidence level minconf

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Brute-force approach

1 Generate all candidate itemsets C: |C| = 2|U| - 1


(excluding the empty set) for a universe U.
2 Count the support for each candidate in the transaction
database.
For |U| = 1000, |C| = 21000 - 1 > 10300
If computing support count for each candidate takes 1 µs,
checking C will take 10300 * 10-6 = 10294 s
Age of the universe ~14 billion years = 14 * 10 ^{9} *
365 * 86400 s = 4.4 * 1017 s

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Brute-force approach - 2

Notice that: If no k-itemset is frequent, then no


k+1-itemset is frequent.
Start with all 1-itemsets and count their support.
Construct all 2-itemsets from the above.
Continue up to length k when none of the candidates are
frequent.
k ≪ |U| (a supermarket may have 10,000 items, but it is
unlikely that individual customers frequently buy more
than 50 items).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Brute-force approach - 2

∑ ( )
Now |C| = ki=1 |U| i
What is |C| with |U| = 1000 and k = 10?
~ 2.7 * 1023
271 orders of magnitude improvement!
But still will take more time than the age of the
universe!!!

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining


Thank you!

References
Charu Aggarwal
Han, Kamber, Pei
Hongbo Du

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Subhasis Ray Association pattern mining

You might also like