0% found this document useful (0 votes)
47 views

Association: Market Basket Analysis

Uploaded by

projectonamlon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Association: Market Basket Analysis

Uploaded by

projectonamlon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Association: Market Basket Analysis

WH AT I S A S S O C IA TIO N R U L E M IN IN G ?
 Association rule mining searches for relationships between items in a
dataset:
⚫ aims at discovering associations between items in a

transactional database.
Store {a,b,c,d…} find
combinations
of items that
{x,y,z}
occur typically
together
{ , , ,
…}

• Rule form: “Body  Head [support,


confidence]”.

buys(x, “bread”) buys(x, “milk”) [0.6%, 65%]
major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]
38
Transactional Databases

Transaction Frequent Rule


itemset

(Bread, milk) Bread milk


{bread, milk, Pop,…}

Automatic
diagnostic {term1, term2,…,termn} (term2, term25) term2 term25

(f3, f5, fα) f3^f5 fα


{f1, f2,…,fn}
Association Rule Mining

 Given a set of transactions, find rules that will predict the occurrence of an
item based on the occurrences of other items in the transaction

Market-Basket transactions
Example of Association Rules

{Diaper}  {Beer},

{Milk, Bread}  {Eggs,Coke},


{Beer, Bread}  {Milk},

Implication means co- occurrence, not causality!


The model: data

• I = {i1, i2, …, im}: a set of items.

• Transaction t : t a set of items, and t  I.

• Transaction Database T: a set of transactions T = {t1, t2, …, tn}.

6
Transaction data: supermarket data

• Market basket transactions:


t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}

• Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it may have TID (transaction ID)
• A transactional dataset: A set of transactions

7
Example: Transaction data

Consider the following four sets of items (itemsets) bought together:

1.{bread, diapers, milk}

2.{beer, diapers, butter}

3.{bread, beer, diapers, butter}

4.{beer, butter}

8
Example: Transaction data

Transaction id Items

t1 {1, 2, 4, 5}
t2 {2, 3, 5}
t3 {1, 2, 4, 5}
t4 {1, 2, 3, 5}
t5 {1, 2, 3, 4, 5}
t6 {2, 3, 4}

9
The model: rules

• A transaction t contains X, a set of items (itemset) in I, if X  t.

• An association rule is an implication of the form:


X  Y, where X, Y  I, and X Y = 

• An itemset is a set of items.


• E.g., X = {milk, bread, cereal} is an itemset.

• A k-itemset is an itemset with k items.


• E.g., {milk, bread, cereal} is a 3-itemset

10
What an association rule exactly looks like?

A `rule’ is something like this:


If a basket contains Bread and Egg, then it also
contains Milk.

• It consists of an antecedent and a consequent, both of which are a list of


items.

• For a given rule, itemset is the list of all the items in the antecedent and the
consequent.

An association rule is a pattern that states when X occurs, Y occurs with


certain probability.
11
Rule strength measures

• N is the number of transactions in the transaction database,

• sup(X  Y) is the number of transactions containing X and Y,

• sup(X) is the number of transactions containing X

• sup(Y) is the number of transactions containing Y.

12
Rule strength measures

• Support:

support( =

• This measure gives an idea of how frequent an itemset is in all the


transactions.

• coverage or support – how much of the database contains the `if’ part?
13
Rule strength measures
• Confidence:

confidence( =

• This measure defines the likeliness of occurrence of consequent on the cart


given that the cart already has the antecedents.

• Technically, confidence is the conditional probability of occurrence of


consequent given the antecedent.

• confidence – when the `if’ part is true, how often is the `then’ bit true? This
is the same as accuracy.
14
Rule strength measures

• Lift:

Lift( =

• The lift of the rule X=>Y is the confidence of the rule divided by the
expected confidence, assuming that the itemsets X and Y are
independent of each other.

• The expected confidence is the confidence divided by the frequency of


{Y}.

15
Rule strength measures

• Lift: Lift( =

• Lift value near 1 indicates X and Y almost often appear together as


expected,

• greater than 1 means they appear together more than expected, and

• less than 1 means they appear less than expected.

• Greater lift values indicate stronger association.

16
What Is An Itemset?

• A set of items together is called an itemset. If any itemset has k- items it is called a k-
itemset.

• An itemset consists of two or more items. An itemset that occurs frequently is called a
frequent itemset.

• Thus frequent itemset mining is a data mining technique to identify the items that often
occur together.

• For Example, Bread and butter, Laptop and Antivirus software, etc.
What Is a Frequent Itemset?

• A set of items is called frequent if it satisfies a minimum threshold value for support and
confidence. Support shows transactions with items purchased together in a single
transaction. Confidence shows transactions where the items are purchased one after the
other.

• For frequent itemset mining method, we consider only those transactions which meet
minimum threshold support and confidence requirements. Insights from these mining
algorithms offer a lot of benefits, cost-cutting and improved competitive advantage.

• There is a tradeoff time taken to mine data and the volume of data for frequent mining.
The frequent mining algorithm is an efficient algorithm to mine the hidden patterns of
itemsets within a short time and less memory consumption.
Example

1 Milk Egg Bread Butter


2 Milk Butter Egg Ketchup
3 Bread Butter Ketchup
4 Milk Bread Butter
5 Bread Butter Cookies
6 Milk Bread Butter Cookies
7 Milk Cookies
8 Milk Bread Butter
9 Bread Butter Egg Cookies
10 Milk Butter Bread
11 Milk Bread Butter
12 Milk Bread Cookies Ketchup
21
Rule strength measures

Support ( Milk)?

Support (Egg) ?

Support (Milk -> Bread) ?

Confidence( Milk-> Bread) ?

Lift(Milk-> Bread) ?

22
Example
Item Frequency
Milk 9
Egg 3
Bread 10
Butter 10
Cookies 5
Ketchup 3

23
Apriori Algorithm in Data Mining
1: Find all large 1-itemsets
2: For (k = 2 ; while Lk-1 is non-empty; k++)
3 {Ck = apriori-gen(Lk-1)
4 For each c in Ck, initialise c.count to zero
5 For all records r in the DB
6 {Cr = subset(Ck, r); For each c in Cr , c.count++ }
7 Set Lk := all c in Ck whose count >= minsup
8 } /* end -- return all of the Lk sets.

24
Apriori Algorithm in Data Mining
There are two-step process is followed, including join and prune actions
which are as follows −

1. Join Step: This step generates (K+1) itemset from K-itemsets by joining
each item with itself.

2. Prune Step: This step scans the count of each item in the database. If
the candidate item does not meet minimum support, then it is regarded
as infrequent and thus it is removed. This step is performed to reduce
the size of the candidate itemsets.

25
Apriori Algorithm in Data Mining
The Apriori Algorithm: Pseudo Code

C: Candidate item set of size k

L: Frequent itemset of size k

26
Apriori Algorithm: Find Association Rules
Here are dozen sales transactions

Find which products sell together often (that is, affinities between products)

The minimum support level will be set at 33% and the


confidence level will be set at 50%.

27
Apriori Algorithm Example

1 Milk Egg Bread Butter


2 Milk Butter Egg Ketchup
3 Bread Butter Ketchup
4 Milk Bread Butter
5 Bread Butter Cookies
6 Milk Bread Butter Cookies
7 Milk Cookies
8 Milk Bread Butter
9 Bread Butter Egg Cookies
10 Milk Butter Bread
11 Milk Bread Butter
12 Milk Bread Cookies Ketchup
28
Apriori Algorithm: Iteration 1

1-Item Sets Frequency Frequent 1-Item Sets Frequency


Milk 9 Milk 9
Egg 3
Bread 10 Bread 10
Butter 10 Butter 10
Cookies 5
Cookies 5
Ketchup 3

29
Apriori Algorithm: Iteration 2 Milk
Bread
Butter
Cookies

2-Item Sets Frequency Frequent 2-Item Sets Frequency


Milk, Bread 7 Milk, Bread 7
Milk, Butter 7 Milk, Butter 7
Milk, Cookies 3 Bread, Butter 9
Bread, Butter 9 Bread, Cookies 4
Bread, Cookies 4
Butter, Cookies 3

30
Apriori Algorithm: Iteration 3 Milk
Bread
Butter
Cookies

3-Item Sets Frequency Frequent 3-Item Sets Frequency


Milk, Bread, Butter 6 Milk, Bread, Butter 6
Milk, Bread, Butter 1
Bread, Butter, Cookies 3
Milk, Butter, Butter 2

31
Apriori Algorithm: Iteration 3

Milk
Bread
Butter

4-Item Sets Frequency


???? ????

Frequent 3-Item Sets Frequency


Milk, Bread, Butter 6

32
Apriori Algorithm: Association Rule Formation

Frequent 3-Item Set I = { Milk, Bread, Butter}

Non-empty Subset are = { (Milk), (Bread), (Butter),


(Milk, Bread) (Milk, Butter), (Bread, Butter),
(Milk, Bread, Butter)
}

33
Apriori Algorithm: Association Rule Formation

How to form Association Rule …

For every non-empty set S of I, the association rule is:

• S -> (S-I)
• If support (I)/ support(S) >= minimum_confidance

34
Apriori Algorithm: Association Rule Formation
 Non-empty Subset are = {(Milk), (Bread), (Butter), (Milk, Bread) (Milk,
Butter), (Bread, Butter), (Milk, Bread, Butter)}

 Min_support =30% and Min_Confidence = 60%

 Rule 1: {Milk} -> {Bread, Butter}


 support(Milk, Bread, Butter) = 6/12
 support(Milk) = 9/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Milk)
= 6/12 * 12/9 = 6/9
= 66.67% >= 60%
√ {Milk} -> {Bread, Butter}

35
Apriori Algorithm: Association Rule Formation
 Rule 2: {Bread} -> {Milk, Butter}
 support(Milk, Bread, Butter) = 6/12
 support(Bread) = 10/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Bread)
= 6/12 * 12/10 = 6/10
= 66.00% >= 60%
√ {Bread} -> {Milk, Butter}

 Rule 3: {Butter} -> {Milk, Butter, Bread}


 support(Milk, Bread, Butter) = 6/12
 support(Butter) = 10/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Butter)
= 6/12 * 12/10 = 6/10
= 66.00% >= 60%
√ {Butte} -> {Milk, Bread}
36
Apriori Algorithm: Association Rule Formation
 Rule 4: {Milk, Bread} -> {Butter}
 support(Milk, Bread, Butter) = 6/12
 support(Milk, Bread) = 7/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Milk, Bread)
= 6/12 * 12/7 = 6/7
= 85.7% >= 60%
√ {Milk, Bread} -> {Butter}

 Rule 5: {Milk, Butter} -> {Bread}


 support(Milk, Bread, Butter) = 6/12
 support(Milk, Butter) = 7/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Milk, Butter)
= 6/12 * 12/7 = 6/7
= 85.7% >= 60%
√ {Milk, Butter} -> {Bread}
37
Apriori Algorithm: Association Rule Formation
 Rule 6: {Butter, Bread} -> {Milk}
 support(Milk, Bread, Butter) = 6/12
 support(Butter, Bread) = 7/ 12
 Confidence = support(Milk, Bread, Butter)/ support(Butter, Bread)
= 6/12 * 12/9 = 6/9
= 85.7% >= 66.66%
√ {Butter, Bread} -> {Milk}

38
Thanks

You might also like