Association: Market Basket Analysis
Association: Market Basket Analysis
WH AT I S A S S O C IA TIO N R U L E M IN IN G ?
Association rule mining searches for relationships between items in a
dataset:
⚫ aims at discovering associations between items in a
transactional database.
Store {a,b,c,d…} find
combinations
of items that
{x,y,z}
occur typically
together
{ , , ,
…}
Automatic
diagnostic {term1, term2,…,termn} (term2, term25) term2 term25
Given a set of transactions, find rules that will predict the occurrence of an
item based on the occurrences of other items in the transaction
Market-Basket transactions
Example of Association Rules
{Diaper} {Beer},
6
Transaction data: supermarket data
• Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it may have TID (transaction ID)
• A transactional dataset: A set of transactions
7
Example: Transaction data
4.{beer, butter}
8
Example: Transaction data
Transaction id Items
t1 {1, 2, 4, 5}
t2 {2, 3, 5}
t3 {1, 2, 4, 5}
t4 {1, 2, 3, 5}
t5 {1, 2, 3, 4, 5}
t6 {2, 3, 4}
9
The model: rules
10
What an association rule exactly looks like?
• For a given rule, itemset is the list of all the items in the antecedent and the
consequent.
12
Rule strength measures
• Support:
support( =
• coverage or support – how much of the database contains the `if’ part?
13
Rule strength measures
• Confidence:
confidence( =
• confidence – when the `if’ part is true, how often is the `then’ bit true? This
is the same as accuracy.
14
Rule strength measures
• Lift:
Lift( =
• The lift of the rule X=>Y is the confidence of the rule divided by the
expected confidence, assuming that the itemsets X and Y are
independent of each other.
15
Rule strength measures
• Lift: Lift( =
• greater than 1 means they appear together more than expected, and
16
What Is An Itemset?
• A set of items together is called an itemset. If any itemset has k- items it is called a k-
itemset.
• An itemset consists of two or more items. An itemset that occurs frequently is called a
frequent itemset.
• Thus frequent itemset mining is a data mining technique to identify the items that often
occur together.
• For Example, Bread and butter, Laptop and Antivirus software, etc.
What Is a Frequent Itemset?
• A set of items is called frequent if it satisfies a minimum threshold value for support and
confidence. Support shows transactions with items purchased together in a single
transaction. Confidence shows transactions where the items are purchased one after the
other.
• For frequent itemset mining method, we consider only those transactions which meet
minimum threshold support and confidence requirements. Insights from these mining
algorithms offer a lot of benefits, cost-cutting and improved competitive advantage.
• There is a tradeoff time taken to mine data and the volume of data for frequent mining.
The frequent mining algorithm is an efficient algorithm to mine the hidden patterns of
itemsets within a short time and less memory consumption.
Example
Support ( Milk)?
Support (Egg) ?
Lift(Milk-> Bread) ?
22
Example
Item Frequency
Milk 9
Egg 3
Bread 10
Butter 10
Cookies 5
Ketchup 3
23
Apriori Algorithm in Data Mining
1: Find all large 1-itemsets
2: For (k = 2 ; while Lk-1 is non-empty; k++)
3 {Ck = apriori-gen(Lk-1)
4 For each c in Ck, initialise c.count to zero
5 For all records r in the DB
6 {Cr = subset(Ck, r); For each c in Cr , c.count++ }
7 Set Lk := all c in Ck whose count >= minsup
8 } /* end -- return all of the Lk sets.
24
Apriori Algorithm in Data Mining
There are two-step process is followed, including join and prune actions
which are as follows −
1. Join Step: This step generates (K+1) itemset from K-itemsets by joining
each item with itself.
2. Prune Step: This step scans the count of each item in the database. If
the candidate item does not meet minimum support, then it is regarded
as infrequent and thus it is removed. This step is performed to reduce
the size of the candidate itemsets.
25
Apriori Algorithm in Data Mining
The Apriori Algorithm: Pseudo Code
26
Apriori Algorithm: Find Association Rules
Here are dozen sales transactions
Find which products sell together often (that is, affinities between products)
27
Apriori Algorithm Example
29
Apriori Algorithm: Iteration 2 Milk
Bread
Butter
Cookies
30
Apriori Algorithm: Iteration 3 Milk
Bread
Butter
Cookies
31
Apriori Algorithm: Iteration 3
Milk
Bread
Butter
32
Apriori Algorithm: Association Rule Formation
33
Apriori Algorithm: Association Rule Formation
• S -> (S-I)
• If support (I)/ support(S) >= minimum_confidance
34
Apriori Algorithm: Association Rule Formation
Non-empty Subset are = {(Milk), (Bread), (Butter), (Milk, Bread) (Milk,
Butter), (Bread, Butter), (Milk, Bread, Butter)}
35
Apriori Algorithm: Association Rule Formation
Rule 2: {Bread} -> {Milk, Butter}
support(Milk, Bread, Butter) = 6/12
support(Bread) = 10/ 12
Confidence = support(Milk, Bread, Butter)/ support(Bread)
= 6/12 * 12/10 = 6/10
= 66.00% >= 60%
√ {Bread} -> {Milk, Butter}
38
Thanks