Clickstream Analytics
Clickstream Analytics
Association Rules
Web Analytics
▪ Web Structure Analytics – one of the method is network analysis which can be applied.
▪ Web Content Analytics - Text Analytics process can be applied.
▪ Web Usage Analytics – Application in Recommender system, Clickstream Analytics
Clickstream analytics
▪ Web log data of clicks recorded over a period of time can provide insights in
form of discovering patterns
Session Id Item
item1
Session ID 1
▪ Whenever item 4 is clicked item 5 is also clicked OR
Session ID 1 item3
▪ Whenever item4 is bought item 5 is also bought OR
Session ID 1 item4
Session ID 2 item4
Session ID 2 item5
Discovering Patterns
▪ Pattern
▪ 12121?
▪ ’12’ pattern is found often enough So, with some confidence we can say ‘?’ is 2
▪ “If ‘1’ then ‘2’ follows”
▪ Pattern ➔ Model
Confidence
▪ 121212?
▪ 12121231212123121212?
▪ 121212➔ 3
▪ Association rule mining is well know for Market Basket Analysis . It can also be
applied for clickstream analysis
▪ Finding frequent patterns, associations, correlations, or causal structures among
sets of items or objects in transaction databases, relational databases, and other
information repositories is called association rule mining.
▪ To simply put, given a set of records, each of which contain some number of
items from a given collection
▪ produce dependency rules which will predict occurrence of an item based on
occurrences of other items
What Is ASSOCIATION RULE MINING?
▪ Given a set of records, let say for a fresh farm ecommerce store, each of which
contain some number of items from a given collection
▪ produce dependency rules which will identify occurrence of an item based on
occurrences of other items
▪ For example set of records is in form of transaction data table from which rules are
found
TID Itemset Rules Found:
1 Bread, Coke, Milk {Milk} => {Coke}
{ Diaper, Milk} => {Beer}
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Association Rule Mining Basics
▪ It is assumed that we are given: (1) database of transactions, (2) each transaction
is a list of items (purchased/viewed/activity by a customer in a visit on
website)
▪ The transaction table have Transaction ID (TID) which is nominal field and
Itemset which is collection of items (one or more) (can be product or webpages)
TID Itemset
Then find: all rules that correlate the
1 Bread, Coke, Milk presence of one set of items with that of
2 Beer, Bread another set of items
E.g., 60% of people who viewed SonyAlpha12 also
3 Beer, Coke, Diaper, Milk tends to view Camera Tripod
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Applications of Association Rule Mining
▪ Market Basket Analysis or Association Rule Mining is applied in areas such as
▪ Clickstream Analytics – Consider rule for Goodreads as if you viewed the {Biography, … } -->
{Memoir}
▪ Marketing and Sales Promotion – Consider discovered rule as {Potato Chips , … } -->
{Cold drinks}
▪ Supermarket shelf management - Consider discovered rule as {Potato Chips , … } -->
{Cold drinks}
▪ Sequential Pattern Discovery - Given: set of objects, each associated with its own
timeline of events, find rules that predict strong sequential dependencies among
different events, of the form (A B) (C) (D E) --> (F). For
example, (Shoes) (Racket, Racketball) --> (Sports Jacket)
▪ Catalogue design for business, Product clustering , Credit/debit card analysis , Web
usage mining, Banking & Insurance products profiles
Translate Rules to Strategy
Strategy 1: Placing milk and bread as
frequently bought together on fresh
food website may further encourage
the sale of these items
10%
Off
Measures & Concepts of Association Rule
Mining
TID Itemset
▪ Itemset
1 Bread, Coke, Milk
▪ A collection of one or more items
2 Beer, Bread, Milk
▪ Example: {Milk, Bread, Diaper}
3 Beer, Coke, Diaper, Milk
▪ k-itemset
▪ An itemset that contains k items 4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
▪ Rule
▪ E.g. X=> Y
▪ Left Hand Side (LHS) of Rule = X is an itemset, for example X = { Beer, Bread}
▪ Right Hand Side (RHS) of Rule = Y is an itemset, for example Y = {Milk}
▪ => implication means co-occurrence & NOT causality , merely association
Support
TID Itemset
▪ Support count
1 Bread, Coke, Milk
▪ Frequency of occurrence of an itemset ,
▪ E.g. Support Count of ({Milk, Beer,Diaper}) = 2 2 Beer, Bread, Milk
3 Beer, Coke, Diaper, Milk
▪ Support
4 Beer, Bread, Diaper, Milk
▪ Fraction of transactions that contain an itemset
5 Coke, Diaper, Milk
▪ For a rule X => Y
▪ Probability that a transaction contains (X U Y)
▪ E.g. Support ({Milk, Beer, Diaper}) = 2/5 = 0.4
▪ Alternatively in probability notation,
▪ Support = P(XU Y) = Support Count (X U Y)/ Total no. of transactions
▪ Support is an indication of how frequently the itemset appears in the dataset
Confidence
TID Itemset
▪ Confidence
1 Bread, Coke, Milk
▪ For a rule X => Y
2 Beer, Bread, Milk
▪ conditional probability that a transaction having X also
contains Y 3 Beer, Coke, Diaper, Milk
▪ Measures how often itemset Y appear in transactions that 4 Beer, Bread, Diaper, Milk
contains X itemset 5 Coke, Diaper, Milk
▪ E.g. For Rule {Milk, Beer} => {Diaper}
▪ Confidence ({Milk, Beer, Diaper}) = 2/3 = 0.67
• Alternatively,
• Confidence(X=>Y) = Support (X U Y)/ Support (X)
• If we take Ex & Ey as events that a transaction contains itemset X & Y respectively then
• Support (X U Y) = P (Ex Ey)
• Confidence (X=>Y) = P(Ey/Ex) = P (Ex Ey) / P(Ex) = Support (X U Y) / Support (X)
• Confidence is an indication of how often the rule has been found to be true.
Association Rule Mining Task
▪ Now the Association Rule Mining Task can be broken down as
▪ Given a set of transactions T, the goal of association rule mining is to find all rules
having
▪ support ≥ minsup threshold ( user provided parameter)
▪ confidence ≥ minconf threshold ( user provided parameter)
▪ So it seems good rule. But note that Support (Coffee) = (15/100)/((20/100)*(90/100)) = 0.90
▪ Lift here is 0.75/0.9= 0.8333 (< 1, therefore is negatively associated i.e. substitute items) . So Lift is useful to judge
positive as well as negative association.
▪ Other rules interest measures are leverage, conviction, rule power factor , chisquare, cosine, coverage etc.
Association Rule Mining Task
▪ If we computer all rules for {Milk, Diaper, Beer} then we may realize that Rules
originating from the same itemset have identical support but have different
confidence
▪ So Task is broken down in two stages
▪ (1) Frequent Itemset Generation - Generate all itemsets whose support minsup
▪ (2) Rule Generation - Generate high confidence rules from each frequent itemset
where confidence of rule minconfidence
Additional Reference Slides – for those who
want to explore further ( not mandatory)
Approaches for association rule generation
▪ One approach can be Brute Force
▪ List all possible association rules
▪ Compute the support and confidence for each rule
▪ Prune rules that fail the minsup and minconf thresholds
▪ But this is Computationally Prohibitive
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDE
Apriori Algorithm in Action
TID Itemsets ▪ Let Bread be assigned code 1 Butter – 2; Milk -3
A {Bread, Butter, Milk, Sugar} ; Sugar-4
B {Bread, Butter, Sugar} ▪ Then Transaction dataset would be
C {Bread, Butter}
TID Itemsets
D {Butter, Milk, Sugar}
A {1,2,3,4}
E {Butter, Milk}
B {1,2,4}
F {Milk, Sugar}
C {1,2}
G {Butter, Sugar}
D {2,3,4}
E {2,3}
F {3,4}
G {2,4}
Apriori Algorithm in Action
Itemsets Support
Determine Frequent Item Support
{1} 3
1-Itemset {1,2} 3
{2} 6
Let minsupport = 3, so any {1,3} 1
{3} 4
itemset which appears more {1,4} 2
than equal to 3 will be {4} 5
{2,3} 3
frequent itemset
{2,4} 4
Determine Frequent 2-Itemset
{3,4} 3
Only {1,3} & {1,4} are not
frequent.
Apriori Algorithm make use of
the result that any superset of
these will not be frequent Item Support
{2,3,4} 2