0% found this document useful (0 votes)

58 views

Clickstream Analytics

- The document discusses clickstream analysis using association rule mining to discover patterns in user behavior from web log data. Association rule mining can find relationships between different items that users interact with, such as pages viewed or products purchased, together. It generates rules with metrics like support and confidence to indicate how strongly two items are associated. Uncovering such rules can provide insights for applications like recommender systems, marketing strategies, and website optimization.

Uploaded by

Arunima Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Clickstream Analytics

Uploaded by

Arunima Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Clickstream Analysis using

Association Rules
Web Analytics

▪ Web Structure Analytics – one of the method is network analysis which can be applied.
▪ Web Content Analytics - Text Analytics process can be applied.
▪ Web Usage Analytics – Application in Recommender system, Clickstream Analytics
Clickstream analytics
▪ Web log data of clicks recorded over a period of time can provide insights in
form of discovering patterns

Session Id Item

item1
Session ID 1
▪ Whenever item 4 is clicked item 5 is also clicked OR
Session ID 1 item3
▪ Whenever item4 is bought item 5 is also bought OR
Session ID 1 item4

Session ID 1 item5 ▪ Whenever item4 is viewed item 5 is also viewed

Session ID 2 item9

Session ID 2 item4

Session ID 2 item5
Discovering Patterns
▪ Pattern
▪ 12121?
▪ ’12’ pattern is found often enough So, with some confidence we can say ‘?’ is 2
▪ “If ‘1’ then ‘2’ follows”
▪ Pattern ➔ Model

Confidence
▪ 121212?
▪ 12121231212123121212?
▪ 121212➔ 3

▪ Models are created using historical data by detecting patterns. It is a calculated

guess about likelihood of repetition of pattern.
Discovering Patterns
What Is ASSOCIATION RULE MINING?

▪ Association rule mining is well know for Market Basket Analysis . It can also be
applied for clickstream analysis
▪ Finding frequent patterns, associations, correlations, or causal structures among
sets of items or objects in transaction databases, relational databases, and other
information repositories is called association rule mining.
▪ To simply put, given a set of records, each of which contain some number of
items from a given collection
▪ produce dependency rules which will predict occurrence of an item based on
occurrences of other items
What Is ASSOCIATION RULE MINING?

▪ Given a set of records, let say for a fresh farm ecommerce store, each of which
contain some number of items from a given collection
▪ produce dependency rules which will identify occurrence of an item based on
occurrences of other items
▪ For example set of records is in form of transaction data table from which rules are
found
TID Itemset Rules Found:
1 Bread, Coke, Milk {Milk} => {Coke}
{ Diaper, Milk} => {Beer}
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Association Rule Mining Basics
▪ It is assumed that we are given: (1) database of transactions, (2) each transaction
is a list of items (purchased/viewed/activity by a customer in a visit on
website)
▪ The transaction table have Transaction ID (TID) which is nominal field and
Itemset which is collection of items (one or more) (can be product or webpages)

TID Itemset
Then find: all rules that correlate the
1 Bread, Coke, Milk presence of one set of items with that of
2 Beer, Bread another set of items
E.g., 60% of people who viewed SonyAlpha12 also
3 Beer, Coke, Diaper, Milk tends to view Camera Tripod
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Applications of Association Rule Mining
▪ Market Basket Analysis or Association Rule Mining is applied in areas such as
▪ Clickstream Analytics – Consider rule for Goodreads as if you viewed the {Biography, … } -->
{Memoir}

▪ Marketing and Sales Promotion – Consider discovered rule as {Potato Chips , … } -->
{Cold drinks}
▪ Supermarket shelf management - Consider discovered rule as {Potato Chips , … } -->
{Cold drinks}
▪ Sequential Pattern Discovery - Given: set of objects, each associated with its own
timeline of events, find rules that predict strong sequential dependencies among
different events, of the form (A B) (C) (D E) --> (F). For
example, (Shoes) (Racket, Racketball) --> (Sports Jacket)
▪ Catalogue design for business, Product clustering , Credit/debit card analysis , Web
usage mining, Banking & Insurance products profiles
Translate Rules to Strategy
Strategy 1: Placing milk and bread as
frequently bought together on fresh
food website may further encourage
the sale of these items

Strategy 2: Placing milk and bread at opposite

ends. When people have put milk you remind
at the end of checkout process to add bread in
cart, along with other items associated

Strategy 3:Put these two items into a package

at reduced price.

10%
Off
Measures & Concepts of Association Rule
Mining
TID Itemset
▪ Itemset
1 Bread, Coke, Milk
▪ A collection of one or more items
2 Beer, Bread, Milk
▪ Example: {Milk, Bread, Diaper}
3 Beer, Coke, Diaper, Milk
▪ k-itemset
▪ An itemset that contains k items 4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
▪ Rule
▪ E.g. X=> Y
▪ Left Hand Side (LHS) of Rule = X is an itemset, for example X = { Beer, Bread}
▪ Right Hand Side (RHS) of Rule = Y is an itemset, for example Y = {Milk}
▪ => implication means co-occurrence & NOT causality , merely association
Support
TID Itemset
▪ Support count
1 Bread, Coke, Milk
▪ Frequency of occurrence of an itemset ,
▪ E.g. Support Count of ({Milk, Beer,Diaper}) = 2 2 Beer, Bread, Milk
3 Beer, Coke, Diaper, Milk
▪ Support
4 Beer, Bread, Diaper, Milk
▪ Fraction of transactions that contain an itemset
5 Coke, Diaper, Milk
▪ For a rule X => Y
▪ Probability that a transaction contains (X U Y)
▪ E.g. Support ({Milk, Beer, Diaper}) = 2/5 = 0.4
▪ Alternatively in probability notation,
▪ Support = P(XU Y) = Support Count (X U Y)/ Total no. of transactions
▪ Support is an indication of how frequently the itemset appears in the dataset
Confidence
TID Itemset
▪ Confidence
1 Bread, Coke, Milk
▪ For a rule X => Y
2 Beer, Bread, Milk
▪ conditional probability that a transaction having X also
contains Y 3 Beer, Coke, Diaper, Milk

▪ Measures how often itemset Y appear in transactions that 4 Beer, Bread, Diaper, Milk
contains X itemset 5 Coke, Diaper, Milk
▪ E.g. For Rule {Milk, Beer} => {Diaper}
▪ Confidence ({Milk, Beer, Diaper}) = 2/3 = 0.67
• Alternatively,
• Confidence(X=>Y) = Support (X U Y)/ Support (X)
• If we take Ex & Ey as events that a transaction contains itemset X & Y respectively then
• Support (X U Y) = P (Ex  Ey)
• Confidence (X=>Y) = P(Ey/Ex) = P (Ex  Ey) / P(Ex) = Support (X U Y) / Support (X)

• Confidence is an indication of how often the rule has been found to be true.
Association Rule Mining Task
▪ Now the Association Rule Mining Task can be broken down as
▪ Given a set of transactions T, the goal of association rule mining is to find all rules
having
▪ support ≥ minsup threshold ( user provided parameter)
▪ confidence ≥ minconf threshold ( user provided parameter)

▪ Also additionally we can consider Lift measure for evaluating rules

Generated Rule Evaluation
▪ Lift Measure
Coffee Coffee
▪ For a rule X => Y

▪ Lift (X=>Y) = Support (X U Y)/ (Support (X)*Support(Y))

Tea 15 5 20
Tea 75 5 80
90 10 100
▪ Lift & other measures can be used to prune/rank the derived patterns

▪ Let us test Association Rule: Tea → Coffee

Confidence = P(Coffee|Tea) = 15/20 = 0.75

▪ So it seems good rule. But note that Support (Coffee) = (15/100)/((20/100)*(90/100)) = 0.90

▪ Lift here is 0.75/0.9= 0.8333 (< 1, therefore is negatively associated i.e. substitute items) . So Lift is useful to judge
positive as well as negative association.

▪ Other rules interest measures are leverage, conviction, rule power factor , chisquare, cosine, coverage etc.
Association Rule Mining Task
▪ If we computer all rules for {Milk, Diaper, Beer} then we may realize that Rules
originating from the same itemset have identical support but have different
confidence
▪ So Task is broken down in two stages
▪ (1) Frequent Itemset Generation - Generate all itemsets whose support  minsup
▪ (2) Rule Generation - Generate high confidence rules from each frequent itemset
where confidence of rule  minconfidence
Additional Reference Slides – for those who
want to explore further ( not mandatory)
Approaches for association rule generation
▪ One approach can be Brute Force
▪ List all possible association rules
▪ Compute the support and confidence for each rule
▪ Prune rules that fail the minsup and minconf thresholds
▪ But this is Computationally Prohibitive

▪ Other is to use the algorithms like apriori, FP Growth, EClat

Alogrithms

▪ Apriori algorithm - uses a breadth-first search strategy to count the support of

itemsets and uses a candidate generation function which exploits the downward
closure property of support.

(a) Breadth first

▪ ECLAT algorithm - stands for Equivalence Class Transformation is a depth-first
search algorithm based on set intersection.

▪ FP-growth algorithm - FP stands for frequent pattern

(a) Breadth first (b) Depth first

Frequent Itemset Generation from Lattice
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE
Apriori Algorithm in Action
TID Itemsets ▪ Let Bread be assigned code 1 Butter – 2; Milk -3
A {Bread, Butter, Milk, Sugar} ; Sugar-4
B {Bread, Butter, Sugar} ▪ Then Transaction dataset would be
C {Bread, Butter}
TID Itemsets
D {Butter, Milk, Sugar}
A {1,2,3,4}
E {Butter, Milk}
B {1,2,4}
F {Milk, Sugar}
C {1,2}
G {Butter, Sugar}
D {2,3,4}
E {2,3}
F {3,4}
G {2,4}
Apriori Algorithm in Action
Itemsets Support
Determine Frequent Item Support
{1} 3
1-Itemset {1,2} 3
{2} 6
Let minsupport = 3, so any {1,3} 1
{3} 4
itemset which appears more {1,4} 2
than equal to 3 will be {4} 5
{2,3} 3
frequent itemset
{2,4} 4
Determine Frequent 2-Itemset
{3,4} 3
Only {1,3} & {1,4} are not
frequent.
Apriori Algorithm make use of
the result that any superset of
these will not be frequent Item Support
{2,3,4} 2

DevOps ZeroToHero English
No ratings yet
DevOps ZeroToHero English
3 pages
Two Way User Guide MN006555A01-AC - Multilingual - Talkabout - Two-Way - Radio - T21X - T27X - Series - User - Guide
No ratings yet
Two Way User Guide MN006555A01-AC - Multilingual - Talkabout - Two-Way - Radio - T21X - T27X - Series - User - Guide
124 pages
Bagisto Packaging Best Practices
No ratings yet
Bagisto Packaging Best Practices
9 pages
Siemens Connexxlink Operation User S Manual 17
No ratings yet
Siemens Connexxlink Operation User S Manual 17
17 pages
Motion - GPS: How To Export or Import Files
No ratings yet
Motion - GPS: How To Export or Import Files
22 pages
Overview of Entrepreneurship
No ratings yet
Overview of Entrepreneurship
17 pages
Grade 4 Answers Key MR Maged Ogharli
No ratings yet
Grade 4 Answers Key MR Maged Ogharli
44 pages
4.1. Networks
100% (1)
4.1. Networks
26 pages
Linux Boot Process
No ratings yet
Linux Boot Process
14 pages
Fair Work Handbook
No ratings yet
Fair Work Handbook
15 pages
Vcloud Director Administrator's Guide
No ratings yet
Vcloud Director Administrator's Guide
258 pages
Cs Project
No ratings yet
Cs Project
41 pages
Maaaaaaaa
No ratings yet
Maaaaaaaa
17 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Management Discussion and Analysis
No ratings yet
Management Discussion and Analysis
14 pages
Business Story Examples
No ratings yet
Business Story Examples
5 pages
Yfvoapqgkstkwamojgqy
No ratings yet
Yfvoapqgkstkwamojgqy
126 pages
Linux From Scratch
100% (1)
Linux From Scratch
252 pages
Edupristine FM Brochure PDF
No ratings yet
Edupristine FM Brochure PDF
8 pages
DBMS Project Report 1
No ratings yet
DBMS Project Report 1
32 pages
Accounts & Admin
No ratings yet
Accounts & Admin
4 pages
Aarti Shriniwar Portfolio
No ratings yet
Aarti Shriniwar Portfolio
74 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
Software Process: Greeshma K V Assistant Professor Carmel College, Mala
No ratings yet
Software Process: Greeshma K V Assistant Professor Carmel College, Mala
20 pages
Building Blocks of The Learning Organization - HBR Article
No ratings yet
Building Blocks of The Learning Organization - HBR Article
2 pages
Metametaverse Pitch Deck 0.68
No ratings yet
Metametaverse Pitch Deck 0.68
16 pages
Aafr Updated Past Papers
No ratings yet
Aafr Updated Past Papers
491 pages
Energy Mineral Extr
No ratings yet
Energy Mineral Extr
33 pages
Microsoft Corp. $260.36 Rating: Neutral Neutral Neutral
No ratings yet
Microsoft Corp. $260.36 Rating: Neutral Neutral Neutral
3 pages
1.3 - Network Core
100% (1)
1.3 - Network Core
23 pages
Excel Basics Cheat Sheet: Operators
No ratings yet
Excel Basics Cheat Sheet: Operators
1 page
13.3.1 Packet Tracer - Use ICMP To Test and Correct Network Connectivity
No ratings yet
13.3.1 Packet Tracer - Use ICMP To Test and Correct Network Connectivity
2 pages
dEPM Roadmap
No ratings yet
dEPM Roadmap
43 pages
AWS Machine Learning Engineer Nanodegree Program Syllabus
No ratings yet
AWS Machine Learning Engineer Nanodegree Program Syllabus
16 pages
WIREs Computational Stats - 2010 - Gollmann - Computer Security
No ratings yet
WIREs Computational Stats - 2010 - Gollmann - Computer Security
11 pages
3.1 Sources of Finance 2021
No ratings yet
3.1 Sources of Finance 2021
22 pages
IP Management
No ratings yet
IP Management
62 pages
Citigroup
No ratings yet
Citigroup
9 pages
How To Build A Better Test Strategy
No ratings yet
How To Build A Better Test Strategy
6 pages
JAN40-Netflix 2
No ratings yet
JAN40-Netflix 2
19 pages
(Unit-3) Human Resource Management - Kmbn-202
No ratings yet
(Unit-3) Human Resource Management - Kmbn-202
31 pages
(Prof Rob) Prof Rob Van Tulder - SAC 2022
No ratings yet
(Prof Rob) Prof Rob Van Tulder - SAC 2022
22 pages
Cisco Nexus 7700 Switches Data Sheet
No ratings yet
Cisco Nexus 7700 Switches Data Sheet
10 pages
AI Magazine - 2022 - Chaudhri - Knowledge Graphs Introduction History and Perspectives
No ratings yet
AI Magazine - 2022 - Chaudhri - Knowledge Graphs Introduction History and Perspectives
13 pages
Greenmarketing
No ratings yet
Greenmarketing
20 pages
Maturity Matters: Performance Determinants of The Procurement Business Function
No ratings yet
Maturity Matters: Performance Determinants of The Procurement Business Function
13 pages
MCIT Deployment Best Practice Guide
No ratings yet
MCIT Deployment Best Practice Guide
143 pages
Data Integration Specialist
No ratings yet
Data Integration Specialist
2 pages
Observe Directly: An ITIL® Guiding Principle
No ratings yet
Observe Directly: An ITIL® Guiding Principle
7 pages
Welding - Geometry Angles
No ratings yet
Welding - Geometry Angles
4 pages
An Introduction To Arangodb Server, An Advanced Multimodel Nosql Database
No ratings yet
An Introduction To Arangodb Server, An Advanced Multimodel Nosql Database
47 pages
Sapm
No ratings yet
Sapm
35 pages
Betriebsanleitung Und Sicherheitsvorschriften Operating and Safety Instructions Mode D Emploi Et de Sécurité Istruzioni Per L Uso E Di Sicurezza
No ratings yet
Betriebsanleitung Und Sicherheitsvorschriften Operating and Safety Instructions Mode D Emploi Et de Sécurité Istruzioni Per L Uso E Di Sicurezza
19 pages
Ans - HG: Combinational Circuit
No ratings yet
Ans - HG: Combinational Circuit
75 pages
Buffer Overflow by Alfred Chin
100% (1)
Buffer Overflow by Alfred Chin
16 pages
Unit-1 Concept of PM
No ratings yet
Unit-1 Concept of PM
24 pages
Reasearch Paper
100% (1)
Reasearch Paper
9 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
49 pages
QC Report
No ratings yet
QC Report
23 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Association
No ratings yet
Association
54 pages
versal-ai-edge-gen2-automotive-solution-brief
No ratings yet
versal-ai-edge-gen2-automotive-solution-brief
3 pages
CP E28a
No ratings yet
CP E28a
4 pages
Nordac Pro - SK 500e Frequency Inverter - Brochure
No ratings yet
Nordac Pro - SK 500e Frequency Inverter - Brochure
44 pages
7.2.6 Troubleshoot IP Configuration 2
0% (1)
7.2.6 Troubleshoot IP Configuration 2
2 pages
WR-854 - B - Manual-01202004
No ratings yet
WR-854 - B - Manual-01202004
45 pages
Gopal K. Profile
No ratings yet
Gopal K. Profile
1 page
PENTEST-Introduction To Web Pentest
No ratings yet
PENTEST-Introduction To Web Pentest
10 pages
Kirti LohokareResume
No ratings yet
Kirti LohokareResume
8 pages
An A-Z Index of The Command Line: Windows NT/XP
No ratings yet
An A-Z Index of The Command Line: Windows NT/XP
5 pages
APO Transaction Codes
No ratings yet
APO Transaction Codes
54 pages
2018 Mckinsey. A Digital Branch For A Digital Age
No ratings yet
2018 Mckinsey. A Digital Branch For A Digital Age
11 pages
NEC SL1000 Programming Manual (Simplify)
No ratings yet
NEC SL1000 Programming Manual (Simplify)
4 pages
Product Wise Multi Bom For Manufacturer
No ratings yet
Product Wise Multi Bom For Manufacturer
10 pages
Organization of 8085 Microprocessor.
No ratings yet
Organization of 8085 Microprocessor.
5 pages
Hacking For Beginners The Complete Guide - Barnes Tim
No ratings yet
Hacking For Beginners The Complete Guide - Barnes Tim
48 pages
0fi GL 6
No ratings yet
0fi GL 6
3 pages
NetVault Backup Installation Guide - 100 PDF
No ratings yet
NetVault Backup Installation Guide - 100 PDF
62 pages
Kits de Instalación TRF
No ratings yet
Kits de Instalación TRF
31 pages
Pipe Stone Motel Pvt.ltd(Mehak)
No ratings yet
Pipe Stone Motel Pvt.ltd(Mehak)
56 pages
Huawei TL1
No ratings yet
Huawei TL1
1,730 pages
Anuj Satish Lathi: Experience
No ratings yet
Anuj Satish Lathi: Experience
1 page
UNIX SYSTEM PROGRAMMING
No ratings yet
UNIX SYSTEM PROGRAMMING
13 pages
Debug 1214
No ratings yet
Debug 1214
3 pages
Requirements For Delivering Deployment Services
No ratings yet
Requirements For Delivering Deployment Services
5 pages
My Notes On CPP in Sap Apo
No ratings yet
My Notes On CPP in Sap Apo
14 pages
Jayakrishnan J Chennai 10.03 Yrs
No ratings yet
Jayakrishnan J Chennai 10.03 Yrs
2 pages
Hibernate Interview Questions (2023) - InterviewBit
No ratings yet
Hibernate Interview Questions (2023) - InterviewBit
65 pages
Vdocuments - MX - Demystifying SDN For Optical Transport Networks Real Demystifying SDN For
No ratings yet
Vdocuments - MX - Demystifying SDN For Optical Transport Networks Real Demystifying SDN For
7 pages
Worldwide Shipping G540 Universal Usb Bios Programmer - Buy G540, Bios Programmer, Universal Programmer Product On
No ratings yet
Worldwide Shipping G540 Universal Usb Bios Programmer - Buy G540, Bios Programmer, Universal Programmer Product On
7 pages

Clickstream Analytics

Uploaded by

Clickstream Analytics

Uploaded by

Clickstream Analysis using

Session ID 1 item5 ▪ Whenever item4 is viewed item 5 is also viewed

▪ Models are created using historical data by detecting patterns. It is a calculated

Strategy 2: Placing milk and bread at opposite

Strategy 3:Put these two items into a package

▪ Also additionally we can consider Lift measure for evaluating rules

▪ Lift (X=>Y) = Support (X U Y)/ (Support (X)*Support(Y))

▪ Let us test Association Rule: Tea → Coffee

▪ Other is to use the algorithms like apriori, FP Growth, EClat

▪ Apriori algorithm - uses a breadth-first search strategy to count the support of

(a) Breadth first

▪ FP-growth algorithm - FP stands for frequent pattern

(a) Breadth first (b) Depth first

ABCD ABCE ABDE ACDE BCDE

You might also like