0% found this document useful (0 votes)
19 views44 pages

Unit 3

Uploaded by

redoxit809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views44 pages

Unit 3

Uploaded by

redoxit809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Data Mining (DM)

2101CS521

Unit-3
Mining Frequent
Patterns,
Associations, and
Correlations
Prof. Jayesh D. vagadiya
Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
[email protected]
9537133260
 Looping
Topics to be covered
• What Kinds of Patterns Can Be Mined?
• Market Basket Analysis
• Frequent Itemsets
• Association Rule
• Maximal and Closed Frequent Itemsets
• Apriori Algorithm
• Methods to Improve Apriori Efficiency
• FP-growth Algorithm
• Correlation
What Kinds of Patterns Can Be Mined?
 Data mining functionalities can be classified into two categories:
1. Descriptive We are going to cover
this part in this
2. Predictive chapter

 Descriptive
• This task presents the general properties of data stored in a database.
• The descriptive tasks are used to find out patterns in data.
• E.g.: Frequent patterns, association, correlation etc.

 Predictive
• These tasks predict the value of one attribute on the basis of values of other
attributes.
• E.g.: Festival Customer/Product Sell prediction at store

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 3
What Kinds of Patterns Can Be Mined?
 Mining Frequent Patterns:
• Frequent patterns are those patterns that
occur frequently in data. Here is the list
of kind of frequent patterns

• Frequent Item Set


• It refers to a set of items that frequently
appear together, for example, milk and
bread.

• Frequent Subsequence
• A sequence of patterns that occur
frequently such as purchasing a laptop is
followed by digital camera and a memory
card.

• Frequent Sub Structure


• A substructure can refer to different
structural forms (e.g., graphs, trees, or
lattices) that may be#2101CS521
Prof. Jayesh D. Vagadiya
combined
(DM) with
Unit 3 - Mining Frequent Patterns, Associations,
4
Market Basket Analysis
 Market Basket Analysis is a modelling technique to find frequent itemset.
 It is based on, if you buy a certain group of items, you are more (or less)
likely to buy another group of items.
 For example, if you are in a store and you buy a car then you are more
likely to buy insurance at the same time than somebody who don't buy
insurance also.
 The set of items that a customer buys it referred as an itemset.
 Market basket analysis seeks to find relationships between purchases
(Items).
 E.g. IF {Car, Accessories} THEN {Insurance}
{Car, Accessories}  {Insurance}

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 5
Association Rule
 The process of uncovering the relationship among data and determining
association rules.
 It is used to discover interesting relationships and associations among
items or events in large datasets.

Computer Software
IF then

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 6
Association Rule Mining
 Given a set of transactions, we need rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction.
 Market-Basket transactions
TI Items Example of
D Association Rules
1 Bread, Milk

2
Bread, Chocolate, Pepsi, {Chocolate} → {Pepsi},
Eggs {Milk, Bread} → {Eggs,
3 Milk, Chocolate, Pepsi, Coke Coke},
4
Bread, Milk, Chocolate, {Pepsi, Bread} → {Milk}
Pepsi
Bread, Milk, Chocolate,
5
Coke

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 7
Association Rule Mining Cont..
 Itemset TI Items
D
• A collection of one or more items
o E.g. : {Milk, Bread, Chocolate} 1 Bread, Milk

• k-itemset Bread, Chocolate, Pepsi,


2
Eggs
An itemset that contains k items
3 Milk, Chocolate, Pepsi, Coke
 Support count (σ)
Bread, Milk, Chocolate,
• Frequency of occurrence of an itemset 4
Pepsi
o E.g. σ({Milk, Bread, Chocolate}) = 2 Bread, Milk, Chocolate,
5
 Support Coke
• Fraction of transactions that contain an
itemset
o E.g. s({Milk, Bread, Chocolate}) = 2/5
 Frequent Itemset
• An itemset whose support is greater than
or equal to a minimum support threshold
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 8
Association Rule Mining Cont..
 Association Rule TI Items
D
 An implication expression of the form X → Y, where X and Y are
1 Bread, Milk
item sets
 E.g.: {Milk, Chocolate} → {Pepsi} Bread, Chocolate, Pepsi,
2
Eggs
 Rule Evaluation 3 Milk, Chocolate, Pepsi, Coke
 Support (s)
Bread, Milk, Chocolate,
 Fraction of transactions that contain both X and Y 4
Pepsi
 Confidence (c)
Bread, Milk, Chocolate,
5 contain X
 Measures how often items in Y appear in transactions that Coke
Example:
Find support & confidence for {Milk, Chocolate} ⇒ Pepsi

= c = 67

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 9
Association Rule Mining Cont..
A common strategy adopted by many association rule mining algorithms is to
decompose the problem into two major subtasks:
1. Frequent Itemset Generation
• The objective is to find all the item-sets that satisfy the minimum
support threshold.
• These item sets are called frequent item sets.
2. Rule Generation
• The objective is to extract all the high-confidence rules from the
frequent item sets found in the previous step.
• These rules are called strong rules.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 10
Maximal and Closed Frequent Itemsets
 Closed Frequent Itemsets:
 A frequent itemset is closed, when no (immediate) superset has the same support.

 Maximal Frequent Itemsets:


 A frequent itemset is maximal, if none of its (immediate) supersets is frequent.

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 11
Minimum
Maximal and Closed Frequent Itemsets Support = 3
TI {A} = 4 ; not closed due to {A,E}
Items
D {B} = 2 ; not frequent => ignore
1 ABCE {C} = 5 ; not closed due to {C,E}
{D} = 4 ; not closed due to {D,E}, but not maximal due to e.g.
2 ACDE {A,D}
{E} =
6 ; closed, {A,B}
but not maximal due to e.g. {D,E}
3 BCE = 1; not frequent => ignore
{A,C} = 3; not closed due to {A,C,E}
4 ACDE {A,D} = 3; not closed due to {A,D,E}
{A,E} = 4; closed, but not maximal due to
5 CDE {A,D,E}
6 ADE {B,C} = 2; not frequent => ignore
{A,B,C} = 1; not frequent => ignore {B,D} = 0; not frequent => ignore
{A,B,D} = 0; not frequent => ignore {B,E} = 2; not frequent => ignore
{A,B,E} = 1; not frequent => ignore {C,D} = 3; not closed due to {C,D,E}
{A,C,D} = 2; not frequent => ignore {C,E} = 5; closed, but not maximal due to
{A,C,E} = 3; maximal frequent {C,D,E}
{A,B,C,D}
{D,E} = 4; =closed,
0; not frequent
but not =>
maximal due to
{A,D,E} = 3; maximal frequent ignore
{B,C,D} = 0; not frequent => ignore {A,D,E}
{A,B,C,E} = 1; not frequent =>
{B,C,E} = 2; not frequent => ignore ignore
{C,D,E} = 3; maximal frequent
#2101CS521{B,C,D,E}
(DM)  Unit 3= 0; not
- Mining frequent
Frequent =>
Patterns, Associations,
Prof. Jayesh D. Vagadiya 12
Apriori Algorithm
 It is used to mine frequent patterns.
 The algorithm makes use of prior knowledge about frequent item sets to
efficiently explore and generate larger item sets.
 Apriori employs an iterative approach known as a level-wise search, where
k-itemsets are used to explore (k + 1)-itemsets.
 First, the set of frequent 1-itemsets is found by scanning the database to
accumulate the count for each item, and collecting those items that
satisfy minimum support.
 The resulting set is denoted by L1. Next, L1 is used to find L2, the set of
frequent 2-itemsets, which is used to find L3, and so on, until no more
frequent k-itemsets can be found.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 13
Minimum
Apriori Algorithm - Example Support = 2
C2 ItemS
ItemS Min.
TID Items C1 et Sup L1 ItemS et
Min.
100 134 {1} 2 et Sup {1 2}
Scan D {1} 2
200 235 {2} 3 {1 3}
300 1235 {2} 3 {1 5}
{3} 3
400 25 {3} 3 {2 3}
{4} 1
{5} 3 {2 5}
{5} 3
L2 ItemS Min. C2 ItemS Min. {3 5}
ItemS Min.
et Sup et Sup et Sup
Scan D
{1 2 1 Scan D {1 3} 2 {1 2} 1
3} {2 3} 2 {1 3} 2
{1 3 1
{2 5} 3 {1 5} 1
5}
{2 3 2 {3 5} 2 {2 3} 2
5} {2 5} 3
{3 5} 2
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 14
Minimum
Apriori Algorithm - Example Cont.. Support = 2
Rules
Generation
Association Suppor Confidenc Confidence (%)
Confidence
Rule t e
2^35 2 2/2 = 1 100 % A -> B
3^52 2 2/2 = 1 100 %
2^3 -> 5
2^53 2 2/3 = 66%
0.66
23^5 2 2/3 = 66%
=1
0.66
32^5 2 2/3 = 66%
0.66
52^3 2 2/3 = 66%
0.66

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 15
Apriori Property
 All nonempty subsets of a frequent itemset must also be
frequent.
 E.g. if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset.
 This property belongs to a special category of properties called
antimonotonicity in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well.
 It is used in apriori algorithm to improve the performance of algorithm.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 16
Important steps in Apriori
 The Join Step:
 Create candidate k-item sets by joining frequent (k-1)-item sets with itself.
 Ck is generated by joining Lk-1with itself.
 The join, Lk−1 I Lk−1, is performed, where members of Lk−1 are joinable if
their first (k − 2) items are in common.

 The Pruning Step:


 Any (k-1) itemset that is not frequent cannot be a subset of a frequent k-
itemset.(Apriori property)
 Hence, if any (k − 1)-subset of a candidate k-itemset is not in Lk−1, then
the candidate cannot be frequent either and so can be removed from C k.
 This steps improve the performance by removing the candidate from Ck.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 17
Apriori Algorithm Steps
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
for (k = 1; Lk != ∅; k++) do begin
Ck+1 = candidates generated from Lk //Join Step
Any (k-1) itemset that is not frequent cannot be a subset of a
frequent k- itemset. // Pruning Step
for each transaction t in database do
Increment the count of all candidates in Ck+1 That are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Lk;

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 18
Apriori steps
1- Frequent
Apply
C1 min_support L1

Self Join and


Pruning
2- Frequent
Apply
C2 min_support L2

Self Join and


Pruning
3- Frequent
Apply
C3 min_support L3

N- Frequent
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 19
Methods to Improve Apriori Efficiency
 Hash-based technique:
 Using a hash-based structure known as a hash table, the k-itemsets and their related
counts are generated.
 The table is generated using a hash function.
 For example, when scanning each transaction in the database to generate the
frequent 1-itemsets, L1, we can generate all the 2-itemsets for each transaction and
TI hash (i.e., map) them Suppo
into the different buckets of a hash table structure, and
Items the corresponding bucket counts.
increase
D Items rt
1 11, 12, 15 Count
2 12, 14 I1 6
3 12, 13 I2 7
4 11, 12, 14 I3 6
5 11,13 I4 2
6 12, 13 I5 2
C1
7 11,13
8 11, 12, 13,
15
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
9 11,
Prof. 12, 13
Jayesh D. Vagadiya 20
Methods to Improve Apriori Efficiency
 Hash-based technique: Hash Table Structure to
generate L2
Items Hash Buck
Count
et Function et
0 1 2 3 4 5 6
11, 12 4 [1*10+2] mod addre
7=5 ss
11,13 4 [1*10+3] mod Buck 2 2 4 2 2 4 4
7=6 et
Count
11, 14 1 [1*10+4] mod
7=0 Buck {I1,I4: {I1,I5: {I2,I3: {I2,I4: {I2,I5: {I1,I2: {I1,I5:
et 1} 2} 4} 2} 2} 4} 4}
11, 15 2 [1*10+5 mod
Conte {I3,I5:
7=1
nt 1}
12, 13 4 [2*10+3] mod
7=2 L2 NO NO YES NO NO YES YES
12, 14 2
[2*10+4] mod
7=3
12,Hash
15 Function
2 [2*10+5] mod
7=4
H(X, Y)= ((Order of First)* 10+(Order of
13, 14 0 Second))mod
- 7
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 21
Methods to Improve Apriori Efficiency
 Transaction reduction:
 A transaction that does not contain any frequent k-itemsets cannot contain any
frequent (k + 1)-itemsets.
 Therefore, such a transaction can be marked or removed.
 During this step, the algorithm further reduces the size of transactions by eliminating
items that are no longer frequent after the previous iteration.
 Since the eliminated items can't be part of any frequent itemsets, removing them
reduces the search space and improves efficiency.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 22
Methods to Improve Apriori Efficiency
 Partitioning:
 It consists of two phases.
 In phase I, the algorithm divides the transactions of D into n nonoverlapping
partitions.
 Find the frequent itemsets local to each partition (1 scan).
 Combine all local frequent itemsets to form candidate itemset.
 In phase II, Find global frequent itemsets among candidates (2 scan) and we get
Phase - I
Frequent itemsets in D
Phase - II
Find
Combine
Divide D frequent
results to Find global Freq.
into n itemsets
Trans. form a frequent itemset
Non- local
in D global itemsets s
over to each
set of among in D
lapping partition
candidate candidates
partitions (parallel
itemset
alg.)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 23
Methods to Improve Apriori Efficiency
 Sampling:
 A random sample S is selected from database D, and then a search is conducted for
frequent itemsets within that sample S.
 In this way, we trade off some degree of accuracy against efficiency.
 These frequent itemsets are called sample frequent itemsets.
 More than one sample could be used to improve accuracy.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 24
Methods to Improve Apriori Efficiency
 Dynamic itemset counting:
 Dynamic itemset counting refers to the process of incrementally updating the
support counts of itemsets as new transactions are added to the dataset.
 This is particularly useful when dealing with dynamic or streaming data where
transactions arrive over time.
 Instead of recalculating support counts from scratch whenever new data arrives,
dynamic counting efficiently maintains and updates the support counts of existing
itemsets.
 The technique uses the count-so-far as the lower bound of the actual count.
 If the count-so-far passes the minimum support, the itemset is added into the
frequent itemset collection and can be used to generate longer candidates.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 25
Disadvantages of Apriori
 It may still need to generate a huge number of candidate sets.
 For example, if there are 104 frequent 1-itemsets, the Apriori algorithm will need to
generate more than 107 candidate 2-itemsets.
 it may need to repeatedly scan the whole database and check a large set
of candidates by pattern matching.
 It is costly to go over each transaction in the database to determine the support of
the candidate itemsets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 26
Apriori Algorithm (Try Yourself!)
A database has 4 transactions. Let Min_sup = 50% and
Min_conf = 75%
TID Items
1000 Cheese, Milk, Cookies
Frequent Itemset Sup
2000 Butter, Milk, Bread
Butter,Milk,Bread 2
3000 Cheese, Butter, Milk, Bread
4000 Butter, Bread
Sr. Association Rule Support Confidence Confidence (%)
Rule Butter^Milk  2 2/2 = 1 100% Min_sup = 50%
1 Bread How to convert support
Rule Milk^Bread  2 2/2 = 1 100% in integer?
Given % X Total
2 Butter
Records
Rule Butter^Bread  2 2/3 = 0.66 66% 100
3 Milk So here, 50 X =4 2
100
Rule ButterMilk^Bre 2 2/3 = 0.66 66%
4 ad
Rule MilkButter^Bre 2 2/3 = 0.66 66%
Prof. Jayesh5
D. Vagadiyaad
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
27
FP-growth
 It stands for finding frequent item sets without candidate generation.
 First, it compresses the database representing frequent items into a
frequent pattern tree or FP tree.
 Once an FP-tree has been constructed, it uses a recursive divide-and-
conquer approach to mine the frequent item sets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 28
Minimum
FP-growth Example Support = 3
FP-Tree Step:1 Step:2
TI Generation Freq. 1- Transactions with items sorted
Items Itemsets. based on frequencies, and
D
Min_Sup  3 ignoring the infrequent items.
1 EKMNO item
Frequen {K:5, E:4, O:3, M:3,
Building the FP-Tree
cy Y:3}
Y A 1 TI  Scan data to determine
Sorted Items
D the support count of each
2 DEKNO C 2 item.
Y D 1
1 K E M O Y  Infrequent items are
2 KEOY discarded, while the
3 AEKM E 4
frequent items are sorted
4 CKMUY
I 1 3 KEM in decreasing support
K 5 counts.
5 CEIKO 4 KMY  Make a second pass over
M 3
5 KEO the data to construct the
N 2 FP­-tree.
O 3  As the transactions are
read, before being
U 1
processed, their items are
Y 3 sorted according to the
above order.
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 29
FP-growth Example
null
KEMOY
K EOY K:1

K EM
E:1
K MY K:5
K EO E:4
M:3 M:1

O:3
Y:3 O:1

Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 30
FP-growth Example
null
KEMOY
KEOY K:2

KEM
E:2
KMY K:5
KEO E:4
M:3 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 31
FP-growth Example
null
KEMOY
KEOY K:3

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 32
FP-growth Example
null
KEMOY
KEOY K:4

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 33
FP-growth Example
null
K EMOY
K EOY K:5

K EM
K MY E:4
K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:2

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 34
FP-growth Example
Conditional Pattern
Base
null

K:5 Item Conditional Pattern Base

Y {KEMO:1} {KEO:1}
E:4
{KM:1}
O {KEM:1} {KE:2}
M:2 M:1
M {KE:2} {K:1}
E {k:4}
O:1 O:2
K -
Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 35
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

Y {KEMO:1} {KEO:1} {K:3} O {KEM:1} {KE:2} {K:3,E:3


{KM:1} }
null
null
K:3
K:3

E:2
E:3

M:1 M:1
M:1

O:1 O:1

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 36
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

M {KE:2} {K:1} {K:3} E {k:4} {K:4}

null
null
K:3
K:4

E:2

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 37
FP-growth Example
Conditional FP-tree and Frequent
Ite
Patterns Generated Frequent Patterns
Conditional Pattern Base Conditional FP-tree
m Generated

Y {KEMO:1} {KEO:1} {K:3} {K,Y:3}


{KM:1}
O {KEM:1} {KE:2} {K:3,E:3} {K,O:3} {E,O:3}
{K,E,0:3}
M {KE:2} {K:1} {K:3} {K,M:3}
E {k:4} {K:4} {K,E:4}
K - - -

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 38
FP-growth Algorithm
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
1. The FP-tree is constructed in the following steps:

1. Scan the transaction database D once. Collect F, the set of frequent items,
and their support counts. Sort F in support count descending order as L, the
list of frequent items.
2. Create the root of an FP-tree, and label it as “null.” For each transaction
Trans in D do the following.
1. select and sort the frequent items in Trans according to the order of L.
2. Let the sorted frequent item list in Trans be [p|P], where p is the first
element and P is the remaining list. Call insert_tree([p|P], T)

2. The FP-tree is mined by calling FP_growth(FP tree, null)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 39
FP-growth Algorithm
Insert_tree([p|P], T):

If T has a child N such that N.item-name = p.item-name then


Increment N’s count by 1
else:
Create a new node N , and let its count be 1, its parent link be linked to
T, and its node-link to the nodes with the same item-name via the node-link
structure.
If P is nonempty then
call insert_tree(P, N) recursively.
FP_growth(Tree, α ):
if Tree contains a single path P then
for each combination (denoted as β) of the nodes in the path P
generate pattern β ∪ α with support count = minimum support count of nodes
in β;
else for each ai in the header of Tree
generatepattern β=ai∪α withsupport count=ai.support_count;
construct β’s conditional pattern base and then β’s conditional FP_tree Treeβ;
if Treeβ ̸= ∅ then
Call FP_growth(Treeβ,β);
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 40
Pattern Evaluation Methods
 Most association rule mining algorithms employ a support–confidence
framework.
 minimum support and confidence thresholds may generate good number
of uninteresting rules, many of the rules generated are still not interesting
to the users.
 Whether or not a rule is interesting can be assessed either subjectively or
objectively.
 Ultimately, only the user can judge if a given rule is interesting, and this
judgment, being subjective, may differ from one user to another.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 41
Correlation Analysis
 The support and confidence measures are insufficient at filtering out
uninteresting association rules.
 A  B{support, confidence, lift}
 Lift is a simple correlation measure that is given as follows.
 The occurrence of itemset A is independent of the occurrence of itemset B
if P(A ∪ B) = P(A)P(B); otherwise, itemsets A and B are dependent and
correlated as events.
 The lift between the occurrence of A and B can be measured by computing

P( A B)
lift 
P( A) P( B)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 42
Correlation Analysis
 If the resulting value of lift is less than 1, then the occurrence of A and B is
negatively correlated.
 In other words, the presence of A makes the presence of B less likely.
 if the resulting value of lift is greater than 1, then A and B are positively
correlated.
 This indicates that the presence of A makes the presence of B more likely.
 A lift value of 1 suggests independence, meaning that the presence of one
item doesn't affect the likelihood
buys(X, of the
“computer other⇒item's presence.
games”)
buys(X, “videos”)
• Total transactions = 10,000 Lift(‘Computer games’,’video’) =
• Transactions with computer
games (game): 6,000
• Transactions with videos Lift(‘Computer games’,’video’) = = 0.89
(video): 7,500
• Transactions with both
computer games and
videos: 4,000
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 43
Questions
 What is confidence, support and lift ?, Explain with example.
 Explain maximal and closed item set.
 Explain Apriori algorithm with example.
 Explain FP- Tree algorithm with example.
 Explain steps to improve efficiency of apriori algorithm.
 Explain correlation analysis in frequent pattern mining with example.
 Explain market basket analysis with example.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 44

You might also like