0% found this document useful (0 votes)

19 views44 pages

Unit 3

Uploaded by

redoxit809

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views44 pages

Unit 3

Uploaded by

redoxit809

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Data Mining (DM)

2101CS521

Unit-3
Mining Frequent
Patterns,
Associations, and
Correlations
Prof. Jayesh D. vagadiya
Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
[email protected]
9537133260
 Looping
Topics to be covered
• What Kinds of Patterns Can Be Mined?
• Market Basket Analysis
• Frequent Itemsets
• Association Rule
• Maximal and Closed Frequent Itemsets
• Apriori Algorithm
• Methods to Improve Apriori Efficiency
• FP-growth Algorithm
• Correlation
What Kinds of Patterns Can Be Mined?
 Data mining functionalities can be classified into two categories:
1. Descriptive We are going to cover
this part in this
2. Predictive chapter

 Descriptive
• This task presents the general properties of data stored in a database.
• The descriptive tasks are used to find out patterns in data.
• E.g.: Frequent patterns, association, correlation etc.

 Predictive
• These tasks predict the value of one attribute on the basis of values of other
attributes.
• E.g.: Festival Customer/Product Sell prediction at store

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 3
What Kinds of Patterns Can Be Mined?
 Mining Frequent Patterns:
• Frequent patterns are those patterns that
occur frequently in data. Here is the list
of kind of frequent patterns

• Frequent Item Set

• It refers to a set of items that frequently
appear together, for example, milk and
bread.

• Frequent Subsequence
• A sequence of patterns that occur
frequently such as purchasing a laptop is
followed by digital camera and a memory
card.

• Frequent Sub Structure

• A substructure can refer to different
structural forms (e.g., graphs, trees, or
lattices) that may be#2101CS521
Prof. Jayesh D. Vagadiya
combined
(DM) with
Unit 3 - Mining Frequent Patterns, Associations,
4
Market Basket Analysis
 Market Basket Analysis is a modelling technique to find frequent itemset.
 It is based on, if you buy a certain group of items, you are more (or less)
likely to buy another group of items.
 For example, if you are in a store and you buy a car then you are more
likely to buy insurance at the same time than somebody who don't buy
insurance also.
 The set of items that a customer buys it referred as an itemset.
 Market basket analysis seeks to find relationships between purchases
(Items).
 E.g. IF {Car, Accessories} THEN {Insurance}
{Car, Accessories}  {Insurance}

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 5
Association Rule
 The process of uncovering the relationship among data and determining
association rules.
 It is used to discover interesting relationships and associations among
items or events in large datasets.

Computer Software
IF then

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 6
Association Rule Mining
 Given a set of transactions, we need rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction.
 Market-Basket transactions
TI Items Example of
D Association Rules
1 Bread, Milk

2
Bread, Chocolate, Pepsi, {Chocolate} → {Pepsi},
Eggs {Milk, Bread} → {Eggs,
3 Milk, Chocolate, Pepsi, Coke Coke},
4
Bread, Milk, Chocolate, {Pepsi, Bread} → {Milk}
Pepsi
Bread, Milk, Chocolate,
5
Coke

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 7
Association Rule Mining Cont..
 Itemset TI Items
D
• A collection of one or more items
o E.g. : {Milk, Bread, Chocolate} 1 Bread, Milk

• k-itemset Bread, Chocolate, Pepsi,

2
Eggs
An itemset that contains k items
3 Milk, Chocolate, Pepsi, Coke
 Support count (σ)
Bread, Milk, Chocolate,
• Frequency of occurrence of an itemset 4
Pepsi
o E.g. σ({Milk, Bread, Chocolate}) = 2 Bread, Milk, Chocolate,
5
 Support Coke
• Fraction of transactions that contain an
itemset
o E.g. s({Milk, Bread, Chocolate}) = 2/5
 Frequent Itemset
• An itemset whose support is greater than
or equal to a minimum support threshold
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 8
Association Rule Mining Cont..
 Association Rule TI Items
D
 An implication expression of the form X → Y, where X and Y are
1 Bread, Milk
item sets
 E.g.: {Milk, Chocolate} → {Pepsi} Bread, Chocolate, Pepsi,
2
Eggs
 Rule Evaluation 3 Milk, Chocolate, Pepsi, Coke
 Support (s)
Bread, Milk, Chocolate,
 Fraction of transactions that contain both X and Y 4
Pepsi
 Confidence (c)
Bread, Milk, Chocolate,
5 contain X
 Measures how often items in Y appear in transactions that Coke
Example:
Find support & confidence for {Milk, Chocolate} ⇒ Pepsi

= c = 67

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 9
Association Rule Mining Cont..
A common strategy adopted by many association rule mining algorithms is to
decompose the problem into two major subtasks:
1. Frequent Itemset Generation
• The objective is to find all the item-sets that satisfy the minimum
support threshold.
• These item sets are called frequent item sets.
2. Rule Generation
• The objective is to extract all the high-confidence rules from the
frequent item sets found in the previous step.
• These rules are called strong rules.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 10
Maximal and Closed Frequent Itemsets
 Closed Frequent Itemsets:
 A frequent itemset is closed, when no (immediate) superset has the same support.

 Maximal Frequent Itemsets:

 A frequent itemset is maximal, if none of its (immediate) supersets is frequent.

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 11
Minimum
Maximal and Closed Frequent Itemsets Support = 3
TI {A} = 4 ; not closed due to {A,E}
Items
D {B} = 2 ; not frequent => ignore
1 ABCE {C} = 5 ; not closed due to {C,E}
{D} = 4 ; not closed due to {D,E}, but not maximal due to e.g.
2 ACDE {A,D}
{E} =
6 ; closed, {A,B}
but not maximal due to e.g. {D,E}
3 BCE = 1; not frequent => ignore
{A,C} = 3; not closed due to {A,C,E}
4 ACDE {A,D} = 3; not closed due to {A,D,E}
{A,E} = 4; closed, but not maximal due to
5 CDE {A,D,E}
6 ADE {B,C} = 2; not frequent => ignore
{A,B,C} = 1; not frequent => ignore {B,D} = 0; not frequent => ignore
{A,B,D} = 0; not frequent => ignore {B,E} = 2; not frequent => ignore
{A,B,E} = 1; not frequent => ignore {C,D} = 3; not closed due to {C,D,E}
{A,C,D} = 2; not frequent => ignore {C,E} = 5; closed, but not maximal due to
{A,C,E} = 3; maximal frequent {C,D,E}
{A,B,C,D}
{D,E} = 4; =closed,
0; not frequent
but not =>
maximal due to
{A,D,E} = 3; maximal frequent ignore
{B,C,D} = 0; not frequent => ignore {A,D,E}
{A,B,C,E} = 1; not frequent =>
{B,C,E} = 2; not frequent => ignore ignore
{C,D,E} = 3; maximal frequent
#2101CS521{B,C,D,E}
(DM)  Unit 3= 0; not
- Mining frequent
Frequent =>
Patterns, Associations,
Prof. Jayesh D. Vagadiya 12
Apriori Algorithm
 It is used to mine frequent patterns.
 The algorithm makes use of prior knowledge about frequent item sets to
efficiently explore and generate larger item sets.
 Apriori employs an iterative approach known as a level-wise search, where
k-itemsets are used to explore (k + 1)-itemsets.
 First, the set of frequent 1-itemsets is found by scanning the database to
accumulate the count for each item, and collecting those items that
satisfy minimum support.
 The resulting set is denoted by L1. Next, L1 is used to find L2, the set of
frequent 2-itemsets, which is used to find L3, and so on, until no more
frequent k-itemsets can be found.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 13
Minimum
Apriori Algorithm - Example Support = 2
C2 ItemS
ItemS Min.
TID Items C1 et Sup L1 ItemS et
Min.
100 134 {1} 2 et Sup {1 2}
Scan D {1} 2
200 235 {2} 3 {1 3}
300 1235 {2} 3 {1 5}
{3} 3
400 25 {3} 3 {2 3}
{4} 1
{5} 3 {2 5}
{5} 3
L2 ItemS Min. C2 ItemS Min. {3 5}
ItemS Min.
et Sup et Sup et Sup
Scan D
{1 2 1 Scan D {1 3} 2 {1 2} 1
3} {2 3} 2 {1 3} 2
{1 3 1
{2 5} 3 {1 5} 1
5}
{2 3 2 {3 5} 2 {2 3} 2
5} {2 5} 3
{3 5} 2
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 14
Minimum
Apriori Algorithm - Example Cont.. Support = 2
Rules
Generation
Association Suppor Confidenc Confidence (%)
Confidence
Rule t e
2^35 2 2/2 = 1 100 % A -> B
3^52 2 2/2 = 1 100 %
2^3 -> 5
2^53 2 2/3 = 66%
0.66
23^5 2 2/3 = 66%
=1
0.66
32^5 2 2/3 = 66%
0.66
52^3 2 2/3 = 66%
0.66

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 15
Apriori Property
 All nonempty subsets of a frequent itemset must also be
frequent.
 E.g. if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset.
 This property belongs to a special category of properties called
antimonotonicity in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well.
 It is used in apriori algorithm to improve the performance of algorithm.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 16
Important steps in Apriori
 The Join Step:
 Create candidate k-item sets by joining frequent (k-1)-item sets with itself.
 Ck is generated by joining Lk-1with itself.
 The join, Lk−1 I Lk−1, is performed, where members of Lk−1 are joinable if
their first (k − 2) items are in common.

 The Pruning Step:

 Any (k-1) itemset that is not frequent cannot be a subset of a frequent k-
itemset.(Apriori property)
 Hence, if any (k − 1)-subset of a candidate k-itemset is not in Lk−1, then
the candidate cannot be frequent either and so can be removed from C k.
 This steps improve the performance by removing the candidate from Ck.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 17
Apriori Algorithm Steps
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
for (k = 1; Lk != ∅; k++) do begin
Ck+1 = candidates generated from Lk //Join Step
Any (k-1) itemset that is not frequent cannot be a subset of a
frequent k- itemset. // Pruning Step
for each transaction t in database do
Increment the count of all candidates in Ck+1 That are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Lk;

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 18
Apriori steps
1- Frequent
Apply
C1 min_support L1

Self Join and

Pruning
2- Frequent
Apply
C2 min_support L2

Self Join and

Pruning
3- Frequent
Apply
C3 min_support L3

N- Frequent
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 19
Methods to Improve Apriori Efficiency
 Hash-based technique:
 Using a hash-based structure known as a hash table, the k-itemsets and their related
counts are generated.
 The table is generated using a hash function.
 For example, when scanning each transaction in the database to generate the
frequent 1-itemsets, L1, we can generate all the 2-itemsets for each transaction and
TI hash (i.e., map) them Suppo
into the different buckets of a hash table structure, and
Items the corresponding bucket counts.
increase
D Items rt
1 11, 12, 15 Count
2 12, 14 I1 6
3 12, 13 I2 7
4 11, 12, 14 I3 6
5 11,13 I4 2
6 12, 13 I5 2
C1
7 11,13
8 11, 12, 13,
15
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
9 11,
Prof. 12, 13
Jayesh D. Vagadiya 20
Methods to Improve Apriori Efficiency
 Hash-based technique: Hash Table Structure to
generate L2
Items Hash Buck
Count
et Function et
0 1 2 3 4 5 6
11, 12 4 [1*10+2] mod addre
7=5 ss
11,13 4 [1*10+3] mod Buck 2 2 4 2 2 4 4
7=6 et
Count
11, 14 1 [1*10+4] mod
7=0 Buck {I1,I4: {I1,I5: {I2,I3: {I2,I4: {I2,I5: {I1,I2: {I1,I5:
et 1} 2} 4} 2} 2} 4} 4}
11, 15 2 [1*10+5 mod
Conte {I3,I5:
7=1
nt 1}
12, 13 4 [2*10+3] mod
7=2 L2 NO NO YES NO NO YES YES
12, 14 2
[2*10+4] mod
7=3
12,Hash
15 Function
2 [2*10+5] mod
7=4
H(X, Y)= ((Order of First)* 10+(Order of
13, 14 0 Second))mod
- 7
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 21
Methods to Improve Apriori Efficiency
 Transaction reduction:
 A transaction that does not contain any frequent k-itemsets cannot contain any
frequent (k + 1)-itemsets.
 Therefore, such a transaction can be marked or removed.
 During this step, the algorithm further reduces the size of transactions by eliminating
items that are no longer frequent after the previous iteration.
 Since the eliminated items can't be part of any frequent itemsets, removing them
reduces the search space and improves efficiency.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 22
Methods to Improve Apriori Efficiency
 Partitioning:
 It consists of two phases.
 In phase I, the algorithm divides the transactions of D into n nonoverlapping
partitions.
 Find the frequent itemsets local to each partition (1 scan).
 Combine all local frequent itemsets to form candidate itemset.
 In phase II, Find global frequent itemsets among candidates (2 scan) and we get
Phase - I
Frequent itemsets in D
Phase - II
Find
Combine
Divide D frequent
results to Find global Freq.
into n itemsets
Trans. form a frequent itemset
Non- local
in D global itemsets s
over to each
set of among in D
lapping partition
candidate candidates
partitions (parallel
itemset
alg.)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 23
Methods to Improve Apriori Efficiency
 Sampling:
 A random sample S is selected from database D, and then a search is conducted for
frequent itemsets within that sample S.
 In this way, we trade off some degree of accuracy against efficiency.
 These frequent itemsets are called sample frequent itemsets.
 More than one sample could be used to improve accuracy.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 24
Methods to Improve Apriori Efficiency
 Dynamic itemset counting:
 Dynamic itemset counting refers to the process of incrementally updating the
support counts of itemsets as new transactions are added to the dataset.
 This is particularly useful when dealing with dynamic or streaming data where
transactions arrive over time.
 Instead of recalculating support counts from scratch whenever new data arrives,
dynamic counting efficiently maintains and updates the support counts of existing
itemsets.
 The technique uses the count-so-far as the lower bound of the actual count.
 If the count-so-far passes the minimum support, the itemset is added into the
frequent itemset collection and can be used to generate longer candidates.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 25
Disadvantages of Apriori
 It may still need to generate a huge number of candidate sets.
 For example, if there are 104 frequent 1-itemsets, the Apriori algorithm will need to
generate more than 107 candidate 2-itemsets.
 it may need to repeatedly scan the whole database and check a large set
of candidates by pattern matching.
 It is costly to go over each transaction in the database to determine the support of
the candidate itemsets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 26
Apriori Algorithm (Try Yourself!)
A database has 4 transactions. Let Min_sup = 50% and
Min_conf = 75%
TID Items
1000 Cheese, Milk, Cookies
Frequent Itemset Sup
2000 Butter, Milk, Bread
Butter,Milk,Bread 2
3000 Cheese, Butter, Milk, Bread
4000 Butter, Bread
Sr. Association Rule Support Confidence Confidence (%)
Rule Butter^Milk  2 2/2 = 1 100% Min_sup = 50%
1 Bread How to convert support
Rule Milk^Bread  2 2/2 = 1 100% in integer?
Given % X Total
2 Butter
Records
Rule Butter^Bread  2 2/3 = 0.66 66% 100
3 Milk So here, 50 X =4 2
100
Rule ButterMilk^Bre 2 2/3 = 0.66 66%
4 ad
Rule MilkButter^Bre 2 2/3 = 0.66 66%
Prof. Jayesh5
D. Vagadiyaad
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
27
FP-growth
 It stands for finding frequent item sets without candidate generation.
 First, it compresses the database representing frequent items into a
frequent pattern tree or FP tree.
 Once an FP-tree has been constructed, it uses a recursive divide-and-
conquer approach to mine the frequent item sets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 28
Minimum
FP-growth Example Support = 3
FP-Tree Step:1 Step:2
TI Generation Freq. 1- Transactions with items sorted
Items Itemsets. based on frequencies, and
D
Min_Sup  3 ignoring the infrequent items.
1 EKMNO item
Frequen {K:5, E:4, O:3, M:3,
Building the FP-Tree
cy Y:3}
Y A 1 TI  Scan data to determine
Sorted Items
D the support count of each
2 DEKNO C 2 item.
Y D 1
1 K E M O Y  Infrequent items are
2 KEOY discarded, while the
3 AEKM E 4
frequent items are sorted
4 CKMUY
I 1 3 KEM in decreasing support
K 5 counts.
5 CEIKO 4 KMY  Make a second pass over
M 3
5 KEO the data to construct the
N 2 FP-tree.
O 3  As the transactions are
read, before being
U 1
processed, their items are
Y 3 sorted according to the
above order.
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 29
FP-growth Example
null
KEMOY
K EOY K:1

K EM
E:1
K MY K:5
K EO E:4
M:3 M:1

O:3
Y:3 O:1

Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 30
FP-growth Example
null
KEMOY
KEOY K:2

KEM
E:2
KMY K:5
KEO E:4
M:3 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 31
FP-growth Example
null
KEMOY
KEOY K:3

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 32
FP-growth Example
null
KEMOY
KEOY K:4

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 33
FP-growth Example
null
K EMOY
K EOY K:5

K EM
K MY E:4
K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:2

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 34
FP-growth Example
Conditional Pattern
Base
null

K:5 Item Conditional Pattern Base

Y {KEMO:1} {KEO:1}
E:4
{KM:1}
O {KEM:1} {KE:2}
M:2 M:1
M {KE:2} {K:1}
E {k:4}
O:1 O:2
K -
Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 35
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

Y {KEMO:1} {KEO:1} {K:3} O {KEM:1} {KE:2} {K:3,E:3

{KM:1} }
null
null
K:3
K:3

E:2
E:3

M:1 M:1
M:1

O:1 O:1

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 36
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

M {KE:2} {K:1} {K:3} E {k:4} {K:4}

null
null
K:3
K:4

E:2

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 37
FP-growth Example
Conditional FP-tree and Frequent
Ite
Patterns Generated Frequent Patterns
Conditional Pattern Base Conditional FP-tree
m Generated

Y {KEMO:1} {KEO:1} {K:3} {K,Y:3}

{KM:1}
O {KEM:1} {KE:2} {K:3,E:3} {K,O:3} {E,O:3}
{K,E,0:3}
M {KE:2} {K:1} {K:3} {K,M:3}
E {k:4} {K:4} {K,E:4}
K - - -

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 38
FP-growth Algorithm
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
1. The FP-tree is constructed in the following steps:

1. Scan the transaction database D once. Collect F, the set of frequent items,
and their support counts. Sort F in support count descending order as L, the
list of frequent items.
2. Create the root of an FP-tree, and label it as “null.” For each transaction
Trans in D do the following.
1. select and sort the frequent items in Trans according to the order of L.
2. Let the sorted frequent item list in Trans be [p|P], where p is the first
element and P is the remaining list. Call insert_tree([p|P], T)

2. The FP-tree is mined by calling FP_growth(FP tree, null)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 39
FP-growth Algorithm
Insert_tree([p|P], T):

If T has a child N such that N.item-name = p.item-name then

Increment N’s count by 1
else:
Create a new node N , and let its count be 1, its parent link be linked to
T, and its node-link to the nodes with the same item-name via the node-link
structure.
If P is nonempty then
call insert_tree(P, N) recursively.
FP_growth(Tree, α ):
if Tree contains a single path P then
for each combination (denoted as β) of the nodes in the path P
generate pattern β ∪ α with support count = minimum support count of nodes
in β;
else for each ai in the header of Tree
generatepattern β=ai∪α withsupport count=ai.support_count;
construct β’s conditional pattern base and then β’s conditional FP_tree Treeβ;
if Treeβ ̸= ∅ then
Call FP_growth(Treeβ,β);
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 40
Pattern Evaluation Methods
 Most association rule mining algorithms employ a support–confidence
framework.
 minimum support and confidence thresholds may generate good number
of uninteresting rules, many of the rules generated are still not interesting
to the users.
 Whether or not a rule is interesting can be assessed either subjectively or
objectively.
 Ultimately, only the user can judge if a given rule is interesting, and this
judgment, being subjective, may differ from one user to another.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 41
Correlation Analysis
 The support and confidence measures are insufficient at filtering out
uninteresting association rules.
 A  B{support, confidence, lift}
 Lift is a simple correlation measure that is given as follows.
 The occurrence of itemset A is independent of the occurrence of itemset B
if P(A ∪ B) = P(A)P(B); otherwise, itemsets A and B are dependent and
correlated as events.
 The lift between the occurrence of A and B can be measured by computing

P( A B)
lift 
P( A) P( B)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 42
Correlation Analysis
 If the resulting value of lift is less than 1, then the occurrence of A and B is
negatively correlated.
 In other words, the presence of A makes the presence of B less likely.
 if the resulting value of lift is greater than 1, then A and B are positively
correlated.
 This indicates that the presence of A makes the presence of B more likely.
 A lift value of 1 suggests independence, meaning that the presence of one
item doesn't affect the likelihood
buys(X, of the
“computer other⇒item's presence.
games”)
buys(X, “videos”)
• Total transactions = 10,000 Lift(‘Computer games’,’video’) =
• Transactions with computer
games (game): 6,000
• Transactions with videos Lift(‘Computer games’,’video’) = = 0.89
(video): 7,500
• Transactions with both
computer games and
videos: 4,000
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 43
Questions
 What is confidence, support and lift ?, Explain with example.
 Explain maximal and closed item set.
 Explain Apriori algorithm with example.
 Explain FP- Tree algorithm with example.
 Explain steps to improve efficiency of apriori algorithm.
 Explain correlation analysis in frequent pattern mining with example.
 Explain market basket analysis with example.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Prof. Jayesh D. Vagadiya 44

5G Heterogeneous Networks PDF
No ratings yet
5G Heterogeneous Networks PDF
63 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
CH - 5
No ratings yet
CH - 5
43 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Association Rules
No ratings yet
Association Rules
48 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
3160714_DM_GTU_Study_Material_Presentations_Unit-3_21052021124240PM
No ratings yet
3160714_DM_GTU_Study_Material_Presentations_Unit-3_21052021124240PM
54 pages
14-Introduction to Apriori level wise algorithm-03-09-2024
No ratings yet
14-Introduction to Apriori level wise algorithm-03-09-2024
32 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association
No ratings yet
Association
40 pages
06Apriori Edited v3
No ratings yet
06Apriori Edited v3
29 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
L9
No ratings yet
L9
24 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Unit_3 Mining Frequent Patterns
No ratings yet
Unit_3 Mining Frequent Patterns
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Association Rules
No ratings yet
Association Rules
20 pages
Contents
No ratings yet
Contents
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit 3
No ratings yet
Unit 3
62 pages
15. Association RuleMining
No ratings yet
15. Association RuleMining
52 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Unit-2
No ratings yet
Unit-2
65 pages
Unit - III
No ratings yet
Unit - III
27 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
Slide 06 Chapter6 Frequent Itemset Mining Methods
No ratings yet
Slide 06 Chapter6 Frequent Itemset Mining Methods
62 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Lect 4. Frequent Pattern Mining
100% (1)
Lect 4. Frequent Pattern Mining
60 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Module_5_Cyber_Digital_Literacies
No ratings yet
Module_5_Cyber_Digital_Literacies
2 pages
General options
No ratings yet
General options
11 pages
Python3 Cheat Sheet
No ratings yet
Python3 Cheat Sheet
17 pages
Tally Practice Book
100% (3)
Tally Practice Book
20 pages
AASTU Map
No ratings yet
AASTU Map
9 pages
CV of MD - Morshadul Islam PDF
No ratings yet
CV of MD - Morshadul Islam PDF
2 pages
MPPSC Prelims: Micro Syllabus Important Topics
No ratings yet
MPPSC Prelims: Micro Syllabus Important Topics
8 pages
ICT Governance Standard 2023
100% (1)
ICT Governance Standard 2023
74 pages
Isovolt 160 Mobile Brochure English 0
No ratings yet
Isovolt 160 Mobile Brochure English 0
8 pages
Java Design Patterns With Examples
96% (25)
Java Design Patterns With Examples
92 pages
STI Statement On Mark Zuckerberg
No ratings yet
STI Statement On Mark Zuckerberg
3 pages
Front Office Lesson 2
No ratings yet
Front Office Lesson 2
5 pages
Artificial Intelligence: Computer Application Mohit Saini
No ratings yet
Artificial Intelligence: Computer Application Mohit Saini
15 pages
Codes: Validate, Verify & Check Credit
No ratings yet
Codes: Validate, Verify & Check Credit
2 pages
EST4 Overview Data Sheet Final 8819
No ratings yet
EST4 Overview Data Sheet Final 8819
6 pages
1 Researchgate
No ratings yet
1 Researchgate
3 pages
localization_21
No ratings yet
localization_21
6 pages
CV Delon MIRA english version
No ratings yet
CV Delon MIRA english version
1 page
Outliers and Influential Points
No ratings yet
Outliers and Influential Points
14 pages
Ortea Orion Plus 3 Phase
No ratings yet
Ortea Orion Plus 3 Phase
2 pages
Assignment 74
No ratings yet
Assignment 74
3 pages
Cascadia Cabin CAN ECU Compatibility: General Information
No ratings yet
Cascadia Cabin CAN ECU Compatibility: General Information
4 pages
IMP Questions in Consumer Electronics [E-VB]
No ratings yet
IMP Questions in Consumer Electronics [E-VB]
2 pages
SDS5032E (V) : User Manual
No ratings yet
SDS5032E (V) : User Manual
90 pages
BE-8th Sem Mech
No ratings yet
BE-8th Sem Mech
19 pages
Correlation & Regression
No ratings yet
Correlation & Regression
10 pages
2021 JP 3D Virtual Patient Representation For Guiding A Maxillary Overdenture Fabrication
No ratings yet
2021 JP 3D Virtual Patient Representation For Guiding A Maxillary Overdenture Fabrication
6 pages
Training Material of MS59KL Chassis 20140612041624226
No ratings yet
Training Material of MS59KL Chassis 20140612041624226
40 pages
Electric Piano: Adam Estes and Yukimi Morimoto
No ratings yet
Electric Piano: Adam Estes and Yukimi Morimoto
21 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

Data Mining (DM)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

• Frequent Item Set

• Frequent Sub Structure

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

• k-itemset Bread, Chocolate, Pepsi,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

 Maximal Frequent Itemsets:

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

 The Pruning Step:

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Self Join and

Self Join and

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

K:5 Item Conditional Pattern Base

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

Y {KEMO:1} {KEO:1} {K:3} O {KEM:1} {KE:2} {K:3,E:3

M {KE:2} {K:1} {K:3} E {k:4} {K:4}

Y {KEMO:1} {KEO:1} {K:3} {K,Y:3}

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

2. The FP-tree is mined by calling FP_growth(FP tree, null)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

If T has a child N such that N.item-name = p.item-name then

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,

You might also like