Data Mining Frequent Patterns

Frequent pattern mining, also known as association rule mining, is a data mining technique used to identify patterns or associations in large datasets, with applications in market basket analysis and recommendation systems. Key concepts include frequent itemsets, support, confidence, and lift, which help evaluate the strength of associations between items. The document outlines the Apriori and FP-Growth algorithms for mining frequent itemsets and generating association rules, highlighting their processes and advantages.

Uploaded by

vravindraravindra927

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views22 pages

Data Mining Frequent Patterns

Uploaded by

vravindraravindra927

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

MINING FREQUENT Unit-03

PATTERNS
INTRODUCTION
Frequent pattern mining is a crucial data mining
technique aimed at identifying patterns or
associations within a dataset that occur frequently.
This is also known as ASSOCIATION RULE MINING. This
process involves analyzing large datasets to find items
or sets of items that appear together frequently,
providing valuable insights into the connections and
affiliations among different components or features
within the data. These patterns can provide valuable
insights for various applications, such as market
basket analysis, recommendation systems, and
anomaly detection.
KEY CONCEPTS IN
FREQUENT PATTERN MINING:
Frequent Itemsets: These are groups of items (or features) that
appear together in a dataset more frequently than a user-defined
threshold, called support. The support of an itemset is the
proportion of transactions in which the itemset appears.
Association Rules: These are implications of the form 𝐴→ 𝐵A→B,
where if itemset 𝐴A occurs, itemset 𝐵B is likely to occur as well.
The confidence of the rule is the probability that itemset 𝐵B
appears in transactions where 𝐴A appears.
Support: Support is a measure of how often an itemset appears in
the dataset. It's usually defined as the fraction of transactions in
which the itemset appears. A higher support indicates a more
frequent pattern.
Support(X)=
Number of transactions containing X/Total number of transactions
Confidence: Confidence is a measure of the reliability of an
association rule. It represents the likelihood that item B is
bought when item A is bought. It is calculated based on the
support of both the itemset and its subsets.
Formula: Confidence(𝐴⇒𝐵)=Support(𝐴∪𝐵)/Support(𝐴)
Example: If the support of {bread, butter} is 0.30 and the
support of {bread} is 0.50, the confidence of the rule
{bread} → {butter} would be 0.300.50=0.600.500.30
=0.60, meaning there's a 60% chance that butter is bought
when bread is bought.
Lift: Lift is a measure of the strength of an
association rule. It compares the observed
support of an itemset with the expected support if
the items were independent.
Formula: Lift(A⇒B)=
Support(A∪B)/Support(A)×Support(B)
A lift value greater than 1 indicates a positive
association (i.e., the items tend to be bought
together more often than expected), while a value
less than 1 indicates a negative association (i.e.,
the items are less likely to be bought together
ASSOCIATION RULE MINING
Association rule mining is a popular technique
in data mining that focuses on discovering
interesting relationships (associations)
between variables in large datasets. It is
commonly used in market basket analysis,
where the goal is to find associations between
different products that are frequently bought
together. The most well-known algorithm for
association rule mining is Apriori, but other
algorithms, such as FP-growth, are also used.
KEY CONCEPTS IN
ASSOCIATION RULE MINING
Association Rules: An association rule is of the form:
X→Y
where: X (antecedent) is the condition or set of items that
imply the presence of Y (consequent).For example, "if a
customer buys bread (X), they are likely to buy butter (Y)."
EXAMPLE DATASET
Let's consider a small dataset of transactions at a retail store:
Transaction ID Items
T1 {Bread, Butter}
T2 {Bread, Milk}
T3 {Butter, Milk}
T4 {Bread, Butter, Milk}
T5 {Bread, Butter}
This dataset shows 5 transactions. We are interested in
discovering association rules between the items.
STEP 1: GENERATE
FREQUENT ITEMSETS
Step 1.1: Find 1-itemsets (individual items)We first count how
frequently each individual item appears in the transactions.Bread
appears in T1, T2, T4, T5 → Support = 4/5 = 0.8.Butter appears in
T1, T3, T4, T5 → Support = 4/5 = 0.8.Milk appears in T2, T3, T4 →
Support = 3/5 = 0.6.
Step 1.2: Find 2-itemsets (pairs of items)Next, we generate pairs
of items and count how frequently each pair occurs together.
{Bread, Butter} appears in T1, T4, T5 → Support = 3/5 = 0.6.
{Bread, Milk} appears in T2, T4 → Support = 2/5 = 0.4.{Butter,
Milk} appears in T3, T4 → Support = 2/5 = 0.4.
Step 1.3: Find 3-itemsets (triplets of items)We can also generate
triplets of items, though these are less common and require more
frequent itemsets .{Bread, Butter, Milk} appears in T4 → Support
= 1/5 = 0.2.
STEP 2: GENERATE
ASSOCIATION RULES
Now that we have frequent item sets, we can generate
association rules from these item sets. Let’s consider
generating rules from the itemset {Bread, Butter}.Rule 1:
{Bread} → {Butter}Rule 2: {Butter} → {Bread}
We will now calculate the Support, Confidence, and Lift for
these rules.
Rule 1: {Bread} → {Butter}
Support:
Support(Bread→Butter)=Number of transactions containing
both Bread and Butter/
Total number of transactions=3/5=0.6.
Confidence:
Confidence(Bread→Butter)=Number of transactions containi
ng both Bread and Butter/
Number of transactions containing Bread=3/4=0.75.
Lift:
Lift(Bread→Butter)=Confidence(Bread→Butter)/
Support(Butter)=0.75/0.8=0.9375
The Lift value of 0.9375 indicates that buying bread
decreases the likelihood of buying butter slightly,
suggesting a weak association.
Rule 2: {Butter} → {Bread}
Support:
Support(Butter→Bread)=Number of transactions containing bo
th Butter and Bread/ Total number of transactions=3/5=0.6
Confidence:
Confidence(Butter→Bread)=Number of transactions containing
both Butter and Bread/
Number of transactions containing Butter=3/4=0.75
Lift:
Lift(Butter→Bread)=Confidence(Butter→Bread)/Support(Bread)
=0.75/0.8=0.9375
This rule also has a Lift of 0.9375, similar to the previous one.
STEP 3: INTERPRET RESULTS
The two rules we generated indicate that there
is a moderate association between Bread and
Butter. Specifically, the Confidence values are
high (0.75), meaning that when one of these
items is purchased, the likelihood of purchasing
the other item is 75%. However, the Lift values
are slightly less than 1, indicating that these
two items are not strongly correlated; they are
likely to be bought together in the dataset but
not as strongly as other more correlated items.
THE TWO-STEP PROCESS OF
ASSOCIATION RULE MINING
IS:
Frequent Itemset Generation: Identify the
frequent item sets by scanning the dataset
and applying a minimum support threshold.
Rule Generation: Generate association rules
from the frequent item sets, then evaluate and
prune rules based on confidence and lift
thresholds.
MARKET BASKET ANALYSIS
Market Basket Analysis is a data mining
technique used to discover patterns or
relationships between items that customers
frequently buy together in retail environments.
The primary goal of this analysis is to find
associations between products that often
appear together in transactions, which can
then be used for various business purposes
such as cross-selling, up-selling, inventory
management, and personalized marketing.
APRIORI AND FREQUENT
PATTERN GROWTH
ALGORITHM
Apriori Algorithm and Frequent Pattern Growth (FP-Growth)
Algorithm are two popular techniques used for mining
frequent itemsets and association rules in large datasets,
particularly in the context of market basket analysis. Let's
dive into both algorithms:
1. Apriori Algorithm:
The Apriori algorithm is one of the earliest and most widely
used algorithms for mining frequent itemsets in
transactional databases. It works by iteratively identifying
itemsets that meet a predefined minimum support
threshold.
STEPS IN THE APRIORI
ALGORITHM:
1.Generate Candidate Itemsets: Start by scanning the
database to identify all frequent 1-itemsets (items that
appear at least a minimum number of times).
For each pass, generate candidate itemsets of size k (i.e.,
pairs, triples, etc.), which are formed from frequent itemsets
of size (k-1).
2. Count Support for Candidate Itemsets:Scan the database
again and count the frequency of each candidate itemset.
If the frequency (support) of a candidate itemset meets or
exceeds the minimum support threshold, it is considered a
frequent itemset.
3. Prune Infrequent Itemsets:Remove itemsets that do not
meet the minimum support threshold from the candidate
list. This reduces the size of the search space for the next
iteration.
4. Repeat Until No More Frequent Itemsets: The process
repeats by generating larger candidate itemsets and
checking their support, until no more frequent itemsets
are found.

Example:
If you're mining itemsets from a supermarket dataset
where the transactions include items like milk, bread,
butter, etc., Apriori would identify frequent itemsets (like
{milk, bread}) and generate association rules like:

{milk} -> {bread}: If a customer buys milk, they are likely

to buy bread.
2. FREQUENT PATTERN
GROWTH (FP-GROWTH)
ALGORITHM
The FP-Growth algorithm is an alternative to Apriori, designed to
be faster and more efficient. While Apriori generates candidate
itemsets and scans the database multiple times, FP-Growth uses a
compact data structure (called an FP-tree) to avoid generating
candidate itemsets and reduce the number of database scans.
Steps in the FP-Growth Algorithm:
1. Build the FP-tree: Scan the database to find frequent 1-itemsets
and then order items in the transactions based on frequency.
Construct the FP-tree by inserting transactions into the tree,
ensuring that common prefixes (frequent itemsets) are shared
among transactions.
2. Mining the FP-tree: Once the FP-tree is built, it is
recursively mined for frequent itemsets. Each
conditional pattern base is mined by constructing a
conditional FP-tree and applying the same process.
3. Recursive Process: For each item in the FP-tree,
recursively mine the conditional FP-tree for frequent
itemsets, resulting in a set of frequent patterns.
ADVANTAGES OF FP-
GROWTH OVER APRIORI:
Efficiency: FP-Growth is generally more efficient than
Apriori, especially with large datasets. It reduces the
number of database scans (just two passes in most cases).
No Candidate Generation: Unlike Apriori, FP-Growth
doesn't generate candidate itemsets, avoiding the
combinatorial explosion that can occur with Apriori.
When to Use:
Apriori: Suitable for smaller datasets or when memory
usage is less of a concern.
FP-Growth: Preferred for larger datasets, as it is more
efficient and scalable.
EXAMPLE:
Imagine we have the following transactions in a
database:
T1: {A, B, C}
T2: {A, C, D}
T3: {B, C, D}
FP-Growth would:
Construct an FP-tree by ordering items based on
frequency and organizing the transactions.
Recursively mine the FP-tree to identify frequent
itemsets.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Transportation Notes Good
100% (1)
Transportation Notes Good
134 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Data Mining
No ratings yet
Data Mining
4 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Association (IML)
No ratings yet
Association (IML)
19 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Unit - III
No ratings yet
Unit - III
27 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
How
No ratings yet
How
4 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
UNIT III
No ratings yet
UNIT III
13 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Mod 3 Notes Full
No ratings yet
Mod 3 Notes Full
25 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
DWDM 3
No ratings yet
DWDM 3
34 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Association Rules
No ratings yet
Association Rules
39 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Data Mining Unit 4 (1) PDF PDF
No ratings yet
Data Mining Unit 4 (1) PDF PDF
11 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (19)
Best Stocks for Day Trading: How to Find the Best Stocks for Your Day Trading Strategy
From Everand
Best Stocks for Day Trading: How to Find the Best Stocks for Your Day Trading Strategy
William Lowe
3.5/5 (3)
Gaussian Mixture Model (GMM)
No ratings yet
Gaussian Mixture Model (GMM)
10 pages
DSA Roadmap - C&D
No ratings yet
DSA Roadmap - C&D
9 pages
Assignment 1 Process Scheduling-1
No ratings yet
Assignment 1 Process Scheduling-1
3 pages
Binary Search Trees
No ratings yet
Binary Search Trees
116 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
Question Bank For ESA Preparation
No ratings yet
Question Bank For ESA Preparation
2 pages
Machine Learning - Lecture 5
No ratings yet
Machine Learning - Lecture 5
19 pages
Signal Processing in Noise PDF
No ratings yet
Signal Processing in Noise PDF
51 pages
Advanced Control Systems 2020-2021 Iv-Group Ii Mallikarjuna Reddy.M
No ratings yet
Advanced Control Systems 2020-2021 Iv-Group Ii Mallikarjuna Reddy.M
21 pages
OR Harshu
No ratings yet
OR Harshu
11 pages
6-Month Roadmap To Becoming An AI Engineer - A Step-by-Step Guide
No ratings yet
6-Month Roadmap To Becoming An AI Engineer - A Step-by-Step Guide
21 pages
Module 3: Sampling & Reconstruction Lecture 23: Low Pass Filter
No ratings yet
Module 3: Sampling & Reconstruction Lecture 23: Low Pass Filter
3 pages
Final Numerical Analysis 16082024 105950am
No ratings yet
Final Numerical Analysis 16082024 105950am
2 pages
Neural Style Transfer
No ratings yet
Neural Style Transfer
14 pages
Applied Numerical Analysis - Quiz #1
No ratings yet
Applied Numerical Analysis - Quiz #1
4 pages
The Transhipment Problem: Formulation
No ratings yet
The Transhipment Problem: Formulation
104 pages
1.3 Ravi Shankar Singh - 21bcs11619
No ratings yet
1.3 Ravi Shankar Singh - 21bcs11619
8 pages
CS 161: Design and Analysis of Algorithms
No ratings yet
CS 161: Design and Analysis of Algorithms
43 pages
Real and Complex Cepstrum
No ratings yet
Real and Complex Cepstrum
26 pages
AA V1 I1 PCG Lanczos Eigensolver
No ratings yet
AA V1 I1 PCG Lanczos Eigensolver
2 pages
Solution HW4
No ratings yet
Solution HW4
5 pages
Least Square Method Definition
No ratings yet
Least Square Method Definition
7 pages
Dataware Housing and Data Mining Question
No ratings yet
Dataware Housing and Data Mining Question
8 pages
Assignments-Numerical Anlysis (2023-24)
No ratings yet
Assignments-Numerical Anlysis (2023-24)
18 pages
30 Days SDE-PROBLEMS
No ratings yet
30 Days SDE-PROBLEMS
6 pages
Numerical Methods I
No ratings yet
Numerical Methods I
44 pages
CSE3004 - DESIGN-AND-ANALYSIS-OF-ALGORITHMS - LT - 2.0 - 36 - Design and Analysis of Algorithms
No ratings yet
CSE3004 - DESIGN-AND-ANALYSIS-OF-ALGORITHMS - LT - 2.0 - 36 - Design and Analysis of Algorithms
2 pages
Data Structures and Merge Sort Algorithms
No ratings yet
Data Structures and Merge Sort Algorithms
2 pages

Data Mining Frequent Patterns

Uploaded by

Data Mining Frequent Patterns

Uploaded by

MINING FREQUENT Unit-03

{milk} -> {bread}: If a customer buys milk, they are likely

You might also like