0% found this document useful (0 votes)
17 views22 pages

Data Mining Frequent Patterns

Frequent pattern mining, also known as association rule mining, is a data mining technique used to identify patterns or associations in large datasets, with applications in market basket analysis and recommendation systems. Key concepts include frequent itemsets, support, confidence, and lift, which help evaluate the strength of associations between items. The document outlines the Apriori and FP-Growth algorithms for mining frequent itemsets and generating association rules, highlighting their processes and advantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Data Mining Frequent Patterns

Frequent pattern mining, also known as association rule mining, is a data mining technique used to identify patterns or associations in large datasets, with applications in market basket analysis and recommendation systems. Key concepts include frequent itemsets, support, confidence, and lift, which help evaluate the strength of associations between items. The document outlines the Apriori and FP-Growth algorithms for mining frequent itemsets and generating association rules, highlighting their processes and advantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

MINING FREQUENT Unit-03

PATTERNS
INTRODUCTION
Frequent pattern mining is a crucial data mining
technique aimed at identifying patterns or
associations within a dataset that occur frequently.
This is also known as ASSOCIATION RULE MINING. This
process involves analyzing large datasets to find items
or sets of items that appear together frequently,
providing valuable insights into the connections and
affiliations among different components or features
within the data. These patterns can provide valuable
insights for various applications, such as market
basket analysis, recommendation systems, and
anomaly detection.
KEY CONCEPTS IN
FREQUENT PATTERN MINING:
Frequent Itemsets: These are groups of items (or features) that
appear together in a dataset more frequently than a user-defined
threshold, called support. The support of an itemset is the
proportion of transactions in which the itemset appears.
Association Rules: These are implications of the form 𝐴→ 𝐵A→B,
where if itemset 𝐴A occurs, itemset 𝐵B is likely to occur as well.
The confidence of the rule is the probability that itemset 𝐵B
appears in transactions where 𝐴A appears.
Support: Support is a measure of how often an itemset appears in
the dataset. It's usually defined as the fraction of transactions in
which the itemset appears. A higher support indicates a more
frequent pattern.
Support(X)=
Number of transactions containing X​/Total number of transactions
Confidence: Confidence is a measure of the reliability of an
association rule. It represents the likelihood that item B is
bought when item A is bought. It is calculated based on the
support of both the itemset and its subsets.
Formula: Confidence(𝐴⇒𝐵)=Support(𝐴∪𝐵)/Support(𝐴)
Example: If the support of {bread, butter} is 0.30 and the
support of {bread} is 0.50, the confidence of the rule
{bread} → {butter} would be 0.300.50=0.600.500.30​
=0.60, meaning there's a 60% chance that butter is bought
when bread is bought.
Lift: Lift is a measure of the strength of an
association rule. It compares the observed
support of an itemset with the expected support if
the items were independent.
Formula: Lift(A⇒B)=
Support(A∪B)/Support(A)×Support(B)
A lift value greater than 1 indicates a positive
association (i.e., the items tend to be bought
together more often than expected), while a value
less than 1 indicates a negative association (i.e.,
the items are less likely to be bought together
ASSOCIATION RULE MINING
Association rule mining is a popular technique
in data mining that focuses on discovering
interesting relationships (associations)
between variables in large datasets. It is
commonly used in market basket analysis,
where the goal is to find associations between
different products that are frequently bought
together. The most well-known algorithm for
association rule mining is Apriori, but other
algorithms, such as FP-growth, are also used.
KEY CONCEPTS IN
ASSOCIATION RULE MINING
Association Rules: An association rule is of the form:
X→Y
where: X (antecedent) is the condition or set of items that
imply the presence of Y (consequent).For example, "if a
customer buys bread (X), they are likely to buy butter (Y)."
EXAMPLE DATASET
Let's consider a small dataset of transactions at a retail store:
Transaction ID Items
T1 {Bread, Butter}
T2 {Bread, Milk}
T3 {Butter, Milk}
T4 {Bread, Butter, Milk}
T5 {Bread, Butter}
This dataset shows 5 transactions. We are interested in
discovering association rules between the items.
STEP 1: GENERATE
FREQUENT ITEMSETS
Step 1.1: Find 1-itemsets (individual items)We first count how
frequently each individual item appears in the transactions.Bread
appears in T1, T2, T4, T5 → Support = 4/5 = 0.8.Butter appears in
T1, T3, T4, T5 → Support = 4/5 = 0.8.Milk appears in T2, T3, T4 →
Support = 3/5 = 0.6.
Step 1.2: Find 2-itemsets (pairs of items)Next, we generate pairs
of items and count how frequently each pair occurs together.
{Bread, Butter} appears in T1, T4, T5 → Support = 3/5 = 0.6.
{Bread, Milk} appears in T2, T4 → Support = 2/5 = 0.4.{Butter,
Milk} appears in T3, T4 → Support = 2/5 = 0.4.
Step 1.3: Find 3-itemsets (triplets of items)We can also generate
triplets of items, though these are less common and require more
frequent itemsets .{Bread, Butter, Milk} appears in T4 → Support
= 1/5 = 0.2.
STEP 2: GENERATE
ASSOCIATION RULES
Now that we have frequent item sets, we can generate
association rules from these item sets. Let’s consider
generating rules from the itemset {Bread, Butter}.Rule 1:
{Bread} → {Butter}Rule 2: {Butter} → {Bread}
We will now calculate the Support, Confidence, and Lift for
these rules.
Rule 1: {Bread} → {Butter}
Support:
Support(Bread→Butter)=Number of transactions containing
both Bread and Butter/
Total number of transactions=3/5=0.6.
Confidence:
Confidence(Bread→Butter)=Number of transactions containi
ng both Bread and Butter/
Number of transactions containing Bread=3/4=0.75.
Lift:
Lift(Bread→Butter)=Confidence(Bread→Butter)/
Support(Butter)=0.75/0.8=0.9375
The Lift value of 0.9375 indicates that buying bread
decreases the likelihood of buying butter slightly,
suggesting a weak association.
Rule 2: {Butter} → {Bread}
Support:
Support(Butter→Bread)=Number of transactions containing bo
th Butter and Bread/ Total number of transactions=3/5=0.6
Confidence:
Confidence(Butter→Bread)=Number of transactions containing
both Butter and Bread/
Number of transactions containing Butter=3/4=0.75
Lift:
Lift(Butter→Bread)=Confidence(Butter→Bread)/Support(Bread)
=0.75/0.8=0.9375
This rule also has a Lift of 0.9375, similar to the previous one.
STEP 3: INTERPRET RESULTS
The two rules we generated indicate that there
is a moderate association between Bread and
Butter. Specifically, the Confidence values are
high (0.75), meaning that when one of these
items is purchased, the likelihood of purchasing
the other item is 75%. However, the Lift values
are slightly less than 1, indicating that these
two items are not strongly correlated; they are
likely to be bought together in the dataset but
not as strongly as other more correlated items.
THE TWO-STEP PROCESS OF
ASSOCIATION RULE MINING
IS:
Frequent Itemset Generation: Identify the
frequent item sets by scanning the dataset
and applying a minimum support threshold.
Rule Generation: Generate association rules
from the frequent item sets, then evaluate and
prune rules based on confidence and lift
thresholds.
MARKET BASKET ANALYSIS
Market Basket Analysis is a data mining
technique used to discover patterns or
relationships between items that customers
frequently buy together in retail environments.
The primary goal of this analysis is to find
associations between products that often
appear together in transactions, which can
then be used for various business purposes
such as cross-selling, up-selling, inventory
management, and personalized marketing.
APRIORI AND FREQUENT
PATTERN GROWTH
ALGORITHM
Apriori Algorithm and Frequent Pattern Growth (FP-Growth)
Algorithm are two popular techniques used for mining
frequent itemsets and association rules in large datasets,
particularly in the context of market basket analysis. Let's
dive into both algorithms:
1. Apriori Algorithm:
The Apriori algorithm is one of the earliest and most widely
used algorithms for mining frequent itemsets in
transactional databases. It works by iteratively identifying
itemsets that meet a predefined minimum support
threshold.
STEPS IN THE APRIORI
ALGORITHM:
1.Generate Candidate Itemsets: Start by scanning the
database to identify all frequent 1-itemsets (items that
appear at least a minimum number of times).
For each pass, generate candidate itemsets of size k (i.e.,
pairs, triples, etc.), which are formed from frequent itemsets
of size (k-1).
2. Count Support for Candidate Itemsets:Scan the database
again and count the frequency of each candidate itemset.
If the frequency (support) of a candidate itemset meets or
exceeds the minimum support threshold, it is considered a
frequent itemset.
3. Prune Infrequent Itemsets:Remove itemsets that do not
meet the minimum support threshold from the candidate
list. This reduces the size of the search space for the next
iteration.
4. Repeat Until No More Frequent Itemsets: The process
repeats by generating larger candidate itemsets and
checking their support, until no more frequent itemsets
are found.

Example:
If you're mining itemsets from a supermarket dataset
where the transactions include items like milk, bread,
butter, etc., Apriori would identify frequent itemsets (like
{milk, bread}) and generate association rules like:

{milk} -> {bread}: If a customer buys milk, they are likely


to buy bread.
2. FREQUENT PATTERN
GROWTH (FP-GROWTH)
ALGORITHM
The FP-Growth algorithm is an alternative to Apriori, designed to
be faster and more efficient. While Apriori generates candidate
itemsets and scans the database multiple times, FP-Growth uses a
compact data structure (called an FP-tree) to avoid generating
candidate itemsets and reduce the number of database scans.
Steps in the FP-Growth Algorithm:
1. Build the FP-tree: Scan the database to find frequent 1-itemsets
and then order items in the transactions based on frequency.
Construct the FP-tree by inserting transactions into the tree,
ensuring that common prefixes (frequent itemsets) are shared
among transactions.
2. Mining the FP-tree: Once the FP-tree is built, it is
recursively mined for frequent itemsets. Each
conditional pattern base is mined by constructing a
conditional FP-tree and applying the same process.
3. Recursive Process: For each item in the FP-tree,
recursively mine the conditional FP-tree for frequent
itemsets, resulting in a set of frequent patterns.
ADVANTAGES OF FP-
GROWTH OVER APRIORI:
Efficiency: FP-Growth is generally more efficient than
Apriori, especially with large datasets. It reduces the
number of database scans (just two passes in most cases).
No Candidate Generation: Unlike Apriori, FP-Growth
doesn't generate candidate itemsets, avoiding the
combinatorial explosion that can occur with Apriori.
When to Use:
Apriori: Suitable for smaller datasets or when memory
usage is less of a concern.
FP-Growth: Preferred for larger datasets, as it is more
efficient and scalable.
EXAMPLE:
Imagine we have the following transactions in a
database:
T1: {A, B, C}
T2: {A, C, D}
T3: {B, C, D}
FP-Growth would:
Construct an FP-tree by ordering items based on
frequency and organizing the transactions.
Recursively mine the FP-tree to identify frequent
itemsets.

You might also like