0% found this document useful (0 votes)
36 views

Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm

The document discusses using the Apriori algorithm to perform association analysis and frequent sequential pattern mining on transaction data to discover relationships between frequently purchased items. It describes the Apriori algorithm's level-wise approach of finding frequent item sets and generating association rules to capture these relationships.

Uploaded by

menon pokemon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm

The document discusses using the Apriori algorithm to perform association analysis and frequent sequential pattern mining on transaction data to discover relationships between frequently purchased items. It describes the Apriori algorithm's level-wise approach of finding frequent item sets and generating association rules to capture these relationships.

Uploaded by

menon pokemon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Association analysis and frequent sequential

pattern mining-Apriori Algorithm


• Enterprises accumulate a large amount of transaction data (for example, sales orders from retailers, invoices,
and shipping documentations) from daily operations.
• Finding hidden relationships in the data can be useful, such as, "What products are often bought together?" or
"What are the subsequent purchases after buying a cell phone?"
• To answer these two questions, we need to perform association analysis and frequent sequential pattern
mining on a transaction dataset.
• Association analysis is an approach to find interesting relationships within a transaction dataset.
• If retailers can use this kind of information(interesting relationships) or rule to cross-sell products to their
customers, there is a high likelihood that they can increase their sales.
• Association analysis is used to find a correlation between item sets, but what if you want to find out the order
in which items are frequently purchased?
• To achieve this, you can adopt frequent sequential pattern mining to find frequent subsequences from
transaction datasets with temporal information.
• You can then use the mined frequent subsequences to predict customer shopping sequence orders, web click
streams, biological sequences, and usages in other applications.
Mining associations with the Apriori rule

• Association mining is a technique that can discover interesting relationships hidden in transaction datasets.
• This approach first finds all frequent item sets, and generates strong association rules from frequent item sets.
• Apriori is the most well-known association mining algorithm, which identifies frequent individual items first and
then performs a breadth-first search strategy to extend individual items to larger item sets until larger frequent item
sets cannot be found.
• The purpose of association mining is to discover associations among items from the transactional database.
• Typically, the process of association mining proceeds by finding item sets that have the support greater than the
minimum support.
• Next, the process uses the frequent item sets to generate strong rules (for example, milk => bread; a customer who
buys milk is likely to buy bread) that have the confidence greater than minimum the confidence.
• By definition, an association rule can be expressed in the form of X=>Y, where X and Y are disjointed item sets.
• We can measure the strength of associations between two terms: support and confidence.
• Support shows how much of the percentage of a rule is applicable within a dataset, while confidence indicates the
probability of both X and Y appearing in the same transaction:
• 

Here, σ refers to the frequency of a particular itemset; N denotes the population


As support and confidence are metrics for the strength rule only, you might still obtain many redundant rules
with a high support and confidence.
Therefore, we can use the third measure, lift, to evaluate the quality (ranking) of the rule.
By definition, lift indicates the strength of a rule over the random co-occurrence of X and Y, so we can
formulate lift in the following form:
• Apriori is the best known algorithm for mining associations, which performs a level-wise, breadth-first
algorithm to count the candidate item sets.
• The process of Apriori starts by finding frequent item sets (a set of items that have minimum support) level-
wisely. For example, the process starts with finding frequent 1-itemsets.
• Then, the process continues by using frequent 1-itemsets to find frequent 2-itemsets.
• The process iteratively discovers new frequent k+1- item sets from frequent k-item sets until no frequent item
sets are found.
• Finally, the process utilizes frequent item sets to generate association rules:
The apriori algorithm
• The apriori algorithm is credited to Agrawwal, Imieliński and Swami (Agrawal et al. 1993) who applied it to
market basket data to generate association rules.
• Association rules are usually applied to binary data, which fits the context where customers either purchase or
don’t purchase particular products.
• The apriori algorithm operates by systematically considering combinations of variables, and ranking them on
either support, confidence, or lift at the user’s discretion.
• The apriori algorithm operates by finding all rules satisfying minimum confidence and support specifications.
First, the set of frequent 1- item sets is identified by scanning the database to count each item.
• Next, 2-item sets are identified, gaining some efficiency by using the fact that if a 1-item set is not frequent, it
can’t be part of a frequent itemset of larger dimension.
• This continues to larger dimensioned item sets until they become null.
• The magnitude of effort required is indicated by the fact that each dimension of item sets requires a full scan
of thedatabase.
The algorithm is:
To identify the candidate itemset Ck of size k
1. Identify frequent items L1
For k = 1 generate all item sets with support ≥ Supportmin
If item sets null, STOP
Increment k by 1
For item sets of size k identify all with support ≥ Supportmin
END
2. Return list of frequent item sets
3. Identify rules in the form of antecedents and consequents from the frequent items
4. Check confidence of these rules
• If confidence of a rule meets Confidencemin mark this rule as strong.
• The output of the apriori algorithm can be used as the basis for recommending rules, considering factors such
as correlation, or analysis from other techniques, from a training set of data.
• This information may be used in many ways, including in retail where if a rule is identified indicating that
purchase of the antecedent occurred without that customer purchasing the consequent, then it might be
attractive to suggest purchase of the consequent.
• The apriori algorithm can generate many frequent item sets.
• Association rules can be generated by only looking at frequent item sets that are strong, in the sense that they
meet or exceed both minimum support and minimum confidence levels.
• It must be noted that this does not necessarily mean such a rule is useful, that it means high correlation, nor
that it has any proof of causality.
• However, a good feature is that you can let computers loose to identify them (an example of machine
learning)
• To demonstrate using data from Table given below, establish Support min = 0.4 and Confidencemin = 0.5:
• Identify rules from frequent items:

• All other combinations of frequent item sets in L3 failed the minimum support test.
• These rules now would need to be evaluated, possibly subjectively by the users, for interestingness.
• Here the focus is on cases where a customer who buys one type of book might be likely according to this data
to buy the other type of books.
• Another indication is that if a customer never bought a paperback, they are not likely to buy a hardback, and
vice versa.
The Apriori algorithm to find association rules within transactions:
An application on real world data set

• We use the built-in Groceries dataset, which contains one month of real-world point-of-sale transaction data
from a typical grocery outlet.
• We then use the summary function to obtain the summary statistics of the Groceries dataset.
• The summary statistics shows that the dataset contains 9,835 transactions, which are categorized into 169
categories.
• In addition to this, the summary shows information, such as most frequent items, itemset distribution, and
example extended item information within the dataset.
• We can then use itemFrequencyPlot to visualize the five most frequent items with support over 0.1.
• Next, we apply the Apriori algorithm to search for rules with support over 0.001 and confidence over 0.5.
• We then use the summary function to inspect detailed information on the generated rules. From the output
summary, we find the Apriori algorithm generates 5,668 rules with support over 0.001 and confidence over
0.5.
• Further, we can find the rule length distribution, summary of quality measures, and mining information. In
the summary of the quality measurement, we find descriptive statistics of three measurements, which are
support, confidence, and lift.
• Support is the proportion of transactions containing a certain itemset.
• Confidence is the correctness percentage of the rule. Lift is the response target association rule divided by the
average response.
• To explore some generated rules, we can use the inspect function to view the first six rules of the 5,668
generated rules.
• Lastly, we can sort rules by confidence and list rules with the most confidence.
• Therefore, we find that rich sugar associated to whole milk is the most confident rule with the support equal
to 0.001220132, confidence equal to 1, and lift equal to 3.913649.

You might also like