Module 3 Mining frequent patterns and associations
Module 3 Mining frequent patterns and associations
• Market Basket Analysis (MBA) is a data mining technique used in retail and e-commerce to identify patterns in
customer purchasing behavior. It helps businesses understand which products are frequently bought together, allowing
them to optimize marketing strategies, cross-selling, and store layouts.
How Market Basket Analysis Works
Market Basket Analysis is based on association rule learning, a technique that discovers relationships between items in a
dataset. It primarily uses three key metrics:
1. Support:
Measures how frequently an item or item set appears in transactions.
2. Confidence
• Measures the likelihood that if a customer buys item A, they will also buy item B.
Lift
• Measures how much more likely item B is to be purchased when item A is bought, compared to random chance.
Applications of Market Basket Analysis
1. Retail & E-commerce
1. Recommending products based on frequently bought-together items.
2. Example: Amazon's "Customers who bought this also bought..."
2. Supermarket & Grocery Stores
1. Designing store layouts to place related products together.
2. Example: Placing bread next to butter and jam.
3. Fraud Detection
1. Identifying unusual purchasing patterns that may indicate fraud.
4. Healthcare & Medical Diagnosis
1. Analyzing patient records to identify common disease patterns.
5. Web & Content Recommendations
1. Suggesting related articles, videos, or online courses.
Algorithms for Market Basket Analysis
• Apriori Algorithm – Finds frequent item sets using a bottom-up approach.
• FP-Growth Algorithm – Uses a tree structure to find frequent patterns more efficiently.
• Eclat Algorithm – Uses depth-first search to find frequent item sets.
Frequent Item Sets
• A frequent item set is a collection of items that appear together in a dataset with a frequency higher than a predefined
threshold, called support. Frequent item sets are a key concept in Market Basket Analysis (MBA) and are used to
derive association rules that help in decision-making.
Closed item sets
• A closed item set is a frequent item set where none of its supersets (larger sets containing it) have the same
support. In simpler terms, a closed item set captures all important information about item frequency without
redundancy.
Why Use Closed Item Sets?
• Reduces the number of item sets without losing meaningful information.
• Speeds up association rule mining by eliminating unnecessary rules.
Association Rule
• Association Rule Mining is a machine learning technique used to discover relationships between items in large
datasets. It helps identify patterns, such as which products are often purchased together in retail or which
symptoms frequently co-occur in medical diagnosis.
An association rule is written in the form:
A⇒B where:
• A (Antecedent): The item(s) on the left side of the rule (e.g., "If a customer buys Milk").
• B (Consequent): The item(s) on the right side of the rule (e.g., "Then they also buy Bread").
Improving the efficiency of Apriori
Example Rules:
• [Age=20–30] ∧ [Gender=Male] → [Buys=Laptop]
• [Occupation=Teacher] ∧ [Buys=Books] → [Visits=Library]
Types of Multidimensional Rules:
• Intradimensional: All attributes from the same dimension (e.g., all products).
• Interdimensional: Attributes from different dimensions (e.g., age and product).
• Hybrid-dimensional: Mix of single-dimension and multi-dimension rules.
Techniques Used:
• Transformation of non-item data into item-like format (e.g., "Age=20-30" becomes an item).
• Use of cube-based mining or relational databases.
Applications:
• Customer segmentation
• Targeted marketing
• Fraud detection
• Healthcare analytics