Data Mining_Unit-V
Data Mining_Unit-V
Anomaly Detection:
o Example: Detecting fraudulent credit card transactions.
Regression:
o Example: Predicting real-time stock prices.
d) Sampling
Randomly selects a subset of the data stream for processing.
Example:
o Sampling a portion of website traffic to estimate the total number of
users.
e) Incremental Algorithms
Continuously updates the model with each incoming data point.
Example:
o Incremental clustering algorithms like CluStream.
o Update the decision tree with new labeled data points (incremental
learning).
1. Time-Series Data
Definition: Time-series data is a sequence of data points collected over time,
typically at uniform intervals. Each data point is associated with a timestamp.
Examples:
Stock prices over days
Temperature measurements recorded hourly
Monthly sales data of a product
Characteristics:
Temporal Ordering: The sequence of the data matters.
Continuous or Discrete: Data can be continuously measured or occur at
distinct intervals.
Trends and Seasonality: Patterns such as growth trends or seasonal
variations are common.
Pattern: {A → C}
Support: 2 (appears in sequences 1 and 2)
b) PrefixSpan (Prefix-projected Sequential Pattern Mining)
Projects the database based on prefixes of sequences to reduce the search
space.
Generates frequent patterns directly from the projected database.
Example:
Database: {A → B → C}, {A → C}, {B → C → D}
Step 1: Identify prefix (e.g., {A}).
Step 2: Project database: {B → C}, {C}.
Step 3: Mine frequent patterns: {A → C}.
c) SPADE (Sequential Pattern Discovery using Equivalence classes)
Uses a vertical database format to efficiently mine sequences.
Stores each item with the list of sequences in which it appears.
2. Healthcare:
o Identifying symptom sequences leading to a diagnosis.
3. Finance:
o Detecting sequences of transactions indicating fraud.
4. Web Analytics:
o Analyzing user navigation paths to improve website design.
Key Techniques:
Spatial Clustering: Identifies groups of similar spatial objects (e.g., k-
Means for geographic data).
Spatial Classification: Assigns a label to spatial data (e.g., land type
classification: forest, urban, water).
Spatial Association Rules: Finds relationships among spatial data (e.g.,
"If a region is near a river, it is likely to have fertile soil").
Applications:
Satellite image analysis.
Geographic Information Systems (GIS).
Crime mapping.
Applications:
Search engine optimization (SEO).
E-commerce recommendations.
User behavior analysis.