0% found this document useful (0 votes)
2K views6 pages

Pattern Warehouse

The document discusses pattern warehousing, which involves storing patterns discovered from data in a pattern warehouse rather than storing the raw data. A pattern warehouse is a type of repository that stores useful patterns found within data in an efficient manner. It allows storing patterns over many years using much less storage than would be needed for the raw data. Accessing patterns stored in a pattern warehouse can be done through a pattern query language similar to how SQL is used to access data in a traditional data warehouse.

Uploaded by

Sahu Sahu Subham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views6 pages

Pattern Warehouse

The document discusses pattern warehousing, which involves storing patterns discovered from data in a pattern warehouse rather than storing the raw data. A pattern warehouse is a type of repository that stores useful patterns found within data in an efficient manner. It allows storing patterns over many years using much less storage than would be needed for the raw data. Accessing patterns stored in a pattern warehouse can be done through a pattern query language similar to how SQL is used to access data in a traditional data warehouse.

Uploaded by

Sahu Sahu Subham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Pattern Warehouse

As the size of the data warehouse is growing due to massive increase in data day by day,
business analysts are not demand the huge analytical data but they are interested in getting
relevant and efficient patterns hidden within the repositories. Pattern warehouse is a kind
of repository which stores the relevant and useful patterns in an efficient manner. A
pattern is a set of items, subsequences substructures occur frequently in a data set. A
pattern may be in the form of association rules, a decision tree, and a cluster of items [13].
e.g. {Milk}-> {Bread, Butter} the different kinds of repositories used for data storage from
past decade to till now are shown. Comparison among Database, Data Warehouse and
Pattern Warehouse

Pattern Warehousing

Data Warehousing

DBMS

Traditional File Based System

Data vs Pattern

Table 1

Name Age
Rule 1
Table 2 If profession=athelete

Then

Age<30
Data Warehouse vs. Pattern Warehouse
The patterns a data mining system discovers are stored in the Pattern Warehouse. Just as a data
warehouse stores data, the Pattern Warehouse stores patterns it is an information repository that
stores relationships between data items, but not the data. While data items are stored in data
warehouse, we use the Pattern Warehouse to store the patterns and relationships among them. A
Pattern Warehouse is not a knowledge base. A knowledge base includes information that is
usually known to humans, is often handcoded and is somewhat static changing it will require
care and effort. A Pattern Warehouse holds far more dynamic information (which is
automatically re-generated once a month with new data) is often surprising to users, and detects
trends and patterns of change as they happen. A Pattern Warehouse is a repository that holds
historical patterns rather than historical data. With a pattern warehouse, almost all the relevant
patterns in the data are found beforehand, and stored for use by business users such as marketing
analysts, bank branch managers, store managers, etc. Business users get the interesting patterns
of change every week or month or can query the Pattern Warehouse at will.
Because of disk space limitations, many organizations only store 12 to 18 months worth of
historical data and in some cases there are so many transactions that data for only a few months
is actually available. However, because knowledge is so much more compact than data, the
Pattern Warehouse is only a fraction of the size of the data warehouse, allowing the patterns
many years to be stored with ease, even when the data is no longer available. To get a
perspective on the time and space scales, consider an example where the recent operational data
refers to one month of a bank’s customer information, while the historical data in the data
warehouse goes back 1 year. However, the historical patterns in the pattern warehouse may go as
far as 5 or 10 years and still be a small fraction of the size of the data warehouse. This provides a
huge amount of knowledge over time at a low cost for disk space and response time is far better
than the data warehouse because the patterns have already been extracted, ready for look-up.
This provides an environment for long-term corporate knowledge management. 5. Components
of Pattern Management To deal with patterns, we need to collect, store, manipulate, access and
visualize them, we need repositories, query languages and systems to deal with refined patterns
rather than raw data. Each of these has an equivalent in the data management world.
Components of Pattern Management
To deal with patterns, we need to collect, store, manipulate, access and visualize them, we need
repositories, query languages and systems to deal with refined patterns rather than raw data. Each
of these has an equivalent in the data management world. Patterns can in fact be represented as a
set of “pattern-tables” within a traditional relational database. This solves several potential issues
regarding user access rights, security control, multi-user access, etc. Obviously, we need a
language to access and query the contents of pattern repository. SQL may be considered an
obvious first candidate for this, but when SQL was designed over 30 years ago, data mining was
not a major issue. SQL was designed to access data stored in databases. We need pattern-
oriented languages to access pattern repository storing various types of exact and inexact
patterns. Often, it is very hard to access these patterns with SQL. Patterns cannot be conveniently
queried in a direct way using a relational query language. Not only are some patterns not easily
stored in a simple tabular format, but by just looking up influence factors in pattern-tables we
may get incorrect results. We need a “patternkernel” that consistently manages and merges
patterns. While SQL relies on the relational algebra, pattern query uses the “pattern algebra”.
Pattern query process should use SQL as part of its operation, i.e. pattern queries are
decomposed into a set of related SQL queries, and then the results are recombined. However,
business users just click on a graphic user interface to retrieve patterns on the intranet. They can
begin to access knowledge immediately without lengthy training sessions or analytical know-
how. With pattern visualization the user still performs analysis (e.g. visualizes affinity patterns)
the results delivered for the same level of computational effort are orders of magnitude better
because the user now analyzes refined knowledge, not data. And now 100 different analysts will
no longer get 100 different answers from the same data because there is a central knowledge
repository. A natural way of delivering pattern-based information to users on the web is a
document organized as a collection of information of different types, e.g. text, data, graphs, etc.
An Explainable Document looks like any other web page at first, but does an incredible amount
more by allowing users to dynamically obtain explanations that clarify, justify and substantiate
the patterns presented within the document. Explainable documents are in fact a key element of
Machine-Man Systems allowing for the intelligent exchange of refined information between
users and systems. Fast Accessing to the Pattern Warehouse
The Pattern Warehouse is represented as a set of "pattern-tables" within a traditional relational
database. This solves several potential issues regarding user access rights, security control, multi-
user access, etc. But obviously, we need a language to access and query the contents of Pattern
Warehouses. SQL may be considered an obvious first candidate for this, but when SQL was
designed over 30 years ago, data mining was not a major issue. SQL was designed to access data
stored in databases. We need pattern-oriented languages to access Pattern Warehouses storing
various types of exact and inexact patterns. Often, it is very hard to access these patterns with
SQL. Hence a Pattern Warehouse can’t be conveniently queried in a direct way using a relational
query language. Not only are some patterns not easily stored in a simple tabular format, but also
by just looking up influence factors in pattern-tables we may get incorrect results. We need a
"pattern-kernel" that consistently manages and merges patterns. The pattern-kernel forms the
heart of PQL(Pattern Query Language), which does for decision support spaces, what SQL does
for the data space. While SQL relies on the relational algebra, PQL uses the "pattern algebra".
PQL was designed to access Pattern Warehouses just as SQL was designed to access databases.
PQL was designed to be very similar to SQL. It allows knowledge based queries just as SQL
allows data based queries. And, PQL uses SQL as part of its operation, i.e. PQL queries are
decomposed into a set of related SQL queries, then the results are re-combined. However,
business users do not usually see PQL. They just click on a graphic user interface to retrieve
patterns on the intranet. They can begin to access knowledge immediately just by clicking on a
browser-based graphic user interface without lengthy training sessions or analytical knowhow.
Using PQL has a multitude of technical and business benefits that reinforce each other. Not only
does it provide faster response with less computing, but delivers more accurate, consistent and
higher quality knowledge. Responses to knowledge queries are more efficient because patterns
have already been pre-computed. Avoiding the repeated discovery sessions that are unknowingly
performed by multiple analysts reduces the overall computational burden. In many cases,
avoiding repeat discovery sessions performed by the same analyst is itself a significant benefit.
With the PQL the user still performs analysis (e.g. visualizes affinity patterns) the results
delivered for the same level of computational effort are orders of magnitude better because the
user now analyzes refined knowledge, not data. And now 100 different analysts will no longer
get 100 different answers from the same data because there is a central knowledge repository.
Architectural aspects of Pattern Warehousing
Challenges/Issues in Pattern Warehousing
Data Integration: Data warehouses are designed to integrate data from various sources, which
can be a complex process. The data may be stored in different formats, have different levels of
granularity, or use different data models. Integrating this data into a cohesive and consistent
data warehouse can be challenging.
Data Quality: Data quality is crucial for the success of a data warehouse. Poor data quality
can lead to inaccurate or incomplete analyses, which can have significant impacts on business
decisions. Ensuring data quality requires careful data cleaning and validation, which can be
time-consuming and challenging.

Data Volume: Data warehouses can contain vast amounts of data, which can make it
challenging to manage and process. Managing the volume of data requires careful planning,
design, and optimization to ensure that the system can handle the required workload.

Performance: Data warehouses must provide fast query response times to support business
intelligence and analytics. Achieving high performance can be challenging, as data warehouses
require complex data models, indexing strategies, and query optimization techniques.

Security: Data warehouses contain sensitive data, and ensuring data security is crucial.
Implementing robust security measures, such as access control, data encryption, and data
masking, can be challenging, especially when dealing with large volumes of data.

Business Requirements: Designing and implementing a data warehouse that meets business


requirements can be challenging. Business requirements can be complex, and may require
specialized data models, analytics, or reporting capabilities. Meeting these requirements
requires careful planning, communication, and collaboration between business stakeholders
and IT teams.

What is the basic concept of pattern mining?


Pattern mining concentrates on identifying rules that describe specific patterns
within the data. Market-basket analysis, which identifies items that typically occur
together in purchase transactions, was one of the first applications of data mining.

What are the algorithms for pattern mining?


Commonly used algorithms include:
 GSP algorithm.
 Sequential Pattern Discovery using Equivalence classes (SPADE)
 FreeSpan.
 PrefixSpan.
 MAPres.
 Seq2Pat (for constraint-based sequential pattern mining)
What are the five common types of data patterns?
What Are Data Trends and Patterns, and How Do They Impact Business
Decisions?
 Linear Data Trends. A linear pattern is a continuous decrease or increase in numbers
over time. ...
 Exponential Data Trends. ...
 Seasonality. ...
 Irregular/Random Patterns. ...
 Stationary/Stationarity. ...
 Cyclical Patterns.

 What is pattern growth approach in mining?


Pattern-growth is one of several influential frequent pattern mining methodologies,
where a pattern (e.g., an itemset, a subsequence, a subtree, or a substructure) is
frequent if its occurrence frequency in a database is no less than a specified
minimum_support threshold.

What is the best way to find patterns in data?


Graphic displays like histograms in statistics are useful for seeing patterns in data.
Patterns in data are commonly described in terms of center, spread, shape, and
unusual features. Some common distributions have special descriptive labels, such as
symmetric, bell-shaped, skewed, etc

What is emerging patterns?


Emerging patterns are sets of items whose frequency changes significantly from
one dataset to another. They are useful as a means of discovering distinctions
inherently present amongst a collection datasets and have been shown to be a powerful
method for constructing accurate classifiers.

What software to identify patterns?


Target groups
 Handwriting Analysis Tool (HAT)
 Visual-Pattern Detector (VPD)
 Line Detection Tool (LDT)
 X-ray Fluorescence Data Analysis Tool (XRF-DAT)
 Text-Lines Counter (TLC)
 Artefact-Features Analysis Tool (AFAT)

You might also like