Pattern Warehouse
Pattern Warehouse
As the size of the data warehouse is growing due to massive increase in data day by day,
business analysts are not demand the huge analytical data but they are interested in getting
relevant and efficient patterns hidden within the repositories. Pattern warehouse is a kind
of repository which stores the relevant and useful patterns in an efficient manner. A
pattern is a set of items, subsequences substructures occur frequently in a data set. A
pattern may be in the form of association rules, a decision tree, and a cluster of items [13].
e.g. {Milk}-> {Bread, Butter} the different kinds of repositories used for data storage from
past decade to till now are shown. Comparison among Database, Data Warehouse and
Pattern Warehouse
Pattern Warehousing
Data Warehousing
DBMS
Data vs Pattern
Table 1
Name Age
Rule 1
Table 2 If profession=athelete
Then
Age<30
Data Warehouse vs. Pattern Warehouse
The patterns a data mining system discovers are stored in the Pattern Warehouse. Just as a data
warehouse stores data, the Pattern Warehouse stores patterns it is an information repository that
stores relationships between data items, but not the data. While data items are stored in data
warehouse, we use the Pattern Warehouse to store the patterns and relationships among them. A
Pattern Warehouse is not a knowledge base. A knowledge base includes information that is
usually known to humans, is often handcoded and is somewhat static changing it will require
care and effort. A Pattern Warehouse holds far more dynamic information (which is
automatically re-generated once a month with new data) is often surprising to users, and detects
trends and patterns of change as they happen. A Pattern Warehouse is a repository that holds
historical patterns rather than historical data. With a pattern warehouse, almost all the relevant
patterns in the data are found beforehand, and stored for use by business users such as marketing
analysts, bank branch managers, store managers, etc. Business users get the interesting patterns
of change every week or month or can query the Pattern Warehouse at will.
Because of disk space limitations, many organizations only store 12 to 18 months worth of
historical data and in some cases there are so many transactions that data for only a few months
is actually available. However, because knowledge is so much more compact than data, the
Pattern Warehouse is only a fraction of the size of the data warehouse, allowing the patterns
many years to be stored with ease, even when the data is no longer available. To get a
perspective on the time and space scales, consider an example where the recent operational data
refers to one month of a bank’s customer information, while the historical data in the data
warehouse goes back 1 year. However, the historical patterns in the pattern warehouse may go as
far as 5 or 10 years and still be a small fraction of the size of the data warehouse. This provides a
huge amount of knowledge over time at a low cost for disk space and response time is far better
than the data warehouse because the patterns have already been extracted, ready for look-up.
This provides an environment for long-term corporate knowledge management. 5. Components
of Pattern Management To deal with patterns, we need to collect, store, manipulate, access and
visualize them, we need repositories, query languages and systems to deal with refined patterns
rather than raw data. Each of these has an equivalent in the data management world.
Components of Pattern Management
To deal with patterns, we need to collect, store, manipulate, access and visualize them, we need
repositories, query languages and systems to deal with refined patterns rather than raw data. Each
of these has an equivalent in the data management world. Patterns can in fact be represented as a
set of “pattern-tables” within a traditional relational database. This solves several potential issues
regarding user access rights, security control, multi-user access, etc. Obviously, we need a
language to access and query the contents of pattern repository. SQL may be considered an
obvious first candidate for this, but when SQL was designed over 30 years ago, data mining was
not a major issue. SQL was designed to access data stored in databases. We need pattern-
oriented languages to access pattern repository storing various types of exact and inexact
patterns. Often, it is very hard to access these patterns with SQL. Patterns cannot be conveniently
queried in a direct way using a relational query language. Not only are some patterns not easily
stored in a simple tabular format, but by just looking up influence factors in pattern-tables we
may get incorrect results. We need a “patternkernel” that consistently manages and merges
patterns. While SQL relies on the relational algebra, pattern query uses the “pattern algebra”.
Pattern query process should use SQL as part of its operation, i.e. pattern queries are
decomposed into a set of related SQL queries, and then the results are recombined. However,
business users just click on a graphic user interface to retrieve patterns on the intranet. They can
begin to access knowledge immediately without lengthy training sessions or analytical know-
how. With pattern visualization the user still performs analysis (e.g. visualizes affinity patterns)
the results delivered for the same level of computational effort are orders of magnitude better
because the user now analyzes refined knowledge, not data. And now 100 different analysts will
no longer get 100 different answers from the same data because there is a central knowledge
repository. A natural way of delivering pattern-based information to users on the web is a
document organized as a collection of information of different types, e.g. text, data, graphs, etc.
An Explainable Document looks like any other web page at first, but does an incredible amount
more by allowing users to dynamically obtain explanations that clarify, justify and substantiate
the patterns presented within the document. Explainable documents are in fact a key element of
Machine-Man Systems allowing for the intelligent exchange of refined information between
users and systems. Fast Accessing to the Pattern Warehouse
The Pattern Warehouse is represented as a set of "pattern-tables" within a traditional relational
database. This solves several potential issues regarding user access rights, security control, multi-
user access, etc. But obviously, we need a language to access and query the contents of Pattern
Warehouses. SQL may be considered an obvious first candidate for this, but when SQL was
designed over 30 years ago, data mining was not a major issue. SQL was designed to access data
stored in databases. We need pattern-oriented languages to access Pattern Warehouses storing
various types of exact and inexact patterns. Often, it is very hard to access these patterns with
SQL. Hence a Pattern Warehouse can’t be conveniently queried in a direct way using a relational
query language. Not only are some patterns not easily stored in a simple tabular format, but also
by just looking up influence factors in pattern-tables we may get incorrect results. We need a
"pattern-kernel" that consistently manages and merges patterns. The pattern-kernel forms the
heart of PQL(Pattern Query Language), which does for decision support spaces, what SQL does
for the data space. While SQL relies on the relational algebra, PQL uses the "pattern algebra".
PQL was designed to access Pattern Warehouses just as SQL was designed to access databases.
PQL was designed to be very similar to SQL. It allows knowledge based queries just as SQL
allows data based queries. And, PQL uses SQL as part of its operation, i.e. PQL queries are
decomposed into a set of related SQL queries, then the results are re-combined. However,
business users do not usually see PQL. They just click on a graphic user interface to retrieve
patterns on the intranet. They can begin to access knowledge immediately just by clicking on a
browser-based graphic user interface without lengthy training sessions or analytical knowhow.
Using PQL has a multitude of technical and business benefits that reinforce each other. Not only
does it provide faster response with less computing, but delivers more accurate, consistent and
higher quality knowledge. Responses to knowledge queries are more efficient because patterns
have already been pre-computed. Avoiding the repeated discovery sessions that are unknowingly
performed by multiple analysts reduces the overall computational burden. In many cases,
avoiding repeat discovery sessions performed by the same analyst is itself a significant benefit.
With the PQL the user still performs analysis (e.g. visualizes affinity patterns) the results
delivered for the same level of computational effort are orders of magnitude better because the
user now analyzes refined knowledge, not data. And now 100 different analysts will no longer
get 100 different answers from the same data because there is a central knowledge repository.
Architectural aspects of Pattern Warehousing
Challenges/Issues in Pattern Warehousing
Data Integration: Data warehouses are designed to integrate data from various sources, which
can be a complex process. The data may be stored in different formats, have different levels of
granularity, or use different data models. Integrating this data into a cohesive and consistent
data warehouse can be challenging.
Data Quality: Data quality is crucial for the success of a data warehouse. Poor data quality
can lead to inaccurate or incomplete analyses, which can have significant impacts on business
decisions. Ensuring data quality requires careful data cleaning and validation, which can be
time-consuming and challenging.
Data Volume: Data warehouses can contain vast amounts of data, which can make it
challenging to manage and process. Managing the volume of data requires careful planning,
design, and optimization to ensure that the system can handle the required workload.
Performance: Data warehouses must provide fast query response times to support business
intelligence and analytics. Achieving high performance can be challenging, as data warehouses
require complex data models, indexing strategies, and query optimization techniques.
Security: Data warehouses contain sensitive data, and ensuring data security is crucial.
Implementing robust security measures, such as access control, data encryption, and data
masking, can be challenging, especially when dealing with large volumes of data.