CH 03
CH 03
Data warehousing:
Data WarehouseSubject-Oriented
product, sales
Data WarehouseIntegrated
Data WarehouseNonvolatile
operational environment
OLAP
users
clerk, IT professional
knowledge worker
function
decision support
DB design
application-oriented
subject-oriented
data
current, up-to-date
detailed, flat relational
isolated
repetitive
historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans
unit of work
read/write
index/hash on prim. key
short, simple transaction
# records accessed
tens
millions
#users
thousands
hundreds
DB size
100MB-GB
100GB-TB
metric
transaction throughput
usage
access
complex query
Note: There are more and more systems which perform OLAP
analysis directly on relational databases
10
11
Top-down view
12
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
13
Other
sources
Operational
DBs
Metadata
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP Server
Serve
Analysis
Query
Reports
Data mining
Data Marts
Data Sources
December 18, 2014
Data Storage
14
Enterprise warehouse
collects all of the information about subjects spanning
the entire organization
Data Mart
a subset of corporate-wide data that is of value to a
specific groups of users. Its scope is confined to
specific, selected groups, such as marketing data mart
Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be
materialized
15
Distributed
Data Marts
Data
Mart
Data
Mart
Model refinement
Enterprise
Data
Warehouse
Model refinement
16
Data extraction
get data from multiple, heterogeneous, and external
sources
Data cleaning
detect errors in the data and rectify them when possible
Data transformation
convert data from legacy or host format to warehouse
format
Load
sort, summarize, consolidate, compute views, check
integrity, and build indicies and partitions
Refresh
propagate the updates from the data sources to the
warehouse
17
Metadata Repository
Operational meta-data
18
19
20
Information processing
Analytical processing
Data mining
21
22
Mining result
Layer4
User Interface
OLAM
Engine
OLAP
Engine
Layer3
OLAP/OLAM
MDDB
MDDB
Meta Data
Filtering&Integration
Database API
Filtering
Layer1
Data cleaning
Databases
December 18, 2014
Data
Data integration Warehouse
Data Mining: Concepts and Techniques
Data
Repository
23
Summary
24
25
References (I)
E. F. Codd, S. B. Codd, and C. T. Salley. Beyond decision support. Computer World, 27,
July 1993.
J. Gray, et al. Data cube: A relational aggregation operator generalizing group-by,
cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.
A. Gupta and I. S. Mumick. Materialized Views: Techniques, Implementations, and
Applications. MIT Press, 1999.
J. Han. Towards on-line analytical mining in large databases. ACM SIGMOD Record,
27:97-107, 1998.
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently.
SIGMOD96
26
References (II)
27