Data Warehousing
Data Warehousing
Concepts and
Techniques
warehouses
Data Mining: Concepts and
July 8, 2025 Techniques 3
Data Warehouse—Subject-
Oriented
transaction records
Data cleaning and data integration techniques
are applied.
Ensure consistency in naming conventions,
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
Data Mining: Concepts and
July 8, 2025 Techniques 14
Conceptual Modeling
of Data Warehouses
Modeling data warehouses: dimensions & measures
Star schema: A fact table in the middle connected
to a set of dimension tables
Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized
into a set of smaller dimension tables, forming a
shape similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Data Mining: Concepts and
July 8, 2025 Techniques 15
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
Data Mining: Concepts and
July 8, 2025 Techniques 16
Example of Snowflake
Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
Data Mining: Concepts and
July 8, 2025 Techniques 19
Defining a Star Schema in
DMQL
all all
Specification of hierarchies
Schema hierarchy
day < {month < quarter;
week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
Data Mining: Concepts and
July 8, 2025 Techniques 26
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
ct
TV
du
PC U.S.A
o
Pr
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Visualization
OLAP capabilities
Interactive manipulation
Data Mining: Concepts and
July 8, 2025 Techniques 29
Typical OLAP Operations
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or
detailed data, or introducing new dimensions
Slice and dice:
project and select
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
Data Mining: Concepts and
July 8, 2025 Techniques 30
A Star-Net Query Model
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a Promotion
Data Mining: Concepts and
Organization
July 8, 2025 footprint Techniques 31
Chapter 2: Data Warehousing
and OLAP Technology for Data
Mining
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
Monitor
Metadata & OLAP Server
other
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Refresh
Warehouse Data mining
Data Marts
materialized
Data Mining: Concepts and
July 8, 2025 Techniques 36
Data Warehouse
Development: A
Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
techniques)
fast indexing to pre-computed summarized data
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32 What is the best
c0
b3 B13 14 15 16 60 traversing order
44
9
28 56 to do multi-way
b2
B 40
24 52 aggregation?
b1 5 36
20
b0 1 2 3 4
a0 a1 a2 a3
Data Mining: Concepts and
July 8, 2025 A Techniques 44
Multi-way Array Aggregation
for Cube Computation
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data Data
integration Warehouse
Mining: Concepts and Repository
July 8, 2025 Techniques 60
Summary
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making
process
A multi-dimensional model of a data warehouse
Star schema, snowflake schema, fact constellations
A data cube consists of dimensions & measures
OLAP operations: drilling, rolling, slicing, dicing and pivoting
OLAP servers: ROLAP, MOLAP, HOLAP
Efficient computation of data cubes
Partial vs. full vs. no materialization
Multiway array aggregation
Bitmap index and join index implementations
Further development of data cube technology
Discovery-drive and multi-feature cubes
From OLAP to OLAM (on-line analytical
Data Mining: mining)
Concepts and
July 8, 2025 Techniques 61
References (I)
S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S.
Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf.
Very Large Data Bases, 506-521, Bombay, India, Sept. 1996.
D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance in data
warehouses. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data, 417-427, Tucson,
Arizona, May 1997.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of
high dimensional data for data mining applications. In Proc. 1998 ACM-SIGMOD Int. Conf.
Management of Data, 94-105, Seattle, Washington, June 1998.
R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc.
1997 Int. Conf. Data Engineering, 232-243, Birmingham, England, April 1997.
K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In
Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), 359-370,
Philadelphia, PA, June 1999.
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM
SIGMOD Record, 26:65-74, 1997.
OLAP council. MDAPI specification version 2.0. In
http://www.olapcouncil.org/research/apily.htm, 1998.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H.
Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab
and sub-totals. Data Mining and Data
Knowledge Discovery,
Mining: Concepts and1:29-54, 1997.
July 8, 2025 Techniques 62
References (II)
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In
Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, pages 205-216, Montreal,
Canada, June 1996.
Microsoft. OLEDB for OLAP programmer's reference version 1.0. In
http://www.microsoft.com/data/oledb/olap, 1998.
K. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. 1997 Int. Conf.
Very Large Data Bases, 116-125, Athens, Greece, Aug. 1997.
K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple
granularities. In Proc. Int. Conf. of Extending Database Technology (EDBT'98), 263-277,
Valencia, Spain, March 1998.
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data
cubes. In Proc. Int. Conf. of Extending Database Technology (EDBT'98), pages 168-182,
Valencia, Spain, March 1998.
E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems. John Wiley &
Sons, 1997.
Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous
multidimensional aggregates. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data,
159-170, Tucson, Arizona, May 1997.
Data Mining: Concepts and
July 8, 2025 Techniques 63
http://www.cs.sfu.ca/~han