0% found this document useful (0 votes)

21 views

Data Warehouse

The document discusses data warehousing and OLAP technology. It describes the differences between data warehouses and heterogeneous databases as well as operational databases. The document also explains concepts like star schemas, snowflake schemas, and fact constellations that are used in data warehouse design.

Uploaded by

Vignesh Senthil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Data Warehouse

Uploaded by

Vignesh Senthil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Data Warehousing and OLAP

Technology

* 1
Data Warehouse vs. Heterogeneous DBMS

■ Traditional heterogeneous DB integration:

■ Build wrappers/mediators on top of heterogeneous databases
■ Query driven approach
■ When a query is posed to a client site, a meta-dictionary is
used to translate the query into queries appropriate for
individual heterogeneous sites involved, and the results are
integrated into a global answer set
■ Complex information filtering, compete for resources
■ Data warehouse: update-driven, high performance
■ Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis

* 2
Data Warehouse vs. Operational DBMS
■ OLTP (on-line transaction processing)
■ Major task of traditional relational DBMS
■ Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
■ OLAP (on-line analytical processing)
■ Major task of data warehouse system
■ Data analysis and decision making
■ Distinct features (OLTP vs. OLAP):
■ User and system orientation: customer vs. market
■ Data contents: current, detailed vs. historical, consolidated
■ Database design: ER + application vs. star + subject
■ View: current, local vs. evolutionary, integrated
■ Access patterns: update vs. read-only but complex queries
* 3
OLTP vs. OLAP

* 4
Why Separate Data Warehouse?
■ High performance for both systems
■ DBMS— tuned for OLTP: access methods, indexing,
concurrency control, recovery
■ Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation.
■ Different functions and different data:
■ missing data: Decision support requires historical data
which operational DBs do not typically maintain
■ data consolidation: DS requires consolidation
(aggregation, summarization) of data from
heterogeneous sources
■ data quality: different sources typically use
inconsistent data representations, codes and formats
which have to be reconciled
* 5
From Tables and Spreadsheets
to Data Cubes

■ A data warehouse is based on a multidimensional data model which

views data in the form of a data cube
■ A data cube, such as sales, allows data to be modeled and viewed
in multiple dimensions
■ Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)
■ Fact table contains measures (such as dollars_sold) and keys to
each of the related dimension tables
■ In data warehousing literature, an n-D base cube is called a base
cuboid. The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.
* 6
Cube: A Lattice of Cuboids

all
0-D(apex)
cuboid
tim ite locatio supplie
e m n r 1-D
cuboids
time,item time,location item,location location,supplier
2-D
time,supplier item,supplier cuboids
time,location,supplie
time,item,location r 3-D
cuboids
time,item,supplie item,location,supplier
r
4-D(base)
time, item, location, supplier cuboid
* 7
Conceptual Modeling of
Data Warehouses
■ Modeling data warehouses: dimensions & measures
■ Star schema: A fact table in the middle connected to a
set of dimension tables
■ Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
■ Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
* 8
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key location_key
branch_key
branch_name dollars_sold street
branch_type units_sold city
province_or_street
country
avg_sales
Measures

* 9
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year supplier_key
item_key
branch branch_key location
location_key
location_key
branch_key
street
branch_name
city_key city
branch_type units_sold
city_key
dollars_sold
avg_sales city
province_or_street
Measures country

* 10
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key
from_location
branch location
branch_key location_key to_location
location_key
branch_name street dollars_cost
branch_type units_sold
city
dollars_sold province_or_street units_shipped
country shipper
Measures avg_sales
shipper_key
shipper_name
location_key
* shipper_type 11
Measures: Three Categories
■ distributive: if the result derived by applying the function
to n aggregate values is the same as that derived by
applying the function on all the data without partitioning.
■ E.g., count(), sum(), min(), max().
■ algebraic: if it can be computed by an algebraic function
with M arguments (where M is a bounded integer), each
of which is obtained by applying a distributive aggregate
function.
■ E.g., avg(), min_N(), standard_deviation().
■ holistic: if there is no constant bound on the storage size
needed to describe a subaggregate.
■ E.g., median(), mode(), rank().
* 12
Multidimensional Data
■ Sales volume as a function of product, month,
and region
Dimensions: Product, Location,
Time
n

Hierarchical summarization paths

o
gi

Industry Region Year

Category Country Quarter

Produc

Product City Month Week

Office Day

Mont
* h 13
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qt 2Qt 3Qt 4Qt su
t
uc

TV r
r r r m
od

PC U.S.
Pr

VCR A

Country
su
Canad
m
a
Mexic
o
su
m

* 14
Cuboids Corresponding to the Cube

al
l 0-D(apex)
product countr cuboid
date
y 1-D
cuboids
product,dat product,countr date,
e y country 2-D
cuboids

3-D(base)
product, date, cuboid
country

* 15
Browsing a Data Cube

■ Visualization
■ OLAP capabilities
■ Interactive manipulation
* 16
Typical OLAP Operations

■ Roll up (drill-up): summarize data

■ by climbing up hierarchy or by dimension reduction
■ Drill down (roll down): reverse of roll-up
■ from higher level summary to lower level summary or detailed
data, or introducing new dimensions
■ Slice and dice:
■ project and select
■ Pivot (rotate):
■ reorient the cube, visualization, 3D to series of 2D planes.
■ Other operations
■ drill across: involving (across) more than one fact table
■ drill through: through the bottom level of the cube to its back-
end relational tables (using SQL)
* 17
A Star-Net Query Model
Customer
Shipping
Orders Custome
Method
CONTRACTS r
AIR-EXPRESS

ORDER
TRUCK
PRODUCT LINE
Time Produc
ANNUALY QTRLY DAILY t
PRODUCT ITEM PRODUCT GROUP
CIT
Y SALES PERSON
COUNTRY
DISTRIC
T
REGION
DIVISION
Locatio
Promotio Organization
n
* Each (abstraction
n level) circle is called a footprint 18
Design of a Data Warehouse: A
Business Analysis Framework
■ Four views regarding the design of a data warehouse
■ Top-down view
■ allows selection of the relevant information necessary for the
data warehouse
■ Data source view
■ exposes the information being captured, stored, and
managed by operational systems
■ Data warehouse view
■ consists of fact tables and dimension tables
■ Business query view
■ sees the perspectives of data in the warehouse from the view
of end-user
* 19
Data Warehouse Design Process

■ Top-down, bottom-up approaches or a combination of both

■ Top-down: Starts with overall design and planning (mature)
■ Bottom-up: Starts with experiments and prototypes (rapid)
■ From software engineering point of view
■ Waterfall: structured and systematic analysis at each step before
proceeding to the next
■ Spiral: rapid generation of increasingly functional systems, short
turn around time, quick turn around
■ Typical data warehouse design process
■ Choose a business process to model, e.g., orders, invoices, etc.
■ Choose the grain (atomic level of data) of the business process
■ Choose the dimensions that will apply to each fact table record
■ Choose the measure that will populate each fact table record

* 20
Multi-Tiered Architecture
Monitor
& OLAP Server
other Metadat
Integrato
a
source r
s Analysis
Operational Extract Query
Transform Data Serv Reports
DBs
Load
Warehouse e Data
Refresh
mining

Data
Marts

Data Data OLAP Engine Front-End Tools

* 21
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts

Data Data Enterprise

Mart Mart Data
Warehouse

Model refinement Model refinement

Define a high-level corporate data model

* 22
OLAP Server Architectures
■ Relational OLAP (ROLAP)
■ Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware to support missing pieces
■ Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
■ greater scalability
■ Multidimensional OLAP (MOLAP)
■ Array-based multidimensional storage engine (sparse matrix
techniques)
■ fast indexing to pre-computed summarized data
■ Hybrid OLAP (HOLAP)
■ User flexibility, e.g., low level: relational, high-level: array
■ Specialized SQL servers
■ specialized support for SQL queries over star/snowflake schemas
* 23

12 NODIA Economics
100% (1)
12 NODIA Economics
5 pages
EViews 14 Users Guide I
No ratings yet
EViews 14 Users Guide I
1,107 pages
EXAMTOPICS Quiz GCP
100% (1)
EXAMTOPICS Quiz GCP
43 pages
Disruptive Innovation in Dentistry: What It Is and What Could Be Next
No ratings yet
Disruptive Innovation in Dentistry: What It Is and What Could Be Next
6 pages
COROB Software Getting Started R0 English
100% (1)
COROB Software Getting Started R0 English
12 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
04OLAP
100% (1)
04OLAP
58 pages
04OLAP
No ratings yet
04OLAP
58 pages
DWDM 3
0% (1)
DWDM 3
52 pages
Datawarehouse Notes
No ratings yet
Datawarehouse Notes
39 pages
Unit 1- Data Warehouse
No ratings yet
Unit 1- Data Warehouse
21 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
51 pages
warehouse
No ratings yet
warehouse
58 pages
DMDW_Operations
No ratings yet
DMDW_Operations
65 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
[2025!04!03]-Data Warehouse_lecture 3
No ratings yet
[2025!04!03]-Data Warehouse_lecture 3
41 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
Data Warehouse
No ratings yet
Data Warehouse
174 pages
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
No ratings yet
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
46 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
04OLAP
No ratings yet
04OLAP
66 pages
Chapter 1 Datawarehouse
100% (1)
Chapter 1 Datawarehouse
47 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
Data Warehouse C
No ratings yet
Data Warehouse C
34 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
25 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
04olap New
No ratings yet
04olap New
55 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Data Warehousing and OLAP Technology For Data Mining: What Is A Data Warehouse?
No ratings yet
Data Warehousing and OLAP Technology For Data Mining: What Is A Data Warehouse?
98 pages
Datawarehouse Modeling For BF
No ratings yet
Datawarehouse Modeling For BF
61 pages
_04OLAP_editted_v1_
No ratings yet
_04OLAP_editted_v1_
59 pages
What Is Data Warehouse?
No ratings yet
What Is Data Warehouse?
26 pages
Multitier DW Architecture & Implementation
No ratings yet
Multitier DW Architecture & Implementation
63 pages
04OLAP
No ratings yet
04OLAP
50 pages
CSEP 546 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSEP 546 Data Mining: Instructor: Pedro Domingos
63 pages
Data Mining: Concepts and Techniques: - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 2
62 pages
UNIT-1 Data Warehousing Part-III
No ratings yet
UNIT-1 Data Warehousing Part-III
68 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
DM 6
No ratings yet
DM 6
29 pages
DataWarehousing and Its Relevance
No ratings yet
DataWarehousing and Its Relevance
19 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
IT DWDM Unit I New PPT
No ratings yet
IT DWDM Unit I New PPT
60 pages
Datawarehouse: Fact Table
No ratings yet
Datawarehouse: Fact Table
55 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Unit 2_Data Science BCA
No ratings yet
Unit 2_Data Science BCA
20 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Iare DWDM PPT Cse
No ratings yet
Iare DWDM PPT Cse
249 pages
Chap3-Data Warehousing and OLAP
No ratings yet
Chap3-Data Warehousing and OLAP
67 pages
CSE 592 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSE 592 Data Mining: Instructor: Pedro Domingos
63 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection
From Everand
The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection
Ralph Kimball
No ratings yet
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
4 pages
22 Ignition System Unit II
No ratings yet
22 Ignition System Unit II
69 pages
31 Frames Unit III
No ratings yet
31 Frames Unit III
49 pages
AU5692 Syllabus Assessment I
No ratings yet
AU5692 Syllabus Assessment I
1 page
Abdul Hadi Walizai - Scrum Master
No ratings yet
Abdul Hadi Walizai - Scrum Master
6 pages
SECTOR 19 001 Blockchain Lowres
No ratings yet
SECTOR 19 001 Blockchain Lowres
21 pages
A Smart E-Learning System For Social Networking June1
No ratings yet
A Smart E-Learning System For Social Networking June1
15 pages
Lab 6: User Management: Goals
No ratings yet
Lab 6: User Management: Goals
14 pages
Kshitiz Exp 037
No ratings yet
Kshitiz Exp 037
1 page
Brilliant - Chebyshev Polynomials
No ratings yet
Brilliant - Chebyshev Polynomials
8 pages
DBMS Solution-1
No ratings yet
DBMS Solution-1
10 pages
Major Advancements in Automatic Speech Recognition Technology
No ratings yet
Major Advancements in Automatic Speech Recognition Technology
3 pages
YXC API Spec Basic
No ratings yet
YXC API Spec Basic
129 pages
DL Unit1 HD
No ratings yet
DL Unit1 HD
141 pages
Code Pal Result
No ratings yet
Code Pal Result
2 pages
Enterprise Search: Putting Metadata To Work
No ratings yet
Enterprise Search: Putting Metadata To Work
52 pages
Alcatel-Lucent 7750 SR Mg-Ism
No ratings yet
Alcatel-Lucent 7750 SR Mg-Ism
3 pages
Preseena RC CRP
No ratings yet
Preseena RC CRP
89 pages
Privacy
No ratings yet
Privacy
13 pages
SG8A-ORIN-GMSL2 User manual V2.0-en
No ratings yet
SG8A-ORIN-GMSL2 User manual V2.0-en
27 pages
2021COAZ4108
No ratings yet
2021COAZ4108
228 pages
Marking Scheme MATHEMATICS (Subject Code-041) (PAPER CODE: 65/C/1)
No ratings yet
Marking Scheme MATHEMATICS (Subject Code-041) (PAPER CODE: 65/C/1)
24 pages
Netwrix Auditor User Guide
No ratings yet
Netwrix Auditor User Guide
60 pages
Allegro Platform System Requirements
No ratings yet
Allegro Platform System Requirements
21 pages
The Brief History of Computer
No ratings yet
The Brief History of Computer
54 pages
Publication 1239
No ratings yet
Publication 1239
52 pages
Lecture-25 (KEC-072) Raman Kapoor ABES
No ratings yet
Lecture-25 (KEC-072) Raman Kapoor ABES
16 pages
Disassembly and Model Identification Guide
No ratings yet
Disassembly and Model Identification Guide
9 pages
ELE551 Embedded Systems and IOT Fundamentals Mid Sem Project Work Project Title: Arduino Sunflower
No ratings yet
ELE551 Embedded Systems and IOT Fundamentals Mid Sem Project Work Project Title: Arduino Sunflower
9 pages

Data Warehouse

Uploaded by

Data Warehouse

Uploaded by

Data Warehousing and OLAP

■ Traditional heterogeneous DB integration:

■ A data warehouse is based on a multidimensional data model which

Hierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

■ Roll up (drill-up): summarize data

■ Top-down, bottom-up approaches or a combination of both

Data Data OLAP Engine Front-End Tools

Data Data Enterprise

Model refinement Model refinement

Define a high-level corporate data model

You might also like