DWH Concepts
DWH Concepts
to
Data Warehousing
1
Objectives
Data
Operational
Warehouse
Appl A - m,f
Appl B - 1,0 m,f
Appl C - male,female
Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand
insert change
Operational Data
Warehouse
insert
delete
load
read only
access
replace
change
Operational Data
Warehouse
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Focus on Reporting
Evolution of Data Warehousing
• Trend Analysis
• What If ?
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
hardware is different
Understanding The Differences Is The Key
OLTP Vs Warehouse
Efficiency Effectiveness
Data Marts
Enterprise wide data warehousing projects have a
very large cycle time
Getting consensus between multiple parties may
also be difficult
Departments may not be satisfied with priority
accorded to them
Sometimes individual departmental needs may be
strong enough to warrant a local implementation
Application/database distribution is also an
important factor
Data Marts
Warehouse
» Finance, Manufacturing, Sales etc.
Data
Marts
EIS /DSS
Metadata
Select Query Tools
Extract
Transform
Integrate Data OLAP/ROLAP
Maintain Warehouse
Web Browsers
Operational
Systems/Data Middleware/
API Data Mining
Data
Preparation
Data
Marts
Metadata Metadata
Select Select
Extract Extract
ODS Transform
Data
Transform Warehouse
Integrate Load
Maintain
Operational
Systems/Data
Data
Data
Preparation
Preparation
EIS /DSS
Metadata
Web Browsers
Operational
Systems/Data Middleware/
API Data Mining
Data
Preparation
Metadata
EIS /DSS
Data Mart
Metadata
Select Query Tools
Extract
Transform Data Mart
Integrate
OLAP/ROLAP
Maintain
Metadata
Web Browsers
Operational Data Mart
Systems/Data Middleware/
Data API Data Mining
Preparation
Data
Marts
EIS /DSS
Metadata
Web Browsers
Operational
Systems/Data Middleware/
Operational
API Data Mining
Data Data Store
Preparation
(Ralph Kimball)
– Second supports ER for Data Warehouse and Star
LAN
Data Warehouse
Server
Processes
•Extract
• Scrubbing
• Transformation DW is sum
• Load Jobs total of all
• Aggregation Jobs Data Marts
• Replication
• Monitoring
• Management DW Bus using
• Meta Data Repository Conformed Dimensions
• Meta Data Population
• Meta Data Maintenance
LAN
Data Warehouse Server
Processes
•
Extract
• Scrubbing
• Transformation
• Load Jobs
• Aggregation Jobs
• Replication
• Monitoring Detail Data
• Management in ER format
• Meta Data Repository
• Meta Data Population
• Meta Data Maintenance
Summarized Data
in Star formats
Requirement Gathering
Analysis
Deciding Database
Schema Generation
Normalization
Entity
– Object that can be observed and classified by its
properties and characteristics
– Business definition with a clear boundary
– Characterized by a noun
– Example
• Product
• Employee
Entity-Relationship Modeling - Basic Concepts
Relationship
– Relationship between entities - structural interaction and
association
– described by a verb
– Cardinality
• 1-1
• 1-M
• M-M
– Example : Books belong to Printed Media
Dimension Modeling - Basic Concepts
Dimension
– Collection of members or units of the same type of
views
– Determine contextual background for facts
– Parameters for OLAP
– Examples :
• Time
• Location/Region
• Customers
Dimensional Modeling - Basic Concepts
Measures
– A numeric attribute of a fact
– Represents performance or behavior of the business
relative to the dimensions
– The actual numbers are called variables
– Examples :
• Quantity supplied
• Transaction amount
• Sales volume
Dimensional Data Model
Star Schema
– Fact Tables
– Dimension Tables
Snowflake Schema
Coverage Tables
Factless Tables
Star Join Schema Design
FK
City Salesrep table
FK
Sales District Order Header Customer Table
Sales Region FK
Order Details Item Table
Product Category
Star Schema
CITY Dimension
PRODUCT
DISTRICT s
BRAND
STATE CITY
CATEGORY
REGION PRODUCT
COLOR
PERIOD
SIZE
CUSTOMER
SALES AMOUNT
CUSTOMER
DAY UNITS
ADDRESS
MONTH
CATEGORY
QUARTER
CONTACT
YEAR Measures
Fact Table & Dimension Tables
dimension table).
– (NOTE : FT’s must be in 1,2,3 normal forms.)
Dimensional Tables
Attributes should be textual and discrete
Occupy very little space compared to Fact Tables
Some common dimensions are :
– Customer
– Geography
– Time
– Products
Date
Date_Key
Dimension Student
Student_Key
Dimension
Course Course_Key
Dimension Teacher
Teacher_Key
Dimension
Facility Facility_Key
Dimension