An Introduction
to
Data Warehousing
Course Roadmap
► Data Warehousing - An Overview
► Data Warehouse Architecture
► Data Modeling for Data Warehousing
► Data Cleansing,
► Data Extraction, Transformation and Load
► Metadata Management
► Data Warehouse Databases
► Data Access and Analysis
► Data Warehousing at Wipro
► Wipro Data Warehouse Process Model
Objectives
► At the end of this lesson, you will know :
What is Data Warehousing
The evolution of Data Warehousing
Need for Data Warehousing
OLTP Vs Warehouse Applications
Data marts Vs Data Warehouses
Operational Data Stores
Overview of Warehouse Architecture
What is a Data Warehouse ? Can I see credit
Can I see credit
report from
report from
Accounts, Sales
Accounts, Sales
from marketing and
from marketing and
open order report
Data from open order report
Data from from order entry for
multiple from order entry for
multiple this customer
sources is this customer
sources is
integrated for
integrated for
a subject
a subject
A data warehouse is a subject-oriented,
integrated, nonvolatile, time-variant
collection of data in support of
management's decisions.
Identical
Identical
- WH Inmon
queries will
queries will
give same
give same
results at
results at Data stored for
different Data stored for
different historical period.
times. historical period.
times. Data is populated in
Supports Data is populated in
Supports the data warehouse
analysis the data warehouse
analysis on daily/weekly
requiring on daily/weekly
requiring basis depending
historical basis depending
historical upon the
data upon the
WH Inmon
data - Regarded As Father Of Data Warehousing requirement.
requirement.
Subject-Oriented-
Characteristics of a Data
Warehouse
Operation Data
al Warehouse
Leads Prospects Customers Products
Quotes Regions Time
Orders
Focus is on Subject Areas rather than Applications
Integrated - Characteristics
of a Data Warehouse
Appl A - m,f
Appl B - 1,0 m,f
Appl C - male,female
Appl A - balance dec fixed (13,2)
balance dec
Appl B - balance pic 9(9)V99
fixed (13,2)
Appl C - balance pic S9(7)V99 comp-3
Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand
Appl A - date (julian)
Appl B - date (yymmdd) date (julian)
Appl C - date (absolute)
Integrated View Is The Essence Of A Data Warehouse
Non-volatile - Characteristics
of a Data Warehouse
insert change
Operational Data
Warehouse
insert
delet
e load
read only
access
replace
change
Integrated View Is The Essence Of A Data Warehouse
Time Variant - Characteristics
of a Data Warehouse
Data
Operational
Warehouse
Current Value data Snapshot data
• time horizon : 60-90 days • time horizon : 5-10 years
• key may not have element of • key has an element of time
time • data warehouse stores
historical data
Data Warehouse Typically Spans Across Time
Alternate Definitions
A collection of integrated, subject
oriented databases designed to
support the DSS function, where each
unit of data is relevant to some
moment of time
- Imhoff
Alternate Definitions
Data Warehouse is a repository of data
summarized or aggregated in
simplified form from operational
systems. End user orientated data
access and reporting tools let user get
at the data for decision support -
Babcock
Evolution of Data
1960 - 1985 : MIS Warehousing
Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Focus on Reporting
Evolution of Data
Warehousing
1985 - 1990 : Querying Era
• Adhoc, unstructured access to corporate data
• SQL as interface not scalable
• Cannot handle complex analysis
Focus on Online Querying
Evolution of Data
Warehousing
1990 - 20xx : Analysis Era
• Trend Analysis
• What If ?
• Moving Averages
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
Focus on Online Analysis
Need for Data Warehousing
► Better business intelligence for end-users
► Reduction in time to locate, access, and analyze information
► Consolidation of disparate information sources
► Strategic advantage over competitors
► Faster time-to-market for products and services
► Replacement of older, less-responsive decision support
systems
► Reduction in demand on IS to generate reports
OLTP Vs Warehouse
Operational System Data Warehouse
Transaction Processing Query Processing
Time Sensitive History Oriented
Operator View Managerial View
Organized by transactions Organized by subject (Customer,
(Order, Input, Inventory) Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent users
Volatile Data Non Volatile Data
Stores all data Stores relevant data
Not Flexible Flexible
Processing Power
Capacity Planning
Time of day
Processing Load Peaks During the Beginning and End of Day
Examples Of Some
Applications
►Target Marketing
Manufacturers
Manufacturers Retailers
Retailers
►Market Segmentation
►Budgeting
►Credit Rating Agencies
►Financial Reporting and
Consolidation
Market Basket Analysis - POS Analysis Customers
Customers
Churn Analysis
Profitability Management
Event tracking
Do we need a separate
► OLTP
database ?
and data warehousing require two very differently configured systems
► Isolation of Production System from Business Intelligence System
► Significant and highly variable resource demands of the data warehouse
► Cost of disk space no longer a concern
► Production systems not designed for query processing
Data
Marts
► Enterprise wide data warehousing projects have a
Enterprise wide data warehousing projects have a
very large cycle time
► Getting consensus between multiple parties may
also be difficult
► Departments may not be satisfied with priority
accorded to them
► Sometimes individual departmental needs may be
strong enough to warrant a local implementation
► Application/database distribution is also an
important factor
Data Marts
► Subject or Application Oriented
Business View of Warehouse
Quick Solution to a specific Business
Problem
Finance, Manufacturing, Sales etc.
Smaller amount of data used for Analytic
Processing
A Logical Subset of The Complete Data Warehouse
Data Warehouses or Data Marts
For companies interested in changing their
corporate cultures or integrating separate
departments, an enterprise wide approach
makes sense.
Companies that want a quick solution to a
specific business problem are better served by a
standalone data mart.
Some companies opt to build a warehouse
incrementally, data mart by data mart.
A Logical Subset of The Complete Data Warehouse
Data Warehouse and Data
Mart
Data Warehouse Data Marts
Scope ►Application Neutral ►Specific
►Centralized, Shared Application
►Cross
Requirement
LOB/enterprise ►LOB,
department
►Business
Process Oriented
Data ►Historical Detailed ►Detailed (some
Perspectiv data history)
e ►Some summary ►Summarized
Subjects ►Multiple subject ►Single Partial
areas subject
►Multiple partial
subjects
Data Warehouse and Data
Mart
Data Warehouse Data Marts
Data Sources ►Many ►Few
►Operational/ ►Operational,
External Data external data
Implement ►9-18 months for ►4-12 months
Time Frame first stage
►Multiple stage
implementation
Characteristics►Flexible, ►Restrictive,
extensible non extensible
►Durable/Strategic ►Short
►Data orientation life/tactical
►Project
Orientation
Warehouse or Mart First ?
Data Warehouse First Data Mart first
Expensive Relatively cheap
Large development cycle Delivered in < 6 months
Change management is Easy to manage change
difficult
Difficult to obtain Can lead to independent
continuous corporate and incompatible marts
support
Technical challenges in Cleansing,
building large databases transformation,
modeling techniques
may be incompatible
OLTP Systems Vs Data
Warehouse
Remember
Between OLTP and Data Warehouse systems
users are different
data content is different,
data structures are different
hardware is different
Understanding The Differences Is The Key
Operational Data Store -
Definition
A
Data
B ODS Warehouse
Operational
DSS
Can I see credit
ODS - Definition
report from
Accounts, Sales Data from multiple
from marketing sources is integrated
and open order for a subject
report from order
entry for this
customer
A subject oriented, integrated,
volatile, current valued data store
containing only corporate
detailed data
Identical queries may
give different results
Data stored only
at different times.
for current
Supports analysis
period. Old Data
requiring current
is either archived
data
or moved to Data
Warehouse
Operational Data Store
► The ODS applies only to the world of
operational systems.
► The ODS contains current valued and
near current valued data.
► The ODS contains almost exclusively
all detail data
► The ODS requires a full function,
update, record oriented environment.
Operational Data
Store
► Functions of an ODS
► Converts Data,
► Decides Which Data of Multiple Sources Is the Best,
► Summarizes Data,
► Decodes/encodes Data,
► Alters the Key Structures,
► Alters the Physical Structures,
► Reformats Data,
► Internally Represents Data,
► Recalculates Data.
Different kinds of Information
Needs
Is this medicine available
►Current
► Current in stock
What are the tests this
►Recent
► Recent patient has completed so
far
Has the incidence of
►Historica
► Historica Tuberculosis increased in
ll last 5 years in Southern
region
OLTP Vs ODS Vs
DWH
Characteristic OLTP ODS Data Warehouse
Audience Operating Analysts Managers and
Personnel analysts
Data access Individual records, Individual records, Set of records,
transaction driven transaction or analysis driven
analysis driven
Data content Current, real-time Current and near- Historical
current
Data Structure Detailed Detailed and lightly Detailed and
summarized Summarized
Data Functional Subject-oriented Subject-oriented
organization
Type of Data Homogeneous Homogeneous Vast Supply of very
heterogeneous data
OLTP Vs ODS Vs
DWH
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant within Somewhat Managed redundancy
system; Unmanaged redundant with
redundancy among operational
systems databases
Data update Field by field Field by field Controlled batch
Database size Moderate Moderate Large to very large
Development Requirements driven, Data driven, Data driven,
Methodology structured somewhat evolutionary
evolutionary
Philosophy Support day-to-day Support day-to- Support managing
operation day decisions & the enterprise
operational
activities
Typical Data Warehouse
Architecture
Data
Marts
EIS /DSS
Metadata
Select Query Tools
Extract
Transform
Integrate
Data OLAP/ROLAP
Warehouse
Maintain
Web Browsers
Operational
Systems/Data Middleware/
Data API Data Mining
Preparation
Multi-tiered Data Warehouse without ODS
Typical Data Warehouse
Architecture
Data
Marts
Metadata Metadata
Select Select
Extract Extract
ODS Data
Transform
Transform Warehouse
Integrate Load
Maintain
Operational
Systems/Data
Data
Data
Preparation
Preparation
Multi-tiered Data Warehouse with ODS
Questions