0% found this document useful (0 votes)

474 views11 pages

Data Warehouse Architecture

The document discusses the key processes involved in data warehouse architecture: 1) Populating the warehouse through extract, load, and transform processes to clean and structure data for queries. 2) Day-to-day management of the warehouse including backups, archiving old data, and managing large volumes and summaries of data. 3) Ensuring the architecture can evolve over time to handle changing query profiles and requirements.

Uploaded by

Abish Haridasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

474 views11 pages

Data Warehouse Architecture

Uploaded by

Abish Haridasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Data Warehouse Architecture -Processes-

Overview
Architecture a technical blueprint stage Must support 3 major driving forces:
Populating the warehouse
Data extraction, cleaning and loading

Day-to-day management of the warehouse

Large volumes of data, create/delete summaries

The ability to cope with requirement evolution

Cope with future changes with query profiles

Typical Process Flow

Extract and load the data Clean and transform data into a form that provides good query performance Backup and archive data Manage queries, and direct them to appropriate data sources

Extract & Load Process

Extract
Takes data from source systems and make it available to the data warehouse

Load
Takes extracted data and loads it into the data warehouse

Data in operational systems is held in a from suitable for that system Before loading the data into the DW, information content must be reconstructed Data must become value added business information
Extract & load process must take data and add context and meaning

Issues with ELP

When to start extracting the data, run transformation and consistency checks and so on?
A controlling mechanism is essential to fire each module when appropriate

When to extract?
Data must be in consistent Start extracting data from data sources when it represents the same snapshot of time as all other data sources
Eg. Customer database

Issues
Loading the data
Extracted data are loaded into temporary data store to perform clean up and check for consistency Do not execute consistency checks until all the data sources have been loaded into the temporary data store
Eg. Customer canceling subscriptions

Error recovery must be an integral part of the design The effort required to clean up the source systems increases exponentially with the number of overlapping data sources

Issues
Copy Management tools and clean up
Eg. IBMs Information Warehouse Framework
Data Refresher & Data Hub

Most copy management tools do not have the capability of performing consistency check directly (user must write the logic & code it) Make cost-benefit analysis before purchasing copy management tool
If source systems do not overlap, then consistency checks are very simple

Clean and Transform Data

Steps involved are:
Clean and transform the loaded data into a structure that speeds up queries Partition the data to speed up queries, optimize hardware performance and simplify the DW management Create aggregations to speed up the common queries

Clean and Transform Data

Data needs to be cleaned and checked in the following ways:
Make sure data is consistent with itself Make sure data is consistent with other data within the same source Make sure data is consistent with data in the other source systems Make sure data is consistent with the information already in the DW

Once data is cleaned, convert source data into a structure that is designed to balance query performance and operational cost
The structure must be suitable for long term storage

Backup & archive

Regular backup is essential to recover data from loss Archiving
Older data is removed from the system in a format that allows it to be quickly restored if required Issue
As DW evolves, all information may change Hence to ensure that a restored archive is valid, it becomes necessary to extract all related data and structures as well

Query Management Process

It is a system process
Manages the queries Speeds them up by directing queries to the most effective data source Ensure that all system resources are used effectively Monitor query profiles manage which aggregations to generate This process operates at all times

ETL Checklist for Data Warehousing
No ratings yet
ETL Checklist for Data Warehousing
3 pages
Evaluating ETL and Data Integration Plataforms 2003ETLReport
No ratings yet
Evaluating ETL and Data Integration Plataforms 2003ETLReport
40 pages
Database/ETL Architect Expertise
100% (1)
Database/ETL Architect Expertise
4 pages
Comprehensive ETL Process Guide
No ratings yet
Comprehensive ETL Process Guide
27 pages
Best Practices for Data Model Reviews
No ratings yet
Best Practices for Data Model Reviews
3 pages
DWH by Concepts - v1
No ratings yet
DWH by Concepts - v1
56 pages
Modern Data Engineering Tools & Trends
No ratings yet
Modern Data Engineering Tools & Trends
23 pages
Metadata Management On A Hadoop Eco-System: Whitepaper by
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
12 pages
Enterprise Data & Architecture Guide
No ratings yet
Enterprise Data & Architecture Guide
13 pages
Top Data Integration Trends and Best
No ratings yet
Top Data Integration Trends and Best
18 pages
Best Data Warehouse Interview Quastions
No ratings yet
Best Data Warehouse Interview Quastions
50 pages
Inmon vs. Kimball: Data Warehouse Models
No ratings yet
Inmon vs. Kimball: Data Warehouse Models
15 pages
ETL Development Standards Guide
No ratings yet
ETL Development Standards Guide
8 pages
MDM Project Interview Guide
No ratings yet
MDM Project Interview Guide
20 pages
SQL Server to Netezza Migration Case Study
No ratings yet
SQL Server to Netezza Migration Case Study
3 pages
Data Warehousing and Multidimensional Models
No ratings yet
Data Warehousing and Multidimensional Models
71 pages
Data Warehousing and Mining Concepts
100% (1)
Data Warehousing and Mining Concepts
42 pages
ETL Standards For Informatica
100% (2)
ETL Standards For Informatica
16 pages
Staging Area in Data Warehousing
No ratings yet
Staging Area in Data Warehousing
3 pages
Database Management Course Outline
No ratings yet
Database Management Course Outline
1 page
Technical Design Document
No ratings yet
Technical Design Document
4 pages
Azure Data Pipelines Overview
No ratings yet
Azure Data Pipelines Overview
41 pages
SCD Type 3 Implementation Using Informatica PowerCenter
0% (1)
SCD Type 3 Implementation Using Informatica PowerCenter
6 pages
ETL Power Point Presentation
No ratings yet
ETL Power Point Presentation
40 pages
Etl Tools Comparison
No ratings yet
Etl Tools Comparison
21 pages
Data Cleansing for IT Professionals
100% (1)
Data Cleansing for IT Professionals
18 pages
Change Data Capture
No ratings yet
Change Data Capture
10 pages
Comprehensive Data Migration Checklist
No ratings yet
Comprehensive Data Migration Checklist
81 pages
Enterprise Data Warehouse Overview
No ratings yet
Enterprise Data Warehouse Overview
84 pages
Data Warehouse Design Principles
No ratings yet
Data Warehouse Design Principles
30 pages
User Guide: Informatica MDM - Customer 360 10.2 Hotfix 5
No ratings yet
User Guide: Informatica MDM - Customer 360 10.2 Hotfix 5
78 pages
Dataware House
100% (8)
Dataware House
42 pages
Data Warehousing
No ratings yet
Data Warehousing
39 pages
Data Organization Overview
No ratings yet
Data Organization Overview
23 pages
Overview of Apache Hive Essentials
No ratings yet
Overview of Apache Hive Essentials
9 pages
Enterprise Data Planning Overview
No ratings yet
Enterprise Data Planning Overview
24 pages
Business Intelligence Maturity Model Explained
No ratings yet
Business Intelligence Maturity Model Explained
5 pages
Chapter 12 - Data Warehousing and Online Analytical Processing
No ratings yet
Chapter 12 - Data Warehousing and Online Analytical Processing
20 pages
A Comprehensive Approach To Data Warehouse Testing
No ratings yet
A Comprehensive Approach To Data Warehouse Testing
8 pages
MDM ETL Code Review Summary
No ratings yet
MDM ETL Code Review Summary
3 pages
Telecommunication - DWH - Models
No ratings yet
Telecommunication - DWH - Models
3 pages
Understanding Data Warehousing Basics
No ratings yet
Understanding Data Warehousing Basics
65 pages
A Framework For ETL Systems Development
No ratings yet
A Framework For ETL Systems Development
16 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
Data Warehouse Design with Dimensional Modeling
No ratings yet
Data Warehouse Design with Dimensional Modeling
87 pages
Data Mining Concepts and Techniques Guide
100% (1)
Data Mining Concepts and Techniques Guide
63 pages
Logical Design of Data Warehouses
No ratings yet
Logical Design of Data Warehouses
73 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
8 pages
Data Warehouse Development Strategies
No ratings yet
Data Warehouse Development Strategies
25 pages
Data Warehouse Process Overview
No ratings yet
Data Warehouse Process Overview
5 pages
Data Warehouse Tools and Processes
No ratings yet
Data Warehouse Tools and Processes
13 pages
Bi - Unit Iii
No ratings yet
Bi - Unit Iii
65 pages
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
No ratings yet
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
72 pages
DW Olap1
No ratings yet
DW Olap1
88 pages
Data Warehouse Overview and Benefits
No ratings yet
Data Warehouse Overview and Benefits
22 pages
Understanding the ETL Process Steps
No ratings yet
Understanding the ETL Process Steps
30 pages
DWDM (BCS058) 2nd UNIT NOTES
No ratings yet
DWDM (BCS058) 2nd UNIT NOTES
39 pages
DWDM 2 Unit Notes
No ratings yet
DWDM 2 Unit Notes
14 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
43 pages
Database System Lab Instructions
No ratings yet
Database System Lab Instructions
21 pages
Folleto. GA1-240202501-AA1-EV03 Edwin Duran: Analisis y Desarrollo de Software (Servicio Nacional de Aprendizaje)
No ratings yet
Folleto. GA1-240202501-AA1-EV03 Edwin Duran: Analisis y Desarrollo de Software (Servicio Nacional de Aprendizaje)
4 pages
Google Search History Management
No ratings yet
Google Search History Management
59 pages
Informatica Bottlenecks Overview
No ratings yet
Informatica Bottlenecks Overview
7 pages
Software QA: A Comprehensive Guide
No ratings yet
Software QA: A Comprehensive Guide
27 pages
Queues - Daily Tasks: Local Queue
100% (1)
Queues - Daily Tasks: Local Queue
3 pages
Operating Systems: Key Concepts & Evolution
No ratings yet
Operating Systems: Key Concepts & Evolution
19 pages
Sir Naveed Lecture # 01
No ratings yet
Sir Naveed Lecture # 01
3 pages
Principles of Compiler Design Aho Ullman PDF
100% (1)
Principles of Compiler Design Aho Ullman PDF
2 pages
IDOC Handy
No ratings yet
IDOC Handy
4 pages
Manual Tsplines 3 en Español
100% (4)
Manual Tsplines 3 en Español
149 pages
Brocade Qualification Letter Fos 6.4.0c PDF
No ratings yet
Brocade Qualification Letter Fos 6.4.0c PDF
4 pages
Euro Truck Simulator 2 Crash Log
No ratings yet
Euro Truck Simulator 2 Crash Log
53 pages
HubSpots Social Media Content Calendar
No ratings yet
HubSpots Social Media Content Calendar
31 pages
Online Ordering System Project PDF
No ratings yet
Online Ordering System Project PDF
24 pages
Fdocuments - Us - Microscada Pro Sys 600 9 Abb LTD 10 Microscada Pro 1mrs756118 Operation Manual
No ratings yet
Fdocuments - Us - Microscada Pro Sys 600 9 Abb LTD 10 Microscada Pro 1mrs756118 Operation Manual
182 pages
Manuel Freefem++ Pour Les Utilisateurs
100% (2)
Manuel Freefem++ Pour Les Utilisateurs
360 pages
Class X Practical Program
No ratings yet
Class X Practical Program
8 pages
Files From Scribd For Free Without An Account
No ratings yet
Files From Scribd For Free Without An Account
2 pages
Participant Portal User Manual
No ratings yet
Participant Portal User Manual
39 pages
Understanding the /etc/init.d Directory
No ratings yet
Understanding the /etc/init.d Directory
2 pages
Cloud Computing: Benefits, Models & Challenges
No ratings yet
Cloud Computing: Benefits, Models & Challenges
10 pages
Java Streams: Sequential vs Parallel
No ratings yet
Java Streams: Sequential vs Parallel
25 pages
Firmware Update Guide EN TW
No ratings yet
Firmware Update Guide EN TW
5 pages
New Table PRCD - ELEMENTS in S - 4 HANA
100% (1)
New Table PRCD - ELEMENTS in S - 4 HANA
9 pages
QRadar Apps Overview - IBM Documentation
No ratings yet
QRadar Apps Overview - IBM Documentation
2 pages
SAP Extended Warehouse (EWM) Management Training: 4systems Info Solutions
No ratings yet
SAP Extended Warehouse (EWM) Management Training: 4systems Info Solutions
12 pages
Ds Export For Eops From Pre Prod 1
No ratings yet
Ds Export For Eops From Pre Prod 1
250 pages
Windows Server 2016 Administration Course
No ratings yet
Windows Server 2016 Administration Course
23 pages
IB Lite 1.2 Reference - Guide
No ratings yet
IB Lite 1.2 Reference - Guide
19 pages

Data Warehouse Architecture

Uploaded by

Data Warehouse Architecture

Uploaded by

Data Warehouse Architecture -Processes-

Day-to-day management of the warehouse

The ability to cope with requirement evolution

Typical Process Flow

Extract & Load Process

Issues with ELP

Clean and Transform Data

Clean and Transform Data

Backup & archive

Query Management Process

You might also like