0% found this document useful (0 votes)
28 views36 pages

Lecture-2 the Building Blocks

Uploaded by

muneebmalik5527
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views36 pages

Lecture-2 the Building Blocks

Uploaded by

muneebmalik5527
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

DATA WAREHOUSE:

THE BUILDING BLOCKS


Lecture # 02
Instructor: Mr. Sharjeel Ahmed
Slide Elements
• Data Warehouse
• Data Warehouse (Definition)
• Defining Features
• Data Warehouse and Data Marts

• Building Blocks of Data Warehouse


• Source Data Component
• Data Staging Component
• Data Storage Component
• Information Storage Component
• Information Delivery Component
• Metadata Component
• Management and Control Component
DATA WAREHOUSE
Data Warehouse - 01
• Bill Inmon, considered to be the father of Data
Warehousing provides the following definition:

“A Data Warehouse is a subject oriented,


integrated, nonvolatile, and time variant collection
of data in support of management’s decisions.”
Data Warehouse - 02
• Sean Kelly, another leading data warehousing
practitioner defines the data warehouse in the following
way:

“ The data in the data warehouse is:


• Separate
• Available
• Integrated
• Time stamped
• Subject oriented
• Nonvolatile
• Accessible ”
Defining Features - 01
• Some of the key defining features of the data
warehouse based on previous definitions:

• Subject- Oriented Data


• Integrated Data
• Time- Variant Data
• Nonvolatile Data
• Data Granularity
Defining Features - 02
Subject- Oriented Data – I
• In operational systems, we store data by individual
applications. The data sets for each application need to be
organized around that specific application to provide data
for the specific functions efficiently.

• In striking contrast, in the data warehouse, data is stored


by business subjects, not by applications.
• Business subjects differ from enterprise to enterprise.
These are the subjects critical for the enterprise. For
example: for a manufacturing company, sales, shipments,
and inventory are critical business subjects.
Defining Features - 03
Subject- Oriented Data - II
Defining Features - 04
Integrated Data - I
• The data in the data warehouse comes from several
operational systems. For proper decision making, you
need to pull together all the relevant data from the various
applications.
• Source data are in different databases, files, and data
segments. These are disparate applications, so the
operational platforms and operating systems could be
different.
• In addition to data from internal operational systems, data
from outside sources is likely to be very important.
Companies such as Metro Mail specializes in providing
vital data on a regular basis.
Defining Features - 05
Integrated Data - II
• The file layouts, character code representations, and field
naming conventions all could be different.
• Before the data from various disparate sources can be
usefully stored in a data ware house, you have to remove
the inconsistencies.
• Here are some of the items that would need
standardization:
• Naming conventions
• Codes
• Data attributes
• Measurements
Defining Features - 06
Integrated Data - III
Defining Features - 07
Time Variant Data - I
• Operational systems reflect current information because these
systems support day-to-day current operations.

• On the other hand, the data in the data warehouse is meant for
analysis and decision making. A data warehouse, because of
the very nature of its purpose, has to contain historical data.
• Data is stored as snapshots over past and current periods.
• Every data structure in the data warehouse contains the time
element. You will find historical snapshots of the operational
data in the data warehouse.
Defining Features - 08
Time Variant Data - II
• The time-variant nature of the data in a data warehouse:
• Allows for analysis of the past
• Relates information to the present
• Enables forecasts for the future
Defining Features - 09
Non Volatile Data - I
• The business transactions update the operational system
databases in real time.

• The data in a data warehouse is not as volatile as the


data in an operational database is.
• The data in the data warehouse is not intended to run the
day-to-day business. The data in a data warehouse is
primarily for query and analysis.
• You do not usually update data and do not delete the
data in the data warehouse in real time.
• Data updates are commonplace in an operational
database; not so in a data warehouse.
Defining Features - 10
Non Volatile Data - II
• Once the data is captured in the data warehouse, you do
not run individual transactions to change the data there
Defining Features - 11
Data Granularity - I
• Data granularity refers to the level of detail.
• Depending on the requirements, multiple levels of detail
may be present. Many data warehouses have at least dual
levels of granularity.
• Depending on the query, you can then go to the particular
level of detail and satisfy the query.
• The lower the level of detail, the finer the data granularity.
• To keep data in the lowest level of detail, you have to store
a lot of data in the data warehouse.
Defining Features - 12
Data Granularity - II
• Examples of data granularity in a typical data
• warehouse.
Data Warehouse and Data Marts - 01
• A data mart is a logical subset of the complete data
warehouse.
Data Warehouse and Data Marts - 02
• Writing in a leading trade magazine in 1998, Bill Inmon
stated, “ The single most important issue facing the IT
manager this year is whether to build the data warehouse
first or the data mart first. ”

• Basic and fundamental questions to address the


relevant issues before deciding to build a data warehouse:
• Top-down or bottom-up approach?
• Enterprise-wide or departmental?
• Which first—data warehouse or data mart?
• Build pilot or go with a full-fledged implementation?
• Dependent or independent data marts?
Data Warehouse and Data Marts - 03
Top-Down Approach- Build Data Warehouse
First
• The advantages of this approach are:
• A truly corporate effort, an enterprise view of data
• Inherently architected—not a union of disparate data marts
• Single, central storage of data about the content
• Centralized rules and control
• May see quick results if implemented with iterations
• The disadvantages are:
• Takes longer to build even with an iterative method
• High exposure/risk to failure
• Needs high level of cross-functional skills
• High outlay without proof of concept
Data Warehouse and Data Marts - 04
Bottom-up Approach- Build Data Marts First
• The advantages of this approach are:
• Faster and easier implementation of manageable pieces
• Favorable return on investment and proof of concept
• Less risk of failure
• Inherently incremental; can schedule important data marts first
• Allows project team to learn and grow
• The disadvantages are:
• Each data mart has its own narrow view of data
• Permeates redundant data in every data mart
• Perpetuates inconsistent and irreconcilable data
• Proliferates unmanageable interfaces
Data Warehouse and Data Marts - 05
Practical Approach
• Although both the top-down and the bottom-up
approaches each have their own advantages and
drawbacks, a compromise approach accommodating both
views appears to be practical. Key of this approach is:
• You first plan at the enterprise level. You gather
requirements at the overall level. You establish the
architecture for the complete warehouse.
• Then you determine the data content for each supermart.
• Supermarts are carefully architected data marts. You
implement these supermarts, one at a time.
Data Warehouse and Data Marts - 06
Practical Approach (Cont. )
• Before implementation, you make sure to standardized data in
terms of data types, field lengths, precision, and semantics.
This will avoid spread of disparate data across several data
marts.
• A data warehouse, therefore, is a conformed union of all data
marts.
• The steps in this practical approach are as follows:
• Plan and define requirements at the overall corporate level
• Create a surrounding architecture for a complete warehouse
• Conform and standardize the data content
• Implement the data warehouse as a series of supermarts, one at a
time
BUILDING BLOCKS OF
DATA WAREHOUSE
Data Warehouse Building Blocks
Overview Of The Components
• Each data warehouse is put together with the same building
blocks. The essential difference for each organization is in
the way these building blocks are arranged. The variation is
in the manner in which some of the blocks are made stronger
than others in the architecture.

• Components:
• Source Data Component
• Data Staging Component
• Data Storage Component
• Information Storage Component
• Information Delivery Component
• Metadata Component
• Management and Control Component
Source Data Component - 01
• Source data coming into the data warehouse may be
grouped into four broad categories:

• Production Data: This data comes from the various


operational systems of the enterprise itself. The great
challenge is to standardize and transform the disparate
data that is coming from different production sources.
• Internal Data: In every organization, users keep their
“private” spreadsheets, documents, customer profiles, and
sometimes even departmental databases. This is the
internal data, parts of which could be useful in a data
warehouse.
Source Data Component - 02
• Archived Data: A data warehouse keeps historical
snapshots of data. You essentially need historical data for
analysis over time. For getting historical information, you
look into your archived data sets.

• External Data: Most executives depend on data from


external sources for a high percentage of the information
they use. In order to spot industry trends and compare
performance against other organizations, you need data
from external sources.
Data Staging Component - 01
• After you have extracted data from various operational
systems and from external sources, you have to prepare
the data for storing in the data warehouse.
• The extracted data coming from several disparate sources
needs to be changed, converted, and made ready in a
format that is suitable to be stored for querying and
analysis.
• Three major functions need to be performed for getting
the data ready.
• You have to extract the data, transform the data, and then
load the data into the data warehouse storage.
Data Staging Component - 02
• let us understand what happens in data staging:
• Data Extraction: This function has to deal with numerous data
sources. You have to employ the appropriate technique for each data
source. Source data may be from different source machines in diverse
data formats.

• Data Transformation: Data for a data warehouse comes from many


disparate sources. First, you clean the data extracted from each
source. Standardization of data elements forms a large part of data
transformation. Semantic standardization is another major task. You
resolve synonyms and homonyms. When two or more terms from
different source systems mean the same thing, you resolve the
synonyms. When a single term means many different things in
different source systems, you resolve the homonym. Sorting and
merging of data takes place on a large scale in the data staging area.
Data Staging Component - 03
• Data Loading: Two distinct groups of tasks form the data loading
function. When you complete the design and construction of the data
warehouse and go live for the first time, you do the initial loading of
the data into the data warehouse storage. The initial load moves large
volumes of data using up substantial amounts of time. As the data
warehouse starts functioning, you continue to extract the changes to
the source data, transform the data revisions, and feed the
incremental data revisions on an ongoing basis.
Data Storage Component
• The data repositories for the operational systems typically contain
only the current data. Also, these data repositories contain the data
structured in highly normalized formats for fast and efficient
processing.

• The data storage for the data warehouse is a separate repository


which stores large volumes of historical data for analysis. You have to
keep the data in the data warehouse in structures suitable for
analysis, and not for quick retrieval of individual pieces of information.
• The data warehouses are “read-only” data repositories.
• The data warehouse must be open to different tools. Most of the
data warehouses employ relational database management systems.
Many of the data warehouses also employ multidimensional database
management systems.
Information System Component - 01
• In order to provide information to the wide community of data
warehouse users, the information delivery component includes
different methods of information delivery.
Information System Component - 02
• Ad hoc reports are predefined reports primarily meant for novice and
casual users.
• Provision for complex queries, multidimensional (MD) analysis, and
statistical analysis cater to the needs of the business analysts and
power users.
• Information fed into Executive Information Systems (EIS) is meant for
senior executives and high-level managers.
• Some data warehouses also provide data to data-mining applications.
• Data-mining applications are knowledge discovery systems where the
mining algorithms help you discover trends and patterns from the
usage of your data.
MetaData Component
• Metadata in a data warehouse is similar to the data dictionary or the
data catalog in a database management system.
• The data dictionary contains data about the data in the database.
• Similarly, the metadata component is the data about the data in the
data warehouse.

• Types of Metadata
• Operational Metadata: Operational metadata contain all information about
the operational data sources.
• Extraction and Transformation Metadata: contain data about the extraction
of data from the source systems.
• End-User Metadata: The end-user metadata is the navigational map of the
data warehouse. It enables the end-users to find information from the data
warehouse.
Management and Control Component
• This component of the data warehouse architecture sits on top of all
the other components.
• The management and control component coordinates the services
and activities within the data warehouse.
• This component controls the data transformation and the data transfer
into the data warehouse storage.
• On the other hand, it moderates the information delivery to the users.
• It works with the database management systems and enables data to
be properly stored in the repositories.
• It monitors the movement of data into the staging area and from there
into the data warehouse storage itself.
• The management and control component interacts with the metadata
component to perform the management and control functions.

You might also like