0% found this document useful (0 votes)
15 views47 pages

Data Warehousing Fundamentals

DWM MOD 1

Uploaded by

zain.fataks110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views47 pages

Data Warehousing Fundamentals

DWM MOD 1

Uploaded by

zain.fataks110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Module 1.

1:
Data Warehousing
Fundamentals
Content
• Introduction to Data Warehouse
• Features of Data Warehouse
• Need of Data Warehouse
• Applications of Data Warehouse
• Benefits of Data Warehouse
• Data warehouse Architecture

• Data warehouse versus Data Marts


Data Warehouse: Introduction
• Data Warehouse is a relational database management system
(RDBMS) constructed to meet the requirement of transaction
processing systems.
• Loosely described as any centralized data repository which can be
queried for business benefits.
• It is a database that stores information oriented to satisfy
decision-making requests. It is a group of decision support
technologies that targets to enabling the knowledge worker
(executive, manager, and analyst) to make superior and higher
decisions.
• It includes historical data derived from transaction data from single
and multiple sources.
• According to William H. Inmon, “A data warehouse is a
subject-oriented, integrated, time-variant, and non-volatile
collection of data in support of management's decision making
process.“
Database vs. Data Warehouse
Database Data Warehouse

Data is collected for multiple transactional Data is collected on an extensive scale to


purposes. perform analytics.

Databases provide real-time data Data warehouses store data to be accessed for
big analytical queries.

OLTP is an online database modifying system, for Data warehouse is an example of an OLAP
example, ATM. system or an online database query answering
system.
Features of Data Warehouse
1. Subject Oriented
• A data warehouse target on the modeling and analysis of data for
decision-makers. Therefore, data warehouses typically provide a
concise and straight forward view around a particular subject, such
as customer, product, or sales, instead of the global organization's
ongoing operations.
• This is done by excluding data that are not useful concerning the
subject and including all data needed by the users to understand the
subject
Features of Data Warehouse
2. Integrated
• A data warehouse integrates various heterogeneous data
sources like RDBMS, flat files, and online transaction records.
• It requires performing data cleaning and integration during
data warehousing to ensure consistency in naming
conventions, encoding structures, attributes measures, etc.,
among different data sources.
• Data warehouse systems ingest data from practically all
important operational systems to power analytics with a
broad and complete picture of all enterprise data plus other
data sources beyond the scope of the enterprise.
• Ultimately a lot of heterogeneous data needs to be integrated
and combined.
• Data across separate sources needs to be aligned and
harmonized, and standardized. The need for data cleansing
and data quality control is significant.
Features of Data Warehouse
3. Time Variant
• Historical information is kept in a data warehouse. For
example, one can retrieve files from 3 months, 6 months, 12
months, or even previous data from a data warehouse.
• These varies with a transactions system, where often only the
most current file is kept.
• Every key structure in the data warehouse contains, either
implicitly or explicitly, a time element.
Features of Data Warehouse
4. Non-volatile
• The data warehouse is a physically separate data storage, which is
transformed from the source operational RDBMS.
• The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not
performed.
• It usually requires only two procedures in data accessing: initial
loading of data and access to data.
• Therefore, the data warehouse does not require transaction
processing, recovery, and concurrency capabilities, which allows for
substantial speedup of data retrieval.
• Non-Volatile defines that once entered into the warehouse, and
data should not change.
Need of Data warehouse
1. Business User : Business Users require a data warehouse to view
summarized data from the past. Since these people are
non-technical, the data may be presented to them in an
elementary form.
2. Store historical data : Data Warehouse is required to store the
time variable data from the past. This input is made to be used
for various purposes.
3. Make strategic decisions : Some strategies may be depending
upon the data in the data warehouse. So. data warehouse
contributes to making strategic decisions.
4. For data consistency and quality: Bringing the data from
different sources at a commonplace, the user can effectively
undertake to bring the uniformity and consistency in data.
5. High response time : Data warehouse has to be ready for
somewhat unexpected loads and types of queries, which
demands a significant degree of flexibility and quick response
time.
Applications of Data Warehouse
1. Airline
2. Banking
3. Healthcare
4. Public sector
5. Investment and Insurance sector
6. Retail Chain
7. Telecommunication
8. Hospitality Industry
Benefits of Data Warehouse
1. Delivers enhanced business intelligence
2. Saves time
3. Enhance data quality and consistency
4. Generates high Return on Investment(ROI)
5. Provides competitive advantages
6. Improves decision making process
7. Enables organizations to forecast with confidence
8. Streamlines flow of information
Approaches to Build Data Warehouse
1. Top-Down Approach
• This is the big-picture approach to building
the overall, massive, enterprise-wide data
warehouse. There is no collection of
information sources here.
• The data warehouse is large and
well-integrated.
• This approach, on the other hand, would
take longer to build and has a higher
failure rate.
• This approach could be dangerous if you
do not have experienced professionals on
your team.
• It will also be difficult to sell this approach
to senior management and sponsors.
• They are unlikely to see results soon
enough.
Approaches to Build Data Warehouse
Advantages
• Represents a data view from the perspective of the enterprise.
• Data about the content is stored in a single, central location
• Centralized control and rules

Disadvantage's
• Even with an iterative strategy, building takes longer.
• High failure risk/exposure
• Requires a high level of cross-functional expertise
• Expenses are high without proof of concept.
Approaches to Build Data Warehouse
Bottom-Up Approach
• You create departmental data marts
one by one using this bottom-up
method.
• To figure out which data marts to
build first, you'd create a priority list.
• The most serious disadvantage of
this method is data fragmentation.
• Each data mart will be blind to the
organization's overarching
requirements.
Approaches to Build Data Warehouse
Advantages:
• Implementation of small portions is faster and easier.
• Favorable return on investment and proof of concept.
• There is a lower chance of failure.
• Inherently incremental; significant data marts can be

Disadvantages:
• Each data mart has its own perspective on information.
• Every data mart is flooded with redundant information.
• Increases the number of unmanageable interfaces.
Data Warehouse Architecture
• The GUI (graphical user interface) is used to interact with
users for data entry in the front-end component.
• The database management system, such as Oracle, Informix,
or Microsoft SQL Server, is part of the data storage
component.
• The display component consists of the user's screens and
reports. The connectivity component consists of data
interfaces and network software.
• Architecture is the proper arrangement of the components in
the most efficient manner possible, based on the information
requirements and the structure of our firm.
• You put together a data warehouse using software and
hardware. You organize these building blocks in a specific way
to meet the needs of your company for optimal benefit.
Data Warehouse Architecture
Data Warehouse
Architecture(continued)
Source Data Component
• Source data coming into the data warehouse may be grouped into
four broad categories:
Production Data
• This category of data comes from the various operational systems of
the enterprise.
• These normally include financial systems, manufacturing systems,
systems along the supply chain, and customer relationship
management systems.
• While dealing with this data, you come across many variations in the
data formats.
• data resides on different hardware platforms
• data is supported by different database systems and operating systems
Data Warehouse
Architecture(continued)
Internal Data
• In every organization, users keep their “private” spreadsheets,
documents, customer profiles, and sometimes even departmental
databases which could be useful in a data warehouse

Archived Data
• Operational systems are primarily intended to run the current
business.
• In every operational system, you periodically take the old data and
store it in archived files.
• Some data is archived after a year. Sometimes data is left in the
operational system databases for as long as five years.
Data Warehouse
Architecture(continued)
External Data
• Most executives depend on data from external sources for a high
percentage of the information they use.
• They use statistics relating to their industry produced by external
agencies and national statistical offices.
• They use market share data of competitors.
• They use standard values of financial indicators for their business to
check on their performance.
Data Warehouse Architecture(continued)
Data Warehouse Architecture(continued)
Data Staging Component
• After you have extracted data from various operational
systems and from external sources, prepare the data for
storing in the data warehouse.
• Extracted data coming from several disparate sources needs
to be changed, converted, and made ready in a format that is
suitable to be stored for querying and analysis.
• Three major functions need to be performed for getting the
data ready.
• Extract the data, Transform the data, and then Load the data into
the data warehouse storage.
• Data staging provides a place and an area with a set of
functions to clean, change, combine, convert, duplicate, and
prepare source data for storage and use in the data
warehouse.
Data Warehouse Components
(continued)
Data Extraction
• This function has to deal with numerous data sources.
• You have to employ the appropriate technique for each data source.
• Source data may be from different source machines in diverse data
formats.
• Part of the source data may be in relational database systems.
• Some data may be on other legacy network and hierarchical data
models.
• Many data sources may still be in flat files.
• You may want to include data from spreadsheets and local
departmental data sets.
• Data extraction may become quite complex.
Data Warehouse Components
(continued)
Data Transformation
• In every system implementation, data conversion is an important function.
• For example:
• When you implement an operational system such as a magazine
subscription application, you have to initially populate your database
with data from the prior system records manually
• Moving initially from a file-oriented system to relational database
tables.
• Another factor in the data warehouse is that the data feed is not just an
initial load.
• You will have to continue to pick up the ongoing changes from the source
systems.
• Any transformation tasks you set up for the initial load will be adapted for
the ongoing revisions as well.
Data Warehouse Components
(continued)
• You perform a number of individual tasks as part of data
transformation.
• First, you clean the data extracted from each source.
• Cleaning may just be
• correction of misspellings
• resolution of conflicts between state codes and zip codes in the
source data
• providing default values for missing data elements
• elimination of duplicates when you bring in the same data from
multiple source systems.
Data Warehouse Components
(continued)
• Standardization of data elements forms a large part of data
transformation.
• You standardize the data types and field lengths for same data
elements retrieved from the various sources.
• Semantic standardization is another major task.
• You resolve synonyms and homonyms.
• When two or more terms from different source systems mean
the same thing, you resolve the synonyms. (ex: emp_id)
• When a single term means many different things in different
source systems, you resolve the homonym (ex: street no)
Data Warehouse Components
(continued)
Data Transformation:
• Example: A grocery sale operational system keeps the unit sales
and revenue amounts by individual transactions at the check-out
counter at each store.
• But in the data warehouse, it may not be necessary to keep the
data at this detailed level.
• You may want to summarize the totals by product at each store for
a given day and keep the summary totals of the sale units and
revenue in the data warehouse storage.
• In such cases, the data transformation function would include
appropriate summarization.
• When the data transformation function ends, a collection of
integrated data that is cleaned, standardized, and summarized is
ready to load in to each dataset in your data warehouse.
Data Warehouse Components
(continued)
Data Loading
• When you complete the design and construction of the data
warehouse and go live for the first time:
• Do the initial loading of the data into the data warehouse storage
• The initial load moves large volumes of data using up substantial
amounts of time.
• As the data warehouse starts functioning:
• continue to extract the changes to the source data
• transform the data revisions
• feed the incremental data revisions on an ongoing
Data Warehouse
Architecture(continued)
Data Warehouse
Architecture(continued)
Data Warehouse
Architecture(continued)
Data Storage Component
• The data storage for the data warehouse is a separate repository
• The operational systems of your enterprise support the
day-to-day operations for applications storing current data.
• Also, these data repositories contain the data structured in highly
normalized formats for fast and efficient processing.
• In contrast, in the data repository for a data warehouse, you need to
keep large volumes of historical data for analysis.
• Further, keep the data in the data warehouse in structures suitable
for analysis, and not for quick retrieval of information.
• Therefore, the data storage for the data warehouse is kept
separate from the data storage for operational systems.
Data Warehouse Components
Data Warehouse
Architecture(continued)
Information Delivery Component
• Who are the users that need information from the data
warehouse?
• The novice user comes to the data warehouse with no training and,
therefore, needs prefabricated reports and preset queries.
• The casual user needs information once in a while, not regularly.
This type of user also needs prepackaged information.
• The business analyst looks for ability to do complex analysis using
the information in the data warehouse.
• The power user wants to be able to navigate throughout the data
warehouse, pick up interesting data, format his or her own
queries, drill through the data layers, and create custom reports
and ad hoc queries.
Data Warehouse
Architecture(continued)
• Ad hoc reports are predefined reports primarily meant for novice
and casual users.
• Provision for complex queries, multidimensional (MD) analysis,
and statistical analysis cater to the needs of the business analysts
and power users.
• Information fed into Executive Information Systems (EIS) is meant
for senior executives and high-level managers.
• Some data warehouses also provide data to data-mining
applications.
Data Warehouse
Architecture(continued)
Data Warehouse
Architecture(continued)
Metadata Component (data about the data )
• Metadata in a data warehouse is similar to the data dictionary or the
data catalog in a database management system.
• In the data dictionary, you keep the information about the logical
data structures, the information about the files and addresses, the
information about the indexes, and so on.
• The metadata component is the data about the data in the data
warehouse.
• Metadata in a data warehouse fall into three major categories:
• Operational metadata
• Extraction and transformation metadata
• End-user metadata
Types of Metadata(continued)
Operational Metadata
• Data for the data warehouse comes from several operational
systems of the enterprise.
• These source systems contain different data structures.
• The data elements selected for the data warehouse have various
field lengths and data types.
• In selecting data from the source systems for the data warehouse,
you split records, combine parts of records from different source
files, and deal with multiple coding schemes and field lengths.
• When you deliver information to the end-users, you must be able to
tie that back to the original source data sets.
• Operational metadata contain all of this information about the
operational data sources.
Types of Metadata(continued)
Extraction and Transformation Metadata
• Extraction and transformation metadata contain data about the
extraction of data from the source systems, namely,
• extraction frequencies
• extraction methods
• business rules for the data extraction.
• Also, this category of metadata contains information about all the
data transformations that take place in the data staging area.
Types of Metadata(continued)
End-User Metadata
• The end-user metadata is the navigational map of the data
warehouse.
• It enables the end-users to find information from the data
warehouse.
• The end-user metadata allows the end-users to use their own
business terminology and look for information in those ways in
which they normally think of the business.
• Describe Metadata of a Book Store.

• Name of the books


• Summary of the books
• The date of publication
• Who are publisher
• How you can find the books
• Whether the book is available or
not.
• Cost of the books etc.
Data Warehouse Metadata
• Physical Name
• Logical Name
• Type : Fact, Dimension , Bridge
• Role : OLTP
• DBMS : DB2 , MS SQL Server , oracle
• Create date
• Update date
• Update cycle : Weekly
• Planned Archival : Every six months
• Responsible User
Data Warehouse Architecture(continued)
Data Warehouse
Architecture(continued)
Management and Control Component
• This component of the data warehouse architecture sits on top of all
the other components.
• The management and control component coordinates the services
and activities within the data warehouse.
• This component controls the data transformation and the data
transfer into the data warehouse storage.
• It moderates the information delivery to the users.
• It works with the database management systems and enables data
to be properly stored in the repositories.
• It monitors the movement of data into the staging area and from
there into the data warehouse storage itself.
Data Warehouse vs Data Marts
• A data mart is a small, single-subject data warehouse subset
that provides decision support to a small group of people.
• Data Marts can serve as a test vehicle for companies exploring
the potential benefits of Data Warehouses.
• Data Marts address local or departmental problems, while a
Data Warehouse involves a company-wide effort to support
decision making at all levels in the organization
Data Warehouse vs Data Marts
Data Warehouse Data Marts
Data warehouse is a centralized Data mart is decentralized system.
system.
In data warehouse, lightly In Data mart, highly denormalization
denormalization takes place. takes place.
Data warehouse is top down model. Data mart is a bottom up model.
To build a warehouse is difficult. To build a mart is easy.
Data warehouse is flexible. Data mart is not flexible.
Data warehouse is the data-oriented Data mart is project-oriented in
in nature. nature.
Data warehouse has long life Data mart has short life than
warehouse.
In data warehouse, data are In data mart , data are contained in
contained in detail form summarized form
Data warehouse is vast in size Data mart is smaller in size than
warehouse
Thank You!!

You might also like