0% found this document useful (0 votes)

2 views

Datascience Unit 02 1

A Data Warehouse (DW) is a relational database designed for query and analysis, integrating historical data from various sources to support decision-making. Key characteristics include being subject-oriented, integrated, time-variant, and non-volatile, while data marts serve specific business needs. The document also discusses data warehouse architecture, OLAP operations, and data preprocessing techniques essential for effective data mining and analysis.

Uploaded by

vravindraravindra927

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Datascience Unit 02 1

Uploaded by

vravindraravindra927

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

DATA WAREHOUSE

UNIT-02
WHAT IS A DATA WAREHOUSE?

• A Data Warehouse (DW) is a relational database that is

designed for query and analysis rather than transaction
processing. It includes historical data derived from
transaction data from single and multiple sources.
• A Data Warehouse provides integrated, enterprise-wide,
historical data and focuses on providing support for
decision-makers for data modeling and analysis. A Data
Warehouse is a group of data specific to the entire
organization, not only to a particular group of users. It is
not used for daily operations and transaction processing
CHARACTERISTICS OF DATA WAREHOUSE
SUBJECT-ORIENTED

• A data warehouse target on the modeling and analysis of

data for decision-makers. Therefore, data warehouses
typically provide a concise and straightforward view around
a particular subject, such as customer, product, or sales,
instead of the global organization's ongoing operations. This
is done by excluding data that are not useful concerning the
subject and including all data needed by the users to
understand the subject.
INTEGRATED

• A data warehouse integrates various heterogeneous data sources like

RDBMS, flat files, and online transaction records. It requires performing
data cleaning and integration during data warehousing to ensure
consistency in naming conventions, attributes types, etc., among
different data sources.
TIME-VARIANT

• Historical information is kept in a data warehouse. For

example, one can retrieve files from 3 months, 6 months,
12 months, or even previous data from a data warehouse.
These variations with a transactions system, where often
only the most current file is kept.
NON-VOLATILE

• The data warehouse is a physically separate data storage, which is

transformed from the source operational RDBMS. The operational
updates of data do not occur in the data warehouse, i.e., update,
insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and
access to data. Therefore, the DW does not require transaction
processing, recovery, and concurrency capabilities, which allows for
substantial speedup of data retrieval. Non-Volatile defines that once
entered into the warehouse, and data should not change.
GOALS OF DATA WAREHOUSING

• To help reporting as well as analysis

• Maintain the organization's historical information
• Be the foundation for decision making.
DATA MART:

• A data mart is a subset of a main data warehouse that is segmented to

serve business needs, typically with a focus on a particular purpose.
• For Example: If we assume an hons college as data warehouse then,
• Geography Dept
• History Dept.
• English Dept.
• Bengali Dept.
• CSE Dept.
• These are all departments. And each department is a data mart of a data
warehouse.
• There may be distinct data marts for finance, sales, production, or
marketing. Departments comprise the software, hardware, programs, and
data related to a particular department inside the firm.
• Although each of these data marts is unique, they may all be coordinated.
• The data marts of several departments differ from one another.
• A departmentally planned tiny warehouse is called a data mart.
META DATA

• Your data warehouse's contents are listed in a directory

called Meta Data.
• Forms of meta data
• Three main types may be found in meta data in a data
warehouse:
• Operational metadata
• Extraction and transformation metadata
• End user meta data
1. OPERATIONAL METADATA

• Data for the data warehouse originates from several operational

systems inside the organization, since operational metadata
encompasses all relevant information about the operational data
sources.
• 2. Metadata Extraction and Transformation
It includes details on every data transformation that has ever
occurred.
3. END-USER METADATA (INDEX)

• The data warehouse's navigational map is the end

user information. It makes it possible for the end
user to locate data warehouse information.
DATA WARE HOUSE ARCHITECTURE
BACK-END TOOLS AND UTILITIES

• They are employed to feed data from operational

databases or other external sources into the data
warehouse (bottom layer).
• These tools and utilities carry out load and refresh
operations to update the data warehouse in
addition to data extraction, cleansing, and
transformation (e.g., merging comparable data
from several sources into unified format).
BOTTOM TIER

• Relational database systems are mostly often

found on the warehouse database server.
• A data warehouse was created by connecting
many data mart.
• Additionally, this layer has a metadata repository
that houses data on the content of the data
warehouse.
• Additionally, there are integrators and monitors on
THE MIDDLE TIER

• An OLAP server is the intermediate tier.

• Typically, MOLAP or ROLAP are used to
implement it.
• ROLAP is the name of the server that
manages relational databases.
• MOLAP is a unique kind of server that is
specifically designed for multidimensional
THE TOP TIER

• It is a front-end client layer that includes data mining,

analysis, and query and reporting capabilities.
NEED FOR DATA WAREHOUSE
BENEFITS OF DATA WAREHOUSE

• Understand business trends and make better forecasting decisions.

• Data Warehouses are designed to perform well enormous amounts of data.
• The structure of data warehouses is more accessible for end-users to
navigate, understand, and query.
• Queries that would be complex in many normalized databases could be
easier to build and maintain in data warehouses.
• Data warehousing is an efficient method to manage demand for lots of
information from lots of users.
• Data warehousing provide the capabilities to analyze a large amount of
historical data.
MULTIDIMENSIONAL DATA MODEL

• A multidimensional data model is a method used to

organize data in a database, allowing users to view
and analyze data from multiple perspectives. This
model is particularly useful in data warehousing and
OLAP (Online Analytical Processing), where it
enables users to quickly retrieve answers to complex
analytical queries
KEY CONCEPTS

• Data Cubes: The core concept of the

multidimensional data model is the data cube, which
allows data to be modeled and viewed in multiple
dimensions. Each dimension represents a different
perspective or entity, such as time, location, or
product. For example, a sales data warehouse might
have dimensions for time, items, and locations
• Dimensions and Facts:
• Dimensions: These are attributes that describe the measures,
such as time, location, or product. They are typically stored in
dimension tables.
• Facts: These are numerical measures that represent the central
theme of the data, such as sales or revenue. Facts are stored in
fact tables, which contain measures of the related dimensional
tables
MULTIDIMENSIONAL DATA MODEL SCHEMA

I. Star Schema: Each dimension in a star schema is represented with

only one-dimension table. This dimension table contains the set of
attributes. The following diagram shows the sales data of a company
with respect to the four dimensions, namely time, item, branch, and
location.
There is a fact table at the center. It contains the keys to each of four
dimensions.
The fact table also contains the attributes, namely dollars sold and units
sold.
II. Snowflake Schema: Some dimension tables in the Snowflake schema
are normalized. The normalization splits up the data into additional
tables. Unlike Star schema, the dimensions table in a snowflake schema
are normalized. For example, the item dimension table in star schema
is normalized and split into two dimension tables, namely item and
supplier table. Now the item dimension table contains the attributes
item_key, item_name, type, brand, and supplier-key. The supplier key is
linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.
A fact constellation has multiple fact
III. Fact Constellation Schema:
tables. It is also known as galaxy schema. The sales fact table is
same as that in the star schema. The shipping fact table has the five
dimensions, namely item_key, time_key, shipper_key, from_location,
to_location. The shipping fact table also contains two measures,
namely dollars sold and units sold. It is also possible to share
dimension tables between fact tables. For example, time, item, and
location dimension tables are shared between the sales and shipping
fact table.
OLAP OPERATIONS

• Since OLAP servers are based on multidimensional view of data, we will

discuss OLAP operations in multidimensional data.
• Here is the list of OLAP operations −
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
• Roll-up: Roll-up performs aggregation on a data
cube in any of the following ways −
• By climbing up a concept hierarchy for a dimension
• By dimension reduction
Roll-up is performed by climbing up a concept hierarchy for
the dimension location. Initially the concept hierarchy was
"street < city < province < country".
On rolling up, the data is aggregated by ascending the
location hierarchy from the level of city to the level of
country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions from the
data cube are removed.
• Drill-down: Drill-down is the reverse operation of
roll-up. It is performed by either of the following
ways −
• By stepping down a concept hierarchy for a
dimension
• By introducing a new dimension.
• Drill-down is performed by stepping down a concept
hierarchy for the dimension time.
• Initially the concept hierarchy was "day < month < quarter
< year.“
• On drilling down, the time dimension is descended from the
level of quarter to the level of month.
• When drill-down is performed, one or more dimensions
from the data cube are added.
• It navigates the data from less detailed data to highly
• Slice: The slice operation selects one particular dimension from a
given cube and provides a new sub-cube. Consider the following
diagram that shows how slice works.
• Here Slice is performed for the dimension "time"
using the criterion time = "Q1".

• It will form a new sub-cube by selecting one or

more dimensions.
• Dice:Dice selects two or more dimensions from a given cube and
provides a new sub-cube. Consider the following diagram that shows
the dice operation.
• The dice operation on the cube based on the following
selection criteria involves three dimensions.
• (location = "Toronto" or "Vancouver")
• (time = "Q1" or "Q2")
• (item =" Mobile" or "Modem")
• Pivot: The pivot operation is also known as rotation. It
rotates the data axes in view in order to provide an
alternative presentation of data. Consider the following
diagram that shows the pivot operation.
ADVANTAGES OF DATA CUBES:

• Multi-dimensional analysis: Data cubes enable multi-dimensional analysis of business

data, allowing users to view data from different perspectives and levels of detail.
• Interactivity: Data cubes provide interactive access to large amounts of data, allowing
users to easily navigate and manipulate the data to support their analysis.
• Speed and efficiency: Data cubes are optimized for OLAP analysis, enabling fast and
efficient querying and aggregation of data.
• Data aggregation: Data cubes support complex calculations and data aggregation,
enabling users to quickly and easily summarize large amounts of data.
• Improved decision-making: Data cubes provide a clear and comprehensive view of
business data, enabling improved decision-making and business intelligence.
• Accessibility: Data cubes can be accessed from a variety of devices and platforms,
making it easy for users to access and analyze business data from anywhere.
DISADVANTAGES OF DATA CUBE:

• Complexity: OLAP systems can be complex to set up and maintain, requiring specialized technical
expertise.
• Data size limitations: OLAP systems can struggle with very large data sets and may require
extensive data aggregation or summarization.
• Performance issues: OLAP systems can be slow when dealing with large amounts of data,
especially when running complex queries or calculations.
• Data integrity: Inconsistent data definitions and data quality issues can affect the accuracy of
OLAP analysis.
• Cost: OLAP technology can be expensive, especially for enterprise-level solutions, due to the need
for specialized hardware and software.
• Inflexibility: OLAP systems may not easily accommodate changing business needs and may
require significant effort to modify or extend.
DATA PREPROCESSING IN DATA MINING

• Data preprocessing is an important process of data mining. In this

process, raw data is converted into an understandable format and
made ready for further analysis. The motive is to improve data quality
and make it up to mark for specific tasks.
• Tasks in Data Preprocessing
DATA CLEANING

• Data cleaning help us remove inaccurate, incomplete and incorrect data from
the dataset. Some techniques used in data cleaning are −
• Binning − This method handle noisy data to make it smooth. Data gets
divided equally and stored in form of bins and then methods are applied to
smoothing or completing the tasks.
• Regression − Regression functions are used to smoothen the data.
Regression can be linear(consists of one independent variable) or
multiple(consists of multiple independent variables).
• Clustering − It is used for grouping the similar data in clusters and is used for
finding outliers.
DATA INTEGRATION

• The process of combining data from multiple sources

(databases, spreadsheets,text files) into a single dataset.
Single and consistent view of data is created in this process.
Major problems during data integration are Schema
integration(Integrates set of data collected from various
sources), Entity identification(identifying entities from
different databases) and detecting and resolving data values
concept.
DATA TRANSFORMATION

• In this part, change in format or structure of data in order to transform the data suitable for
mining process. Methods for data transformation are −Normalization − Method of scaling
data to represent it in a specific smaller range( -1.0 to 1.0)
Discretization − It helps reduce the data size and make continuous data divide into intervals.
Attribute Selection − To help the mining process, new attributes are derived from the given
attributes.
Concept Hierarchy Generation − In this, the attributes are changed from lower level to higher
level in hierarchy.
Aggregation − In this, a summary of data gets stored which depends upon quality and quantity
of data to make the result more optimal.
DATA REDUCTION

• It helps in increasing storage efficiency and

reducing data storage to make the analysis
easier by producing almost the same results.
Analysis becomes harder while working with
huge amounts of data, so reduction is used to
get rid of that.

Amity Online MCA QA-5
No ratings yet
Amity Online MCA QA-5
18 pages
Cis Sultan .2021
No ratings yet
Cis Sultan .2021
3 pages
Benjamin Packwood - Great Great Grandfather of Barry McAllister
100% (2)
Benjamin Packwood - Great Great Grandfather of Barry McAllister
24 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Analytical Aquifer ECLIPSE Petrel
No ratings yet
Analytical Aquifer ECLIPSE Petrel
13 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
Data Warehousing, Business Analytics and Online Analytical -1 (1)
No ratings yet
Data Warehousing, Business Analytics and Online Analytical -1 (1)
35 pages
Unit 2_Data Science BCA
No ratings yet
Unit 2_Data Science BCA
20 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
55 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
DWDM - Unit - I
No ratings yet
DWDM - Unit - I
70 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
BDA U2
No ratings yet
BDA U2
44 pages
Data Mining 9,10,11
No ratings yet
Data Mining 9,10,11
27 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
04OLAP
No ratings yet
04OLAP
66 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
Dataware Housing Notes
No ratings yet
Dataware Housing Notes
134 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
IT DWDM Unit I New PPT
No ratings yet
IT DWDM Unit I New PPT
60 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
BDA Unit 2 B.tech
No ratings yet
BDA Unit 2 B.tech
9 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Ch4-DW-detailed-version (1)
No ratings yet
Ch4-DW-detailed-version (1)
39 pages
Unit 2 Updated
No ratings yet
Unit 2 Updated
50 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Warehouse
No ratings yet
Warehouse
60 pages
data mining 4
No ratings yet
data mining 4
59 pages
Data Mining
No ratings yet
Data Mining
98 pages
04olap New
No ratings yet
04olap New
55 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
No ratings yet
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
90 pages
4th Year Dw& Dm Kai075 Unit 1
No ratings yet
4th Year Dw& Dm Kai075 Unit 1
25 pages
unit-2_1 (1)
No ratings yet
unit-2_1 (1)
60 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
Data Warehouse 2
No ratings yet
Data Warehouse 2
33 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Overview of Data Warehousing and OLAP: Slide 29-2
No ratings yet
Overview of Data Warehousing and OLAP: Slide 29-2
36 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Data Warehousing and Data Mining: Dr. Karunendra Verma
No ratings yet
Data Warehousing and Data Mining: Dr. Karunendra Verma
101 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
Unit 1- Data Warehouse
No ratings yet
Unit 1- Data Warehouse
21 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
Offer Letter Like This
No ratings yet
Offer Letter Like This
1 page
Ang Vs American Steamship Agencies, 19 Scra 631
No ratings yet
Ang Vs American Steamship Agencies, 19 Scra 631
3 pages
TAO (Trace Architecture Office)
No ratings yet
TAO (Trace Architecture Office)
11 pages
The Care and Feeding of an IACUC The Organization and Management of an Institutional Animal Care and Use Committee, Second Edition - 2nd Edition Unrestricted Download
No ratings yet
The Care and Feeding of an IACUC The Organization and Management of an Institutional Animal Care and Use Committee, Second Edition - 2nd Edition Unrestricted Download
15 pages
f01_850_16_en
No ratings yet
f01_850_16_en
1 page
Exam Final Preboard Chem Eng
100% (2)
Exam Final Preboard Chem Eng
5 pages
CCNA 3 Lab
No ratings yet
CCNA 3 Lab
30 pages
SRSG Corporate
No ratings yet
SRSG Corporate
22 pages
Links in CATIA Part2
No ratings yet
Links in CATIA Part2
6 pages
PABT Battlemax PR Range Brochure
No ratings yet
PABT Battlemax PR Range Brochure
2 pages
Prowler-Service-Catalog-1999 Feul pump
No ratings yet
Prowler-Service-Catalog-1999 Feul pump
5 pages
OPERATION MANUAL SR-20R IV Type B cz.2
No ratings yet
OPERATION MANUAL SR-20R IV Type B cz.2
300 pages
The LP and LPR Commands
No ratings yet
The LP and LPR Commands
3 pages
COSCA CALENDAR 2023 2024 Updated
No ratings yet
COSCA CALENDAR 2023 2024 Updated
15 pages
A Review: Prediction For Cube in OLAP - Based Data Mining Techniques
No ratings yet
A Review: Prediction For Cube in OLAP - Based Data Mining Techniques
4 pages
Manuals - XRN 3010, XRN 3010A, XRN 2010, XRN 2011, XRN 2010A, XRN 2011A - 191031 - EN PDF
100% (1)
Manuals - XRN 3010, XRN 3010A, XRN 2010, XRN 2011, XRN 2010A, XRN 2011A - 191031 - EN PDF
88 pages
MUF Rev
No ratings yet
MUF Rev
8 pages
CMM 49-94-32
No ratings yet
CMM 49-94-32
76 pages
41 PLC (Gearless Mill Drive)
100% (1)
41 PLC (Gearless Mill Drive)
40 pages
Cool Facts
No ratings yet
Cool Facts
36 pages
Table Saw Location
No ratings yet
Table Saw Location
2 pages
Reaffirmed 1996
No ratings yet
Reaffirmed 1996
9 pages
Introduction PDF
No ratings yet
Introduction PDF
7 pages
A Study On The Brands - Sustainable Fashion Efforts On DLSU-M SHS
No ratings yet
A Study On The Brands - Sustainable Fashion Efforts On DLSU-M SHS
7 pages
Welbound Times - March April 2010
No ratings yet
Welbound Times - March April 2010
8 pages
Startups Survival Guide Digital
100% (1)
Startups Survival Guide Digital
104 pages

Datascience Unit 02 1

Uploaded by

Datascience Unit 02 1

Uploaded by

DATA WAREHOUSE

• A Data Warehouse (DW) is a relational database that is

• A data warehouse target on the modeling and analysis of

• A data warehouse integrates various heterogeneous data sources like

• Historical information is kept in a data warehouse. For

• The data warehouse is a physically separate data storage, which is

• To help reporting as well as analysis

• A data mart is a subset of a main data warehouse that is segmented to

• Your data warehouse's contents are listed in a directory

• Data for the data warehouse originates from several operational

• The data warehouse's navigational map is the end

• They are employed to feed data from operational

• Relational database systems are mostly often

• An OLAP server is the intermediate tier.

• It is a front-end client layer that includes data mining,

• Understand business trends and make better forecasting decisions.

• A multidimensional data model is a method used to

• Data Cubes: The core concept of the

I. Star Schema: Each dimension in a star schema is represented with

• Since OLAP servers are based on multidimensional view of data, we will

• It will form a new sub-cube by selecting one or

• Multi-dimensional analysis: Data cubes enable multi-dimensional analysis of business

• Data preprocessing is an important process of data mining. In this

• The process of combining data from multiple sources

• It helps in increasing storage efficiency and

You might also like