0% found this document useful (0 votes)

101 views

Data Warehouse Schema

A data warehouse schema defines how database entities like fact and dimension tables are organized logically. Common schemas include star, snowflake, galaxy, and star cluster. A star schema has a central fact table surrounded by dimension tables, allowing for easy querying of aggregated data. A snowflake schema fully normalizes dimension tables from a star schema into a hierarchical structure, reducing data redundancy but making queries more complex. A galaxy schema has multiple fact tables sharing the same dimension tables. [/SUMMARY]

Uploaded by

maheshpolasani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views

Data Warehouse Schema

Uploaded by

maheshpolasani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Warehouse Schema

In a data warehouse, a schema is used to define the way to organize the system with all the database entities
(fact tables, dimension tables) and their logical association.

Here are the different types of Schemas in DW:

1. Star Schema
2. SnowFlake Schema
3. Galaxy Schema
4. Star Cluster Schema

#1) Star Schema

This is the simplest and most effective schema in a data warehouse. A fact table in the center surrounded by
multiple dimension tables resembles a star in the Star Schema model.

The fact table maintains one-to-many relations with all the dimension tables. Every row in a fact table is
associated with its dimension table rows with a foreign key reference.

Due to the above reason, navigation among the tables in this model is easy for querying aggregated data. An
end-user can easily understand this structure. Hence all the Business Intelligence (BI) tools greatly support the
Star schema model.

While designing star schemas the dimension tables are purposefully de-normalized. They are wide with many
attributes to store the contextual data for better analysis and reporting.

Benefits Of Star Schema

 Queries use very simple joins while retrieving the data and thereby query performance is increased.
 It is simple to retrieve data for reporting, at any point of time for any period.
Disadvantages Of Star Schema
 If there are many changes in the requirements, the existing star schema is not recommended to
modify and reuse in the long run.
 Data redundancy is more as tables are not hierarchically divided.

An example of a Star Schema is given below.

Querying A Star Schema

An end-user can request a report using Business Intelligence tools. All such requests will be processed by
creating a chain of “SELECT queries” internally. The performance of these queries will have an impact on the
report execution time.

From the above Star schema example, if a business user wants to know how many Novels and DVDs have
been sold in the state of Kerala in January in 2018, then you can apply the query as follows on Star schema
tables:

SELECT pdim.Name Product_Name,

                   Sum (sfact.sales_units) Quanity_Sold
FROM      Product pdim,
                   Sales sfact,
                   Store sdim,
                   Date ddim
WHERE sfact.product_id = pdim.product_id
                 AND sfact.store_id = sdim.store_id
                 AND sfact.date_id = ddim.date_id
                 AND sdim.state = 'Kerala'
                 AND ddim.month   = 1
                 AND ddim.year    = 2018
                 AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name

#2) SnowFlake Schema

Star schema acts as an input to design a SnowFlake schema. Snow flaking is a process that completely
normalizes all the dimension tables from a star schema.

The arrangement of a fact table in the center surrounded by multiple hierarchies of dimension tables looks like
a SnowFlake in the SnowFlake schema model. Every fact table row is associated with its dimension table rows
with a foreign key reference.

While designing SnowFlake schemas the dimension tables are purposefully normalized. Foreign keys will be
added to each level of the dimension tables to link to its parent attribute. The complexity of the SnowFlake
schema is directly proportional to the hierarchy levels of the dimension tables.

Benefits of SnowFlake Schema:

 Data redundancy is completely removed by creating new dimension tables.
 When compared with star schema, less storage space is used by the Snow Flaking dimension tables.
 It is easy to update (or) maintain the Snow Flaking tables.
Disadvantages of SnowFlake Schema:
 Due to normalized dimension tables, the ETL system has to load the number of tables.
 You may need complex joins to perform a query due to the number of tables added. Hence query
performance will be degraded.
An example of a SnowFlake Schema is given below.
The Dimension Tables in the above SnowFlake Diagram are normalized as explained below:
 Date dimension is normalized into Quarterly, Monthly and Weekly tables by leaving foreign key ids in
the Date table.
 The store dimension is normalized to comprise the table for State.
 The product dimension is normalized into Brand.
 In the Customer dimension, the attributes connected to the city are moved into the new City table by
leaving a foreign key id in the Customer table.
In the same way, a single dimension can maintain multiple levels of hierarchy.

Different levels of hierarchies from the above diagram can be referred to as follows:
 Quarterly id, Monthly id, and Weekly ids are the new surrogate keys that are created for Date
dimension hierarchies and those have been added as foreign keys in the Date dimension table.
 State id is the new surrogate key created for Store dimension hierarchy and it has been added as the
foreign key in the Store dimension table.
 Brand id is the new surrogate key created for the Product dimension hierarchy and it has been added
as the foreign key in the Product dimension table.
 City id is the new surrogate key created for Customer dimension hierarchy and it has been added as
the foreign key in the Customer dimension table.

Querying A Snowflake Schema

We can generate the same kind of reports for end-users as that of star schema structures with SnowFlake
schemas as well. But the queries are a bit complicated here.

From the above SnowFlake schema example, we are going to generate the same query that we have designed
during the Star schema query example.
That is if a business user wants to know how many Novels and DVDs have been sold in the state of Kerala in
January in 2018, you can apply the query as follows on SnowFlake schema tables.

SELECT pdim.Name Product_Name,

                   Sum (sfact.sales_units) Quanity_Sold
FROM        Sales sfact
INNER JOIN Product pdim ON sfact.product_id = pdim.product_id
INNER JOIN Store sdim ON sfact.store_id = sdim.store_id
INNER JOIN State stdim ON sdim.state_id = stdim.state_id
INNER JOIN Date ddim ON sfact.date_id = ddim.date_id
INNER JOIN Month mdim ON ddim.month_id = mdim.month_id
WHERE stdim.state = 'Kerala'
                 AND mdim.month   = 1
                 AND ddim.year    = 2018
                 AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name

Product_Name Quantity_Sold

Novels 12,702

DVDs 32,919
Points To Remember While Querying Star (or) SnowFlake Schema Tables
Any query can be designed with the below structure:

SELECT Clause:
 The attributes specified in the select clause are shown in the query results.
 The Select statement also uses groups to find the aggregated values and hence we must use group
by clause in the where condition.
FROM Clause:
 All the essential fact tables and dimension tables have to be chosen as per the context.
WHERE Clause:
 Appropriate dimension attributes are mentioned in the where clause by joining with the fact table
attributes. Surrogate keys from the dimension tables are joined with the respective foreign keys from
the fact tables to fix the range of data to be queried. Please refer to the above-written star schema
query example to understand this. You can also filter data in the from clause itself if in case you are
using inner/outer joins there, as written in the SnowFlake schema example.
 Dimension attributes are also mentioned as constraints on data in the where clause.
 By filtering the data with all the above steps, appropriate data is returned for the reports.
As per the business needs, you can add (or) remove the facts, dimensions, attributes, and constraints to a star
schema (or) SnowFlake schema query by following the above structure. You can also add sub-queries (or)
merge different query results to generate data for any complex reports.

#3) Galaxy Schema

A galaxy schema is also known as Fact Constellation Schema. In this schema, multiple fact tables share the
same dimension tables. The arrangement of fact tables and dimension tables looks like a collection of stars in
the Galaxy schema model.

The shared dimensions in this model are known as Conformed dimensions.

This type of schema is used for sophisticated requirements and for aggregated fact tables that are more
complex to be supported by the Star schema (or) SnowFlake schema. This schema is difficult to maintain due
to its complexity.

An example of Galaxy Schema is given below.

#4) Star Cluster Schema

A SnowFlake schema with many dimension tables may need more complex joins while querying. A star
schema with fewer dimension tables may have more redundancy. Hence, a star cluster schema came into the
picture by combining the features of the above two schemas.

Star schema is the base to design a star cluster schema and few essential dimension tables from the star
schema are snowflaked and this, in turn, forms a more stable schema structure.

An example of a Star Cluster Schema is given below.

Which Is Better Snowflake Schema Or Star Schema?
The data warehouse platform and the BI tools used in your DW system will play a vital role in deciding the
suitable schema to be designed. Star and SnowFlake are the most frequently used schemas in DW.

Star schema is preferred if BI tools allow business users to easily interact with the table structures with simple
queries. The SnowFlake schema is preferred if BI tools are more complicated for the business users to interact
directly with the table structures due to more joins and complex queries.

You can go ahead with the SnowFlake schema either if you want to save some storage space or if your DW
system has optimized tools to design this schema.

Star Schema Vs Snowflake Schema

Given below are the key differences between Star schema and SnowFlake schema.

S.N Star Schema Snow Flake Schema

o
1 Data redundancy is more. Data redundancy is less.
2 Storage space for dimension tables is Storage space for dimension tables is
more. comparatively less.
3 Contains de-normalized dimension tables. Contains normalized dimension tables.
4 Single fact table is surrounded by multiple Single fact table is surrounded by multiple
dimension tables. hierarchies of dimension tables.
5 Queries use direct joins between fact and Queries use complex joins between fact and
dimensions to fetch the data. dimensions to fetch the data.
6 Query execution time is less. Query execution time is more.
7 Anyone can easily understand and design It is tough to understand and design the
the schema. schema.
8 Uses top down approach. Uses bottom up approach.

Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
XML-Based Web Applications
No ratings yet
XML-Based Web Applications
114 pages
FFT128 Project
No ratings yet
FFT128 Project
70 pages
Scd2 Flag Implementation
No ratings yet
Scd2 Flag Implementation
7 pages
Orchard PDF Itunes Booklet Template PDF
0% (1)
Orchard PDF Itunes Booklet Template PDF
2 pages
Sap Security Authorizations
100% (1)
Sap Security Authorizations
9 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
DataStage Matter
No ratings yet
DataStage Matter
81 pages
Accenture Interview Etl Testing
No ratings yet
Accenture Interview Etl Testing
1 page
Star and Snowflake Schemas
No ratings yet
Star and Snowflake Schemas
4 pages
Incremental Loading For Dimension Table
100% (1)
Incremental Loading For Dimension Table
3 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
No ratings yet
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
9 pages
AGILE
No ratings yet
AGILE
1 page
Srikanth
No ratings yet
Srikanth
7 pages
Data Model - Important - Concepts
No ratings yet
Data Model - Important - Concepts
24 pages
DWH Concepts Overview
No ratings yet
DWH Concepts Overview
11 pages
SQL interview questions for a Data Engineer
No ratings yet
SQL interview questions for a Data Engineer
11 pages
Multidimensional Data Mode:-: Characteristics of Data Warehouse
100% (1)
Multidimensional Data Mode:-: Characteristics of Data Warehouse
26 pages
Super Informatica Basics PDF
No ratings yet
Super Informatica Basics PDF
49 pages
Prathap Reddy.C: Rofessional Ummary
No ratings yet
Prathap Reddy.C: Rofessional Ummary
4 pages
Oracle PLSQL Notes
100% (4)
Oracle PLSQL Notes
59 pages
60+ MySQL Interview Questions and Answers [2025 Updated]
No ratings yet
60+ MySQL Interview Questions and Answers [2025 Updated]
12 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Srinivas SR - Informatica Developer Summary of Qualification
No ratings yet
Srinivas SR - Informatica Developer Summary of Qualification
5 pages
Teradata SQL Performance Tuning Case Study Part II
0% (1)
Teradata SQL Performance Tuning Case Study Part II
37 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Iics Ipc Ranjith Kumar
100% (1)
Iics Ipc Ranjith Kumar
4 pages
Akshay Pratap - Informatica IICS
No ratings yet
Akshay Pratap - Informatica IICS
3 pages
IDQ Functionality Imp
No ratings yet
IDQ Functionality Imp
7 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
What Are The Dimensions in Data Warehouse
100% (1)
What Are The Dimensions in Data Warehouse
6 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Data Warehouse Interview Questions:: Why Oracle No Netezza?
No ratings yet
Data Warehouse Interview Questions:: Why Oracle No Netezza?
6 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Datawarehouse Tools
No ratings yet
Datawarehouse Tools
8 pages
SCD Type1
100% (1)
SCD Type1
16 pages
Working With A Dynamic Lookup Cache
No ratings yet
Working With A Dynamic Lookup Cache
15 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
3 Teradata Interview Questions and Answers
No ratings yet
3 Teradata Interview Questions and Answers
7 pages
1c. Advanced SQL (Selected)
No ratings yet
1c. Advanced SQL (Selected)
10 pages
Informatica Rocks - Informatica Project Explonation
No ratings yet
Informatica Rocks - Informatica Project Explonation
5 pages
SQL Interview
No ratings yet
SQL Interview
73 pages
New Informatica Concepts - Day
100% (1)
New Informatica Concepts - Day
98 pages
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
No ratings yet
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
6 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
InformaticaQ&A
100% (1)
InformaticaQ&A
18 pages
Data Modeler Release Notes
No ratings yet
Data Modeler Release Notes
81 pages
IDQ Mappings From Cleanse Functions
No ratings yet
IDQ Mappings From Cleanse Functions
4 pages
Informatica
No ratings yet
Informatica
7 pages
Welcome To The Finest Collection of Informatica Interview Questions With Standard Answers That You Can Count On
No ratings yet
Welcome To The Finest Collection of Informatica Interview Questions With Standard Answers That You Can Count On
60 pages
Informatica MDM Training 2
No ratings yet
Informatica MDM Training 2
80 pages
IDQ Reference
No ratings yet
IDQ Reference
31 pages
Velocity v8 Data Warehousing Methodology
No ratings yet
Velocity v8 Data Warehousing Methodology
1,106 pages
Data Warehousing Interview Questions and Answers
No ratings yet
Data Warehousing Interview Questions and Answers
6 pages
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
No ratings yet
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
28 pages
ORACLE 12C Complete Self-Assessment Guide
From Everand
ORACLE 12C Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
From Everand
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
Rafiq Muhammad
No ratings yet
Lecture 11data Warehouse Scema
No ratings yet
Lecture 11data Warehouse Scema
12 pages
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
No ratings yet
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
15 pages
Datadeling
No ratings yet
Datadeling
27 pages
Umesh CV
No ratings yet
Umesh CV
4 pages
Conver Flat File Into Staing Area
No ratings yet
Conver Flat File Into Staing Area
1 page
Linux Top 50 Commands
No ratings yet
Linux Top 50 Commands
23 pages
What Is DevOps
No ratings yet
What Is DevOps
7 pages
ETL Business Logics
No ratings yet
ETL Business Logics
1 page
Parameter and Variables
No ratings yet
Parameter and Variables
4 pages
Data Completenes-Saccuracy-Consistency Testing
No ratings yet
Data Completenes-Saccuracy-Consistency Testing
1 page
Pranali Pawar Resume
No ratings yet
Pranali Pawar Resume
5 pages
IDQ Architecture 1
No ratings yet
IDQ Architecture 1
8 pages
Power Center Basic Concepts
No ratings yet
Power Center Basic Concepts
38 pages
UNIX Commands
No ratings yet
UNIX Commands
7 pages
Tarek - S - Dakrory 2015 Ijca 907590
No ratings yet
Tarek - S - Dakrory 2015 Ijca 907590
9 pages
Datawarehousingbasics 160923045745
No ratings yet
Datawarehousingbasics 160923045745
21 pages
Balance Interface Program Instructions
No ratings yet
Balance Interface Program Instructions
2 pages
Sample of Oil &amp Gas CV
No ratings yet
Sample of Oil &amp Gas CV
4 pages
JQuesry + JavaScripting Language by JUNAID
No ratings yet
JQuesry + JavaScripting Language by JUNAID
46 pages
DBMS Unit 1 Notes
No ratings yet
DBMS Unit 1 Notes
36 pages
V.anirudh Joshi Resume
No ratings yet
V.anirudh Joshi Resume
2 pages
The American Standard Code For Information Interchange
100% (1)
The American Standard Code For Information Interchange
4 pages
TEMS Pocket 16.1 Technical Product Description
80% (5)
TEMS Pocket 16.1 Technical Product Description
102 pages
Dec Bill
No ratings yet
Dec Bill
12 pages
University of Cagayan Valley: College of Information Technology
No ratings yet
University of Cagayan Valley: College of Information Technology
15 pages
A13C Manual of AI Plugin V2.0
No ratings yet
A13C Manual of AI Plugin V2.0
7 pages
log0
No ratings yet
log0
293 pages
CV Monica Wara Kartika
No ratings yet
CV Monica Wara Kartika
1 page
DBMS 22MCA21
No ratings yet
DBMS 22MCA21
3 pages
It Ec 114 Assignment 2
No ratings yet
It Ec 114 Assignment 2
4 pages
BigCommerce Build An Ecommerce Website
No ratings yet
BigCommerce Build An Ecommerce Website
21 pages
Unit2 - Lines and Indentation - Multi-Line Statements - Comments
No ratings yet
Unit2 - Lines and Indentation - Multi-Line Statements - Comments
9 pages
SICAM A8000 - HMI - Profile
No ratings yet
SICAM A8000 - HMI - Profile
2 pages
SvenWhell Gc-w900 Manual en
No ratings yet
SvenWhell Gc-w900 Manual en
9 pages
UML Class Diagram
100% (1)
UML Class Diagram
47 pages
Java MCQ
100% (1)
Java MCQ
13 pages
Cadworx Customising Backing Sheet
100% (1)
Cadworx Customising Backing Sheet
27 pages
E!Cockpit: Quickstart Reference For The Software 2759-0101
No ratings yet
E!Cockpit: Quickstart Reference For The Software 2759-0101
58 pages
University of Melbourne (Semester-2 2011) Department of Computer Science and Software Engineering
No ratings yet
University of Melbourne (Semester-2 2011) Department of Computer Science and Software Engineering
4 pages
Computer Programming in Fortran 77: Lecture 8 - Output Design and File Processing
No ratings yet
Computer Programming in Fortran 77: Lecture 8 - Output Design and File Processing
17 pages
Functional Business Systems: Oleh: Angelia Merdiyanti, MM
No ratings yet
Functional Business Systems: Oleh: Angelia Merdiyanti, MM
43 pages
Course Outline
No ratings yet
Course Outline
1 page

Data Warehouse Schema

Uploaded by

Data Warehouse Schema

Uploaded by

Data Warehouse Schema

Here are the different types of Schemas in DW:

#1) Star Schema

Benefits Of Star Schema

An example of a Star Schema is given below.

SELECT pdim.Name Product_Name,

#2) SnowFlake Schema

Benefits of SnowFlake Schema:

Querying A Snowflake Schema

SELECT pdim.Name Product_Name,

#3) Galaxy Schema

The shared dimensions in this model are known as Conformed dimensions.

An example of Galaxy Schema is given below.

#4) Star Cluster Schema

An example of a Star Cluster Schema is given below.

Star Schema Vs Snowflake Schema

S.N Star Schema Snow Flake Schema

You might also like