Data Warehouse Schema
Data Warehouse Schema
In a data warehouse, a schema is used to define the way to organize the system with all the database entities
(fact tables, dimension tables) and their logical association.
1. Star Schema
2. SnowFlake Schema
3. Galaxy Schema
4. Star Cluster Schema
The fact table maintains one-to-many relations with all the dimension tables. Every row in a fact table is
associated with its dimension table rows with a foreign key reference.
Due to the above reason, navigation among the tables in this model is easy for querying aggregated data. An
end-user can easily understand this structure. Hence all the Business Intelligence (BI) tools greatly support the
Star schema model.
While designing star schemas the dimension tables are purposefully de-normalized. They are wide with many
attributes to store the contextual data for better analysis and reporting.
An end-user can request a report using Business Intelligence tools. All such requests will be processed by
creating a chain of “SELECT queries” internally. The performance of these queries will have an impact on the
report execution time.
From the above Star schema example, if a business user wants to know how many Novels and DVDs have
been sold in the state of Kerala in January in 2018, then you can apply the query as follows on Star schema
tables:
The arrangement of a fact table in the center surrounded by multiple hierarchies of dimension tables looks like
a SnowFlake in the SnowFlake schema model. Every fact table row is associated with its dimension table rows
with a foreign key reference.
While designing SnowFlake schemas the dimension tables are purposefully normalized. Foreign keys will be
added to each level of the dimension tables to link to its parent attribute. The complexity of the SnowFlake
schema is directly proportional to the hierarchy levels of the dimension tables.
Different levels of hierarchies from the above diagram can be referred to as follows:
Quarterly id, Monthly id, and Weekly ids are the new surrogate keys that are created for Date
dimension hierarchies and those have been added as foreign keys in the Date dimension table.
State id is the new surrogate key created for Store dimension hierarchy and it has been added as the
foreign key in the Store dimension table.
Brand id is the new surrogate key created for the Product dimension hierarchy and it has been added
as the foreign key in the Product dimension table.
City id is the new surrogate key created for Customer dimension hierarchy and it has been added as
the foreign key in the Customer dimension table.
We can generate the same kind of reports for end-users as that of star schema structures with SnowFlake
schemas as well. But the queries are a bit complicated here.
From the above SnowFlake schema example, we are going to generate the same query that we have designed
during the Star schema query example.
That is if a business user wants to know how many Novels and DVDs have been sold in the state of Kerala in
January in 2018, you can apply the query as follows on SnowFlake schema tables.
Product_Name Quantity_Sold
Novels 12,702
DVDs 32,919
Points To Remember While Querying Star (or) SnowFlake Schema Tables
Any query can be designed with the below structure:
SELECT Clause:
The attributes specified in the select clause are shown in the query results.
The Select statement also uses groups to find the aggregated values and hence we must use group
by clause in the where condition.
FROM Clause:
All the essential fact tables and dimension tables have to be chosen as per the context.
WHERE Clause:
Appropriate dimension attributes are mentioned in the where clause by joining with the fact table
attributes. Surrogate keys from the dimension tables are joined with the respective foreign keys from
the fact tables to fix the range of data to be queried. Please refer to the above-written star schema
query example to understand this. You can also filter data in the from clause itself if in case you are
using inner/outer joins there, as written in the SnowFlake schema example.
Dimension attributes are also mentioned as constraints on data in the where clause.
By filtering the data with all the above steps, appropriate data is returned for the reports.
As per the business needs, you can add (or) remove the facts, dimensions, attributes, and constraints to a star
schema (or) SnowFlake schema query by following the above structure. You can also add sub-queries (or)
merge different query results to generate data for any complex reports.
Star schema is the base to design a star cluster schema and few essential dimension tables from the star
schema are snowflaked and this, in turn, forms a more stable schema structure.
Star schema is preferred if BI tools allow business users to easily interact with the table structures with simple
queries. The SnowFlake schema is preferred if BI tools are more complicated for the business users to interact
directly with the table structures due to more joins and complex queries.
You can go ahead with the SnowFlake schema either if you want to save some storage space or if your DW
system has optimized tools to design this schema.