Lecture 4 2025 EUI
Lecture 4 2025 EUI
The Grain describes the level of detail for the business problem/solution.
It is the process of identifying the lowest level of information for any table in
your data warehouse.
•A fact table is a • Dimensions • A dimension table •The Attributes are • Facts are the
primary table in • Dimension provides contains dimensions the various measurements/
dimension the context of a fact. characteristics of metrics from your
modelling. surrounding a • They are joined to the dimension in
fact table via a business process.
business process dimensional data
event foreign key.
modeling.
•A Fact Table • In simple terms, they • The Dimension • For a Sales
contains give who, what, Attributes are the business process,
1.Measurements/ where of a fact. various columns in a •In the Location a measurement
• In the Sales business dimension table dimension, the
facts would be sales
process, for the fact • No limit for the attributes can be
2.Foreign key to number
quarterly sales number of
dimension table dimensions •State
number, the
dimensions would be • The dimension can •Country
• Who– Customers also contain one or •Zipcode
• Where – Location more hierarchical
relationships
• What – Products
Dimensional Model: A Simple Example
Date Product
Dimension Dimension
Date_Dim PK Product_Dim PK
Loc_Dim PK Cust_Dim PK
Facts / Measures
Location-related Customer-related
Attributes Attributes
2022
all
0-D(apex) cuboid
product customer location
1-D cuboids
3-D(base) cuboid
product, customer, location
https://www.coursera.org/learn/data-warehouse-fundamentals/ungradedLti/7tDHW/hands-on-lab-populating-a-data-warehouse-using-postgresql
2018
Slicing reduces
cube dimension
by 1
Dicing shrinks
a dimension
Summarize a dimension
(average, count, sum, etc.)
Prof. Hoda M. O. Mokhtar 15
3. Choose the Dimensions
• The sales fact table only records the products actually sold. There are
no fact table rows with zero facts for products that didn’t sell because
doing so would enlarge the fact table enormously.
• This is similar to the sales fact table we just designed; however, the
grain would be significantly different.
• Here we’d load one row in the fact table for each product on promotion
in a store regardless of whether the product sold or not.
• The center of the star consists of a large fact table and the points of the star
are the dimension tables.
• A star schema is characterized by one or more very large fact tables that
contain the primary information in the data warehouse, and a number of
much smaller dimension tables (or lookup tables), each of which contains
information about the entries for a particular attribute in the fact table.
1. If a dimension is very sparse (i.e. most of the possible values for the
dimension have no data) and/or a dimension has a very long list of
attributes which may be used in a query, the dimension table may occupy a
significant proportion of the database and snowflaking may be appropriate
1. Historical data
2. Inventory Transactions
• Questions:
• What are the 4 design steps?
• What type of fact table do we have in this model?
• Example
• 60,000 products * 100 store * 14 row width = 84MB
• A year’s worth of daily snapshots >= 30GB
• Inventory levels, however, are not additive across dates because they
represent snapshots of a level or balance at one point in time.
• Because inventory levels are additive across some dimensions but not all,
we refer to them as semiadditive facts.
• Each fact table row will be updated until the product leaves the
warehouse.
Rule 1:
Accumulating snapshots typically have multiple dates in the fact
table representing the major milestones of the process. However,
just because a fact table has several dates doesn’t dictate that it
is an accumulating snapshot.
Rule 2:
The primary differentiator of an accumulating snapshot is that
we typically revisit the fact rows as activity takes place.