DWT Chapter 2
Part 1
Dimensional Techniques
Identification of Data
Gather business requirements
Determine what dimensional data needs to be represented by holding
meetings with business representatives
During these meetings be sure to watch for undiscovered opportunities
Fours Step Design Process
1. Select business process
2. Declare the grain
3. Identify dimensions
4. Identify the facts
Business Process
Operational activities which take place in a business
Most fact tables focus on the results of individual business
processes
The target business process must be identified in order to properly
identify the proper level of data grain, dimensions, and facts
This is the first and best place to combat scope creep
By requiring the business representatives to pick a specific
business process to begin defining the other attributes, you can
counter those who want to “see it all”
Determine the grain
The grain establishes exactly what a single fact table row
represents
Must be declared before dimensions and facts are identified to
ensure both dimensions and facts are collected and stored at the
same level
Prevents future apples to oranges problems with atomic vs
aggregate data
Identify dimensions
Dimension tables contain the attributes used by BI applications to
filter and group the facts in the fact tables
Identification usually starts with conversations with business users
upon which the dimensions are expanded to provide filtering and
grouping even the business users don’t yet know they want or need
Identify the facts
The facts are what the business representatives eventually want to
see
The use of the dimensions to group or filter the facts is what
provides the context for the reports
Facts are generally numeric in nature, which provides for
aggregation in reports
Extensions to dimensional
models
All the following can easily be used to improve the functionality of
dimensional models
1. Facts (of consistent grain) can be added to an existing fact table by
simply adding new columns
2. Attributes can be added to dimension tables by adding new columns
3. New dimensions can be added and attributed to a fact table through
the addition of new foreign keys to the fact tables
4. The grain of a fact table can be made more atomic by adding attributes
to an existing dimension table and then restating the fact table at the
lower grain (original columns should be retained in the fact table)
Fact table techniques
Fundamental design is based upon a physical business activity and
should never be influenced by the future reports which will draw
upon the fact table data
Fact table data can be:
Fully additive: Numerical data that can be added across any
dimension attributed to the fact table (sales)
Semi-additive: Numerical data that can be summed across some,
but not all dimensions (account balances can be added for a certain
date, but not across dates)
Non-additive: Numerical data that cannot be summed (ratios)
Transaction Fact Tables
Holds transactional business event data
The fact table row can be traced to an actual real-world event in
time and space
These tables can be very dense or sparse as each row is only
formed when a specific business event takes place
Always contain foreign keys to connect to one or more dimension
tables
Periodic snapshot fact tables
Rows in these tables aggregate fact data over standard time-
frames (weeks, months, seasons)
The grain is the period instead of individual events
These tables are uniformly dense
Accumulating snapshot fact
tables
Rows in these tables aggregate fact table events which take place
over a set period of time like the beginning and end of a process
Factless fact tables
Factless fact tables are often used to record events in terms of
what happened or did not happen rather than facts that have a
numerical measure
Customer appointments or customer communications are a couple
examples. Dinner reservations would be another
Aggregate fact tables
These are essentially the OLAP cubes we discussed previously
which aggregate lots of possible query results from the large fact
tables
Consolidated fact tables
These tables hold related data from multiple separate fact tables
The text mentions consolidating historical fact data with
forecasted fact data to make comparisons between the two
different fact tables easier and faster
Dimension table structure
Every dimension table contains a primary key which becomes the
foreign key used to connect to one or more fact tables
However, these keys for the dimension tables cannot be natural
keys as the same natural keys might be in use in several places in
the system
In addition, as natural keys are connected with the outside real
world, the DW administrators would not have true control over the
keys being used in the dimension tables
For this reason, dimension tables use dimension surrogate keys
(simple integers starting with 1 and incremented each time a new
key is needed)
Natural vs. durable
(supernatural) keys
Natural keys: subject to business rules outside the control of the DW system.
These are potentially changeable like an employee ID number, phone number
etc.
Durable keys: are created in the system to create a permanent key to refer to
something. The best ones have no true connection to the business process
which might use natural keys. This makes them more difficult to change as
users and/or administrators will not likely see the connection between the
durable key and natural key
While an employee who is hired, fired, and re-hired might have more than one
employee ID number (natural key) in the transactional database, in the DW
system they would be assigned a durable key once, and that key will always
refer to that individual no matter how many employee IDs they might be given
over time
What questions can I answer?