0% found this document useful (0 votes)
71 views66 pages

06 07 Dimensional Modeling

This document contains lecture notes on dimensional modeling for a course on database systems and data warehousing/mining. It discusses the need for dimensional modeling over entity-relationship modeling for decision support systems. The key aspects covered include choosing the business process, grain, facts and dimensions in the dimensional modeling process. Specific topics include star schemas, snowflake schemas, dimension hierarchies, and different types of facts and dimensions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views66 pages

06 07 Dimensional Modeling

This document contains lecture notes on dimensional modeling for a course on database systems and data warehousing/mining. It discusses the need for dimensional modeling over entity-relationship modeling for decision support systems. The key aspects covered include choosing the business process, grain, facts and dimensions in the dimensional modeling process. Specific topics include star schemas, snowflake schemas, dimension hierarchies, and different types of facts and dimensions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Islamic Republic Of Afghanistan

Ministry Of Higher Education


Herat University
Computer Science Faculty
Advance Studies in
Database Systems 1 (Semester 7)

(Data Warehousing and Data Mining )


Lecture 6 & 7
Dimensional Modeling

LECTURER: HAMED AMIRY

[Link].2015@[Link]

DATAWAREHOUSING & DATAMINING-LEC 6 & 7 1


Outline
 ER and DM
 Dimensional Modeling
 Process of DM
 Choosing the business process
 Choosing the grain
 Choosing the facts
 Factless fact table
 Snapshot, transaction, accumulating snapshot fact tables
 Additive, Semi Additive and non Additive Facts
 Choosing the dimensions
 Junk Dimension
 Slowly changing dimension
 Degenerate dimension
 Outrigger dimension
 International Issues

2 Datawarehousing & Datamining-Lec 6 & 7


The need for ER modeling?
 Problems with early COBOLian data processing systems.
 Data redundancies
 From flat file to Table, each entity ultimately becomes a Table
in the physical schema.
 Coupled with normalization drives out all the redundancy
out of the database.
 Change (or add or delete) the data at just one point.
 Can be used with indexing for very fast access.
 Resulted in success of OLTP systems.

3 Datawarehousing & Datamining-Lec 6 & 7


“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
4 cat_x_dept Datawarehousing & Datamining-Lec 6 & 7
Need for DM: Un-answered Qs
 Lets have a look at a typical ER data model first.
 Some Observations:
 All tables look-alike, as a consequence it is difficult to identify:

 Which table is more important ?

 Which is the largest?

 Which tables contain numerical measurements of the business?

 Which table contain nearly static descriptive attributes?

5 Datawarehousing & Datamining-Lec 6 & 7


Need for Dimensional Model: The Paradox
 The Paradox: Trying to make information accessible using
tables resulted in an inability to query them!
 ER and Normalization result in large number of tables
which are:
 Hard to understand by the users (DB programmers)
 Hard to navigate optimally by DBMS software
 Too complex for queries that span multiple tables with a
12
large number of records 7
3
11 4
8
9
1
10
6

6 Datawarehousing & Datamining-Lec 6 & 7


2 5
How to simplify a ER data model?
 Two general methods:

 De-Normalization

 Dimensional Modeling (DM)

7 Datawarehousing & Datamining-Lec 6 & 7


What is DM?…
 A simpler logical model optimized for decision support.
 Inherently dimensional in nature, with a single central
fact table and a set of smaller dimensional tables.
 Multi-part key for the fact table
 Dimensional tables with a single-part PK.
 Keys are usually system generated
 Results in a star like structure, called star schema or star
join.
 All relationships mandatory M-1.
 Single path between any two levels.
 Supports ROLAP operations.
 e.g Drill down, roll up

8 Datawarehousing & Datamining-Lec 6 & 7


Dimensions have Hierarchies
Items

Books Cloths

Fiction Text Men Women

Engg Medical

 Analysts tend to look at the data through dimension at a


particular “level” in the hierarchy

9 Datawarehousing & Datamining-Lec 6 & 7


The two Schemas

Star
Snow-flake
 A vastly simplified physical data model!
 Fewer tables (thousands of tables in some
ERP systems).
 Fewer joins resulting in high performance.
 Some requirement of additional space.
10 Datawarehousing & Datamining-Lec 6 & 7
Star Schema of Retail Sale example
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK

Sale Rs. MONTH

Beauty lies in close correspondence with QUARTER

the business, evident even to business YEAR


users.
11 Datawarehousing & Datamining-Lec 6 & 7
Process of Dimensional Modeling

12 Datawarehousing & Datamining-Lec 6 & 7


The Process of Dimensional Modeling
 Four Step Method from ER to DM

 Choose the Business Process


 Choose the Grain
 Choose the Facts
 Choose the Dimensions

13 Datawarehousing & Datamining-Lec 6 & 7


Step-1: Choose the Business Process
 A business process is a major operational process in an
organization.
 Typically supported by a legacy system (database) or an OLTP.
 Examples: Orders, Invoices, Inventory etc.
 In some peoples view organizational or departmental function is
referred as the business process that why they criticize it
 But that view/think is wrong
 By focusing on business processes, rather than departments, consistent,
deduplicated information can be delivered
 e.g. a single model is built to handle orders data rather than building separate
models for marketing and sales departments, which both access the orders
data.

14 Datawarehousing & Datamining-Lec 6 & 7


Step-1: Separating the Process

Star-1

Snow-flake

15
Star-2
Datawarehousing & Datamining-Lec 6 & 7
Step-2: Choosing the Grain
 Grain is the fundamental, atomic level of data (lowest level of
detail) to be represented.
 Grain is also termed as the unit of analyses.
 e.g. unit of weight is Kg etc
 Example grain statements: (one fact row represents a…)
 Line item from a cash register receipt
 Boarding pass to get on a flight LOW Granularity HIGH Granularity
 Daily snapshot of inventory
level for a product in a warehouse Four aggregates per week
4 x 4 = 16 values
 Student enrolled in a course
 Finer-grained fact tables:
 are more expressive
 have more rows
 Trade-off between performance and expressiveness
Two aggregates per week Daily aggregates
2 x 4 = 8 values
 Rule of thumb: Err in favor of expressiveness 6 x 4 = 24 values
 Pre-computed aggregates can solve performance problems
 Grain determines the aggregation level

16 Datawarehousing & Datamining-Lec 6 & 7


The case for data aggregation
 Works well for repetitive queries.
 Justifiable if used for max number of queries.
 Follows the known thought process.
 Provides a “big picture” or macroscopic view.
 Application dependent, usually inflexible to business changes
(remember lack of absoluteness of conventions).
 Aggregation is irreversible.
 Can create monthly sales data from weekly sales data, but the reverse is
not possible.
 Aggregation hides crucial facts

17 Datawarehousing & Datamining-Lec 6 & 7


Aggregation hides crucial facts Example

Week-1 Week-2 Week-3 Week-4 Average


Zone-1 100 100 100 100 100
Zone-2 50 100 150 100 100
Zone-3 50 100 100 150 100
Zone-4 200 100 50 50 100
Average 100 100 100 100

Just looking at the averages i.e. aggregate


 There is no change in sales i.e. neither across
time nor across the geography
 Aggregation can hide crucial facts.
 The average of 100 & 100 is same as 150 & 50
18
Datawarehousing & Datamining-Lec
Aggregation hides crucial facts chart
250
Z1 Z2 Z3 Z4
200

150

100

50

0
Week-1 Week-2 Week-3 Week-4

Z1: Sale is constant (need to work on it)


Z2: Sale went up, then fell (need of concern)
Z3: Sale is on the rise, why?
Z4: Sale dropped sharply, need to look deeply.
W2: Static sale
19 Datawarehousing & Datamining-Lec 6 & 7
Step 3: Choose Facts statement

Facts
“We need monthly sales
volume and Afghanis. by
week, product and Zone”

Dimensions

 Numeric facts are identified by answering the question


"what are we measuring?”
20 Datawarehousing & Datamining-Lec 6 & 7
Step 3: Choose Facts
 Choose the facts that will populate each fact table record.
 Remember that best Facts are Numeric, Continuously Valued,
Additive and non-key.
 Example: Quantity Sold, Amount etc.

 All the candidate facts in a design must be true to the


grain described in previous slides.
 the fact table record considered could be a single transaction
or weekly aggregate or monthly sums etc i.e. a grain is
associated
 Facts that clearly belong to a different grain must be in a
separate fact table

21 Datawarehousing & Datamining-Lec 6 & 7


Step 4: Choose Dimensions

 Choose the dimensions that apply to each fact in the fact


table.

 Typical dimensions: time, product, geography etc.

 Identify the descriptive attributes that explain each dimension.

 Determine hierarchies within each dimension.

22 Datawarehousing & Datamining-Lec 6 & 7


Step-4: How to Identify a Dimension?
 The single valued attributes during recording of a
transaction are dimensions.
Fact Table
Calendar_Date
Time_of_Day
Dim Account _No
ATM_Location
Transaction_Type
Transaction_Afghanis
None of the above dimensions change during a single
transaction
 Time_of_day: Morning, Mid Morning, Lunch Break etc.
 Transaction_Type: Withdrawal, Deposit, Check balance etc.

23 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Can Dimensions be Multi-valued?
 Are dimensions ALWAYS single?
 Not really
 What are the problems? And how to handle them

 Calendar_Date (of inspection)


 Reg_No
 Technician
 Workshop
 Maintenance_Operation
 How many maintenance operations are possible?
 Few
 Maybe more for old cars.
 such as oil change, air filter change, spark plug change, etc

24 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Dimensions & Grain
 Several grains are possible as per business requirement.
 For some aggregations certain descriptions do not remain atomic.
 Example: Time_of_Day may change several times during daily
aggregate, but not during a transaction
 Choose the dimensions that are applicable within the selected
grain.
 Note that higher the level of aggregation of the fact table, the
fewer will be the number of dimensions you can attach to the
fact records.
 Note that none of the ATM example dimensions change during a
single transaction. However, for weekly transactions probably only
Account _No and ATM_Location can be treated as a dimension.
 The more granular the data, the more dimensions make sense.
 Hence the lowest-level data in any organization is the most dimensional

25 Datawarehousing & Datamining-Lec 6 & 7


Issues of Dimensional Modeling

26 Datawarehousing & Datamining-Lec 6 & 7


Step 3: Additive vs. Non-Additive facts
 Additive facts are easy to
work with Month Crates of
 Summing the fact value gives Bottles Sold
meaningful results
May 14
 Additive facts:
 Quantity sold Jun. 20
 Total Afghanis. sales Jul. 24
 Non-additive facts: TOTAL 58
 Averages (average sales price,
unit price) Month % discount
 Percentages (% discount) May 10
 Ratios (gross margin)
Jun. 8
 Count of distinct products sold
Jul. 6
TOTAL 24% ← Incorrect!
27
Datawarehousing & Datamining-Lec
Step-3: Not recording Facts
 Transactional fact tables don’t have records for events
that don’t occur
 Example: No records(rows) for products that were not sold.

 This has both advantage and disadvantage.


 Advantage: Benefit of sparsity of data
 Significantly less data to store for “rare” events

Problem:
 Disadvantage: Lack of information
 Example: What products on promotion were not sold?

29 Datawarehousing & Datamining-Lec 6 & 7


Step-3: A Fact-less Fact Table
 Solution: “Fact-less” fact table
 A fact table without numeric fact columns
 Captures relationships between dimensions
 Perhaps the many-to-many relationships of dimensions
 Use a dummy fact column that always has value 1
 Examples:
 Department/Student mapping fact table
 What is the major for each student?
 Which students did not enroll in ANY course
 Promotion coverage fact table
 Which products were on promotion in which stores for which days?
 List the list of products that have promotion but did not sell/ sell.

30 Datawarehousing & Datamining-Lec 6 & 7


Step-3 Types of fact tables
 Transaction Fact
 Grain set at a single transaction
 Periodic snapshot fact
 Grain is set at a fixed time span
 Accumulated snapshot fact
 Each fact represents a process with a clear beginning and end

31 Datawarehousing & Datamining-Lec 6 & 7


Examples
Transaction fact
Accumulated snapshot

Periodic snapshot
32 Datawarehousing & Datamining-Lec 6 & 7
Transactional vs. Snapshot Facts
 Transactional
 Each fact row represents a discrete event
 Provides the most granular, detailed information
 Snapshot
 Each fact row represents a point-in-time snapshot
 Snapshots are taken at predefined time intervals
 Examples: Hourly, daily, or weekly snapshots
 Provides a cumulative view
 Used for continuous processes / measures of intensity
 Examples:
 Account balance
 Inventory level
 Room temperature

33 Datawarehousing & Datamining-Lec 6 & 7


Transactional vs. Snapshot Facts
Transactional Snapshot
Brian Oct. 1 CREDIT +40 Brian Oct. 1 40
Rajeev Oct. 1 CREDIT +10 Rajeev Oct. 1 10
Brian Oct. 3 DEBIT -10 Brian Oct. 2 40
Rajeev Oct. 3 CREDIT +20 Rajeev Oct. 2 10
Brian Oct. 4 DEBIT -5 Brian Oct. 3 30
Brian Oct. 4 CREDIT +15 Rajeev Oct. 3 30
Rajeev Oct. 4 CREDIT +50 Brian Oct. 4 40
Brian Oct. 5 DEBIT -20 Rajeev Oct. 4 80
Rajeev Oct. 5 DEBIT -10 Brian Oct. 5 40
Rajeev Oct. 5 DEBIT -15 Rajeev Oct. 5 55

34 Datawarehousing & Datamining-Lec 6 & 7


Transactional vs. Snapshot Facts
 Two complementary organizations
 Information content is similar
 Snapshot view can be always derived from transactional fact
 But not the other way around.
 Why use snapshot facts?
 Sampling is the only option for continuous processes
 E.g. sensor readings
 Data compression
 Recording all transactional activity may be too much data!
 Stock price at each trade vs. opening / closing price
 Query expressiveness
 Some queries are much easier to ask/answer with snapshot fact
 Example: Average daily balance

35 Datawarehousing & Datamining-Lec 6 & 7


A Difficult SQL Exercise

How to generate snapshot fact


from transactional fact?

Brian Oct. 1 CREDIT +40 Brian Oct. 1 40


Rajeev Oct. 1 CREDIT +10 Rajeev Oct. 1 10
Brian Oct. 3 DEBIT -10 Brian Oct. 2 40
Rajeev Oct. 3 CREDIT +20 Rajeev Oct. 2 10
Brian Oct. 4 DEBIT -5 Brian Oct. 3 30
Brian Oct. 4 CREDIT +15 Rajeev Oct. 3 30
Rajeev Oct. 4 CREDIT +50 Brian Oct. 4 40
Brian Oct. 5 DEBIT -20 Rajeev Oct. 4 80
Rajeev Oct. 5 DEBIT -10 Brian Oct. 5 40
36 Datawarehousing & Datamining-Lec 6 & 7
Rajeev Oct. 5 DEBIT -15 Rajeev Oct. 5 55
Accumulating Snapshot Facts
 Accumulating Snapshot is a third type of fact table
 Not as common as the other two
 Useful for pipelined processes
 Process proceeds through a series of stages
 1 fact row tracks an entire process through its lifetime
 Best for short-lived processes with linear workflow
 Example: Order fulfillment for custom manufacturing
 Order placed → Release to Mfg → Finished Goods Inventory → Shipped →
Delivered → Invoiced → Returned
 a line on an order, is initially inserted when the order line is created
 As pipeline progress occurs, the accumulating fact table row is revisited and
updated
 Characteristics of accumulating snapshot facts
 Fact row is updated multiple times during process lifetime
 Different from append-only Transactional and Snapshot facts
 Separate date dimension roles for each milestone
 Numeric fact columns corresponding to various stages

37 Datawarehousing & Datamining-Lec 6 & 7


Order fulfillment accumulating snapshot fact table.

38 Datawarehousing & Datamining-Lec 6 & 7


Querying Accumulating Snapshots
 Reporting based on lag
 How long does a process spend in a given pipeline stage?
 Calculated by time lapse between dates
 Average lag as a measurement
 Report on current state of the process
 How many orders are currently in each stage?
 Reporting on historical state
 Combine the Periodic Snapshot and Accumulating Snapshot fact table
types
 Take a periodic snapshot of the “active” rows of the Accumulating
Snapshot fact
 How many unshipped orders were waiting in inventory now vs. three
months ago vs. six months ago?

39 Datawarehousing & Datamining-Lec 6 & 7


Semi-Additive Facts
 Snapshot facts are semi-additive
 Additive across non-date dimensions
 Not additive across date dimension
 Example:
 what's the total current balance for all accounts in the bank?
 adding up all current balances for a given account for each
day of the month is not useful/correct

40 Datawarehousing & Datamining-Lec 6 & 7


Fact tables
 Ensure all records are at the declared grain.
Don’t mix grains
 Don’t add misfit attributes in fact table.
Consider junk dimension instead
 Store additive quantities in the fact table
 Example:
 Don’t store “unit price”
 Store “quantity sold” and “total price” instead

41 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Handling Multi-valued Dimensions?
 One of the following approaches is adopted:
 Drop the dimension.
 Drop the Maintenance_Operation dimension as it is multi-valued
 Use a primary value as a single value.
 and omit the other values
 Such as 20,000Km maintenance or 40,000Km maintenance. (a
bundle)
 Add multiple values in the dimension table.
 add a fixed number of additional columns columns in dimensions
table
 a long list will result in many null entries

42 Datawarehousing & Datamining-Lec 6 & 7


Step-4: OLTP & Slowly Changing Dimensions
 OLTP systems not good at tracking the past. History never
changes.

 OLTP systems are not “static” always evolving, data changing


by overwriting.

 Inability of OLTP systems to track history, purged after 90 to


180 days.

 Actually don’t want to keep historical data for OLTP system.

43 Datawarehousing & Datamining-Lec 6 & 7


Step-4: DWH Dilemma: Slowly Changing
Dimensions
 The responsibility of the DWH to track the changes.

 Example: Slight change in description, but the product ID (SKU)


is not changed.

 Dilemma: Want to track both old and new descriptions, what


do they use for the key? And where do they put the two values
of the changed ingredient attribute?

44 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Explanation of Slowly Changing
Dimensions…
 Compared to fact tables, contents of dimension tables are
relatively stable.
 New sales transactions occur constantly (fact).
 New products are introduced rarely(dimension).
 New stores are opened very rarely(dimension).
 The assumption does not hold in some cases
 Certain dimensions evolve with time
 e.g. description and formulation of products change with time
 Customers get married and divorced, have children, change
addresses etc.
 Land changes ownership etc.
 Changing names of sales regions.

45 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Explanation of Slowly Changing
Dimensions…

 Although these dimensions change but the change is not


rapid.

 Therefore called “Slowly” Changing Dimensions

46 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Handling Slowly Changing Dimensions
 Option-1: Overwrite History
 Example: Code for a city, product entered incorrectly

 Just overwrite the record changing the values of modified


attributes.

 No keys are affected.

 No changes needed elsewhere in the DM.

 Cannot track history and hence not a good option in DSS.

47 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Handling Slowly Changing Dimensions
 Option-2: Create current valued field

 Example: The name and organization of the sales regions


change over time, and want to know how sales would have
looked with old regions.

 Add a new field called current_region rename old to


previous_region.

 Sales record keys are not changed.

 Only TWO most recent changes can be tracked.

48 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Handling Slowly Changing Dimensions
 Option-3: Preserve History
 Example: The packaging of a part change from glued box to
stapled box, but the code assigned (SKU) is not changed.

 Create an additional dimension record at the time of change


with new attribute values.

 Segments history accurately between old and new description

 Requires adding two to three version numbers to the end of


key. SKU#+1, SKU#+2 etc.

49 Datawarehousing & Datamining-Lec 6 & 7


Step-4: Pros and Cons of SCD Handling
 Option-0: Don’t update dimension at all
 Not up to date > wrong reports > businesses don’t like it
 Option-1: Overwrite existing value
 Simple to implement
 No tracking of history
 Option-2: Add a new dimension row
 Accurate historical reporting to last TWO changes
 Record keys are unaffected
 Dimension table size increases
 Option-3: Add a new field
 Accurate historical reporting
 Pre-computed aggregates unaffected
 Dimension table grows over time

50 Datawarehousing & Datamining-Lec 6 & 7


Another SCD Handling Alternative - Example

51 Datawarehousing & Datamining-Lec 6 & 7


Step-4 Junk Dimension
 Sometimes certain attributes don't fit nicely into any dimension
 Payment method (Cash vs. Credit Card vs. Check)
 Bagging type (Paper vs. Plastic vs. None)
 Create one or more "mix" dimensions
 Group together leftover attributes as a dimension even if not related
 Reduces number of dimension tables, width of fact table
 Works best if leftover attributes are
 Few in number
 Low in cardinality
 Correlated
 Other options
 Each leftover attribute becomes a dimension
 But Ideally, the concatenated primary key of a fact table should consist of fewer
than 10 foreign keys
 Eliminate leftover attributes that are not useful

52 Datawarehousing & Datamining-Lec 6 & 7


Step-4 Junk Dimensions
 A technique that allows reduction of the number
of foreign keys in a fact table is the creation of
"junk" dimensions.
 These are just "made up" dimensions where you can put
several of these single level hierarchies.
 Simplify model, don’t create a bunch of small dimensions

53 Datawarehousing & Datamining-Lec 6 & 7


Junk dimension example

54 Datawarehousing & Datamining-Lec 6 & 7


Step-4 Degenerate Dimension
 Stored in Fact table
 Is Dimension Key without Related Dimension
 Is Not a fact & not an attribute
 Provides grouping and business meaning

55 Datawarehousing & Datamining-Lec 6 & 7


Step-4 Snowflake Dimension/scheme
 Dimension tables are not in normal form
 Redundant information about hierarchies
 Avoid redundancy → some storage savings

 Snowflaking not recommended in most cases


 More tables = more complex design
 More tables → more joins → slower queries
 Space consumed by dimensions is small compared to facts
 Exception: Really big dimension tables

56 Datawarehousing & Datamining-Lec 6 & 7


Snowflake dimension example
 Snowflake dimension/scheme

 Non Snowflake dimention

57 Datawarehousing & Datamining-Lec 6 & 7


Step-4 Outrigger dimension
 Not fully normalized
 One level removed from fact table
 Use as need, but don’t overuse

58 Datawarehousing & Datamining-Lec 6 & 7


International Issues
 International organizations often have facts
denominated in different currencies
 Some transactions are in dollars, others in Euros, still others
in yen, etc.
 Reporting requirements may be diverse
 Standard currency vs. local currency
 Historical exchange rate vs. current exchange rate
 Time zones cause a similar problem
 Sometimes local time is most meaningful
 E.g. buying patterns are different in morning vs. afternoon
 Sometimes standardized time (e.g. GMT) is better
 Correctly express relative order of events

59 Datawarehousing & Datamining-Lec 6 & 7


Handling Multiple Currencies
 Add a Currency dimension to the fact table
 Values are US Dollars, Yen, Euros, etc.
 Each currency-denominated fact gets 2 fact columns
 One column uses the local currency of the transaction
 The other column stores the equivalent value in standard currency
 Currency dimension is used to indicate the units being used in the local
currency column
 Historical exchange rate in effect the day of the transaction is used for
the conversion
 Create a special currency conversion table
 Store current conversion factor between each pair of currencies
 Used to generate reports in any currency of interest

60 Datawarehousing & Datamining-Lec 6 & 7


Multi-Currency Example
Sales Fact
Product Date Currency AmtLocal AmtUSD
443 87 1 400 400
1287 87 4 1250 1447
34 88 2 3500 380
Currency Dimension Conversion Table

Key Name Abrv Country From To Factor


1 US Dollar USD USA 1 2 111.3
2 Japanese Yen JPY Japan 1 3 .562
3 Pound Sterling GBP UK 1 4 .814
4 Euro EUR Europe 2 1 .0089
61 Datawarehousing & Datamining-Lec 6 & 7
… … …
Master-Detail Facts
 Consider order data from an e-commerce site
 Each Order consists of a series of Lineitems
 Each Lineitem represents one product that is
purchased
 Measurements are calculated at different levels
 Each Lineitem has Quantity and Price
 Each Order has Tax, Discount, and ShippingFee
 Natural design: two fact tables, different grains
 Orders fact table with 1 row per order
 Lineitem fact table with 1 row per line item

62 Datawarehousing & Datamining-Lec 6 & 7


Orders and Lineitems

Orders Header Fact Order Lineitem Fact


 Dimensions  Dimensions
 Date  Date
 Customer  Customer
 OrderID (degenerate)  Product
 Fact Columns  OrderID (degenerate)
 Tax  Fact Columns
 Discount  Quantity
 ShippingFee  Price
 TotalPrice

63 Datawarehousing & Datamining-Lec 6 & 7


A Problem with the Design
 Difficult to report on revenue/income by product
 Orders fact lacks Product dimension
 Adding Product would violate the grain
 Lineitem fact lacks important revenue data
 Effects of discount, tax, shipping are important
 But they are not captured at the lineitem level!
 Solution: allocation of master-level facts to detail-level
 Add Tax, Discount, and ShippingFee columns to the Lineitem fact table
 Distribute Tax, Discount, and ShippingFee for the order among its
component line items
 Sum of allocated Tax for all line items in an order = actual overall Tax for
that order
 Different allocation policies are possible

64 Datawarehousing & Datamining-Lec 6 & 7


Allocating header facts to the line item.

65 Datawarehousing & Datamining-Lec 6 & 7


Visit these links
 [Link]
m
 [Link]
 [Link]
[Link]
 [Link]
 [Link]
 [Link]
 [Link]
 [Link]
warehouse/?link_body%2Fbody=%7Bincl%3AAggregation
%7D
66 Datawarehousing & Datamining-Lec 6 & 7
Any Question???

67 Datawarehousing & Datamining-Lec 6 & 7

You might also like