0% found this document useful (0 votes)
36 views31 pages

Lecture 4

The document discusses multidimensional data modeling and OLAP. It introduces data cubes as a way to model and view data in multiple dimensions. Dimensions include tables like products and time, and facts include measures like sales. It provides an example of a star schema and how a cube with three dimensions can represent it. It then discusses OLAP operations like roll-up, drill-down, slice and dice used to analyze aggregated data in cubes.

Uploaded by

chan chanchan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views31 pages

Lecture 4

The document discusses multidimensional data modeling and OLAP. It introduces data cubes as a way to model and view data in multiple dimensions. Dimensions include tables like products and time, and facts include measures like sales. It provides an example of a star schema and how a cube with three dimensions can represent it. It then discusses OLAP operations like roll-up, drill-down, slice and dice used to analyze aggregated data in cubes.

Uploaded by

chan chanchan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CST3340

Multidimensional Data
Modeling
and OLAP

Slides adapted from Jiawei Han, Micheline Kamber, Jian Pei,


(2011), Data Mining: Concepts and Techniques, Third Edition, The
Morgan Kaufmann Series in Data Management System.

CST3340 _ Business Intelligence


From Tables and Spreadsheets to
Data Cubes

• The multidimensional data model allows a data warehouse to


be viewed in the form of a data cube.
• A data cube, such as sales, allows data to be modeled and
viewed in multiple dimensions.
– Dimension tables, such as product (produt_name,
category, sub_category), or time(day, week, month,
quarter, year)..
– Fact table contains measures (such as units_sold, Revenue,
Profit, Cost) and foreign keys to each of the related
dimension tables.
CST3340 _ Business Intelligence
Example of a Star Schema
Product
Time Product_SK
Time_SK Sales Fact Table Name
Day Time__SK Subcategory
Week Product_SK Category
Month Location_SK Supplier_Type
Quarter Units_sold
Year Unit_cost Location
Revenue Location_SK
Profit Store
Key: Average_Sales Street
Surrogate Keys City
Facts
District
Country
CST3340 _ Business Intelligence
Data Cube 1
• A star schema with 3 dimensions can be
represented by a cube.
• Each side of the cube represents a dimension.
• The cells of the cube store the measures.
• An n-dimensional star schema can be
represented by an n-sides shape but cannot
be visualised.

CST3340 _ Business Intelligence


Visualisation of Multi-dimensional Modelling

Location Sales Data


USA Cube
Canada

PC
UK Phone
Laptop
Q1 Q2 Q3 Product

Time

CST3340 _ Business Intelligence


Data Cube 2
• Different types of data stored in the
cube:
– Data values such as unit sales, profit, costs.
– Aggregated values such as sum, average etc.
– Column and row totals such as
• Sum of sales in Q1.
• Total sales in Canada

CST3340 _ Business Intelligence


A Sample Data Cube

Total annual sales


Quarter
e
of TVs in U.S.A.
yp

Q1 Q2 Q3 Q4 sum
TV
tt
uc

PC U.S.A
od

Phone
Pr

Country
sum
Canada

UK

sum

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
SAMPLE CUBE with aggregates (totals)

Total annual sales


Quarter of TV in U.S.A.
Q1 Q2 Q3 Q4 sum
ct

TV Total annual sales


u
od

PC ofU.S.A
PC in U.S.A.
Pr

VCR Total salesTotal annual sales


sum Q1 sales
Total In U.S.Aof Phone in U.S.A.

Country
In U.S.A Canada
Total sales
Total Q1 sales
In Canada In Canada UK
Total Q1 sales Total sales
In UK In UK sum
Total Q2 sales
Total Q1 sales
In all countries TOTAL SALES
In all countries

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
What is OLAP?

• OLAP (On-Line Analytical Processing) is used to process data in a


multidimensional cube.
• Data are stored in data cubes, which are defined over multiple dimensions.
• Used to discover trends and patterns in the data using
– Business Intelligence Report
– Complex calculations
– Forecasting – What if analysis.
– Ad-hoc queries
• Used to power many Business applications such as
– Business Performance Management
– Knowledge Discovery
– Simulation Models

CST3340 _ Business Intelligence


OLAP Operations 1

• Roll up (drill-up): Data is summarised over


dimensions – aggregated data
– Moving up the hierarchy of a dimension.
– Reducing the number dimensions.
• Drill down (roll down): Opposite of roll-up – more
detailed data
– Moving down the hierarchy of a dimension
– Increasing the number of dimensions.

CST3340 _ Business Intelligence


OLAP Operations 2

• Slice:
– Using a slice of cube,
– Reducing the number of dimensions
• Dice:
– Using a sub-cube
– Reducing the number of values considered on
each dimension

CST3340 _ Business Intelligence


OLAP Operations 3

• Pivot (rotate):
– Considering different faces of the cube
– Visualising the 3D cube as a series of 2D planes.
• Other operations
– drill across: uses the fact constellation – several
fact table.
– drill through: uses SQL to access the data in the
original relational tables.

CST3340 _ Business Intelligence


Location Dimension Hierarchy

all

North
Europe
America

Spain Germany Canada Mexico

CST3340 _ Business Intelligence


Original Data Cube

y Month
or
eg

Jan Feb Mar Apr May


at

TV
tC

PC U.S.A
uc

Phone
od

Country
Pr

Laptop Canada

UK

France

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Roll Up

y Quarter
or
eg

Q1 Q2 Q3 Q4 The data on
t
Ca

TV
PC U.S.A the Time
t
uc

dimension
od

Phone

Country
Pr

Laptop Canada is rolled up


from
UK months to
quarters.
France The data is
aggregated

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Roll Down

y Month
or
eg

Jan Feb Mar Apr The data on


t
Ca

TV
PC U.S.A the Time
t
uc

dimension
od

Phone

Country
Pr

Laptop Canada is rolled


down from
UK quarters to
Months.
France The data is
more
detailed

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Slice on 1 dimension

Month
y Apr
or
TV A slice of
eg

PC U.S.A the cube is


at
tC

Phone
extracted.

Country
uc

Laptop Canada For example


od
Pr

consider
UK the sales in
Apr. This
France produces a
sub cube.

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Slice on 2 dimensions

Month A slice of the


Apr cube is
extracted. For
Laptop
U.S.A example
consider the

Country
Canada sales of laptops
in Apr. This
UK produces a sub
cube.
France

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Dice

A sub cube is
extracted. For
y

Month
or

Example,
eg

Jan Apr consider the sale


at
tC

PC of Laptops and
uc

Laptop

Country
Canada PCs in January
od
Pr

and April in
France
Canada and
France

CST3340 _ Business Intelligence Source:Jiawei Han, Micheline Kamber, Jian Pei (2011)
Example of Rotation

Rotation
six different
views - Data
Product slicing

Location
Time

CST3340 _ Business Intelligence


Data Rotation/Slicing – Different views of the data
Product Location Location

Time Time Product

Time Time Product

Product Location Location


CST3340 _ Business Intelligence
Different way data is stored

• The data cube is a good way to visualise the data in


an OLAP environment.
• However, Data is not stored on disc as a data cube
• There are three different logical data models
– ROLAP: Relational Online Analytical Processing
– MOLAP: Multi-dimensional Online Analytical
Processing.
– HOLAP: Hybrid Online Analytical Processing.

CST3340 _ Business Intelligence


ROLAP: Relational Online Analytical Processing 1

• Data is stored and manipulate by a relational DBMS.


• Gives the appearance of traditional OLAP functionality.
• Advantages:
– Handles large amounts of data.
– Has access to relational database functionality.
– Easy to modify and maintenance is low.
– Uses star and snowflake schemas.
– Proved technology (DBMS and relational Model).
– Can out perform Multi Dimensional Database for large data
sets.

CST3340 _ Business Intelligence


ROLAP: Relational Online Analytical Processing 2

• Disadvantages:
– Performance can be slow
• ROLAP can require multiple complex SQL queries which
can be slow for large data sets.
– Limited by functionality of SQL
• SQL does not meet all the requirement of ROLAP, such
as complex calculations.
• Many vendors mitigate this risk by building complex
functions or allowing user defined functions.

CST3340 _ Business Intelligence


MOLAP: Multi-dimensional Online Analytical
Processing 1.
• Data (Facts) stored in multidimensional arrays,
separate from the Data warehouse.
• Known as a Multidimensional Database
• Users have direct access to the arrays.
• Dimensions used as index for arrays.

CST3340 _ Business Intelligence


MOLAP: Multi-dimensional Online Analytical
Processing 2.
• Advantages:
– Excellent performance: MOLAP cubes are
designed for fast and efficient retrieval.
– Optimal efficiency for slicing and dicing
operations.
– Performance of complex calculations is optimised
as many calculations are generated as the cube is
formed.

CST3340 _ Business Intelligence


MOLAP: Multi-dimensional Online Analytical
Processing 3.
• Disadvantages:
– Limited data as calculations are performed as the
cube is created, not all data can be included.
– Requires additional investment in human and
capital resources.
• Extra technology required to store and manipulate
cubes.
• Extra technicians required to support the MDDB

CST3340 _ Business Intelligence


HOLAP: Hybrid Online Analytical Processing 1.

• ROALP and MOLAP combined


• Detailed data stored on RDBMS
• Aggregated data stored in MDBMS
• User access MDBMS via MOLAP tools.

CST3340 _ Business Intelligence


HOLAP: Hybrid Online Analytical Processing 2.

RDBMS Multidimensional
Server MDBMS Viewers
Server Multidimensional
SQL Read access

Relational
Multiple dimensional Data Viewers

SQL Read

Relational
Data

CST3340 _ Business Intelligence


ROLAP vs MOLAP vs HOLAP
ROLAP MOLAP HOLAP
Storage Method Relational DB Multidimensional Both
DB
Data Arrangement Star Schemas. Data Cubes Multidimensional
& schemas Tables with rows
and columns
Volume Enormous amount Limited data can be Large amounts of
of data processed processed data processed
Technique SQL Sparce Matrix SQL and Sparse
technology Matrix Technology
Design View Dynamic Static Dynamic

CST3340 _ Business Intelligence


Reading

Chapter 4, Section 4.2:

Jiawei Han, Micheline Kamber, Jian Pei, (2011), Data
Mining: Concepts and Techniques, Third Edition, The
Morgan Kaufmann Series in Data Management System.


Chapter 3, Section 3.6:

Sharda, Delen, Turban (2018), Business Intelligence
Analytics and Data Science: A management
Perspective. Pearson.

CST3340 _ Business Intelligence

You might also like