100% found this document useful (1 vote)
466 views

Data Mining: OLAP Operations

The document discusses OLAP operations and data warehousing. It describes five OLAP operations performed on a data cube: roll-up, drill-down, slice, dice, and pivot. It also explains three common data warehouse schemas - star schema, snowflake schema, and fact/galaxy schema. Finally, it summarizes the ETL process used to integrate data from source systems into a data warehouse, including extract, transform, and load steps.

Uploaded by

Nithyan Nithya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
466 views

Data Mining: OLAP Operations

The document discusses OLAP operations and data warehousing. It describes five OLAP operations performed on a data cube: roll-up, drill-down, slice, dice, and pivot. It also explains three common data warehouse schemas - star schema, snowflake schema, and fact/galaxy schema. Finally, it summarizes the ETL process used to integrate data from source systems into a data warehouse, including extract, transform, and load steps.

Uploaded by

Nithyan Nithya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Mining

OLAP Operations

 OLAP stands for Online Analytical Processing.

 Applications of OLAP are

o Finance and accounting.

o Sales and marketing.

o Production.

There are 5 operations performed on data cube.

o Roll-up

o Drill down

o Slice

o Dice

o pivot

Operations

on cube

Roll-up Drill- Slice Dice Pivot

down

1) Roll-up: -

o When roll-up operation is performed on data cube one or more dimensions

from that cube are removed.

o Roll up operation performs aggregation on data cube by climbing up the hierarchy to this

Prepared By: K C Silpa Page 1


Data Mining

dimension are reduced

o Here more detailed data to less detailed data.

o Consider a cube of 3 dimensions.

o Here we observed that Tumkur and Mysore both cities are assumed to climbing up

“India” country and Pune and Bombay cities are into U.S.A, Hence number of

dimensions are reduced.

2) Drill-down: -
o It is reverse operation of roll-up.

o When we perform drill-down on data cube dimensions are added to the cube.

o It means lower level summary to higher level summery.

o Here less detailed data to more detailed data. Consider that we are going add dimensions

to time dimensions.

Prepared By: K C Silpa Page 2


Data Mining

3) Slice: -

It performs selection on one dimension from given cube and provides a new sub-

cube.

o We select only one particular dimension.

4) Dice: -

This operation selects two or more dimensions from a given cube and provides a new sub

cube.

Prepared By: K C Silpa Page 3


Data Mining

5. Pivot: -

o It is also called as rotation.

o It is a technique of changing from one-dimension orientation to another with value also.

Data ware house Schema

A schema is an overall structure or design of objects like tables views index etc.

The data warehouse is designed using either one of these three schemas. They are

o Star Schema

o Snow flake Schema

o Fact / Galaxy Schema

Prepared By: K C Silpa Page 4


Data Mining

1. Star Schema:

o It is the simplest schema and very easy to understand.

o It has only one fact table which stores foreign keys and it refers number ofdimension

tables.

o In this schema all dimension tables are "not normalized”. It means a less number of

tables are used for that less number of joins used in this.

o It is suitable for query processing.

Ex:

Consider 4-dimensional table like book, college, employee and student and one fact table that is

university.

o In the above star schema “University” is a table which referred to all other tables.

o This schema is most suitable for query processing because we can use simple query.

o Problem is more data redundancy because tables are denormalized.

Prepared By: K C Silpa Page 5


Data Mining

2. Snowflake schema:

o It is also same as the star schema, which is also having only one “fact” table which is

referred to number of dimension tables.

o But the difference is, in this schema all “dimension tables are normalized” and these

tables can have multiple levels.

o If the tables are normalized, more no of tables are used and more joins are used in order

to get the result.

o Advantage is, less redundancy because of normalized hence dimension tables are easy to

update and maintain.

Consider the same 4-dimension tables but again we are going to tale next level of book table.

3. Fact /Galaxy schema:

o In this we can use multiple fact tables that share common dimension tables

Prepared By: K C Silpa Page 6


Data Mining

o It is complex schema due to multiple fact table’s maintenance.

o Dimension tables are also very large in this suppose we have 2 fact tables sales and

publisher but one-dimension table book.

Primary key Foreign key


Foreign key
pid
Bid bid
Bid
title lid
transaction
price

Fact table Dimension table Publisher fact table

ETL process: [Extract Transformation/Transform Loading]

o The process of moving data from traditional databases to data warehouse is called ETL

process.

o Transactional databases cannot answer complex questions then we can use ETL.

o ETL provides a method of moving data from various sources to data warehouse.

Source 1
Different sources

extract

Transform Load data


Source 2
staging in to data
area warehouse

Source 3

1. Extract: [data extraction]

In this step, data is extracted from the multiple sources.

2. Data transformation:

Prepared By: K C Silpa Page 7


Data Mining

The second step of ETL process is transformation.

After extracting data, it is a raw and it’s not useful for that reason we need to do some

transformations like,

 Filtering: loading any particular things in to the data warehouse.

 Cleaning: the data should be accurate.

 Joining: combining multiple columns to one column.

 Splitting: dividing single column to multiple columns.

 Merging: merge the data from multiple sources

 Sorting: arrange the data in any one of the orders.

Here, staging area gives an opportunity to validate the extracted data before it is moved to data

warehouse.

3. Data loading:

The third step/ final step o ETL process is loading.

In this step, the transformed data is finally loaded into data warehouse

daily/weekly/monthly/yearly.

Prepared By: K C Silpa Page 8

You might also like