0% found this document useful (0 votes)
31 views15 pages

DW Chap2

Uploaded by

samanthaargent21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

DW Chap2

Uploaded by

samanthaargent21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no.

(1-29)

 1.8.4 Junk Dimension  ETL Tools : Most commonly used ETL tools are
Sybase, Oracle Warehouse builder, Clover ETL and
 A junk dimension is a grouping of typically low MarkLogic.
cardinality attributes, so you can remove them from
main dimension.
 You can use Junk dimensions to implement the rapidly
changing dimension where you can use it to stores the
attribute that changes rapidly. For example, attributes
such as flags, weights, BMI (body mass index), etc. (1A16)Fig. 1.9.1 : ETL Process
 1.8.5 Degenerated Dimension Let us understand each step of the ETL process in depth:

 A degenerated dimension is a dimension that is derived 1. Extraction


from fact table and does not have its own dimension
 The first step of the ETL process is extraction. In this
table.For example, receipt number does not have
step, data from various source systems is extracted
dimension table associated with it. Such details are just
which can be in various formats like relational
for information purpose.
databases, NoSQL, XML and flat files into the staging
 1.8.6 Role Playing Dimension area.
 It is important to extract the data from various source
Dimensions which are often used for multiple purposes systems and store it into the staging area first and not
within the same database are called role-playing dimensions. directly into the data warehouse because the extracted
For example, you can use a date dimension for “date of data is in various formats and can be corrupted also.
sale”, as well as “date of delivery”, or “date of hire”.
 Hence loading it directly into the data warehouse may
 1.9 MAJOR STEPS IN ETL PROCESS damage it and rollback will be much more difficult.
Therefore, this is one of the most important steps of
 The process of extracting data from source systems and ETL process.
bringing it into the data warehouse is commonly
Data Extraction Techniques
called ETL, which stands for extraction,
transformation, and loading. Note that ETL refers to a There are two types of data warehouse extraction
broad process, and not three methods: Logical and Physical extraction methods.
well-defined steps.
(A) Logical Extraction
 It is a process in which an ETL tool extracts the data
Logical Extraction method in-turn has two methods:
from various data source systems, transforms it in the
staging area and then finally, loads it into the Data (i) Full Extraction
Warehouse system.  In this method, data is completely extracted from the
 The ETL process requires active inputs from various source system. The source data will be provided as-is
stakeholders, including developers, analysts, testers, and no additional logical information is necessary on
top executives and is technically challenging. the source system. Since it is complete extraction, so
 To maintain its value as a tool for decision-makers, no need to track source system for changes.
Data warehouse technique needs to change with  For example, exporting complete table in the form of
business changes. flat file.
 ETL is a recurring method (daily, weekly, monthly) of (ii) Incremental Extraction
a Data warehouse system and needs to be agile,  In incremental extraction, the changes in source data
automated, and well documented. need to be tracked since the last successful extraction.

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-30)

Only these changes in data will be extracted and then distorted and meaningless data within a dataset.
loaded. Identifying the last changed data itself is the Smoothing uses algorithms to highlight the special
complex process and involve many logics. features in the data. After removing noise, the process
 You can detect the changes in the source system from can detect any small changes to the data to detect
the specific column in the source system that has the special patterns.
last changed timestamp. You can also create a change  Data Aggregation: Aggregation is the process of
table in the source system, which keeps track of the collecting data from a variety of sources and storing it
changes in the source data. in a single format. Here, data is collected, stored,
analyzed and presented in a report or summary format.
(B) Physical Extraction
It helps in gathering more information about a
Physical extraction has two methods: Online and particular data cluster. The method helps in collecting
Offline extraction. vast amounts of data.
(i) Online Extraction  Discretization: This is a process of converting
In this process, extraction process directly connects to continuous data into a set of data intervals. Continuous
the source system and extract the source data. attribute values are substituted by small interval labels.
This makes the data easier to study and analyze.
(ii) Offline Extraction
 Generalization: In this process, low-level data
 The data is not extracted directly from the source
attributes are transformed into high-level data
system but is staged explicitly outside the original
attributes using concept hierarchies. For example, age
source system.
data can be in the form of (20, 30) in a dataset. It is
 You can consider the following common structure in transformed into a higher conceptual level into a
offline extraction: categorical value (young, old).
1. Flat file: Generic format  Attribute construction: In the attribute construction
2. Dump file: Database specific file method, new attributes are created from an existing set
2. Transformation
of attributes. For example, in a dataset of employee
information, the attributes can be employee name,
 The second step of the ETL process is transformation. employee ID and address. These attributes can be used
In this step, a set of rules or functions are applied on to construct another dataset that contains information
the extracted data to convert it into a single standard about the employees who have joined in the year 2019
format. only. This method of reconstruction makes mining
 It may involve following processes/tasks: more efficient and helps in creating new datasets
1. Filtering – loading only certain attributes into the quickly.
data warehouse.  Normalization: Also called data pre-processing, this is
2. Cleaning – filling up the NULL values with some one of the crucial techniques for data transformation
default values, mapping U.S.A, United States and in data mining. Here, the data is transformed so that it
America into USA, etc. falls under a given range. When attributes are on
3. Joining – joining multiple attributes into one. different ranges or scales, data modelling and mining
4. Splitting – splitting a single attribute into multiple can be difficult. Normalization helps in applying data
attributes. mining algorithms and extracting data faster.
5. Sorting – sorting tuples on the basis of some 3. Loading
attribute (generally key-attribute).
 The third and final step of the ETL process is loading.
Data Transformation Techniques
In this step, the transformed data is finally loaded into
 Data Smoothing: This method is used for removing the data warehouse.
the noise from a dataset. Noise is referred to as the

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-31)

 Sometimes the data is updated by loading into the data designed in such a way that creating and viewing
warehouse very frequently and sometimes it is done reports become easy. At the core of the OLAP concept,
after longer but regular intervals. is an OLAP Cube. The OLAP cube is a data structure
 The rate and period of loading solely depends on the optimized for very quick data analysis. The OLAP
requirements and varies from system to system. Cube consists of numeric facts called measures which
are categorized by dimensions. OLAP Cube is also
 Loading can be carried in two ways:
called the hypercube.
(A) Refresh : Data Warehouse data is completely
rewritten. This means that older file is replaced.
Refresh is usually used in combination with static
extraction to populate a data warehouse initially.
(B) Update : Only those changes applied to source
information are added to the Data Warehouse. An
update is typically carried out without deleting or
modifying pre-existing data. This method is used
in combination with incremental extraction to
update data warehouses regularly. (1A17)Fig. 1.10.1 : OLTP Vs OLAP Operations

 1.10 OLTP VS OLAP  The following table summarizes the major differences
between OLTP and OLAP system design.
 We can divide IT systems as transactional (OLTP) and Table 1.10.1: OLTP Vs OLAP
analytical (OLAP). In general, we can assume that
OLTP systems provide source data to data warehouses, Parameters OLTP OLAP
whereas OLAP systems help to analyze it. Process It is an online OLAP is an online
transactional analysis and data
 OLTP (On-line Transaction Processing) is system. It retrieving process.
characterized by a large number of short on-line manages
transactions (INSERT, UPDATE, DELETE). The main database
emphasis for OLTP systems is put on very fast query modification.
processing, maintaining data integrity in multi-access Characteristic It is It is characterized by
characterized a large volume of
environments and an effectiveness measured by by large data.
number of transactions per second. In OLTP database numbers of
there is detailed and current data, and schema used to short online
store transactional databases is the entity model transactions.
(usually 3NF). Functionality OLTP is an OLAP is an online
online database database query
 OLAP (On-line Analytical Processing) is modifying management system.
characterized by relatively low volume of transactions. system.
Queries are often very complex and involve Method OLTP uses OLAP uses the data
aggregations. For OLAP systems a response time is an traditional warehouse.
effectiveness measure. OLAP applications are widely DBMS.
used by Data Mining techniques. In OLAP database Query Insert, Update, Mostly Select
and Delete operations
there is aggregated, historical data, stored in multi- information
dimensional schemas (usually star schema). For from the
example, a bank storing years of historical records of database.
check deposits could use an OLAP database to provide Table Tables in Tables in OLAP
reporting to business users. OLAP databases are OLTP database database are not
are normalized. normalized.
divided into one or more cubes. The cubes are

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-32)

Parameters OLTP OLAP Parameters OLTP OLAP


Source OLTP and its Different OLTP Airline,
transactions are databases become Banking, etc.
the sources of the source of data User type It is used by Used by Data
data. for OLAP. Data critical knowledge users like
Storage The size of the Large amount of users like clerk, workers, managers,
data is data is stored DBA & Data and CEO.
relatively small typically in TB, PB. Base
as the historical professionals.
data is Purpose Designed for Designed for
archived. For real time analysis of business
e.g. MB, GB. business measures by
Data Integrity OLTP database OLAP database does operations. category and
must maintain not get frequently attributes.
data integrity modified. Hence, Performance Transaction Query throughput is
constraint. data integrity is not metric throughput is the performance
an issue. the metric.
Response It's response Response time in performance
time time is in seconds to minutes. metric.
millisecond. Number of This kind of This kind
Data quality The data in the The data in OLAP users database allows of database allows
OLTP database process might not be thousands of only hundreds of
is always organized. users. users.
detailed and Productivity It helps to Help to Increase
organized. Increase user's productivity of the
Usefulness It helps to It helps with self-service and business analysts.
control and run planning, problem- productivity.
fundamental solving, and Challenge Data An OLAP cube is
business tasks. decision support. Warehouses not an open SQL
Operation Allow Only read and rarely historically server data
read/write write. have been a warehouse.
operations. development Therefore, technical
Audience It is a market It is a customer project which knowledge and
orientated orientated process. may prove experience is
process. costly to build. essential to manage
the OLAP server.
Query Type Queries in this Complex queries
process are involving Process It provides fast It ensures that
standardized aggregations. result for daily response to the
and simple. used data. query is quicker
consistently.
Back-up Complete OLAP only need a
backup of the backup from time to Characteristic It is easy to It lets the user create
data combined time. Backup is not create and a view with the help
with important compared maintain. of a spreadsheet.
incremental to OLTP Style OLTP is A data warehouse is
backups. designed to created uniquely so
Design DB design is DB design is subject have fast that it can integrate
application oriented. Example: response time, different data
oriented. Database design low data sources for building
Example: changes with redundancy and a consolidated
Database subjects like sales, is normalized. database
design changes marketing,
with industry purchasing, etc.
like Retail,

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-33)

 1.11 OLAP OPERATIONS

 In the multidimensional model, data are organized into multiple dimensions, and each dimension contains multiple
levels of abstraction defined by concept hierarchies. This organization provides users with the flexibility to view data
from different perspectives.
 For example, we have attributes as day, temperature and humidity, we can group values in subsets and name these
subsets, thus obtaining a set of hierarchies as shown in Fig. 1.11.1.

(1A18)Fig. 1.11.1

 OLAP provides a user-friendly environment for interactive data analysis. A number of OLAP data cube operations
exist to materialize different views of data, allowing interactive querying and analysis of the data.
 The most popular end user operations on dimensional  The concept hierarchy can be defined as
data are:
hotdayweek. The roll-up operation groups the data
1. Roll-Up by levels of temperature.
 The roll-up operation (also called drill-up or 2. Drill-Down
aggregation operation) performs aggregation on a data
cube, either by climbing up a concept hierarchy for a  The drill-down operation (also called roll-down) is the
dimension or by climbing down a concept hierarchy, reverse operation of roll-up. Drill-down is like
i.e. dimension reduction. Let us explain roll up with an zooming-in on the data cube. It navigates from less
example: detailed record to more detailed data. Drill-down can
 Consider the following cubes illustrating temperature be performed by either stepping down a concept
of certain days recorded weekly: hierarchy for a dimension or adding additional
dimensions.
Temperature 64 65 68 69 70 71 72 75 80 81 83 85
 Figure shows a drill-down operation performed on the
Week1 1 0 1 0 1 0 0 0 0 0 1 0
dimension time by stepping down a concept hierarchy
Week2 0 0 0 1 0 0 1 2 0 1 0 0 which is defined as day, month, quarter, and year.
 Consider that we want to set up levels (hot (80-85), Drill-down appears by descending the time hierarchy
mild (70-75), cool (64-69)) in temperature from the from the level of the quarter to a more detailed level of
above cubes.
the month.
 To do this, we have to group column and add up the
value according to the concept hierarchies. This  Because a drill-down adds more details to the given
operation is known as a roll-up. data, it can also be performed by adding a new
 By doing this, we get the following cube : dimension to a cube. For example, a drill-down on the
Temperature Cool mild Hot central cubes of the figure can occur by introducing an
Week1 2 1 1 additional dimension, such as a customer group. Drill-

Week2 2 1 1 down adds more details to the given data.

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-34)

Temperature Cool mild Hot Temperature Cool


Day 1 0 0 0 Day 8 1
Day 2 0 0 0 Day 9 1
Day 3 0 0 1 Day 11 0
Day 4 0 1 0 Day 12 0
Day 5 1 0 0 Day 13 0
Day 6 0 0 0 Day 14 0
Day 7 1 0 0
4. Dice
Day 8 0 0 0
 The dice operation describes a sub-cube by operating a
Day 9 1 0 0
selection on two or more dimension.
Day 10 0 1 0
 For example, Implement the selection (time = day 3
Day 11 0 1 0 OR time = day 4) AND (temperature = cool OR
Day 12 0 1 0 temperature = hot) to the original cubes we get the
Day 13 0 0 1 following sub-cube (still two-dimensional)
Day 14 0 0 0 Temperature Cool hot
Day 3 0 1
3. Slice
Day 4 0 0
 A slice is a subset of the cubes corresponding to a
single value for one or more members of the 5. Pivot
dimension. For example, a slice operation is executed
 The pivot operation is also called a rotation. Pivot is a
when the customer wants a selection on one dimension
visualization operation which rotates the data axes in
of a three-dimensional cube resulting in a
view to provide an alternative presentation of the data.
two-dimensional site. So, the slice operations perform
It may contain swapping the rows and columns or
a selection on one dimension of the given cube, thus
moving one of the row-dimensions into the column
resulting in a sub-cube. It will form a new sub-cubes
dimensions.
by selecting one or more dimensions.
 Example : Let’s look at some typical OLAP
 For example, if we make the selection, temperature =
operations for multidimensional data. Each of the
cool we will obtain the following cube:
following operations described is illustrated below. In
Temperature Cool the figure is a data cube for Digi1 Electronics sales.
Day 1 0 The cube contains the dimension location, time, and
Day 2 0 item, where location is aggregated with respect to city
Day 3 0 values, time is aggregated with respect to quarters, and
item is aggregated with respect to item types. The
Day 4 0
measure displayed is dollars sold (in thousands). (For
Day 5 1 improved readability, only some of the cubes’ cell
Day 6 1 values are shown.) The data examined are for the cities
Day 7 1 Chicago, NewYork, Toronto, and Vancouver.

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-35)

(1A19)Fig. 1.11.2: Pivot Operation on Multidimensional Cube

1. Roll-up Operation

(1A20) Fig. 1.11.3: Roll-up Operation on Location Dimension

NOTES

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-36)

2. Drill-down Operation

(1A21) Fig. 1.11.4: Drill-down Operation on Time Dimension

3. Slice Operation

(1A22) Fig. 1.11.5: Slice Operation for Time Dimension

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-37)

4. Dice Operation

(1A23) Fig. 1.11.6: Dice Operation for Location and Time Dimensions

5. Pivot Operation

(1A24) Fig. 1.11.7: Pivot Operation for Location and Item Dimensions

Ex. 1.11.1 : Consider a data warehouse for a hospital where there are three dimensions: a) Doctor b) Patient c) Time.
Consider two measuresi) Countii) Charge where charge is the fee that the doctor charges a patient for a visit. For the above
example create a cube and illustrate the following OLAP operations.1) Rollup 2) Drill down 3) Slice 4) Dice 5) Pivot.
 Soln. : 1. Roll-up Operation

(1A25) Fig. P. 1.11.1(a): Roll-up Operation on Doctors Dimension

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-38)

2. Drill-down Operation

(1A26) Fig. 1.11.1(b): Drill-down Operation on Time Dimension

3. Slice Operation

(1A27)Fig. 1.11.1(c): Slice Operation for Time Dimension

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-39)

4. Dice Operation

(1A28) Fig. 1.11.1(d): Dice Operation on Doctors and Time Dimensions

(1A29) Fig. 1.11.1(e): Pivot Operation on Doctors and Patients Dimensions

 1.20 OLAP SERVERS

There are three types of OLAP servers, namely, Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP) and
Hybrid OLAP (HOLAP).

1. Relational OLAP (ROLAP)

 Relational On-Line Analytical Processing (ROLAP) work mainly for the data that resides in a relational database,
where the base data and dimension tables are stored as relational tables.
 ROLAP servers are placed between the relational back-end server and client front-end tools.
 ROLAP servers use RDBMS to store and manage warehouse data, and OLAP middleware to support missing
pieces.
 Example : DSS Server of Microstrategy

Advantages of ROLAP

1. ROLAP can handle large amounts of data. 2. Can be used with data warehouse and OLTP systems.

Disadvantages of ROLAP

1. Limited by SQL functionalities. 2. Hard to maintain aggregate tables.

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-40)

(1A30)Fig.1.12.1 : ROLAP Server

2. Multidimensional OLAP (MOLAP)

 Multidimensional On-Line Analytical Processing (MOLAP) support multidimensional views of data through
array-based multidimensional storage engines.
 With multidimensional data stores, the storage utilization may be low if the data set is sparse.
 Example : Oracle Essbase

Advantages of MOLAP

1. Optimal for slice and dice operations.


2. Performs better than ROLAP when data is dense.
3. Can perform complex calculations.

Disadvantages of MOLAP

1. Difficult to change dimension without re-aggregation.


2. MOLAP can handle limited amount of data.

(1A31)Fig.1.12.2 : MOLAP Server

3. Hybrid OLAP (HOLAP)

 Hybrid On-Line Analytical Processing (HOLAP) is a combination of ROLAP and MOLAP.


 HOLAP provide greater scalability of ROLAP and the faster computation of MOLAP.
 Example: Microsoft SQL Server 2000

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-41)

Advantages of HOLAP

1. HOLAP provide advantages of both MOLAP and ROLAP.


2. Provide fast access at all levels of aggregation.

Disadvantage of HOLAP

1. HOLAP architecture is very complex because it supports both MOLAP and ROLAP servers.

(1A32)Fig. 1.12.3: HOLAP Server


 1.13 APPLICATIONS OF OLAP
 1.14 HYPERCUBE
OLAP system is to analyze the business which helps in
 Multidimensional databases can present their data for
decision-making, forecasting, planning, problem solving.
an application using two types of cubes: hypercube and
Some of the applications of OLAP include: multi-cubes. In a hypercube, as shown in Fig. 1.14.1,
1. Financial Applications all data appears logically as a single cube. All parts of
the manifold represented by this hypercube have
 Resource (man-power, raw material) allocation identical dimensionality. Each dimension belongs to
 Budgeting one cube only. A dimension is owned by the
hypercube. This simplicity makes easy for users to
2. Sales Applications
understand.
 Research on market analysis  Designing a hypercube model is a top-down process
 Forecasting sales with three major steps.
 Analyzing sales promotions 1. You decide which process of the business you
want to capture in the model, such as sales
 Analyzing customer requirements
activity.
 Dividing market based on customer 2. You identify the values that you want to capture,
3. Business Modelling such as sales amounts. This information is always
numeric.
 Understanding and simulating the market trend and
3. You identify the granularity of the data, meaning
business behavior the lowest level of detail at which you want to
 Decision support system for managers, executives, capture. These elements are the dimensions. And
CEO, data scientists. time, geography, product, and customer are some
common dimensions. For example, a single cell in
a cube could refer to the sales amount of Sony
TVs in the first quarter of the year, in PA, USA.

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-42)

 1.15 AGGREGATE FACT TABLES

 Aggregate fact tables are special fact tables in a data


warehouse that contain new metrics derived from one
or more aggregate functions (AVERAGE, COUNT,
MIN, MAX, etc.) or from other specialized functions
that output totals derived from a grouping of the base
data.
 These new metrics, called “aggregate facts” or
(1A33)Fig. 1.14.1 : Hypercube “summary statistics” are stored and maintained in the
data warehouse database in special fact tables at the
 In the multi-cube model, data is segmented into a set of grain of the aggregation.
smaller cubes, each of which is composed of a subset
 Likewise, the corresponding dimensions are rolled up
of the available dimensions, as shown in Fig. 1.14.2.
and condensed to match the new grain of the fact.
They are used to handle multiple fact tables, each with
different dimensionality.  These specialized tables are used as substitutes
whenever possible for returning user queries. The
 A dimension can be part of multiple cubes. Dimensions
reason is Speed. Querying a tidy aggregate table is
are not owned by any one cube, like under the
much faster and uses much less disk I/O than the base,
hypercube model. Rather, they are available to all
atomic fact table, especially if the dimensions are large
cubes, or there can be some dimensions that do not
as well.
belong to any cube. This makes it much more efficient
and versatile. It is also a more efficient way of storing  If you want to wow your users, start adding aggregates.
very sparse data, and it can reduce the pre-calculation You can even use this “trick” in your operational
database explosion effect, which will be covered in a systems to serve as a foundation for operational
later section. reports.

 The drawback is that this is less straightforward than  For example, take the “Orders” business process from
hypercube and can carry steeper learning curves. Some an online catalog company where you might have
systems use the combined approach of hypercube and customer orders in a fact table called FactOrders with
multi-cubes, by separating the storage, processing, and dimensions Customer, Product, and OrderDate.
presentation layers. It stores data as multi-cubes but  With possibly millions of orders in the transaction fact,
presents as a hypercube. it makes sense to start thinking about aggregates.
 To further the above example, assume that the business
is interested in a report: “Monthly orders by state and
product type”.
 While you could generate this easily enough using the
FactOrders fact table, you could likely speed up the
data retrieval for the report by at least half (but likely
much, much more) using an aggregate.

 1.16 MULTIPLE CHOICE QUESTIONS

Q. 1.1 Among the following which is not a type of


business data?
(a) Real time data (b) Application data
(c) Reconciled data (d) Derived data
(1A34)Fig. 1.14.2 : Multi-cube Ans. : (b)

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture
Data Warehousing and Mining (MU-Sem 5-Comp.) (Data Warehousing Fundamentals)….Page no. (1-43)
Q. 1.2 A data warehouse is which of the following? Q. 1.10 Which is NOT considered as a standard querying
(a) Can be updated by end users. technique?
(b) Contains numerous naming conventions and (a) Roll-up (b) Drill-down
formats. (c) DSS (d) Pivot Ans. : (c)
(c) Organized around important subject areas. Q. 1.11 Among the following which is not a type of
(d) Contains only current data. Ans. : (c) business data?
(a) Real time data (b) Application data
Q. 1.3 An operational system is which of the following?
(c) Reconciled data (d) Derived data
(a) A system that is used to run the business in real
time and is based on historical data.
Ans. : (b)
(b) A system that is used to run the business in real Q. 1.12 A snowflake schema has which of the following
time and is based on current data. types of tables?
(c) A system that is used to support decision (a) Fact (b) Dimension
making and is based on current data. (c) Helper (d) All of the above Ans. : (d)
(d) A system that is used to support decision Q. 1.13 The extract process is which of the following?
making and is based on historical data. (a) Capturing all of the data contained in various
Ans. : (b) operational systems
(b) Capturing a subset of the data contained in
Q. 1.4 What is the type of relationship in star schema?
various operational systems
(a) many-to-many (b) one-to-one (c) Capturing all of the data contained in various
(c) many-to-one (d) one-to-many Ans. : (d) decision support systems
Q. 1.5 Fact tables are _______. (d) Capturing a subset of the data contained in
(a) completely demoralized. various decision support systems Ans. : (b)
(b) partially demoralized. Q. 1.14 Which of the following is not true regarding
(c) completely normalized. characteristics of warehoused data?
(a) Changed data will be added as new data
(d) partially normalized. Ans. : (c)
(b) Data warehouse can contain historical data
Q. 1.6 Data warehouse is volatile, because obsolete data (c) Obsolete data are discarded
are discarded
(d) Users can change data once entered into the
(a) True (b) False Ans. : (b) data warehouse Ans. : (d)
Q. 1.7 Which is NOT a basic conceptual schema in Data Q. 1.15 Which of the following statements is incorrect?
Modeling of Data Warehouses?
(a) ROLAPs have large data volumes
(a) Star Schema (b) Tree Schema (b) Data form of ROLAP is large multidimentional
(c) Snowflake Schema array made of cubes
(d) Fact Constellation Schema Ans. : (b) (c) MOLAP uses sparse matrix technology to
manage data sparcity
Q. 1.8 Among the followings which is not a characteristic
of Data Warehouse? (d) Access for MOLAP is faster than ROLAP
(a) Integrated (b) Volatile Ans. : (b)
(c) Time-variant (d) Subject-oriented Q. 1.16 Which of the following standard query techniques
Ans. : (b) increase the granularity
(a) roll-up (b) drill-down
Q. 1.9 What is not considered as issues in data
warehousing?
(c) slicing (d) dicing Ans. : (b)
(a) Optimization (b) Data transformation Q. 1.17 The full form of OLAP is
(c) Extraction (d) Intermediation (a) Online Analytical Processing
(b) Online Advanced Processing
Ans. : (d)
(c) Online Analytical Performance
(d) Online Advanced Preparation Ans. : (a)

(MU-New Syllabus w.e.f academic year 21-22)(M5-56) Tech-Neo Publications...A SACHIN SHAH Venture

You might also like