0% found this document useful (1 vote)

45 views

CCS341_Data Warehousing_Unit 4 Notes

The document discusses dimensional modeling and schema in data warehousing, focusing on multi-dimensional data models, data cubes, and various schema types such as star, snowflake, and galaxy schemas. It outlines the structure and components of these schemas, including fact and dimension tables, and highlights the advantages and disadvantages of multi-dimensional data models. Additionally, it explains the process of building a multi-dimensional data model and the significance of OLAP tools in analyzing large datasets.

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

45 views

CCS341_Data Warehousing_Unit 4 Notes

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

RIT CCS341 - DATA WAREHOUSING 1

UNIT-IV

DIMENSIONAL MODELING AND SCHEMA

Dimensional Modeling- Multi-Dimensional Data Modeling – Data Cube- Star Schema-

Snowflake schema- Star Vs Snowflake schema- Fact constellation Schema- Schema
Definition - Process Architecture- Types of Data Base Parallelism – Data warehouse Tools

Multidimensional Model:
A multidimensional model views data in the form of a data-cube. A data cube enables data to be modelled
and viewed in multiple dimensions. It is defined by dimensions and facts.
The dimensions are the perspectives or entities concerning which an organization keeps records. For
example, a shop may create a sales data warehouse to keep records of the store's sales for the dimension time,
item, and location. These dimensions allow the save to keep track of things, for example, monthly sales of
items and the locations at which the items were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For example, a dimensional table for an item may
contain the attributes item name, brand, and type.
A multidimensional data model is organized around a central theme, for example, sales. This theme is
represented by a fact table. Facts are numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 2

Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in the table. In
this 2D representation, the sales for Delhi are shown for the time dimension (organized in quarters) and the
item dimension (classified according to the types of an item sold). The fact or measure displayed in rupee
sold (in thousands).

Now, if we want to view the sales data with a third dimension, For example, suppose the data according to
time and item, as well as the location is considered for the cities Chennai, Kolkata, Mumbai, and Delhi.
These 3D data are shown in the table. The 3D data of the table are represented as a series of 2D tables.

Conceptually, it may also be represented by the same data in the form of a 3D data cube, as shown in fig:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 3

Working on a Multidimensional Data Model

The following stages should be followed by every project for building a Multi-Dimensional Data Model
Stage 1: Assembling data from the client - In first stage, a Multi-Dimensional Data Model collects
correct data from the client. Mostly, software professionals provide simplicity to the client about the range
of data which can be gained with the selected technology and collect the complete data in detail.

Stage 2: Grouping different segments of the system - In the second stage, the Multi-Dimensional Data
Model recognizes and classifies all the data to the respective section they belong to and also builds it
problem-free to apply step by step.

Stage 3: Noticing the different proportions - In the third stage, it is the basis on which the design of the
system is based. In this stage, the main factors are recognized according to the user’s point of view. These
factors are also known as “Dimensions”.

Stage 4: Preparing the actual-time factors and their respective qualities - In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related qualities.
These qualities are also known as “attributes” in the database.

Stage 5: Finding the actuality of factors which are listed previously and their qualities - In the fifth
stage, A Multi-Dimensional Data Model separates and differentiates the actuality from the factors which
are collected by it. These actually play a significant role in the arrangement of a Multi-Dimensional Data
Model.

Stage 6: Building the Schema to place the data, with respect to the information collected from the
steps above - In the sixth stage, on the basis of the data which was collected previously, a Schema is
built.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 4

Features of multidimensional data models:

Measures: Measures are numerical data that can be analyzed and compared, such as sales or revenue. They
are typically stored in fact tables in a multidimensional data model.
Dimensions: Dimensions are attributes that describe the measures, such as time, location, or product. They
are typically stored in dimension tables in a multidimensional data model.
Cubes: Cubes are structures that represent the multidimensional relationships between measures and
dimensions in a data model. They provide a fast and efficient way to retrieve and analyze data.
Aggregation: Aggregation is the process of summarizing data across dimensions and levels of detail. This
is a key feature of multidimensional data models, as it enables users to quickly analyze data at different levels
of granularity.
Drill-down and roll-up: Drill-down is the process of moving from a higher-level summary of data to a lower
level of detail, while roll-up is the opposite process of moving from a lower-level detail to a higher-level
summary. These features enable users to explore data in greater detail and gain insights into the underlying
patterns.
Hierarchies: Hierarchies are a way of organizing dimensions into levels of detail. For example, a time
dimension might be organized into years, quarters, months, and days. Hierarchies provide a way to navigate
the data and perform drill-down and roll-up operations.
OLAP (Online Analytical Processing): OLAP is a type of multidimensional data model that supports fast
and efficient querying of large datasets. OLAP systems are designed to handle complex queries and provide
fast response times.
Advantages of Multi-Dimensional Data Model
The following are the advantages of a multi-dimensional data model:
 A multi-dimensional data model is easy to handle.
 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational databases).
 The representation of data is better than traditional databases. That is because the multi-dimensional
databases are multi-viewed and carry different types of factors.
 It is workable on complex systems and applications, contrary to the simple one-dimensional
database systems.
 The compatibility in this type of database is an upliftment for projects having lower bandwidth for
maintenance staff.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 5

Disadvantages of Multi-Dimensional Data Model

The following are the disadvantages of a Multi-Dimensional Data Model:
 The multi-dimensional Data Model is slightly complicated in nature and it requires professionals to
recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system caches, there is a great
effect on the working of the system.
 It is complicated in nature due to which the databases are generally dynamic in design.
 The path to achieving the end product is complicated most of the time.
 As the Multi-Dimensional Data Model has complicated systems, databases have a large number of
databases due to which the system is very insecure when there is a security break.
What is Data Cube?

When data is grouped or combined in multidimensional matrices called Data Cubes. The data cube method
has a few alternative names or a few variants, such as "Multidimensional databases," "materialized views,"
and "OLAP (On-Line Analytical Processing)."

The general idea of this approach is to materialize certain expensive computations that are frequently
inquired.

For example, a relation with the schema sales (part, supplier, customer, and sale-price) can be materialized
into a set of eight views as shown in fig, where psc indicates a view consisting of aggregate function value
(such as total-sales) computed by grouping three attributes part, supplier, and customer, p indicates a view
composed of the corresponding aggregate function values calculated by grouping part alone, etc.

Data cube is created from a subset of attributes in the database. Specific attributes are chosen to be measure
attributes, i.e., the attributes whose values are of interest. Another attributes are selected as dimensions or
functional attributes. The measure attributes are aggregated according to the dimensions.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 6

For example, XYZ may create a sales data warehouse to keep records of the store's sales for the dimensions
time, item, branch, and location. These dimensions enable the store to keep track of things like monthly sales
of items, and the branches and locations at which the items were sold. Each dimension may have a table
identify with it, known as a dimensional table, which describes the dimensions. For example, a dimension
table for items may contain the attributes item name, brand, and type.
Data cube method is an interesting technique with many applications. Data cubes could be sparse in many
cases because not every cell in each dimension may have corresponding data in the database.
Techniques should be developed to handle sparse cubes efficiently.
If a query contains constants at even lower levels than those provided in a data cube, it is not clear how to
make the best use of the precomputed results stored in the data cube.
The model view data in the form of a data cube. OLAP tools are based on the multidimensional data model.
Data cubes usually model n-dimensional data.
A data cube enables data to be modelled and viewed in multiple dimensions. A multidimensional data model
is organized around a central theme, like sales and transactions. A fact table represents this theme. Facts are
numerical measures. Thus, the fact table contains measure (such as Rs. sold) and keys to each of the related
dimensional tables.
Dimensions are a fact that defines a data cube. Facts are generally quantities, which are used for analyzing
the relationship between dimensions.

Example: In the 2-D representation, we will look at the All Electronics sales data for items sold per quarter in
the city of Vancouver. The measured display in dollars sold (in thousands).

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 7

3-Dimensional Cuboids
Let suppose we would like to view the sales data with a third dimension. For example, suppose we would
like to view the data according to time, item as well as the location for the cities Chicago, New York, Toronto,
and Vancouver. The measured display in dollars sold (in thousands). These 3-D data are shown in the table.
The 3-D data of the table are represented as a series of 2-D tables.

Conceptually, we may represent the same data in the form of 3-D data cubes, as shown in fig:

Let us suppose that we would like to view our sales data with an additional fourth dimension, such as a
supplier.
In data warehousing, the data cubes are n-dimensional. The cuboid which holds the lowest level of
summarization is called a base cuboid.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 8

For example, the 4-D cuboid in the figure is the base cuboid for the given time, item, location, and supplier
dimensions.

Figure is shown a 4-D data cube representation of sales data, according to the dimensions time, item,
location, and supplier. The measure displayed is dollars sold (in thousands).
The topmost 0-D cuboid, which holds the highest level of summarization, is known as the apex cuboid. In
this example, this is the total sales, or dollars sold, summarized over all four dimensions.
The lattice of cuboid forms a data cube. The figure shows the lattice of cuboids creating 4-D data cubes for
the dimension time, item, location, and supplier. Each cuboid represents a different degree of summarization.

Schemas Used in Data Warehouses: Star, Galaxy (Fact constellation), and Snowflake:

What Is a Data Warehouse Schema?

We can think of a data warehouse schema as a blueprint or an architecture of how data will be stored and
managed. A data warehouse schema isn’t the data itself, but the organization of how data is stored and how it
relates to other data components within the data warehouse architecture.

In the past, data warehouse schemas were often strictly enforced across an enterprise, but in modern imple-
mentations where storage is increasingly inexpensive, schemas have become less constrained. Despite this
loosening or sometimes total abandonment of data warehouse schemas, knowledge of the foundational
schema designs can be important to both maintaining legacy resources and for creating modern data ware-
house design that learns from the past.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 9

The basic components of all data warehouse schemas are fact and dimension tables. The different combination
of these two central elements compose almost the entirety of all data warehouse schema designs.
Fact Table

A fact table aggregates metrics, measurements, or facts about business processes. In this example, fact tables
are connected to dimension tables to form a schema architecture representing how data relates within the data
warehouse. Fact tables store primary keys of dimension tables as foreign keys within the fact table.

Dimension Table

Dimension tables are non-denormalized tables used to store data attributes or dimensions. As mentioned
above, the primary key of a dimension table is stored as a foreign key in the fact table. Dimension tables are
not joined together. Instead, they are joined via association through the central fact table.

3 Types of Schema Used in Data Warehouses

History presents us with three prominent types of data warehouse schema known as Star Schema, Snowflake
Schema, and Galaxy Schema. Each of these data warehouse schemas has unique design constraints and de-
scribes a different organizational structure for how data is stored and how it relates to other data within the
data warehouse.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 10

What Is a Star Schema in a Data Warehouse?

The star schema in a data warehouse is historically one of the most straightforward designs. This schema
follows some distinct design parameters, such as only permitting one central table and a handful of single-
dimension tables joined to the table. In following these design constraints, star schema can resemble a star
with one central table, and five dimension tables joined (thus where the star schema got its name).

Star Schema is known to create denormalized dimension tables – a database structuring strategy that organizes
tables to introduce redundancy for improved performance. Denormalization intends to introduce redundancy
in additional dimensions so long as it improves query performance.

Characteristics of the Star Schema:

 Star data warehouse schemas create a denormalized database that enables quick querying responses
 The primary key in the dimension table is joined to the fact table by the foreign key
 Each dimension in the star schema maps to one dimension table
 Dimension tables within a star scheme are not to be connected directly
 Star schema creates denormalized dimension tables

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 11

What Is a Snowflake Schema?

The Snowflake Schema is a data warehouse schema that encompasses a logical arrangement of dimension
tables. This data warehouse schema builds on the star schema by adding additional sub-dimension tables that
relate to first-order dimension tables joined to the fact table.

Just like the relationship between the foreign key in the fact table and the primary key in the dimension table,
with the snowflake schema approach, a primary key in a sub-dimension table will relate to a foreign key within
the higher order dimension table.

Snowflake schema creates normalized dimension tables – a database structuring strategy that organizes tables
to reduce redundancy. The purpose of normalization is to eliminate any redundant data to reduce overhead.

Characteristics of the Snowflake Schema:

 Snowflake Schema are permitted to have dimension tables joined to other dimension tables
 Snowflake Schema are to have one fact table only
 Snowflake Schema create normalized dimension tables
 The normalized schema reduces required disk space for running and managing this data warehouse
 Snowflake Scheme offer an easier way to implement a dimension

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 12

What Is a Galaxy Schema?

The Galaxy Data Warehouse Schema, also known as a Fact Constellation Schema, acts as the next iteration
of the data warehouse schema. Unlike the Star Schema and Snowflake Schema, the Galaxy Schema uses
multiple fact tables connected with shared normalized dimension tables. Galaxy Schema can be thought of as
star schema interlinked and completely normalized, avoiding any kind of redundancy or inconsistency of data.

Characteristics of the Galaxy Schema:

 Galaxy Schema is multidimensional acting as a strong design consideration for complex database sys-
tems
 Galaxy Schema reduces redundancy to near zero redundancy as a result of normalization
 Galaxy Schema is known for high data quality and accuracy and lends to effective reporting and
analytics

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 13

Key Differences between Star, Snowflake, and Galaxy Schema:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 14

Summary of Data Warehouse Schemas’:

To understand data warehouse schema and its various types at the conceptual level, here are a few things to
remember:

 Data warehouse schema is a blueprint for how data will be stored and managed. It includes
definitions of terms, relationships, and the arrangement of those terms and relationships.
 Star, galaxy, and snowflake are common types of data warehouse schema that vary in the
arrangement and design of the data relationships.
 Star schema is the simplest data warehouse schema and contains just one central table and a handful
of single-dimension tables joined together.
 Snowflake schema builds on star schema by adding sub-dimension tables, which eliminates
Redundancy and reduces overhead costs.
 Galaxy schema uses multiple fact tables (Snowflake and Star use only one) which makes it like an
Interlinked star schema. This nearly eliminates redundancy and is ideal for complex database
Systems.

Which Data Warehouse Schema is Best?

There’s no one “best” data warehouse schema. The “best” schema depends on (among other things) your
resources, the type of data you’re working with, and what you’d like to do with it.

For instance, star schema is ideal for organizations that want maximum simplicity and can tolerate higher disk
space usage. But galaxy schema is more suitable for complex data aggregation. And snowflake schema could
be superior for an organization that wants lower data redundancy without the complexity of star schema.

How StreamSets’ Schema-agnostic Approach Makes Schemas Easy

Our agnostic approach to schema management means that StreamSets data pipeline tools can manage any kind
of schema – simple, complex or non-existent. Meaning, with StreamSets you don’t have to spend hours match-
ing the schema from a legacy origin into your destination, instead StreamSets can infer any kind of schema
without you having to lift a finger. If however, you want to enforce a schema and create hard and fast validation
rules, StreamSets can help you with that as well. Our flexibility in how we manage schemas means your data
teams have less to figure out on their own and more time to spend on what really matters: your data.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 15

Data Warehouse Process Architecture

The process architecture defines an architecture in which the data from the data warehouse is processed for
a particular computation.

Following are the two fundamental process architectures:

Centralized Process Architecture

In this architecture, the data is collected into single centralized storage and processed upon completion by a
single machine with a huge structure in terms of memory, processor, and storage.

Centralized process architecture evolved with transaction processing and is well suited for small
organizations with one location of service.
It requires minimal resources both from people and system perspectives.

It is very successful when the collection and consumption of data occur at the same location.

Distributed Process Architecture

In this architecture, information and its processing are allocated across data centres, and its processing is
distributed across data centres, and processing of data is localized with the group of the results into
centralized storage. Distributed architectures are used to overcome the limitations of the centralized process
architectures where all the information needs to be collected to one central location, and results are available
in one central location.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 16

There are several architectures of the distributed process:

Client-Server

In this architecture, the user does all the information collecting and presentation, while the server does the
processing and management of data.

Three-tier Architecture

With client-server architecture, the client machines need to be connected to a server machine, thus mandating
finite states and introducing latencies and overhead in terms of record to be carried between clients and
servers.

N-tier Architecture

The n-tier or multi-tier architecture is where clients, middleware, applications, and servers are isolated into
multiple tiers.

Cluster Architecture

In this architecture, machines that are connected in network architecture (software or hardware) to
approximately work together to process information or compute requirements in parallel. Each device in a
cluster is associated with a function that is processed locally, and the result sets are collected to a master
server that returns it to the user.

Peer-to-Peer Architecture

This is a type of architecture where there are no dedicated servers and clients. Instead, all the processing
responsibilities are allocated among all machines, called peers. Each machine can perform the function of a
Client or server or just process data.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 17

Types of Database Parallelism

Parallelism is used to support speedup, where queries are executed faster because more resources, such as
processors and disks, are provided. Parallelism is also used to provide scale-up, where increasing workloads
are managed without increase response-time, via an increase in the degree of parallelism.

Different architectures for parallel database systems are shared-memory, shared-disk, shared-nothing, and
hierarchical structures.

(a)Horizontal Parallelism: It means that the database is partitioned across multiple disks, and parallel
processing occurs within a specific task (i.e., table scan) that is performed concurrently on different
processors against different sets of data.

(b)Vertical Parallelism: It occurs among various tasks. All component query operations (i.e., scan, join, and
sort) are executed in parallel in a pipelined fashion. In other words, an output from one function (e.g., join)
as soon as records become available.

Intraquery Parallelism
Intraquery parallelism defines the execution of a single query in parallel on multiple processors and disks.
Using intraquery parallelism is essential for speeding up long-running queries.

Interquery parallelism
In this method it does not help in this function since each query is run sequentially.

This application of parallelism decomposes the serial SQL, query into lower-level operations such as scan,
join, sort, and aggregation.

These lower-level operations are executed concurrently, in parallel.

In interquery parallelism, different queries or transaction execute in parallel with one another.

This form of parallelism can increase transactions throughput. The response times of individual transactions
are not faster than they would be if the transactions were run in isolation.

Thus, the primary use of interquery parallelism is to scale up a transaction processing system to support a
more significant number of transactions per second.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 18

Database vendors started to take advantage of parallel hardware architectures by implementing multiserver
and multithreaded systems designed to handle a large number of client requests efficiently.

This approach naturally resulted in interquery parallelism, in which different server threads (or processes)
handle multiple requests at the same time.

Interquery parallelism has been successfully implemented on SMP systems, where it increased the
throughput and allowed the support of more concurrent users.

Data Warehouse Tools

The tools that allow sourcing of data contents and formats accurately and external data stores into the data
warehouse have to perform several essential tasks that contain:
 Data consolidation and integration.
 Data transformation from one form to another form.
 Data transformation and calculation based on the function of business rules that force transformation.
 Metadata synchronization and management, which includes storing or updating metadata about
source files, transformation actions, loading formats, and events.

There are several selection criteria which should be considered while implementing a data warehouse:

1. The ability to identify the data in the data source environment that can be read by the tool is necessary.
2. Support for flat files, indexed files, and legacy DBMSs is critical.
3. The capability to merge records from multiple data stores is required in many installations.
4. The specification interface to indicate the information to be extracted and conversation are essential.
5. The ability to read information from repository products or data dictionaries is desired.
6. The code develops by the tool should be completely maintainable.
7. Selective data extraction of both data items and records enables users to extract only the required
data.
8. A field-level data examination for the transformation of data into information is needed.
9. The ability to perform data type and the character-set translation is a requirement when moving data
between incompatible systems.
10. The ability to create aggregation, summarization and derivation fields and records are necessary.
11. Vendor stability and support for the products are components that must be evaluated carefully.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

RIT CCS341 - DATA WAREHOUSING 19

Data Warehouse Software Components:

A warehousing team will require different types of tools during a warehouse project. These software
products usually fall into one or more of the categories illustrated, as shown in the figure.

Extraction and Transformation

The warehouse team needs tools that can extract, transform, integrate, clean, and load information from a
source system into one or more data warehouse databases. Middleware and gateway products may be needed
for warehouses that extract a record from a host-based source system.

Warehouse Storage
Software products are also needed to store warehouse data and their accompanying metadata. Relational
database management systems are well suited to large and growing warehouses.

Data access and retrieval

Different types of software are needed to access, retrieve, distribute, and present warehouse data to its end-
clients.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Futurelations Vol.3 E-book
100% (2)
Futurelations Vol.3 E-book
13 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
My Interview Questions and Answers For Job
No ratings yet
My Interview Questions and Answers For Job
78 pages
Visualization Errors
No ratings yet
Visualization Errors
34 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
UNIT V DWM Notes
No ratings yet
UNIT V DWM Notes
18 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
Cns Lessonplan
No ratings yet
Cns Lessonplan
2 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
CCS341 Data Warehousing
No ratings yet
CCS341 Data Warehousing
7 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Circular Linked List Program in C
100% (1)
Circular Linked List Program in C
3 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Unit 3
No ratings yet
Unit 3
24 pages
II Cse Cs3352 Fds QB Unit2
No ratings yet
II Cse Cs3352 Fds QB Unit2
5 pages
Data Mining
No ratings yet
Data Mining
2 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Data Wrangling
No ratings yet
Data Wrangling
15 pages
Dev PDF
100% (1)
Dev PDF
35 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Unit 5 Fod (1) (Repaired)
No ratings yet
Unit 5 Fod (1) (Repaired)
28 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Lecture 4 Data Structure Linked List
No ratings yet
Lecture 4 Data Structure Linked List
30 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
DWDM Unit-2 PDF
No ratings yet
DWDM Unit-2 PDF
149 pages
Data Warehousing & Data Mining Important Questions
No ratings yet
Data Warehousing & Data Mining Important Questions
1 page
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Lab-manual-Advanced Python Programming 4321602
No ratings yet
Lab-manual-Advanced Python Programming 4321602
24 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Question Bank_CSE-DS
No ratings yet
Question Bank_CSE-DS
5 pages
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
No ratings yet
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
192 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
DBMS Notes Unit IV PDF
No ratings yet
DBMS Notes Unit IV PDF
73 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
3D Lab Manual
No ratings yet
3D Lab Manual
57 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Unit II Data Analytics
No ratings yet
Unit II Data Analytics
17 pages
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
DW Example
No ratings yet
DW Example
24 pages
CS3301 Datastructure QN Paper Apr-May
No ratings yet
CS3301 Datastructure QN Paper Apr-May
2 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Queue Data Structure
No ratings yet
Queue Data Structure
13 pages
OCS353 Data Science Fundamentals QB_(Common to EEE,Mech,Civil)
No ratings yet
OCS353 Data Science Fundamentals QB_(Common to EEE,Mech,Civil)
7 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
DM Unit 1&2 Notes
No ratings yet
DM Unit 1&2 Notes
45 pages
Engineering Economics and FA
No ratings yet
Engineering Economics and FA
3 pages
Employability Skills
No ratings yet
Employability Skills
2 pages
Cn Lab Marks Dummy Sheets
No ratings yet
Cn Lab Marks Dummy Sheets
7 pages
WDFP Practical Question Set-1
No ratings yet
WDFP Practical Question Set-1
2 pages
Data Structure Design Set b (1)
No ratings yet
Data Structure Design Set b (1)
2 pages
WDFP Practical Question Set A
No ratings yet
WDFP Practical Question Set A
3 pages
Cs3303 Web Technology Qb
No ratings yet
Cs3303 Web Technology Qb
14 pages
Prolegomena to Homer 1795 Friedrich August Wolf - The ebook in PDF and DOCX formats is ready for download now
100% (1)
Prolegomena to Homer 1795 Friedrich August Wolf - The ebook in PDF and DOCX formats is ready for download now
60 pages
Toobas Resume
No ratings yet
Toobas Resume
1 page
2010 REAL DaVinci Code DICTIONARY Doc1
No ratings yet
2010 REAL DaVinci Code DICTIONARY Doc1
40 pages
Oral and Non-Verbal Communication
No ratings yet
Oral and Non-Verbal Communication
10 pages
Soal Ganjil Sasing 23 Xi
No ratings yet
Soal Ganjil Sasing 23 Xi
5 pages
B.A. III - Old Annual Pattern Syllabus
No ratings yet
B.A. III - Old Annual Pattern Syllabus
22 pages
Grade 7, Second Ex First Term 2023-2024 Las
No ratings yet
Grade 7, Second Ex First Term 2023-2024 Las
2 pages
English Test 1
No ratings yet
English Test 1
5 pages
Landfried Learning
No ratings yet
Landfried Learning
43 pages
Summative Test-Quarter 2
No ratings yet
Summative Test-Quarter 2
19 pages
Operating Systems: - Chapter 3
No ratings yet
Operating Systems: - Chapter 3
3 pages
Induction and Recursion
No ratings yet
Induction and Recursion
34 pages
English Drill 001
No ratings yet
English Drill 001
3 pages
The Top 10 Languages Spoken in The World
No ratings yet
The Top 10 Languages Spoken in The World
3 pages
Reaction Paper
100% (1)
Reaction Paper
4 pages
P & PLC E C T: Ppendix
No ratings yet
P & PLC E C T: Ppendix
43 pages
Lesson 2 Evangelization
No ratings yet
Lesson 2 Evangelization
18 pages
Don Carlo Gesualdo2
100% (1)
Don Carlo Gesualdo2
50 pages
Verbos Irregulares y Phrasal Verbs
No ratings yet
Verbos Irregulares y Phrasal Verbs
5 pages
On Brouwer Mark Van Atten pdf download
No ratings yet
On Brouwer Mark Van Atten pdf download
81 pages
Lab 6 Firna Frilanisa
No ratings yet
Lab 6 Firna Frilanisa
22 pages
_devoir_de_synthese_n1-7eme_annee_de_base-anglais---sfax pioneer prep school
No ratings yet
_devoir_de_synthese_n1-7eme_annee_de_base-anglais---sfax pioneer prep school
3 pages
Draw ER Model For A Given Database.: Entity
No ratings yet
Draw ER Model For A Given Database.: Entity
3 pages
Research Chapter 4
No ratings yet
Research Chapter 4
6 pages
Course Syllabus For Basic Computer Skills
No ratings yet
Course Syllabus For Basic Computer Skills
3 pages
Arts and Humanities
No ratings yet
Arts and Humanities
8 pages
Inversion Grammar Explanation
No ratings yet
Inversion Grammar Explanation
4 pages
4 Chapter SB Study Guide
No ratings yet
4 Chapter SB Study Guide
6 pages

CCS341_Data Warehousing_Unit 4 Notes

Uploaded by

CCS341_Data Warehousing_Unit 4 Notes

Uploaded by

RIT CCS341 - DATA WAREHOUSING 1

DIMENSIONAL MODELING AND SCHEMA

Dimensional Modeling- Multi-Dimensional Data Modeling – Data Cube- Star Schema-

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Working on a Multidimensional Data Model

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Features of multidimensional data models:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Disadvantages of Multi-Dimensional Data Model

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

What Is a Data Warehouse Schema?

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

3 Types of Schema Used in Data Warehouses

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

What Is a Star Schema in a Data Warehouse?

Characteristics of the Star Schema:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

What Is a Snowflake Schema?

Characteristics of the Snowflake Schema:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

What Is a Galaxy Schema?

Characteristics of the Galaxy Schema:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Key Differences between Star, Snowflake, and Galaxy Schema:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Summary of Data Warehouse Schemas’:

Which Data Warehouse Schema is Best?

How StreamSets’ Schema-agnostic Approach Makes Schemas Easy

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Data Warehouse Process Architecture

Following are the two fundamental process architectures:

Centralized Process Architecture

Distributed Process Architecture

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

There are several architectures of the distributed process:

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Types of Database Parallelism

These lower-level operations are executed concurrently, in parallel.

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Data Warehouse Tools

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

Data Warehouse Software Components:

Extraction and Transformation

Data access and retrieval

UNIT 4: DIMENSIONAL MODELING AND SCHEMA

You might also like