0% found this document useful (0 votes)

15 views12 pages

DWH

Uploaded by

Amit Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

DWH

Uploaded by

Amit Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

1. Compare a database with Data Warehouse.

Criteria Database Data Warehouse

Type of data Relational or object-oriented data Large volume with multiple data
types

Data operations Transaction processing Data modeling and analysis

Dimensions of Two-dimensional data Multidimensional data

data

Data design ER-based and application-oriented Star/Snowflake schema and subject-

database design oriented database design

Size of the data Small (in GB) Large (in TB)

Functionality High availability and performance High flexibility and user autonomy

2. What is the purpose of cluster analysis in Data Warehousing?

Cluster analysis in data warehousing is a technique used for grouping data objects based on
their similarity. The primary purpose of cluster analysis is to identify patterns, groupings, or
relationships within data that are not immediately apparent. Here's a detailed explanation:

Purpose of Cluster Analysis in Data Warehousing

1. Data Segmentation:
o Divides large datasets into smaller, meaningful groups (clusters) based on
similarity.

o Helps in understanding and organizing data for easier analysis and decision-
making.

2. Customer Segmentation:

o Groups customers based on shared characteristics such as buying behavior,

demographics, or preferences.

o Enables personalized marketing, targeted promotions, and better customer

service.

3. Outlier Detection:

o Identifies data points that do not belong to any cluster, which can represent
anomalies or errors.

o Useful for fraud detection, quality control, and error analysis.

4. Pattern Recognition:

o Discovers hidden patterns in the data, such as frequently co-occurring

features or relationships.

o Supports predictive modeling and data mining tasks.

5. Simplification of Data:

o Reduces the complexity of large datasets by summarizing them into distinct

groups.

o Improves the performance of further data analysis tasks, such as classification

or trend analysis.

6. Improving Data Quality:

o Groups similar records together, which can aid in identifying duplicates or

inconsistencies.

o Enhances the accuracy of the data warehouse.

7. Enhancing Decision-Making:

o Provides insights into distinct segments or clusters, enabling more informed

strategic decisions.

o For example, identifying high-value customers or regions with high sales

potential.

8. Trend Analysis:
o Identifies trends over time within clusters, helping organizations understand
changes in behaviour or performance.

o Useful for forecasting and planning.

3. What is the difference between agglomerative and divisive hierarchical clustering?

Agglomerative clustering builds clusters bottom-up by merging smaller clusters, while

divisive clustering splits clusters top-down starting with a single large cluster.

4. What is Virtual Data Warehousing?

Virtual Data Warehousing is an architectural approach where a central data repository is not
physically implemented; instead, data remains in source systems and is accessed through a
virtual layer that provides a unified view for analysis and reporting.

5. What is Active Data Warehousing?

Active Data Warehousing is a data warehousing approach that supports real-time or near-
real-time data updates and analytics, enabling businesses to make timely, data-driven
decisions and respond quickly to changing conditions.

6. What is a snapshot with reference to Data Warehouse?

In a data warehouse, a snapshot is a static view of data captured at a specific point in time,
used to track historical changes, create time-based reports, or maintain periodic data
records for analysis.

7. What is the difference between ‘view’ and ‘materialized view’?

Here’s the detailed explanation of the difference between view and materialized view:

Aspect View Materialized View

A view is a virtual table based on the A materialized view is a physical copy

result of a query. It does not store the of the query results stored in the
Definition data itself but fetches data dynamically database. The data is precomputed
from the underlying tables whenever and saved, making it available for
accessed. faster access.

No physical storage of data; it only Physically stores data in the database,

Data Storage
stores the query definition. consuming storage space.

Performance Every time the view is queried, the Faster because the results are
database executes the underlying precomputed and stored, reducing
query, which can be slow for complex the need to execute complex queries
Aspect View Materialized View

queries. repeatedly.

Does not automatically update with

Automatically reflects real-time
changes in the underlying tables;
Data Updates changes in the underlying tables since
requires manual or scheduled refresh
it dynamically fetches the latest data.
to synchronize the data.

Requires regular maintenance to

No need for maintenance since it does
Maintenance refresh and update the data to stay
not store data.
current.

Used when data must always reflect Suitable for scenarios where query
the latest state of the underlying performance is critical, especially for
Use Case
tables. Ideal for simple and dynamic complex or expensive queries with
queries. relatively static data.

Example A view to show all active customers:

CREATE VIEW ActiveCustomers AS

SELECT * FROM Customers WHERE Status = 'Active';

A materialized view for monthly sales summary:

CREATE MATERIALIZED VIEW MonthlySales AS

SELECT MONTH(Date) AS Month, SUM(Sales) AS TotalSales

FROM Transactions

GROUP BY MONTH(Date);

### **Summary**

- Use a **view** when you need up-to-date, dynamic data directly from the source.

- Use a **materialized view** when performance is critical, and you can afford periodic data
refreshes.

9. What is junk dimension?

In scenarios where certain data may not be appropriate to store in the schema, the
data (or attributes) can be stored in a junk dimension. The nature of the data of junk
dimension is usually Boolean or flag values.
Example
Suppose a fact table has the following attributes that don’t fit into other dimensions:
 Is_Returned (Yes/No)
 Order_Priority (High/Medium/Low)
 Payment_Mode (Credit Card/Debit Card/Cash)
These attributes can be combined into a junk dimension like this:

Junk_Dimension_I Is_Returne Order_Priorit Payment_Mod

D d y e

1 Yes High Credit Card

2 No Medium Cash

3 Yes Low Debit Card

In the fact table, a single foreign key (Junk_Dimension_ID) references this junk
dimension, reducing complexity.

10. What are the different types of SCDs used in Data Warehousing?

In data warehousing, Slowly Changing Dimensions (SCDs) refer to dimensions where

attribute values change slowly over time, and different techniques are used to handle these
changes. There are three main types of SCDs:

Type Description Use Case

Overwrite the old value with

Type 1 When tracking history is unnecessary.
the new one.

Add new rows for each change

Type 2 When full history needs to be preserved.
(keeps full history).

Add new columns for previous When limited history (e.g., current and
Type 3
values. previous) is needed.

To separate historical data for better

Type 4 Use a separate history table.
performance or management.

Hybrid approach combining For complex history tracking with current and
Type 6
Type 1, Type 2, and Type 3. previous values.
22. Explain the ETL cycle's 3-layer architecture.

The staging layer, the data integration layer, and the access layer are the three layers that are
involved in an ETL cycle.

Staging layer: It is used to store the data extracted from various data structures of the
source.

Data integration layer: Data from the staging layer is transformed and transferred to the
database using the integration layer. The data is arranged into hierarchical groups (often
referred to as dimensions), facts, and aggregates. In a DW system, the combination of facts
and dimensions tables is called a schema.

Access layer: For analytical reporting, end-users use the access layer to retrieve the data.

10. Define conformed dimensions.

Conformed dimensions in data warehousing are dimensions that are consistent and shared
across multiple fact tables or data marts, allowing for standardized reporting and analysis.
These dimensions are designed to be used universally across different parts of the
organization, ensuring that data from different sources or business areas can be combined
and compared accurately.

Example:

 A Date dimension can be a conformed dimension used across multiple fact tables like
Sales Fact, Inventory Fact, and Shipping Fact, ensuring that all the facts are analyzed
in the context of the same calendar dates.

Date_ID Date Month Quarter Year

1 2023-01-01 January Q1 2023

2 2023-01-02 January Q1 2023

In this case, the Date dimension is conformed because it has the same structure and
attributes used across all different fact tables, ensuring data consistency.

11.

Non-additive facts in data warehousing are facts or measures that cannot be meaningfully
added or aggregated across any dimension. These facts do not support typical aggregation
methods like sum, average, or count, and thus require specialized handling when used in
queries.

Characteristics of Non-Additive Facts:

Instead of aggregation, non-additive facts often require more complex calculations or
operations (e.g., ratios, percentages, or averages).

Typically used in metrics such as ratios, percentages, or other derived measures.

Handling Non-Additive Facts:

 For non-additive facts, the appropriate aggregation function must be defined (e.g.,
calculating the average of ratios, rather than summing them).

 Specialized reporting or querying techniques are used to handle them, ensuring that
incorrect aggregations don’t occur.

In summary, non-additive facts require careful treatment in a data warehouse to ensure that
they are correctly calculated and interpreted across dimensions.

12. What is meant by VLDB?

VLDB or a Very large database consists of a database of one terabyte. The database requires
storage space with the most extensive file and a large number of database rows. This
database uses decision support applications and training process applications for a large
number of users.

11. Slice vs dice

Summary of Operations:

Operation Definition Effect

Selects a single value from one

Reduces a multidimensional cube to a
Slice dimension, reducing the cube’s
lower-dimensional subset.
dimensionality.

Filters data based on multiple

Selects multiple values from multiple
Dice conditions to focus on specific data
dimensions, creating a subcube.
points.

12. What are some of the functions performed by OLAP?

The primary functions performed by OLAP are:

Roll up
Slice

Dice

Drill-down

Pivot

13. You face resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse. How would you handle this situation?

Handling resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse requires a strategic, empathetic, and solution-
oriented approach. Here’s a step-by-step way to address the situation:

1. Understand Their Concerns

 Listen Actively: Schedule a one-on-one meeting to understand the specific issues

they are facing. Ask open-ended questions to get detailed insights into their
concerns.

o Example: "Can you share some specific examples where you feel the data
warehouse is not meeting your needs?"

 Clarify Expectations: Understand their expectations for data access, quality, and
timeliness.

o Example: "What types of reports or data access would help you make better
decisions?"

2. Validate Their Concerns

 Acknowledge the Problem: Show empathy and validate that their concerns are
legitimate. Business leaders want to feel heard and understood.

o Example: "I understand that the current data warehouse setup might not fully
address your reporting needs, and I see how that can affect your ability to
make informed decisions."

3. Assess the Root Cause

 Analyze the Data Gaps: Investigate whether the issues stem from:

o Incomplete or outdated data

o Incorrect data models or dimensions

o Lack of training or understanding of available tools

o Technical limitations in the data warehouse

 Consult with Your Team: Collaborate with your data team to analyze whether the
data warehouse infrastructure is misaligned with the business unit’s needs or if the
issue lies elsewhere.

4. Offer a Solution or Roadmap

 Propose Immediate Fixes: If the issues are simple to resolve (e.g., a missing report or
data field), prioritize and provide a solution.

o Example: "We can quickly add the requested data field to the dashboard you
need."

 Long-term Improvements: If the issues are more complex (e.g., changes to data
models or integration with new systems), provide a roadmap outlining how these
changes will be implemented.

o Example: "It will take a few weeks to restructure the data model to
accommodate your business unit's unique needs, but we can start working on
it immediately and provide progress updates."

5. Align Data Warehouse with Business Needs

 Collaborate with Stakeholders: Involve the business leader in data warehouse

improvement discussions, so they feel empowered and part of the process.

o Example: "Let's have regular check-ins to ensure the data warehouse evolves
in line with your evolving needs."

 Enhance Reporting and Training: If the issue lies in usability or access, offer
additional training sessions or work with the business unit to create tailored reports
or dashboards.

o Example: "We can schedule a workshop to help your team better understand
how to extract the data you need from the warehouse."

6. Set Clear Expectations and Timelines

 Manage Expectations: Be transparent about the time and resources needed to

implement changes. Ensure the business leader understands what can be delivered
immediately versus what will take longer to address.

o Example: "While some changes can be made within a few days, others may
require more development time, but we’ll provide updates as we make
progress."

7. Follow-up and Measure Success

 Regular Check-ins: Once the changes are implemented, schedule follow-ups to
confirm that the data warehouse is meeting their needs and to identify any
remaining gaps.

o Example: "Let’s meet again next month to review how the new reports are
helping and if any further adjustments are needed."

 Iterative Improvements: Encourage continuous feedback to ensure the data

warehouse evolves and remains aligned with the business needs.

Summary Approach:

1. Listen and understand their concerns.

2. Validate and empathize with their situation.

3. Assess the root cause of the issues.

4. Provide both short-term and long-term solutions.

5. Collaborate to enhance the data warehouse in alignment with their needs.

6. Set clear expectations and timelines.

7. Follow-up regularly and refine based on feedback.

By demonstrating understanding and commitment to addressing their concerns, you can

build a stronger partnership and ensure the data warehouse supports the business unit’s
goals.

14. You need to explain the benefits of data warehousing to executives unfamiliar with
the technical side. How would you communicate its value in layman’s terms?

Answer: Use analogies like comparing the data warehouse to a central library for historical
information. Explain how it enables informed decision-making by providing easily accessible,
accurate, and integrated data insights.

When explaining the value of data warehousing to executives who may not be familiar with
the technical aspects, it’s important to focus on the business benefits and outcomes rather
than technical jargon. Here's how you could communicate its value in layman’s terms:

Summary of Benefits

1. Better, Faster Decisions: Easy access to accurate, up-to-date data.

2. Time Savings: Streamlined reporting and analysis.

3. High-Quality Data: Consistent and error-free data.

4. Historical Insights: Track trends over time and predict future outcomes.

5. Competitive Edge: Faster decision-making and market insights.

6. Scalable Growth: Grows with your business without losing efficiency.

15.

Google I - O Extended Slides - Thomas Chong
No ratings yet
Google I - O Extended Slides - Thomas Chong
56 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Master Thesis Database Rug
100% (2)
Master Thesis Database Rug
7 pages
OOAD - Module 4
No ratings yet
OOAD - Module 4
27 pages
DWH 03
No ratings yet
DWH 03
39 pages
6th - SEM Data Science Notes
No ratings yet
6th - SEM Data Science Notes
46 pages
Unit 5 Endsem PYQS
No ratings yet
Unit 5 Endsem PYQS
16 pages
Strengthening Cybersecurity in Nigerian Libraries: Challenges, Mitigation Strategies, and Future Trends (WWW - Kiu.ac - Ug)
100% (1)
Strengthening Cybersecurity in Nigerian Libraries: Challenges, Mitigation Strategies, and Future Trends (WWW - Kiu.ac - Ug)
5 pages
LDAP Authentication For IBM DS8000 Systems
No ratings yet
LDAP Authentication For IBM DS8000 Systems
172 pages
Interview Questions Data Warehouse
No ratings yet
Interview Questions Data Warehouse
35 pages
CCZT Knowledge Guide
No ratings yet
CCZT Knowledge Guide
6 pages
Basic Internet
No ratings yet
Basic Internet
15 pages
Sandip Dalavi 8796956036
No ratings yet
Sandip Dalavi 8796956036
2 pages
Soc 2
No ratings yet
Soc 2
3 pages
Business Intelligence Study Guide
No ratings yet
Business Intelligence Study Guide
24 pages
Route53 RoutingPolicies
No ratings yet
Route53 RoutingPolicies
9 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
Data Ware Hose Fundamentals
No ratings yet
Data Ware Hose Fundamentals
13 pages
Software Development Proposal
50% (2)
Software Development Proposal
6 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
Lecture 1 Introduction To Data Warehousing
No ratings yet
Lecture 1 Introduction To Data Warehousing
41 pages
1571217737553resume Vignesh
No ratings yet
1571217737553resume Vignesh
4 pages
SQL
No ratings yet
SQL
14 pages
Data Warehousing and Data Mining: Dr. Karunendra Verma
No ratings yet
Data Warehousing and Data Mining: Dr. Karunendra Verma
101 pages
Fusion Applications Accounting Hub Fundamentals D77158GC20 - 28 - US
No ratings yet
Fusion Applications Accounting Hub Fundamentals D77158GC20 - 28 - US
3 pages
Chapter1 Introduction To IT Audit
No ratings yet
Chapter1 Introduction To IT Audit
83 pages
Discover The Top 8 Types Cybersecurity Jobs and Salary Insights - Reader Mode
No ratings yet
Discover The Top 8 Types Cybersecurity Jobs and Salary Insights - Reader Mode
8 pages
FYP Presentations2 Sem1-2
No ratings yet
FYP Presentations2 Sem1-2
24 pages
Data Warehousing PArt B
No ratings yet
Data Warehousing PArt B
7 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Techtrail Technologies Company Overview
No ratings yet
Techtrail Technologies Company Overview
8 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Percentage
No ratings yet
Percentage
8 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
22 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
Data Warehouse Overview
No ratings yet
Data Warehouse Overview
36 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
Shaik Shabbeer Babu Resume
No ratings yet
Shaik Shabbeer Babu Resume
4 pages
S1, S2, 2. Data Warehousing Concepts and Stella Gatziu and AVavouras (1999)
No ratings yet
S1, S2, 2. Data Warehousing Concepts and Stella Gatziu and AVavouras (1999)
4 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
W.B.C.S. Main Examination 2019 Optional Computer Science Question Paper 1
No ratings yet
W.B.C.S. Main Examination 2019 Optional Computer Science Question Paper 1
5 pages
Management Information Systems
0% (1)
Management Information Systems
9 pages
GlobalScale Whitepaper WebVersion 072018
No ratings yet
GlobalScale Whitepaper WebVersion 072018
10 pages
Data Warehousin G Concepts
No ratings yet
Data Warehousin G Concepts
39 pages
Indexing: Database System Concepts, 6 Ed
No ratings yet
Indexing: Database System Concepts, 6 Ed
15 pages
Data Modeler
No ratings yet
Data Modeler
5 pages
Data Warehousin G Concepts
No ratings yet
Data Warehousin G Concepts
41 pages
ETL Training - Day 1
No ratings yet
ETL Training - Day 1
59 pages
Data Warehousing: People Making Technology Wor K™
100% (1)
Data Warehousing: People Making Technology Wor K™
44 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
ISTQB CTFL v4.0 Sample-Exam-A-Questions v1.6
No ratings yet
ISTQB CTFL v4.0 Sample-Exam-A-Questions v1.6
31 pages
15 Servelets
No ratings yet
15 Servelets
27 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
A Complete Notes
No ratings yet
A Complete Notes
10 pages
Unit-4 CS
No ratings yet
Unit-4 CS
13 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
No ratings yet
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
7 pages
Togaf 9
No ratings yet
Togaf 9
3 pages
Warehousing
No ratings yet
Warehousing
15 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Module 1 Data Warehousing Fundamentals
No ratings yet
Module 1 Data Warehousing Fundamentals
17 pages
A Project Repot ON School Management System
No ratings yet
A Project Repot ON School Management System
41 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Tandberg Data SMB Guide To: Backup Best Practices
No ratings yet
Tandberg Data SMB Guide To: Backup Best Practices
25 pages
CRM For Airlines Industry
86% (7)
CRM For Airlines Industry
42 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
Abinitio Session 1
100% (1)
Abinitio Session 1
237 pages
DWH Faqs
No ratings yet
DWH Faqs
13 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
BGP Labeled Unicast Segment Routing Multi-Agent Support
No ratings yet
BGP Labeled Unicast Segment Routing Multi-Agent Support
6 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
DWM - Viva and Short Question Answers
No ratings yet
DWM - Viva and Short Question Answers
24 pages
Data Warehousing Experienced Level Questions
No ratings yet
Data Warehousing Experienced Level Questions
11 pages
Interview Abinitio
100% (2)
Interview Abinitio
28 pages
Kiran Abinitio
No ratings yet
Kiran Abinitio
66 pages
Project Proposal OF ERP
17% (6)
Project Proposal OF ERP
7 pages
Estimation Concepts 51608
No ratings yet
Estimation Concepts 51608
9 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Data Warehousing Basics Interview Questions
No ratings yet
Data Warehousing Basics Interview Questions
24 pages
Data WareHouse Previous Year Question Paper
100% (1)
Data WareHouse Previous Year Question Paper
10 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)