0% found this document useful (0 votes)
15 views12 pages

DWH

DWH

Uploaded by

Amit Patra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

DWH

DWH

Uploaded by

Amit Patra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1. Compare a database with Data Warehouse.

Criteria Database Data Warehouse

Type of data Relational or object-oriented data Large volume with multiple data
types

Data operations Transaction processing Data modeling and analysis

Dimensions of Two-dimensional data Multidimensional data


data

Data design ER-based and application-oriented Star/Snowflake schema and subject-


database design oriented database design

Size of the data Small (in GB) Large (in TB)

Functionality High availability and performance High flexibility and user autonomy

2. What is the purpose of cluster analysis in Data Warehousing?

Cluster analysis in data warehousing is a technique used for grouping data objects based on
their similarity. The primary purpose of cluster analysis is to identify patterns, groupings, or
relationships within data that are not immediately apparent. Here's a detailed explanation:

Purpose of Cluster Analysis in Data Warehousing

1. Data Segmentation:
o Divides large datasets into smaller, meaningful groups (clusters) based on
similarity.

o Helps in understanding and organizing data for easier analysis and decision-
making.

2. Customer Segmentation:

o Groups customers based on shared characteristics such as buying behavior,


demographics, or preferences.

o Enables personalized marketing, targeted promotions, and better customer


service.

3. Outlier Detection:

o Identifies data points that do not belong to any cluster, which can represent
anomalies or errors.

o Useful for fraud detection, quality control, and error analysis.

4. Pattern Recognition:

o Discovers hidden patterns in the data, such as frequently co-occurring


features or relationships.

o Supports predictive modeling and data mining tasks.

5. Simplification of Data:

o Reduces the complexity of large datasets by summarizing them into distinct


groups.

o Improves the performance of further data analysis tasks, such as classification


or trend analysis.

6. Improving Data Quality:

o Groups similar records together, which can aid in identifying duplicates or


inconsistencies.

o Enhances the accuracy of the data warehouse.

7. Enhancing Decision-Making:

o Provides insights into distinct segments or clusters, enabling more informed


strategic decisions.

o For example, identifying high-value customers or regions with high sales


potential.

8. Trend Analysis:
o Identifies trends over time within clusters, helping organizations understand
changes in behaviour or performance.

o Useful for forecasting and planning.

3. What is the difference between agglomerative and divisive hierarchical clustering?

Agglomerative clustering builds clusters bottom-up by merging smaller clusters, while


divisive clustering splits clusters top-down starting with a single large cluster.

4. What is Virtual Data Warehousing?

Virtual Data Warehousing is an architectural approach where a central data repository is not
physically implemented; instead, data remains in source systems and is accessed through a
virtual layer that provides a unified view for analysis and reporting.

5. What is Active Data Warehousing?

Active Data Warehousing is a data warehousing approach that supports real-time or near-
real-time data updates and analytics, enabling businesses to make timely, data-driven
decisions and respond quickly to changing conditions.

6. What is a snapshot with reference to Data Warehouse?

In a data warehouse, a snapshot is a static view of data captured at a specific point in time,
used to track historical changes, create time-based reports, or maintain periodic data
records for analysis.

7. What is the difference between ‘view’ and ‘materialized view’?

Here’s the detailed explanation of the difference between view and materialized view:

Aspect View Materialized View

A view is a virtual table based on the A materialized view is a physical copy


result of a query. It does not store the of the query results stored in the
Definition data itself but fetches data dynamically database. The data is precomputed
from the underlying tables whenever and saved, making it available for
accessed. faster access.

No physical storage of data; it only Physically stores data in the database,


Data Storage
stores the query definition. consuming storage space.

Performance Every time the view is queried, the Faster because the results are
database executes the underlying precomputed and stored, reducing
query, which can be slow for complex the need to execute complex queries
Aspect View Materialized View

queries. repeatedly.

Does not automatically update with


Automatically reflects real-time
changes in the underlying tables;
Data Updates changes in the underlying tables since
requires manual or scheduled refresh
it dynamically fetches the latest data.
to synchronize the data.

Requires regular maintenance to


No need for maintenance since it does
Maintenance refresh and update the data to stay
not store data.
current.

Used when data must always reflect Suitable for scenarios where query
the latest state of the underlying performance is critical, especially for
Use Case
tables. Ideal for simple and dynamic complex or expensive queries with
queries. relatively static data.

Example A view to show all active customers:

CREATE VIEW ActiveCustomers AS

SELECT * FROM Customers WHERE Status = 'Active';

A materialized view for monthly sales summary:

CREATE MATERIALIZED VIEW MonthlySales AS

SELECT MONTH(Date) AS Month, SUM(Sales) AS TotalSales

FROM Transactions

GROUP BY MONTH(Date);

### **Summary**

- Use a **view** when you need up-to-date, dynamic data directly from the source.

- Use a **materialized view** when performance is critical, and you can afford periodic data
refreshes.

9. What is junk dimension?


In scenarios where certain data may not be appropriate to store in the schema, the
data (or attributes) can be stored in a junk dimension. The nature of the data of junk
dimension is usually Boolean or flag values.
Example
Suppose a fact table has the following attributes that don’t fit into other dimensions:
 Is_Returned (Yes/No)
 Order_Priority (High/Medium/Low)
 Payment_Mode (Credit Card/Debit Card/Cash)
These attributes can be combined into a junk dimension like this:

Junk_Dimension_I Is_Returne Order_Priorit Payment_Mod


D d y e

1 Yes High Credit Card

2 No Medium Cash

3 Yes Low Debit Card

In the fact table, a single foreign key (Junk_Dimension_ID) references this junk
dimension, reducing complexity.

10. What are the different types of SCDs used in Data Warehousing?

In data warehousing, Slowly Changing Dimensions (SCDs) refer to dimensions where


attribute values change slowly over time, and different techniques are used to handle these
changes. There are three main types of SCDs:

Type Description Use Case

Overwrite the old value with


Type 1 When tracking history is unnecessary.
the new one.

Add new rows for each change


Type 2 When full history needs to be preserved.
(keeps full history).

Add new columns for previous When limited history (e.g., current and
Type 3
values. previous) is needed.

To separate historical data for better


Type 4 Use a separate history table.
performance or management.

Hybrid approach combining For complex history tracking with current and
Type 6
Type 1, Type 2, and Type 3. previous values.
22. Explain the ETL cycle's 3-layer architecture.

The staging layer, the data integration layer, and the access layer are the three layers that are
involved in an ETL cycle.

Staging layer: It is used to store the data extracted from various data structures of the
source.

Data integration layer: Data from the staging layer is transformed and transferred to the
database using the integration layer. The data is arranged into hierarchical groups (often
referred to as dimensions), facts, and aggregates. In a DW system, the combination of facts
and dimensions tables is called a schema.

Access layer: For analytical reporting, end-users use the access layer to retrieve the data.

10. Define conformed dimensions.

Conformed dimensions in data warehousing are dimensions that are consistent and shared
across multiple fact tables or data marts, allowing for standardized reporting and analysis.
These dimensions are designed to be used universally across different parts of the
organization, ensuring that data from different sources or business areas can be combined
and compared accurately.

Example:

 A Date dimension can be a conformed dimension used across multiple fact tables like
Sales Fact, Inventory Fact, and Shipping Fact, ensuring that all the facts are analyzed
in the context of the same calendar dates.

Date_ID Date Month Quarter Year

1 2023-01-01 January Q1 2023

2 2023-01-02 January Q1 2023

In this case, the Date dimension is conformed because it has the same structure and
attributes used across all different fact tables, ensuring data consistency.

11.

Non-additive facts in data warehousing are facts or measures that cannot be meaningfully
added or aggregated across any dimension. These facts do not support typical aggregation
methods like sum, average, or count, and thus require specialized handling when used in
queries.

Characteristics of Non-Additive Facts:


Instead of aggregation, non-additive facts often require more complex calculations or
operations (e.g., ratios, percentages, or averages).

Typically used in metrics such as ratios, percentages, or other derived measures.

Handling Non-Additive Facts:

 For non-additive facts, the appropriate aggregation function must be defined (e.g.,
calculating the average of ratios, rather than summing them).

 Specialized reporting or querying techniques are used to handle them, ensuring that
incorrect aggregations don’t occur.

In summary, non-additive facts require careful treatment in a data warehouse to ensure that
they are correctly calculated and interpreted across dimensions.

12. What is meant by VLDB?

VLDB or a Very large database consists of a database of one terabyte. The database requires
storage space with the most extensive file and a large number of database rows. This
database uses decision support applications and training process applications for a large
number of users.

11. Slice vs dice

Summary of Operations:

Operation Definition Effect

Selects a single value from one


Reduces a multidimensional cube to a
Slice dimension, reducing the cube’s
lower-dimensional subset.
dimensionality.

Filters data based on multiple


Selects multiple values from multiple
Dice conditions to focus on specific data
dimensions, creating a subcube.
points.

12. What are some of the functions performed by OLAP?

The primary functions performed by OLAP are:

Roll up
Slice

Dice

Drill-down

Pivot

13. You face resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse. How would you handle this situation?

Handling resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse requires a strategic, empathetic, and solution-
oriented approach. Here’s a step-by-step way to address the situation:

1. Understand Their Concerns

 Listen Actively: Schedule a one-on-one meeting to understand the specific issues


they are facing. Ask open-ended questions to get detailed insights into their
concerns.

o Example: "Can you share some specific examples where you feel the data
warehouse is not meeting your needs?"

 Clarify Expectations: Understand their expectations for data access, quality, and
timeliness.

o Example: "What types of reports or data access would help you make better
decisions?"

2. Validate Their Concerns

 Acknowledge the Problem: Show empathy and validate that their concerns are
legitimate. Business leaders want to feel heard and understood.

o Example: "I understand that the current data warehouse setup might not fully
address your reporting needs, and I see how that can affect your ability to
make informed decisions."

3. Assess the Root Cause

 Analyze the Data Gaps: Investigate whether the issues stem from:

o Incomplete or outdated data

o Incorrect data models or dimensions

o Lack of training or understanding of available tools

o Technical limitations in the data warehouse


 Consult with Your Team: Collaborate with your data team to analyze whether the
data warehouse infrastructure is misaligned with the business unit’s needs or if the
issue lies elsewhere.

4. Offer a Solution or Roadmap

 Propose Immediate Fixes: If the issues are simple to resolve (e.g., a missing report or
data field), prioritize and provide a solution.

o Example: "We can quickly add the requested data field to the dashboard you
need."

 Long-term Improvements: If the issues are more complex (e.g., changes to data
models or integration with new systems), provide a roadmap outlining how these
changes will be implemented.

o Example: "It will take a few weeks to restructure the data model to
accommodate your business unit's unique needs, but we can start working on
it immediately and provide progress updates."

5. Align Data Warehouse with Business Needs

 Collaborate with Stakeholders: Involve the business leader in data warehouse


improvement discussions, so they feel empowered and part of the process.

o Example: "Let's have regular check-ins to ensure the data warehouse evolves
in line with your evolving needs."

 Enhance Reporting and Training: If the issue lies in usability or access, offer
additional training sessions or work with the business unit to create tailored reports
or dashboards.

o Example: "We can schedule a workshop to help your team better understand
how to extract the data you need from the warehouse."

6. Set Clear Expectations and Timelines

 Manage Expectations: Be transparent about the time and resources needed to


implement changes. Ensure the business leader understands what can be delivered
immediately versus what will take longer to address.

o Example: "While some changes can be made within a few days, others may
require more development time, but we’ll provide updates as we make
progress."

7. Follow-up and Measure Success


 Regular Check-ins: Once the changes are implemented, schedule follow-ups to
confirm that the data warehouse is meeting their needs and to identify any
remaining gaps.

o Example: "Let’s meet again next month to review how the new reports are
helping and if any further adjustments are needed."

 Iterative Improvements: Encourage continuous feedback to ensure the data


warehouse evolves and remains aligned with the business needs.

Summary Approach:

1. Listen and understand their concerns.

2. Validate and empathize with their situation.

3. Assess the root cause of the issues.

4. Provide both short-term and long-term solutions.

5. Collaborate to enhance the data warehouse in alignment with their needs.

6. Set clear expectations and timelines.

7. Follow-up regularly and refine based on feedback.

By demonstrating understanding and commitment to addressing their concerns, you can


build a stronger partnership and ensure the data warehouse supports the business unit’s
goals.

14. You need to explain the benefits of data warehousing to executives unfamiliar with
the technical side. How would you communicate its value in layman’s terms?

Answer: Use analogies like comparing the data warehouse to a central library for historical
information. Explain how it enables informed decision-making by providing easily accessible,
accurate, and integrated data insights.

When explaining the value of data warehousing to executives who may not be familiar with
the technical aspects, it’s important to focus on the business benefits and outcomes rather
than technical jargon. Here's how you could communicate its value in layman’s terms:

Summary of Benefits

1. Better, Faster Decisions: Easy access to accurate, up-to-date data.

2. Time Savings: Streamlined reporting and analysis.

3. High-Quality Data: Consistent and error-free data.


4. Historical Insights: Track trends over time and predict future outcomes.

5. Competitive Edge: Faster decision-making and market insights.

6. Scalable Growth: Grows with your business without losing efficiency.

15.

You might also like