DWH
DWH
Type of data Relational or object-oriented data Large volume with multiple data
types
Functionality High availability and performance High flexibility and user autonomy
Cluster analysis in data warehousing is a technique used for grouping data objects based on
their similarity. The primary purpose of cluster analysis is to identify patterns, groupings, or
relationships within data that are not immediately apparent. Here's a detailed explanation:
1. Data Segmentation:
o Divides large datasets into smaller, meaningful groups (clusters) based on
similarity.
o Helps in understanding and organizing data for easier analysis and decision-
making.
2. Customer Segmentation:
3. Outlier Detection:
o Identifies data points that do not belong to any cluster, which can represent
anomalies or errors.
4. Pattern Recognition:
5. Simplification of Data:
7. Enhancing Decision-Making:
8. Trend Analysis:
o Identifies trends over time within clusters, helping organizations understand
changes in behaviour or performance.
Virtual Data Warehousing is an architectural approach where a central data repository is not
physically implemented; instead, data remains in source systems and is accessed through a
virtual layer that provides a unified view for analysis and reporting.
Active Data Warehousing is a data warehousing approach that supports real-time or near-
real-time data updates and analytics, enabling businesses to make timely, data-driven
decisions and respond quickly to changing conditions.
In a data warehouse, a snapshot is a static view of data captured at a specific point in time,
used to track historical changes, create time-based reports, or maintain periodic data
records for analysis.
Here’s the detailed explanation of the difference between view and materialized view:
Performance Every time the view is queried, the Faster because the results are
database executes the underlying precomputed and stored, reducing
query, which can be slow for complex the need to execute complex queries
Aspect View Materialized View
queries. repeatedly.
Used when data must always reflect Suitable for scenarios where query
the latest state of the underlying performance is critical, especially for
Use Case
tables. Ideal for simple and dynamic complex or expensive queries with
queries. relatively static data.
FROM Transactions
GROUP BY MONTH(Date);
### **Summary**
- Use a **view** when you need up-to-date, dynamic data directly from the source.
- Use a **materialized view** when performance is critical, and you can afford periodic data
refreshes.
2 No Medium Cash
In the fact table, a single foreign key (Junk_Dimension_ID) references this junk
dimension, reducing complexity.
10. What are the different types of SCDs used in Data Warehousing?
Add new columns for previous When limited history (e.g., current and
Type 3
values. previous) is needed.
Hybrid approach combining For complex history tracking with current and
Type 6
Type 1, Type 2, and Type 3. previous values.
22. Explain the ETL cycle's 3-layer architecture.
The staging layer, the data integration layer, and the access layer are the three layers that are
involved in an ETL cycle.
Staging layer: It is used to store the data extracted from various data structures of the
source.
Data integration layer: Data from the staging layer is transformed and transferred to the
database using the integration layer. The data is arranged into hierarchical groups (often
referred to as dimensions), facts, and aggregates. In a DW system, the combination of facts
and dimensions tables is called a schema.
Access layer: For analytical reporting, end-users use the access layer to retrieve the data.
Conformed dimensions in data warehousing are dimensions that are consistent and shared
across multiple fact tables or data marts, allowing for standardized reporting and analysis.
These dimensions are designed to be used universally across different parts of the
organization, ensuring that data from different sources or business areas can be combined
and compared accurately.
Example:
A Date dimension can be a conformed dimension used across multiple fact tables like
Sales Fact, Inventory Fact, and Shipping Fact, ensuring that all the facts are analyzed
in the context of the same calendar dates.
In this case, the Date dimension is conformed because it has the same structure and
attributes used across all different fact tables, ensuring data consistency.
11.
Non-additive facts in data warehousing are facts or measures that cannot be meaningfully
added or aggregated across any dimension. These facts do not support typical aggregation
methods like sum, average, or count, and thus require specialized handling when used in
queries.
For non-additive facts, the appropriate aggregation function must be defined (e.g.,
calculating the average of ratios, rather than summing them).
Specialized reporting or querying techniques are used to handle them, ensuring that
incorrect aggregations don’t occur.
In summary, non-additive facts require careful treatment in a data warehouse to ensure that
they are correctly calculated and interpreted across dimensions.
VLDB or a Very large database consists of a database of one terabyte. The database requires
storage space with the most extensive file and a large number of database rows. This
database uses decision support applications and training process applications for a large
number of users.
Summary of Operations:
Roll up
Slice
Dice
Drill-down
Pivot
13. You face resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse. How would you handle this situation?
Handling resistance from a business unit leader who feels their data needs are not
adequately addressed by the data warehouse requires a strategic, empathetic, and solution-
oriented approach. Here’s a step-by-step way to address the situation:
o Example: "Can you share some specific examples where you feel the data
warehouse is not meeting your needs?"
Clarify Expectations: Understand their expectations for data access, quality, and
timeliness.
o Example: "What types of reports or data access would help you make better
decisions?"
Acknowledge the Problem: Show empathy and validate that their concerns are
legitimate. Business leaders want to feel heard and understood.
o Example: "I understand that the current data warehouse setup might not fully
address your reporting needs, and I see how that can affect your ability to
make informed decisions."
Analyze the Data Gaps: Investigate whether the issues stem from:
Propose Immediate Fixes: If the issues are simple to resolve (e.g., a missing report or
data field), prioritize and provide a solution.
o Example: "We can quickly add the requested data field to the dashboard you
need."
Long-term Improvements: If the issues are more complex (e.g., changes to data
models or integration with new systems), provide a roadmap outlining how these
changes will be implemented.
o Example: "It will take a few weeks to restructure the data model to
accommodate your business unit's unique needs, but we can start working on
it immediately and provide progress updates."
o Example: "Let's have regular check-ins to ensure the data warehouse evolves
in line with your evolving needs."
Enhance Reporting and Training: If the issue lies in usability or access, offer
additional training sessions or work with the business unit to create tailored reports
or dashboards.
o Example: "We can schedule a workshop to help your team better understand
how to extract the data you need from the warehouse."
o Example: "While some changes can be made within a few days, others may
require more development time, but we’ll provide updates as we make
progress."
o Example: "Let’s meet again next month to review how the new reports are
helping and if any further adjustments are needed."
Summary Approach:
14. You need to explain the benefits of data warehousing to executives unfamiliar with
the technical side. How would you communicate its value in layman’s terms?
Answer: Use analogies like comparing the data warehouse to a central library for historical
information. Explain how it enables informed decision-making by providing easily accessible,
accurate, and integrated data insights.
When explaining the value of data warehousing to executives who may not be familiar with
the technical aspects, it’s important to focus on the business benefits and outcomes rather
than technical jargon. Here's how you could communicate its value in layman’s terms:
Summary of Benefits
15.