0% found this document useful (0 votes)
48 views6 pages

Unit I Data Warehousing Notes Hinglish FULL CLEAN

Unit I of the Data Warehousing & Data Mining course provides an overview of data warehousing, including its definition, components, architecture, and advantages. It distinguishes between OLTP and OLAP systems, explains different schemas like star and snowflake, and details the ETL process. The unit concludes by emphasizing the importance of data warehousing in supporting business intelligence and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Unit I Data Warehousing Notes Hinglish FULL CLEAN

Unit I of the Data Warehousing & Data Mining course provides an overview of data warehousing, including its definition, components, architecture, and advantages. It distinguishes between OLTP and OLAP systems, explains different schemas like star and snowflake, and details the ETL process. The unit concludes by emphasizing the importance of data warehousing in supporting business intelligence and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

KOE093: Data Warehousing & Data Mining - Unit I Notes

Unit I: Data Warehousing

Data Warehousing: Overview, Definition, Components, Architecture

Introduction:

Data Warehousing ek aisa concept hai jahan par alag-alag data sources se data ko collect karke ek

centralized repository mein store kiya jata hai. Ye mainly analysis aur reporting ke liye use hota hai.

Definition:

Data Warehouse ek subject-oriented, integrated, time-variant aur non-volatile collection hoti hai data

ka jo decision making ke process ko support karta hai.

Components of Data Warehouse:

1. Data Sources - Multiple OLTP systems jaise banking apps, CRMs, ERPs etc.

2. ETL Process - ETL ka matlab hai Extract, Transform, Load. Ye process data ko extract karta hai,

clean karta hai aur warehouse mein load karta hai.

3. Data Warehouse Database - Ye wo jagah hai jahan cleaned aur integrated data store hota hai.

4. Metadata - Data ke baare mein data. Jaise source, format, update frequency.

5. Front-End Tools - Jise analysts use karte hain for reporting and data analysis.

Architecture of Data Warehouse:

1. Single-tier architecture

2. Two-tier architecture

3. Three-tier architecture (most common)

Page 1
KOE093: Data Warehousing & Data Mining - Unit I Notes

- Bottom Tier: Database server (relational database)

- Middle Tier: OLAP server

- Top Tier: Front-end tools for reporting, data mining, etc.

Difference between OLTP and OLAP:

- OLTP (Online Transaction Processing): Real-time transactions ke liye

- OLAP (Online Analytical Processing): Analysis and complex queries ke liye

Multidimensional Data Model:

- Data ko cubes ki form mein represent karta hai.

- Dimensions: Customer, Time, Product, etc.

- Measures: Sales, Profit etc.

Schemas in Data Warehouse:

1. Star Schema - Central fact table with dimension tables.

2. Snowflake Schema - Normalized version of star schema.

3. Fact Constellation Schema - Multiple fact tables sharing dimension tables.

Concepts:

- Facts: Quantifiable data like sales amount, units sold.

- Dimensions: Contextual information like date, product, location.

- Data Marts: Subset of data warehouse specific to business line.

(ye content abhi ek sample hai, full Unit I mein aur bhi topics cover honge jaise types of data

warehouse, characteristics, advantages etc. Poora 10,000+ word ka PDF banega.)

Page 2
KOE093: Data Warehousing & Data Mining - Unit I Notes

Advantages of Data Warehousing:

1. Better Decision Making - Historical data ka access milta hai jo decision making mein help karta

hai.

2. Data Consistency - Data ek standardized form mein store hota hai.

3. Improved Business Intelligence - Analytical queries easily run kiye ja sakte hain.

4. High Performance - Queries fast run hoti hain kyunki transactional data se separate hota hai.

5. Data Integration - Multiple sources ka data ek jagah integrate hota hai.

Characteristics of Data Warehouse:

1. Subject-Oriented - Business subjects (sales, marketing, finance) pe focus karta hai.

2. Integrated - Different sources ka data clean karke integrate kiya jata hai.

3. Time-Variant - Data historical hota hai, time dimension ke sath.

4. Non-Volatile - Data ek baar store ho gaya toh change nahi hota.

Types of Data Warehouse:

1. Enterprise Data Warehouse (EDW) - Puri organization ke liye centralized warehouse.

2. Operational Data Store (ODS) - Real-time data ke liye use hota hai.

3. Data Mart - Specific department (jaise sales ya finance) ke liye subset of data warehouse.

Data Warehouse vs Database:

| Feature | Database | Data Warehouse |

|----------------|------------------------------|--------------------------------|

| Purpose | Daily transactions | Analytical processing |

| Data Type | Current data | Historical data |

Page 3
KOE093: Data Warehousing & Data Mining - Unit I Notes

| Normalization | Highly normalized | Denormalized schemas |

| Users | Clerks, DB admins | Business analysts, managers |

Multidimensional Data Modeling:

- Ye modeling technique facts and dimensions ke form mein data ko organize karta hai.

- Example: Sales data can be analyzed across dimensions like Time, Product, and Region.

Schemas Explained:

1. Star Schema:

- Ek fact table hoti hai jo numeric data rakhti hai.

- Dimension tables directly connect hoti hain fact table se.

- Easy to understand aur fast queries.

2. Snowflake Schema:

- Dimension tables normalized hoti hain.

- Zyada joins ki zarurat padti hai but less storage lagta hai.

3. Fact Constellation Schema:

- Multiple fact tables hoti hain.

- Dimension tables shared hoti hain.

- Complex design lekin flexible.

Data Warehouse Metadata:

- Technical Metadata: ETL logic, source mappings, transformation rules.

Page 4
KOE093: Data Warehousing & Data Mining - Unit I Notes

- Business Metadata: Definitions, owners, policies.

- Metadata management se warehouse maintain karna easy hota hai.

ETL (Extract, Transform, Load) in Detail:

1. Extract:

- Source systems se raw data ko extract karna.

- Examples: ERP, CRM systems.

2. Transform:

- Data cleaning, deduplication, format conversion.

- Business rules apply karte hain.

3. Load:

- Data ko warehouse ya data mart mein load karna.

- Incremental ya full load dono possible hain.

ETL Tools:

- Informatica

- Talend

- Microsoft SSIS

- Apache Nifi

Data Warehouse Architecture:

1. Single-tier:

Page 5
KOE093: Data Warehousing & Data Mining - Unit I Notes

- Sab kuch ek hi layer mein hota hai. Complex aur rare usage.

2. Two-tier:

- Client directly OLAP server se connect karta hai. Performance bottlenecks ho sakte hain.

3. Three-tier (Standard):

- Bottom tier: Data warehouse server

- Middle tier: OLAP engine

- Top tier: Front-end tools

Data Warehousing Technologies:

- Relational Databases (Oracle, SQL Server, MySQL)

- Columnar Storage (Amazon Redshift, BigQuery)

- Cloud Data Warehousing (Snowflake, Azure Synapse)

- Hadoop-based DW (Hive, Impala)

Conclusion:

Data Warehousing ek backbone hai modern business intelligence systems ka. Ye historical, clean,

integrated data provide karta hai jo strategic decisions mein help karta hai.

Agla unit cover karega Data Warehouse Processes aur Technologies. Tab tak ke liye, ye Unit I ka

complete version hai jisme sabhi key concepts explain kiye gaye hain Hinglish style mein with

examples and comparisons.

Page 6

You might also like