0% found this document useful (0 votes)
5 views11 pages

dwm module 1 (1.1)

A Data Warehouse is a centralized system that collects, cleans, organizes, and stores data from various sources for business analysis and decision-making. It is characterized by being subject-oriented, integrated, non-volatile, time-variant, and optimized for querying and reporting. The architecture includes components like data sources, ETL processes, storage, metadata, and tools for querying and reporting.

Uploaded by

shalom.lane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

dwm module 1 (1.1)

A Data Warehouse is a centralized system that collects, cleans, organizes, and stores data from various sources for business analysis and decision-making. It is characterized by being subject-oriented, integrated, non-volatile, time-variant, and optimized for querying and reporting. The architecture includes components like data sources, ETL processes, storage, metadata, and tools for querying and reporting.

Uploaded by

shalom.lane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

1 data warehouse and dimensional


modelling

📘 Introduction to Data Warehouse –


In-depth Detail + Short Summary (Hindi
& English)

✅ Short Explanation (In Hindi):


Data Warehouse ek aisa centralized
system hai jahan pe alag-alag sources
(jaise sales, finance, marketing, etc.) se
data collect karke, clean karke, organize
karke store kiya jata hai, taaki business
analysis aur decision-making ke liye use
kiya ja sake. Ye mostly historical data ko
store karta hai aur OLAP (Online
Analytical Processing) pe based hota
hai, na ki OLTP.
📘 In-Depth Explanation:

🔹 1. What is a Data Warehouse?


A Data Warehouse is a central repository
of data collected from multiple
heterogeneous sources. It stores
historical and current data in one single
place, which is used for analytical
reporting, data mining, and business
decision-making.

✅ Data Warehouse Characteristics – In-depth


Explanation in Short with Examples

1. Subject-Oriented

➡ Meaning: Data ko business ke topics (subjects) ke according store


kiya jata hai – jaise sales, finance, customer.​
➡ Example: Agar ek company ko sirf customer behaviour dekhna
hai, toh Data Warehouse mein sirf customer-related data hota hai.
2. Integrated

➡ Meaning: Data alag-alag sources se aake ek format mein store hota


hai – clean and consistent.​
➡ Example: Ek source mein gender "M/F" hai, dusre mein
"Male/Female" – Data Warehouse ise uniform bana deta hai:
"Male"/"Female".

3. Non-Volatile

➡ Meaning: Data once stored, change nahi hota. Updates bhi naye
record ke roop mein store hote hain.​
➡ Example: Agar kisi product ki price change hoti hai, purani price
delete nahi hoti – naye record ke sath nayi price add hoti hai.

4. Time-Variant

➡ Meaning: Data ke sath time stamp hota hai – taaki historical


analysis possible ho.​
➡ Example: Company ko pichhle 5 saal ka sales data chahiye – to
Data Warehouse wo saari dates ke data rakhta hai.

5. Data Consolidation

➡ Meaning: Alag-alag databases aur files se data lekar ek jagah store


karna.​
➡ Example: HR system se employee data, CRM se customer data,
aur SAP se finance data combine karna.

6. Optimized for Querying & Reporting


➡ Meaning: Complex queries aur reports ke liye banaya gaya hota hai
(OLAP support karta hai).​
➡ Example: CEO agar poore India ki quarterly sales compare karna
chahe – to ek hi query se report mil jati hai.

7. Supports Decision-Making

➡ Meaning: Business ko sahi decision lene ke liye accurate data


insights milte hain.​
➡ Example: Marketing team ko dekhna hai ki kaunsi campaign best
thi – Data Warehouse se uska analysis ho sakta hai.

8. Read-Only Access

➡ Meaning: Users sirf data dekh sakte hain, usse change nahi kar
sakte.​
➡ Example: Analyst sirf query run karega – data update ya delete
nahi kar sakta.

9. Large-Scale Storage

➡ Meaning: Bohot saal ka data store karne ke liye design kiya gaya
hota hai.​
➡ Example: Ek bank ke paas 15 saal ka transaction history Data
Warehouse mein safe hota hai.

10. ETL (Data Cleaning & Transformation)

➡ Meaning: Data load hone se pehle clean, filter, aur format kiya jata
hai.​
➡ Example: "NULL" ya "N/A" jaisa dirty data remove karke proper
format mein convert karna.

11. Accessibility

➡ Meaning: Non-technical log bhi dashboards, charts ke through data


easily access kar sakte hain.​
➡ Example: HR manager easily ek dashboard se employee turnover
report dekh sakta hai bina coding ke.

12. Scalable & Flexible

➡ Meaning: Data Warehouse badhte data aur naye data types (videos,
PDFs) ko bhi support karta hai.​
➡ Example: Pehle sirf Excel data tha, ab APIs aur cloud se data bhi
connect ho raha hai – fir bhi smoothly kaam karta hai.

🔚 Final Summary in One Line:


Data Warehouse ek smart storage system hai jo past se lekar present
tak ka clean, well-organized data store karta hai – jisse business fast,
smart aur strategic decisions le sake.

✅ Data Warehouse Architecture Components – Short


In-Depth Explanation with Examples

1. Data Sources
➡ Meaning: Jahan se data aata hai.​
➡ Types:

●​ Operational DBs: e.g., Online shopping site's order database


(OLTP).​

●​ External Sources: e.g., Facebook Ads data, Google Analytics,


APIs.​

●​ Flat Files: e.g., Excel sheets, CSV files from old systems.​

📌 Example: Ek company ke paas SAP system, Excel sales reports


aur social media campaign data hai – ye sab alag-alag data sources
hain.

2. ETL (Extract, Transform, Load)

➡ Meaning: Data ko collect karna, clean karna aur warehouse mein


store karna.

●​ Extract: Data uthana from sources.​

●​ Transform: Clean, format, merge karna.​

●​ Load: Final data ko warehouse mein daalna.​

📌 Example: "M/F" ko "Male/Female" mein convert karke warehouse


mein save karna.

3. Data Warehouse Storage

➡ Parts:
●​ Staging Area: Temporary raw data store hota hai.​

●​ Repository: Final clean and structured data hota hai.​

●​ Data Marts: Specific department ke liye data – like marketing or


HR only.​

📌 Example: Staging mein raw CSV aata hai, clean hoke repository
mein jata hai, aur HR department ko sirf HR data mart access hota hai.

4. Metadata Layer

➡ Meaning: "Data about data" – kis source se aaya, kya change hua,

📌
kis format mein hai.​
Example: Power BI user dekh sakta hai ki sales report kis date
range se generate hui, aur kis source se data aaya.

5. Data Query and Reporting Tools

➡ Meaning: Data ko dekhne, analyze karne aur visualize karne ke


tools.

●​ Tools: Tableau, Power BI, SQL, QlikView, Excel Dashboards.​

📌 Example: Sales Manager Power BI mein ek chart bana ke dekhta


hai ki pichle 6 months mein sales kaisi rahi.

6. OLAP (Online Analytical Processing)

➡ Meaning: Data ko multi-dimensional way mein analyze karna –


slice, dice, drill down, roll up.​
📌 Example: Regional Manager dekhta hai – Mumbai city → Q2 sales
→ specific product ki sales.

7. Data Mining Tools

➡ Meaning: Hidden patterns ya future predictions nikalna using


algorithms.

●​ Tools: Python ML libraries, R, Weka, SAS.​

📌 Example: Bank predict karta hai kaunsa customer loan repay nahi
karega using past data.

8. DBMS (Database Management System)

➡ Meaning: Data ko efficiently store, retrieve, aur manage karna.

●​ Examples: Oracle, SQL Server, PostgreSQL.​

📌 Example: Amazon apne warehouse data ko Oracle DBMS mein


store karta hai for fast queries.

9. Data Governance & Security

➡ Meaning: Data ko secure rakhna aur access ko control karna.

●​ Includes: Encryption, Access Control, Compliance (GDPR).​

📌 Example: Only Finance department hi salary data dekh sakta hai;


baaki departments access nahi kar sakte.
10. Middleware

➡ Meaning: Systems ke beech mein data ka communication smooth

📌
banana.​
Example: Excel se Power BI connect karna using an API – jo
middleware ke through hota hai.

11. User Interface Layer

➡ Meaning: Easy-to-use dashboard ya portals jisme non-technical

📌
users bhi data dekh sakte hain.​
Example: CEO ek mobile app dashboard mein monthly profit chart
directly dekh sakta hai.

12. Performance & Monitoring Tools

➡ Meaning: Data warehouse ki health check karna – slow queries,


ETL fail hua ya nahi.

●​ Examples: Load balancer, ETL logs, Query performance


analyzer.​

📌 Example: Agar koi query slow hai, to monitoring tool alert dega –
"Optimize needed."

🔚 Final Summary (Ek Line Mein):


Data Warehouse Architecture ek complete system hai jisme data
sources se data aake clean hoke store hota hai, jise tools ke through
query, report, and analysis ke liye use kiya jata hai – secure, fast aur
scalable way mein.

Architecture of datawarehouse
Data Warehouses Vs Data Mart

You might also like