0% found this document useful (0 votes)

13 views56 pages

ML Module1 Ppt - Copy

The document provides an overview of data warehousing and modeling, emphasizing its importance for businesses to analyze and utilize data for strategic decision-making. It defines data warehouses, outlines their key features, and compares them with operational databases, highlighting their role in business intelligence. Additionally, it discusses various types of data warehouses, data integration processes, and OLAP operations for data analysis.

Uploaded by

vvce22cseaiml0123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views56 pages

ML Module1 Ppt - Copy

Uploaded by

vvce22cseaiml0123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Module 1

Data warehousing & Modelling

Motivation for dare warehouse
Competition mounting in every industry, data warehousing the must-
have marketing weapon to retain customers by learning more about
their needs

Data warehousing will provide tools & architecture for business

executives to organise , understand and use data to make strategies .
A data warehouse refers to a data repository that is maintained
separately from an organization’s operational databases. Data
warehouse systems allow for integration of a variety of application
systems. They support information processing by providing a solid
platform of consolidated historic data for analysis.
Oracle definition of data warehouse
• A data warehouse is a type of data management system that is
designed to enable and support business intelligence (BI) activities,
especially analytics. Data warehouses are solely intended to perform
queries and analysis and often contain large amounts of historical
data.
IBM definition
• A core component of business intelligence, a data warehouse
pulls together data from many different sources into a single
data repository for sophisticated analytics and decision
support.
Key features of Datawarehouse

Integrated : from various sources , kept in same format

Subject oriented : around a particular subject ex: customer, sales (instead of whole
organization data)

Nonvolatile: unchanged, read only

Time variant : documented wrt to time (ex: sales of 1-5 years)

• Data warehouse is mainly used for decision support
• Knowledge workers like managers & analyst work on Datawarehouse
to obtain overview or insights on data
• Business decisions like :
✓Analysis of customer buying patterns(ex: combo)
✓Repositioning products and managing product portfolios according to
sales per time and per geographical regions(ex: clothes)
✓To come up with strategies to improve profit
✓Managing customer relationship, managing corporate assets
▪ Organization collect diverse , heterogenous data , distributed data to
integrate provide access is the major challenge
▪ In traditional database approach integrator/wrapper is used to address
this
▪ Integrator is a mediator on multiple heterogenous databases , when a
query is posed to client site , meta data dictionaries will be used to
translate query to relevant query of any particular database
▪ These queries are mapped and sent to local query processor.
▪ The results returned from the different sites are integrated into a global
answer set
▪ This query-driven approach requires complex information filtering and
integration processes and competes with local sites for processing
resources. It is inefficient and potentially expensive for frequent queries.
• Data warehousing employs an update driven approach in which
information from multiple, heterogeneous sources is integrated in
advance and stored in a warehouse for direct querying .
• A data warehouse brings high performance to the integrated
heterogeneous database system because data are copied, pre-
processed, integrated summarized into one data store.
Difference
between
OLAP & OLTP
• Difference between operational database & data
warehouse:
Feature OLTP OLAP

Characteristic Operational processing Information processing analysis

Orientation Transaction Analysis

User Clerk, DBA, database professional Knowledge worker

Function Day to day operation Long term informational requirement decision

support

DB design ER based , application oriented subject-oriented

Data Current data , guaranteed up to date Historic, accuracy-maintained overtime

Summarization highly detailed consolidated

Feature OLTP OLAP

View Detailed , flat relational Summarized , multidimension

Unit of work simple complex

Access Read/write Mostly read

Focus Data in Information out

No. of records Tens millions

No. of users Thousands hundreds

DB size GB ≥ TB

Priority high performance, high availability high flexibility, end-user autonomy

Three Tier data warehouse architecture
Types of warehouse:
Enterprise warehouse
• All the info of entire organization
• Data Integration of 1 or more operational systems
• Detailed and summarized data
• Gb and beyond
• Implemented on mainframes or super servers
• Extensive business model hence takes years to design & build
Data mart
• Subset of Datawarehouse
• Confined particularly to subject & user
• Can be built on low-cost servers like windows or linux
• Implementation can be done in weeks
• Types: dependent or independent
• Independent : locally generated at department
• Dependent : from enterprise
Virtual Datawarehouse
• Looking at frequent queries, (views) summary is created
• Easy to build
• Excess space in servers required
• Data integration is difficult in bottom up
• Data model building as common model is difficult for enterprise in
top down
Extraction, Transformation, and Loading
• Data extraction, which typically gathers data from multiple,
heterogeneous, and external sources.
• Data cleaning, which detects errors in the data and rectifies them
when possible.
• Data transformation, which converts data from legacy or host format
to warehouse format.
• Load, which sorts, summarizes, consolidates, checks integrity, and
builds indices and partitions.
• Refresh, which propagates the updates from the data sources to the
warehouse.
Meta data repository
• Desc of structure , schema, dimensions etc
• Operational metadata: history of migrated data, error and monitoring
reports
• Which algorithms used for summary, aggregation , queries and
reports
• Data source , gateway (ODBC)
• Data related to system performance, which include indices and
profiles that improve data access and retrieval performance
• Business terms
Data cube –Multidimensional approach
• Data cube allows data to be modelled in multidimension
• Dimension-perspective in which organization wants to keep the data
• Ex:Sales –store : dimension :time , item , branch
• Each dimension will be kept in table called dimension table
• Ex: item-itemname, brand, type
• Multidimension data model around a central theme –ex: sales
• Facts –numeric value/measures
• The fact table contains the names of the facts, or measures, as well as
keys to each of the related dimension tables
Schema for multidimensional model
• Db : entity relationship schema is used
• Warehouse : have multidimensional model and hence uses star,
snowflake , fact constellation
Star schema
Snowflake schema
Dimension : The role of Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from a set of
low-level concepts to higher-level, more general concepts
A concept hierarchy that is a total or partial
order among attributes in a database schema is
called a schema hierarchy
Concept hierarchies may also be defined by grouping values for a given
dimension or attribute, resulting in a set-grouping hierarchy
• There may be more than one concept hierarchy for a given attribute
or dimension, based on different user viewpoints.
• Concept hierarchies may be provided manually by system users,
domain experts, or knowledge engineers,
• May be automatically generated based on statistical analysis of the
data distribution.
Measures-categorisation
• A data cube measure is a numeric function that can be evaluated at
each point in the data cube space

• A measure value is computed for a given point by aggregating the

data corresponding to the respective dimension–value pairs defining
the given point
• (Dimension-value pair)time = “Q1”, location = “Vancouver”, item =
“computer”
• Measures can be organized into three categories—distributive,
algebraic, and holistic—based on the kind of aggregate functions used
• Distributive: An aggregate function is distributive if it can be
computed in a distributed manner. Suppose the data are partitioned
into n sets. We apply the function to each partition, resulting in n
aggregate values.
• For example, sum() can be computed for a data cube by first
partitioning the cube into a set of subcubes, computing sum() for
each subcube, and then summing up the counts obtained for each
subcube. Hence, sum() is a distributive aggregate function
• count(), min(), and max() are distributive aggregate function
• Distributive measures can be computed efficiently because of the way
the computation can be partitioned.
• Algebraic: An aggregate function is algebraic if it can be computed by
an algebraic function with M arguments each of which is obtained by
applying a distributive aggregate function.
• For example, avg() (average) can be computed by sum()/count(),
where both sum() and count() are distributive aggregate functions
• Holistic function: there does not exist an algebraic function with M
arguments (where M is a constant) that characterizes the
computation median(), mode(), and rank().
OLAP operations
• Organization allows the user to view data in different perspectives.
• Several OLAP data cube operations exist to demonstrate these
different views allowing interactive querying and analysis of the data
at hand
• Hence, OLAP provides a user-friendly environment for interactive
data analysis
OLAP operations
Roll up
Drill down
Slice and dice
Pivot (rotate)
• Roll-up: The roll-up operation (also called the drill-up operation by
some vendors) performs aggregation on a data cube, either by
climbing up a concept hierarchy for a dimension or by dimension
reduction.
• This hierarchy was defined as the total order “street < city < province
or state < country.” The roll-up operation shown aggregates the data
by ascending the location hierarchy from the level of city to the level
of country. In other words, rather than grouping the data by city, the
resulting cube groups the data by country
• Drill-down: Drill-down is the reverse of roll-up.
• It navigates from less detailed data to more detailed data. Drill-down
can be realized by either stepping down a concept hierarchy for a
dimension introducing additional dimensions
• concept hierarchy for time defined as “day < month < quarter < year.”
Drill-down occurs by descending the time hierarchy from the level of
quarter to the more detailed level of month
• Slice and dice: The slice operation performs a selection on one
dimension of the given cube, resulting in a sub-cube
• Here Slice is performed for the dimension "time" using the
criterion time = "Q1".
• It will form a new sub-cube by selecting one or more
dimensions.
• Dice selects two or more dimensions from a given cube and
provides a new sub-cube The dice operation on the cube based on the following
selection criteria involves three dimensions.
•(location = "Toronto" or "Vancouver")
•(time = "Q1" or "Q2")
•(item =" Mobile" or "Modem")
• Pivot
• The pivot operation is also known as rotation. It rotates the
data axes in view in order to provide an alternative
presentation of data
OLAP Systems versus Statistical Databases
• Statistical db is a database system is designed to support statistical
applications meant for socio economic applications,
• OLAP for business intelligence with large amount of data ,.
• Sdb has truth issues
Star net model for querying
• The querying of multidimensional databases can be based on a star
net model, which consists of radial lines emerging from a central
point, where each line represents a concept hierarchy for a
dimension.
• Each abstraction level in the hierarchy is called a footprint. These
represent the granularities available for use by OLAP operations such
as drill-down and roll-up
• This starnet consists of four radial lines, representing concept
hierarchies for the dimensions location, customer, item, and time,
respectively. Each line consists of footprints representing abstraction
levels of the dimension

Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
IT DWDM Unit I New PPT
No ratings yet
IT DWDM Unit I New PPT
60 pages
Data Mining 9,10,11
No ratings yet
Data Mining 9,10,11
27 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
Data Warehouse C
No ratings yet
Data Warehouse C
34 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
Module 1 (2)
No ratings yet
Module 1 (2)
71 pages
DWM UNIT 1 (2)
No ratings yet
DWM UNIT 1 (2)
67 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Iare DWDM PPT Cse
No ratings yet
Iare DWDM PPT Cse
249 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
2-Data Warehouse Architecture_ Three-tier Data Warehouse Architecture-16!12!2024
No ratings yet
2-Data Warehouse Architecture_ Three-tier Data Warehouse Architecture-16!12!2024
30 pages
Data warehouse_unit-2_s
No ratings yet
Data warehouse_unit-2_s
21 pages
Adbms: Data Warehousing OLAP Technology
No ratings yet
Adbms: Data Warehousing OLAP Technology
57 pages
Data Warehousing unit 1,2
No ratings yet
Data Warehousing unit 1,2
9 pages
Data Warehousing and OLAP Technology For Data Mining
No ratings yet
Data Warehousing and OLAP Technology For Data Mining
30 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Unit 2
No ratings yet
Unit 2
34 pages
Unit 2_Data Science BCA
No ratings yet
Unit 2_Data Science BCA
20 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
No ratings yet
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
35 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
Unit 1- Data Warehouse
No ratings yet
Unit 1- Data Warehouse
21 pages
Data Warehouse
No ratings yet
Data Warehouse
23 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Chapter6_DataWareHousing_final
No ratings yet
Chapter6_DataWareHousing_final
46 pages
DSS Course in English
No ratings yet
DSS Course in English
17 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
Unit-2
No ratings yet
Unit-2
32 pages
Data_Mining_Warehousing Unit 1
No ratings yet
Data_Mining_Warehousing Unit 1
35 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
What Is Data Warehouse?
No ratings yet
What Is Data Warehouse?
26 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
DWDM notes
No ratings yet
DWDM notes
19 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
bi-unit-4
No ratings yet
bi-unit-4
40 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
DW Concepts
100% (1)
DW Concepts
40 pages
DWDM 3
0% (1)
DWDM 3
52 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
Data_Mining_Warehousing Unit I
No ratings yet
Data_Mining_Warehousing Unit I
45 pages
3
No ratings yet
3
77 pages
unit-2_1 (1)
No ratings yet
unit-2_1 (1)
60 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
IPDirector TechRef RemoteInstaller 7.92
No ratings yet
IPDirector TechRef RemoteInstaller 7.92
274 pages
GSM Architecture GTR
No ratings yet
GSM Architecture GTR
61 pages
Stock ATK BLN Jan - Des'21
No ratings yet
Stock ATK BLN Jan - Des'21
30 pages
Full-Stack React and Node
No ratings yet
Full-Stack React and Node
5 pages
UAT Integration With PaymentGateway
No ratings yet
UAT Integration With PaymentGateway
3 pages
Instruction Manual 977 Tank Side Indicator
No ratings yet
Instruction Manual 977 Tank Side Indicator
28 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
8 pages
Ebooks File Oracle 12c SQL 3rd Edition (Ebook PDF) All Chapters
100% (2)
Ebooks File Oracle 12c SQL 3rd Edition (Ebook PDF) All Chapters
41 pages
Madhuri Neravetla
No ratings yet
Madhuri Neravetla
5 pages
CV Developer
No ratings yet
CV Developer
3 pages
Eaton Jockey Touch™ Microprocessor Based Jockey Pump Controller
No ratings yet
Eaton Jockey Touch™ Microprocessor Based Jockey Pump Controller
28 pages
List of Steps To Measure Thermocouple With MATLAB
No ratings yet
List of Steps To Measure Thermocouple With MATLAB
1 page
Multimedia and ICT
75% (4)
Multimedia and ICT
23 pages
Concept of Computational Model
No ratings yet
Concept of Computational Model
11 pages
Microcontroller Based Real Time Weather Monitoring Device With GSM
No ratings yet
Microcontroller Based Real Time Weather Monitoring Device With GSM
5 pages
Modern Systems Analysis and Design: System Implementation
No ratings yet
Modern Systems Analysis and Design: System Implementation
42 pages
Main
No ratings yet
Main
10 pages
Analysis and Design of Algorithms - Handout
No ratings yet
Analysis and Design of Algorithms - Handout
32 pages
Free Online Thesis Website
100% (2)
Free Online Thesis Website
7 pages
Reversing Apks: Example Lab
No ratings yet
Reversing Apks: Example Lab
10 pages
Sonar Qube
No ratings yet
Sonar Qube
46 pages
Mapg 2024 1712162925
No ratings yet
Mapg 2024 1712162925
13 pages
Toshiba HDD
No ratings yet
Toshiba HDD
4 pages
ReactJS Interview Questions
No ratings yet
ReactJS Interview Questions
20 pages
Sant Longowal Institute of Engineering and Technology: Practical Number 1&2 OF Computer Networks
No ratings yet
Sant Longowal Institute of Engineering and Technology: Practical Number 1&2 OF Computer Networks
8 pages
md0f9d7a02ae 2
No ratings yet
md0f9d7a02ae 2
18 pages
Shivam Joshi Resume
No ratings yet
Shivam Joshi Resume
1 page
CERSAI 2.0 Offline Registration of Security Interest or Attachment Order Details User Manual
0% (1)
CERSAI 2.0 Offline Registration of Security Interest or Attachment Order Details User Manual
11 pages
Os Lab Manual
No ratings yet
Os Lab Manual
75 pages
internet security
No ratings yet
internet security
3 pages

ML Module1 Ppt - Copy

Uploaded by

ML Module1 Ppt - Copy

Uploaded by

Module 1

Data warehousing & Modelling

Data warehousing will provide tools & architecture for business

Integrated : from various sources , kept in same format

Nonvolatile: unchanged, read only

Time variant : documented wrt to time (ex: sales of 1-5 years)

Characteristic Operational processing Information processing analysis

Orientation Transaction Analysis

User Clerk, DBA, database professional Knowledge worker

Function Day to day operation Long term informational requirement decision

DB design ER based , application oriented subject-oriented

Data Current data , guaranteed up to date Historic, accuracy-maintained overtime

Summarization highly detailed consolidated

View Detailed , flat relational Summarized , multidimension

Unit of work simple complex

Access Read/write Mostly read

Focus Data in Information out

No. of records Tens millions

No. of users Thousands hundreds

Priority high performance, high availability high flexibility, end-user autonomy

• A measure value is computed for a given point by aggregating the

You might also like