0% found this document useful (0 votes)

30 views56 pages

Lecture 09

This document summarizes a lecture on data integration. It discusses the challenges of providing uniform access to multiple autonomous and heterogeneous data sources. Current solutions include data warehousing and virtual integration architectures. Virtual integration uses wrappers to query individual data sources and a mediator to combine the results. Source descriptions are needed to map between the mediated schema and source schemas. Global-as-view and local-as-view are two common approaches for specifying these mappings to enable query reformulation.

Uploaded by

dharawagh17091991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views56 pages

Lecture 09

Uploaded by

dharawagh17091991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

Lecture #9

Data Integration
May 30th, 2002
Agenda/Administration
• Project demo scheduling.
• Reading pointers for exam.
What is Data Integration
• Providing
– Uniform (same query interface to all sources)
– Access to (queries; eventually updates too)
– Multiple (we want many, but 2 is hard too)
– Autonomous (DBA doesn’t report to you)
– Heterogeneous (data models are different)
– Structured (or at least semi-structured)
– Data Sources (not only databases).
The Problem: Data Integration
m ybook s .c om M e dia te d S c he m a
B ooks In ven tory O rd ers S h ip p in g R eview s

Inte rne t W AN Inte rne t Inte rne t

M o rga n- C us to me r
East O rde rs
K a ufma n F e dE x R e v ie w s
P re ntic e - W est
UP S NY Time s
Ha ll
... ... a lt.bo o ks .
re v ie w s

Uniform query capability across autonomous, heterogeneous data

sources on LAN, WAN, or Internet
Motivation(s)
• Enterprise data integration; web-site construction.
• WWW:
– Comparison shopping
– Portals integrating data from multiple sources
– B2B, electronic marketplaces
• Science and culture:
– Medical genetics: integrating genomic data
– Astrophysics: monitoring events in the sky.
– Environment: Puget Sound Regional Synthesis Model
– Culture: uniform access to all cultural databases
produced by countries in Europe.
Discussion
• Why is it hard?

• How will we solve it?

Current Solutions
• Mostly ad-hoc programming: create a
special solution for every case; pay
consultants a lot of money.
• Data warehousing: load all the data
periodically into a warehouse.
– 6-18 months lead time
– Separates operational DBMS from decision
support DBMS. (not only a solution to data
integration).
– Performance is good; data may not be fresh.
– Need to clean, scrub you data.
Data Warehouse Architecture
User queries OLAP / Decision support/
Data cubes/ data mining

Relational database (warehouse)

Data extraction Data cleaning/

programs scrubbing

Data Data Data

source source source
The Virtual Integration
Architecture
• Leave the data in the sources.
• When a query comes in:
– Determine the relevant sources to the query
– Break down the query into sub-queries for the
sources.
– Get the answers from the sources, and combine
them appropriately.
• Data is fresh.
• Challenge: performance.
Virtual Integration Architecture
User queries
Mediated schema
Mediator: Reformulation engine

optimizer
Which data Data source
model? Execution engine catalog

wrapper wrapper wrapper

Data Data Data

source source source

Sources can be: relational, hierarchical (IMS), structure files, web sites.
Research Projects

• Garlic (IBM),
• Information Manifold (AT&T)
• Tsimmis, InfoMaster (Stanford)
• The Internet Softbot/Razor/Tukwila (UW)
• Hermes (Maryland)
• DISCO, Agora (INRIA, France)
• SIMS/Ariadne (USC/ISI)
Industry
• Nimble Technology
• Enosys Markets
• IBM starting to announce stuff
• BEA marketing announcing stuff too.
Dimensions to Consider
• How many sources are we accessing?
• How autonomous are they?
• Meta-data about sources?
• Is the data structured?
• Queries or also updates?
• Requirements: accuracy, completeness,
performance, handling inconsistencies.
• Closed world assumption vs. open world?
Outline
• Wrappers
• Semantic integration and source descriptions:
– Modeling source completeness
– Modeling source capabilities
• Query optimization
• Query execution
• Peer-data management systems
• Creating schema mappings
Wrapper Programs
• Task: to communicate with the data sources
and do format translations.
• They are built w.r.t. a specific source.
• They can sit either at the source or at the
mediator.
• Often hard to build (very little science).
• Can be “intelligent”: perform source-
specific optimizations.
Example
Transform:
 Introduction to DB 
 Phil Bernstein 
 Eric Newcomer 
Addison Wesley, 1999

into:
<book>
<title> Introduction to DB </title>
<author> Phil Bernstein </author>
<author> Eric Newcomer </author>
<publisher> Addison Wesley </publisher>
<year> 1999 </year>
</book>
Data Source Catalog
• Contains all meta-information about the
sources:
– Logical source contents (books, new cars).
– Source capabilities (can answer SQL queries)
– Source completeness (has all books).
– Physical properties of source and network.
– Statistics about the data (like in an RDBMS)
– Source reliability
– Mirror sources
– Update frequency.
Content Descriptions
• User queries refer to the mediated schema.
• Data is stored in the sources in a local
schema.
• Content descriptions provide the semantic
mappings between the different schemas.
• Data integration system uses the
descriptions to translate user queries into
queries on the sources.
Desiderata from Source
Descriptions
• Expressive power: distinguish between
sources with closely related data. Hence, be
able to prune access to irrelevant sources.
• Easy addition: make it easy to add new data
sources.
• Reformulation: be able to reformulate a user
query into a query on the sources efficiently
and effectively.
Reformulation Problem

• Given:
– A query Q posed over the mediated schema
– Descriptions of the data sources
• Find:
– A query Q’ over the data source relations, such
that:
• Q’ provides only correct answers to Q, and
• Q’ provides all possible answers from to Q given
the sources.
Approaches to Specifying Source
Descriptions
• Global-as-view: express the mediated
schema relations as a set of views over the
data source relations
• Local-as-view: express the source relations
as views over the mediated schema.
• Can be combined with no additional cost.
Global-as-View
Mediated schema:
Movie(title, dir, year, genre),
Schedule(cinema, title, time).
Create View Movie AS
select * from S1 [S1(title,dir,year,genre)]
union
select * from S2 [S2(title, dir,year,genre)]
union [S3(title,dir), S4(title,year,genre)]
select S3.title, S3.dir, S4.year, S4.genre
from S3, S4
where S3.title=S4.title
Global-as-View: Example 2
Mediated schema:
Movie(title, dir, year, genre),
Schedule(cinema, title, time).

Create View Movie AS [S1(title,dir,year)]

select title, dir, year, NULL
from S1
union [S2(title, dir,genre)]
select title, dir, NULL, genre
from S2
Global-as-View: Example 3
Mediated schema:
Movie(title, dir, year, genre),
Schedule(cinema, title, time).
Source S4: S4(cinema, genre)
Create View Movie AS
select NULL, NULL, NULL, genre
from S4
Create View Schedule AS
select cinema, NULL, NULL
from S4.
But what if we want to find which cinemas are playing comedies?
Global-as-View Summary
• Query reformulation boils down to view
unfolding.
• Very easy conceptually.
• Can build hierarchies of mediated schemas.
• You sometimes loose information. Not
always natural.
• Adding sources is hard. Need to consider all
other sources that are available.
Local-as-View: example 1
Mediated schema:
Movie(title, dir, year, genre),
Schedule(cinema, title, time).
Create Source S1 AS
select * from Movie
Create Source S3 AS [S3(title, dir)]
select title, dir from Movie
Create Source S5 AS
select title, dir, year
from Movie
where year > 1960 AND genre=“Comedy”
Local-as-View: Example 2
Mediated schema:
Movie(title, dir, year, genre),
Schedule(cinema, title, time).
Source S4: S4(cinema, genre)
Create Source S4
select cinema, genre
from Movie m, Schedule s
where m.title=s.title
.
Now if we want to find which cinemas are playing comedies, there is
hope!
Local-as-View Summary
• Very flexible. You have the power of the
entire query language to define the contents
of the source.
• Hence, can easily distinguish between
contents of closely related sources.
• Adding sources is easy: they’re independent
of each other.
• Query reformulation: answering queries
using views!
The General Problem
• Given a set of views V1,…,Vn, and a query
Q, can we answer Q using only the answers to
V1,…,Vn?
• Many, many papers on this problem.
• The best performing algorithm: The MiniCon
Algorithm, (Pottinger & Levy, 2000).
• Great survey on the topic: (Halevy, 2001).
Local Completeness Information
• If sources are incomplete, we need to look
at each one of them.
• Often, sources are locally complete.
• Movie(title, director, year) complete for
years after 1960, or for American directors.
• Question: given a set of local completeness
statements, is a query Q’ a complete answer
to Q?
Example
• Movie(title, director, year) (complete after
1960).
• Show(title, theater, city, hour)
• Query: find movies (and directors) playing
in Seattle:
Select m.title, m.director
From Movie m, Show s
Where m.title=s.title AND city=“Seattle”
• Complete or not?
Example #2

• Movie(title, director, year), Oscar(title, year)

• Query: find directors whose movies won
Oscars after 1965:
select m.director
from Movie m, Oscar o
where m.title=o.title AND m.year=o.year
AND o.year > 1965.
• Complete or not?
Query Optimization
• Very related to query reformulation!
• Goal of the optimizer: find a physical plan
with minimal cost.
• Key components in optimization:
– Search space of plans
– Search strategy
– Cost model
Optimization in Distributed
DBMS
• A distributed database (2-minute tutorial):
– Data is distributed over multiple nodes, but is
uniform.
– Query execution can be distributed to sites.
– Communication costs are significant.
• Consequences for optimization:
– Optimizer needs to decide locality
– Need to exploit independent parallelism.
– Need operators that reduce communication
costs (semi-joins).
DDBMS vs. Data Integration

• In a DDBMS, data is distributed over a set

of uniform sites with precise rules.
• In a data integration context:
– Data sources may provide only limited access
patterns to the data.
– Data sources may have additional query
capabilities.
– Cost of answering queries at sources unknown.
– Statistics about data unknown.
– Transfer rates unpredictable.
Modeling Source Capabilities
• Negative capabilities:
– A web site may require certain inputs (in an
HTML form).
– Need to consider only valid query execution
plans.
• Positive capabilities:
– A source may be an ODBC compliant system.
– Need to decide placement of operations
according to capabilities.
• Problem: how to describe and exploit
source capabilities.
Example #1: Access Patterns
Mediated schema relation: Cites(paper1, paper2)

Create Source S1 as
select *
from Cites
given paper1
Create Source S2 as
select paper1
from Cites

Query: select paper1 from Cites where paper2=“Hal00”

Example #1: Continued
Create Source S1 as
select *
from Cites
given paper1
Create Source S2 as
select paper1
from Cites
Select p1
From S1, S2
Where S2.paper1=S1.paper1 AND S1.paper2=“Hal00”
Example #2: Access Patterns
Create Source S1 as
select *
from Cites
given paper1
Create Source S2 as
select paperID
from UW-Papers
Create Source S3 as
select paperID
from AwardPapers
given paperID
Query: select * from AwardPapers
Example #2: Solutions
• Can’t go directly to S3 because it requires a
binding.
• Can go to S1, get UW papers, and check if they’re
in S3.
• Can go to S1, get UW papers, feed them into S2,
and feed the results into S3.
• Can go to S1, feed results into S2, feed results into
S2 again, and then feed results into S3.
• Strictly speaking, we can’t a priori decide when to
stop.
• Need recursive query processing.
Handling Positive Capabilities
• Characterizing positive capabilities:
– Schema independent (e.g., can always perform joins,
selections).
– Schema dependent: can join R and S, but not T.
– Given a query, tells you whether it can be handled.
• Key issue: how do you search for plans?
• Garlic approach (IBM): Given a query, STAR
rules determine which subqueries are executable
by the sources. Then proceed bottom-up as in
System-R.
Matching Objects Across
Sources
• How do I know that A. Halevy in source 1 is the
same as Alon Halevy in source 2?
• If there are uniform keys across sources, no
problem.
• If not:
– Domain specific solutions (e.g., maybe look at the
address, ssn).
– Use Information retrieval techniques (Cohen, 98).
Judge similarity as you would between documents.
– Use concordance tables. These are time-consuming to
build, but you can then sell them for lots of money.
Optimization and Execution

• Problem:
– Few and unreliable statistics about the data.
– Unexpected (possibly bursty) network transfer
rates.
– Generally, unpredictable environment.
• General solution: (research area)
– Adaptive query processing.
– Interleave optimization and execution. As you
get to know more about your data, you can
improve your plan.
Tukwila Data Integration System
data

O ptim izer E xecution

E ngine

query (R e-) E vent

logical exec
O ptim izer H andler answ er
R eform ulator plan plan

M em A lloc- exec Q uery

Fragm enter results O perators
source m appings

C atalog Tem p S tore

Novel components:
– Event handler
– Optimization-execution loop
Double Pipelined Join (Tukwila)

Hash Join Double Pipelined Hash Join

 Partially pipelined: no output
until inner read
 Asymmetric (inner vs. outer) —  Outputs data immediately
optimization requires source  Symmetric — requires less
behavior knowledge
source knowledge to optimize
Piazza: A Peer-Data Management System
Goal: To enable users to share data across
local or wide area networks in an ad-hoc,
highly dynamic distributed architecture.
 Peers share data, mediated views.
 Peers act as both clients and servers
 Rich semantic relationships between peers.
 Ad-hoc collaborations (peers join and leave
at will).
Extending the Vision to Data Sharing
First
Hospital
(FH) Earthquake
Hospitals Command
(H) 911 Dispatch Center (ECC)
Center (9DC)
Lakeview
Hospital (LH)

Medical Search &

Aid (MA) Rescue (SR)
Fire
Services (FS)

Emergency
Workers (EW)
Portland Vancouver Fire
Fire District (PFD) District (VFD)

National Washington
Guard State

Station 3 Station 19 Station 12 Station 32

The Structure Mapping Problem
• Types of structures:
– Database schemas, XML DTDs, ontologies, …,
• Input:
– Two (or more) structures, S1 and S2
– (perhaps) Data instances for S1 and S2
– Background knowledge
• Output:
– A mapping between S1 and S2
• Should enable translating between data instances.
Semantic Mappings between
Schemas
• Source schemas = XML DTDs
house

address contact-info num-baths

agent-name agent-phone
1-1 mapping non 1-1 mapping
house

location contact full-baths half-baths

name phone
Why Matching is Difficult
• Structures represent same entity differently
– different names => same entity:
• area & address => location
– same names => different entities:
• area => location or square-feet
• Intended semantics is typically subjective!
– IBM Almaden Lab = IBM?
• Schema, data and rules never fully capture semantics!
– not adequately documented, certainly not for machine
consumption.
• Often hard for humans (committees are formed!)
Desiderata from Proposed
Solutions
• Accuracy, efficiency, ease of use.
• Realistic expectations:
– Unlikely to be fully automated. Need user in the loop.
• Some notion of semantics for mappings.
• Extensibility:
– Solution should exploit additional background
knowledge.
• “Memory”, knowledge reuse:
– System should exploit previous manual or
automatically generated matchings.
– Key idea behind LSD.
Learning for Mapping
• Context: generating semantic mappings between
a mediated schema and a large set of data source
schemas.
• Key idea: generate the first mappings manually,
and learn from them to generate the rest.
• Technique: multi-strategy learning (extensible!)
• L(earning) S(ource) D(escriptions) [SIGMOD 2001].
Data Integration (a simple
PDMS)
Find houses with four bathrooms priced under $500,000

mediated schema
Query reformulation
and optimization.

source schema 1 source schema 2 source schema 3

realestate.com homeseekers.com homes.com

Applications: WWW, enterprises, science projects

Techniques: virtual data integration, warehousing, custom code.
Learning from the Manual Mappings
Mediated schema
price agent-name agent-phone office-phone description

listed-price contact-name contact-phone office comments

Schema of realestate.com
If “office” occurs in the name
realestate.com => office-phone

listed-price contact-name contact-phone office comments

$250K James Smith (305) 729 0831 (305) 616 1822 Fantastic house
$320K Mike Doan (617) 253 1429 (617) 112 2315 Great location

If “fantastic” & “great”

occur frequently in
homes.com data instances
sold-at contact-agent extra-info => description
$350K (206) 634 9435 Beautiful yard
$230K (617) 335 4243 Close to Seattle
$190K (512) 342 1263 Great lot
Multi-Strategy Learning

• Use a set of base learners:

– Name learner, Naïve Bayes, Whirl, XML learner
• And a set of recognizers:
– County name, zip code, phone numbers.
• Each base learner produces a prediction weighted by
confidence score.
• Combine base learners with a meta-learner, using
stacking.
The Semantic Web
• How does it relate to data integration?
• How are we going to do it?
• Why should we do it? Do we need a killer
app or is the semantic web a killer app?

TRS601 University Success 9780134400785 FULL
100% (2)
TRS601 University Success 9780134400785 FULL
368 pages
Skillful Foundation Reading Writing Unit 1 PDF
67% (3)
Skillful Foundation Reading Writing Unit 1 PDF
9 pages
Art of Effective Writing
0% (1)
Art of Effective Writing
3 pages
Data Integration
No ratings yet
Data Integration
42 pages
Data Integration Approaches
No ratings yet
Data Integration Approaches
28 pages
Data Integration
No ratings yet
Data Integration
44 pages
w5_L52_data integration_my
No ratings yet
w5_L52_data integration_my
46 pages
Information Integration: Existing Methods and Solutions
No ratings yet
Information Integration: Existing Methods and Solutions
25 pages
9 Integration
No ratings yet
9 Integration
19 pages
Data Sources
No ratings yet
Data Sources
80 pages
Database
No ratings yet
Database
98 pages
Data Integration
100% (1)
Data Integration
38 pages
Data Integration
No ratings yet
Data Integration
46 pages
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
No ratings yet
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
11 pages
Week-3 Schema Matching and Mapping
No ratings yet
Week-3 Schema Matching and Mapping
26 pages
Database Integration
No ratings yet
Database Integration
17 pages
Database Integration
No ratings yet
Database Integration
16 pages
Data Integration: A Theoretical Perspective: Maurizio Lenzerini
No ratings yet
Data Integration: A Theoretical Perspective: Maurizio Lenzerini
14 pages
Data Integration A Theoretical Perspective
No ratings yet
Data Integration A Theoretical Perspective
15 pages
Semantic Mapping in Data Integration Systems: Baladevi C
No ratings yet
Semantic Mapping in Data Integration Systems: Baladevi C
31 pages
Chapter_1
No ratings yet
Chapter_1
39 pages
Parent 1998 Issues and Approaches of Database Integration
No ratings yet
Parent 1998 Issues and Approaches of Database Integration
12 pages
Data Integration: Click To Edit Master Subtitle Style
No ratings yet
Data Integration: Click To Edit Master Subtitle Style
60 pages
Data Intergration
No ratings yet
Data Intergration
14 pages
Ch-9 Multidatabase Query Processing
No ratings yet
Ch-9 Multidatabase Query Processing
46 pages
1 IntegrationApproaches
No ratings yet
1 IntegrationApproaches
19 pages
Bajwa A C
No ratings yet
Bajwa A C
4 pages
Chapter 4
No ratings yet
Chapter 4
24 pages
Score: Context-Oriented Structured and Unstructured Information Integration
No ratings yet
Score: Context-Oriented Structured and Unstructured Information Integration
35 pages
Recent Progress On Selected Topics in Database Research
No ratings yet
Recent Progress On Selected Topics in Database Research
15 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
Peerj Cs 254
No ratings yet
Peerj Cs 254
30 pages
16 08 2024 Data Virtualization Session2
No ratings yet
16 08 2024 Data Virtualization Session2
45 pages
Distributed Data Architecture & Management: DR Simon Scola
No ratings yet
Distributed Data Architecture & Management: DR Simon Scola
42 pages
Database Slide Book
No ratings yet
Database Slide Book
309 pages
Irs Unit-1
No ratings yet
Irs Unit-1
61 pages
UNIT I
No ratings yet
UNIT I
65 pages
Unit5
No ratings yet
Unit5
17 pages
My SQL
No ratings yet
My SQL
28 pages
SLIDE-03
No ratings yet
SLIDE-03
38 pages
ADDBASE - Prelim
No ratings yet
ADDBASE - Prelim
6 pages
ADDBASE - Prelim
No ratings yet
ADDBASE - Prelim
6 pages
Addbase - Prelim: Slide 1 - Review of Theory of Databases
No ratings yet
Addbase - Prelim: Slide 1 - Review of Theory of Databases
6 pages
Addbase - Prelim: Slide 1 - Review of Theory of Databases
No ratings yet
Addbase - Prelim: Slide 1 - Review of Theory of Databases
6 pages
ESIA Study
No ratings yet
ESIA Study
28 pages
IRS unit-1
No ratings yet
IRS unit-1
61 pages
Lesson 2
No ratings yet
Lesson 2
50 pages
001 - OpenEdge Getting Started Database Essentials Gsdbe
No ratings yet
001 - OpenEdge Getting Started Database Essentials Gsdbe
142 pages
PPT_203105251-6
No ratings yet
PPT_203105251-6
54 pages
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
119 pages
Data Modeling and T-SQL: Meetings / Methodology
No ratings yet
Data Modeling and T-SQL: Meetings / Methodology
13 pages
Multimedia Database
No ratings yet
Multimedia Database
48 pages
Advanced Databases: Course 4 Querying SQL Server Databases
No ratings yet
Advanced Databases: Course 4 Querying SQL Server Databases
30 pages
Reading - W3 CLIO
No ratings yet
Reading - W3 CLIO
7 pages
Lect2 PDF
No ratings yet
Lect2 PDF
16 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Dbms Basics
No ratings yet
Dbms Basics
13 pages
Chapter-03
No ratings yet
Chapter-03
43 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Data Warehouse - Logical Design
No ratings yet
Data Warehouse - Logical Design
40 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Practice Questions 100level
No ratings yet
Practice Questions 100level
51 pages
Adding Custom Top Works To A CADWorx Specification FAQ
100% (1)
Adding Custom Top Works To A CADWorx Specification FAQ
12 pages
A1350FW5xx Communication
No ratings yet
A1350FW5xx Communication
45 pages
Let's Learn About: Past Tense
100% (1)
Let's Learn About: Past Tense
13 pages
Lesson 3 The Paradise Carpet
No ratings yet
Lesson 3 The Paradise Carpet
5 pages
English Lesson Plan: Early Years Education
No ratings yet
English Lesson Plan: Early Years Education
3 pages
5 Counting
No ratings yet
5 Counting
55 pages
Ob 10 Npi
No ratings yet
Ob 10 Npi
3 pages
SFL (Systemic Functional Linguistics) in Discourse Analysis
No ratings yet
SFL (Systemic Functional Linguistics) in Discourse Analysis
14 pages
2.4,2.5,2.6 Semi Ans
No ratings yet
2.4,2.5,2.6 Semi Ans
4 pages
Park Place
No ratings yet
Park Place
116 pages
Tocfl Test A1.1
No ratings yet
Tocfl Test A1.1
23 pages
The Discardment
100% (1)
The Discardment
17 pages
IELTS Pratice
No ratings yet
IELTS Pratice
2 pages
Abylay Adambayev: E-Mail: Phone Number: +18147771547 Current Address Permanent Address
No ratings yet
Abylay Adambayev: E-Mail: Phone Number: +18147771547 Current Address Permanent Address
2 pages
Adverbs of Degree
100% (3)
Adverbs of Degree
6 pages
Session Plan: LO1. LO2. LO3
No ratings yet
Session Plan: LO1. LO2. LO3
9 pages
Modes
No ratings yet
Modes
2 pages
CFL User Manual
No ratings yet
CFL User Manual
20 pages
Dhana Resume
No ratings yet
Dhana Resume
4 pages
Tertib Acara Pensi 2024
No ratings yet
Tertib Acara Pensi 2024
3 pages
Steve Jobs Speech Analysis
No ratings yet
Steve Jobs Speech Analysis
2 pages
Object Oriented Programming Using C++ (CS - 103) : Lab Manual # 06
No ratings yet
Object Oriented Programming Using C++ (CS - 103) : Lab Manual # 06
6 pages
AUDIENCE'S EVALUATION by Novi Nur Cahyati, L2, 30801800031
No ratings yet
AUDIENCE'S EVALUATION by Novi Nur Cahyati, L2, 30801800031
3 pages
Agujeros Estructurales
No ratings yet
Agujeros Estructurales
9 pages
Historical Student Guide
100% (2)
Historical Student Guide
10 pages
eBooks Bere 2019
No ratings yet
eBooks Bere 2019
46 pages

Lecture 09

Uploaded by

Lecture 09

Uploaded by

Lecture #9

Inte rne t W AN Inte rne t Inte rne t

Uniform query capability across autonomous, heterogeneous data

• How will we solve it?

Relational database (warehouse)

Data extraction Data cleaning/

Data Data Data

wrapper wrapper wrapper

Data Data Data

Create View Movie AS [S1(title,dir,year)]

• Movie(title, director, year), Oscar(title, year)

• In a DDBMS, data is distributed over a set

Query: select paper1 from Cites where paper2=“Hal00”

O ptim izer E xecution

query (R e-) E vent

M em A lloc- exec Q uery

C atalog Tem p S tore

Hash Join Double Pipelined Hash Join

Medical Search &

Station 3 Station 19 Station 12 Station 32

address contact-info num-baths

location contact full-baths half-baths

source schema 1 source schema 2 source schema 3

realestate.com homeseekers.com homes.com

Applications: WWW, enterprises, science projects

listed-price contact-name contact-phone office comments

listed-price contact-name contact-phone office comments

If “fantastic” & “great”

• Use a set of base learners:

You might also like