80% found this document useful (10 votes)

9K views

Lect#2 DDBS (Characteristics and Layers of Query Processing)

This document discusses query processing in distributed database systems. It describes the key characteristics of query processors, including the languages they support, types of optimization, when optimization occurs, use of statistics, where decisions are made, how network topology and replicated fragments are exploited, and use of semi-joins. It then explains the four main layers involved in distributed query processing: query decomposition, data localization, global query optimization, and distributed query execution. Query decomposition transforms queries into relational algebra and normalizes, analyzes, simplifies, and restructures queries.

Uploaded by

ridagul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

80% found this document useful (10 votes)

9K views

Lect#2 DDBS (Characteristics and Layers of Query Processing)

Uploaded by

ridagul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

Distributed Database

Systems
Week 5 and 6

Characteristics of query processing

Layers of Query Processing

by Razaullah Khan, The AUP.

Distributed Database Systems 1

Characterization of Query Processors
• Important characteristics of query processors that can be used as a basis
for comparison. First four characteristics hold for both centralized and
distributed query processors while the next four for distributed query
processors.
• 1. Languages
• Relational DBMSs use relational calculus
• Object DBMSs use Object calculus ( an extension of RDBMs)
• XML: Used to store and transport data over the internet, is
another data model uses XQuery and XPath (Chap 17)
• XQuery vs XPath. XQuery is a query language that is used to
query a group of XML data. XQuery for XML is the same as SQL for
DB. XPath is a xml path language that is used to select nodes
(navigate through elements) from an xml document using queries.
• Query processor must perform efficient mapping from the input
language to the output language

Distributed Database Systems 2

Characterization of Query Processors
2. Types of Optimization
•Query Optimization aims at choosing the “best” point in the solution space of
all possible execution strategies
•(i) Exhaustive search approach: An immediate method for query optimization
is to search the solution space, exhaustively predict the cost of each strategy,
and select the strategy with minimum cost.
•Although this method is effective in selecting the best strategy. It may incur a
significant processing cost for the optimization itself.
•The problem is that the solution space can be large; that is, there may be
many equivalent strategies, even with a small number of relations.
•(ii) Heuristics: Restrict the solution space to only a few strategies. Process
unary operators first, and then binary operators with increasing sizes, e.g.
replace join with semi-join to minimize data communication cost (as we have
discussed in fragments).

Distributed Database Systems 3

Characterization of Query Processors
3. Optimization Timing
Optimization can be done statically before executing the query or dynamically
as the query is executed.
Static query optimization; At query compilation time. Suitable for exhaustive
search method. The run time must be estimated using database statistics. But,
error may occur.
Dynamically; as the query is executed.
At any point of execution, the choice of the best next operator can be based on
accurate knowledge of the results of the operators executed previously.
Main adv. Is that the size of intermediate relations are available with query
processors.
Main disadv. Must be repeated for each execution of the query (so an
expensive task).
Hybrid query optimization; Basically static but dynamic query opt. may take
place. Adv. of both static and dynamic QP.

Distributed Database Systems 4

Characterization of Query Processors
4. Statistics
•The effectiveness of query optimization relies on statistics on the database.
•Dynamic query optimization requires statistics in order to choose which
operators should be done first.
•Static query optimization is even more demanding since the size of
intermediate relations must also be estimated based on statistical information.
•In DDB; statistics is related to fragments, size, and number of distinct values of
each attribute. Sometimes, to minimize the probability of errors; Histograms of
attribute values (freq. of occurrences for each attribute value), are created.
•Periodic updating is performed to achieve accuracy that might result in query
re-optimization.

Distributed Database Systems 5

Characterization of Query Processors
5. Decision Sites
•In static optimization: a single site or several sites may participate in the
selection of the strategy to be applied for answering the query.
•Most systems use the centralized decision approach, in which a single site
generates the strategy.
•However, the decision process could be distributed among various sites
participating in the elaboration of the best strategy.
•The centralized approach is simpler but requires knowledge of the entire
DDB, while the distributed approach requires only local information.
•Hybrid approaches where one site makes the major decision and other sites
can make local decisions.

Distributed Database Systems 6

Characterization of Query Processors
6. Exploitation of the Network Topology
•The network topology is generally exploited by the distributed
query processor.
•With WAN, the cost function can be restricted to the data
communication cost, and can be divided into two separate problems:
selection of the global execution strategy; based on inter-site
communication, and selection of each local execution strategy, based
on a centralized query processing algorithm.
•With LAN, communication costs are comparable to I/O costs.
•Therefore, it is reasonable for the distributed query processor to
increase parallel execution at the expense of communication cost.
•In a client-server env.; data shipping is also performed. To solve
the problem in an optimized way, the query work is divided among
server and client. Client also participate to execute the query.

Distributed Database Systems 7

Characterization of Query Processors
7. Exploitation of Replicated Fragments
•A distributed relation is usually divided into relation fragments.
•Distributed queries expressed on global relations are mapped into
queries on physical fragments of relations by translating relations
into fragments. This process is called localization because its main
function is to localize the data involved in the query.
•For higher reliability, it is useful to have fragments replicated at
different sites.
•Replicated fragments at run time helps to minimize communication
time.

Distributed Database Systems 8

Characterization of Query Processors
8. Use of Semi-joins
•The basic idea from semijoin is to reduce the communication cost
between different sites.
•It reduces the size of the operand relation.
•When the main cost component considered by the query processor is
communication, a semijoin is particularly useful for improving the
processing of distributed join operators as it reduces the size of data
exchanged between sites. For example:

Oracle semijoin q1: SELECT D.dept_id, D.dept_name FROM dept D WHERE EXISTS (SELECT 1
FROM emp E WHERE E.dept_id = D.dept_id) ORDER BY D.dept_id;

Oracle conventional join q2: SELECT D.dept_id, D.dept_name FROM dept D, emp E WHERE
E.dept_id = D.dept_id ORDER BY D.dept_id;

q1 sample output
q2 sample output

Distributed Database Systems 9

Layers of Query Processing

• The problem of query processing can be decomposed into

several sub-problems, corresponding to various layers.
• Each layer solves a well-defined sub-problem.
• The input is a query on global data expressed in relational
calculus.
• This query is posed on global (distributed) relations, meaning
that data distribution is hidden.

Distributed Database Systems 10

Layers of Query Processing
• Four main layers are involved in distributed query processing.
• Query decomposition
• Data localization
• Global query optimization, and
• Distributed query execution
• The first three layers map the input query into an optimized
distributed query execution plan.
• Query decomposition and data localization correspond to query
rewriting.
• The first three layers are performed by a central control site and
use schema information stored in the global directory (global query
optimizer  global conceptual schema). Schema is a skeleton or
structure of entire database.
• The fourth layer performs distributed query execution by executing
the plan and returns the answer to the query.

Distributed Database Systems 11

Distributed Database Systems 12
Layers of Query Processing
1. Query Decomposition
• Query decomposition is the first phase of query processing that
transforms a relational calculus query into a relational algebra
query.
• The information needed for this transformation is found in the
global conceptual schema describing the global relations.
•Both input and output queries refer to global relations, without
knowledge of the distribution of data.
•Therefore, query decomposition is the same for centralized and
distributed systems.

•The successive steps of query decomposition are (1) normalization,

(2) analysis, (3) elimination of redundancy, and (4) rewriting.

Distributed Database Systems 13

Layers of Query Processing
• Query Decomposition
• Query decomposition can be viewed as four successive steps.
• First, the calculus query is rewritten in a normalized form  logical
operator priority.
• Second, the normalized query is analyzed semantically so that  incorrect
queries are detected and rejected as early as possible.
• Third, the correct query (still expressed in relational calculus) is simplified.
One way to simplify a query is to eliminate redundant predicates.
• Fourth, the calculus query is restructured as an algebraic query.
• Several algebraic queries can be derived from the same calculus query, and
that some algebraic queries are “better” than others.
• Relational algebra query is represented graphically in an operator
tree.

Distributed Database Systems 14

Query Decomposition: operator tree
• An operator tree is a tree in which a leaf node is a relation stored in
the database, and a non-leaf node is an intermediate relation
produced by a relational algebra operator. The sequence of
operations is directed from the leaves to the root, which represents
the answer to the query.
• The transformation of a tuple relational calculus query into an
operator tree can easily be achieved as follows.
• In SQL, the leaves are immediately available in the FROM clause.
• Second, the root node is created as a project operation involving the
result attributes. These are found in the SELECT clause in SQL.
• Third, the qualification (SQL WHERE clause) is translated into the
appropriate sequence of relational operations (select, join, union,
etc.) going from the leaves to the root.
• The sequence can be given directly by the order of appearance of the
predicates and operators.

Distributed Database Systems 15

Example of Operator Tree

Distributed Database Systems 16

Layers of Query Processing
2. Data Localization
•The input to the second layer is an algebraic query on global
relations.
•The main role of the second layer is to localize the query’s data
using data distribution information in the fragment schema.
•In DDB, relations are fragmented and stored in disjoint subsets,
called fragments, each being stored at a different site.
•This layer determines which fragments are involved in the query
and transforms the distributed query into a query on fragments.

Distributed Database Systems 17

Layers of Query Processing
3. Global Query Optimization
•The input to the third layer is an algebraic query on fragments.
•The goal of query optimization is to find an execution strategy for the
query which is close to optimal.
•Query optimization consists of finding the “best” ordering of
operators in the query, including communication operators that
minimize a cost function (disk space, I/O, buffer space, CPU cost,
communication cost i.e. limited bandwidth).
•So, predict statistically the execution cost (i.e. static optimization)
•One aspect of query optimization is join ordering through the semijoin
operators.

Distributed Database Systems 18

Layers of Query Processing

4. Distributed Query Execution

•The last layer is performed by all the sites having fragments involved in
the query.
•Each subquery executing at one site, called a local query, is then
optimized using the local schema of the site and executed.
•At this time, the algorithms to perform the relational operators may be
chosen.
•Local optimization uses the algorithms of centralized systems

Distributed Database Systems 19

The End

Distributed Database Systems 20

PPL Complete Notes Jntuh
No ratings yet
PPL Complete Notes Jntuh
125 pages
Database Management Systems Complete Notes
100% (7)
Database Management Systems Complete Notes
181 pages
DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
DBMS Solved Paper
100% (1)
DBMS Solved Paper
39 pages
Distributed Database Questions
33% (3)
Distributed Database Questions
8 pages
DBMS - R18 UNIT 5 Notes
86% (7)
DBMS - R18 UNIT 5 Notes
23 pages
Distributed System Important Questions
50% (2)
Distributed System Important Questions
1 page
Shotcreting in Australia
No ratings yet
Shotcreting in Australia
84 pages
Distributed DBMS Reliability Unit IV
100% (1)
Distributed DBMS Reliability Unit IV
27 pages
Unit-1 Problem Areas in A Distributed DDBMS
100% (3)
Unit-1 Problem Areas in A Distributed DDBMS
8 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Semantic Integrity Control in Distributed DBMSS: References
100% (1)
Semantic Integrity Control in Distributed DBMSS: References
33 pages
Chapter 6: Query Decomposition and Data Localization
0% (1)
Chapter 6: Query Decomposition and Data Localization
26 pages
Distributed Database Design Concept
No ratings yet
Distributed Database Design Concept
5 pages
Dbms - r18 Unit 4 Notes
100% (2)
Dbms - r18 Unit 4 Notes
29 pages
Chapter 4: Semantic Data Control: View Management Security Control Integrity Control
100% (1)
Chapter 4: Semantic Data Control: View Management Security Control Integrity Control
25 pages
Persistent Programming Language
No ratings yet
Persistent Programming Language
2 pages
DBMS - QUESTION BANK With Answer TIE
100% (3)
DBMS - QUESTION BANK With Answer TIE
56 pages
Dbms Unit 1 Acoording To AKTU Syllabus
100% (1)
Dbms Unit 1 Acoording To AKTU Syllabus
22 pages
DBMS Technical Publications Chapter 1
100% (2)
DBMS Technical Publications Chapter 1
24 pages
Data Mining-Mining Time Series Data
0% (1)
Data Mining-Mining Time Series Data
7 pages
Jntuh r18 DM Gunshot ? Very Important ??? Questions and Answers
No ratings yet
Jntuh r18 DM Gunshot ? Very Important ??? Questions and Answers
95 pages
Update Operation Violations
No ratings yet
Update Operation Violations
13 pages
7.3. Objectives of Distributed Transaction Management
No ratings yet
7.3. Objectives of Distributed Transaction Management
2 pages
DBMS UNIT-3 Notes
100% (3)
DBMS UNIT-3 Notes
45 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Daa Handwritten Notes
No ratings yet
Daa Handwritten Notes
43 pages
DBMS Unit - 1 and Unit-2 Notes
100% (3)
DBMS Unit - 1 and Unit-2 Notes
62 pages
Links and Associations
No ratings yet
Links and Associations
24 pages
DBMS-Super Important questions-18CS53
No ratings yet
DBMS-Super Important questions-18CS53
4 pages
20 Distributed Reliability Protocols PDF
0% (2)
20 Distributed Reliability Protocols PDF
31 pages
Operating System Handwritten Notes
0% (1)
Operating System Handwritten Notes
68 pages
Unit Iv: Transaction and Concurrency
No ratings yet
Unit Iv: Transaction and Concurrency
54 pages
Third Year Sixth Semester CS6601 Distributed System 2 Mark With Answer
86% (7)
Third Year Sixth Semester CS6601 Distributed System 2 Mark With Answer
25 pages
Dbms 2 Marks
100% (1)
Dbms 2 Marks
17 pages
DDBMS Questions Answers
No ratings yet
DDBMS Questions Answers
4 pages
AI Unit-3 Notes
No ratings yet
AI Unit-3 Notes
23 pages
Unit 5: 1) Reactive Vs Proactive Risk Strategies
No ratings yet
Unit 5: 1) Reactive Vs Proactive Risk Strategies
10 pages
Distributed Cost Model
0% (1)
Distributed Cost Model
52 pages
Chapter 7 Common Standard in Cloud Computing: Working Group
No ratings yet
Chapter 7 Common Standard in Cloud Computing: Working Group
6 pages
SPM Unit Wise Important Questions
No ratings yet
SPM Unit Wise Important Questions
2 pages
PPL Unit 2 Notes
No ratings yet
PPL Unit 2 Notes
27 pages
Answer Key - DBMS - June 2023
No ratings yet
Answer Key - DBMS - June 2023
25 pages
Dbms-Unit-3 - Aktu
100% (1)
Dbms-Unit-3 - Aktu
7 pages
DWM - Viva and Short Question Answers
No ratings yet
DWM - Viva and Short Question Answers
24 pages
Daa Unit 1 Notes
67% (3)
Daa Unit 1 Notes
67 pages
Daa Two Mark Questions
No ratings yet
Daa Two Mark Questions
9 pages
DBMS (R23) UNIT - 1
No ratings yet
DBMS (R23) UNIT - 1
15 pages
Unit Wise Important Questions
83% (12)
Unit Wise Important Questions
11 pages
DBMS Module-3-Notes - SQL
100% (1)
DBMS Module-3-Notes - SQL
26 pages
Vtu 5TH Sem Cse DBMS Notes
85% (20)
Vtu 5TH Sem Cse DBMS Notes
34 pages
Influences On Language Design
100% (1)
Influences On Language Design
7 pages
Unit Wise Possible Questions: Software Engineering
50% (2)
Unit Wise Possible Questions: Software Engineering
2 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Query Processing
No ratings yet
Query Processing
28 pages
SF8 - UNIT 2 DDB
No ratings yet
SF8 - UNIT 2 DDB
97 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Distributed Database Management Notes - 3
86% (7)
Distributed Database Management Notes - 3
48 pages
A Survey of Distributed Query Optimization
No ratings yet
A Survey of Distributed Query Optimization
10 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
Effectiveness of Inductive and Deductive Method of Teaching On Achievement in Science of Secondary School Students
No ratings yet
Effectiveness of Inductive and Deductive Method of Teaching On Achievement in Science of Secondary School Students
6 pages
Amee 2007 Abstracts
No ratings yet
Amee 2007 Abstracts
241 pages
Problems and Prospects of Science Education at Secondary Level in Pakistan
No ratings yet
Problems and Prospects of Science Education at Secondary Level in Pakistan
264 pages
Test of Basic and Integrated Science Process Skills (T-BISPS) : How Do Form Four Students in Kelantan Fare
No ratings yet
Test of Basic and Integrated Science Process Skills (T-BISPS) : How Do Form Four Students in Kelantan Fare
16 pages
Lect#6 DDBS (Integrity Constraints) - Security Issues, Authorization
No ratings yet
Lect#6 DDBS (Integrity Constraints) - Security Issues, Authorization
31 pages
Introducing Curriculum: Allama Iqbal Open UNIVERSITY, Islamabad
No ratings yet
Introducing Curriculum: Allama Iqbal Open UNIVERSITY, Islamabad
22 pages
Finall&t
No ratings yet
Finall&t
2 pages
Systems Architecture. Strategy and product development for complex systems Bruce Cameroninstant download
100% (2)
Systems Architecture. Strategy and product development for complex systems Bruce Cameroninstant download
50 pages
Disc
No ratings yet
Disc
4 pages
HT System
0% (1)
HT System
11 pages
Securty SOP
No ratings yet
Securty SOP
84 pages
Transpo Cases
No ratings yet
Transpo Cases
141 pages
Undergraduate Thesis Submitted To The Faculty of The Cavite State University-Trece Martires City, Campus Trece Martires City, Cavite
No ratings yet
Undergraduate Thesis Submitted To The Faculty of The Cavite State University-Trece Martires City, Campus Trece Martires City, Cavite
14 pages
5.1.1 1614852171 5451
No ratings yet
5.1.1 1614852171 5451
14 pages
Document
No ratings yet
Document
8 pages
Chapter 6 - Memory Management
No ratings yet
Chapter 6 - Memory Management
60 pages
6.2departmental Accounts Problems
No ratings yet
6.2departmental Accounts Problems
11 pages
Model DVSU Model DVSHU: Submersible Semi-Open Vortex Sewage Pump
No ratings yet
Model DVSU Model DVSHU: Submersible Semi-Open Vortex Sewage Pump
12 pages
Formative Trig. Scheme
No ratings yet
Formative Trig. Scheme
10 pages
Business Fundamentals BSC Curriculum Integration
No ratings yet
Business Fundamentals BSC Curriculum Integration
36 pages
Appendix I
No ratings yet
Appendix I
3 pages
The Business Research Process
100% (1)
The Business Research Process
41 pages
Pt. Korindo bara energi- COA PT. KBE - BG INDAH ABADI 02 - TB INDAH ABADI 01
No ratings yet
Pt. Korindo bara energi- COA PT. KBE - BG INDAH ABADI 02 - TB INDAH ABADI 01
1 page
English q3 Module8
No ratings yet
English q3 Module8
8 pages
Class Test-1
No ratings yet
Class Test-1
2 pages
Building parallel programs SMPs clusters and Java 1st Edition Alan Kaminsky - The ebook is available for quick download, easy access to content
100% (1)
Building parallel programs SMPs clusters and Java 1st Edition Alan Kaminsky - The ebook is available for quick download, easy access to content
31 pages
Android Users Guide
No ratings yet
Android Users Guide
81 pages
Anthurium Varieties Performance and Economics Under Greenhouse.
33% (3)
Anthurium Varieties Performance and Economics Under Greenhouse.
4 pages
Pisa_mathematics Test Questions2
No ratings yet
Pisa_mathematics Test Questions2
89 pages
A Study On Consumers Perception Towards
No ratings yet
A Study On Consumers Perception Towards
8 pages
0.75 in Sockolet 36-2 X 3-4 6m BW X SW HT Ca0611
No ratings yet
0.75 in Sockolet 36-2 X 3-4 6m BW X SW HT Ca0611
2 pages
Matalan Po Formate
No ratings yet
Matalan Po Formate
6 pages
incometax_assignment
No ratings yet
incometax_assignment
1 page
Scorecard Spreadsheet - 4
No ratings yet
Scorecard Spreadsheet - 4
5 pages
Case Study
No ratings yet
Case Study
6 pages

Lect#2 DDBS (Characteristics and Layers of Query Processing)

Uploaded by

Lect#2 DDBS (Characteristics and Layers of Query Processing)

Uploaded by

Distributed Database

Characteristics of query processing

by Razaullah Khan, The AUP.

Distributed Database Systems 1

Distributed Database Systems 2

Distributed Database Systems 3

Distributed Database Systems 4

Distributed Database Systems 5

Distributed Database Systems 6

Distributed Database Systems 7

Distributed Database Systems 8

Distributed Database Systems 9

• The problem of query processing can be decomposed into

Distributed Database Systems 10

Distributed Database Systems 11

•The successive steps of query decomposition are (1) normalization,

Distributed Database Systems 13

Distributed Database Systems 14

Distributed Database Systems 15

Distributed Database Systems 16

Distributed Database Systems 17

Distributed Database Systems 18

4. Distributed Query Execution

Distributed Database Systems 19

Distributed Database Systems 20

You might also like