Distributed Database Systems
Chapter 8
Decomposition
&
Data Localization
Distrubuted Database System 1
Query Decomposition
Distrubuted Database System 2
Query Decomposition
1 • Normalization
2 • Analysis
• Elimination of
3 Redundancy
4 • Rewriting
Distrubuted Database System 3
1. Normalization
For SQL statement, this is the
normalization on predicates in WHERE
clause, which may be arbitrarily complex,
quantifier-free predicate, preceded by
necessary quantifier (, )
Conjunctive normal form (more practical)
Disjunctive normal form
Distrubuted Database System 4
1. Normalization
Transformation rules
Distrubuted Database System 5
1. Normalization
Example
The conjuctive normal form
Distrubuted Database System 6
2. Analysis
Objective
◦ Reject type incorrect or semantically incorrect
queries
Type incorrect
◦ Undefined relation and attribute, wrong type
mapping etc.
Distrubuted Database System 7
2. Analysis
Example
Distrubuted Database System 8
2. Analysis
Semantically incorrect
◦ A query has some components not
contributing to the query result
Fact
◦ It’s impossible to determine the semantic
correctness of a general query. But it is
possible to do so for queries not containing
v and ¬
Distrubuted Database System 9
2. Analysis – Tool of Analysis
Query Graph
◦ one node representing the result relation
◦ other nodes to represent operand relations, and
◦ edges of two classes
an edge to represent a join if neither of its two nodes
is the result
an edge to represent a projection if one of its node is
the result node
Nodes and edges may be labeled by predicates for
selection, projection or join.
Distrubuted Database System 10
2. Analysis – Tool of Analysis
Join Graph
◦ a subgraph of query graph for join operation
Distrubuted Database System 11
2. Analysis – Tool of Analysis
Example 1
Distrubuted Database System 12
2. Analysis – Tool of Analysis
Example 1 – Query Graph
Distrubuted Database System 13
2. Analysis – Tool of Analysis
Example 1 – Join Graph
Distrubuted Database System 14
2. Analysis – Tool of Analysis
A conjunctive query without negation is
semantically incorrect if its query graph is
NOT connected!
Distrubuted Database System 15
2. Analysis – Tool of Analysis
Example 2
Distrubuted Database System 16
2. Analysis – Tool of Analysis
Example 2 – Query Graph
Distrubuted Database System 17
3. Elemination of Redundancy
The technique using idem potency rules
to eliminate redundant predicates from
WHERE clause.
Distrubuted Database System 18
3. Elemination of Redundancy
Example
Distrubuted Database System 19
4. Rewriting
Rewrite a calculus query in relational
algebra:
◦ translation, and
◦ reconstruction of algebra query to improve
performance
Distrubuted Database System 20
3. Rewriting
Relational algebra tree
a tree defined by:
◦ a root node representing the query result
◦ leaves representing database relations
◦ non-leaf nodes representing relations
produced by operations, and
◦ edges from leaves to root representing the
sequences of operations
Distrubuted Database System 21
3. Rewriting
How to translate an SQL query into an
algebra tree
1. create a leaf for every relation in the FROM
clause
2. create the root as a project operation
involving attributes in the SELECT clause
3. create the operation sequence by the
predicates and operators in the WHERE
clause
Distrubuted Database System 22
3. Rewriting
Example 1
Distrubuted Database System 23
3. Rewriting
Example 1 – Query tree
Distrubuted Database System 24
3. Rewriting
How to use transformation rules to
optimize
◦ separate unary operations to simplify the
query expression
◦ unary operations on the same relation may be
grouped to access the same relation once
◦ unary operations may be commuted with
binary operations, so that may be performed
first to reduce the size of intermediate
relations
◦ binary operations may be reordered
Distrubuted Database System 25
3. Rewriting
Example 2 – the optimization of previous
query tree
Distrubuted Database System 26
Localization of Distributed Data
Distrubuted Database System 27
Localization of Distributed Data
Task:
Translate a query on global relation into algebra
queries on physical fragment, and optimize the
query by reduction.
Distrubuted Database System 28
Reduction for Primary Horizontal
Fragmentation
Example:
EMP(ENO, ENAME, TITLE) is fragmented
Distrubuted Database System 29
Reduction for Primary Horizontal
Fragmentation
Reduction with selection
Rule 1
Distrubuted Database System 30
Reduction for Primary Horizontal
Fragmentation
Example 1
For the fragmented EMP we have
Distrubuted Database System 31
Reduction for Primary Horizontal
Fragmentation
Example 1- Step 1
Generate a global query tree
Distrubuted Database System 32
Reduction for Primary Horizontal
Fragmentation
Example 1- Step 2
Substitute fragments for EMP
Distrubuted Database System 33
Reduction for Primary Horizontal
Fragmentation
Example 1- Step 3
Substitute fragments for EMP
ENO=”E5” is contradictory to
ENO<=”E3” and ENO>”E6”
Distrubuted Database System 34
Reduction for Primary Horizontal
Fragmentation
Reduction with join
Rule 2
Distrubuted Database System 35
Reduction for Primary Horizontal
Fragmentation
Note the following transformation is often
used to eliminate useless join in
reduction:
Distrubuted Database System 36
Reduction for Primary Horizontal
Fragmentation
Example
Assume EMP is fragmented as before, and ASG
is fragmented as
Distrubuted Database System 37
Reduction for Primary Horizontal
Fragmentation
Example
EMP and ASG are fragmented using predicates
on the same attribute ENO
Distrubuted Database System 38
Reduction for Primary Horizontal
Fragmentation
Example
Generic query tree
Distrubuted Database System 39
Reduction for Primary Horizontal
Fragmentation
Example
Reduced query tree
Distrubuted Database System 40
Reduction for Vertical Fragmentation
The reconstruction operation for a
relation vertically fragmented is join.
Every fragment must contain the key of
the relation
Distrubuted Database System 41
Reduction for Vertical Fragmentation
Example
Distrubuted Database System 42
Reduction for Vertical Fragmentation
Let R(A1, A2, An) be a relation,
Rule 3
Distrubuted Database System 43
Reduction for Vertical Fragmentation
Example
Distrubuted Database System 44
Reduction for Vertical Fragmentation
Example
Generic query tree
Distrubuted Database System 45
Reduction for Vertical Fragmentation
Example
Reduced query tree
Distrubuted Database System 46
Reduction for Derived Fragmentation
S is primary horizontal fragmented, R is
fragmented by , where A is the
common attributes set, and a foreign key
of R referring to S.
Distrubuted Database System 47
Reduction for Derived Fragmentation
Example
In the Engineering database ASG is
fragmented based on EMP as
Distrubuted Database System 48
Reduction for Derived Fragmentation
Query optimization method Distribute
joins over unions and eliminate those
useless joins due to predicate conflicts.
Example
Distrubuted Database System 49
Reduction for Derived Fragmentation
Example
Generic Query(ignoring the final
projection)
Distrubuted Database System 50
Reduction for Derived Fragmentation
Example
Distrubuted Database System 51
Reduction for Derived Fragmentation
Example
Distrubute join over union
Distrubuted Database System 52
Reduction for Derived Fragmentation
Example
Remove the useless join (left brach of the tree)
to get the best result.
Distrubuted Database System 53
Reduction for Hybrid Fragmentation
Hybrid Fragmentation
The Combination of horizontal and
vertical fragmentation
Distrubuted Database System 54
Reduction for Hybrid Fragmentation
Example
EMP is vertically fragmented first, and then
horizontally next.
Distrubuted Database System 55
Reduction for Hybrid Fragmentation
Combine all discussed three rules to
reduce hybrid fragmentation.
Example
A query on EMP fragmented as above example.
Distrubuted Database System 56
Reduction for Hybrid Fragmentation
Example
By rule 3, E3 is eliminated, and by rule 1, E1 is
eliminated. The reduced query is
Distrubuted Database System 57
Conclusions
Distrubuted Database System 58
Conclusions
Decomposition generates algebraic
queries from calculus queries.
Localization express algebraic queries on
fragments. An algebraic query can be
optimized by transformation, heuristics,
and elimination of useless operations.
Distrubuted Database System 59