0% found this document useful (0 votes)
24 views11 pages

Class 19

Uploaded by

ktang950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Class 19

Uploaded by

ktang950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Relational Query

Optimization II: Size


Estimation and Plan
Generation
Size Estimation and Reduction
Factors SELECT attribute list
FROM relation list
 Consider a query block:WHERE term1 AND ... AND termk
 Maximum # tuples in result is the product of the
cardinalities of relations in the FROM clause.
 Reduction factor (RF) associated with each term
reflects the impact of the term in reducing result
size. Result cardinality = Max # tuples * product
of all RF’s.
– Implicit assumption that terms are independent!
– Term col=value has RF = 1/NKeys(I), given index I on col
– Term col1=col2 has RF = 1/MAX(NKeys(I1), NKeys(I2))
– Term col>value has RF = (High(I)-value)/(High(I)-Low(I))
Result Size Estimation
(cont’d)
 Selections:
– Use reduction factors, logic, probability
– Often use histograms, other statistics
 Projections:
– Ignore duplicate elimination (done last, if
needed)
– Tuple size is reduced  fewer pages for result
 Joins: like selections, plus:
– Use candidate key information
– Should consider “dangling” tuples
Relational Algebra
Equivalences
 Allow us to choose different join orders and to
`push’ selections and projections ahead of
joins.  c1 ... cn  R   c1  . . .  cn  R
 Selections:
 c1  c 2  R   c 2  c1  R (Commute)
(Cascade)

 Projections:  a1  R   a1 . . .  an  R (Cascade)
 Joins:
R (S T)  (R S) T (Associative)

(R S)  (S R) (Commute)
More Equivalences
 A projection commutes with a selection that only
uses attributes retained by the projection.
 Selection between attributes of the two
arguments of a cross-product converts cross-
product to a join.
 A selection on just attributes of R commutes with
R S. (i.e.,  (R S)   (R) S)
 Similarly, if a projection follows a join R S, we
can `push’ it by retaining only attributes of R (and
S) that are needed for the join or are kept by the
projection.
Enumeration of Alternative
Plans
 There are two main cases:
– Single-relation plans
– Multiple-relation plans
 For queries over a single relation, queries consist of a
combination of selects, projects, and aggregate ops:
– Each available access path (file scan / index) is considered,
and the one with the least estimated cost is chosen.
– The different operations are essentially carried out
together (e.g., if an index is used for a selection, projection
is done for each retrieved tuple, and the resulting tuples
are pipelined into the aggregate computation).
Queries Over Multiple
Relations
 Fundamental decision in System R: only left-deep
join trees are considered.
– As the number of joins increases, the number of alternative
plans grows rapidly; we need to restrict the search space.
– Left-deep trees allow us to generate all fully pipelined
plans.
 Intermediate results not written to temporary files.

 Not all left-deep trees are fully pipelined (e.g., SM join).

D D

C C

A B C D A B B
A
Enumeration of Left-Deep
Plans
 Left-deep plans differ only in the order of relations,
the access method for each relation, and the join
method for each join.
 Enumerated using N passes (if N relations joined):
– Pass 1: Find best 1-relation plan for each relation.
– Pass 2: Find best way to join result of each 1-relation plan
(as outer) to another relation. (All 2-relation plans.)
– Pass N: Find best way to join result of a (N-1)-relation plan
(as outer) to the Nth relation. (All N-relation plans.)
 For each subset of relations, retain only:
– Cheapest plan overall, plus
– Cheapest plan for each interesting order of the tuples.
Enumeration of Plans
(Contd.)
 ORDER BY, GROUP BY, aggregates etc. handled
as a final step, using either an `interestingly
ordered’ plan or an additional sorting
operator.
 An N-1 way plan is not combined with an
additional relation unless there is a join
condition between them, unless all
predicates in WHERE have been used up.
– i.e., avoid Cartesian products if possible.
 In spite of pruning plan space, this approach
is still exponential in the # of tables.
Sailors:
sname
B+ tree on rating
Example Hash on sid
Reserves:
 Pass1: B+ tree on bid
sid=sid
– Sailors: B+ tree matches rating>5,
and is probably cheapest. However,
if this selection is expected to bid=100 rating > 5
retrieve a lot of tuples, and index is
unclustered, file scan may be cheaper.
 Still, B+ tree plan kept (because tuples are in rating order).Sailors
Reserves

– Reserves: B+ tree on bid matches bid=500; cheapest.

 Pass 2:
– We consider each plan retained from Pass 1 as the
outer, and consider how to join it with the (only)
other relation.
e.g., Reserves as outer: Hash index can be used to get
Sailors tuples 1
Summary
 Query optimization is an important task in a
relational DBMS.
 Typically optimize 1 “select…” (query block) at a
time
 Must understand optimization in order to understand
the performance impact of a given database design
(relations, indexes) on a workload (set of queries).
 Two parts to optimizing a query:
– Consider a set of alternative plans.
 Must prune search space; typically, left-deep plans only.
– Must estimate cost of each plan that is considered.
 Must estimate size of result and cost for each plan node.
 Key issues: Statistics, indexes, operator implementations.
1

You might also like