Class 19
Class 19
(R S) (S R) (Commute)
More Equivalences
A projection commutes with a selection that only
uses attributes retained by the projection.
Selection between attributes of the two
arguments of a cross-product converts cross-
product to a join.
A selection on just attributes of R commutes with
R S. (i.e., (R S) (R) S)
Similarly, if a projection follows a join R S, we
can `push’ it by retaining only attributes of R (and
S) that are needed for the join or are kept by the
projection.
Enumeration of Alternative
Plans
There are two main cases:
– Single-relation plans
– Multiple-relation plans
For queries over a single relation, queries consist of a
combination of selects, projects, and aggregate ops:
– Each available access path (file scan / index) is considered,
and the one with the least estimated cost is chosen.
– The different operations are essentially carried out
together (e.g., if an index is used for a selection, projection
is done for each retrieved tuple, and the resulting tuples
are pipelined into the aggregate computation).
Queries Over Multiple
Relations
Fundamental decision in System R: only left-deep
join trees are considered.
– As the number of joins increases, the number of alternative
plans grows rapidly; we need to restrict the search space.
– Left-deep trees allow us to generate all fully pipelined
plans.
Intermediate results not written to temporary files.
D D
C C
A B C D A B B
A
Enumeration of Left-Deep
Plans
Left-deep plans differ only in the order of relations,
the access method for each relation, and the join
method for each join.
Enumerated using N passes (if N relations joined):
– Pass 1: Find best 1-relation plan for each relation.
– Pass 2: Find best way to join result of each 1-relation plan
(as outer) to another relation. (All 2-relation plans.)
– Pass N: Find best way to join result of a (N-1)-relation plan
(as outer) to the Nth relation. (All N-relation plans.)
For each subset of relations, retain only:
– Cheapest plan overall, plus
– Cheapest plan for each interesting order of the tuples.
Enumeration of Plans
(Contd.)
ORDER BY, GROUP BY, aggregates etc. handled
as a final step, using either an `interestingly
ordered’ plan or an additional sorting
operator.
An N-1 way plan is not combined with an
additional relation unless there is a join
condition between them, unless all
predicates in WHERE have been used up.
– i.e., avoid Cartesian products if possible.
In spite of pruning plan space, this approach
is still exponential in the # of tables.
Sailors:
sname
B+ tree on rating
Example Hash on sid
Reserves:
Pass1: B+ tree on bid
sid=sid
– Sailors: B+ tree matches rating>5,
and is probably cheapest. However,
if this selection is expected to bid=100 rating > 5
retrieve a lot of tuples, and index is
unclustered, file scan may be cheaper.
Still, B+ tree plan kept (because tuples are in rating order).Sailors
Reserves
Pass 2:
– We consider each plan retained from Pass 1 as the
outer, and consider how to join it with the (only)
other relation.
e.g., Reserves as outer: Hash index can be used to get
Sailors tuples 1
Summary
Query optimization is an important task in a
relational DBMS.
Typically optimize 1 “select…” (query block) at a
time
Must understand optimization in order to understand
the performance impact of a given database design
(relations, indexes) on a workload (set of queries).
Two parts to optimizing a query:
– Consider a set of alternative plans.
Must prune search space; typically, left-deep plans only.
– Must estimate cost of each plan that is considered.
Must estimate size of result and cost for each plan node.
Key issues: Statistics, indexes, operator implementations.
1