14. DB - Lecture Query Optimization
14. DB - Lecture Query Optimization
Optimization
1
Objectives
• Introduction
• Major Phases in Query Processing
• Optimizing Queries
2
Introduction
3
Query Processing
• It is a procedure of transforming high-level query
(SQL) into low level language.
• Parsing and Translation
• Optimization
• Evaluation
Query:
select salary from Employee where
salary>10000;
Relational Algebra:
σsalary>10000 (πsalary (Employee))
πsalary (σsalary>10000 (Employee))
4
Indexes and Query
Optimization
5
Processing a High-Level
Query
Query in high level language
Query Optimizer
Execution plan
Query result
6
Processing a High-Level
Query
• The scanner identifies the query tokens—such as SQL
keywords, attribute names, and relation names—that
appear in the text of the query, whereas the
• parser checks the query syntax to determine whether it
is formulated according to the syntax rules (rules of
grammar) of the query language. The query must also be
• validated by checking that all attribute and relation
names are valid and semantically meaningful names in
the schema of the particular database being
• queried.
7
Processing a High-Level
Query
• An internal representation of the query is then
created, usually as a tree data structure called a
query tree.
• It is also possible to represent the query using a
graph data structure called a query graph, which is
generally a directed acyclic graph (DAG).
• The DBMS must then devise an execution strategy or
query plan for retrieving the results of the query from
the database files.
• A query has many possible execution strategies, and
the process
8
Major Phases in Query
Processing
• Scan, Parse, and Validate Query: Transform query written in
high level language (e g. SQL), into correct efficient execution
strategy expressed in low-level language (e. g. Relational
Algebra)
9
Example: Translating SQL Queries into
Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
10
Example: Translating SQL Queries into
Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
11
Optimizing Queries
• The process of choosing a suitable execution strategy for
processing a query
• As there are many equivalent transformations from same high-
level query, aim is to choose the one that minimizes resource
usage
• Generally, an efficient execution plan reduces the total execution
time of a query; thereby reducing the response time of a query
• The problem is computationally intractable with large number of
relations, so the strategy adopted is reduced to finding a near
optimum solution
• Two main techniques for query optimization are:
• Heuristic rules that order operations in a query
• Comparing different strategies based on relative cost, and
selecting one that minimizes resource usage. 12
Steps in Query
Optimization
• Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups
of operations based on the access paths available on
the files involved in the query.
checked_out
call_numbe copy_numbe borrower_i date_du
r r d e
borrower
borrower_i name
d
14
Example Query
• SELECT title
FROM book NATURAL JOIN
book_author WHERE author = "Bruce
Schneier"
• Many possible execution plans. For
example:
A. πtitle (σ author = ‘Bruce Schneier’
(Book ⋈ BookAuthor))
• Compare:
A. πtitle (σ author = ‘Bruce Schneier’ (Book ⋈
BookAuthor))
• Relevant information:
• How many records are in each table?
• What indexes do we have?
• How many books did Bruce Schneier write?
16
Evaluating Execution Plans
• Compare:
A. πtitle (σauthor = ‘Bruce Schneier’ (Book ⋈ BookAuthor))
B. πtitle (Book ⋈ (σauthor = ‘Bruce Schneier’ BookAuthor))
• Suppose:
• BookAuthor has 20K tuples
• Book has 10K tuples (an average of two authors per book)
• Only 2 BookAuthor tuples contain “Bruce Schneier”
• Relevant indexes exist
• Selection Strategies
• Join Strategies
18
Selection Strategies
• How to perform selection (σ)?
• Example:
A. Borrower and BookAuthor have no attributes in
common, so a cartesian product is formed. This
results in a temporary table with 20 million tuples!
22
Statistics and
Query
Optimization
• Using statistics about database objects can
help speed up queries
• On a relation r:
• nr = number of tuples in the relation
• lr = size (in bytes) of a tuple in the relation
• fr = blocking factor, number of tuples per block
• br = number of blocks used by the relation
• Thus:
• fr = floor( block size / lr ) if tuples do not span
blocks
• br = ceiling( nr / fr ) if tuples in r reside in a
single file and are not clustered with other
relations 24
Table Statistics
Table nr lr V( A , r )
borrower 2000 58 bytes V( borrower_id, Borrower ) =
checked_ou 1000 74 bytes 2000
t V( borrower_id, CheckedOut ) =
book_autho 10,000 100 100
r bytes V( callNo, CheckedOut ) = 500
V( callNo, BookAuthor ) = 5000
26
Calculating the Size
of a Cartesian
Product
• Cartesian product: r × s
• Number of tuples in join: nr × s = nr * ns
• Size of each tuple in join: lr × s = lr + ls
27
Estimating the Size of a
Join
• Natural join: r ⋈ s, where r and s have A in common
• Estimated number of tuples in join:
nr ⋈ s = ns * nr / max( V(A, r), V(A, s) )
• Number of unique values: V(A, r ⋈ s) = min( V(A, r),
V(A, s) )
• Some tuples in the relation with the larger number of
column values do not join with any tuples in the other
relation
28
Example Join Estimation
• π name, author Borrower ⋈ BookAuthor ⋈ CheckedOut
29
Rules of Equivalence
• Reordering the joins improved performance,
without changing the results!
• More generally, two formulations of a
query are "equivalent" if they produce the
same set of results
• Tuples aren't necessarily in the same order
• SELECT title
FROM book NATURAL JOIN
book_author WHERE author = "Bruce
Schneier"
• "Equivalent" execution plans:
• Commutativity:
• A binary operation * is commutative if for all 𝑥, 𝑦:
𝑥∗𝑦=𝑦∗𝑥
• Associativity
• A binary operation * is associative if for all 𝑥, 𝑦, 𝑧:
𝑥∗𝑦 ∗ 𝑧 = 𝑥 ∗𝑦 ∗ 𝑧
32
Rules of Equivalence
33
Rules of Equivalence
34
Rules of Equivalence
35
Rules of Equivalence
36
Push Selections Inward
• Do selections as early as possible
• Reduces (“flattens”) the number of records in the
relation(s) being joined
• Example:
• πtitle (σauthor = ‘Bruce Schneier’ (Book ⋈ BookAuthor))
• πtitle (Book ⋈ (σauthor = ‘Bruce Schneier’ BookAuthor))
• Sometimes this is not feasible:
• σ Borrower.name = BookAuthor.author Borrower × BookAuthor
37
Push Projections Inward
• Example:
•π name, title, dateDue Borrower ⋈ CheckedOut ⋈ Book
38
Optimization Algorithm
Statistically based query optimization/Cost-based Optimizer: Uses sophisticated
algorithms based on statistics about the objects being accessed to determine the
best approach to execute a query. In this case, the optimizer process adds up the
processing cost, the I/O costs, and the resource costs (RAM and temporary space)
to determine the total cost of a given execution plan
A rule-based optimizer uses preset rules and points to determine the best
approach to execute a query. The rules assign a “fixed cost” to each SQL
operation; the costs are then added to yield the cost of the execution plan. For
example, a full table scan has a set cost of 10, while a table access by row ID
has a set cost of 3
39
Example
Consider the query
Select balance
From account
Where balance <2500;
This query can be translated into either of the following relational algebra
expressions:
The query which takes less CPU time, CPU cycles or disk access
will have less cost and will be executed.
40
SQL Query & Relation Algebr
Expression
SQL Query
SELECT ENAME
FROM EMPLOYEE, WORKSON, PROJECT
WHERE PNAME = 'database' AND PNUM = PNO AND ENO = ENUM
AND BDATE > '1965';
41
Query Trees
• A tree data structure that corresponds to a relational algebra
expression.
• It represents the input relations of the query as leaf nodes of
the tree, and represents the relational algebra operations as
internal nodes.
• An execution of the query tree consists of executing an internal node
operation whenever its operands are available and then replacing
that internal node by the relation that results from executing the
operation.
• The order of execution of operations starts at the leaf nodes, which
represents the input database relations for the query, and ends at
the root node, which represents the final operation of the query. The
execution terminates when the root node operation is executed and
produces the result relation for the query.
42
Query Tree Example
43
Example 1
Select NAME From Student Where Major = ‘ICS’
44
Example 2
SELECT DeptName From Student, Department Where Code
= ‘Major’ AND year = 4
Relation Algebra Expression
45
Example 3 – Query Tree
Relation Algebra Expression: πp(R ⋈ R.P = S.P S)
46
Example 4 – Query Tree
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation =
‘Stanford’(PROJECT)) ⋈ Dnum=Dnumber(DEPARTMENT)) ⋈
Mgr_ssn=Ssn(EMPLOYEE))
47
Example 4 – Query Tree (Cont
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation =
‘Stanford’(PROJECT)) ⋈ Dnum=Dnumber(DEPARTMENT)) ⋈
Mgr_ssn=Ssn(EMPLOYEE))
48
Example 4 – Query Tree (Cont
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation =
‘Stanford’(PROJECT)) ⋈ Dnum=Dnumber(DEPARTMENT)) ⋈
Mgr_ssn=Ssn(EMPLOYEE))
49
Example 4 – Query Tree (Cont
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation =
‘Stanford’(PROJECT)) ⋈ Dnum=Dnumber(DEPARTMENT)) ⋈
Mgr_ssn=Ssn(EMPLOYEE))
50
Consider The Following Table
Instructor (ID, Name, Dept_name, Salary)
Teaches (ID, Course_ID, Sec_ID, Semester, Year)
Course (Course_ID, Title, Dept_name, Credits)
How to Optimize it
51
Query Trees
∏(name, title)
σ(Dept_name = “Music”)
⋈
⋈
Instructor
Course
52
Query Tree(Optimized)
∏(name, title)
⋈
⋈
σ(Dept_name = “Music”)
Course
53
Example
Query 2: Find the names of all instructors in the CSE department who
have taught a course in 2009, along with the titles of the courses that
they taught
How to Optimize it
54
Query Tree of Example 2
σ(Dept_name = “CSE”)
σ (year = 2009)
⋈
Instructor Teaches
⋈
σ (Dept_name = σ (year = 2009)
‘CSE’)
Instructor Teaches
57
58
59
Query graph:
60
Query Tree Example
• For every project located in ‘ISB’, list the project number, the
controlling department number and the department
manager’s last name, address, and birth date.
SELECT Pnumber, Dnum, Lname, Address,
Bdate
FROM Project, Department, Employee
WHERE Dnum=Dnumber
AND MgrSSN = SSN
AND Plocation = ‘ISB’
61
Query Tree: Example
∏P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
θD.MGRSSN=E.SSN
θP.Dnum=D.Dnumber E
σP.Location=‘ISB’
D
P
Note: The symbol θ represents JOIN
62
… --- Query Tree: Example
∏P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
X E
D
P
63
--- Query Graph Example
…
• For every project located in ‘ISB’, list the project
number, the controlling department number, and
the department manager’s last name, address, and
birth date.
SELECT Pnumber, Dnum, Lname,
Address, Bdate
FROM Project, Department, Employee
WHERE Dnum = Dnumber
AND MgrSSN = SSN
AND Plocation = ‘ISB’
64
… --- Query Graph
Example
[P.Pnumber, P.Dum] [E.lname,E.aAddress, E.Bdate]
P.Dum=D.Numbr D.MgrSSN=E.SSN
P D E
P.Plocation = ‘ISB’
‘Stafford’
65
- Relational Algebra
Transformation
• The following list gives a basic selection of relational
algebra transformation:
• Cascade of selection: Conjunctive selection
operations can cascade into individual selection
operations
p AND q AND r (R) = p (q (r (R)))
• Cascade of projection: In a sequence of projection
operations, only the last in sequence is required.
L(M(…(N(R) = L(R)
66
- Relational Algebra
Transformation
• The following list gives a basic selection of relational
algebra transformation:
• Commutatively of selection: A sequence of selection
operations are commutative
q(p(R)) = p(q(R))
• Associativity of Natural Join and Cross Product:
Natural join (*) and Cross Product (X) are associative:
(R * S) * T = R * (S * T)
(R X S) X T = R X (S X T)
67
- Relational Algebra
Transformation
• The following list gives a basic selection of relational algebra
transformation:
• Cascade of selection: Conjunctive selection operations can cascade into
individual selection operations
p AND q AND r (R) = p (q (r (R)))
• Cascade of projection: In a sequence of projection operations, only the last in
sequence is required.
L(M(…(N(R) = L(R)
• Commutatively of selection: A sequence of selection operations are
commutative
q(p(R)) = p(q(R))
• Associativity of Natural Join and Cross Product: Natural join (*) and Cross
Product (X) are associative:
(R * S) * T = R * (S * T)
(R X S) X T = R X (S X T)
68
Heuristic Optimization-
Further Details
• Perform selection operations as early as possible
• Keep predicates on same relation together
• Combine Cartesian product with subsequent selection whose
predicate represents join condition into a join operation.
• Use associative of binary operations to rearrange leaf nodes with
most restrictive selection operations first.
• Perform projection as early as possible
• Keep projection attributes on same relations together.
• Compute common expressions once.
• If common expression appears more than once, and result not too large,
store result and reuse it when required.
• Useful when querying views, as same expression is used to construct
view each time.
69
Given an SQL Query, Let we Translate it
into its Relational Algebra Expression
R(A,B) S(B,C) T(C,D)
Π 𝐴,𝐷
SELECT R.A,T.D sA<10
FROM R,S,T
WHERE R.B = S.B
AND S.C = T.C
AND R.A < 10; T(C,D)
R(A,B) S(B,C)
Π 𝐴 , 𝐷 (𝜎 𝐴<10 ( 𝑇 ⋈ ( 𝑅 ⋈ 𝑆 ) ) )
70
Optimizing RA Plan
R(A,B) S(B,C) T(C,D) Push down selection
on A so it occurs
SELECT R.A,T.D earlier
FROM R,S,T
WHERE R.B = S.B Π 𝐴,𝐷
AND S.C = T.C sA<10
R(A,B) S(B,C)
Π 𝐴 , 𝐷 (𝜎 𝐴<10 ( 𝑇 ⋈ ( 𝑅 ⋈ 𝑆 ) ) )
71
Optimizing RA Plan
(cont…)
R(A,B) S(B,C) T(C,D)
sA<10
S(B,C)
Π 𝐴 , 𝐷 ( 𝑇 ⋈ ( 𝜎 𝐴< 10( 𝑅)⋈ 𝑆 ) )
R(A,B)
72
Optimizing RA Plan
(cont…)
R(A,B) S(B,C) T(C,D)
sA<10
S(B,C)
Π 𝐴 , 𝐷 ( 𝑇 ⋈ ( 𝜎 𝐴< 10( 𝑅)⋈ 𝑆 ) )
R(A,B)
73
Optimizing RA Plan
(cont…)
R(A,B) S(B,C) T(C,D)
74
-- Query Optimization:
Example …
• Query: Find the last name of employees born after
1975 who work on a project named ‘Aquarius’
SELECT Lname
FROM Employee, Works_on, Project
WHERE Pname = ‘Aquarius’
AND ESSN = SSN
AND Pnumber = PNO
AND Bdate > ‘1975-12-31’
75
… -- Query Optimization: Example …
∏Lname
Initial Canonical
Query Tree X Project
Employee Works_On
76
… -- Query Optimization: Example …
∏Lname
Χ
σESSN=SSN σPname=‘Aquarius’
X Project
σBDate>‘1957-12-13’
Works_On
Employee
77
… -- Query Optimization: Example …
∏Lname
σESSN=SSN
Apply more restrictive SELECT
Χ
σPnumber=PNO σBDate>‘1957-12-13’
X Employee
σPname=‘Aquirius’
Works_On
Project
78
… -- Query Optimization: Example …
∏Lname
Replace CARTESIAN PRDUCT
And SELECT with JOIN θESSN=SSN
σBDate>‘1957-12-13’
θPnumber=Pno
σPname=‘Aquirius’
Employee
Works_On
Project
79
… -- Query Optimization: Example
∏Lname
Move PROJECTION down
θESSN=SSN
∏ESSN ∏SSN,Lname
θPnumber=Pno σBDate>‘1957-12-13’
∏Pnumber ∏ESSN,PNO
Employee
σPname=‘Aquirius’
Works_On
Project
80