CSCE 608 – 600 Database Systems
description
Transcript of CSCE 608 – 600 Database Systems
CSCE 608 – 600 Database Systems
Chapter 15: Query Execution
1
2
Index-Based Algorithms
The existence of an index is especially helpful for selection, and helps others
Clustered relation: tuples are packed into the minimum number of blocks
Clustering index: all tuples with the same value for the index's search key are packed into the minimum number of blocks
3
Index-Based Selection
Without an index, selection takes B(R), or even T(R), disk I/O's.
To select all tuples with attribute a equal to value v, when there is an index on a: search the index for value v and get pointers to
exactly the blocks containing the desired tuples
If index is clustering, then number of disk I/O's is about B(R)/V(R,a)
4
Examples
Suppose B(R) = 1000, T(R) = 20,000, there is an index on a and we want to select all tuples with a = 0. If R is clustered and don't use index: 1000 disk I/O's If R is not clustered and don't use index: 20,000 disk
I/O's If V(R,a) = 100, index is clustering, and use index:
1000/100 = 10 disk I/O's (on average) If V(R,a) = 10, R is not clustered, index is non-clustering,
and use index: 20,000/10 = 2000 disk I/O's (on average) If V(R,a) = 20,000 (a is a key) and use index: 1 disk I/O
5
Using Indexes in Other Operations
1. If the index is a B-tree, can efficiently select tuples with indexed attribute in a range
2. If selection is on a complex condition such as "a = v AND …", first do the index-based algorithm to get tuples satisfying "a = v".
Such splitting is part of the job of the query optimizer
6
Index-Based Join Algorithm
Consider natural join of R(X,Y) and S(Y,Z). Suppose S has an index on Y.
for each block of Rfor each tuple t in the current block
use index on S to find tuples of S that match t in the attribute(s) Y
output the join of these tuples
7
Analysis of Index-Based Join
To get all the blocks of R, either B(R) or T(R) disk I/O's are needed
For each tuple of R, there are on average T(S)/V(S,Y) matching tuples of S T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering T(R)*B(S)/V(S,Y) disk I/O's if index is clustering
This method is efficient if R is much smaller than S and V(S,Y) is large (i.e., not many tuples of S match)
8
Join Using a Sorted Index
Suppose we want to join R(X,Y) and S(Y,Z).
Suppose we have a sorted index (e.g., B-tree) on Y for R and S:do sort-join butno need to sort the indexed relations first
9
Buffer Management
The availability of blocks (buffers) of main memory is controlled by buffer manager.
When a new buffer is needed, a replacement policy is used to decide which existing buffer should be returned to disk.
If the number of buffers available for an operation cannot be predicted in advance, then the algorithm chosen must degrade gracefully as the number of buffers shrinks.
If the number of buffers available is not large enough for a two-pass algorithm, then there are generalizations to algorithms that use three or more passes.
CSCE 608 - 600 Database Systems
Chapter 16: Query Compiler
10
Query Compiler
Parsing
Logical Query Plan
11
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
13
Outline
Convert SQL query to a parse tree Semantic checking: attributes, relation names, types
Convert to a logical query plan (relational algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations
Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
14
Parsing
Goal is to convert a text string containing a query into a parse tree data structure: leaves form the text string (broken into lexical
elements) internal nodes are syntactic categories
Uses standard algorithmic techniques from compilers given a grammar for the language (e.g., SQL),
process the string and build the tree
15
Example: SQL query
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
Assume we have a simplified grammar for SQL.
16
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
SELECT <SelList> FROM <FromList> WHERE <Condition>
17
The Preprocessor
replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query)
does semantic checking: are relations and views mentioned in the schema? are attributes mentioned in the current scope? are attribute types correct?
18
Outline
Convert SQL query to a parse tree Semantic checking: attributes, relation names, types
Convert to a logical query plan (relational algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations
Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
19
Convert Parse Tree to Relational Algebra
Complete algorithm depends on specific grammar, which determines forms of the parse trees
Here give a flavor of the approach
20
Conversion
Suppose there are no subqueries.
SELECT att-list FROM rel-list WHERE cond
is converted into
PROJatt-list(SELECTcond(PRODUCT(rel-list))), or
att-list(cond( X (rel-list)))
SELECT movieTitle
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE '%1960';
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND <Condition>
movieTitle StarsIn <RelName> <Attribute> LIKE <Pattern>
MovieStar birthdate '%1960'
<Condition>
<Attribute> = <Attribute>
starName name
22
Equivalent Algebraic Expression Tree
movieTitle
starname = name AND birthdate LIKE '%1960'
X
StarsIn MovieStar
23
Handling Subqueries
Recall the (equivalent) query:SELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
Use an intermediate format called two-argument selection
24
title
StarsIn <condition>
<tuple> IN name
<attribute> birthdate LIKE ‘%1960’
starName MovieStar
Example: Two-Argument Selection
25
Converting Two-Argument Selection
To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression
Different rules depending on the nature of the subquery
Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)
26
Rules for IN
R <Condition>
t IN S R S
X
C
C is the condition that equatesattributes in t with correspondingattributes in S
27
Example: Logical Query Plan
title
starName=name
StarsIn
birthdate LIKE ‘%1960’
MovieStar
name
28
What if Subquery is Correlated?
Example is when subquery refers to the current tuple of the outer scope that is being tested
More complicated to deal with, since subquery cannot be translated in isolation
Need to incorporate external attributes in the translation
Some details are in textbook
29
Outline
Convert SQL query to a parse tree Semantic checking: attributes, relation names, types
Convert to a logical query plan (relational algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations
Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
30
Improving the Logical Query Plan
There are numerous algebraic laws concerning relational algebra operations
By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently
Next we'll survey some of these laws
31
Associative and Commutative Operations
product natural join set and bag union set and bag intersection
associative: (A op B) op C = A op (B op C) commutative: A op B = B op A
32
Laws Involving Selection
Selections usually reduce the size of the relation
Usually good to do selections early, i.e., "push them down the tree"
Also can be helpful to break up a complex selection into parts
33
Selection Splitting
C1 AND C2 (R) = C1 ( C2 (R))
C1 OR C2 (R) = ( C1 (R)) Uset ( C2 (R))if R is a set
C1 ( C2 (R)) = C2 ( C1 (R))
34
Selection and Binary Operators
Must push selection to both arguments: C (R U S) = C (R) U C (S)
Must push to first arg, optional for 2nd: C (R - S) = C (R) - S C (R - S) = C (R) - C (S)
Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X S, if R has all the atts
in C
35
Pushing Selection Up the Tree
Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName)
and a view CREATE VIEW MoviesOf1996 AS
SELECT *FROM MovieWHERE year = 1996;
and the query SELECT starName, studioName
FROM MoviesOf1996 NATURAL JOIN StarsIn;
36
The Straightforward Tree
starName,studioName
year=1996 StarsIn
Movie Remember the ruleC(R S) = C(R) S ?
37
The Improved Logical Query Plan
starName,studioName
year=1996 StarsIn
Movie
starName,studioName
year=1996
Movie StarsIn
starName,studioName
year=1996 year=1996
Movie StarsIn
push selectionup tree
push selectiondown tree
38
Laws Involving Projections
Consider adding in additional projections Adding a projection lower in the tree can improve
performance, since often tuple size is reduced Usually not as helpful as pushing selections down
If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is
all attributes of R (resp. S) that are used in L Another example:
L(R Ubag S) = L(R) Ubag L(S) But watch out for set union!
39
Push Projection Below Selection?
Rule: L(C(R)) = L(C(M(R)))
where M is all attributes used by L or C But is it a good idea?
SELECT starName FROM StarsIn WHERE movieYear = 1996;
starName
movieYear=1996
StarsIn
starName,movieYear
starName
movieYear=1996
StarsIn
40
Joins and Products
Recall from the definitions of relational algebra:R C S = C(R X S) (theta join)R S = L(C(R X S)) (natural join)
where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates
To improve a logical query plan, replace a product followed by a selection with a joinJoin algorithms are usually faster than doing
product followed by selection
41
Duplicate Elimination
Moving down the tree is potentially beneficial as it can reduce the size of intermediate relations
Can be eliminated if argument has no duplicates a relation with a primary key a relation resulting from a grouping operator
Legal to push through product, join, selection, and bag intersection Ex: (R X S) = (R) X (S)
Cannot push through bag union, bag difference or projection