Database Management Systems M Query optimization...
Transcript of Database Management Systems M Query optimization...
![Page 1: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/1.jpg)
DBMG
Database Management Systems
Query optimization
1
![Page 2: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/2.jpg)
DBMG
DBMS Architecture
OPTIMIZER
MANAGEMENT OF ACCESS
METHODS
BUFFER MANAGER
CONCURRENCY CONTROL
RELIABILITY MANAGEMENT
SQL INSTRUCTION
System
Catalog
Index Files
Data Files
DATABASEDATABASE
2
![Page 3: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/3.jpg)
DBMG
3
Query optimizer
It selects an efficient strategy for query execution
It is a fundamental building block of a relational DBMS
It guarantees the data independence property
The form in which the SQL query is written does not affect the way in which it is implemented
A physical reorganization of data does not require rewriting SQL queries
![Page 4: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/4.jpg)
DBMG
4
Query optimizer
It automatically generates a query execution planIt was formerly hard-coded by a programmer
The automatically generated execution plan is usually more efficient
It evaluates many different alternatives
It exploits statistics on data, stored in the system catalog, to make decisions
It exploits the best known strategies
It dynamically adapts to changes in the data distribution
![Page 5: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/5.jpg)
DBMG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
5
![Page 6: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/6.jpg)
DBMG
6
Lexical, syntactic and semantic analysis
Analysis of a statement to detect
Lexical errors
e.g., misspelled keywords
Syntactic errors
errors in the grammar of the SQL language
Semantic errors
references to objects which do not actually exist in the database (e.g, attributes or tables)
information in the data dictionary is needed
![Page 7: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/7.jpg)
DBMG
7
Lexical, syntactic and semantic analysis
Output
Internal representation in (extended) relational algebra
Why relational algebra?
It explicitly represents the order in which operators are applied
It is procedural (different from SQL)
There is a corpus of theorems and properties
exploited to modify the initial query tree
![Page 8: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/8.jpg)
DBMG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
DATA
DICTIONARY
8
![Page 9: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/9.jpg)
DBMG
9
Algebraic optimization
Execution of algebraic transformations considered to be always beneficial
Example: anticipation of selection with respect to join
Should eliminate the difference among different formulations of the same query
This step is usually independent of the data distribution
Output
Query tree in “canonical” form
![Page 10: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/10.jpg)
DBMG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
DATA
DICTIONARY
10
![Page 11: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/11.jpg)
DBMG
11
Cost based optimization
Selection of the “best” execution plan by evaluating execution cost
Selection of
the best access method for each table
the best algorithm for each relational operator among available alternatives
Based on a cost model for access methods and algorithms
Generation of the code implementing the best strategy
![Page 12: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/12.jpg)
DBMG
Cost based optimization
Output
Access program in executable format
It exploits the internal structures of the DBMS
Set of dependencies
conditions on which the validity of the query plan depends
e.g., the existence of an index
12
![Page 13: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/13.jpg)
DBMG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
ACCESS PROGRAM SET OF DEPENDENCIES
DATA
DICTIONARY
DATA PROFILES
(STATISTICS ON
DATA)
13
![Page 14: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/14.jpg)
DBMG
14
Execution modes
Compile and go
Compilation and immediate execution of the statement
No storage of the query plan
Dependencies are not needed
![Page 15: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/15.jpg)
DBMG
15
Execution modes
Compile and store
The access plan is stored in the database together with its dependencies
It is executed on demand
It should be recompiled when the data structure changes
![Page 16: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/16.jpg)
DBMG
Database Management Systems
Algebraic optimization
16
![Page 17: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/17.jpg)
DBMG
Algebraic optimization
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
ACCESS PROGRAM SET OF DEPENDENCIES
DATA
DICTIONARY
DATA PROFILES
(STATISTICS ON
DATA)
17
![Page 18: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/18.jpg)
DBMG
18
Algebraic optimization
It is based on equivalence transformations
Two relational expressions are equivalent if they both produce the same query result for any arbitrary database instance
Interesting transformations
reduce the size of the intermediate result to be stored in memory
prepare an expression for the application of a transformation which reduces the size of the intermediate result
![Page 19: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/19.jpg)
DBMG
19
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
![Page 20: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/20.jpg)
DBMG
20
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
2. Cascading projectionspX(E) ≡ pX (pX,Y(E))
![Page 21: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/21.jpg)
DBMG
21
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
2. Cascading projectionspX(E) ≡ pX (pX,Y(E))
3. Anticipation of selection with respect to join (pushing selection down)
sF (E1 E2) ≡ E1 (sF (E2))
F is a predicate on attributes in E2 only
![Page 22: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/22.jpg)
DBMG
22
Transformations
4. Anticipation of projection with respect to join
pL(E1 E2) ≡ pL ((pL1, J(E1)) (pL2,J(E2)))
L1 = L - Schema(E2)
L2 = L - Schema(E1)
J = set of attributes needed to evaluate join predicate p
p p
![Page 23: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/23.jpg)
DBMG
23
Transformations
5. Join derivation from Cartesian product
sF (E1 E2 ) ≡ E1 E2
predicate F only relates attributes in E1 and E2
F
![Page 24: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/24.jpg)
DBMG
24
Transformations
5. Join derivation from Cartesian product
sF (E1 E2 ) ≡ E1 E2
predicate F only relates attributes in E1 and E2
6. Distribution of selection with respect to union
sF(E1 E2 ) ≡ (sF (E1) ) (sF (E2) )
F
![Page 25: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/25.jpg)
DBMG
25
Transformations
5. Join derivation from Cartesian product
sF (E1 E2 ) ≡ E1 E2
predicate F only relates attributes in E1 and E2
6. Distribution of selection with respect to union
sF(E1 E2 ) ≡ (sF (E1) ) (sF (E2) )
7. Distribution of selection with respect to difference
sF(E1 – E2) ≡ (sF (E1) ) – (sF (E2) )
≡ (sF (E1)) – E2
F
![Page 26: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/26.jpg)
DBMG
26
Transformations
8. Distribution of projection with respect to unionpX(E1 E2) ≡ (pX(E1)) (pX(E2))
![Page 27: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/27.jpg)
DBMG
27
Transformations
8. Distribution of projection with respect to unionpX(E1 E2) ≡ (pX(E1)) (pX(E2))
Can projection be distributed with respect to difference?
pX (E1 - E2) ≡ (pX(E1)) - (pX(E2))
![Page 28: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/28.jpg)
DBMG
28
Transformations
8. Distribution of projection with respect to unionpX(E1 E2) ≡ (pX(E1)) (pX(E2))
Can projection be distributed with respect to difference?
pX (E1 - E2) ≡ (pX(E1)) - (pX(E2))
This equivalence only holds if X includes the primary key or a set of attributes with the same properties (unique and not null)
![Page 29: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/29.jpg)
DBMG
29
Transformations
9. Other properties
sF1 V F2(E) ≡ (sF1 (E)) (sF2 (E))
sF1 Ʌ F2(E) ≡ (sF1 (E)) (sF2 (E))
![Page 30: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/30.jpg)
DBMG
30
Transformations
10.Distribution of join with respect to unionE (E1 E2) ≡ (E E1) (E E2)
All binary operators are commutative and associative except for difference
![Page 31: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/31.jpg)
DBMG
31
Example
Tables
EMP (Emp#, ………, Dept#, Salary)
DEPT (Dept#, DName,……………)
SQL query
SELECT DISTINCT DName
FROM EMP, DEPT
WHERE EMP.Dept#=DEPT.Dept#
AND Salary > 1000;
![Page 32: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/32.jpg)
DBMG
32
Example: Algebraic transformations
pDName (sEMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT))
![Page 33: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/33.jpg)
DBMG
33
pDName (sEMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT))
Prop #1
pDName(sSalary >1000 (sEMP.Dept#=DEPT.Dept# (EMPDEPT))
Example: Algebraic transformations
![Page 34: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/34.jpg)
DBMG
34
pDName (sEMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT))
Prop #1
pDName(sSalary >1000 (sEMP.Dept#=DEPT.Dept# (EMPDEPT))
Prop #5
pDName(sSalary >1000 (EMP DEPT)
Example: Algebraic transformations
![Page 35: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/35.jpg)
DBMG
pDName(sSalary >1000 (EMP DEPT)
Prop #3
pDName(sSalary >1000 (EMP)) DEPT)
35
Example: Algebraic transformations
![Page 36: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/36.jpg)
DBMG
pDName(sSalary >1000 (EMP DEPT)
Prop #3
pDName(sSalary >1000 (EMP)) DEPT)
Prop #2 and #4
pDName ((pDept# (sSalary >1000(EMP)) (pDept#,DName(DEPT)))
36
Example: Algebraic transformations
![Page 37: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/37.jpg)
DBMG
37
Example: Query tree
DEPT
pDName
Final query tree
pDept#,DName
EMP
sSalary>1000
pDept#
![Page 38: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/38.jpg)
DBMG
38
Example: Cardinalities
Cardinality (EMP) ≈ 10,000
Cardinality (DEPT) ≈ 100
Cardinality (EMP where Salary > 1000) ≈ 50
![Page 39: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/39.jpg)
DBMG
Database Management Systems
Cost based optimization
39
![Page 40: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/40.jpg)
DBMG
Cost based optimization
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
ACCESS PROGRAM SET OF DEPENDENCIES
DATA
DICTIONARY
DATA PROFILES
(STATISTICS ON
DATA)
40
![Page 41: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/41.jpg)
DBMG
Cost based optimization
It is based on
Data profiles
statistical information describing data distribution for tables and intermediate relational expressions
Approximate cost formulas for access operations
Allow evaluating the cost of different alternatives for executing a relational operator
41
![Page 42: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/42.jpg)
DBMG
Database Management Systems
Data profiles
42
![Page 43: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/43.jpg)
DBMG
Table profiles
Quantitative information on the characteristics of tables and columns
cardinality (# of tuples) in each table T
also estimated for intermediate relational expressions
size in bytes of tuples in T
size in bytes of each attribute Aj in T
number of distinct values of each attribute in T
cardinality of the active domain of the attribute
min and max values of each attribute Aj in T
43
![Page 44: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/44.jpg)
DBMG
44
Table profiles
Table profiles are stored in the data dictionary
Profiles should be periodically refreshed by re-analyzing data in the tables
Update statistics command
Executed on demand
immediate execution during transaction processing would overload the system
![Page 45: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/45.jpg)
DBMG
45
Data profiles
Table profiles are exploited to estimate the size of intermediate relational expressions
For the selection operator
Card (sAi = v (T)) ≈ Card (T)/ Val (Ai in T)
Val (Ai in T) = # of distinct values of Ai in T (active domain)
It holds only under the hypothesis of uniform distribution
![Page 46: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/46.jpg)
DBMG
Database Management Systems
Access operators
46
![Page 47: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/47.jpg)
DBMG
Query tree
Internal representation of the relational expression as a query tree
47
DEPT
pDName
pDept#,DName
EMP
sSalary>1000
pDept#
![Page 48: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/48.jpg)
DBMG
48
Query tree
Leaves correspond to the physical structures
tables, indices
Intermediate nodes are operations on data supported by the given physical structure
e.g., scan, join, group by
![Page 49: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/49.jpg)
DBMG
49
Sequential scan
Executes sequential access to all tuples in a table
also called full table scan
Operations performed during a sequential scan
Projection
discards unnecessary columns
Selection on a simple predicate (Ai=v)
Sorting based on an attribute list
Insert, update, delete
![Page 50: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/50.jpg)
DBMG
50
Sorting
Classical algorithms in computer science are exploited
e.g., quick sort
Size of data is relevant
memory sort
sort on disk
![Page 51: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/51.jpg)
DBMG
Predicate evaluation
If available, it may exploit index access
B+-tree, hash, or bitmap
Simple equality predicate Ai=v
Hash, B+-tree, or bitmap are appropriate
Range predicate v1 ≤ Ai ≤ v2
only B+-tree is appropriate
For predicates with limited selectivity full table scan is better
if available, consider bitmap
51
![Page 52: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/52.jpg)
DBMG
B+-tree versus bitmap
52
Bitmap VS B-Tree
0
100
200
300
400
500
600
0 5 10 15 20 25 30 35 40 45NK
Dis
k s
pace (
MB
)
B-Tree Bitmap
Courtesy of Golfarelli, Rizzi, ”Data warehouse, teoria e pratica della progettazione”, McGraw Hill 2006
B-tree NRLen(Pointer)Bitmap NR NK 1 bit
Len(Pointer) = 48 bit
![Page 53: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/53.jpg)
DBMG
53
Predicate evaluation
Conjunction of predicates Ai= v1 Ʌ Aj= v2
The most selective predicate is evaluated first
Table is read through the index
Next the other predicates are evaluated on the intermediate result
Optimization
First compute the intersection of bitmaps or RIDs coming from available indices
Next table read and evaluation of remaining predicates
![Page 54: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/54.jpg)
DBMG
54
Example: Predicate evaluation
Which female students living in Piemonte are exempt from enrollment fee?
RID Gender Exempt Region
1 M Y Piemonte
2 F Y Liguria
3 M N Puglia
4 M N Sicilia
5 F Y Piemonte
Gender
0
1
0
0
1
Exempt
1
1
0
0
1
Piemonte
1
0
0
0
1
RID 5
![Page 55: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/55.jpg)
DBMG
55
Disjunction of predicates Ai= v1 V Aj= v2
Index access can be exploited only if all predicates are supported by an index
otherwise full table scan
Predicate evaluation
![Page 56: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/56.jpg)
DBMG
56
Join operation
A critical operation for a relational DBMS
connection between tables is based on values
instead of pointers
size of the intermediate result is typically larger than the smaller table
Different join algorithms
Nested loop
Merge scan join
Hash join
Bitmapped join
![Page 57: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/57.jpg)
DBMG
57
A
a
Outer table Inner table
joinattribute
a
a
a
external scan
internal or direct scan
A
Nested loop
![Page 58: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/58.jpg)
DBMG
58
Nested loop
A single full scan is done on the outer table
For each tuple in the outer table
a full scan of the inner table is performed, looking for corresponding values
Also called “brute force”
![Page 59: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/59.jpg)
DBMG
59
Nested loop
Efficient when
inner table is small and fits in memory
optimized scan
join attribute in the inner table is indexed
index scan
Execution cost
The nested loop join technique is not symmetric
The execution cost depends on which table takes the role of inner table
![Page 60: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/60.jpg)
DBMG
60
Merge scan
Left table Right table
joinattribute
left scan
bb
A A
right scan
b
aa
c
d
ee
![Page 61: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/61.jpg)
DBMG
61
Merge scan
Both tables are sorted on the join attributes
The two tables are scanned in parallel
tuple pairs are generated on corresponding values
Execution cost
The merge scan technique is symmetric
requires sorting both tables
may be sorted by a previous operation
may be read through a clustered index on join attributes
More used in the past
efficient for large tables, because sorted tables may be stored on disk
![Page 62: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/62.jpg)
DBMG
62
Hash Join
de
ac
jp
em
aw
jz
Join
Attribute
Buckets for left table
Buckets for right table
HASH(a)
From left table
HASH(a)
From right table
![Page 63: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/63.jpg)
DBMG
63
Hash join
Application of the same hash function to the join attributes in both tables
Tuples to be joined end up in the same buckets
collisions are generated by tuples yielding the same hash function result with different attribute value
A local sort and join is performed into each bucket
Very fast join technique
![Page 64: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/64.jpg)
DBMG
64
Bitmapped join index
Bit matrix that precomputes the join between two tables A and B
One column for each RID in table A
One row for each RID in table B
Position (i, j) of the matrix is
1 if tuple with RID j in table A joins with tuple with RID i in table B
0 otherwise
Updates may be slow
RID 1 2 … n
1 0 0 … 1
2 0 1 … 0
3 0 0 … 1
4 1 0 … 0
… … … … 0
![Page 65: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/65.jpg)
DBMG
65
Bitmapped join
Typically used in OLAP queries
joining several tables with a large central table
Example
Exam table, joined to Student and Course tables
Exploits one or more bitmapped join indices
One for each pair of joined tables
Access to the large central table is the last step
![Page 66: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/66.jpg)
DBMG
66
Bitmapped join
Complex queries may exploit jointly
bitmapped join indices
bitmap indices for predicates on single tables
![Page 67: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/67.jpg)
DBMG
67
Example: Bitmapped join
Average score of male students for exams of courses in the first year of the master degree
STUDENT (Reg#, SName, Gender)
COURSE (Course#, CName, CourseYear)
EXAM (Reg#, Course#, Date, Grade)
SELECT AVG (Grade)
FROM STUDENT S, EXAM E, COURSE C
WHERE E.Reg# = S.Reg#
AND E.Course# = C.Course#
AND CourseYear = ‘1M’
AND Gender = ‘M’;
![Page 68: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/68.jpg)
DBMG
Bitmapped join
RID … 1M … …
1 0 1 … 0
2 0 0 … 0
3 0 0 … 1
4 0 1 … 0
5 1 0 … 0
Bitmap for CourseYear attribute
1
0
0
0
1
…
RID 1 … 4 …
1 0 … 1 1
2 0 … 1 0
3 0 … 0 1
4 1 … 0 0
… … … … …
Bitmapped join indexfor Course-Exams join
4
1
1
0
0
…
OR
RIDs 1 and 4
RIDCY
1
1
0
1
…
=
… FROM EXAM E, COURSE C
WHERE E.Course# = C.Course#
AND CourseYear = ‘1M’ …
68
![Page 69: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/69.jpg)
DBMG
Bitmapped join
RID
1
0
0
1
…
AND
RIDCY
1
1
0
1
…
=
RIDG
1
0
0
1
…
RIDs of Exam table for tuples to be read
bitmap for Course-Exampredicates and join
bitmap for Student-Exampredicates and join
69
![Page 70: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/70.jpg)
DBMG
70
Group by
Sort based
Sort on the group by attributes
Next compute aggregate functions on groups
Hash based
Hash function on the group by attributes
Next sort each bucket and compute aggregate functions
Materialized views may be exploited to improve the performance of aggregation operations
![Page 71: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/71.jpg)
DBMG
Database Management Systems
Execution plan selection
71
![Page 72: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/72.jpg)
DBMG
Cost based optimization
Inputs
Data profiles
Internal representation of the query tree
Output
“Optimal” query execution plan
Set of dependencies
It evaluates the cost of different alternatives for
reading each table
executing each relational operator
It exploits approximate cost formulas for access operations
72
![Page 73: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/73.jpg)
DBMG
General approach to optimization
The search for the optimal plan is based on the following dimensions
The way data is read from disk
e.g., full scan, index
The execution order among operators
e.g., join order between two join operations
The technique by means of which each operator is implemented
e.g., the join method
When to perform sort (if sort is needed)
73
![Page 74: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/74.jpg)
DBMG
General approach to optimization
The optimizer builds a tree of alternatives in which
each internal node makes a decision on a variable
each leaf represents a complete query execution plan
74
![Page 75: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/75.jpg)
DBMG
Example
Given 3 tables
R, S, T
Compute the join
R S T
Execution alternatives
4 join techniques to evaluate (for both joins)
3 join orders
In total, at most
4 * 4 * 3 = 48 different alternatives
75
![Page 76: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/76.jpg)
DBMG
Example
R S T1 2
R S T1 2
S T R1 2
R T S1 2
NESTED LOOP1
R INNER
1NESTED LOOP
S INNER
MERGE SCAN1
2NESTED LOOP
T INNER
2NESTED LOOP
T OUTER
HASH JOIN1
LEAF NODE
76
![Page 77: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/77.jpg)
DBMG
Best execution plan selection
The optimizer selects the leaf with the lowest cost
General formula
CTotal = CI/O x nI/O + Ccpu x ncpu
nI/O is the number of I/O operations
ncpu is the number of CPU operations
The selection is based on operation research optimization techniques
e.g., branch and bound
77
![Page 78: Database Management Systems M Query optimization Gdbdmg.polito.it/.../wp-content/uploads/2018/11/14-QueryOptimization… · cost based optimization sql query lexical, syntactic and](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9927fdb40ba3c186660cc/html5/thumbnails/78.jpg)
DBMG
Best execution plan selection
The final execution plan is an approximation of the best solution
The optimizer looks for a solution which is of the same order of magnitude of the “best” solution
For compile and go
it stops when the time spent in searching is comparable to the time required to execute the current best plan
78