28-Execution Plan Optimization Techniques Stroffek Kovarik
-
Upload
sreekanthkata -
Category
Documents
-
view
224 -
download
0
Transcript of 28-Execution Plan Optimization Techniques Stroffek Kovarik
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
1/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
2/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
3/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
4/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
5/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
6/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
7/49
Join Graph
Denition
A node represents arelationAn edge indicates that acondition was used torestrict records of theconnected relationsArrows might be used toreect outer joins
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
8/49
Example
We have the following tables
Question/Query?What is the salary and cell phone number of my bossesboss?
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
9/49
Optimizer and Execution Plan
Lets have a query like thisselect
bboss.EmpId, sal.Salary,sal.Currency, con.Value
fromEmployees as emp,
Employees as boss,Employees as bboss,Salaries as sal,Contacts as con
whereemp.LastName=MyNameand emp.ReportsTo = boss.EmpId
and boss.ReportsTo = bboss.EmpIdand bboss.EmpId = sal.EmpIdand con.EmpId = bboss.EmpIdand con.Type = CELL
How to process the query?
There are 5 relations tobe joinedIf we do not allow product joins we have 10 optionsWith product joinsallowed we have 60options
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
10/49
-
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
11/49
Execution PlanLeft-Deep Tree Example
Hash Join (cost=13.84..17.81 rows=1 width=151)Hash Cond: (outer.empid = inner.reportsto) Seq Scan on employees bboss (cost=0.00..3.64 rows=64 width=4) Hash (cost=13.84..13.84 rows=1 width=159)
Hash Join (cost=10.71..13.84 rows=1 width=159)
Hash Cond: (outer.empid = inner.reportsto) Seq Scan on contacts con (cost=0.00..2.80 rows=64 width=136)
Filter: (type = CELL::bpchar) Hash (cost=10.71..10.71 rows=1 width=23)
Hash Join (cost=7.78..10.71 rows=1 width=23)Hash Cond: (outer.empid = inner.reportsto) Seq Scan on sallaries sal (cost=0.00..2.28 rows=128 width=19) Hash (cost=7.77..7.77 rows=1 width=4)
Hash Join (cost=3.80..7.77 rows=1 width=4)
Hash Cond: (outer.empid = inner.reportsto) Seq Scan on employees boss (cost=0.00..3.64 rows=64 width=8) Hash (cost=3.80..3.80 rows=1 width=4)
Seq Scan on employees emp (cost=0.00..3.80 rows=1 width=4)Filter: (lastname = MyName::bpchar)
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
12/49
Execution PlanLeft-Deep Tree Example - pgAdmin
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
13/49
Execution PlanBushy Tree Example
Nested Loop (cost=1.00..13.17 rows=1 width=151)Join Filter: (inner.empid = outer.empid) Hash Join (cost=1.00..4.13 rows=1 width=155)
Hash Cond: (outer.empid = inner.empid) Seq Scan on contacts con (cost=0.00..2.80 rows=64 width=136)
Filter: (type = CELL::bpchar) Hash (cost=1.00..1.00 rows=1 width=19)
Seq Scan on sallaries sal (cost=0.00..1.00 rows=1 width=19) Nested Loop (cost=0.00..9.02 rows=1 width=8)
Join Filter: (outer.reportsto = inner.empid) Nested Loop (cost=0.00..6.01 rows=1 width=4)
Join Filter: (outer.reportsto = inner.empid) Seq Scan on employees emp (cost=0.00..3.00 rows=1 width=4)
Filter: (lastname = MyName::bpchar) Seq Scan on employees boss (cost=0.00..3.00 rows=1 width=8)
Seq Scan on employees bboss (cost=0.00..3.00 rows=1 width=4)
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
14/49
Execution PlanBushy Tree Example - pgAdmin
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
15/49
Rule Based Optimization
Deterministic - produce the same result at every callfor the same queryMostly produces a plan based on a join graphDoes not search through the large spaceThus it is relatively fastDoes not nd the best plan in most casesMany approaches how to nd the plan based on a
use case and database schemaDeprecated Approach
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
16/49
Cost Based Optimization
Evaluate every execution plan by its costUsing relation statistics to calculate estimates andcost
Exponential search space sizeHave to use a proper method for searching the spaceFinds the best plan for a small number of relationsNot possible to nd the best plan for too manyrelations
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
17/49
Deterministic Algorithms
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
18/49
Dynamic Programming
Checks all possible orders of relationsApplicable only for small number of relationsPioneered by IBMs System R database projectIteratively creates and computes costs for everycombination of relations. Start with an access to anevery single relation. Later, compute the cost of joinsof every two relations, afterwards do the same for
more relations.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
19/49
Dynamic ProgrammingExample
Four relations to be joined A, B, C and D.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
20/49
Dijkstras algorithmDescription
A graph algorithm which nds the shortest path from asource node to any destination node in a weightedoriented graph
A graph has to have a non-negative weight functionon edgesThe weight of the path is the sum of weights on edgeson the path
QuestionHow can we manage that the path will go through all thenodes/relations in a join graph?
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
21/49
Dijkstras algorithmDescription II
We will search for the path in a join graph directly but wewill search for a shortest path in a different graph.
Nodes all possible subsets of relations to be joined
Edges directed; connect every node to all othernodes where it is possible to go by performingexactly one joinBeing in a node on the path means that thecorresponding relations are joinedGoing through the edge means that we areperforming a join
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
22/49
Dijkstras algorithmGraph Example
We are joining four relations - A, B, C and D.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
23/49
Dijkstras algorithmDescription III
For each node V we will maintainThe lowest cost c (V ) of already found path to achieve
the nodeThe state of the node open/closed
We will also maintain a set of nodes M which will holdopen nodes
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
24/49
A* Search Algorithm
Modication of Dijkstras algorithmIntroduces a heuristic function h (V ) which calculatesthe lower bound of the path cost from the node V to
the destination nodeWe will choose the node to be processed withminimal c (V ) + h (V ) and mark it as closedIf we have better heuristic function we do not have
to go through the whole space
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
25/49
A* Search AlgorithmPossible Heuristic Functions
The number of tuples in the current nodeReduces paths where too many tuples are prodcedApplicable only in materialization nodes
The number of tuples that need to be fetched from theremaining relationsA realistic lower boundMight be extended to get a lower bound of the cost ofremaining joins
Any other approximation?
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
26/49
Nearest Neighbor
1. Lets start in the rst node2. Chose an edge with the lowest cost going to an
unreached node3. Repeat the previous step until there is any unvisited
node4. If the path found has lower cost than the best path
already known remember the path5. Choose the next node as a starting one and continue
with 2nd step
It is possible to check all the paths of the constant lengthinstead of only edges.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
27/49
Nearest Neighbor for Bushy Trees
1. Build a set of nodes with an access to every relation2. Choose the cheapest join from all the combinations of
possible joins of any two relations from the set
3. Add the result node to the set of relations to be joined4. Remove the originating relations from the set5. Go back to 2nd step if we still have more than one
relation
It is also possible to look forward for more joins at once.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
28/49
Non-Deterministic Algorithms
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
29/49
Random Walk
Randomly goes through the search space
1. Set the MAX VALUE as a cost for the best knownpath
2. Choose a random path3. If the cost of the chosen path is better than the
currently best known remember the path as the best4. Go back to step 2
Very naive, not very usable in practise but can be used forcomparison with other algorithms
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
30/49
Simulated AnnealingIt is an analogy with physics a cooling process ofa crystal.
A crystal has lots of energy and small random changesare happening in its structure. These small changes are
more stable if the crystal emits the energy afterwards.An execution plan has a high cost at the begining. A smallrandom change is generated in every iteration of theprocess. Depending upon the cost change the probability
whether the change is preserved or discarded iscomputed. The change is kept with the computedprobability.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
31/49
Hill-Climbing
1. Start with a random path2. Check all the neighbor paths and nd the one with the
lowest cost3. If the cost of the best neighboring path is lower than
the actual one change the actual one to that neighbor4. Go back to 2nd step if we found a better path in the
previous step
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
32/49
DemoTravelling Salesman Problem
A travelling salesman has N cities which he needs to visit.He would like to nd the shortest path going through allthe cities.
Similar to the execution plan optimization problemThe evaluation function has different properties
The evaluation of an edge between two cities does notdepend on the cities already visitedDifferent behavior of heuristic algorithms
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
33/49
Implemented Algorithms
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
34/49
Implemented Algorithms
Number of algorithms for optimization of large joinqueries was described and studied, e.g. in:
M. Steinbrunn, G. Moerkotte, A. Kemper. Heuristic andRandomized Optimization for the Join Ordering Problem. InVLDB Journal, 6(3), 1997Ioannidis, Y. E. and Kang, Y. 1990. Randomized algorithmsfor optimizing large join queries. In Proceedings of the 1990ACM SIGMOD international Conference on Management ofData
Performance evaluation and comparison of thesealgorithms is availableImplementation is often experimental and only forpurposes of the given research
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
35/49
Studied algorithms
We will look at the following algorithms in more detail
Genetic Algorithm
Simulated AnnealingIterative ImprovementTwo Phase OptimizationToured Simulated Annealing
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
36/49
Genetic Algorithm
Well known to the PostgreSQL community - GEQO -GEnetic Query OptimizerFundamentally different approach designed tosimulate biological evolutionWorks always on a set of solutions called populationSolutions are represented as character strings byappropriate encoding.
In our case query processing trees
Quality, or tness of the solution is measured by anobjective function
Evaluation cost of processing tree, has to be minimized
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
37/49
Genetic AlgorithmSteps
Zero population of random character strings isgeneratedEach next generation is obtained by a combination of
Selection - Selected ttest members of the populationsurvive to the next generationCrossover - Selected ttest members of the population arecombined producing an offspringMutation - Certain fraction of the population is randomlyaltered (mutated)
This is repeated untilthe population has reached desired qualitypredetermined number of populations has been generatedno improvement has been detected for several generations
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
38/49
Simulated Annealing
Doesnt have to improve on every moveDoesnt get trapped in local minimum so easilyExact behavior determined by parameters:
starting temperature (e.g. function of node cost)temperature reduction (e.g. 90% of the old value)stopping condition (e.g. no improvement in certain numberof temperature reductions)
Performance of the algorithm is dependent on thevalues of these parameters
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
39/49
Simulated AnnealingDetermining neighbor states
Join method choiceLeft-deep processing trees solution spacerepresented by an ordered list of relations to be joined
Swap exchange position of two relations in the list3Cycle Cyclic rotation of three relations in the list
Complete solution space including bushy treesJoin commutativity A B B AJoin associativity (A B) C A (B C)
Left join exchange (A B) C
(A C) BRight join exchange A (B C) B (A C)
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
40/49
Iterative Improvement
Strategy similar to Hill climbing algorithmOvercomes the problem of reaching local minimumSteps of the algorithm
Select random starting pointChoose random neighbor, if the cost is lower than thecurrent node, carry out the moveRepeat step 2 until local minimum is reachedRepeat all steps until stopping condition is metReturn the local minimum with the lowest cost
Stopping condition can be processing xed number ofstarting points or reaching the time limit
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
41/49
Two Phase Optimization
Combination of SA and II, uses advantages of bothRapid local minimum discoverySearch of the neighborhood even with uphill moves
Steps of the algorithmPredened number of starting points selected randomlyLocal minima sought using Iterative improvementLowest of these minima is used as a starting point ofsimulated annealing
Initial temperature of the SA is lower, since only smallproximity of the minimum needs to be covered
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
42/49
Toured Simulated Annealing
Simulated annealing algorithm is run several timeswith different starting pointsStarting points are generated using somedeterministic algorithm with reasonable heuristicThe initial temperature is much lower than with thegeneric simulated annealing algorithmBenets over plain simulated annealing
Covering different parts of the search spaceReduced running time
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
43/49
Experimental Performance
Comparision
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
44/49
Experimental Performance ComparisonWhat to consider
Evaluation plan quality vs. running timeDifferent join graph types
ChainStarCycleGrid
Left-deep tree optimization vs. Bushy treeoptimization
Algorithms involvedGenetic algorithm (Bushy genetic algorithm)Simulated annealing, Iterative improvement, Two phaseoptimization
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
45/49
Experimental Performance ComparisonResults
Bushy tree optimization yields better resultsespecially for chain and cycle join graphs.
If solution quality is preferred, 2PO achieves betterresults (although slightly sacricing the running time).Iterative improvement and Genetic algorithm aresuitable in case the running time is more important.
Pure Simulated annealing requires higher runningtime without providing better solutions.
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
46/49
Demo
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
47/49
Conclusions
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
48/49
Conclusions
We have presented an overview of algorithms usablefor query optimizationChoosing the best algorithm is difcultPostgreSQL has conguration parameters for GEQO
ThresholdPopulation sizeNumber of populationsEffort based approach
More congurable optimizer with different algorithms
OLAP - would prefer better plan optimizationOLTP - would prefer faster plan optimization
http://find/http://goback/ -
8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik
49/49
11
J ulius [email protected]@stroffek.net
Tom a s Kova r [email protected]
http://find/http://goback/