Lecture 17 Optimization - Stanford...
Transcript of Lecture 17 Optimization - Stanford...
Logicalvs.PhysicalOptimization
• Logicaloptimization:• Findequivalentplansthataremoreefficient• Intuition:Minimize#oftuplesateachstepbychangingtheorderofRAoperators
• Physicaloptimization:• FindalgorithmwithlowestIOcosttoexecuteourplan• Intuition:Calculatebasedonphysicalparameters(buffersize,etc.)andestimatesofdatasize(histograms)
Execution
SQLQuery
RelationalAlgebra(RA)Plan
OptimizedRAPlan
Lecture17
Whatyouwilllearnaboutinthissection
1. OptimizationofRAPlans
2. ACTIVITY:RAPlanOptimization
5
Lecture17>Section1
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpresson
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
Lecture17>Section1 >PlanOptimization
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!
Lecture17>Section1 >PlanOptimization
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
Recall:RelationalAlgebra(RA)
We’lllookatthesefirst!
Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)
Lecture17>Section1 >PlanOptimization
Recall:ConvertingSFWQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
HowdowerepresentthisqueryinRA?
Π"#$,$&&'())(𝜎"#$,-./(𝑆 ⋈ 𝑃))
Students(sid,sname,gpa)People(ssn,sname,address)
Lecture17>Section1 >PlanOptimization
Recall:LogicalEquivalece ofRAPlans
• GivenrelationsR(A,B)andS(B,C):
• Here,projection&selectioncommute:• 𝜎45/(Π4(𝑅)) = Π4(𝜎45/(𝑅))
• Whatabouthere?• 𝜎45/(Π8(𝑅))?= Π8(𝜎45/(𝑅))
We’lllookatthisinmoredepthlaterinthelecture…
Lecture17>Section1 >PlanOptimization
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
We’lllookathowtothenoptimizetheseplansnow
Lecture17>Section1 >PlanOptimization
Note:Wecanvisualizetheplanasatree
Π8
R(A,B) S(B,C)
Π8(𝑅 𝐴, 𝐵 ⋈ 𝑆 𝐵, 𝐶 )
Bottom-uptreetraversal=orderofoperationexecution!
Lecture17>Section1 >PlanOptimization
Asimpleplan
Π8
R(A,B) S(B,C)
WhatSQLquerydoesthiscorrespondto?
ArethereanylogicallyequivalentRAexpressions?
Lecture17>Section1 >PlanOptimization
“Pushingdown”projection
Π8
R(A,B) S(B,C)
Π8
R(A,B) S(B,C)
Π8
Whymightwepreferthisplan?
Lecture17>Section1 >PlanOptimization
Takeaways
• Thisprocessiscalledlogicaloptimization
• Manyequivalentplansusedtosearchfor“goodplans”
• Relationalalgebraisanimportantabstraction.
Lecture17>Section1 >PlanOptimization
RAcommutators
• Thebasiccommutators:• Pushprojection through(1)selection,(2)join• Pushselectionthrough(3)selection,(4)projection,(5)join• Also:Joinscanbere-ordered!
• Notethatthisisnotanexhaustivesetofoperations• Thiscoverslocalre-writes;globalre-writespossiblebutmuchharder
ThissimplesetoftoolsallowsustogreatlyimprovetheexecutiontimeofqueriesbyoptimizingRAplans!
Lecture17>Section1 >PlanOptimization
Π4,>
R(A,B) S(B,C)
T(C,D)
sA<10
Π4,>(𝜎4?@A 𝑇 ⋈ 𝑅 ⋈ 𝑆 )
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
TranslatingtoRA
Lecture17>Section1 >PlanOptimization
LogicalOptimization
• Heuristically,wewantselectionsandprojectionstooccurasearlyaspossibleintheplan• Terminology:“pushdownselections”and“pushingdownprojections.”
• Intuition:Wewillhavefewertuplesinaplan.• Couldfailiftheselectionconditionisveryexpensive(sayrunssomeimageprocessingalgorithm).• Projectioncouldbeawasteofeffort,butmorerarely.
Lecture17>Section1 >PlanOptimization
Π4,>
R(A,B) S(B,C)
T(C,D)
sA<10
Π4,>(𝜎4?@A 𝑇 ⋈ 𝑅 ⋈ 𝑆 )
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
OptimizingRAPlan PushdownselectiononAsoitoccursearlier
Lecture17>Section1 >PlanOptimization
Π4,>
R(A,B)
S(B,C)
T(C,D)
Π4,> 𝑇 ⋈ 𝜎4?@A(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
OptimizingRAPlan PushdownselectiononAsoitoccursearlier
sA<10
Lecture17>Section1 >PlanOptimization
Π4,>
R(A,B)
S(B,C)
T(C,D)
Π4,> 𝑇 ⋈ 𝜎4?@A(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
OptimizingRAPlan Pushdownprojectionsoitoccursearlier
sA<10
Lecture17>Section1 >PlanOptimization
Π4,>
R(A,B)
S(B,C)
T(C,D)
Π4,> 𝑇 ⋈ Π4,C 𝜎4?@A(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
OptimizingRAPlan WeeliminateBearlier!
sA<10
Π4,D
Ingeneral,whenisanattributenotneeded…?
Lecture17>Section1 >PlanOptimization
Whatyouwilllearnaboutinthissection
1. IndexSelection
2. Histograms
3. ACTIVITY
26
Lecture17>Section2
IndexSelectionInput:• Schemaofthedatabase• Workloaddescription: setof(querytemplate,frequency)pairs
Goal:Selectasetofindexesthatminimizeexecutiontimeoftheworkload.• Cost/benefitbalance:Eachadditionalindexmayhelpwithsomequeries,butrequiresupdating
Thisisanoptimizationproblem!
Lecture17>Section2>IndexSelection
Example
SELECT pname, FROM ProductWHERE year = ? AND Category = ? AND manufacturer = ?
SELECT pnameFROM ProductWHERE year = ? AND category = ?
Frequency10,000,000
Workloaddescription:
Frequency10,000,000
Whichindexesmightwechoose?
Lecture17>Section2>IndexSelection
Example
SELECT pnameFROM ProductWHERE year = ? AND Category =? AND manufacturer = ?
SELECT pnameFROM ProductWHERE year = ? AND category =?
Frequency10,000,000
Workloaddescription:
Frequency100
Nowwhichindexesmightwechoose?Worthkeepinganindexwithmanufacturerinitssearchkeyaround?
Lecture17>Section2>IndexSelection
SimpleHeuristic
• Canbeframedasstandardoptimizationproblem:Estimatehowcostchangeswhenweaddindex.
• Wecanasktheoptimizer!
• Searchoverallpossiblespaceistooexpensive,optimizationsurfaceisreallynasty.• RealDBsmayhave1000softables!
• Techniquestoexploitstructureofthespace.• InSQLServer Autoadmin.
NP-hardproblem,butcanbesolved!
Lecture17>Section2>IndexSelection
Estimatingindexcost?
• Notethattoframeasoptimizationproblem,wefirstneedanestimateofthecost ofanindexlookup
• Needtobeabletoestimatethecostsofdifferentindexes/indextypes…
Lecture17>Section2>IndexSelection
Wewillseethismainlydependsongettingestimatesofresultsetsize!
Ex:Clusteredvs.Unclustered
CosttodoarangequeryforMentriesoverN-pagefile(Pperpage):
• Clustered:• Totraverse:Logf(1.5N)• Toscan:1randomIO+ EF@
GsequentialIO
• Unclustered:• Totraverse:Logf(1.5N)• Toscan:~MrandomIO
Lecture17>Section2>IndexSelection
SupposeweareusingaB+Treeindexwith:• Fanout f• Fillfactor2/3
Plugginginsomenumbers
• Clustered:• Totraverse:LogF(1.5N)• Toscan:1randomIO+ EF@
GsequentialIO
• Unclustered:• Totraverse:LogF(1.5N)• Toscan:~MrandomIO
• IfM=1,thenthereisnodifference!• IfM=100,000records,thendifferenceis~10min.Vs.10ms!
Lecture17>Section2>IndexSelection
Tosimplify:• RandomIO=~10ms• SequentialIO=free
~1randomIO=10ms
~M randomIO=M*10ms
IfonlywehadgoodestimatesofM…
IOCostEstimationviaHistograms
• Forindexselection:• Whatisthecostofanindexlookup?
• Alsofordecidingwhichalgorithmtouse:• Ex:ToexecuteR ⋈ 𝑆,whichjoinalgorithmshouldDBMSuse?
• Whatifwewanttocompute𝝈𝑨,𝟏𝟎(𝐑) ⋈ 𝝈𝑩5𝟏(𝑺)?
• Ingeneral,wewillneedsomewaytoestimate intermediateresultsetsizes
Lecture17>Section2>Histograms
Histogramsprovideawaytoefficientlystoreestimatesofthesequantities
Histograms
• Ahistogramisasetofvalueranges(“buckets”)andthefrequenciesofvaluesinthosebucketsoccurring
• Howtochoosethebuckets?• Equiwidth &Equidepth
• Turnsouthigh-frequencyvaluesareveryimportant
Lecture17>Section2>Histograms
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Values
Frequency
Howdowecomputehowmanyvaluesbetween8and10?(Yes,it’sobvious)
Problem:countstakeuptoomuchspace!
Example
Lecture17>Section2>Histograms
Fullvs.UniformCounts
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Howmuchspacedothefullcounts(bucket_size=1)take?
Howmuchspacedotheuniformcounts(bucket_size=ALL)take?
Lecture17>Section2>Histograms
FundamentalTradeoffs
• Wanthighresolution(likethefullcounts)
• Wantlowspace(likeuniform)
• Histogramsareacompromise!
Sohowdowecomputethe“bucket”sizes?
Lecture17>Section2>Histograms
Equi-width
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Allbucketsroughlythesamewidth
Lecture17>Section2>Histograms
Equidepth
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Allbucketscontainroughlythesamenumberofitems(totalfrequency)
Lecture17>Section2>Histograms
Histograms
• Simple,intuitiveandpopular
• Parameters:#ofbucketsandtype
• Canextendtomanyattributes(multidimensional)
Lecture17>Section2>Histograms
MaintainingHistograms
• Histogramsrequirethatweupdatethem!• Typically,youmustrun/scheduleacommandtoupdatestatisticsonthedatabase• Outofdatehistogramscanbeterrible!
• Thereisresearchworkonself-tuninghistogramsandtheuseofqueryfeedback• Oracle11g
Lecture17>Section2>Histograms
Nastyexample
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.weinsertmanytuples withvalue>162.wedonotupdatethehistogram3.weaskforvalues>20?
Lecture17>Section2>Histograms
CompressedHistograms
• Onepopularapproach:1. Storethemostfrequentvaluesandtheircountsexplicitly2. Keepanequiwidth orequidepth onefortherestofthevalues
Peoplecontinuetotryallmanneroffancinessherewavelets,graphicalmodels,entropymodels,…
Lecture17>Section2>Histograms
CourseSummary
• Welearned…
1. Howtodesignadatabase
2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts
3. Howtooptimizetheperformanceofadatabase
• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance
Lecture17>Section3
1.Intro
2-3.SQL
4.ERDiagrams
5-6.DBDesign
7-8.TXNs
11-12.IOCost
14-15.Joins
16.Rel.Algebra
CourseSummary
• Welearned…
1. Howtodesignadatabase
2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts
3. Howtooptimizetheperformanceofadatabase
• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance
Lecture17>Section3
1.Intro
2-3.SQL
4.ERDiagrams
5-6.DBDesign
7-8.TXNs
11-12.IOCost
14-15.Joins
16.Rel.Algebra
CourseSummary
• Welearned…
1. Howtodesignadatabase
2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts
3. Howtooptimizetheperformanceofadatabase
• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance
Lecture17>Section3
1.Intro
2-3.SQL
4.ERDiagrams
5-6.DBDesign
7-8.TXNs
11-12.IOCost
14-15.Joins
16.Rel.Algebra
CourseSummary
• Welearned…
1. Howtodesignadatabase
2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts
3. Howtooptimizetheperformanceofadatabase
• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance
Lecture17>Section3
1.Intro
2-3.SQL
4.ERDiagrams
5-6.DBDesign
7-8.TXNs
11-12.IOCost
14-15.Joins
16.Rel.Algebra