Lecture 17 Optimization - Stanford...

51
Optimization Overview Lecture 17 Lecture 17

Transcript of Lecture 17 Optimization - Stanford...

OptimizationOverviewLecture17

Lecture17

Today’sLecture

1. LogicalOptimization

2. PhysicalOptimization

3. CourseSummary

2

Lecture17

Logicalvs.PhysicalOptimization

• Logicaloptimization:• Findequivalentplansthataremoreefficient• Intuition:Minimize#oftuplesateachstepbychangingtheorderofRAoperators

• Physicaloptimization:• FindalgorithmwithlowestIOcosttoexecuteourplan• Intuition:Calculatebasedonphysicalparameters(buffersize,etc.)andestimatesofdatasize(histograms)

Execution

SQLQuery

RelationalAlgebra(RA)Plan

OptimizedRAPlan

Lecture17

1.LogicalOptimization

4

Lecture17>Section1

Whatyouwilllearnaboutinthissection

1. OptimizationofRAPlans

2. ACTIVITY:RAPlanOptimization

5

Lecture17>Section1

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

Declarativequery(fromuser)

Translatetorelationalalgebraexpresson

Findlogicallyequivalent- butmoreefficient- RAexpression

Executeeachoperatoroftheoptimizedplan!

Lecture17>Section1 >PlanOptimization

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!

Lecture17>Section1 >PlanOptimization

• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-

• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division

Recall:RelationalAlgebra(RA)

We’lllookatthesefirst!

Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)

Lecture17>Section1 >PlanOptimization

Recall:ConvertingSFWQuery->RA

SELECT DISTINCTgpa,address

FROM Students S,People P

WHERE gpa > 3.5 ANDsname = pname;

HowdowerepresentthisqueryinRA?

Π"#$,$&&'())(𝜎"#$,-./(𝑆 ⋈ 𝑃))

Students(sid,sname,gpa)People(ssn,sname,address)

Lecture17>Section1 >PlanOptimization

Recall:LogicalEquivalece ofRAPlans

• GivenrelationsR(A,B)andS(B,C):

• Here,projection&selectioncommute:• 𝜎45/(Π4(𝑅)) = Π4(𝜎45/(𝑅))

• Whatabouthere?• 𝜎45/(Π8(𝑅))?= Π8(𝜎45/(𝑅))

We’lllookatthisinmoredepthlaterinthelecture…

Lecture17>Section1 >PlanOptimization

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

We’lllookathowtothenoptimizetheseplansnow

Lecture17>Section1 >PlanOptimization

Note:Wecanvisualizetheplanasatree

Π8

R(A,B) S(B,C)

Π8(𝑅 𝐴, 𝐵 ⋈ 𝑆 𝐵, 𝐶 )

Bottom-uptreetraversal=orderofoperationexecution!

Lecture17>Section1 >PlanOptimization

Asimpleplan

Π8

R(A,B) S(B,C)

WhatSQLquerydoesthiscorrespondto?

ArethereanylogicallyequivalentRAexpressions?

Lecture17>Section1 >PlanOptimization

“Pushingdown”projection

Π8

R(A,B) S(B,C)

Π8

R(A,B) S(B,C)

Π8

Whymightwepreferthisplan?

Lecture17>Section1 >PlanOptimization

Takeaways

• Thisprocessiscalledlogicaloptimization

• Manyequivalentplansusedtosearchfor“goodplans”

• Relationalalgebraisanimportantabstraction.

Lecture17>Section1 >PlanOptimization

RAcommutators

• Thebasiccommutators:• Pushprojection through(1)selection,(2)join• Pushselectionthrough(3)selection,(4)projection,(5)join• Also:Joinscanbere-ordered!

• Notethatthisisnotanexhaustivesetofoperations• Thiscoverslocalre-writes;globalre-writespossiblebutmuchharder

ThissimplesetoftoolsallowsustogreatlyimprovetheexecutiontimeofqueriesbyoptimizingRAplans!

Lecture17>Section1 >PlanOptimization

OptimizingtheSFWRAPlan

Lecture17>Section1 >PlanOptimization

Π4,>

R(A,B) S(B,C)

T(C,D)

sA<10

Π4,>(𝜎4?@A 𝑇 ⋈ 𝑅 ⋈ 𝑆 )

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

TranslatingtoRA

Lecture17>Section1 >PlanOptimization

LogicalOptimization

• Heuristically,wewantselectionsandprojectionstooccurasearlyaspossibleintheplan• Terminology:“pushdownselections”and“pushingdownprojections.”

• Intuition:Wewillhavefewertuplesinaplan.• Couldfailiftheselectionconditionisveryexpensive(sayrunssomeimageprocessingalgorithm).• Projectioncouldbeawasteofeffort,butmorerarely.

Lecture17>Section1 >PlanOptimization

Π4,>

R(A,B) S(B,C)

T(C,D)

sA<10

Π4,>(𝜎4?@A 𝑇 ⋈ 𝑅 ⋈ 𝑆 )

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan PushdownselectiononAsoitoccursearlier

Lecture17>Section1 >PlanOptimization

Π4,>

R(A,B)

S(B,C)

T(C,D)

Π4,> 𝑇 ⋈ 𝜎4?@A(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan PushdownselectiononAsoitoccursearlier

sA<10

Lecture17>Section1 >PlanOptimization

Π4,>

R(A,B)

S(B,C)

T(C,D)

Π4,> 𝑇 ⋈ 𝜎4?@A(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan Pushdownprojectionsoitoccursearlier

sA<10

Lecture17>Section1 >PlanOptimization

Π4,>

R(A,B)

S(B,C)

T(C,D)

Π4,> 𝑇 ⋈ Π4,C 𝜎4?@A(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan WeeliminateBearlier!

sA<10

Π4,D

Ingeneral,whenisanattributenotneeded…?

Lecture17>Section1 >PlanOptimization

Activity-17-1.ipynb

24

Lecture17>Section1>ACTIVITY

2.PhysicalOptimization

25

Lecture17>Section2

Whatyouwilllearnaboutinthissection

1. IndexSelection

2. Histograms

3. ACTIVITY

26

Lecture17>Section2

IndexSelectionInput:• Schemaofthedatabase• Workloaddescription: setof(querytemplate,frequency)pairs

Goal:Selectasetofindexesthatminimizeexecutiontimeoftheworkload.• Cost/benefitbalance:Eachadditionalindexmayhelpwithsomequeries,butrequiresupdating

Thisisanoptimizationproblem!

Lecture17>Section2>IndexSelection

Example

SELECT pname, FROM ProductWHERE year = ? AND Category = ? AND manufacturer = ?

SELECT pnameFROM ProductWHERE year = ? AND category = ?

Frequency10,000,000

Workloaddescription:

Frequency10,000,000

Whichindexesmightwechoose?

Lecture17>Section2>IndexSelection

Example

SELECT pnameFROM ProductWHERE year = ? AND Category =? AND manufacturer = ?

SELECT pnameFROM ProductWHERE year = ? AND category =?

Frequency10,000,000

Workloaddescription:

Frequency100

Nowwhichindexesmightwechoose?Worthkeepinganindexwithmanufacturerinitssearchkeyaround?

Lecture17>Section2>IndexSelection

SimpleHeuristic

• Canbeframedasstandardoptimizationproblem:Estimatehowcostchangeswhenweaddindex.

• Wecanasktheoptimizer!

• Searchoverallpossiblespaceistooexpensive,optimizationsurfaceisreallynasty.• RealDBsmayhave1000softables!

• Techniquestoexploitstructureofthespace.• InSQLServer Autoadmin.

NP-hardproblem,butcanbesolved!

Lecture17>Section2>IndexSelection

Estimatingindexcost?

• Notethattoframeasoptimizationproblem,wefirstneedanestimateofthecost ofanindexlookup

• Needtobeabletoestimatethecostsofdifferentindexes/indextypes…

Lecture17>Section2>IndexSelection

Wewillseethismainlydependsongettingestimatesofresultsetsize!

Ex:Clusteredvs.Unclustered

CosttodoarangequeryforMentriesoverN-pagefile(Pperpage):

• Clustered:• Totraverse:Logf(1.5N)• Toscan:1randomIO+ EF@

GsequentialIO

• Unclustered:• Totraverse:Logf(1.5N)• Toscan:~MrandomIO

Lecture17>Section2>IndexSelection

SupposeweareusingaB+Treeindexwith:• Fanout f• Fillfactor2/3

Plugginginsomenumbers

• Clustered:• Totraverse:LogF(1.5N)• Toscan:1randomIO+ EF@

GsequentialIO

• Unclustered:• Totraverse:LogF(1.5N)• Toscan:~MrandomIO

• IfM=1,thenthereisnodifference!• IfM=100,000records,thendifferenceis~10min.Vs.10ms!

Lecture17>Section2>IndexSelection

Tosimplify:• RandomIO=~10ms• SequentialIO=free

~1randomIO=10ms

~M randomIO=M*10ms

IfonlywehadgoodestimatesofM…

Histograms&IOCostEstimation

34

Lecture17>Section2>Histograms

IOCostEstimationviaHistograms

• Forindexselection:• Whatisthecostofanindexlookup?

• Alsofordecidingwhichalgorithmtouse:• Ex:ToexecuteR ⋈ 𝑆,whichjoinalgorithmshouldDBMSuse?

• Whatifwewanttocompute𝝈𝑨,𝟏𝟎(𝐑) ⋈ 𝝈𝑩5𝟏(𝑺)?

• Ingeneral,wewillneedsomewaytoestimate intermediateresultsetsizes

Lecture17>Section2>Histograms

Histogramsprovideawaytoefficientlystoreestimatesofthesequantities

Histograms

• Ahistogramisasetofvalueranges(“buckets”)andthefrequenciesofvaluesinthosebucketsoccurring

• Howtochoosethebuckets?• Equiwidth &Equidepth

• Turnsouthigh-frequencyvaluesareveryimportant

Lecture17>Section2>Histograms

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Values

Frequency

Howdowecomputehowmanyvaluesbetween8and10?(Yes,it’sobvious)

Problem:countstakeuptoomuchspace!

Example

Lecture17>Section2>Histograms

Fullvs.UniformCounts

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Howmuchspacedothefullcounts(bucket_size=1)take?

Howmuchspacedotheuniformcounts(bucket_size=ALL)take?

Lecture17>Section2>Histograms

FundamentalTradeoffs

• Wanthighresolution(likethefullcounts)

• Wantlowspace(likeuniform)

• Histogramsareacompromise!

Sohowdowecomputethe“bucket”sizes?

Lecture17>Section2>Histograms

Equi-width

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Allbucketsroughlythesamewidth

Lecture17>Section2>Histograms

Equidepth

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Allbucketscontainroughlythesamenumberofitems(totalfrequency)

Lecture17>Section2>Histograms

Histograms

• Simple,intuitiveandpopular

• Parameters:#ofbucketsandtype

• Canextendtomanyattributes(multidimensional)

Lecture17>Section2>Histograms

MaintainingHistograms

• Histogramsrequirethatweupdatethem!• Typically,youmustrun/scheduleacommandtoupdatestatisticsonthedatabase• Outofdatehistogramscanbeterrible!

• Thereisresearchworkonself-tuninghistogramsandtheuseofqueryfeedback• Oracle11g

Lecture17>Section2>Histograms

Nastyexample

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1.weinsertmanytuples withvalue>162.wedonotupdatethehistogram3.weaskforvalues>20?

Lecture17>Section2>Histograms

CompressedHistograms

• Onepopularapproach:1. Storethemostfrequentvaluesandtheircountsexplicitly2. Keepanequiwidth orequidepth onefortherestofthevalues

Peoplecontinuetotryallmanneroffancinessherewavelets,graphicalmodels,entropymodels,…

Lecture17>Section2>Histograms

Activity-17-2.ipynb

46

Lecture17>Section2 >ACTIVITY

3.CourseSummary

47

Lecture17>Section3

CourseSummary

• Welearned…

1. Howtodesignadatabase

2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts

3. Howtooptimizetheperformanceofadatabase

• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance

Lecture17>Section3

1.Intro

2-3.SQL

4.ERDiagrams

5-6.DBDesign

7-8.TXNs

11-12.IOCost

14-15.Joins

16.Rel.Algebra

CourseSummary

• Welearned…

1. Howtodesignadatabase

2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts

3. Howtooptimizetheperformanceofadatabase

• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance

Lecture17>Section3

1.Intro

2-3.SQL

4.ERDiagrams

5-6.DBDesign

7-8.TXNs

11-12.IOCost

14-15.Joins

16.Rel.Algebra

CourseSummary

• Welearned…

1. Howtodesignadatabase

2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts

3. Howtooptimizetheperformanceofadatabase

• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance

Lecture17>Section3

1.Intro

2-3.SQL

4.ERDiagrams

5-6.DBDesign

7-8.TXNs

11-12.IOCost

14-15.Joins

16.Rel.Algebra

CourseSummary

• Welearned…

1. Howtodesignadatabase

2. Howtoqueryadatabase,evenwithconcurrentusersandcrashes/aborts

3. Howtooptimizetheperformanceofadatabase

• Wegotasense(astheoldjokegoes)ofthethreemostimportanttopicsinDBresearch:• Performance,performance,andperformance

Lecture17>Section3

1.Intro

2-3.SQL

4.ERDiagrams

5-6.DBDesign

7-8.TXNs

11-12.IOCost

14-15.Joins

16.Rel.Algebra