Programming by Optimisationhoos/PbO/Tutorials/IJCAI-17/ijcai-17-tutorial-slides.pdfProgramming by...

Programming by Optimisation:

A Practical Paradigm

for Computer-Aided Algorithm Design

Holger H. Hoos & Frank Hutter

Leiden Institute for Advanced CSUniversiteit LeidenThe Netherlands

Department of Computer ScienceUniversitat FreiburgGermany

IJCAI 2017

Melbourne, Australia, 2017/08/21

The age of machines

“As soon as an Analytical Engine exists, it will necessarily guide the futurecourse of the science. Whenever any result is sought by its aid, the questionwill then arise – by what course of calculation can these results be arrived atby the machine in the shortest time?”

(Charles Babbage, 1864)

Hoos & Hutter: Programming by Optimization 2

The age of computation

“The maths[!] that computers use todecide stu↵ [is] infiltrating every aspectof our lives.”

I financial markets

I social interactions

I cultural preferences

I artistic production

I . . .

Performance matters ...

I computation speed (time is money!)

I energy consumption (battery life, ...)

I quality of results (cost, profit, weight, ...)

... increasingly:

I globalised markets

I just-in-time production & services

I tighter resource constraints

Example: Resource allocation

I resources > demands many solutions, easy to find

economically wasteful reduction of resources / increase of demand

I resources < demands no solution, easy to demonstrate

lost market opportunity, strain within organisation increase of resources / reduction of demand

I resources ⇡ demands di�cult to find solution / show infeasibilityresources ⇡demands di�cult to find solution / show infeasibility

This tutorial:

new approach to software development, leveraging . . .

I human creativity

I optimisation & machine learning

I large amounts of computation / data

Key idea:

I program (large) space of programs

I encourage software developers toI avoid premature commitment to design choicesI seek & maintain design alternatives

I automatically find performance-optimising designsfor given use context(s)

) Programming by Optimization (PbO)

application context 1

solver

application context 2 application context 3

solversolver

application context 1

solver[p1]

application context 2 application context 3

solver[p3]solver

solver[·]

solversolversolversolver[p2]

Outline

1. Programming by Optimization: Motivation & Introduction

2. Algorithm Configuration (incl. Co↵ee Break)

3. Portfolio-based Algorithm Selection

4. Software Development Support & Further Directions

Programming by Optimization:

Motivation & Introduction

Example: SAT-based software verificationHutter, Babic, Hoos, Hu (2007)

I Goal: Solve SAT-encoded software verification problemsGoal: as fast as possible

I new DPLL-style SAT solver Spear (by Domagoj Babic)

= highly parameterised heuristic algorithm= (26 parameters, ⇡ 8.3⇥ 1017 configurations)

I manual configuration by algorithm designer

I automated configuration using ParamILS, a genericalgorithm configuration procedureHutter, Hoos, Stutzle (2007)

Spear: Performance on software verification benchmarks

solver num. solved mean run-time

MiniSAT 2.0 302/302 161.3 CPU sec

Spear original 298/302 787.1 CPU secSpear generic. opt. config. 302/302 35.9 CPU secSpear specific. opt. config. 302/302 1.5 CPU sec

I ⇡ 500-fold speedup through use automated algorithmconfiguration procedure (ParamILS)

I new state of the art(winner of 2007 SMT Competition, QF BV category)

Levels of PbO:

Level 4: Make no design choice prematurely thatcannot be justified compellingly.

Level 3: Strive to provide design choices andalternatives.

Level 2: Keep and expose design choices consideredduring software development.

Level 1: Expose design choices hardwired intoexisting code (magic constants, hiddenparameters, abandoned design alternatives).

Level 0: Optimise settings of parameters exposedby existing software.

Success in optimising speed:

Application, Design choices Speedup PbO level

SAT-based software verification (Spear), 26Hutter, Babic, Hoos, Hu (2007)

4.5–500 ⇥ 2–3

AI Planning (LPG), 62Vallati, Fawcett, Gerevini, Hoos, Saetti (2011)

3–118 ⇥ 1

Mixed integer programming (CPLEX), 76Hutter, Hoos, Leyton-Brown (2010)

2–52 ⇥ 0

... and solution quality:

University timetabling, 18 design choices, PbO level 2–3 new state of the art; UBC exam schedulingFawcett, Chiarandini, Hoos (2009)

Machine learning / Classification, 786 design choices, PbO level 0–1 outperforms specialised model selection & hyper-parameter optimisation methods from machine learningThornton, Hutter, Hoos, Leyton-Brown (2012–13)

PbO enables . . .

I performance optimisation for di↵erent use contexts(some details later)

I adaptation to changing use contexts(see, e.g., life-long learning – Thrun 1996)

I self-adaptation while solving given problem instance(e.g., Battiti et al. 2008; Carchrae & Beck 2005; Da Costa et al. 2008)

I automated generation of instance-based solver selectors(e.g., SATzilla – Leyton-Brown et al. 2003, Xu et al. 2008;Hydra – Xu et al. 2010; ISAC – Kadioglu et al. 2010;AutoFolio – Lindaue 2015)

I automated generation of parallel solver portfolios(e.g., Huberman et al. 1997; Gomes & Selman 2001;Hoos et al. 2012; Lindauer et al. 2017)

Cost & concerns

But what about ...

I Computational complexity?

I Cost of development?

I Limitations of scope?

Computationally too expensive?

Spear revisited:

I total configuration time on software verification benchmarks:⇡ 20 CPU days

I wall-clock time on 10 CPU cluster:⇡ 2 days

I cost on Amazon Elastic Compute Cloud (EC2):60.23 AUD (= 48 USD)

I 60.23 AUD pays for ...

I 1h42 of typical software engineer in AustraliaI 3h33 at minimum wage in Australia

Too expensive in terms of development?

Design and coding:

I tradeo↵ between performance/flexibility and overhead

I overhead depends on level of PbO

I traditional approach: cost from manual exploration ofdesign choices!

Testing and debugging:

I design alternatives for individual mechanisms and componentscan be tested separately

e↵ort linear (rather than exponential) in the number ofdesign choices

Limited to the “niche” of NP-hard problem solving?

Some PbO-flavoured work in the literature:

I computing-platform-specific performance optimisationof linear algebra routines(Whaley et al. 2001)

I optimisation of sorting algorithmsusing genetic programming(Li et al. 2005)

I compiler optimisation(Pan & Eigenmann 2006; Cavazos et al. 2007; Fawcett et al. 2017)

I database server configuration(Diao et al. 2003)

Algorithm Configuration:

Methods

Overview• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues[coffee]– CaseStudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

AlgorithmConfiguration• Inanutshell:optimizationoffreeparameters

– Whichparameters?Theonesyou‘dotherwisetunemanually&more

• ExamplesoffreeparametersinvarioussubfieldsofAI– Treesearch(inparticularforSAT):pre-processing,branching

heuristics,clauselearning&deletion,restarts,datastructures,...

– Localsearch:neighbourhoods,perturbations,tabu length,annealing...

– Geneticalgorithms:populationsize,matingscheme,crossoveroperators,mutationrate,localimprovementstages,hybridizations,...

– MachineLearning:pre-processing,regularization(type&strength),minibatch size,learningrateschedules,optimizer&itsparameters,...

– Deeplearning(inaddition):#layers(&layertypes),#units/layer,dropoutconstants,weightinitializationanddecay,non-linearities,...

AlgorithmParameters

Parametertypes– Continuous,integer,ordinal– Categorical:finitedomain,unordered,e.g.{a,b,c}

Parameterspacehasstructure– E.g.parameterCisonlyactiveifheuristicH=hisused– Inthiscase,wesayCisaconditionalparameter withparentH

Parametersgiverisetoastructuredspaceofalgorithms– Manyconfigurations (e.g.1047)– Configurationsoftenyieldqualitativelydifferentbehaviour® Algorithmconfiguration (asopposedto“parametertuning”)

TheAlgorithmConfigurationProcess

AlgorithmConfiguration– inMoreDetail

AlgorithmConfiguration– FullFormalDefinition

Distributionvs SetofInstances

AlgorithmConfigurationisaUsefulAbstraction

Two differentinstantiations:

Largeimprovementstosolversformanyhardcombinatorialproblems

SAT,Max-SAT,MIP,SMT,TSP,ASP,time-tabling,AIplanning,…Competitionwinnersforalloftheserelyonconfigurationtools

AlgorithmConfigurationisaUsefulAbstraction

Increasinglypopular(citationnumbers:GoogleScholar)

20082009

20102011

20122013

20142015

ParamILS (Hutter et al. 07+09)SMAC (Hutter et al. 11)I/F-Race (Balaprakash et al. 07, Birattari et al. 10, López-Ibáñez et al. 11)GGA/GGA++ (Ansotegui et al. 09+15)

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

ConfiguratorshaveTwoKeyComponents

• Component1:whichconfigurationtoevaluatenext?– Outofalargecombinatorialsearchspace– E.g.,CPLEX:76parameters,1047 configurations

• Component2:howtoevaluatethatconfiguration?– Evaluatingperformanceofaconfigurationisexpensive– E.g.,CPLEX:budgetof10000sperinstance– Instancesvaryinhardness

• Sometakemilliseconds,otherdays(forthedefault)• Improvementonafewinstancesmightnotmeanmuch

Component1:WhichConfigurationtoChoose?

• Forthiscomponent,wecanconsiderasimplerproblem:

Blackbox functionoptimization

– Onlymodeofinteraction:queryf(q)atarbitraryqÎQ

– Abstractsawaythecomplexityofmultipleinstances– Q isstillastructuredspace

• Mixedcontinuous/discrete• Conditionalparameters• Stillmoregeneralthan“standard”continuousBBO[e.g.,Hansenetal.]

minf(q)

q f(q)

Component1:WhichConfigurationtoChoose?

• Needtobalancediversificationandintensification• Theextremes

– Randomsearch– Hill-climbing

• Stochasticlocalsearch(SLS)• Population-basedmethods• Model-BasedOptimization

SequentialModel-BasedOptimization

• Fita(probabilistic)modeloffunctionf

• Usethatmodeltotradeoffexploitationvs exploration

• InmachinelearningliteratureknownasBayesianOptimization

Component2:HowtoEvaluateaConfiguration?

Backtogeneralalgorithmconfiguration

• Generalprinciple– Don’twastetoomuchtimeonbadconfigurations– Evaluategoodconfigurationsmorethoroughly

SimplestSolution:UseFixedNInstances

• Effectivelytreatstheproblemasblack-boxfunctionoptimization

• Issue:howlargetochooseN?– Toosmall:over-tuningtothoseinstances– Toolarge:everyfunctionevaluationisslow

RacingAlgorithms

• Comparetwoormorealgorithmsagainsteachother– Performonerunforeachconfigurationatatime– Discardconfigurationswhendominated

Imagesource:Maron &Moore,Hoeffding Races,NIPS1994

[Maron &Moore,NIPS1994][Birattari,Stützle,Paquete&Varrentrapp,GECCO2002]

SavingTime:AggressiveRacing

• Racenewconfigurationsagainstthebestknown– Discardpoornewconfigurationsquickly– Norequirementforstatisticaldomination

• Searchcomponentshouldallowtoreturntoconfigurationsdiscardedbecausetheywere“unlucky”

[Hutter,Hoos &Stützle,AAAI2007]

SavingMoreTime:AdaptiveCapping

Canterminaterunsforpoorconfigurationsq’early:– Is q’ better than q?

• Example:

• Can terminate evaluation of q’ once guaranteed to be worse than q

RT(q)=20 RT(q’)>20

RT(q’) = ?

(onlywhenminimizingalgorithmruntime)

[Hutter,Hoos,Leyton-Brown&Stützle,JAIR2009]

Overview

Overview:AlgorithmConfigurationSystems• Continuousparameters,singleinstances(black-boxfunctionoptimization)– Covarianceadaptationevolutionarystrategy(CMA-ES)

[Hansenetal. 2006-17]– SequentialParameterOptimization(SPO)

[Bartz-Beielstein etal. 2006]

• Generalalgorithmconfigurationmethods– ParamILS [Hutter etal. 2007,2009;Blotetal.2016]– Gender-basedGeneticAlgorithm(GGA)[Ansotegui etal. 2009,2015]– IteratedF-Race[Birattari etal. 2002,2010]– SequentialModel-basedAlgorithmConfiguration(SMAC)

[Hutter etal. 2011-17]– DistributedSMAC[Hutter etal. 2012-17]

TheBaseline:GraduateStudentDescent

StartwithsomeconfigurationrepeatModifyasingleparameterifperformanceonabenchmarksetdegradesthen

undomodificationuntilnomoreimprovementpossible

(or“goodenough")

TheParamILS Framework

IteratedLocalSearchinparameterconfigurationspace:

® Performsbiasedrandomwalkoverlocaloptima

[Hutter,Hoos,Leyton-Brown&Stützle,AAAI2007&JAIR2009]

TheBasicILS(N)algorithm

• InstantiatestheParamILS framework• UsesafixednumberofNrunsforeachevaluation

– SampleNinstancefromgivenset(withrepetitions)– Sameinstances(andseeds)forevaluatingallconfigurations– Essentiallytreatstheproblemasblackbox optimization

• HowtochooseN?– Toohigh:evaluatingaconfigurationisexpensive

® Optimizationprocessisslow– Toolow:noisyapproximationsoftruecost

® Poorgeneralizationtotestinstances/seeds

GeneralizationtoTestset,LargeN(N=100)

SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)

GeneralizationtoTestSet,SmallN(N=1)

SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)

BasicILS:Speed/GeneralizationTradeoff

46TestperformanceofSAPSonasingleQWHinstance

TheFocusedILS Algorithm

Aggressiveracing:morerunsforgoodconfigurations– StartwithN(q)=0forallconfigurations– IncrementN(q) wheneverthesearchvisitsq– “Bonus”runsforconfigurationsthatwinmanycomparisons

TheoremAsnumberofFocusedILS iterations®¥,convergencetooptimalconfiguration

– Keyideasinproof:1.Theunderlying ILSeventuallyreachesanyconfiguration2.ForN(q)®¥,theerrorincostapproximationsvanishes

FocusedILS:Speed/GeneralizationTradeoff

48TestperformanceofSAPSonasingleQWHinstance

SpeedingupParamILS

Standardadaptivecapping– Is q’ better than q?

• Example:

• Can terminate evaluation of q’ once guaranteed to be worse than q

TheoremEarly termination of poor configurations does not change ParamILS trajectory

– Often yields substantial speedups

– Especially when best configuration is much faster than worst

RT(q)=20 RT(q’)>20

[Hutter ,Hoos,Leyton-Brown,andStützle,JAIR2009]

Gender-basedGeneticAlgorithm(GGA)

• Geneticalgorithm– Genome=parameterconfiguration– Combinegenomesof2parentstoformanoffspring

• Supportsadaptivecapping– Evaluatepopulationmembersinparallel– Adaptivecapping:canstopwhenthefirstksucceed

• UseNinstancestoevaluateconfigurations– IncreaseNineachgeneration– LinearincreasefromNstart toNend

• Notrecommendedforsmallbudgets• Userhastospecify#generationsaheadoftime

[Ansotegui,Sellmann &Tierney,CP2009&IJCAI2015]

F-RaceandIteratedF-Race• F-Race

– Standardracingframework– F-testtoestablishthatsome

configurationisdominated– Followedbypairwiset tests

ifF-testsucceeds

• IteratedF-Race– Maintainaprobabilitydistribution

overwhichconfigurationsaregood– Samplek configurationsfromthatdistribution&racethem– Updatedistributionswiththeresultsoftherace

• Well-supportedsoftwarepackageinR

[Birattari etal.,GECCO2002&ORPerspectives2016;PérezCáceres etal.,LION2017]

Model-BasedAlgorithmConfiguration

SMAC:SequentialModel-BasedAlgorithmConfiguration– SequentialModel-BasedOptimization&aggressiveracing

repeat- constructmodeltopredictperformance- usethatmodeltoselectpromisingconfigurations- compareeachselectedconfigurationagainstthebestknown

until timebudgetexhausted52

[Hutter,Hoos &Leyton-Brown,LION2011]

SMAC:AggressiveRacing

• SimilarracingcomponentasFocusedILS– Morerunsforgoodconfigurations– Increase#runsforincumbentovertime

• Theorem fordiscreteconfigurationspaces:AsoveralltimebudgetforSMAC®¥,convergencetooptimalconfiguration

PoweringSMAC:EmpiricalPerformanceModels

Given:– Configuration space– Foreachprobleminstancei:vectorxi offeaturevalues– Observedalgorithmruntimedata:(q1,x1,y1),…,(qn,xn ,yn)

Find:mapping m: [q, x] ↦ y predictingA’sperformance

– Richliteratureonsuchperformancepredictionproblems[see,e.g.,Hutter,Xu,Hoos,Leyton-Brown,AIJ2014,foroverview]

– Here:usemodelm basedonrandomforests54

≈m (q,x)

RegressionTrees:FittingtoData

– Ineachinternalnode:onlystoresplitcriterionused– Ineachleaf:storemeanofruntimes

param3 Î {red} param3 Î {blue, green}

feature2 > 3.5feature2 ≤ 3.5

3.7 1.65 …55

feature2 > 3.5

RegressionTrees:PredictionsforNewInputs

param3 Î {red} param3 Î {blue, green}

feature2 ≤ 3.5

3.7 1.65 …

E.g. xn+1 = (true, 4.7, red)

– Walk down tree, return mean runtime stored in leaf Þ 1.65

RandomForests:SetsofRegressionTrees

Training– DrawTbootstrapsamples ofthedata– Foreachbootstrapsample,fitarandomized regressiontree

Prediction– PredictwitheachoftheTtrees– ReturnempiricalmeanandvarianceacrosstheseTpredictions

Complexity forNdatapoints– Training:O(TNlog2 N)– Prediction:O(TlogN)

AdvantagesofRandomForests

Automatedselectionofimportantinputdimensions– Continuous,integer,andcategoricalinputs– Upto138features,76parameters– Canidentifyimportantfeatureandparametersubsets

• Sometimes1feature,2parameterssufficient![Hutter,Hoos,Leyton-Brown,LION2013]

Robustness– Noneedtooptimizehyperparameters– Alreadygoodpredictionswithfewtrainingdatapoints

SMAC:AveragingAcrossMultipleInstances

• Fitrandomforestmodel

• Aggregateoverinstancesbymarginalization

– Intuition:predictforeachinstanceandthenaverage– Moreefficientimplementationinrandomforests

SMAC:PuttingitallTogetherInitializewithsinglerunfordefaultconfigurationrepeat

1.LearnRFmodelfromdatasofar:

2.Aggregateoverinstances:

3. Usemodelf toselectpromisingconfigurations4.Raceeachselectedconfigurationagainstthebestknown

until timebudgetexhausted

• Distributed SMAC[Hutter,Hoos &Leyton-Brown,2012]– Maintainqueueofpromisingconfigurations– Racetheseagainstbestknownondistributedworkercores

SMAC:AdaptiveCapping

Terminaterunsforpoorconfigurationsq early:

– Lowerboundonruntime® right-censoreddatapoint f(q)>20f(q*)=20

[Hutter,Hoos &Leyton-Brown,BayesOpt 2011]

ExperimentalEvaluation

ComparedSMACtoParamILS,GGA– On 17 SAT and MIP configuration scenarios, same time budget

SMACperformedbest– Improvements in test performance of configurations returned

• vs ParamILS: 0.93´ - 2.25´ (11/17 cases significantly better)

• vs GGA: 1.01´ - 2.76´ (13/17 cases significantly better)

Wall-clockspeedupsindistributedSMAC– Almostperfectwithupto16parallelworkers– Upto50-foldwith64workers

• Reductionsinwallclocktime:5h® 6min-15min2days® 40min- 2h

[Hutter,Hoos &Leyton-Brown,LION2011]

Overview

TheAlgorithmConfigurationProcess

preproc {none,simple,expensive}[simple]alpha[1,5][2]beta[0.1,1][0.5]

Parameterspacedeclarationfile./wrapper–inst X–timeout30-preproc none-alpha3-beta0.7® e.g.“successfulafter3.4seconds”

Wrapperforcommandlinecall

Whattheuserhastoprovide

Example:RunningSMAC

wget http://www.cs.ubc.ca/labs/beta/Projects/SMAC/smac-v2.10.03-master-778.tar.gz

tar xzvf smac-v2.10.03-master-778.tar.gz

cd smac-v2.10.03-master-778

./smac

./smac --seed 0 --scenarioFile example_scenarios/spear/spear-scenario.txt

Scenariofileholds:- Locationofparameterfile,wrapper&instances- Objectivefunction(here:minimizeavg.runtime)- Configurationbudget(here:30s)- Maximalcaptime pertargetrun(here:5s)

Forausagescreen

OutputofaSMACrun

[INFO ] *****Runtime Statistics*****Incumbent ID: 12 (0x22BB8)

Number of Runs for Incumbent: 43

Number of Instances for Incumbent: 5Number of Configurations Run: 42

Performance of the Incumbent: 0.012555555555555556Configuration Time Budget used: 30.589647351000067 s (101%)

Sum of Target Algorithm Execution Times (treating minimum value as 0.1): 24.70000s

CPU time of Configurator: 5.889042742 s[INFO ] **********************************************

[INFO ] Total Objective of Final Incumbent 12 (0x22BB8) on training set:

0.012555555555555556; on test set: 0.014499999999999999

[INFO ] Sample Call for Final Incumbent 12 (0x22BB8) cd /ubc/cs/home/h/hutter/tmp/smac-v2.06.00-master-615/example_scenarios/spear; ruby spear_wrapper.rbinstances/qcplin2006.10408.cnf 0 5.0 2147483647 3282095 -sp-update-dec-queue '0' -sp-rand-var-dec-scaling

'0.3528466348383826' -sp-clause-decay '1.713857938112484' -sp-variable-decay '1.461422623379798' -sp-orig-clause-sort-heur '7' -sp-rand-phase-dec-freq '0.05' -sp-clause-del-heur '0' -sp-learned-clauses-inc'1.452683835620401' -sp-restart-inc '1.6481745669620091' -sp-resolution '0' -sp-clause-activity-inc'0.7121640599232154' -sp-learned-clause-sort-heur '12' -sp-var-activity-inc '0.9358501810374242' -sp-rand-var-dec-freq '0.0001' -sp-use-pure-literal-rule '1' -sp-learned-size-factor '0.27995062371127827' -sp-var-dec-heur '16' -sp-

phase-dec-heur '6' -sp-rand-phase-scaling '1.0424648235977578' -sp-first-restart '31'

Decision#1:ConfigurationBudget&Captime

• Configurationbudget– Dictatedbyyourresources&needs

• E.g.,startconfigurationbeforeleavingworkonFriday

– Thelongerthebetter(butdiminishingreturns)• Roughruleofthumb:typicallytimeforatleast1000targetruns• Buthavealsoachievedgoodresultswith50targetrunsinsomecases

• Maximalcaptime pertargetrun– Dictatedbyyourneeds(typicalinstancehardness,etc)– Toohigh:slowprogress– Toolow:possibleovertuning toeasyinstances– ForSATetc.,oftenuse300CPUseconds

Decision#2:ChoosingtheTrainingInstances

• Representativeinstances,moderatelyhard– Toohard:won’tsolvemanyinstances,notraction– Tooeasy:willresultsgeneralizetoharderinstances?– Ruleofthumb:mixofhardnessranges

• Roughly75%instancessolvablebydefaultinmaximalcaptime

• Enoughinstances– Themoretraininginstances,thebetter– Veryhomogeneousinstancesets:50instancesmightsuffice– Preferably³ 300instances,bettereven³ 1000instances

• Splitinstancesetintotrainingandtestsets– Configureonthetraininginstances® configurationq*– Run(only)q* ontestinstances

• Unbiasedestimateofperformance

Pitfall:configuringonyourtestinstances

That’sfromthedarkages

Finepractice:domultipleconfigurationrunsandpicktheq*withbesttrainingperformance

Not(!!)bestontestset

• Worksmuchbetteronhomogeneousbenchmarks– Instancesthathavesomethingincommon

• E.g.,comefromthesameproblemdomain• E.g.,usethesameencoding

– Oneconfigurationlikelytoperformwellonallinstances

Pitfall:configurationontooheterogeneoussets

Thereoftenisnosinglegreatoverallconfiguration(butseealgorithmselectionetc.,laterinthetutorial)

Decision#3:HowManyParameterstoExpose?

• Suggestion:allparametersyoudon’tknowtobeuseless– Moreparameters® largergainspossible– Moreparameters® harderproblem– Max.#parameterstackledsofar:768

[Thornton, Hutter, Hoos & Leyton-Brown, KDD‘13]

• Withmoretimeyoucansearchalargerspace

Pitfall:includingparametersthatchangetheproblem

E.g.,optimalitythresholdinMIPsolvingE.g.,howmuchmemorytoallowthetargetalgorithm

Decision#4:HowtoWraptheTargetAlgorithm• Donottrustanytargetalgorithm

– Willitterminateinthetimeyouspecify?– Willitcorrectlyreportitstime?– Willitneverusemorememorythanspecified?– Willitbecorrectwithallparametersettings?

Pitfall:blindlyminimizingtargetalgorithmruntimeTypically,youwillminimizethetimetocrash

Goodpractice:wraptargetrunswithtoolcontrollingtimeandmemory(e.g.,runsolver [Roussel etal,’11])

Goodpractice:verifycorrectnessoftargetrunsDetectcrashes&penalizethem

FurtherBestPracticesand Pitfalls to Avoid

• Pitfalls– Using differentwrappers for comparing differentconfigurators

• Often causes subtle bugs that completely invalidate the comparison

– Notterminating algorithm runs properly• Use generic wrapper to handlethis for you (we provide one inPython)

– Filesystem issues when running many parallelexperiments

• Bestpractices– Choose configuration spaces dependent onbudget

• If you have timefor 10evaluations,don‘t configure 100sof parameters

– Realistic and/or reproducible running timemetrics• CPUvs wall-clock timevsMEMS

– Compareconfiguratorsonexisting,open-sourcebenchmarks• E.g.,useAClib (inversion2:https://bitbucket.org/mlindauer/aclib2)

[Eggensperger,Lindauer &Hutter,arXiv 2017]

Overview

ApplicationsofAlgorithmConfiguration

HelpedwinCompetitionsSAT:since2009ASP:since2009IPC:since2011Time-tabling:2007SMT:2007

OtherAcademicApplicationsMixedintegerprogramming(MIP)TSP&QuadraticAssignmentProblemGameTheory:KidneyExchangeLinearalgebrasubroutinesImprovingJavaGarbageCollectionMLHyperparameter OptimizationDeeplearning

FCCspectrumauction

Mixedintegerprogramming

Analytics&Optimization

Socialgaming

SchedulingandResourceAllocation

BacktotheSpearExampleSpear[Babic,2007]

– 26parameters– 8.34´ 1017 configurations

RanParamILS,2to3days´ 10machines– Onatrainingsetfromeachof2distributions

Comparedtodefault(1weekofmanualtuning)– Onadisjointtestsetfromeachdistribution

4.5-foldspeedup500-foldspeedupÞ wonQF_BVcategoryin2007SMTcompetition

belowdiagonal:speedup

Log-logscale!

[Hutter,Babic,Hu&Hoos,FMCAD2007]

OtherExamplesofPbO forSAT

• SATenstein [KhudaBukhsh,Xu,Hoos&Leyton-Brown,IJCAI2009&AIJ2016]

– Combinedingredientsfromexistingsolvers– 54parameters,over1012 configurations– Speedupfactors:1.6xto218x

• CaptainJack[Tompkins&Hoos,SAT2011]

– Exploredacompletelynewdesignspace– 58parameters,over1050 configurations– Afterconfiguration:bestknownsolverfor3sat10kandIL50k

ConfigurableSATSolverCompetition(CSSC)

• AnnualSATcompetition– ScoresSATsolversbytheirperformanceacrossinstances– Medalsforbestaverageperformancewithsolverdefaults

• Misleadingresults:implicitlyhighlightssolverswithgooddefaults

• CSSC2013&2014– Betterreflectsanapplicationsetting:homogeneousinstances® canautomaticallyoptimizeparameters

– Medalsforbestperformanceafterconfiguration

[Hutter,Balint,Bayless,Hoos &Leyton-Brown2013]

CSSCResult#1

• Solverperformanceoftenimprovedalot:

Lingeling onCircuitFuzz:Timeouts:119® 107

Clasponn-queens:Timeouts:211® 102

probSAT onunif rnd 5-SAT:Timeouts:250® 0

[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]

CSSCResult#2

• Automatedconfigurationchangedalgorithmrankings– Example:randomSAT+UNSATcategoryin2013

Solver CSSCranking DefaultrankingClasp 1 6

Lingeling 2 4

Riss3g 3 5

Solver43 4 2

Simpsat 5 1

Sat4j 6 3

For1-nodrup 7 7

gNovelty+GCwa 8 8

gNovelty+Gca 9 9

gNovelty+PCL 10 10

[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]

Real-WorldApplication:FCCSpectrumAuction• Wirelessfrequencyspectra:demandincreases

– USFederalCommunicationsCommission(FCC)iscurrentlyholdinganauction– Expectednetrevenue:$10billionto$40billion

• Keycomputationalproblem:feasibilitytestingbasedoninterferenceconstraints– Ahardgraphcolouringproblem– 2991stations(nodes)&

2.7millioninterferenceconstraints– Needtosolvemanydifferentinstances– Moreinstancessolved:higherrevenue

• Bestsolution:basedonSATsolving&configurationwithSMAC– Improved#instancessolvedfrom73%to99.6%

[Frechette, Newman & Leyton-Brown, AAAI 2016]

ConfigurationofaCommercialMIPsolver

MixedIntegerProgramming(MIP)

CommercialMIPsolver:IBMILOGCPLEX– Leadingsolverfor15years– Licensedby over1000universitiesand1300corporations– 76parameters,1047 configurations

Minimizingruntimetooptimalsolution– Speedupfactor:2´ to50´– Laterwork:speedupsupto10,000´

Minimizingoptimalitygapreached– Gapreductionfactor:1.3´ to8.6´

[Hutter,Hoos &Leyton-Brown,CPAIOR2010]

ComparisontoCPLEXTuningToolCPLEXtuningtool

– Introducedinversion11(late2007,afterParamILS)– Evaluatespredefinedgoodconfigurations,returnsbestone– Requiredruntimevaries(from<1htoweeks)

ParamILS:anytimealgorithm– Ateachtimestep,keepstrackofitsincumbent

2-fold speedup

(our worst result)50-fold speedup(our best result)

lower is better

[Hutter,Hoos &Leyton-Brown,CPAIOR2010]

ConfigurationofMachineLearningAlgorithms

• MachineLearninghascelebratedsubstantialsuccesses• Butitrequireshumanmachinelearningexpertsto

– Preprocessthedata– Performfeatureselection– Selectamodelfamily– Optimizehyperparameters– …

• AutoML:takingthehumanexpertoutoftheinnerloop– YearlyAutoML workshopsatICMLsince2014– PbO appliedtomachinelearning

Auto-WEKA

WEKA [Wittenetal,1999-current]– mostwidelyusedoff-the-shelfmachinelearningpackage– over20000citationsonGoogleScholar

Javaimplementationofabroadrangeofmethods– 27baseclassifiers(withupto10parameterseach)– 10meta-methods– 2ensemblemethods– 3featuresearchmethods&8featureevaluators

Differentmethodsworkbestondifferentdatasets– Wantatrueoff-the-shelfsolution:

[Thornton, Hutter, Hoos & Leyton-Brown, KDD 2013]

WEKA’sconfigurationspace

Baseclassifiers– 27choices,eachwithupto10subparameters– Coarsediscretization:about108 instantiations

Hierarchicalstructureontopofbaseclassifiers

WEKA’sconfigurationspace(cont’d)Featureselection

– Searchmethod:whichfeaturesubsetstoevaluate– Evaluationmethod:howtoevaluatefeaturesubsetsinsearch– Bothmethodshavesubparameters® about107 instantiations

Intotal:768parameters,1047 configurations

Auto-WEKA:Results

• Auto-WEKAperformedbetterthanbestbaseclassifier– Evenwhen“bestbaseclassifier”determinedbyoracle– In6/21datasetsmorethan10%reductionsinrelativeerror

• Comparisontofullgridsearch– Unionofgridsoverparametersofall27baseclassifiers– Auto-WEKAwas100timesfaster– Auto-WEKAhadbettertestperformancein15/21cases

• Auto-WEKAbasedonSMACvs TPE[Bergstra etal.,NIPS2011]– SMACyieldedbetterCVperformancein19/21cases– SMACyieldedbettertestperformancein14/21cases– Differencesusuallysmall,in3casessubstantial(SMACbetter)

Auto-WEKAasaWEKAplugin

• Commandlineexample:java -cp autoweka.jar weka.classifiers.meta.AutoWEKAClassifier-t iris.arff -timeLimit 5 -no-cv

• GUIexample:

Auto-sklearn

• FollowedandextendedAuto-WEKA’sAutoML approach• Scikit-learnoptimizedbySMAC,plus

– Meta-learningtowarmstart Bayesianoptimization– Automated posthoc ensembleconstructiontocombinethemodelsBayesianoptimizationevaluated

[Feurer,Klein,Eggensperger,Springenberg,Blum,Hutter, NIPS2015]

Auto-sklearn’s configurationspace

• Scikit-learn[Pedregosa etal.,2011-current] insteadofWEKA– 15classifiers,(withatotalof59hyperparameters)

– 13featurepreprocessors(42hyperparams)

– 4 datapreprocessors(5hyperparams)

• 110hyperpametersvs 768inAuto-WEKA

Auto-sklearn:ReadyforPrimeTime• Winningapproachinthe14-monthAutoML challenge

– Best-performingapproachinauto-trackandhumantrack– Wonbothtracksinbothfinalphases– Vs 150teamsofhumanexperts

• Trivialtouse:

• Availableonline:https://github.com/automl/auto-sklearn

PbO forDeepLearning

• Whatisdeeplearning?– Neuralnetworkswithmanylayers

• Whyistheresomuchexcitementaboutit?– Dramaticallyimprovedthestate-of-the-artinmanyareas,e.g.,

• Speechrecognition• Imagerecognition

– Automaticlearningofrepresentations® nomoremanualfeatureengineering

• Whatchanged?– Largerdatasets– Betterregularizationmethods,e.g.,dropout[Hinton et al., 2012]

– FastGPUimplementations[Krizhevsky et al., 2012]

Source:Krizhevsky etal.,2012

Source:Leetal.,2012

DeepLearningisSensitivetoManyChoices

dogcat

#convolutionallayers #fullyconnectedlayers

#unitsperlayer

Kernelsizes

Choiceofnetworkarchitecture…

…and20-30othernumericalchoicesLearningrateschedule(initialization,decay,adaptation),momentum,batchnormalization,batchsize,#epochs,dropoutrates,weightinitializations,weightdecay,…

Auto-NetforComputerVisionData

• Application:objectrecognition

• ParameterizedtheCaffe framework [Jia, 2013]

– Convolutionalneuralnetwork– 9networkhyperparameters– 12hyperparametersperlayer,upto6layers– Intotal81hyperparameters

• ResultsforCIFAR-10– NewbestresultforCIFAR-10withoutdataaugmentation

– SMACoutperformedTPE(onlyotherapplicablehyperparameter optimizer)

[Domhan, Springenberg, Hutter, IJCAI 2015]

Auto-NetintheAutoML Challenge

• ClearlywonadatasetofAutoML challenge– 54491datapoints,5000features,18classes

• Unstructureddata® fully-connectednetwork– Upto5layers(with3layerhyperparameterseach)– 14networkhyperparameters,intotal29hyperparameters– Optimizedfor18hon5GPUs

• Result(onprivatetestset)– AUC90%– Allother(manual)approaches<80%

SpeedupsbyPredictionofLearningCurves

• Humanscanlookinsidetheblackbox– Theycanpredictthefinalperformanceofatargetalgorithmrunearly

– Afterfewepochsofstochasticgradientdescent

– Stopifnotpromising

• Weautomatedthatheuristic– Fittedlinearcombinationof22parametricmodels– MCMCtopreserveuncertaintyovermodelparameters– Stoppedpoorrunsearly:overall2-foldspeedup

[Domhan, Springenberg & Hutter, IJCAI 2015]

log(C)

log(g) log(g) log(g) log(g)

smax /128smax /16smax /4smax

log(C)

SpeedupsbyReasoningoverDataSubsets

• Problem:trainingisveryslowforlargedatasets

• Solutionapproach:scalingupfromsubsetsofgivendata

• Example:SVM– Computationalcostgrowsquadratically indatasetsizes– Errorshrinkssmoothlywiths

[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]

SpeedupsbyReasoningoverDataSubsets

• 10-100xspeedupforoptimizingSVMhyperparameters• 5-10xspeedupforconvolutionalneuralnetworks

Budgetofoptimizer[s]

[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]

SummaryofAlgorithmConfiguration

• Algorithmconfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Usefulabstractionwithmany(!)applications• Oftenbetterperformancethanhumanexperts• Muchlesshumanexperttimerequired

Linkstoallourcode:http://ml4aad.org100

Overview

• AlgorithmConfiguration

• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction

• SoftwareDevelopmentToolsandFurtherDirections

Motivation:nosinglegreatconfigurationexists

• Heterogeneousinstancedistributions– Evenbestoverallconfigurationisnotgreat;e.g.:

• Nosinglebestsolver– E.g.,SATsolving:differentsolverswindifferentcategories

– Virtualbestsolver(VBS)muchbetterthansinglebestsolver(SBS)

Configuration Instancetype1 Instancetype2#1 1s 1000s#2 1000s 1s#3 100s 100s

Algorithmportfolios

Parallel portfolios [Huberman et al. 1997]

Algorithm schedules [Sayag et al. 2006]

Algorithm selection [Rice 1976]

Algorithms

Feature extractor

Algorithmselector

instance

instancetime

Exploiting complementary strengths of different algorithms

Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution

• Satisfiability:– SATzilla*† [variouscoauthors,citedinthefollowingslides;2003-17]– 3S*† [Sellmann 2011]

– ppfolio‡ [Roussel 2011]– claspfolio* [Gebser,Kaminski,Kaufmann,Schaub,Schneider,Ziller2011]

– aspeed†‡ [Kaminski,Hoos,Schaub,Schneider2012]

• ConstraintSatisfaction:– CPHydra*† [O’Mahony,Hebrard,Holland,Nugent,O’Sullivan2008]

Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution

• Planning:– FDStoneSoup† [Helmert,Röger,Karpas 2011]

• MixedIntegerProgramming:– ISAC* [Kadioglu,Malitsky,Sellmann,Tierney2010]

– MIPzilla*† [Xu,Hutter,Hoos,Leyton-Brown2011]

• ..andthisisjustthetipoftheiceberg:– http://dl.acm.org/citation.cfm?id=1456656[Smith-Miles2008]– http://4c.ucc.ie/~larsko/assurvey[Kotthoff 2012]

Overview

• AlgorithmConfiguration

• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction

• SoftwareDevelopmentToolsandFurtherDirections

SATzilla:theearlycoreapproach[Leyton-Brown,Nudelman,Andrew,J.McFadden,Shoham 2003]

[Nudelman,Leyton-Brown,Devkar,Shoham,Hoos2004]

• Training(partofalgorithmdevelopment)

– Buildastatisticalmodeltopredictruntimeforeachcomponentalgorithm

• Test(foreachnewinstance)– Predictperformanceforeachalgorithm– Pickthealgorithmpredictedtobebest

• GoodperformanceinSATcompetitions– 2003:2silver,1bronzemedals– 2004:2bronzemedals

• Given:– trainingsetofinstances– performancemetric– candidatesolvers– portfoliobuilder

(incl.instancefeatures)

• Training:– collectperformancedata– learnamodelforselecting

amongsolvers

• AtRuntime:– evaluatemodel– runselectedsolver

Metric

Portfolio Builder

Training Set

NovelInstance Portfolio-Based

Algorithm Selector

Candidate Solvers

SelectedSolver

SATzilla (stylizedversion)

SATInstanceFeatures(2003-14)Over100features,including:• Instancesize(clauses,variables,clauses/variables,…)• Syntactic properties(e.g.,positive/negativeclauseratio)

• Statisticsofvariousconstraintgraphs– factorgraph– clause–clausegraph– variable–variablegraph

• Knuth’ssearchspacesize estimate

• Treesearchprobing• Localsearchprobing

• Linearprogramming relaxation109

SATzilla 2007

• Substantiallyextendedfeatures

• Earlyalgorithmschedule:identifysetof“presolvers”andascheduleforrunningthem– Foreverychoiceoftwopresolvers +captimes,runentireSATzilla pipelineandevaluateoverallperformance

– Keepchoicethatyieldsbestperformance– Forlatersteps:Discardinstancessolvedbythispresolvingschedule

• Identify a“backupsolver”:SBSonremainingdata– Neededincasefeaturecomputationcrashes

• 2007SATcompetition:3gold,1silver,1bronzemedals

[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]

SATzilla 2009

• Robustness:selectionofbestsubsetofcomponentsolvers– Considereverysubsetofgivensolverset

• omittingweaksolverspreventstheiraccidentalselection• conditionedonchoiceofpresolvers• computationallycheap:modelsdecomposeacrosssolvers

– Keepsubsetthatachievesbestperformance

• Fullyautomatedprocedure– optimizeslossonavalidationset

• 2009SATcompetition:3gold,2silvermedals

[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]

SATzilla 2011andlater:cost-sensitiveDFs

• Howitworks:– Buildclassifiertodeterminewhichalgorithmtopreferbetweeneachpairofalgorithmsingivenportfolio

– Lossfunction:costofmisclassification

• Decisionforestsandsupportvectormachineshavecost-sensitivevariants

• Classifiersvotefordifferentalgorithms;selectalgorithmwithmostvotes– Advantage:selectionisaclassificationproblem– Advantage:bigandsmallerrorstreateddifferently

• 2011SATcompetition:enteredEvaluationTrack(morelater)

[Xu, Hutter, Hoos & Leyton-Brown, SAT 2012]

2012SATChallenge:Application

113*Interactingmulti-enginesolvers:likeportfolios,butricherinteractionbetweensolvers

2012SATChallenge:HardCombinatorial

SATChallenge2012:Random

2012SATChallenge:SequentialPortfolio

• 3Sdeservesmentioning,butdidn’trankofficially[Kadioglu,Malitsky,Sabharwal,Samulowitz,Sellmann,2011]– Disqualifiedonatechnicality

• choseabuggysolverthatreturnedanincorrectresult• anoccupationalhazardforportfolios!

– OverallperformancenearlyasstrongasSATzilla116

2013onwards

• Algorithmselectionavictimofitsownsuccess

• 2013:“TheemphasisofSATCompetition2013isonevaluationofcoresolvers:”– Single-coreportfoliosof>2solversnoteligible– One“opentrack”allowingparallelsolvers,portfolios,etc– Thatopentrackwasdominatedbyportfolios

• 2014onwards– “SATCompetition2014onlyallowssubmissionofcoresolvers”– Portfolioresearchersstartedtheirowncompetition:theICONAlgorithmSelectionChallenge

AutoFolio:PbOforAlgorithmSelection

• Defineageneralspaceofalgorithmselectionmethods– AutoFolio‘sconfigurationspace:54choices

• Usealgorithmconfigurationtoselectbestinstantiation– Partitionthetrainingbenchmarkinstancesinto10folds– UseSMACtofindalgorithm selectionapproach&itshyperparametersthatoptimizeCVperformance

[Lindauer, Hoos, Hutter & Schaub, JAIR 2015]

ICONAlgorithmSelectionChallenge2015• Ingredientsofanalgorithmselection(AS)benchmark

– Asetofsolvers– Asetofbenchmarkinstances(splitintotraining&test)– Measuredperformanceofallsolversonallinstances

• Algorithmselectioncompetition:– 13ASbenchmarksfromSAT,MaxSAT,CSP,ASP,QBF,Premarshalling– 9competitorsusingregression,classification,clustering,k-NN,etc

• Winningalgorithmsinthe3tracks:– Penalizedaverageruntime:AutoFolio– Numberofinstancessolved:AutoFolio– Frequencyofselectingthebestalgorithm:SATzilla– OverallwinnerSATzilla(2ndinfirsttwotracks)

Tryityourself!

• SATzilla andAutoFolio arefreelyavailableonline

http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/http://ml4aad.org/autofolio

• Youcantrythemforyourproblem– wehavefeaturesforSAT,MIP,AIplanningandTSP– youneedtoprovidefeaturesforotherdomains

• inmanycases,thegeneralideasbehindourfeaturesapply• canalsomakefeaturesbyreducingyourproblemtoe.g.SATandcomputingtheSATfeatures

Automatically Configuring Algorithmsfor Portfolio-Based SelectionXu, Hoos, Leyton-Brown (2010); Kadioglu et al. (2010)

I SATzilla builds algorithm selector based on given setof SAT solversbut: success entirely depends on quality of given solvers

I Automated configuration produces solvers that work wellon average on a given set of SAT instances(e.g., SATenstein – KhudaBukhsh, Xu, Hoos, Leyton-Brown 2009)

but: may have to settle for compromisesfor broad, heterogenous instance sets

Idea: Combine the two approaches portfolio-based selectionfrom set of automatically constructed solvers

Combined Configuration + Selection

parametric algorithm(multiple configurations)

selectorfeature extractor

Approach #1:

1. build solvers for various types of instances using automatedalgorithm configuration

2. construct portfolio-based selector from these

Problem: requires suitably defined sets of instances

Solution: automatically partition heterogenous instance set

Instance-specific algorithm configuration (ISAC)

Kadioglu, Malitsky, Sellmann, Tierney (2010); Malitky, Sellman (2012)

1. cluster training instances based on features(using G-means)

2. configure given parameterised algorithm independentlyfor each cluster (using GGA)

3. construct portfolio-based selector from resulting configurations(using distance to cluster centroids)

Drawback: Instance features may not correlate wellwith impact of algorithm parameters on performance(e.g., uninformative features)

Approach #2:

Key idea: Augment existing selector AS by targetting instanceson which AS performs poorly(cf. Leyton-Brown et al. 2003; Leyton-Brown et al. 2009)

I interleave configuration and selector construction

I in each iteration, determine configuration that complementscurrent selector best

Advantages:

I any-time behaviour: iteratively adds configurations

I desirable theoretical guarantees (under idealising assumptions)

HydraXu, Hoos, Leyton-Brown (2010); Xu, Hutter, Hoos, Leyton-Brown (2011)

1. configure given target algorithm A on complete instance set I configuration A1 = selector AS1 (always selects A1)

2. configure a new copy of A on I such that performance ofselector AS := AS1 + Anew is optimised configuration A2

selector AS2 := AS1 + A2 (selects from {A1,A2})

3. configure a new copy of A on I such that performance ofselector AS := AS2 + Anew is optimised configuration A3

selector AS3 := AS2 + A3 (selects from {A1,A2,A3})

I e↵ectively adds A with maximal marginal contributionin each iteration

I estimate marginal contribution using perfect selector (oracle) avoids costly construction of selectors during configuration

I works well using FocusedILS for configuration,*zilla for selection (but can use other configurators, selectors)

I can be further improved by adding multiple configurationsper iteration; using performance estimates from configurator

Results on SAT:

I target algorithm: SATenstein-LS (KhudaBukhsh et al. 2009)

I 6 well-known benchmark sets of SAT instances(application, crafted, random)

I 7 iterations of Hydra

I 10 configurator runs per iteration, 1 CPU day each

Results on mixture of 6 benchmark sets

Hydra[BM, 1] PAR Score

Results on mixture of 6 benchmark sets

0 1 2 3 4 5 6 7 80

Number of Hydra Steps

Hydra[BM] trainingHydra[BM] testSATenFACT on trainingSATenFACT on test

I good results also for MIP (CPLEX)(Xu, Hutter, Hoos, Leyton-Brown 2011)

I idea underlying Hydra can also be applied toautomatically construct parallel algorithm portfoliosfrom single parameterised target algorithm(Hoos, Leyton-Brown, Schaub, Schneider 2012–14)

Software Development Support

and Further Directions

Software development in the PbO paradigm

use context

PbO-<L>source(s)

parametric<L>

source(s)

instantiated<L>

source(s)

deployedexecutable

designspace

description

PbO-<L> weaver

PbO design

optimiser

benchmarkinputs

Design space specification

Option 1: use language-specific mechanisms

I command-line parameters

I conditional execution

I conditional compilation (ifdef)

Option 2: generic programming language extension

Dedicated support for . . .

I exposing parameters

I specifying alternative blocks of code

Advantages of generic language extension:

I reduced overhead for programmer

I clean separation of design choices from other code

I dedicated PbO support in software development environments

Key idea:

I augmented sources: PbO-Java = Java + PbO constructs, . . .

I tool to compile down into target language: weaver

use context

PbO-<L>source(s)

parametric<L>

source(s)

instantiated<L>

source(s)

deployedexecutable

designspace

description

PbO-<L> weaver

PbO design

optimiser

benchmarkinput

Exposing parameters

numerator -= (int) (numerator / (adjfactor+1) * 1.4);

... ...

##PARAM(float multiplier=1.4)

numerator -= (int) (numerator / (adjfactor+1) * ##multiplier);

I parameter declarations can appear at arbitrary places(before or after first use of parameter)

I access to parameters is read-only (values can only beset/changed via command-line or config file)

Specifying design alternatives

I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)

I Choice point:location in a program at which a choice is available

##BEGIN CHOICE preProcessing

##END CHOICE preProcessing

##BEGIN CHOICE preProcessing=standard

##BEGIN CHOICE preProcessing=enhanced

##BEGIN CHOICE extraPreProcessing

##END CHOICE extraPreProcessing

!"#$%&'(#)(

*+,-./0"&!1%#2"3

45156#(17%./0

$"&!1%#2"3

7'"(5'(75(#8./0

$"&!1%#2"3

8#49&:#8#)#%!(5+9#

8#"7;'"45%#

8#"%174(7&'

$$$*+,-./0$$$<#5=#1

*+,$8#"7;'&4(767"#1

+#'%>651?7'4!(

The Weaver

transforms PbO-<L> code into <L> code(<L> = Java, C++, . . . )

I parametric mode:

I expose parameters

I make choices accessible via (conditional, categorical)parameters

I (partial) instantiation mode:

I hardwire (some) parameters into code(expose others)

I hardwire (some) choices into code(make others accessible via parameters)

The road ahead

I Support for PbO-based software development

I Weavers for PbO-C, PbO-C++, PbO-Java

I PbO-aware development platforms

I Improved / integrated PbO design optimiser

I Debugging and performance analysis tools

I Best practices

I Many further applications

I Scientific insights

Which choices matter?

Observation: Some design choices matter more than others

depending on . . .

I algorithm under consideration

I given use context

Knowledge which choices / parameters matter may . . .

I guide algorithm development

I facilitate configuration

3 recent approaches:

I Forward selection based on empirical performance modelsHutter, Hoos, Leyton-Brown (2013)

I Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)

I Ablation analysisFawcett & Hoos (2013–16)

Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)

Key idea:

I build regression model of algorithm performance as a functionof all input parameters (= design choices)

empirical performance models (EPMs)

I analyse variance in model output (= predicted performance)due to each parameter, parameter interactions

I importance of parameter: fraction of performance variationover configuration space explained by it (main e↵ect)

I analogous for sets of parameters (interaction e↵ects)

Decomposition of variance in a nutshell

For parameters p1, . . . , pn and a function (performance model) y :

y(p1, . . . , pn) = µ

+ f1(p1) + f2(p2) + · · ·+ fn(pn)

+ f1,2(p1, p2) + f1,3(p1, p3) + · · ·+ fn�1,n(pn�1, pn)

+ f1,2,3(p1, p2, p3) + · · ·+ · · ·

I Straightforward computation of main and interaction e↵ectsis intractable.(integration over combinatorial spaces of configurations)

I For random forest models, marginal performance predictionsand variance decomposition (up to constant-sized interactions)can be computed exactly and e�ciently.

Empirical study:

I 8 high-performance solvers for SAT, ASP, MIP, TSP(4–85 parameters)

I 12 well-known sets of benchmark data(random + real-world structure)

I random forest models for performance prediction,trained on 10 000 randomly sampled configurations per solver+ data from 25+ runs of SMAC configuration procedure

Fraction of variance explained by main e↵ects:

CPLEX on RCW (comp sust) 70.3%CPLEX on CORLAT (comp sust) 35.0%

Clasp on software verificatition 78.9%Clasp on DB query optimisation 62.5%

CryptoMiniSAT on bounded model checking 35.5%CryptoMiniSAT on software verification 31.9%

Fraction of variance explained by main + 2-interaction e↵ects:

CPLEX on RCW (comp sust) 70.3% + 12.7%CPLEX on CORLAT (comp sust) 35.0% + 8.3%

Clasp on software verificatition 78.9% + 14.3%Clasp on DB query optimisation 62.5% + 11.7%

CryptoMiniSAT on bounded model checking 35.5% + 20.8%CryptoMiniSAT on software verification 31.9% + 28.5%

Note:may pick up variation caused by poorly performing configurations

Simple solution:cap at default performance or quantile from distribution ofrandomly sampled configurations; build model from capped data.

Ablation analysisFawcett & Hoos (2013, 2016)

Key idea:

I given two configurations, A and B , change one parameterat a time to get from A to B

ablation path

I in each step, change parameter to achieve maximal gain(or minimal loss) in performance

I for computational e�ciency, use racing (F-race)for evaluating parameters considered in each step

Empirical study:

I high-performance solvers for SAT, MIP, AI Planning(26–76 parameters),well-known sets of benchmark data (real-world structure)

I optimised configurations obtained from ParamILS(minimisation of penalised average running time;(10 runs per scenario, 48 CPU hours each)

Ablation between default and optimised configurations:

0 5 10 15 20 25

#Parameters modified from default

Default to configuredConfigured to default

LPG on Depots planning domain

Which parameters are important?

LPG on depots:

I cri intermediate levels (43% of overall gain!)

I triomemory

I donot try suspected actions

I walkplan

I weight mutex in relaxed plan

Note: Importance of parameters varies between planning domains

New development: Ablation based on surrogate models(Biedenkapp, Lindauer, Eggensperger, Hutter, Fawcett, Hoos 2017)

Algorithm configuration: parameter importance

Algorithm selection: component contributionXu, Hutter, Hoos, Leyton-Brown (2012)

Consider:portfolio-based algorithm selector ASwith candidate algorithms A1,A2, . . .Ak

Question:How much does each Ai contributeto overall performance of AS?

Marginal contribution of Ai to portfolio-based selector AS

= di↵erence in performance of AS with and without Ai

(trained separately)

6= frequency of selecting Ai

6= fraction of instances solved by Ai

6= contribution of Ai to virtual best solver (VBS)

Application to SATzilla:

I all instances from 2011 SAT Competition:300 Application; 300 Crafted; 300 Random

I candidate solvers from 2011 SAT Competition:

I for determining virtual best solver (VBS)and single best solver (SBS):all solvers from Phase 2 of competition:31 Application; 25 Crafted; 17 Random

I for building SATzilla:all sequential, non-portfolio solvers from Phase 2:18 Application; 15 Crafted; 9 Random

I SATzilla assessed by 10-fold cross validation

SATzilla 2011 Performance (Inst. Solved)

Solver Application Crafted Random

VBS 84.7% 76.3% 82.2%SATzilla 2011 75.3% 66.0% 80.8%SATzilla 2009 70.3% 63.0% 80.3%Gold medalist (SBS) 71.7% 54.3% 68.0%

Performance of Individual SolversApplication

0 20 40 60 80 100

QuteRSatCryptoMinisat

EBGlucoseGlucose2Glucose1

RclMinisat_psm

ContrasatMPhaseSAT64

LingelingPrecosat

LR GL SHRGlueminisatMinisatagile

EBMinisatMinisat

CirminisatRestartSAT

Percentage Solved

5000 CPU sec cuto↵

Correlation of Solver PerformanceApplication

RestartSAT

CirminisatMinisat

EBMinisat

Minisatagile

Glueminisat

LR GL SHRPrecosat

Lingeling

MPhaseSAT64

Contrasat

Minisat_psmRclGlucose1

Glucose2

EBGlucose

CryptoMinisat

QuteRSat

RestartSATCirminisat

MinisatEBMinisat

MinisatagileGlueminisatLR GL SHR

PrecosatLingeling

MPhaseSAT64Contrasat

Minisat_psmRcl

Glucose1Glucose2

EBGlucoseCryptoMinisat

QuteRSat

darker = higher Spearman correlation coe�cient

Correlation of Solver PerformanceRandom

Sparrow

EagleU

Gnovelt

y+2TNM

Sattime

Adaptg2

wsat11

eSAT_M

March_

Sparrow

EagleUP

Gnovelty+2

Sattime11

Adaptg2wsat11

MPhaseSAT_M

March_rw

March_hi

darker = higher Spearman correlation coe�cient

Solver Selection Frequency in SATzilla 2011Application

Glucose2 (Backup)

Glucose2

Glueminisat

QuteRSat

Precosat

Other Solvers

Solved by Presolvers

Instances Solved by SATzilla 2011 ComponentsApplication

Glucose2 (Backup)

Glucose2 (Pre1)

Glucose2

Glueminisat (Pre1)

GlueminisatQuteRSat

Precosat

EBGlucose (Pre1)EBGlucose

Minisat psm (Pre1)Minisat psm

Other Solvers

Unsolved

Marginal Contribution of ComponentsApplication

0 2 4 6 8 10

QuteRSatCryptoMinisat

EBGlucoseGlucose2Glucose1

RclMinisat_psm

ContrasatMPhaseSAT64

LingelingPrecosat

LR GL SHRGlueminisatMinisatagile

EBMinisatMinisat

CirminisatRestartSAT

Marginal Contribution (%)

Instances Solved vs Marginal Contribution of ComponentsApplication

0 10 20 30 40 50 600

% Solved by Component Solver

MPhaseSAT64

Glueminisat

Instances Solved vs Marginal Contribution of ComponentsCrafted

0 10 20 30 40 50 600

Clasp2Sattime

MPhaseSAT

Joint contributions:- 2 Clasp variants = 6.3%- 2 Sattime variants = 5.4%

Instances Solved vs Marginal Contribution of ComponentsRandom

0 10 20 30 40 50 600

EagleUP

Sparrow

March_rw

Joint contributions:- 2 March variants = 4%- 6 LS solvers = 22.5%

More nuanced analysis based on Shapley value:

AAAI-16 paper by Frechette et al.

Bias + correction in portfolio performance evaluation:

IJCAI-16 paper by Cameron et al.

Leveraging parallelism

I design choices in parallel programs(Hamadi, Jabhour, Sais 2009)

I deriving parallel programs from sequential sources concurrent execution of optimised designs (parallel portfolios)(Hoos, Leyton-Brown, Schaub, Schneider 2012)

I parallel design optimisers(e.g., Hutter, Hoos, Leyton-Brown 2012)

I use of cloud resources (parallel runs of design optimisers, ...)(Geschwender, Hutter, Kottho↵, Malitsky, Hoos, Leyton-Brown 2014)

Take-home Message

Programming by Optimisation ...

I leverages computational power to constructbetter software

I enables creative thinking about design alternatives

I produces better performing, more flexible software

I facilitates scientific insights into

I e�cacy of algorithms and their components

I empirical complexity of computational problems

... changes how we build and use high-performance software

More Information:

I www.prog-by-opt.net/Tutorials/IJCAI-17

I www.prog-by-opt.net

I PbO article in Communications of the ACM (Hoos 2012)

I Talk by Kleinberg et al.: Tue, 15:15, Room 216Talk by Lindauer et al.: Fri, 10:50, Room 203Invited talk by HH at CP/ICLP/SAT: Tue, 08/29, 9:00

I Forthcoming book (Morgan & Claypool)

If PbO works for you:

I Make our day – let us know!

I Share the joy – tell everyone else!Hoos & Hutter: Programming by Optimization 170

Programming by Optimisationhoos/PbO/Tutorials/IJCAI-17/ijcai-17-tutorial-slides.pdfProgramming by...

Documents

Transcript of Programming by Optimisationhoos/PbO/Tutorials/IJCAI-17/ijcai-17-tutorial-slides.pdfProgramming by...

pbo 4 ervan

PbO-B203 and PbO-Si02 · About half of the PbO-B20 a glasses were analyzed by a triple evaporation with HF and H2S04 for PbO both before and after the surface tension determinations.

IBM India and Research IJCAi

PBO 2014-2015

Conferences Review – AAAI and IJCAI Sean. 2 Outline Introduction to AAAI Selected papers from AAAI (3) Introduction to IJCAI Selected papers from IJCAI.

Pbo Delphi Modul

Pbo fazenda - rj

Hierarchical Hardness Models for SAT Lin Xu, Holger H. Hoos and Kevin Leyton-Brown University of British Columbia {xulin730, hoos, kevinlb}@cs.ubc.ca.

IJCAI-2016 Tutorial A Concise Introduction to Planning ...hgeffner/Tutorial/IJCAI-2016-Tutorial... · IJCAI-2016 Tutorial A Concise Introduction to Planning Models and Methods ...

Principles of AI Problem Solving Tutorial IJCAI-05dechter/talks/ijcai-tutorial2005.pdf · Principles of AI Problem Solving Tutorial IJCAI-05 Adnan Darwiche (UCLA, Los Angeles) Rina

Hoos Career Guide

Hoos Home for The Holidays Guide

Parliamentary Budget Office PBO

PBO Prospectus 2001

PBO-acid in water

pbo 2 ervan

Ijcai ip-2015 cyberbullying-final

PBO-CCW-1202S4 PBO-CCW-2302S4 - promationei.com · Page 2 of 12 PBO CCW HV 2S4 Series Actuator Specifi cations PBO Torque “lb/Nm 1150”lbs/130Nm Supply Voltage 120vac 230vac Max

Chapter 12: PBO & FBO

ParamILS: An Automatic Algorithm Conﬁguration Framework · ParamILS: An Automatic Algorithm Conﬁguration Framework Frank Hutter HUTTER@CS.UBC CA Holger H. Hoos HOOS@CS.UBC CA