Programming by Optimisationhoos/PbO/Tutorials/IJCAI-17/ijcai-17-tutorial-slides.pdfProgramming by...

Post on 28-May-2020

11 views 0 download

Transcript of Programming by Optimisationhoos/PbO/Tutorials/IJCAI-17/ijcai-17-tutorial-slides.pdfProgramming by...

Programming by Optimisation:

A Practical Paradigm

for Computer-Aided Algorithm Design

Holger H. Hoos & Frank Hutter

Leiden Institute for Advanced CSUniversiteit LeidenThe Netherlands

Department of Computer ScienceUniversitat FreiburgGermany

IJCAI 2017

Melbourne, Australia, 2017/08/21

The age of machines

“As soon as an Analytical Engine exists, it will necessarily guide the futurecourse of the science. Whenever any result is sought by its aid, the questionwill then arise – by what course of calculation can these results be arrived atby the machine in the shortest time?”

(Charles Babbage, 1864)

Hoos & Hutter: Programming by Optimization 2

Hoos & Hutter: Programming by Optimization 3

The age of computation

“The maths[!] that computers use todecide stu↵ [is] infiltrating every aspectof our lives.”

I financial markets

I social interactions

I cultural preferences

I artistic production

I . . .

Hoos & Hutter: Programming by Optimization 3

Performance matters ...

I computation speed (time is money!)

I energy consumption (battery life, ...)

I quality of results (cost, profit, weight, ...)

... increasingly:

I globalised markets

I just-in-time production & services

I tighter resource constraints

Hoos & Hutter: Programming by Optimization 4

Example: Resource allocation

I resources > demands many solutions, easy to find

economically wasteful reduction of resources / increase of demand

I resources < demands no solution, easy to demonstrate

lost market opportunity, strain within organisation increase of resources / reduction of demand

I resources ⇡ demands di�cult to find solution / show infeasibilityresources ⇡demands di�cult to find solution / show infeasibility

Hoos & Hutter: Programming by Optimization 5

This tutorial:

new approach to software development, leveraging . . .

I human creativity

I optimisation & machine learning

I large amounts of computation / data

Hoos & Hutter: Programming by Optimization 6

Key idea:

I program (large) space of programs

I encourage software developers toI avoid premature commitment to design choicesI seek & maintain design alternatives

I automatically find performance-optimising designsfor given use context(s)

) Programming by Optimization (PbO)

Hoos & Hutter: Programming by Optimization 7

application context 1

solver

application context 2 application context 3

solversolver

Hoos & Hutter: Programming by Optimization 8

application context 1

solver[p1]

application context 2 application context 3

solver[p3]solver

solver[·]

solversolversolversolver[p2]

Hoos & Hutter: Programming by Optimization 8

Outline

1. Programming by Optimization: Motivation & Introduction

2. Algorithm Configuration (incl. Co↵ee Break)

3. Portfolio-based Algorithm Selection

4. Software Development Support & Further Directions

Hoos & Hutter: Programming by Optimization 9

Programming by Optimization:

Motivation & Introduction

Example: SAT-based software verificationHutter, Babic, Hoos, Hu (2007)

I Goal: Solve SAT-encoded software verification problemsGoal: as fast as possible

I new DPLL-style SAT solver Spear (by Domagoj Babic)

= highly parameterised heuristic algorithm= (26 parameters, ⇡ 8.3⇥ 1017 configurations)

I manual configuration by algorithm designer

I automated configuration using ParamILS, a genericalgorithm configuration procedureHutter, Hoos, Stutzle (2007)

Hoos & Hutter: Programming by Optimization 10

Spear: Performance on software verification benchmarks

solver num. solved mean run-time

MiniSAT 2.0 302/302 161.3 CPU sec

Spear original 298/302 787.1 CPU secSpear generic. opt. config. 302/302 35.9 CPU secSpear specific. opt. config. 302/302 1.5 CPU sec

I ⇡ 500-fold speedup through use automated algorithmconfiguration procedure (ParamILS)

I new state of the art(winner of 2007 SMT Competition, QF BV category)

Hoos & Hutter: Programming by Optimization 11

Levels of PbO:

Level 4: Make no design choice prematurely thatcannot be justified compellingly.

Level 3: Strive to provide design choices andalternatives.

Level 2: Keep and expose design choices consideredduring software development.

Level 1: Expose design choices hardwired intoexisting code (magic constants, hiddenparameters, abandoned design alternatives).

Level 0: Optimise settings of parameters exposedby existing software.

Hoos & Hutter: Programming by Optimization 12

Lo Hi

Hoos & Hutter: Programming by Optimization 13

Lo Hi

Hoos & Hutter: Programming by Optimization 13

Lo Hi

Hoos & Hutter: Programming by Optimization 13

Lo Hi

Hoos & Hutter: Programming by Optimization 13

Success in optimising speed:

Application, Design choices Speedup PbO level

SAT-based software verification (Spear), 26Hutter, Babic, Hoos, Hu (2007)

4.5–500 ⇥ 2–3

AI Planning (LPG), 62Vallati, Fawcett, Gerevini, Hoos, Saetti (2011)

3–118 ⇥ 1

Mixed integer programming (CPLEX), 76Hutter, Hoos, Leyton-Brown (2010)

2–52 ⇥ 0

... and solution quality:

University timetabling, 18 design choices, PbO level 2–3 new state of the art; UBC exam schedulingFawcett, Chiarandini, Hoos (2009)

Machine learning / Classification, 786 design choices, PbO level 0–1 outperforms specialised model selection & hyper-parameter optimisation methods from machine learningThornton, Hutter, Hoos, Leyton-Brown (2012–13)

Hoos & Hutter: Programming by Optimization 14

PbO enables . . .

I performance optimisation for di↵erent use contexts(some details later)

I adaptation to changing use contexts(see, e.g., life-long learning – Thrun 1996)

I self-adaptation while solving given problem instance(e.g., Battiti et al. 2008; Carchrae & Beck 2005; Da Costa et al. 2008)

I automated generation of instance-based solver selectors(e.g., SATzilla – Leyton-Brown et al. 2003, Xu et al. 2008;Hydra – Xu et al. 2010; ISAC – Kadioglu et al. 2010;AutoFolio – Lindaue 2015)

I automated generation of parallel solver portfolios(e.g., Huberman et al. 1997; Gomes & Selman 2001;Hoos et al. 2012; Lindauer et al. 2017)

Hoos & Hutter: Programming by Optimization 15

Cost & concerns

But what about ...

I Computational complexity?

I Cost of development?

I Limitations of scope?

Hoos & Hutter: Programming by Optimization 16

Computationally too expensive?

Spear revisited:

I total configuration time on software verification benchmarks:⇡ 20 CPU days

I wall-clock time on 10 CPU cluster:⇡ 2 days

I cost on Amazon Elastic Compute Cloud (EC2):60.23 AUD (= 48 USD)

I 60.23 AUD pays for ...

I 1h42 of typical software engineer in AustraliaI 3h33 at minimum wage in Australia

Hoos & Hutter: Programming by Optimization 17

Too expensive in terms of development?

Design and coding:

I tradeo↵ between performance/flexibility and overhead

I overhead depends on level of PbO

I traditional approach: cost from manual exploration ofdesign choices!

Testing and debugging:

I design alternatives for individual mechanisms and componentscan be tested separately

e↵ort linear (rather than exponential) in the number ofdesign choices

Hoos & Hutter: Programming by Optimization 18

Limited to the “niche” of NP-hard problem solving?

Some PbO-flavoured work in the literature:

I computing-platform-specific performance optimisationof linear algebra routines(Whaley et al. 2001)

I optimisation of sorting algorithmsusing genetic programming(Li et al. 2005)

I compiler optimisation(Pan & Eigenmann 2006; Cavazos et al. 2007; Fawcett et al. 2017)

I database server configuration(Diao et al. 2003)

Hoos & Hutter: Programming by Optimization 19

Algorithm Configuration:

Methods

Overview• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues[coffee]– CaseStudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

20

AlgorithmConfiguration• Inanutshell:optimizationoffreeparameters

– Whichparameters?Theonesyou‘dotherwisetunemanually&more

• ExamplesoffreeparametersinvarioussubfieldsofAI– Treesearch(inparticularforSAT):pre-processing,branching

heuristics,clauselearning&deletion,restarts,datastructures,...

– Localsearch:neighbourhoods,perturbations,tabu length,annealing...

– Geneticalgorithms:populationsize,matingscheme,crossoveroperators,mutationrate,localimprovementstages,hybridizations,...

– MachineLearning:pre-processing,regularization(type&strength),minibatch size,learningrateschedules,optimizer&itsparameters,...

– Deeplearning(inaddition):#layers(&layertypes),#units/layer,dropoutconstants,weightinitializationanddecay,non-linearities,...

21

AlgorithmParameters

Parametertypes– Continuous,integer,ordinal– Categorical:finitedomain,unordered,e.g.{a,b,c}

Parameterspacehasstructure– E.g.parameterCisonlyactiveifheuristicH=hisused– Inthiscase,wesayCisaconditionalparameter withparentH

Parametersgiverisetoastructuredspaceofalgorithms– Manyconfigurations (e.g.1047)– Configurationsoftenyieldqualitativelydifferentbehaviour® Algorithmconfiguration (asopposedto“parametertuning”)

22

TheAlgorithmConfigurationProcess

23

AlgorithmConfiguration– inMoreDetail

24

AlgorithmConfiguration– FullFormalDefinition

25

Distributionvs SetofInstances

26

AlgorithmConfigurationisaUsefulAbstraction

27

Two differentinstantiations:

Largeimprovementstosolversformanyhardcombinatorialproblems

SAT,Max-SAT,MIP,SMT,TSP,ASP,time-tabling,AIplanning,…Competitionwinnersforalloftheserelyonconfigurationtools

AlgorithmConfigurationisaUsefulAbstraction

28

Increasinglypopular(citationnumbers:GoogleScholar)

0

50

100

150

200

250

300

350

400

450

500

20082009

20102011

20122013

20142015

2016

ParamILS (Hutter et al. 07+09)SMAC (Hutter et al. 11)I/F-Race (Balaprakash et al. 07, Birattari et al. 10, López-Ibáñez et al. 11)GGA/GGA++ (Ansotegui et al. 09+15)

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

29

ConfiguratorshaveTwoKeyComponents

• Component1:whichconfigurationtoevaluatenext?– Outofalargecombinatorialsearchspace– E.g.,CPLEX:76parameters,1047 configurations

• Component2:howtoevaluatethatconfiguration?– Evaluatingperformanceofaconfigurationisexpensive– E.g.,CPLEX:budgetof10000sperinstance– Instancesvaryinhardness

• Sometakemilliseconds,otherdays(forthedefault)• Improvementonafewinstancesmightnotmeanmuch

30

Component1:WhichConfigurationtoChoose?

• Forthiscomponent,wecanconsiderasimplerproblem:

Blackbox functionoptimization

– Onlymodeofinteraction:queryf(q)atarbitraryqÎQ

– Abstractsawaythecomplexityofmultipleinstances– Q isstillastructuredspace

• Mixedcontinuous/discrete• Conditionalparameters• Stillmoregeneralthan“standard”continuousBBO[e.g.,Hansenetal.]

31

minf(q)

q f(q)

qÎQ

Component1:WhichConfigurationtoChoose?

• Needtobalancediversificationandintensification• Theextremes

– Randomsearch– Hill-climbing

• Stochasticlocalsearch(SLS)• Population-basedmethods• Model-BasedOptimization

32

SequentialModel-BasedOptimization

• Fita(probabilistic)modeloffunctionf

• Usethatmodeltotradeoffexploitationvs exploration

• InmachinelearningliteratureknownasBayesianOptimization

33

Component2:HowtoEvaluateaConfiguration?

Backtogeneralalgorithmconfiguration

• Generalprinciple– Don’twastetoomuchtimeonbadconfigurations– Evaluategoodconfigurationsmorethoroughly

34

SimplestSolution:UseFixedNInstances

• Effectivelytreatstheproblemasblack-boxfunctionoptimization

• Issue:howlargetochooseN?– Toosmall:over-tuningtothoseinstances– Toolarge:everyfunctionevaluationisslow

35

RacingAlgorithms

• Comparetwoormorealgorithmsagainsteachother– Performonerunforeachconfigurationatatime– Discardconfigurationswhendominated

36

Imagesource:Maron &Moore,Hoeffding Races,NIPS1994

[Maron &Moore,NIPS1994][Birattari,Stützle,Paquete&Varrentrapp,GECCO2002]

SavingTime:AggressiveRacing

• Racenewconfigurationsagainstthebestknown– Discardpoornewconfigurationsquickly– Norequirementforstatisticaldomination

• Searchcomponentshouldallowtoreturntoconfigurationsdiscardedbecausetheywere“unlucky”

37

[Hutter,Hoos &Stützle,AAAI2007]

SavingMoreTime:AdaptiveCapping

Canterminaterunsforpoorconfigurationsq’early:– Is q’ better than q?

• Example:

• Can terminate evaluation of q’ once guaranteed to be worse than q

RT(q)=20 RT(q’)>20

20

RT(q’) = ?

(onlywhenminimizingalgorithmruntime)

38

[Hutter,Hoos,Leyton-Brown&Stützle,JAIR2009]

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

39

Overview:AlgorithmConfigurationSystems• Continuousparameters,singleinstances(black-boxfunctionoptimization)– Covarianceadaptationevolutionarystrategy(CMA-ES)

[Hansenetal. 2006-17]– SequentialParameterOptimization(SPO)

[Bartz-Beielstein etal. 2006]

• Generalalgorithmconfigurationmethods– ParamILS [Hutter etal. 2007,2009;Blotetal.2016]– Gender-basedGeneticAlgorithm(GGA)[Ansotegui etal. 2009,2015]– IteratedF-Race[Birattari etal. 2002,2010]– SequentialModel-basedAlgorithmConfiguration(SMAC)

[Hutter etal. 2011-17]– DistributedSMAC[Hutter etal. 2012-17]

40

TheBaseline:GraduateStudentDescent

StartwithsomeconfigurationrepeatModifyasingleparameterifperformanceonabenchmarksetdegradesthen

undomodificationuntilnomoreimprovementpossible

(or“goodenough")

41

TheParamILS Framework

IteratedLocalSearchinparameterconfigurationspace:

® Performsbiasedrandomwalkoverlocaloptima

[Hutter,Hoos,Leyton-Brown&Stützle,AAAI2007&JAIR2009]

42

TheBasicILS(N)algorithm

• InstantiatestheParamILS framework• UsesafixednumberofNrunsforeachevaluation

– SampleNinstancefromgivenset(withrepetitions)– Sameinstances(andseeds)forevaluatingallconfigurations– Essentiallytreatstheproblemasblackbox optimization

• HowtochooseN?– Toohigh:evaluatingaconfigurationisexpensive

® Optimizationprocessisslow– Toolow:noisyapproximationsoftruecost

® Poorgeneralizationtotestinstances/seeds

43

GeneralizationtoTestset,LargeN(N=100)

44

SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)

GeneralizationtoTestSet,SmallN(N=1)

45

SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)

BasicILS:Speed/GeneralizationTradeoff

46TestperformanceofSAPSonasingleQWHinstance

TheFocusedILS Algorithm

Aggressiveracing:morerunsforgoodconfigurations– StartwithN(q)=0forallconfigurations– IncrementN(q) wheneverthesearchvisitsq– “Bonus”runsforconfigurationsthatwinmanycomparisons

TheoremAsnumberofFocusedILS iterations®¥,convergencetooptimalconfiguration

– Keyideasinproof:1.Theunderlying ILSeventuallyreachesanyconfiguration2.ForN(q)®¥,theerrorincostapproximationsvanishes

47

FocusedILS:Speed/GeneralizationTradeoff

48TestperformanceofSAPSonasingleQWHinstance

SpeedingupParamILS

Standardadaptivecapping– Is q’ better than q?

• Example:

• Can terminate evaluation of q’ once guaranteed to be worse than q

TheoremEarly termination of poor configurations does not change ParamILS trajectory

– Often yields substantial speedups

– Especially when best configuration is much faster than worst

RT(q)=20 RT(q’)>20

20

49

[Hutter ,Hoos,Leyton-Brown,andStützle,JAIR2009]

Gender-basedGeneticAlgorithm(GGA)

• Geneticalgorithm– Genome=parameterconfiguration– Combinegenomesof2parentstoformanoffspring

• Supportsadaptivecapping– Evaluatepopulationmembersinparallel– Adaptivecapping:canstopwhenthefirstksucceed

• UseNinstancestoevaluateconfigurations– IncreaseNineachgeneration– LinearincreasefromNstart toNend

• Notrecommendedforsmallbudgets• Userhastospecify#generationsaheadoftime

50

[Ansotegui,Sellmann &Tierney,CP2009&IJCAI2015]

F-RaceandIteratedF-Race• F-Race

– Standardracingframework– F-testtoestablishthatsome

configurationisdominated– Followedbypairwiset tests

ifF-testsucceeds

• IteratedF-Race– Maintainaprobabilitydistribution

overwhichconfigurationsaregood– Samplek configurationsfromthatdistribution&racethem– Updatedistributionswiththeresultsoftherace

• Well-supportedsoftwarepackageinR

51

[Birattari etal.,GECCO2002&ORPerspectives2016;PérezCáceres etal.,LION2017]

Model-BasedAlgorithmConfiguration

SMAC:SequentialModel-BasedAlgorithmConfiguration– SequentialModel-BasedOptimization&aggressiveracing

repeat- constructmodeltopredictperformance- usethatmodeltoselectpromisingconfigurations- compareeachselectedconfigurationagainstthebestknown

until timebudgetexhausted52

[Hutter,Hoos &Leyton-Brown,LION2011]

SMAC:AggressiveRacing

• SimilarracingcomponentasFocusedILS– Morerunsforgoodconfigurations– Increase#runsforincumbentovertime

• Theorem fordiscreteconfigurationspaces:AsoveralltimebudgetforSMAC®¥,convergencetooptimalconfiguration

53

PoweringSMAC:EmpiricalPerformanceModels

Given:– Configuration space– Foreachprobleminstancei:vectorxi offeaturevalues– Observedalgorithmruntimedata:(q1,x1,y1),…,(qn,xn ,yn)

Find:mapping m: [q, x] ↦ y predictingA’sperformance

– Richliteratureonsuchperformancepredictionproblems[see,e.g.,Hutter,Xu,Hoos,Leyton-Brown,AIJ2014,foroverview]

– Here:usemodelm basedonrandomforests54

≈m (q,x)

RegressionTrees:FittingtoData

– Ineachinternalnode:onlystoresplitcriterionused– Ineachleaf:storemeanofruntimes

param3 Î {red} param3 Î {blue, green}

feature2 > 3.5feature2 ≤ 3.5

3.7 1.65 …55

feature2 > 3.5

RegressionTrees:PredictionsforNewInputs

param3 Î {red} param3 Î {blue, green}

feature2 ≤ 3.5

3.7 1.65 …

E.g. xn+1 = (true, 4.7, red)

– Walk down tree, return mean runtime stored in leaf Þ 1.65

56

RandomForests:SetsofRegressionTrees

Training– DrawTbootstrapsamples ofthedata– Foreachbootstrapsample,fitarandomized regressiontree

Prediction– PredictwitheachoftheTtrees– ReturnempiricalmeanandvarianceacrosstheseTpredictions

Complexity forNdatapoints– Training:O(TNlog2 N)– Prediction:O(TlogN)

57

AdvantagesofRandomForests

Automatedselectionofimportantinputdimensions– Continuous,integer,andcategoricalinputs– Upto138features,76parameters– Canidentifyimportantfeatureandparametersubsets

• Sometimes1feature,2parameterssufficient![Hutter,Hoos,Leyton-Brown,LION2013]

Robustness– Noneedtooptimizehyperparameters– Alreadygoodpredictionswithfewtrainingdatapoints

58

SMAC:AveragingAcrossMultipleInstances

• Fitrandomforestmodel

• Aggregateoverinstancesbymarginalization

– Intuition:predictforeachinstanceandthenaverage– Moreefficientimplementationinrandomforests

59

SMAC:PuttingitallTogetherInitializewithsinglerunfordefaultconfigurationrepeat

1.LearnRFmodelfromdatasofar:

2.Aggregateoverinstances:

3. Usemodelf toselectpromisingconfigurations4.Raceeachselectedconfigurationagainstthebestknown

until timebudgetexhausted

• Distributed SMAC[Hutter,Hoos &Leyton-Brown,2012]– Maintainqueueofpromisingconfigurations– Racetheseagainstbestknownondistributedworkercores

60

SMAC:AdaptiveCapping

Terminaterunsforpoorconfigurationsq early:

– Lowerboundonruntime® right-censoreddatapoint f(q)>20f(q*)=20

20

61

[Hutter,Hoos &Leyton-Brown,BayesOpt 2011]

ExperimentalEvaluation

ComparedSMACtoParamILS,GGA– On 17 SAT and MIP configuration scenarios, same time budget

SMACperformedbest– Improvements in test performance of configurations returned

• vs ParamILS: 0.93´ - 2.25´ (11/17 cases significantly better)

• vs GGA: 1.01´ - 2.76´ (13/17 cases significantly better)

Wall-clockspeedupsindistributedSMAC– Almostperfectwithupto16parallelworkers– Upto50-foldwith64workers

• Reductionsinwallclocktime:5h® 6min-15min2days® 40min- 2h

62

[Hutter,Hoos &Leyton-Brown,LION2011]

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

63

TheAlgorithmConfigurationProcess

preproc {none,simple,expensive}[simple]alpha[1,5][2]beta[0.1,1][0.5]

Parameterspacedeclarationfile./wrapper–inst X–timeout30-preproc none-alpha3-beta0.7® e.g.“successfulafter3.4seconds”

Wrapperforcommandlinecall

Whattheuserhastoprovide

64

Example:RunningSMAC

65

wget http://www.cs.ubc.ca/labs/beta/Projects/SMAC/smac-v2.10.03-master-778.tar.gz

tar xzvf smac-v2.10.03-master-778.tar.gz

cd smac-v2.10.03-master-778

./smac

./smac --seed 0 --scenarioFile example_scenarios/spear/spear-scenario.txt

Scenariofileholds:- Locationofparameterfile,wrapper&instances- Objectivefunction(here:minimizeavg.runtime)- Configurationbudget(here:30s)- Maximalcaptime pertargetrun(here:5s)

Forausagescreen

OutputofaSMACrun

66

[…]

[INFO ] *****Runtime Statistics*****Incumbent ID: 12 (0x22BB8)

Number of Runs for Incumbent: 43

Number of Instances for Incumbent: 5Number of Configurations Run: 42

Performance of the Incumbent: 0.012555555555555556Configuration Time Budget used: 30.589647351000067 s (101%)

Sum of Target Algorithm Execution Times (treating minimum value as 0.1): 24.70000s

CPU time of Configurator: 5.889042742 s[INFO ] **********************************************

[INFO ] Total Objective of Final Incumbent 12 (0x22BB8) on training set:

0.012555555555555556; on test set: 0.014499999999999999

[INFO ] Sample Call for Final Incumbent 12 (0x22BB8) cd /ubc/cs/home/h/hutter/tmp/smac-v2.06.00-master-615/example_scenarios/spear; ruby spear_wrapper.rbinstances/qcplin2006.10408.cnf 0 5.0 2147483647 3282095 -sp-update-dec-queue '0' -sp-rand-var-dec-scaling

'0.3528466348383826' -sp-clause-decay '1.713857938112484' -sp-variable-decay '1.461422623379798' -sp-orig-clause-sort-heur '7' -sp-rand-phase-dec-freq '0.05' -sp-clause-del-heur '0' -sp-learned-clauses-inc'1.452683835620401' -sp-restart-inc '1.6481745669620091' -sp-resolution '0' -sp-clause-activity-inc'0.7121640599232154' -sp-learned-clause-sort-heur '12' -sp-var-activity-inc '0.9358501810374242' -sp-rand-var-dec-freq '0.0001' -sp-use-pure-literal-rule '1' -sp-learned-size-factor '0.27995062371127827' -sp-var-dec-heur '16' -sp-

phase-dec-heur '6' -sp-rand-phase-scaling '1.0424648235977578' -sp-first-restart '31'

Decision#1:ConfigurationBudget&Captime

• Configurationbudget– Dictatedbyyourresources&needs

• E.g.,startconfigurationbeforeleavingworkonFriday

– Thelongerthebetter(butdiminishingreturns)• Roughruleofthumb:typicallytimeforatleast1000targetruns• Buthavealsoachievedgoodresultswith50targetrunsinsomecases

• Maximalcaptime pertargetrun– Dictatedbyyourneeds(typicalinstancehardness,etc)– Toohigh:slowprogress– Toolow:possibleovertuning toeasyinstances– ForSATetc.,oftenuse300CPUseconds

67

Decision#2:ChoosingtheTrainingInstances

• Representativeinstances,moderatelyhard– Toohard:won’tsolvemanyinstances,notraction– Tooeasy:willresultsgeneralizetoharderinstances?– Ruleofthumb:mixofhardnessranges

• Roughly75%instancessolvablebydefaultinmaximalcaptime

• Enoughinstances– Themoretraininginstances,thebetter– Veryhomogeneousinstancesets:50instancesmightsuffice– Preferably³ 300instances,bettereven³ 1000instances

68

Decision#2:ChoosingtheTrainingInstances

• Splitinstancesetintotrainingandtestsets– Configureonthetraininginstances® configurationq*– Run(only)q* ontestinstances

• Unbiasedestimateofperformance

69

Pitfall:configuringonyourtestinstances

That’sfromthedarkages

Finepractice:domultipleconfigurationrunsandpicktheq*withbesttrainingperformance

Not(!!)bestontestset

Decision#2:ChoosingtheTrainingInstances

• Worksmuchbetteronhomogeneousbenchmarks– Instancesthathavesomethingincommon

• E.g.,comefromthesameproblemdomain• E.g.,usethesameencoding

– Oneconfigurationlikelytoperformwellonallinstances

70

Pitfall:configurationontooheterogeneoussets

Thereoftenisnosinglegreatoverallconfiguration(butseealgorithmselectionetc.,laterinthetutorial)

Decision#3:HowManyParameterstoExpose?

• Suggestion:allparametersyoudon’tknowtobeuseless– Moreparameters® largergainspossible– Moreparameters® harderproblem– Max.#parameterstackledsofar:768

[Thornton, Hutter, Hoos & Leyton-Brown, KDD‘13]

• Withmoretimeyoucansearchalargerspace

71

Pitfall:includingparametersthatchangetheproblem

E.g.,optimalitythresholdinMIPsolvingE.g.,howmuchmemorytoallowthetargetalgorithm

Decision#4:HowtoWraptheTargetAlgorithm• Donottrustanytargetalgorithm

– Willitterminateinthetimeyouspecify?– Willitcorrectlyreportitstime?– Willitneverusemorememorythanspecified?– Willitbecorrectwithallparametersettings?

72

Pitfall:blindlyminimizingtargetalgorithmruntimeTypically,youwillminimizethetimetocrash

Goodpractice:wraptargetrunswithtoolcontrollingtimeandmemory(e.g.,runsolver [Roussel etal,’11])

Goodpractice:verifycorrectnessoftargetrunsDetectcrashes&penalizethem

FurtherBestPracticesand Pitfalls to Avoid

• Pitfalls– Using differentwrappers for comparing differentconfigurators

• Often causes subtle bugs that completely invalidate the comparison

– Notterminating algorithm runs properly• Use generic wrapper to handlethis for you (we provide one inPython)

– Filesystem issues when running many parallelexperiments

• Bestpractices– Choose configuration spaces dependent onbudget

• If you have timefor 10evaluations,don‘t configure 100sof parameters

– Realistic and/or reproducible running timemetrics• CPUvs wall-clock timevsMEMS

– Compareconfiguratorsonexisting,open-sourcebenchmarks• E.g.,useAClib (inversion2:https://bitbucket.org/mlindauer/aclib2)

73

[Eggensperger,Lindauer &Hutter,arXiv 2017]

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Portfolio-BasedAlgorithmSelection

• SoftwareDevelopmentSupport&FurtherDirections

74

ApplicationsofAlgorithmConfiguration

HelpedwinCompetitionsSAT:since2009ASP:since2009IPC:since2011Time-tabling:2007SMT:2007

OtherAcademicApplicationsMixedintegerprogramming(MIP)TSP&QuadraticAssignmentProblemGameTheory:KidneyExchangeLinearalgebrasubroutinesImprovingJavaGarbageCollectionMLHyperparameter OptimizationDeeplearning

75

FCCspectrumauction

Mixedintegerprogramming

Analytics&Optimization

Socialgaming

SchedulingandResourceAllocation

BacktotheSpearExampleSpear[Babic,2007]

– 26parameters– 8.34´ 1017 configurations

RanParamILS,2to3days´ 10machines– Onatrainingsetfromeachof2distributions

Comparedtodefault(1weekofmanualtuning)– Onadisjointtestsetfromeachdistribution

4.5-foldspeedup500-foldspeedupÞ wonQF_BVcategoryin2007SMTcompetition

belowdiagonal:speedup

Log-logscale!

[Hutter,Babic,Hu&Hoos,FMCAD2007]

76

OtherExamplesofPbO forSAT

• SATenstein [KhudaBukhsh,Xu,Hoos&Leyton-Brown,IJCAI2009&AIJ2016]

– Combinedingredientsfromexistingsolvers– 54parameters,over1012 configurations– Speedupfactors:1.6xto218x

• CaptainJack[Tompkins&Hoos,SAT2011]

– Exploredacompletelynewdesignspace– 58parameters,over1050 configurations– Afterconfiguration:bestknownsolverfor3sat10kandIL50k

77

ConfigurableSATSolverCompetition(CSSC)

• AnnualSATcompetition– ScoresSATsolversbytheirperformanceacrossinstances– Medalsforbestaverageperformancewithsolverdefaults

• Misleadingresults:implicitlyhighlightssolverswithgooddefaults

• CSSC2013&2014– Betterreflectsanapplicationsetting:homogeneousinstances® canautomaticallyoptimizeparameters

– Medalsforbestperformanceafterconfiguration

78

[Hutter,Balint,Bayless,Hoos &Leyton-Brown2013]

CSSCResult#1

• Solverperformanceoftenimprovedalot:

79

Lingeling onCircuitFuzz:Timeouts:119® 107

Clasponn-queens:Timeouts:211® 102

probSAT onunif rnd 5-SAT:Timeouts:250® 0

[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]

CSSCResult#2

• Automatedconfigurationchangedalgorithmrankings– Example:randomSAT+UNSATcategoryin2013

80

Solver CSSCranking DefaultrankingClasp 1 6

Lingeling 2 4

Riss3g 3 5

Solver43 4 2

Simpsat 5 1

Sat4j 6 3

For1-nodrup 7 7

gNovelty+GCwa 8 8

gNovelty+Gca 9 9

gNovelty+PCL 10 10

[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]

Real-WorldApplication:FCCSpectrumAuction• Wirelessfrequencyspectra:demandincreases

– USFederalCommunicationsCommission(FCC)iscurrentlyholdinganauction– Expectednetrevenue:$10billionto$40billion

• Keycomputationalproblem:feasibilitytestingbasedoninterferenceconstraints– Ahardgraphcolouringproblem– 2991stations(nodes)&

2.7millioninterferenceconstraints– Needtosolvemanydifferentinstances– Moreinstancessolved:higherrevenue

• Bestsolution:basedonSATsolving&configurationwithSMAC– Improved#instancessolvedfrom73%to99.6%

[Frechette, Newman & Leyton-Brown, AAAI 2016]

81

ConfigurationofaCommercialMIPsolver

MixedIntegerProgramming(MIP)

CommercialMIPsolver:IBMILOGCPLEX– Leadingsolverfor15years– Licensedby over1000universitiesand1300corporations– 76parameters,1047 configurations

Minimizingruntimetooptimalsolution– Speedupfactor:2´ to50´– Laterwork:speedupsupto10,000´

Minimizingoptimalitygapreached– Gapreductionfactor:1.3´ to8.6´

[Hutter,Hoos &Leyton-Brown,CPAIOR2010]

82

ComparisontoCPLEXTuningToolCPLEXtuningtool

– Introducedinversion11(late2007,afterParamILS)– Evaluatespredefinedgoodconfigurations,returnsbestone– Requiredruntimevaries(from<1htoweeks)

ParamILS:anytimealgorithm– Ateachtimestep,keepstrackofitsincumbent

2-fold speedup

(our worst result)50-fold speedup(our best result)

lower is better

[Hutter,Hoos &Leyton-Brown,CPAIOR2010]

83

ConfigurationofMachineLearningAlgorithms

• MachineLearninghascelebratedsubstantialsuccesses• Butitrequireshumanmachinelearningexpertsto

– Preprocessthedata– Performfeatureselection– Selectamodelfamily– Optimizehyperparameters– …

• AutoML:takingthehumanexpertoutoftheinnerloop– YearlyAutoML workshopsatICMLsince2014– PbO appliedtomachinelearning

84

Auto-WEKA

WEKA [Wittenetal,1999-current]– mostwidelyusedoff-the-shelfmachinelearningpackage– over20000citationsonGoogleScholar

Javaimplementationofabroadrangeofmethods– 27baseclassifiers(withupto10parameterseach)– 10meta-methods– 2ensemblemethods– 3featuresearchmethods&8featureevaluators

Differentmethodsworkbestondifferentdatasets– Wantatrueoff-the-shelfsolution:

85

[Thornton, Hutter, Hoos & Leyton-Brown, KDD 2013]

Learn

WEKA’sconfigurationspace

Baseclassifiers– 27choices,eachwithupto10subparameters– Coarsediscretization:about108 instantiations

Hierarchicalstructureontopofbaseclassifiers

86

WEKA’sconfigurationspace(cont’d)Featureselection

– Searchmethod:whichfeaturesubsetstoevaluate– Evaluationmethod:howtoevaluatefeaturesubsetsinsearch– Bothmethodshavesubparameters® about107 instantiations

Intotal:768parameters,1047 configurations

87

Auto-WEKA:Results

• Auto-WEKAperformedbetterthanbestbaseclassifier– Evenwhen“bestbaseclassifier”determinedbyoracle– In6/21datasetsmorethan10%reductionsinrelativeerror

• Comparisontofullgridsearch– Unionofgridsoverparametersofall27baseclassifiers– Auto-WEKAwas100timesfaster– Auto-WEKAhadbettertestperformancein15/21cases

• Auto-WEKAbasedonSMACvs TPE[Bergstra etal.,NIPS2011]– SMACyieldedbetterCVperformancein19/21cases– SMACyieldedbettertestperformancein14/21cases– Differencesusuallysmall,in3casessubstantial(SMACbetter)

88

Auto-WEKAasaWEKAplugin

• Commandlineexample:java -cp autoweka.jar weka.classifiers.meta.AutoWEKAClassifier-t iris.arff -timeLimit 5 -no-cv

• GUIexample:

89

Auto-sklearn

• FollowedandextendedAuto-WEKA’sAutoML approach• Scikit-learnoptimizedbySMAC,plus

– Meta-learningtowarmstart Bayesianoptimization– Automated posthoc ensembleconstructiontocombinethemodelsBayesianoptimizationevaluated

90

[Feurer,Klein,Eggensperger,Springenberg,Blum,Hutter, NIPS2015]

Auto-sklearn’s configurationspace

• Scikit-learn[Pedregosa etal.,2011-current] insteadofWEKA– 15classifiers,(withatotalof59hyperparameters)

– 13featurepreprocessors(42hyperparams)

– 4 datapreprocessors(5hyperparams)

• 110hyperpametersvs 768inAuto-WEKA

91

Auto-sklearn:ReadyforPrimeTime• Winningapproachinthe14-monthAutoML challenge

– Best-performingapproachinauto-trackandhumantrack– Wonbothtracksinbothfinalphases– Vs 150teamsofhumanexperts

• Trivialtouse:

• Availableonline:https://github.com/automl/auto-sklearn

92

PbO forDeepLearning

• Whatisdeeplearning?– Neuralnetworkswithmanylayers

• Whyistheresomuchexcitementaboutit?– Dramaticallyimprovedthestate-of-the-artinmanyareas,e.g.,

• Speechrecognition• Imagerecognition

– Automaticlearningofrepresentations® nomoremanualfeatureengineering

• Whatchanged?– Largerdatasets– Betterregularizationmethods,e.g.,dropout[Hinton et al., 2012]

– FastGPUimplementations[Krizhevsky et al., 2012]

93

Source:Krizhevsky etal.,2012

Source:Leetal.,2012

DeepLearningisSensitivetoManyChoices

94

dogcat

#convolutionallayers #fullyconnectedlayers

#unitsperlayer

Kernelsizes

Choiceofnetworkarchitecture…

…and20-30othernumericalchoicesLearningrateschedule(initialization,decay,adaptation),momentum,batchnormalization,batchsize,#epochs,dropoutrates,weightinitializations,weightdecay,…

Auto-NetforComputerVisionData

• Application:objectrecognition

• ParameterizedtheCaffe framework [Jia, 2013]

– Convolutionalneuralnetwork– 9networkhyperparameters– 12hyperparametersperlayer,upto6layers– Intotal81hyperparameters

• ResultsforCIFAR-10– NewbestresultforCIFAR-10withoutdataaugmentation

– SMACoutperformedTPE(onlyotherapplicablehyperparameter optimizer)

95

[Domhan, Springenberg, Hutter, IJCAI 2015]

Auto-NetintheAutoML Challenge

• ClearlywonadatasetofAutoML challenge– 54491datapoints,5000features,18classes

• Unstructureddata® fully-connectednetwork– Upto5layers(with3layerhyperparameterseach)– 14networkhyperparameters,intotal29hyperparameters– Optimizedfor18hon5GPUs

• Result(onprivatetestset)– AUC90%– Allother(manual)approaches<80%

96

SpeedupsbyPredictionofLearningCurves

• Humanscanlookinsidetheblackbox– Theycanpredictthefinalperformanceofatargetalgorithmrunearly

– Afterfewepochsofstochasticgradientdescent

– Stopifnotpromising

• Weautomatedthatheuristic– Fittedlinearcombinationof22parametricmodels– MCMCtopreserveuncertaintyovermodelparameters– Stoppedpoorrunsearly:overall2-foldspeedup

97

[Domhan, Springenberg & Hutter, IJCAI 2015]

log(C)

log(g) log(g) log(g) log(g)

smax /128smax /16smax /4smax

log(C)

log(C)

log(C)

SpeedupsbyReasoningoverDataSubsets

• Problem:trainingisveryslowforlargedatasets

• Solutionapproach:scalingupfromsubsetsofgivendata

• Example:SVM– Computationalcostgrowsquadratically indatasetsizes– Errorshrinkssmoothlywiths

98

[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]

SpeedupsbyReasoningoverDataSubsets

• 10-100xspeedupforoptimizingSVMhyperparameters• 5-10xspeedupforconvolutionalneuralnetworks

99

Error

Budgetofoptimizer[s]

[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]

SummaryofAlgorithmConfiguration

• Algorithmconfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies

• Usefulabstractionwithmany(!)applications• Oftenbetterperformancethanhumanexperts• Muchlesshumanexperttimerequired

Linkstoallourcode:http://ml4aad.org100

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration

• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction

• SoftwareDevelopmentToolsandFurtherDirections

101

Motivation:nosinglegreatconfigurationexists

• Heterogeneousinstancedistributions– Evenbestoverallconfigurationisnotgreat;e.g.:

• Nosinglebestsolver– E.g.,SATsolving:differentsolverswindifferentcategories

– Virtualbestsolver(VBS)muchbetterthansinglebestsolver(SBS)

102

Configuration Instancetype1 Instancetype2#1 1s 1000s#2 1000s 1s#3 100s 100s

Algorithmportfolios

103

Parallel portfolios [Huberman et al. 1997]

Algorithm schedules [Sayag et al. 2006]

Algorithm selection [Rice 1976]

Algorithms

Feature extractor

Algorithmselector

instance

instance

time

instancetime

Exploiting complementary strengths of different algorithms

Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution

• Satisfiability:– SATzilla*† [variouscoauthors,citedinthefollowingslides;2003-17]– 3S*† [Sellmann 2011]

– ppfolio‡ [Roussel 2011]– claspfolio* [Gebser,Kaminski,Kaufmann,Schaub,Schneider,Ziller2011]

– aspeed†‡ [Kaminski,Hoos,Schaub,Schneider2012]

• ConstraintSatisfaction:– CPHydra*† [O’Mahony,Hebrard,Holland,Nugent,O’Sullivan2008]

104

Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution

• Planning:– FDStoneSoup† [Helmert,Röger,Karpas 2011]

• MixedIntegerProgramming:– ISAC* [Kadioglu,Malitsky,Sellmann,Tierney2010]

– MIPzilla*† [Xu,Hutter,Hoos,Leyton-Brown2011]

• ..andthisisjustthetipoftheiceberg:– http://dl.acm.org/citation.cfm?id=1456656[Smith-Miles2008]– http://4c.ucc.ie/~larsko/assurvey[Kotthoff 2012]

105

Overview

• ProgrammingbyOptimization(PbO):MotivationandIntroduction

• AlgorithmConfiguration

• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction

• SoftwareDevelopmentToolsandFurtherDirections

106

SATzilla:theearlycoreapproach[Leyton-Brown,Nudelman,Andrew,J.McFadden,Shoham 2003]

[Nudelman,Leyton-Brown,Devkar,Shoham,Hoos2004]

• Training(partofalgorithmdevelopment)

– Buildastatisticalmodeltopredictruntimeforeachcomponentalgorithm

• Test(foreachnewinstance)– Predictperformanceforeachalgorithm– Pickthealgorithmpredictedtobebest

• GoodperformanceinSATcompetitions– 2003:2silver,1bronzemedals– 2004:2bronzemedals

107

• Given:– trainingsetofinstances– performancemetric– candidatesolvers– portfoliobuilder

(incl.instancefeatures)

• Training:– collectperformancedata– learnamodelforselecting

amongsolvers

• AtRuntime:– evaluatemodel– runselectedsolver

Metric

Portfolio Builder

Training Set

NovelInstance Portfolio-Based

Algorithm Selector

Candidate Solvers

SelectedSolver

SATzilla (stylizedversion)

108

SATInstanceFeatures(2003-14)Over100features,including:• Instancesize(clauses,variables,clauses/variables,…)• Syntactic properties(e.g.,positive/negativeclauseratio)

• Statisticsofvariousconstraintgraphs– factorgraph– clause–clausegraph– variable–variablegraph

• Knuth’ssearchspacesize estimate

• Treesearchprobing• Localsearchprobing

• Linearprogramming relaxation109

SATzilla 2007

• Substantiallyextendedfeatures

• Earlyalgorithmschedule:identifysetof“presolvers”andascheduleforrunningthem– Foreverychoiceoftwopresolvers +captimes,runentireSATzilla pipelineandevaluateoverallperformance

– Keepchoicethatyieldsbestperformance– Forlatersteps:Discardinstancessolvedbythispresolvingschedule

• Identify a“backupsolver”:SBSonremainingdata– Neededincasefeaturecomputationcrashes

• 2007SATcompetition:3gold,1silver,1bronzemedals

110

[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]

SATzilla 2009

• Robustness:selectionofbestsubsetofcomponentsolvers– Considereverysubsetofgivensolverset

• omittingweaksolverspreventstheiraccidentalselection• conditionedonchoiceofpresolvers• computationallycheap:modelsdecomposeacrosssolvers

– Keepsubsetthatachievesbestperformance

• Fullyautomatedprocedure– optimizeslossonavalidationset

• 2009SATcompetition:3gold,2silvermedals

111

[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]

SATzilla 2011andlater:cost-sensitiveDFs

• Howitworks:– Buildclassifiertodeterminewhichalgorithmtopreferbetweeneachpairofalgorithmsingivenportfolio

– Lossfunction:costofmisclassification

• Decisionforestsandsupportvectormachineshavecost-sensitivevariants

• Classifiersvotefordifferentalgorithms;selectalgorithmwithmostvotes– Advantage:selectionisaclassificationproblem– Advantage:bigandsmallerrorstreateddifferently

• 2011SATcompetition:enteredEvaluationTrack(morelater)

112

[Xu, Hutter, Hoos & Leyton-Brown, SAT 2012]

2012SATChallenge:Application

113*Interactingmulti-enginesolvers:likeportfolios,butricherinteractionbetweensolvers

2012SATChallenge:HardCombinatorial

114

SATChallenge2012:Random

115

2012SATChallenge:SequentialPortfolio

• 3Sdeservesmentioning,butdidn’trankofficially[Kadioglu,Malitsky,Sabharwal,Samulowitz,Sellmann,2011]– Disqualifiedonatechnicality

• choseabuggysolverthatreturnedanincorrectresult• anoccupationalhazardforportfolios!

– OverallperformancenearlyasstrongasSATzilla116

2013onwards

• Algorithmselectionavictimofitsownsuccess

• 2013:“TheemphasisofSATCompetition2013isonevaluationofcoresolvers:”– Single-coreportfoliosof>2solversnoteligible– One“opentrack”allowingparallelsolvers,portfolios,etc– Thatopentrackwasdominatedbyportfolios

• 2014onwards– “SATCompetition2014onlyallowssubmissionofcoresolvers”– Portfolioresearchersstartedtheirowncompetition:theICONAlgorithmSelectionChallenge

117

AutoFolio:PbOforAlgorithmSelection

• Defineageneralspaceofalgorithmselectionmethods– AutoFolio‘sconfigurationspace:54choices

• Usealgorithmconfigurationtoselectbestinstantiation– Partitionthetrainingbenchmarkinstancesinto10folds– UseSMACtofindalgorithm selectionapproach&itshyperparametersthatoptimizeCVperformance

118

[Lindauer, Hoos, Hutter & Schaub, JAIR 2015]

ICONAlgorithmSelectionChallenge2015• Ingredientsofanalgorithmselection(AS)benchmark

– Asetofsolvers– Asetofbenchmarkinstances(splitintotraining&test)– Measuredperformanceofallsolversonallinstances

• Algorithmselectioncompetition:– 13ASbenchmarksfromSAT,MaxSAT,CSP,ASP,QBF,Premarshalling– 9competitorsusingregression,classification,clustering,k-NN,etc

• Winningalgorithmsinthe3tracks:– Penalizedaverageruntime:AutoFolio– Numberofinstancessolved:AutoFolio– Frequencyofselectingthebestalgorithm:SATzilla– OverallwinnerSATzilla(2ndinfirsttwotracks)

119

Tryityourself!

• SATzilla andAutoFolio arefreelyavailableonline

http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/http://ml4aad.org/autofolio

• Youcantrythemforyourproblem– wehavefeaturesforSAT,MIP,AIplanningandTSP– youneedtoprovidefeaturesforotherdomains

• inmanycases,thegeneralideasbehindourfeaturesapply• canalsomakefeaturesbyreducingyourproblemtoe.g.SATandcomputingtheSATfeatures

120

Automatically Configuring Algorithmsfor Portfolio-Based SelectionXu, Hoos, Leyton-Brown (2010); Kadioglu et al. (2010)

Note:

I SATzilla builds algorithm selector based on given setof SAT solversbut: success entirely depends on quality of given solvers

I Automated configuration produces solvers that work wellon average on a given set of SAT instances(e.g., SATenstein – KhudaBukhsh, Xu, Hoos, Leyton-Brown 2009)

but: may have to settle for compromisesfor broad, heterogenous instance sets

Idea: Combine the two approaches portfolio-based selectionfrom set of automatically constructed solvers

Hoos & Hutter: Programming by Optimization 121

Combined Configuration + Selection

parametric algorithm(multiple configurations)

selectorfeature extractor

Hoos & Hutter: Programming by Optimization 122

Approach #1:

1. build solvers for various types of instances using automatedalgorithm configuration

2. construct portfolio-based selector from these

Problem: requires suitably defined sets of instances

Solution: automatically partition heterogenous instance set

Hoos & Hutter: Programming by Optimization 123

Instance-specific algorithm configuration (ISAC)

Kadioglu, Malitsky, Sellmann, Tierney (2010); Malitky, Sellman (2012)

1. cluster training instances based on features(using G-means)

2. configure given parameterised algorithm independentlyfor each cluster (using GGA)

3. construct portfolio-based selector from resulting configurations(using distance to cluster centroids)

Drawback: Instance features may not correlate wellwith impact of algorithm parameters on performance(e.g., uninformative features)

Hoos & Hutter: Programming by Optimization 124

Approach #2:

Key idea: Augment existing selector AS by targetting instanceson which AS performs poorly(cf. Leyton-Brown et al. 2003; Leyton-Brown et al. 2009)

I interleave configuration and selector construction

I in each iteration, determine configuration that complementscurrent selector best

Advantages:

I any-time behaviour: iteratively adds configurations

I desirable theoretical guarantees (under idealising assumptions)

Hoos & Hutter: Programming by Optimization 125

HydraXu, Hoos, Leyton-Brown (2010); Xu, Hutter, Hoos, Leyton-Brown (2011)

1. configure given target algorithm A on complete instance set I configuration A1 = selector AS1 (always selects A1)

2. configure a new copy of A on I such that performance ofselector AS := AS1 + Anew is optimised configuration A2

selector AS2 := AS1 + A2 (selects from {A1,A2})

3. configure a new copy of A on I such that performance ofselector AS := AS2 + Anew is optimised configuration A3

selector AS3 := AS2 + A3 (selects from {A1,A2,A3})

...

Hoos & Hutter: Programming by Optimization 126

Note:

I e↵ectively adds A with maximal marginal contributionin each iteration

I estimate marginal contribution using perfect selector (oracle) avoids costly construction of selectors during configuration

I works well using FocusedILS for configuration,*zilla for selection (but can use other configurators, selectors)

I can be further improved by adding multiple configurationsper iteration; using performance estimates from configurator

Hoos & Hutter: Programming by Optimization 127

Results on SAT:

I target algorithm: SATenstein-LS (KhudaBukhsh et al. 2009)

I 6 well-known benchmark sets of SAT instances(application, crafted, random)

I 7 iterations of Hydra

I 10 configurator runs per iteration, 1 CPU day each

Hoos & Hutter: Programming by Optimization 128

Results on mixture of 6 benchmark sets

10-2

100

102

10-2

10-1

100

101

102

103

Hydra[BM, 1] PAR Score

Hyd

ra[B

M, 7

] P

AR

Sco

re

Hoos & Hutter: Programming by Optimization 129

Results on mixture of 6 benchmark sets

0 1 2 3 4 5 6 7 80

50

100

150

200

250

300

Number of Hydra Steps

PA

R S

core

Hydra[BM] trainingHydra[BM] testSATenFACT on trainingSATenFACT on test

Hoos & Hutter: Programming by Optimization 130

Note:

I good results also for MIP (CPLEX)(Xu, Hutter, Hoos, Leyton-Brown 2011)

I idea underlying Hydra can also be applied toautomatically construct parallel algorithm portfoliosfrom single parameterised target algorithm(Hoos, Leyton-Brown, Schaub, Schneider 2012–14)

Hoos & Hutter: Programming by Optimization 131

Software Development Support

and Further Directions

Software development in the PbO paradigm

use context

PbO-<L>source(s)

parametric<L>

source(s)

instantiated<L>

source(s)

deployedexecutable

designspace

description

PbO-<L> weaver

PbO design

optimiser

benchmarkinputs

Hoos & Hutter: Programming by Optimization 132

Design space specification

Option 1: use language-specific mechanisms

I command-line parameters

I conditional execution

I conditional compilation (ifdef)

Option 2: generic programming language extension

Dedicated support for . . .

I exposing parameters

I specifying alternative blocks of code

Hoos & Hutter: Programming by Optimization 133

Advantages of generic language extension:

I reduced overhead for programmer

I clean separation of design choices from other code

I dedicated PbO support in software development environments

Key idea:

I augmented sources: PbO-Java = Java + PbO constructs, . . .

I tool to compile down into target language: weaver

Hoos & Hutter: Programming by Optimization 134

use context

PbO-<L>source(s)

parametric<L>

source(s)

instantiated<L>

source(s)

deployedexecutable

designspace

description

PbO-<L> weaver

PbO design

optimiser

benchmarkinput

Hoos & Hutter: Programming by Optimization 135

Exposing parameters

...

numerator -= (int) (numerator / (adjfactor+1) * 1.4);

... ...

##PARAM(float multiplier=1.4)

numerator -= (int) (numerator / (adjfactor+1) * ##multiplier);

...

I parameter declarations can appear at arbitrary places(before or after first use of parameter)

I access to parameters is read-only (values can only beset/changed via command-line or config file)

Hoos & Hutter: Programming by Optimization 136

Specifying design alternatives

I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)

I Choice point:location in a program at which a choice is available

##BEGIN CHOICE preProcessing

<block 1>

##END CHOICE preProcessing

Hoos & Hutter: Programming by Optimization 137

Specifying design alternatives

I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)

I Choice point:location in a program at which a choice is available

##BEGIN CHOICE preProcessing=standard

<block S>

##END CHOICE preProcessing

##BEGIN CHOICE preProcessing=enhanced

<block E>

##END CHOICE preProcessing

Hoos & Hutter: Programming by Optimization 137

Specifying design alternatives

I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)

I Choice point:location in a program at which a choice is available

##BEGIN CHOICE preProcessing

<block 1>

##END CHOICE preProcessing

...

##BEGIN CHOICE preProcessing

<block 2>

##END CHOICE preProcessing

Hoos & Hutter: Programming by Optimization 137

Specifying design alternatives

I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)

I Choice point:location in a program at which a choice is available

##BEGIN CHOICE preProcessing

<block 1a>

##BEGIN CHOICE extraPreProcessing

<block 2>

##END CHOICE extraPreProcessing

<block 1b>

##END CHOICE preProcessing

Hoos & Hutter: Programming by Optimization 137

!"#$%&'(#)(

*+,-./0"&!1%#2"3

45156#(17%./0

$"&!1%#2"3

7'"(5'(75(#8./0

$"&!1%#2"3

8#49&:#8#)#%!(5+9#

8#"7;'"45%#

8#"%174(7&'

$$$*+,-./0$$$<#5=#1

*+,$8#"7;'&4(767"#1

+#'%>651?7'4!(

Hoos & Hutter: Programming by Optimization 138

The Weaver

transforms PbO-<L> code into <L> code(<L> = Java, C++, . . . )

I parametric mode:

I expose parameters

I make choices accessible via (conditional, categorical)parameters

I (partial) instantiation mode:

I hardwire (some) parameters into code(expose others)

I hardwire (some) choices into code(make others accessible via parameters)

Hoos & Hutter: Programming by Optimization 139

The road ahead

I Support for PbO-based software development

I Weavers for PbO-C, PbO-C++, PbO-Java

I PbO-aware development platforms

I Improved / integrated PbO design optimiser

I Debugging and performance analysis tools

I Best practices

I Many further applications

I Scientific insights

Hoos & Hutter: Programming by Optimization 140

Which choices matter?

Observation: Some design choices matter more than others

depending on . . .

I algorithm under consideration

I given use context

Knowledge which choices / parameters matter may . . .

I guide algorithm development

I facilitate configuration

Hoos & Hutter: Programming by Optimization 141

3 recent approaches:

I Forward selection based on empirical performance modelsHutter, Hoos, Leyton-Brown (2013)

I Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)

I Ablation analysisFawcett & Hoos (2013–16)

Hoos & Hutter: Programming by Optimization 142

Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)

Key idea:

I build regression model of algorithm performance as a functionof all input parameters (= design choices)

empirical performance models (EPMs)

I analyse variance in model output (= predicted performance)due to each parameter, parameter interactions

I importance of parameter: fraction of performance variationover configuration space explained by it (main e↵ect)

I analogous for sets of parameters (interaction e↵ects)

Hoos & Hutter: Programming by Optimization 143

Decomposition of variance in a nutshell

For parameters p1, . . . , pn and a function (performance model) y :

y(p1, . . . , pn) = µ

+ f1(p1) + f2(p2) + · · ·+ fn(pn)

+ f1,2(p1, p2) + f1,3(p1, p3) + · · ·+ fn�1,n(pn�1, pn)

+ f1,2,3(p1, p2, p3) + · · ·+ · · ·

Hoos & Hutter: Programming by Optimization 144

Note:

I Straightforward computation of main and interaction e↵ectsis intractable.(integration over combinatorial spaces of configurations)

I For random forest models, marginal performance predictionsand variance decomposition (up to constant-sized interactions)can be computed exactly and e�ciently.

Hoos & Hutter: Programming by Optimization 145

Empirical study:

I 8 high-performance solvers for SAT, ASP, MIP, TSP(4–85 parameters)

I 12 well-known sets of benchmark data(random + real-world structure)

I random forest models for performance prediction,trained on 10 000 randomly sampled configurations per solver+ data from 25+ runs of SMAC configuration procedure

Hoos & Hutter: Programming by Optimization 146

Fraction of variance explained by main e↵ects:

CPLEX on RCW (comp sust) 70.3%CPLEX on CORLAT (comp sust) 35.0%

Clasp on software verificatition 78.9%Clasp on DB query optimisation 62.5%

CryptoMiniSAT on bounded model checking 35.5%CryptoMiniSAT on software verification 31.9%

Hoos & Hutter: Programming by Optimization 147

Fraction of variance explained by main + 2-interaction e↵ects:

CPLEX on RCW (comp sust) 70.3% + 12.7%CPLEX on CORLAT (comp sust) 35.0% + 8.3%

Clasp on software verificatition 78.9% + 14.3%Clasp on DB query optimisation 62.5% + 11.7%

CryptoMiniSAT on bounded model checking 35.5% + 20.8%CryptoMiniSAT on software verification 31.9% + 28.5%

Hoos & Hutter: Programming by Optimization 148

Note:may pick up variation caused by poorly performing configurations

Simple solution:cap at default performance or quantile from distribution ofrandomly sampled configurations; build model from capped data.

Hoos & Hutter: Programming by Optimization 149

Ablation analysisFawcett & Hoos (2013, 2016)

Key idea:

I given two configurations, A and B , change one parameterat a time to get from A to B

ablation path

I in each step, change parameter to achieve maximal gain(or minimal loss) in performance

I for computational e�ciency, use racing (F-race)for evaluating parameters considered in each step

Hoos & Hutter: Programming by Optimization 150

Empirical study:

I high-performance solvers for SAT, MIP, AI Planning(26–76 parameters),well-known sets of benchmark data (real-world structure)

I optimised configurations obtained from ParamILS(minimisation of penalised average running time;(10 runs per scenario, 48 CPU hours each)

Hoos & Hutter: Programming by Optimization 151

Ablation between default and optimised configurations:

0.1

1

10

100

0 5 10 15 20 25

Per

form

ance

(PA

R10

, s)

#Parameters modified from default

Default to configuredConfigured to default

LPG on Depots planning domain

Hoos & Hutter: Programming by Optimization 152

Which parameters are important?

LPG on depots:

I cri intermediate levels (43% of overall gain!)

I triomemory

I donot try suspected actions

I walkplan

I weight mutex in relaxed plan

Note: Importance of parameters varies between planning domains

New development: Ablation based on surrogate models(Biedenkapp, Lindauer, Eggensperger, Hutter, Fawcett, Hoos 2017)

Hoos & Hutter: Programming by Optimization 153

Algorithm configuration: parameter importance

Algorithm selection: component contributionXu, Hutter, Hoos, Leyton-Brown (2012)

Consider:portfolio-based algorithm selector ASwith candidate algorithms A1,A2, . . .Ak

Question:How much does each Ai contributeto overall performance of AS?

Hoos & Hutter: Programming by Optimization 154

Marginal contribution of Ai to portfolio-based selector AS

= di↵erence in performance of AS with and without Ai

(trained separately)

6= frequency of selecting Ai

6= fraction of instances solved by Ai

6= contribution of Ai to virtual best solver (VBS)

Hoos & Hutter: Programming by Optimization 155

Application to SATzilla:

I all instances from 2011 SAT Competition:300 Application; 300 Crafted; 300 Random

I candidate solvers from 2011 SAT Competition:

I for determining virtual best solver (VBS)and single best solver (SBS):all solvers from Phase 2 of competition:31 Application; 25 Crafted; 17 Random

I for building SATzilla:all sequential, non-portfolio solvers from Phase 2:18 Application; 15 Crafted; 9 Random

I SATzilla assessed by 10-fold cross validation

Hoos & Hutter: Programming by Optimization 156

SATzilla 2011 Performance (Inst. Solved)

Solver Application Crafted Random

VBS 84.7% 76.3% 82.2%SATzilla 2011 75.3% 66.0% 80.8%SATzilla 2009 70.3% 63.0% 80.3%Gold medalist (SBS) 71.7% 54.3% 68.0%

Hoos & Hutter: Programming by Optimization 157

Performance of Individual SolversApplication

0 20 40 60 80 100

QuteRSatCryptoMinisat

EBGlucoseGlucose2Glucose1

RclMinisat_psm

ContrasatMPhaseSAT64

LingelingPrecosat

LR GL SHRGlueminisatMinisatagile

EBMinisatMinisat

CirminisatRestartSAT

Percentage Solved

5000 CPU sec cuto↵

Hoos & Hutter: Programming by Optimization 158

Correlation of Solver PerformanceApplication

RestartSAT

CirminisatMinisat

EBMinisat

Minisatagile

Glueminisat

LR GL SHRPrecosat

Lingeling

MPhaseSAT64

Contrasat

Minisat_psmRclGlucose1

Glucose2

EBGlucose

CryptoMinisat

QuteRSat

RestartSATCirminisat

MinisatEBMinisat

MinisatagileGlueminisatLR GL SHR

PrecosatLingeling

MPhaseSAT64Contrasat

Minisat_psmRcl

Glucose1Glucose2

EBGlucoseCryptoMinisat

QuteRSat

darker = higher Spearman correlation coe�cient

Hoos & Hutter: Programming by Optimization 159

Correlation of Solver PerformanceRandom

Sparrow

EagleU

P

Gnovelt

y+2TNM

Sattime

11

Adaptg2

wsat11

MPhas

eSAT_M

March_

rw

March_

hi

Sparrow

EagleUP

Gnovelty+2

TNM

Sattime11

Adaptg2wsat11

MPhaseSAT_M

March_rw

March_hi

darker = higher Spearman correlation coe�cient

Hoos & Hutter: Programming by Optimization 160

Solver Selection Frequency in SATzilla 2011Application

Glucose2 (Backup)

Glucose2

Glueminisat

QuteRSat

Precosat

Other Solvers

Solved by Presolvers

Hoos & Hutter: Programming by Optimization 161

Instances Solved by SATzilla 2011 ComponentsApplication

Glucose2 (Backup)

Glucose2 (Pre1)

Glucose2

Glueminisat (Pre1)

GlueminisatQuteRSat

Precosat

EBGlucose (Pre1)EBGlucose

Minisat psm (Pre1)Minisat psm

Other Solvers

Unsolved

Hoos & Hutter: Programming by Optimization 162

Marginal Contribution of ComponentsApplication

0 2 4 6 8 10

QuteRSatCryptoMinisat

EBGlucoseGlucose2Glucose1

RclMinisat_psm

ContrasatMPhaseSAT64

LingelingPrecosat

LR GL SHRGlueminisatMinisatagile

EBMinisatMinisat

CirminisatRestartSAT

Marginal Contribution (%)

Hoos & Hutter: Programming by Optimization 163

Instances Solved vs Marginal Contribution of ComponentsApplication

0 10 20 30 40 50 600

2

4

6

8

10

% Solved by Component Solver

Mar

gina

l Con

tribu

tion

MPhaseSAT64

Glueminisat

Hoos & Hutter: Programming by Optimization 164

Instances Solved vs Marginal Contribution of ComponentsCrafted

0 10 20 30 40 50 600

2

4

6

8

10

% Solved by Component Solver

Mar

gina

l Con

tribu

tion

Sol

Clasp2Sattime

MPhaseSAT

Joint contributions:- 2 Clasp variants = 6.3%- 2 Sattime variants = 5.4%

Hoos & Hutter: Programming by Optimization 165

Instances Solved vs Marginal Contribution of ComponentsRandom

0 10 20 30 40 50 600

2

4

6

8

10

% Solved by Component Solver

Mar

gina

l Con

tribu

tion

EagleUP

Sparrow

March_rw

Joint contributions:- 2 March variants = 4%- 6 LS solvers = 22.5%

Hoos & Hutter: Programming by Optimization 166

More nuanced analysis based on Shapley value:

AAAI-16 paper by Frechette et al.

Bias + correction in portfolio performance evaluation:

IJCAI-16 paper by Cameron et al.

Hoos & Hutter: Programming by Optimization 167

Leveraging parallelism

I design choices in parallel programs(Hamadi, Jabhour, Sais 2009)

I deriving parallel programs from sequential sources concurrent execution of optimised designs (parallel portfolios)(Hoos, Leyton-Brown, Schaub, Schneider 2012)

I parallel design optimisers(e.g., Hutter, Hoos, Leyton-Brown 2012)

I use of cloud resources (parallel runs of design optimisers, ...)(Geschwender, Hutter, Kottho↵, Malitsky, Hoos, Leyton-Brown 2014)

Hoos & Hutter: Programming by Optimization 168

Take-home Message

Programming by Optimisation ...

I leverages computational power to constructbetter software

I enables creative thinking about design alternatives

I produces better performing, more flexible software

I facilitates scientific insights into

I e�cacy of algorithms and their components

I empirical complexity of computational problems

... changes how we build and use high-performance software

Hoos & Hutter: Programming by Optimization 169

More Information:

I www.prog-by-opt.net/Tutorials/IJCAI-17

I www.prog-by-opt.net

I PbO article in Communications of the ACM (Hoos 2012)

I Talk by Kleinberg et al.: Tue, 15:15, Room 216Talk by Lindauer et al.: Fri, 10:50, Room 203Invited talk by HH at CP/ICLP/SAT: Tue, 08/29, 9:00

I Forthcoming book (Morgan & Claypool)

If PbO works for you:

I Make our day – let us know!

I Share the joy – tell everyone else!Hoos & Hutter: Programming by Optimization 170