Post on 28-May-2020
Programming by Optimisation:
A Practical Paradigm
for Computer-Aided Algorithm Design
Holger H. Hoos & Frank Hutter
Leiden Institute for Advanced CSUniversiteit LeidenThe Netherlands
Department of Computer ScienceUniversitat FreiburgGermany
IJCAI 2017
Melbourne, Australia, 2017/08/21
The age of machines
“As soon as an Analytical Engine exists, it will necessarily guide the futurecourse of the science. Whenever any result is sought by its aid, the questionwill then arise – by what course of calculation can these results be arrived atby the machine in the shortest time?”
(Charles Babbage, 1864)
Hoos & Hutter: Programming by Optimization 2
Hoos & Hutter: Programming by Optimization 3
The age of computation
“The maths[!] that computers use todecide stu↵ [is] infiltrating every aspectof our lives.”
I financial markets
I social interactions
I cultural preferences
I artistic production
I . . .
Hoos & Hutter: Programming by Optimization 3
Performance matters ...
I computation speed (time is money!)
I energy consumption (battery life, ...)
I quality of results (cost, profit, weight, ...)
... increasingly:
I globalised markets
I just-in-time production & services
I tighter resource constraints
Hoos & Hutter: Programming by Optimization 4
Example: Resource allocation
I resources > demands many solutions, easy to find
economically wasteful reduction of resources / increase of demand
I resources < demands no solution, easy to demonstrate
lost market opportunity, strain within organisation increase of resources / reduction of demand
I resources ⇡ demands di�cult to find solution / show infeasibilityresources ⇡demands di�cult to find solution / show infeasibility
Hoos & Hutter: Programming by Optimization 5
This tutorial:
new approach to software development, leveraging . . .
I human creativity
I optimisation & machine learning
I large amounts of computation / data
Hoos & Hutter: Programming by Optimization 6
Key idea:
I program (large) space of programs
I encourage software developers toI avoid premature commitment to design choicesI seek & maintain design alternatives
I automatically find performance-optimising designsfor given use context(s)
) Programming by Optimization (PbO)
Hoos & Hutter: Programming by Optimization 7
application context 1
solver
application context 2 application context 3
solversolver
Hoos & Hutter: Programming by Optimization 8
application context 1
solver[p1]
application context 2 application context 3
solver[p3]solver
solver[·]
solversolversolversolver[p2]
Hoos & Hutter: Programming by Optimization 8
Outline
1. Programming by Optimization: Motivation & Introduction
2. Algorithm Configuration (incl. Co↵ee Break)
3. Portfolio-based Algorithm Selection
4. Software Development Support & Further Directions
Hoos & Hutter: Programming by Optimization 9
Programming by Optimization:
Motivation & Introduction
Example: SAT-based software verificationHutter, Babic, Hoos, Hu (2007)
I Goal: Solve SAT-encoded software verification problemsGoal: as fast as possible
I new DPLL-style SAT solver Spear (by Domagoj Babic)
= highly parameterised heuristic algorithm= (26 parameters, ⇡ 8.3⇥ 1017 configurations)
I manual configuration by algorithm designer
I automated configuration using ParamILS, a genericalgorithm configuration procedureHutter, Hoos, Stutzle (2007)
Hoos & Hutter: Programming by Optimization 10
Spear: Performance on software verification benchmarks
solver num. solved mean run-time
MiniSAT 2.0 302/302 161.3 CPU sec
Spear original 298/302 787.1 CPU secSpear generic. opt. config. 302/302 35.9 CPU secSpear specific. opt. config. 302/302 1.5 CPU sec
I ⇡ 500-fold speedup through use automated algorithmconfiguration procedure (ParamILS)
I new state of the art(winner of 2007 SMT Competition, QF BV category)
Hoos & Hutter: Programming by Optimization 11
Levels of PbO:
Level 4: Make no design choice prematurely thatcannot be justified compellingly.
Level 3: Strive to provide design choices andalternatives.
Level 2: Keep and expose design choices consideredduring software development.
Level 1: Expose design choices hardwired intoexisting code (magic constants, hiddenparameters, abandoned design alternatives).
Level 0: Optimise settings of parameters exposedby existing software.
Hoos & Hutter: Programming by Optimization 12
Lo Hi
Hoos & Hutter: Programming by Optimization 13
Lo Hi
Hoos & Hutter: Programming by Optimization 13
Lo Hi
Hoos & Hutter: Programming by Optimization 13
Lo Hi
Hoos & Hutter: Programming by Optimization 13
Success in optimising speed:
Application, Design choices Speedup PbO level
SAT-based software verification (Spear), 26Hutter, Babic, Hoos, Hu (2007)
4.5–500 ⇥ 2–3
AI Planning (LPG), 62Vallati, Fawcett, Gerevini, Hoos, Saetti (2011)
3–118 ⇥ 1
Mixed integer programming (CPLEX), 76Hutter, Hoos, Leyton-Brown (2010)
2–52 ⇥ 0
... and solution quality:
University timetabling, 18 design choices, PbO level 2–3 new state of the art; UBC exam schedulingFawcett, Chiarandini, Hoos (2009)
Machine learning / Classification, 786 design choices, PbO level 0–1 outperforms specialised model selection & hyper-parameter optimisation methods from machine learningThornton, Hutter, Hoos, Leyton-Brown (2012–13)
Hoos & Hutter: Programming by Optimization 14
PbO enables . . .
I performance optimisation for di↵erent use contexts(some details later)
I adaptation to changing use contexts(see, e.g., life-long learning – Thrun 1996)
I self-adaptation while solving given problem instance(e.g., Battiti et al. 2008; Carchrae & Beck 2005; Da Costa et al. 2008)
I automated generation of instance-based solver selectors(e.g., SATzilla – Leyton-Brown et al. 2003, Xu et al. 2008;Hydra – Xu et al. 2010; ISAC – Kadioglu et al. 2010;AutoFolio – Lindaue 2015)
I automated generation of parallel solver portfolios(e.g., Huberman et al. 1997; Gomes & Selman 2001;Hoos et al. 2012; Lindauer et al. 2017)
Hoos & Hutter: Programming by Optimization 15
Cost & concerns
But what about ...
I Computational complexity?
I Cost of development?
I Limitations of scope?
Hoos & Hutter: Programming by Optimization 16
Computationally too expensive?
Spear revisited:
I total configuration time on software verification benchmarks:⇡ 20 CPU days
I wall-clock time on 10 CPU cluster:⇡ 2 days
I cost on Amazon Elastic Compute Cloud (EC2):60.23 AUD (= 48 USD)
I 60.23 AUD pays for ...
I 1h42 of typical software engineer in AustraliaI 3h33 at minimum wage in Australia
Hoos & Hutter: Programming by Optimization 17
Too expensive in terms of development?
Design and coding:
I tradeo↵ between performance/flexibility and overhead
I overhead depends on level of PbO
I traditional approach: cost from manual exploration ofdesign choices!
Testing and debugging:
I design alternatives for individual mechanisms and componentscan be tested separately
e↵ort linear (rather than exponential) in the number ofdesign choices
Hoos & Hutter: Programming by Optimization 18
Limited to the “niche” of NP-hard problem solving?
Some PbO-flavoured work in the literature:
I computing-platform-specific performance optimisationof linear algebra routines(Whaley et al. 2001)
I optimisation of sorting algorithmsusing genetic programming(Li et al. 2005)
I compiler optimisation(Pan & Eigenmann 2006; Cavazos et al. 2007; Fawcett et al. 2017)
I database server configuration(Diao et al. 2003)
Hoos & Hutter: Programming by Optimization 19
Algorithm Configuration:
Methods
Overview• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues[coffee]– CaseStudies
• Portfolio-BasedAlgorithmSelection
• SoftwareDevelopmentSupport&FurtherDirections
20
AlgorithmConfiguration• Inanutshell:optimizationoffreeparameters
– Whichparameters?Theonesyou‘dotherwisetunemanually&more
• ExamplesoffreeparametersinvarioussubfieldsofAI– Treesearch(inparticularforSAT):pre-processing,branching
heuristics,clauselearning&deletion,restarts,datastructures,...
– Localsearch:neighbourhoods,perturbations,tabu length,annealing...
– Geneticalgorithms:populationsize,matingscheme,crossoveroperators,mutationrate,localimprovementstages,hybridizations,...
– MachineLearning:pre-processing,regularization(type&strength),minibatch size,learningrateschedules,optimizer&itsparameters,...
– Deeplearning(inaddition):#layers(&layertypes),#units/layer,dropoutconstants,weightinitializationanddecay,non-linearities,...
21
AlgorithmParameters
Parametertypes– Continuous,integer,ordinal– Categorical:finitedomain,unordered,e.g.{a,b,c}
Parameterspacehasstructure– E.g.parameterCisonlyactiveifheuristicH=hisused– Inthiscase,wesayCisaconditionalparameter withparentH
Parametersgiverisetoastructuredspaceofalgorithms– Manyconfigurations (e.g.1047)– Configurationsoftenyieldqualitativelydifferentbehaviour® Algorithmconfiguration (asopposedto“parametertuning”)
22
TheAlgorithmConfigurationProcess
23
AlgorithmConfiguration– inMoreDetail
24
AlgorithmConfiguration– FullFormalDefinition
25
Distributionvs SetofInstances
26
AlgorithmConfigurationisaUsefulAbstraction
27
Two differentinstantiations:
Largeimprovementstosolversformanyhardcombinatorialproblems
SAT,Max-SAT,MIP,SMT,TSP,ASP,time-tabling,AIplanning,…Competitionwinnersforalloftheserelyonconfigurationtools
AlgorithmConfigurationisaUsefulAbstraction
28
Increasinglypopular(citationnumbers:GoogleScholar)
0
50
100
150
200
250
300
350
400
450
500
20082009
20102011
20122013
20142015
2016
ParamILS (Hutter et al. 07+09)SMAC (Hutter et al. 11)I/F-Race (Balaprakash et al. 07, Birattari et al. 10, López-Ibáñez et al. 11)GGA/GGA++ (Ansotegui et al. 09+15)
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies
• Portfolio-BasedAlgorithmSelection
• SoftwareDevelopmentSupport&FurtherDirections
29
ConfiguratorshaveTwoKeyComponents
• Component1:whichconfigurationtoevaluatenext?– Outofalargecombinatorialsearchspace– E.g.,CPLEX:76parameters,1047 configurations
• Component2:howtoevaluatethatconfiguration?– Evaluatingperformanceofaconfigurationisexpensive– E.g.,CPLEX:budgetof10000sperinstance– Instancesvaryinhardness
• Sometakemilliseconds,otherdays(forthedefault)• Improvementonafewinstancesmightnotmeanmuch
30
Component1:WhichConfigurationtoChoose?
• Forthiscomponent,wecanconsiderasimplerproblem:
Blackbox functionoptimization
– Onlymodeofinteraction:queryf(q)atarbitraryqÎQ
– Abstractsawaythecomplexityofmultipleinstances– Q isstillastructuredspace
• Mixedcontinuous/discrete• Conditionalparameters• Stillmoregeneralthan“standard”continuousBBO[e.g.,Hansenetal.]
31
minf(q)
q f(q)
qÎQ
Component1:WhichConfigurationtoChoose?
• Needtobalancediversificationandintensification• Theextremes
– Randomsearch– Hill-climbing
• Stochasticlocalsearch(SLS)• Population-basedmethods• Model-BasedOptimization
32
SequentialModel-BasedOptimization
• Fita(probabilistic)modeloffunctionf
• Usethatmodeltotradeoffexploitationvs exploration
• InmachinelearningliteratureknownasBayesianOptimization
33
Component2:HowtoEvaluateaConfiguration?
Backtogeneralalgorithmconfiguration
• Generalprinciple– Don’twastetoomuchtimeonbadconfigurations– Evaluategoodconfigurationsmorethoroughly
34
SimplestSolution:UseFixedNInstances
• Effectivelytreatstheproblemasblack-boxfunctionoptimization
• Issue:howlargetochooseN?– Toosmall:over-tuningtothoseinstances– Toolarge:everyfunctionevaluationisslow
35
RacingAlgorithms
• Comparetwoormorealgorithmsagainsteachother– Performonerunforeachconfigurationatatime– Discardconfigurationswhendominated
36
Imagesource:Maron &Moore,Hoeffding Races,NIPS1994
[Maron &Moore,NIPS1994][Birattari,Stützle,Paquete&Varrentrapp,GECCO2002]
SavingTime:AggressiveRacing
• Racenewconfigurationsagainstthebestknown– Discardpoornewconfigurationsquickly– Norequirementforstatisticaldomination
• Searchcomponentshouldallowtoreturntoconfigurationsdiscardedbecausetheywere“unlucky”
37
[Hutter,Hoos &Stützle,AAAI2007]
SavingMoreTime:AdaptiveCapping
Canterminaterunsforpoorconfigurationsq’early:– Is q’ better than q?
• Example:
• Can terminate evaluation of q’ once guaranteed to be worse than q
RT(q)=20 RT(q’)>20
20
RT(q’) = ?
(onlywhenminimizingalgorithmruntime)
38
[Hutter,Hoos,Leyton-Brown&Stützle,JAIR2009]
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies
• Portfolio-BasedAlgorithmSelection
• SoftwareDevelopmentSupport&FurtherDirections
39
Overview:AlgorithmConfigurationSystems• Continuousparameters,singleinstances(black-boxfunctionoptimization)– Covarianceadaptationevolutionarystrategy(CMA-ES)
[Hansenetal. 2006-17]– SequentialParameterOptimization(SPO)
[Bartz-Beielstein etal. 2006]
• Generalalgorithmconfigurationmethods– ParamILS [Hutter etal. 2007,2009;Blotetal.2016]– Gender-basedGeneticAlgorithm(GGA)[Ansotegui etal. 2009,2015]– IteratedF-Race[Birattari etal. 2002,2010]– SequentialModel-basedAlgorithmConfiguration(SMAC)
[Hutter etal. 2011-17]– DistributedSMAC[Hutter etal. 2012-17]
40
TheBaseline:GraduateStudentDescent
StartwithsomeconfigurationrepeatModifyasingleparameterifperformanceonabenchmarksetdegradesthen
undomodificationuntilnomoreimprovementpossible
(or“goodenough")
41
TheParamILS Framework
IteratedLocalSearchinparameterconfigurationspace:
® Performsbiasedrandomwalkoverlocaloptima
[Hutter,Hoos,Leyton-Brown&Stützle,AAAI2007&JAIR2009]
42
TheBasicILS(N)algorithm
• InstantiatestheParamILS framework• UsesafixednumberofNrunsforeachevaluation
– SampleNinstancefromgivenset(withrepetitions)– Sameinstances(andseeds)forevaluatingallconfigurations– Essentiallytreatstheproblemasblackbox optimization
• HowtochooseN?– Toohigh:evaluatingaconfigurationisexpensive
® Optimizationprocessisslow– Toolow:noisyapproximationsoftruecost
® Poorgeneralizationtotestinstances/seeds
43
GeneralizationtoTestset,LargeN(N=100)
44
SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)
GeneralizationtoTestSet,SmallN(N=1)
45
SAPSonasingleQWHinstance(sameinstancefortraining&test;onlydifference:seeds)
BasicILS:Speed/GeneralizationTradeoff
46TestperformanceofSAPSonasingleQWHinstance
TheFocusedILS Algorithm
Aggressiveracing:morerunsforgoodconfigurations– StartwithN(q)=0forallconfigurations– IncrementN(q) wheneverthesearchvisitsq– “Bonus”runsforconfigurationsthatwinmanycomparisons
TheoremAsnumberofFocusedILS iterations®¥,convergencetooptimalconfiguration
– Keyideasinproof:1.Theunderlying ILSeventuallyreachesanyconfiguration2.ForN(q)®¥,theerrorincostapproximationsvanishes
47
FocusedILS:Speed/GeneralizationTradeoff
48TestperformanceofSAPSonasingleQWHinstance
SpeedingupParamILS
Standardadaptivecapping– Is q’ better than q?
• Example:
• Can terminate evaluation of q’ once guaranteed to be worse than q
TheoremEarly termination of poor configurations does not change ParamILS trajectory
– Often yields substantial speedups
– Especially when best configuration is much faster than worst
RT(q)=20 RT(q’)>20
20
49
[Hutter ,Hoos,Leyton-Brown,andStützle,JAIR2009]
Gender-basedGeneticAlgorithm(GGA)
• Geneticalgorithm– Genome=parameterconfiguration– Combinegenomesof2parentstoformanoffspring
• Supportsadaptivecapping– Evaluatepopulationmembersinparallel– Adaptivecapping:canstopwhenthefirstksucceed
• UseNinstancestoevaluateconfigurations– IncreaseNineachgeneration– LinearincreasefromNstart toNend
• Notrecommendedforsmallbudgets• Userhastospecify#generationsaheadoftime
50
[Ansotegui,Sellmann &Tierney,CP2009&IJCAI2015]
F-RaceandIteratedF-Race• F-Race
– Standardracingframework– F-testtoestablishthatsome
configurationisdominated– Followedbypairwiset tests
ifF-testsucceeds
• IteratedF-Race– Maintainaprobabilitydistribution
overwhichconfigurationsaregood– Samplek configurationsfromthatdistribution&racethem– Updatedistributionswiththeresultsoftherace
• Well-supportedsoftwarepackageinR
51
[Birattari etal.,GECCO2002&ORPerspectives2016;PérezCáceres etal.,LION2017]
Model-BasedAlgorithmConfiguration
SMAC:SequentialModel-BasedAlgorithmConfiguration– SequentialModel-BasedOptimization&aggressiveracing
repeat- constructmodeltopredictperformance- usethatmodeltoselectpromisingconfigurations- compareeachselectedconfigurationagainstthebestknown
until timebudgetexhausted52
[Hutter,Hoos &Leyton-Brown,LION2011]
SMAC:AggressiveRacing
• SimilarracingcomponentasFocusedILS– Morerunsforgoodconfigurations– Increase#runsforincumbentovertime
• Theorem fordiscreteconfigurationspaces:AsoveralltimebudgetforSMAC®¥,convergencetooptimalconfiguration
53
PoweringSMAC:EmpiricalPerformanceModels
Given:– Configuration space– Foreachprobleminstancei:vectorxi offeaturevalues– Observedalgorithmruntimedata:(q1,x1,y1),…,(qn,xn ,yn)
Find:mapping m: [q, x] ↦ y predictingA’sperformance
– Richliteratureonsuchperformancepredictionproblems[see,e.g.,Hutter,Xu,Hoos,Leyton-Brown,AIJ2014,foroverview]
– Here:usemodelm basedonrandomforests54
≈m (q,x)
RegressionTrees:FittingtoData
– Ineachinternalnode:onlystoresplitcriterionused– Ineachleaf:storemeanofruntimes
param3 Î {red} param3 Î {blue, green}
feature2 > 3.5feature2 ≤ 3.5
3.7 1.65 …55
feature2 > 3.5
RegressionTrees:PredictionsforNewInputs
param3 Î {red} param3 Î {blue, green}
feature2 ≤ 3.5
3.7 1.65 …
E.g. xn+1 = (true, 4.7, red)
– Walk down tree, return mean runtime stored in leaf Þ 1.65
56
RandomForests:SetsofRegressionTrees
Training– DrawTbootstrapsamples ofthedata– Foreachbootstrapsample,fitarandomized regressiontree
Prediction– PredictwitheachoftheTtrees– ReturnempiricalmeanandvarianceacrosstheseTpredictions
Complexity forNdatapoints– Training:O(TNlog2 N)– Prediction:O(TlogN)
…
57
AdvantagesofRandomForests
Automatedselectionofimportantinputdimensions– Continuous,integer,andcategoricalinputs– Upto138features,76parameters– Canidentifyimportantfeatureandparametersubsets
• Sometimes1feature,2parameterssufficient![Hutter,Hoos,Leyton-Brown,LION2013]
Robustness– Noneedtooptimizehyperparameters– Alreadygoodpredictionswithfewtrainingdatapoints
58
SMAC:AveragingAcrossMultipleInstances
• Fitrandomforestmodel
• Aggregateoverinstancesbymarginalization
– Intuition:predictforeachinstanceandthenaverage– Moreefficientimplementationinrandomforests
59
SMAC:PuttingitallTogetherInitializewithsinglerunfordefaultconfigurationrepeat
1.LearnRFmodelfromdatasofar:
2.Aggregateoverinstances:
3. Usemodelf toselectpromisingconfigurations4.Raceeachselectedconfigurationagainstthebestknown
until timebudgetexhausted
• Distributed SMAC[Hutter,Hoos &Leyton-Brown,2012]– Maintainqueueofpromisingconfigurations– Racetheseagainstbestknownondistributedworkercores
60
SMAC:AdaptiveCapping
Terminaterunsforpoorconfigurationsq early:
– Lowerboundonruntime® right-censoreddatapoint f(q)>20f(q*)=20
20
61
[Hutter,Hoos &Leyton-Brown,BayesOpt 2011]
ExperimentalEvaluation
ComparedSMACtoParamILS,GGA– On 17 SAT and MIP configuration scenarios, same time budget
SMACperformedbest– Improvements in test performance of configurations returned
• vs ParamILS: 0.93´ - 2.25´ (11/17 cases significantly better)
• vs GGA: 1.01´ - 2.76´ (13/17 cases significantly better)
Wall-clockspeedupsindistributedSMAC– Almostperfectwithupto16parallelworkers– Upto50-foldwith64workers
• Reductionsinwallclocktime:5h® 6min-15min2days® 40min- 2h
62
[Hutter,Hoos &Leyton-Brown,LION2011]
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies
• Portfolio-BasedAlgorithmSelection
• SoftwareDevelopmentSupport&FurtherDirections
63
TheAlgorithmConfigurationProcess
preproc {none,simple,expensive}[simple]alpha[1,5][2]beta[0.1,1][0.5]
Parameterspacedeclarationfile./wrapper–inst X–timeout30-preproc none-alpha3-beta0.7® e.g.“successfulafter3.4seconds”
Wrapperforcommandlinecall
Whattheuserhastoprovide
64
Example:RunningSMAC
65
wget http://www.cs.ubc.ca/labs/beta/Projects/SMAC/smac-v2.10.03-master-778.tar.gz
tar xzvf smac-v2.10.03-master-778.tar.gz
cd smac-v2.10.03-master-778
./smac
./smac --seed 0 --scenarioFile example_scenarios/spear/spear-scenario.txt
Scenariofileholds:- Locationofparameterfile,wrapper&instances- Objectivefunction(here:minimizeavg.runtime)- Configurationbudget(here:30s)- Maximalcaptime pertargetrun(here:5s)
Forausagescreen
OutputofaSMACrun
66
[…]
[INFO ] *****Runtime Statistics*****Incumbent ID: 12 (0x22BB8)
Number of Runs for Incumbent: 43
Number of Instances for Incumbent: 5Number of Configurations Run: 42
Performance of the Incumbent: 0.012555555555555556Configuration Time Budget used: 30.589647351000067 s (101%)
Sum of Target Algorithm Execution Times (treating minimum value as 0.1): 24.70000s
CPU time of Configurator: 5.889042742 s[INFO ] **********************************************
[INFO ] Total Objective of Final Incumbent 12 (0x22BB8) on training set:
0.012555555555555556; on test set: 0.014499999999999999
[INFO ] Sample Call for Final Incumbent 12 (0x22BB8) cd /ubc/cs/home/h/hutter/tmp/smac-v2.06.00-master-615/example_scenarios/spear; ruby spear_wrapper.rbinstances/qcplin2006.10408.cnf 0 5.0 2147483647 3282095 -sp-update-dec-queue '0' -sp-rand-var-dec-scaling
'0.3528466348383826' -sp-clause-decay '1.713857938112484' -sp-variable-decay '1.461422623379798' -sp-orig-clause-sort-heur '7' -sp-rand-phase-dec-freq '0.05' -sp-clause-del-heur '0' -sp-learned-clauses-inc'1.452683835620401' -sp-restart-inc '1.6481745669620091' -sp-resolution '0' -sp-clause-activity-inc'0.7121640599232154' -sp-learned-clause-sort-heur '12' -sp-var-activity-inc '0.9358501810374242' -sp-rand-var-dec-freq '0.0001' -sp-use-pure-literal-rule '1' -sp-learned-size-factor '0.27995062371127827' -sp-var-dec-heur '16' -sp-
phase-dec-heur '6' -sp-rand-phase-scaling '1.0424648235977578' -sp-first-restart '31'
Decision#1:ConfigurationBudget&Captime
• Configurationbudget– Dictatedbyyourresources&needs
• E.g.,startconfigurationbeforeleavingworkonFriday
– Thelongerthebetter(butdiminishingreturns)• Roughruleofthumb:typicallytimeforatleast1000targetruns• Buthavealsoachievedgoodresultswith50targetrunsinsomecases
• Maximalcaptime pertargetrun– Dictatedbyyourneeds(typicalinstancehardness,etc)– Toohigh:slowprogress– Toolow:possibleovertuning toeasyinstances– ForSATetc.,oftenuse300CPUseconds
67
Decision#2:ChoosingtheTrainingInstances
• Representativeinstances,moderatelyhard– Toohard:won’tsolvemanyinstances,notraction– Tooeasy:willresultsgeneralizetoharderinstances?– Ruleofthumb:mixofhardnessranges
• Roughly75%instancessolvablebydefaultinmaximalcaptime
• Enoughinstances– Themoretraininginstances,thebetter– Veryhomogeneousinstancesets:50instancesmightsuffice– Preferably³ 300instances,bettereven³ 1000instances
68
Decision#2:ChoosingtheTrainingInstances
• Splitinstancesetintotrainingandtestsets– Configureonthetraininginstances® configurationq*– Run(only)q* ontestinstances
• Unbiasedestimateofperformance
69
Pitfall:configuringonyourtestinstances
That’sfromthedarkages
Finepractice:domultipleconfigurationrunsandpicktheq*withbesttrainingperformance
Not(!!)bestontestset
Decision#2:ChoosingtheTrainingInstances
• Worksmuchbetteronhomogeneousbenchmarks– Instancesthathavesomethingincommon
• E.g.,comefromthesameproblemdomain• E.g.,usethesameencoding
– Oneconfigurationlikelytoperformwellonallinstances
70
Pitfall:configurationontooheterogeneoussets
Thereoftenisnosinglegreatoverallconfiguration(butseealgorithmselectionetc.,laterinthetutorial)
Decision#3:HowManyParameterstoExpose?
• Suggestion:allparametersyoudon’tknowtobeuseless– Moreparameters® largergainspossible– Moreparameters® harderproblem– Max.#parameterstackledsofar:768
[Thornton, Hutter, Hoos & Leyton-Brown, KDD‘13]
• Withmoretimeyoucansearchalargerspace
71
Pitfall:includingparametersthatchangetheproblem
E.g.,optimalitythresholdinMIPsolvingE.g.,howmuchmemorytoallowthetargetalgorithm
Decision#4:HowtoWraptheTargetAlgorithm• Donottrustanytargetalgorithm
– Willitterminateinthetimeyouspecify?– Willitcorrectlyreportitstime?– Willitneverusemorememorythanspecified?– Willitbecorrectwithallparametersettings?
72
Pitfall:blindlyminimizingtargetalgorithmruntimeTypically,youwillminimizethetimetocrash
Goodpractice:wraptargetrunswithtoolcontrollingtimeandmemory(e.g.,runsolver [Roussel etal,’11])
Goodpractice:verifycorrectnessoftargetrunsDetectcrashes&penalizethem
FurtherBestPracticesand Pitfalls to Avoid
• Pitfalls– Using differentwrappers for comparing differentconfigurators
• Often causes subtle bugs that completely invalidate the comparison
– Notterminating algorithm runs properly• Use generic wrapper to handlethis for you (we provide one inPython)
– Filesystem issues when running many parallelexperiments
• Bestpractices– Choose configuration spaces dependent onbudget
• If you have timefor 10evaluations,don‘t configure 100sof parameters
– Realistic and/or reproducible running timemetrics• CPUvs wall-clock timevsMEMS
– Compareconfiguratorsonexisting,open-sourcebenchmarks• E.g.,useAClib (inversion2:https://bitbucket.org/mlindauer/aclib2)
73
[Eggensperger,Lindauer &Hutter,arXiv 2017]
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies
• Portfolio-BasedAlgorithmSelection
• SoftwareDevelopmentSupport&FurtherDirections
74
ApplicationsofAlgorithmConfiguration
HelpedwinCompetitionsSAT:since2009ASP:since2009IPC:since2011Time-tabling:2007SMT:2007
OtherAcademicApplicationsMixedintegerprogramming(MIP)TSP&QuadraticAssignmentProblemGameTheory:KidneyExchangeLinearalgebrasubroutinesImprovingJavaGarbageCollectionMLHyperparameter OptimizationDeeplearning
75
FCCspectrumauction
Mixedintegerprogramming
Analytics&Optimization
Socialgaming
SchedulingandResourceAllocation
BacktotheSpearExampleSpear[Babic,2007]
– 26parameters– 8.34´ 1017 configurations
RanParamILS,2to3days´ 10machines– Onatrainingsetfromeachof2distributions
Comparedtodefault(1weekofmanualtuning)– Onadisjointtestsetfromeachdistribution
4.5-foldspeedup500-foldspeedupÞ wonQF_BVcategoryin2007SMTcompetition
belowdiagonal:speedup
Log-logscale!
[Hutter,Babic,Hu&Hoos,FMCAD2007]
76
OtherExamplesofPbO forSAT
• SATenstein [KhudaBukhsh,Xu,Hoos&Leyton-Brown,IJCAI2009&AIJ2016]
– Combinedingredientsfromexistingsolvers– 54parameters,over1012 configurations– Speedupfactors:1.6xto218x
• CaptainJack[Tompkins&Hoos,SAT2011]
– Exploredacompletelynewdesignspace– 58parameters,over1050 configurations– Afterconfiguration:bestknownsolverfor3sat10kandIL50k
77
ConfigurableSATSolverCompetition(CSSC)
• AnnualSATcompetition– ScoresSATsolversbytheirperformanceacrossinstances– Medalsforbestaverageperformancewithsolverdefaults
• Misleadingresults:implicitlyhighlightssolverswithgooddefaults
• CSSC2013&2014– Betterreflectsanapplicationsetting:homogeneousinstances® canautomaticallyoptimizeparameters
– Medalsforbestperformanceafterconfiguration
78
[Hutter,Balint,Bayless,Hoos &Leyton-Brown2013]
CSSCResult#1
• Solverperformanceoftenimprovedalot:
79
Lingeling onCircuitFuzz:Timeouts:119® 107
Clasponn-queens:Timeouts:211® 102
probSAT onunif rnd 5-SAT:Timeouts:250® 0
[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]
CSSCResult#2
• Automatedconfigurationchangedalgorithmrankings– Example:randomSAT+UNSATcategoryin2013
80
Solver CSSCranking DefaultrankingClasp 1 6
Lingeling 2 4
Riss3g 3 5
Solver43 4 2
Simpsat 5 1
Sat4j 6 3
For1-nodrup 7 7
gNovelty+GCwa 8 8
gNovelty+Gca 9 9
gNovelty+PCL 10 10
[Hutter,Lindauer,Balint,Bayless,Hoos &Leyton-Brown2014]
Real-WorldApplication:FCCSpectrumAuction• Wirelessfrequencyspectra:demandincreases
– USFederalCommunicationsCommission(FCC)iscurrentlyholdinganauction– Expectednetrevenue:$10billionto$40billion
• Keycomputationalproblem:feasibilitytestingbasedoninterferenceconstraints– Ahardgraphcolouringproblem– 2991stations(nodes)&
2.7millioninterferenceconstraints– Needtosolvemanydifferentinstances– Moreinstancessolved:higherrevenue
• Bestsolution:basedonSATsolving&configurationwithSMAC– Improved#instancessolvedfrom73%to99.6%
[Frechette, Newman & Leyton-Brown, AAAI 2016]
81
ConfigurationofaCommercialMIPsolver
MixedIntegerProgramming(MIP)
CommercialMIPsolver:IBMILOGCPLEX– Leadingsolverfor15years– Licensedby over1000universitiesand1300corporations– 76parameters,1047 configurations
Minimizingruntimetooptimalsolution– Speedupfactor:2´ to50´– Laterwork:speedupsupto10,000´
Minimizingoptimalitygapreached– Gapreductionfactor:1.3´ to8.6´
[Hutter,Hoos &Leyton-Brown,CPAIOR2010]
82
ComparisontoCPLEXTuningToolCPLEXtuningtool
– Introducedinversion11(late2007,afterParamILS)– Evaluatespredefinedgoodconfigurations,returnsbestone– Requiredruntimevaries(from<1htoweeks)
ParamILS:anytimealgorithm– Ateachtimestep,keepstrackofitsincumbent
2-fold speedup
(our worst result)50-fold speedup(our best result)
lower is better
[Hutter,Hoos &Leyton-Brown,CPAIOR2010]
83
ConfigurationofMachineLearningAlgorithms
• MachineLearninghascelebratedsubstantialsuccesses• Butitrequireshumanmachinelearningexpertsto
– Preprocessthedata– Performfeatureselection– Selectamodelfamily– Optimizehyperparameters– …
• AutoML:takingthehumanexpertoutoftheinnerloop– YearlyAutoML workshopsatICMLsince2014– PbO appliedtomachinelearning
84
Auto-WEKA
WEKA [Wittenetal,1999-current]– mostwidelyusedoff-the-shelfmachinelearningpackage– over20000citationsonGoogleScholar
Javaimplementationofabroadrangeofmethods– 27baseclassifiers(withupto10parameterseach)– 10meta-methods– 2ensemblemethods– 3featuresearchmethods&8featureevaluators
Differentmethodsworkbestondifferentdatasets– Wantatrueoff-the-shelfsolution:
85
[Thornton, Hutter, Hoos & Leyton-Brown, KDD 2013]
Learn
WEKA’sconfigurationspace
Baseclassifiers– 27choices,eachwithupto10subparameters– Coarsediscretization:about108 instantiations
Hierarchicalstructureontopofbaseclassifiers
86
WEKA’sconfigurationspace(cont’d)Featureselection
– Searchmethod:whichfeaturesubsetstoevaluate– Evaluationmethod:howtoevaluatefeaturesubsetsinsearch– Bothmethodshavesubparameters® about107 instantiations
Intotal:768parameters,1047 configurations
87
Auto-WEKA:Results
• Auto-WEKAperformedbetterthanbestbaseclassifier– Evenwhen“bestbaseclassifier”determinedbyoracle– In6/21datasetsmorethan10%reductionsinrelativeerror
• Comparisontofullgridsearch– Unionofgridsoverparametersofall27baseclassifiers– Auto-WEKAwas100timesfaster– Auto-WEKAhadbettertestperformancein15/21cases
• Auto-WEKAbasedonSMACvs TPE[Bergstra etal.,NIPS2011]– SMACyieldedbetterCVperformancein19/21cases– SMACyieldedbettertestperformancein14/21cases– Differencesusuallysmall,in3casessubstantial(SMACbetter)
88
Auto-WEKAasaWEKAplugin
• Commandlineexample:java -cp autoweka.jar weka.classifiers.meta.AutoWEKAClassifier-t iris.arff -timeLimit 5 -no-cv
• GUIexample:
89
Auto-sklearn
• FollowedandextendedAuto-WEKA’sAutoML approach• Scikit-learnoptimizedbySMAC,plus
– Meta-learningtowarmstart Bayesianoptimization– Automated posthoc ensembleconstructiontocombinethemodelsBayesianoptimizationevaluated
90
[Feurer,Klein,Eggensperger,Springenberg,Blum,Hutter, NIPS2015]
Auto-sklearn’s configurationspace
• Scikit-learn[Pedregosa etal.,2011-current] insteadofWEKA– 15classifiers,(withatotalof59hyperparameters)
– 13featurepreprocessors(42hyperparams)
– 4 datapreprocessors(5hyperparams)
• 110hyperpametersvs 768inAuto-WEKA
91
Auto-sklearn:ReadyforPrimeTime• Winningapproachinthe14-monthAutoML challenge
– Best-performingapproachinauto-trackandhumantrack– Wonbothtracksinbothfinalphases– Vs 150teamsofhumanexperts
• Trivialtouse:
• Availableonline:https://github.com/automl/auto-sklearn
92
PbO forDeepLearning
• Whatisdeeplearning?– Neuralnetworkswithmanylayers
• Whyistheresomuchexcitementaboutit?– Dramaticallyimprovedthestate-of-the-artinmanyareas,e.g.,
• Speechrecognition• Imagerecognition
– Automaticlearningofrepresentations® nomoremanualfeatureengineering
• Whatchanged?– Largerdatasets– Betterregularizationmethods,e.g.,dropout[Hinton et al., 2012]
– FastGPUimplementations[Krizhevsky et al., 2012]
93
Source:Krizhevsky etal.,2012
Source:Leetal.,2012
DeepLearningisSensitivetoManyChoices
94
…
dogcat
#convolutionallayers #fullyconnectedlayers
#unitsperlayer
Kernelsizes
Choiceofnetworkarchitecture…
…and20-30othernumericalchoicesLearningrateschedule(initialization,decay,adaptation),momentum,batchnormalization,batchsize,#epochs,dropoutrates,weightinitializations,weightdecay,…
Auto-NetforComputerVisionData
• Application:objectrecognition
• ParameterizedtheCaffe framework [Jia, 2013]
– Convolutionalneuralnetwork– 9networkhyperparameters– 12hyperparametersperlayer,upto6layers– Intotal81hyperparameters
• ResultsforCIFAR-10– NewbestresultforCIFAR-10withoutdataaugmentation
– SMACoutperformedTPE(onlyotherapplicablehyperparameter optimizer)
95
[Domhan, Springenberg, Hutter, IJCAI 2015]
Auto-NetintheAutoML Challenge
• ClearlywonadatasetofAutoML challenge– 54491datapoints,5000features,18classes
• Unstructureddata® fully-connectednetwork– Upto5layers(with3layerhyperparameterseach)– 14networkhyperparameters,intotal29hyperparameters– Optimizedfor18hon5GPUs
• Result(onprivatetestset)– AUC90%– Allother(manual)approaches<80%
96
SpeedupsbyPredictionofLearningCurves
• Humanscanlookinsidetheblackbox– Theycanpredictthefinalperformanceofatargetalgorithmrunearly
– Afterfewepochsofstochasticgradientdescent
– Stopifnotpromising
• Weautomatedthatheuristic– Fittedlinearcombinationof22parametricmodels– MCMCtopreserveuncertaintyovermodelparameters– Stoppedpoorrunsearly:overall2-foldspeedup
97
[Domhan, Springenberg & Hutter, IJCAI 2015]
log(C)
log(g) log(g) log(g) log(g)
smax /128smax /16smax /4smax
log(C)
log(C)
log(C)
SpeedupsbyReasoningoverDataSubsets
• Problem:trainingisveryslowforlargedatasets
• Solutionapproach:scalingupfromsubsetsofgivendata
• Example:SVM– Computationalcostgrowsquadratically indatasetsizes– Errorshrinkssmoothlywiths
98
[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]
SpeedupsbyReasoningoverDataSubsets
• 10-100xspeedupforoptimizingSVMhyperparameters• 5-10xspeedupforconvolutionalneuralnetworks
99
Error
Budgetofoptimizer[s]
[Klein, Bartels, Falkner, Hennig, Hutter, arXiv 2016]
SummaryofAlgorithmConfiguration
• Algorithmconfiguration– Methods(componentsofalgorithmconfiguration)– Systems(thatinstantiatethesecomponents)– Demo&practicalissues– Casestudies
• Usefulabstractionwithmany(!)applications• Oftenbetterperformancethanhumanexperts• Muchlesshumanexperttimerequired
Linkstoallourcode:http://ml4aad.org100
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration
• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction
• SoftwareDevelopmentToolsandFurtherDirections
101
Motivation:nosinglegreatconfigurationexists
• Heterogeneousinstancedistributions– Evenbestoverallconfigurationisnotgreat;e.g.:
• Nosinglebestsolver– E.g.,SATsolving:differentsolverswindifferentcategories
– Virtualbestsolver(VBS)muchbetterthansinglebestsolver(SBS)
102
Configuration Instancetype1 Instancetype2#1 1s 1000s#2 1000s 1s#3 100s 100s
Algorithmportfolios
103
Parallel portfolios [Huberman et al. 1997]
Algorithm schedules [Sayag et al. 2006]
Algorithm selection [Rice 1976]
Algorithms
Feature extractor
Algorithmselector
instance
instance
time
instancetime
…
Exploiting complementary strengths of different algorithms
Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution
• Satisfiability:– SATzilla*† [variouscoauthors,citedinthefollowingslides;2003-17]– 3S*† [Sellmann 2011]
– ppfolio‡ [Roussel 2011]– claspfolio* [Gebser,Kaminski,Kaufmann,Schaub,Schneider,Ziller2011]
– aspeed†‡ [Kaminski,Hoos,Schaub,Schneider2012]
• ConstraintSatisfaction:– CPHydra*† [O’Mahony,Hebrard,Holland,Nugent,O’Sullivan2008]
104
Portfolioshavebeensuccessfulinmanyareas*AlgorithmSelection †SequentialExecution ‡ParallelExecution
• Planning:– FDStoneSoup† [Helmert,Röger,Karpas 2011]
• MixedIntegerProgramming:– ISAC* [Kadioglu,Malitsky,Sellmann,Tierney2010]
– MIPzilla*† [Xu,Hutter,Hoos,Leyton-Brown2011]
• ..andthisisjustthetipoftheiceberg:– http://dl.acm.org/citation.cfm?id=1456656[Smith-Miles2008]– http://4c.ucc.ie/~larsko/assurvey[Kotthoff 2012]
105
Overview
• ProgrammingbyOptimization(PbO):MotivationandIntroduction
• AlgorithmConfiguration
• Portfolio-BasedAlgorithmSelection– SATzilla:aframeworkforalgorithmselection– Hydra:automaticportfolioconstruction
• SoftwareDevelopmentToolsandFurtherDirections
106
SATzilla:theearlycoreapproach[Leyton-Brown,Nudelman,Andrew,J.McFadden,Shoham 2003]
[Nudelman,Leyton-Brown,Devkar,Shoham,Hoos2004]
• Training(partofalgorithmdevelopment)
– Buildastatisticalmodeltopredictruntimeforeachcomponentalgorithm
• Test(foreachnewinstance)– Predictperformanceforeachalgorithm– Pickthealgorithmpredictedtobebest
• GoodperformanceinSATcompetitions– 2003:2silver,1bronzemedals– 2004:2bronzemedals
107
• Given:– trainingsetofinstances– performancemetric– candidatesolvers– portfoliobuilder
(incl.instancefeatures)
• Training:– collectperformancedata– learnamodelforselecting
amongsolvers
• AtRuntime:– evaluatemodel– runselectedsolver
Metric
Portfolio Builder
Training Set
NovelInstance Portfolio-Based
Algorithm Selector
Candidate Solvers
SelectedSolver
SATzilla (stylizedversion)
108
SATInstanceFeatures(2003-14)Over100features,including:• Instancesize(clauses,variables,clauses/variables,…)• Syntactic properties(e.g.,positive/negativeclauseratio)
• Statisticsofvariousconstraintgraphs– factorgraph– clause–clausegraph– variable–variablegraph
• Knuth’ssearchspacesize estimate
• Treesearchprobing• Localsearchprobing
• Linearprogramming relaxation109
SATzilla 2007
• Substantiallyextendedfeatures
• Earlyalgorithmschedule:identifysetof“presolvers”andascheduleforrunningthem– Foreverychoiceoftwopresolvers +captimes,runentireSATzilla pipelineandevaluateoverallperformance
– Keepchoicethatyieldsbestperformance– Forlatersteps:Discardinstancessolvedbythispresolvingschedule
• Identify a“backupsolver”:SBSonremainingdata– Neededincasefeaturecomputationcrashes
• 2007SATcompetition:3gold,1silver,1bronzemedals
110
[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]
SATzilla 2009
• Robustness:selectionofbestsubsetofcomponentsolvers– Considereverysubsetofgivensolverset
• omittingweaksolverspreventstheiraccidentalselection• conditionedonchoiceofpresolvers• computationallycheap:modelsdecomposeacrosssolvers
– Keepsubsetthatachievesbestperformance
• Fullyautomatedprocedure– optimizeslossonavalidationset
• 2009SATcompetition:3gold,2silvermedals
111
[Xu, Hutter, Hoos & Leyton-Brown, CP 2007; JAIR 2008]
SATzilla 2011andlater:cost-sensitiveDFs
• Howitworks:– Buildclassifiertodeterminewhichalgorithmtopreferbetweeneachpairofalgorithmsingivenportfolio
– Lossfunction:costofmisclassification
• Decisionforestsandsupportvectormachineshavecost-sensitivevariants
• Classifiersvotefordifferentalgorithms;selectalgorithmwithmostvotes– Advantage:selectionisaclassificationproblem– Advantage:bigandsmallerrorstreateddifferently
• 2011SATcompetition:enteredEvaluationTrack(morelater)
112
[Xu, Hutter, Hoos & Leyton-Brown, SAT 2012]
2012SATChallenge:Application
113*Interactingmulti-enginesolvers:likeportfolios,butricherinteractionbetweensolvers
2012SATChallenge:HardCombinatorial
114
SATChallenge2012:Random
115
2012SATChallenge:SequentialPortfolio
• 3Sdeservesmentioning,butdidn’trankofficially[Kadioglu,Malitsky,Sabharwal,Samulowitz,Sellmann,2011]– Disqualifiedonatechnicality
• choseabuggysolverthatreturnedanincorrectresult• anoccupationalhazardforportfolios!
– OverallperformancenearlyasstrongasSATzilla116
2013onwards
• Algorithmselectionavictimofitsownsuccess
• 2013:“TheemphasisofSATCompetition2013isonevaluationofcoresolvers:”– Single-coreportfoliosof>2solversnoteligible– One“opentrack”allowingparallelsolvers,portfolios,etc– Thatopentrackwasdominatedbyportfolios
• 2014onwards– “SATCompetition2014onlyallowssubmissionofcoresolvers”– Portfolioresearchersstartedtheirowncompetition:theICONAlgorithmSelectionChallenge
117
AutoFolio:PbOforAlgorithmSelection
• Defineageneralspaceofalgorithmselectionmethods– AutoFolio‘sconfigurationspace:54choices
• Usealgorithmconfigurationtoselectbestinstantiation– Partitionthetrainingbenchmarkinstancesinto10folds– UseSMACtofindalgorithm selectionapproach&itshyperparametersthatoptimizeCVperformance
118
[Lindauer, Hoos, Hutter & Schaub, JAIR 2015]
ICONAlgorithmSelectionChallenge2015• Ingredientsofanalgorithmselection(AS)benchmark
– Asetofsolvers– Asetofbenchmarkinstances(splitintotraining&test)– Measuredperformanceofallsolversonallinstances
• Algorithmselectioncompetition:– 13ASbenchmarksfromSAT,MaxSAT,CSP,ASP,QBF,Premarshalling– 9competitorsusingregression,classification,clustering,k-NN,etc
• Winningalgorithmsinthe3tracks:– Penalizedaverageruntime:AutoFolio– Numberofinstancessolved:AutoFolio– Frequencyofselectingthebestalgorithm:SATzilla– OverallwinnerSATzilla(2ndinfirsttwotracks)
119
Tryityourself!
• SATzilla andAutoFolio arefreelyavailableonline
http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/http://ml4aad.org/autofolio
• Youcantrythemforyourproblem– wehavefeaturesforSAT,MIP,AIplanningandTSP– youneedtoprovidefeaturesforotherdomains
• inmanycases,thegeneralideasbehindourfeaturesapply• canalsomakefeaturesbyreducingyourproblemtoe.g.SATandcomputingtheSATfeatures
120
Automatically Configuring Algorithmsfor Portfolio-Based SelectionXu, Hoos, Leyton-Brown (2010); Kadioglu et al. (2010)
Note:
I SATzilla builds algorithm selector based on given setof SAT solversbut: success entirely depends on quality of given solvers
I Automated configuration produces solvers that work wellon average on a given set of SAT instances(e.g., SATenstein – KhudaBukhsh, Xu, Hoos, Leyton-Brown 2009)
but: may have to settle for compromisesfor broad, heterogenous instance sets
Idea: Combine the two approaches portfolio-based selectionfrom set of automatically constructed solvers
Hoos & Hutter: Programming by Optimization 121
Combined Configuration + Selection
parametric algorithm(multiple configurations)
selectorfeature extractor
Hoos & Hutter: Programming by Optimization 122
Approach #1:
1. build solvers for various types of instances using automatedalgorithm configuration
2. construct portfolio-based selector from these
Problem: requires suitably defined sets of instances
Solution: automatically partition heterogenous instance set
Hoos & Hutter: Programming by Optimization 123
Instance-specific algorithm configuration (ISAC)
Kadioglu, Malitsky, Sellmann, Tierney (2010); Malitky, Sellman (2012)
1. cluster training instances based on features(using G-means)
2. configure given parameterised algorithm independentlyfor each cluster (using GGA)
3. construct portfolio-based selector from resulting configurations(using distance to cluster centroids)
Drawback: Instance features may not correlate wellwith impact of algorithm parameters on performance(e.g., uninformative features)
Hoos & Hutter: Programming by Optimization 124
Approach #2:
Key idea: Augment existing selector AS by targetting instanceson which AS performs poorly(cf. Leyton-Brown et al. 2003; Leyton-Brown et al. 2009)
I interleave configuration and selector construction
I in each iteration, determine configuration that complementscurrent selector best
Advantages:
I any-time behaviour: iteratively adds configurations
I desirable theoretical guarantees (under idealising assumptions)
Hoos & Hutter: Programming by Optimization 125
HydraXu, Hoos, Leyton-Brown (2010); Xu, Hutter, Hoos, Leyton-Brown (2011)
1. configure given target algorithm A on complete instance set I configuration A1 = selector AS1 (always selects A1)
2. configure a new copy of A on I such that performance ofselector AS := AS1 + Anew is optimised configuration A2
selector AS2 := AS1 + A2 (selects from {A1,A2})
3. configure a new copy of A on I such that performance ofselector AS := AS2 + Anew is optimised configuration A3
selector AS3 := AS2 + A3 (selects from {A1,A2,A3})
...
Hoos & Hutter: Programming by Optimization 126
Note:
I e↵ectively adds A with maximal marginal contributionin each iteration
I estimate marginal contribution using perfect selector (oracle) avoids costly construction of selectors during configuration
I works well using FocusedILS for configuration,*zilla for selection (but can use other configurators, selectors)
I can be further improved by adding multiple configurationsper iteration; using performance estimates from configurator
Hoos & Hutter: Programming by Optimization 127
Results on SAT:
I target algorithm: SATenstein-LS (KhudaBukhsh et al. 2009)
I 6 well-known benchmark sets of SAT instances(application, crafted, random)
I 7 iterations of Hydra
I 10 configurator runs per iteration, 1 CPU day each
Hoos & Hutter: Programming by Optimization 128
Results on mixture of 6 benchmark sets
10-2
100
102
10-2
10-1
100
101
102
103
Hydra[BM, 1] PAR Score
Hyd
ra[B
M, 7
] P
AR
Sco
re
Hoos & Hutter: Programming by Optimization 129
Results on mixture of 6 benchmark sets
0 1 2 3 4 5 6 7 80
50
100
150
200
250
300
Number of Hydra Steps
PA
R S
core
Hydra[BM] trainingHydra[BM] testSATenFACT on trainingSATenFACT on test
Hoos & Hutter: Programming by Optimization 130
Note:
I good results also for MIP (CPLEX)(Xu, Hutter, Hoos, Leyton-Brown 2011)
I idea underlying Hydra can also be applied toautomatically construct parallel algorithm portfoliosfrom single parameterised target algorithm(Hoos, Leyton-Brown, Schaub, Schneider 2012–14)
Hoos & Hutter: Programming by Optimization 131
Software Development Support
and Further Directions
Software development in the PbO paradigm
use context
PbO-<L>source(s)
parametric<L>
source(s)
instantiated<L>
source(s)
deployedexecutable
designspace
description
PbO-<L> weaver
PbO design
optimiser
benchmarkinputs
Hoos & Hutter: Programming by Optimization 132
Design space specification
Option 1: use language-specific mechanisms
I command-line parameters
I conditional execution
I conditional compilation (ifdef)
Option 2: generic programming language extension
Dedicated support for . . .
I exposing parameters
I specifying alternative blocks of code
Hoos & Hutter: Programming by Optimization 133
Advantages of generic language extension:
I reduced overhead for programmer
I clean separation of design choices from other code
I dedicated PbO support in software development environments
Key idea:
I augmented sources: PbO-Java = Java + PbO constructs, . . .
I tool to compile down into target language: weaver
Hoos & Hutter: Programming by Optimization 134
use context
PbO-<L>source(s)
parametric<L>
source(s)
instantiated<L>
source(s)
deployedexecutable
designspace
description
PbO-<L> weaver
PbO design
optimiser
benchmarkinput
Hoos & Hutter: Programming by Optimization 135
Exposing parameters
...
numerator -= (int) (numerator / (adjfactor+1) * 1.4);
... ...
##PARAM(float multiplier=1.4)
numerator -= (int) (numerator / (adjfactor+1) * ##multiplier);
...
I parameter declarations can appear at arbitrary places(before or after first use of parameter)
I access to parameters is read-only (values can only beset/changed via command-line or config file)
Hoos & Hutter: Programming by Optimization 136
Specifying design alternatives
I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)
I Choice point:location in a program at which a choice is available
##BEGIN CHOICE preProcessing
<block 1>
##END CHOICE preProcessing
Hoos & Hutter: Programming by Optimization 137
Specifying design alternatives
I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)
I Choice point:location in a program at which a choice is available
##BEGIN CHOICE preProcessing=standard
<block S>
##END CHOICE preProcessing
##BEGIN CHOICE preProcessing=enhanced
<block E>
##END CHOICE preProcessing
Hoos & Hutter: Programming by Optimization 137
Specifying design alternatives
I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)
I Choice point:location in a program at which a choice is available
##BEGIN CHOICE preProcessing
<block 1>
##END CHOICE preProcessing
...
##BEGIN CHOICE preProcessing
<block 2>
##END CHOICE preProcessing
Hoos & Hutter: Programming by Optimization 137
Specifying design alternatives
I Choice: set of interchangeable fragments of codethat represent design alternatives (instances of choice)
I Choice point:location in a program at which a choice is available
##BEGIN CHOICE preProcessing
<block 1a>
##BEGIN CHOICE extraPreProcessing
<block 2>
##END CHOICE extraPreProcessing
<block 1b>
##END CHOICE preProcessing
Hoos & Hutter: Programming by Optimization 137
!"#$%&'(#)(
*+,-./0"&!1%#2"3
45156#(17%./0
$"&!1%#2"3
7'"(5'(75(#8./0
$"&!1%#2"3
8#49&:#8#)#%!(5+9#
8#"7;'"45%#
8#"%174(7&'
$$$*+,-./0$$$<#5=#1
*+,$8#"7;'&4(767"#1
+#'%>651?7'4!(
Hoos & Hutter: Programming by Optimization 138
The Weaver
transforms PbO-<L> code into <L> code(<L> = Java, C++, . . . )
I parametric mode:
I expose parameters
I make choices accessible via (conditional, categorical)parameters
I (partial) instantiation mode:
I hardwire (some) parameters into code(expose others)
I hardwire (some) choices into code(make others accessible via parameters)
Hoos & Hutter: Programming by Optimization 139
The road ahead
I Support for PbO-based software development
I Weavers for PbO-C, PbO-C++, PbO-Java
I PbO-aware development platforms
I Improved / integrated PbO design optimiser
I Debugging and performance analysis tools
I Best practices
I Many further applications
I Scientific insights
Hoos & Hutter: Programming by Optimization 140
Which choices matter?
Observation: Some design choices matter more than others
depending on . . .
I algorithm under consideration
I given use context
Knowledge which choices / parameters matter may . . .
I guide algorithm development
I facilitate configuration
Hoos & Hutter: Programming by Optimization 141
3 recent approaches:
I Forward selection based on empirical performance modelsHutter, Hoos, Leyton-Brown (2013)
I Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)
I Ablation analysisFawcett & Hoos (2013–16)
Hoos & Hutter: Programming by Optimization 142
Functional ANOVA based on empirical performance modelsHutter, Hoos, Leyton-Brown (2014)
Key idea:
I build regression model of algorithm performance as a functionof all input parameters (= design choices)
empirical performance models (EPMs)
I analyse variance in model output (= predicted performance)due to each parameter, parameter interactions
I importance of parameter: fraction of performance variationover configuration space explained by it (main e↵ect)
I analogous for sets of parameters (interaction e↵ects)
Hoos & Hutter: Programming by Optimization 143
Decomposition of variance in a nutshell
For parameters p1, . . . , pn and a function (performance model) y :
y(p1, . . . , pn) = µ
+ f1(p1) + f2(p2) + · · ·+ fn(pn)
+ f1,2(p1, p2) + f1,3(p1, p3) + · · ·+ fn�1,n(pn�1, pn)
+ f1,2,3(p1, p2, p3) + · · ·+ · · ·
Hoos & Hutter: Programming by Optimization 144
Note:
I Straightforward computation of main and interaction e↵ectsis intractable.(integration over combinatorial spaces of configurations)
I For random forest models, marginal performance predictionsand variance decomposition (up to constant-sized interactions)can be computed exactly and e�ciently.
Hoos & Hutter: Programming by Optimization 145
Empirical study:
I 8 high-performance solvers for SAT, ASP, MIP, TSP(4–85 parameters)
I 12 well-known sets of benchmark data(random + real-world structure)
I random forest models for performance prediction,trained on 10 000 randomly sampled configurations per solver+ data from 25+ runs of SMAC configuration procedure
Hoos & Hutter: Programming by Optimization 146
Fraction of variance explained by main e↵ects:
CPLEX on RCW (comp sust) 70.3%CPLEX on CORLAT (comp sust) 35.0%
Clasp on software verificatition 78.9%Clasp on DB query optimisation 62.5%
CryptoMiniSAT on bounded model checking 35.5%CryptoMiniSAT on software verification 31.9%
Hoos & Hutter: Programming by Optimization 147
Fraction of variance explained by main + 2-interaction e↵ects:
CPLEX on RCW (comp sust) 70.3% + 12.7%CPLEX on CORLAT (comp sust) 35.0% + 8.3%
Clasp on software verificatition 78.9% + 14.3%Clasp on DB query optimisation 62.5% + 11.7%
CryptoMiniSAT on bounded model checking 35.5% + 20.8%CryptoMiniSAT on software verification 31.9% + 28.5%
Hoos & Hutter: Programming by Optimization 148
Note:may pick up variation caused by poorly performing configurations
Simple solution:cap at default performance or quantile from distribution ofrandomly sampled configurations; build model from capped data.
Hoos & Hutter: Programming by Optimization 149
Ablation analysisFawcett & Hoos (2013, 2016)
Key idea:
I given two configurations, A and B , change one parameterat a time to get from A to B
ablation path
I in each step, change parameter to achieve maximal gain(or minimal loss) in performance
I for computational e�ciency, use racing (F-race)for evaluating parameters considered in each step
Hoos & Hutter: Programming by Optimization 150
Empirical study:
I high-performance solvers for SAT, MIP, AI Planning(26–76 parameters),well-known sets of benchmark data (real-world structure)
I optimised configurations obtained from ParamILS(minimisation of penalised average running time;(10 runs per scenario, 48 CPU hours each)
Hoos & Hutter: Programming by Optimization 151
Ablation between default and optimised configurations:
0.1
1
10
100
0 5 10 15 20 25
Per
form
ance
(PA
R10
, s)
#Parameters modified from default
Default to configuredConfigured to default
LPG on Depots planning domain
Hoos & Hutter: Programming by Optimization 152
Which parameters are important?
LPG on depots:
I cri intermediate levels (43% of overall gain!)
I triomemory
I donot try suspected actions
I walkplan
I weight mutex in relaxed plan
Note: Importance of parameters varies between planning domains
New development: Ablation based on surrogate models(Biedenkapp, Lindauer, Eggensperger, Hutter, Fawcett, Hoos 2017)
Hoos & Hutter: Programming by Optimization 153
Algorithm configuration: parameter importance
Algorithm selection: component contributionXu, Hutter, Hoos, Leyton-Brown (2012)
Consider:portfolio-based algorithm selector ASwith candidate algorithms A1,A2, . . .Ak
Question:How much does each Ai contributeto overall performance of AS?
Hoos & Hutter: Programming by Optimization 154
Marginal contribution of Ai to portfolio-based selector AS
= di↵erence in performance of AS with and without Ai
(trained separately)
6= frequency of selecting Ai
6= fraction of instances solved by Ai
6= contribution of Ai to virtual best solver (VBS)
Hoos & Hutter: Programming by Optimization 155
Application to SATzilla:
I all instances from 2011 SAT Competition:300 Application; 300 Crafted; 300 Random
I candidate solvers from 2011 SAT Competition:
I for determining virtual best solver (VBS)and single best solver (SBS):all solvers from Phase 2 of competition:31 Application; 25 Crafted; 17 Random
I for building SATzilla:all sequential, non-portfolio solvers from Phase 2:18 Application; 15 Crafted; 9 Random
I SATzilla assessed by 10-fold cross validation
Hoos & Hutter: Programming by Optimization 156
SATzilla 2011 Performance (Inst. Solved)
Solver Application Crafted Random
VBS 84.7% 76.3% 82.2%SATzilla 2011 75.3% 66.0% 80.8%SATzilla 2009 70.3% 63.0% 80.3%Gold medalist (SBS) 71.7% 54.3% 68.0%
Hoos & Hutter: Programming by Optimization 157
Performance of Individual SolversApplication
0 20 40 60 80 100
QuteRSatCryptoMinisat
EBGlucoseGlucose2Glucose1
RclMinisat_psm
ContrasatMPhaseSAT64
LingelingPrecosat
LR GL SHRGlueminisatMinisatagile
EBMinisatMinisat
CirminisatRestartSAT
Percentage Solved
5000 CPU sec cuto↵
Hoos & Hutter: Programming by Optimization 158
Correlation of Solver PerformanceApplication
RestartSAT
CirminisatMinisat
EBMinisat
Minisatagile
Glueminisat
LR GL SHRPrecosat
Lingeling
MPhaseSAT64
Contrasat
Minisat_psmRclGlucose1
Glucose2
EBGlucose
CryptoMinisat
QuteRSat
RestartSATCirminisat
MinisatEBMinisat
MinisatagileGlueminisatLR GL SHR
PrecosatLingeling
MPhaseSAT64Contrasat
Minisat_psmRcl
Glucose1Glucose2
EBGlucoseCryptoMinisat
QuteRSat
darker = higher Spearman correlation coe�cient
Hoos & Hutter: Programming by Optimization 159
Correlation of Solver PerformanceRandom
Sparrow
EagleU
P
Gnovelt
y+2TNM
Sattime
11
Adaptg2
wsat11
MPhas
eSAT_M
March_
rw
March_
hi
Sparrow
EagleUP
Gnovelty+2
TNM
Sattime11
Adaptg2wsat11
MPhaseSAT_M
March_rw
March_hi
darker = higher Spearman correlation coe�cient
Hoos & Hutter: Programming by Optimization 160
Solver Selection Frequency in SATzilla 2011Application
Glucose2 (Backup)
Glucose2
Glueminisat
QuteRSat
Precosat
Other Solvers
Solved by Presolvers
Hoos & Hutter: Programming by Optimization 161
Instances Solved by SATzilla 2011 ComponentsApplication
Glucose2 (Backup)
Glucose2 (Pre1)
Glucose2
Glueminisat (Pre1)
GlueminisatQuteRSat
Precosat
EBGlucose (Pre1)EBGlucose
Minisat psm (Pre1)Minisat psm
Other Solvers
Unsolved
Hoos & Hutter: Programming by Optimization 162
Marginal Contribution of ComponentsApplication
0 2 4 6 8 10
QuteRSatCryptoMinisat
EBGlucoseGlucose2Glucose1
RclMinisat_psm
ContrasatMPhaseSAT64
LingelingPrecosat
LR GL SHRGlueminisatMinisatagile
EBMinisatMinisat
CirminisatRestartSAT
Marginal Contribution (%)
Hoos & Hutter: Programming by Optimization 163
Instances Solved vs Marginal Contribution of ComponentsApplication
0 10 20 30 40 50 600
2
4
6
8
10
% Solved by Component Solver
Mar
gina
l Con
tribu
tion
MPhaseSAT64
Glueminisat
Hoos & Hutter: Programming by Optimization 164
Instances Solved vs Marginal Contribution of ComponentsCrafted
0 10 20 30 40 50 600
2
4
6
8
10
% Solved by Component Solver
Mar
gina
l Con
tribu
tion
Sol
Clasp2Sattime
MPhaseSAT
Joint contributions:- 2 Clasp variants = 6.3%- 2 Sattime variants = 5.4%
Hoos & Hutter: Programming by Optimization 165
Instances Solved vs Marginal Contribution of ComponentsRandom
0 10 20 30 40 50 600
2
4
6
8
10
% Solved by Component Solver
Mar
gina
l Con
tribu
tion
EagleUP
Sparrow
March_rw
Joint contributions:- 2 March variants = 4%- 6 LS solvers = 22.5%
Hoos & Hutter: Programming by Optimization 166
More nuanced analysis based on Shapley value:
AAAI-16 paper by Frechette et al.
Bias + correction in portfolio performance evaluation:
IJCAI-16 paper by Cameron et al.
Hoos & Hutter: Programming by Optimization 167
Leveraging parallelism
I design choices in parallel programs(Hamadi, Jabhour, Sais 2009)
I deriving parallel programs from sequential sources concurrent execution of optimised designs (parallel portfolios)(Hoos, Leyton-Brown, Schaub, Schneider 2012)
I parallel design optimisers(e.g., Hutter, Hoos, Leyton-Brown 2012)
I use of cloud resources (parallel runs of design optimisers, ...)(Geschwender, Hutter, Kottho↵, Malitsky, Hoos, Leyton-Brown 2014)
Hoos & Hutter: Programming by Optimization 168
Take-home Message
Programming by Optimisation ...
I leverages computational power to constructbetter software
I enables creative thinking about design alternatives
I produces better performing, more flexible software
I facilitates scientific insights into
I e�cacy of algorithms and their components
I empirical complexity of computational problems
... changes how we build and use high-performance software
Hoos & Hutter: Programming by Optimization 169
More Information:
I www.prog-by-opt.net/Tutorials/IJCAI-17
I www.prog-by-opt.net
I PbO article in Communications of the ACM (Hoos 2012)
I Talk by Kleinberg et al.: Tue, 15:15, Room 216Talk by Lindauer et al.: Fri, 10:50, Room 203Invited talk by HH at CP/ICLP/SAT: Tue, 08/29, 9:00
I Forthcoming book (Morgan & Claypool)
If PbO works for you:
I Make our day – let us know!
I Share the joy – tell everyone else!Hoos & Hutter: Programming by Optimization 170