Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank...

26
Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter

Transcript of Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank...

Page 1: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

Automatic Algorithm Configuration based on Local Search

EARG presentationDecember 13, 2006

Frank Hutter

Page 2: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

2

Motivation

• Want to design “best” algorithm A to solve a problem Many design choices need to be made Some choices deferred to later: free parameters of algorithm Set parameters to maximise empirical performance

• Finding best parameter configuration is non-trivial Many parameter configurations Many test instances Many runs to get realistic estimates for randomised algorithms Tuning still often done manually, up to 50% of development

time

• Let’s automate tuning!

Page 3: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

3

Parameters in different research areas• NP-hard problems: tree search

Variable/value heuristics, learning, restarts, …

• NP-hard problems: local search Percentage of random steps, tabu length, strength of escape moves, …

• Nonlinear optimisation: interior point methods Slack, barrier init, barrier decrease rate, bound multiplier init, …

• Computer vision: object detection Locality, smoothing, slack, …

• Supervised machine learning NOT model parameters L1/L2 loss, penalizer, kernel, preprocessing, num. optimizer, …

• Compiler optimisation, robotics, …

Page 4: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

4

Related work• Best fixed parameter setting

Search approaches [Minton ’93, ‘96], [Hutter ’04], [Cavazos & O’Boyle ’05],[Adenso-Diaz & Laguna ’06], [Audet & Orban ’06]

Racing algorithms/Bandit solvers [Birattari et al. ’02], [Smith et al. ’04 – ’06] Stochastic Optimisation [Kiefer & Wolfowitz ’52], [Geman & Geman ’84], [Spall ’87]

• Per instance Algorithm selection [Knuth ’75], [Rice 1976], [Lobjois and Lemaître ’98],

[Leyton-Brown et al. ’02], [Gebruers et al. ’05] Instance-specific parameter setting [Patterson & Kautz ’02]

• During algorithm run Portfolios [Kautz et al. ’02], [Carchrae & Beck ’05],

[Gagliolo & Schmidhuber ’05, ’06] Reactive search [Lagoudakis & Littman ’01, ’02], [Battiti et al ’05], [Hoos ’02]

Page 5: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

5

Static Algorithm Configuration (SAC)

SAC problem instance: 3-tuple (D,A,), where D is a distribution of problem instances, A is a parameterised algorithm, and is the parameter configuration space of A.

Candidate solution: configuration 2 , expected cost

C() = EI»D[Cost(A, , I)]

Stochastic Optimisation Problem

Page 6: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

6

Static Algorithm Configuration (SAC)

SAC problem instance: 3-tuple (D,A,), where D is a distribution of problem instances, A is a parameterised algorithm, and is the parameter configuration space of A.

Candidate solution: configuration 2 , expected cost

C() = statistic[ CD (A, , I) ]

CD(A, , D): cost distribution of algorithm A with parameterconfiguration across instances from D. Variation due to randomisation of A and variation in instances.

Stochastic Optimisation Problem

Page 7: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

7

Parameter tuning in practice

• Manual approaches are often fairly ad hoc Full factorial design

- Expensive (exponential in number of parameters) Tweak one parameter at a time

- Only optimal if parameters independent Tweak one parameter at a time until no more improvement

possible- Local search ! local minimum

• Manual approaches are suboptimal Only find poor parameter configurations Very long tuning time Want to automate

Page 8: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

8

Simple Local Search in Configuration Space

Page 9: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

9

Iterated Local Search (ILS)

Page 10: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

10

ILS in Configuration Space

Page 11: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

11

What is the objective function?• User-defined objective function

E.g. expected runtime across a number of instances Or expected speedup over a competitor Or average approximation error Or anything else

• BUT: must be able to approximate objective based on a finite (small) number of samples Statistic is expectation ! sample mean (weak law of large numbers) Statistic is median ! sample median (converges ??) Statistic is 90% quantile ! sample 90% quantile

(underestimated with small samples !) Statistic is maximum (supremum) ! cannot generally approximate based on

finite sample?

Page 12: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

12

Parameter tuning as “pure” optimisation problem

• Approximate objective function (statistic of a distribution) based on fixed number of N instances Beam search [Minton ’93, ‘96] Genetic algorithms [Cavazos & O’Boyle ’05] Exp. design & local search [Adenso-Diaz & Laguna ‘06] Mesh adaptive direct search [Audet & Orban ‘06]

• But how large should N be ? Too large: take too long to evaluate a configuration Too small: very noisy approximation ! over-tuning

Page 13: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

13

Minimum is a biased estimator

• Let x1, …, xn be realisations of rv’s X1, …, Xn

• Each xi is a priori an unbiased estimator of E[ Xi ] Let xj = min(x1,…,xn)

This is an unbiased estimator of E[ min(X1,…,Xn) ]but NOT of E[ Xj ] (because we’re conditioning on it being the minimum!)

• Example Let Xi » Exp(), i.e. F(x|) = 1 - exp(-x) E[ min(X1,…,Xn) ] = 1/n E[ min(Xi) ] I.e. if we just take the minimum and report its runtime, we

underestimate cost by a factor of n (over-confidence)

• Similar issues for cross-validation etc

Page 14: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

14

Primer: over-confidence for N=1 sample

y-axis: runlength of best found so farTraining: approximation based on N=1 sample (1 run, 1 instance)Test: 100 independent runs (on the same instance) for each Median & quantiles over 100 repetitions of the procedure

Page 15: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

15

Over-tuning

• More training ! worse performance Training cost monotonically decreases Test cost can increase

• Big error in cost approximation leads to over-tuning (on expectation): Let * 2 be the optimal parameter configuration Let ’ 2 be a suboptimal parameter configuration

with better training performance than * If the search finds * before ’ (and it is the best one so

far), and the search then finds ’, training cost decreases but test cost increases

Page 16: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

16

1, 10, 100 training runs (qwh)

Page 17: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

17

Over-tuning on uf400 instance

Page 18: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

18

Other approaches without over-tuning

• Another approach Small N for poor configurations , large N for good ones

- Racing algorithms [Birattari et al. ’02, ‘05]- Bandit solvers [Smith et al., ’04 –’06]

But treat all parameter configurations as independent- May work well for small configuration spaces- E.g. SAT4J has over a million possible configurations

! Even a single run each is infeasible

• My work: combination of approaches Local search in parameter configuration space Start with N=1 for each , increase it whenever re-visited Does not have to visit all configurations, good ones visited often Does not suffer from over-tuning

Page 19: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

19

Unbiased ILS in parameter space

• Over-confidence for each vanishes as N!1

• Increase N for good Start with N=0 for all Increment N for

whenever is visited

• Can proof convergence Simple property,

even applies for round-robin

• Experiments: SAPS on qwh

ParamsILS with N=1

Focused ParamsILS

Page 20: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

20

No over-tuning for my approach (qwh)

Page 21: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

21

Runlength on simple instance (qwh)

Page 22: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

22

SAC problem instances

• SAPS [Hutter, Tompkins, Hoos ’02] 4 continuous parameters h,,wp, Psmoothi Each discretised to 7 values ! 74 =2,401 configurations

• Iterated Local Search for MPE [Hutter ’04]

4 discrete + 4 continuous (2 of them conditional) In total 2,560 configurations

• Instance distributions Two single instances, one easy (qwh), one harder (uf) Heterogeneous distribution with 98 instances from satlib for

which SAPS median is 50K-1M steps Heterogeneous distribution with 50 MPE instances: mixed

• Cost: average runlength, q90 runtime, approximation error

Page 23: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

23

Comparison against CALIBRA

Page 24: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

24

Comparison against CALIBRA

Page 25: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

25

Comparison against CALIBRA

Page 26: Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

26

Local Search for SAC: Summary

• Direct approach for SAC• Positive

Incremental homing-in to good configurations But using the structure of parameter space (unlike bandits) No distributional assumptions Natural treatment of conditional parameters

• Limitations Does not learn

- Which parameters are important?- Unnecessary binary parameter: doubles search space

Requires discretization- Could be relaxed (hybrid scheme etc)