Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

56
Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Page 1: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Experimental Analysis of Algorithms: What to Measure

Catherine C. McGeoch

Amherst College

Page 2: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Theory vs Practice

Predict puck velocities using the new composite sticks.

Page 3: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Isolate Components

Attach Probes

Control Factors

Page 4: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Experiments on AlgorithmsGood news Bad news

Easy to probe.

Nearly total experimental control.

Simple mechanisms.

Model validation not a problem.

Fast experiments (often).

Tons of data points (often).

Unusual questions = few techniques.

Unusually precise questions = need advanced techniques.

Unusual data = parametric methods are weak.

Large and infinite sample spaces = sampling difficulties.

Generalization/abstraction from the computational artifact = extrapolation.

NP-hard problems = problematic.

Page 5: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Standard statistical techniques1. Comparison (estimation

and hypothesis testing): same/different, bigger/smaller.

2. Interpolation (linear and nonlinear regression -- fitting models to data.

3. Extrapolation (??) -- building models of data, explaining phenomena.

cost

parameter

Page 6: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Standard statistical techniques1. Comparison (estimation

and hypothesis testing): same/different, bigger/smaller.

2. Interpolation (linear and nonlinear regression -- fitting models to data.

3. Extrapolation (*) -- building models of data, explaining phenomena.

cost

numerical parameter

Page 7: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Standard statistical techniques1. Comparison (estimation

and hypothesis testing): same/different, bigger/smaller.

2. Interpolation (linear and nonlinear regression -- fitting models to data. Interpolation

3. Extrapolation (*) -- building models, explaining phenomena.

cost

numerical parameter

Page 8: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Some Nonstandard Techniques

1. Graphical analysis (GA) -- big data sets, unusual questions, interpolation, extrapolation.

2. Exploratory data analysis (EDA) -- model building, unusual data sets.

3. Variance reduction techniques -- simple mechanisms, complete control.

4. Biased estimators -- NP-Hard problems, large sample spaces.

Today!

Page 9: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Case Study: First Fit Bin Packing

u

l

Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.

Applications: CD file storage; stock cutting; iPod file storage; generalizations to 2D, 3D...

Bin packing is NP-Hard.

How well does the FF heuristic

perform?

Page 10: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Experimental Study of First Fit Bin Packing

Input categories: -- uni1: n reals drawn uniformly from (0, 1] -- uni8: n reals drawn uniformly from (0, .8] -- file: n file sizes (scaled to 0..1). -- dict: n dictionary word sizes (in 0..1).

Run First Fit on these inputs, analyze results....

Page 11: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

What to Measure? What performance indicator to use for assessing heuristic solution quality?

The obvious choice: Number of Bins

Other performance indicators suggested by data analysis:

- Graphical analysis (GA) - Exploratory data analysis (EDA) - Variance reduction techniques (VRT) - Biased estimators

Today’s topic!

Page 12: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

uni1 30,000 15,270

uni1 60,000 30,446

uni1 120,000 60,809

uni8 30,000 12,217

uni8 60,000 24,385

uni8 120,000 48,965

file 30,000 9

file 60,000 13

file 124,016 27

dicto 60,687 23,727

dicto 61,406 22,448

dicto 81,520 28,767

Tabular data: Good for comparisons.

Which is better? How much better? When is it better?

First Fit

input type n Bins

Page 13: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Graphical Analysis

Identify trends

Find common scales

Discover anomalies

Build models/explanations

Page 14: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Graphical analysis: Locate Correlations

GA: Look for trends.

Page 15: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

GA (pairs plot): Look for correlations

Page 16: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

GA (Weight Sum vs Bins): Find a common scale

y = x

Page 17: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

GA (distribution of weights in input): look for explanations

Conjectured input properties affecting FF packing quality: symmetry, discreteness, skew.

Page 18: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

File data: FF packings are optimal! Due to extreme skew in the weight distribution (few big weights, many tiny weights).

Dicto data: Sorting the weights (FFD) makes the packing worse! Due to discrete weights in bad combinations. (Bad FFD packings can be predicted to within 100 bins. )

Uniform data: Smaller weight distributions (0 .8) give worse packings (compared to optimal) than larger weight distributions (0, 1). Probably due to asymmetry.

Graphical Analysis: some results...

(more)...

Page 19: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Top line = 1-u

Bottom line = .55 - u/2

Conjecture: The distribution of “empty space per bin” has holes when u <= .85. These holes cause bad FF packings.

GA (details): u vs distribution of es in bins

Page 20: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Graphical Analysis: What to Measure

Input: Number of weights Sum of weights Number of weights > 0.5 Weight distribution

Output: Number of bins Empty space = Bins - Weight Sum Gaps = Empty space per bin Distribution of gaps Animations of packings

Page 21: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Exploratory Data Analysis

Smooth and the rough: look at general trends, and (equally important) deviation from trends.

Categories of data: tune analysis for type of data -- counts and amounts, proportions, counted fractions, percentages ...

Data transformation: adjust data properties using logarithms, powers, square roots, ratios ...

No a priori hypotheses, no models, no estimators.

Page 22: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA: The smooth ...

Number of bins is nearly equal to weight sum.

Page 23: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

... and the rough: Weight Sum vs Bins/Weight Sum

Deviation of bins from weight sum < 8% of bins, depending on input class. (Focus on 8 and 1 ...)

Page 24: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Given n weights drawn uniformly from (0, u], for 0 < u <= 1.

Consider FF packing quality as f(u), for fixed n.

Focus on Uniform Weight Lists

Page 25: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA (the smooth): u vs Bins, at n=100,000

Number of bins is proportional to upper bound on weights.

Page 26: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA (the Rough): u vs Bins/Weight Sum (~ nu/2).

Page 27: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA: Categories of Data

Bin efficiency = Bins/Weight Sum

Always > 1, mean is in [1.0,1.7], variance large but decreasing in n. Does it converge to 1 (optimal) or to 1+c?

Empty space = Bins - Weight Sum

Always >= 0, mean is linear or sublinear in nu, variance constant in n.) Is it linear or sublinear in n?

a ratio

a difference

Convergence in n is easier to see

Page 28: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Packing efficiency: n vs Bins/(un/2).

Page 29: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Empty Space: (Bins - Weight)/nu

Page 30: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Power law fits: y = x^.974 .. x^.998

EDA (data transformation): Linear growth on a log-log scale.

Page 31: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA: Some Results

Number of Bins is near Weight Sum ~ nu/2, with largest deviations near u = .8.

Empty space (a difference) has clearer convergence properties than Bin Efficiency (a ratio).

Empty space appears to be asymptotically linear in n -- non-optimal -- for all u except 1.

Page 32: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Variance Reduction Techniques

Is y asymptotically positive or negative?

x x

y y0

Page 33: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

VRT: Control VariatesSubtract a source of noise if its expectation is known and it is positively correlated with outcome.

Expected number of bins:

Bins = Weight Sum + Empty Space

E[WS - nu/2] = 0

E[B - (WS - nu/2) ] =

Var[B - (WS - nu/2)] = Var[B] + Var[(WS-nu/2)] -

2Cov[B, (WS-nu/2)].

B - WS + nu/2 = ES + nu/2.

Weight Sum is a Control Variate for Bins.

ES + nu/2 is a better estimator of .

Page 34: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Estimating with B.

Page 35: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Estimating with ES

Page 36: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

More Variance Reduction Techniques

Common Random Variates: Compare heuristics on identical inputs when performance is correlated.

Antithetic Variates: Exploit negative correlation in inputs.

Stratification: Adjust variations in output according to known variations in input.

Conditional Monte Carlo: More data per experiment, using efficient tests.

Page 37: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Biased Estimators

Bad estimator of mean(y) vs good estimator of z=lb(y).

x x

y

z

Page 38: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Biased Estimators

Bad estimator of mean(y) vs cheap estimator of z=lb(y).

x x

y

z

Page 39: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Biased Estimators of Optimal Packing Quality

Bounds on optimal number of bins:

U: FF number of bins used (or any heuristic)

L: Weight sum

L: Number items >= 0.5

L: FF/2

L: (FF -2)10/17

L: (FFD - 4 )9/11

Page 40: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Summary: What to Measure

GA: trends

GA: scale

GA: details

EDA: smooth and rough

EDA: data categories

EDA: data transformation

VRT: alternatives with same mean, lower variance

BE: lower/upper bounds on the interesting quantity

First Fit Packings:

number of bins

number of weights

sum of weights

distribution of weights

number of weights > .5

packing efficiency

empty space

empty space per bin

bounds on bin counts

Page 41: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Another Example Problem

TSP: Given graph G with n vertices and m weighted edges, find the least-cost tour through all vertices.

17

3

6

18

12

6

20

4

10

9 12

Applications: well known.

Page 42: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

TSP: What to MeasureVRT’s and BE’s:

Mean edge weight is a control variate for Tour Length.

Beginning Tour is a control variate for Final Tour, in iterative algorithms with random starts.

f(MST + Matching) is a biased estimator of Tour Length (lower bound).

Held-Karp Lower Bound is biased estimator of Tour Length.

Can you think of others?

Page 43: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

TSP: Graphical Analysis

Input: Vertices n and Edges m ... can you think of others?

Output: Tour Length ... can you think of others?

Page 44: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

TSP: Exploratory Data Analysis

Any ideas?

Page 45: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

References

Tukey, Exploratory Data Analysis.

Cleveland, Visualising Data.

Chambers, Cleveland, Kleiner, Tukey, Graphical Methods for Data Analysis.

Bratley, Fox, Schrage, A Guide to Simulation.

C. C. McGeoch, “Variance Reduction Techniques and Simulation Speedups,” Computing Surveys, June 1992.

Page 46: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Upcoming Events in Experimental Algorithmics

January 2007: ALENEX (Workshop on Algorithm Engineering and Experimentation), New Orleans.

Spring 2007: DIMACS/NISS joint workshop on experimental analysis of algorithms, North Carolina. (Center for Discrete Mathematics and Theoretical Computer Science, and National Institute for Statistical Sciences. )

June 2007: WEA (Workshop on Experimental Algorithmics), Rome.

(Ongoing): DIMACS Challenge on Shortest Paths Algorithms.

Page 47: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Unusual Functions

Page 48: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

EDA: What To Measure

The Smooth and the Rough:

Bins used is near nu/2, with large deviations in packing quality near u - .8.

Categories of Data:

Empty space (a difference) has clearer convergence properties than Bin Efficiency (a ratio).

Data Transformation:

Empty space appears to be asymptotically linear in n -- implying non-optimal convergence -- for all u except 1.

Page 49: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Theory and Practice

Page 50: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Hockey Science: Predict the in-game puck velocities generated by new stick technologies.

Page 51: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Theory, Experiment, Practice

Page 52: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

What is experimental algorithmics?

Page 53: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

What is experimental algorithmics?

Abstraction

Dominant operations

Asymptotics

Worst-case, Average Case

Real systems

CPU times

Measurements

Real inputs, benchmarks

Page 54: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

First Fit

u

l

L = rand_list(n, l, u)

S(L)=Sum of weights in L

B(L)=Number of bins used in FF

ES(L) = Total empty space in bins: B(L) -S(L)

ESL(L) = Empty space in all but the last bin.

Opt(L) = Optimal number of bins

Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.

Page 55: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

First Fit Bin Packing

Given n weights drawn uniformly from the real range (l, u], pack into unit-capacity bins according to the First Fit (FF) heuristic.

+

u

l

n

0

1

Page 56: Experimental Analysis of Algorithms: What to Measure Catherine C. McGeoch Amherst College.

Type n Bins, Bins/Sum

unif 30,000 15,270 / 1.081

unif 60,000 30,446 / 1.024

unif 120,000 60,809 / 1.033

file 30,000 9 / 1.021

file 60,000 13 / 1.015

file 124,016 27 / 1.013

dicto 60,687 23,727/ 1.037

dicto 61,406 22,448/1.047

dicto 81,520 28,767/1.038