Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II...

32
Distributions of Randomized Backtrack Search •Key Properties: I Erratic behavior of mean II Distributions have “heavy tails ”.

Transcript of Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II...

Distributions of Randomized Backtrack Search

Distributions of Randomized Backtrack Search

• Key Properties:

• I Erratic behavior of mean

• II Distributions have “heavy tails”.

Median = 1!

samplemean

3500!

Erratic Behavior of Search CostQuasigroup Completion ProblemErratic Behavior of Search Cost

Quasigroup Completion Problem

500

2000

number of runs

1

75%<=30

Number backtracks Number backtracks

Pro

port

ion o

f ca

ses

Solv

ed

5%>100000

Heavy-Tailed DistributionsHeavy-Tailed Distributions

• … … infinite variance … infinite meaninfinite variance … infinite mean

• Introduced by Pareto in the 1920’s• --- “probabilistic curiosity.”

• Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena.

• Examples: stock-market, earth-quakes, weather,...

Decay of DistributionsDecay of Distributions

• Standard --- Exponential Decay• e.g. Normal:•

• Heavy-Tailed --- Power Law Decay• e.g. Pareto-Levy:••

Pr[ ] , ,X x Ce x for some C x 2 0 1

Pr[ ] ,X x Cx x 0

Standard Distribution(finite mean & variance)

Power Law Decay

Exponential Decay

Normal, Cauchy, and LevyNormal, Cauchy, and Levy

Normal - Exponential Decay

Cauchy -Power law DecayLevy -Power law Decay

Tail Probabilities (Standard Normal, Cauchy, Levy)

Tail Probabilities (Standard Normal, Cauchy, Levy)

• c Normal Cauchy Levy0 0.5 0.5 11 0.1587 0.25 0.68272 0.0228 0.1476 0.52053 0.001347 0.1024 0.43634 0.00003167 0.078 0.3829

Example of Heavy Tailed Model(Random Walk)

Example of Heavy Tailed Model(Random Walk)

•Random Walk:•Start at position 0

•Toss a fair coin:–with each head take a step up (+1)

–with each tail take a step down (-1)X --- number of steps the random walk takes to return to position 0.

The record of 10,000 tosses of an ideal coin

(Feller)

Zero crossing Long periods without zero crossing

Random Walk

Heavy-tails vs. Non-Heavy-TailsHeavy-tails vs. Non-Heavy-Tails

Normal(2,1000000)

Normal(2,1)

O,1%>200000

50%

2

Median=2

1-F

(x)

Unso

lved f

ract

ion

X - number of steps the walk takes to return to zero (log scale)

How to Check for “Heavy Tails”?

How to Check for “Heavy Tails”?

• Log-Log plot of tail of distribution• should be approximately linear.

• Slope gives value of • • infinite mean and infinite varianceinfinite mean and infinite variance

• infinite varianceinfinite variance

1

1 2

466.0

319.0153.0

Number backtracks (log)

(1-F

(x))

(log

)U

nso

lved

fra

ctio

n

1 => Infinite mean

Heavy-Tailed Behavior in QCP Domain

18% unsolved

0.002% unsolved

Formal Models of Heavy-Tailed Behavior in

Combinatorial Search

Chen, Gomes, Selman 2001

MotivationMotivationMotivationMotivation• Research on heavy-tails has been largely based on empirical studies of run time distribution.

• Goal: to provide a formal characterization of tree search models and show under what conditions heavy-tailed distributions can arise.

• Intuition: Heavy-tailed behavior arises:

• from the fact that wrong branching decisions may lead the procedure to explore an exponentially large subtree of the search space that contains no solutions;

• the procedure is characterized by a large variability in the time to find a solution on different runs, which leads to highly different trees from run to run;

••

Balanced vs. ImbalancedBalanced vs. Imbalanced Tree Model Tree Model

Balanced vs. ImbalancedBalanced vs. Imbalanced Tree Model Tree Model

• Balanced Tree Model:

• chronological backtrack search model;• fixed variable ordering;• random child selection with no propagation

mechanisms;

(show demo)

221)]([

nnTE

12

122)]([n

nTV

The run time distribution of chronological backtrack search ona complete balanced tree is uniform (therefore not heavy-tailed).Both the expected run time and variance scale exponentially

Balanced Tree ModelBalanced Tree ModelBalanced Tree ModelBalanced Tree Model

– The expected run time and variance scale exponentially, in the height of the search tree (number of variables);

– The run time distribution is Uniform, (not heavy tailed ).

– Backtrack search on balanced tree model has no restart strategy with exponential polynomial time.

221)]([

nnTE

12

122)]([n

nTV

Chen, Gomes & Selman 01

• How can we improve on the balanced serach tree model?

• Very clever search heuristic that leads quickly to the solution node - but that is hard in general;

• Combination of pruning, propagation, dynamic variable ordering that prune subtrees that do not contain the solution, allowing for runs that are short.

• ---> resulting trees may vary dramatically from run to run.

• T - the number of leaf nodes visited up to and including the successful node; b - branching factor

0)1(][ iippibTP

Formal Model Yielding Heavy-Tailed BehaviorFormal Model Yielding Heavy-Tailed Behavior

b = 2

(show demo)

• Expected Run Time• (infinite expected time)

• Variance•• (infinite variance)

• Tail

• (heavy-tailed)

][1 TEb

p

][2

1 TVb

p

2log

2][2

1 LCp

bLpLTPb

p

Bounded Heavy-Tailed Behavior

(show demo)

No Heavy-tailed behavior for Proving Optimality

No Heavy-tailed behavior for Proving Optimality

Proving OptimalityProving Optimality

Small-World Vs. Heavy-Tailed Behavior

• Does a Small-World topology (Watts & Strogatz) induce heavy-tail behavior?

The constraint graph of a quasigroup exhibits a small-world topology(Walsh 99)

Exploiting Heavy-Tailed Behavior

Exploiting Heavy-Tailed Behavior

• Heavy Tailed behavior has been observed in several domains: QCP, Graph Coloring, Planning, Scheduling, Circuit synthesis, Decoding, etc.

• Consequence for algorithm design:

• Use restarts or parallel / interleaved runs to exploit the extreme variance performance.Restarts provably eliminate

heavy-tailed behavior.(Gomes et al. 97, Hoos 99, Horvitz 99, Huberman, Lukose and Hogg 97, Karp et al 96, Luby et al. 93, Rish et al. 97, Wlash 99)

X XX XX

solved10 101010 10

Sequential: 50 +1 = 51 seconds

Parallel: 10 machines --- 1 second 51 x speedup

Super-linear Speedups

Interleaved (1 machine): 10 x 1 = 10 seconds 5 x speedup

RestartsRestarts70%

unsolved

1-F

(x)

Un

solv

ed f

ract

ion

Number backtracks (log)

no restarts

restart every 4 backtracks

250 (62 restarts)

0.001%unsolved

Example of Rapid Restart Speedup(planning)

1000

10000

100000

1000000

1 10 100 1000 10000 100000 1000000

log( cutoff )

log

( b

ackt

rack

s )

20

2000 ~100 restarts

Cutoff (log)

Num

ber

back

track

s (l

og)

~10 restarts

100000

Sketch of proof of elimination of heavy tails

Sketch of proof of elimination of heavy tails

• Let’s truncate the search procedure• after m backtracks.

• Probability of solving problem with truncated version:

• Run the truncated procedure and restart it repeatedly.

pm X m Pr[ ]

X numberof backtracks to solve the problem

Y total number backtracks with restarts

F Y y pmY m

c e c y

Pr[ ] ( )

/1

12

Number of starts Y m Geometric pmRe / ~ ( )

Y - does not have Heavy Tails