Download - Reducing the Run-time of MCMC Programs by …hpc.pnl.gov/mtaap/mtaap08/byrd-mcmc.pdfAbhir H. Bhalerao Department of Computer Science University of Warwick ... Execute multiple chains.

Markov Chain Monte CarloSpeculative Moves

Summary

Reducing the Run-time of MCMC Programs byMultithreading on SMP Architectures

Jonathan M. R. Byrd Stephen A. JarvisAbhir H. Bhalerao

Department of Computer ScienceUniversity of Warwick

MTAAP IPDPS 2008

Jonathan M. R. Byrd, Stephen A. Jarvis, Abhir H. Bhalerao Multithreading MCMC


Summary

Outline

1 Markov Chain Monte CarloIntroduction to MCMCProgram CycleExisting Parallelisation

2 Speculative MovesMethodTheoretical ResultsPractical Results



Summary

Introduction to MCMCProgram CycleExisting Parallelisation

What is Markov Chain Monte Carlo?

MCMC is a computationally expensive iterative techniquefor sampling from a probability distribution.Basic idea:

Construct a Markov Chain such that its stationarydistribution is equal to the distribution we wish to sample.After sufficient burn-in time, sampling from the chain isequivalent to sampling from the distribution.



Summary


What uses Markov Chain Monte Carlo?

MCMC is widely used inBayesian statisticsComputational physicsComputational biology

Specific applications include:Phylogenetic analysisSpectral modelling of X-ray data from the Chandra X-raysatelliteCalculating financial econometricsMapping vascular trees from retinal slides



Summary


The MCMC Program Cycle

Create a potentialnew state, y

Calculate acceptanceprobability α

rng() < α

Let x be the current state

Apply state changex→ y

Yes

No



Summary





rng() < α



Yes

No

α



Summary





rng() < α



Yes

No



Summary


Existing Parallelisation

Execute multiple chains. Take samples from all of them.Embarrassingly parallelDoes not reduce burn-in time.Does not help escape local optima.

Metropolis-Coupled MCMCExecute multiple chains.Coarse parallelisation, machines connected by LAN.Modifies algorithm to improve rate of convergence.Good for escaping local optima.Hard to predict benefits.



Summary

MethodTheoretical ResultsPractical Results

Introducing Speculative Moves

1 Each state in a Markov Chain must depend on only thepreceding state.

2 But, typically only 14 of iterations accept the proposed

state-change.3 Consecutive rejected iterations could have been performed

in parallel.4 => Assume all iterations will be rejected.



Summary


Speculative Moves Program Cycle

Create a potential state a

Calculate acceptanceprobability αa

rng() < αa

Let y = a

Apply x→ y


Yes

No



Summary





rng() < αa

Let y = a

Create a potential state b

Calculate acceptanceprobability αb

rng() < αb

Let y = b

Apply x→ y


Thread AThread B

YesYes

NoNo



Summary





rng() < αa

Let y = a

Create a potential state b

Calculate acceptanceprobability αb

rng() < αb

Let y = b

Create a potential state c

Calculate acceptanceprobability αc

rng() < αc

Let y = c

Apply x→ y


Thread AThread B

Thread C

YesYes

Yes

NoNo No



Summary


Theoretical Benefits of Speculative Moves

Let:n be the number of iterations considered concurrently.pr be the average state-change rejection probability.

Each program cycle performs 1..n MCMC iterations.

On average 1−pnr

1−prMCMC iterations are performed at each

loop of the program cycle.If multithreading overhead negligible, time for 1 programcycle ≈ time for 1 MCMC iteration.



Summary


Theoretical Results

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Run

time

(per

cent

ofse

quen

tial)

pr (move rejection probability)

Maximum benefit of speculative moves on runtime

2 processes4 processes8 processes

16 processes



Summary


Theoretical Results

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Run

time

(per

cent

ofse

quen

tial)


Maximum benefit of speculative moves on runtime

0.75

57

36

2 processes4 processes8 processes

16 processes



Summary


Practical Testing

Circle detection algorithm used for testingFixed number of iterationsAutogenerated imagesRuntime values averaged over 20 runs

Hardware utilised:AMD Athlon 64 X2 4400+ (dual-core)Intel Xeon Dual-ProcessorIntel Pentium-D (dual core)Intel Core2 Quad Q6600 (2x dual-core dies)56 Itanium2 processor SGI Altix



Summary


Comparing Practical with Theoretical (1)

0

20

40

60

80

100

120

140

0 0.2 0.4 0.6 0.8 1

Run

time

(sec

onds

)


Runtime against pr on a Intel Pentium D (Dual-core)

1 threadobserved 2 threads

theoretical 2 threads



Summary


Comparing Practical with Theoretical (2)

0

10

20

30

40

50

60

70

80

90

0 0.2 0.4 0.6 0.8 1

Run

time

(sec

onds

)


Runtime against pr on a Intel Core2 Q6600 (Quad-core)

1 threadobserved 2 threadsobserved 4 threads

theoretical 2 threadstheoretical 4 threads



Summary


Preferable Architectures

0

20

40

60

80

100

X2 4400+(2 core)

Xeon(2 proc)

PentiumD(2 core)

Q6600(4 core)

SGI Altix(56x

Italium2)

Perc

ento

fmax

imum

runt

ime

redu

ctio

nac

hiev

ed

2 threads4 threads8 threads



Summary


Will I Benefit?

This table shows the iteration time at which the overhead frommultithreading balances the benefits, when pr = 0.75.

Iteration IterationTime (µs) Rate (s−1)

Xeon Dual-Processor 70 14 285Pentium-D (dual core) 55 18 181

Q6600 (using 2 threads) 75 13 333Q6600 (using 4 threads) 25 40 000



Summary

Summary

The speculative moves method uses increasingly availablemultiprocessor and multicore machines to reduce theruntime of MCMC program.The statistical algorithm is preserved. Speculative moveswill not effect the results, only the real-time required toobtain them.Real-time reductions of 35% using a dual-core and 55%using quad-cores machines have been demonstrated.