Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

68
1 Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking Dissertation Defense Anh Vo Committee: Prof. Ganesh Gopalakrishnan (co-advisor), Prof. Robert M. Kirby (co-advisor), Dr. Bronis R. de Supinski (LLNL), Prof. Mary Hall and Prof. Matthew Might

description

Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking. Dissertation Defense Anh Vo. Committee: Prof. Ganesh Gopalakrishnan (co-advisor ), Prof. Robert M. Kirby (co-advisor), - PowerPoint PPT Presentation

Transcript of Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

Page 1: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

1

Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

Dissertation Defense

Anh Vo

Committee: Prof. Ganesh Gopalakrishnan (co-advisor), Prof. Robert M. Kirby (co-advisor),

Dr. Bronis R. de Supinski (LLNL), Prof. Mary Hall and Prof. Matthew Might

Page 2: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

2

Our computational ambitions are endless!• Terascale

• Petascale (where we are now)

• Exascale

• Zettascale• Correctness is important

Jaguar, Courtesy ORNL

Protein Folding, Courtesy Wikipedia

Computation Astrophysics, Courtesy LBL

Page 3: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

3Yet we are falling behind when it comes to correctness• Concurrent software debugging is hard• It gets harder as the degree of parallelism in applications

increases– Node level: Message Passing Interface (MPI)– Core level: Threads, OpenMPI, CUDA

• Hybrid programming will be the future– MPI + Threads – MPI + OpenMP– MPI + CUDA

• Yet tools are lagging behind!– Many tools cannot operate at scale

MPI AppsMPI Correctness Tools

Page 4: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

4

We focus on dynamic verification for MPI

• Lack of systematic verification tools for MPI

• We need to build verification tools for MPI first– Realistic MPI programs run at large scale– Downscaling might mask bugs

• MPI tools can be expanded to support hybrid programs

Page 5: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

5

We choose MPI because of its ubiquity

• Born 1994 when the world had 600 internet sites, 700 nm lithography, 68 MHz CPUs

• Still the dominant API for HPC– Most widely supported and understood– High performance, flexible, portable

Page 6: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

6

Thesis statement

Scalable, modular and usable dynamic verification of realistic MPI programs is feasible

and novel.

Page 7: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

7

Contributions• Need scalable distributed algorithms to discover alternate

schedules– Using only local states and observed matches– Matches-before ( is necessary

• DAMPI– Distributed Analyzer for MPI programs– Implements distributed causality tracking using matches-before

• ISP (previous work)– Dynamic verifier for MPI– Implements a scheduler which exerts control to enforce matches-before

• Publications: PACT 2011 (in submission), SC 2010, FM 2010, EuroPVM 2010, PPoPP 2009, EuroPVM 2009

Page 8: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

8

Agenda

• Motivation and Contributions• Background• MPI ordering based on Matches-Before• The centralized approach: ISP• The distributed approach: DAMPI• Conclusions

Page 9: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

9

• Example: Deterministic operations are permuted

MPI_Barrier MPI_Barrier … MPI_Barrier

P1 P2 … Pn

Exploring all n! permutations is wasteful

Traditional testing is wasteful….

Page 10: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

10

10

P0---

MPI_Send(to P1…);

MPI_Send(to P1, data=22);

P1---

MPI_Recv(from P0…);

MPI_Recv(from P2…);

MPI_Recv(*, x);

if (x==22) then ERROR else MPI_Recv(*, x);

P2---

MPI_Send(to P1…);

MPI_Send(to P1, data=33);

Unlucky (bug missed)

Testing can also be inconclusive; without non-determinism coverage, we can miss bugs

Page 11: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

11

11

P0---

MPI_Send(to P1…);

MPI_Send(to P1, data=22);

P1---

MPI_Recv(from P0…);

MPI_Recv(from P2…);

MPI_Recv(*, x);

if (x==22) then ERROR else MPI_Recv(*, x);

P2---

MPI_Send(to P1…);

MPI_Send(to P1, data=33);

Lucky (bug caught!)

Testing can also be inconclusive; without non-determinism coverage, we can miss bugs

Page 12: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

12

12

P0---

MPI_Send(to P1…);

MPI_Send(to P1, data=22);

P1---

MPI_Recv(from P0…);

MPI_Recv(from P2…);

MPI_Recv(*, x);

if (x==22) then ERROR else MPI_Recv(*, x);

P2---

MPI_Send(to P1…);

MPI_Send(to P1, data=33);

Verification: test all possible scenarios

• Find all possible matches for the receive– Ability to track causality is a prerequisite

• Replay the program and force the other matches

Page 13: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

13

Dynamic verification of MPI

• Dynamic verification combines strength of formal methods and testing– Avoids generating false alarms– Finds bugs with respect to actual binaries

• Builds on the familiar approach of “testing”

• Guarantee coverage over nondeterminism

Page 14: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

14

Overview of Message Passing Interface (MPI)

• An API specification for communication protocols between processes

• Allows developers to write high performance and portable parallel code

• Rich in features– Synchronous: easy to use and understand– Asynchronous: high performance– Nondeterministic constructs: reduce code

complexity

Page 15: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

15

MPI operations

MPI_Send(void* buffer, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm)send(P,T)send(P)MPI_Recv(void* buffer, int count, MPI_Datatype type, int src, int tag, MPI_Comm comm, MPI_Status status)

send(P,T)- send a message with tag T to process Precv(P,T)- recv a message with tag T from process Precv(*,T)- recv a message with tag T from any processrecv(*,*)- recv a message with any tag from any processMPI_Isend(void* buffer, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm, MPI_Request h)isend(P,T,h) – nonblocking send, communication handle h

irecv(P,T,h) – nonblocking recv, communication handle hirecv(*,T,h)irecv(*,*,h)

MPI_Irecv(void* buffer, int count, MPI_Datatype type, int src, int tag, int comm, MPI_Request h)

MPI_Wait(MPI_Request h, MPI_Status status)wait(h) – wait for the completion of h

barrier – synchronizationMPI_Barrier(MPI_Comm comm)

Page 16: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

16

Nonovertaking rule facilitates message matching

• Sender sends two messages:– Both can match to a receive operation– First message will match before second message

• Receiver posts two receives:– Both can match with an incoming message– First receive will match before second receive

P0

P1

send(1) send(1)

recv(0) recv(0)

P0

P1

send(1) send(1)

recv(*) recv(0)

P0

P1

send(1) send(1)

irecv(*) recv(0)

Page 17: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

17

Happens-before is the basis of causality tracking

• e1 happens-before () e2 iff:– e1 occurs before e2 in the same process– e1 is the sending of a message m and e2 receives it

– e1 e3 and e3 e2

a

b c

d

e

f

a eb cd f

a bc de f

a ca da fb db fc f

Page 18: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

18

Tracking causality with Lamport clocks• Each process keeps a clock (an integer)• Increase the clock when it has an event• Attach the clock to outgoing messages (piggyback)• Upon receiving piggybacked clock, update the clock to the value

greater or equal to its clock, but higher than the piggybacked clock

1a

2b 3c

4d

2e

5f

If e1 e2 then the clock of e1 is less than the clock of e2

What about e and d? The converse does not hold!

Page 19: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

19

Tracking causality with vector clocks• Each process keeps a vector of clocks (VC)• Increase its clock component when it has an event• Attach the VC to outgoing messages (piggyback)• Upon receiving piggybacked VC clock, update each component to the

maximum between the current VC and the piggybacked VC

1,0,0a

1,1,0b 1,2,0c

1,2,1d

2,0,0e

2,2,2f

What about e and d? They are concurrent!

e1 e2 iff VC of e1 is less than VC e2

Page 20: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

20

Agenda

• Motivation and Contributions• Background• Matches-Before for MPI• The centralized approach: ISP• The distributed approach: DAMPI• Conclusions

Page 21: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

21

The necessity for matches-before

• The notion of happening does not mean much• Local ordering does not always hold– For example, P0: send(1); send(2);

• The notion of completion also does not work

P0---send (1)send (1)

P1---irecv (*,h)recv (*)wait(h)

The irecv(*) happens before the recv(*) but completes after it

Page 22: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

22

Possible states of an MPI call

• All possible states of an MPI call– Issued– Matched– Completed– Returned

• It’s always possible to know exactly which state the call is in– Except for the Matched state, which has a matching

window

P1

isend(0,h1)barriersend(0)wait(h1)

P2

irecv(0,h2)barrierrecv(0)wait(h2)

Page 23: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

23

Definition of matches-before• recv, barrier, and wait match before all the calls following it• sends and receives have matches-before order according to the

non-overtaking rule• Nonblocking calls match before their waits• Matches-before is irreflexive, asymmetric and transitive

isend(0,h1) send(0)barrier wait(h1)

irecv(0,h2) recv(0)barrier wait(h2)

Page 24: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

24

The role of match-sets• When a send and a receive match, they form a match-set• When barriers match, they form a match-set• Members of a match-set do not have • e M iff e (the send call or some barrier call) of M• M e iff (the receive call or some barrier call) of M e

send

recv e2

e1barrier

barrier

barriere3

Page 25: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

25

Agenda

• Motivation and Contributions• Background• Matches-Before for MPI• The centralized approach: ISP• The distributed approach: DAMPI• Conclusions

Page 26: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

26

Executable

Proc1

Proc2

……Procn

SchedulerRun

MPI Runtime

Previous work: centralized approach

MPI Program

Interposition Layer

• Verifies MPI programs for deadlocks, resource leaks, assertion violations• Guarantees coverage over the space of MPI non-determinism• FM 2010, PPoPP 2009, EuroPVM 2010, EuroPVM 2009

Page 27: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

27

Drawbacks of ISP• Scales only up to 32-64 processes

• Large programs (of 1000s of processes) often exhibit bugs that are not triggered at low ends– Index out of bounds– Buffer overflows– MPI implementation bugs

• Need a truly In-Situ verification method for codes deployed on large-scale clusters!– Verify an MPI program as deployed on a cluster

Page 28: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

28

Agenda

• Motivation and Contributions• Background• Matches-Before for MPI• The centralized approach: ISP• The distributed approach: DAMPI• Conclusions

Page 29: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

29

DAMPI

• Distributed Analyzer of MPI Programs• Dynamic verification focusing on coverage over

the space of MPI non-determinism• Verification occurs on the actual deployed code• DAMPI’s features:– Detect and enforce alternative outcomes– Scalable– User-configurable coverage

Page 30: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

30

DAMPI Framework

Executable

Proc1

Proc2

……Procn

Alternate Matches

MPI runtime

MPI Program

DAMPI - PnMPI modules

Schedule Generator

EpochDecisions

Rerun

DAMPI – Distributed Analyzer for MPI

Page 31: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

31

Main idea in DAMPI: Distributed Causality Tracking• Perform an initial run of the MPI program

• Track causalities (discover which alternative non-deterministic matches could have occurred)

• Two alternatives:– Use of Vector Clocks (thorough, but non-scalable)– Use Lamport Clocks (our choice)

• Omissions possible – but only in unrealistic situations

• Scalable!

Page 32: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

32DAMPI uses Lamport clocks to maintainMatches-Before

• Use Lamport clock to track Matches-Before– Each process keeps a logical clock– Attach clock to each outgoing message– Increases it after a nondeterministic receive has matched

• Mb allows us to infer when irecv’s match– Compare incoming clock to detect potential matches

barrier[0]

irecv(*)[0]

recv(*)[1]

send(1)[0]

send(1)[0]

P0

P1

P2

barrier[0]

barrier[0]

wait[2]

Excuse me, why is the second

send RED?

Page 33: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

33How we use Matches-Before relationship to detect alternative matches

barrier[0]

irecv(*)[0]

recv(*)[1]

send(1)[0]

send(1)[0]

P0

P1

P2

barrier[0]

barrier[0]

wait[2]

Page 34: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

34How we use Matches-Before relationship to detect alternative matches• Wildcard Ordering Lemma:

If r is a wildcard receive: r e then C(r) < C(e)• Intuition:

– The process increases the clock after r matches– If e is in the same process with r, lemma holds– If e is not in the same process, there must be some match-sets that act as

“bridges” so that r e. Piggybacking ensures the lemma holds.

send

recv e2

e1 barrier

barrier

barriere3

Page 35: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

35How we use Matches-Before relationship to detect alternative matches

barrier[0]

R = irecv(*)[0]

R’= recv(*)[1]

S = send(1)[0]

send(1)[0]

P0

P1

P2

barrier[0]

barrier[0]

wait[2]

• If S R then S R’ , which violates the match-set property– Thus, we have S R

• Wildcard ordering lemma gives : R S• Thus, S and R are concurrent

Page 36: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

36

Limitations of Lamport Clock

R1(*)

0

0

0

0

1

pb(3)

S(P2)P0

P1

P2

P3

R2(*)

2

S(P2)

R3(*)

3

S(P2)

pb(0)

pb(0)

pb(0)

R(*)

1

S(P3)

pb(0)

This send is a potential

match

S(P3)

Our protocol guarantees that impossible matches will not be forced (there could be deadlocks otherwise)

Page 37: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

37

Lamport Clocks vs Vector Clocks

• DAMPI provides two protocols– Lamport clocks: sound and scalable– Vector clocks: sound and complete

• We evaluate the scalability and accuracy– Scalability: bandwidth, latency, overhead– Accuracy: omissions

• The Lamport Clocks protocol does not have any omissions in practice– MPI applications have well structured communication

patterns

Page 38: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

38

Experiments setup

• Atlas cluster in LLNL– 1152 nodes, 8 cores per node– 16GB of memory per node– MVAPICH-2

• All experiments run at 8 tasks per node• Results averaged out over five runs

Page 39: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

39

Latency Impact

64 128 256 512 10240

5

10

15

20

25

30

35

40

Latency for 4-byte Messages

OriginalLamport ClocksVector Clocks

Number of Processes

Late

ncy

(micr

o se

cond

s)

Page 40: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

40

Bandwidth Impact

64 128 256 512 10240

5000

10000

15000

20000

25000

OriginalLamport Clocks

Vector Clocks

Number of Processes

Mes

sage

Size

Page 41: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

41

Application overhead – ParMETIS

64 128 256 512 10240

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

ParMETIS-3.1.1

OriginalLamport ClocksVector Clocks

Number of Processes

Slow

dow

n

Page 42: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

42

Application overhead – AMG 2006

64 128 256 512 10240

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

AMG2006

OriginalLamport ClocksVector Clocks

Number of Processes

Slow

dow

n

Page 43: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

43

Application overhead – SMG2000

64 128 256 512 10240

0.2

0.4

0.6

0.8

1

1.2

1.4

SMG2000

OriginalLamport ClocksVector Clocks

Number of Processes

Slow

dow

n

Page 44: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

44

DAMPI’s Implementation Detail: using PnMPI

Executable

Proc1

Proc2

……Procn

Alternate Matches

MPI runtime

MPI Program

DAMPI - PnMPI

modules

Schedule Generator

EpochDecisions

Status module

Request module

Communicator module

Type module

Deadlock module

DAMPI - PnMPI modules

Core Module

Optional Error Checking Module

Piggyback module

DAMPI driver

Page 45: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

45

Piggyback implementation details

• MPI does not provide a built-in mechanism to attach piggyback data to messages

• Most common piggyback mechanisms– Attach piggyback to the buffer: • easy to use but expensive

– Send piggyback as a separate message: • low overhead but has issues with wildcard receives

– Using user-defined datatype to transmit piggyback• low overhead, difficult to piggyback on collectives

Page 46: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

46

DAMPI uses a mixed piggyback scheme

• Datatype piggyback for point-to-point• Separate message piggyback for collectives

Piggyback Message Data

pb_buf stores piggyback

int MPI_Send(buf,count,user_type,…){ Create new datatype D from pb_buf and buf return PMPI_Send(MPI_BOTTOM,1,D,…);}

int MPI_Recv(buf,count,user_type,…) { Create new datatype D from pb_buf and buf return PMPI_Recv(MPI_BOTTOM,1,D,…);}

Wrapper – Piggyback Layer

Datatype D

Piggyback Message Data

Sending/Receiving(MPI_BOTTOM,1,D) instead of (buffer,count,user_type)

Datatype D

Page 47: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

47

Experiments

• Comparison with ISP:– 64-node cluster of Intel Xeon X5550 (8 cores per

node, 2.67 GHZ), 24GB RAM per node– All experiments were run with 8 tasks per node

• Measuring overhead of DAMPI:– 800-node cluster of AMD Opteron (16 cores per

node, 2.3GHZ), 32GB RAM per node– All experiments were run with 16 tasks per node

Page 48: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

48

DAMPI maintains very good scalability vs ISP

4 8 16 320

100

200

300

400

500

600

700

800

900

ParMETIS-3.1 (no wildcard)

ISPDAMPI

Number of tasks

Tim

e in

seco

nds

Page 49: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

49

DAMPI is also faster at processing interleavings

250 500 750 10000

1000

2000

3000

4000

5000

6000

7000

8000

Matrix Multiplication with Wildcard Receives

ISPDAMPI

Number of Interleavings

Tim

e in

seco

nds

Page 50: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

50Results on large applications: SpecMPI2007 and NAS-PB

ParMETI

S-3.1

107.leslie3

d

113.GemsFD

TD

126.lammps

130.soco

rro137.lu BT CG DT EP FT IS LU MG

0

0.5

1

1.5

2

2.5

Base 1024 processes

DAMPI 1024 processes

Slowdown is for one interleaving No replay was necessary

Page 51: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

51

Heuristics for Overcoming Search Explosion

• Full coverage leads to state explosion– Given limited time, full coverage biases towards the

beginning of the state space

• DAMPI offers two ways to limit search space:– Ignore uninteresting regions• Users annotate programs with MPI_Pcontrol

– Bounded mixing: limits impact of non-det. choice• bound = infinity: full search

Page 52: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

52

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 1

Page 53: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

53

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 1

Page 54: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

54

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 1Total interleavings: 8

Page 55: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

55

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 2

Page 56: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

56

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 2

Page 57: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

57

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 2

Page 58: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

58

Bounded Mixing visualization

A

B

C

D E

F G H

MPI_Finalize

Bound = 2Total interleavings: 10

Bound >= 3Total interleavings: 16

Page 59: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

59

Applying Bounded Mixing on ADLB

Page 60: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

60

How well did we do?

• DAMPI achieves scalable verification– Coverage over the space of nondeterminism– Works on realistic MPI programs at large scale

• Further correctness checking capabilities can be added as modules to DAMPI

Page 61: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

61

• Questions?

Page 62: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

62

Concluding Remarks

• Scalable dynamic verification for MPI is feasible– Combines strength of testing and formal methods– Guarantees coverage over nondeterminism

• Matches-before ordering for MPI provides the basis for tracking causality in MPI

• DAMPI is the first MPI verifier that can scale beyond hundreds of processes

Page 63: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

63

Moore Law still holds

Page 64: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

64Results on non-trivial applications:SpecMPI2007 and NAS-PB

Benchmark Slowdown Total R* Communicator Leak Request Leak

ParMETIS-3.1 1.18 0 Yes No104.milc 15 51K Yes No

107.leslie3d 1.14 0 No No113.GemsFDTD 1.13 0 Yes No

126.lammps 1.88 0 No No130.socorro 1.25 0 No No

137.lu 1.04 732 Yes NoBT 1.28 0 Yes NoCG 1.09 0 No NoDT 1.01 0 No NoEP 1.02 0 No NoFT 1.01 0 Yes NoIS 1.09 0 No NoLU 2.22 1K No NoMG 1.15 0 No No

Page 65: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

65

P0 P1 P2

Barrier

Isend(1, req)

Wait(req)

Scheduler

Irecv(*, req)

Barrier

Recv(2)

Wait(req)

Isend(1, req)

Wait(req)

Barrier

Isend(1)

sendNext Barrier

MPI Runtime

How ISP Works: Delayed Execution

Page 66: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

66

P0 P1 P2

Barrier

Isend(1, req)

Wait(req)

Scheduler

Irecv(*, req)

Barrier

Recv(2)

Wait(req)

Isend(1, req)

Wait(req)

Barrier

Isend(1)

sendNextBarrier

Irecv(*)Barrier

MPI Runtime

Delayed Execution

Page 67: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

67

P0 P1 P2

Barrier

Isend(1, req)

Wait(req)

Scheduler

Irecv(*, req)

Barrier

Recv(2)

Wait(req)

Isend(1, req)

Wait(req)

Barrier

Isend(1)

Barrier

Irecv(*)Barrier

Barrier

Barrier

Barrier

Barrier

MPI Runtime

Collect Max Set of Enabled MPI calls

Page 68: Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

68

P0 P1 P2

Barrier

Isend(1, req)

Wait(req)

MPI Runtime

Scheduler

Irecv(*, req)

Barrier

Recv(2)

Wait(req)

Isend(1, req)

Wait(req)

Barrier

Isend(1)

Barrier

Irecv(*)Barrier

Barrier

Wait (req)

Recv(2)

Isend(1)

SendNext

Wait (req)

Irecv(2)Isend

Wait

No Match

Deadlock!

Build Happens-Before Edges