Download - Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Transcript
Page 1: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Formal Verification of Programs That Use MPI One-Sided Communication

Salman Pervez, Ganesh Gopalakrishnan, Robert M. KirbySchool of Computing

University of Utah

Rajeev Thakur, William GroppMathematics and Computer Science Division

Argonne National Laboratory

Page 2: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

• The demand for concurrent software is increasing.

• Concurrent algorithms are notoriously hard to design and verify.

• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle advantages of modeling checking approach:- provides formal framework for reasoning- allows coverage – examination of all possible process interleavings

Thesis of the Talk

Thesis: If finite-state models are created and exhaustively analyzed for desired formal properties, robust

algorithms and implementations will result.

Page 3: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

What is Model Checking?

Navier-Stokes Equations are a mathematical model of fluid flow physics

“V&V” – Validation and Verification“Validate Models, Verify Codes”

“Formal models” can be generated eitherautomatically or by a modeler which

translate and abstract algorithms and implementations.

Page 4: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Model Checking: History and Current Practice• History

– Approach invented around 1981 by:• Clarke and Emerson, Queille and Sifakis

– Widely used in Hardware Verification since the 90’s– Uses in Software Verification is the current rage

• Notable Successes– Bell Labs : Telephone Switch Software Verification– NASA : Concurrent Java Program Verification– Microsoft : Device Driver Verification

• Applications in HPC by others:– Siegel and Avrunin: MPI two-sided communication programs– Matlin, Lusk, McCune: Verifying parts of MPD

Page 5: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

MPI One-Sided Communication

• MPI One-Sided Constructs Examined:– MPI_Win_lock– MPI_Win_unlock– MPI_Put– MPI_Get

• The desired atomicity is provided by the constructs MPI_Win_Lock / MPI_Win_Unlock

• Once the lock is relinquished, data values can no longer be trusted

Page 6: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Test Case: Byte-Range Algorithm

• Algorithm implemented using MPI one-sided communication (with passive-target lock-unlock synchronization) for coordinating a collection of parallel processes contending for byte-range locks.

Notes Concerning Algorithm:• To acquire a lock, a process must checkpoint the

global state by ‘simultaneously’ indicating its intent and reading others’ status.

• When the lock owner release the lock, he wakes up all conflicting ‘sleeping’ processes.

Page 7: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Lock Acquire

lock_acquire (start, end) {

Stage 11 val[0] = 1; /* flag */ val[1] = start; val[2] = end;2 while(1) {3 lock_win4 place val in win5 get values of other processes from win6 unlock_win7 for all i, if (Pi conflicts with my range)8 conflict = 1;

Stage 29 if(conflict) {10 val[0] = 011 lock_win12 place val in win13 unlock_win14 MPI_Recv(ANY_SOURCE)15 }16 else{17 /* lock is acquired */18 break;19 }20 }//end while

Window:

P0 P1

flag start end 0 -1 -1 0 -1 -1 0 -1 -1

Page 8: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Lock Release

lock_release (start, end) { val[0] = 0; /* flag */ val[1] = -1; val[2] = -1;

lock_win place val in win get values of other processes from win unlock_win

for all i, if (Pi conflicts with my range) MPI_Send(Pi);

}

Window:

P0 P1

flag start end 0 -1 -1 0 -1 -1 0 -1 -1

Page 9: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 10: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 11: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 2Blocks on Receive

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 12: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 2Blocks on Receive

Send Signal to P1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 13: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Receives SignalRetry Stage 1

Send Signal to P1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 14: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)lock_release()

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 15: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)lock_release()

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 16: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

inlineMPI_Win_lock(proc_i){ /* try sending a message

on a channel of size 1,

will block if a message is already in

the queue. */

lock_chan!proc_id; }

Modeling in Promela

Example Promela Code for lock_release• C-like structure• Powerful abstractions like channels

Page 17: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 18: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 19: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Send Signal to P1

Page 20: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 21: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Deduces Conflict – Stage 2Block on Receive

Page 22: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Receive SignalRetry Stage 1

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 23: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Deduces Conflict – Stage 1

Page 24: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 25: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 26: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

DEADLOCK

Page 27: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

ObservationsAfter Model Checking

• P0 releases lock before it can see that P1 will be blocked.

• There is no way for P0 to figure out whether P1 merely wants the lock or is actually blocked.

• Multiple unmatched sends can occur (example to follow)

Page 28: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 0 -1 -1 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Example 3: Demonstration of Lock Acquire/Release Limitation

Page 29: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 1 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 30: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 0 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 31: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 -1 -1 0 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 2Block on Receive

Send Signal to P2 Send Signal to P2

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 32: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Proposed Solution 1• Main idea: Distinguish between processes that want

the lock and those that are blocked.• Three possible flag values:

– 0 = I do not have the lock– 1 = I have the lock– 2 = I am trying for the lock

• If a process wants the lock, but finds another conflicting process with a flag value of 2, it must wait until this value changes to either 1 or 0.

• We have added more certainty to the algorithm but taken a possible performance hit and possible livelock.

Page 33: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Proposed Solution 2• Main Idea: The process about to be blocked picks who

will wake it up and indicates so by writing to shared memory

• Once processes declare their intentions globally, deadlock can be avoided.

• For there to be deadlock, a dependency cycle must exist.

• The last process to complete this cycle will know about it and must not do so.

Window:

P0 P1

flag start end pick -1 -1 0 -1 -1 0 -1 -1

Page 34: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 -1 -1 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 35: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 36: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 37: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 38: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1 Deduces Conflict – Stage 2Block on Receive

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 39: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

No Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 40: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Deadlock – Stage 2Reset to Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 41: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Discussion and Future Work“Execution Checking”

“Model Checking”

In current practice, concrete executions on a few diverse platforms are often used to verifyalgorithms/codes.

Consequence: Many feasible executions mightnot be manifested.

Model checking forces all executions of a judiciously down-scaled model to be examined.

Current focus of our research: minimize modeling effort and error.

Page 42: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Funding Acknowledgements:

• NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis)• Microsoft (Formal Analysis and Code Generation Support for MPI)• Office of Science – Department of Energy

Summary• Paradigms such as one-sided MPI and threading creates a plethora of execution possibilities – many of which might be algorithmically fatal yet lay dormant at testing time.

• Model checking provides a formal and practical means of reasoning about all possible executions as part of the design, verification and optimization process.

Closing Question (“Food for Thought”):• Can one come up with safe usages (i.e. easier to verify yet not overly restrictive) of one-sided communication?