Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal...

42
Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby School of Computing University of Utah Rajeev Thakur, William Gropp Mathematics and Computer Science Division Argonne National Laboratory
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    4

Transcript of Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal...

Page 1: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Formal Verification of Programs That Use MPI One-Sided Communication

Salman Pervez, Ganesh Gopalakrishnan, Robert M. KirbySchool of Computing

University of Utah

Rajeev Thakur, William GroppMathematics and Computer Science Division

Argonne National Laboratory

Page 2: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

• The demand for concurrent software is increasing.

• Concurrent algorithms are notoriously hard to design and verify.

• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle advantages of modeling checking approach:- provides formal framework for reasoning- allows coverage – examination of all possible process interleavings

Thesis of the Talk

Thesis: If finite-state models are created and exhaustively analyzed for desired formal properties, robust

algorithms and implementations will result.

Page 3: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

What is Model Checking?

Navier-Stokes Equations are a mathematical model of fluid flow physics

“V&V” – Validation and Verification“Validate Models, Verify Codes”

“Formal models” can be generated eitherautomatically or by a modeler which

translate and abstract algorithms and implementations.

Page 4: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Model Checking: History and Current Practice• History

– Approach invented around 1981 by:• Clarke and Emerson, Queille and Sifakis

– Widely used in Hardware Verification since the 90’s– Uses in Software Verification is the current rage

• Notable Successes– Bell Labs : Telephone Switch Software Verification– NASA : Concurrent Java Program Verification– Microsoft : Device Driver Verification

• Applications in HPC by others:– Siegel and Avrunin: MPI two-sided communication programs– Matlin, Lusk, McCune: Verifying parts of MPD

Page 5: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

MPI One-Sided Communication

• MPI One-Sided Constructs Examined:– MPI_Win_lock– MPI_Win_unlock– MPI_Put– MPI_Get

• The desired atomicity is provided by the constructs MPI_Win_Lock / MPI_Win_Unlock

• Once the lock is relinquished, data values can no longer be trusted

Page 6: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Test Case: Byte-Range Algorithm

• Algorithm implemented using MPI one-sided communication (with passive-target lock-unlock synchronization) for coordinating a collection of parallel processes contending for byte-range locks.

Notes Concerning Algorithm:• To acquire a lock, a process must checkpoint the

global state by ‘simultaneously’ indicating its intent and reading others’ status.

• When the lock owner release the lock, he wakes up all conflicting ‘sleeping’ processes.

Page 7: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Lock Acquire

lock_acquire (start, end) {

Stage 11 val[0] = 1; /* flag */ val[1] = start; val[2] = end;2 while(1) {3 lock_win4 place val in win5 get values of other processes from win6 unlock_win7 for all i, if (Pi conflicts with my range)8 conflict = 1;

Stage 29 if(conflict) {10 val[0] = 011 lock_win12 place val in win13 unlock_win14 MPI_Recv(ANY_SOURCE)15 }16 else{17 /* lock is acquired */18 break;19 }20 }//end while

Window:

P0 P1

flag start end 0 -1 -1 0 -1 -1 0 -1 -1

Page 8: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Lock Release

lock_release (start, end) { val[0] = 0; /* flag */ val[1] = -1; val[2] = -1;

lock_win place val in win get values of other processes from win unlock_win

for all i, if (Pi conflicts with my range) MPI_Send(Pi);

}

Window:

P0 P1

flag start end 0 -1 -1 0 -1 -1 0 -1 -1

Page 9: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 10: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 11: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 2Blocks on Receive

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 12: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Deduces Conflict – Stage 2Blocks on Receive

Send Signal to P1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 13: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)

Receives SignalRetry Stage 1

Send Signal to P1

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 14: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)lock_release()

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 15: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()

lock_acquire(3,5)lock_release()

Example 1: Demonstration of Lock Acquire/Release Strategy

Page 16: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

inlineMPI_Win_lock(proc_i){ /* try sending a message

on a channel of size 1,

will block if a message is already in

the queue. */

lock_chan!proc_id; }

Modeling in Promela

Example Promela Code for lock_release• C-like structure• Powerful abstractions like channels

Page 17: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 -1 -1 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 18: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 19: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Send Signal to P1

Page 20: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 21: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Deduces Conflict – Stage 2Block on Receive

Page 22: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Receive SignalRetry Stage 1

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 23: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Deduces Conflict – Stage 1

Page 24: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 1 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 25: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 26: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 0 3 5 0 -1 -1 0 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

DEADLOCK

Page 27: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

ObservationsAfter Model Checking

• P0 releases lock before it can see that P1 will be blocked.

• There is no way for P0 to figure out whether P1 merely wants the lock or is actually blocked.

• Multiple unmatched sends can occur (example to follow)

Page 28: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 0 -1 -1 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Example 3: Demonstration of Lock Acquire/Release Limitation

Page 29: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 1 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 1

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 30: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 1 6 8 0 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 2Block on Receive

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 31: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 0 -1 -1 0 5 6 0 -1 -1

Process 0 Process 1 Process 2

lock_acquire(3,5)lock_release()

lock_acquire(6,8)lock_release()

lock_acquire(5,6)

P2

Deduces Conflict – Stage 2Block on Receive

Send Signal to P2 Send Signal to P2

Example 2: Demonstration of Lock Acquire/Release Limitation

Page 32: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Proposed Solution 1• Main idea: Distinguish between processes that want

the lock and those that are blocked.• Three possible flag values:

– 0 = I do not have the lock– 1 = I have the lock– 2 = I am trying for the lock

• If a process wants the lock, but finds another conflicting process with a flag value of 2, it must wait until this value changes to either 1 or 0.

• We have added more certainty to the algorithm but taken a possible performance hit and possible livelock.

Page 33: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Proposed Solution 2• Main Idea: The process about to be blocked picks who

will wake it up and indicates so by writing to shared memory

• Once processes declare their intentions globally, deadlock can be avoided.

• For there to be deadlock, a dependency cycle must exist.

• The last process to complete this cycle will know about it and must not do so.

Window:

P0 P1

flag start end pick -1 -1 0 -1 -1 0 -1 -1

Page 34: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 -1 -1 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 35: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 36: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 -1 -1 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 37: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 1 3 5 -1 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1Deduces Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 38: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 1 Deduces Conflict – Stage 2Block on Receive

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 39: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

1 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

No Conflict – Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 40: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Window:

P0 P1

0 3 5 -1 0 3 5 0 0 -1 -1 -1

Process 0 Process 1

lock_acquire(3,5)lock_release()lock_acquire(3,5)

lock_acquire(3,5)

Deduces Conflict – Stage 2Block on Receive

Deduces Deadlock – Stage 2Reset to Stage 1

Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Page 41: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Discussion and Future Work“Execution Checking”

“Model Checking”

In current practice, concrete executions on a few diverse platforms are often used to verifyalgorithms/codes.

Consequence: Many feasible executions mightnot be manifested.

Model checking forces all executions of a judiciously down-scaled model to be examined.

Current focus of our research: minimize modeling effort and error.

Page 42: Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Funding Acknowledgements:

• NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis)• Microsoft (Formal Analysis and Code Generation Support for MPI)• Office of Science – Department of Energy

Summary• Paradigms such as one-sided MPI and threading creates a plethora of execution possibilities – many of which might be algorithmically fatal yet lay dormant at testing time.

• Model checking provides a formal and practical means of reasoning about all possible executions as part of the design, verification and optimization process.

Closing Question (“Food for Thought”):• Can one come up with safe usages (i.e. easier to verify yet not overly restrictive) of one-sided communication?