UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C
description
Transcript of UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C
![Page 1: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/1.jpg)
UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel CIndranil RoyHigh Performance Computing (HPC) group
![Page 2: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/2.jpg)
Segmentation error. Core dumped.
![Page 3: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/3.jpg)
A good error messageThread 0 encountered invalid arguments in function upc all broadcast at line 26 in file /home/jjc/ex1.upc.
Error: Parameter (sizeof(int ) * sh val) passes non-positive value of 0 to nbytes argument
Variable sh val was declared at line 10 in file /home/jjc/ex1.upc.
![Page 4: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/4.jpg)
Outline▫Understanding a Unified Parallel C▫UPC-CHECK 1.0 tool
How does it work? Usability Error coverage and quality of error reports generated Testing Overheads Scalability Known limitations
▫Challenges in argument error detection▫Deadlock detection algorithm▫Demo
![Page 5: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/5.jpg)
Understanding Unified Parallel C
•Shared memory model
•Distributed memory model
![Page 6: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/6.jpg)
Understanding Unified Parallel C•Unified Parallel C
▫Distributed Shared Memory Model or Partitioned Global Address Space Model
![Page 7: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/7.jpg)
UPC-CHECK v1.0•Source to source translator•Pre-compiler•Error handling
▫Argument errors ▫Deadlocks
![Page 8: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/8.jpg)
UPC-CHECK: Usability •Portable
▫Machine independent▫Compiler independent
•Ease of use▫Easy to install
install_UPC-CHECK
▫Easy to run •Freely available
wget http://hpcgroup.public.iastate.edu/UPC-CHECK/UPC-CHECK.tar.gz
![Page 9: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/9.jpg)
UPC-CHECK 1.0: Usability•Usage
upc-check [compiler options] [--upccheck:flag [--upccheck:flag] ...] -c sourcefile.upc
-a|-d_argument_check disables argument checking (enabled by
default)-d|-d_deadlock_check disables deadlock checking (enabled by
default)-s|-e_track_func_call_stack enables tracing of function call stack (disabled by default)-h|--h|-help prints help for UPC-CHECK
• Just replace your compile-command with upc-check.
![Page 10: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/10.jpg)
Quality of error reports generated• Coyle, J., Hoekstra, J., Kraeva, M., Luecke, G. R., Kleiman, R.,
Srinivas, V., Tripathi, A., Weiss, O., Wehe, A., Xu, Y., Yahya, M. (2008). UPC Run-Time Error Detection Test Suite. http://kraeva.public.iastate.edu/rted/UPC.TestPlan.pdf,
Iowa State University, High Performance Computing Group.▫ A score of 5 is given for a detailed error message that will assist a
programmer to x the error.▫ A score of 4 is given for error messages with more information than a
score of 3 and less than 5. This is tailored for each test.▫ A score of 3 is given for error messages with the correct error name,
line number and the name of the file where the error occurred.▫ A score of 2 is given for error messages with the correct error name and
line number where the error occurred but not the file name where the error occurred.
▫ A score of 1 is given for error messages with the correct error name.▫ A score of 0 is given when the error was not detected.
![Page 11: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/11.jpg)
Run-time environments
Argument errors Deadlocks
Cray 0.38 0Berkeley 0.04 0.58HP 0 0.36GNU 0 0.27
UPC-CHECK 4.89 5
![Page 12: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/12.jpg)
UPC-CHECK 1.0: Testing
•400 error test-cases•1800 false-positive cases•Additional testing for deadlocks•Testing across application programs
![Page 13: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/13.jpg)
UPC-CHECK 1.0: Overhead• Base memory requirement
▫~ 128 KB per thread▫With every acquired or requested shared memory
lock, requirement goes by around 256 B ▫while tracking function call stack, with every
level of nested function call, memory requirement goes by around 512 B
• Increase of code section▫~ 100 lines of instrumentation per UPC operation▫~12000 lines from support files
![Page 14: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/14.jpg)
Efficiency overheadBerkeley UPC Cray UPC
Original Instumented Overhead Original Instumented OverheadCG-S 7.329 7.393 1.009 7.34 7.924 1.080CG-W 7.554 7.613 1.008 7.576 8.344 1.101CG-A 8.531 9.378 1.099 8.372 9.133 1.091CG-B 73.619 74.222 1.008 56.376 63.239 1.122CG-C 171.36 173.036 1.010 132.997 140.317 1.055EP-S 8.048 8.581 1.066 5.319 5.307 0.998EP-W 8.944 10.179 1.138 6.039 6.019 0.997EP-A 19.71 25.193 1.278 14.755 14.743 0.999EP-B 57.366 92.385 1.610 44.706 46.567 1.042EP-C 211.214 349.248 1.654 164.289 163.929 0.998FT-S 7.529 7.74 1.028 4.97 4.918 0.990FT-W+ 7.651 7.68 1.004 5.135 5.151 1.003FT-A*+ 15.34 14.312 0.933 9.173 9.084 0.990FT-B*+ 83.981 77.339 0.921 50.621 50.613 1.000FT-C*+ 0.000 200.947 220.111 1.095IS-S 7.257 7.389 1.018 4.954 4.894 0.988IS-W 7.409 7.441 1.004 5.099 5.006 0.982IS-A 8.526 8.435 0.989 5.787 5.799 1.002IS-B 12.038 12.115 1.006 8.604 8.54 0.993IS-C*+ 25.397 25.69 1.012 19.662 21.655 1.101MG-S 7.298 7.45 1.021 4.798 4.79 0.998MG-W 7.631 7.499 0.983 5.239 5.815 1.110MG-A*+ 11.32 10.73 0.948 6.979 12.038 1.725MG-B*+ 19.083 19.1 1.001 11.718 16.88 1.441MG-C*+ 118.651 118.965 1.003 68.33 107.597 1.575
Maximum 1.654 Maximum 1.725Average 1.073 Average 1.099
Maximum 1.654
Average 1.073
Maximum 1.725
Average 1.099
![Page 15: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/15.jpg)
UPC-CHECK 1.0: ScalabilityOriginal Instrumented Slowdown
CG-S 4.742 4.942 1.042CG-W 15.664 15.708 1.003CG-A 4.912 4.99 1.016CG-B 54.183 54.239 1.001CG-C 58.309 58.281 1.000EP-S 1.145 1.145 1.000EP-W 6.247 6.243 0.999EP-A 1.417 1.427 1.007EP-B 7.116 7.128 1.002EP-C 11.19 11.17 0.998FT-S DNR DNR 0.000FT-W DNR DNR 0.000FT-A DNR DNR 0.000FT-B 15.528 15.556 1.002FT-C* 22.855 22.735 0.995IS-S 3.541 3.594 1.015IS-W 10.422 10.961 1.052IS-A 3.56 3.658 1.028IS-B 8.752 8.776 1.003IS-C 10.089 10.073 0.998MG-S DNR DNR 0.000MG-W 8.288 8.308 1.002MG-A DNR DNR 0.000MG-B 9.293 9.341 1.005MG-C 13.551 13.579 1.002
Maximum 1.052Average 1.008
Maximum 1.052
Average 1.008
• CROW cluster• Cray compiler• Cray run-time environment• 128 threads
![Page 16: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/16.jpg)
UPC-CHECK v1.0: Known limitations• UPC-CHECK will not test the single-valued
requirement of upc forall statements.• Since UPC-CHECK works on UPC source
programs, it will be unable to handle any deadlocks which are created in a library that a user might be using.
• UPC-CHECK should not be used for programs where the ‘main' function lies within a headerfile▫Best effort will be made, but may lead to memory
leaks at end of execution.
![Page 17: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/17.jpg)
Challenges in checking argument errors•Engineering challenges
▫Exhaustiveness▫Argument checks against multiple
functions▫Handling vector arguments▫Dependency of one argument on another
argument▫Data-structures used ▫Displaying the errors
![Page 18: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/18.jpg)
A novel Deadlock Detection Algorithm•Dynamic•Optimal
▫O(1) for deadlocks created by collective routines
▫O(n) for deadlocks created by locks•Distributed•Scalable
![Page 19: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/19.jpg)
A few more terms:“collective” operations
• “Collective” is a constraint placed on some language operations which requires evaluation of such operations to be matched across all threads. The behavior of collective operations is undefined unless all threads execute the same sequence of collective operations.
• “Single valued” refers to an operand to a collective operation, which has the same value on every thread. The behavior of the operation is otherwise undefined.
![Page 20: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/20.jpg)
Central idea•The collective requirement simply states a
relative ordering property of calls to collective operations that must be maintained in the parallel execution trace for all executions of any legal program.
![Page 21: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/21.jpg)
time
threads
![Page 22: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/22.jpg)
Deadlocks in UPC1. Not all threads are waiting at the same collective routine
time
threads
0 1 2 … i T-1… T-2j …
![Page 23: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/23.jpg)
2. Some threads are waiting at the same collective routine when at least one of the threads has reached end-of-execution
time
threads
0 1 2 … i T-1… T-2j …
End-of-execution
3. One of the threads at a collective routine is holding a lock that at least
one of the threads are trying to acquire.
time
threads
0 1 2 … i T-1… T-2j …
![Page 24: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/24.jpg)
5. Circular dependency for acquiring locks amongst threads
time
threads
0 1 2 … i T-1… T-2j …
Definition: A thread i is dependent on another thread j if the thread i is trying to acquire a lock held by thread j
![Page 25: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/25.jpg)
6. Chain of dependency for acquiring locks leads to a thread which is
waiting at a collective routine.
time
threads
0 1 2 … i T-1… T-2j …
![Page 26: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/26.jpg)
6. Chain of dependency for acquiring locks leads to a thread which is
reached end of execution.
time
threads
0 1 2 … i T-1… T-2j …
End-of-execution
![Page 27: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/27.jpg)
Algorithm: Get all the threads in the picture
i+2
13 i-1 i
jT-3T-2
0
T-1
2i+1
… …
…
![Page 28: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/28.jpg)
Validation method: A basic block
time
threads
i-1
i
Rtim
ethreads
i-1
i
R
![Page 29: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/29.jpg)
Implementation: Algorithm 1shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
0 1 … i-1 i i+1 … T-1… nk nk nk …… un un un …
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… nk …… un un un …
0 1 … i-1 I i+1 … T-1… nk nk …… un un un …
0 1 … i-1 i i+1 … T-1… …… un un un …
![Page 30: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/30.jpg)
shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… nk nk …… un un …
![Page 31: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/31.jpg)
shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… nk …… un un un …
![Page 32: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/32.jpg)
shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… nk …… un un …
![Page 33: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/33.jpg)
shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… …… un un …
![Page 34: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/34.jpg)
shared [1] deadlock_ctxt_t unified_deadlock_ctxt[THREADS];
statedesired_stat
e
i-1 i i+1
0 1 … i-1 i i+1 … T-1… …… un …
![Page 35: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/35.jpg)
Atomicity and serialization of status checks•One centralized lock solution
▫Efficiency hit – complete serialization•Decentralized lock solution –one lock per
thread▫ shared [1] upc_lock_t upc_check_deadlock_detection_lock[THREADS];
…i i+1
… T-3 T-2 T-10 1 2
![Page 36: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/36.jpg)
Avoiding deadlocks created by the checks
…i i+1
… T-3 T-2 T-10 1 2
![Page 37: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/37.jpg)
Scheme 1 of acquiring locks
…i i+1
… T-3 T-2 T-10 1 2
Legend: : First lock acquired : Second lock acquired
Even thread: lock[i] then lock[(i+1) %THREADS]Odd thread: lock[(i+1) %THREADS] then lock[i]
![Page 38: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/38.jpg)
Scheme 1: Maximum latency of acquiring locks for even number of threads
i-2 i-1 i i+1
1 2 2 1 1 2 2 1
i-1 i i+1 i+2
1 2 2 1 1 2 2 1
Longest dependency chains when i is even
Longest dependency chains when i is odd
Maximum latency is 3 or O(1)
![Page 39: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/39.jpg)
Maximum latency: when total number of threads are odd
Maximum latency is 4 or O(1)
![Page 40: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/40.jpg)
Efficiency •The number of threads for which any
thread has to wait before entering its critical section is is O(1).
•The number of remote memory access is O(1) as any thread i only accesses memory related to the state of only thread I and thread (i+1)%THREADS.
•Optimal!
![Page 41: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/41.jpg)
When thread reaches a upc_lock• Track requested locks and acquired locks• Look out cyclical hold-and-wait conditions• Look out for chain of hold-and-wait conditions
which lead to a thread blocked at a collective routine▫ If a thread has reached a collective routine, check if
there is a request for a lock that the thread is holding
• Look out for chain of hold-and-wait conditions which lead to a thread which has reached end-of-execution▫ If a thread is exiting without freeing all locks held by it,
then check if there is a request for a lock that the thread is holding
![Page 42: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/42.jpg)
Papers1. Coyle, J., Hoekstra, J., Kraeva, M., Luecke, G. R., Kleiman, R.,
Roy, I. (2009). UPC Compile-Time Error Detection Test Suite. http://kraeva.public.iastate.edu/rted/UPCct.TestPlan.pdf, Iowa State University High Performance Computing Group.
2. Roy, I., Luecke, G. R., Coyle, J., Kraeva, M., Hoekstra, J. (2011). UPC-CHECK: A run-time error detection tool for programs written in UPC. Preprint
3. Roy, I., Luecke, G. R., Coyle, J., Kraeva, M., Hoekstra, J. (2011). An O(1) algorithm to detect deadlocks in collective routines in the distributed shared memory model. Preprint
![Page 43: UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C](https://reader035.fdocuments.us/reader035/viewer/2022070420/56815eb8550346895dcd3d2a/html5/thumbnails/43.jpg)
Thank you