Synchronization
description
Transcript of Synchronization
![Page 1: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/1.jpg)
SYNCHRONIZATIONSpinlocks and all the rest
![Page 2: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/2.jpg)
Synchronization Overview Cache coherency Single versus Multi-core Under versus Oversubscribed Atomic operations …
![Page 3: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/3.jpg)
Synchronization Overview Spinlock
acquire_lock(lock){while (TAS(lock) == true);
} TAS – test and set
Puts true in address, returns old value
![Page 4: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/4.jpg)
Synchronization Mellor-Crummey, Scott 1991
Analyzed spinlocks and barriers○ Linear, Proportional, Exponential Backoff○ Ticket locks -> “now serving”
Proposed the “mcs” lock, a queue based lock
![Page 5: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/5.jpg)
![Page 6: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/6.jpg)
Overview Synchronization Types to be Discussed Further Developments Implementation Details
![Page 7: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/7.jpg)
Types to be Discussed Mutual Exclusion
SpinlockMutexReader Writer Lock
Execution PointBarrier
Queues, etc (time permitting)
![Page 8: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/8.jpg)
Spinlocks Spin until lock is acquired
Simple Implementation Contention on lock
![Page 9: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/9.jpg)
Queued Spinlock Create a local lock
Spin on itOn release, signal next waiter
Additional operations Reduced contention
![Page 10: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/10.jpg)
Mutex Wait to acquire
May use thread scheduler to wait
![Page 11: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/11.jpg)
Reader Writer Lock Readers can operate simultaneously
with other readers Only writers cause problems
Often spinlock plus count of readers
![Page 12: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/12.jpg)
Barrier Keep a group of threads in “sync” Barrier has to recognize two events
Old barrier as some threads may not be active
New barrier as threads may have reached it
![Page 13: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/13.jpg)
Further Developments
![Page 14: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/14.jpg)
Scalable RW Lock Modification to MCS lock
Count of Readers + Writer Waiting FlagQueue of waiting threadsReaders unblock readers on acquireWriters unblock next thread on release
John M. Mellor-Crummey and Michael L. Scott. 1991. Scalable reader-writer synchronization for shared-memory multiprocessors. In Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP '91). ACM, New York, NY, USA, 106-113.
![Page 15: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/15.jpg)
![Page 16: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/16.jpg)
Scalable RW Lock cont. Split up the reader access
Since readers can acquire the lock with readers, have multiple locks
Writers, however, need all of the reader locks
Wilson C. Hsieh and William E. Weihl. 1992. Scalable Reader-Writer Locks for Parallel Systems. In Proceedings of the 6th International Parallel Processing Symposium, Viktor K. Prasanna and Larry H. Canter (Eds.). IEEE Computer Society, Washington, DC, USA, 656-659.
![Page 17: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/17.jpg)
Scalable RW Lock cont. Or use a C-SNZI
Closable scalable nonzero indicatorLike a semaphore, but can be “closed”
What about write upgrade?
Yossi Lev, Victor Luchangco, and Marek Olszewski. 2009. Scalable reader-writer locks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures (SPAA '09). ACM, New York, NY, USA, 101-110.
![Page 18: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/18.jpg)
![Page 19: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/19.jpg)
Biased Locks First and second class “citizens”
Like readers / writers, but all exclusive Secondary locks request the lock
Primary holder grants them the lock
Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards. 2010. Simple and fast biased locks. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10). ACM, New York, NY, USA, 65-74.
![Page 20: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/20.jpg)
MCS Extensions Queue based locks
What if threads are preempted? Add a time component to the lock
Stale elements are skipped
Michael L. Scott and William N. Scherer. 2001. Scalable queue-based spin locks with timeout. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming (PPoPP '01). ACM, New York, NY, USA, 44-52.
B. He, W. N. Scherer III, and M. L. Scott. “Preemption Adaptivity in Time-Published Queue-Based Spin Locks,” 11th Intl. Conf. on High Performance Computing, Goa, India, Dec. 2005.
![Page 21: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/21.jpg)
Spinning vs Blocking Spinning = busy-waiting Blocking = thread scheduling
What is the trade-off between the two schemes?Tested Solaris pthread implementation that does
both
Ryan Johnson, Manos Athanassoulis, Radu Stoica, and Anastasia Ailamaki. 2009. A new look at the roles of spinning and blocking. In Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN '09). ACM, New York, NY, USA, 21-26.
![Page 22: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/22.jpg)
![Page 23: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/23.jpg)
Trees, etc Barriers Lots of threads all signaling a single
countSounds bad
Signal and Wakeup trees, with different degrees
![Page 24: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/24.jpg)
Hardware Supported Barriers Introduce dedicated on-chip connections
Single Centralized ControllerTransmission lines
Jungju Oh, Milos Prvulovic, and Alenka Zajic. 2011. TLSync: support for multiple fast barriers using on-chip transmission lines. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11). ACM, New York, NY, USA, 105-116.
![Page 25: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/25.jpg)
![Page 26: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/26.jpg)
Implementation Details
![Page 27: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/27.jpg)
Architectural Primitives Compare and Swap(mem, old, new)
If (*mem == old) *mem = newReturn what was in mem
LL/SCLL – load valueSC to same address succeeds only if data
unmodified
![Page 28: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/28.jpg)
Test and Test-and-Set Synchronization instructions are
expensiveSo don’t do them until likely to succeed
Test the lock, then Test-and-set the lock Caveat emptor
Can lead to races if used incorrectlyCan save time like TryToAcquire rather than
release
![Page 29: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/29.jpg)
Queued Spinlock Detailsvoid acquire_queued_spinlock(void* lock,
entry* me){ me->next = NULL; me->state = UNLOCKED; entry* prev = atomic_swap(lock, me); if (prev == NULL) return; me->state = LOCKED; prev->next = me; while (me->state == LOCKED);}
![Page 30: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/30.jpg)
Queued Spinlock Details contvoid release_queued_spinlock(void* lock, entry* me)
{ while (me->next == NULL) { if (me == CAS(lock, me, NULL)) return; } me->next->state = UNLOCKED;}
![Page 31: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/31.jpg)
Bibliography Dave Dice, Virendra J. Marathe, and Nir Shavit. 2011. Flat-combining NUMA locks. In
Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 65-74.
B. He, W. N. Scherer III, and M. L. Scott. “Preemption Adaptivity in Time-Published Queue-Based Spin Locks,” 11th Intl. Conf. on High Performance Computing, Goa, India, Dec. 2005.
Wilson C. Hsieh and William E. Weihl. 1992. Scalable Reader-Writer Locks for Parallel Systems. In Proceedings of the 6th International Parallel Processing Symposium, Viktor K. Prasanna and Larry H. Canter (Eds.). IEEE Computer Society, Washington, DC, USA, 656-659.
Ryan Johnson, Manos Athanassoulis, Radu Stoica, and Anastasia Ailamaki. 2009. A new look at the roles of spinning and blocking. In Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN '09). ACM, New York, NY, USA, 21-26.
Yossi Lev, Victor Luchangco, and Marek Olszewski. 2009. Scalable reader-writer locks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures (SPAA '09). ACM, New York, NY, USA, 101-110.
Peter S. Magnusson, Anders Landin, and Erik Hagersten. 1994. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing, Howard Jay Siegel (Ed.). IEEE Computer Society, Washington, DC, USA, 165-171.
![Page 32: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/32.jpg)
Bibliography cont John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable
synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (February 1991), 21-65.
John M. Mellor-Crummey and Michael L. Scott. 1991. Scalable reader-writer synchronization for shared-memory multiprocessors. In Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP '91). ACM, New York, NY, USA, 106-113.
Jungju Oh, Milos Prvulovic, and Alenka Zajic. 2011. TLSync: support for multiple fast barriers using on-chip transmission lines. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11). ACM, New York, NY, USA, 105-116.
Michael L. Scott and William N. Scherer. 2001. Scalable queue-based spin locks with timeout. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming (PPoPP '01). ACM, New York, NY, USA, 44-52.
Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards. 2010. Simple and fast biased locks. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10). ACM, New York, NY, USA, 65-74.
![Page 33: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/33.jpg)
Lock free list Store head pointer
Atomic update head
void push(node head, node n){ now = old = *head do { old = now n->next = old } while ((now = CAS(head, old, n)) != old) }
![Page 34: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/34.jpg)
“ABA” Problem Push C // pending
Pop APop BPush A
// Does Push C complete successfully now?
![Page 35: Synchronization](https://reader036.fdocuments.us/reader036/viewer/2022062501/568165b5550346895dd8b065/html5/thumbnails/35.jpg)
“ABA” Problem cont. Pop A // pending
Pop APop BPush A
Does Pop A succeed?