Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April,...
-
Upload
lindsay-ball -
Category
Documents
-
view
212 -
download
0
Transcript of Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April,...
![Page 1: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/1.jpg)
Anshul Kumar, CSE IITD
CSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : Multiprocessors
Synchronization, Memory Consistency
17th April, 2006
![Page 2: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/2.jpg)
Anshul Kumar, CSE IITD slide 2
Synchronization ProblemSynchronization ProblemSynchronization ProblemSynchronization Problem
• Processes run on different processors independently
• At some point they need to know the status of each other for– communication– mutual exclusion etc
• Hardware primitive for atomic read+write is required (e.g. test&set, exchange, fetch&increment etc.)
![Page 3: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/3.jpg)
Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Lock: 0 indicates free and 1 indicates locked
Code to lock X : r2 1lockit: r2 X ;atomic exchange
if(r20)lockit ;already locked
locks are cached for efficiency, coherence is used
Better code to lock X :lockit: r2 X ;read lock if(r20)lockit ;not available r2 1 r2 X ;atomic exchange
if(r20)lockit ;already locked
![Page 4: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/4.jpg)
Anshul Kumar, CSE IITD slide 4
LD Locked & ST conditionalLD Locked & ST conditionalLD Locked & ST conditionalLD Locked & ST conditionalSimpler to implement• atomic exchange using LL and SCtry: r3 r2 ;move exchange value LL r1, X ;load locked SC r3, X ;store conditional if(r3=0)try ;branch store fails r2 r1 ;put loaded value in r2• fetch&increment using LL and SCtry: LL r1, X ;load locked r3 r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch store fails
![Page 5: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/5.jpg)
Anshul Kumar, CSE IITD slide 5
Spin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SClockit: LL r2, X ;load locked if(r20)lockit ;not available r2 1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails
spin lock with exponential back-off reduces contention
![Page 6: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/6.jpg)
Anshul Kumar, CSE IITD slide 6
Barrier SynchronizationBarrier SynchronizationBarrier SynchronizationBarrier Synchronization
lock (X)
if(count=0)release 0count++
unlock(X)
if(count=total){count0;release1}else spin(release=1)
![Page 7: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/7.jpg)
Anshul Kumar, CSE IITD slide 7
Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.
local_sense !local_senselock (X)count++unlock(X)if(count = total) {count0;releaselocal_sense}else {spin(release = local_sense)}
tree based barrier reduces contention
![Page 8: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/8.jpg)
Anshul Kumar, CSE IITD slide 8
Memory Consistency ProblemMemory Consistency ProblemMemory Consistency ProblemMemory Consistency Problem
• When must a processor see the value that has been written by another processor? Atomicity of operations – system wide?
• Can memory operations be re-ordered?
Various models :
http://rsim.cs.uiuc.edu/~sadve/Publications/
models_tutorial.ps
![Page 9: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/9.jpg)
Anshul Kumar, CSE IITD slide 9
ExampleExampleExampleExample
P1: A = 0 P2: B = 0 ... ... A = 1 B = 1L1: if(B=0)S1 L2: if(A=0)S2
Which statements among S1 and S2 are done?
Both S1, S2 may be done if writes are delayed
![Page 10: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/10.jpg)
Anshul Kumar, CSE IITD slide 10
Sequential ConsistencySequential ConsistencySequential ConsistencySequential Consistency
• result of any execution is same as if the operations of all processors were executed in some sequential order
• operations of each processor occur in the order specified by its program
- it requires all memory operations to be atomic
- too restrictive, high overheads
![Page 11: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/11.jpg)
Anshul Kumar, CSE IITD slide 11
Relaxing WRelaxing WR orderR orderRelaxing WRelaxing WR orderR order
Loads are allowed to overtake stores
Write buffering is permitted
1. Total Store Ordering : Writes are atomic
2. Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate
![Page 12: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/12.jpg)
Anshul Kumar, CSE IITD slide 12
Relaxing WRelaxing WR & WR & WW orderW orderRelaxing WRelaxing WR & WR & WW orderW order
Partial Store Ordering
• Loads are allowed to overtake stores
• Writes can be re-ordered
• Memory barrier or fence are used to explicitly order any operations
Further improves the performance
![Page 13: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/13.jpg)
Anshul Kumar, CSE IITD slide 13
ExamplesExamplesExamplesExamples
P1 P2
A = 1; while(flag=0);
flag = 1; print A;
P1 P2
A = 1; print B;
B = 1; print A;
SC ensures that “1” is printed
TSO, PC also do so
PSO does not
SC ensures that if B is printed as “1” then A is also printed as “1”
TSO, PC also do so
PSO does not
![Page 14: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/14.jpg)
Anshul Kumar, CSE IITD slide 14
Examples - continuedExamples - continuedExamples - continuedExamples - continued
P1 P2 P3A = 1; while(A=0); while(B=0); B = 1; print A;SC ensures that “1” is printed. TSO and PSO also do that but
PC does not
P1 P2A = 1; B = 1;print B; print A;SC ensures that both can’t be printed as “0”. TSO, PC and
PSO do not
![Page 15: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/15.jpg)
Anshul Kumar, CSE IITD slide 15
Relaxing all R/W orderRelaxing all R/W orderRelaxing all R/W orderRelaxing all R/W order
Weak Ordering or Weak Consistency
• Loads and Stores are not restricted to follow an order
• Explicit synchronization primitives are used
• Synchronization primitives follow a strict order
• Easy to achieve
• Low overhead
![Page 16: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/16.jpg)
Anshul Kumar, CSE IITD slide 16
Release ConsistencyRelease ConsistencyRelease ConsistencyRelease Consistency
• Further relaxation of weak ordering• Synch primitives are divided into aquire
and release operations• R/W operations after an aquire can not
move before it but those before it can be moved after
• R/W operations before a release can not move after it but those after it can be moved before
![Page 17: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649f3b5503460f94c59cac/html5/thumbnails/17.jpg)
Anshul Kumar, CSE IITD slide 17
WC and RC ComparisonWC and RC ComparisonWC and RC ComparisonWC and RC Comparison
R/W…
R/W
R/W…
R/W
R/W…
R/W
synch
synch
1
2
3
R/W…
R/W
R/W…
R/W
R/W…
R/W
aquire
release
1
2
3
WC RC