Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
-
Upload
byron-mills -
Category
Documents
-
view
229 -
download
0
Transcript of Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
Lazy Release Consistency for Software Distributed Shared Memory
Pete Keleher
Alan L. Cox
Willy Z.
Overview
Software DSM Release Consistency Eager Release Consistency Lazy Release Consistency Conclusion
Software DSM
Provides shared address space using software support
Rely on (user level) memory management techniques to detect access/updates to shared data
Memory coherence protocol – illusion of shared memory
High Communication overheads and Large page-size coherence units
Sending messages expensive in Software DSM
Release Consistency
Extension of weak consistency Weak Consistency
Synchronization – Globally Update Memory Local changes propagated to all processors
Release Consistency Propagates only locked memory as needed.
RC – Shared Memory Accesses
Shared Memory Accesses
Ordinary Special
Sync Nsync
Acquire Release
RC – Formal Definition
A system is release consistent if
Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed
Before a release is allowed to perform with respect to any other
processor, all the previous reads and writes must be performed.
Special accesses are sequentially consistent with each other.
Eager Release Consistency(based on Munin’s write share protocol)
Release Modification propagated at release
Invalidate Protocol– Sends invalidations
Update Protocol Diffs – limit the amount of data exchanged
Eager Release Consistency (..Contd)
Acquire No consistency related operations Protocol locates the processor that last executed a
release on the same variable
Access Miss Message to directory manager. Directory manager forwards request to current owner
Eager Release Consistency
P1
P2
P3
P4
w(x) rel
acq w(x) rel
acq w(x) rel
acq r(x)
Repeated Updates of Cached Copies in Eager RC
Lazy Release Consistency
Rather than eagerly “sync up” data at release point, why not “lazily” wait until the subsequent acquire?
Propagation of modifications postponed until the time of an acquire.
To do so happened-before-1 partial order is used.
Lazy Release Consistency
P1
P2
P3
P4
w(x) rel
acq w(x) rel
acq w(x) rel
acq r(x)
Message Traffic in LRC
happened-before-1 Partial Order
Shared memory accesses are partially ordered by happened-before-1, denoted by , defined as follows:
If a1 and a2 are accesses on the same processor, and a1 occurs before a2 in program order, then a1 a2
If a1 is a release on processor p1, and a2 is an acquire on the same location on processor p2, and a2 returns the value written by a1, then a1 a2
If a1 a2, a2 a3, then a1 a3.
hb1
hb1
hb1
hb1 hb1 hb1
Write Notices
RC requires that before a processor may continue past an acquire, all shared accesses that precede acquire must be performed at the acquiring processor
LRC – Guaranteed by write notices Write Notice
Indication of modification
Write Notice Propagation
Execution of each processor is divided into intervals
Interval beginning – special access executed by that processor
Interval performed at a processor All modifications during that interval have been
performed at the processor
Write Notice Propagation
P1
P2
P3
P4
w(x) rel
acq w(x) rel
acq w(x) rel
acq r(x)
ip1 iP2 ip3ip4
Write Notice Propagation
Vp(i) Vector Timestamp for interval i and processor p.
Number of elements in Vp(i) = Number of processors
Entry for p in Vp(i) = i Entry for q in Vp(i) = Most recent interval of q
performed at p
Write Notice Propagation
Vp1(ip1) = { ip1, 0, 0, 0}
Vp2(ip2) = {ip1, ip2, 0, 0}
Vp3(ip3) = {0, ip2, ip3, 0}
Vp4(ip4) = {0, 0, ip3, ip4} On acquire, the acquiring processor p3 sends its current
vector timestamp to previous releaser p2. Processor p2 uses this information to send p3 the write
notices for all intervals of all processors that have performed at p2 but not at p1
Data Movement Protocols
Multiple Writer Protocol
False Sharing
Occurs when two or more processors access different variables within a page, with at least one of the accesses being a write Generates large amount of message traffic Handling false sharing for software DSM – important because of large page size
LRC allows multiple writer protocol:
Allows concurrent writes to different part of the page No message traffic Modifications merged using diffs
Invalidate Vs Update
Invalidate Acquiring processor invalidates all pages in its cache for which it
receives write notices.
Update Updates those pages Diffs must be obtained for all concurrent modifiers. For interval i, diffs must be obtained from all intervals j, such that, j i,
and there exists no k such that j k i
hb1
hb1 hb1
Access Misses
Copy of page as well as a number of diffs may have to be retrieved
Modifications summarized by diffs are merged before access
Access Miss: At interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no k such that j k i
If processor has an invalidated copy of page Whole page not sent Write-notices contain all the necessary information of diffs Reduces the amount of data sent.
hb1 hb1 hb1
Conclusion
Performance of Software DSM – Sensitive to the number of messages and amount of data exchanged to create shared memory abstraction.
LRC aims at reducing both the number of messages and amount of data exchanged by allowing changes to propagate lazily, only when needed.