Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Lazy Release Consistency for Software Distributed Shared Memory

Pete Keleher

Alan L. Cox

Willy Z.

Overview

Software DSM Release Consistency Eager Release Consistency Lazy Release Consistency Conclusion

Software DSM

Provides shared address space using software support

Rely on (user level) memory management techniques to detect access/updates to shared data

Memory coherence protocol – illusion of shared memory

High Communication overheads and Large page-size coherence units

Sending messages expensive in Software DSM

Release Consistency

Extension of weak consistency Weak Consistency

Synchronization – Globally Update Memory Local changes propagated to all processors

Release Consistency Propagates only locked memory as needed.

RC – Shared Memory Accesses

Shared Memory Accesses

Ordinary Special

Sync Nsync

Acquire Release

RC – Formal Definition

A system is release consistent if

Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed

Before a release is allowed to perform with respect to any other

processor, all the previous reads and writes must be performed.

Special accesses are sequentially consistent with each other.

Eager Release Consistency(based on Munin’s write share protocol)

Release Modification propagated at release

Invalidate Protocol– Sends invalidations

Update Protocol Diffs – limit the amount of data exchanged

Eager Release Consistency (..Contd)

Acquire No consistency related operations Protocol locates the processor that last executed a

release on the same variable

Access Miss Message to directory manager. Directory manager forwards request to current owner

Eager Release Consistency

P1

P2

P3

P4

w(x) rel

acq w(x) rel

acq w(x) rel

acq r(x)

Repeated Updates of Cached Copies in Eager RC

Lazy Release Consistency

Rather than eagerly “sync up” data at release point, why not “lazily” wait until the subsequent acquire?

Propagation of modifications postponed until the time of an acquire.

To do so happened-before-1 partial order is used.

Lazy Release Consistency

P1

P2

P3

P4

w(x) rel

acq w(x) rel

acq w(x) rel

acq r(x)

Message Traffic in LRC

happened-before-1 Partial Order

Shared memory accesses are partially ordered by happened-before-1, denoted by , defined as follows:

If a1 and a2 are accesses on the same processor, and a1 occurs before a2 in program order, then a1 a2

If a1 is a release on processor p1, and a2 is an acquire on the same location on processor p2, and a2 returns the value written by a1, then a1 a2

If a1 a2, a2 a3, then a1 a3.

hb1

hb1

hb1

hb1 hb1 hb1

Write Notices

RC requires that before a processor may continue past an acquire, all shared accesses that precede acquire must be performed at the acquiring processor

LRC – Guaranteed by write notices Write Notice

Indication of modification

Write Notice Propagation

Execution of each processor is divided into intervals

Interval beginning – special access executed by that processor

Interval performed at a processor All modifications during that interval have been

performed at the processor


P1

P2

P3

P4

w(x) rel

acq w(x) rel

acq w(x) rel

acq r(x)

ip1 iP2 ip3ip4


Vp(i) Vector Timestamp for interval i and processor p.

Number of elements in Vp(i) = Number of processors

Entry for p in Vp(i) = i Entry for q in Vp(i) = Most recent interval of q

performed at p


Vp1(ip1) = { ip1, 0, 0, 0}

Vp2(ip2) = {ip1, ip2, 0, 0}

Vp3(ip3) = {0, ip2, ip3, 0}

Vp4(ip4) = {0, 0, ip3, ip4} On acquire, the acquiring processor p3 sends its current

vector timestamp to previous releaser p2. Processor p2 uses this information to send p3 the write

notices for all intervals of all processors that have performed at p2 but not at p1

Data Movement Protocols

Multiple Writer Protocol

False Sharing

Occurs when two or more processors access different variables within a page, with at least one of the accesses being a write Generates large amount of message traffic Handling false sharing for software DSM – important because of large page size

LRC allows multiple writer protocol:

Allows concurrent writes to different part of the page No message traffic Modifications merged using diffs

Invalidate Vs Update

Invalidate Acquiring processor invalidates all pages in its cache for which it

receives write notices.

Update Updates those pages Diffs must be obtained for all concurrent modifiers. For interval i, diffs must be obtained from all intervals j, such that, j i,

and there exists no k such that j k i

hb1

hb1 hb1

Access Misses

Copy of page as well as a number of diffs may have to be retrieved

Modifications summarized by diffs are merged before access

Access Miss: At interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no k such that j k i

If processor has an invalidated copy of page Whole page not sent Write-notices contain all the necessary information of diffs Reduces the amount of data sent.

hb1 hb1 hb1

Conclusion

Performance of Software DSM – Sensitive to the number of messages and amount of data exchanged to create shared memory abstraction.

LRC aims at reducing both the number of messages and amount of data exchanged by allowing changes to propagate lazily, only when needed.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Documents

Transcript of Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.