ECE 1747: Parallel Programming
description
Transcript of ECE 1747: Parallel Programming
ECE 1747: Parallel Programming
Distributed Shared Memory (DSM)
Multiprocessor (SMP)
proc1 proc3
X=0
X=0 X=0
proc2
X=0
Consistency Models
• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that
reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.
Common consistency protocols
• Write update– Multicast update to all replicas
• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X
• Valid data from other cache
Distributed Shared Memory (DSM)
mem0
proc0
mem1
proc1
mem2
proc2
memN
procN
network
...
shared memory
DSM programming
• Standard – pthread-like• synchronizations
– Barriers – Locks– Semaphores
Sequential SOR
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=1, j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )
for( j=1; j<n; j++ )grid[i][j] = temp[i][j];
}
Parallel SOR with Barriers (1 of 2)
void* sor (void* arg){
int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;
for some number of iterations { … }}
Parallel SOR with Barriers (2 of 2)
for (i=from; i<to; i++) for (j=1; j<n; j++)
temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);
barrier();for (i=from; i<to; i++)
for (j=1; j<n; j++) grid[i][j]=temp[i][j];
barrier();
Sequential Consistency DSM
• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement
sharing.• Shared memory divided up by virtual
memory pages.• Use an SMP-like coherence protocol.• Keep pages in one of three states:
– invalid, read-only, read-write
SC implementation
• Synchronous read/write– Writes must be propagated before
moving on to the next operation
Read-Write False Sharing
x
y
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
w(x) w(x)
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
Weak Consistency (WEAKC)
• Data modifications are only propagated at the time of synchronization.
• Works fine if program is properly synchronized through system primitives.– All programs should be …
Read-Write False Sharing (Before)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
Read-Write False Sharing (WEAKC)
w(x) w(x)
r(y) r(y) r(x)
synch
Write-Write False Sharing
x
y
Write-Write False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Write-Write False Sharing (WEAKC)
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Multiple Writer (MW) Protocols
• Allows multiple writers per page.• Modifications merged at
synchronization (according to weakc definition).
• Modifications are recorded through a mechanism called twinning and diffing.
Write-Write False Sharing and MW
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Creating a diff (delta)
w(x) w(x)...
twin Diff (delta)
writablewrite-protected
write-protected
Write-Write False Sharing and MW
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
y yx
xtwin
twin
x
Release Consistency (RC)
• Distinguish acquires from releases– Ordinary read/write wait until the
previous acquire is performed– Release waits until previous
read/write are performed– Acquire/release are sequentially
consistent w.r.t. one another
Eager & Lazy Release Consistency
• Eager release consistency: transfer consistency information at release of a lock.
• Lazy release consistency: transfer consistency information at acquire of a lock.
Eager Release Consistency
w(x) rel
acq r(x)
acq w(x) rel
p1
p2
p3
p4
Acq w(x) rel
Lazy Release Consistency
w(x) rel
acq r(x)
acq w(x) rel
p1
p2
p3
p4
Acq w(x) rel
Lazy Release Consistency
• Acquiring processor determines witch modifications it needs to see.
w(x) rel
acq w(y) rel
p1
p2
p3acq r(x) r(y)
synch
Vector Timestamps
w(x) rel
acq w(y) rel
p1
p2
p3acq r(x) r(y)
000
000
000
100
110
DSM Summary
• Relaxed consistency– application’s definition of correctness
• >70% performance of corresponding message passing applications