1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false...
-
date post
22-Dec-2015 -
Category
Documents
-
view
237 -
download
0
Transcript of 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false...
![Page 1: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/1.jpg)
1DSM Innovations - MultiView
Software Distributed Shared Memory (SDSM):
MultiView
1. SDSM, false sharing.
2. Solution: MultiView.
3. Granularity adaptation.
4. Integrated services.
Ayal Itzkovitz, Assaf Schuster
![Page 2: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/2.jpg)
2DSM Innovations - MultiView
A multi-core system (simplified)
A parallel program may spawn processes (threads) in order to utilize all computing units
Processes communicate through shared memory, physically located on the local machine
core
Local memory
core core
core
![Page 3: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/3.jpg)
3DSM Innovations - MultiView
Network
A distributed system
core
Local memory
core
Local memory
core
Local memory
Virtual Shared Memory
Emulation of the same programming paradigm Ultimately: no changes to source/binary code
![Page 4: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/4.jpg)
4DSM Innovations - MultiView
The First SDSM System
The first software SDSM system, Ivy [Li & Hudak, Yale, ‘86] Strict memory semantics (Lamport’s sequential consistency)
Page-based: memory pages as units of sharing
The major performance limitation:
Page size False sharing Page size – 4K (and more) Average object size – 28 bytes
About 150 objects on a page
![Page 5: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/5.jpg)
5DSM Innovations - MultiView
Object Distribution
![Page 6: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/6.jpg)
6DSM Innovations - MultiView
Network
Object Distribution – Memory View
![Page 7: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/7.jpg)
“…the conventional wisdom remains that the overhead of false sharing […] in page-based consistency protocols is the primary factor limiting the performance of software SDSM”
[Amza, Cox, Ramajamni, and Zwaenepoel, PPoPP ‘97]
“[The] conventional wisdom holds that fine-grain performance and false sharing doom page-based approaches”
[Buck and Keleher, IPPS ‘98]
False Sharing
![Page 8: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/8.jpg)
8DSM Innovations - MultiView
Solution: The MultiView Approach
“MultiView and Millipage – Fine-grain Sharing in Page-based SDSMs” [Itzkovitz and Schuster, OSDI ‘99]
Implement small-size pages through special memory configuration
Other Goals: W/O compromising the strict memory consistency [ICS’04, EuroPar’04]
Utilizing low-latency networks (Myrinet, VIA/ServerNet-II, Infiniband) [Hot-Interconnects’03, IPDPS’04]
Transparency [EuroPar’03]
Adaptive sharing granularity [ICPP’00, IPDPS’01 best paper]
Maximize locality through migration and load sharing [DISC’01]
Additional “service layers” (garbage collection, data-race detection) [JPDC’01,JPDC02]
![Page 9: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/9.jpg)
9DSM Innovations - MultiView
The Traditional Memory Layout
xyz
Traditional
w
v
u
struct a { …};struct b; int x, y, z;
main() { w = malloc(sizeof(struct a)); v = malloc(sizeof(struct a)); u = malloc(sizeof(struct b));
…}
struct a { …};struct b; int x, y, z;
main() { w = malloc(sizeof(struct a)); v = malloc(sizeof(struct a)); u = malloc(sizeof(struct b));
…}
![Page 10: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/10.jpg)
10DSM Innovations - MultiView
xyz
The MultiView Technique
TraditionalMultiView
w
v
u
w
v
u
xyz
![Page 11: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/11.jpg)
11DSM Innovations - MultiView
The MultiView Technique
TraditionalMultiView
w
v
u
xyz
xyz
w
v
u
Protection is now set independently
RW
NAR
Variables reside in the same page but are not shared
![Page 12: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/12.jpg)
12DSM Innovations - MultiView
The MultiView Technique
TraditionalMultiView
w
v
u
xyz
xyz
w
v
u
View 1
View 2
View 3
Memory Object
![Page 13: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/13.jpg)
13DSM Innovations - MultiView
The MultiView Technique
Memory Layout
View 1
View 2
Memory Object
xyz
MultiView
w
v
u
MemoryObjectView 1
View 2
View 3
View 3
![Page 14: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/14.jpg)
14DSM Innovations - MultiView
The MultiView Technique
Host A
View 1
View 2
Memory Object
View 3
Host B
View 1
View 2
Memory Object
View 3
R R
NA RW
NA
R
R
R
R
R
RW
RW
NA
NA
![Page 15: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/15.jpg)
15DSM Innovations - MultiView
The MultiView Technique
View 1
View 2
View 3
View 1
View 2
View 3
R R
NA RW
NA
R
R
R
R
R
RW
RW
NA
NA
Host A Host B
![Page 16: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/16.jpg)
16DSM Innovations - MultiView
Enabling Technology
SharedMemoryObject
Memory mapped I/O created for inter-process communication
![Page 17: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/17.jpg)
17DSM Innovations - MultiView
Implementation: Millipage
Can be used by a single process to provide desired functionality
SharedMemoryObject
• Windows-NT (Solaris, BSD, Linux)
• CreateFileMapping(), MapViewOfFileEx() for allocating views
![Page 18: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/18.jpg)
18DSM Innovations - MultiView
Transparency
1999: Minipages are allocated at malloc time (via malloc-like API) Allocation routines should be slightly modified
mat = malloc(lines*cols*sizeof(int));…mat[i][j] = mat[i-1][j]+mat[i][j-1]; …
mat = malloc(lines*sizeof(int*));for(i=0;i<N;i++) mat[i] = malloc(cols*sizeof(int));…mat[i][j] = mat[i-1][j]+mat[i][j-1]; …
SOR and LU have not been modified at all WATER- changed ~20 lines out of 783 lines IS- changed 5 lines out of 93 lines TSP- changed ~15 lines out of ~400 lines
2003: complete transparency Through binary instrumentation/interception of OS calls
![Page 19: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/19.jpg)
19DSM Innovations - MultiView
SOR SPLASH-II Benchmark
SOR speedup
012345678
0 2 4 6 8 10
Number of threads
Spe
edup
Transparent DSM
Millipede 4.0
Transparent+Barrier
SMP (2 processors)
![Page 20: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/20.jpg)
20DSM Innovations - MultiView
Performance with Fixed Granularity(NBodyW on 8 nodes)
50
52
54
56
58
60
62
allocation granularity
run
tim
e [
s]
![Page 21: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/21.jpg)
21DSM Innovations - MultiView
False Sharing vs. Prefetching (WATER)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 none
chunking level
0.50
0.60
0.70
0.80
0.90
1.00
1.10
eff
icie
nc
y
compete req. (4) x 10 compete req. (8) x 10
Read/Write faults(4) Read/Write faults(8)
efficiency (4 hosts) efficiency (8 hosts)
![Page 22: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/22.jpg)
22DSM Innovations - MultiView
Adapting Granularity
Application run time
Sha
red
data
ele
men
ts
Adaptation is dynamic, automatic, transparent
![Page 23: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/23.jpg)
23DSM Innovations - MultiView
Performance (VIA/ServerNet-II, 2004)
1 2 4 6 8 10 120
2
4
6
8
10
12Water-nsq speedup (one thread per node)
nodes
spee
dup
1 2 4 6 8 10 1202468
1012141618202224
Water-nsq speedup (two threads per node)
nodes
spee
dup
SC/MV - fine granularityHLRCMixed consistencySC/MV - best static granularity
SC/MV - dynamic granularity
![Page 24: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/24.jpg)
24DSM Innovations - MultiView
Integrating Data Race Detection
Detection in application variable granularity
Overheads 1 proc
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
SOR LU IS TSP W ATER
no
rmal
ized
exe
cuti
on
tim
e
NO_DR BAS PCT OPT
Overheads 8 proc
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
SOR LU IS TSP W ATER
no
rmal
ized
exe
cuti
on
tim
e
NO_DR BAS PCT OPT
![Page 25: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/25.jpg)
25DSM Innovations - MultiView
Integrating Distributed Garbage Collection(Remote Reference Counting)
Collection in native application granularity.
0.20%
2.50% 2.60%
37.70%
0%
5%
10%
15%
20%
25%
30%
35%
40%
IS 0.8 LU 30 WATER 31 SOR 1140
garbage creation ratio (obj/sec)
ove
rhe
ad
![Page 26: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/26.jpg)
26DSM Innovations - MultiView
Questions?
![Page 27: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/27.jpg)
27DSM Innovations - MultiView
1. In-core multi-threading
2. Multi-core/SMP multi-threading
3. Tightly-coupled cluster,
customized interconnect (SGI’s Altix)
4. Tightly-coupled cluster,
of-the-shelf interconnect (InfiniBand)
5. WAN, Internet, Grid, peer-to-peer
Traditionally: 1+2 are programmable using shared memory, 3+4 are programmable using message passing, in 5 peer processes communicate with central control only.
HDSM: systems in 3 move towards presenting a shared memory interface to a physically distributed system.
What about 4,5? Software Distributed Shared Memory = SDSM
Types of Parallel Systems
Scalability
Communication Efficiency
![Page 28: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/28.jpg)
28DSM Innovations - MultiView
Matrix Multiplication
R R W
two threads
Read/only matrices Write matrix
A = malloc(MATSIZE);B = malloc(MATSIZE);C = malloc(MATSIZE);
parfor(n) mult(A, B, C);
mult(id):
for (line=Nxid .. Nx(id+1)) for(col=0..N) C[line,col] = multline(A[line],B[col]);
![Page 29: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/29.jpg)
29DSM Innovations - MultiView
Network
Matrix Multiplication
RO RO
RO RO
RO RO
RW RW
RO RO
RO RO
RO RO
RW RW
A
x
B
=
C
A
x
B
=
C
Sent once
Sent once
![Page 30: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/30.jpg)
30DSM Innovations - MultiView
Network
Matrix Multiplication
RO RO
RO RO
RO RO
RW RW
RO RO
RO RO
RO RO
RW RW
A
x
B
=
C
A
x
B
=
C
R WR
![Page 31: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/31.jpg)
31DSM Innovations - MultiView
Network
Matrix Multiplication - False Sharing
RO RO
RO RO
NA
RO RO
RO RO
A
x
B
=
C
A
x
B
=
C
Sent once
RO RO
RW RW
RO RO
RO RO
RO RO
RW RW
Sent once
NA
RO RO RO RO
RW RW
![Page 32: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/32.jpg)
32DSM Innovations - MultiView
Network
Matrix Multiplication - False Sharing
RO RO
RO RO
RO RO
RO RO
A
x
B
=
C
A
x
B
=
CRW RW
RO RO RO RO
RW RW
NA NA
RO RO
RO RO
RO RO
RO RO
RW RW
![Page 33: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/33.jpg)
33DSM Innovations - MultiView
Network
Matrix Multiplication - False Sharing
RO RO
RO RO
RO RO
RO RO
A
x
B
=
C
A
x
B
=
CRW RW
RO RO RO RO
RW RW
RO RO
RO RO
RO RO
RO RO
RW RWNA NA
![Page 34: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/34.jpg)
34DSM Innovations - MultiView
RR W
Network
Matrix Multiplication - False Sharing
RO RO
RO RO
RO RO
RO RO
A
x
B
=
C
A
x
B
=
C
RO RO RO RO
RO RO
RO RO
RO RO
RO RO
RW RW
RW RW
RW RW
RW RW
![Page 35: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/35.jpg)
35DSM Innovations - MultiView
First Approach: Weak Semantics
Example - Release Consistency: Allow multiple writers to page
(assume exclusive update for any portion of the page) Each page has a twin copy At synchronization time, all pages perform “diff” with their twins, and
send diffs to managers Managers hold master copies
twin twin
RW RW
Apply diff Apply diff
![Page 36: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/36.jpg)
36DSM Innovations - MultiView
First Approach: Weak Semantics
Allow memory to reside in an incosistent state for time intervals
Enforce consistency only at synchronization points Reaching a consistent view of the memory requires
computation
Reduces (but not always eliminate) false sharing Reduces number of protocol messages
Weak memory semantics Involves both memory and processing time overhead
Still: coarse-grain sharing (why diff at locations not touched? )
![Page 37: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/37.jpg)
37DSM Innovations - MultiView
Software DSM Evolution - Weak Semantics
Li & Hudak - IVY, ‘86Yale
Munin, ‘92Release Cons.
Rice
Midway, ‘93Entry Cons.CMU
Treadmarks, ‘94Lazy Release Cons.
Rice
Brazos, ‘97Scope Cons.
Rice
Page-grain:
Relaxed consistency
![Page 38: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/38.jpg)
38DSM Innovations - MultiView
Software DSM Evolution - Multithreading
Li & Hudak - IVY, ‘86Yale
Munin, ‘92Release Cons.
Rice
Midway, ‘93Entry Cons.CMU
Treadmarks, ‘94Lazy Release Cons.
Rice
Brazos, ‘97Scope Cons.
Rice
Page-grain:
Relaxed consistency
CVM, Millipede, ‘96 multi-protocol
Maryland Technion
Quarks, ‘98protocol latency hiding
Utah
Multithreading
![Page 39: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/39.jpg)
39DSM Innovations - MultiView
Second Approach:Code Instrumentation
Example - Binary Rewriting: wrap each load and store with instructions that check whether
the data is available locally
load r1, ptr[line]load r2, ptr[v] add r1, 3hstore r1, ptr[line]sub r2, r1store r2, ptr[v]
push ptr[line]call __check_rload r1, ptr[line]push ptr[v]call __check_r load r2, ptr[v] add r1, 3hpush ptr[line]call __check_wstore r1, ptr[line]push ptr[line]call __done sub r2, r1push ptr[v]call __check_w store r2, ptr[v]push ptr[v]call __done
CodeInstr.
push ptr[line]call __check_wload r1, ptr[line]push ptr[v]call __check_w load r2, ptr[v] add r1, 3hstore r1, ptr[line]push ptr[line]call __done sub r2, r1store r2, ptr[v]push ptr[v]call __done
Opt.
line += 3; v = v - line;
Compile
![Page 40: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/40.jpg)
40DSM Innovations - MultiView
Second Approach:Code Instrumentation
Provides fine-grain access control, thus avoids false sharing
Bypasses the page protection mechanism Usually, fixed granularity for all application data (Still,
false sharing ) Needs a special compiler or binary-level rewriting tools
Cost: High overheads (even on single machine) Inflated code Not portable (among architectures)
![Page 41: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/41.jpg)
41DSM Innovations - MultiView
Software DSM Evolution
Li & Hudak - IVY, ‘86Yale
Munin, ‘92Release Cons.
Rice
Midway, ‘93Entry Cons.CMU
Treadmarks, ‘94Lazy Release Cons.
Rice
Brazos, ‘97Scope Cons.
Rice
Page-grain:
Relaxed consistency
CVM, Millipede, ‘96 multi-protocol
Maryland Technion
Quarks, ‘98protocol latency hiding
Utah
Multithreading
Blizzard, ‘94binary
instrumentationWisconsin
Shasta, ‘97transparent,
works forcommercial apps
Digital WRL
Fine-grain:Code
Instrumentation
![Page 42: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/42.jpg)
42DSM Innovations - MultiView
MultiView - Overheads
Application:traverse an array of integers, all packed up in minipages
The number of minipages is derived from the value of max views in page
Limitations of the experiments: 1.63GB contiguous address space available Up to 1664 views Need 64 bits!!!
![Page 43: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/43.jpg)
43DSM Innovations - MultiView
MultiView - Overheads
As expected, committed (physical) memory is constant Only a negligible overhead (< 4%): Due to TLB misses
0.96
0.98
1
1.02
1.04
1.06
1.08
512Kb
1 MB 2 MB 4 MB 8 MB 16MB
Slo
wdo
wns
1 2 4 8 16 32Num views
![Page 44: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/44.jpg)
44DSM Innovations - MultiView
MultiView - Taking it to the extreme
Beyond critical points overhead becomes substantial
0
2
4
6
8
10
12
14
16
18
20
Number of views
Slo
wd
ow
n
512 Kb 1 MB 2 MB4 MB 8 MB 16 MB
8MB
4MB
2MB
1MB
Number of minipages at critical points is 128K Slowdown due to L2 cache exhausted by PTEs
![Page 45: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/45.jpg)
45DSM Innovations - MultiView
MultiView - Taking it to the extreme
Beyond critical points overhead becomes substantial
0
2
4
6
8
10
12
14
16
18
20
Number of views
Slo
wd
ow
n
512 Kb 1 MB 2 MB4 MB 8 MB 16 MB
8MB
4MB
2MB
1MB
Number of minipages at critical points is 128K Slowdown due to L2 cache exhausted by PTEs
SDSM
![Page 46: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/46.jpg)
47DSM Innovations - MultiView
The Transparent DSM: System Initialization
For most DSM systems, initialization is an almost trivial task
The transparent DSM system cannot use such a simple solution
In order to initialize a DSM system transparently we have to inject the initialization code into the loaded application
![Page 47: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/47.jpg)
48DSM Innovations - MultiView
Standard Initialization
…call c_init…call main…
crtStartup:
…application code…
main:
Startup code from in the C standard library. This code is
identical for all C applications.crtStartup is the entry point of
the executable.
Standard C application
This instruction lies at a fixed offset from crtStartup. We
denote this offset as main_call_offset
Initialize the C runtime library
Start the application
![Page 48: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/48.jpg)
49DSM Innovations - MultiView
Transparent DSM System Initialization
…call c_init…call main…
crtStartup:
…application code…
main:
mainPtr dd NULL
hookedMain: dsm_init(…); dsm_create_thread(…,mainPtr,…); …
DllMain: … crtStartup = get_entry_point(); mainPtr = *(crtStartup + main_call_offset); *(crtStartup + main_call_offset) = hookedMain; …
main
hookedMain
Injected DLL
The OS passes control to DllMain() after
the DLL has been loadedThe main thread is resumed
Initialize the C runtime library
Initialize the DSM system(the OS API is intercepted,
globals are moved to the DSM)
The application main threadis created using the DSM
system thread creation API
![Page 49: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/49.jpg)
50DSM Innovations - MultiView
SDSMs on Emerging Fast Networks
Fast networking is an emerging technology MultiView provides only one aspect: reducing message
sizes
The next magnitude of improvement shifts from the network layer to the system architectures and protocols that use those networks
Challenges: Efficiently employ and integrate fast networks Provide a “thin” protocol layer: reduce protocol complexity, eliminate
buffer copying, use home-based management, etc.
![Page 50: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/50.jpg)
51DSM Innovations - MultiView
Adding the Privileged View
Constant Read/Write permissions
Separate application threads from SDSM injected threads
Atomic updates DSM threads can access (and
update) memory while application threads are prohibited
Direct send/receive Memory-to-memory No buffer copying
xyz
Application Views
RW
NAR
RW
The Privileged View
Memory Object
![Page 51: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/51.jpg)
52DSM Innovations - MultiView
Coarse Granularity
1
2
3
5
6
4
Manager
Memory Access Request(1-6) Request
Request
Host 1 Host 2
Host 3
Reply (Data 2,4,5)
Reply (Data 1,3) 1
2
3
5
6
4
1
2
3
5
6
4
![Page 52: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/52.jpg)
53DSM Innovations - MultiView
Automatic Adaptation of Granularity
1
2
3
4
5
6
Recompose
When same host accesses consecutive minipages
Coarse granularity
1
2
3
4
5
6
Coarse granularityHost A
Host A
Split
When different hosts update
different minipages
Host A
Host B
Fine granularity
1
2
3
4
5
6
Fine granularity
![Page 53: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/53.jpg)
54DSM Innovations - MultiView
Memory Faults(Barnes)
0
10000
20000
30000
40000
50000fa
ult
s
read faults write faults
Millipede
![Page 54: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/54.jpg)
55DSM Innovations - MultiView
Water-nsq Performance (cont’d)
SC/MV-f.g. HLRC Mixed SC/MV-b.g.0
20
40
60
80
100
120
140
160
180
200
run
time
brea
kdow
ns (s
ec)
Water-nsquared breakdown
computationread faultswrite faultsbarrierslocks
1 2 3 4 5 6 7 80
0.5
1
1.5
2
2.5
3
3.5
4x 10
4
chunking level (molecules)
Pro
toco
l ove
rhea
d
read faultswrite faultscompete requests
run-
time
(sec
)
run-time
The effect of chunking in Water - nsquared
162
164
166
168
170
172
174
176
178
180
![Page 55: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/55.jpg)
56DSM Innovations - MultiView
Basic Costs in Millipage(Myrinet interconnect, 1998)
Access fault 26 usec
get protection 7 usec
set protection 12 usec
messages (one way)
header msg 12 usec
a data msg (1/2 KB) 22 usec
a data msg (1 KB) 34 usec
a data msg (4 KB) 90 usec
MPT translation 7 usec
Message sizes directly influence latency
The most compute demanding operation: Minipage translation - 7 usec
In relaxed consistency systems, protocol operations might take hundreds of usecs
example:Run-length diff for 4KB page: 250 usec
![Page 56: 1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1.SDSM, false sharing. 2.Solution: MultiView. 3.Granularity adaptation.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d815503460f94a6576d/html5/thumbnails/56.jpg)
57DSM Innovations - MultiView
Scalability (IB vs. VIA interconnects, 2003)
Application Speedups (8 nodes)
0
2
4
6
8
10
12
14
16
Sp
eed
up
VIA/ServerNet - 1 thread Kernel/IB - 1 thread Kernel/IB - 2 threads