Heapamaral/papers/ISSC-IPDPS...Speedup in the MANNA machine. 4.2 Robustness to Latency Variation 4 8...
Transcript of Heapamaral/papers/ISSC-IPDPS...Speedup in the MANNA machine. 4.2 Robustness to Latency Variation 4 8...
-
HeapDeferred
Full
FATALERROR
Empty
allocatewrite
read
write
read
delete
writeread
delete
delete
reset
reset
reset
delete
Figure 1. State Transition Diagram for the I-Structure Implementation
-
3.1 I-Structure Software Cache implementation
Table 1. Latency of EARTH and ISSC oper-ations on EARTH-MANNA-SPN, measured innumber of cycles (1 cycle = 20 ns).
-
Table 2. ISSC cache hit ratios(%)
Table 3. Percentage of reduced remote mem-ory requests by ISSC(%)
4.1 Cache Performance
4 8 12 16Number of Nodes
0.0
4.0
8.0
12.0
16.0
Absolu
te Spee
dup
Threaded-CThreaded-C + ISThreaded-C + ISSC
4 8 12 16Number of Nodes
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
Absolu
te Spee
dup
4 8 12 16Number of Nodes
0.0
2.0
4.0
6.0
8.0
10.0
Absolu
te Spee
dup
Absolu
te Spee
dup
2 2
2
(a) Matrix Multiplication (b) Conjugate Gradient
(c) Hopfield (d) Sparse Matrix Multiplication
4 8 12 16Number of Nodes
0.0
5.0
10.0
15.0
20.0
2
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Figure 2. Speedup in the MANNA machine.
4.2 Robustness to Latency Variation
-
4 8 12 16Number of Nodes
0.0
4.0
8.0
12.0
16.0
Absolu
te Spee
dup
0 4 8 12 16Number of Nodes
0.0
1.0
2.0
3.0
4.0
5.0
Absolu
te Spee
dup
4 8 12 16Number of Nodes
0.0
2.0
4.0
6.0
8.0
Absolu
te Spee
dup
2 2
2
(a) Matrix Multiplication (b) Conjugate Gradient
(c) Hopfield (d) Sparse Matrix Multiplication
4 8 12 16Number of Nodes
0.0
5.0
10.0
15.0
20.0Abs
olute S
peedup
2
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Figure 3. Absolute speedup with 10 s com-munication interface overhead
1 10 100Communication Interface Overhead (µs)
106
107
108
109
Execut
ion Tim
e (µs)
1 10 100Communication Interface Overhead (µs)
105
106
107
Execut
ion Tim
e (µs)
1 10 100Communication Interface Overhead (µs)
105
106
107
Execut
ion Tim
e (µs)
500 500
500
(a) Matrix Multiplication (b) Conjugate Gradient
(c) Hopfield (d) Sparse Matrix Multiplication
6.3µs 9.2µs
6µs
1 10 100Communication Interface Overhead (µs)
106
107
108
Execut
ion Tim
e (µs)
500
3.2µs
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Threaded-CThreaded-C + ISThreaded-C + ISSC
Figure 4. Execution time with syntheticallyvariable communication interface overhead