Heapamaral/papers/ISSC-IPDPS...Speedup in the MANNA machine. 4.2 Robustness to Latency Variation 4 8...

6

Transcript of Heapamaral/papers/ISSC-IPDPS...Speedup in the MANNA machine. 4.2 Robustness to Latency Variation 4 8...

  • HeapDeferred

    Full

    FATALERROR

    Empty

    allocatewrite

    read

    write

    read

    delete

    writeread

    delete

    delete

    reset

    reset

    reset

    delete

    Figure 1. State Transition Diagram for the I-Structure Implementation

  • 3.1 I-Structure Software Cache implementation

    Table 1. Latency of EARTH and ISSC oper-ations on EARTH-MANNA-SPN, measured innumber of cycles (1 cycle = 20 ns).

  • Table 2. ISSC cache hit ratios(%)

    Table 3. Percentage of reduced remote mem-ory requests by ISSC(%)

    4.1 Cache Performance

    4 8 12 16Number of Nodes

    0.0

    4.0

    8.0

    12.0

    16.0

    Absolu

    te Spee

    dup

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    4 8 12 16Number of Nodes

    0.0

    2.0

    4.0

    6.0

    8.0

    10.0

    12.0

    14.0

    Absolu

    te Spee

    dup

    4 8 12 16Number of Nodes

    0.0

    2.0

    4.0

    6.0

    8.0

    10.0

    Absolu

    te Spee

    dup

    Absolu

    te Spee

    dup

    2 2

    2

    (a) Matrix Multiplication (b) Conjugate Gradient

    (c) Hopfield (d) Sparse Matrix Multiplication

    4 8 12 16Number of Nodes

    0.0

    5.0

    10.0

    15.0

    20.0

    2

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Figure 2. Speedup in the MANNA machine.

    4.2 Robustness to Latency Variation

  • 4 8 12 16Number of Nodes

    0.0

    4.0

    8.0

    12.0

    16.0

    Absolu

    te Spee

    dup

    0 4 8 12 16Number of Nodes

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    Absolu

    te Spee

    dup

    4 8 12 16Number of Nodes

    0.0

    2.0

    4.0

    6.0

    8.0

    Absolu

    te Spee

    dup

    2 2

    2

    (a) Matrix Multiplication (b) Conjugate Gradient

    (c) Hopfield (d) Sparse Matrix Multiplication

    4 8 12 16Number of Nodes

    0.0

    5.0

    10.0

    15.0

    20.0Abs

    olute S

    peedup

    2

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Figure 3. Absolute speedup with 10 s com-munication interface overhead

    1 10 100Communication Interface Overhead (µs)

    106

    107

    108

    109

    Execut

    ion Tim

    e (µs)

    1 10 100Communication Interface Overhead (µs)

    105

    106

    107

    Execut

    ion Tim

    e (µs)

    1 10 100Communication Interface Overhead (µs)

    105

    106

    107

    Execut

    ion Tim

    e (µs)

    500 500

    500

    (a) Matrix Multiplication (b) Conjugate Gradient

    (c) Hopfield (d) Sparse Matrix Multiplication

    6.3µs 9.2µs

    6µs

    1 10 100Communication Interface Overhead (µs)

    106

    107

    108

    Execut

    ion Tim

    e (µs)

    500

    3.2µs

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Threaded-CThreaded-C + ISThreaded-C + ISSC

    Figure 4. Execution time with syntheticallyvariable communication interface overhead