I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T....
Transcript of I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T....
![Page 1: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/1.jpg)
spcl.inf.ethz.ch
@spcl_eth
S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. HOEFLER
Exploiting Offload Enabled Network Interfaces
![Page 2: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/2.jpg)
spcl.inf.ethz.ch
@spcl_eth
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
![Page 3: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/3.jpg)
spcl.inf.ethz.ch
@spcl_eth
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
![Page 4: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/4.jpg)
spcl.inf.ethz.ch
@spcl_eth
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
How to
program
QsNet?
![Page 5: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/5.jpg)
spcl.inf.ethz.ch
@spcl_eth
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
How to
program
QsNet?
How to
offload in
Portals 4?
![Page 6: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/6.jpg)
spcl.inf.ethz.ch
@spcl_eth
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
How to
program
QsNet?
How to
offload in
Portals 4?
How to
offload in
libfabric?
![Page 7: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/7.jpg)
spcl.inf.ethz.ch
@spcl_eth
We need an
abstraction!
2
1980’s 2000’s 2020’s
Lossless Networks
RDMA
Device Programming
Offload
Lossy Networks
Ethernet
![Page 8: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/8.jpg)
spcl.inf.ethz.ch
@spcl_eth
OFFLOAD
3
Computations DependenciesCommunications(non-blocking)
L0: recv a from P1;
L1: b = compute f(buff, a);
L2: send b to P1;
L0 and CPU-> L1
L1 -> L2
Offload EngineCPU
recv
send
comp EXPRESS
![Page 9: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/9.jpg)
spcl.inf.ethz.ch
@spcl_eth
(s-1)G
o
(s-1)G
oo
(s-1)G
(s-1)G
o
4
Performance Model
[1] A. Alexandrov et al. "LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for
parallel computation.“, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures. ACM, 1995.
CPU
OE
OE
CPU
P0
P1
P1{
L0: recv m1 from P1;
L1: send m2 to P1;
L0 -> L1
}
P0{
L0: recv m1 from P1;
L1: send m2 to P1;
}
time
![Page 10: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/10.jpg)
spcl.inf.ethz.ch
@spcl_eth
(s-1)G
o
(s-1)G
oo
(s-1)G
(s-1)G
o
4
Performance Model
[1] A. Alexandrov et al. "LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for
parallel computation.“, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures. ACM, 1995.
CPU
OE
OE
CPU
P0
P1
P1{
L0: recv m1 from P1;
L1: send m2 to P1;
L0 -> L1
}
P0{
L0: recv m1 from P1;
L1: send m2 to P1;
}
time
![Page 11: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/11.jpg)
spcl.inf.ethz.ch
@spcl_eth
(s-1)G
o
(s-1)G
oo
(s-1)G
(s-1)G
o
4
Performance Model
[1] A. Alexandrov et al. "LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for
parallel computation.“, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures. ACM, 1995.
CPU
OE
OE
CPU
P0
P1
P1{
L0: recv m1 from P1;
L1: send m2 to P1;
L0 -> L1
}
P0{
L0: recv m1 from P1;
L1: send m2 to P1;
}
time
![Page 12: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/12.jpg)
spcl.inf.ethz.ch
@spcl_eth
m
(s-1)G m
o
(s-1)G
oo
(s-1)G
(s-1)G
o
4
Performance Model
[1] A. Alexandrov et al. "LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for
parallel computation.“, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures. ACM, 1995.
CPU
OE
OE
CPU
P0
P1
P1{
L0: recv m1 from P1;
L1: send m2 to P1;
L0 -> L1
}
P0{
L0: recv m1 from P1;
L1: send m2 to P1;
}
time
![Page 13: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/13.jpg)
spcl.inf.ethz.ch
@spcl_eth
5
Offloading Collectives
A collective operation is fully offloaded if:
1. No synchronization is required in order to start the collective operation
2. Once a collective operation is started, no further CPU intervention is required in order
to progress or complete it.
L0: recv msg1 from 5;
L1: recv msg2 from 6;
L3: res = compute f(res, msg1);
L4: res = compute f(res, msg2);
L5: send res to 0;
L1 and CPU -> L3
L2 and CPU -> L4
L3 and L4 -> L5
recv
send
comp
recv comp
CPU
Definition. A schedule is a local dependency graph describing a partial ordered set of operations.
Definition. A collective communication involving 𝑛 nodes can be modeled as a set of schedules 𝑆 = 𝑆1, … , 𝑆𝑛where each node 𝑖 participates in the collective executing its own schedule 𝑆1
![Page 14: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/14.jpg)
spcl.inf.ethz.ch
@spcl_eth
5
Offloading Collectives
A collective operation is fully offloaded if:
1. No synchronization is required in order to start the collective operation
2. Once a collective operation is started, no further CPU intervention is required in order
to progress or complete it.
62 5
1
3
0
4
recv
send
comp
recv comp
CPU
Definition. A schedule is a local dependency graph describing a partial ordered set of operations.
Definition. A collective communication involving 𝑛 nodes can be modeled as a set of schedules 𝑆 = 𝑆1, … , 𝑆𝑛where each node 𝑖 participates in the collective executing its own schedule 𝑆1
![Page 15: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/15.jpg)
spcl.inf.ethz.ch
@spcl_eth
6
Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures.
Edmond Chow et al., “Asynchronous Iterative Algorithm for Computing Incomplete Factorizations
on GPUs”, High Performance Computing. Springer International Publishing, 2015.
![Page 16: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/16.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
![Page 17: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/17.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call
![Page 18: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/18.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message
![Page 19: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/19.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message
![Page 20: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/20.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message
![Page 21: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/21.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message
![Page 22: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/22.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message Activation message
![Page 23: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/23.jpg)
spcl.inf.ethz.ch
@spcl_eth
Theory Synchronized
7
Solo Collectives
Solo
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Synchronized collectives lead to the synchronization of the
participating nodes
A solo collective starts its execution as soon as one node (the
initiator) starts its own schedule
Collective call Data message Activation message
![Page 24: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/24.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 25: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/25.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 26: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/26.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 27: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/27.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 28: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/28.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 29: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/29.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 30: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/30.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 31: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/31.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 32: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/32.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 33: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/33.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 34: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/34.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 35: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/35.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 36: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/36.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 37: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/37.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 38: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/38.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 39: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/39.jpg)
spcl.inf.ethz.ch
@spcl_eth
8
Solo Collectives: Activation
Root-Activation: the initiator is always the root of the collective
Non-Root-Activation: the initiator can be any participating node
P0 P1 P2 P3 P4 P5 P6 P7
![Page 40: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/40.jpg)
spcl.inf.ethz.ch
@spcl_eth
TargetInitiator
9
A Case Study: Portals 4
[2] “The Portal 4.0.2 Network Programming Interface”
Portals Table
Priority List Overflow List
ME
ME
MEDiscard
ME
ME
NIMD
MD
MD
MD
Interconnection
NetworkNI
Based on the one-sided communication model
Matching/Non-Matching semantics can be adopted
![Page 41: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/41.jpg)
spcl.inf.ethz.ch
@spcl_eth
TargetInitiator
9
A Case Study: Portals 4
[2] “The Portal 4.0.2 Network Programming Interface”
Portals Table
Priority List Overflow List
ME
ME
MEDiscard
ME
ME
NIMD
MD
MD
MD
Interconnection
NetworkNI
Based on the one-sided communication model
Matching/Non-Matching semantics can be adopted
![Page 42: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/42.jpg)
spcl.inf.ethz.ch
@spcl_eth
10
Communication primitives
Put/Get operations are natively supported by Portals 4
One-sided + matching semantic
A Case Study: Portals 4
Atomic operations
Operands are the data specified by the MD at the initiator and by the ME
at the target
Available operators: min, max, sum, prod, swap, and, or, …
Counters
Associated with MDs or MEs
Count specific events (e.g., operation completion)
Triggered operations
Put/Get/Atomic associated with a counter
Executed when the associated counter reaches the specified threshold
![Page 43: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/43.jpg)
spcl.inf.ethz.ch
@spcl_eth
10
x y
x
z
y
A Case Study: Portals 4
Counters
Associated with MDs or MEs
Count specific events (e.g., operation completion)
Triggered operations
Put/Get/Atomic associated with a counter
Executed when the associated counter reaches the specified threshold
![Page 44: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/44.jpg)
spcl.inf.ethz.ch
@spcl_eth
x yct ct
10
x
z
y
A Case Study: Portals 4
Counters
Associated with MDs or MEs
Count specific events (e.g., operation completion)
Triggered operations
Put/Get/Atomic associated with a counter
Executed when the associated counter reaches the specified threshold
![Page 45: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/45.jpg)
spcl.inf.ethz.ch
@spcl_eth
x yct ct
x
z
y
ct ct
10
A Case Study: Portals 4
Counters
Associated with MDs or MEs
Count specific events (e.g., operation completion)
Triggered operations
Put/Get/Atomic associated with a counter
Executed when the associated counter reaches the specified threshold
![Page 46: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/46.jpg)
spcl.inf.ethz.ch
@spcl_eth
11
Experimental results
Curie, a Tier-0 system
5,040 nodes
2 eight-core Intel Sandy Bridge processors
Full fat-tree Infiniband QDR
OMPI: Open MPI 1.8.4
OMPI/P4: Open MPI 1.8.4 + Portals 4 backend
FFLIB: proof of concept library
One process per computing node
Broadcast Allreduce
More about FFLIB at
http://spcl.inf.ethz.ch/Research/Parallel_Programming/FFlib/
![Page 47: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/47.jpg)
spcl.inf.ethz.ch
@spcl_eth
12
Experimental resultsAllgatherScatter
Curie, a Tier-0 system
5,040 nodes
2 eight-core Intel Sandy Bridge processors
Full fat-tree Infiniband QDR
OMPI: Open MPI 1.8.4
OMPI/P4: Open MPI 1.8.4 + Portals 4 backend
FFLIB: proof of concept library
One process per computing node
More about FFLIB at
http://spcl.inf.ethz.ch/Research/Parallel_Programming/FFlib/
![Page 48: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/48.jpg)
spcl.inf.ethz.ch
@spcl_eth
Why? To study offloaded collectives at large scale
How? Extending the LogGOPSim to simulate Portals 4 functionalities
13
Simulations
[3] T. Hoefler, T. Schneider, A. Lumsdaine. “LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model”, In Proceedings of
the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10). ACM, 2010.
[4] Underwood et al., "Enabling Flexible Collective Communication Offload with Triggered Operations“, IEEE 19th Annual Symposium on
High Performance Interconnects (HOTI ‘11). IEEE, 2011.
AllreduceBroadcast
L o g G m
P4-SW 5𝜇𝑠 6𝜇𝑠 6𝜇𝑠 0.4𝑛𝑠 0.9𝑛𝑠
P4-HW 2.7𝜇𝑠 1.2𝜇𝑠 0.5𝜇𝑠 0.4𝑛𝑠 0.3𝑛𝑠 [4]
![Page 49: I GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. …htor.inf.ethz.ch/publications/img/exploiting-offload-enabled-ni... · spcl.inf.ethz.ch @spcl_eth S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD,](https://reader034.fdocuments.us/reader034/viewer/2022051721/5a7a6d547f8b9ac3118df538/html5/thumbnails/49.jpg)
spcl.inf.ethz.ch
@spcl_eth
Co-Authors
14
Abstract Machine Model Offloading Collectives
Solo Collectives Mapping to Portals 4
Results
P. JolivetK. D. Underwood
T. Hoefler