Ikki Fujiwara, Michihiro Koibuchi National Institute of...
Transcript of Ikki Fujiwara, Michihiro Koibuchi National Institute of...
Ikki Fujiwara,
Michihiro Koibuchi National Institute of Informatics
Hiroki Matsutani Keio University
Henri Casanova University of Hawaii at Manoa
IPDPS 2014 / May 20th, 2014 / Phoenix, Arizona, USA
The Light Speed is Fixed
2014-05-20
2
Koibuchi Lab @ National Institute of Informatics
c ≈ 0.3 m/ns c ≈ 0.2 m/ns
= 5.00 ns/m
Switch Delay is Continuously Decreasing
2014-05-20
3
Koibuchi Lab @ National Institute of Informatics
1 hop =
÷ 5 ns/m =
140 ns
QLogic 12300
28 m
200 ns
Cisco SFS7000D
40 m
60 ns
A future product
?
12 m
Switch delay will no longer dominate the end-to-end
communication latency
Switch delay
Equivalent
cable length
What Happens in the Future
2014-05-20
4
Koibuchi Lab @ National Institute of Informatics
0.8
1.6
2.4
3.2
0 60 120 180
Maxi
mu
m late
ncy
[μ
s]
Switch delay [ns]
Random
degree=11
diameter=5
Hypercube
degree=11
diameter=11
Traditional Hypercube outperforms the same-degree
Random topology!
Topology Design Trends
2014-05-20
5
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Ring+Random [Koibuchi et al. ISCA12]
HyperX [Ahn et al. SC09]
Jellyfish [Singla et al. NDSI12]
Skywalk
Torus / Hypercube
Introduction
Skywalk construction
Intra-cabinet links
Inter-cabinet links
Graph analysis
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
6
Koibuchi Lab @ National Institute of Informatics
Intra-cabinet Links
2014-05-20
7
Koibuchi Lab @ National Institute of Informatics
Switch Hosts (compute nodes) *
Cabinet
* Hereafter the hosts are omitted
1 Randomly connect the switches in each cabinet — Possibly fully connected
Inter-cabinet Links
2014-05-20
8
Koibuchi Lab @ National Institute of Informatics
Floor
Cabinets
2 Randomly connect the
cabinets in each row
Inter-cabinet Links
2014-05-20
9
Koibuchi Lab @ National Institute of Informatics
3 Randomly connect the
cabinets in each column
4 Randomly connect the remaining cabinets (optional)
2 Randomly connect the
cabinets in each row
Skywalk Construction
2014-05-20
10
Koibuchi Lab @ National Institute of Informatics
4 Randomly connect the remaining cabinets (optional)
3 Randomly connect the
cabinets in each column
2 Randomly connect the
cabinets in each row
1 Randomly connect the switches in each cabinet — Possibly fully connected
Skywalk Details
Parameters
z = Number of switch in each cabinet
c = Number of cabinets
di = Number of intra-cabinet links at a switch
do = Number of inter-cabinet links at a switch
d = di + do = Total degree
Cyclic linking
Inter-cabinet links are connected to one of the switches in that
cabinet in a cyclic manner
Fastest routing
Packets choose lowest-latency paths (not shortest-hop paths)
2014-05-20
11
Koibuchi Lab @ National Institute of Informatics
Standpoints of Skywalk and Dragonfly
2014-05-20
12
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Ring+Random [Koibuchi et al. ISCA12]
HyperX [Ahn et al. SC09]
Jellyfish [Singla et al. NDSI12]
Torus / Hypercube
Dragonfly 2-layer hierarchical meta-topology
with intra-group and inter-group
sub-topologies
Skywalk A Dragonfly instance
• group = cabinet
• intra-group: random
• inter-group: random
Introduction
Skywalk construction
Graph analysis
Switch delay vs. latency
Degree vs. latency
Total cable length vs. latency
Network size vs. latency
Cabinet size vs. latency
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
13
Koibuchi Lab @ National Institute of Informatics
Graph Analysis: Setup
Parameters: (unless otherwise specified)
z = 8 switches/cabinet
c = 256 cabinets arranged in a 16×16 grid
N = 2,048 switches in total
Switch delay = 60 ns
Packet injection delay = 300 ns
Featured topologies:
Skywalk fully connected for intra-cabinet
Random d-degree uniform random
Torus 3-D (8×16×16) or 5-D (8×4×4×4×4)
HyperX tailored to map onto the floorplan
Dragonfly group=cabinet, fully connected for both intra- and inter-group
See the proceeding for average latency
2014-05-20
14
Koibuchi Lab @ National Institute of Informatics
Switch Delay vs. Latency
2014-05-20
15
Koibuchi Lab @ National Institute of Informatics
* HyperX is omitted. See the proceeding for complete results.
0
0.5
1
1.5
2
2.5
3
3.5
0 100 200 300 400 500
Maxi
mu
m late
ncy
[μ
s]
Switch delay [ns]
3-D Torus
d=6
Hypercube
d=11 Random
d=11
Skywalk
d=11
Dragonfly
d=39
0.5
0.6
0.7
0.8
0.9
0 20 40 60
Skywalk leads to the lowest latency with ultra-low-delay
switches and also with high-delay switches
d = degree
Degree vs. Latency
2014-05-20
16
Koibuchi Lab @ National Institute of Informatics
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 8 16 24 32 40
Maxi
mu
m late
ncy
[μ
s]
Degree
5-D Torus
HyperX
Random
Skywalk
Dragonfly
* Skywalk with di = {1, 4} and Hypercube are omitted. See the proceeding for complete results.
Skywalk leads to a desirable tradeoff between degree and
latency
Total Cable Length vs. Latency
2014-05-20
17
Koibuchi Lab @ National Institute of Informatics
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 200 400 600
Maxi
mu
m late
ncy
[μ
s]
Total cable length [km]
5-D Torus
HyperX
Random
Skywalk
Dragonfly
* Skywalk with di = {1, 4} and Hypercube are omitted. See the proceeding for complete results.
Skywalk saves 90% cable length over Dragonfly with only
19% higher maximum latency
Network Size vs. Latency
2014-05-20
18
Koibuchi Lab @ National Institute of Informatics
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
128 512 2048 8192
Maxi
mu
m late
ncy
[μ
s]
#Switch
Skywalk
d=8Skywalk
d=16 Skywalk
d=32
Skywalk
d=64
Dragonfly
d=9
3-D Torus
d=6
Dragonfly
d=39
Dragonfly
d=135
Dragonfly
d=15
* Hypercube is omitted. See the proceeding for complete results.
Skywalk scales well with relatively low degree
d = degree
Cabinet Size vs. Latency
2014-05-20
19
Koibuchi Lab @ National Institute of Informatics
0.6
0.7
0.8
0.9
1
1.1
1.2
2 8 32 128
Maxi
mu
m late
ncy
[μ
s]
#Switch/cabinet
Skywalk
d=8
Skywalk
d=16
Skywalk
d=32
Skywalk has an optimal cabinet size because it becomes
similar to Random with very large or very small cabinets
d = degree
Introduction
Skywalk construction
Graph analysis
Cycle-accurate simulation
Throughput vs. latency
Conclusion
Agenda
2014-05-20
20
Koibuchi Lab @ National Institute of Informatics
Cycle-accurate Simulation: Setup
Topology parameters: h = 8 hosts/switch
z = 4 switches/cabinet
c = 64 cabinets arranged in an 8×8 grid
N = 256 switches in total
Switch delay = 60 ns
Simulation parameters: Adaptive deadlock-free routing
4 virtual channels
256 bits/flit × 33 flits/packet = 8,448 bits/packet
96 Gbps/switch ÷ 8 hosts/switch = 12 Gbps/host max.
Random uniform traffic
See the proceeding for: Bit reversal traffic
Matrix transpose traffic
2014-05-20
21
Koibuchi Lab @ National Institute of Informatics
Cycle-accurate Simulation: Result
2014-05-20
22
Koibuchi Lab @ National Institute of Informatics
* HyperX is omitted. See the proceeding for complete results.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 2 4 6 8 10 12
Late
ncy
[μ
s]
Accepted traffic [Gbit/sec/host]
Skywalk
d=11
Hypercube
d=8
Dragonfly
d=19
3-D Torus
d=6Random
d=11
Skywalk achieves low latency and higher throughput than
Random at lower degree than Dragonfly
d = degree
Introduction
Skywalk construction
Graph analysis
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
23
Koibuchi Lab @ National Institute of Informatics
Wrap-up
The speed of light affects topology design once ultra-low-
delay switches are put into practical use
We propose the “Skywalk” topology that uses randomness
in a layout-conscious way
Skywalk achieves desirable tradeoffs between end-to-end
latency and degree or cable length
Cycle-accurate simulation show that Skywalk provides not
only low latency but also high throughput at low degree
2014-05-20
24
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Skywalk