Post on 30-Dec-2015
description
Jose Miguel Montanana (NII, Japan)
Michihiro Koibuchi (NII, Japan )Hiroki Matsutani ( U of Tokyo, Japan )Hideharu Amano ( Keio U/ NII, Japan )
Stabilizing Path Modification of Power-Aware On/Off
Interconnection Networks
• HPC networks (Infiniband, GbE)
• On/Off link activation method – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links
• Applying network reconfiguration to switches
• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change
Outline
0
50
100
150
200
250
300
Jun2003
Jun2004
Jun2005
Jun2006
Jun2007
Jun2008
Other technologiesMyrinetInfiniBand (IBA)Gigabit Ethernet
20%
40%
60%
50%
30%
10%
0%
Num
ber
of S
uper
com
pute
rs o
n T
op50
0 L
ist
Per
cent
age
on T
op50
0 L
ist
Network of High-performance computing
Virginia Tech's X
2,200 cores 280th on Top500
ABE (NCSA)
9,600 cores 23th on top500
ASCI-Q (LANL)
8,192 cores
BLUEGENE/L (LLNL)
212,992 processors 2nd on Top500 list
IBA
Propietary
RoadRunner (LANL)
122,400 cores 1st on
Top500
Quadrics
IBA
TACC (Univ Texas)
251,904 cores 5th on top500
IBA
IBA
Examples
2008
HPC Networks Small switches (24/48-port) provide the lowest cost per port
When 100,000 cores are connected, a large number of small switches are needed
- drastically increasing the number of links
- Unused and rarely-used links should be deactivated for power-aware HPCs
switch
host
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
Link aggr. using 3 links
4 paths
• Power cons is almost constant regardless of traffic load• # of activated ports dominates the power cons of switches
– Power cons of port is reduced down to ZERO by port-shutdown operation
Power cons of HPC switchesProduct Port Other
(Xbar) Total ( ratio of ports )
PC5324 1.2 14.9 42.9(65%)
PC6224 2.0 42.5 91.1(53%)
PC6248 2.1 56.8 155.2(63%)
SF-420 1.0 32.6 55.4(41%)SFS7000D-SK9
1.0 43.4 66.1(34%)
Unit :W
GbE
IB
• HPC networks (Infiniband, GbE)
• On/Off link activation method – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links
• Applying network reconfiguration to switches
• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change
Outline
Overview of the on/off link method
switch
host
Traffic load becomes low
( turning off a part of links)
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
Network load is not always high (e.g. during computation time)
Switch ports consume 40-60% of the total power of a switch
A runtime on/off link methodEg : port monitor,
IPTraf, pilot execution
How is NW stabilized during the path-update?
Low or high-load links appear
Selection of on/off links and paths
Update of link status and paths
Traffic monitoring
No
Yes
Very crucial factor
Low traffic load is detected
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 910 11 12131415
Paths: Before & After the before path is deactivated
0
1 23
45
6
Stabilizing network during the path updateNetwork Reconfiguration (deadlock avoidance)
Rold
Rold is deadlock freeRnew is deadlock freeRold+Rnew may deadlock
Rnew
3
05
14
6
2
NW ReconfigurationSwitch
Link
Rold=Routing Table
before the update
Rnew=Routing Table after the update
2
6
6
2
0
13
45
Network Reconfiguration
Rold
Rold is deadlock freeRnew is deadlock freeRold+Rnew may cause deadlock
Rnew
3
05
14
Reconfiguration
DeadlockOld behind newNew behind old
Existing NW reconf tech. on fault-tolerant networks
DOUBLE-SCHEMESIMPLE RECONFIGURATION
Static reconfiguration Dynamic reconfigurationTraffic is stopped
New routing is appliedTraffic is resumed
Traffic is not stoppedOld and new routing
coexist
Difficulty to avoid deadlockHigh latencies
STATIC RECONFIGURATION(ST)
Current NW Reconfigurations – SR PDA: Simple
Reconfiguration: Packet Dropping Aware[Lysne08,TC]
• Tokens are sent before update of routing• Packets are sent after updating routing
tables
– SR LA: Simple Reconfiguration: Latency Aware[Lysne08,IEEE TC]
• All new tables are distributed before using new one.
• Latency due to the tokens is reduced.
– DS: Double Scheme[Pinkston03,TPDS]
• Requires 2 virtual channels.• One channel have to be drained
– ST:Static Reconfiguration• Traffic injection is completely stopped
• HPC Interconnects (Infiniband, GbE)
• On/Off link activation method – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links
• Applying network reconfiguration to switches
• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change
Outline
• Switch model (InfiniBand)• Buffered input (1KB per VL) and output (1KB per VL) ports • Non-multiplexed crossbar with separate ports per VL• FIFO-based crossbar arbiter per output crossbar port• Round-robin arbiter per output port• 100 ns routing time
• Link model• Link Speed = 2.5 Gbps (1X links)
• Topologies• 2D mesh networks
• Traffic model• Packet lengths are 58 bytes• Uniform• Full range of traffic, from low load to saturation
Simulation Environment
Evaluation ResultsWe twice apply NW reconf. process to each execution:
• Deactivating links, after decrease the traffic injection
• Re-activating links, after increase the traffic injection
We evaluated full range of initial traffic injection, (from low traffic-to near congestion)
Static Reconfiguration (ST)
(a) Low Traffic Load
(b) High Traffic Load
Traffic load decreases Traffic load increasesLatency is high
Latency is high
Traffic decreases, a link is deactivated
Traffic increases, a link is reactivated
At each on/off link operation, traffic is not stabilized in ST!!
SR-LA (dynamic reconfiguration)
(a) Low Traffic Load
(b) High Traffic Load
Also, at each on/off link operation, traffic is not stabilized in SR-LA!!
SR-PDA (dynamic reconfiguration)
(a) Low Traffic Load
(b) High Traffic Load
Also, at each on/off link operation, traffic is not stabilized in SR-PDA!!
Double Scheme (dynamic reocnfiguration)
(a) Low Traffic Load
(b) High Traffic Load
Latency is constant
Traffic load decreases Traffic load increases
Latency is constant
Stabilizing the path update only in Double Scheme!!
DS
ST
SRL
Larger Network (8x8 Mesh)
Similar behavior!!
Only Double Scheme stabilizes networks during the path update!!
• We apply network reconfiguration techniques to power-aware on/off networks for HPC– Links consume ~63% of switch power
• On/off link activation reduces power • It must accept the topology change
– Network reconfiguration smoothly supports the path update » Stabilizing the update of new/old paths» Avoiding deadlocks of new/old paths
• Cycle-accurate simulation – shows its impact on the power-aware on/off networks
• Double Scheme (dynamic NW reconf) maintains performance, stabilizing networks, deadlock avoidance
• Network reconfiguration is essential for realizing the power-aware on/off networks for HPC systems
Conclusions