Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo,...
-
Upload
flora-murphy -
Category
Documents
-
view
218 -
download
1
Transcript of Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo,...
Michihiro Koibuchi(NII, Japan )Tomohiro Otsuka(Keio U, Japan )Hiroki Matsutani ( U of Tokyo, Japan )Hideharu Amano ( Keio U/ NII, Japan )
An On/Off Link Activation Method for Low-Power Ethernet
in PC Clusters
HPC PC Clusters with Ethernet• Host/CPU
– Various low-power techniques are used
• DVFS• Power Gating
• Ethernet Switch– Always preparing
(active) for packet injection
We propose, and evaluate a low-power technique of Ethernet switches for PC clusters
PC Ethernet switch
Interconnects share@TOP500 (Nov 2008 ) Gigabit Ethernet
56%GbE
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• On/Off link activation method• Evaluations
– Overhead of On/Off link operation– Performance and power consumption of PC
clusters
Outline
Ethernet on HPC systemsIncreasing the number of ports of GbE switches
- 24/48-port switches provide the lowest cost per port
Improving the computation power of host ( > 10GFlops)
Link aggregation [IEEE 802.3ad] + multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004]
- drastically increasing the number of links
switch
host
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
Link aggr. using 3 links
4 paths
• Power cons is almost constant regardless of traffic load• # of activated ports dominates the power cons of switches
– Power cons of port is reduced down to ZERO by port-shutdown operation
Power cons of GbE switchesProduct Port Other
(Xbar) Total ( ratio of ports )
PC5324 1.2 14.9 42.9(65%)
PC6224 2.0 42.5 91.1(53%)
PC6248 2.1 56.8 155.2(63%)
SF-420 1.0 32.6 55.4(41%)
C-3750 1.8 84.5 127.7(34%)
Unit :W
Overview of the on/off link method
switch
node
Traffic load becomes low
( turning off a part of links)
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
Network load is not always high (e.g. during computation time
Switch ports consume 40-60% of the total power
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• On/Off link activation method• Evaluations
– Overhead of On/Off link operation– Performance and power consumption of PC
clusters
Outline
A framework of on/off link methodEg : port monitor,
IPTraf, pilot execution
How is it implemented on Ethernet?
Low or high-load links appear
Selection of on/off links and paths
Update of on/off link operation
Traffic monitoring
No
Yes
Very crucial factor
Low traffic load is detected
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 910 11 12131415
Paths: Before & After the before path is deactivated
Requirements for the on/off link method
To achieve a practical on/off link activation method,
No update of the MPI communication library
Using existing functions of commercial switches
Hiding the overhead to activate the link
Stabilizing the MAC address tables during updating paths
- Avoiding broadcast storms, and communication interruption
TREE 1 TREE 4TREE 3TREE 2
0 1 2 3 4 5 6 7 8 9 10 11 12131415
Switch
Host
Before
After
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Changing the paths for on/off link op
• Using switch-tagged ・ VLAN routing method[Otsuka,ICPP06]
– Specifying the path by attaching the VLAN tag to a frame ( Port VLAN ID: PVID)
– Each host sends and receives usual (untagged) frames• When an frame arrives at a switch from a host, add a VLAN tag (PVID) to it• When it leaves to a host, removes the VLAN tag
The path of PVID#v1The path of PVID#v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
VLAN v0
VLAN v1
PVID v0 1
VLAN tag #v0 is
attached
When a deactivated link is activated • (1) Activating the target link
– Using no-shutdown command of switch• (2) Create VLAN v0 for the new path set that includes the
target link, and make its MAC address table• (3) Update the PVIDs of the ports for connecting hosts to v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Updating PVID to v0
Before
PVID v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 3
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 1,2Link On,
VLAN v0
When the traffic increases
When an activated link is deactivated• (1) Create VLAN v1 for the new path set that avoids the target
link, and make its MAC address table
• (2) Update the PVID of the ports for connecting hosts to v1• (3) Deactivating the link
The path of PVID v0
PVID #v0 v1
Before
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータStep 3
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Deactivating
Decreasing the traffic
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 1,2
The path of PVID v1
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• On/Off link activation method• Evaluations
– Overhead of On/Off link operation• On/off link operation• Overhead to modify the path set
– Performance and power consumption of PC clusters
Outline
Dell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports)
We can buy them at $1,000-3,000
a link is continuously operated: on off on
• When enabling STP, the overhead becomes some dozens ~ 1 min• To hide this overhead, paths should be updated after completing the
on/off operation
Fund. eval : On/Off overhead
On/Off Link Op.
PC5324 4.0 (sec)
PC6224 3.4
PC6248 2.2
SF-420 12.0コンピュータコンピュータ
• Measure the overhead to change paths using VLANs• Communication is not interrupted!!
– Enabling the runtime on/off link activation
Fund. eval(2) : overhead to update paths
Path update
PC5324 0(sec)
PC6224 0
PC6248 0
SF-420 0
コンピュータ
コンピュータ
Before
After
Update PVID to v1
VLAN v0
VLAN v1
Performance evaluation on a PC cluster
• PC Cluster – 128 hosts, Dual Opteron 1.8GHz x2– MPICH 1.2.7p1
• GbE switch– Dell Power Connect6248
• 28host per switch• 48port@8
• Application– NPB 3.2
Topology of the cluster• Peak: 4×2 torus, 6 links between switches
– Enabling the link aggregation (IEEE 803.ad)
• Pre-executing the applications for estimating traffic amount– Set up the on/off link set before executing
• Two on/off link selection algorithms – Conservative: maintain the maximum amount of traffic on a link– Aggressive: further power reduction ( details are the proceeding )
Torus
Results of NPB(64 procs, PC6248 SW )
Fig 1 : Performance Fig 2 : Power Cons of NWs, PC6248s
26% of NW power cons is reduced w/o performance degradation
0
0.2
0.4
0.6
0.8
1
1.2
EP IS LU SP
Rel
ativ
e M
op/s
peak(all links) conservative aggressive
35offlink 14
24
10 40 11 4053
0.6
0.7
0.8
0.9
1
1.1
EP IS LU SP
Rela
tive P
ow
er
Cons(
W)
peak(all links)conservative aggressive
The conservative policy maintained almost the peak performance
26% of power reduction
Results of NPB(64 procs, other SWs )
A small number of services in L2 switch ( PC5324) is always running compared with that of L3 switch ( PC6248)
0.6
0.7
0.8
0.9
1
1.1
EP IS LU SP
Rela
tive P
ow
er
Cons(
W)
peak(all links)conservative aggressive
0.6
0.7
0.8
0.9
1
1.1
EP IS LU SP
Rela
tive P
ow
er
Cons(
W)
peak(all links) conservative aggressive
Fig 3 : Power Cons, SF-420s
Fig 4 : Power Cons, PC5324
37% of power reduction
The L2 switches reduces the larger ratio of power cons
• On/Off interconnection networks– Cannot be directly applied to Ethernet– M.Alonso[IPDPS05],V.Soteriou[TPDS07]
– Our on/off link method enables to support some of them in Ethernet
• DVFS for interconnection networks– L.Shang[HPCA03], J.M.Stine[CAL04]– Using multi-speed Ethernet (10M/100M/GbE/10GE) is
similar to the approach for DVFS• Dell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1W
Related Work
• We propose the on/off link method on Ethernet– Using port-shutdown command for reducing
power cons• Switch ports consume up to 60% of power cons
in GbE switch– Stabilizing the update of the MAC address table
• Evaluations on the PC cluster with GbE switches– No overhead to update paths– Reducing down to up to 37% of NW power cons
• We will provide the total solution of Ethernet for Low-Power PC clusters
Link aggre. + multi-path topology + on/off links
Conclusions