1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu,...
-
Upload
dayna-gilmore -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu,...
1
Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao,
Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano
Performance Evaluation of Power-aware Multi-tree
Ethernet for HPC Interconnects
HPC PC Clusters with Ethernet• Host/CPU
– Various low-power techniques are used
• DVFS• Power Gating
• Ethernet Switch– Always preparing
(active) for packet injection
We evaluate our power-aware On/Off Link Activation for Ethernet on PC clusters
PC Ethernet switch
Interconnects share@TOP500 (Nov 2011 ) Gigabit Ethernet
45%GbE
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• Our On/Off link activation method
• Evaluations– Performance and power consumption of PC
clusters
Outline
Ethernet on HPC systemsIncreasing the number of ports of GbE switches
- 24/48-port switches provide the lowest cost per port
Improving the computation power of host ( > 10GFlops)
Link aggregation [IEEE 802.3ad] + multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004][Koibuchi et al, IEEE TPDS2011]
- drastically increasing the number of links
switch
host
Link aggr. using 2 links
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
2 paths
• Power cons is almost constant regardless of traffic load• # of activated ports dominates the power cons of switches
– Power cons of port is reduced down to ZERO by port-shutdown operation
Power cons of GbE switchesProduct Port Other
(Xbar) Total ( ratio of ports )
PC5324 1.2 14.9 42.9(65%)
PC6224 2.0 42.5 91.1(53%)
PC6248 2.1 56.8 155.2(63%)
SF-420 1.0 32.6 55.4(41%)
C-3750 1.8 84.5 127.7(34%)
Unit :W
Overview of the on/off link method
Traffic load becomes low
( turning off a part of links)
Network load is not always high (e.g. during computation time
Switch ports consume 40-60% of the total power
switch
host0 41 2 3 5 6 7
コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• Our On/Off link activation method• Evaluations
– Performance and power consumption of PC clusters
Outline
A framework of on/off link methodEg : port monitor,
IPTraf, pilot execution
How is it implemented on Ethernet?
Low or high-load links appear
Selection of on/off links and paths
Update of on/off link operation
Traffic monitoring
No
Yes
Traffic load becomes low
Paths: Before & After
The before path is deactivated
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Requirements for the on/off link method
No update of the MPI communication library
Hide the overhead to activate the link
Stabilize the MAC address tables during updating paths
Switch
Host
Before
After0 41 2 3 5 6 7
コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Changing the paths for on/off link op
• Using switch-tagged ・ VLAN routing method[Otsuka,ICPP06]
– Specifying the path by attaching the VLAN tag to a frame ( Port VLAN ID: PVID)
– Each host sends and receives usual (untagged) frames• When an frame arrives at a switch from a host, add a VLAN tag (PVID) to it• When it leaves to a host, removes the VLAN tag
The path of PVID#v1The path of PVID#v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
VLAN v0
VLAN v1
PVID v0 v1
VLAN tag #v0 is
attached
When a deactivated link is activated • (1) Activating the target link
– Using no-shutdown command of switch• (2) Create VLAN v0 for the new path set that includes the
target link, and make its MAC address table• (3) Update the PVIDs of the ports for connecting hosts to v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Updating PVID to v0
Before
PVID v0
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 3
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 1,2Activate links
VLAN v0
When the traffic increases
When an activated link is deactivated• (1) Create VLAN v1 for the new path set that avoids the target
link, and make its MAC address table
• (2) Update the PVID of the ports for connecting hosts to v1• (3) Deactivating the link
The path of PVID v0
PVID #v0 v1
Before
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータStep 3
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Deactivating
Decreasing the traffic
0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ
Step 1,2
The path of PVID v1
• Ethernet for HPC– Link aggregation (channel group) + multi-paths
• On/Off link activation method• Evaluations
– Performance and power consumption of PC clusters
Outline
Performance evaluation on a PC cluster
• PC Cluster – 66 hosts, 528 cores – CPU Quad-Core AMD Opteron 2.3GHz– Memory DDR2 667 MHz 8GB– NIC & driver Broadcom BCM95721, Tigon3– Kernel 2.6.9-67.0.15.ELsmp
• GbE switch– Dell PC 6248
• 48port@8
• Application– NPB 3.2 / HPL (OpenMPI 1.3 /MPICH-1.2.7p1)
Dell PC6248SW
Topology of the cluster• Tree or completely connected graph,
– Up to 5 links between switches• Enabling the link aggregation (IEEE 803.ad)
• Pre-executing the applications for estimating traffic amount– Set up the on/off link set before executing
• Performing our simple link regularation algorithm
Completely (fully) Connected TopologyTree
Pre-evaluation (even link removal) P
erf
orm
ance
(T
flop
s)
0
0.5
1
1.5
2
2.5
3
3.5
Tree(1link) Tree(2link) Tree(5link) Compl(1link) Compl(2link) Compl(5link) Ideal
Per
form
ance
(Tflo
ps)
0
1
2
3
4
5
6
7
8
CG FT IS LU MG BT SP
Rel
ativ
e M
op/s
Tree(1link) Tree(2link) Tree(3link)Tree(4link) Tree(5link) Compl(1link)Compl(2link) Compl(5link) ideal
Rmax/Rpeak=61%
(2) Linpack (HPL)
(3) NPB, Class C
0
100
200
300
400
500
600
700
800
900
Matrix transpose Bit- reversal
Thr
ough
put(
Mbp
s/ho
st)
Tree(1link) Tree(2link) Tree(3link) Tree(4link) Tree(5link)Compl(1link) Compl(2link) Compl(3link) Compl(4link) Compl(5link)
Tree
Tree Compl
Compl
All the applications drastically decrease the performance if links are uniformly removed
(1) Synthetic traffic
Performance and Power in HPL
Rmax/Rpeak=61%
Over 20% power reduction with almost same performance
Almost same performance
Performance and Power in NPB64
Rmax/Rpeak=61%
Over 25% power reduction with almost same performance
CLASS C
IS, LU, BT, SP keep performance
Performance and Power in NPB128
Rmax/Rpeak=61%
CLASS C
Over 20% power reduction with almost same performance
LU, MG keep performance
• We evaluated our on/off link method on Ethernet– Multi-tree topologies & link aggre. are enabled – Using port-shutdown command for reducing
power cons• Ports consume up to 60% of switch power
• Reducing by up to 37% NW power in the 528-core PC cluster
Conclusions