Network Measurement & Characterisation and the Challenge of SuperComputing SC200x
description
Transcript of Network Measurement & Characterisation and the Challenge of SuperComputing SC200x
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
Network Measurement & Characterisation and the
Challenge of SuperComputing SC200x
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
The SC Network
Working with S2io Cisco & folks
At the SLAC BoothRunning theBW Challenge
Bandwidth Lust at SC2003
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
The Bandwidth Challenge at SC2003 The peak aggregate bandwidth from the 3 booths was 23.21Gbits/s 1-way link utilisations of >90% 6.6 TBytes in 48 minutes
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
Multi-Gigabit flows at SC2003 BW Challenge Three Server systems with 10 Gigabit Ethernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:
Pal Alto PAIX rtt 17 ms , window 30 MB Shared with Caltech booth 4.37 Gbit HighSpeed TCP I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link
3.3Gbit Scalable TCP I=8% Tested 2 flows sum 1.9Gbit I=39%
Chicago Starlight rtt 65 ms , window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit HighSpeed TCP I=1.6%
Amsterdam SARA rtt 175 ms , window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit HighSpeed TCP I=6.9%
Very Stable Both used Abilene to Chicago
10 Gbits/s throughput from SC2003 to PAIX
0
1
2
3
4
5
6
7
8
9
10
11/19/0315:59
11/19/0316:13
11/19/0316:27
11/19/0316:42
11/19/0316:56
11/19/0317:11
11/19/0317:25 Date & Time
Thro
ughp
ut G
bits
/s
Router to LA/PAIXPhoenix-PAIX HS-TCPPhoenix-PAIX Scalable-TCPPhoenix-PAIX Scalable-TCP #2
10 Gbits/s throughput from SC2003 to Chicago & Amsterdam
0
1
2
3
4
5
6
7
8
9
10
11/19/0315:59
11/19/0316:13
11/19/0316:27
11/19/0316:42
11/19/0316:56
11/19/0317:11
11/19/0317:25 Date & Time
Thro
ughp
ut G
bits
/s
Router traffic to Abilele
Phoenix-Chicago
Phoenix-Amsterdam
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
UKLight at SC2004 UK e-Science Researchers from Manchester, UCL & ULCC involved in the
Bandwidth Challenge Collaborated with Scientists & Engineers from Caltech, CERN, FERMI, SLAC,
Starlight, UKERNA & U. of Florida Worked on:
10 Gbit Ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale 10 Gbit Ethernet link from SC2004 and the CENIC/NLR/Level(3) PoP in
Sunnyvale 10 Gbit Ethernet link from SC2004 to Chicago and on to UKLight
UKLight focused on disk-to-disk transfers between UK sites and Pittsburgh
UK had generous support from Boston Ltd who loaned the servers The BWC Collaboration had support from:
S2io NICs Chelsio TOE Sun who loaned servers
Essential support from Cisco
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
SCINet
Collaboration at SC2004 Setting up the BW Bunker
The BW Challenge at the SLAC Booth
Working with S2io, Sun, Chelsio
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
The Bandwidth Challenge – SC2004 The peak aggregate bandwidth from the booths was 101.13Gbits/s Or 3 full length DVD per second Saturated TEN 10GE waves SLAC Booth: Sunnyvale to Pittsburgh, LA to Pittsburgh and Chicago
to Pittsburgh (with UKLight).
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
SC2004 UKLIGHT – Focused on Disk-to-Disk
MB-NG 7600 OSR
Manchester
ULCC UKlight
UCL HEP UCL network
K2
Ci
Chicago Starlight
Amsterdam
SC2004
Caltech BoothUltraLight IP
SLAC Booth
Cisco 6509
UKlight 10GFour 1GE channels
UKlight 10G
Surfnet/ EuroLink 10GTwo 1GE channels
NLR LambdaNLR-PITT-STAR-10GE-16
K2
K2 Ci
CERN 7600
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
Transatlantic Ethernet: TCP Throughput Tests Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel Memory-memory TCP throughput Standard TCP
Wire rate throughput of 940 Mbit/s
First 10 sec
Work in progress to study: Implementation detail Advanced stacks Packet loss Sharing
0
500
1000
1500
2000
0 20000 40000 60000 80000 100000 120000 140000time ms
TCPA
chiv
e M
bit/s
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1400000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
0
500
1000
1500
2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000time ms
TCPA
chiv
e M
bit/s
0500000010000000150000002000000025000000300000003500000040000000
Cwnd
InstaneousBWAveBWCurCwnd (Value)
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
Transatlantic Ethernet: disk-to-disk Tests Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel RAID0 (6 SATA disks) Bbftp (disk-disk) throughput Standard TCP
Throughput of 436 Mbit/s
First 10 sec
Work in progress to study: Throughput limitations Help real users
0
500
1000
1500
2000
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000time ms
TCPA
chiv
e M
bit/s
0500000010000000150000002000000025000000300000003500000040000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
0
500
1000
1500
2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000time ms
TCPA
chiv
e M
bit/s
0500000010000000150000002000000025000000300000003500000040000000
Cwnd
InstaneousBWAveBWCurCwnd (Value)
sculcc1-chi-2
0100200300400500600700800900
1000
0 10 20 30TCP buffer size Mbytes
TCP
Ach
ieva
ble
thro
ughp
ut
Mbi
t/s
iperf Sender Mbit/sbbftp Mbit/s
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
10 Gigabit Ethernet: UDP Throughput Tests 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s
CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate of 5.7 Gbit/s
SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire ra
te M
bits
/s
16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
05
101520253035404550
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tran
sfer
tim
e us
0
1
2
3
4
5
6
7
8
9
PC
I-X
Tra
nsfe
r ra
te G
bit/s
Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X
Kernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR AccessPCI-X Sequence
Data TransferInterrupt & CSR Update
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
10 Gigabit Ethernet: SC2004 TCP Tests Sun AMD opteron compute servers v20z Chelsio TOE Tests between Linux 2.6.6. hosts
10 Gbit ethernet link from SC2004 to CENIC/NLR/Level(3) PoP in Sunnyvale Two 2.4GHz AMD 64 bit Opteron processors with 4GB of RAM at SC2004 1500B MTU, all Linux 2.6.6 in one direction 9.43G i.e. 9.07G goodput and the reverse direction 5.65G i.e. 5.44G goodput Total of 15+G on wire.
10 Gbit ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale One 2.4GHz AMD 64 bit Opteron each end 2MByte window, 16 streams, 1500B MTU, all Linux 2.6.6 in one direction 7.72Gbit/s i.e. 7.42 Gbit/s goodput 120mins (6.6Tbits shipped)
S2io NICs with Solaris 10 in 4*2.2GHz Opteron cpu v40z to one or more S2io or Chelsio NICs with Linux 2.6.5 or 2.6.6 in 2*2.4GHz V20Zs LAN 1 S2io NIC back to back: 7.46 Gbit/s LAN 2 S2io in V40z to 2 V20z : each NIC ~6 Gbit/s total 12.08 Gbit/s
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones
UKLight and ESLEA Collaboration forming for SC2005
Caltech, CERN, FERMI, SLAC, Starlight, UKLight, … Current Proposals include:
Bandwidth Challenge with even faster disk-to-disk transfers between UK sites and SC2005
Radio Astronomy demo at 512 Mbit user data or 1 Gbit user dataJapan, Haystack(MIT), Jodrell Bank, JIVE
High Bandwidth linkup between UK and US HPC systems 10Gig NLR wave to Seattle
Set up a 10 Gigabit Ethernet Test Bench Experiments (CALICE) need to investigate >25 Gbit to the processor
ESLEA/UKlight need resources to study: New protocols and congestion / sharing The interaction between protcol processing, applications and storage Monitoring L1/L2 behaviour in hybrid networks
ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones