Network Measurement & Characterisation and the Challenge of SuperComputing SC200x

16
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x

description

Network Measurement & Characterisation and the Challenge of SuperComputing SC200x. Bandwidth Lust at SC2003. Working with S2io Cisco & folks. The SC Network. At the SLAC Booth Running the BW Challenge. The Bandwidth Challenge at SC2003. - PowerPoint PPT Presentation

Transcript of Network Measurement & Characterisation and the Challenge of SuperComputing SC200x

Page 1: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

Network Measurement & Characterisation and the

Challenge of SuperComputing SC200x

Page 2: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

The SC Network

Working with S2io Cisco & folks

At the SLAC BoothRunning theBW Challenge

Bandwidth Lust at SC2003

Page 3: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

The Bandwidth Challenge at SC2003 The peak aggregate bandwidth from the 3 booths was 23.21Gbits/s 1-way link utilisations of >90% 6.6 TBytes in 48 minutes

Page 4: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

Multi-Gigabit flows at SC2003 BW Challenge Three Server systems with 10 Gigabit Ethernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:

Pal Alto PAIX rtt 17 ms , window 30 MB Shared with Caltech booth 4.37 Gbit HighSpeed TCP I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link

3.3Gbit Scalable TCP I=8% Tested 2 flows sum 1.9Gbit I=39%

Chicago Starlight rtt 65 ms , window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit HighSpeed TCP I=1.6%

Amsterdam SARA rtt 175 ms , window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit HighSpeed TCP I=6.9%

Very Stable Both used Abilene to Chicago

10 Gbits/s throughput from SC2003 to PAIX

0

1

2

3

4

5

6

7

8

9

10

11/19/0315:59

11/19/0316:13

11/19/0316:27

11/19/0316:42

11/19/0316:56

11/19/0317:11

11/19/0317:25 Date & Time

Thro

ughp

ut G

bits

/s

Router to LA/PAIXPhoenix-PAIX HS-TCPPhoenix-PAIX Scalable-TCPPhoenix-PAIX Scalable-TCP #2

10 Gbits/s throughput from SC2003 to Chicago & Amsterdam

0

1

2

3

4

5

6

7

8

9

10

11/19/0315:59

11/19/0316:13

11/19/0316:27

11/19/0316:42

11/19/0316:56

11/19/0317:11

11/19/0317:25 Date & Time

Thro

ughp

ut G

bits

/s

Router traffic to Abilele

Phoenix-Chicago

Phoenix-Amsterdam

Page 5: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

Page 6: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

UKLight at SC2004 UK e-Science Researchers from Manchester, UCL & ULCC involved in the

Bandwidth Challenge Collaborated with Scientists & Engineers from Caltech, CERN, FERMI, SLAC,

Starlight, UKERNA & U. of Florida Worked on:

10 Gbit Ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale 10 Gbit Ethernet link from SC2004 and the CENIC/NLR/Level(3) PoP in

Sunnyvale  10 Gbit Ethernet link from SC2004 to Chicago and on to UKLight

UKLight focused on disk-to-disk transfers between UK sites and Pittsburgh

UK had generous support from Boston Ltd who loaned the servers The BWC Collaboration had support from:

S2io NICs Chelsio TOE Sun who loaned servers

Essential support from Cisco

Page 7: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

SCINet

Collaboration at SC2004 Setting up the BW Bunker

The BW Challenge at the SLAC Booth

Working with S2io, Sun, Chelsio

Page 8: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

The Bandwidth Challenge – SC2004 The peak aggregate bandwidth from the booths was 101.13Gbits/s Or 3 full length DVD per second Saturated TEN 10GE waves SLAC Booth: Sunnyvale to Pittsburgh, LA to Pittsburgh and Chicago

to Pittsburgh (with UKLight).

Page 9: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

SC2004 UKLIGHT – Focused on Disk-to-Disk

MB-NG 7600 OSR

Manchester

ULCC UKlight

UCL HEP UCL network

K2

Ci

Chicago Starlight

Amsterdam

SC2004

Caltech BoothUltraLight IP

SLAC Booth

Cisco 6509

UKlight 10GFour 1GE channels

UKlight 10G

Surfnet/ EuroLink 10GTwo 1GE channels

NLR LambdaNLR-PITT-STAR-10GE-16

K2

K2 Ci

CERN 7600

Page 10: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

Transatlantic Ethernet: TCP Throughput Tests Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel Memory-memory TCP throughput Standard TCP

Wire rate throughput of 940 Mbit/s

First 10 sec

Work in progress to study: Implementation detail Advanced stacks Packet loss Sharing

0

500

1000

1500

2000

0 20000 40000 60000 80000 100000 120000 140000time ms

TCPA

chiv

e M

bit/s

0

200000000

400000000

600000000

800000000

1000000000

1200000000

1400000000

Cw

nd

InstaneousBWAveBWCurCwnd (Value)

0

500

1000

1500

2000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000time ms

TCPA

chiv

e M

bit/s

0500000010000000150000002000000025000000300000003500000040000000

Cwnd

InstaneousBWAveBWCurCwnd (Value)

Page 11: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

Transatlantic Ethernet: disk-to-disk Tests Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel RAID0 (6 SATA disks) Bbftp (disk-disk) throughput Standard TCP

Throughput of 436 Mbit/s

First 10 sec

Work in progress to study: Throughput limitations Help real users

0

500

1000

1500

2000

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000time ms

TCPA

chiv

e M

bit/s

0500000010000000150000002000000025000000300000003500000040000000

Cw

nd

InstaneousBWAveBWCurCwnd (Value)

0

500

1000

1500

2000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000time ms

TCPA

chiv

e M

bit/s

0500000010000000150000002000000025000000300000003500000040000000

Cwnd

InstaneousBWAveBWCurCwnd (Value)

sculcc1-chi-2

0100200300400500600700800900

1000

0 10 20 30TCP buffer size Mbytes

TCP

Ach

ieva

ble

thro

ughp

ut

Mbi

t/s

iperf Sender Mbit/sbbftp Mbit/s

Page 12: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

10 Gigabit Ethernet: UDP Throughput Tests 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire ra

te M

bits

/s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Page 13: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

05

101520253035404550

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tran

sfer

tim

e us

0

1

2

3

4

5

6

7

8

9

PC

I-X

Tra

nsfe

r ra

te G

bit/s

Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X

Kernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR AccessPCI-X Sequence

Data TransferInterrupt & CSR Update

Page 14: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

10 Gigabit Ethernet: SC2004 TCP Tests Sun AMD opteron compute servers v20z Chelsio TOE Tests between Linux 2.6.6. hosts

10 Gbit ethernet link from SC2004 to CENIC/NLR/Level(3) PoP in Sunnyvale  Two 2.4GHz AMD 64 bit Opteron processors with 4GB of RAM at SC2004 1500B MTU, all Linux 2.6.6 in one direction 9.43G i.e. 9.07G goodput and the reverse direction 5.65G i.e. 5.44G goodput Total of 15+G on wire.

10 Gbit ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale One 2.4GHz AMD 64 bit Opteron each end 2MByte window, 16 streams, 1500B MTU, all Linux 2.6.6 in one direction 7.72Gbit/s i.e. 7.42 Gbit/s goodput 120mins (6.6Tbits shipped)

S2io NICs with Solaris 10 in 4*2.2GHz Opteron cpu v40z to one or more S2io or Chelsio NICs with Linux 2.6.5 or 2.6.6 in 2*2.4GHz V20Zs LAN 1 S2io NIC back to back: 7.46 Gbit/s LAN 2 S2io in V40z to 2 V20z : each NIC ~6 Gbit/s total 12.08 Gbit/s

Page 15: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones

UKLight and ESLEA Collaboration forming for SC2005

Caltech, CERN, FERMI, SLAC, Starlight, UKLight, … Current Proposals include:

Bandwidth Challenge with even faster disk-to-disk transfers between UK sites and SC2005

Radio Astronomy demo at 512 Mbit user data or 1 Gbit user dataJapan, Haystack(MIT), Jodrell Bank, JIVE

High Bandwidth linkup between UK and US HPC systems 10Gig NLR wave to Seattle

Set up a 10 Gigabit Ethernet Test Bench Experiments (CALICE) need to investigate >25 Gbit to the processor

ESLEA/UKlight need resources to study: New protocols and congestion / sharing The interaction between protcol processing, applications and storage Monitoring L1/L2 behaviour in hybrid networks

Page 16: Network Measurement & Characterisation  and the  Challenge of SuperComputing SC200x

ESLEA Bedfont Lakes Dec 04Richard Hughes-Jones