Investigating Network Performance – A Case Study
Ralph Spencer, Richard Hughes-Jones, Matt Strong and Simon Casey
The University of ManchesterG2 Technical Workshop, Cambridge, Jan
2006
Very Long Baseline Interferometry
eVLBI – using the Internet for data transfer
GRS 1915+105: 15 solar mass BH in an X-ray binary: MERLIN observations
receding
600 mas = 6000 A.U. at 10 kpc
Sensitivity in Radio Astronomy
• Noise level• B=bandwidth, integration
time.• High sensitivity requires large
bandwidths as well as large collecting area e.g Lovell, GBT, Effelsberg, Camb. 32-m
• Aperture synthesis needs signals from individual antennas to be correlated together at a central site
• Need for interconnection data rates of many Gbit/sec
B/1
New Instruments are making the best use of bandwidth:
• eMERLIN 30 Gbps• Atacama Large mm Array
(ALMA) 120 Gbps• EVLA 120 Gbps• Upgrade to European VLBI:
eVLBI 1 Gbps• Square Km Array (SKA)
many Tbps
The European VLBI NetworkEVN
• Detailed radio imaging uses antenna networks over 100s-1000s km
• Currently use disk recording at 512Mb/s (Mk5)
• real-time connection allows greater – response– reliability– sensitivity– Need Internet
eVLBI
WesterborkNetherlands
Dedicated
Gbit link
EVN-NREN
OnsalaSweden
Gbit link
Jodrell BankUK
DwingelooDWDM link
CambridgeUK
MERLIN
MedicinaItaly
Chalmers University
of Technolo
gy, Gothenbu
rg
TorunPoland
Gbit link
Testing the Network for eVLBI Aim is to obtain maximum BW compatible
with VLBI observing systems in Europe and USA.
First sustained data flow tests in Europe:
iGRID 200224-26 September 2002
Amsterdam Science and Technology Centre (WTCW)
The Netherlands“ We hereby challenge the international research
community to demonstrate applications that benefit from huge amounts of bandwidth! ”
iGRID2002 Radio Astronomy VLBI Demo.
• Web based demonstration sending VLBI data– A controlled stream of UDP packets– 256-500 Mbit/s
• production network Man –Superjanet Geant --Amsterdam
• Dedicated lambda Amsterdam Dwingeloo
The Works:
n bytes
Wait timetime
Raid0Disc
UDP Data
Raid0Disc
RingBuffer RingBuffer
TCP Control
Web Interface
UDP Throughput on the Production WAN
Manc-UvA SARA 750 Mbit/s SJANET4 + Geant +
SURFnet 75% Manchester Access link
Manc-UvA SARA 825 Mbit/s
UDP Man-UvA Gig 19 May 02
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1472 bytes
UDP Man-UvA Gig 28 Apr 02
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mbi
ts/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
How do we test the network?• Simple connectivity test from Telescope site to
correlator (at JIVE, Dwingeloo, The Netherlands, or MIT Haystack Observatory, Massachusetts) : traceroute, bwctl
• Performance of link and end hosts: UDPmon, iPERF• Sustained data tests vlbiUDP (under development)• True eVLBI data from Mk5 recorder: pre-recorded
(Disk2Net) or Real Time (Out2Net)
Mk 5’s are 1.2 GHz P3’s with Streamstore cardsand 8-pack exchangeable disks, 1.3 Tbytes storage.Capable of 1 Gbps continuous recording and playback.Made by Conduant, Haystack design.
Jodrell BankUK
OnsalaSweden
MedicinaItaly
TorunPoland
EffelsbergGermany
WesterborkNetherlands
Telescope connections
JIVE
1Gb/s
1Gb/s
1Gb/s
155Mb/s
MERLIN
CambridgeUK
2* 1G
1Gb/s light now
MERLINe
??end 06???
eVLBI Milestones • January 2004: Disk buffered eVLBI session:
• Three telescopes at 128Mb/s for first eVLBI image
• On – Wb fringes at 256Mb/s
• April 2004: Three-telescope, real-time eVLBI session.
• Fringes at 64Mb/s• First real-time EVN image - 32Mb/s.
• September 2004: Four telescope real-time eVLBI• Fringes to Torun and Arecibo• First EVN, eVLBI Science session
• January 2005: First “dedicated light-path” eVLBI• ??Gbyte of data from Huygens descent
transferred from Australia to JIVE• Data rate ~450Mb/s
• 20 December 20 2004• connection of JBO to Manchester by 2 x 1 GE• eVLBI tests between Poland Sweden UK and Netherlands at 256 Mb/s
• February 2005• TCP and UDP memory – memory tests at rates up to 450 Mb/s(TCP) and 650 Mb/s (UDP)• Tests showed inconsistencies betweeb Red Hat kernals, rates of 128 Mb/s only obtained on 10 Feb• Haystack (US) – Onsala (Sweden) runs at 256 Mb/s
• 11 March 2005 Science demo• JBO telescope winded off, short run on calibrator source done
Summary of EVN eVLBI tests
• Regular tests with eVLBI Mk5 data every ~6 weeks – 128 Mpbs OK, 256 Mpbs often,– 512 Mbps Onsala – Jive occasionally– but not JBO at 512 Mbps – WHY NOT?(NB using Jumbo packets 4470 or 9000 bytes)
• Note correlator can cope with large error rates– up to ~ 1 %– but need high throughput for sensitivity– implications for protocols, since throughput on TCP
is very sensitive to packet loss.
Gnt5-DwMk5 11Nov03-1472 bytes
0
2
4
6
8
10
12
0 5 10 15 20 25 30 35 40Spacing between frames us
% P
acket
loss
Gnt5-DwMk5
DwMk5-Gnt5
Throughput vs packet spacing Manchester: 2.0G Hz Xeon Dwingeloo: 1.2 GHz PIII Near wire rate, 950 Mbps UDPmon
Packet loss
CPU Kernel Load sender
CPU Kernel Load receiver 4th Year project
Adam Mathews Steve O’Toole
Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes
0
200
400
600
800
1000
1200
0 5 10 15 20 25 30 35 40Spacing between frames us
Recv W
ire r
ate
Mbits/s
Gnt5-DwMk5
DwMk5-Gnt5
Gnt5-DwMk5 11Nov03 1472 bytes
020406080
100
0 5 10 15 20 25 30 35 40Spacing between frames us
% K
erne
l S
ende
r
Gnt5-DwMk5 11Nov03 1472 bytes
020406080
100
0 5 10 15 20 25 30 35 40Spacing between frames us
% K
erne
l R
ecei
ver
UDP Throughput Oct-Nov 2003 Manchester-Dwingeloo Production
ESLEA
• Packet loss will cause low throughput in TCP/IP
• Congestion will result in routers drooping packets: use Switched Light Paths!
• Tests with MB-NG network Jan-Jun 05
• JBO connected to JIVE via UKLight in June (thanks to John Graham, UKERNA)
• Comparison tests between UKLight connections JBO-JIVE and production (SJ4-Geant)
Project Partners
Project Collaborators
The Council for the Central Laboratoryof the Research Councils
Funded by
EPSRC GR/T04465/01
www.eslea.uklight.ac.uk
£1.1 M, 11.5 FTE
UKLight Switched light path
Tests on the UKLight switched light-path Manchester : Dwingeloo
• Throughput as a function of inter-packet spacing (2.4 GHz dual Xeon machines)
• Packet loss for small packet size • Maximum size packets can reach
full line rates with no loss, and there was no re-ordering (plot not shown).
gig03-jiveg1_UKL_25Jun05
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Rec
v W
ire r
ate
Mbi
t/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
gig03-jiveg1_UKL_25Jun05
0.0001
0.001
0.01
0.1
1
10
100
0 10 20 30 40Spacing between frames us
% P
acke
t lo
ss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
Tests on the production network Manchester : Dwingeloo.
• Throughput
• Small (0.2%) packet loss was seen
• Re-ordering of packets was significant
gig6-jivegig1_31May05
0.0001
0.001
0.01
0.1
1
10
100
0 10 20 30 40Spacing between frames us
% P
acket
loss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
UKLight using Mk5 recording terminals
Jodrell BankUK
DwingelooDWDM link
MedicinaItaly Torun
Poland
e-VLBI at the GÉANT2 Launch Jun 2005
UDP Performance: 3 Flows on GÉANT• Throughput: 5 Hour run 1500 byte MTU
• Jodrell: JIVE2.0 GHz dual Xeon – 2.4 GHz dual Xeon670-840 Mbit/s
• Medicina (Bologna): JIVE 800 MHz PIII – Mk5 (623) 1.2 GHz PIII 330 Mbit/s limited by sending PC
• Torun: JIVE 2.4 GHz dual Xeon – Mk5 (575) 1.2 GHz PIII
245-325 Mbit/s limited by security policing
(>600Mbit/s 20 Mbit/s) ?
• Throughput: 50 min period• Period is ~17 min
BW 14Jun05
0
200
400
600
800
1000
0 500 1000 1500 2000Time 10s steps
Rec
v w
ire ra
te M
bit/s
JodrellMedicinaTorun
BW 14Jun05
0
200
400
600
800
1000
200 250 300 350 400 450 500Time 10s steps
Rec
v w
ire ra
te M
bit/s
JodrellMedicinaTorun
18 Hour Flows on UKLightJodrell – JIVE, 26 June 2005
• Throughput:• Jodrell: JIVE
2.4 GHz dual Xeon – 2.4 GHz dual Xeon
960-980 Mbit/s
• Traffic through SURFnet
• Packet Loss– Only 3 groups with 10-150 lost
packets each– No packets lost the rest of the
time
• Packet re-ordering– None
man03-jivegig1_26Jun05
0
200
400
600
800
1000
0 1000 2000 3000 4000 5000 6000 7000
Time 10s steps
Rec
v w
ire r
ate
Mbi
t/s
w10
man03-jivegig1_26Jun05
900910920930940950
960970980990
1000
5000 5050 5100 5150 5200
Time 10s
Recv w
ire r
ate
Mbit/s w10
man03-jivegig1_26Jun05
1
10
100
1000
0 1000 2000 3000 4000 5000 6000 7000
Time 10s steps
Packet
Loss
w10
Recent Results 1:
• iGRID 2005 and SC 2005– Global eVLBI demonstration– Achieved 1.5 Gbps across Atlantic using UKLight– 3 VC-3-13c ~700 Mbps SDH links carrying data
across the Atlantic from Onsala, JBO and Westerbork telescopes
– 512 Mps K4 – Mk5data from Japan to USA– 512 Mbs Mk5 real time interferometry between
Onsala, Westford, Maryland Point antennas correlated at Haystack observatory
– Used VLSR technology from DRAGON project in US to set up light paths.
<JBO Mk2 Westerbork array>
Onsala 20-m
Kashima 34-m >
Recent results 2:• Why can Onsala achieve 512 Mbps from Mk5 to Mk5 even transatlantic?
– Identical Mk5 to JBO – Longer link
• iperf TCP JBO Mk5 to Man. rtt ~1ms 4420 byte packets get 960 Mpbs
• iperf TCP JBO Mk5 to JIVE rtt ~15ms 4420 byte packets get 777 Mpbs
Not much wrong with the networks!
• – –
• shows 94.7% kernel usage and 1.5% idle
• shows 96.3% kernel usage and 0.06% idle – no cpu left!
• Likelihood is that Onsala Mk 5 marginally faster cpu – at critical point for 512 Mbps transmission
• Solution – better motherboards for Mk5’s – about 40 machines to upgrade!
mk5-606-jive_9Dec05
0102030405060708090
100
0 1 2 3 4 5trial
% C
PU
ker
nel
00.511.522.533.544.55
% C
PU
mod
e
kernel
user
nice
idle
mk5-606-g7_10Dec05
0100200300400500600700800900
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thr
ough
put M
bit/s
no CPU load
The Future:• Regular eVLBI tests in EVN continue• Testing Mk5 SuperStor interface <-> network
interaction• Test upgraded Mk5 recording devices• Investigate alternatives to TCP/UDP – DCCP,
vlbiUDP, tsunami, etc.• ESLEA comparing UKLight with production• EU’s EXPReS eVLBI project starts March 2006
– Connection of 100-m Effelsberg telescope in 2006– Protocols for distributed processing– Onsala-JBO correlator test link at 4 Gbps in 2007
• eVLBI will become routine in 2006!
Processing Nodes
Controller/DataConcentrator
VLBI Correlation: GRID Computation task
Questions ?
Top Related