High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for...

21
High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting October 2005
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for...

Page 1: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

High Speed Physics Data Transfers using UltraLight

Julian Bunn(thanks to Yang Xia and others for material in this talk)

UltraLight Collaboration MeetingOctober 2005

Page 2: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Disk to Disk (Newisys) 2004System VendorNewisys 4300 AMD Opteron Enterprise Server with 3 AMD-8131

CPUQuad Opteron 848 2.2GHz

Memory16GB PC2700 DDR ECC

Network Interface S2io 10GE in a 64-bit/133MHz PCI-X slot.

Raid Controller 3 x Supermicro Marvell SATA controller

Hard Drives 24 x 250GB WDC 7200rpm SATA

OSWin2K3 AMD64, Service Pack 1, v.1185

550 MBytes/sec

Page 3: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Tests with rootd

• Physics analysis files are typically ROOT format• Would like to serve these files over the network

as quickly as possible.• At least three possibilities:

– Use rootd– Use Clarens– Use Web server

• Use of rootd is simple:– On client, use “123.456.789.012:/dir/root.file”– On server, run “rootd”

Page 4: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

rootdOn server:

[root@dhcp-116-157 rootdata]# ./rootd -p 5000 -f -noauth main: running in foreground mode: sending output to stderrROOTD_PORT=5000

On client, add following to .rootrc (corrects issue in current root):

XNet.ConnectDomainAllowRE: * +Plugin.TFile: ^root:  TNetFile Core "TNetFile(const char*,Option_t*,const

char*,Int_t,Int_t)"

In the C code, access the files like this: 

TChain* ch = new TChain("Analysis"); ch->Add("root://10.1.1.1:5000/../raid/rootdata/zpr200gev.mumu.root"); ch->Add("root://10.1.1.1:5000/../raid/rootdata/zpr500gev.mumu.root");

 

Page 5: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Rootd (measure performance)

Int_t nbytes = 0, nb = 0;

TStopwatch s;

for (Long64_t jentry=0; jentry<nentries;jentry++){

Long64_t ientry = LoadTree(jentry);

if (ientry < 0) break;

nb = fChain->GetEntry(jentry);

nbytes += nb;

}

s.Stop();

s.Print();

Long64_t fileBytes = gFile->GetBytesRead();

Double_t mbSec = (Double_t) (fileBytes/1024/1024);

mbSec /= s.RealTime();

cout << nbytes << " Bytes (uncompressed) " << fileBytes << " Bytes (in file) " << mbSec << " MBytes/sec" << endl;

Compression makes a big difference: Root file is 282 MBytes, but Root object data amounts to 655 MBytes! Thus the physics data rate to application is twice the reported network rate (for this test ~22 MBytes/sec)

Application:

Real time 0:00:14,

CP time 12.790 655167999 Bytes

Rootd:

rd=2.81415e+08, wr=0,

rx=478531,

tx=2.81671e+08

Page 6: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Tests with Clarens/Root• Using Dimitri’s analysis (Root files containing

Higgs -> muon data at various energies)• Root client requests objects from files of size a

few hundred MBytes• In this analysis, not all the objects from the file

are read, so care in computing the network data rate is required

• Clarens serves data to Root client at approx. 60 MBytes/sec

• Compare with using wget pull of Root file from Clarens/Apache: 125 MBytes/sec cold cache, 258 MBytes/sec warm cache

Page 7: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Tests with gridftp

• Gridftp may work well, if you can manage to install it and work with security constraints

• Michael Thomas experience:– Installed on laptop successfully, but needed Grid

certificate for host, and reverse DNS lookup. Didn’t have, so couldn’t use

– Installed on osg-discovery.caltech.edu successfully, but could not use for testing since production machine

– Attempted install on UltraLight dual core Opterons at Caltech, but no host certificates, no reverse lookup, no support for x86_64

• Summary: installation/deployment constraints severely restrict usefulness of gridftp

Page 8: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Tests with bbftp• bbftp supported by IN2P3• Time difference makes support less interactive than for bbcp • Operates with an ftp-like client/server setup• Tested bbftp v3.2.0 between LAN Opterons• Example localhost copy:bbftp -e 'put /tmp/julian/example.session /tmp/julian/junk.dat' localhost -u root • Some problems:• Segmentation faults when using IP numbers rather than names … x86_64 issue?• Transfer fails with reported routing error, but routes are OK• By default, files are copied to temporary location on target machine, then copied to

correct location. This is not what is wanted when targetting a high speed RAID array! [Can be avoided with “setoption notmpfile”]

• Sending files to /dev/null did not seem to work:

>> USER root PASS << bbftpd version 3.2.0 : OK >> COMMAND : setoption notmpfile << OK >> COMMAND : put OneGB.dat /dev/null BBFTP-ERROR-00100 : Disk quota excedeed or No Space left on device << Disk quota excedeed or No Space left on device

Page 9: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

bbcp

• http://www.slac.stanford.edu/~abh/bbcp/ • Developed as tool for BaBar file transfers• The work of Andy Hanushevsky (SLAC)• Peer to Peer architecture – third party

transfers• Simple to install: just need bbcp

executable in path on remote machine(s)• Works with all standard methods of

authentication

Page 10: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Tests with bbcp

The goal is to transfer data files at 10 Gbits/sec in the WANWe use Opteron systems with two CPUs each dual core,

8GB or 16GB RAM, s2io 10Gbit NICs, RHEL 2.6 kernel

We use a stepwise approach, starting with the easiest data transfers:

1) Memory to bit bucket (/dev/zero to /dev/null)2) Ramdisk to bit bucket (/mnt/rd to /dev/null)3) Ramdisk to Ramdisk (/mnt/rd to /mnt/rd)4) Disk to bit bucket (/disk/file to /dev/null)5) Disk to Ramdisk6) Disk to Disk

Page 11: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

bbcp LAN Rates• Goal: bbcp rates should match or exceed iperf rates• Single bbcp process:

a) 1 stream max rate =   523 MBytes/sec b) 2 streams max rate =  522 MBytes/sec c) 4 streams max rate =  473 MBytes/sec d) 8 streams max rate =  460 MBytes/sec e) 16 streams max rate = 440 MBytes/sec f) 32 streams max rate = 417 MBytes/sec

• 3 simultaneous bbcp processes:P 1) bbcp: At 050922 08:58:14 copy 99% complete; 348432.0 KB/s P 2) bbcp: At 050922 08:58:15 copy 54% complete; 192539.5 KB/s P 3) bbcp: At 050922 08:58:15 copy 30% complete; 194359.9 KB/s

Aggregate utlization of 735 MByte/sec (~6 Gbits/sec).

Conclusion: bbcp can match iperf in the LAN. Use one or two streams, and several bbcp processes (if you can)

Page 12: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

bbcp WAN rates

785 MBytes/sec

MemoryTo

Memory(sender has FAST &

Web100)

Page 13: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Performance Killers

1) Make sure you're using the right interface! Check with ifconfig 2) Do a cat /proc/sys/net/ipv4/tcp_rmem

and make sure the numbers are big, like 1610612736      1610612736      1610612736

3) Tune the interface if not, using: /usr/local/src/s2io//s2io_perf.sh

4) Flush existing routes # sysctl -w net.ipv4.route.flush=1

5) Sometimes a route has to be configured manually, and added to /etc/sysconfig/networks-scripts/route-ethX for the future

6) Sometimes commands like sysctl and ifconfig are not in the PATH 7) Check route is OK with traceroute in both directions 8) Check machine reachable with ping 9) Sometimes 10Gbit adapter does not have 9000 MTU ... But instead

has default of 1500 10) If in doubt, reboot11) If still in doubt, rebuild your application, and goto 10)

Page 14: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Ramdisks & SHC

• Avoid disk I/O by using ramdisks – it works% mount -t ramfs none /mnt/rd • Allows physics data files to be placed in system RAM• Finesses the new Bandwidth Challenge “rule” disallowing

iperf/artificial data• In CACR’s new “Shared Heterogeneous Cluster” (>80

dual Opteron HP nodes) we intend to populate ramdisks on all nodes with Root files, and transfer them using bbcp to nodes in the Caltech booth at SC2005

• The SHC is connected to the WAN via a Black Diamond switch, with two bonded 10Gbit links to Caltech’s UltraLight Force10.

Page 15: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

SC2005 Bandwidth ChallengeThe Caltech-CERN-Florida-FNAL-Michigan-Manchester-SLAC entry will demonstrate high speed transfers of physics data between host labs and collaborating institutes in the USA and worldwide. Caltech and FNAL are major participants in the CMS collaboration at CERN’s Large Hadron Collider (LHC). SLAC is the host of the BaBar collaboration. Using state of the art WAN infrastructure and Grid-based Web Services based on the LHC Tiered Architecture, our demonstration will show real-time particle event analysis requiring transfers of Terabyte-scale datasets.

We propose to saturate at least fifteen lambdas at Seattle, full duplex (potentially over 300 Gbps of scientific data).The lambdas will carry traffic between SLAC, Caltech and other partner Grid Service sites including UKlight, UERJ, FNAL and AARnet. We will monitor the WAN performance using Caltech's MonALISA agent-based system. The analysis software will use a suite of Grid-enabled Analysis tools developed at Caltech and University of Florida. There will be a realistic mixture of streams: those due to the transfer of the TeraByte event datasets, and those due to a set of background flows of varied character absorbing the remaining capacity. The intention is to simulate the environment in which distributed physics analysis will be carried out at the LHC. We expect to easily beat our SC2004 record of ~100Gbits/sec (roughly equivalent to downloading 1000 DVDs in less than an hour).

Page 16: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.
Page 17: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.
Page 18: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.
Page 19: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.
Page 20: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.
Page 21: High Speed Physics Data Transfers using UltraLight Julian Bunn (thanks to Yang Xia and others for material in this talk) UltraLight Collaboration Meeting.

Summary• Seeking fastest ways of moving physics data in the 10 Gbps WAN

• Disk to Disk WAN record held by Newisys machines in 2004: >500MBytes/sec

• Root files can be served to Root clients at decent rates (> 60Mbytes/sec). Root compression helps by factor >2

• Root files can be served by rootd, xrootd, Clarens, and vanilla Web servers

• For file transfers, bbftp and gridftp hard to deploy and test• bbcp easy to deploy, well supported, and can match iperf speeds in

the LAN (~7Gbits/sec) and the WAN (~6.3Gbits/sec) for memory to memory data transfers

• Optimistically, bbcp should be able to copy disk resident files in the WAN at the same speeds, given:– Powerful servers– Fast disks

• Although we are not there yet, we are aiming to be by SC2005!