Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements...

18
connect • communicate • collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe LHCONE Planning Meeting , RENATER Paris, 5 April 2011

Transcript of Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements...

Page 1: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

LHCONE – Linking Tier 1 & Tier 2 SitesBackground and Requirements

Richard Hughes-Jones

DANTE Delivery of Advanced Network Technology to Europe

LHCONE Planning Meeting , RENATER Paris, 5 April 2011

Page 2: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate2

Introduction:

Describe some of the changes in the computing model of the LHC experiments.

Demonstrate the importance and usage of the network.

Show the relation between LHCONE and LHCOPN.

Bring together and present the user requirements for future LHC physics analysis.

Provide the information to facilitate the presentations on the Architecture and the Implementation of LHCONE.

Page 3: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate3

A Little History

Requirements paper from K. Bos (Atlas) and I. Fisk (CMS) in autumn 2010.Experiments had devised new compute and data models for LHC data evaluation basically assuming a high speed network connecting the T2s worldwide.Ideas & proposals were discussed at a workshop held at CERN in Jan 2011. Gave input from the networking community.An "LHCONE Architecture" doc finalised in Lyon in Feb 2011. Here K. Bos proposed to start with a prototype based on the commonly agreed architecture.K. Bos and I. Fisk produced a "Use Case" note with list of sites for the prototype.In Rome late Feb 2011 some NRENs & DANTE formed ideas for the "LHCONE prototype planning" doc.

Page 4: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

LHCOPN

LHC: Changing Data Models (1)

LHC computing model based on MONARC served well > 10 years

ATLAS strictly hierarchal; CMS less so.

The successful operation of the LHC accelerator & start of data analysis, brought a re-evaluation of the computing and data models.

Flatter hierarchy: Any site might in the future pull data from any other site hosting it.

LHCOPN

4 Artur Barczyk

Page 5: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

LHC: Changing Data Models (2)

Data caching: A bit like web caching.Analysis sites will pull datasets from other sites “on demand”, including from Tier2s in other regions, then make it available for others.

Possible strategic pre-placement of data setsDatasets put close to physicists studying that data / suitable CPU power.Use of continental replicas.

Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time.

Traffic patterns are changing – more direct inter-country data transfers

5

Page 6: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

ATLAS Data TransfersBetween all Tier levels

Average: ~ 2.3 GB/s (daily average)

Peak: ~ 7 GB/s (daily average)

Data available on site within a few hours.

70 Gbit/s on LHCOPN ATLAS reprocessing

Daniele Bonacorsi6

Page 7: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Data Flow EU – US ATLAS Tier 2’s

Example above is from US Tier 2 sites Exponential rise in April and May, after LHC start Changed data distribution model end of June – caching ESD and DESD Much slower rise since July, even as luminosity grows rapidly

Kors Bos7

Page 8: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

LHC: Evolving Traffic Patterns

One example of data coming from the US

4 Gbit/s for ~ 1.5 days (11 Jan 11)

Transatlantic link

GÉANT Backbone

NREN Access Link

Not an isolated case

Often made up of many data flows

Users getting good at running gridftp

8

Page 9: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Data Transfers over RENATER

Peak rates a substantial fraction of 10 Gigabits, often for hours.

Several LHC involved.

Demand variable depending on user work.

Francois-Xavier Andreu9

Page 10: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Data Transfers over DFN

Peak rates saturate one of the10 Gigabit links DFN-GÉANT.

Demand variable depending on user work.

Christian Grimm10

Two different weeks from GÉANT to Aachen

Page 11: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Data Transfers from GARR - CNAFT0-T1 + T1-T1 + T1-T2

Peak rates 14-18 Gigabit/s.

Traffic shows diurnal demand & is variable depending on user work.

Sustained growth over last year

Marco Marletta11

Page 12: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

CMS Data TransfersData Placement for Physics Analysis

Once data is onto the WLCG, it must be made accessible to analysis applications.

Largest fraction of analysis computing at LHC is at the Tier2s.

New flexibility reduces latency for end users.Daniele Bonacorsi

12

T1‐T2 dominates

T2‐T2 emerges

Page 13: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Data Transfer Performance Site or Network?

Test NorthGrid to GÉANT PoP London

UDP throughput from SE 990 Mbit/s.

75% packet loss.

Data transmitted by SE at 3.8 Gbit/sover 4 1 Gigabit interfaces.

TCP transmits in bursts at 3.8 Gbit/spacket loss & re-tries mean low throughput

13

1 Gbit Bottleneck at receiver

Classic packet loss from bottleneck

Even more data with end-hosts fixed.

Page 14: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

LHCOPN linking Tier 0 to Tier 1’sLHCONE for Tier 1’s and Tier 2’s

14

LHCONEOther regionsOther regions

T2s in a country

LHCONE prototype in Europe.

T1 are connected but not LHCOPN

Page 15: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Requirements for LHCONE

LHCOPN provides infrastructure to move data T0-T1 and T1-T1.

New infrastructure required to improve transfers T1-T2 & T2-T2:

Analysis is mainly done in Tier 2, so data is required from any T1 or any T2. T2-T2 is very important.

Work done at a Tier 2: Simulations & Physics Analysis (50:50)

Network BW needs of a T2 include:

Re-processing efforts: 400 TByte refresh in a week = 5 Gbit/s

Data bursts from user analysis : 25 Tbyte in a day = 2.5Gbit/s

Feeding a 1000 core farm with LHC events: ~ 1Gbit/s

Note this implies timely delivery of data not just average rates!

Access link “available bandwidth” for Tier 2 sizes:

Large 10 Gbit; Medium 5 Gbit; Small 1 Gbit

15

Page 16: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

Requirements for LHCONE

Sites are free to choose the way they wish to connect.

Flexibility & extensibility required:

T2s change

Analysis usage pattern is more chaotic – Dynamic Networks of interest

World-wide connectivity required for LHC sites.

There is concern about LHC traffic swamping other disciplines.

Monitoring & fault-finding support should be built in.

Cost effective solution required – may influence the Architecture.

No isolation of sites must occur.

No interruption of the data-taking or physics analysis

A prototype is needed.

16

Page 17: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

RequirementsFitting in with LHC 2011 data taking

17

Machine development & Technical Stops provide pauses in the data taking.

This does not mean there is plenty of time.

LHCONE prototype might grow in phases.

Page 18: Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network.

connect • communicate • collaborate

ANY QUESTIONS ?

18