presidentofindia.nic/scripts/sllatest1.jsp?id=734

47
28.3.2006 H.Göringer IT/EE Palaver GSI 1

description

http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734. Status of the LHC Project. J. Engelen February 13, 2006. The Large Hadron Collider. The Large Hadron Collider: 14 TeV pp collisions at 10 34 cm -2 s -1 New energy domain (x10), new luminosity domain (x100) - PowerPoint PPT Presentation

Transcript of presidentofindia.nic/scripts/sllatest1.jsp?id=734

Page 1: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 1

Page 2: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 2

Page 3: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 3

Page 4: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 4

Page 5: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 5

Page 6: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 6

Page 7: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 7

http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734

Page 8: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 8

Page 9: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 9

Page 10: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 10

Status of the LHC ProjectStatus of the LHC ProjectStatus of the LHC ProjectStatus of the LHC Project

J. Engelen

February 13, 2006

J. Engelen

February 13, 2006

Page 11: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 11

The Large Hadron Collider: 14 TeV pp collisions at 1034 cm-2s-1

New energy domain (x10), new luminosity domain (x100)

Will have to cross threshold of electroweak symmetrybreaking; unitarity of WW scattering requires MHiggs< 850 GeV

Many possibilities: Standard Higgs – SUSY (many possibilities...)-Large Extra Dimensions (quantum gravity)

-and many more results on CP violation, Quark Gluon Plasma, QCD, ..., surprises...

The LHC results will determine the future course ofHigh Energy PhysicsThe LHC results will determine the future course ofHigh Energy Physics

The Large Hadron Collider

Page 12: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 12

Barrel Toroid installation statusThe mechanical installation is complete, electrical and cryogenic connections are being made now, for a first in-situ cool-down and excitation test in spring 2006

Page 13: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 13

The LHC Computing Grid: LCG(Project leader Les Robertson)

is about storing 15 PB (imagine!) of new data per year; processing them and making the information available to thousands of physicists all around the world!

Model: ‘Tiered’ architecture; 100,000 processors; multi-PB disk, tape capacity

Leading ‘computing centers’ involved

Page 14: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 14Full physics

run

First physics

First beams

cosmics

2007

2005

2008

2006

Building the ServiceSC1 - Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK)

SC2 – Apr05 - data distribution from CERN to 7 Tier-1s – 600 MB/sec sustained for 10 days (one third of final nominal rate)

SC3 – Sep-Dec05 - demonstrate reliable basic service – most Tier-1s, some Tier-2s; push up Tier-1 data rates to 150 MB/sec (60 MB/sec to tape)

SC4 – May-Aug06 - demonstrate full service – all Tier-1s, major Tier-2s; full set of baseline services; data distribution and recording at nominal LHC rate (1.6 GB/sec)

LHC Service in operation – Sep06 – over following six months ramp up to full operational capacity & performance

LHC service commissioned – Apr07

today

LCG

Page 15: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 15

Conclusions

The LHC project (machine; detectors; LCG) is well underway for physics in 2007

Detector construction is generally proceeding well, although not without concerns in some cases; an enormous integration/installation effort is ongoing – schedules are tight but are also taken very seriously.

LCG (like machine and detectors at a technological level that defines the new ‘state of the art’) needs to fully develop the functionality required; new ‘paradigm’.

Large potential for exciting physics.Large potential for exciting physics.

Page 16: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 16

CHEP – Mumbai, February 2006CHEP – Mumbai, February 2006

State of Readiness of State of Readiness of LHC Computing LHC Computing InfrastructureInfrastructure

Jamie Shiers, CERNJamie Shiers, CERN

Page 17: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 17

LHC CommissioningExpect to be characterised by:

• Poorly understood detectors, calibration, software, triggers etc.

• Most likely no AOD or TAG from first pass – but ESD will be larger?

• The pressure will be on to produce some results as soon as possible!

There will not be sufficient resources at CERN to handle the load

• We need a fully functional distributed system, aka Grid

• There are many Use Cases we did not yet clearly identify

Nor indeed test --- this remains to be done in the coming 9 months!

Page 18: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 18

Resource Deployment and Usage

Resource Requirements for 2008

Page 19: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 19

State of Readiness of the LHC experiments’ software

P. SphicasCERN/UoA

Computing in High Energy PhysicsMumbai, Feb 2006

Page 20: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 20

• ROOT activity at CERN fully integrated in the LCG organization (planning, milestones, reviews, resources, etc.)– The main change during last year has been the

merge of the SEAL and ROOT projects• Single development team• Adiabatic migration of the software products into a single set

of core software libraries

– 50% of the SEAL functionality has been migrated into ROOT (mathlib, reflection, python scripting, etc.)

– ROOT is now at the “root” of the software for all the LHC experiments

Page 21: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 21

Page 22: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 22

LHC-Era Data Rates in 2004 and 2005Experiences of the PHENIX Experiment with a

PetaByte of Data

RHIC from space

Long Island, NY

Martin L. Purschke, Brookhaven National LaboratoryPHENIX Collaboration

Page 23: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 23

Where we are w.r.t. others

ATLAS

CMS

LHCb

ALICE

CDF

~25 ~40

~100

~300

All in MB/sall approximate

~100

~150

600

~1250

400-600MB/s are not so Sci-Fi these days

Page 24: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 24

But is this a good thing?We had a good amount of discussions about the merit of going to high data rates. Are we drowning in data? Will we be able to analyze the data quickly enough? Are we recording “boring” events, mostly? Is it not better to trigger and reject?

• In Heavy-Ion collisions, the rejection power of level2- triggers is limited (high multiplicity, etc)

• triggers take a lot of time to study and developers usually welcome a delay in the onset of the actual rejection mode

• The rejected events are by no means “boring”, high-statistics physics in them, too

Page 25: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 25

...good thing? Cont’d

In the end we convinced ourselves that we could (and should) do it.

The increased data rate helped defer the onset of the LVL2 rejection mode in Runs 4, 5 (didn’t run rejection at all in the end)

Saved a lot of headaches… we think

• Get the data while the getting is good - the Detector system is evolving and is hence unique for each run, better get as much data with it as you can

• Get physics that you simply can’t trigger on

• Don’t be afraid to let data sit on tape unanalyzed - computing power increases, time is on your side here

Page 26: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 26

Ingredients to achieve the high rates

We implemented several main ingredients that made the high rates possible.

• We compress the data before they get on a disk the first time (cram as much information as possible into each MB)

• Run several local storage servers (“Buffer boxes”) in parallel, and dramatically increased the buffer disk space (40TB currently)

• Improved the overall network connectivity and topology

• Automated most of the file handling so the data rates in the DAQ become manageable

Page 27: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 27

How to analyze these amounts of data?

• Priority reconstruction and analysis of “filtered” events (Level2 trigger algorithms offline, filter out the most interesting events)

• Shipped a whole raw data set (Run 5 p-p) to Japan to a regional PHENIX Computing Center (CCJ)

• Radically changed the analysis paradigm to a “train-based” one

Page 28: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 28

Shifts in the Analysis ParadigmAfter a good amount of experimenting, we found that simple concepts work best. PHENIX reduced the level of complexity in the analysis model quite a bit.

We found:

You can overwhelm any storage system by having it run as a free-for-all. Random access to files at those volumes brings anything down to its knees.

People don’t care too much what data they are developing with (ok, it has to be the right kind)

Every once in a while you want to go through a substantial dataset with your established analysis code

Page 29: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 29

Shifts in the Analysis Paradigm• We keep some data of the most-wanted varieties on disk.

• that disk-resident dataset remains the same mostly, we add data to the collection as we get newer runs.

• The stable disk-resident dataset has the advantage that you can immediately compare the effect of a code change while you are developing your analysis module

• Once you think it’s mature enough to see more data, you register it with a “train”.

Page 30: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 30

Analysis Trains• After some hard lessons with the more free-for-all model, we established the

concept of analysis trains. • Pulling a lot of data off the tapes is expensive (in terms of time/resources) • Once you go to the trouble, you want to get as much “return on investment”

for that file as possible - do all the analysis you want while it’s on disk

• We also switched to tape (cartridge)-centric retrievals - once a tape is mounted, get all the files off while the getting is good

• Hard to put a speed-up factor to this, but we went from impossible to analyse to an “ok” experience. On paper the speed-up is like 30 or so.

• So now the data gets pulled from the tapes, and any number of registered analysis modules run over it -- very efficient

• You can still opt out for certain files you don’t want or need

If you don’t do that, the number of file retrievals explodes - I request the file today, next person requests it tomorrow, 3rd person next week

Page 31: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 31

Analysis Train EtiquetteYour analysis module has to obey certain rules:• be of a certain module-kind to make it manageable by the train conductor • be a “good citizen” -- mostly enforced by inheriting from the module parent

class, start from templates, and review• Code mustn't crash, have no endless loops, no memory leaks• pass prescribed Valgrind and Insure tests

This train concept has streamlined the PHENIX analysis in hard-to-underestimate ways.

After the train, the typical output is relatively small and fits on disk

Made a mistake? Or forgot to include something you need? Bad, but not too bad… fix it, test it, the next train is leaving soon. Be on board again.

Page 32: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 32

Page 33: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 33

Page 34: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 34

Page 35: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 35

Page 36: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 36

Page 37: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 37

Page 38: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 38

Performance and Scalability of xrootd

Andrew Hanushevsky (SLAC),Wilko Kroeger (SLAC), Bill Weeks (SLAC),

Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin)

Les Cottrell (SLAC), Yee Ting Li (SLAC)

Computing in High Energy Physics

13-17 February 2006 http://xrootd.slac.stanford.edu xrootd is largely funded by the US Department of Energy

Contract DE-AC02-76SF00515 with Stanford University

Page 39: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 39

xrootd

• 2 projects to integrate SRM and xrootd

(SLAC, FNAL)• possible cooperation gStore – xrootd

Page 40: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 40

xrootd read access

1. requested file on file server and in cache table in memory of redirector:

immediate access

2. requested file on file server, but not in cache table:

access after 5 s (promised to make it better)

(file life time in cache table configurable,

default 6 hours)

3. file on tape:

access via mass storage interface with

gstore: only for few files allowed!

Page 41: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 41

prepare file servers for large scale xrootd read access

1. gStore fills file servers from tape

o only 1 stage command

o gStore knows optimal tape read sequence

o gStore distributes optimal on all file servers

o parallel streams

o storage quota for groups can be implemented

2. gStore passes list of new files to xrootd via

prepare() interface

3. xrootd updates it's cache tables in memory

Page 42: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 42

actions after xrootd write to fileservers

1. xrootd writes via mass storage interface with gStore to tape

2. eventually also possible: asynchroneously with gStore:

o periodical query (~hours) of fileservers and xrootd cache tables (?)

o archive files new files no longer in xrootd cache tables (and not yet archived)

Page 43: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 43

Web Addresses CHEP06

• http://www.tifr.res.in/~chep06/index.php• http://indico.cern.ch/confAuthorIndex.py?confId=048• http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734

Page 44: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 44

Page 45: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 45

Service Challenge Throughput Tests

• Currently focussing on Tier0Tier1 transfers with modest Tier2Tier1 upload (simulated data)

Recently achieved target of 1GB/s out of CERN with rates into Tier1s at or close to nominal rates

• Still much work to do!

• We still do not have the stability required / desired…

The daily average needs to meet / exceed targetsWe need to handle this without “heroic efforts” at all times of day / night!We need to sustain this over many (100) daysWe need to test recovery from problems (individual sites – also Tier0)We need these rates to tape at Tier1s (currently disk)

• Agree on milestones for TierXTierY transfers & demonstrate readiness

Page 46: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 46

Achieved (Nominal) pp data ratesCentre ALICE ATLAS CMS LHCb Rate into T1 (pp)

Disk-Disk (SRM) rates in MB/s

ASGC, Taipei - - 80 (100) (have hit 140)

CNAF, Italy 200

PIC, Spain - >30 (100) (network constraints)

IN2P3, Lyon 200

GridKA, Germany 200

RAL, UK - 200 (150)

BNL, USA - - - 150 (200)

FNAL, USA - - - >200 (200)

TRIUMF, Canada - - - 140 (50)

SARA, NL - 250 (150)

Nordic Data Grid Facility

- - 150 (50)

Meeting or exceeding nominal rate (disk – disk)

Met target rate for SC3 (disk & tape) re-run

Page 47: presidentofindia.nic/scripts/sllatest1.jsp?id=734

28.3.2006 H.Göringer IT/EE Palaver GSI 47

Timeline - 2006January SC3 disk repeat – nominal rates

capped at 150MB/s

SRM 2.1 delivered (?)

July Tape Throughput tests at full nominal rates!

February CHEP w/s – T1-T1 Use Cases, SC3 disk – tape repeat (50MB/s, 5 drives)

August T2 Milestones – debugging of tape results if needed

March Detailed plan for SC4 service agreed (M/W + DM service enhancements)

September LHCC review – rerun of tape tests if required?

April SC4 disk – disk (nominal) and disk – tape (reduced) throughput tests

October WLCG Service Officially opened. Capacity continues to build up.

May Deployment of new M/W and DM services across sites – extensive testing

November 1st WLCG ‘conference’

All sites have network / tape h/w in production(?)

June SC4 production - Tests by experiments of ‘T1 Use Cases’. ‘Tier2 workshop’ – identification of key Use Cases and Milestones for T2s

December ‘Final’ service / middleware review leading to early 2007 upgrades for LHC data taking??

O/S

Upgrade? (S

LC

4) Som

etime before A

pril 2007!