Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

12
Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield

Transcript of Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Page 1: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

User Board Overview

Dan ToveyUniversity Of Sheffield

Page 2: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Tier-1 Planning

• Quarterly UB meeting in April (see minutes) updated Tier-1 planning figures

• Shortfall of T1 resources in future years, (especially 2008) evident.

• Will need to consider if expt. requirements can be met by Tier-2 resources need to demonstrate clear need for Tier-1 functionality.

• Requests which can be met by Tier-2 to be discussed with Tier-2 board.

• ‘Other Experiments’ line removed from Tier-1 Schedule following detailed Tier-1 board plan all users must make representation to UB to get access to resources

Page 3: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Tier-1 Planning

• Tier-1 utilisation figures frequently fall significantly short of both requests and allocations – sends the wrong message– Often not fault of experiments (e.g. middleware /

operational problems) but experiments must work to produce more realistic estimates

• Move to strict allocation of Disk resources (no over-allocation) helps Tier-1 team.

• Also synchronise with spending cycle aim to ensure complete use of all new resources as soon as on-line

Page 4: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

DB Links

• Stronger links with Deployment Board are seen as vital standing invitation for DB representation at UB meetings.

Page 5: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

UB Concerns

How are experiments that globally are not moving to the Grid to be handled?

Site stability & User support Balance of effort at Tier-1: much used for

CMS (SRM) and later LCG SC, but what about smaller user communities?

What about ‘non-standard’ OS at Tier-2 sites can render useless to some experiments. UB and Tier-2 board need to persuade to work towards standardisation.

Page 6: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Questionnaire

• User Board questionnaire updated for latest OsC process.

• No big changes from February• Some new comments/concerns:

– fragmented support structure – All stick and no carrot – held up by problems with establishing the VO– Not all experiments supported by large Tier-2s

• Further details at: – http://www.gridpp.ac.uk/eb/workdoc/

gridusebyexpts_0605.doc

Page 7: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Pleasure: LHCbShared data (LHCb RTTC production

May/June)Countries Events produced

UK 60 M

Italy 42 M

Swiss 23 M

France 11 M

Netherland 10 M

Spain 8 M

Russia 3 M

Grece 2.5 M

Canada 2 M

Germany 0.3 M

Belgium 0.2

Sweden 0.2 M

Romany,Hungary,Brasil,USA 0.8 M

The data reported are preliminary (accuracy at 5%)

5% produced with plain DIRAC sites95% produced with LCG sites

Page 8: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Pleasure: ATLAS

• Using the Grid for 100% of Simulation, Digitisation and Reconstruction.

• 8.5M fully simulated ATLAS events produced

• 20% of LCG jobs in UK

• Overall throughput good, and improving …

Page 9: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

Pain: ATLAS

• But … experience has been painful!• Significant throughput problems experienced in

January/February – production goals descoped (15M events planned vs. 8.5M ev.

actual).• Identified problems (highlights – see also questionnaire):

– System appears to function best when only one person submitting jobs!– Lack of a distributed mechanism for prioritising jobs– Lack of inter-operability between LCG and other Grids: load balancing

and data replication have to be done 'by hand'. Leads to production errors (e.g. same sample produced multiple times on different grids)

– Too much human intervention required to set, adjust and enforce priorities

– Could not saturate CPU resources on LCG easily (rate doubled with a simple change of scripts/person!): production time does not scale with cpu requirements

– Job definition/submission very (expert) labour intensive– Absolute need for a SE/SRM solution for small files.– Urgent need for VOMS, integrated with other grid tools for resource

allocation/access/monitoring/accounting

Page 10: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

H1 Tests

Page 11: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

H1 Tests

30 Jobs failed: 22 due to Grid problems (gridproxy/misc.)

Page 12: Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.

Dan Tovey, University of Sheffield

H1 Tests