Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

10
Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15

Transcript of Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Page 1: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Distributed Computing Operations

Stefan RoiserLHCb Computing Operations Workshop

27 Jan ‘15

Page 2: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 2

Content

• Roles– Shifters, GEOC, “LHCb 3rd line”

• Some ideas for Operations in Run2– LHCb Computing Operations Meeting– “SCRUM” in Operations?

• DIRAC/SAM jobs

27 Jan '15

Page 3: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 3

Shifters

• What we did in the last ½ year– Shifters only assigned during “heavy campaigns”

• e.g. Stripping 21

– Was this useful? Shall we continue like this? • Prompt reco is certainly also a “heavy campaign”

• Some ideas on how to improve the situation– Use the new web portal as central piece of info

• Shall contain ALL information relevant for shifters• To be maintained by the WLCG Comp Ops Coordinator

– Changes done centrally will propagate to “subscribers” of the plots

– Scratch everything else• www.lhcb-shifters.cern.ch, twiki,

27 Jan '15

Page 4: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 4

Example of new shifters page

• How to import it– In “Settings” change role to “lhcb_shifter” – “Applications” -> “Public State Manager” -> “Desktops” -> “Shared

Desktops” -> “Shifter_Overview” -> “Load”

27 Jan '15

Every page include help -> “?”

Every “help” page has sections:• “Introduction”

– How this page is organized• “Plots explained”

– Detailed explanation of each (group of) plots

• “What to look for”– Hints on possible errors to

check• “Additional Info”

Page 5: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 5

The GEOC role

• Central point of info for Distributed Computing Operations– Is the first contact point for everything that concerns

LHCb Distributed Computing Operations• Manages and possibly solves all issues • May involve others: relay to “LHCb 3rd line support or ask

shifters for help

– Receives info from Shifters, Sites, Production team, LHCb/DIRAC developers

– Provides info to Sites, WLCG Services– Organizes, participates to LHCb/WLCG meetings

27 Jan '15

Page 6: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 6

More ideas for GEOCs

• Shall we have a “handover time”, e.g. shift for 8 days where Monday we have 2 GEOCs

• More involvement into WLCG Ops, e.g. attending “WLCG Operations Coordination meeting”?

• Do we need an “Organogram” of LHCb Computing? – Who is responsible for what in the “3rd line”?

• More involvement of the GEOC in other operational tasks? – E.g. production closing

27 Jan '15

Page 7: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 7

The “LHCb Computing Operations” meeting

• Currently held – Mo, We, Thu @ 11.30 in CERN/2-R-14– Organized by GEOC

• Do we need changes in Run2?– Come back to a daily meeting? – Meeting time, can we have it earlier?– Do we need video in the room?

27 Jan '15

Page 8: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 8

Trello• Lightweight tool to

– Followup on daily operational tasks– All operations people to be member of a “shared board”– We can assign people, give deadlines, categorize, track progress– Proposal: The GEOC of the week keeps the overview of the board, e.g.

during/after “Ops meeting” creates new assignments• Everybody assigned is responsible for moving “his task” through the different

states

27 Jan '15

Page 9: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 9

How to go on from here …

• We now have a good starting point for LHCb Distributed Computing Operations

• Create repository of slides and link it from Dirac web portal

• Keep the slides up to date -> responsibility of everybody– Will ask to go through / update slides approx every ½ year,

to be organized by “Computing Operations Coordinator” – Do we need to have things spelled out? E.g. written down

docu, twiki?

27 Jan '15

Page 10: Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.

Dist Comp Operations - StR 10

Thanks to everybody for contributing to the meeting with slides, ideas, discussions, … !!!!

27 Jan '15