Run Coordinator Report on behalf of everybody involved in Pit Operation
description
Transcript of Run Coordinator Report on behalf of everybody involved in Pit Operation
R. Jacobsson 1
Pit Operation - “Luminosity Production” - is in good hands with many devoted and competent people from experts to shifters
• But as conclusion will state, we need more to guarantee quality physics
Experts would also like to be able to devote a bit of time to physics analysis
• Also, luckily we left behind a very good team at CERN meeting the challenges of this week with nominal bunch intensities!
R. Jacobsson
Concentrate on global topics that are of concern interest to entire collaboration
• Will not discuss the status of the individual sub-detectors unless affecting global operation
• In the past presented a lot how we followed and participated to the beam commissioning
Main topics• Operation up to now• Operational status and efficiency• Luminosity• Data Quality• First experience with nominal bunches• Trigger • Organization• Tools to follow operation• Shifter situation, the working model, and the needs for the future
2
R. Jacobsson
Machine and Experiment Availability• Extremely low average failure rates * extremely high number of vital system = 0.50• Thunderstorms daily now!
Tripped LHCb magnet already twice
3
Wednesday:1.AFS problem2.SPS down3.Thunderstorm4.VELO motion5.….
A lot for one day…..Still we took 1h of physics!•Wrong fill number for 30min!
R. Jacobsson
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12
Day shifts & Day Piquet
LHCb PostMortem meeting
LHC Beam Commission Workshop
LHC Chamonix Workshop
Meeting on calibration run needs
24h Shifts 11/2 -
LHC + Exps Dryn Run
First beam in LHC 28/2
450 GeV Technical Fills?
Power-up & Online Upgrades
Detector Calibrations & Dataflow?
Cosmics 13/2-14/2
TED Run 18/2
Determining colliding beams?
LHCb Magnet (V-) w. beam
LHCb Magnet (V+) w. beam?
January February Mars
Sub-detector stand-alone works and testsContinuous global 'Heat Run' with rate ramp (First detector calibrations)
LHC Beam Commissioning
MD Mode = off No Beam = Tests
Where the plan stopped at the RC report in March:
4
R. Jacobsson 5
Nominal bunchesB downHLT1 rejectionHLT2 pass-thru
Bup~5nb-1 Bdown~7.6nb-1
Minimum Bias, HLT pass-thru
MB<1 kHzHLT1 rejection
MB<100 Hz(H
z/b
)
R. Jacobsson
Cumulative (In-)efficiency logging implemented since fill 1089• Breakdown on HV, VELO, DAQ, DAQ Lifetime (trigger throttling)• Entered into Run Database
6
Operational luminosity (in-)efficiencies May 10 – June 5
R. Jacobsson
LHCb dependence on LHC:• Short-hand page for LHC Operators and EICs
• Completely automatized for LHCb Shifters requiring ‘only’ confirmations Also Voice Assistance VELO still to be fully integrated Very advanced as compared to Atlas and CMS….
7
R. Jacobsson
LHCb State Control
8
R. Jacobsson
Shifter Voice Assistance• Draw attention to new information or changes
LHC Page 1, injection, optimization scans, etc• Instructions for LHCb State Control handling
HV/LV handling, BCM rearm etc • Undesired events…
Beams lost, run stopped, magnet trip, clock loss• DSS Alarms, Histogram Alarms to be added and voice quality to be improved
• Related work in progress: Clean up shifters instructions on the consoles and add help button to all displays
Collapse of separation bumps simultaneous between all experiments• Golden orbit established with improved reproducibility Good luminosity already during Adjust Optimization scan right at start of Stable Beams starting with experiment with lowest
luminosity
Full VELO powering during ADJUST (TCTs at physics setting and separation bumps collapsed) Powering of VELO by central shifters next step
• Future of VELO Closure by Shifter/Expert being discussed Closing Manager now very user friendly Aim for have “on-call” shifter for closing, preferably same as piquet
End of fill calibrations, automization?
9
R. Jacobsson
Work on automatic recovery from DAQ problems in progress• Added one after the other• Start testing Autopilot
Majority mechanism when configuring farm to start run being looked into• Farm sw and storage sw crashes still room for improvement
Exclusion/recovery of problematic (sub)farms on the fly while taking data• Routine procedure for shifters
Recovery of monitoring/reconstruction/calibration farms while taking data
Faster recovery of sub-detectors without stopping the run (only trigger) becoming routine maneuvers for most shifters
10
R. Jacobsson 11
Two numbers for Trigger Deadtime counting•TriggerLivetime(L0) @ bb-crossings•TriggerLivetime(Lumi) @ bb-crossings
Also major improvements made on monitoring of the HLT (histograms and trends)• Both technical parameters and physics retention
R. Jacobsson
Problem with DAQ and control switch seems solved
Storage problem earlier this year also solved
Purchase of farm during 2010Q3, install in November during ion run
Some outstanding subdetector problems:• Dieing VCSELs in the subdetectors is a worry• SPECS connections in the OT tracker• Control of ISEG HV for VELO seems solved changing from Systec to Peak• L0 Derandomizer emulation of Beetle – about to be addressed• …
System Diagnostics Tools is like an AOB on every agenda since always…• Alarm screen and Log viewer• Well, better solving the problem than adding the alarm if that works!
12
R. Jacobsson
Data Quality is of highest importance now (together with trigger)• Main problem: we need more interest/participation from people doing physics analysis
Discover and document “which” problem is tolerable and not tolerable• Impact on data quality in order to know the urgency of solving a problem operational
efficiency Aim at perfect data obviously more than 100% operational efficiency But recoveries should be well thought through, well planned and swift, and to the extent that it is
possible coordinated with other pending recoveries!
• How to classify data quality problems for different physics analysis
Establish routine for use of Problem Database• Checking in and checking out entries, fast feedback
Procedure outlined for decision on detector interventions which may have an impact on data quality
Working group setup to address Online Data Quality tools and follow up• Improvements of histogram presenter, histogram analysis, alarms etc• Need for trend plots, trend presenter and trend database being looked into• Documenting quality problems and their impacts/recoveries• Reconstruction Farm and associated histograms• More interest from subdetectors would be welcome
13
R. Jacobsson
Shifter catalogue • Most important/significant histograms with descriptions and references• Several iterations, still need improvements and links to severity/actions
Alarm panel from automatic histogram analysis• Associate sound/voice to alarms
14
R. Jacobsson
The tool for registering data quality problems – Problem Database• Shared between Online – Offline• http://lbproblems.cern.ch/ (“Problem DB” from LHCb Welcome page)
15
R. Jacobsson
Three sources of luminosity online• Counted by ODIN using non-prescaled L0Calo or L0Muon trigger from L0DU
Getting average number of interactions per crossing and pileup from fraction of null crossings Correcting luminosity real-time
Recorded luminosity
• Beam Loss Scintillators acceptance determined relative to L0 Calo Luminosity corrected for pileup
• LHC Collision rate monitors (BRANs) Not yet calibrated but in principle only used for cross-checking
• Combination gives delivered luminosity• Recorded in Online archive, Run Database, LHC displays and logging, and LHC
Program Coordinator plots (delivered) for overall machine performance
Optimization scans are based on this combined luminosity
For offline lumi triggers containing luminosity counters – “nanofied”• Tool being finalized to obtain integrated luminosity on analyzed files• Constantly at 1 kHz• Careful changing thresholds/prescaling on sources of the lumi counters
16
R. Jacobsson
http://lbrundb.cern.ch/ (“RunDB” on LHCb Welcome page) • Tool for anybody in the collaboration to get rough idea on data collected• Help/documentation should be linked
17
R. Jacobsson
Van der Meer scans• To a large extent automatic with ODIN connected directly to the scan data received from
LHC real-time and flagging the steps in the data Allows easy offline analysis
Has allowed a first determination of length scales (LHC/VELO) and of absolute luminosity:• Visible L0Calo cross-section to 60+/-6 mb (prel)
• From MC: (L0 CALO) = (L0) x 0.937 = 63.7 * 0.937 = 59.7 mb
• Many things still to be verified, another vdM scan is on our planning Also allows another method to extract beam shapes and VELO resolution
18
R. Jacobsson
Access to experiment condition archive in the online system • Machine settings
• Beam parameters measured by machine and LHCb
• Backgrounds measured by machine and LHCb
• Trigger rates, luminosities, VELO luminous region, bunch profiles
• Run performance numbers, etc
Tool also produces LPC files for luminosity, luminous region and bunch profile data19
R. Jacobsson
1. Arrived at a dead-end with Qbunch ~ 2E10 (max 4-5E10)
2. More to understand with increasing Qbunch than Nbunch
3. Summer months with not all experts present
4. Keep up luminosity ladder for this year
June 9 - June 25 (16 days!)
20
7x7@5E10
13x13@2E10
R. Jacobsson
Increasing number of nominal bunches through July-August• 170 kJ 1.5 MJ• Gain experience• Understand already strange bunch/beam behaviour• LHC Operation does not feel ready for 0.5 – 1 MJ yet, work in progress
21
2x2 1e11 2 1 112 2.5E29 0.005 (1 fills)3x3 1e11 3 2 168 5.0E29 0.03 (3 fills) 6x6 1e11 6 4 336 1.0E30 0.7 (10 fills) 12x12 1e11 12 8 672 2.0E30 2.1 (10 fills) 24x24 1e11 24 16 1344 4.0E30 4.9 (10 fills)Trains needed…
2x2 1e11 2 1 112 2.5E29 0.005 (1 fills)3x3 1e11 3 2 168 5.0E29 0.03 (3 fills) 6x6 1e11 6 4 336 1.0E30 0.7 (10 fills) 12x12 1e11 12 8 672 2.0E30 2.1 (10 fills) 24x24 1e11 24 16 1344 4.0E30 4.9 (10 fills)Trains needed…
R. Jacobsson
Complete two-day internal review of the Machine and Experiment Protection• >1.5 (3) MJ• Long list of actions• Will be followed by a complete external review
Dump following lightning strike and power blackout!
22
R. Jacobsson
Four fills with 3x3 • #fill Qbunch L0Calo Pileup PeakLumi Efficiency• 1179 0.8E11 7500 1.2 0.15 78% (VELO lumi-monitoring/BPM/new conf)• 1182 0.9E11 16000 1.7 0.46 68% (deadtime, HLT blocked)• 1185 1.15E11 19300 2.3 0.73 85% (RICH, VELO, • 1186 10000 1.3 0.22 To be patched (wrong fill number but stable)• 1188 16000 1.7 0.46 65% (Storage, HLT, VELO, Trigger OK)
23
Rocky start!... Old L0 settings + HLT1+ HLT2Express (Stable but 15% deadtime)Reconfiguring: New L0 settings + HLT1+ HLT2Full (30 min)
Memory and combinatorics – run died and tasks stuck…
2 hours to recover/reconfigure New L0 + HLT1 + HLT2Express
Completely stable through entire night
R. Jacobsson
We’ve been sailing in light breeze up to now
Not only interaction pileup but also problem pileup• Pileup 2.3!
• Occupancies E.g. Problem with MTU size for UKL1
• Event size 85 kB (used to be 35 kB)
• Storage backpressure Running with 10% - 20% deadtime at 1500 – 2000 Hz at 85 kB (peak!) Suspicion is that MD5 checksum calculation limits output (again) to 1 Gb/s
• Lurking instabilities in weak individual electronics boards? Desychronizations, data corruption, strange errors at beginning of fills….
24
R. Jacobsson
Peak occupancies 22%! Average >7.5% as compared to 5% in the past
25
R. Jacobsson
(0x2710 0x1F)• L0-Mb (CALO, MUON, minbias, SPD, SPD40, PU, PU20) Prescale by 100• Physics
Electron 700 MeV 1400 MeV Hadron 1220 MeV 2260 MeV Muon 320 MeV 1000 MeV Dimuon 320/80MeV 400 MeV Photon 2400 MeV 2400 MeV
Yet another configuration prepared• L0xHLT1 retention 2%, including HLT2 would allow to go to 200 kHz• Would prefer not to use even if we have to run with a bit of deadtime
Changed to solve 10% - 20% deadtime problem• System completely stable with deadtime but long to stop in case of problems….
10 kHz of random bb-crossing and be-, eb-, ee-crossings according to • Weighting {bb:0.7, eb:0.15, be:0.1, ee:0.05}
26
R. Jacobsson
Technical problems in HLT• HLT1 (3D) OK with 7.5 % retention
• HLT2Express stable but contains only J/, , KS, Ds, D*D0BeamHalo
• HLT2Full (150++ lines) serious problems and surely a lot of unnecessary overlap• HLT2Core (81 lines) validated with FEST and data taken during weekend
Configured in pass-through now to test it and check output before we have to switch on rejection >6x6
Best compromise we have for the moment together with L0TCK 0x1F First impression is that it was working stable during fill this night
• Processing time for HLT with HLT2Express observed to be 140ms…450 nodes x 8 task * 1/140E-3 = 26 kHz!To be followed upShould see how this developed with HLT2Core during this nights fill
• Two measures to solve bad memory behavior partly and stuck tasks already done Activating swap space on local disk of farm node improved significantly the situation Automatic script prepared which would kill the leader
Requires careful tuning and testing since memory spread is narrow
Memory/disk in Westmere machines?
27
R. Jacobsson
We managed to take a lot of data containing full natural mixture of pileup• Invaluable for testing, validating and debugging HLT• Lucky we got nominal intensity now with few bunches!...
We aim hard to be flexible and should keep this spirit• But converge quickly on compromise for physics and technical limitations
Most of all solve bugs and tune system
• Avoid cornering ourselves in phase space now in panic by severe cuts• Exploring and understanding is now or never
Procedure for release of new TCKs works well now and efficient• But should not be abused!
FEST is an indispensible tool for testing/debugging/validating HLT• Make sure it satisfies needs for future• More HLT real-time diagnostics tools to be developed
Effect of L0 derandomizer and trains….• No proper emulation for Beetle and we are forced to only exploit half of buffer• We currently accept all crossings… Filling scheme for autumn 25% L0 deadtime
28
R. Jacobsson
Two possibilities to reduce luminosity per bunch• Back-off on beta*
Requires several days – week of machine commissioning
• Collision offset in the vertical plane Beam-beam interaction with an offset between the beams can result in an emittance growth Follow ongoing tests for Alice to reduce luminosity by a factor 30
• Hoped to detailed news from Alice beam offset tests Attempt during end-of-fill study this morning but not completed due to control software
HOT NEWS while I was in the plane: Seems to work fine
29
R. Jacobsson
Daily Run Meeting ~30 minutes• EVO everyday• Chaired by Run Chief• 24h summary with Run Summary attached to the agenda (Lumi, Efficiency, Beam, Background)• LHC status and plan• Round table where experts comment on problems• Internal Plan of the Day
Minutes from Run Meeting and other postings on Run News serve two purposes• Expert follow up on problem• Inform collaboration about daily operation – strive for public language in 24h summary and plan
for next 24h
Improve• Systematic follow up on data quality• Check lists• Checkup on Piquet routines• Invite more Run Chiefs – already discussed with several candidates• Meetings three days a week when we are ready for this (Monday – Wednesday – Friday)
Requires more discipline from piquets and efficient exchange of information directly with involved people
• Synchronize piquets take-over with overlaps
30
R. Jacobsson
http://lhcbproject.web.cern.ch/lhcbproject/online/comet/Online/• (“Status” from LHCb welcome page)
31
R. Jacobsson
1.Shifter Intro
1. Introduction
2. Pit Area 8 – LHCb
3. Control Room
4. Cavern
5. Access to Cavern
6. Shift Organization
7. Safety
8. Calling Experts
9. Coordinators
10.Experts
11.Online computers
12.Shifter Duties
13.Shift Logbook
14.LHCb Status
15.LHC Status
16.LHC Logbook
17.Documentation
18.Conclusion
R. Jacobsson
1. Introduction for LHCb Shifters
1
2. SLIMOS
1. Role of SLIMOS
2. Safety Systems
3. Level 3 Alarms
4. L3 Alarm and fire brigade
5. L3 and SLIMOS duties
6. Emergency Panel
7. Detector Safety System
8. DSS Panel
9. Contacts
R. Jacobsson
2. SLIMOS
1
32
3. Basic Concepts
1. Introduction
2. LHCb at LHC
3. Coordinate Systems
4. Insertion Region 8
5. Injection
6. Filling Schemes
7. Collimation
8. Beam Dump
9. Fill Procedure
10.Crossing Angle
11.Timing
12.LHCb Detector
13.Readout System
14.Trigger
15.Luminosity
16.Backgrounds
R. Jacobsson
3. Basic Concepts for Shifters
1
4. Running LHCb
1. Introduction
2. Running LHCb
3. Operational Phases
4. Shifter Interfaces
5. LHC Page 1
6. LHCb Overview
7. LHC/LHC Op.View
8. Intensity&Luminosity
9. Backgrounds
10.LHCb Beam Dumps
11.Beam Pos.Monitor
12.Timing
13.Trigger Rates
14.Run Change
15.Run Performance
16.Experiment Status
17.Magnet
18.Cavern Radiation
R. Jacobsson
4. Running LHCb
1
5. Data Manager
1. Introduction
2. Data Manager Duties
3. Quality Checking
4. Problem Reporting
5. Data Monitoring
6. Histogram Presenter
7. Trend Presenter
8. Event Display
9. Run & File Status
10.Problem Database
11.Logbook
R. Jacobsson
5. Data Manager
1
6. Shift Leader
1. Introduction
2. SL Duties
3. Golden Rules
4. Operational Procedure
5. Mode Handshakes
6. Cold Start
7. LHCb State Control
8. Clock Switching
9. End of Fill
10.Machine Development
11.Run Control
12.System Allocation
13.System Configuration
14.Run/File Status
15.Farm Node Status
16.Dead Time
17.Error
18.Slow Control
19.Access
R. Jacobsson
6. Shift Leader
1
Shifter Training• Completely overhauled and updated training slides• Refrsher course now as well
With EVO in future Invite piquets to go through Shifter Histograms with Data Managers
• Insist more on shifts with already experienced shifters as newcomer
R. Jacobsson
In my view the experiment consists of sort of three levels of activities:1. Maintaining and developing all from electronics to the last bit of software in the common
interest of the experiment.
2. Producing the data we use for analysis, basically carried out by four types of shifters: Shift Leader, Data Manager, Production Manager, Data Quality checker
3. Consuming the data and producing physics results
• Activity 1 and 2 should not be compared and counted in the same "sum“• Activities 2 and 3 are instead coupled:
"I contribute to produce the data that I analyze"
• Huge benefit taking regular shifts, learn about data quality, and have the opportunity to discuss and exchange information about problems met in your analysis of real data
Shifter situation “Far from satisfactory” – What does it mean?• Means that “the situation is vital to improve” by:
1. Maintaining current commitments
2. And making an additional effort which is relatively modest spread across all of LHCb!
33
R. Jacobsson
Shifter model based on the idea of “volunteers”• Not synonymous with “offering a favour” to people heavily involved in operating LHCb• Based on the idea of feeling responsible, in particular for your own data• We need people interested in learning about the detectors and data they are hopefully
going to use Each group would normally find the representatives themselves, also to a large extent meaning
an Experiment Link Person
• Why this model? Because we don’t have neither the tools, nor the time and strength to be bureaucratic
• However, up to now not sufficiently clear on the size of the required commitments
November 2009 – July 2010 #/24h #Shifters #Shifts
• Active Shift Leaders 3 30 660• Active Data Managers: 3 61 (- Dec) 564 • Active Production Managers: 2 27 408• Active Data Quality Checkers: 1 11 13
Total 9 129 1768
34
R. Jacobsson 35
November 2009 – July 2010
Current Normalized Contribution
Institute
R. Jacobsson 36
November 2009 – July 2010
Institute
R. Jacobsson 37
Nov 09 – July 10 (3 / 24h) Nov 09 – July 10 (3 / 24h)
Nov 09 – Dec 10 (2 / 24h) Nov 09 – July 10 (1 / 24h)
R. Jacobsson
Assuming• Perfect uniform availability (no exclusion of weekends, nights)• Immediate replacement of people leaving and no lag in training new people
38
R. Jacobsson
Change in subdetector piquets coverage being increasingly assured by non-experts instead of experts
Should free the people with the ideal profile for shift leader shifts this year
39
“One available shifter taking 4-6 shifts every 2 months per 3 authors”,
Recruited 2010-2011 from:1. Shift Leaders: A pool of 50-100 people with experience in commissioning/operation of LHCb
2. Data Managers: All authors making physics analysis
3. Production Managers: A pool of 50-100 people with experience with analysis on Grid
4. Data Quality: All authors making physics analysis
R. Jacobsson 40
R. Jacobsson
Experiment Conditions are good, machine is very clean
Data Quality• Requires fast reaction time and feed-back/good communication with offline• Establish the habit and routine• No Data Quality offline now for two weeks!
Find appropriate compromise for trigger is of absolute highest priority and solve technical issues
• Dedicate time/luminosity intelligently now
System stability, individually is good but multiplied with the number….• Sensibilize everybody to react to any anomaly and act quickly• Big step from 10 years of MC to real data
Masochistic exercise to produce shifter bstatistics• Need improvements and functions in ShiftDB tool
Great team work, spirit and perseverance • Join us to produce Your data!
LHC bunch evolution til end of August• Up to 24 bunches with 16 colliding in LHCb = 1.55 MJ/beam
41
R. Jacobsson
Regular opportunities for access up to now OT tracker opened 3 times to change FE box
• Impact on data quality
Procedure for filing access request and handling works well• Taken care of very well by shifter, Run Chiefs and Access Piquet/RPs
Issue:• Still no instruments for radioactivity in magnetic field!
Complicates access where in principle magnet could be left on
42