CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

50
+ CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08 C. Zampolli Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett)

description

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett). CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08. C. Zampolli. To be followed up. LHC12f  being processed smoothly LHC12e  done, waiting for LHC12d - PowerPoint PPT Presentation

Transcript of CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

Page 1: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+

CPass0/CPass1 on LHC12f/e/d/cUpdated at 10:00 on 28/08

C. Zampolli

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

(S. Beckett)

Page 2: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+To be followed up

LHC12f being processed smoothly

LHC12e done, waiting for LHC12d

LHC12d being processed with Rev-22, failures in T0, and TRD

LHC12c manual update ongoing

pA MC test on LHC11f2 ongoing (merging will be submitted today)

pA data run 187338, done at CPass0, but no info on MonALISA, not appearing at CPass1 snapshot

pA pilot run CTP will have a Alias file with everything defined as kCalibBarrel (to my understanding)

8/28/12C. Zampolli

2

Page 3: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+To be followed up – II

kCalibBarrel triggers number published in logbook (waiting for news, promised by the end of August)

Validation codes from the detectors still missing TPC and MeanVertex, will help for short runs failing in TRD due to statistics

Replicating the OCDB entries: new functionality implemented by Raffaele how should it be implemented in the calib code? Member of the AliCDBManager? Otherwise everybody will have to change their code…

Reprocessing of old runs trigger classes defined, Aliases file to be created, then downscaling

Better selection of runs? Difficult in my opinion

8/28/12C. Zampolli

3

Page 4: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Some diagnostics

During the last 2 months of CPass0/CPass1 processing, (quite) some manual intervention was needed Fixing steering macros/scripts Restarting CPass0 and/or CPass1 Triggering CPass0 and/or CPass1 manually

Main reasons (to my memory… I might forget something) Wrong AddTaskTPCcalib.C committed to the release by mistake during

synchronization Merging of syswatch trees not properly tested and consuming too much

memory TPC wrong OCDB update in makeOCDB.C macro for CPass1 Wrong TPC gain threshold used for validation

8/28/12C. Zampolli

4

Page 5: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Some diagnostics – II

Reprocessing of LHC12d due to a bug in the TRD reconstruction Re-reprocessing of LHC12d due to a problem with TRD code in Rev-23 Some LHC12e runs to be reprocessed after a fix in the aliases files due to

“miscommunication” (mis = missing + wrong) between TRD, RC, Trigger, calibration

CPass1 manual triggering for runs failed in T0 at CPass0 (1 done, 20 to be done)

CPass1 manual triggering for a run for which CPass0 was merged manually (Raphaelle)

CNAF disk full ALICE::CERN::T0 issue

8/28/12C. Zampolli

5

Page 6: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Two more comments…

As already said in July, no modification in AliRoot that may affect the calibration should be requested to be ported to the Release if not properly tested in the calibration train on the grid I cannot know whether changes may affect the calibration, the detector

experts should

Since apparently it is not enough to show updates on Monday Offline, Tuesday RC, Thursday Offline Calibration Readiness and Friday Calibration usual meetings, I think it would be important that: One person representing all the detectors taking part in CPass0/CPass1

should always be present at the calibration meetings If the direct responsible(s) is not available, someone representing the

corresponding detector should anyway participate, to propagate the information discussed there.

8/28/12C. Zampolli

6

Page 7: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+How to decide when to process a run

Currently, we process runs marked as good (DAQ flag), duration > 5min, GRP ok, with Beam Could this be improved? Hardly to say… Not on the offline side at least…

8/28/12C. Zampolli

7

Page 8: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+LHC12f

8/28/12 C. Zampolli 8

Page 9: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00LHC12f

8/28/12C. Zampolli

69 in logbook Filters used: LHC12f, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC,

TRD, TOF, T0], with Beam

CPass0: Snapshot: 69 Reco+CalibTrain: 69 Merging+OCDB: 69, 1 of which running

CPass1: Snapshot: 49 Reco+CalibTrain: 49 Merging+OCDB: 44

9

Page 10: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f

8/28/12C. Zampolli

COSMICS: 0 failure expected

EMCAL/PHOS/MUON: 13 failure expected

No triggers: 0 failure expected (too short run)

EE/EV/Expired: 0 memory issue during the merging (under investigation)

Running: 1

Others (detectors): 5 (but all short runs)

Successful: 55, but 1 (187338) has no logs in MonALISA

55/(55+5) = 91.7% success rate

10

Page 11: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f

8/28/12C. Zampolli

Failure reason

Run Number

TRD (5)

186694

186816

186855

187147

187148

11

12 min, 4874 events/ 43825 tracks

6 min, 11111 events/ 114733 tracks

7 min, 11242 events/ 107505 tracks

All failures due to too short runs (number of events/tracks in terms of events used by TRD calibration)

7 min, 11089 events/ 138408 tracks

5 min, 11138 events/ 154946 tracks

Page 12: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f

8/28/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (13)

186805

186834

186926

186962

186980

186981

187046

187064

187081

187117

187133

187193

187198

12

Page 13: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12f

8/28/12C. Zampolli

Of the 55 successful runs: 49 at CPass1 reco+CalibTrain 44 at CPass1 merging+OCDB

13

Page 14: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+LHC12e

8/28/12 C. Zampolli 14

Page 15: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00LHC12e

8/28/12C. Zampolli

27 in logbook Filters used: LHC12e, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0]

CPass0, completed: Snapshot: 27 Reco+CalibTrain: 27 Merging+OCDB: 27, 21 useful, 14 ok

CPass1, completed: Snapshot: 15 Reco+CalibTrain: 15 Merging+OCDB: 15

15

Page 16: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e

8/28/12C. Zampolli

COSMICS: 0 failure expected

EMCAL/PHOS/MUON: 6 failure expected

No triggers: 0 failure expected (too short run)

EE/EV/Expired: 0 memory issue during the merging (under investigation)

Running: 0

Others (detectors): 10: 3 recovered so far for TRD, 7 remaining

Successful: 11 became 14

11/(11+10) = 52.4% success rate became: 14/(14+7) = 66.6%

16

Page 17: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e

8/28/12C. Zampolli

Failure reason Run Number

TRD (8)

186428 (*)

186429 (*)

186453 (*)

186456 (**)

186459 (**)

186507 (*)

186508 (**)

186598 (*)

17

Failure reason Run Number

TRD + T0 (1) 186600 (**)

Failure reason Run Number

T0 (1) 186601 TRD:

(*) suffered from missing class (CSPI8WU-S-NOPF-ALL) in the configuration during data taking

Fixed manually using CINT8WU-S-NOPF-ALL Cpass0/1 should be re-run

(**) suffered from statistics – 186459 has CSPI8WU-S-NOPF-ALL but with zero triggers)

T0 suffers from high background, but limits will be increased Re-running will be ok (but CPass1 should be triggered manually if Rev < Rev-23

will be used)

14 min, events

14 min, events

14 min, events

Page 18: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e - REPROCESSING

8/28/12C. Zampolli

Failure reason Run Number

TRD (5)

186428

186429

186453

186507

186598

18

Failure reason Run Number

T0 (1) 186601

Failed (statistics)Ok

CPass1 re-run! Failing again in CPass1 as expected, but T0 experts already fixed the OCDB

Page 19: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e

8/28/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (6)

186383

186405

186425

186448

186503

186589

19

Page 20: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12e

8/28/12C. Zampolli

Of the 14 successful runs, 15 at CPass1 ( one more since 186601 was inserted manually!): 15 at the snapshot 15 at CPass1 reco+CalibTrain 15 at CPass1 merging+OCDB

20

Page 21: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Actions

COMPLETED Since the period was too short, the manual update should be done together

with LHC12d waiting for this period to be completed

8/28/12C. Zampolli

21

Page 22: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+LHC12d

8/28/12 C. Zampolli 22

Page 23: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00LHC12d

8/28/12C. Zampolli

224 in logbook Filters used: LHC12d, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0]

CPass0 completed: Snapshot: 220 Reco+CalibTrain: 220 Merging+OCDB: 220, 176 needed, 147 ok

CPass1 completed: Snapshot: 148 (1 more than CPass0, triggered manually after CPass0) Reco+CalibTrain: 148 Merging+OCDB: 148, 148 needed

23

Page 24: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Difference between logbook and snapshot in MonALISA In logbook, but not in MonALISA:

184370 (EMCAL), 184645 (EMCAL), 185345 (ACORDE trigger), 185347 (ACORDE trigger), 185467 still in the migration process, checking with offline

In MonALISA but not in the logbook: 185190 (short run, the quality flag was changed)

8/28/12C. Zampolli

24

Page 25: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d

8/28/12C. Zampolli

COSMICS: 9 failure expected

EMCAL/PHOS/MUON: 33 failure expected

No triggers: 2 failure expected (too short run)

EE/EV/Expired: 1 memory issue during the merging, but then merged manually

Running: 0

Others (detectors): 28

Successful: 147

147/(147+28+1) = 83.5% success rate

25

Page 26: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d

8/28/12C. Zampolli

Failure reason Run Number

TPC Gain Threshold (1) 185460

Failure reason Run Number

COSMICS (9)

184880

184882

184885

184886

184889

184910

184914

184918

186264

26

Also TRD

16 recovered rerunning with looser constraints for validation (run 185460 not retried, since it failed anyway in TRD)

Page 27: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d

8/28/12C. Zampolli

27

Failure reason Run Number

T0 (20)

185687

185692

185695

185697

185698

185699

185700

185701

185734

185735

185738

185756

185757

185764

185765 Hardware problem, fixed now

Failure reason Run Number

185768

T0 (20)

185775

185776

185778

185784

Page 28: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d

8/28/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

184443

184481

184663

184664

184709

184716

184719

184762

184780

185024

185148

185186

185341

28

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

185456

185559

185560

185562

185631

185647

185677

185731

185934

185994

185998

186036

186062

186063

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

186159

186192

186224

186225

186232

186316

Page 29: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d

8/28/12C. Zampolli

Failure reason Run Number

No triggers (2)183915

185190

TRD (8)

184190

185133

185378

185460

185915

185916

186319

186320

EV (1) 184673

29

Also TPC

Merged manually

Page 30: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12d

8/28/12C. Zampolli

Of the 147 successful runs: 148 at CPass1 reco+CalibTrain

1 more than CPass0 since CPass0 was merged manually and the objects were uploaded manually in the OCDB (184673)

148 at CPass1 merging+OCDB… …of which 147 successful (ignore the red TPC color)… ...1 failed in TRD (184145)…

30

Different statistics for CPass0 and CPass1 480/480 chunks at CPass0 472/480 chunks at CPass1

Page 31: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+TRD issue

Due to a problem in the TRD reconstruction, some wrong OCDB entries were produced at CPass0; it is not possible to get the correct ones without re-running CPass0 Some manual OCDB update is needed (after LHC12d is fully processed,

ongoing for completed runs) DONE Then CPass0/CPass1 should be re-run with a Rev > Rev-18

Rev-23 (the latest) was used Changed in TRD code made the calibration not work properly More tests, new re-running with Rev-22

Will the failed runs be recovered? Waiting for experts’ reply still not known

8/28/12C. Zampolli

31

Page 32: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Actions

CPass0 completed 20 runs failed at CPass0 due to T0 hardware problems

CPass1 should be triggered manually for these runs To be done after reprocessing, since now it would be useless (they all contain

TRD) Re-running with Rev-22… ongoing

8/28/12C. Zampolli

32

Page 33: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing

8/28/12C. Zampolli

Failure reason Run Number

TRD (1) 184145

185378

185460

185916

33

12 min, 11490 events/ 208981 tracks, had not failed before

Page 34: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing

8/28/12C. Zampolli

34

Failure reason Run Number

T0 (20)

185687

185692

185695

185697

185698

185699

185700

185701

185734

185735

185738

185756

185757

185764

185765 Hardware problem, fixed now

Failure reason Run Number

185768

T0 (20)

185775

185776

185778

185784

Page 35: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+LHC12c

8/28/12 C. Zampolli 35

Page 36: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00LHC12c

8/28/12C. Zampolli

205 in logbook Filters used: LHC12c, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0] Do not coincide with those in MonALISA, since runs were queued

manually for CPass0

CPass0 completed: Snapshot: 208, 1 should be ignored (179444) Reco+CalibTrain: 207 Merging+OCDB: 207, 109 needed, 93 ok

CPass1 completed: Snapshot: 93 Reco+CalibTrain: 93 Merging+OCDB: 93

36

Page 37: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

COSMICS: 37 failure expected

EMCAL/PHOS/MUON: 58 failure expected

No triggers: 3 failure expected (too short, or not the right trigger configuration)

EE/EV/Expired: 0

Others (detectors): 16

Successful: 93

93/(93+16) = 85.3% success rate

37

Page 38: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

Failure reason Run Number

COSMICS (37)

179941

179943

179944

179946

179948

179950

179951

179960

180164

180979

180980

180981

180983

180984

180985

Failure reason Run Number

COSMICS (37)

180986

180987

180988

180991

180992

182749

182750

38

Failure reason Run Number

COSMICS (37)

179658

179712

179713

179717

179723

179725

179730

179736

179740

179742

179743

179746

179747

179758

179766

Page 39: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

179595

179603

179604

179685

179687

180552

180559

180616

180643

180644

180692

180704

39

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

181026

181040

181046

181328

181339

181344

181360

181546

181558

Page 40: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

181580

181625

181631

181954

181956

181984

182003

182094

182100

182103

182195

182198

182200

182226

Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

40

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

182316

182403

182405

182410

182449

182451

182452

182470

182471

182475

182477

Page 41: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

41

Failure reason Run Number

EMCAL/MUON/PHOS runs (60)

182499

182502

182504

182609

182610

182612

182640

182641

182681

182712

182717

182721

Page 42: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c

8/28/12C. Zampolli

Failure reason Run Number

No triggers (3)

180934

181609

182639

Failure reason Run Number

TRD (7)

180716 (*)

180717 (*)

182325 (*)

182509 (*)

182508 (*)

182513 (*)

182724 (*)

Failure reason Run Number

TPC+TRD (9)

181617 (**)

181618 (**)

181619 (**)

181620 (**)

181652 (**)

181694 (**)

181698 (**)

181701 (**)

181703 (**)

42

(*) Low statistics, recoverable(*) Low statistics, not recoverable(**) No SSD/SDD number of contributors to Vertex Track = 0, TRD calibration failing, TRD fix in place; what about TPC?

Page 43: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12c

8/28/12C. Zampolli

Of the 93 successful runs: 93 at CPass1 reco+CalibTrain 93 at CPass1 merging+OCDB…

…of which 84 successful in CPass1 (ignore the red TPC color)… …and 9 failed in T0, but are MUON runs – they should have not gone

through (different AliRoot, some changes in T0)

As soon as CPass1 is completed, 1 week of time will be given for manual update. If too little (QM, holidays), we’ll increase it. Then, Vpass should start

43

Page 44: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Actions CPass0 completed; 9 runs failed in TPC and TRD

Not recoverable, no CPass1 7 runs failed in TRD due to low statistics

TRD can recover them manually, but no CPass1 would be run after those how will the other detectors mark these runs?

TOF, T0 bad Mean Vertex good TRP? TRD?

CPass1 completed on the available runs

In summary, ready for the manual update window

8/28/12C. Zampolli

44

1 week for the manual update announced: deadline on Friday 31 Aug (so far, eventually extended to Monday)

Page 45: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Further comments

8/28/12 C. Zampolli 45

Page 46: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Interdependencies

Under discussion: does EMCAL runs need calibration triggers? (PHOS does not) Seems not!

8/28/12C. Zampolli

46

Page 47: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Further issues

Some reconstruction jobs fail with bad_alloc under investigation Grid tests with gdb ongoing not many information retrievable, the jobs

ran successfully Valgrind test ongoing did not show anything significant Trying with Rev-21 on LHC12c, LHC12e

Many errors, but FPE, not bad_alloc stack trace available I could not reproduce the problem, still investigating

8/28/12C. Zampolli

47

Page 48: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+PPass

LHC12a and LHC12b Vpass validated ready for Ppass A patched Rev-16 was created to fix the TRD QA issue to be used to run

Ppass LHC12a completed, QA feedback last week LHC12b completed, QA feedback last week

8/28/12C. Zampolli

48

Page 49: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+Calibration of old data

GRP/CTP/Aliases entries to be created, after defining the classes to be used for the reconstruction Might be needed to apply some downscale min(max(nevents/10,30000),nevents)/nevents, but we need to define

nevents

8/28/12C. Zampolli

49

Page 50: CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

+pA

Since MB will be the main trigger, we propose to use that and downscale. For the pA pilot run, all data are asked to be reconstructed, keeping ESDs,

friends, and ITS RecPoints

Tests on the LHC11f2 ongoing feedback will be asked

8/28/12C. Zampolli

50