EDELWEISS data structure and analysis framework · 2014. 5. 21. · proc0: copy to Lyon proc1:...

39
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Benjamin Schmidt, June 2014 at MPI München www.kit.edu Photo by Böhringer Friedrich EDELWEISS data structure and analysis framework

Transcript of EDELWEISS data structure and analysis framework · 2014. 5. 21. · proc0: copy to Lyon proc1:...

  • KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft

    Benjamin Schmidt, June 2014 at MPI München

    www.kit.edu

    Photo by Böhringer Friedrich

    EDELWEISS data structure and analysis framework

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 2

    Motivation to build a new data structure and analysis framework (Kdata)

    !   We had: Edw-II data analysis dispersed between Ana and Era !   2 experts (full time analysis) !   Each with their own code

    single(few local)-user / single-programmer !   2010 A. Cox and I struggling to find, to access and to analyze Edw2

    data !   Coincidence (Muon-Veto/Bolometer) study as diploma work

    Benjamin Schmidt

    Era Root based, but difficult access, no server with most recent code/data… Saclay Ana Fortran, Paw and C, No paw support, French comments in code/data…

    Task: Get the data

    J. Cham

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 3

    !   Short term facilitate data access !   Build flexible event based data structure !   Single combined HLA-file:

    muon-veto and bolometer data !   Make code and data easily available

    Documentation

    !   Long term establish a common collaboration-wide analysis and data storage tool !   Share tasks (calibration, template creation, …) / Remove barriers

    (documentation) !   Allow for upgrade to 100’s of detectors – develop automatic processing

    scheme

    Benjamin Schmidt

    Motivation to build a new data structure and analysis framework (Kdata)

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 4

    The general picture – The idea All software modules

    Benjamin Schmidt

    KDS data structure

    KPTA pulse trace analysis Kamping

    Raw Amp HLA

    Analysis: KDataPy KQPA

    DAQ KSamba

    ampToHLA

    A bit special: Standalone code Extensive use of templates

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 5

    Specific known - unknown requirements during Kdata development

    !   Requirements Edw-3: !   10 -> 40 detectors

    !   Larger workload for debugging, calibration and analysis

    !   New detector design (channel number/specifics initially unknown)

    !   New electronics (some specifics unknown)

    !   1st time resolved ionization signals (trace length?, num traces?) !   Change in analog amplifiers -> signal shape?, trace length?, sampling? !   new efforts to optimize signal treatment needed

    !   Integrate muon-veto in bolo DAQ

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 6

    !   The idea: !   Build a data storage and analysis framework use ROOT

    for event-based physics data !   Fast I/O !   Support for LHC lifetime !   Data compression !   Statistics tools !   Well known

    !   C++ class library for data encapsulation !   Keep it modular !   Keep it flexible and general !   Try to keep it simple !   Keep fully split tree (library independent)

    !   Document it !   Make it easily accessible Benjamin Schmidt

    Event based data sorage Kdata - implementation

    repository

    https://edwdev-ik.fzk.de/SVN_Repository_for_the_KIT_Dark_Matter_Group/KData.html

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 7

    Kdata event structure in detail

    !   Use ROOT types !   No nested arrays

    ! Kdata library not needed to read data !   Long livety of data guaranteed

    ! Kdata coded consistent to ROOT and taligent coding style: !   Easier to read/collaborate/check code !   For example:

    !   classes defined in header .h; implemented in .cxx !   variables start with small f (fChannelName; fAmp; fExtra; …) !   functions start with capital letter GetChannelName(); GetTrace();… ! Kds completely implemented with Get…() and Set…() methods

    àTab completion (ipython, root session)

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 8

    Kdata event structure in detail

    !   ROOT TTree with single event branch !   Event with flexible structure:

    !   Variable sized TClonesArrays for Bolometer-, BoloPulse-, PulseAnalysis-, Samba- and MuonModule information

    !   Allows to change in hardware number of bolos/number of channels per bolo… without code change in “kds” (data structure source code)!

    !   Requires some effort to get to know, though

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 9

    Kdata event structure Logic Layout:

    Benjamin Schmidt

    TTree

    KEvent

    KBoloPulseRecord = Channel

    KPulseAnalysisRecord

    KSambaRecord

    KMuonModuleRecords

    KBolometerRecord

    Logic event structure via TRef and TRefArray Very powerful – can be spread over files,…. A word of caution though: Require specific handling in event building: Never forget to reset the referenced object count TProcessID::SetObjectCount ->blowing up file size otherwise Probably most bugs and pbs in kds were related to TRef issues

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 10

    Kdata event structure Logic Layout:

    Benjamin Schmidt

    TTree

    KEvent

    KBolometerRecord

    KBoloPulseRecord = Channel

    KPulseAnalysisRecord

    KSambaRecord

    KMuonModuleRecords

    Looping in python: for event in filereader: for bolo in event.boloRecords(): for pulse in bolo.pulseRecords(): for analyis in pulse.analysisRecords(): Looping C++ style in python: for i in range(f.GetEntries()): f.GetEntry(i) event = f.GetEvent() for ii in range(event.GetNumBolos()): bolo = event.GetBolo(ii) samba = bolo.GetSambaRecord() print samba.GetNtpDateSec() for iii in range(bolo.GetNumPulseRecords()): pulse = bolo.GetPulseRecord(iii) Trace = pulse.GetTrace() …

    KPulseAnalysisRecord

    KPulseAnalysisRecord

    Bandpass analysis

    Optimal filter

    Trapezoidal filter …

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 11

    Kdata event structure in detail

    Benjamin Schmidt

    !   Structure subclassed in !   Raw: KRawEvent, KRawBolometerRecord, … !   Amp: KAmpEvent, KAmpBolometerRecord, …. !   HLA: KHLAEvent, KHLABolometerRecord, …

    Raw – with pulse traces! No KPulseAnalysisRecords

    Amp and HLA – no pulse traces, but KPulseAnalysisRecord

    With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV

    < 1/10 raw file size

    ~ 1/2 samba file size

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 12

    Python and KDataPy

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 13

    simpleEventViewer output:

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 14

    Looping utilites – no need to write the looping/plotting

    Benjamin Schmidt

    !   Use KDataPy.util with plotpulse(), looppulse(), loopbolo() and KDataPy.loop_amp with loopchannel(), plotchan_x(), plotchan_x_files(), plotchan_x_dir()

    ! Loop_amp to be completed with plotchannel_xy(), … and loop/plotbolo functions – Note that KDataPy.util loopbolo() also works for Amp and HLA data

    !   Basic usage: import ROOT import KDataPy.util as ut ut.plotpulse(“/sps/edelweis/kdata/data/raw/nk23b002_000.root”, “chalB FID823”)

    ! Documentation

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 15

    Our data acquisition chains revisited

    Benjamin Schmidt

    Samba Macs

    Muon Veto DAQ

    Bolo-Raw data

    Automated proc0: copy to Lyon proc1: rootification proc2: raw->amp proc3: amp->hla proc4: merge/skim muon/hla bolo data spsToHpss: backup on tape drive

    Kdata - ROOT on kalinka

    Our look up place

    Modane

    Lyon

    Karlsruhe

    Radon

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 16

    Using the Kdata pulse processing library

    Benjamin Schmidt

    Adam Cox our benevolent dictator for life

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 17

    The KPulseAnalysisChain

    Benjamin Schmidt

    The kpta-chain is applied before your analysis function

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 18

    Ionisation channel after pattern removal:

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 19

    Advantages – Drawbacks (personal opinion) !   Flexibility of data structure

    !   Consistency of data structure (over time)

    !   Same data structure for different detector systems -> Great for coincidence studies

    !   Same data structure for different processing/analyses (bandpass, optimal filter, …)

    !   Decouple high level analyses from DAQ/processing changes

    !   Independent kpta library !   Has been reused with (flat) data from

    EURECA test stand !   Very versatile

    Benjamin Schmidt

    !   Flexibility of data structure comes with some complexity (heavyness)

    !   Especially Ttree.Draw() more complex

    !   Single raw data folder à restricted use of ls

    !   Writing kpta with templates a bit more complex

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 20

    Usage of pyhton

    Benjamin Schmidt

    !   90 % of the time python feels like the right solution

    !   Shorter, more legible code !   Vast set of external libraries !   Extremely handy for scripting !   Basic Documentation in python

    always via ‘’’docstrings’’’

    !   Main price – speed: !   Circumvent by producing an

    additional set of data files skimmed by detector

    !   Future use of pypy + ROOT6

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 21 Benjamin Schmidt

    But 50 x slower PyPY-JIT compile 1.06 x slower

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 22 Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 23

    CouchDB for everything else and python to glue everything together

    !   Automat database (117 parameters every 20 sec) ! dataDB

    !   Samba header information Useful to find data under conditions(temperature, voltage, run_type,…)

    !   Processing state History of processing/file location (complete documentation)

    !   Supplementary processing databases !   Templates, high-/lowpass filter parameters, cuts

    !   Radon measurements !   …

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 24

    A more complex example: Heat template fitting code

    !   Three python modules (all part of KDataPy!): 1.  templateFitSelection.py (looping over data, select pulses, average

    parameters; call the other scripts) 2.  pulsetempy.py (perform template fit) 3.  uploadAnalyticalTemplateToDB.py (save fit parameter to DB)

    !   Usage: Import KDataPy.TemplateFitSelection as tfit tfit.templateFitSelection(‘/sps/kdata/data/raw/nk23b002_000.root’) tfit.run(‘chalB FID808’)

    !   Note that there are some more options though! !   The code itself is commented and should help to discover more options !   Sorry – Documentation (web) has not been updated yet

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 25

    Basic looping once more

    !   More verbose version: !   Use plotPulseEventViewer module in kanacodewok

    import plotPulseEventViewer as plt plt.plotPulseEventViewer(‘/sps/edelweis/kdata/data/raw/nk23b002_000.root’, ‘chalA FID823’)

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 26

    More advanced usage

    !   Hook in an analysis function

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 27

    Processing – some details

    !   Database driven: !   Proc0: scp of samba raw data to ccage (Lyon)

    !   Task1: change scp account to keitel (all tests finished, batch-, hpss-,…) !   Task2: add md5 checksum test after transfer

    !   Proc1: rootification (Modane) scp to ccage (Lyon) !   Task: transfer rootification to Lyon

    !   Proc2: processing and filtering !   Template fitting tools with DB access implemented !   Adaptation of processing to 8 step function ionization channels !   All data from november processed with KFeldbergKampSite

    (BW Bandpass filter – all channels treated seperately) sps/edelweis/kdata/data/amp/Run305

    !   Task1: automate using DB and redhook.sh script !   Task2: implement KSeebugKampSite (BW Bandpass with simultaneous heat-

    ionization fits) !   Task3: (longer term) revive/debug optimal filter KChamonixKAmpSite

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 28

    Processing – some more details

    !   Proc3: calibration of Amp level files !   Task1: portation of Era scripts: perform calibration, store results (calibDB) !   Task2: implement Amp->HLA process using calibDB

    !   Proc4/5/6: !   Tasks: concat/Merge/Skim data

    !   What can/should be automated? !   Tasks: facilitate access to data:

    !   Implement run list based on datadb (see talks by Cecile/Lukas/Valentin) !   Write python utilities to facilitate plotting/looping

    ! KDataPy.util ! KDataPy.loop_amp …

    ! spsToHPSS: !   Fully working !   Task1: nj13b…tar. There is a file that was too big for automatic processing !   Task2: implement md5 checksum test after writing

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 29

    Template fitting

    The program is rather verbose!

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 30

    Template fitting

    Benjamin Schmidt

    Strong dependence on initial parameters Initial params from last fit pulstemplates db Some tweaking still necessary (larger amplitude…)

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 31

    A useful trick – Quitting your loop

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 32

    Loop-/plotbolo

    !   You need to correlate channels? à skip looping at bolometer level

    Benjamin Schmidt

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 33 Benjamin Schmidt

    Okay a stupid example, but a quick one Note the documentation with further examples: KDataPy Utility functions

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 34

    From theory to practice – Part 2 Working with Amp level data

    Benjamin Schmidt

    !   Structure subclassed in !   Raw: KRawEvent, KRawBolometerRecord, … !   Amp: KAmpEvent, KAmpBolometerRecord, …. !   HLA: KHLAEvent, KHLABolometerRecord, …

    Raw – with pulse traces! No KPulseAnalysisRecords

    Amp and HLA – no pulse traces, but KPulseAnalysisRecord

    With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV

    < 1/10 raw file size

    ~ 1/2 samba file size

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 35

    Ttree.Draw() example

    Benjamin Schmidt

    With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV

    TTree->Draw() command or rather TChain->Draw() (called from python) c.Draw("fPulseAna[].GetAmp()", "fPulseAna[].GetBoloPulseRecord().GetChannelName() == \"slowD FID823\" && fPulseAna[].GetExtra(8)==5 ")

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 36

    Using loop_amp

    Benjamin Schmidt

    Or – if the automatic binning is too crude:

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 37

    Loop_amp together with file lists/directories

    !   Use loop.plotchan_x_files([“file1.root”, “file2.root”], ‘channel’, …) or use loop.plotchan_x_dir(‘directory’, ‘file-pattern’, ‘channel’, …)

    Benjamin Schmidt

    Amplitude

    Entries Ent

    ries

    Amplitude

    Ent

    ries

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 38

    Plotting a Tgraph of two variables – very first example: RMS vs energy

    Benjamin Schmidt

    Chi2

    Amplitude

    These are just examples Develop your own “hook-in” functions! x_some_function() Xy_some_function()….

  • June 2014, CRESST/EDELWEISS/EURECA software workshop 39

    Calibrated data

    !   ERA calibrated data in Kdata v3.0 format for Run12 Computing Center in Lyon and at KIT

    !   Ana calibrated data in Kdata (dev-version) for Run20 https://edwdev-ik.fzk.de/wsvn/EDELWEISS/analysis/kdata/branches/newhla2/ An initial data set FID804 available at KIT and Lyon /sps/edelweis/schmidt/AnaToKData/Run20

    ! KData preliminary analysis files of single detectors Run12 – Run20 – Run 304 at KIT

    Benjamin Schmidt

    Hole collecting

    Hole veto Electron veto

    Electron collecting