Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008

31
Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008 Turkish Atomic Energy Authority (TAEA), Ankara, Turkey Data Discovery Tools, DQ2 Enduser Tools and Physics Analysis Tools

description

Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008 Turkish Atomic Energy Authority (TAEA), Ankara, Turkey. Data Discovery Tools, DQ2 Enduser Tools and Physics Analysis Tools. Outline. User’s work-flow for Data Analysis Data Discovery Tools - PowerPoint PPT Presentation

Transcript of Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008

Page 1: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

Nurcan Ozturk

University of Texas at Arlington

SCHOOL ON HEP@TR-GRID

April 30 – May 2, 2008

Turkish Atomic Energy Authority (TAEA), Ankara, Turkey

Data Discovery Tools, DQ2 Enduser Tools andPhysics Analysis Tools

Page 2: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 2

Outline

User’s work-flow for Data Analysis

Data Discovery Tools AMI - ATLAS Metadata Interface

TAG Browser - ELSSI

DQ2 Enduser Tools

ATLAS Analysis Model Analysis Model Forum Recommendations

Derived Physics Data (DPD)

Analyzing the Data (inside or outside Athena)

AthenaRootAccess (ARA)

EventView

Page 3: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 3

User’s Work-flow for Data Analysis

Locate the data

Analyze the results

Setup the analysis job

Submit to the Grid

Retrieve the results

Setup the analysis code

Page 4: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

Data Discovery Tools

Page 5: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 5

ATLAS Metadata Interface (AMI)

http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/index.html

AMI is a bookkeeping project. AMI is a generic cataloging system (a database application). The majority of datasets currently catalogued in AMI are Monte Carlo datasets. AMI reads information from the task request system, and correlates it with information read from the production database. AMI contains the physics metadata for:

2008 real data 2008 FDR exercise 2007 Cosmics runs (M5 data) 2006/2007 service challenge datasets StreamTest Data Challenges DC1 and DC2 / Rome

Production System Combined Test Beam

AMI also powers the TagCollector release management tool.

Page 6: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 6

AMI Tutorial

http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/

Or

http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/FastTrackTutorial.html

What is AMI? Where does AMI get its Information? How do I search for a dataset? Which information can I get from the result of an AMI dataset search? What is the schema of the AMI dataset catalogue? Why can I sometimes not find a dataset when I can see its existence in other catalogues? Can I refine the search? Can I simply browse all of the information in AMI? Can I bookmark an AMI page? Why doesn't the back button of my browser work? Can I use AMI without going through the web interface? How can I extract information from AMI? How to I write to AMI?

Page 7: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 7

How Do I Search For A Dataset? – Simple Search

Follow the link to the “simple search interface” from the tutorial page:

type here

Page 8: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 8

Results From Simple Search (1)

pull down menu

link links link

Page 9: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 9

Results From Simple Search (2)When you click on Provenance link it shows:what version of Athena software used in making evgen/digit/reco

Page 10: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 10

Results From Simple Search (3)When you click on DQ2 link it shows:DQ2 Dataset Metadata, existing replicas of the dataset, a link to PanDA monitor

Page 11: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 11

Results From Simple Search (4)When you click on PANDA link:It gets you to the dataset browser

Page 12: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 12

How Do I Search For A Dataset? – Advanced Search

Follow the link to the “Advanced search interface” from the tutorial page:

Page 13: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 13

Results From Advanced Search

Page 14: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 14

TAG

ATLAS will produce petabytes of data, a system of event-level metadata is needed to quickly identify and select events that are interested for a given analysis. This is provided by TAG files, and the TAG database.

TAG files are built from AOD according to offline analysis-style code. TAG files are then loaded into TAG database.

TAG files store information about the status of each sub-detector, trigger and physics object ID.

For instance for FDR-1 data TAGs contain: Event information:

Run number, event number, luminosity block, number of vertices and tracks, primary vertex position. (Luminosity has an entry but not filled)

Variables such as the summed cell Et, missing Et magnitude, and phi

Trigger information: BitMasks encode pass, pass after prescale for each trigger item/chain Physics objects:

multiplicity of physics objects and the Pt, eta, phi for the highest Pt objects A tightness criterion for e/mu/gamma is included as is b-tag likelihoods and tau candidate likelihood.

PhysWords: 32-bit TAG Word. For b-physics for instance: Bit 0: HighPtMuonPair, Bit 1: J/Psi candidate, Bit 2: Upsilon candidate.

See more details for FDR & TAGs from a talk by James Frost, April Exotics Working See more details for FDR & TAGs from a talk by James Frost, April Exotics Working Group meetingGroup meeting

Page 15: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 15

How Does TAG Selection Work?

Use the TAG file as an input to EventSelector or PoolTAGInput. Make sure the matching Pool file (eg. AOD) is in the PoolFileCatalog. Define you query of the TAG content. Run the job. Very flexible:

Can use the TAG to preselect the events from an AOD in which you are interested, passing only those to an analysis algorithm.

Can use the ATG to write out an AOD (or ESD, RDO) of only the selected events. How to learn more? Good tutorials are available already:How to learn more? Good tutorials are available already:

https://twiki.cern.ch/twiki/bin/view/Atlas/FeedBackForTags https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection https://twiki.cern.ch/twiki/bin/view/Atlas/

TagForEventSelection#Building_Tags_Under_12_0_31 (create tag files) https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAG https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAGAnalysis https://twiki.cern.ch/twiki/bin/view/Atlas/TopFdrTag http://twiki.mwt2.org/bin/view/Main/TutorialTag080318 (All the above links are

available from this one.)

Page 16: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 16

TAG Browser – ELSSI (1)

TAGs are accessed by users via a web interface called ELSSI, the ATLAS Event Level Selection Service Interface. For FDR-1 data (tutorial) https://atldbdev01.cern.ch/tagservices/tutorial/index.htm For FDR-1 data: https://atldbdev01.cern.ch/tagservices/fdr/index.htm

You need Firefox to see this page As Jack Cranshaw informed me.

Page 17: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 17

TAG Browser – ELSSI (2)

How to use ELSSI:

Define a query to select runs, streams, data quality, trigger chains,…

Review the query

Execute the query and retrieve the TAG file (a root file)

Page 18: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

DQ2 Enduser Tools

Page 19: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 19

The Client Tools to Retrieve Data

DQ2 enduser tools Includes dq2_xxx (dq2_ls, dq2_get, etc) commands Available to download from:

https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#Download The setup files are edited to accommodate local needs (dq2.sh, setup.sh) Available on AFS at CERN:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh

source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN

gLite UI (User Interface) Includes lcg-cp, egee-gridftp-xxx Available on AFS at CERN:

source /afs/usatlas.bnl.gov/lcg/current/etc/profile.d/grid_env.sh

source /afs/cern.ch/project/gd/LCG-share/current/external/etc/profile.d/grid-env.sh

Why glite UI may be needed in OSG:

dq2_put/get may use some gLite commands depending on the site they interact with (TiersOfATLASCache.py description): lcg-lg, lcg-rf, glite-gridftp-ls, lcg-gt

More Info:

https://twiki.cern.ch/twiki/bin/view/Atlas/DDMEndUserTutorial

Page 20: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 20

DQ2 Enduser Tools

dq2_ls: returns a list of datasets matching a given pattern dq2_ls fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1

dq2_get: copies the files from DQ2 to a local area dq2_get –rv fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1

dq2_put: registers datasets to DQ2

dq2_poolFCjobO: creates PoolFileCatalog and Athena job-option for DQ2 datasets

dq2_register: uploads and registers external generator input files to DQ2

dq2_cleanup: deletes a dataset from a site's catalog and storage.

dq2_sample: copies a portion of an existing dataset and registers it to DQ2

More info:

https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#DQ2_end_user_tools

Page 21: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

ATLAS Analysis Model

Page 22: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 22

Analysis Model Forum Recommendations on the Analysis Model

includes metadata + simple UserData

Page 23: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 23

Derived Physics Data - DPnD

Primary DP1D: POOL-based DPD produced by the GRID production system. There are expected to be O(10) primary DPDs, so the contents will not be very specific to an analysis. It is expected to be skimmed (keeping only interesting events), slimmed (keeping only interesting objects, for example electrons and muons), and thinned (keeping only the subset of information inside objects that is relevant in future steps) compared to the AOD. An Example Job Options file AODtoDPD.py (see CVS)

Packages In CVS: TopDPDMaker, TauDPDMaker, BPhysicsDPDMaker,

SUSYDPDMaker

Secondary DP2D: POOL-based DPD with more analysis-specific information. Typically, this is produced from Primary DPD and may be created using an Athena tool like EventView. SimpleThinningExample

HighPtViewDPDThinningTutorial

Tertiary DP3D: Does not need to be POOL-based, it includes flat ntuples.

Page 24: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 24

Analyzing the Data

Inside Athena Interactive or batch using C++, python code.

Needs a part from Athena (depends on user needs).

Provides full access to all tools and services.

Outside Athena – AthenaRootAccess (ARA) CINT, or using python, or compiled C++ code.

Does not need full Athena installation (expected 1GB)

Not all classes are available (example, calo-Cells)

Important: both methods use the same files as input.

Page 25: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 25

ARA - AthenaRootAccess

Allows to read an Allows to read an AOD in ROOTAOD in ROOT like you would read a normal ntuple like you would read a normal ntuple (without using Athena). (without using Athena).

The goal is to seamlessly use Athena tools.The goal is to seamlessly use Athena tools. One can use One can use identical code/toolsidentical code/tools to run on ESDs, AODs, DPDs. to run on ESDs, AODs, DPDs. The The names of the variablesnames of the variables in the AOD ROOT tree are the same as in in the AOD ROOT tree are the same as in

the AOD.the AOD. Limitations:Limitations:

However it uses the transient classes and converters of the ATLAS software so a portion of the offline is needed. A ~1GB distribution including Athena libraries.

Tools and data that need detector description, conditions, B-field etc, cannot be called in ARA. However this type of info can be put in UserData in DPD.

Gaudi based classes (like AlgTools, Services) don’t work in ARA. Wrapping machinery is needed to reuse the code in Athena/ARA.

Page 26: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 26

ARA Examples (1)

CINT macrosCINT macros Easy development (change code and run), Run time is slow ~x10 C++ compiled code

C++ compiled codeC++ compiled code Slower development (change code, recompile, cannot reload libs) Fastest runtime Integrates easily back into Athena

Python scriptsPython scripts Easy development (change code, reload and run) Simple example shows runtime ~x3 C++ compiled code

May be able to compile Python

Integration of developed code into Athena?

Examples on Examples on TwikiTwiki and in and in ReleaseRelease:: https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaROOTAccess PhysicsAnalysis/AthenaROOTAccessExamples

Page 27: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 27

ARA Examples (2)

Available in CVS under PhysicsAnalysis/AthenaROOTAccessExamplesAvailable in CVS under PhysicsAnalysis/AthenaROOTAccessExamples

Need Need pythonpython script to script to open fileopen file and and setup transient treesetup transient tree::

lxplus:~> get_files AthenaROOTAccess/test.pylxplus:~> get_files AthenaROOTAccess/test.py Compiled C++ Example:Compiled C++ Example:lxplus:~> rootlxplus:~> root

root [0] TPython::Exec("execfile('test.py')");root [0] TPython::Exec("execfile('test.py')");

root [1] CollectionTree_trans = (TTree *)gROOT>Get("CollectionTree_trans");root [1] CollectionTree_trans = (TTree *)gROOT>Get("CollectionTree_trans");

root [2] ClusterExample ce; // Example class in AthenaROOTAccessExamplesroot [2] ClusterExample ce; // Example class in AthenaROOTAccessExamples

root [3] ce.plot(CollectionTree_trans);root [3] ce.plot(CollectionTree_trans);

root [4] TruthInfo ti;root [4] TruthInfo ti;

root [5] ti.truth_info(CollectionTree_trans);root [5] ti.truth_info(CollectionTree_trans);

test.py takes about ~20 secs to load necessary dictionaries

One can recompile and then restart from the beginning

Page 28: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 28

ARA Examples (3)

CINT Example:lxplus:~> rootroot [0] TPython::Exec("execfile('test.py')");root [1] CollectionTree_trans = (TTree *)gROOT->Get("CollectionTree_trans");root [2] gROOT->LoadMacro("AthenaROOTAccessExamples/macros/cluster_example.C");root [3] plot(CollectionTree_trans);

One can now edit cluster_example.C and re-run LoadMacro

Python Example:lxplus:~> python -i test.py>>> import AthenaROOTAccessExamples.cluster_example>>> AthenaROOTAccessExamples.cluster_example.plot(tt)

One can now edit cluster_example.py and re-run:

>>> reload(AthenaROOTAccessExamples.cluster_example)>>> AthenaROOTAccessExamples.cluster_example.plot(tt)

Page 29: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 29

Analysis Frameworks: EventView (1)

This framework provides general tools for common analysis tasks like

particle selection

overlap removal

observable calculation

combinatorics

Recalibration

systematics evaluation

generating ntuples

Users can perform a great deal of their analyses in Athena by chaining and configuring a set of these tools and producing an ntuple for further analysis in ROOT.

Twiki page:

https://twiki.cern.ch/twiki/bin/view/Atlas/EventView

Page 30: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 30

Analysis Frameworks: EventView (2)

Though this style of "modular" analysis usually does not require writing C++, the EventView framework is completely extensible, so if necessary users can easily develop and mix their own C++ tools with the common EventView tools and share their configurations and tools with other collaborators.

Most users are introduced to EventView through one of the "View" packages (eg TopView, SusyView, HighPtView) which for the most part collect configurations of EventView tools for a specific set of analyses and produce a standard ntuple output.

These users typically start by analyzing the View ntuples produced by the various physics working groups, and then continue to re-configuring and re-running the respective View package if they require additional tuning for their specific analyses.

There also efforts to evolve (the persistent piece of) EventView in the context of AthenaROOTAccess.

Page 31: Nurcan Ozturk University of Texas at Arlington SCHOOL ON  HEP@TR-GRID  April 30 – May 2, 2008

We will practice with the tools during the tutorial.