ACAT 2000 FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura...

20
ACAT 2000 ACAT 2000 FNAL, Octo ber 2000 Lassi A. Tuura http://iguana.cern.ch Analysis Environment Analysis Environment Challenges Challenges Lassi A. Tuura Northeastern University, Boston

description

FNAL, October 2000 Lassi A. Tuura 3 So What Is An Analysis Environment? v Analysis involves a lot more than just the interactive tool v Learn from the “PAW revolution” r N-tuples provided new, more powerful ways to work with the data r New user interface v Move towards closer integration with data continues r We can do much more and better than just a N-tuple today r Examples: ROOT added trees, CMS uses a full-blown object model v Experiments are making big jumps in data accessibility r Exploiting widely used, very powerful object models—not just data r New levels of automation and integration are becoming available for networks, distributed computing and mass-storage systems r User interfaces to these new data models need to catch up! àThe analysis environments will need considerable links with the rest of the experiment’s computing and software infrastructure

Transcript of ACAT 2000 FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura...

Page 1: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

ACAT 2000ACAT 2000FN

AL,

Oct

ober

200

0La

ssi A

. Tuu

rahttp://iguana.cern.ch

Analysis Environment Analysis Environment ChallengesChallenges

Lassi A. Tuura

Northeastern University, Boston

Page 2: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

2http://iguana.cern.ch

What Is An Analysis Environment? What Is An Analysis Environment? Physics analysis is to a large degree

an iterative process of Reducing data samples to more

interesting subsets Distilling the sample into information

at higher abstraction level– By summarising lower level

information– By calculating statistical entities

from the samples

ExperimentExperiment

ReduceReduce

DistillDistill

InterpretInterpret

A large part of the work can be done on very high-level entities in an interactive analysis and presentation tool Hence focus on tools that work on simple summary information

(DSTs, N-tuples, tag databases, ...) Additional tools for detector and event visualisation

Page 3: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

3http://iguana.cern.ch

So What Is An Analysis So What Is An Analysis Environment?Environment? Analysis involves a lot more than just the interactive tool Learn from the “PAW revolution”

N-tuples provided new, more powerful ways to work with the data New user interface

Move towards closer integration with data continues We can do much more and better than just a N-tuple today Examples: ROOT added trees, CMS uses a full-blown object model

Experiments are making big jumps in data accessibility Exploiting widely used, very powerful object models—not just data New levels of automation and integration are becoming available

for networks, distributed computing and mass-storage systems User interfaces to these new data models need to catch up! The analysis environments will need considerable links with the rest of

the experiment’s computing and software infrastructure

Page 4: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

4http://iguana.cern.ch

The ChallengeThe Challenge Beyond the interactive analysis tool

Data analysis & presentation: N-tuples, histograms, fitting, plotting, … A great range of other user activities with fuzzy boundaries

Batch Interactive from “pointy-clicky” to Emacs-like power tool to scripting Setting up configuration management tools, application frameworks

and reconstruction packages Data store operations: Replicating entire data stores; Copying runs,

events, event parts between stores; Not just copying but also doing something more complicated—filtering, reconstruction, analysis, …

Browsing data stores down to object detail level 2D and 3D visualisation Moving code across final analysis, reconstruction and triggers

Today this involves (too) many tools

Page 5: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

5http://iguana.cern.ch

Example: Distributing Your Data Example: Distributing Your Data StoreStore Problem: replicating and sharing your experiment’s

data in full or in part for various analysis tasks and GRID Tools exist but...

Do I understand my experiment’s world-wide configurations well enough to use the tools confidently?

How do I find out the data store nearest me in the first place? If I want a private working store that shares the experiment data

at the same time, what should I do? What if I do not want just a plain file copy, but want only a copy

of the reconstructed data for the calorimeter from a certain sample that includes events in tens of files?

What if I want to share my analysis settings and results with my colleague for a verification?

Enquiring minds want to know!

Page 6: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

6http://iguana.cern.ch

What Do We Need?What Do We Need? A uniform integrated interface to the whole task range (within

reasonable limits)? A tool suite or a work bench? Wizards for common tasks to guide us through the choices,

to give sensible defaults and to explain the terminology? Some ideas that might prove helpful

Showing the data store or parts of it as a directory Conceptual “home directory” in the data store Make it easy to put stuff related to your analyses under your “home

directory” (framework and reconstruction setups, parameters etc.) Make it easy to access analysis setups and results of different groups

– Keep track of configurations, input and output data selections, …– A “desktop” where you can have shortcuts/links– Standard shortcuts for common stuff

One size never fits all—the

tools need to adapt!

Page 7: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

7http://iguana.cern.ch

Extrapolate these to a data

store…Concepts In Today’s AppsConcepts In Today’s Apps

(IGUANA prototype)

Page 8: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

8http://iguana.cern.ch

Command-line interface that reflects actions in other

windows

Visualisation window

Plus of course batch modewithout pointy-clicky!

Concepts In Today’s Apps…Concepts In Today’s Apps…

Page 9: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

9http://iguana.cern.ch

How To Get There?How To Get There? Few can afford to develop a new interactive analysis tool, let

alone coherent tools for the entire range of analysis tasks! Divide, conquer and co-operate

Divide the problem into categories, such as GUI, event and detector visualisation, and data analysis and presentation

We need to share: use existing modules in each category where possible—write your own only where nothing suitable exists (and don’t get attached to code, ditch it when something better is available!)

Integrate the lot into a user-friendly and productive environment Make applications by choosing from the module pool—experiments

could construct their own specific environments with customisation For this to work, the pool should be truly modular

Need to take into account all dependencies, not just the obvious ones Need to think what it would take to test all the features provided by

each component—those form its immediate dependencies

Page 10: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

10http://iguana.cern.ch

What Kind of an Architecture?What Kind of an Architecture? Modular where it matters

Model-View-Controller and alike work to partition the domain Layer to keep front-ends and back-ends separate Ensure a standard for visual components to facilitate integration

Interfaces for data access Narrow interfaces to link the analysis and visualisation

sub-framework to the core framework

Not everything needs an abstract interface! It may be better to make a strategic choice to use a particular

product if it can be contained and completely replaced in 6-9 months

Example: Use OpenInventor instead of inventing your own 3D API We need to assess and bound the risks, not total safety!

Page 11: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

11http://iguana.cern.ch

More About InterfacesMore About Interfaces Example: selecting events using high-level summary data

Pick your favourite name for the same concept:Tags, N-tuples, DSTs, B-tree indices…

N-tuple was both an access paradigm and a storage method Historical emphasis was on storage format

Shift the emphasis to an access and query interface Can provide the look and feel for a proven access method (N-

tuple) with natural modern extensions Implementation behind the interface may vary

– Data may already be cached or accessed from deep in the event– May exploit advanced indexing and retrieval– May involve computation on demand– May even be necessary to read from tape

Other interfaces can provide access to underlying features

Page 12: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

12http://iguana.cern.ch

SummarySummary Analysis environment includes a lot more than just the

interactive data analysis and presentation tools As experiment complexity grows we need

To be able to drill down to and interact with data in many new ways A good solid user interface for the whole range of tasks all the way

from batch mode operation to the quick pointy-clicky jobs Building all this from scratch is neither affordable nor wise

Exploit existing components—HEP, open source or commercial Components need clearly defined responsibilities: a mission

statement Abstract interfaces are useful means to

– Help people co-operate and not disturb each other too much– Provide hooks for all the cool new stuff we will see– Layer and partition the problem domain– Bound risks should a technology or a component fail

Page 13: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

13http://iguana.cern.ch

Page 14: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

14http://iguana.cern.ch

Some Architecture IdeasSome Architecture Ideas Three-tier architecture

Application model (framework, reconstruction, simulation …) Specific ways of looking at objects (3D, 2D, hierarchical

browser, object inspector, fitter…) Representation tier to tie the above two together Dynamically load and integrate required bits together (MV)2C: Representation is the view from application model, but

model to the visualiser Possible interesting result: scripting becomes “yet another view”

and does not require special treatment or privilege A host of wizards

Coherent, good human interface Easily adapted and expanded to new tasks Should be able to leave behind scripts or other batch mode food

Page 15: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

15http://iguana.cern.ch

Interface Pros and ConsInterface Pros and Cons Modularity and good interfaces make a big difference

When one particular component fails, it doesn’t take others down Easier to add new features—without disturbing existing ones Easier to adapt to new, sometimes radically different contexts Testing is manageable and actually gets done Easier to manage the project and for people to co-operate

(often much more of the work is in communication, not coding) …but they come at a price

Costlier to develop up front Bad interface can make life really awkward Hard to justify if you have only one implementation A good interface needs one clearly defined mission—coming up

with it may require considerable work, but usually is more than worth it as doing so usually clarifies problem understanding and project strategy

Page 16: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

16http://iguana.cern.ch

Do Languages Matter?Do Languages Matter? No—Great concepts will survive in almost any language

Especially within a common paradigm like object oriented languages It is the paradigm changes that hurt, changing from objects to

components is a more difficult change than from C++ to Java… Will we see extern “Java” { class XYZ { … }; }?

Yes—Consider this scenario Someone in the collaboration comes up with a new analysis cut … and that cut proves very interesting … so the analysis needs to get into the trigger express line

If the analysis was done by C++ code that writes out a N-tuple that was then processed with a few-thousand lines of PAW KUMACs and FORTRAN, you’ll have a hard time finding volunteers to re-code it for the trigger, let alone someone willing to double-check it

It is not (just) the languages that hurt...

Page 17: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

17http://iguana.cern.ch

Data Store(Objectivity)

File

File

File

File

FileFederation

wizards

Cmscan

DataBrowser

Analysis jobwizards

Other Non-IGUANA Tools

IGUANAIGUANA

OSCAROSCAR

ORCAORCACARFCARF

Tony’sTony’sscriptsscripts ObjyObjy

toolstools

GRIDGRIDToolsTools

CMS Analysis Architecture At a CMS Analysis Architecture At a GlanceGlance

Page 18: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

18http://iguana.cern.ch

Modularity Example: IgAPDlabModularity Example: IgAPDlab

Could pick only a subset for some

related task

Page 19: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

19http://iguana.cern.ch

Current IGUANA Tools (By Origin)Current IGUANA Tools (By Origin)

IGUANA LHC++or HEP

Public-domain

Commercial

Page 20: ACAT 2000  FNAL, October 2000 Lassi A. Tuura Analysis Environment Challenges Lassi A. Tuura Northeastern University, Boston.

FNA

L, O

ctob

er 2

000

Lass

i A. T

uura

20http://iguana.cern.ch

Current IGUANA Tools (By Current IGUANA Tools (By Purpose)Purpose)