Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é...

62
Perspective on Future Data AnalysisL 1 Computing in High Energy Physics 2003 La Jolla 24 March René Brun CERN Perspective on Future Data Analysis in HENP

Transcript of Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é...

Page 1: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

Perspective on Future Data AnalysisL 1

Computing in High Energy Physics 2003

La Jolla 24 March

René Brun

CERN

Perspective on Future Data Analysis in HENP

Page 2: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 2

Data Analysis ??

Data Analysis has been traditionally associated with the final stages of data processing, ie Physics Analysis.

In this talk, I will cover a more general aspect of Data Analysis (in the true sense).

How to interact with data at all stages of data processing (batch or interactive modes)?

Can we imagine an experiment-independent way to achieve this?

Page 3: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 3

Evolution

To understand the possible directions, we must understand some messages from the past, the solid recipes!

One important message is “Make it simple”.

Heavy experiment frameworks are often perceived as a serious obstacle and push users to use more basic but universal frameworks.

Page 4: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 4

Once upon a time (seventies)

With the first electronic (as opposed to bubble chamber) experiments, data analysis was experiment specific, an activity after the data taking.

The only common software was the histograming package (eg Hbook) ,the fitting package (eg Minuit), some plotting packages and independent routines in cernlib (linear algebra and small utilities)

Data structures = Fortran common blocks

Page 5: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 5

Early Eighties

With the growing complexity of the experiments and corresponding software, we see the development of Data Structures management systems (hydra, zbook-->zebra, bos).

These systems are able to write/read complex bank collections. Zebra had a self-describing bank format with built-in support for bank evolution.

Most data processed in batch, but many prototypes of interactive systems start to appear (htv, gep, then paw..)

Page 6: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 6

PAW

Designed in 1985. Stable since 1993 Row-Wise-Ntuples. OK for small data sets,

interactive histograming with cuts. Column-Wise-Ntuples. A major step

illustrating the advantage of structured data sets

PAW: a success not so much because of its technical merits but perceived as a tool widely available stability since many years: an important

element

Page 7: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 7

1993-->2000 (1)

Move from Fortran to OO Took far more time than expected new language(s) new programming techniques basic infrastructure not available to

compete with existing libraries and tools conflicts between projects ad-hoc software in experiments

Page 8: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 8

1993-->2000 (2)

False hopes with OODBMS (or too early?) OODBMS -->Objectivity OO models designed for Objy batch oriented Interactive use via conversion to PAW

ntuples central data base does not fit well with

GRID concepts Licensing problems and more

Page 9: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

Perspective on Future Data AnalysisL 9

Data Analysis Models

Page 10: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 10

From the desktop to the GRID

Desktop Local/remote

Storage

Online/Offline

Farms

GRID

New data analysis tools must be able to use in parallel remote CPUS, storage elements and networks in a transparent way for a user at a desktop

Page 11: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 11

My laptop in 200X

Using a naïve extrapolation of Moore’s law

for a state of the art laptop

Year CPU/Ghz RAM/GB disk/GB

2003 2.4 0.5 60

2005 5 1 150

2007 10 2 300

2009 20 4 600

2011 40 8 1000

Nice !But less than 1/1000

of what I need

Page 12: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 12

Batch-mode Local analysis

Conventional model: The user has full control on the event loop.

The program produces histograms, ntuples or trees.

The selection is via user private code Histograms are then added (tool or in the

interactive session) ntuples/trees are combined into a chain

and analyzed interactively.

Page 13: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 13

Batch Analysis on the GRID

From a user viewpoint, a simple extrapolation of the local batch analysis.

In practice, must involve all the GRID machinery: authentication, resource brokers, sandboxes.

Viewing the current status (histograms) must be possible.

Advantage: Stateless, can process large data volumes.

Advanced systems already exist (see talk by Andreas Wagner)

Page 14: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 14

AliEnFS & Distributed Analysis

******************************************* * * * W E L C O M E to R O O T * * * * Version 3.03/09 3 December 2002 * * * * You are welcome to visit our Web site * * http://root.cern.ch * * * *******************************************

Compiled for linux with thread support.

CINT/ROOT C/C++ Interpreter version 5.15.61, Oct 6 2002Type ? for help. Commands must be C++ statements.Enclose multiple statements between { }.root [0]newanalysis->Submit();

Analysis Macro

MSS

MSS

MSS

MSS

MSS

CE

CE

CE

CE

CE

merged Trees +Histograms

? Query for Input Data

MSS

MSS

MSS

MSS

VFS

Kernel

LUFS

Kernel Space

AliEnFSAliEn API

User Space

castor://

soap://root://

root:// root://

https://

/alien/

alice/ atlas/

data/ prod/mc/

a/ b/

Linux File System

MSS

Page 15: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 15

Interactive Local Analysis

On a public cluster, or the user’s laptop. Tools like PAW or successor are used for

visualization and ntuples/trees analysis.

Page 16: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 16

GRID: Interactive AnalysisCase 1

Data transfer to user’s laptop Optional Run/File catalog Optional GRID software

Optionalrun/FileCatalog

Remotefile servereg rootd

Trees

Trees

Analysis scripts are interpretedor compiled on the local machine

Page 17: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 17

GRID: Interactive AnalysisCase 2

Remote data processing Optional Run/File catalog Optional GRID software

Optionalrun/FileCatalog

Remotedata analyzer

eg proofd

Trees

Trees

Commands, scripts

histograms

Analysis scripts are interpretedor compiled on the remote machine

Page 18: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 18

GRID: Interactive AnalysisCase 3

Remote data processing Run/File catalog Full GRID software

Run/FileCatalog

Remotedata analyzer

eg proofd

Trees

Trees

Commands, scripts

Histograms,trees

TreesTreesTrees

TreesTreesTrees

slave

slave

slave

slave

slave

slave

Analysis scripts are interpretedor compiled on the remote master(s)

Page 19: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

Perspective on Future Data AnalysisL 19

Data Analysis Projects

Page 20: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 20

Tools for data analysis

PAW: started in 1985, no major developments since 1994.

HippoDraw: started in 1991 ROOT: started in 1995, continuous

developments JAS: started in 1995, continuous

developments Open Scientist: ? LHC++/Anaphe: 1996-->2002 PI: new project in the LHC Computing Grid,

just starting now

Page 21: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 21

PAW The reference since 18 years (1985), Used by most collaborations ported on many platforms, small (3 to 15 MB) many criticisms during the development phase applauded since it is stable maintained by Olivier Couet (ROOT team)

Usagestill growing

0.1 FTE

Page 22: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 22

HippoDraw

Author: Paul Kunz show the way in 1991/1992 Usage: Paul + “a 50 year-old CERN

physicist” Seems to be in constant prototyping

phases Good to have this type of prototype to

illustrate new possible interactive techniques.

1 FTE ?

Page 23: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 23

ROOT

In constant development since 1995 Used by many collaborations and outside

HEP

More than 10000 distributionsof binary tar files in February

6 +2+..FTE

Page 24: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 24

JAS

Started in 1995. (Tony Johnson) Current version 2. JAS3 presented at this

CHEP For the Java world. How to cooperate with C++ frameworks?

3 FTE ?

Page 25: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 25

In AIDA you believe ?

The Abstract Interfaces for Data Analysis project was started by the defunct LHC++ and continued by Anaphe (now stopped).

Supported by JAS and Open Scientist Goal: define abstract interfaces to

facilitate cooperation between developers and facilitate migration of users to new products

Versions 1, 2 and 3 (version 4 for PI ?)

Page 26: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 26

In AIDA I don’t believe Abstract Interfaces are fundamental in modern

systems to make a system more modular and adaptable.

But, common abstract interfaces are not a good idea.

They force a lowest common denominator They require international agreements Users will be confused (what is common and not) you become slave of a deal: against creativity

It is more important to agree on object interchange formats and data base access You can easily change a few hundred lines of

code. You cannot copy Terabytes of data

Page 27: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 27

The LCG PI project

Fresh from the oven One of the projects recently launched by

the Applications Area of the LCG project. Ideas:

promote the use of AIDA (version 4) Python for scripting interface to ROOT & CINT in gestation

see Vincenzo

Page 28: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 28

User & Developer views

Users Requests very rarely requests for grandiose new

features zillions of tiny new features zillions of tiny improvements want consolidation & stability

Developers view want to implement the sexy features target modularity (more complex installation?) maintenance & helpdesk: a problem or a

chance?

Page 29: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 29

Lessons from the past

It takes time to develop a general tool more than 7 years for PAW, ROOT and JAS

User feedback is essential in the development phase

People like stable systems Efficient access to data sets is a

prerequisite 24h x 7days x 12 months x N years online

support is vital

Page 30: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 30

Develop/Debug/maintainIn an Interactive system with N basic functions, the number of combinations may be unlimited, (Not NxN, but N! )10% of the time to develop first 90% of the code.90% of the time to develop the remaining 10%

Page 31: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 31

Time to develop

LCG

Page 32: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

Perspective on Future Data AnalysisL 32

Technical aspects

Page 33: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 33

Desktop

Plug-in Manager and Dictionary GUI Graphics 2-d, 3-d Event Displays Histograming & Fitting Statistics tools Scripting Data/Program organization

Page 34: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 34

Plug-in Manager

Object Dictionary

I/O manager InterpreterI/O manager

Plug-in managerBasic Services, GUI, Math..

User Shared lib Exp Shared libs

General Utility Shared lib

Exp Shared libsExp Shared libs

Page 35: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 35

The Object Dictionary

Object Dictionary

Data dictionary Functions dictionary

Compiled code

Interpreted scriptsGUI

Command line

I/O InspectorsBrowsers

Page 36: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 36

Scripting for data analysis

After KUIP and Tk/Tcl era Command line Interface required Scripts

interpreted or/and byte-code interpreted automatic compilation and linking call compiled or interpreted code compiled code must be able to call interpreted code (GUI

and configuration scripts) Big bonus if compiled and interpreted languages are the

same

Scripting and object dictionary symbiosis Remote execution of scripts (in parallel)

Page 37: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 37

Languages & scripting

C++ Compiled code

Python/Perl scripts

GUI with signal/slots

Interactive User

C++ Interpreted scripts

Batch User

Page 38: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 38

Comparing scripts

http://sarkar.home.cern.ch/sarkar/jroot/main.html

Very interesting projectfrom Subir Sarkar

Cooperation between

Javaand a C++ framework

based on Object Dictionary

Page 39: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 39

GUI(s)

Constant evolution

+Microsoft MFC, Win32 API Signals/Slots principle: very nice. It helps

designing large and modular GUI systems Interpreters help GUI builders/editors

1983

Vax/VMS

SMS

VT100

1985

GKS

Textronix

1989

MOTIF

Unix workstations

2001

Qt

Linux/Laptops

1997

Java/Swing

The Web

Page 40: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 40

2-D graphics

An area where constant improvements are required.

Better plotters, better fonts,... Better drivers: postscript, SVG, XML, etc

Publication quality is a must. This requirement alone explains why many proposed data analysis systems do not penetrate experiments

Page 41: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 41

3-D graphics

Data structures: Objects <--> scene Scene renderers: OpenGL, Open Inventor Most difficult is detector geometry graphics z-buffer algorithms OK for fast real time

fancy graphics, not OK for good debugging (shape outline is important on top of z-buffer views).

Vector Postscript (or PDF/SVG) must be available (not Postscript from OpenGL triangles)

see talks about GraXML and Persint

Page 42: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 42

Example with PERSINT/ATLAS

Page 43: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 43

Event Displays The most successful event displays so far were 2-

D projections (see Aleph, Atlas/Atlantis) A lot of work with 3-d graphics in many

experiments (see talks about Iguana) Client-server model Access to framework objects, browsers One could have expected a bigger role for Java!

Mismatch with experiment C++ frameworks? Possible directions

standardize object exchange (SOAP/XML/Root I/O) standardize low level graphics exchange (HEPREP)

Page 44: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 44

Histograming

This should be a stable area Thread Safety Binning on parallel systems Merging on batch/parallel systems

Page 45: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 45

Fitting

Minuit: the standard Fumili: was nice and fast Upgrade of Minuit with new algorithms

including Fumili in the pipeline several GUIs on top a very powerful package developed by

BaBar see talk on RooFit by D.Kirkby

Page 46: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 46

Statistics & Math

Many tools and algorithms exist GSL ? Gnu R-Math project TerraFerma Initiative

Subject of discussions at many workshops confidence limits workshops ACAT FermiLab and Moscow Durham

Need to be federated in a coherent framework

Page 47: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 47

Lost with Complexity?

In large collaborations, users are often lost when confronted to the complexity of big simulation and reconstruction programs:

What is the data organization? How are algorithms organized? The

hierarchy? The problem is amplified by the use of

dynamically configurable systems, dynamic linking and polymorphism

Browsing data and algorithms is a must

Page 48: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 48

Folders/ white boards

Folders help understandingcomplex hierarchical

structuresLanguage IndependentCould be GRID-aware

Page 49: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 49

Why Folders ?

This diagram shows a system without folders. The objects have pointers to each other to access each other's data.

Pointers are an efficient way to share data between classes. However, a direct pointer creates a direct coupling between classes.

This design can become a very tangled web of dependencies in a system with a large number of classes.

Page 50: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 50

Why Folders ?

In the diagram below, a reference to the data is in the folder and the consumers refer to the folder rather than each other to access the data.

A naming and search service provides an alternative. It loosely couples the classes and greatly enhances I/O operations.

In this way, folders separate the data from the algorithms and greatly improve the modularity of an application by minimizing the class dependencies.

Page 51: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 51

Tasks/Algorithms

In the same way that Folders can be used to organize the data, one can use Tasks to organize a hierarchy of algorithms.

Tasks can be organized into a hierarchical tree of tasks and displayed in the browser. A Task is an abstraction with standard functions to Begin,Execute,Finish.

Each Task derived class may contain other Tasks that can be executed recursively, such that a complex program can be dynamically built and executed by invoking the services of the top level task or one of its subtasks.

Tasks help understandingthe organization and

sequence of executionof large programs

Page 52: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

Perspective on Future Data AnalysisL 52

Directions

Page 53: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 53

Exchange/Compatibility

If we assume that several data analysis tools will be around (HEP made or commercial), it is important to exchange objects between these tools (drag&drop, network or files).

The SOAP/XML have emerged as standards to exchange low level volume of objects.

Several technical solutions are possible. The winning solutions will be the ones that will be able to automatize the process by exploiting all the information in the object dictionary.

Page 54: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 54

Follow Microsoft ?

The SOAP/XML are one of the key components of .NET (and also of the MS competition).

MS is preparing a new OS (Longhorn ?) for 2005. This new OS will introduce an Object distributed data base.

This may have a serious impact on the GRID software and on our tools.

Page 55: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 55

Access Patterns

Understand data access patterns to objects in one file to subsets of objects in many collections

relations with run/file catalogs persistent reference pointers Optimize design of containers for

processing in batch interactive parallel processing

cache management and proxies

Page 56: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 56

Query processor Extend/Develop powerful query systems that

minimize the amount of programming Optimize I/O (read only the strict necessary) are able to process data in parallel, hiding the

complexity of parallelism to the end user. can be executed again and again, possibly

learning from the previous passes. Are robust against network failures, CRTL/C,

programming errors. Can be run in GUI mode, interpreted or compiled

mode

Page 57: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 57

Event Collections Develop/Extend objects able to keep a summary

of previous runs Event collections with their iterators well

matched to the query processor (event+run, UUID, tree entry serial number).

Special objects: masks, bit slice index to speed up searches in large collections.

The system must be able to run with and without the run/file catalog

Page 58: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 58

Exploiting meta information

The normal data analysis mode requires access to the user classes.

However, experience shows that users also expect (as it was the case for PAW) to be able to process their data sets without the classes/shared libraries used to generate these data sets, still supporting automatic schema evolution.

The class meta information is saved in the data set. Simple queries involving only data class attributes must be possible without the code.

This requirement has consequences on the way the object dictionary is used.

Page 59: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 59

Dependencies & Simplicity

Minimize component dependencies to facilitate software distribution/portability

The winning tools will be the ones that are easy to port to new systems

(OS/compilers) depend only on other systems also easy to

port are used in real conditions to guarantee

feedback are able to evolve very quickly to adapt to new

situations and new requirements.

Page 60: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 60

Integration with GRID soft The data analysis software is an integral part of

the GRID software. It drives the process, not the inverse.

This implies a close cooperation between teams working on tools for data analysis and teams working on the GRID plumbing: resource brokers, authentication,etc, and GRID high level tools like Condor.

The Batch line and the Interactive line must be developed in a complementary way.

Page 61: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 61

Trends Summary

HistogramNtuple viewers

Data Presenters

Efficient Accessto large andstructured

event collections

Interactionwith user &

experiment classes

Parallelism on the GRID

Batch/Interactive

Access to Catalogs

Resource BrokersProcess migration

Progress Monitors

Proxies/cachesVirtual data sets

More and more GRID oriented data analysisMore and more experiment-independent software

Page 62: Perspective on Future Data AnalysisL1 Computing in High Energy Physics 2003 La Jolla 24 March Ren é Brun CERN Perspective on Future Data Analysis in HENP.

René Brun CHEP03Perspective on Future Data

Analysis 62

Acknowledgements

For a long time, data analysis has been the last wheel of the car. Many thanks to the organizing committee for giving me the opportunity to present my views on the subject.

Enjoy this conference