SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2...

23
SkimSlimService ENABLING NEW WAYS

Transcript of SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2...

Page 1: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

SkimSlimServiceENABLING NEW WAYS

Page 2: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 2

Problems of Current Analysis Model

2/18/13

Unsustainable in the long run (higher luminosity, no faster cpu’s)

Physicists have no feedback on resources they used.

Long running times.

Very small percentage of people wants/knows-how to optimize their code.

IT people are not happy when someone submits 10k jobs running with 1% efficiency for days, producing 10k of 100 MB files.

Huge load on people doing DPD production, frequent errors, slow turnaround.

Nobody wants to care about DS sizes, registrations, DDM transfers, approvals.

This is the moment to do changes.

Page 3: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 3

(R)evolution of ATLAS data formats

2/18/13

Original plan (6y. ago)ESD < 500kB/ev 1k br.AOD <100 kB/ev 500 br.Athena used for everything.

4y. agoESD < 1500kB/ev 8k br.AOD <500 kB/ev 4k br.Athena + ARA

3y. agoESD < 1800kB/ev 10k br.AOD <1000 kB/ev 7k br.D3PD <20 kB/ev 500-7 k br.Athena + ARA + ROOT

TodayESD < 1800kB/evAOD <1000 kB/evD3PD <200 kB/evAthena + ARA + ROOT + Mana + RootCore + Event…

Proposals for futureESD < 1800kB/evAOD <1000 kB/evGODZILA D3PDs , Structured D3PDsD3PD Athena + ARA + ROOT + Mana + RootCore + Event…

TAG ?!

Page 4: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 4

Problems with ATLAS data formats

2/18/13

• large and kept only for a short time. Used only for special studies

ESDs

• too large, needs Athena/ARA/Mana, slow to start up, nobody made it user friendly

AOD

• A lot of them.• Flat format• Too large. (in sum much larger than AOD)• Expensive to produce, store. Inefficient to read• Could be reduced at least 60% but nobody knows who needs what• Effectively usable only from grid jobs

D3PD

• Takes up to a week to produce it on the grid.• People make them larger than necessary to avoid doing it twice• Files usually too small for efficient transport, storage, thus requiring merging that can’t be done on grid.

Skim/slim D3PD

Page 5: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 5

What a physicist want?

2/18/13

A full freedom to do analysis

In a language he wants

Not be forced to use complex frameworks with hundreds of libraries, 20 min compilations, etc.

Not be forced to think about computing farms, queues, data transfers, job efficiency, …

Get results in no time.

Page 6: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 6

Idea

2/18/13

Let small number of highly experienced physicists together with IT stuff handle big data. They can do it efficiently.

Move majority of physicists away from 100TB scale data to ~100GB data.

Sufficiently small for transport, you can analyze it anywhere, even on your laptop.

However inefficient your code you won’t spend too much resources, and will get results back in a reasonable time.

Page 7: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 7

How would it work

2/18/13

Use FAX to access all the data without overhead of staging.

Use optimally situated replicas. (possible optimization - production D3PDs preplaced at just several sites, maybe even just one)

Physicists request skim/slim through a web service.

Could add a few variables in flight.

Produced datasets registered in the name of requester.

Delivered to a site requested.

All in 1-2 hours – this is essential, as only in this case people will skim/slim to only variables they need without thinking of – “what if I forget something I’ll need”.

Page 8: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 8

Would it work?

2/18/13

Couple hundreds dedicated cores which are made free from all personal inefficient slims/skims using prun.

Highly optimized code

As we know what branches (variables) people are using we know what is useless in the original D3PDs, so we can produce them much smaller.

If bug found in D3PD production no new global redistribution. Some problems can even be fixed in place without new production.

If we find it useful we can split/merger/reorganize D3PD without anyone noticing.

We could later even go for a completely different underlying big data format: Godzilla D3PDs, merged AOD/D3PD, Hadoop !

Page 9: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 9

SkimSlimService

2/18/13

1 We have no dedicated resources for this I used UC3 but any queue that has cvmfs will suffice.2 Modified version of filter-and-merge.py used.3 Currently under my name as I don’t have production role.

Web site at CERN gets requests, shows their status

Handmade server1

receives web queries, collects info on datasets, files, trees, branches

Executor at UC31 gets tasks from the DB,

creates, submits condor SkimSlim jobs2

makes and registers resulting DS3

OracleDB at CERNStores requests, splits them in tasks, serves as a backend for the web site

Page 10: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 102/18/13

http://ivukotic.web.cern.ch/ivukotic/SSS/index.asp

Page 11: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 11

Test runs results

2/18/13

Used datasets, skim, slim code of our larges user. Worst case scenario.

All of the SMWZ 2012 data and MC

185 TB -> 10 TB (300 branches)

Missing in FAX 24 datasets (~3.5%)

0 1 2 3 4 5 60

20

40

60

80

100

120SMWZ replicas in FAX

Egamma

Muons

Sherpa

Herwig

Alpgen

Pythia8

mc.rest

replicas

data

sets

data.Egamma.txt 284data.Muons.txt 288mc.Alpgen.txt 63mc.Herwig.txt 3mc.Pythia8.txt 28mc.Sherpa.txt 19mc.all.txt 9Total 694

Page 12: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 12

Test runs results

2/18/13

CPU efficiency: when data local ~ 0.75%, remote data between 10 and 50%

(6.25MB/s gives 100% eff.)

All of SMWZ requires 8600 CPU hours.

Can be done in 2 hours by pooling unused resources.

Could have one service in EU and one in US to avoid over the ocean traffic.

It is easy to deploy service on anything that mounts CVMFS (UC3,UCT3, UCT2, OSG, EC2).

On EC2 assuming small instance ~ 500$

Micro instance and spot pricing ~100$. But result delivery ~1k$ (10TB * 0.12/GB).

Page 13: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 13

Conclusion

2/18/13

Produced a fully functional system you may use now.

To be done

Polish it

Market it

Push it politically (essential)

Page 14: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 14

Reserve

2/18/13

Page 15: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 15

A number of ATLAS sites made their storage accessible from outside using xRootD protocol1.

Has a mechanism that gets you a file if it exists anywhere in the federation.

All kinds of sites: xrootd, dCache, dpm, lustre, gpfs

Read only

Need a grid proxy to use it

Instructions: https://twiki.cern.ch/twiki/bin/view/Atlas/UsingFAXforEndUsers

global

regional

AGLT2 MWT2

SLAC

2/18/13

What is FAX?

1CMS has very similar system they call AAA.

EU

UK

Oxford QMUL

Redirector Endpoint

Page 16: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 162/18/13

1CMS has very similar system they call AAA.

We want all the T1s and T2s included.

Adding new sites weekly.

Currently 31.

FAX today

AGLT2BNL-ATLASBU_ATLAS_TIER2CERN-PRODDESY-HHINFN-FRASCATIINFN-NAPOLI-ATLASINFN-ROMA1JINR-LCG2LRZ-LMUMPPMUMWT2

OU_OCHEP_SWT2PRAGUELCG2RAL-LCG2RU-PROTVINO-IHEPSWT2_CPBUKI-LT2-QMULUKI-NORTHGRID-LANCS-HEPUKI-NORTHGRID-LIV-HEPUKI-NORTHGRID-MAN-HEPUKI-SCOTGRID-ECDFUKI-SCOTGRID-GLASGOWUKI-SOUTHGRID-CAM-HEP

UKI-SOUTHGRID-OX-HEPWT2WUPPERTALPRODGRIF-LALGRIF-IRFUGRIF-LPNHEIN2P3-LAPP

Page 17: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 17

Does it work?

2/18/13

*For the most part. But a lot of redundancy in the system. We have ~2.5 copies of popular datasets.

YES!

Page 18: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 182/18/13

What is it good for?

• IT: less failed jobs• Physicist: less failed jobs

Failover if grid job has a problem with an input file

• IT: easier upgrades, more availability• Physicist: more CPU resources

Diskless Tier2

• IT: simpler and cheaper• Physicist: more CPU resources

Diskless Tier3s

• Physicist: effectively more disk space • Less data movements• GlobalLFN simplify scripts

Enables storage sharing between nearby sites

• University queues• Amazon, Google, Microsoft clouds

Easily spin more workers

• Optimize applications• Who is reading what• How efficiently

Have full info

Page 19: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 19

How it works?

2/18/13

Quite complex system

A lot of people involved

A lot of development

Takes time to deploy

Takes time to work

out kinks

FAXXrootd protocol• Dcache• Xrootd• DPM

N2N• LFC

Authentication• VOMS x509

Infrastructure changes• AGIS• Pilot

Monitoring & testing• GLED• ML• AMQ• SSB• HC

Page 20: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 20

What can I do today?

2/18/13

Access data on T2 disks localgroupdisk, userdisk, …

If a file is not there job won’t fail, but will come from elsewhere.

I can run jobs at uct2/uct3 and access data anywhere in FAX.

Use frun:◦ If you have data processed at 10 sites all over the world◦ Want to merge them◦ Want to submit jobs where queues are short

Page 21: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 21

Full Dress Rehearsal

2/18/13

A week of stress testing all of the FAX endpoints

While we have continuous monitoring of standard user accesses (ROOT, xrdcp) to stress the system one has to submit jobs to grid.

Submitting realistic jobs manually, automatically

Had more problems with tests than with FAX◦ Late distribution of test dataset to endpoints (TB size datasets)◦ High load due to winter conferences did not help◦ Jobs running on a grid node are entirely different game due to limited proxy they use. ◦ Found and addresses a number of issues

◦ New voms libraries developed◦ Settings at several sites corrected◦ New pilot version

Conclusion: We broke nothing (storages, lfcs, links, servers, monitoring). As soon as all observed problems fixed, we’ll hit harder.

Page 22: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 22

FAX – remaining to be done

2/18/13

Near future:Further expansion: next in line – French and Spanish cloudsImproving robustness of all the elements of the systemImproving documentation, giving tutorials, user support

Months: Move to RucioOptimization: making network smart so it provides the fastest transfers Integration with other network services

Page 23: SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

ILIJA VUKOTIC <[email protected]> 23

Foogle.com

2/18/13

Simple to use: • Learn a few simple things (shell scripts, pbs/condor macros, python, root and c++, laTeX, … )• Write a few hundreds pages of code• Process crawler data and rewrite in a new way. Move it• Rewrite original format to a new different one.• Rewrite again . Move it.• Rewrite again . Move it.• Rewrite again . Move it. • Code to find the page• Compile your page to ps/pdf• Show!

New internet search engine! Say NO

to

IE, firefox,

chrome!

RAW -> ESDESD -> AODAOD -> D3PDD3PD -> slimmed D3PDslimmed one to Ntuple for final analysisFinal analysis

Terminal based!

From inventors of WWW !