The ATLAS Computing Model

21
The ATLAS Computing Model The ATLAS Computing Model Roger Jones Roger Jones Lancaster University Lancaster University CHEP07 CHEP07 Victoria B.C. Sept. 4 Victoria B.C. Sept. 4 2007 2007

description

The ATLAS Computing Model. Roger Jones Lancaster University CHEP07 Victoria B.C. Sept. 4 2007. Overview. Brief summary ATLAS Facilities and their roles Commissioning Cosmics running: M3-M6 Dummy Data T0/T1 Full Dress Rehearsals Data Distribution CPU, Disk, Mass Storage Data Access - PowerPoint PPT Presentation

Transcript of The ATLAS Computing Model

  • The ATLAS Computing ModelRoger JonesLancaster UniversityCHEP07Victoria B.C. Sept. 4 2007

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • OverviewBrief summary ATLAS Facilities and their rolesCommissioningCosmics running: M3-M6Dummy DataT0/T1Full Dress RehearsalsData DistributionCPU, Disk, Mass StorageData AccessStreamingTAGsOperational Issues and Hot Topics

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Transition to Service versus DevelopmentNo more clever developments for the next 18 months!Focus now must be physics analysis performanceWhich means integration, deployment, testing, documentation, EDMPragmatic addition of limited vital additional functionalityStimulation will come from the physics!But also many big unsolved problems for later:How can we store data more efficiently?How can we compute more efficiently?How can we reduce memory profiles?How far can we move data reduction earlier in the chain?How should we use virtualisation?What tasks can really be made interactive, and which are desirable?How do we use really high-speed communications?

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Computing ResourcesNo major change in high-level Computing Model, gradual evolutionComputing Model Review, Jan 2005Computing Technical Design reports July-October 2005 Revised planning Summer 07A Big Change - Planning now beginning to be informed by real data!Detector commissioning has a real payloadUpdates in the accelerator scheduleUpdating with each revision:Event sizesLarge improvementsReconstruction timesSimulation timesConstant tension!This is still evolving!Memory an issue

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • ATLAS Requirements start 2008, 2010 Note the high ratio of disk to cpu in the Tier 2sNot yet realisedMay require adjustments

    CPU (MSi2k)Disk (PB)Tape (PB)200820102008201020082010Tier-03.76.10.150.52.411.4CERN Analysis Facility2.14.61.02.80.41.0Sum of Tier-1s18.15010407.728.7Sum of Tier-2s17.551.57.722.1Total41.4112.218.965.410.541.1

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Resource PlanRecall the early data is for calibration and commissioningThis is needed either from collisions or cosmics etcChange of schedule makes little change to the resource profileThe issue of evolution post 2012 is now under active consideration

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Roles for SitesThe roles in our model are remarkably stableCERN Tier-0:Prompt first pass processing on express/calibration & physics streams with old calibrations - calibration, monitoringCalibration tasks on prompt data24-48 hours later, process full physics data streams with reasonable calibrationsCERN Analysis FacilityAccess to ESD and RAW/calibration data on demandEssential for early calibration / Detector optimization / algorithmic development 10 Tier-1s worldwide (2/3 of collaboration):Reprocess 1-2 months after arrival with better calibrationsReprocess all resident RAW at year end with improved calibration and software30+ Tier 2 Facilities distributed worldwide On demand user physics analysis of shared datasetsLimited access to ESD and RAW data setsSimulation (some at Tier 1s in early years)Tier 3 Centers distributed worldwideOn demand physics analysisData private and local - summary datasets

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Advice for Tier 3sA Tier 3 is not a defined set-up, nor does it have any formal commitmentWorking definition: Tier 3 facilities are for local use, not for all of ATLASNeed a Grid User InterfaceThere are some ATLAS common requirementsIt is also desirable to have a CE and SE to allow occasional use by ATLAS for additional production capacity at local discressionATLAS Tier 3 task force is starting to give recommendations and to describe possible solutionsAim is to help sitesThis will be advisory, not prescriptive!They can come in different forms, many ideasDedicated cpu and disk racksFraction of fabric for Tier 2sDesktop clustersEnumerate the possibilities, match to cases

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Roles for PeopleNeed for tens of groupsIt is clear we will have 20+ physics/detector groupsWe are also seeing justified regional / national groups We envisage in a typical group:Production roleInstaller UserWe really needs groups and roles to be known and understood by the middlewareDifferent roles, different quotas / fairshares / access to SRMsWe can live without individual quotas etc for nowBut we need to be able to manage at the group levelAt present, we are only able to split storage between production and user by force majeur!There is a danger here of divergent VO-specific solutions

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Data Movement PolicyWe still believe that an efficient system requires The pre-placement of dataEarly on, try to place all of the Express Stream ESD at Tier 2sExpress stream determines most RAW data placed on diskMultiple instancesJobs going to the dataThe brokers and tools like GANGA steer jobs to close CEThe Tier 1s should serve (most) close Tier 2 data needsFull AOD set and group Derived Physics Datasets (DPD) in each cloudPre-placed small RAW and ESD samples in Tier 2sThis requires that Tools place the data quickly and efficientlyBut if that fails, wildcat data movements by individuals makes the situation worse for everyonePolicies that do the sameIf only complete datasets are to be movedThe datasets must be closed in a reasonably short timeThe copy to be replicated must itself be copied quickly/completely

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • User Data Movement PolicyUsers need to access the files they produceThis means they need (ATLAS) data tools on Tier 3sThere is a risk: some users may attempt to move large data volumes (to a Tier 2 or Tier 3)SE overloadNetwork congestionATLAS policy in outline:O(10GB/day/user) who cares?O(50GB/day/user) rate throttledO(10TB/day/user) user throttled!Planned large movements possible if negotiatedBut how can we enforce this?!The first line of defence is user education, but it is not enough

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Interaction Between Computing ModelsSo far, the computing models have been tested largely independentlyThey are (quite rightly) designed to optimize the operation of the individual VOMany sites and many networks are sharedHow will the VO Computing Models interact?E.g. LHCb are the only VO with on-demand analysis at Tier 1sDifferent access patterns etcProbably a manageable interaction, but we need experienceE.g. on-demand data access in CMS from Tier 2 to any Tier 1 Very different network usage c.f. ATLASThe proposed joint exercises will be very important to iron-out issues in advance of real daay

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Access Optimization - StreamingAll discussions are about optimisation of data accessATLAS now plans streaming of RAW, ESD, AODStreams derived from trigger informationDoes not change with processing versionInclusive verus exclusive being evaluated based on recent testsCurrent tests have ~5 streams, plus an overlap stream for exclusive streamingEvaluating bookkeeping load, multi-stream analyses etcFurther refinement of event selection using TAGs

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Event data ModelLots of progress on all formatsTrade size versus flexibilityMuch more will be possible from AOD that originally plannedAOD size still under controlBig discussion on Derived Physics DataThis can be made by physics/detector groups or individualsMuch work on possible common formats - good for groupsSome use cases are more demanding (large samples, partial data required)Studies of Skimming (event selections)Thinning (selecting containers or objects from a container)Slimming (selection of properties of an object )Also studies of alternate formatsHope this is only needed in a few cases

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Optimised Access - TAGsRAW, ESD and AOD will be streamed to optimise accessThe selection and direct access to individual events is via a TAG databaseTAG is a keyed list of variables/eventOverhead of file opens is acceptable in many scenariosWorks very well with pre-streamed dataTwo rolesDirect access to event in file via pointerData collection definition functionTwo formats, file and databaseNow believe large queries require full databaseMulti-TB relational database; at least one, but number to be determined from testsRestricted it to Tier0 and a few other sitesDoes not support complex physics queriesFile-based TAG allows direct access to events in files (pointers)Ordinary Tier2s hold file-based primary TAG corresponding to locally-held datasetsSupports physics queriesSee performance and scalability test talk from Helen McGlone

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Commissioning PlansWe are now getting real data, at realistic rates to validate the modelM3 (mid-July)Cosmics produced about 100TB in 2 weeksStressed offline by running at 4 times the nominal rate (32LAr sample test)!M4 now underway - August 23 - early SeptemberExpect about Total data volume: RAW = 66 TB , ESD + AOD = 6 TB20TB RAW data and 6TB ESD at RAL2TB ESD at 5 Tier 2 sitesData distribution as for real dataCurrently writing at 200MB/sec, half nominalRAW and ESD now appearing at RALM5 will be similar - October 16-23M6 will run from end December until real dataIncremental goals, reprocessing between runsWill run close to nominal rateMaybe ~420TB by start of run, plus Monte CarloT1 should treat this as valuable data, but may only live for about a yearFull rate T0 processing OKData exported to 5 / 10 T1s and stored OK, and did more !For 2 / 5 T1s exports to at least 2 T2s not yet all doneQuasi-rt analysis in at least 1 T2 OK, and did more !Reprocessing in Sept. in at least 1 T1 not yet started

    Full chain from DAQ to DA in last week of Aug07

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • M4 Data MovementThroughput (MB/s)Completed File Transfers

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • FDR production goalsSimulated events injected in the t/daq Realistic physics mix in bytestream format incl. luminosity blocksReal data file & dataset sizes, trigger tables, data streamingT0/T1 data quality, express line, calibration runningUse of conditions databaseT0 reconstruction: ESD, AOD, TAG, DPDExports to T1&2sRemote analysis@ the T1sReprocessing from RAW ESD, AOD, DPD, TAGRemake AOD from ESDGroup based analysis DPD@ the T2&T3sRoot based analysisTrigger aware analysis with Cond. and Trigger dbNo MC truth, user analysisMC/Reco production in parallel

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • FDR ScheduleRound 1Data streaming tests DONESept/Oct 07 Data preparationSTARTS SOONEnd Oct07: Tier 0 operations testsNov07-Feb08. Reprocess at Tier1, make group DPD's

    Round 2 ASSUMING NEW G4Dec07-Jan08 New data production for final round Feb08 Data prep for final round using Mar08. Reco final round ASSUMING SRMv2.2Apr08. DPD production at T1sApr08 More simulated data prod in preparation for first data.May08 final FDR First pass production should be validated by year-endReprocessing will be validated months laterAnalysis roles will still be evolvingExpect the unexpected!

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • Remember the Graphic?The LHC Computing FacilityIs this a cloud of uniformityOr the Fog of War??!!In fact, the fog starts to clear

    RWL Jones 4 Sept. 2007 Victoria B.C.

  • SummaryThe computing model has (so far) stood up wellThe localization of data to clouds future proofs!Scheduled production is largely solvedOn-demand analysis, data management & serving many users are the mountains we are climbing nowUsers are important to getting everything workingNo Pain, No Gain!

    RWL Jones 4 Sept. 2007 Victoria B.C.