Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June ...

download Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June  24 ,  2011

If you can't read please download the document

description

Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24 , 2011. The Experiments you approve. Depend heavily (at all stages from inception to publication and beyond) on Computing: Facilities (power, cooling, space) D ata storage and distribution - PowerPoint PPT Presentation

Transcript of Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June ...

Important communication talk

Computing Strategy

Victoria White, Associate Lab Director for Computing and CIOFermilab PACJune 24, 2011The Experiments you approveDepend heavily (at all stages from inception to publication and beyond) on Computing:Facilities (power, cooling, space)Data storage and distributionCompute serversGrid servicesDatabasesHigh performance networksSoftware frameworks for simulation, processing, analysisTools such as GEANT, ROOT, Pythia, GENIEGeneral tools to support collaboration, documentation, code management, etc.

Computing Strategy - Fermilab PAC 6/24/20112Our job in the Computing SectorIs to enable science and to optimize the support (human and technological) of the scientific programs of the lab (including the Experiment program) Within funding and resource contraintsIn the face of growing demands To meet emerging needsTo deal with rapidly changing technology

We also have to provide computing to support the labs operations and provide all the standard services that an organization needs (and often expects 24x7)Computing Strategy - Fermilab PAC 6/24/20113Computing Division -> Computing Sector

Computing Strategy - Fermilab PAC 6/24/20114Service ManagementBusiness Relationship Management (BSM)ITIL Process OwnersContinuous Service Improvement Program ISO 20K CertificationOffice of the CIOEnterprise Architecture (EA) & Configuration ManagementComputer SecurityGovernance and Portfolio ManagementProject Management OfficeFinancial Management

Scientific Computing strategyProvide computing, software tools and expertise to all parts of the Fermilab scientific program including theory simulations (Lattice QCD and Cosmology), and accelerator modeling Work closely with each scientific program as collaborators (where a scientist from computing is involved) and as valued customers.Create a coherent Scientific Computing program from the many parts and many funding sources encouraging sharing of facilities, common approaches and re-use of software wherever possibleComputing Strategy - Fermilab PAC 6/24/20115Experiment computing strategiesComputing Strategy - Fermilab PAC 6/24/20116CMS Tier 1 at FermilabThe CMS Tier-1 facility at Fermilab and the experienced team who operate it enable CMS to reprocess data quickly and to distribute the data reliably to the user community around the world. Computing Strategy - Fermilab PAC 6/24/20117

Fermilab also operates: LHC Physics Center (LPC)Remote Operations CenterU.S. CMS Analysis Facility2010 Data Taking was successful. Scientific Results published within weeks of data being acquired. Computing system has had flexibility to handletrigger rates at 300 Hz and much higher in peaks, T0 kept upmore reprocessing passes than expected (19)full 2010 pp dataset reprocessed in 10 days plus one week debugging/postmortermlarge transfer rates, full-mesh transfer topology5 PB from T0 and other T1s to T1Most T2s can get data from any T1 or T23.1 billion events simulatedwith underlying experimental conditions changing throughout!Workflows were performed at expected locations from Day 12011 will have quick startup, with peak luminosity for the year reached as soon as May -- 16 interactions/event!RECO/AOD event sizes double from 2010 values to 0.8/0.2 MB/event7CMS Offline and ComputingFermilab is a hub for CMS Offline and ComputingIan Fisk is the CMS Computing CoordinatorLiz Sexton-Kennedy is Deputy Offline CoordinatorPatricia McBride is Deputy Computing CoordinatorLeadership roles in many areas in CMS Offline and Computing: Frameworks, Simulations, Data Quality Monitoring, Workload Management and Data Management, Data Operations, Integration and User Support. Fermilab Remote Operations Center allows US physicists to participate in monitoring shifts for CMS.

Computing Strategy - Fermilab PAC 6/24/20118Computing Strategy for CMSContinue to evolve the CMS Tier 1 center at Fermilab - to meet US obligations to CMS and provide the highest level of availability and functionality for the $Continue to ensure that the LHC Physics Center and the US CMS physics community is well supported by the Tier 3 (LPC CAF) at FermilabPlan for evolution of the computing, software and data access models as the experiment matures requires R&D and developmentEver higher bandwidth networksData on demand Frameworks for multi-core

Computing Strategy - Fermilab PAC 6/24/20119Any Data, Anywhere, Any time: Early Demonstrator10

Root I/O and Xrootd demonstrator : an example of evolving requirements and technology Computing Strategy - Fermilab PAC 6/24/2011Run II Computing StrategyProduction processing and Monte-Carlo production capability after the end of data takingReprocessing efforts in 2011/2012 aimed at the HiggsMonte Carlo production at the current rate through mid-2013Analysis computing capability for at least 5 years, but diminishing after end of 2012Push for 2012 conferences for many results no large drop in computing requirements through this periodContinued support for up to 5 years forCode management and science software infrastructureData handling for production (+MC) and Analysis OperationsCuration of the data: > 10 years with possibly some support for continuing analyses

Computing Strategy - Fermilab PAC 6/24/201111Tevatron looking aheadComputing Strategy - Fermilab PAC 6/24/201112CDF and D0 expect the publication rate to remain stable for several years.Analysis activity:Expect > 100 (students+ postdocs) actively doing analysis in each experiment through 2012.Expect this number to be much smaller in 2015 though data analysis will still be on-going.

D0 Publications each yearCDF Publications each year

Data Preservation for Tevatron dataData will be stored and migrated to new tape technologies for ~ 10 yearsEventually 16 PB of data will seem modestIf we want to maintain the ability to reprocess and do analysis on the data there is a lot of work to be done to keep the entire environment viable Code, access to databases, libraries, I/O routines, Operating Systems, documentation.. If there is a goal to provide open data that scientists outside of CDF and Dzero could use there is even more work to do.4th Data Preservation Workshop was held at Fermilab in MayNot just a Tevatron issueComputing Strategy - Fermilab PAC 6/24/201113Intensity Frontier program needsComputing Strategy - Fermilab PAC 6/24/201114Many experiments in many different phases of development/operations.MINOSMiniBooNESciBooNEMINERvANOvAMicroBooNEArgoNeuTMu2eg-2LBNEProject X era expts

CPU (cores)Disk (TB)1 PBIntensity Frontier strategiesNuComp forum to encourage planning and common approaches where possibleA shared analysis facility where we can quickly and flexibly allocate computing to experimentsContinue to work to grid enable the simulation and processing softwareGood success with MINOS, MINERvA and Mu2eAll experiments use shared storage services for data and local disk so we can allocate resources when neededHired two associate scientists in the past year and reassigned another scientist.Computing Strategy - Fermilab PAC 6/24/201115Budget/resource allocation for 2012 +There is always upward pressure for computingmore disk and more cpu leads to faster results and greater flexibilitymore help with software & operations is always requestedWithin a fixed budget each experiment can usually optimize between tape drives, tapes, disk, cpu, serversassuming basic shared services are provided.With so many experiments in so many different stages we intend to convene a Scientific Computing Portfolio Management Team to examine the needs/computing models of the different Fermilab based experiments and help in allocating the finite dollars to optimize scientific output.Computing Strategy - Fermilab PAC 6/24/201116Cosmic Frontier experimentsContinue to curate data for SDSS Support data and processing for Auger, CDMS and COUPP Will maintain an archive copy of the DES data and provide modest analysis facilities for Fermilab DES scientists.Data management is an NCSA (NSF) responsibilityWe have the capability to provide computing should this become necessary DES use Open Science Grid resources opportunisticallyFuture initiatives still in the planning stagesComputing Strategy - Fermilab PAC 6/24/201117

SDSS

DESDES Analysis Computing at FermilabComputing Strategy - Fermilab PAC 6/24/201118Fermilab plans to host a copy of the DES Science Archive. This consists of two piecesA copy of the Science databaseA copy of the relevant image data on disk and tapeThis copy serves a number of different rolesActs as a backup for the primary NCSA archive, enabling collaboration access to the data when the primary is unavailableHandles queries by the collaboration, thus supplementing the resources at NCSAEnables the Fermilab scientists to effectively exploit the DES data for science analysisTo support the science analysis of the Fermilab Scientists, DES will need a modest amount of computing (of order 24 nodes). This is similar to what was supported for the SDSS project.

LSST Fermilab recently joined LSSTFermilab expertise in data management, software frameworks, overall computing from SDSS and from the entire program means we could contribute effectivelyCurrently negotiating small roles in Data Acquisition (where it touches data management)Science Analysis (where it touches data management)Computing Strategy - Fermilab PAC 6/24/201119SOFTWARE in CollaborationComputing Strategy - Fermilab PAC 6/24/201120Software Tools and frameworks: our strategyDevelop and maintain core expertise and tools, aiming to support the entire lifecycle of scientific programs Focus on areas of general applicability with long term support requirementsWork in partnership with individual programs to create scientific applicationsParticipate in projects and collaborations that aim to develop scientific computational infrastructure Provide support of concept development to scientific programs in pre-project phase Enabled by core expertise and tools Reuse expertise and best-of-class tools from partnerships with individual projects and make them available to other projects Computing Strategy - Fermilab PAC 6/24/201121----- Meeting Notes (1/21/11 17:53) -----lifestyle becomes lifecycleprojects becomes programs in 1st subbulletscientific computing - computationalto other programs --> projects

21Framework ApplicationsSuccess: specific application (RunII) leads to community tool and continuing requests for framework applications from new projectsSuccess: high-quality implementations (most recently, CMS framework)

RunII Offline infrastructureFramework LQCD software

LArNOvACMSMu2e

MiniBooNEComputing Strategy - Fermilab PAC 6/24/20112222CMS framework in excellent shape and well validated*Computing Strategy - Fermilab PAC 6/24/201123

*CMS offline coordinators, Dec 2010Detector SimulationGEANT activity: members of G4 collaboration since 2007, toolkit capability development.Work in critical areas defined by G4 external reviewsSimulation development & support activity: provide expertise and support to Fermilab projects and users.Applications in high-priority areas for the Fermilab program. Shifting from LHC/CMS main focus to Intensity FrontierToolkit evolution: in collaboration with other institutions (SLAC, CERN,)Optimize performance of existing toolkitEnhance capabilities and improve infrastructure

Computing Strategy - Fermilab PAC 6/24/201124----- Meeting Notes (1/21/11 17:53) -----need graphics here.ilcsim - what to say? strategy is need clear statement wrt slac.committing to work with them in the short term.customer is lepton collider communityby by --> byilcsim could go here. in toolkit evolution.not making too big a deal. but not ignoring it.add strategy here - short: partner, middle: leverage existing frameworks, decide onfar: future frameworks24Analysis suites for the community: ROOTROOT is the standard HEP analysis toolkit, used for RunII, LHC, and Intensity FrontierFermilab is a founding member of the ROOT projectSupport deployment and operation of ROOT applications by Fermilab users and projectsDevelopment emphasis, in collaboration with CERN, to optimize I/O (essential for LHC) and thread safety (driven by technology evolution and LHC needs)

2525Computing Strategy - Fermilab PAC 6/24/201125Software collaborative effortsComputing Strategy - Fermilab PAC 6/24/201126ComPASS Accelerator Modeling Tools projectLattice QCD project and USQCD CollaborationOpen Science Grid many aspects and some sub-projects such as Grid security, workload managementGrid and Data Management toolsAdvanced Wide Area Network projects Dcache collaborationEnstore collaborationScientific Linux (with CERN)GEANT core development /validation (with GEANT4 collaboration)ROOT development & support (with CERN)Cosmological ComputingData Preservation initiative (global HEP)

Sharing StrategiesComputing Strategy - Fermilab PAC 6/24/201127Why Sharing Strategies are neededCostCoherent technical approaches and architecturesSupport over the entire lifecycle of an experiment/projectComputing Strategy - Fermilab PAC 6/24/201128Experiment/Project Lifecycle and fundingComputing Strategy - Fermilab PAC 6/24/201129Early PeriodR&D, SimulationsLOI,ProposalsShared servicesMature phaseConstruction, Operations, AnalysisShared servicesExpt or Project specificFinal data-takingand beyondFinal analysis,Data preservationand accessShared servicesProject specificShared servicesSharing via the Grid FermiGridComputing Strategy - Fermilab PAC 6/24/201130TeraGrid WLCG NDGFUser Login & JobSubmissionGRIDFarm3284 slotsCMS7485 slotsCDF5600 slotsD06916 slotsFermiGridMonitoring/AccountingServicesFermiGridInfrastructureServicesFermiGridSiteGatewayFermiGridAuthentication/AuthorizationServices

Open Science Grid30Computing Strategy - Fermilab PAC 6/24/201131The Open Science Grid (OSG) advances science through open distributed computing. The OSG is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.

Total of 95 sites; million jobs a day, 1 million CPU hours/day; 1 million files transferred/day. It is cost effective, it promotes collaboration, it is working!Open Science Grid (OSG)

The US contribution and partnership with the LHC Computing Grid is provided through OSG for CMS and ATLASFNAL CPU core count for scienceComputing Strategy - Fermilab PAC 6/24/201132

Data Storage at Fermilab - TapeComputing Strategy - Fermilab PAC 6/24/201133Data on tape - totalComputing Strategy - Fermilab PAC 6/24/2011

Other Experiments34FermiCloud: Virtualization likely a key component for long term analysisThe FermiCloud project is a private cloud facility built to provide a production facility for cloud servicesA private cloudon-site access only for registered Fermilab usersCan be evolved into a hybrid cloud with connections to Magellan, Amazon or other cloud provider in the future.Much of the data intensive computing cannot use commercial Cloud computingNot cost effective today for permanent use only for overflow or unexpected needs for Simulation.Computing Strategy - Fermilab PAC 6/24/201135COMPUTING FOR THEORY AND SIMULATION SCIENCEComputing Strategy - Fermilab PAC 6/24/201136High Performance (parallel) Computing is needed for Lattice Gauge Theory calculations (LQCD)Accelerator modeling tools and simulationsComputational Cosmology:

Computing Strategy - Fermilab PAC 6/24/201137

Dark energy, matterCosmic gasGalaxiesSimulations connect fundamentals with observablesStrategies for Simulation Science ComputingLattice QCD is the poster child Coherent inclusive US QCD collaborationPaul MacKenzie, Fermilab leads. This allocates HPC resources. LQCD Computing Project (HEP and NP funding)Bill Boroski, Fermilab is the Project ManagerSciDAC II project to develop the software infrastructureAccelerator modeling Multi-institutional tools project COMPASS Panagiotis Spentzouris, Fermilab is the PIAlso accelerator project specific modeling effortsComputational CosmologyComputational Cosmology Collaboration (C3) for mid-range computing for astrophysics and cosmologyTaskforce Fermilab, ANL, U of Chicago - to develop strategy

Computing Strategy - Fermilab PAC 6/24/201138CORE COMPUTING & INFRASTRUCTUREComputing Strategy - Fermilab PAC 6/24/201139Core Computing a strong baseScientific Computing relies on Core Computing services and Computing Facility infrastructure Core Networking and network servicesComputer rooms, power and coolingEmail, videoconferencing, web serversDocument databases, Indico, calenderingService deskMonitoring and alertsLogisticsDesktop support (Windows and Mac)Printer support Computer Security .. and moreAll of the above is provided through overheads

Computing Strategy - Fermilab PAC 6/24/201140Computer RoomsThe home of all the scientific computing hardware is the computer rooms.They provide power, space and cooling for all the systems.CDs computer rooms are a critical component of the successful delivery of scientific computing.Computing Strategy - Fermilab PAC 6/24/201141

Feynman Computing Center (FCC)Grid Computing Center (GCC)Lattice Computing Center (LCC)Fermilab Computing FacilitiesComputing Strategy - Fermilab PAC 6/24/201142

Lattice Computing Center (LCC)High Performance Computing (HPC)Accelerator Simulation, Cosmology nodesNo UPSFeynman Computing Center (FCC)High availability services e.g. core network, email, etc.Tape Robotic Storage (3 10000 slot libraries)UPS & Standby Power GenerationARRA project: upgrade cooling and add HA computing room - completedGrid Computing Center (GCC)High Density Computational ComputingCMS, RUNII, Grid Farm batch worker nodesLattice HPC nodesTape Robotic Storage (4 10000 slot libraries)UPS & taps for portable generators

EPA Energy Star award 201042Facilities: more than just space power and cooling continuous planningComputing Strategy - Fermilab PAC 6/24/201143

ARRA funded new high availability computer room in Feynman Computing Center

Many CMS disks are now in hereReliable high speed networking is keyComputing Strategy - Fermilab PAC 6/24/201144

ConclusionWe have a coherent and evolving scientific computing program that emphasizes sharing of resources, re-use of code and tools, and requirements planning.Embedded scientists with deep involvement are also a key strategy for success.Fermilab takes on leadership roles in computing in many areas. We support projects and experiments at all stages of their lifecycle but if we want to truly preserve access to Tevatron data long term much more work is needed.Computing Strategy - Fermilab PAC 6/24/201145