DATA GRIDS for Science and Engineering

28
DATA GRIDS for DATA GRIDS for Science and Engineering Science and Engineering Worldwide Analysis at Regional Centers Worldwide Analysis at Regional Centers Harvey B. Newman Harvey B. Newman Professor of Physics, Caltech Professor of Physics, Caltech Islamabad, August 21, 2000 Islamabad, August 21, 2000

description

DATA GRIDS for Science and Engineering Worldwide Analysis at Regional Centers Harvey B. Newman Professor of Physics, Caltech Islamabad, August 21, 2000. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. HPSS. HPSS. HPSS. HPSS. LHC Vision: Data Grid Hierarchy. - PowerPoint PPT Presentation

Transcript of DATA GRIDS for Science and Engineering

Page 1: DATA GRIDS for  Science and Engineering

DATA GRIDS for DATA GRIDS for Science and EngineeringScience and Engineering

Worldwide Analysis at Regional CentersWorldwide Analysis at Regional Centers Harvey B. NewmanHarvey B. Newman

Professor of Physics, CaltechProfessor of Physics, Caltech

Islamabad, August 21, 2000Islamabad, August 21, 2000

Page 2: DATA GRIDS for  Science and Engineering

LHC Vision: Data Grid HierarchyLHC Vision: Data Grid Hierarchy

Tier 1

Tier2 Center

Online System

Offline Farm,CERN Computer

Ctr > 20 TIPS

FranceCentre

FNAL Center Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~2.5 Gbits/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs; 100 triggers per second. Event is ~1 MByte in size

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PByte/sec

~0.6-2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

Page 3: DATA GRIDS for  Science and Engineering

On-demand creation of powerfulOn-demand creation of powerfulvirtual computing and data systemsvirtual computing and data systems

Grid: Flexible, high-performance access to all significant resources

Sensor nets

http://

http://

WebWeb: Uniform : Uniform access to access to HTML HTML documentsdocuments

Data Stores

Computers

Softwarecatalogs

Colleagues

Grids: Next Generation WebGrids: Next Generation Web

Page 4: DATA GRIDS for  Science and Engineering

RD45, RD45, GIODGIOD Networked Object DatabasesNetworked Object Databases Clipper/GC Clipper/GC High speed access, processing and analysis High speed access, processing and analysis FNAL/SAM FNAL/SAM of files and object dataof files and object data SLAC/OOFS SLAC/OOFS Distributed File System + Objectivity Interface Distributed File System + Objectivity Interface NILE, CondorNILE, Condor Fault Tolerant Distributed ComputingFault Tolerant Distributed Computing

MONARCMONARC LHC Computing Models: LHC Computing Models: Architecture, Simulation, StrategyArchitecture, Simulation, Strategy

PPDGPPDG First Distributed Data Services and First Distributed Data Services and Data Grid System PrototypeData Grid System Prototype

ALDAPALDAP OO Database Structures & Access OO Database Structures & Access Methods for Astrophysics and HENP DataMethods for Astrophysics and HENP Data

GriPhyN GriPhyN Production-Scale Data Grids Production-Scale Data Grids EU Data GridEU Data Grid

Roles of ProjectsRoles of Projectsfor HENP Distributed Analysisfor HENP Distributed Analysis

Page 5: DATA GRIDS for  Science and Engineering

Grid Services Architecture [*]Grid Services Architecture [*]

GridGridFabricFabric

GridGridServicesServices

ApplnApplnToolkitsToolkits

ApplnsApplns

Data stores, networks, computers, display Data stores, networks, computers, display devices,… ; associated local servicesdevices,… ; associated local services

Protocols, authentication, policy, resource Protocols, authentication, policy, resource discovery & management, instrumentation,... discovery & management, instrumentation,...

......RemotRemot

eevizviz

toolkittoolkit

RemotRemotee

comp.comp.toolkittoolkit

RemotRemotee

datadatatoolkittoolkit

RemotRemotee

sensorssensorstoolkittoolkit

RemotRemotee

collab.collab.toolkittoolkit

A Rich Set of HEP Data-Analysis A Rich Set of HEP Data-Analysis Related ApplicationsRelated Applications

[*] [*] Adapted from Ian Foster: there are computing grids, Adapted from Ian Foster: there are computing grids, access (collaborative) grids, data grids, ...access (collaborative) grids, data grids, ...

Page 6: DATA GRIDS for  Science and Engineering

The Grid MiddlewareThe Grid MiddlewareServices ConceptServices Concept

Standard services thatStandard services that

Provide uniform, high-level access to a wide Provide uniform, high-level access to a wide range of resources (including networks)range of resources (including networks)

Address interdomain issues: security, policyAddress interdomain issues: security, policy

Permit application-level management and Permit application-level management and monitoring of end-to-end performancemonitoring of end-to-end performance

Broadly deployed, like Internet ProtocolsBroadly deployed, like Internet Protocols

Enabler of application-specific tools as well Enabler of application-specific tools as well as applications themselvesas applications themselves

Page 7: DATA GRIDS for  Science and Engineering

Application Example:Application Example:Condor Numerical OptimizationCondor Numerical Optimization

Exact solution of “nug30” quadratic Exact solution of “nug30” quadratic assignment problem on June 16, 2000assignment problem on June 16, 2000 14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,

13, 26,17,30,6,20,19,8,18,7,27,12,11,2313, 26,17,30,6,20,19,8,18,7,27,12,11,23 Used “MW” framework that maps branch and Used “MW” framework that maps branch and

bound problem to master-worker structurebound problem to master-worker structure Condor-G delivered 3.46E8 CPU seconds in Condor-G delivered 3.46E8 CPU seconds in

7 days (peak 1009 processors), using parallel 7 days (peak 1009 processors), using parallel computers, workstations, and clusterscomputers, workstations, and clusters

MetaNEOS: Argonne, Northwestern, Wisconson

Page 8: DATA GRIDS for  Science and Engineering

Emerging Emerging Data GridData Grid User Communities User Communities

NSF Network for Earthquake Engineering NSF Network for Earthquake Engineering Simulation Grid (NEES)Simulation Grid (NEES) Integrated instrumentation, Integrated instrumentation,

collaboration, simulationcollaboration, simulation Grid Physics Network (GriPhyN)Grid Physics Network (GriPhyN)

ATLAS, CMS, LIGO, SDSSATLAS, CMS, LIGO, SDSS Particle Physics Data Grid (PPDG)Particle Physics Data Grid (PPDG) EU Data GridEU Data Grid Access Grid; VRVS: supporting Access Grid; VRVS: supporting

group-based collaborationgroup-based collaborationAndAnd

The Human Genome ProjectThe Human Genome Project The Earth System Grid and EOSDISThe Earth System Grid and EOSDIS Federating Brain DataFederating Brain Data Computed MicrotomographyComputed Microtomography The Virtual Observatory (US + Int’l) The Virtual Observatory (US + Int’l)

Page 9: DATA GRIDS for  Science and Engineering

The Particle Physics Data Grid The Particle Physics Data Grid (PPDG)(PPDG)

First Round Goal: First Round Goal: Optimized cached read access to 10-100 Gbytes Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabytedrawn from a total data set of 0.1 to ~1 Petabyte

Matchmaking, Resource Co-Scheduling: SRB, Condor, HRM, GlobusMatchmaking, Resource Co-Scheduling: SRB, Condor, HRM, Globus

PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,

CPU, Disk, CPU, Disk, Tape RobotTape Robot

REGIONAL SITEREGIONAL SITECPU, Disk, CPU, Disk, Tape RobotTape Robot

Site to Site Data Replication Service

100 Mbytes/sec

ANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS

Multi-Site Cached File Access Service

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsersPRIMARY SITEPRIMARY SITE

DAQ, Tape, DAQ, Tape, CPU, CPU,

Disk, RobotDisk, Robot

Regional SiteRegional SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

Regional SiteRegional SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

Page 10: DATA GRIDS for  Science and Engineering

GriPhyN: PetaScale GriPhyN: PetaScale Virtual Data GridsVirtual Data Grids

Build the Foundation for Petascale Virtual Data GridsBuild the Foundation for Petascale Virtual Data Grids

Virtual Data Tools

Request Planning &

Scheduling ToolsRequest Execution & Management Tools

Transforms

Distributed resources(code, storage,

computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual Investigator

Workgroups

Raw data source

Page 11: DATA GRIDS for  Science and Engineering

WorkPackageNumber

Work Package title Leadcontractor

WP1 Grid Workload Management INFN

WP2 Grid Data Management CERN

WP3 Grid Monitoring Services PPARC

WP4 Fabric Management CERN

WP5 Mass Storage Management PPARC

WP6 Integration Testbed CNRS

WP7 Network Services CNRS

WP8 High Energy Physics Applications CERN

WP9 Earth Observation Science Applications ESA

WP10 Biology Science Applications INFN

WP11 Dissemination and Exploitation INFN

WP12 Project Management CERN

EU-Grid ProjectEU-Grid ProjectWork PackagesWork Packages

Page 12: DATA GRIDS for  Science and Engineering

Grid Tools for CMS “HLT” Grid Tools for CMS “HLT” Production: A. Samar (Caltech)Production: A. Samar (Caltech)

Distributed Distributed Job Job ExecutionExecution and and Data Handling:Data Handling:

GoalsGoals

TransparencyTransparency PerformancePerformance Security Security Fault ToleranceFault Tolerance AutomationAutomation

Submit job

Replicate data

Replicatedata

Site A Site B

Site C

Jobs are executed locally or

remotely Data is always

written locally Data is replicated

to remote sites

Job writes data locally

A.Samar, M. Hafeez (Caltech) with CERN and FNAL

Page 13: DATA GRIDS for  Science and Engineering

GRIDs In 2000: SummaryGRIDs In 2000: Summary

Grids are changing the way we do science Grids are changing the way we do science and engineeringand engineering

Key services and concepts have been Key services and concepts have been identified, and development has startedidentified, and development has started

Major IT challenges remainMajor IT challenges remain Opportunities for CollaborationOpportunities for Collaboration

Transition of services and applications to Transition of services and applications to production use is starting to occurproduction use is starting to occur

In future more sophisticated integrated services In future more sophisticated integrated services and toolsets could drive advances in many and toolsets could drive advances in many fields of science and engineeringfields of science and engineering

High Energy Physics, facing the need for High Energy Physics, facing the need for Petascale Virtual Data, is an early adopterPetascale Virtual Data, is an early adopterand leading Data Grid developerand leading Data Grid developer

Page 14: DATA GRIDS for  Science and Engineering

The GRID BOOKThe GRID BOOK

Book published by Book published by Morgan KaufmanMorgan Kaufman www.mkp.com/gridswww.mkp.com/grids

GlobusGlobus www.globus.orgwww.globus.org

Grid ForumGrid Forum www.gridforum.orgwww.gridforum.org

Page 15: DATA GRIDS for  Science and Engineering

French GRID Initiative PartnersFrench GRID Initiative Partners

Computing centresComputing centres:: IDRIS CNRS High Performance Computing CentreIDRIS CNRS High Performance Computing Centre IN2P3 Computing CentreIN2P3 Computing Centre CINES, centre de calcul intensif de l’enseignementCINES, centre de calcul intensif de l’enseignement CRIHAN centre régional d’informatique à RouenCRIHAN centre régional d’informatique à Rouen

Network departments:Network departments: UREC CNRS network departmentUREC CNRS network department GIP RenaterGIP Renater

Computing Science CNRS & INRIA labs:Computing Science CNRS & INRIA labs: Université Joseph FourierUniversité Joseph Fourier ID-IMAG ID-IMAG LAASLAAS RESAMRESAM LIP and PSMN (Ecole Normale LIP and PSMN (Ecole Normale

Supérieure de Lyon)Supérieure de Lyon)Industry:Industry:

Société Communication et Systèmes Société Communication et Systèmes EDF R&D departmentEDF R&D department

Applications development teams (HEP, Applications development teams (HEP, Bioinformatics,Earth Observation):Bioinformatics,Earth Observation):

IN2P3, CEA, Observatoire de Grenoble, Laboratoire IN2P3, CEA, Observatoire de Grenoble, Laboratoire de Biométrie, Institut Pierre Simon Laplacede Biométrie, Institut Pierre Simon Laplace

Page 16: DATA GRIDS for  Science and Engineering

LHC Tier 2 Center In 2001LHC Tier 2 Center In 2001

OC-12

Router

DLT

RAID

FCG

bEth

FEth

OC

-3

VRVSMPEG2

OC

-3

FEth SwitchFEth Switch

FEth SwitchFEth SwitchGEth Switch

Data S

erver

Page 17: DATA GRIDS for  Science and Engineering

ESG Prototype ESG Prototype Inter-communication DiagramInter-communication Diagram

LBNLGSI-wuftpd

LLNL

Disk

PCMDI

Request Manager

ISIGSI-

wuftpd

Disk

SDSCGSI-pftpd

HPSS

Disk

ANLGSI-

wuftpd

Disk

NCARGSI-

wuftpd

Disk

LBNL

Diskon

Clipper

HPSS

HRM

ANLReplica Catalog

GIS with NWS

GSI-ncftp

GS

I-ncftpGSI-n

cftp

LDAP Script

LDAP C API or Script

GSI-ncftp

GSI-ncftpGSI-ncftp CORBA

Page 18: DATA GRIDS for  Science and Engineering

GriPhyN ScopeGriPhyN Scope

Several scientific disciplinesSeveral scientific disciplines US-CMSUS-CMS High Energy PhysicsHigh Energy Physics US-ATLASUS-ATLAS High Energy PhysicsHigh Energy Physics LIGOLIGO Gravity wave experimentGravity wave experiment SDSSSDSS Sloan Digital Sky SurveySloan Digital Sky Survey

Requesting $70M from NSF to build GridsRequesting $70M from NSF to build Grids 4 Grid implementations, one per experiment4 Grid implementations, one per experiment Tier2 hardware, networking, people, R&DTier2 hardware, networking, people, R&D Common problems for different Common problems for different

implementationsimplementations PartnershipPartnership with CS professionals, with CS professionals,

IT, industryIT, industry R&D from NSF ITR Program ($12M)R&D from NSF ITR Program ($12M)

Page 19: DATA GRIDS for  Science and Engineering

Data Grids: Better Global Resource Data Grids: Better Global Resource Use Use andand Faster Turnaround Faster Turnaround

Efficient resource use and improved responsiveness Efficient resource use and improved responsiveness through:through:

Treatment of the ensemble of site and network Treatment of the ensemble of site and network resources as an integrated (loosely coupled) systemresources as an integrated (loosely coupled) system

Resource discovery, prioritizationResource discovery, prioritization

Data caching, query estimation, co-scheduling, Data caching, query estimation, co-scheduling, transaction managementtransaction management

Network and site “instrumentation”: performance Network and site “instrumentation”: performance tracking, monitoring, problem trapping and tracking, monitoring, problem trapping and

handlinghandling

Page 20: DATA GRIDS for  Science and Engineering

Emerging ProductionEmerging ProductionGridsGrids

NASA Information Power Grid

NSF National Technology

Grid

Page 21: DATA GRIDS for  Science and Engineering

EU HEP Data Grid Project

Page 22: DATA GRIDS for  Science and Engineering

Grid (IT) Issues to be AddressedGrid (IT) Issues to be Addressed

Data caching and mirroring strategies Data caching and mirroring strategies Object Collection Extract/Export/Transport/Import Object Collection Extract/Export/Transport/Import

for large or highly distributed data transactionsfor large or highly distributed data transactions Query estimators, Query Monitors (cf. ATLAS/GC work)Query estimators, Query Monitors (cf. ATLAS/GC work)

Enable flexible, resilient prioritisation schemesEnable flexible, resilient prioritisation schemes Query redirection, priority alteration, fragmentation, etc.Query redirection, priority alteration, fragmentation, etc.

Pre-Emptive and realtime data/resource matchmakingPre-Emptive and realtime data/resource matchmaking Resource discoveryResource discovery Co-scheduling and queueing Co-scheduling and queueing

State, workflow, & performance-monitoring State, workflow, & performance-monitoring instrumentation; tracking and forward predictioninstrumentation; tracking and forward prediction

Security: Authentication (for resource allocation/usage Security: Authentication (for resource allocation/usage and priority); running an international certificate authorityand priority); running an international certificate authority

Page 23: DATA GRIDS for  Science and Engineering

Why Now?Why Now?

The Internet as infrastructureThe Internet as infrastructure Increasing bandwidth, advanced services;Increasing bandwidth, advanced services;

a need to explore higher throughputa need to explore higher throughput

Advances in storage capacityAdvances in storage capacity A Terabyte for ~$40k (or $ 10k)A Terabyte for ~$40k (or $ 10k)

Increased availability of compute Increased availability of compute resourcesresources Dense (Web) Server-Clusters, Dense (Web) Server-Clusters,

supercomputers, etc.supercomputers, etc.

Advances in application conceptsAdvances in application concepts Simulation-based design, advanced scientific Simulation-based design, advanced scientific

instruments, collaborative engineering, ...instruments, collaborative engineering, ...

Page 24: DATA GRIDS for  Science and Engineering

PPDG Work at Caltech and PPDG Work at Caltech and SLACSLAC

Work on the NTON connections between Caltech and Work on the NTON connections between Caltech and SLACSLAC Test with 8 OC3 adapters on the Caltech Exemplar Test with 8 OC3 adapters on the Caltech Exemplar

multiplexed across to a SLAC Cisco GSR router. multiplexed across to a SLAC Cisco GSR router. Limited throughput due to small MTU in the GSR.Limited throughput due to small MTU in the GSR.

A Dell dual Pentium III based server with A Dell dual Pentium III based server with two OC12 two OC12 (622 Mbps) ATM cards(622 Mbps) ATM cards. Configured to allow . Configured to allow aggregate transfer of more then aggregate transfer of more then 100 Mbytes/seconds100 Mbytes/seconds in both directions Caltech in both directions Caltech SLAC. SLAC.

So far reached 40 Mbytes/sec on one OC12So far reached 40 Mbytes/sec on one OC12 Monitoring tools installed at Caltech/CACRMonitoring tools installed at Caltech/CACR

PingER installed to monitor WAN HEP connectivityPingER installed to monitor WAN HEP connectivity AA Surveyor Surveyor device will be installed soon, for very device will be installed soon, for very

precise measurement of network traffic speeds precise measurement of network traffic speeds Investigations into a distributed resource management Investigations into a distributed resource management

architecture that co-manages processors and dataarchitecture that co-manages processors and data

Page 25: DATA GRIDS for  Science and Engineering

ParticipantsParticipants

Main partners:Main partners: CERN, INFN(I), CNRS(F), PPARC(UK), CERN, INFN(I), CNRS(F), PPARC(UK), NIKHEF(NL), ESA-Earth Observation NIKHEF(NL), ESA-Earth Observation

Other sciences:Other sciences: Earth Observation, Biology, MedicineEarth Observation, Biology, Medicine

Industrial participation:Industrial participation: CS SI/F, DataMat/I, IBM/UKCS SI/F, DataMat/I, IBM/UK

Associated partners:Associated partners: Czech Republic, Finland, Czech Republic, Finland, Germany, Hungary, Spain, Sweden (mostly computer Germany, Hungary, Spain, Sweden (mostly computer scientists)scientists)

Work with US:Work with US:Underway; Formal collaboration being establishedUnderway; Formal collaboration being established

Industry and Research Project ForumIndustry and Research Project Forum with representatives from:with representatives from:

Denmark, Greece, Israel, Japan, Norway, Denmark, Greece, Israel, Japan, Norway, Poland, Portugal, Russia, SwitzerlandPoland, Portugal, Russia, Switzerland

Page 26: DATA GRIDS for  Science and Engineering

GriPhyN: First Production Scale GriPhyN: First Production Scale “Grid Physics Network”“Grid Physics Network”

Develop a New Form of Integrated Distributed Develop a New Form of Integrated Distributed System, while Meeting Primary Goals of the System, while Meeting Primary Goals of the

LIGO, SDSS and LHC Scientific ProgramsLIGO, SDSS and LHC Scientific Programs

Focus on Tier2 Centers at UniversitiesFocus on Tier2 Centers at Universities In a Unified Hierarchical Grid of Five LevelsIn a Unified Hierarchical Grid of Five Levels

18 Centers; with Four Sub-Implementations 18 Centers; with Four Sub-Implementations 5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS Near Term Focus on LIGO, SDSS handling of real Near Term Focus on LIGO, SDSS handling of real

data; LHC “Data Challenges” with simulated datadata; LHC “Data Challenges” with simulated data Cooperation with PPDG, MONARC and Cooperation with PPDG, MONARC and

EU Grid ProjectEU Grid Project

http://www.phys.ufl.edu/~avery/GriPhyN/http://www.phys.ufl.edu/~avery/GriPhyN/

Page 27: DATA GRIDS for  Science and Engineering

An effective collaboration betweenAn effective collaboration betweenPhysicists, Astronomers, and Computer ScientistsPhysicists, Astronomers, and Computer Scientists

Virtual DataVirtual Data A hierarchy of compact data formsA hierarchy of compact data forms, , user user

collections and remote data transformations collections and remote data transformations is essentialis essential

Even with future Gbps networksEven with future Gbps networks Coordination among multiple sites is requiredCoordination among multiple sites is required Coherent strategies are needed for data location, Coherent strategies are needed for data location,

transport, caching and replication, structuring,transport, caching and replication, structuring,and resource co-scheduling for efficient accessand resource co-scheduling for efficient access

GriPhyN: Petascale Virtual GriPhyN: Petascale Virtual Data GridsData Grids

Page 28: DATA GRIDS for  Science and Engineering

Sloan Digital Sky SurveySloan Digital Sky SurveyData GridData Grid

Three main functions:Three main functions: Raw data processing on a Grid (FNAL)Raw data processing on a Grid (FNAL) Rapid turnaround with TBs of dataRapid turnaround with TBs of data Accessible storage of all image dataAccessible storage of all image data

Fast science analysis environmentFast science analysis environment(JHU)(JHU)

Combined data access + analysis Combined data access + analysis of calibrated dataof calibrated data

Distributed I/O layer and processing Distributed I/O layer and processing layer; shared by whole collaborationlayer; shared by whole collaboration

Public data accessPublic data access SDSS data browsing for SDSS data browsing for

astronomers, and studentsastronomers, and students Complex query engine for the publicComplex query engine for the public