Introduction to GRID Computing

254
Introduction to GRID Computing Bebo White [email protected] New Directions in Information Technology Series Contra Costa College Fall 2005

description

Introduction to GRID Computing. Bebo White [email protected]. New Directions in Information Technology Series Contra Costa College Fall 2005. Today’s Goals. To provide an introduction to key Grid computing and Web services issues, techniques, and technologies - PowerPoint PPT Presentation

Transcript of Introduction to GRID Computing

  • Introduction to GRID ComputingBebo White [email protected] Directions in Information Technology Series

    Contra Costa College

    Fall 2005

    DataGrid is a project funded by the European Union

  • Todays GoalsTo provide an introduction to key Grid computing and Web services issues, techniques, and technologiesTo provide a substantial background and vocabulary to support future studies in Grid computing and Web servicesTo describe some of the current applications of Grid computingTo describe some of the current Grid computing initiatives

    DataGrid is a project funded by the European Union

  • Grid Hype

    DataGrid is a project funded by the European Union

  • The Power Grid -On-Demand Access to ElectricityDecouple production & consumption, enablingOn-demand accessEconomies of scaleConsumer flexibilityNew devicesTimeQuality, economies of scale

    DataGrid is a project funded by the European Union

  • The Shape of Grids to Come?

    DataGrid is a project funded by the European Union

  • A Grid Checklist (#1)A system that coordinates resources that are not subject to centralized controlIntegrates and coordinates resources and users that live within different control domains for example, the users desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings.Otherwise we are dealing with a local management system(Ian Foster)

    DataGrid is a project funded by the European Union

  • A Grid Checklist (#2)A system that uses standard, open, general-purpose protocols and interfacesIs built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access.It is important that these protocols and interfaces be standard and open.Otherwise, we are dealing with an application-specific system.(Ian Foster)

    DataGrid is a project funded by the European Union

  • A Grid Checklist (#3)A system that delivers nontrivial qualities of service.Allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating, for example, to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than the sum of its parts.(Ian Foster)

    DataGrid is a project funded by the European Union

  • What is Grid Computing ?Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [ I.Foster]A VO is a collection of users sharing similar needs and requirements in their access to processing, data and distributed resources and pursuing similar goals. Key concept :Ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose [I.Foster]

    DataGrid is a project funded by the European Union

  • The Grid ProblemFlexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom The Anatomy of the Grid: Enabling Scalable Virtual OrganizationsEnable communities (virtual organizations) to share geographically distributed resources as they pursue common goals -- assuming the absence ofcentral location,central control, omniscience, existing trust relationships.

    DataGrid is a project funded by the European Union

  • Elements of the ProblemResource sharingComputers, storage, sensors, networks, Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, Dynamic, multi-institutional virtual orgsCommunity overlays on classic org structuresLarge or small, static or dynamic

    DataGrid is a project funded by the European Union

  • The Grid Information ProblemThere is a need for different views of the information depending uponVO membershipSecurity constraintsIntended purposeEtc.

    DataGrid is a project funded by the European Union

  • Why Grids ?Scale of the problems/applicationsSolving problems that are bigger than any one data center can holdSize of user communitiesLeading research in many different fields today require collaborations that span research centers and countries (i.e. multi-domain access to distributed resources) Need to provide access to large data processing power and huge data storage

    DataGrid is a project funded by the European Union

  • What Kinds of Applications?Computation intensiveInteractive simulation (climate modeling)Large-scale simulation (galaxy formation, gravity waves, battlefield simulation)Engineering (parameter studies, linked models)Data intensiveExperimental data analysis (high energy physics)Image, sensor analysis (astronomy, climate)Distributed collaborationOnline instruments (microscopes, x-ray devices)Remote visualization (climate studies, biology)Engineering (structural testing, chemical)

    DataGrid is a project funded by the European Union

  • Online Access to Scientific InstrumentsDOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicagotomographic reconstructionreal-timecollectionwide-areadisseminationdesktop & VR clients with shared controlsAdvanced Photon Sourcearchival storage

    DataGrid is a project funded by the European Union

  • Mathematicians Solve NUG30Looking for the solution to the NUG30 quadratic assignment problem The problem involves assigning 30 facilities to 30 fixed locations so as to minimize the total cost of transferring material between the facilities. An informal collaboration of mathematicians and computer scientistsCondor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

    MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

    DataGrid is a project funded by the European Union

  • Home Computers Evaluate AIDS DrugsCommunity =1000s of home computer usersPhilanthropic computing vendor (Entropia)Research group (Scripps)Common goal= advance AIDS research

    DataGrid is a project funded by the European Union

  • Network for Earthquake Engineering Simulation NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each otherOn-demand access to experiments, data streams, computing, archives, collaborationNEESgrid: Argonne, Michigan, NCSA, UIUC, USC

    DataGrid is a project funded by the European Union

  • The LHC DetectorsCMSATLASLHCb~6-8 PetaBytes / year~108 events/year~103 batch and interactive users Federico.carminati , EU review presentationHigh Energy Physics

    DataGrid is a project funded by the European Union

  • Data Grids for High Energy PhysicsImage courtesy Harvey Newman, Caltech

    DataGrid is a project funded by the European Union

  • Solving Large Problems Pre-GridMini ComputerMicrocomputerCluster(by Christophe Jacquet)Once upon a time..mainframe

    DataGrid is a project funded by the European Union

  • The Grid Distributed Computing Idea (by Christophe Jacquet)and today

    DataGrid is a project funded by the European Union

  • Differences Between Grids andDistributed ApplicationsHuge distributed applications already exist, but they tend to be specialized systems intended for a single purpose or user group e.g., SETI@Home, FightAIDS@HomeGrids go further and take into account:Different kinds of resourcesNot always the same hardware, data and applicationsNo parallelization requiredDifferent kinds of interactionsUser groups or applications want to interact with Grids in different waysDynamic natureResources and users added/removed/changed frequently

    DataGrid is a project funded by the European Union

  • The Grid Vision

    DataGrid is a project funded by the European Union

  • Broader ContextGrid Computing has much in common with major industrial thrustsBusiness-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet ComputingSharing issues not adequately addressed by existing technologies Complicated requirements: run program X at site Y subject to community policy P, providing access to data at Z according to policy QHigh performance: unique demands of advanced and high-performance systems

    DataGrid is a project funded by the European Union

  • Grid Types - PhysicalCluster Grid Enterprise Grid Global Grid

    DataGrid is a project funded by the European Union

  • Grid Types - LogicalData Grid responds to requests for computers and data stores; similar to (but more secure and auditable than) today's research gridsInformation Grid responds to requests for computational processes, that may require several data sources and processing stages to deliver a desired resultKnowledge Grid responds to high-level questions and finds the appropriate processes to deliver answers in the required form

    DataGrid is a project funded by the European Union

  • The Classical (early) GridFocused on applications where data was stored in fileslittle support for transactions, relational database access or distributed query processingExploits a range of protocols such as: LDAP for directory services and file store queries,GridFTP for large-scale reliable data transferSSL for security

    DataGrid is a project funded by the European Union

  • Why Now?Moores law improvements in computing produce highly functional end systemsThe Internet and burgeoning wired and wireless provide universal connectivityChanging modes of working and problem solving emphasize teamwork, computationNetwork exponentials produce dramatic changes in geometry and geography

    DataGrid is a project funded by the European Union

  • Network ExponentialsNetwork vs. computer performanceComputer speed doubles every 18 monthsNetwork speed doubles every 9 monthsDifference = order of magnitude per 5 years1986 to 2000Computers: x 500Networks: x 340,0002001 to 2010Computers: x 60Networks: x 4000Moores Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

    DataGrid is a project funded by the European Union

  • The 13.6 TF TeraGrid:Computing at 40 Gb/s262484HPSS5HPSSHPSSUniTreeExternal NetworksExternal NetworksExternal NetworksExternal NetworksSite ResourcesSite ResourcesSite ResourcesSite ResourcesNCSA/PACI8 TF240 TBSDSC4.1 TF225 TBCaltechArgonneTeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org

    DataGrid is a project funded by the European Union

  • iVDGL:International Virtual Data Grid LaboratoryU.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

    DataGrid is a project funded by the European Union

  • Main Services of a Grid ArchitectureService providersPublish the availability of their services via information systemsSuch services may come-and-go or change dynamicallyE.g. a testbed site that offers x CPUs and y GB of storageService brokersRegister and categorize published services and provide search capabilitiesE.g. 1) SLAC Resource Broker selects the best site for a job 2) Catalogues of data held at each testbed siteService requestersSingle sign-on: log into the Grid onceUse brokering services to find a needed service and employ itE.g. CMS physicists submit a simulation job that needs 12 CPUs for 6 hours and 15 GB which gets scheduled, via the Resource Broker, on the CERN testbed site

    DataGrid is a project funded by the European Union

  • Grid SecurityResource providers are essentially opening themselves up to itinerant usersSecure access to resources is requiredX.509 Public Key InfrastructureUsers identity has to be certified by (mutually recognized) national Certification Authorities (CAs)Resources (node machines) have to be certified by CAsTemporary delegation from users to processes to be executed in users name ( proxy certificates )Common agreed policies for accessing resource and handling users rights across different domains within VOs

    DataGrid is a project funded by the European Union

  • The Globus ProjectMaking Grid computing a realityClose collaboration with real Grid projects in science and industryDevelopment and promotion of standard Grid protocols to enable interoperability and shared infrastructureDevelopment and promotion of standard Grid software APIs and SDKs to enable portability and code sharingThe Globus Toolkit: Open source, reference software base for building grid infrastructure and applicationsGlobal Grid Forum: Development of standard protocols and APIs for Grid computing

    DataGrid is a project funded by the European Union

  • Selected Major Grid ProjectsNewNew

    NameURL & SponsorsFocusAccess Gridwww.mcs.anl.gov/FL/ accessgrid; DOE, NSFCreate & deploy group collaboration systems using commodity technologies BlueGridIBMGrid testbed linking IBM laboratoriesDISCOMwww.cs.sandia.gov/ discom DOE Defense ProgramsCreate operational Grid providing access to resources at three U.S. DOE weapons laboratoriesDOE Science Gridsciencegrid.orgDOE Office of ScienceCreate operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universitiesEarth System Grid (ESG)earthsystemgrid.org DOE Office of ScienceDelivery and analysis of large climate model datasets for the climate research communityEuropean Union (EU) DataGrid eu-datagrid.orgEuropean UnionCreate & apply an operational grid for applications in high energy physics, environmental science, bioinformatics

    DataGrid is a project funded by the European Union

  • Selected Major Grid ProjectsNewNewNewNewNew

    NameURL/SponsorFocusEuroGrid, Grid Interoperability (GRIP)eurogrid.orgEuropean UnionCreate tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus ToolkitFusion Collaboratoryfusiongrid.orgDOE Off. ScienceCreate a national computational collaboratory for fusion researchGlobus Projectglobus.orgDARPA, DOE, NSF, NASA, MsoftResearch on Grid technologies; development and support of Globus Toolkit; application and deploymentGridLabgridlab.orgEuropean UnionGrid technologies and applicationsGridPPgridpp.ac.ukU.K. eScienceCreate & apply an operational grid within the U.K. for particle physics researchGrid Research Integration Dev. & Support Centergrids-center.orgNSFIntegration, deployment, support of the NSF Middleware Infrastructure for research & education

    DataGrid is a project funded by the European Union

  • Selected Major Grid ProjectsNewNew

    NameURL/SponsorFocusGrid Application Dev. Softwarehipersoft.rice.edu/ grads; NSFResearch into program development technologies for Grid applicationsGrid Physics Networkgriphyn.orgNSFTechnology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSSInformation Power Gridipg.nasa.govNASACreate and apply a production Grid for aerosciences and other NASA missionsInternational Virtual Data Grid Laboratoryivdgl.orgNSFCreate international Data Grid to enable large-scale experimentation on Grid technologies & applicationsNetwork for Earthquake Eng. Simulation Gridneesgrid.orgNSFCreate and apply a production Grid for earthquake engineeringParticle Physics Data Gridppdg.netDOE ScienceCreate and apply production Grids for data analysis in high energy and nuclear physics experiments

    DataGrid is a project funded by the European Union

  • Selected Major Grid ProjectsNewNewAlso many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS

    See also www.gridforum.org

    NameURL/SponsorFocusTeraGridteragrid.orgNSFU.S. science infrastructure linking four major resource sites at 40 Gb/s UK Grid Support Centergrid-support.ac.ukU.K. eScienceSupport center for Grid projects within the U.K.UnicoreBMBFTTechnologies for remote access to supercomputers

    DataGrid is a project funded by the European Union

  • Where is Development of the Grid Going ?GridWebThe definition of WSRF means that Grid and Web communities can move forward on a common base

    DataGrid is a project funded by the European Union

  • StandardsGrid and Web Services are mergingGrid is an aggressive use case of Web ServicesWSRF completes common infrastructureWeb Services standards landscape is in fluxUncertain status of security and policy standards continues to be a big source of concernGrid services standards landscape heating upAgreement, management, data access, Open source software important for adoption

    DataGrid is a project funded by the European Union

  • Standards (cont)Open, standard protocolsEnable interoperabilityAvoid product/vendor lock-inEnable innovation/competition on end pointsEnable ubiquityIn Grid space, must address how toDescribe, discover, and access resourcesMonitor, manage, and coordinate, resourcesAccount and charge for resourcesFor many different types of resource

    DataGrid is a project funded by the European Union

  • Standards (cont)SSL/TLS v1 (from OpenSSL) (IETF)LDAP v3 (from OpenLDAP) (IETF)X.509 Proxy Certificates (IETF)GridFTP v1.0 (GGF)WSDL 1.1, XML, SOAP (W3C)WS-Security (OASIS)OGSI v1.0 (GGF)And others on the road to standardizationWSRF (OASIS), DAIS (GGF), WS-Agreement (GGF), WSDL 2.0, WSDM, SAML, XACML

    DataGrid is a project funded by the European Union

  • WSRF SpecificationsList is still changing, but basically includes..Core:WS-Resource Framework (WSRF)WS-ResourceProperties (WSRF-RP)WS-ResourceLifetime (WSRF-RL)WS-ServiceGroup (WSRF-SG)WS-Base Faults(WSRF-BF)Related:WS-NotificationsWS-Addressing

    DataGrid is a project funded by the European Union

  • WSRFWSRF is a framework consisting of a number of specifications.WS-Resource Properties WS-Resource Lifetime WS-Service GroupsWS-NotificationWS-BaseFaultsWS-Renewable References (unpublished)

    Other WS specifications such as:WS-Addressing

    DataGrid is a project funded by the European Union

  • How WSRF Fits in With Other Standards, Specifications and Protocols.Internet protocolsWeb servicesWSRFGrid stuffGlobus (GRAM, MDS)WSDL, SOAPHTTP, TCP/IP

    DataGrid is a project funded by the European Union

  • Describing Web ServicesWeb Services Description Language (WSDL) 2.0Status: W3C Last Call Working Draft http://www.w3.org/TR/wsdl WSDL is for describing Web ServicesDefines XML-based grammar for describing network services as a set of endpointsDescribes their methods, arguments, return values and how to useApproach: Service Oriented Architecture (SOA)Service-Provider:Develop a Web Service and publish its description as WSDLPublish a link to it in a Service-RegistryService-Consumer:Service discovery, i.e. find WSDL, e.g. via Service-RegistryUse endpoint definition (WSDL) to communicate with service

    DataGrid is a project funded by the European Union

  • Web Services AddressingURIs (Uniform Resource Identifiers). Look like URLs:http://webservices.mysite.com/weather/us/WeatherServiceWhen you have a Web Service URI, you will usually need to give that URI to a programIf you typed a Web Service URI into your web browser, you would probably get an error message or some unintelligible codeSome services include a polite response page

    DataGrid is a project funded by the European Union

  • Service-Oriented ArchitecturePublishEndpoint DefinitionRegistry:Service BrokerService ProviderService ConsumerDiscoveryBind

    DataGrid is a project funded by the European Union

  • Web Services ArchitectureWSDL: Core element of the Web Service Architecture stack (Endpoint definition language)ListenerResponderWeb ServiceXML 1.0 + Namespaces(messaging)SOAP(messaging)XSD(service description)WSDL(service description)UDDI(service discovery)Simplified Web Service Stack (WS-I Basic Profile 1.0 compliant)WSDL

    DataGrid is a project funded by the European Union

  • WSDL GoalsExtensibility with respect toNew Transport protocolsNew Encoding rulesAbstraction with respect toEndpoints and MessagesTHEN mapped onto n concrete transports and encodingsReuse with respect toDefinitions reuseable to create new definitions

    DataGrid is a project funded by the European Union

  • Abstract Endpoint TypePossibly part of a WSDL specificationMessageOperation PortType (Abstract Endpoint Type)Set of message flows (operations) expected by a particular endpoint type - No details relating to transport or encoding or locationMessageMessageMessageMessageMessageMessageOne-way operationRequest-Response operationNotificationoperationSolicit-ResponseoperationAbstract EndpointType

    (PortType)

    DataGrid is a project funded by the European Union

  • Concrete Endpoint TypeBinding (Concrete Endpoint Type)Defines transport and encoding particulars for a portTypeConcrete Endpoint Type(Binding)Concrete Endpoint Type(Binding)Messagesfor operationMessagesfor operationMessagesfor operationURIURIURIPortTypePortTypeTransport & Encodingoperationoperationoperationoperation

    DataGrid is a project funded by the European Union

  • Shift to Service DefinitionPort (Endpoint Instance)Network address of an endpoint and the binding it adheres toNote not necessarily an TCP port

    ServiceA collection of related endpoint instances HostConcrete Endpoint Type(Binding)HostConcrete Endpoint Type(Binding)Endpoint Instance(Port)Endpoint Instance(Port)Service

    DataGrid is a project funded by the European Union

  • Describing Web ServicesAll WSDL Elements belong to the WSDL namespace: http://schemas.xmlsoap.org/wsdl/Namespaces for WSDL BindingSOAP Binding: http://schemas.xmlsoap.org/wsdl/soap/HTTP GET and POST Binding: http://schemas.xmlsoap.org/wsdl/http/WSDL MIME binding: http://schemas.xmlsoap.org/wsdl/mime/More to come

    DataGrid is a project funded by the European Union

  • State ManagementCore communication model of the Web (HTTP) is statelessApplication requires state when a user traverses the multiple endpoints of a Web application/service

    DataGrid is a project funded by the European Union

  • Web Service: Stateless

    DataGrid is a project funded by the European Union

  • Web Service: Stateful

    DataGrid is a project funded by the European Union

  • Web Service Invocation - Stateful

    DataGrid is a project funded by the European Union

  • Web Service + WSRF = Stateful Resources = WS-ResourceA stateful resource is something that exists even when you're not interacting with it.E.g. database backend serviceStateful resources have properties that define statethese properties are how you interact with themProperties have valuesAdd/remove/change properties and values dynamically WSRF Specification:a WS-Resource is the combination of a Web service and a stateful resource on which it acts.

    DataGrid is a project funded by the European Union

  • WS-Resource Approach to StateTypical approach:Put the state in the Web service (thus making it stateful, which is generally regarded as a bad thing) WSRF approach:Store state in a separate entity called a resourceEach resource has a unique key, A Web service can have multiple resourcesTo connect to service: URI + WS-Addressing Std

    DataGrid is a project funded by the European Union

  • WS-ResourcesWeb services often provide access to stateJob submissions, databases

    A WS-Resource is standard way of representing that state.

    In this tutorial, we will be using counter resources which are simple accumulators.

    DataGrid is a project funded by the European Union

  • WS-ResourcesWSRF specifications provide:XML-based Resource PropertiesLifetime management (creation/destruction) of resourcesServicegroups, which group together WS-ResourcesNotification(for example of changes in resource properties)FaultsRenewable References

    DataGrid is a project funded by the European Union

  • Examples of WS-ResourcesFiles on a file serverRows in a databaseJobs in a job submission systemAccounts in a bank

    DataGrid is a project funded by the European Union

  • Session DesignSession Defines a context in which a user communicates with a Web Application in a defined time periodOne Session per user Assigns application state to multiple requests from one userDesign Decision / Rules of thumbUse a database to persist stateUUID to identify a session/userPhysical Design: Session identifier exchange Cookie, hidden variable, or encoded into the URL

    DataGrid is a project funded by the European Union

  • TransactionsTransaction A unit of work that should either succeed or fail as a whole. A series of operations that behave corresponding to the ACID rules.Series: BEGIN_TRANSACTION, Op1, , OpN, COMMIT_TRANSACTIONACID Rules define Atomicity, Consistency, Isolation, and DurabilityCharacteristics regarding Web ApplicationsLong RunningNested

    DataGrid is a project funded by the European Union

  • Atomicity And ConsistencyAtomicityTransaction executes exactly once and is atomicAll the work is done or none of itConsistencyTransaction preserves the consistency of dataTransforming one consistent state results in another consistent state of data

    DataGrid is a project funded by the European Union

  • Isolation And DurabilityIsolationTransaction is a unit of isolationConcurrent transactions behave as though each was the only transaction running in the SystemDurabilityTransaction is a unit of recoveryIf a transaction commits, the system guarantees that its updates will persist, immediately after the commit.

    DataGrid is a project funded by the European Union

  • Aspects of DSADriven by communication aspectsPerformance issuesProtocol overheadBandwidthQuality of ServiceDelaysProxy, Cache and MirrorsOther IssuesSecurity, availability, etc.Operational aspects

    DataGrid is a project funded by the European Union

  • Simple Web Service ChainWeb Service WS 1 provides functionality using WS 2, WS 2 providesLike a chain: The weakest element influences the overall behaviorHops - Represents the number of network nodes involved from the source WS to the destination WS. Example shows 2 Hops, 4 Web Services

    WS 1WS 2WS 3WS n

    DataGrid is a project funded by the European Union

  • Considering ScalabilityScale Up: More power added to the machineScale Out: The application logic unit is cloned across a set of identical serversScale UpScale Out

    DataGrid is a project funded by the European Union

  • Scale-Out and PartitionScale out Web Servers and scale up DatabaseScale Up DatabasePartition Database

    DataGrid is a project funded by the European Union

  • Partition DatabaseFunctional Each functional area of a site gets its own databaseDedicated hardware to certain functionsClass of hardware per functionTables - Huge scale opportunity for large tablesSome modern database management systems provide special support for thisRead-only DatabasesData changes do not occur oftenUse of Replicated Databases

    DataGrid is a project funded by the European Union

  • Dynamic WS DiscoveryWeb Service calls Web Service mediated by Broker (respectively P2P network)Criteria may be quality, context, price, etc.Requires classification system or metadataBroker could use UDDI automatically on requestP2P discovery by content-based routing (e.g. for WSDL)WS 1Broker / P2P-NetworkWS xWS y

    DataGrid is a project funded by the European Union

  • Integrating EndpointsTypical ProblemsNo standard Way to expose FunctionalityIntegration is expensive and error-proneNot designed for Partnership ScenarioWhy?Semantic of content gets lost on its way to presentationNeed for Semantic

    DataGrid is a project funded by the European Union

  • Integrating Application LogicGoal: Federating Web Applications (respectively their Logical Units)Globalize the Component-based ViewNext Generation Web Applications will work togetherExtend processes with external (potentially unknown) partners

    DataGrid is a project funded by the European Union

  • Federation ApproachWeb ApplicationWeb ApplicationWeb ApplicationWeb ApplicationInternet

    DataGrid is a project funded by the European Union

  • Federation ScenariosDistributed Computing / Web Services in use for:Mobile Virtual EnterpriseMarket-place, Supply Chain, Grid Computing (Grid of Web Services) Portals providing uniform Access to distributed Information SpacesExamples of Business Relationships: B2B: Business-to-BusinessB2C: Business-to-ConsumerC2C: Consumer-to-ConsumerB2A: Business-to-AdministrationA2C: Administration-to-ConsumerA2A: Administration-to-Administration

    DataGrid is a project funded by the European Union

  • Accessing ObjectsSOAP Version 1.2 W3C Recommendation 24 June 2003Part 0- Tutorial: http://www.w3.org/TR/soap12-part0/Part1: Defines Messaging FrameworkPart2: Adjuncts (may be used in messages)SOAP provides a simple and lightweight Mechanism for exchanging structured and typed Information between Peers in a decentralized, distributed EnvironmentFormerly known as Simple Object Access ProtocolDoes not itself define any Application Semantics, e.g. Programming Model

    DataGrid is a project funded by the European Union

  • SOAPSOAP consists of three Parts:SOAP envelope - Defines what is in a message; who should deal with it, and whether it is optional or mandatorySOAP encoding rules - Define a serialization mechanism for application-defined data types. SOAP RPC representation - Define a convention that can be used to represent remote procedure calls and responses.

    DataGrid is a project funded by the European Union

  • General Web Service ModelConsumerWeb Service(Provider)TransportProcess-LogicSOAPMessageRequestorParserListenerRespondere.g. HTTP(S), SMTP, FTP)Message

    DataGrid is a project funded by the European Union

  • SOAP MessageSOAP ProtocolLayeringSOAPApplication Protocol(HTTP, SMTP, etc.)Transport Protocol(TCP/IP, IPX/SPX, etc.)Physical Protocol(Ethernet, ATM, etc.)

    DataGrid is a project funded by the European Union

  • SOAP and Client/ServerIn order for SOAP to work, the client must have code running that is responsible for building the SOAP request. In response, a server must also be responsible for understanding the SOAP request, invoke the specified method, build the response message, and return it to the client.These details are up to you: your Web application

    DataGrid is a project funded by the European Union

  • The HTTP AspectA SOAP request via HTTP POST requestsPOST /WebCalculator/Calculator.asmx HTTP/1.1Content-Type: text/xml...SOAPAction: http://tempuri.org/AddContent-Length: 386

    ...

    DataGrid is a project funded by the European Union

  • Message StructureSOAP MessageSOAP EnvelopeSOAP HeaderSOAP BodyMessage Name and DataHeadersHeadersXML-encoded SOAP message name and data contains SOAP message nameIndividual headers encloses headers encloses payloadProtocol binding headersThe complete SOAP message

    DataGrid is a project funded by the European Union

  • SOAP Message ExampleAn XML document using the SOAP schema:

    ... 12 10

    DataGrid is a project funded by the European Union

  • Encoding Complex DataData structures are serialized as XML:

    Plastic Novelties Ltd 129 PLAS

    DataGrid is a project funded by the European Union

  • Example of a SOAP RequestSOAP message over HTTP-POST:POST /StockQuote HTTP/1.1Host: www.stockquoteserver.comContent-Type: text/xml; charset="utf-8"Content-Length: nnnnSOAPAction: "Some-URI

    DIS

    DataGrid is a project funded by the European Union

  • A SOAP ResponseSOAP response over HTTPHTTP/1.1 200 OKContent-Type: text/xml; charset="utf-8"Content-Length: nnnn

    34.5

    DataGrid is a project funded by the European Union

  • Example of a SOAP ErrorSOAP response over HTTPHTTP/1.1 500 Internal Server ErrorContent-Type: text/xml; charset="utf-8"Content-Length: nnnn

    SOAP: MustUnderstand SOAP Must Under Error

    DataGrid is a project funded by the European Union

  • Security and FeaturesIn context of HTTP builds on existing securityHTTPSX.509 certificatesDevelopers explicitly choose which methods to exposeExtensibility - the major strength of SOAPE.g. check the WS-* specifications http://msdn.microsoft.com/webservicesCf. WS-Security Roadmap

    DataGrid is a project funded by the European Union

  • WS-Security RoadmapSecuritySecurityPolicySecureConversationTrustFederationPrivacyAuthorizationSOAP Messaging

    DataGrid is a project funded by the European Union

  • Discovering Web ServicesUniversal Description, Discovery, and Integration (UDDI) Specifies what the API for a Web-based Registry looks like.All about the Yellow, White & Green PagesDefines how to run and operate Registry Sites on the WebDefines how to pay for its Operation encourages basic lookup services for freeFurther Information at http://uddi.org

    DataGrid is a project funded by the European Union

  • Registry OperationPeer nodes (websites)Companies register with any nodeRegistrations replicated on a daily basisComplete set of registered records available at all nodesCommon set of SOAP APIs supported by all nodesCompliance enforced by business contractAribaMicrosoftotherUDDI.orgqueriesIBM

    DataGrid is a project funded by the European Union

  • Why a DNS-like Model?Enforces cross-platform compatibility across competitor platformsDemonstration of trust and opennessAvoids tacit endorsement of any one vendors platformMay migrate to a third party

    DataGrid is a project funded by the European Union

  • UDDI provides informationWho Business InformationWhat Find the right Type of BusinessWhere To Access a ServiceHow Describes how a given Interface functionsInformation provided at http://uddi.microsoft.com

    DataGrid is a project funded by the European Union

  • UDDI A Publisher View

    DataGrid is a project funded by the European Union

  • UDDI and Web ServicesDiscovery

    Let me talk to you (SOAP)

    How do we talk? (WSDL)

    Find a Servicereturn service response (XML)http://yourservice.com/svc1return service descriptions (XML)http://yourservice.com/?WSDLHTML with link to WSDLhttp://yourservice.comhttp://www.uddi.orgLink to discovery documentWebService ConsumerWebService ProviderUDDI

    DataGrid is a project funded by the European Union

  • UDDI and SOAPUser UDDI SOAP RequestUDDI SOAP ResponseUDDI Registry Node HTTP ServerSOAP ProcessorUDDI Registry ServiceB2B DirectoryCreate, View, Update, and Delete registrationsImplementation- neutral

    DataGrid is a project funded by the European Union

  • Registry APIs (SOAP)Inquiry APIFind thingsfind_businessfind_servicefind_bindingfind_tModelGet Details about thingsget_businessDetailget_serviceDetailget_bindingDetailget_tModelDetailPublishers APISave thingssave_businesssave_servicesave_bindingsave_tModelDelete thingsdelete_businessdelete_servicedelete_bindingdelete_tModelsecurityget_authTokendiscard_authToken

    DataGrid is a project funded by the European Union

  • Web Services Makes Sense For Grid ComputingClient requesting Grid ServiceSOAPMessageGrid ServiceProviderHTTP TransportVO BoundaryOr NetworkInterface inWDSL

    DataGrid is a project funded by the European Union

  • Why Should HPC Folks Care About the Grid ? 1) Grid is a disruptive technology [Vision]It ushers in a virtualized, collaborative, distributed world that our applications will use2) Grid addresses pain points now [Reality]Grids are built not bought, and are delivering real benefitsThe computational demands of our applications are not going to get simpler 3) An open Grid is to our advantage [Future]Standards are being defined now that will determine the future of this technology

    DataGrid is a project funded by the European Union

  • The Globus ProjectMaking Grid computing a realityClose collaboration with real Grid projects in science and industryDevelopment and promotion of standard Grid protocols to enable interoperability and shared infrastructureDevelopment and promotion of standard Grid software APIs and SDKs to enable portability and code sharingThe Globus Toolkit: Open source, reference software base for building grid infrastructure and applicationsGlobal Grid Forum: Development of standard protocols and APIs for Grid computing

    DataGrid is a project funded by the European Union

  • Some Important DefinitionsResourceNetwork protocolNetwork enabled serviceApplication Programmer Interface (API)Software Development Kit (SDK)Syntax

    Not discussed, but important: policies

    DataGrid is a project funded by the European Union

  • ResourceAn entity that is to be sharedE.g., computers, storage, data, softwareDoes not have to be a physical entityE.g., Condor pool, distributed file system, Defined in terms of interfaces, not devicesE.g. scheduler such as LSF and PBS define a compute resourceOpen/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS

    DataGrid is a project funded by the European Union

  • Network ProtocolA formal description of message formats and a set of rules for message exchangeRules may define sequence of message exchangesProtocol may define state-change in endpoint, e.g., file system state changeGood protocols designed to do one thingProtocols can be layeredExamples of protocolsIP, TCP, TLS (was SSL), HTTP, Kerberos

    DataGrid is a project funded by the European Union

  • Network Enabled ServicesImplementation of a protocol that defines a set of capabilitiesProtocol defines interaction with serviceAll services require protocolsNot all protocols are used to provide services (e.g. IP, TLS)Examples: FTP and Web servers

    DataGrid is a project funded by the European Union

  • Application Programming InterfaceA specification for a set of routines to facilitate application developmentRefers to definition, not implementationE.g., there are many implementations of MPI Spec often language-specific (or IDL)Routine name, number, order and type of arguments; mapping to language constructsBehavior or function of routineExamplesGSS API (security), MPI (message passing)

    DataGrid is a project funded by the European Union

  • Software Development KitA particular instantiation of an APISDK consists of libraries and toolsProvides implementation of API specificationCan have multiple SDKs for an APIExamples of SDKsMPICH, Motif Widgets

    DataGrid is a project funded by the European Union

  • SyntaxRules for encoding information, e.g.XML, Condor ClassAds, Globus RSLX.509 certificate format (RFC 2459)Cryptographic Message Syntax (RFC 2630)Distinct from protocolsOne syntax may be used by many protocols (e.g., XML); & useful for other purposesSyntaxes may be layeredE.g., Condor ClassAds -> XML -> ASCIIImportant to understand layerings when comparing or evaluating syntaxes

    DataGrid is a project funded by the European Union

  • A Protocol can have Multiple APIsTCP/IP APIs include BSD sockets, Winsock, System V streams, The protocol provides interoperability: programs using different APIs can exchange informationI dont need to know remote users APITCP/IP Protocol: Reliable byte streamsWinSock APIBerkeley Sockets APIApplicationApplication

    DataGrid is a project funded by the European Union

  • An API can have Multiple ProtocolsMPI provides portability: any correct program compiles & runs on a platformDoes not provide interoperability: all processes must link against same SDKE.g., MPICH and LAM versions of MPI

    DataGrid is a project funded by the European Union

  • APIs and Protocols are Both ImportantStandard APIs/SDKs are importantThey enable application portabilityBut w/o standard protocols, interoperability is hard (every SDK speaks every protocol?)Standard protocols are importantEnable cross-site interoperabilityEnable shared infrastructureBut w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)

    DataGrid is a project funded by the European Union

  • Why Discuss Architecture?DescriptiveProvide a common vocabulary for use when describing Grid systemsGuidanceIdentify key areas in which services are required PrescriptiveDefine standard Intergrid protocols and APIs to facilitate creation of interoperable Grid systems and portable applications

    DataGrid is a project funded by the European Union

  • One View of RequirementsIdentity & authenticationAuthorization & policyResource discoveryResource characterizationResource allocation(Co-)reservation, workflowDistributed algorithmsRemote data accessHigh-speed data transferPerformance guaranteesMonitoringAdaptationIntrusion detectionResource managementAccounting & paymentFault managementSystem evolutionEtc.Etc.

    DataGrid is a project funded by the European Union

  • Another View: Three Obstaclesto Making Grid Computing RoutineNew approaches to problem solvingData Grids, distributed computing, peer-to-peer, collaboration grids, Structuring and writing programsAbstractions, toolsEnabling resource sharing across distinct institutionsResource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification;

    DataGrid is a project funded by the European Union

  • Programming & Systems ProblemsThe programming problemFacilitate development of sophisticated appsFacilitate code sharingRequires prog. envs: APIs, SDKs, toolsThe systems problemFacilitate coordinated use of diverse resourcesFacilitate infrastructure sharing: e.g., certificate authorities, info servicesRequires systems: protocols, servicesE.g., port/service/protocol for accessing information, allocating resources

    DataGrid is a project funded by the European Union

  • The Systems Problem:Resource Sharing Mechanisms That Address security and policy concerns of resource owners and usersAre flexible enough to deal with many resource types and sharing modalitiesScale to large number of resources, many participants, many program componentsOperate efficiently when dealing with large amounts of data & computation

    DataGrid is a project funded by the European Union

  • Aspects of the Systems ProblemNeed for interoperability when different groups want to share resourcesDiverse components, policies, mechanismsE.g., standard notions of identity, means of communication, resource descriptionsNeed for shared infrastructure services to avoid repeated development, installationE.g., one port/service/protocol for remote access to computing, not one per tool/applnE.g., Certificate Authorities: expensive to runA common need for protocols & services

    DataGrid is a project funded by the European Union

  • A Protocol-Oriented View of Grid Architecture That Emphasizes Development of Grid protocols & servicesProtocol-mediated access to remote resourcesNew services: e.g., resource brokeringOn the Grid = speak Intergrid protocolsMostly (extensions to) existing protocolsDevelopment of Grid APIs & SDKsInterfaces to Grid protocols & servicesFacilitate application development by supplying higher-level abstractionsThe (hugely successful) model is the Internet

    DataGrid is a project funded by the European Union

  • Layered Grid Architecture(By Analogy to Internet Architecture)

    DataGrid is a project funded by the European Union

  • Protocols, Services, and APIs Occur at Each LevelLanguages/FrameworksFabric LayerApplicationsLocal Access APIs and ProtocolsCollective Service APIs and SDKsCollective ServicesCollective Service ProtocolsResource APIs and SDKsResource ServicesResource Service ProtocolsConnectivity APIsConnectivity Protocols

    DataGrid is a project funded by the European Union

  • Important PointsBuilt on Internet protocols & servicesCommunication, routing, name resolution, etc.Layering here is conceptual, does not imply constraints on who can call whatProtocols/services/APIs/SDKs will, ideally, be largely self-containedSome things are fundamental: e.g., communication and securityBut, advantageous for higher-level functions to use common lower-level functions

    DataGrid is a project funded by the European Union

  • The Hourglass ModelFocus on architecture issuesPropose set of core services as basic infrastructureUse to construct high-level, domain-specific solutionsDesign principlesKeep participation cost lowEnable local controlSupport for adaptationIP hourglass modelDiverse global servicesCoreservicesLocal OSA p p l i c a t i o n s

    DataGrid is a project funded by the European Union

  • Where Are We With Architecture?No official standards existBut: Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocolsGGF has an architecture working groupTechnical specifications are being developed for architecture elements: e.g., security, data, resource management, informationInternet drafts submitted in security area

    DataGrid is a project funded by the European Union

  • Fabric LayerProtocols & ServicesJust what you would expect: the diverse mix of resources that may be sharedIndividual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc.Few constraints on low-level technology: connectivity and resource level protocols form the neck in the hourglass Defined by interfaces not physical characteristics

    DataGrid is a project funded by the European Union

  • Connectivity LayerProtocols & ServicesCommunicationInternet protocols: IP, DNS, routing, etc.Security: Grid Security Infrastructure (GSI)Uniform authentication, authorization, and message protection mechanisms in multi-institutional settingSingle sign-on, delegation, identity mappingPublic key technology, SSL, X.509, GSS-APISupporting infrastructure: Certificate Authorities, certificate & key management, GSI: www.gridforum.org/security

    DataGrid is a project funded by the European Union

  • Resource LayerProtocols & ServicesGrid Resource Allocation Mgmt (GRAM) Remote allocation, reservation, monitoring, control of compute resourcesGridFTP protocol (FTP extensions)High-performance data access & transportGrid Resource Information Service (GRIS)Access to structure & state informationNetwork reservation, monitoring, controlAll built on connectivity layer: GSI & IPGridFTP: www.gridforum.orgGRAM, GRIS: www.globus.org

    DataGrid is a project funded by the European Union

  • Collective LayerProtocols & ServicesIndex servers aka metadirectory servicesCustom views on dynamic resource collections assembled by a community Resource brokers (e.g., Condor Matchmaker)Resource discovery and allocationReplica catalogsReplication servicesCo-reservation and co-allocation servicesWorkflow management servicesEtc.

    DataGrid is a project funded by the European Union

  • Example:High-ThroughputComputing SystemHigh Throughput Computing SystemDynamic checkpoint, job management, failover, stagingBrokering, certificate authorities Access to data, access to computers, access to network performance data Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, schedulersCollective(App)AppCollective(Generic)ResourceConnectFabric

    DataGrid is a project funded by the European Union

  • Example:Data Grid ArchitectureDiscipline-Specific Data Grid ApplicationCoherency control, replica selection, task management, virtual data catalog, virtual data code catalog, Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, Access to data, access to computers, access to network performance data, Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, clusters, networks, network caches, Collective(App)AppCollective(Generic)ResourceConnectFabric

    DataGrid is a project funded by the European Union

  • The Programming ProblemBut how do I develop robust, secure, long-lived, well-performing applications for dynamic, heterogeneous Grids?I need, presumably:Abstractions and models to add to speed/robustness/etc. of developmentTools to ease application development and diagnose common problemsCode/tool sharing to allow reuse of code components developed by others

    DataGrid is a project funded by the European Union

  • Grid Programming TechnologiesGrid applications are incredibly diverse (data, collaboration, computing, sensors, )Seems unlikely there is one solutionMost applications have been written from scratch, with or without Grid servicesApplication-specific libraries have been shown to provide significant benefitsNo new language, programming model, etc., has yet emerged that transforms thingsBut certainly still quite possible

    DataGrid is a project funded by the European Union

  • Examples of GridProgramming TechnologiesMPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: workflow managementLegion: object models for Grid computingCactus: Grid-aware numerical solver frameworkNote tremendous variety, application focus

    DataGrid is a project funded by the European Union

  • MPICH-G2: A Grid-Enabled MPIA complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environmentsBased on the Argonne MPICH implementation of MPI (Gropp and Lusk)Requires services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIEwww.globus.org/mpi

    DataGrid is a project funded by the European Union

  • Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)Modular, portable framework for parallel, multidimensional simulationsConstruct codes by linkingSmall core (flesh): mgmt servicesSelected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.Custom linking/configuration toolsDeveloped for astrophysics, but not astrophysics-specificCactus fleshThornswww.cactuscode.org

    DataGrid is a project funded by the European Union

  • High-Throughput Computingand CondorHigh-throughput computingCPU cycles/day (week, month, year?) under non-ideal circumstancesHow many times can I run simulation X in a month using all available machines?Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facilityEmphasis on policy management and reliability

    DataGrid is a project funded by the European Union

  • Object-Based ApproachesGrid-enabled CORBANASA Lewis, Rutgers, ANL, othersCORBA wrappers for Grid protocolsSome initial successesLegionU.VirginiaObject models for Grid components (e.g., vault=storage, host=computer)

    DataGrid is a project funded by the European Union

  • PortalsN-tier architectures enabling thin clients, with middle tiers using Grid functionsThin clients = Web browsersMiddle tier = e.g. Java Server Pages, with Java CoG Kit, GPDK, GridPort utilitiesBottom tier = various Grid resourcesNumerous applications and projects, e.g.Unicore, Gateway, Discover, Mississippi Computational Web Portal, NPACI Grid Port, Lattice Portal, Nimrod-G, Cactus, NASA IPG Launchpad, Grid Resource Broker,

    DataGrid is a project funded by the European Union

  • Common Toolkit UnderneathEach of these programming environments should not have to implement the protocols and services from scratch!Rather, want to share common code thatImplements core functionalitySoftware Development Kits (SDKs) that can be used to construct a large variety of services and clientsStandard services that can be easily deployedIs robust, well-architected, self-consistentIs open source, with broad input

    DataGrid is a project funded by the European Union

  • General ApproachDefine Grid protocols & APIsProtocol-mediated access to remote resourcesIntegrate and extend existing standardsOn the Grid = speak Intergrid protocolsDevelop a reference implementationClient and server SDKs, services, tools, etc.Grid-enable wide variety of toolsLearn through deployment and applications

    DataGrid is a project funded by the European Union

  • Globus ToolkitA software toolkit addressing key technical problems in the development of Grid enabled tools, services, and applicationsOffer a modular bag of technologiesEnable incremental development of grid-enabled tools and applications Implement standard Grid protocols and APIsMake available under liberal open source licenseCurrent version is 4.0, commonly referred to as GT4

    DataGrid is a project funded by the European Union

  • Key Concepts for GT4OGSA, WSRF, and GT4These are basic architecture components for GT4Open Grid Services Architecture (OGSA)Web Services: OGSA, WSRF, and GT4 are based on standard Web Services technologies such as SOAP and WSDL. Need to be familiar with the Web Services architecture and languages.The Web Services Resource Framework: WSRF is the core of GT4.

    DataGrid is a project funded by the European Union

  • Key Concepts for GT4 (cont)The GT4 Architecture: Based on WS-Resources and Web Services, and grid computingJava & XML: to use GT4, you need to be able to program in Java, and to understand basic XML.

    DataGrid is a project funded by the European Union

  • OGSA Key RequirementsInteroperability and Support for Dynamic and Heterogeneous EnvironmentsResource Sharing Across OrganizationsOptimizationQuality of Service (QoS) AssuranceJob ExecutionData ServicesSecurityAdministrative Cost ReductionScalabilityAvailabilityEase of Use and Extensibility

    DataGrid is a project funded by the European Union

  • OGSA Defines Basic CapabilitiesInfrastructure ServicesExecution Management ServicesData ServicesResource Management ServicesSecurity ServicesSelf-Management ServicesInformation ServicesSecurity Considerations

    DataGrid is a project funded by the European Union

  • OGSA, WSRF, and GT4

    DataGrid is a project funded by the European Union

  • GT4 Roadmap

    DataGrid is a project funded by the European Union

  • History and MotivationDo we want standard APIs?Eg. MPI (Message Passing Interface)But on the grid, we actually want standard wire protocolsThe API can be different on each system

    DataGrid is a project funded by the European Union

  • History and Motivation (cont)Open Grid Services Infrastructure (OGSI)Global Grid Forum (GGF) standardIdentified a number of common building blocks used in grid protocolsInspecting state, creating and removing state, detecting changes in state, naming stateDefined standard ways to do these things, based on Web services (defined a thing called a Grid Service)

    DataGrid is a project funded by the European Union

  • History and Motivation (cont)But thenRealized that this was useful for Web services in general, not just for the grid.Moved out of GGF, into OASISSplit the single OGSI specification into a number of other specifications called WSRF.

    DataGrid is a project funded by the European Union

  • Globus ToolkitGrid infrastructure softwareFour key protocolsSecurity/Authentication (GSI)Resource Management/Scheduling (GRAM)Resource description (GRIS/GIIS)Data/File transfer (GASS, GridFTP)

    DataGrid is a project funded by the European Union

  • Grid Security Infrastructure (GSI)

    DataGrid is a project funded by the European Union

  • Security TerminologyAuthentication: Establishing identityAuthorization: Establishing rightsMessage protectionMessage integrityMessage confidentialityNon-repudiationDigital signatureAccountingCertificate Authority (CA)

    DataGrid is a project funded by the European Union

  • Why Grid Security is HardResources being used may be valuable & the problems being solved sensitiveResources are often located in distinct administrative domainsEach resource has own policies & proceduresSet of resources used by a single computation may be large, dynamic, and unpredictableNot just client/server, requires delegationIt must be broadly available & applicableStandard, well-tested, well-understood protocols; integrated with wide variety of tools

    DataGrid is a project funded by the European Union

  • GSI in ActionCreate Processes at A and B that Communicate & Access Files at CSite A(Kerberos) Site B (Unix)Site C(Kerberos)Computer

    UserComputer

    Storagesystem

    DataGrid is a project funded by the European Union

  • Grid Security Requirements

    DataGrid is a project funded by the European Union

  • Candidate StandardsKerberos 5Fails to meet requirements:Integration with various local security solutionsUser based trust modelTransport Layer Security (TLS/SSL)Fails to meet requirements:Single sign-onDelegation

    DataGrid is a project funded by the European Union

  • Grid Security Infrastructure (GSI)Extensions to standard protocols & APIsStandards: SSL/TLS, X.509 & CA, GSS-APIExtensions for single sign-on and delegationGlobus Toolkit reference implementation of GSISSLeay/OpenSSL + GSS-API + SSO/delegationTools and services to interface to local securitySimple ACLs; SSLK5/PKINIT for access to K5, AFS; Tools for credential managementLogin, logout, etc.SmartcardsMyProxy: Web portal login and delegationK5cert: Automatic X.509 certificate creation

    DataGrid is a project funded by the European Union

  • Review of Public Key CryptographyAsymmetric keysA private key is used to encrypt data.A public key can decrypt data encrypted with the private key.An X.509 certificate includesSomeones subject name (user ID)Their public keyA signature from a Certificate Authority (CA) that:Proves that the certificate came from the CA.Vouches for the subject nameVouches for the binding of the public key to the subject

    DataGrid is a project funded by the European Union

  • Public Key Based AuthenticationUser sends certificate over the wire.Other end sends user a challenge string.User encodes the challenge string with private keyPossession of private key means you can authenticate as subject in certificatePublic key is used to decode the challenge.If you can decode it, you know the subjectTreat your private key carefully!!Private key is stored only in well-guarded places, and only in encrypted form

    DataGrid is a project funded by the European Union

  • X.509 Proxy CertificateDefines how a short term, restricted credential can be created from a normal, long-term X.509 credentialA proxy certificate is a special type of X.509 certificate that is signed by the normal end entity cert, or by another proxySupports single sign-on & delegation through impersonationCurrently an IETF draft

    DataGrid is a project funded by the European Union

  • User ProxiesMinimize exposure of users private keyA temporary, X.509 proxy credential for use by our computationsWe call this a user proxy certificateAllows process to act on behalf of userUser-signed user proxy cert stored in local fileCreated via grid-proxy-init commandProxys private key is not encryptedRely on file system security, proxy certificate file must be readable only by the owner

    DataGrid is a project funded by the European Union

  • DelegationRemote creation of a user proxyResults in a new private key and X.509 proxy certificate, signed by the original keyAllows remote process to act on behalf of the userAvoids sending passwords or private keys across the network

    DataGrid is a project funded by the European Union

  • Globus Security APIsGeneric Security Service (GSS) APIIETF standardProvides functions for authentication, delegation, message protectionDecoupled from any particular communication methodBut GSS-API is somewhat complicated, so we also provide the easier-to-use globus_gss_assist API.GSI-enabled SASL is also provided

    DataGrid is a project funded by the European Union

  • ResultsGSI adopted by 100s of sites, 1000s of usersGlobus CA has issued >3000 certs (user & host), >1500 currently active; other CAs activeRollouts are currently underway all over:NSF Teragrid, NASA Information Power Grid, DOE Science Grid, European Data Grid, etc.Integrated in research & commercial appsGrADS testbed, Earth Systems Grid, European Data Grid, GriPhyN, NEESgrid, etc.Standardization begun in Global Grid Forum, IETF

    DataGrid is a project funded by the European Union

  • GSI ApplicationsGlobus Toolkit uses GSI for authenticationMany Grid tools, directly or indirectly, e.g.Condor-G, SRB, MPICH-G2, Cactus, GDMP, Commercial and open source tools, e.g.ssh, ftp, cvs, OpenLDAP, OpenAFSSecureCRT (Win32 ssh client)And since we use standard X.509 certificates, they can also be used forWeb access, LDAP server access, etc.

    DataGrid is a project funded by the European Union

  • Ongoing and Future GSI WorkProtection against compromised resourcesRestricted delegation, smartcardsStandardizationScalability in numbers of users & resources Credential managementOnline credential repositories (MyProxy)Account managementAuthorizationPolicy languagesCommunity authorization

    DataGrid is a project funded by the European Union

  • Proxy Certificate Standards WorkInternet Public Key Infrastructure X.509 Proxy Certificate Profiledraft-ietf-pkix-proxy-01.txtDraft being considered by IETF PKIX working group, and by GGF GSI working groupDefines proxy certificate format, including restricted rights and delegation tracingDemonstrated a prototype of restricted proxies at HPDC (August 2001) as part of CAS demo

    DataGrid is a project funded by the European Union

  • GSS-API Extensions Work4 years of GSS-API experience, while on the whole quite positive, has shed light on various deficiencies of GSS-APIGSS-API Extensionsdraft-ggf-gss-extensions-04.txtDraft being considered by GGF GSI working group. Not yet submitted to IETF.Defines extensions to the GSS-API to better support Grid security

    DataGrid is a project funded by the European Union

  • GSS-API ExtensionsCredential export/importAllows delegated credentials to be externalizedUsed for checkpointing a serviceDelegation at any time, in either directionMore rich options on use of delegationRestricted delegation handlingAdd proxy restrictions to delegated credInspect auth cert for restrictionsAllow better mapping of GSS to TLSSupport TLS framing of messages

    DataGrid is a project funded by the European Union

  • Community Authorization ServiceQuestion: How does a large community grant its users access to a large set of resources?Should minimize burden on both the users and resource providersCommunity Authorization Service (CAS)Community negotiates access to resourcesResource outsources fine-grain authorization to CASResource only knows about CAS user credentialCAS handles user registration, group membershipUser who wants access to resource asks CAS for a capability credentialRestricted proxy of the CAS user cred., checked by resource

    DataGrid is a project funded by the European Union

  • Community Authorization(Prototype shown August 2001) User

    DataGrid is a project funded by the European Union

  • Community Authorization ServiceCAS provides user community with information needed to authenticate resourcesSent with capability credential, used on connection with resourceResource identity (DN), CAThis allows new resources/users (and their CAs) to be made available to a community through the CAS without action on the other users/resources part

    DataGrid is a project funded by the European Union

  • Authorization APIService providers need to perform authorization policy evaluation on:Local policiesPolicies contained in restricted proxiesWe are working on 2 API layers:Low level GAA-API implementation for evaluation of policiesHigh level, very simple authorization API that can easily be embedded into servicesStill in early prototyping stage

    DataGrid is a project funded by the European Union

  • Passport Online CA & MyProxyRequiring users to manage their own certs and keys is annoying and error proneA solution: Leverage Passport global authentication to obtain a proxy credentialPassport providesGlobally unique user name (email address)Method of verifying ownership of the name (authentication)Re-issuance (e.g. forgotten password)Passport credentials can be presented to an online CA or credential repositoryCreates and issues new (restricted) proxy certificate to the user on demand

    DataGrid is a project funded by the European Union

  • Other Future Security WorkEase-of-useImproved error message, online CA, etc.Improved online credential repositoriesSee MyProxy paper at HPDCSupport for multiple user credentialsMulti-factor authenticationSubordinate certificate authorities for domainsEase issuance of host certs for domainsIndependent Data Unit Support

    DataGrid is a project funded by the European Union

  • Security SummaryGSI successfully addresses wide variety of Grid security issuesBroad acceptance, deployment, integration with toolsStandardization on-going in IETF & GGFOngoing R&D to address next set of issuesFor more information: www.globus.org/research/papers.htmlA Security Architecture for Computational GridsDesign and Deployment of a National-Scale Authentication Infrastructurewww.gridforum.org/security

    DataGrid is a project funded by the European Union

  • Grid Resource Allocation Management (GRAM)

    DataGrid is a project funded by the European Union

  • The ChallengeEnabling secure, controlled remote access to heterogeneous computational resources and management of remote computationAuthentication and authorizationResource discovery & characterizationReservation and allocationComputation monitoring and controlAddressed by new protocols & servicesGRAM protocol as a basic building blockResource brokering & co-allocation servicesGSI for security, MDS for discovery

    DataGrid is a project funded by the European Union

  • Resource ManagementThe Grid Resource Allocation Management (GRAM) protocol and client API allows programs to be started on remote resources, despite local heterogeneityResource Specification Language (RSL) is used to communicate requirements A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM servicesIntegrated with Condor, PBS, MPICH-G2,

    DataGrid is a project funded by the European Union

  • Resource Management ArchitectureGRAMGRAMGRAMLSFCondorNQEApplicationRSLSimple ground RSLInformation ServiceLocalresourcemanagersRSLspecializationGround RSLQueries& Info

    DataGrid is a project funded by the European Union

  • Resource Specification LanguageCommon notation for exchange of information between componentsSyntax similar to MDS/LDAP filtersRSL provides two types of information:Resource requirements: Machine type, number of nodes, memory, etc.Job configuration: Directory, executable, args, environmentGlobus Toolkit provides an API/SDK for manipulating RSL

    DataGrid is a project funded by the European Union

  • RSL SyntaxElementary form: parenthesis clauses(attribute op value [ value ] )Operators Supported: , != Some supported attributes:executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact, resourceManagerNameUnknown attributes are passed through May be handled by subsequent tools

    DataGrid is a project funded by the European Union

  • Constraints: &For example:& (count>=5) (count=64) (executable=myprog)Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours

    DataGrid is a project funded by the European Union

  • Disjunction: |For example:& (executable=myprog) ( | (&(count=5)(memory>=64)) (&(count=10)(memory>=32)))Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory

    DataGrid is a project funded by the European Union

  • GRAM Protocol EvolutionGRAM-1: Simple HTTP-based RPCJob requestReturns a job contact: Opaque string that can be passed between clients, for access to jobJob cancel, status, signalEvent notification (callbacks) for state changesPending, active, done, failed, suspendedGRAM-1.5 (U Wisconsin contribution)Add reliability improvementsOnce-and-only-once submissionRecoverable job manager serviceReliable termination detectionGRAM-2: Moving to Web Services (SOAP)

    DataGrid is a project funded by the European Union

  • Globus Toolkit ImplementationGatekeeperSingle point of entryAuthenticates user, maps to local security environment, runs serviceIn essence, a secure inetd Job managerA gatekeeper serviceLayers on top of local resource management system (e.g., PBS, LSF, etc.)Handles remote interaction with the job

    DataGrid is a project funded by the European Union

  • GRAM ComponentsGrid SecurityInfrastructureJob ManagerGRAM client API calls to request resource allocationand process creation.MDS client API callsto locate resourcesQuery current statusof resourceCreateRSL LibraryParseRequestAllocate &create processesProcessProcessProcessMonitor &controlSite boundaryClientMDS: Grid Index Info ServerGatekeeperMDS: Grid Resource Info ServerLocal Resource ManagerMDS client API callsto get resource infoGRAM client API statechange callbacks

    DataGrid is a project funded by the European Union

  • Co-allocationSimultaneous allocation of a resource setHandled via optimistic co-allocation based on free nodes or queue predictionIn the future, advance reservations will also be supported (already in prototype)Globus APIs/SDKs support the co-allocation of specific multi-requestsUses a Globus component called the Dynamically Updated Request Online Co-allocator (DUROC)

    DataGrid is a project funded by the European Union

  • Multirequest: +A multirequest allows us to specify multiple resource needs, for example+ (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine with at least 64M of memoryExecute p2 on a machine with an ATM connectionMultirequests are central to co-allocation

    DataGrid is a project funded by the European Union

  • A Co-allocation Multirequest+( & (resourceManagerContact= flash.isi.edu:754:/C=US//CN=flash.isi.edu-fork) (count=1) (label="subjob A") (executable= my_app1) ) ( & (resourceManagerContact= sp139.sdsc.edu:8711:/C=US//CN=sp097.sdsc.edu-lsf") (count=2) (label="subjob B") (executable=my_app2) )

    DataGrid is a project funded by the European Union

  • Job Submission InterfacesGlobus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructureOthers are building better interfacesGeneral purposeCondor-G, PBS, GRD, Hotpage, etcApplication specificECCE, Cactus, Web portals

    DataGrid is a project funded by the European Union

  • globus-job-runFor running of interactive jobsAdditional functionality beyond rshEx: Run 2 process job w/ executable stagingglobus-job-run -: host np 2 s myprog arg1 arg2Ex: Run 5 processes across 2 hostsglobus-job-run \-: host1 np 2 s myprog.linux arg1 \-: host2 np 3 s myprog.aix arg2For list of arguments run:globus-job-run -help

    DataGrid is a project funded by the European Union

  • globus-job-submitFor running of batch/offline jobsglobus-job-submitSubmit jobSame interface as globus-job-runReturns immediatelyglobus-job-statusCheck job statusglobus-job-cancelCancel jobglobus-job-get-outputGet job stdout/errglobus-job-cleanCleanup after job

    DataGrid is a project funded by the European Union

  • globusrunFlexible job submission for scriptingUses an RSL string to specify job request Contains an embedded globus-gass-serverDefines GASS URL prefix in RSL substitution variable:(stdout=$(GLOBUSRUN_GASS_URL)/stdout)Supports both interactive and offline jobsComplex to useMust write RSL by handMust understand its esoteric featuresGenerally you should use globus-job-* commands instead

    DataGrid is a project funded by the European Union

  • Resource Management APIsThe globus_gram_client API provides access to all of the core job submission and management capabilities, including callback capabilities for monitoring job status.The globus_rsl API provides convenience functions for manipulating and constructing RSL strings.The globus_gram_myjob allows multi-process jobs to self-organize and to communicate with each other.The globus_duroc_control and globus_duroc_runtime APIs provide access to multirequest (co-allocation) capabilities.

    DataGrid is a project funded by the European Union

  • Advance Reservationand Other GeneralizationsGeneral-purpose Architecture for Reservation and Allocation (GARA)2nd generation resource management servicesBroadens GRAM on two axesGeneralize to support various resource typesCPU, storage, network, devices, etc.Advance reservation of resources, in addition to allocationCurrently a research prototype

    DataGrid is a project funded by the European Union

  • GARA: The Big Picture

    DataGrid is a project funded by the European Union

  • Grid Information ServicesSystem information is critical to operation of the grid and construction of applicationsWhat resources are available?Resource discoveryWhat is the state of the grid?Resource selectionHow to optimize resource use Application configuration and adaptation?We need a general information infrastructure to answer these questions

    DataGrid is a project funded by the European Union

  • Examples of Useful InformationCharacteristics of a compute resourceIP address, software available, system administrator, networks connected to, OS version, loadCharacteristics of a networkBandwidth and latency, protocols, logical topologyCharacteristics of the Globus infrastructureHosts, resource managers

    DataGrid is a project funded by the European Union

  • Grid Information: Facts of LifeInformation is always oldTime of flight, changing system stateNeed to provide quality metricsDistributed state hard to obtainComplexity of global snapshot Component will failScalability and overheadMany different usage scenariosHeterogeneous policy, different information organizations, etc.

    DataGrid is a project funded by the European Union

  • Grid Information ServiceProvide access to static and dynamic information regarding system componentsA basis for configuration and adaptation in heterogeneous, dynamic environmentsRequirements and characteristicsUniform, flexible access to informationScalable, efficient access to dynamic dataAccess to multiple information sourcesDecentralized maintenance

    DataGrid is a project funded by the European Union

  • Two Classes Of Information ServersResource Description ServicesSupplies information about a specific resource (e.g. Globus 1.1.3 GRIS).Aggregate Directory ServicesSupplies collection of information which was gathered from multiple GRIS servers (e.g. Globus 1.1.3 GIIS).Customized naming and indexing

    DataGrid is a project funded by the European Union

  • Information ProtocolsGrid Resource Registration ProtocolSupport information/resource discoveryDesigned to support machine/network failureGrid Resource Inquiry ProtocolQuery resource description server for informationQuery aggregate server for informationLDAP V3.0 in Globus 1.1.3

    DataGrid is a project funded by the European Union

  • GIS ArchitectureAACustomized Aggregate Directories RRRRStandard Resource Description ServicesRegistrationProtocolUsersEnquiryProtocol

    DataGrid is a project funded by the European Union

  • Metacomputing Directory ServiceUse LDAP as Inquiry Access information in a distributed directoryDirectory represented by collection of LDAP serversEach server optimized for particular functionDirectory can be updated by: Information providers and toolsApplications (i.e., users)Backend tools which generate info on demandInformation dynamically available to tools and applications

    DataGrid is a project funded by the European Union

  • Two Classes Of MDS ServersGrid Resource Information Service (GRIS)Supplies information about a specific resourceConfigurable to support multiple information providersLDAP as inquiry protocolGrid Index Information Service (GIIS)Supplies collection of information which was gathered from multiple GRIS serversSupports efficient queries against information which is spread across multiple GRIS serverLDAP as inquiry protocol

    DataGrid is a project funded by the European Union

  • LDAP DetailsLightweight Directory Access ProtocolIETF StandardStripped down version of X.500 DAP protocolSupports distributed storage/access (referrals)Supports authentication and access controlDefines:Network protocol for accessing directory contentsInformation model defining form of information Namespace defining how information is referenced and organized

    DataGrid is a project funded by the European Union

  • MDS ComponentsLDAP 3.0 Protocol EngineBased on OpenLDAP with custom backendIntegrated cachingInformation providersDelivers resource information to backendAPIs for accessing & updating MDS contentsC, Java, PERL (LDAP API, JNDI)Various tools for manipulating MDS contentsCommand line tools, Shell scripts & GUIs

    DataGrid is a project funded by the European Union

  • GRIS/GIIS

    DataGrid is a project funded by the European Union

  • Grid Resource Information ServiceServer which runs on each resourceGiven the resource DNS name, you can find the GRIS server (well known port = 2135)Provides resource specific informationMuch of this information may be dynamicLoad, process information, storage information, etc.GRIS gathers this information on demandWhite pages lookup of resource informationEx: How much memory does machine have?Yellow pages lookup of resource optionsEx: Which queues on machine allows large jobs?

    DataGrid is a project funded by the European Union

  • Grid Index Information ServiceGIIS describes a class of serversGathers information from multiple GRIS serversEach GIIS is optimized for particular queriesEx1: Which Alliance machines are >16 process SGIs?Ex2: Which Alliance storage servers have >100Mbps bandwidth to host X?Akin to web search enginesOrganization GIISThe Globus Toolkit ships with one GIISCaches GRIS info with long update frequencyUseful for queries across an organization that rely on relatively static information (Ex1 above)Can be merged into GRIS

    DataGrid is a project funded by the European Union

  • Information Services APIRFC 1823 defines an IETF draft standard client API for accessing LDAP databasesConnect to serverPose query which returns data structures contains sets of object classes and attributesFunctions to walk these data structuresGlobus does not provide an LDAP API. We recommend the use of OpenLDAP, an open source implementation of RFC 1823.

    DataGrid is a project funded by the European Union

  • Searching an LDAP Directorygrid-info-search [options] filter [attributes]

    Default grid-info-search options-h mds.globus.orgMDS server-p 389 MDS port-b o=Grid search start point-T 30 LDAP query timeout-s sub scope = subtree alternatives: base : lookup this entry one: lookup immediate children

    DataGrid is a project funded by the European Union

  • Searching a GRIS Servergrid-info-host-search [options] filter [attributes]

    Exactly like grid-info-search, except defaults:-h localhostGRIS server-p 2135 GRIS port

    Example:grid-info-host-search h pitcairn dn=* dn

    DataGrid is a project funded by the European Union

  • FilteringFilters allow selection of object based on relational operators (=, ~=,=)grid-info-search cputype=*Compound filters can be construct with Boolean operations: (&, |, !)grid-info-search (&(cputype=*)(cpuload1
  • Data Grid ProblemEnable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of dataNote that this problem:Is common to many areas of scienceOverlaps strongly with other Grid problems

    DataGrid is a project funded by the European Union

  • Data Intensive Issues Include Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for whatSchedule resources efficiently, again subject to local and global constraintsAchieve high performance, with respect to both speed and reliabilityCatalog software and virtual data

    DataGrid is a project funded by the European Union

  • Data IntensiveComputing and GridsThe term Data Grid is often usedUnfortunate as it implies a distinct infrastructure, which it isnt; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, Security, resource mgt, info services, etc.Important to exploit commonalities as very unlikely that multiple infrastructures can be maintainedFortunately this seems easy to do!

    DataGrid is a project funded by the European Union

  • Examples ofDesired Data Grid FunctionalityHigh-speed, reliable access to remote dataAutomated discovery of best copy of data Manage replication to improve performanceCo-schedule compute, storage, networkTransparency wrt delivered performanceEnforce access control on dataAllow representation of global resource allocation policies

    DataGrid is a project funded by the European Union

  • A Model Architecture for Data GridsMetadata CatalogReplica CatalogTape LibraryDisk CacheAttribute SpecificationLogical Collection and Logical File NameDisk ArrayDisk CacheApplicationReplica SelectionMultiple LocationsNWSSelectedReplicaGridFTP Control ChannelPerformanceInformation &PredictionsReplica Location 1Replica Location 2Replica Location 3MDSGridFTP Data Channel

    DataGrid is a project funded by the European Union

  • Globus Toolkit ComponentsTwo major Data Grid components:

    1. Data Transport and AccessCommon protocolSecure, efficient, flexible, extensible data movementFamily of tools supporting this protocol

    2. Replica Management ArchitectureSimple scheme for managing:multiple copies of filescollections of files

    DataGrid is a project funded by the European Union

  • Motivation for a Common Data Access ProtocolExisting distributed data storage systemsDPSS, HPSS: focus on high-performance access, utilize parallel data transfer, stripingDFS: focus on high-volume usage, dataset replication, local cachingSRB: connects heterogeneous data collections, uniform client interface, metadata queriesProblemsIncompatible (and proprietary) protocolsEach require custom clientPartitions available data sets and storage devicesEach protocol has subset of desired functionality

    DataGrid is a project funded by the European Union

  • A Common, Secure,Efficient Data Access ProtocolCommon, extensible transfer protocolCommon protocol means all can interoperateDecouple low-level data transfer mechanisms from the storage serviceAdvantages: New, specialized storage systems are automatically compatible with existing systemsExisting systems have richer data transfer functionalityInterface to many storage systemsHPSS, DPSS, file systemsPlan for SRB integration

    DataGrid is a project funded by the European Union

  • Access/Transport Protocol RequirementsSuite of communication libraries and related tools that supportGSI, Kerberos securityThird-party transfersParameter set/negotiatePartial file accessReliability/restartLarge file supportData channel reuseAll based on a standard, widely deployed protocolIntegrated instrumentationLoggin/audit trailParallel transfersStriping (cf DPSS)Policy-based access controlServer-side computationProxies (firewall, load bal)

    DataGrid is a project funded by the European Union

  • GridFTP and Grid Access to Secondary Storage (GASS)

    DataGrid is a project funded by the European Union

  • GridFTPWhy FTP?Ubiquity enables interoperation with many commodity toolsAlready supports many desired features, easily extended to support othersWell understood and supportedWe use the term GridFTP to refer toTransfer protocol which meets requirementsFamily of tools which implement the protocolNote GridFTP > FTPNote that despite name, GridFTP is not restricted to file transfer!

    DataGrid is a project funded by the European Union

  • GridFTP: Basic ApproachFTP protocol is defined by several IETF RFCsStart with most commonly used subsetStandard FTP: get/put etc., 3rd-party transferImplement standard but often unused featuresGSS binding, extended directory listing, simple restartExtend in various ways, while preserving interoperability with existing serversStriped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart

    DataGrid is a project funded by the European Union

  • GridFTP Protocol SpecificationsExisting standardsRFC 949: File Transfer ProtocolRFC 2228: FTP Security ExtensionsRFC 2389: Feature Negotiation for the File Transfer ProtocolDraft: FTP ExtensionsNew draftsGridFTP: Protocol Extensions to FTP for the GridGrid Forum Data Working Group

    DataGrid is a project funded by the European Union

  • GridFTP vs. WebDAVWebDAV extends http for remote data accessCombines control and data over single channelFTP splits control and dataSupports multiple, user selectable data channel protocolsAdvantage to split channelsThird party transfers handled cleanlyCan (cleanly) define new data channel protocolsE.g. parallel/striped transfer, automatic TCP buffer/window negotiation, non-TCP based protocols, etc.Amenable to high-performance proxiesE.g. For firewalls, load balancing, etc.

    DataGrid is a project funded by the European Union

  • The GridFTP Family of ToolsPatches to existing FTP codeGSI-enabled versions of existing FTP client and server, for high-quality production codeCustom-developed librariesImplement full GridFTP protocol, targeting custom use, high-performanceCustom-developed toolsServers and clients with specialized functionality and performance

    DataGrid is a project funded by the European Union

  • A Word on GASSThe Globus Toolkit provides services for file and executable staging and I/O redirection that work well with GRAM. This is known as Globus Access to Secondary Storage (GASS).GASS uses GSI-enabled HTTP as the protocol for data transfer, and a caching algorithm for copying data when necessary.The globus_gass, globus_gass_transfer, and globus_gass_cache APIs provide programmer access to these capabilities, which are already integrated with the GR