Remote Data Access Working Group

33
Remote Data Access Working Group Introductory Session

description

Remote Data Access Working Group. Introductory Session. Remote Data Access Working Group Grid Forum 5 Reagan Moore. Summary of Working Group Activities Challenges: Rapid evolution of grid environments Pressure of application implementation Interactions with Grid Forum working groups. - PowerPoint PPT Presentation

Transcript of Remote Data Access Working Group

Page 1: Remote Data Access Working Group

Remote Data Access Working Group

Introductory Session

Page 2: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Remote Data Access Working GroupGrid Forum 5

Reagan Moore

Summary of Working Group Activities

Challenges:

Rapid evolution of grid environments

Pressure of application implementation

Interactions with Grid Forum working groups

Page 3: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Organization

Name: Remote Data Access Working Group Chairs: Reagan Moore, Ann Chervenak, John

Karpovich Document Editor: Eric Stephan Charter: Interoperability between remote data

access systems Short-term goals:

Review “Summary of Data Grids” Define framework for common functionality across

data grids

Page 4: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Working Group Liaisonsfor Requirements Lists

Accounting Ed Hanna Grid Performance Brian Tierney Information Ann ChervenakProgram Models Tracey Smith/Craig LeeScheduling Judy BeirigerSecurity (Open position)User Services Judith Utley(Applications and Tools ) Ron Oldfield

Page 5: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Semantics for Data Access

File based access User owns the files Globus, Nile, IBP

Object based access Object is member of a class Legion, CORBA, “Objectivity”

Collection based access Collection owns files Storage Resource Broker, Digital libraries

Page 6: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Remote Data Access Architecture Convergence

FTP Client

ReplicaCatalog

FTP Daemon

StorageSystem

Application

SRBClient

MetadataCatalog

SRB Server

StorageSystem

Application

Globus SDSC Storage Resource Broker

MetadataCatalog

?

?

Page 7: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Evolution of Data Management

A grid supports Data management

Access to distributed storage systems

Users also require Information management

Tagged attributes of the stored data sets

Knowledge management Relationships between the concepts described by the

data set attributes

Page 8: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Architecture

Data HandlingSystems

StorageResources

API that provides“glue” to underlyingstorage, QoS, etc.[GASS, IBP, SRB]

RemoteProcedureExecution

DPSS, HPSS, ADSM, DMF, Unitree, NASstore, DFS, DB2, Oracle, Illustra, Sybase, O2,

ObjectStore, Objectivity

API that provides “glue” to underlying data handling

systems (security, scheduling, QoS, access

protocol, data format/model, adaptivity, info discovery, location

control)

Condor, GASS, NILE, [SRB], I-2 caching

Data ModelManagement

Application

StorageSystem

Description

InformationDiscovery

ArmadaD’agents,FEL, ADRGRAM,

SRB

+ authentication+ authorization

DynamicInfo

DiscoveryGloPerf,

Netlogger, NWS

(e.g., filtering)

(which perf. Monitor, what QoS, location, what access control, replication)

Page 9: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Information Based Grid

AttributesSemantics

Information

Data

Tagging of data Management Access Services

(Data Handling System - SRB / FTP / HTTP)M

CA

T/H

DF

Gri

ds

XM

L D

TD

SD

LIPInformation

RepositoryAttribute- based Query

Feature-basedQuery

FieldsContainersFolders

Storage(Replicas,Persistent IDs)

Page 10: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Knowledge Based Grid

AttributesSemantics

Knowledge

Information

Data

Tagging of data Management Access Services

(Topic Maps / Buckets / Model-based Access)

(Data Handling System - SRB / FTP / HTTP)

MC

AT

/HD

F

Gri

ds

XM

L D

TD

SD

LIP

XT

M D

TD

Rul

es -

KQ

L

InformationRepository

Attribute- based Query

Feature-basedQuery

Knowledge orTopic-Based Query / Browse

KnowledgeRepository for Rules

RelationshipsBetweenConcepts

FieldsContainersFolders

Storage(Replicas,Persistent IDs)

Page 11: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Emerging Applications Virtual Data Products

NSF GriPhyN ITR project Dynamically create product by application of analysis

procedures Information Repositories

Protein Data Bank Support application of structural comparison

algorithms Collections

National Virtual Observatory Federate sky surveys

Page 12: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Current Papers

Remote Data Access Architectures Presented at GF4

Summary/survey of existing data grids Presented at GF4

Data Transport Protocol GridFTP presentation at GF5

Page 13: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Grid Forum 5 Sessions

Monday 11:00 - XML Tutorial Information tagging Relationship tagging

Monday 4:30 - GF/eGRID survey Working group session to identify

requirements Tuesday 3:00 - GridFTP specification

Working group session on data transport protocol

Page 14: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

DATA Working Groups

GF/eGRID discussion

GridFTP discussion

Page 15: Remote Data Access Working Group

Architecture Working Group

Page 16: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Grid Forum Architecture Working Group

Discussion of need for: Network services perspective for designing

protocols and APIs for Grid Forum services Distributed Operating system perspective

for designing an architecture (naming, binding, persistence, process management, storage)

Led by Charlie Catlett

Page 17: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Grid Forum Interactions

Levels Grid Forum Support

Application Application APIs

User Management, Resourceregistration

Collection Persistence, informationrepositories

Resource Abstraction of resources.Standards for informationexchange

Connectivity Transport, security

Fabric Resource Interfaces

Page 18: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Grid Forum Interactions

Levels Data Accounting Scheduling Performance I nformationServices

GridComputingEnvironment

APIs File, object,collectionaccess

Accountinginterface

Schedulerinterface

Monitor API Informationdiscovery API

WorkbenchPortal

Management Replicacatalog

Userregistration

DistributedSchedulerManager

Performanceaggregationserver

Resourceadditionservice

Processinteractionmanagement

Persistence Metadatacatalog

Grid usagerepository

Advancedregistrationmanagement

Performancerepository

Grid resourcenamingrepository

Portal stateinformation

ResourceAbstractionstandards

GridFTP,ODBC, SRB

AuditInformationExchange

Policydescriptionexchange

Performanceinformationexchange

Resourcecapabilityinfo. exch.

Standard Runenvironmentinterface

Transport,Security

GSS, PKI ,TCP/I P

GSS, PKI ,TCP/I PNet usage

GSS, PKI ,TCP/I P

GSS, PKI ,TCP/I PNet perf.

GSS, PKI ,TCP/I P,SDLIP

GSS, PKI ,TCP/I P

ResourceI nterfaces

Storageinterfaces,

Usagetrackinginterface

Local sched.int., policyint.

Monitor dataProducer

Informationrepositoryinterface

Local runenvironmentinterface

Page 19: Remote Data Access Working Group

GF/eGRID Discussion Group

Page 20: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

GF/eGRID DiscussionLed by Reagan Moore

What access protocols are of interest? What latency hiding mechanisms are of

interest? Data streaming Caching Replicas Containers for aggregation Remote proxies for bundling I/O commands

Page 21: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

GF/eGRID Discussion

What are data management requirements? Data collections Information catalogs Knowledge repositories

What is the granularity of the data management systems? Collection size Object size Data set access size

Page 22: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

GF/eGRID Discussion

What is the time granularity? (Execution rate) * (Number of operations) (Transmission bandwidth) * (Number of

bytes) How many operations are done per byte

accessed, Ops-per-Byte?

For your resources, is

Ops-per-Byte ~ Execution rate / Bandwidth

Page 23: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

GF/eGRID Discussion

Common application exists across Japan, US, and Europe for the high energy physics community (CMS, Atlas, Babar) NSF GriPhyN DOE PPDG CERN DataGrid Japan ETL-KEK data grid

Analyze event data generated at CERN

Page 24: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

CERN Event Data

“File” oriented access Latency is smaller than the analysis time Objects managed as a collection Collection - 1 PB/year, event is 1 MB in

size, implies 1 billion events per year

Page 25: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Data Access Requirements

Current implementation Global object namespace Global schema

Each site replicates the catalog the manages the global namespace and global schema

Current data model is based upon Objectivity

Page 26: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Data Management

Objects identified by Database/container/page/slot Each database can be thought of as a file Replication at the file level Analysis time is 10-100 seconds per object

Suggests alternate management by Object level access Size of initial object is 1 MB Derived products are 100 kB to 10 kB in size

Page 27: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Object Level Access

Manage 5 billion objects Requires ability to

Export objects (encapsulated within XML) Access individual objects within Objectivity Definition of procedure for

manipulating/subsetting an object Maintains

Global namespace and global schema Allows migration between collections

Page 28: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Common Requirements

Archive interface Aggregation of objects into containers to

minimize impact on archive namespace Replication of objects to allow local

analysis Track where replicas are located to improve

performance Knowledge management for mapping

between schema

Page 29: Remote Data Access Working Group

GridFTP Discussion Group

Page 30: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

GridFTP ProposalLed by Steven Tuecke

Extensions to the FTP standard RFC 959 - FTP definition RFC 2228 - Security RFC 2389 - Feature negotiation

What extensions are needed by the Grid Forum to support large data transfers over wide area networks?

Page 31: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Grid FTP

Add Security extension - GSI Partial file transfer - Unix semantics Parallel I/O Striped I/O Buffer, window size tuning Recoverable data transfers Progress monitoring

Page 32: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Timeline

E-Mail discussion of current draft Next 2 months

Complete draft by June,2001

Implementation by June, 2001 Depending upon on further extensions

Definition of API is scope of another working group

Page 33: Remote Data Access Working Group

Grid Forum: Remote Data Access Working Group http://www.gridforum.org

Participants Steven Tuecke <[email protected]> Bill Alcock Lee Liming Ann Chervenak <[email protected]>

John Karpovich <[email protected]> Dan Gunter <[email protected]> Tiziana Ferrari <[email protected]> Parkson Wong <[email protected]> Heinz Stockinger <[email protected]> Samuel Meder <[email protected]> Reagan Moore <[email protected]>