Remote Data Access Working Group
-
Upload
amadeus-franks -
Category
Documents
-
view
43 -
download
0
description
Transcript of Remote Data Access Working Group
Remote Data Access Working Group
Introductory Session
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Remote Data Access Working GroupGrid Forum 5
Reagan Moore
Summary of Working Group Activities
Challenges:
Rapid evolution of grid environments
Pressure of application implementation
Interactions with Grid Forum working groups
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Organization
Name: Remote Data Access Working Group Chairs: Reagan Moore, Ann Chervenak, John
Karpovich Document Editor: Eric Stephan Charter: Interoperability between remote data
access systems Short-term goals:
Review “Summary of Data Grids” Define framework for common functionality across
data grids
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Working Group Liaisonsfor Requirements Lists
Accounting Ed Hanna Grid Performance Brian Tierney Information Ann ChervenakProgram Models Tracey Smith/Craig LeeScheduling Judy BeirigerSecurity (Open position)User Services Judith Utley(Applications and Tools ) Ron Oldfield
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Semantics for Data Access
File based access User owns the files Globus, Nile, IBP
Object based access Object is member of a class Legion, CORBA, “Objectivity”
Collection based access Collection owns files Storage Resource Broker, Digital libraries
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Remote Data Access Architecture Convergence
FTP Client
ReplicaCatalog
FTP Daemon
StorageSystem
Application
SRBClient
MetadataCatalog
SRB Server
StorageSystem
Application
Globus SDSC Storage Resource Broker
MetadataCatalog
?
?
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Evolution of Data Management
A grid supports Data management
Access to distributed storage systems
Users also require Information management
Tagged attributes of the stored data sets
Knowledge management Relationships between the concepts described by the
data set attributes
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Architecture
Data HandlingSystems
StorageResources
API that provides“glue” to underlyingstorage, QoS, etc.[GASS, IBP, SRB]
RemoteProcedureExecution
DPSS, HPSS, ADSM, DMF, Unitree, NASstore, DFS, DB2, Oracle, Illustra, Sybase, O2,
ObjectStore, Objectivity
API that provides “glue” to underlying data handling
systems (security, scheduling, QoS, access
protocol, data format/model, adaptivity, info discovery, location
control)
Condor, GASS, NILE, [SRB], I-2 caching
Data ModelManagement
Application
StorageSystem
Description
InformationDiscovery
ArmadaD’agents,FEL, ADRGRAM,
SRB
+ authentication+ authorization
DynamicInfo
DiscoveryGloPerf,
Netlogger, NWS
(e.g., filtering)
(which perf. Monitor, what QoS, location, what access control, replication)
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Information Based Grid
AttributesSemantics
Information
Data
Tagging of data Management Access Services
(Data Handling System - SRB / FTP / HTTP)M
CA
T/H
DF
Gri
ds
XM
L D
TD
SD
LIPInformation
RepositoryAttribute- based Query
Feature-basedQuery
FieldsContainersFolders
Storage(Replicas,Persistent IDs)
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Knowledge Based Grid
AttributesSemantics
Knowledge
Information
Data
Tagging of data Management Access Services
(Topic Maps / Buckets / Model-based Access)
(Data Handling System - SRB / FTP / HTTP)
MC
AT
/HD
F
Gri
ds
XM
L D
TD
SD
LIP
XT
M D
TD
Rul
es -
KQ
L
InformationRepository
Attribute- based Query
Feature-basedQuery
Knowledge orTopic-Based Query / Browse
KnowledgeRepository for Rules
RelationshipsBetweenConcepts
FieldsContainersFolders
Storage(Replicas,Persistent IDs)
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Emerging Applications Virtual Data Products
NSF GriPhyN ITR project Dynamically create product by application of analysis
procedures Information Repositories
Protein Data Bank Support application of structural comparison
algorithms Collections
National Virtual Observatory Federate sky surveys
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Current Papers
Remote Data Access Architectures Presented at GF4
Summary/survey of existing data grids Presented at GF4
Data Transport Protocol GridFTP presentation at GF5
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Grid Forum 5 Sessions
Monday 11:00 - XML Tutorial Information tagging Relationship tagging
Monday 4:30 - GF/eGRID survey Working group session to identify
requirements Tuesday 3:00 - GridFTP specification
Working group session on data transport protocol
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
DATA Working Groups
GF/eGRID discussion
GridFTP discussion
Architecture Working Group
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Grid Forum Architecture Working Group
Discussion of need for: Network services perspective for designing
protocols and APIs for Grid Forum services Distributed Operating system perspective
for designing an architecture (naming, binding, persistence, process management, storage)
Led by Charlie Catlett
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Grid Forum Interactions
Levels Grid Forum Support
Application Application APIs
User Management, Resourceregistration
Collection Persistence, informationrepositories
Resource Abstraction of resources.Standards for informationexchange
Connectivity Transport, security
Fabric Resource Interfaces
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Grid Forum Interactions
Levels Data Accounting Scheduling Performance I nformationServices
GridComputingEnvironment
APIs File, object,collectionaccess
Accountinginterface
Schedulerinterface
Monitor API Informationdiscovery API
WorkbenchPortal
Management Replicacatalog
Userregistration
DistributedSchedulerManager
Performanceaggregationserver
Resourceadditionservice
Processinteractionmanagement
Persistence Metadatacatalog
Grid usagerepository
Advancedregistrationmanagement
Performancerepository
Grid resourcenamingrepository
Portal stateinformation
ResourceAbstractionstandards
GridFTP,ODBC, SRB
AuditInformationExchange
Policydescriptionexchange
Performanceinformationexchange
Resourcecapabilityinfo. exch.
Standard Runenvironmentinterface
Transport,Security
GSS, PKI ,TCP/I P
GSS, PKI ,TCP/I PNet usage
GSS, PKI ,TCP/I P
GSS, PKI ,TCP/I PNet perf.
GSS, PKI ,TCP/I P,SDLIP
GSS, PKI ,TCP/I P
ResourceI nterfaces
Storageinterfaces,
Usagetrackinginterface
Local sched.int., policyint.
Monitor dataProducer
Informationrepositoryinterface
Local runenvironmentinterface
GF/eGRID Discussion Group
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
GF/eGRID DiscussionLed by Reagan Moore
What access protocols are of interest? What latency hiding mechanisms are of
interest? Data streaming Caching Replicas Containers for aggregation Remote proxies for bundling I/O commands
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
GF/eGRID Discussion
What are data management requirements? Data collections Information catalogs Knowledge repositories
What is the granularity of the data management systems? Collection size Object size Data set access size
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
GF/eGRID Discussion
What is the time granularity? (Execution rate) * (Number of operations) (Transmission bandwidth) * (Number of
bytes) How many operations are done per byte
accessed, Ops-per-Byte?
For your resources, is
Ops-per-Byte ~ Execution rate / Bandwidth
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
GF/eGRID Discussion
Common application exists across Japan, US, and Europe for the high energy physics community (CMS, Atlas, Babar) NSF GriPhyN DOE PPDG CERN DataGrid Japan ETL-KEK data grid
Analyze event data generated at CERN
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
CERN Event Data
“File” oriented access Latency is smaller than the analysis time Objects managed as a collection Collection - 1 PB/year, event is 1 MB in
size, implies 1 billion events per year
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Data Access Requirements
Current implementation Global object namespace Global schema
Each site replicates the catalog the manages the global namespace and global schema
Current data model is based upon Objectivity
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Data Management
Objects identified by Database/container/page/slot Each database can be thought of as a file Replication at the file level Analysis time is 10-100 seconds per object
Suggests alternate management by Object level access Size of initial object is 1 MB Derived products are 100 kB to 10 kB in size
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Object Level Access
Manage 5 billion objects Requires ability to
Export objects (encapsulated within XML) Access individual objects within Objectivity Definition of procedure for
manipulating/subsetting an object Maintains
Global namespace and global schema Allows migration between collections
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Common Requirements
Archive interface Aggregation of objects into containers to
minimize impact on archive namespace Replication of objects to allow local
analysis Track where replicas are located to improve
performance Knowledge management for mapping
between schema
GridFTP Discussion Group
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
GridFTP ProposalLed by Steven Tuecke
Extensions to the FTP standard RFC 959 - FTP definition RFC 2228 - Security RFC 2389 - Feature negotiation
What extensions are needed by the Grid Forum to support large data transfers over wide area networks?
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Grid FTP
Add Security extension - GSI Partial file transfer - Unix semantics Parallel I/O Striped I/O Buffer, window size tuning Recoverable data transfers Progress monitoring
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Timeline
E-Mail discussion of current draft Next 2 months
Complete draft by June,2001
Implementation by June, 2001 Depending upon on further extensions
Definition of API is scope of another working group
Grid Forum: Remote Data Access Working Group http://www.gridforum.org
Participants Steven Tuecke <[email protected]> Bill Alcock Lee Liming Ann Chervenak <[email protected]>
John Karpovich <[email protected]> Dan Gunter <[email protected]> Tiziana Ferrari <[email protected]> Parkson Wong <[email protected]> Heinz Stockinger <[email protected]> Samuel Meder <[email protected]> Reagan Moore <[email protected]>