Data ManagementGridPP and EDG
Gavin McCanceUniversity of Glasgow
May 9, 2002
http://www.gridpp.ac.uk/datamanagementhttp://cern.ch/grid-data-management
GridPP, 9 May, 2002 Gavin McCance 2/32
Overview
Status of data management workProducts delivered to 1.2GDMP 3.0Reptor: replica managerSpitfireOptor: grid simulation
What’s currently available and future plans
GridPP, 9 May, 2002 Gavin McCance 3/32
WP2: Data Management
ReplicationReplica catalogueReplica manager
Query Optimisation*Grid replica optimisation
Meta-data management*Secure, transparent access to meta-data
Service discovery
*Direct UK involvement
Work is done within the EDG WP2 team (based in CERN)
GridPP, 9 May, 2002 Gavin McCance 4/32
General Status
Deliverables on targetMajor software released for 1.2
UK manpower based at Glasgow:2.5 RAs, Me, Will Bell, Paul Millar (50%)1 PhD student, David Cameron1 more student to come in September
GridPP, 9 May, 2002 Gavin McCance 5/32
File Replication
Requires: replica catalogue or replica location serviceKeeps track of the mapping between
logical file name and physical file names
Requires: replica manager or replica management serviceHigh level tool to actually do the replication
and manage what files are being replicated
File-1
File-1
File-1 File-1
Paris
Glasgow Chicago
LFN
GridPP, 9 May, 2002 Gavin McCance 6/32
File Replication
Current replication functionality provided by GDMP 3.0 – new for 1.2 release!
Used for mirroring of storage elements
Implements subscription based replication model with security, and updates the Globus replica catalogue
GridPP, 9 May, 2002 Gavin McCance 7/32
GDMP 3.0
1. Site ‘B’ subscribes to site A’s files
2. ‘A’ produces new file – ‘B’ will be notified of this
3. ‘B’ then starts transfer of new files from ‘A’
4. Replica catalogue at ‘B’ is updated to reflect new file replica.
Site A Site B
GridPP, 9 May, 2002 Gavin McCance 8/32
GDMP 3.0
Changes w.r.t. 2.* :New security model – host certificatesServer delegation, i.e. accounts on SE not
necessarily requiredClient-only install possibleBasic space managementStand-alone server option ‘unsubscribe’ option
GridPP, 9 May, 2002 Gavin McCance 9/32
GDMP 3.0 status
Final version of GDMP released for 1.2For future, GDMP will be absorbed into the
Replica Manager Service which will offer richer functionality
SRPM, RPM, tarball, User Guide, Quick Config for EDG SEs:http://cmsdoc.cern.ch/cms/grid/
GridPP, 9 May, 2002 Gavin McCance 10/32
Replica Location Service
Current Globus replica catalogue is LDAP basedTo be replaced with new ‘GIGGLE’
framework Replica Location ServiceJoint EDG WP2 / Globus / PPDG project
Trade-offs: global consistency, space, query / update overhead, reliability
GridPP, 9 May, 2002 Gavin McCance 11/32
RLS model…
Reliable local state
Relaxed global consistency
Soft state updates to global index nodes permits graceful behaviour in face of network problems
Secure access
Implemented as web service
GridPP, 9 May, 2002 Gavin McCance 12/32
LRC LRC LRC
RLI
RLIRLI
LRCLRC
StorageElement
StorageElement
StorageElement
StorageElement
StorageElement
Hierarchical indexing. The higher-level RLI contains pointers to lower-level RLIs or LRCs.
RLI = Replica Location Index
LRC = Local Replica Catalog
GridPP, 9 May, 2002 Gavin McCance 13/32
Scalable, reliable
LFN Namespace partitioned among RLIs
Redundant RLIs for reliability
Lossy compressionHigher level RLIs may lose accuracy about
mappings
GridPP, 9 May, 2002 Gavin McCance 14/32
RLS status
Currently Alpha for developershttp://cern.ch/grid-data-management/replica-
location-service/RLS.html
New version will be progressively integrated with other replication software.Testbed deployment in September release
GridPP, 9 May, 2002 Gavin McCance 15/32
Replica Management Service
Web Service under development (Reptor)Will absorb GDMP functionality and extend itWill use the Replica Location Service
Two facetsCore Replica Management APIOptimisation API
GridPP, 9 May, 2002 Gavin McCance 16/32
Core Reptor API
Similar to GDMP API registerEntrycopyFilecopyAndRegisterFile replicateFiledeleteFile listReplicas
GridPP, 9 May, 2002 Gavin McCance 17/32
Interactions with SE
Defined file types:
Physical file attribute File type
Master permanent
secondary copy permanent, durable or volatile.
GridPP, 9 May, 2002 Gavin McCance 18/32
RMS Current Status
Testbed can use GDMP for 1.2Defined Reptor API currently wraps the Globus Replica Manager
Will be developed progressivelyFull version on testbed in SeptemberTechnical reports: http://cern.ch/grid-data-
management/publications.html
GridPP, 9 May, 2002 Gavin McCance 19/32
Grid Query Optimisation
Best place for a job?
Joint WP1 / WP2 question…
Approach: 2-Phase Optimisation:Phase 1: Find suitable CE for job execution
given distribution of files it will accessPhase 2: Re-optimise file access during job
execution (due to dynamic nature of Grid, the resource status changes over time)
GridPP, 9 May, 2002 Gavin McCance 20/32
Optimisation API
initFilePrefetch(LFN[], CE, protocol[], fraction)
cancelFilePrefetch(LFN[], CE)
getBestFile(LFN[], protocol[], fraction)
getNetworkCosts(SE1, SE2, filesize, protocol) from WP7
getIOCosts(SE, PFN) from WP5
GridPP, 9 May, 2002 Gavin McCance 21/32
Grid Replica Optimisation
Controlled intelligent replication to optimise grid over the longer term
Collect getBestFile requests
‘Intelligence’ based on algorithms
Test replication algorithms on data-centric grid simulator
GridPP, 9 May, 2002 Gavin McCance 22/32
Optor – replica optimiser simulation
Simulate prototype Grid
Input site policies and experiment data files.
Introduce replication algorithm: Files are always replicated to the
local storage. If necessary oldest files are
deleted.
GridPP, 9 May, 2002 Gavin McCance 23/32
Optor first resultsEven a basic replication algorithm significantly
reduces network traffic and program running times.
New economics-based algorithms under investigation!
http://ppewww.ph.gla.ac.uk/ScotGRID/Optor
GridPP, 9 May, 2002 Gavin McCance 24/32
Meta-data Management
Spitfire v1.1.0 delivered A grid enabled database service
Grid enabled front end to any type of RDBMS
Examples: Grid meta-data: replica catalogue, service registry Application meta-data: experimental data
catalogues, calibration data
GridPP, 9 May, 2002 Gavin McCance 25/32
V1.1.0 XSQL Spitfire
CURRENT (v1.1.0) is based on XSQL templates on the server, e.g.
<role=“Read-only”/><query> SELECT FILENAME FROM HFS_DATASET WHERE RNNO={@run} AND TRIGGER={@trig} AND STATUS={@stat}</query>
File URL = http://filecat1.atlas.cern.ch/hfs/findDataSet.xsql
GridPP, 9 May, 2002 Gavin McCance 26/32
V1.1.0 Spitfire client
Any HTTP client – either your own app, or a web-browser form
POST an HTML FORM to http://filecat1.atlas.cern.ch/hfs/findDataSet.xsqlwith parameters run=25555, trig=highlumi, stat=good
The operation is made on the database, and the result send back to the client…
GridPP, 9 May, 2002 Gavin McCance 27/32
Security Mechanism
Servlet Container
SSLServletSocketFactory
TrustManager
Security Servlet
Does user specify role?
Map role to connection id
Authorization Module
HTTP + SSLRequest + client certificate
Yes
Role
Trusted CAsIs certificate signed
by a trusted CA?
No
Has certificatebeen revoked?
Revoked Certsrepository
Find default
No
Role repositoryRole ok?
Connectionmappings
Translator Servlet
RDBMS
Request and connection ID
ConnectionPool
GridPP, 9 May, 2002 Gavin McCance 28/32
V1.1.0
V1.1.0 available for 1.2 release now!
SRPM, RPM, tarball installation
User / Admin / Quick Install guideshttp://cern.ch/hep-proj-spitfire
GridPP, 9 May, 2002 Gavin McCance 29/32
New spitfire client (dev)Users can use either this or v1.1.0 static (XSQL template based) functionalityA database client API has been definedWill implement as grid service using standard web service technologies
GridPP, 9 May, 2002 Gavin McCance 30/32
Client side API to access remote database
DB AdminCreate(), Drop(), Alter() Table, Database
DB Core functionality Insert(), Update(), Delete(), Select()
DB Role adminSecure, role based authorisation
DB InformationSchema, Quotas, Disk space
GridPP, 9 May, 2002 Gavin McCance 31/32
Extra functionality
To be developed..
Distributed querying
Replication of meta-data
Automated expiration and cleanup
Discussions with UK DBTF and GGF Database Group
GridPP, 9 May, 2002 Gavin McCance 32/32
Service IndexHow do I find a specific grid service? E.g. replica location server, image database,
information service
XML Service description What, where, attributes, how to contact.
Scalable architectures for querying this developedService index web service W. Hoschek’s thesis and paper (WP2@CERN) API developed
GridPP, 9 May, 2002 Gavin McCance 33/32
More Info
More information available at…
http://www.gridpp.ac.uk/datamanagement
http://cern.ch/grid-data-management
Top Related