Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University...
-
Upload
gavin-wiggins -
Category
Documents
-
view
214 -
download
0
description
Transcript of Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University...
Scalla + Castor2Scalla + Castor2Andrew Hanushevsky
Stanford Linear Accelerator CenterStanford University
27-March-07Root Workshop
http://xrootd.slac.stanford.edu
Castor2/xrootd Integration – The Details
14-Decemeber-06 2: http://xrootd.slac.stanford.edu
Outline
Introduction Design points Architecture Some Details
Data serving and Clustering xrootd and olbd
Casor2 Integration Dirty Details
Conclusion
14-Decemeber-06 3: http://xrootd.slac.stanford.edu
What is ScallaScalla?
SStructured CCluster AArchitecture for LLow LLatency AAccess Low Latency Access to data via xrootdxrootd servers
POSIX-style byte-level random access By default, arbitrary data organized as files Hierarchical directory-like name space
• Does not have full file system semantics Protocol includes high performance features
Structured Clustering provided by olbdolbd servers Exponentially scalable and self organizing
14-Decemeber-06 4: http://xrootd.slac.stanford.edu
Scalla Design Points (the commercial)(the commercial)
High speed access to experimental data Write once read many times processing mode Small block sparse random access (e.g., root files) High transaction rate with rapid request dispersal (fast opens)
Low setup cost High efficiency data server (low CPU/byte overhead, small memory footprint) Very simple configuration requirements No 3rd party software needed (avoids messy dependencies)
Low administration cost Non-Assisted fault-tolerance Self-organizing servers remove need for configuration changes No database requirements (no backup/recovery issues)
Wide usability Full POSIX access Server clustering for scalability Plug-in architecture and event notification for applicability (HPSS, CastorCastor, etc)
14-Decemeber-06 5: http://xrootd.slac.stanford.edu
Management(XMI Castor, DPM)
lfn2pfnprefix encoding
Storage System(oss, drm/srm, etc)
authentication(gsi, krb5, etc)
Clustering(olbd)
authorization(name based)
File System(ofs, sfs, alice, etc)
Protocol (1 of n)(xrootd)
xrootd Plugin Architecture
Protocol Driver(Xrd)
Many ways to accommodate other systemsMany ways to accommodate other systems
EventsEvents EventsEventsEventsEvents
14-Decemeber-06 6: http://xrootd.slac.stanford.edu
The UpshotPlug-in Architecture Plus Events Easy to integrate other systems
Low Latency Data Server Improved performance
Depending on integration open latency may be high
An actual integration follows How Castor works now How Castor works with xrootd
14-Decemeber-06 7: http://xrootd.slac.stanford.edu
7
Castor 2 Instance With Rootd
clientclient
CastorCastorDisk Disk ServerServer
1: open() ->Stage_open() -> stageGet()
6
Disk Disk ServerServer
OracleOracle
3: Starts rootd via LSF
5: Connects to root and opens file
2: Selects server and stages in file
4: Responds with fileserver and path
6: Reads the file
4
rootdrootd 3
2
1
7: close() -> stageUpdtDone()
5
14-Decemeber-06 8: http://xrootd.slac.stanford.edu
Limitations & SolutionsNot Scalable One rootd process per open file
Resources are quickly consumed
Not Very Good Performance or Robustness rootd is old generation
We’ve learned a lot about servers since then
What We Really Want A long-lived ultra efficient data server
xrootd is a likely candidate Plug-in architecture ideal for integration
• So surprise here
14-Decemeber-06 9: http://xrootd.slac.stanford.edu
XrdCS2dXrdCS2dXMI Plug-in
Adding xrootd Access
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
xrootdxrootd xrootdxrootd
OrphanOrphanClusterCluster
XrdCS2dXrdCS2d
OracleOracle
14-Decemeber-06 10: http://xrootd.slac.stanford.edu
XrdCS2dXrdCS2dXMI Plug-in
Using Orphan xrootd Access
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
xrootdxrootd xrootdxrootd
OrphanOrphanClusterClusterOracleOracle
4
2
31 67
8+11XrdCS2dXrdCS2d
10
5
1: open()
3: Client told to wait for callback
5: Get location and fileid via polled query
2: prepareToGet()
4: Server selected and file staged
6: Client redirected
11: close() -> stageUpdtDone()
7: open()
9: LSF launches XrdCS2e to create LFN to PFN symlink
10: read()
7
8: stage_open()
9
XrdCS2e
14-Decemeber-06 11: http://xrootd.slac.stanford.edu
Limitations & Solutions
Performance only after open() All open requests must go through Castor
Good for management but bad for ultimate speed Reliant on Castor availability
Single point of failure
Mitigate with xrootd based clustering Maneuver around Castor whenever possible
Reduce open() latency Reduce single-point dependence
14-Decemeber-06 12: http://xrootd.slac.stanford.edu
Quick Note on Clusteringxrootd servers can be clustered Increase access points and available data Allows for automatic failover
Structured point-to-point connections Cluster overhead (human & non-human) scales linearly Cluster size is not limited I/O performance is not affected
Always pairs xrootd & olbd servers Symmetric cookie-cutter arrangement Architecture can be used in very novel ways olbdolbd
xrootdxrootd
14-Decemeber-06 13: http://xrootd.slac.stanford.edu
File Request Routing
ClientClient ManagerManager(Head Node/Redirector)
Data ServersData Servers
open file XAA
BB
CC
go to C
open file X
Who has file X?
I have
ClusterClient sees all servers as xrootd data serversClient sees all servers as xrootd data servers
2nd open X
go to C
ManagersManagersCache fileCache filelocationlocation
No External DatabaseNo External Database
14-Decemeber-06 14: http://xrootd.slac.stanford.edu
olbdolbdolbdolbd
Adding an xroot Cluster
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
xrootdxrootd xrootdxrootd
PeerPeerClusterCluster
olbdolbdxrootdxrootd
xrootxrootClusterCluster
serverserverserverserver
peerpeer
OrphanOrphanClusterClusterOracleOracle
XrdCS2dXrdCS2dXrdCS2dXrdCS2d
14-Decemeber-06 15: http://xrootd.slac.stanford.edu
olbdolbdolbdolbd
Offline File Access
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
xrootdxrootd xrootdxrootd
PeerPeerClusterCluster
olbdolbdxrootdxrootd
xrootxrootClusterCluster
Have?Have?Have?Have?
peerpeer
OracleOracle
open()Wait for CallbackRedirect to Peer
1st Time2nd+ Time
open()
As prevously shownAs prevously shown
XrdCS2dXrdCS2d XrdCS2dXrdCS2d
14-Decemeber-06 16: http://xrootd.slac.stanford.edu
olbdolbdolbdolbd
Online File Access
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
xrootdxrootd xrootdxrootd
PeerPeerClusterCluster
olbdolbdxrootdxrootd
xrootxrootClusterCluster
peerpeer
OracleOracle
open() Redirect to Server1st Time2nd+ Time
open()read()
Have?Have?Have?Have?Yes!Yes!
XrdCS2dXrdCS2d XrdCS2dXrdCS2d
14-Decemeber-06 17: http://xrootd.slac.stanford.edu
Some Unsaid Things
File creation and open in write mode Requests are always vectored through Castor
Doesn’t address file creation/write scalability
Massive first time opens for a particular file Requests will pile-up at the Castor redirector
Is not very efficient and has limited scalability
Is there another way? Yes, complicated multicast call backs oror Simply additional hardware
14-Decemeber-06 18: http://xrootd.slac.stanford.edu
Alternative Castor Integration
clientclient
Disk Disk ServerServer
CastorCastorDisk Disk ServerServer
olbdolbdxrootdxrootd
CastorCastorInstanceInstanceOracleOracle
xroot Clusterxroot Clusterolbdolbd
xrootdxrootdolbdolbd
xrootdxrootd
Independent SystemsIndependent Systems Complete Fault ToleanceComplete Fault Toleance
14-Decemeber-06 19: http://xrootd.slac.stanford.edu
Advantages & DisadvantagesBoth System Completely Independent High degree of fault tolerance
BetterBetter recovery modes Load isolation
BetterBetter throughput Ability to speed match response time
BetterBetter transaction rateRequires Additional Hardware Servers and disk space
Approximately one timeone time $50-100K USD (depending on priorities) However, about 1 to 2 FTE reduction in ongoingongoing effort
Difficult to Share Resources Across Experiments May be moot as most experiments are always active and Enables local controlEnables local control
14-Decemeber-06 20: http://xrootd.slac.stanford.edu
Conclusion
Scalla can be integrated with a wide variety of systems Castor integration test is starting DPM (G-Lite) integration test is in full swing
Likely to be rewritten in the Castor model
Alternative integration modes should be considered They may be classic but they have distinct advantages
Whatever the outcome it’s the science that matters Use the best model to crank out the analysis as fast as possible
14-Decemeber-06 21: http://xrootd.slac.stanford.edu
Acknowledgements
Castor Integration Olof Barring, Sebastien Ponce, Rosa Maria Garcia Rioja
Software Collaborators INFN/Padova: Fabrizio Furano (client-side), Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenet (windows) Alice: Derek Feichtinger, Guenter Kickinger STAR/BNL: Pavel Jackl Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks BaBar: Pete Elmer (packaging)
Operational collaborators BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC
Funding US Department of Energy
Contract DE-AC02-76SF00515 with Stanford University