Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University...

21
Scalla + Scalla + Castor2 Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop http://xrootd.slac.stanford.edu Castor2/xrootd Integration – The Details

description

14-Decemeber-063: Scalla What is Scalla? SCA Structured Cluster Architecture for LLA Low Latency Access xrootd Low Latency Access to data via xrootd servers POSIX-style byte-level random access By default, arbitrary data organized as files Hierarchical directory-like name space Does not have full file system semantics Protocol includes high performance features olbd Structured Clustering provided by olbd servers Exponentially scalable and self organizing

Transcript of Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University...

Page 1: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

Scalla + Castor2Scalla + Castor2Andrew Hanushevsky

Stanford Linear Accelerator CenterStanford University

27-March-07Root Workshop

http://xrootd.slac.stanford.edu

Castor2/xrootd Integration – The Details

Page 2: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 2: http://xrootd.slac.stanford.edu

Outline

Introduction Design points Architecture Some Details

Data serving and Clustering xrootd and olbd

Casor2 Integration Dirty Details

Conclusion

Page 3: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 3: http://xrootd.slac.stanford.edu

What is ScallaScalla?

SStructured CCluster AArchitecture for LLow LLatency AAccess Low Latency Access to data via xrootdxrootd servers

POSIX-style byte-level random access By default, arbitrary data organized as files Hierarchical directory-like name space

• Does not have full file system semantics Protocol includes high performance features

Structured Clustering provided by olbdolbd servers Exponentially scalable and self organizing

Page 4: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 4: http://xrootd.slac.stanford.edu

Scalla Design Points (the commercial)(the commercial)

High speed access to experimental data Write once read many times processing mode Small block sparse random access (e.g., root files) High transaction rate with rapid request dispersal (fast opens)

Low setup cost High efficiency data server (low CPU/byte overhead, small memory footprint) Very simple configuration requirements No 3rd party software needed (avoids messy dependencies)

Low administration cost Non-Assisted fault-tolerance Self-organizing servers remove need for configuration changes No database requirements (no backup/recovery issues)

Wide usability Full POSIX access Server clustering for scalability Plug-in architecture and event notification for applicability (HPSS, CastorCastor, etc)

Page 5: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 5: http://xrootd.slac.stanford.edu

Management(XMI Castor, DPM)

lfn2pfnprefix encoding

Storage System(oss, drm/srm, etc)

authentication(gsi, krb5, etc)

Clustering(olbd)

authorization(name based)

File System(ofs, sfs, alice, etc)

Protocol (1 of n)(xrootd)

xrootd Plugin Architecture

Protocol Driver(Xrd)

Many ways to accommodate other systemsMany ways to accommodate other systems

EventsEvents EventsEventsEventsEvents

Page 6: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 6: http://xrootd.slac.stanford.edu

The UpshotPlug-in Architecture Plus Events Easy to integrate other systems

Low Latency Data Server Improved performance

Depending on integration open latency may be high

An actual integration follows How Castor works now How Castor works with xrootd

Page 7: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 7: http://xrootd.slac.stanford.edu

7

Castor 2 Instance With Rootd

clientclient

CastorCastorDisk Disk ServerServer

1: open() ->Stage_open() -> stageGet()

6

Disk Disk ServerServer

OracleOracle

3: Starts rootd via LSF

5: Connects to root and opens file

2: Selects server and stages in file

4: Responds with fileserver and path

6: Reads the file

4

rootdrootd 3

2

1

7: close() -> stageUpdtDone()

5

Page 8: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 8: http://xrootd.slac.stanford.edu

Limitations & SolutionsNot Scalable One rootd process per open file

Resources are quickly consumed

Not Very Good Performance or Robustness rootd is old generation

We’ve learned a lot about servers since then

What We Really Want A long-lived ultra efficient data server

xrootd is a likely candidate Plug-in architecture ideal for integration

• So surprise here

Page 9: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 9: http://xrootd.slac.stanford.edu

XrdCS2dXrdCS2dXMI Plug-in

Adding xrootd Access

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

xrootdxrootd xrootdxrootd

OrphanOrphanClusterCluster

XrdCS2dXrdCS2d

OracleOracle

Page 10: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 10: http://xrootd.slac.stanford.edu

XrdCS2dXrdCS2dXMI Plug-in

Using Orphan xrootd Access

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

xrootdxrootd xrootdxrootd

OrphanOrphanClusterClusterOracleOracle

4

2

31 67

8+11XrdCS2dXrdCS2d

10

5

1: open()

3: Client told to wait for callback

5: Get location and fileid via polled query

2: prepareToGet()

4: Server selected and file staged

6: Client redirected

11: close() -> stageUpdtDone()

7: open()

9: LSF launches XrdCS2e to create LFN to PFN symlink

10: read()

7

8: stage_open()

9

XrdCS2e

Page 11: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 11: http://xrootd.slac.stanford.edu

Limitations & Solutions

Performance only after open() All open requests must go through Castor

Good for management but bad for ultimate speed Reliant on Castor availability

Single point of failure

Mitigate with xrootd based clustering Maneuver around Castor whenever possible

Reduce open() latency Reduce single-point dependence

Page 12: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 12: http://xrootd.slac.stanford.edu

Quick Note on Clusteringxrootd servers can be clustered Increase access points and available data Allows for automatic failover

Structured point-to-point connections Cluster overhead (human & non-human) scales linearly Cluster size is not limited I/O performance is not affected

Always pairs xrootd & olbd servers Symmetric cookie-cutter arrangement Architecture can be used in very novel ways olbdolbd

xrootdxrootd

Page 13: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 13: http://xrootd.slac.stanford.edu

File Request Routing

ClientClient ManagerManager(Head Node/Redirector)

Data ServersData Servers

open file XAA

BB

CC

go to C

open file X

Who has file X?

I have

ClusterClient sees all servers as xrootd data serversClient sees all servers as xrootd data servers

2nd open X

go to C

ManagersManagersCache fileCache filelocationlocation

No External DatabaseNo External Database

Page 14: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 14: http://xrootd.slac.stanford.edu

olbdolbdolbdolbd

Adding an xroot Cluster

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

xrootdxrootd xrootdxrootd

PeerPeerClusterCluster

olbdolbdxrootdxrootd

xrootxrootClusterCluster

serverserverserverserver

peerpeer

OrphanOrphanClusterClusterOracleOracle

XrdCS2dXrdCS2dXrdCS2dXrdCS2d

Page 15: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 15: http://xrootd.slac.stanford.edu

olbdolbdolbdolbd

Offline File Access

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

xrootdxrootd xrootdxrootd

PeerPeerClusterCluster

olbdolbdxrootdxrootd

xrootxrootClusterCluster

Have?Have?Have?Have?

peerpeer

OracleOracle

open()Wait for CallbackRedirect to Peer

1st Time2nd+ Time

open()

As prevously shownAs prevously shown

XrdCS2dXrdCS2d XrdCS2dXrdCS2d

Page 16: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 16: http://xrootd.slac.stanford.edu

olbdolbdolbdolbd

Online File Access

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

xrootdxrootd xrootdxrootd

PeerPeerClusterCluster

olbdolbdxrootdxrootd

xrootxrootClusterCluster

peerpeer

OracleOracle

open() Redirect to Server1st Time2nd+ Time

open()read()

Have?Have?Have?Have?Yes!Yes!

XrdCS2dXrdCS2d XrdCS2dXrdCS2d

Page 17: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 17: http://xrootd.slac.stanford.edu

Some Unsaid Things

File creation and open in write mode Requests are always vectored through Castor

Doesn’t address file creation/write scalability

Massive first time opens for a particular file Requests will pile-up at the Castor redirector

Is not very efficient and has limited scalability

Is there another way? Yes, complicated multicast call backs oror Simply additional hardware

Page 18: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 18: http://xrootd.slac.stanford.edu

Alternative Castor Integration

clientclient

Disk Disk ServerServer

CastorCastorDisk Disk ServerServer

olbdolbdxrootdxrootd

CastorCastorInstanceInstanceOracleOracle

xroot Clusterxroot Clusterolbdolbd

xrootdxrootdolbdolbd

xrootdxrootd

Independent SystemsIndependent Systems Complete Fault ToleanceComplete Fault Toleance

Page 19: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 19: http://xrootd.slac.stanford.edu

Advantages & DisadvantagesBoth System Completely Independent High degree of fault tolerance

BetterBetter recovery modes Load isolation

BetterBetter throughput Ability to speed match response time

BetterBetter transaction rateRequires Additional Hardware Servers and disk space

Approximately one timeone time $50-100K USD (depending on priorities) However, about 1 to 2 FTE reduction in ongoingongoing effort

Difficult to Share Resources Across Experiments May be moot as most experiments are always active and Enables local controlEnables local control

Page 20: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 20: http://xrootd.slac.stanford.edu

Conclusion

Scalla can be integrated with a wide variety of systems Castor integration test is starting DPM (G-Lite) integration test is in full swing

Likely to be rewritten in the Castor model

Alternative integration modes should be considered They may be classic but they have distinct advantages

Whatever the outcome it’s the science that matters Use the best model to crank out the analysis as fast as possible

Page 21: Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop  Castor2/xrootd.

14-Decemeber-06 21: http://xrootd.slac.stanford.edu

Acknowledgements

Castor Integration Olof Barring, Sebastien Ponce, Rosa Maria Garcia Rioja

Software Collaborators INFN/Padova: Fabrizio Furano (client-side), Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenet (windows) Alice: Derek Feichtinger, Guenter Kickinger STAR/BNL: Pavel Jackl Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks BaBar: Pete Elmer (packaging)

Operational collaborators BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC

Funding US Department of Energy

Contract DE-AC02-76SF00515 with Stanford University