Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Efficient Access to Many Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti...
Efficient Access toEfficient Access toMany Small FilesMany Small Files
in a Grid Filesystem in a Grid Filesystem
Douglas Thain and Christopher MorettiDouglas Thain and Christopher Moretti
University of Notre DameUniversity of Notre Dame
Efficient Access to ManyEfficient Access to ManySmall (and Big) FilesSmall (and Big) Files in a Grid Filesystem in a Grid Filesystem
Douglas Thain and Christopher MorettiDouglas Thain and Christopher Moretti
University of Notre DameUniversity of Notre Dame
AbstractAbstractMany grid data tools focus on transferring, Many grid data tools focus on transferring, storing, and managing large (GB-TB) files.storing, and managing large (GB-TB) files.
But, many users need to manage, transfer, and But, many users need to manage, transfer, and process lots (1000s) of small (KB-MB) files.process lots (1000s) of small (KB-MB) files.
We describe protocols and interfaces for We describe protocols and interfaces for manipulating many small files over wide area manipulating many small files over wide area networks. (Doesn’t hurt large files, either.)networks. (Doesn’t hurt large files, either.)
Implemented in the Implemented in the ChirpChirp file system. file system.
Performance:Performance:– Best case: order of magnitude improvement.Best case: order of magnitude improvement.– Worst case: no slower than before.Worst case: no slower than before.
Who has lots of small files?Who has lots of small files?
Anyone using a batch system.Anyone using a batch system.– One file for submit, input, output, error, log...One file for submit, input, output, error, log...
Anyone using a large software package.Anyone using a large software package.– Executables, libraries, config files...Executables, libraries, config files...
Anyone using a filesystem like a database.Anyone using a filesystem like a database.– Genomics, astronomy, physics...Genomics, astronomy, physics...
Anyone who likes to write shell scripts.Anyone who likes to write shell scripts.– foreach host in list ssh $host > $host.outputforeach host in list ssh $host > $host.output
Why is this a problem?Why is this a problem?
Users do the “sensible” thing:Users do the “sensible” thing:– foreach file in (list) do transfer doneforeach file in (list) do transfer done
The “sensible” thing performs miserably:The “sensible” thing performs miserably:– New TCP ConnectionNew TCP Connection– SSL AuthenticationSSL Authentication– Configuration OperationsConfiguration Operations– Slow Start AgainSlow Start Again
Result is KB/s on a GB/s link.Result is KB/s on a GB/s link.
Why not just use tar?Why not just use tar?
If you can, you should!If you can, you should!Sometimes you cannot:Sometimes you cannot:– The system semantics demand multiple files.The system semantics demand multiple files.– Packing and unpacking can be very slow.Packing and unpacking can be very slow.– Not enough disk space to unpack.Not enough disk space to unpack.– Different apps select different data subsets.Different apps select different data subsets.– Using an existing script or program.Using an existing script or program.
Users don’t know or care that it’s a dist Users don’t know or care that it’s a dist system, why should they change?system, why should they change?
The Challenge:The Challenge:
How to design How to design interfacesinterfacesso that users get the expectedso that users get the expected
performance and behavior?performance and behavior?
Requirements for a Grid FilesystemRequirements for a Grid Filesystem
Transparent access to files in the same Transparent access to files in the same manner as a local Unix filesystem.manner as a local Unix filesystem.Non privileged deployment at both client Non privileged deployment at both client and server. (root not possible on the grid.)and server. (root not possible on the grid.)User control over policies for naming, User control over policies for naming, caching, consistency, and fault tolerance.caching, consistency, and fault tolerance.Flexible access controls for sharing.Flexible access controls for sharing.Good performance on both small and Good performance on both small and large files.large files.
Chirp/Parrot – A Grid Chirp/Parrot – A Grid FilesystemFilesystem
Chirp
OrdinaryUnix
Filesystem
OrdinaryUnix
Program
Parrot
unixsystem
calls
Authorization:kerberos:[email protected] RWLDAglobus:/O=ND/CN=Joe RWLDAhostname:*.nd.edu RLgroup:server.nd.edu/team RWL
Protocol:open / pread / pwrite / closestat / mkdir / rmdir / unlinkgetfile / putfile / movefile
Authentication:Kerberos / Globus / Hostname / Unix
Single TCP Stream
NoPrivs
Needed!
NoPrivs
Needed!
Automatic Recoveryptracetrap
Ordinary Unix CommandsOrdinary Unix Commands
> parrot tcsh> parrot tcsh
> ls /chirp> ls /chirp
alpha.nd.edualpha.nd.edu
beta.nd.edubeta.nd.edu
......
> cd /chirp/alpha.nd.edu/mydir> cd /chirp/alpha.nd.edu/mydir
> cp /tmp/bigdata .> cp /tmp/bigdata .
> emacs mydata.txt> emacs mydata.txt
Parrot Specific CommandsParrot Specific Commands
> parrot tcsh> parrot tcsh
> parrot_whoami> parrot_whoami
globus:/O=ND/CN=Joeglobus:/O=ND/CN=Joe
> parrot_getacl /chirp/alpha.nd.edu/> parrot_getacl /chirp/alpha.nd.edu/
kerberos:[email protected] RWLDAkerberos:[email protected] RWLDA
globus:/O=ND/CN=Joe RWLglobus:/O=ND/CN=Joe RWL
hostname:*.nd.edu RLhostname:*.nd.edu RL
Chirp as Remote FilesystemChirp as Remote Filesystem
Grid Site A Grid Site B
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
ChirpServer
UnixFilesystem
GridMiddleware
App
ParrotCert
Securedby GSI
Chirp as Cluster FilesystemChirp as Cluster Filesystem
Grid Site A Grid Site B
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
App
Parrot
ChirpServer
UnixFilesystem
ChirpServer
UnixFilesystem
ChirpServer
UnixFilesystem
ChirpServer
UnixFilesystem
dirserver
auxdb
Sample ApplicationsSample Applications
Image Processing for BiometricsImage Processing for Biometrics– Moretti et al, PCGRID 2007Moretti et al, PCGRID 2007
Bioinformatics on EGEEBioinformatics on EGEE– Blanchet et al, Grid 2006Blanchet et al, Grid 2006
High Energy Physics on LCGHigh Energy Physics on LCG– Sfiligoi et al, CHEP 2005, Sfiligoi et al, CHEP 2005,
Molecular Dynamics RepositoryMolecular Dynamics Repository– Wozniak et al, HPDC 2005Wozniak et al, HPDC 2005
Remote DB Access on EDGRemote DB Access on EDG– Klous et al, CCPE 2005Klous et al, CCPE 2005
What About FTP?What About FTP?
FTP is a great FTP is a great data transferdata transfer system, but it system, but it was never designed to be a was never designed to be a file systemfile system::– New TCP stream per data transfer.New TCP stream per data transfer.– New TCP stream for each directory list.New TCP stream for each directory list.– Lots of connections can overwhelm net devices.Lots of connections can overwhelm net devices.– Coarse errors: 550 for all file system errors.Coarse errors: 550 for all file system errors.– Semantic problems: e.g. empty directory.Semantic problems: e.g. empty directory.– Unix access controls, (But, see SecPAL)Unix access controls, (But, see SecPAL)– Wildly varying implementations and support.Wildly varying implementations and support.
FTP Protocol ReminderFTP Protocol Reminder
AUTH GSSAPIMICMIC
Data Transfer
AUTH GSSAPIMICMIC
PORTRETR
Control Connection
Data Connection
FTPClient
FTPServer
Minimum of four round trips (plus auth overhead) to fetch a file +
loss of TCP window.
Common practice is new control connection for
every data transfer!
What About NFS?What About NFS?
NFS was designed for a local area NFS was designed for a local area network among (relatively) trusted hosts.network among (relatively) trusted hosts.– Fine-grained file access very slow on WAN.Fine-grained file access very slow on WAN.– Kernel support and root assistance needed to Kernel support and root assistance needed to
start server, mount client, change target.start server, mount client, change target.– Unix UID for ownership, access control.Unix UID for ownership, access control.– Need to bind to privileged port, often filtered.Need to bind to privileged port, often filtered.– Use of “file handles” to refer to files makes it Use of “file handles” to refer to files makes it
very difficult to build a user-level server.very difficult to build a user-level server.+ lots of lookup operations over the WAN.+ lots of lookup operations over the WAN.
NFS Protocol ReminderNFS Protocol Reminder
NFSClient
NFSServer
On a WAN, throughput limited to 4KB/latency.
10ms = 400 KB/s
100ms = 40 KB/s
lookup(00,a)lookup(10,b)lookup(20,c)
...
read 4KBread 4KBread 4KB
...
Chirp Hybrid Protocol OverviewChirp Hybrid Protocol Overview
ChirpClient
ChirpServer
auth globus (8 RTT)openreadwriteclose...getfile(“mydata”)
putfile(“otherdata”,size)
size and data
data
Protocol ComparisonProtocol Comparison
FTP - Stream per FileFTP - Stream per File– Latency = 4+ RTT for each fileLatency = 4+ RTT for each file– Throughput = TCP limit after slow startThroughput = TCP limit after slow start
NFS – Remote Procedure CallNFS – Remote Procedure Call– Latency = 1 RTT for each fileLatency = 1 RTT for each file– Throughput = block size / latencyThroughput = block size / latency
Chirp - HybridChirp - Hybrid– Latency = 1 RTT for each fileLatency = 1 RTT for each file– Throughput = TCP limit in steady stateThroughput = TCP limit in steady state
Standard Unix CopyStandard Unix Copy
Parrot
cp
Local Chirp
LocalDisk
ChirpServer
open(source)
open(source)
read
read
open
open
write
write
open(source)open(target)
loop: read/write
cp /tmp/source /chirp/B/target
Problem:Problem:The system does not know the The system does not know the
contextcontext of the operation! of the operation!
Solution:Solution:Introduce a higher-level operationIntroduce a higher-level operationcopyfilecopyfile that exploits the context. that exploits the context.
Improved Copy with CopyfileImproved Copy with Copyfile
Parrot
newcp
Local Chirp
LocalDisk
ChirpServer
copyfile(source,target)
open(source)
open(source)
putfile(target)
putfile(target)
cp /tmp/source /chirp/B/target
Is it reasonable to modify cp?Is it reasonable to modify cp?
Installation:Installation:– Cannot modify /bin/cp.Cannot modify /bin/cp.– Install new parrot_cpInstall new parrot_cp– Alias cp or link named “cp” in PATH.Alias cp or link named “cp” in PATH.
Backwards compatibility:Backwards compatibility:– parrot_cp without Parrot falls back to normal.parrot_cp without Parrot falls back to normal.– Ordinary cp on Parrot behaves as before.Ordinary cp on Parrot behaves as before.– Parrot_cp on a different filesystem falls back.Parrot_cp on a different filesystem falls back.
Improved Copy with CopyfileImproved Copy with Copyfile
Parrot
newcp
Chirp
ChirpServer
B
copyfile(source,target)
thirdput(source,B,target)
ChirpServer
A
cp /chirp/A/source /chirp/B/target
putfile(target)thirdput(source,B,target)
Directory CopyDirectory Copy
ChirpServer
B
ChirpServer
A
ACL X Y Z
mydir
thirdput(/mydir/X,B,/mydir/X)
X
setacl(mydir)
ACL
mydir
thirdput(/mydir/X,B,/mydir/Y)
Y
thirdput(/mydir/X,B,/mydir/Z)
Z
cp
Parrot
mkdir(mydir)
cp –r /chirp/A/mydir /chirp/B/mydir
Improved Directory CopyImproved Directory Copy
ChirpServer
B
ChirpServer
A
ACL X Y Z
mydir
ACL X Y Z
mydir
mkdirputfile*3setacl
cp
Parrot
thirdput(/mydir,B,/mydir)
cp –r /chirp/A/mydir /chirp/B/mydir
You get the idea...You get the idea...
ls –la Dls –la D– Original: getdir D + N*statOriginal: getdir D + N*stat– Improved: getlongdir DImproved: getlongdir D
rm –rf Drm –rf D– Original: getdir D + N*unlink (recursive)Original: getdir D + N*unlink (recursive)– Improved: rmall DImproved: rmall D
md5sum Fmd5sum F– Original: open F + N*read + closeOriginal: open F + N*read + close– Improved: md5 FImproved: md5 F
Final ExampleFinal Example
ls –la /chirp/alpha/datals –la /chirp/alpha/data
md5sum /chirp/alpha/data/*md5sum /chirp/alpha/data/*
cp -r /chirp/alpha/datacp -r /chirp/alpha/data
/chirp/beta/data/chirp/beta/data
md5sum /chirp/beta/data/*md5sum /chirp/beta/data/*
rm –rf /chirp/alpha/datarm –rf /chirp/alpha/data
Original ImplementationOriginal Implementation
ls -la md5 cp rm cp md5
chirpserver
A
chirpserver
B
parrot
app
Improved ImplementationImproved Implementation
rm
chirpserver
A
chirpserver
B
parrot
app
ls -la md5 cp md5
Performance on ScriptPerformance on Script
0
20
40
60
80
100
120
140
160
180
list
chec
ksum
mov
e
chec
ksum
dele
te
tim
e (s
eco
nd
s)
Original
Improved
The Challenge:The Challenge:
How to design How to design interfacesinterfacesso that users get the expectedso that users get the expected
performance and behavior?performance and behavior?
SummarySummaryGood small file performance requires Good small file performance requires attention to low level network protocols.attention to low level network protocols.– getfile, putfile, thirdput, rmall, checksumgetfile, putfile, thirdput, rmall, checksum
Exploiting protocols requires minor Exploiting protocols requires minor changes to the Unix I/O interface.changes to the Unix I/O interface.– copyfile, rmall, checksum, others?copyfile, rmall, checksum, others?
Easy to apply those changes in a user Easy to apply those changes in a user transparent way.transparent way.– cp, rm, md5sum all operate as normalcp, rm, md5sum all operate as normal
Usable performance in a wide-area FS.Usable performance in a wide-area FS.
For more information...For more information...
Douglas ThainDouglas Thain–[email protected]@nd.edu
Chris MorettiChris Moretti–[email protected]@nd.edu
Parrot and ChirpParrot and Chirp–http://www.cctools.orghttp://www.cctools.org