Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an...

71
RechenZentrum Garching der Max-Planck-Gesellschaft Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter Hartmut Reuter [email protected] Tutorial OpenAFS + Object Storage Hartmut Reuter, RZG, Germany Christof Hanke, RZG, Germany Felix Frank, DESY Zeuthen, Germany

Transcript of Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an...

Page 1: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Hartmut [email protected]

Tutorial OpenAFS + Object Storage

Hartmut Reuter, RZG, GermanyChristof Hanke, RZG, Germany

Felix Frank, DESY Zeuthen, Germany

Page 2: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Overview

● What is AFS/OSD ? (in case you didn't attend prior presentations)

● Update to AFS/OSD since Graz

– Policies

– Embedded Filesystems

● Tutorial

– Typical use cases for AFS/OSD

– Setting up AFS/OSD

– Backup strategies for data in OSDs and metadata, link counts

– Running a cell with AFS/OSD

● Reference: New commands and subcommands

Page 3: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

3

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

OSDs

IP-network

clients

Metadata server

● Object storage systems are distributed filesystems which store data in object storage devices (OSD) and keep metadata in metadata servers.

● Examples of object storage systems are Lustre, Panasas

● Access on a client to a file in object storage consists in the following steps:

– rpc to metadata server which checks permissions and returns a special ancrypted handle for each object the file consists of.

– rpc to the OSDs to read or write the objects using the handle obtained from the metadata server

● The OSD can decrypt the handle by use of a secret shared between OSD and metadata server. The advantage of this technique is that the OSD doesn't need any knowledge about users and access rights.

● Files may be striped or mirrored over multiple OSDs

:

What is object storage

Page 4: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

4

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

“AFS/OSD” or “OpenAFS + Object Storage“

● Is an extension to OpenAFS which allows files to be stored in object storage.

● The OSDs (object storage devices) for OpenAFS use the rx-protocol and are therefore called “rxosd”.

● The AFS fileserver stores the metadata of files which are in object storage.

● A new ubik-database stores location and features of the OSDs.

● The decision which files should go into object storage can be based on policies.

● Files in object storage can be simple or striped over multiple OSDs, and/or have copies in multiple OSDs

● AFS/OSD allows to use HSM systems to migrate inactive files to tape

rxosds

clients

AFS fileserver

archival rxosd

HSM system

OSDDB

Page 5: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 5

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Example: reading a file in OSD

osddb fileserver

1

osd

osd

osd

osd

The fileserver gets a list of osds from the osddb (1).The client knows from status information that the filehe wants to read is in object storage. Therefore itdoes an rpc (2) to the fileserver to get its location and thepermission to read it and to lock the file during the asynchronous fetch operation.The fileserver returns an encrypted handle to the client. With this handle the client gets the data from osd (3) and does an EndAsynchronousFetch RPC to the fileserver to release the lock (4)..

client

2

3

24

Page 6: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 6

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Example: writing a file into OSD

osddb fileserver

1

osd

osd

osd

osd

(1) The fileserver gets a list of osds from the osddb. (2) The client asks the fileserver where to store the file it has in his cache. (3) The fileserver decides following his policies to store it in object storage and chooses an osd where it allocates an object. (4) The client asks the fileserver for permission and location of the object and to start an asynchronous store operation which locks the file(5) The client stores the data in the osd using the encrypted handle it got from the fileserver.(6) The client does an EndAsyncStore RPC to the fileserver to set the actual length of the file and to release the lock.

client

2

5

34 6

Page 7: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 7

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

• Data are stored in objects inside OSDs.

– simplest case: one file == one object

Object

OSDs

Objects

Page 8: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 8

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● Data of a file could be stored in multiple objects allowing for– data striping (up to 8 stripes, each in a separate OSD)

– data mirroring (up to 8 copies, each in a separate OSD)

Objects

OSDs

Multiple Objects

Page 9: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 9

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● To a file existing on some OSDs later more data could be appended. The appended data may be stored on different OSDs (in case there is not enough free space on the old ones)

– This leads to the concept of segments

– The appended part is stored in objects belonging to a new segment.

Objects + Segments

OSDs

Page 10: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 10

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● The whole file could get a copy on an archival OSD (tape, HSM)

– This leads to the concept of file copies

Objects + Segments + File Copies

Archival OSD

Page 11: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 11

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● The vnode of an AFS file points to quite a complex structure of osd metadata:

– Objects are contained in segments

– Segments are contained in file copies

– additional metadata such as md5-checksums may be included.

● Even in the simplest case of a file stored as a single object the whole hierarchy (file copy, segment, object) exists in the metadata.

● The osd metadata of all files belonging to a volume are stored together in a single volume special file

– This osdmetada file has slots of constant length which the vnodes point to

– In case of complicated metadata multiple slots can be chained

● The osd metadata are stored in network byte order to allow easy transfer during volume move or replication to other machines.

The complete osd metadata

Page 12: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 12

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● These structures are serialized in net-byte-order by means of rxgen-created xdr-routines into slots of the volume special file “osdmetadata”.

len val

osd_p_fileList

archvers archtime spare len val len val

osd_p_file segmList metaList

length offset stripes stripesize copies len val

osd_p_segm objList

obj_id part_id osd_id stripe

osd_p_obj

● In memory the file is described by a tree of C-structures for the different components

osd_p_meta

osd metadatain memory

Page 13: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

13

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Update to OpenAFS/OSD

● Since the last European meeting in Graz some new things came in

– Felix Frank completed his work on osd policies

– At the Workshop at Stanford I presented OpenAFS/OSD with embedded cluster filesystems (Lustre or GPFS)

● After the introduction of git and gerrit this year integration of OSD code into OpenAFS could start. After discussions in openafs-devel and AFS3-std

– a new transaction model has been applied to asynchronous data access enforcing mandatory locking

– dump tags have been changed to be compliant to the AFS3 standard proposal

● A web site at RZG integrates all information about OpenAFS/OSD and gives access to the source in the subversion server at DESY.

– http://pfanne.rzg.mpg.de/trac/openAFS-OSD

Page 14: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

14

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Policies

● Since the last workshop Felix Frank (DESY Zeuthen) implemented the policies. Policies are rules stored in the OSD-database which have a number and a name. A policiy number can be set for a whole volume or individual directories inside the volume.

● The fileserver gets the policies from the OSD-database and applies the rules when new files in the directory or volume are created or written for the 1st time.

● Policies basically can use filenames (suffixes or regular expressions) and file size to state whether a file

– should be stored on the local disk or go into object storage

– and if into object storage whether striped or mirrorered

– and if striped with which stripe size.

● Simple example:

Means: all files with suffix „.root“ should be stored in object storage independent of their size. (Typical problem in the HEP-community because the program root does an fsync() after writing the tiny header or the file).

3 root

~'*.root' => location=osd, stop;

Page 15: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

15

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

File creation rate

Evaluation of a policy has its price:

OSD off means normal AFS volume.

OSD no policy means files > 1 MB go into OSD

Most expensive is evaluation of regular expressions on file names.

Measurement and graphic by Felix Frank.

Page 16: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

16

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Embedded Filesystems

● Modern cluster file-systems such as Lustre or GPFS are much faster than AFS especially in combination with fast networks (Infiniband, 10GE)

● But they have other disadvantages:

– Because of giant block size inefficient for small files

– File creation and deletion is slow

– No secure way to export into WAN or even to desktops

– Accessible only from Linux (or AIX in case of GPFS).

● This is exactly complementary to AFS. Therefore combine both !

– Use these fast file-systems for rxosd (or fileserver) partitions

– Export them to your trusted batch clusters and HPC environment

– Allow the AFS clients in the batch cluster to read and write files directly

– Let all other clients access data with OSD protocol or classical AFS protocol.

Page 17: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

17

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Embedded Filesystems 2

● OSD files:

– rxosd writes an “osdid” file into the partition

– „afsd -check-osds“ informs cache manager about visible OSDs

– If a file consists in a single object in a visible OSD it's accessed directly

● Plain AFS files:

– Fileservers write a “partid” file into the partition

– „afsd -check-fspartitions“ inform the cache manager about visible vicep partitions

● The fileserver with the uuid found in the partid file gets a flag● Volumes on such a fileserver also get a flag

– If the volume is flagged the CM uses a GetPath RPC to the fileserver

– If the file can be opened all I/O is done directly

● Locking :

– During these asynchronous I/O transactions files are locked in the fileserver

Page 18: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

18

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

HEPIX-Tests (1)

● The “HEPIX Storage Working Group“ developed last year a use case for distributed storage systems based on CERN's soft- and middleware stack for CMS

● In a 1st round in 2008 at FZK (Forschungszentrum Karlsruhe) Andrei Maslennikov compared: AFS, DCACHE, LUSTRE, and XROOTD.

Source: „HEPIX storage working group, - progress report, 2.2008-”, Andrei Maslennikov, Taipei October 20, 2008

● Lustre performs much better than AFS, DCACHE and XROOTD

Page 19: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

19

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

HEPIX-Tests (2)

● In June this year again at FZK, but with different server hardware only Lustre and in AFS embedded Lustre were compared.

Source: SMS from Andrei Maslennikov , June, 3,2009

● AFS with embedded Lustre comes closer to Lustre native.

20 Thrds. 40 Thrds. 60 Thrds. 80 Thrds.0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

LustreAFS / Lustre

Page 20: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

20

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

TUTORIAL

● Typical use cases for AFS/OSD

– AFS with HSM system

– AFS with embedded Lustre or GPFS

– AFS with just disk OSDs

● Setting up AFS/OSD

● Backup strategies for data in OSDs and metadata

● Running a cell with AFS/OSD

– Archiver and wiper

– Gathering information with „vos traverse“

– Health check with „vos salvage“

– How to replacing OSDs

– What's to do when a RAID breaks

Page 21: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

21

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Typical Use Cases

There are use cases where you would like to have OpenAFS/OSD

● If you have an HSM system and want make use of it for AFS

– to get „unlimited“ disk space for AFS

– to remove inactive files from disk instead of buying more and more disks

– to have better backup of your data

● If you have a fast cluster filesystem such as Lustre or GPFS and want integrate it into AFS

– to get nearly native cluster filesystem speed for AFS inside the cluster

– to make files in the cluster filesystem accessible

● from all platforms● from all over the world

● If you have both (HSM and fast cluster filesystem)

– You can also add HSM features to the cluster filesystem

Page 22: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

22

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Other Use Cases

Also without HSM and cluster filesystems you may want to use OpenAFS/OSD

● If you have many fileserver machines take on each one a partition away for OSD

– to make huge volumes easier to handle (move, dump)

– to distribute files more equally over your disk resources

– to get better throughput in heavily used volumes

– to allow for RW mirroring of files (better read throughput, lower risk to loose data)

– to allow for striping of files (better read and write throughput)

Classical OpenAFS OpenAFS/OSD

– Same amount of disk space, but split between fileserver (lower piece) and OSD (upper piece).

– Load to server machines balanced even if only a single volume used at a time.

– Huge volumes can still be moved or replicated.

Page 23: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

23

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Setting Up AFS/OSD

● Adding OSD support to your production cell requires the following steps:

– Get RPMs from Christof Hanke's builds or compile it yourself from the source

– Install and start osddbserver on database machines

– Install rxosd on the OSD machines

– Create database entries for your OSDs in the OSDDB

– Start OSDs

– Install fileserver, volserver, salvager on your 1st AFS/OSD fileserver machine

– Create new volumes there or move existing OpenAFS volumes to it

– Set osdpolicy in the volumes which should use OSDs

Page 24: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

24

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

RPMs for OpenAFS/OSD

● OpenAFS/OSD as we use it today is based on the openafs-1.4.11 source.

● For some Linux flavors Christof Hanke builds rpms of OpenAFS/OSD

– CentOS_5

– Fedora_10

– Mandriva_2009

– RHEL_5

– SLES_9

– SLE_10

– SLE_11

– openSUSE_11.0

– 0penSUSE_11.1

● See http://download.opensuse.org/repositories/home:/hauky/ for details

– The RPMS built with OSD support have 'osd' in their name.

Page 25: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

25

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Building OpenAFS/OSD

● For all other systems you need to build it yourself

– svn checkout http://svnsrv.desy.de/openafs-osd/trunk/openafs

– I use the following options for configure

● We have built OpenAFS/OSD only on the following platforms

● AIX (5.3, 4.2)● SOLARIS (5.8, 5.9, 5.10)● LINUX 2.6 (i386, amd64, ppc64)● MacOS 10

  > ./configure  ­­enable­transarc­paths ­­enable­largefile­fileserver   ­­enable­bos­restricted­mode ­­enable­bos­new­config ­­enable­namei­fileserver   ­­enable­fast­restart ­­enable­bitmap­later ­­enable­debug­kernel   ­­disable­optimize­kernel ­­enable­debug ­­enable­debug­lwp ­­disable­optimize   ­­disable­optimize­lwp –enable­object­storage ­­enable­vicep­access  

Page 26: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

26

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Creating OSDDB

● Install on your AFS database servers the binary „osddbserver“ and create the instance.

– This gives you an empty OSDDB with only a dummy entry for OSD 1 „local_disk“

– The fileserver uses the size range for OSD 1 to decide which files should be stored in OSDs. Default is (0kb-1mb). So files > 1mb go into OSDs.

● Now you may define your 1st OSD in the OSDDB

● The status remains „down“ until the OSD has sent its space information. This has to happen all 5 minutes. If not it's marked „down“ again.

  > bos install mydbserver osddbserver bos: installed file osddbserver   > bos create mydbserver osddbserver simple /usr/afs/bin/osddbserver > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb) >

  > osd createosd 2 myosd­z ­ip 130.183.2.114 ­lun 25 ­minsize 1m ­maxsize 100g  > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb)   2 myosd­z            0 gb   0.0 % down        0   0     myosd    25 (1mb­100g) >

Page 27: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

27

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Creating OSDDB 2

● To change fields use 'osd setosd <number> ....'

● If for a given size range more than one OSD are available the one with the higher 'wrprior' will be chosen.

● An OSD may have an owner and/or a location (both max 3 char.)

– The idea is to let fileservers at a remote location use only OSDs there

– and to let use OSDs some project paid for only by that project

– OSDs with the same location as the fileserver get a bonus of 20 on 'wrprior'

– OSDs with other or no location get a malus of -20 on 'wrprior'

– For owner it's +10 and -10

– The system remains elastic: if all other OSDs are full you still may get space remotely or on a foreign OSD

  > osd setosd 2 ­wrprior 80  > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb)   2 myosd­z            0 gb   0.0 % down       80   0     myosd    25 (1mb­100g) >

Page 28: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

28

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Setting Up 1st OSD

● Install and start on the machine where your 1st rxosd should run the binary 'rxosd'

● If you expect run on your OSD machine also a filesever

– Put a file “OnlyRxosd” into the OSD's /vicep-partition to let the fileserver ignore it

● Install the fileserver, volserver, salvager binaries and create new instance 'fs'

  > bos install myosd rxosd bos: installed file rxosd   > bos create myosd rxosd osd /usr/afs/bin/rxosd > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb)   2 myosd­z         2500 gb   0.0 % up         80   0      myosd   25 (1mb­100g) >

  > bos create myfs fs „/usr/afs/bin/fileserver ­L“ /usr/afs/bin/volserver /usr/afs/bin/salvager >

Page 29: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

29

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Policy Number 1

● To make use of the new OSD you need a volume with an osd policy set in its header

– Policy number 1 cannot to be defined explicitly in the OSDDB

● It says: store files > max size for local_disk in OSDs

– If you don't have special needs you can live with only policy 1

● If no filename based policy is applied all files will be created in the fileserver's partition.

– Before the cache manager stores the 1st time it does an ApplyOsdPolicy RPC to the fileserver telling him the actual length of the file

– At this time the fileserver may or may not decide to move the file to an OSD

– The problem with a too large max size for local_disk is

● With a small cache the cache manager may have to store data of the file before its length reaches this threshold.

● Later appends cannot change the file's location any more.

  > vos setfields myvolume ­osdpolicy 1 >

Page 30: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

30

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Backup Strategy 1

● „vos dump“ generally will only dump

– files (and directories …) which are stored on the fileserver

– the OSD metadata of the files which are stored in OSDs.

● There is an option “-osd” for “vos dump” which also integrates the data of the files in OSDs as if they were local files

– It is thought primarily to allow restoring of OSD-volumes on normal fileservers

– You shouldn't use “-osd” if you have HSM and are not sure all files are on-line.

– The dump files could become very large for large volumes.

● Therefore it's better to create copies of the objects on archival OSDs

– If you have HSM these typically are OSDs acting on HSM filesystems

– If not you may use any old cheap slow disks as archival OSDs.

● Instead of doing dumps you also may replicate the volumes to other servers

– OSD volumes mostly have only small files and directories in the fileserver

– After loss of a partition you are very fast back with “vos convertROtoRW”.

Page 31: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

31

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Backup Strategy 2

● Creating backup copies of objects in archival OSDs can easily be automated

– There are scripts to be run on the database servers as a bos instance

– The script works on the osdserver's sync site and is stand-by on the others

– It gets all information from the OSDDB and asks the volservers for candidates

– For these candidates it issues „fs fidarchive“ commands

– If you have multiple archival OSDs you may run separate scripts for each one

● guarantees best throughput by having a constant input stream per OSD● Allows you to start and stop them separately in case of down times

● To check the backup works properly run „vos traverse“ over all servers once a day

– It shows you in the section „Data without a copy“ how much data are vulnerable

– In the 1st hour data don't get a copy to avoid intermediate versions to be saved

● If you have a suspect you may run “vos listobjects <osd number> server -single -size”

– Shows you all object IDs (Fids) of files without a copy elsewhere

Page 32: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

32

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Backup Strategy 3

● Some sites (also we) use TSM to backup AFS volumes over the /afs path

– This allows users to restore deleted files or old versions of files on their own.

– With HSM you shouldn't do this for OSD volumes because files may be wiped causing TSM to wait long time to bring them back on-line from tape.

● For OSD volumes we are sometimes able to restore deleted OSD files by

– Never really unlinking files in HSM archival OSDs, but renaming them adding a suffix “-unlinked-yyyy-mm-dd”. This simulates the “soft delete” known in some HSM systems (DMF). After a grace period of a month these files may be deleted.

– This alone wouldn't help for deleted files because the Fid isn't known anymore.

– Therefore we do once in a while a “vos dump -osdmetadata” which only contains directories and OSD metadata. The tar.bz2 file for all volumes is < 1 GB.

– With Ken Hornstein's „dumptool“ you can walk in the volume's tree and find the OSD metadata of the lost file.

– From the OSD object ID you can then calculate the namei-path ...

Page 33: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

33

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Link Counts

● The link count of a file in a vice-partition indicates how many volumes a file belongs to

– For local AFS files typically you have a RW volume and in the same partition a RO-volume, thus a link count of 2. With a backup volume you reach 3.

– During „vos move“ a temporary volume is cloned which increments the link count.

– Removing of volumes or the file in the RW-volume decrements the link count.

– If the link count goes to zero the file gets unlinked.

● For files in OSDs the link count is handled exactly the same way,

– But it can become higher because volume copies in any fileserver are counted.

– If a file is deleted while the OSD is down the link count cannot be decremented.

– Therefore it's very important to make sure the link counts are correct

● If it's too low you can loose the file when it still belongs to a volume● If it's too high an orphaned file will later remain in the OSD

● „vos salvage“ checks for all files the size and the link count and can also be used to correct them.

– It calculates the correct link count from the number of copies known in the vldb

Page 34: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

34

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Archival OSDs

● For backup and HSM you need archival OSDs

● Archival OSDs may be OSDs with large cheap RAID systems

– If you need for capacity reasons more then one archival disk OSD you may give them different file size ranges to distribute data equally

● If you have an HSM system use it as archival OSD

– Space on tapes is cheaper than on disk and doesn't consume power

– You can get nearly unlimited space on a HSM system

   > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb)   2 myosd­z         2500 gb  40.0 % up         80   0      myosd   25 (1mb­100g)   3 arch­1 2000 gb  10.3 % up   arch  80  80      myback   0 (0kb­64mb)   4 arch­2          2000 gb  30.5 % up   arch  80  80      myback   1 (64mb­256mb)   5 arch­3          4000 gb  12.7 % up   arch  80  80      myback2  0 (256mb­100gb) >

Page 35: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

35

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

The archiver

● The 'archive1' scripts (one per archival OSD) run on multiple database servers

– Only on the OSDDB's sync site they are active otherwise only standby

● 'archive1' asks the volservers in a loop whether they have have candidates to archive

– With 'vos archcand' it gets a sorted list of files needing an archival copy

– With 'fs fidarchive' it creates the archival copies.

● 'fs fidarchive' adds the archival copy to the OSD metadata of the file● The OSD metadata of archival copies contain the md5 checksum.

● Running one script per archival OSD guarantees the maximal throughput

– Because all archival OSDs can be filled in parallel

– Because there is only one input stream into the archival OSD.

● The 'archive1' script waits when the archival OSD is filled over its high water mark

– That will give an underlying HSM system the chance to move the data onto tape.

Page 36: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

36

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Wipeable OSDs

● If you got unlimited archival space from an underlying HSM system you may give HSM functionality to your on-line OSDs

– Set the 'hsm' flag for the OSD

– Define the high water mark when wiping should start (perhaps at 85 %)

– You may also define a minimal wipe size to avoid small files to be wiped

● In this example minimal wipe size was set to 64mb, so all smaller files remain on-line

● To actually perform automatic wiping run the 'wiper' script on the database servers.

– It also updates the 'newest wiped' field in the OSDDB so you get an idea how long data stay on disk

   > osd list  id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range   1 local_disk                                 wr  rd                 (0kb­1mb)   2 myosd­z         2500 gb  84.0 % up   hsm   80   0      myosd   25 (1mb­100g)   3 arch­1 2000 gb  10.3 % up   arch  80  80      myback   0 (0kb­64mb)   4 arch­2          2000 gb  30.5 % up   arch  80  80      myback   1 (64mb­256mb)   5 arch­3          4000 gb  32.7 % up   arch  80  80      myback2  0 (256mb­100gb) > osd list ­wipeable  id name(loc)       size    state   own   usage   limit  wipe >   newest wiped   2 myosd­z         2500 gb up            84.0 %  85.0 %  64 mb   Jul 31 2008 >

Page 37: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

37

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

The wiper

● The 'wiper' script runs on multiple database servers

– Only on the OSDDB's sync site it is active otherwise only standby

● It checks for OSDs becoming filled up more than the high water mark (typically 85%)

– With 'osd wipecandidates' it gets from the OSD a sorted list of the longest unused objects (based on atime)

– With 'fs fidwipe' it removes the on-line copies of files in the RW-volumes until enough blocks have been freed

● Because of the RO-volumes the space is actually freed only after the volumes have been relesased by the 'wiper' script (for the same reason you shouldn't have backup volumes!)

● It updates the OSDDB database entry of the OSD with the new newest wiped file date.

● After checking all wipeable OSDs it sleeps up to10 minutes and starts again

Page 38: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

38

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Using Embedded Filesystem

● As said before: if you have clusters with fast shared filesystems use them for AFS

– Mount the cluster filesystem as /vicep<x> on all cluster machines

– Run rxosd on one machine and create the corresponding OSDDB entry

– Make the client aware of it by „afsd -check-osd“ (put it into the startup script)

– Allow direct vicep-access on the client by „fs protocol -enable vicepaccess“

– If the filesystem is Lustre you need a special hack:

● „fs protocol -enable lustrehack“

– If you need highest read performance run „fs protocol -enable fastread”

● It does only a StartAsyncFetch at the begin● and EndAsyncFetch at the end instead for each chunk.● Not a good idea if read and write may compete on the same file

● Generally it would also be possible to export fileserver partitions

– „afsd -check-fs“ required on the client

– The many small files in volumes may not be optimal for shared filesystems

Page 39: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

39

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

osd list

  >  osd list id name(loc)     ­­­total space­­­      flag  prior. own. server lun size range  1 local_disk                                 wr  rd                 (0kb­1mb)  4 raid6           5119 gb  69.0 % up   arch  64  64     afs15.rz  0 (0kb­8mb)  5 tape            8924 gb  24.5 % up   arch  64  40     styx.rzg  0 (1mb­500gb)  8 afs16­a         4095 gb  81.1 % up   hsm   79  80     afs16.rz  0 (1mb­100gb)  9 mpp­fs9­a      11079 gb  84.9 % up   hsm   80  80 mpp mpp­fs9.  0 (1mb­500gb) 10 afs4­a          4095 gb  77.1 % up   hsm   79  80     afs4.bc.  0 (1mb­100gb) 11 w7as(hgw)       2721 gb  67.1 % up         80  80     afs­w7as  0 (1mb­100gb) 12 afs1­a          1869 gb  92.6 % up         70  80 tok afs1.rzg  0 (1mb­100gb) 13 hsmgpfs        31236 gb  14.5 % up   arch  64  30     hsmi.rzg 12 (8mb­500gb) 14 afs6­a          1228 gb  84.9 % up   hsm   70  80 tok afs6.rzg  0 (8mb­100gb) 23 mpp­fs11­gj     5580 gb  70.7 % up   hsm   80  80 mpp mpp­fs11 191 (1mb­100gb) 24 mpp­fs12­a      6143 gb  78.1 % up   hsm   80  80 mpp mpp­fs12  0 (1mb­100gb) 25 mpp­fs13­a      6143 gb  84.0 % up   hsm   80  80 mpp mpp­fs13  0 (1mb­100gb) 32 afs8­z          1023 gb   1.3 % up   hsm   12  80     afs8.rzg 25 (1mb­100gb) 34 afs17­gb        5585 gb  85.0 % up   hsm   80  80     afs17.rz 183 (1mb­500gb) 35 afs18­ga        5585 gb  85.0 % up   hsm   80  80     afs18.rz 182 (1mb­500gb) 36 afs21­ge        5585 gb  85.0 % up   hsm   80  80     afs21.rz 186 (1mb­500gb) 37 afs22­gf        5585 gb  84.9 % up   hsm   80  80     afs22.rz 187 (1mb­500gb) 38 afs19­gc        5585 gb  83.7 % up   hsm   79  80     afs19.rz 184 (1mb­500gb) 39 afs20­gd        5585 gb  82.8 % up   hsm   79  80     afs20.rz 185 (1mb­500gb) 40 afs23­gg        5585 gb  74.9 % up   hsm   79  80     afs23.rz 188 (1mb­500gb) 41 mpp­fs15­gi     5580 gb  84.7 % up   hsm   80  80 mpp mpp­fs15 190 (1mb­500gb) 42 mpp­fs14­gh     5580 gb  84.8 % up   hsm   80  80 mpp mpp­fs14 189 (1mb­500gb) 43 mpp­fs10­gk     5580 gb  84.1 % up   hsm   80  80 mpp mpp­fs10 192 (1mb­500gb) 44 sxbl19­z        6656 gb   3.5 % up   hsm   80  80 aug sxbl19.a 25 (1mb­500gb)>

Page 40: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

40

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

osd list -wipeable

● With -wipeable you get only the wipeabble osds, but with additional information

– Newest wiped shows you how long unused files stay on-line

  >  osd list ­wipeable Iid name(loc)       size    state   own   usage   limit  wipe >   newest wiped  8 afs16­a         4095 gb up            81.1 %  85.0 %  64 mb   Jul 31 2008  9 mpp­fs9­a      11079 gb up      mpp   84.9 %  85.0 %  64 mb   Oct 21 2008 10 afs4­a          4095 gb up            77.1 %  85.0 %  64 mb   Aug 14 2008 14 afs6­a          1228 gb up      tok   84.9 %  85.0 %  64 mb   May 19 2008 23 mpp­fs11­gj     5580 gb up      mpp   70.7 %  85.0 %  64 mb 24 mpp­fs12­a      6143 gb up      mpp   78.1 %  85.0 %  64 mb 25 mpp­fs13­a      6143 gb up      mpp   84.0 %  85.0 %  64 mb 32 afs8­z          1023 gb up             1.3 %  80.0 %  64 mb   Jul 31 2008 34 afs17­gb        5585 gb up            85.0 %  85.0 %  64 mb   Jun 30 2009 35 afs18­ga        5585 gb up            85.0 %  85.0 %  64 mb   Apr 24 2009 36 afs21­ge        5585 gb up            85.0 %  85.0 %  64 mb   May 31 2009 37 afs22­gf        5585 gb up            84.9 %  85.0 %  64 mb   Jun 14 2009 38 afs19­gc        5585 gb up            83.7 %  85.0 %  64 mb   Jun 25 2009 39 afs20­gd        5585 gb up            82.8 %  85.0 %  64 mb   Aug  2 2009 40 afs23­gg        5585 gb up            74.9 %  85.0 %  64 mb   Jun 19 2009 41 mpp­fs15­gi     5580 gb up      mpp   84.7 %  85.0 %  64 mb   Feb 25 2009 42 mpp­fs14­gh     5580 gb up      mpp   84.8 %  85.0 %  64 mb   Feb 25 2009 43 mpp­fs10­gk     5580 gb up      mpp   84.1 %  85.0 %  64 mb   Jun 21 2009 44 sxbl19­z        6656 gb up      aug    3.5 %  85.0 %  64 mb>

Page 41: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

41

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Where are my files?

● There are some new or extended „fs“ subcommands:

– „fs ls“ gives an output similar to „ls -l“

– „fs whereis“ shows also OSDs

– „fs [fid]vnode“ shows all vnode fields. For OSD files also the index in the volume's OSD metadata file

– „fs [fid]osd“ shows the OSD metadata

● The command „osd“ has some subcommands to show what is stored in an OSD

– „osd volumes“ shows the RW-voiume ids present in the OSD

– „osd objects“ shows all objects in the OSD belonging to a specified volume

– “osd examine” shows details about a single object

● The command “vos” got some new subcommands

– “vos traverse” shows file size statistic and the number of objects per OSD

– “vos listobjects” shows object-ids of all objects on a specified OSD

– “vos salvage” does a health check for a volume checking sizes and link counts

Page 42: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

42

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

„fs ls“

● „fs ls“ gives an output similar to „ls -l“

● The character in column 1 tells you what it is:

d == directory

f == normal AFS file in fileserver partition

l == symbolic link

m == mount point

o == object file (on-line)

w == wiped object file (off-line)

  > fs ls f rw­     hwr      550615 1998­07­01 13:04:21 arla­0.7.2.tar.gz l rwx     hwr          11 2009­09­18 19:57:51 servers ­> tmp/servers m rwx     root       2048 private o rwx     hwr    63128640 1999­03­17 12:23:03 unicosmk.cray­t3e.20443 d rwx     hwr        2048 1995­07­26 09:47:14 tmp w rw­  daemon    79829905 1997­01­08 09:22:50 ymp_uni80.mrafs34a.wrk >

Page 43: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

43

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

„fs whereis“

● „fs whereis“ has been extended to show also the OSDs where the data are stored:

  > fs ls ymp_uni80.mrafs34a.wrk w rw­  daemon    79829905 1997­01­08 09:22:50 ymp_uni80.mrafs34a.wrk > fs whereis ymp_uni80.mrafs34a.wrk File ymp_uni80.mrafs34a.wrk is on host afs16.rzg.mpg.de  Osds: tape hsmgpfs > 

Page 44: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

44

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

„fs vnode“

● „fs vnode“ shows all vnode fields

– Length shows also in hex notation vn_length_hi and length.

– This file is in object storage and doesn't have therefore an inode number

– vn_ino_hi is filled with the uniquifier in namei-fileservers. LastUsageTime is stored at this location instead.

  >  fs vnode ymp_uni80.mrafs34a.wrk File  536879945.290.46282         modeBits         = 0644         linkCount        = 1         author           = 2         owner            = 2         group            = 4132         Length           = 79829905      (0x0, 0x4c21b91)  76.130 MB         dataVersion      = 3         unixModifyTime   = 1997­01­08 09:22:50         serverModifyTime = 2009­05­10 17:13:48         vn_ino_lo        = 0    (0x0)         lastUsageTime    = 2009­02­25 10:46:39         osd file on disk = 0         osdMetadataIndex = 75         parent           = 1 > 

Page 45: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

45

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● For an OSD file you can see the OSD metadata with „fs osd“:

● This file is not on-line, only in two archival OSDs (5 and 13)

– The file had been restored once from the 1st archival copy on Feb. 25

– Both archive are form data version 3 and have therefore the same md5 sum

– flag=0x2 means it has been checked that the file was copied to tape

  > fs osd ymp_uni80.mrafs34a.wrk ymp_uni80.mrafs34a.wrk has 312 bytes of osd metadata, v=3 Archive, dv=3, 2007­07­19 17:39:30, 1 fetches, last: 2009­02­25, 1 segm, flags=0x2     segment:         lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536879945.290.46282.0, osd=5, stripe=0     metadata:         md5=6441333f2acdae8833898bebaf2041d2  as from  2007­07­19 17:39:30 Archive, dv=3, 2009­02­25 10:46:39, 1 segm, flags=0x2     segment:         lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536879945.290.46282.0, osd=13, stripe=0     metadata:         md5=6441333f2acdae8833898bebaf2041d2  as from  2009­02­25 11:52:02 >

„fs osd“

Page 46: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

2008­05­22 AFS & Kerberos Best Practice Workshop 46

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

osd metadata in memory

● These structures are serialized in net-byte-order by means of rxgen-created xdr-routines into slots of the volume special file “osdmetadata”.

len val

osd_p_fileList

archvers archtime spare len val len val

osd_p_file segmList metaList

length offset stripes stripesize copies len val

osd_p_segm objList

obj_id part_id osd_id stripe

osd_p_obj

● In memory the file is described by a tree of C-structures for the different components

osd_p_meta

Page 47: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

47

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

● For an OSD file you can see the OSD metadata with „fs osd“:

● This file is not on-line, only in two archival OSDs (5 and 13)

– The file had been restored once from the 1st archival copy on Feb. 25

– Both archive are form data version 3 and have therefore the same md5 sum

– flag=0x2 means it has been checked that the file was copied to tape

  > fs osd ymp_uni80.mrafs34a.wrk ymp_uni80.mrafs34a.wrk has 312 bytes of osd metadata, v=3 Archive, dv=3, 2007­07­19 17:39:30, 1 fetches, last: 2009­02­25, 1 segm, flags=0x2     segment:         lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536879945.290.46282.0, osd=5, stripe=0     metadata:         md5=6441333f2acdae8833898bebaf2041d2  as from  2007­07­19 17:39:30 Archive, dv=3, 2009­02­25 10:46:39, 1 segm, flags=0x2     segment:         lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536879945.290.46282.0, osd=13, stripe=0     metadata:         md5=6441333f2acdae8833898bebaf2041d2  as from  2009­02­25 11:52:02 >

Archive, dv=3, 2007­07­19 17:39:30, 1 fetches, last: 2009­02­25, 1 segm, flags=0x02

„fs osd“

Archive, dv=3, 2009­02­25 10:46:39, 1 segm, flags=0x02

segment:    lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects

segment:    lng=79829905, offs=0, stripes=1, strsize=0, cop=1, 1 objects

object:    obj=536879945.290.46282.0, osd=13, stripe=0

object:    obj=536879945.290.46282.0, osd=5, stripe=0

metadata:    md5=6441333f2acdae8833898bebaf2041d2  as from  2009­02­25 11:52:02

metadata:    md5=6441333f2acdae8833898bebaf2041d2  as from  2007­07­19 17:39:30

Page 48: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

48

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Running a cell with OSDs

● Once the services and scripts for archiving and wiping are started the cell should run automatically. However, it is use-full

– to check daily the logs to detect anomalities and errors

– to run once a day a 'vos traverse' over all fileservers to check that all OSD files got copies in archives

– to run once in a while (per week or month) a script that does a 'vos release' followed by a 'vos salvage' for all volumes as a health check

Page 49: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

49

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Analyzing Logfiles

● You find the following lines in a FileLog:

● To find out which file belongs to FID 536986017.82560.85368 just do

  Sat Sep 19 07:55:45 2009 [72] Wrong md5 sum 06bb7f6e2b488bf2b4db6a115589def2 instead of 5789032929755570e2e0c95849d5353f Sat Sep 19 07:55:45 2009 [72] SetReady of 536986017.82560.85368 after restore to OSD 0 from archival OSD 0 failed with 5

  > fs fidvnode 536986017.82560.85368 File  536986017.82560.85368         modeBits         = 0664         linkCount        = 1         author           = 15122         owner            = 15122         group            = 2400         Length           = 15669199      (0x0, 0xef17cf)  14.942 MB         dataVersion      = 81         unixModifyTime   = 2008­01­30 05:12:04         serverModifyTime = 2009­06­02 20:44:20         vn_ino_lo        = 0    (0x0)         lastUsageTime    = 2008­02­04 09:14:56         osd file on disk = 0         osdMetadataIndex = 2682         parent           = 1 Path = {Mountpoint}/snapshot_18880.datafile.corrupted>

Page 50: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

50

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Analyzing Logfiles

● To see the osd metadata of this file do

● To see the objects in the OSDs do

● Here is something wrong on OSD 5 because the file is older than the md5 sum !

  > fs fidosd 536986017.82560.85368 536986017.82560.85368 has 284 bytes of osd metadata, v=3 Being­restored, 1 segm, flags=0x1     segment:         lng=0, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536986017.82560.85368.0, osd=39, stripe=0 Archive, dv=81, 2008­02­09 08:39:47, 1 segm, flags=0x2     segment:         lng=15669199, offs=0, stripes=1, strsize=0, cop=1, 1 objects         object:             obj=536986017.82560.85368.0, osd=5, stripe=0     metadata:         md5=5789032929755570e2e0c95849d5353f  as from  2008­02­09 08:39:47 >

 > osd ex 39 536986017.82560.85368.0 536986017.82560.85368.0 fid 536986017.82560.85368 tag 0 not­striped lng 15669199 lc 3 Sep 19 07:56 > osd ex 5 536986017.82560.85368.0 536986017.82560.85368.0 fid 536986017.82560.85368 tag 0 not­striped lng 15669199 lc 3 Sep 25 2007

Page 51: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

51

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

 File Size Range    Files      %  run %     Data         %  run % ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­   0  B ­   4 KB 69881317  50.43  50.43   86.903 GB   0.02   0.02   4 KB ­   8 KB 11085714   8.00  58.43   60.663 GB   0.02   0.04   8 KB ­  16 KB  9663216   6.97  65.40  103.294 GB   0.03   0.07  16 KB ­  32 KB 10923970   7.88  73.29  228.964 GB   0.07   0.14  32 KB ­  64 KB  8411853   6.07  79.36  384.437 GB   0.11   0.25  64 KB ­ 128 KB  6564296   4.74  84.09  571.588 GB   0.16   0.41 128 KB ­ 256 KB  4397953   3.17  87.27  778.242 GB   0.22   0.63 256 KB ­ 512 KB  4750125   3.43  90.69    1.604 TB   0.47   1.10 512 KB ­   1 MB  3333510   2.41  93.10    2.161 TB   0.63   1.73   1 MB ­   2 MB  2209330   1.59  94.69    3.079 TB   0.90   2.64   2 MB ­   4 MB  2169849   1.57  96.26    5.737 TB   1.68   4.31   4 MB ­   8 MB  1995968   1.44  97.70   10.545 TB   3.08   7.40   8 MB ­  16 MB  1220014   0.88  98.58   13.267 TB   3.88  11.28  16 MB ­  32 MB   644008   0.46  99.05   13.301 TB   3.89  15.17  32 MB ­  64 MB   505218   0.36  99.41   20.875 TB   6.11  21.28  64 MB ­ 128 MB   325616   0.23  99.65   27.368 TB   8.01  29.28 128 MB ­ 256 MB   174111   0.13  99.77   29.835 TB   8.73  38.01 256 MB ­ 512 MB   153027   0.11  99.88   55.941 TB  16.36  54.37 512 MB ­   1 GB   142159   0.10  99.98   95.390 TB  27.90  82.28   1 GB ­   2 GB    16530   0.01 100.00   22.541 TB   6.59  88.87   2 GB ­   4 GB     2821   0.00 100.00    7.363 TB   2.15  91.02   4 GB ­   8 GB     1693   0.00 100.00    9.816 TB   2.87  93.90   8 GB ­  16 GB      720   0.00 100.00    6.897 TB   2.02  95.91  16 GB ­  32 GB      197   0.00 100.00    4.264 TB   1.25  97.16  32 GB ­  64 GB      130   0.00 100.00    5.724 TB   1.67  98.84  64 GB ­ 128 GB       27   0.00 100.00    2.277 TB   0.67  99.50 128 GB ­ 256 GB        6   0.00 100.00    1.052 TB   0.31  99.81 256 GB ­ 512 GB        2   0.00 100.00  667.311 GB   0.19 100.00 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Totals:        138573380 Files          341.869 TB

“vos traverse” output

93.1 % ofall files < 1MBon local disk

99,4 % ofall files are < 64MBpermanent on OSDsor local disk

74.5 TB

5.9 TB

341 TB

Only 0.6 % of all files may be wipedfrom disk

Page 52: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

52

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

  Storage usage: ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­                 1 local_disk  134158413 files      65.702 TB   arch. Osd     4 raid6         1202480 objects     3.403 TB   arch. Osd     5 tape          4281152 objects   273.501 TB         Osd     8 afs16­a         18591 objects     3.239 TB         Osd     9 mpp­fs9­a      423720 objects     8.875 TB         Osd    10 afs4­a          16572 objects     3.057 TB         Osd    11 w7as            97039 objects     1.743 TB         Osd    12 afs1­a         254659 objects     1.357 TB   arch. Osd    13 hsmgpfs       1700047 objects   236.401 TB         Osd    14 afs6­a           8326 objects  1009.701 GB         Osd    23 mpp­fs11­gj     61121 objects     3.838 TB         Osd    24 mpp­fs12­a      79479 objects     4.673 TB         Osd    25 mpp­fs13­a      75723 objects     5.028 TB         Osd    34 afs17­gb       159327 objects     4.501 TB         Osd    35 afs18­ga       112586 objects     4.594 TB         Osd    36 afs21­ge        99621 objects     4.541 TB         Osd    37 afs22­gf        80227 objects     4.587 TB         Osd    38 afs19­gc       112268 objects     4.588 TB         Osd    39 afs20­gd        87823 objects     4.303 TB         Osd    40 afs23­gg        70848 objects     4.063 TB         Osd    41 mpp­fs15­gi     75436 objects     4.625 TB         Osd    42 mpp­fs14­gh     83013 objects     4.618 TB         Osd    43 mpp­fs10­gk     75339 objects     4.606 TB         Osd    44 sxbl19­z        18489 objects   174.675 GB ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Total                         143352299 objects   657.023 TB

“vos traverse” output 2

„vos traverse“ tells you also where your data are

● 66 TB on filservers

● 77 TB on disk OSDs

● 510 TB on arch. OSDs (so much because we create two copies)

Page 53: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

53

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Where our data are

66 TB replicated 77 TB osd+tape199 TB only tape

● All data in the local partitions of the fileservers are replicated to other fileservers

● All data in disk OSDs have copies in archival OSDs (tape)

Page 54: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

54

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

 Data without a copy: ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ if !replicated: 1 local_disk  134158411 files      65.702 TB   arch. Osd     5 tape           750001 objects    30.400 TB         Osd     9 mpp­fs9­a         364 objects   978.990 MB         Osd    11 w7as                1 objects    71.377 MB         Osd    23 mpp­fs11­gj       169 objects   510.487 MB         Osd    24 mpp­fs12­a        164 objects   444.453 MB         Osd    25 mpp­fs13­a        164 objects   448.978 MB         Osd    34 afs17­gb            1 objects   488.316 MB         Osd    35 afs18­ga            1 objects     3.790 MB         Osd    36 afs21­ge            1 objects    54.628 MB         Osd    37 afs22­gf            1 objects     3.790 MB         Osd    38 afs19­gc            2 objects     7.580 MB         Osd    39 afs20­gd            1 objects    14.942 MB         Osd    40 afs23­gg            3 objects    69.175 MB         Osd    41 mpp­fs15­gi       171 objects   446.868 MB         Osd    42 mpp­fs14­gh         4 objects     7.362 MB         Osd    43 mpp­fs10­gk       183 objects     1.751 GB         Osd    44 sxbl19­z           59 objects    83.693 MB ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Total                         134909701 objects    96.108 TB

“vos traverse” output 3

„vos traverse“ tells you also which data are vulnerable

● 66 TB on filserverers are replicated and therefore not really vulnerable

● 30 TB on OSD 5 have still to be copied to the other archival OSD 13, but they have already 2 tape copies

● 6 GB of new data haven't got yet their copies on archival OSDs.

● Really vulnerable are only the 6 GB, less than 0.01 % of the total data!

Page 55: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

55

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

4 K

B

16 K

B

64 K

B

256

KB

1 M

B

4 M

B

16 M

B

64 M

B

256

MB

1 G

B

4 G

B

16 G

B

64 G

B

256

GB

0

10

20

30

40

50

60

Files

Data

FilesData

● The diagram shows number of files and amount of data over the logarithm of the file size.

● All data right of the red line at 64 MB can be wiped from disk staying only on tape

– This are only 0.59 % of the all files, but 79.2 % of the total data volume !

HSM for AFS

Page 56: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

56

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Retiring an OSD

● 1st set the wrprior for that OSD to 0: „osd setosd <OSD> -wrprior 0“ to prevent creation of new objects.

– Wait at least 5 minutes to make sure all fileservers saw your change !

● Use „vos listobj <OSD number> <server>“ for all servers to get a list of all the objects

● Do for all objects

– „fs fidreplace <object> <OSD number>

● This moves the object to another OSD● Use „osd volumes <OSD>“ to get a list of the volumes present there

● Do „vos release“ for all these volumes to ensure the RO-volumes don't have objects on the OSD.

● Run “vos traverse” over all servers to be sure no RW-volume contains any more an object on the retired OSD.

Page 57: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

57

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Salvaging OSD volumes

● “bos salvage” checks only consistency of the OSD metadata.

● To check also the data run „vos salvage <volume>“ (without options it just checks)

● In this example 1 object is too short. The low link count of 2 other objects is not necessarily an error.

– Probably these are archival copies which have been created after the last replication. Before running „vos salvage -update“ we did a new „vos release“

 > vos salvage 1108472992.115754.200635 Salvaging volume 1108472992 Object 1108472992.115754.200635.0 has wrong length on 34 (449740800 instead of 512037786) Object 1108472992.115828.200709.0: linkcount wrong on 5 (1 instead of 3) Object 1108472992.115830.200711.0: linkcount wrong on 5 (1 instead of 3) 1108472992: 49697 local (10.243 gb) and 8985 in OSDs (2.493 tb), 3 errors ATTENTION 

 > vos salvage 1108472992.115754.200635 ­update Salvaging volume 1108472992 Object 1108472992.115754.200635.0 has wrong length on 34 (449740800 instead of 512037786), repaired 1108472992: 49697 local (10.243 gb) and 8985 in OSDs (2.493 tb), 1 errors ATTENTION 

Page 58: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

58

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Damaged Partition

● Broken RAIDs happen about twice a year in our cell !

● If it's a fileserver partitions:

– If you consequently released RO-volumes to other servers

● You may run „vos convertROtoRW“ on the RO-volumes● Don't forget to create new RO-volumes after that !

– Otherwise you will have to restore dumps (takes much longer)

● If it's an OSD partitions

– run „vos listobjects“ for the lost OSD on all fileservers to get the object-ids

– If it's a non-archival OSD

● run „fs fidwipe“ for all these object-ids (the tag-suffix will be ignored)

– will fail for newly created files without archival copy.● run „fs fidprefetch“ to bring the wiped files on-line again

– If it's an archival OSD

● run „fs fidreplace <obj-id> <osd-number> -1“ to eliminate the archival copy

Page 59: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

59

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

AFS hanging or slow 1

If AFS seems to be slow or if files or directories seem to be blocked it makes sense to analyze what's going on on your server and client machines.

● Ask hanging clients with „cmdebug“ or „cmdebug -long“ to find which file is requested

● Use „rxdebug <client> 7001 -nodally“ to see active RPCs

● Use „fs threads -server <fileserver>“ to see load on a fileserver

– The last line is my „fs thread“ command itself (as I know from the IP address)

– The other FsCmd line is a “fs fidarchive“ running on the database server

● Use „osd threads <OSD>“ to find the corresponding OSD activity

 ~: fs th ­ser afs16 3 active threads found rpc FsCmd on 1108595910.590 from 130.183.9.5 rpc StoreData64 on 1108524165.480 from 134.107.107.11 rpc FsCmd on 0.15 from 130.183.2.114 ~: 

 ~: osd th hsm rpc create_archive on 1108595910.590.8415.0 from 130.183.30.16 rpc threads on 0.0.0.0 from 130.183.2.114 ~: 

Page 60: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

60

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

AFS hanging or slow 2

I don't have saved the output and steps of the analysis of such a hanger, but here is what happened

● There was a problem with GPFS and TSM-HSM on an archival OSD which let hang a write() to GPFS forever

● The write() belonged to an RXOSD_create_archive RPC for which a „fs fidarchive“ was waiting.

● „fs fidarchive“ has a WRITE_LOCK on the file's vnode because it must update the osdmetadata when the archiving is successful.

● This blocked many RXAFS_GetStatus and/or RXAFS_InlineBulkStatus RPCs

● Finally all threads of the server were blocked and the server didn't respond anymore

● A server restart only helped for a short time because the archiver script immediately started the next „fs fidarchive“ to that fileserver and the same thing happened again...

Page 61: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

61

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

   > fs stat afs14 Since  2009­09­17 05:08:22 (211842 seconds == 2 days, 10:50:42 hours) Total number of bytes received     211137205106  196 gb Total number of bytes sent          55456907335   51 gb rpc 65538 StoreData64               8707980 rpc   132 FetchStatus               1563205 rpc   147 GiveUpCallBacks            612517 rpc   135 StoreStatus               1906959 rpc   137 CreateFile                 314794 rpc 65537 FetchData64               2063199 rpc   157 ExtendLock                  89790 rpc   140 Link                        11107 rpc   136 RemoveFile                  52155 rpc   156 SetLock                    103698 rpc   158 ReleaseLock                103694 rpc 65560 OsdPolicy                    7713 rpc   138 Rename                     277955 rpc 65536 InlineBulkStatus           673076 rpc 65542 GetStatistics64              8254 rpc   146 GetStatistics                8254 rpc   130 FetchData                   41994 rpc   133 StoreData                   41586 rpc   139 Symlink                      1697 rpc   141 MakeDir                      8973 rpc   134 StoreACL                     6528 rpc   155 BulkStatus                  31685 rpc 65539 GiveUpAllCallBacks             35 rpc   142 RemoveDir                     394 rpc 65566 Statistic                       9

Server Statistcs

● „fs statistic <server> [-verbose]“ gives a statistic about data traffic and RPCs with “-verbose” you get the transfer rates per 15 minute interval

● „osd statistic <OSD> [-verbose]“ gives the same for OSDs

● „vos statistc <server> [-verbose]“ gives the same for the volserver, but without RPC statistc.

These commands allow you to see hot spots and give an idea about the data traffic in your cell.

Page 62: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

62

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

> vos statistic afs11 ­v/­­­­­ snip ­­­­­­/17:45­18:00     0 KB/s sent     0 KB/s received18:00­18:15     0 KB/s sent  3133 KB/s received18:15­18:30     0 KB/s sent  4756 KB/s received18:30­18:45     0 KB/s sent  2874 KB/s received18:45­19:00     0 KB/s sent  1511 KB/s received19:00­19:15     0 KB/s sent  2173 KB/s received19:15­19:30     0 KB/s sent  2176 KB/s received19:30­19:45     0 KB/s sent   809 KB/s received19:45­20:00     0 KB/s sent  1317 KB/s received20:00­20:15     0 KB/s sent   371 KB/s received20:15­20:30     0 KB/s sent  5660 KB/s received20:30­20:45     0 KB/s sent   671 KB/s received20:45­21:00     0 KB/s sent  4098 KB/s received21:00­21:15     0 KB/s sent  8706 KB/s received21:15­21:30     0 KB/s sent 10452 KB/s received21:30­21:45     0 KB/s sent  3892 KB/s received21:45­22:00     0 KB/s sent  5480 KB/s received22:00­22:15     0 KB/s sent  7339 KB/s received22:15­22:30     0 KB/s sent  8558 KB/s received22:30­22:45     0 KB/s sent  5603 KB/s received22:45­23:00     0 KB/s sent  6961 KB/s received23:00­23:15     0 KB/s sent  5522 KB/s received23:15­23:30     0 KB/s sent  2779 KB/s received23:30­23:45     0 KB/s sent   946 KB/s received23:45­24:00     0 KB/s sent     0 KB/s receivedSince  Sep 17 05:00 (716186 seconds == 8 days, 6:56:26 hours)Total number of bytes received     348728736471  324 gbTotal number of bytes sent                    0    0 bytes~:

Server Statistcs

● This example shows volserver traffic during nightly „vos release“.

● This server only keeps RO-copies of volumes belonging to one of the experiments.

● The vos release script is started at 18:00 by CRON.

● All intervals before 18:00 don't show activities.

Page 63: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

63

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'osd' 1

● The „osd“ command has the following subcommands which talk to the OSDDB

– createosd create new osd entry in the OSDDB

– setosd change fields for existing osd entry in OSDDB

– deleteosd mark osd entry as obsolete (will not really delete it)

– addpolicy create a policy entry in the OSDDB

– deletepolicy delete a policy entry in the OSDDB

– addserver create server entry in the OSDDB

– deleteserver delete server entry in the OSDDB

– list list OSDs known in the OSDDB

– policies list policies known in the OSDDB

– servers list servers known in the OSDDB

– osd list all fields of the osd entries in the OSDDB

● Use 'osd help <subcommand>' to get syntax and parameters information.

● All modifying commands can only used by administrators (in UserList of the server)

Page 64: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

64

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'osd' 2

● The „osd“ has the following subcommands to analyze or manage data in an OSD

– volumes show IDs all RW-volumes having data in the OSD

– objects show object IDs (Fids) of all objects of a volume

– examine show details of an object

– incrlinkcount increment link count of an object

– decrlinkcount decrement link count of an object

– read read contents of an onbject

– write over-write contents of an object

– md5sum let OSD calculate md5sum of an object

● Use 'osd help <subcommand>' to get syntax and parameters information.

● All these commands can be used only by administrators (in UserList of the server)

Page 65: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

65

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'osd' 3

● Other subcommands of 'osd';

– fetchqueue show fetch requests on archival HSM OSDs

– statistic show RPC statistic and data flow

– threads show active RPCs

– getvariable show value of a variable

– setvariable set new value to a variable (e.g. LogLevel)

– wipecandidate get sorted list of longest unused obejcts

– help get help texts

● Use 'osd help <subcommand>' to get syntax and parameters information.

● Some of these commands can be used only by administrators (in UserList of the server)

Page 66: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

66

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'vos'

● New subcommands of 'vos';

– archcand get sorted list of files which need archival copies

– statistic show data flow statistic

– listobjects show objects on specified OSD

– getvariable show value of a variable

– setvariable set new value to a variable (e.g. LogLevel)

– salvage check size and linkcounts of all objects in a volume

– traverse show file statistic on server or volume

– split split a volume at a specified directory vnode

● New options for subcommand 'dump'

-osd dump OSD files as normal files (include data)

-metadataonly dump only directories and OSD metadata (for 'dumptool')

Page 67: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

67

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'fs'

● New display subcommands of 'fs';

– statistic show RPC statistic of data flow

– threads show active RPCs

– [fid]vnode show vnode fields (and relative path)

– [fid]osd show OSD metadata of a file

– ls shows directory in 'ls -l' style with info about osd files

– getvariable show value of a variable (e.g. LogLevel)

– translate translate namei-path to fid or vice versa

– listlocked shows locked vnodes

– listoffline shows volumes taken offline by volserver and salvager

– fsynctrace shows trace table of fssync interface

Page 68: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

68

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'fs' 2

● New 'fs' subcommands acting on files:

– [fid]prefetch bring wiped OSD-file back on-line (asynchronously)

– [fid]archive create archive copy of OSD file

– [fid]wipe wipe on-line copy, keep only archival copies

– [fid]replaceosd move object on specified OSD to another one

– [fid]oldversion restore older version of OSD-file

– createstripedfile preallocate OSD file

● Other new modifying subcommands:

– setpolicy set policy on directory level

– setvariable set new value to a variable (e.g. LogLevel)

Page 69: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

69

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Command Reference 'afsio'

● 'afsio' is used to write or read files bypassing the cache manager. 'afsio' reads from stdin (for write) and writes to stdout (for read)

– write write date into an empty or new AFS file

– append append data at the end of an existing AFS file

– read read an AFS file

– help help information

● On some platforms 'afsio' is remarkably faster than I/O through the cache manager

~: afsio help readafsio read: read a file from AFSUsage: afsio read ­file <AFS­filename> [­cell <cellname>] [­verbose] [­help]~: 

~: afsio help appendafsio append: append to a file in AFSUsage: afsio append [­file <AFS­filename>] [­cell <cellname>] [­verbose] [­help]~: 

~: afsio help writeafsio write: write a file into AFSUsage: afsio write [­file <AFS­filename>] [­cell <cellname>] [­verbose] [­md5] [­help]Where: ­md5  calculate md5 checksum

Page 70: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

70

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

Questions or comments?

Thank you

Page 71: Tutorial OpenAFS + Object Storageafscon09/docs/reuter.pdf · “OpenAFS + Object Storage“ Is an extension to OpenAFS which allows files to be stored in object storage. The OSDs

71

RechenZentrum Garchingder Max-Planck-Gesellschaft

Sept 29, .2009 European AFS Conference 2009, Rome Hartmut Reuter

AFS cell „ipp-garching.mpg.de“

40 fileservers with 195 TB disk space

21 non-archival OSDs with 100 TB disk space

2 archival OSD with HSM system TSM-HSM,

will be replaced by HPSS by next year

27000 volumes

8700 users

340 TB total data

4 TB data written per day

8 TB data read per day