OpenAFS Administration Guide

OpenAFS Administration Guide i

OpenAFS Administration Guide

OpenAFS Administration Guide ii

Copyright © 2000 IBM Corporation. All Rights Reserved

OpenAFS Administration Guide iii

COLLABORATORS

TITLE :

OpenAFS Administration Guide

ACTION NAME DATE SIGNATURE

WRITTEN BY February 13, 2016

REVISION HISTORY

NUMBER DATE DESCRIPTION NAME

BP--1.6.x-4781-gc0876-dirty

3.6 April 2000 First IBM Edition, Document NumberGC09-4563-00

OpenAFS Administration Guide iv

Contents

I Concepts and Configuration Issues 1

1 An Overview of OpenAFS Administration 2

1.1 A Broad Overview of AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 AFS: A Distributed File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Servers and Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.4 Transparent Access and the Uniform Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.5 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.6 Efficiency Boosters: Replication and Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.7 Security: Mutual Authentication and Access Control Lists . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 More Detailed Discussions of Some Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Distributed File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3 Servers and Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.4 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.5 The Uniform Namespace and Transparent Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.6 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.7 Mount Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.8 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.9 Caching and Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 AFS Server Processes and the Cache Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 The File Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.2 The Basic OverSeer Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.3 The Protection Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.4 The Volume Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.5 The Volume Location (VL) Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.6 The Salvager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.7 The Update Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

OpenAFS Administration Guide v

1.3.8 The Backup Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.9 The Cache Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.10 The Kerberos KDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.11 The Network Time Protocol Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Issues in Cell Configuration and Administration 12

2.1 Differences between AFS and UNIX: A Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Differences in File and Directory Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Differences in Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.3 Differences in the Semantics of Standard UNIX Commands . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.4 The AFS version of the fsck Command and inode-based fileservers . . . . . . . . . . . . . . . . . . . . 13

2.1.5 Creating Hard Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.6 AFS Implements Save on Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.7 Setuid Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Choosing a Cell Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 How to Set the Cell Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Why Choosing the Appropriate Cell Name is Important . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Participating in the AFS Global Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 What the Global Namespace Looks Like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 Making Your Cell Visible to Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.3 Making Other Cells Visible in Your Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.4 Granting and Denying Foreign Users Access to Your Cell . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Configuring Your AFS Filespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 The Top /afs Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.2 The Second (Cellname) Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.3 The Third Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Creating Volumes to Simplify Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.1 Assigning Volume Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.2 Grouping Related Volumes on a Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.3 When to Replicate Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.4 The Default Quota and ACL on a New Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Configuring Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.1 Replicating the OpenAFS Administrative Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.2 AFS Files on the Local Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.3 Configuring Partitions to Store AFS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.4 Monitoring, Rebooting and Automatic Process Restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7 Configuring Client Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7.1 Configuring the Local Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7.2 Enabling Access to Foreign Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

OpenAFS Administration Guide vi

2.7.3 Using the @sys Variable in Pathnames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7.4 Setting Server Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Configuring AFS User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8.1 Choosing Usernames and Naming Other Account Components . . . . . . . . . . . . . . . . . . . . . . . 27

2.8.2 Grouping Home Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.8.3 Making a Backup Version of User Volumes Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8.4 Creating Standard Files in New AFS Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.9 Using AFS Protection Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.9.1 The Three System Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.9.2 The Two Types of User-Defined Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.10 Login and Authentication in AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.10.1 Identifying AFS Tokens by PAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.10.2 Using an AFS-modified login Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.10.3 Using Two-Step Login and Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.10.4 Obtaining, Displaying, and Discarding Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.10.5 Setting Default Token Lifetimes for Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.10.6 Changing Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.10.7 Imposing Restrictions on Passwords and Authentication Attempts . . . . . . . . . . . . . . . . . . . . . 35

2.10.8 Support for Kerberos Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.11 Security and Authorization in AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.11.1 Some Important Security Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.11.2 Three Types of Privilege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.11.3 Authorization Checking versus Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.11.4 Improving Security in Your Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.11.5 A More Detailed Look at Mutual Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.11.5.1 Simple Mutual Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.11.5.2 Complex Mutual Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.12 Backing Up AFS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.12.1 Backup Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.12.2 The AFS Backup System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.13 Accessing AFS through NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

II Managing File Server Machines 41

3 Administering Server Machines 42

3.1 Summary of Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Local Disk Files on a Server Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2.1 Binaries in the /usr/afs/bin Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2.2 Common Configuration Files in the /usr/afs/etc Directory . . . . . . . . . . . . . . . . . . . . . . . . . . 44

OpenAFS Administration Guide vii

3.2.3 Local Configuration Files in the /usr/afs/local Directory . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.4 Replicated Database Files in the /usr/afs/db Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.5 Log Files in the /usr/afs/logs Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.6 Volume Headers on Server Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 The Four Roles for File Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3.1 Simple File Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.2 Database Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.3 Binary Distribution Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.4 The System Control Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.5 To locate database server machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.6 To locate the system control machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.7 To locate the binary distribution machine for a system type . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.8 Interpreting the Output from the bos status Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.8.1 The Output on the System Control Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3.8.2 The Output on a Binary Distribution Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Administering Database Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4.1 Replicating the OpenAFS Administrative Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4.1.1 Configuring the Cell for Proper Ubik Operation . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4.1.2 How Ubik Operates Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.1.2.1 How Ubik Uses Timestamped Messages . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.1.2.2 A Flexible Coordinator Boosts Availability . . . . . . . . . . . . . . . . . . . . . . . 54

3.4.2 Backing Up and Restoring the Administrative Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4.3 To back up the administrative databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.4 To restore an administrative database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.5 Installing Server Process Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5.1 Installing New Binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5.2 To install new server binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5.3 Reverting to the Previous Version of Binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5.4 To revert to the previous version of binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5.5 Displaying Binary Version Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5.6 To display binary version dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5.7 Removing Obsolete Binary Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5.8 To remove obsolete binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5.9 Displaying A Binary File’s Build Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.5.10 To display an AFS binary’s build level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.6 Maintaining the Server CellServDB File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.6.1 Distributing the Server CellServDB File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.6.2 To display a cell’s database server machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.6.3 To add a database server machine to the CellServDB file . . . . . . . . . . . . . . . . . . . . . . . . . . 63

OpenAFS Administration Guide viii

3.6.4 To remove a database server machine from the CellServDB file . . . . . . . . . . . . . . . . . . . . . . 63

3.7 Managing Authentication and Authorization Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.7.1 Authentication versus Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.7.2 Controlling Authorization Checking on a Server Machine . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.7.3 To disable authorization checking on a server machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.7.4 To enable authorization checking on a server machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.7.5 Bypassing Mutual Authentication for an Individual Command . . . . . . . . . . . . . . . . . . . . . . . 66

3.7.6 To bypass mutual authentication for bos, kas, pts, and vos commands . . . . . . . . . . . . . . . . . . . 66

3.7.7 To bypass mutual authentication for fs commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.8 Adding or Removing Disks and Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.8.1 To add and mount a new disk to house AFS volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.8.2 To unmount and remove a disk housing AFS volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.9 Managing Server IP Addresses and VLDB Server Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.9.1 To create or edit the server NetInfo file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.9.2 To create or edit the server NetRestrict file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.9.3 To display all server entries from the VLDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.9.4 To remove obsolete server entries from the VLDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.9.5 To change a server machine’s IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.10 Rebooting a Server Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.10.1 To reboot a file server machine from its console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.10.2 To reboot a file server machine remotely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Monitoring and Controlling Server Processes 73


4.2 Brief Descriptions of the AFS Server Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2.1 The bosserver Process: the Basic OverSeer Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.2 The buserver Process: the Backup Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.3 The fs Collection of Processes: the File Server, Volume Server and Salvager . . . . . . . . . . . . . . . . 74

4.2.4 The kaserver Process: the Authentication Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2.5 The ptserver Process: the Protection Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2.6 The upserver and upclient Processes: the Update Server . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2.7 The vlserver Process: the Volume Location Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3 Controlling and Checking Process Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.1 The Information in the BosConfig File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.2 How the BOS Server Uses the Information in the BosConfig File . . . . . . . . . . . . . . . . . . . . . . 78

4.3.3 About Starting and Stopping the Database Server Processes . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.4 About Starting and Stopping the Update Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.4 Displaying Process Status and Information from the BosConfig File . . . . . . . . . . . . . . . . . . . . . . . . 79

4.4.1 To display the status of server processes and their BosConfig entries . . . . . . . . . . . . . . . . . . . . 79

OpenAFS Administration Guide ix

4.5 Creating and Removing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.5.1 To create and start a new process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.5.2 To stop a process and remove it from the BosConfig file . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6 Stopping and Starting Processes Permanently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6.1 To stop a process by changing its status to NotRun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6.2 To start processes by changing their status flags to Run . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.7 Stopping and Starting Processes Temporarily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.7.1 To stop processes temporarily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.7.2 To start all stopped processes that have status flag Run in the BosConfig file . . . . . . . . . . . . . . . . 85

4.7.3 To start specific processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.8 Stopping and Immediately Restarting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.8.1 To stop and restart all processes including the BOS Server . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.8.2 To stop and immediately restart all processes except the BOS Server . . . . . . . . . . . . . . . . . . . . 86

4.8.3 To stop and immediately restart specific processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.9 Setting the BOS Server’s Restart Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.9.1 To display the BOS Server restart times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.9.2 To set the general or binary restart time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.10 Displaying Server Process Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.10.1 To examine a server process log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 Managing Volumes 90


5.2 About Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.1 The Three Types of Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.2 How Volumes Improve AFS Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.3 Volume Information in the VLDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.4 The Information in Volume Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.5 Keeping the VLDB and Volume Headers Synchronized . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.6 About Mounting Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.2.7 About Volume Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3 Creating Read/write Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3.1 To create (and mount) a read/write volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4 About Clones and Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.5 Replicating Volumes (Creating Read-only Volumes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.5.1 Using Read-only Volumes Effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.5.2 Replication Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.5.3 To replicate a read/write volume (create a read-only volume) . . . . . . . . . . . . . . . . . . . . . . . . 98

5.6 Creating Backup Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6.1 Backing Up Multiple Volumes at Once . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

OpenAFS Administration Guide x

5.6.2 Automating Creation of Backup Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.6.3 Making the Contents of Backup Volumes Available to Users . . . . . . . . . . . . . . . . . . . . . . . . 101

5.6.4 To create and mount a backup volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.6.5 To create multiple backup volumes at once . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.7 Mounting Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.7.1 The Rules of Mount Point Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.7.2 The Three Types of Mount Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.7.3 Creating a mount point in a foreign cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.7.4 To display a mount point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.7.5 To create a regular or read/write mount point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.7.6 To create a cellular mount point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.7.7 To remove a mount point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.7.8 To access volumes directly by volume ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.8 Displaying Information About Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.8.1 Displaying VLDB Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.8.2 To display VLDB entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.8.3 Displaying Volume Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.8.4 To display volume headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.8.5 Displaying One Volume’s VLDB Entry and Volume Header . . . . . . . . . . . . . . . . . . . . . . . . 112

5.8.6 To display one volume’s VLDB entry and volume header . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.8.7 Displaying the Name or Location of the Volume that Contains a File . . . . . . . . . . . . . . . . . . . . 113

5.8.7.1 To display the name of the volume that contains a file . . . . . . . . . . . . . . . . . . . . . . 113

5.8.7.2 To display the ID number of the volume that contains a file . . . . . . . . . . . . . . . . . . . 113

5.8.7.3 To display the location of the volume that contains a file . . . . . . . . . . . . . . . . . . . . . 114

5.9 Moving Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.9.1 To move a read/write volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.10 Synchronizing the VLDB and Volume Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.10.1 To synchronize the VLDB with volume headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.11 Salvaging Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.11.1 To salvage volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.12 Setting and Displaying Volume Quota and Current Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.12.1 To set quota for a single volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.12.2 To set maximum quota on one or more volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.12.3 To display percent quota used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.12.4 To display quota, current size, and other information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.12.5 To display quota, current size, and more partition information . . . . . . . . . . . . . . . . . . . . . . . 123

5.13 Removing Volumes and their Mount Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.13.1 Other Removal Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.13.2 To remove a volume and unmount it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

OpenAFS Administration Guide xi

5.14 Dumping and Restoring Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.14.1 About Dumping Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.14.2 To dump a volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.14.3 About Restoring Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.14.4 To restore a dump into a new volume and mount it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.14.5 To restore a dump file, overwriting an existing volume . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.15 Renaming Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.15.1 To rename a volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.16 Unlocking and Locking VLDB Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.16.1 To lock a VLDB entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.16.2 To unlock a single VLDB entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.16.3 To unlock multiple VLDB entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6 Configuring the AFS Backup System 132


6.2 Introduction to Backup System Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.2.1 Volume Sets and Volume Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.2 Dumps and Dump Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.3 Dump Hierarchies, Dump Levels and Expiration Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.4 Dump Names and Tape Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.5 Tape Labels, Dump Labels, and EOF Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.6 Tape Coordinator Machines, Port Offsets, and Backup Data Files . . . . . . . . . . . . . . . . . . . . . 135

6.2.7 The Backup Database and Backup Server Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.2.8 Interfaces to the Backup System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.3 Overview of Backup System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.4 Configuring the tapeconfig File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.4.1 To run the fms command on a noncompressing tape device . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.5 Granting Administrative Privilege to Backup Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6 Configuring Tape Coordinator Machines and Tape Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.6.1 To configure a Tape Coordinator machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.6.2 To configure an additional Tape Coordinator on an existing Tape Coordinator machine . . . . . . . . . . 141

6.6.3 To unconfigure a Tape Coordinator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.6.4 To display the list of configured Tape Coordinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.7 Defining and Displaying Volume Sets and Volume Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.7.1 To create a volume set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.7.2 To add a volume entry to a volume set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.7.3 To display volume sets and volume entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.7.4 To delete a volume set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.7.5 To delete a volume entry from a volume set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

OpenAFS Administration Guide xii

6.8 Defining and Displaying the Dump Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.8.1 Creating a Tape Recycling Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.8.2 Archiving Tapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.8.3 Defining Expiration Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.8.4 To add a dump level to the dump hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.8.5 To change a dump level’s expiration date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.8.6 To delete a dump level from the dump hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.8.7 To display the dump hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.9 Writing and Reading Tape Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.9.1 Recording a Name on the Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.9.2 Recording a Capacity on the Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.9.3 To label a tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6.9.4 To read the label on a tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

6.10 Automating and Increasing the Efficiency of the Backup Process . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.10.1 Creating a Device Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

6.10.2 Invoking a Device’s Tape Mounting and Unmounting Routines . . . . . . . . . . . . . . . . . . . . . . . 157

6.10.2.1 How the Tape Coordinator Uses the MOUNT and UNMOUNT Instructions . . . . . . . . . . 158

6.10.2.2 The Available Parameters and Required Exit Codes . . . . . . . . . . . . . . . . . . . . . . . 158

6.10.3 Eliminating the Search or Prompt for the Initial Tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.10.4 Enabling Default Responses to Error Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.10.5 Eliminating the AFS Tape Name Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6.10.6 Setting the Memory Buffer Size to Promote Tape Streaming . . . . . . . . . . . . . . . . . . . . . . . . 161

6.10.7 Dumping Data to a Backup Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.10.8 To configure a backup data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7 Backing Up and Restoring AFS Data 165


7.2 Using the Backup System’s Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.2.1 Performing Backup Operations as the Local Superuser Root or in a Foreign Cell . . . . . . . . . . . . . 166

7.2.2 Using Interactive and Regular Command Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7.2.3 To enter interactive mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.2.4 To exit interactive mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.2.5 To display pending or running jobs in interactive mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.2.6 To cancel operations in interactive mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.2.7 Starting and Stopping the Tape Coordinator Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.2.8 To start a Tape Coordinator process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.2.9 To stop a Tape Coordinator process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7.2.10 To check the status of a Tape Coordinator process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7.3 Backing Up Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

OpenAFS Administration Guide xiii

7.3.1 Making Backup Operations More Efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

7.3.2 How Your Configuration Choices Influence the Dump Process . . . . . . . . . . . . . . . . . . . . . . . 174

7.3.3 Appending Dumps to an Existing Dump Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

7.3.4 Scheduling Dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7.3.5 To create a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7.4 Displaying Backup Dump Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7.4.1 To display dump records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7.4.2 To display a volume’s dump history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

7.4.3 To scan the contents of a tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.5 Restoring and Recovering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

7.5.1 Making Restore Operations More Efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

7.5.2 Using the backup volrestore Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.5.3 To restore volumes with the backup volrestore command . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.5.4 Using the backup diskrestore Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

7.5.5 To restore a partition with the backup diskrestore command . . . . . . . . . . . . . . . . . . . . . . . . 190

7.5.6 Using the backup volsetrestore Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.5.6.1 Restoring a Volume Set with the -name Argument . . . . . . . . . . . . . . . . . . . . . . . . 191

7.5.6.2 Restoring Volumes Listed in a File with the -file Argument . . . . . . . . . . . . . . . . . . . 192

7.5.7 To restore a group of volumes with the backup volsetrestore command . . . . . . . . . . . . . . . . . . . 193

7.6 Maintaining the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.6.1 Backing Up and Restoring the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.6.2 Checking for and Repairing Corruption in the Backup Database . . . . . . . . . . . . . . . . . . . . . . 195

7.6.3 To verify the integrity of the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.6.4 To repair corruption in the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.6.5 Removing Obsolete Records from the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . 197

7.6.6 To delete dump records from the Backup Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

8 Monitoring and Auditing AFS Performance 200


8.2 Using the scout Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

8.2.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

8.2.2 Using the -basename argument to Specify a Domain Name . . . . . . . . . . . . . . . . . . . . . . . . . 201

8.2.3 The Layout of the scout Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

8.2.3.1 The Banner Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

8.2.3.2 The Statistics Display Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

8.2.3.3 The Probe Reporting Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.2.4 Highlighting Significant Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.2.4.1 Highlighting Server Outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.2.4.2 Highlighting for Extreme Statistic Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

OpenAFS Administration Guide xiv

8.2.5 Resizing the scout Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.2.6 To start the scout program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

8.2.7 To stop the scout program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.2.8 Example Commands and Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.3 Using the fstrace Command Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

8.3.1 About the fstrace Command Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

8.3.2 Requirements for Using the fstrace Command Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

8.3.3 Using fstrace Commands Effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

8.3.4 Activating the Trace Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

8.3.5 To configure the trace log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

8.3.6 To set the event set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

8.3.7 Displaying the State of a Trace Log or Event Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

8.3.8 To display the state of an event set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

8.3.9 To display the log size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

8.3.10 Dumping and Clearing the Trace Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

8.3.11 To dump the contents of a trace log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

8.3.12 To clear the contents of a trace log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

8.3.13 Examples of fstrace Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

8.4 Using the afsmonitor Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

8.4.1 Requirements for running the afsmonitor program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

8.4.2 The afsmonitor Output Screens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

8.4.3 The System Overview Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.4.4 The File Servers Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.4.5 The Cache Managers Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

8.5 Configuring the afsmonitor Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

8.6 Writing afsmonitor Statistics to a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

8.7 To start the afsmonitor Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

8.8 To stop the afsmonitor program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8.9 The xstat Data Collection Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8.9.1 The libxstat Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

8.9.2 Example xstat Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

8.9.2.1 To use the example xstat_fs_test command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

8.9.2.2 To use the example xstat_cm_test command . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.10 Auditing AFS Events on AIX File Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.10.1 Configuring AFS Auditing on AIX File Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

8.10.2 To enable AFS auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

8.10.3 To disable AFS auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

OpenAFS Administration Guide xv

9 Managing Server Encryption Keys 227


9.2 About Server Encryption Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

9.2.1 Keys and Mutual Authentication: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

9.2.2 Maintaining AFS Server Encryption Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

9.3 Displaying Server Encryption Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.3.1 To display the KeyFile file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.3.2 To display the afs key from the Authentication Database . . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.4 Adding Server Encryption Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9.4.1 To add a new server encryption key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9.5 Removing Server Encryption Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

9.5.1 To remove a key from the KeyFile file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

9.6 Handling Server Encryption Key Emergencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

9.6.1 Prevent Mutual Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.2 Disable Authorization Checking by Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.3 Work Quickly on Each Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.4 Work at the Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.5 Change Individual KeyFile Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.6 Two Component Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.6.6.1 Disabling Authorization Checking in an Emergency . . . . . . . . . . . . . . . . . . . . . . . 235

9.6.6.2 Reenabling Authorization Checking in an Emergency . . . . . . . . . . . . . . . . . . . . . . 235

9.6.7 To create a new server encryption key in emergencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

III Managing Client Machines 238

10 Administering Client Machines and the Cache Manager 239


10.2 Overview of Cache Manager Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

10.3 Configuration and Cache-Related Files on the Local Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

10.3.1 Configuration Files in the /usr/vice/etc Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

10.3.2 Cache-Related Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

10.4 Determining the Cache Type, Size, and Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

10.4.1 Choosing the Cache Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

10.4.2 Displaying and Setting the Cache Size and Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

10.4.3 To display the cache size set at reboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

10.4.4 To display the current cache size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

10.4.5 To edit the cacheinfo file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

10.4.6 To change the disk cache size without rebooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

10.4.7 To reset the disk cache size to the default without rebooting . . . . . . . . . . . . . . . . . . . . . . . . 245

OpenAFS Administration Guide xvi

10.4.8 How the Cache Manager Chooses Data to Discard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

10.5 Setting Other Cache Parameters with the afsd program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

10.5.1 Setting Cache Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

10.5.2 Configuring a Disk Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

10.5.3 Controlling Memory Cache Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

10.5.4 Tuning Cache Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

10.6 Maintaining Knowledge of Database Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

10.6.1 How Clients Use the List of Database Server Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

10.6.2 The Format of the CellServDB file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

10.6.3 Maintaining the Client CellServDB File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

10.6.4 To display the /usr/vice/etc/CellServDB file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

10.6.5 To display the list of database server machines in kernel memory . . . . . . . . . . . . . . . . . . . . . . 252

10.6.6 To change the list of a cell’s database server machines in kernel memory . . . . . . . . . . . . . . . . . . 252

10.7 Determining if a Client Can Run Setuid Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

10.7.1 To determine a cell’s setuid status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

10.7.2 To change a cell’s setuid status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

10.8 Setting the File Server Probe Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

10.8.1 To set a client’s file server probe interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

10.9 Setting a Client Machine’s Cell Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

10.9.1 To display a client machine’s cell membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

10.9.2 To set a client machine’s cell membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

10.10Forcing the Update of Cached Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

10.10.1 To flush certain files or directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

10.10.2 To flush all data from a volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

10.10.3 To force the Cache Manager to notice other volume changes . . . . . . . . . . . . . . . . . . . . . . . . 257

10.10.4 To flush one or more mount points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

10.11Maintaining Server Preference Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

10.11.1 How the Cache Manager Sets Default Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

10.11.2 How the Cache Manager Uses Preference Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

10.11.3 Displaying and Setting Preference Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

10.11.4 To display server preference ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

10.11.5 To set server preference ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

10.12Managing Multihomed Client Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

10.12.1 To create or edit the client NetInfo file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

10.12.2 To create or edit the client NetRestrict file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

10.12.3 To display the list of addresses from kernel memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

10.12.4 To set the list of addresses in kernel memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

10.13Controlling the Display of Warning and Informational Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 262

10.13.1 To control the display of warning and status messages . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

OpenAFS Administration Guide xvii

10.14Displaying and Setting the System Type Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.14.1 To display the system type name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.14.2 To change the system type name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.15Enabling Asynchronous Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

10.15.1 To set the default store asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

10.15.2 To set the store asynchrony for one or more files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

10.15.3 To display the default store asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

10.15.4 To display the store asynchrony for one or more files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

IV Managing Users and Groups 266

11 Creating and Deleting User Accounts with the uss Command Suite 267


11.2 Overview of the uss Command Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

11.2.1 The Components of an AFS User Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

11.2.2 Privilege Requirements for the uss Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

11.2.3 Avoiding and Recovering from Errors and Interrupted Operations . . . . . . . . . . . . . . . . . . . . . 268

11.3 Creating Local Password File Entries with uss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

11.3.1 Assigning AFS and UNIX UIDs that Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

11.3.2 Specifying Passwords in the Local Password File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

11.3.3 Creating a Common Source Password File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

11.4 Converting Existing UNIX Accounts with uss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

11.4.1 Making UNIX and AFS UIDs Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

11.4.2 Setting the Password Field Appropriately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

11.4.3 Moving Local Files into AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

11.5 Constructing a uss Template File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

11.5.1 Creating the Three Types of User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

11.5.2 Using Constants and Variables in the Template File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

11.5.3 Where to Place Template Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

11.5.4 Some General Rules for Constructing a Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

11.5.5 About Creating Local Disk Directories and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

11.5.6 Example uss Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

11.5.7 Evenly Distributing User Home Directories with the G Instruction . . . . . . . . . . . . . . . . . . . . . 278

11.5.8 Creating a Volume with the V Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

11.5.9 Creating a Directory with the D Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

11.5.10 Creating a File from a Prototype with the F Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

11.5.11 Creating One-Line Files with the E Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

11.5.12 Creating Links with the L and S Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

11.5.13 Increasing Account Security with the A Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

OpenAFS Administration Guide xviii

11.5.14 Executing Commands with the X Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

11.6 Creating Individual Accounts with the uss add Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

11.6.1 To create an AFS account with the uss add command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

11.7 Deleting Individual Accounts with the uss delete Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

11.7.1 To delete an AFS account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

11.8 Creating and Deleting Multiple Accounts with the uss bulk Command . . . . . . . . . . . . . . . . . . . . . . . 293

11.8.1 Constructing a Bulk Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

11.8.2 Example Bulk Input File Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

11.8.3 To create and delete multiple AFS user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

12 Administering User Accounts 298


12.2 The Components of an AFS User Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

12.3 Creating Local Password File Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

12.3.1 Assigning AFS and UNIX UIDs that Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

12.3.2 Specifying Passwords in the Local Password File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

12.4 Converting Existing UNIX Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

12.4.1 Making UNIX and AFS UIDs Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

12.4.2 Setting the Password Field Appropriately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

12.4.3 Moving Local Files into AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

12.5 Creating AFS User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

12.5.1 To create one user account with individual commands . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

12.6 Improving Password and Authentication Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

12.6.1 To limit the number of consecutive failed authentication attempts . . . . . . . . . . . . . . . . . . . . . 307

12.6.2 To unlock a locked user account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

12.6.3 To set password lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

12.6.4 To prohibit reuse of passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

12.7 Changing AFS Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

12.7.1 To change an AFS password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

12.8 Displaying and Setting the Quota on User Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

12.9 Changing Usernames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

12.9.1 To change a username . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

12.10Removing a User Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

12.10.1 To remove a user account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

OpenAFS Administration Guide xix

13 Administering the Protection Database 315


13.2 About the Protection Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13.2.1 The System Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

13.3 Displaying Information from the Protection Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

13.3.1 To display a Protection Database entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

13.3.2 To display group membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

13.3.3 To list the groups that a user or group owns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

13.3.4 To display all Protection Database entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

13.4 Creating User and Machine Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

13.4.1 To create machine entries in the Protection Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

13.5 Creating Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

13.5.1 Using Groups Effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

13.5.2 To create groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

13.5.3 To create a self-owned group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

13.5.4 Using Prefix-Less Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

13.6 Adding and Removing Group Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

13.6.1 To add users and machines to groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

13.6.2 To remove users and machines from groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

13.7 Deleting Protection Database Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

13.7.1 To delete Protection Database entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

13.8 Changing a Group’s Owner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

13.8.1 To change a group’s owner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

13.9 Changing a Protection Database Entry’s Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

13.9.1 To change the name of a machine or group entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

13.10Setting Group-Creation Quota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

13.10.1 To set group-creation quota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

13.11Setting the Privacy Flags on Database Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

13.11.1 To set a Protection Database entry’s privacy flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

13.12Displaying and Setting the AFS UID and GID Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

13.12.1 To display the AFS ID counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

13.12.2 To set the AFS ID counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

14 Managing Access Control Lists 332


14.2 Protecting Data in AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

14.2.1 Differences Between UFS and AFS Data Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

14.2.2 The AFS ACL Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

14.2.2.1 The Four Directory Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

OpenAFS Administration Guide xx

14.2.2.2 The Three File Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

14.2.2.3 The Eight Auxiliary Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

14.2.2.4 Shorthand Notation for Sets of Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

14.2.3 Using Normal and Negative Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

14.2.4 Using Groups on ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

14.3 Displaying ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

14.3.1 To display an ACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

14.4 Setting ACL Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

14.4.1 To add, remove, or edit normal ACL permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

14.4.2 To add, remove, or edit negative ACL permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

14.5 Completely Replacing an ACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

14.5.1 To replace an ACL completely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

14.6 Copying ACLs Between Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

14.6.1 To copy an ACL between directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

14.7 Removing Obsolete AFS IDs from ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

14.7.1 To clean obsolete AFS IDs from an ACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

14.8 How AFS Interprets the UNIX Mode Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

15 Managing Administrative Privilege 343


15.2 An Overview of Administrative Privilege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

15.2.1 The Reason for Separate Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

15.3 Administering the system:administrators Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

15.3.1 To display the members of the system:administrators group . . . . . . . . . . . . . . . . . . . . . . . . . 344

15.3.2 To add users to the system:administrators group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

15.3.3 To remove users from the system:administrators group . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

15.4 Granting Privilege for kas Commands: the ADMIN Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

15.4.1 To check if the ADMIN flag is set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

15.4.2 To set or remove the ADMIN flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

15.5 Administering the UserList File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

15.5.1 To display the users in the UserList file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

15.5.2 To add users to the UserList file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

15.5.3 To remove users from the UserList file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

V Managing the NFS/AFS Translator 348.1 Summary of Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

.2.1 Enabling Unauthenticated or Authenticated AFS Access . . . . . . . . . . . . . . . . . . . . . . . . . . 349

.2.2 Setting the AFSSERVER and AFSCONF Environment Variables . . . . . . . . . . . . . . . . . . . . . . 350

OpenAFS Administration Guide xxi

.2.2.1 The AFSSERVER Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

.2.2.2 The AFSCONF Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

.2.2.3 Setting Values for the Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

.2.3 Delayed Writes for Files Saved on NFS Client Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 351

.3 Configuring NFS/AFS Translator Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

.3.1 Loading NFS and AFS Kernel Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

.3.2 Configuring the Translator Machine to Accept AFS Commands . . . . . . . . . . . . . . . . . . . . . . 352

.3.3 Controlling Optional Translator Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

.3.4 To configure an NFS/AFS translator machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

.3.5 To disable or enable Translator functionality, or set optional features . . . . . . . . . . . . . . . . . . . . 354

.4 Configuring NFS Client Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

.4.1 To configure an NFS client machine to access AFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

.5 Configuring User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

.5.1 To configure a user account for issuing AFS commands . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

.6 Authenticating on Unsupported NFS Client Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

.6.1 To authenticate using the knfs command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

.6.2 To display tokens using the knfs command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

.6.3 To discard tokens using the knfs command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359

VI Using AFS Commands 360.7 AFS Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

.7.1 Command Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

.7.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

.7.3 Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

.7.4 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

.7.5 An Example Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

.7.6 Rules for Entering AFS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

.7.6.1 Conditions for Omitting Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

.7.6.2 An Example of Omitting Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

.7.7 Rules for Using Abbreviations and Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

.7.7.1 Abbreviating Operation Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

.7.7.2 Abbreviating Switches and Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

.7.7.3 Abbreviating Server Machine Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

.7.7.4 Abbreviating Partition Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

.7.7.5 Abbreviating Cell Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

.7.8 Displaying Online Help for AFS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

OpenAFS Administration Guide xxii

VII The afsmonitor Program Statistics 367.8 The Cache Manager Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

.8.1 Performance Statistics Section (PerfStats_section) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

.8.2 Server Up/Down Statistics Section (Server_UpDown_section) . . . . . . . . . . . . . . . . . . . . . . . 370

.8.3 RPC Operation Measurements Section (RPCop_section) . . . . . . . . . . . . . . . . . . . . . . . . . . 373

.8.4 Authentication and Replicated File Access Section (Auth_Access_section) . . . . . . . . . . . . . . . . 385

.9 The File Server Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

.9.1 Performance Statistics Section (PerfStats_section) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

.9.2 RPC Operations Section (RPCop_section) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388

.9.3 CallBack Statistics Section (CallBackStats_section) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

VIII AIX Audit Events 396.10 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

.11 Audit-Specific Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

.12 Volume Server Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

.13 Backup Server Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

.14 Protection Server Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

.15 Authentication Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

.16 File Server and Cache Manager Interface Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

.17 BOS Server Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

.18 Volume Location Server Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

16 Index 406

OpenAFS Administration Guide xxiii

List of Figures

5.1 File Sharing Between the Read/write Source and a Clone Volume . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.1 First example scout display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.2 Second example scout display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8.3 Third example scout display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.4 Fourth example scout display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.5 The afsmonitor System Overview Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.6 The afsmonitor File Servers Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

8.7 The afsmonitor File Servers Screen Shifted One Page to the Right . . . . . . . . . . . . . . . . . . . . . . . . . 219

8.8 The afsmonitor Cache Managers Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

OpenAFS Administration Guide xxiv

List of Tables

2.1 Suggested volume prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Example volume-prefixing scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11.1 Source for values of uss template variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

11.2 Command-line argument sources for uss template variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Abstract

This document describes the configuration and maintenance of OpenAFS from an adminstrator’s standpoint. It provides a concep-tual overview of OpenAFS for cell administrators, details on OpenAFS configuration, and information about managing OpenAFSservers and clients.

OpenAFS Administration Guide xxvi

About This Guide

This section describes the purpose, organization, and conventions of this document.

Audience and Purpose

This guide describes the concepts and procedures that an AFS(R) system administrator needs to know. It assumes familiaritywith UNIX(R) administration, but no previous knowledge of AFS.

This document describes AFS commands in the context of specific tasks. Thus, it does not describe all commands in detail. Referto the OpenAFS Administration Reference for detailed command descriptions.

Document Organization

This document groups AFS administrative tasks into the following conceptual sections:

• Concepts and Configuration Issues

• Managing File Server Machines

• Managing Client Machines

• Managing Users and Groups

The individual chapters in each section contain the following:

• A chapter overview

• A quick reference list of the tasks and commands described in the chapter

• An introduction to concepts that pertain to all of the tasks described in the chapter

• A set of sections devoted to specific tasks. Each section begins with a discussion of concepts specific to that task, followed bystep-by-step instructions for performing the task. The instructions are as specific as has been judged practical. If two relatedprocedures differ from one another in important details, separate sets of instructions are usually provided.

How to Use This Document

When you need to perform a specific administrative task, follow these steps:

1. Determine if the task concerns file server machines, client machines, or users and groups. Turn to the appropriate sectionin this document and then to the appropriate chapter.

2. Read or review the general introductory material at the beginning of the chapter.

3. Read or review the introductory material concerning the specific task you wish to perform.

4. Follow the step-by-step instructions for the task.

5. If necessary, refer to the OpenAFS Administration Reference for more detailed information about the commands.

OpenAFS Administration Guide xxvii

Related Documents

The following documents are also included in the AFS documentation set.

OpenAFS Quick Start Guide This guide provides instructions for installing AFS server and client machines. It is assumed thatthe installer is an experienced UNIX(R) system administrator.

For predictable performance, machines must be installed and configured in accordance with the instructions in this guide.

OpenAFS User Guide This guide presents the basic concepts and procedures necessary for using AFS effectively. It assumesthat the reader has some experience with UNIX, but does not require familiarity with networking or AFS.

The guide explains how to perform basic functions, including authenticating, changing a password, protecting AFS data,creating groups, and troubleshooting. It provides illustrative examples for each function and describes some of the differ-ences between the UNIX file system and AFS.

OpenAFS Reference Manual This reference manual details the syntax and effect of each AFS command. It is intended for theexperienced AFS administrator, programmer, or user. It contains a reference page for each command or file specifyingits syntax, including the acceptable aliases and abbreviations. It then describes the command’s function, arguments, andoutput if any. Examples and a list of related commands are provided, as are warnings where appropriate.

This manual complements the OpenAFS Administration Guide: it does not include procedural information, but describescommands in more detail than the OpenAFS Administration Guide.

The OpenAFS Reference Manual is provided in the form of UNIX(R) manual pages and as HTML pages.

OpenAFS for Windows Release Notes This document provides a series of usage notes regarding the OpenAFS for Windowsclient, supported platforms, contribution information, debugging techniques, and a reference to supported Windows reg-istry values.

Typographical Conventions

This document uses the following typographical conventions:

• Command and option names appear in bold type in syntax definitions, examples, and running text. Names of directories, files,machines, partitions, volumes, and users also appear in bold type.

• Variable information appears in italic type. This includes user-supplied information on command lines and the parts of promptsthat differ depending on who issues the command. New terms also appear in italic type.

• Examples of screen output and file contents appear in monospace type.

In addition, the following symbols appear in command syntax definitions, both in the documentation and in AFS online helpstatements. When issuing a command, do not type these symbols.

• Square brackets [ ] surround optional items.

• Angle brackets < > surround user-supplied values in AFS commands.

• A superscripted plus sign + follows an argument that accepts more than one value.

• The percent sign % represents the regular command shell prompt. Some operating systems possibly use a different characterfor this prompt.

• The number sign # represents the command shell prompt for the local superuser root. Some operating systems possibly use adifferent character for this prompt.

• The pipe symbol | in a command syntax statement separates mutually exclusive values for an argument.

For additional information on AFS commands, including a description of command string components, acceptable abbreviationsand aliases, and how to get online help for commands, see Appendix B, Using AFS Commands.

OpenAFS Administration Guide 1 / 433

Part I

Concepts and Configuration Issues


Chapter 1

An Overview of OpenAFS Administration

This chapter provides a broad overview of the concepts and organization of AFS. It is strongly recommended that anyone involvedin administering an AFS cell read this chapter before beginning to issue commands.

1.1 A Broad Overview of AFS

This section introduces most of the key terms and concepts necessary for a basic understanding of AFS. For a more detaileddiscussion, see More Detailed Discussions of Some Basic Concepts.

1.1.1 AFS: A Distributed File System

AFS is a distributed file system that enables users to share and access all of the files stored in a network of computers as easilyas they access the files stored on their local machines. The file system is called distributed for this exact reason: files can resideon many different machines (be distributed across them), but are available to users on every machine.

1.1.2 Servers and Clients

AFS stores files on file server machines. File server machines provide file storage and delivery service, along with other special-ized services, to the other subset of machines in the network, the client machines. These machines are called clients because theymake use of the servers’ services while doing their own work. In a standard AFS configuration, clients provide computationalpower, access to the files in AFS and other "general purpose" tools to the users seated at their consoles. There are generally manymore client workstations than file server machines.

AFS file server machines run a number of server processes, so called because each provides a distinct specialized service: onehandles file requests, another tracks file location, a third manages security, and so on. To avoid confusion, AFS documentationalways refers to server machines and server processes, not simply to servers. For a more detailed description of the serverprocesses, see AFS Server Processes and the Cache Manager.

1.1.3 Cells

A cell is an administratively independent site running AFS. As a cell’s system administrator, you make many decisions aboutconfiguring and maintaining your cell in the way that best serves its users, without having to consult the administrators in othercells. For example, you determine how many clients and servers to have, where to put files, and how to allocate client machinesto users.


1.1.4 Transparent Access and the Uniform Namespace

Although your AFS cell is administratively independent, you probably want to organize the local collection of files (your filespaceor tree) so that users from other cells can also access the information in it. AFS enables cells to combine their local filespacesinto a global filespace, and does so in such a way that file access is transparent--users do not need to know anything about a file’slocation in order to access it. All they need to know is the pathname of the file, which looks the same in every cell. Thus everyuser at every machine sees the collection of files in the same way, meaning that AFS provides a uniform namespace to its users.

1.1.5 Volumes

AFS groups files into volumes, making it possible to distribute files across many machines and yet maintain a uniform namespace.A volume is a unit of disk space that functions like a container for a set of related files, keeping them all together on one partition.Volumes can vary in size, but are (by definition) smaller than a partition.

Volumes are important to system administrators and users for several reasons. Their small size makes them easy to move fromone partition to another, or even between machines. The system administrator can maintain maximum efficiency by movingvolumes to keep the load balanced evenly. In addition, volumes correspond to directories in the filespace--most cells store thecontents of each user home directory in a separate volume. Thus the complete contents of the directory move together when thevolume moves, making it easy for AFS to keep track of where a file is at a certain time.

Volume moves are recorded automatically, so users do not have to keep track of file locations. Volumes can be moved from serverto server by a cell administrator without notifying clients, even while the volume is in active use by a client machine. Volumemoves are transparent to client machines apart from a brief interruption in file service for files in that volume.

1.1.6 Efficiency Boosters: Replication and Caching

AFS incorporates special features on server machines and client machines that help make it efficient and reliable.

On server machines, AFS enables administrators to replicate commonly-used volumes, such as those containing binaries for pop-ular programs. Replication means putting an identical read-only copy (sometimes called a clone) of a volume on more than onefile server machine. The failure of one file server machine housing the volume does not interrupt users’ work, because the vol-ume’s contents are still available from other machines. Replication also means that one machine does not become overburdenedwith requests for files from a popular volume.

On client machines, AFS uses caching to improve efficiency. When a user on a client machine requests a file, the Cache Manageron the client sends a request for the data to the File Server process running on the proper file server machine. The user does notneed to know which machine this is; the Cache Manager determines file location automatically. The Cache Manager receivesthe file from the File Server process and puts it into the cache, an area of the client machine’s local disk or memory dedicated totemporary file storage. Caching improves efficiency because the client does not need to send a request across the network everytime the user wants the same file. Network traffic is minimized, and subsequent access to the file is especially fast because thefile is stored locally. AFS has a way of ensuring that the cached file stays up-to-date, called a callback.

1.1.7 Security: Mutual Authentication and Access Control Lists

Even in a cell where file sharing is especially frequent and widespread, it is not desirable that every user have equal access toevery file. One way AFS provides adequate security is by requiring that servers and clients prove their identities to one anotherbefore they exchange information. This procedure, called mutual authentication, requires that both server and client demonstrateknowledge of a "shared secret" (like a password) known only to the two of them. Mutual authentication guarantees that serversprovide information only to authorized clients and that clients receive information only from legitimate servers.

Users themselves control another aspect of AFS security, by determining who has access to the directories they own. For anydirectory a user owns, he or she can build an access control list (ACL) that grants or denies access to the contents of the directory.An access control list pairs specific users with specific types of access privileges. There are seven separate permissions and up totwenty different people or groups of people can appear on an access control list.

For a more detailed description of AFS’s mutual authentication procedure, see A More Detailed Look at Mutual Authentication.For further discussion of ACLs, see Managing Access Control Lists.


1.2 More Detailed Discussions of Some Basic Concepts

The previous section offered a brief overview of the many concepts that an AFS system administrator needs to understand.The following sections examine some important concepts in more detail. Although not all concepts are new to an experiencedadministrator, reading this section helps ensure a common understanding of term and concepts.

1.2.1 Networks

A network is a collection of interconnected computers able to communicate with each other and transfer information back andforth.

A network can connect computers of any kind, but the typical network running AFS connects servers or high-function personalworkstations with AFS file server machines. For more about the classes of machines used in an AFS environment, see Serversand Clients.

1.2.2 Distributed File Systems

A file system is a collection of files and the facilities (programs and commands) that enable users to access the information in thefiles. All computing environments have file systems.

Networked computing environments often use distributed file systems like AFS. A distributed file system takes advantage of theinterconnected nature of the network by storing files on more than one computer in the network and making them accessible toall of them. In other words, the responsibility for file storage and delivery is "distributed" among multiple machines instead ofrelying on only one. Despite the distribution of responsibility, a distributed file system like AFS creates the illusion that there isa single filespace.

1.2.3 Servers and Clients

AFS uses a server/client model. In general, a server is a machine, or a process running on a machine, that provides specializedservices to other machines. A client is a machine or process that makes use of a server’s specialized service during the course ofits own work, which is often of a more general nature than the server’s. The functional distinction between clients and server isnot always strict, however--a server can be considered the client of another server whose service it is using.

AFS divides the machines on a network into two basic classes, file server machines and client machines, and assigns differenttasks and responsibilities to each.

File Server Machines

File server machines store the files in the distributed file system, and a server process running on the file server machine deliversand receives files. AFS file server machines run a number of server processes. Each process has a special function, such asmaintaining databases important to AFS administration, managing security or handling volumes. This modular design enableseach server process to specialize in one area, and thus perform more efficiently. For a description of the function of each AFSserver process, see AFS Server Processes and the Cache Manager.

Not all AFS server machines must run all of the server processes. Some processes run on only a few machines because thedemand for their services is low. Other processes run on only one machine in order to act as a synchronization site. See The FourRoles for File Server Machines.

Client Machines

The other class of machines are the client machines, which generally work directly for users, providing computational powerand other general purpose tools but may also be other servers that use data stored in AFS to provide other services. Clientsalso provide users with access to the files stored on the file server machines. Clients run a Cache Manager, which is normally acombination of a kernel module and a running process that enables them to communicate with the AFS server processes runningon the file server machines and to cache files. See The Cache Manager for more information. There are usually many more clientmachines in a cell than file server machines.


1.2.4 Cells

A cell is an independently administered site running AFS. In terms of hardware, it consists of a collection of file server machinesdefined as belonging to the cell. To say that a cell is administratively independent means that its administrators determine manydetails of its configuration without having to consult administrators in other cells or a central authority. For example, a celladministrator determines how many machines of different types to run, where to put files in the local tree, how to associatevolumes and directories, and how much space to allocate to each user.

The terms local cell and home cell are equivalent, and refer to the cell in which a user has initially authenticated during a session,by logging onto a machine that belongs to that cell. All other cells are referred to as foreign from the user’s perspective. In otherwords, throughout a login session, a user is accessing the filespace through a single Cache Manager--the one on the machine towhich he or she initially logged in--and that Cache Manager is normally configured to have a default local cell. All other cells areconsidered foreign during that login session, even if the user authenticates in additional cells or uses the cd command to changedirectories into their file trees. This distinction is mostly invisible and irrelavant to users. For most purposes, users will see nodifference between local and foreign cells.

It is possible to maintain more than one cell at a single geographical location. For instance, separate departments on a universitycampus or in a corporation can choose to administer their own cells. It is also possible to have machines at geographically distantsites belong to the same cell; only limits on the speed of network communication determine how practical this is.

Despite their independence, AFS cells generally agree to make their local filespace visible to other AFS cells, so that users indifferent cells can share files if they choose. If your cell is to participate in the "global" AFS namespace, it must comply with afew basic conventions governing how the local filespace is configured and how the addresses of certain file server machines areadvertised to the outside world.

1.2.5 The Uniform Namespace and Transparent Access

One of the features that makes AFS easy to use is that it provides transparent access to the files in a cell’s filespace. Users do nothave to know which file server machine stores a file in order to access it; they simply provide the file’s pathname, which AFSautomatically translates into a machine location.

In addition to transparent access, AFS also creates a uniform namespace--a file’s pathname is identical regardless of which clientmachine the user is working on. The cell’s file tree looks the same when viewed from any client because the cell’s file servermachines store all the files centrally and present them in an identical manner to all clients.

To enable the transparent access and the uniform namespace features, the system administrator must follow a few simple con-ventions in configuring client machines and file trees. For details, see Making Other Cells Visible in Your Cell.

1.2.6 Volumes

A volume is a conceptual container for a set of related files that keeps them all together on one file server machine partition.Volumes can vary in size, but are (by definition) smaller than a partition. Volumes are the main administrative unit in AFS, andhave several characteristics that make administrative tasks easier and help improve overall system performance.

• The relatively small size of volumes makes them easy to move from one partition to another, or even between machines.

• You can maintain maximum system efficiency by moving volumes to keep the load balanced evenly among the differentmachines. If a partition becomes full, the small size of individual volumes makes it easy to find enough room on othermachines for them.

• Each volume corresponds logically to a directory in the file tree and keeps together, on a single partition, all the data that makesup the files in the directory (including possible subdirectories). By maintaining (for example) a separate volume for each user’shome directory, you keep all of the user’s files together, but separate from those of other users. This is an administrativeconvenience that is impossible if the partition is the smallest unit of storage.

• The directory/volume correspondence also makes transparent file access possible, because it simplifies the process of filelocation. All files in a directory reside together in one volume and in order to find a file, a file server process need only knowthe name of the file’s parent directory, information which is included in the file’s pathname. AFS knows how to translate thedirectory name into a volume name, and automatically tracks every volume’s location, even when a volume is moved frommachine to machine. For more about the directory/volume correspondence, see Mount Points.


• Volumes increase file availability through replication and backup.

• Replication (placing copies of a volume on more than one file server machine) makes the contents more reliably available; fordetails, see Replication. Entire sets of volumes can be backed up as dump files (possibly to tape) and restored to the file system;see Configuring the AFS Backup System and Backing Up and Restoring AFS Data. In AFS, backup also refers to recordingthe state of a volume at a certain time and then storing it (either on tape or elsewhere in the file system) for recovery in theevent files in it are accidentally deleted or changed. See Creating Backup Volumes.

• Volumes are the unit of resource management. A space quota associated with each volume sets a limit on the maximum volumesize. See Setting and Displaying Volume Quota and Current Size.

1.2.7 Mount Points

The previous section discussed how each volume corresponds logically to a directory in the file system: the volume keepstogether on one partition all the data in the files residing in the directory. The directory that corresponds to a volume is called itsroot directory, and the mechanism that associates the directory and volume is called a mount point. A mount point is similar toa symbolic link in the file tree that specifies which volume contains the files kept in a directory. A mount point is not an actualsymbolic link; its internal structure is different.

NoteYou must not create, in AFS, a symbolic link to a file whose name begins with the number sign (#) or the percent sign (%),because the Cache Manager interprets such a link as a mount point to a regular or read/write volume, respectively.

The use of mount points means that many of the elements in an AFS file tree that look and function just like standard UNIXfile system directories are actually mount points. In form, a mount point is a symbolic link in a special format that names thevolume containing the data for files in the directory. When the Cache Manager (see The Cache Manager) encounters a mountpoint--for example, in the course of interpreting a pathname--it looks in the volume named in the mount point. In the volume theCache Manager finds an actual UNIX-style directory element--the volume’s root directory--that lists the files contained in thedirectory/volume. The next element in the pathname appears in that list.

A volume is said to be mounted at the point in the file tree where there is a mount point pointing to the volume. A volume’scontents are not visible or accessible unless it is mounted. Unlike some other file systems, AFS volumes can be mounted atmultiple locations in the file system at the same time.

1.2.8 Replication

Replication refers to making a copy, or clone, of a source read/write volume and then placing the copy on one or more additionalfile server machines in a cell. One benefit of replicating a volume is that it increases the availability of the contents. If one fileserver machine housing the volume fails, users can still access the volume on a different machine. No one machine need becomeoverburdened with requests for a popular file, either, because the file is available from several machines.

Replication is not necessarily appropriate for cells with limited disk space, nor are all types of volumes equally suitable forreplication (replication is most appropriate for volumes that contain popular files that do not change very often). For moredetails, see When to Replicate Volumes.

1.2.9 Caching and Callbacks

Just as replication increases system availability, caching increases the speed and efficiency of file access in AFS. Each AFS clientmachine dedicates a portion of its local disk or memory to a cache where it stores data temporarily. Whenever an applicationprogram (such as a text editor) running on a client machine requests data from an AFS file, the request passes through the CacheManager. The Cache Manager is a portion of the client machine’s kernel that translates file requests from local applicationprograms into cross-network requests to the File Server process running on the file server machine storing the file. When theCache Manager receives the requested data from the File Server, it stores it in the cache and then passes it on to the applicationprogram.

Caching improves the speed of data delivery to application programs in the following ways:


• When the application program repeatedly asks for data from the same file, it is already on the local disk. The application doesnot have to wait for the Cache Manager to request and receive the data from the File Server.

• Caching data eliminates the need for repeated request and transfer of the same data, so network traffic is reduced. Thus, initialrequests and other traffic can get through more quickly.

While caching provides many advantages, it also creates the problem of maintaining consistency among the many cached copiesof a file and the source version of a file. This problem is solved using a mechanism referred to as a callback.

A callback is a promise by a File Server to a Cache Manager to inform the latter when a change is made to any of the datadelivered by the File Server. Callbacks are used differently based on the type of file delivered by the File Server:

• When a File Server delivers a writable copy of a file (from a read/write volume) to the Cache Manager, the File Server sendsalong a callback with that file. If the source version of the file is changed by another user, the File Server breaks the callbackassociated with the cached version of that file--indicating to the Cache Manager that it needs to update the cached copy.

• When a File Server delivers a file from a read-only volume to the Cache Manager, the File Server sends along a callbackassociated with the entire volume (so it does not need to send any more callbacks when it delivers additional files from thevolume). Only a single callback is required per accessed read-only volume because files in a read-only volume can changeonly when a new version of the complete volume is released. All callbacks associated with the old version of the volume arebroken at release time.

The callback mechanism ensures that the Cache Manager always requests the most up-to-date version of a file. However, it doesnot ensure that the user necessarily notices the most current version as soon as the Cache Manager has it. That depends on howoften the application program requests additional data from the File System or how often it checks with the Cache Manager.

1.3 AFS Server Processes and the Cache Manager

As mentioned in Servers and Clients, AFS file server machines run a number of processes, each with a specialized function. Oneof the main responsibilities of a system administrator is to make sure that processes are running correctly as much of the time aspossible, using the administrative services that the server processes provide.

The following list briefly describes the function of each server process and the Cache Manager; the following sections thendiscuss the important features in more detail.

The File Server, the most fundamental of the servers, delivers data files from the file server machine to local workstations asrequested, and stores the files again when the user saves any changes to the files.

The Basic OverSeer Server (BOS Server) ensures that the other server processes on its server machine are running correctly asmuch of the time as possible, since a server is useful only if it is available. The BOS Server relieves system administrators ofmuch of the responsibility for overseeing system operations.

The Protection Server helps users control who has access to their files and directories. It is responsible for mapping Kerberosprincipals to AFS identities. Users can also grant access to several other users at once by putting them all in a group entry in theProtection Database maintained by the Protection Server.

The Volume Server performs all types of volume manipulation. It helps the administrator move volumes from one server machineto another to balance the workload among the various machines.

The Volume Location Server (VL Server) maintains the Volume Location Database (VLDB), in which it records the location ofvolumes as they move from file server machine to file server machine. This service is the key to transparent file access for users.

The Salvager is not a server in the sense that others are. It runs only after the File Server or Volume Server fails; it repairs anyinconsistencies caused by the failure. The system administrator can invoke it directly if necessary.

The Update Server distributes new versions of AFS server process software and configuration information to all file servermachines. It is crucial to stable system performance that all server machines run the same software.

The Backup Server maintains the Backup Database, in which it stores information related to the Backup System. It enables theadministrator to back up data from volumes to tape. The data can then be restored from tape in the event that it is lost from thefile system. The Backup Server is optional and is only one of several ways that the data in an AFS cell can be backed up.


The Cache Manager is the one component in this list that resides on AFS client rather than file server machines. It not aprocess per se, but rather a part of the kernel on AFS client machines that communicates with AFS server processes. Its mainresponsibilities are to retrieve files for application programs running on the client and to maintain the files in the cache.

AFS also relies on two other services that are not part of AFS and need to be instaled separately:

AFS requires a Kerberos KDC to use for user authentication. It verifies user identities at login and provides the facilities throughwhich participants in transactions prove their identities to one another (mutually authenticate). AFS uses Kerberos for all of itsauthentication. The Kerberos KDC replaces the old Authentication Server included in OpenAFS. The Authentication Server isstill available for sites that need it, but is now deprecated and should not be used for any new installations.

The Network Time Protocol Daemon (NTPD) is not an AFS server process, but plays a vital role nonetheless. It synchronizes theinternal clock on a file server machine with those on other machines. Synchronized clocks are particularly important for correctfunctioning of the AFS distributed database technology (known as Ubik); see Configuring the Cell for Proper Ubik Operation.The NTPD is usually provided with the operating system.

1.3.1 The File Server

The File Server is the most fundamental of the AFS server processes and runs on each file server machine. It provides the sameservices across the network that the UNIX file system provides on the local disk:

• Delivering programs and data files to client workstations as requested and storing them again when the client workstationfinishes with them.

• Maintaining the hierarchical directory structure that users create to organize their files.

• Handling requests for copying, moving, creating, and deleting files and directories.

• Keeping track of status information about each file and directory (including its size and latest modification time).

• Making sure that users are authorized to perform the actions they request on particular files or directories.

• Creating symbolic and hard links between files.

• Granting advisory locks (corresponding to UNIX locks) on request.

1.3.2 The Basic OverSeer Server

The Basic OverSeer Server (BOS Server) reduces the demands on system administrators by constantly monitoring the processesrunning on its file server machine. It can restart failed processes automatically and provides a convenient interface for adminis-trative tasks.

The BOS Server runs on every file server machine. Its primary function is to minimize system outages. It also

• Constantly monitors the other server processes (on the local machine) to make sure they are running correctly.

• Automatically restarts failed processes, without contacting a human operator. When restarting multiple server processes simul-taneously, the BOS server takes interdependencies into account and initiates restarts in the correct order.

• Accepts requests from the system administrator. Common reasons to contact BOS are to verify the status of server processeson file server machines, install and start new processes, stop processes either temporarily or permanently, and restart deadprocesses manually.

• Helps system administrators to manage system configuration information. The BOS Server provides a simple interface formodifying two files that contain information about privileged users and certain special file server machines. It also automatesthe process of adding and changing server encryption keys, which are important in mutual authentication, if the AuthenticationServer is still in use, but this function of the BOS Server is deprecated. For more details about these configuration files, seeCommon Configuration Files in the /usr/afs/etc Directory.


1.3.3 The Protection Server

The Protection Server is the key to AFS’s refinement of the normal UNIX methods for protecting files and directories fromunauthorized use. The refinements include the following:

• Defining associations between Kerberos principals and AFS identities. Normally, this is a simple mapping between principalnames in the Kerberos realm associated with an AFS cell to AFS identities in that cell, but the Protection Server also managesmappings for users using cross-realm authentication from a different Kerberos realm.

Defining seven access permissions rather than the standard UNIX file system’s three. In conjunction with the UNIX mode bitsassociated with each file and directory element, AFS associates an access control list (ACL) with each directory. The ACLspecifies which users have which of the seven specific permissions for the directory and all the files it contains. For a definitionof AFS’s seven access permissions and how users can set them on access control lists, see Managing Access Control Lists.

• Enabling users to grant permissions to numerous individual users--a different combination to each individual if desired. UNIXprotection distinguishes only between three user or groups: the owner of the file, members of a single specified group, andeveryone who can access the local file system.

• Enabling users to define their own groups of users, recorded in the Protection Database maintained by the Protection Server.The groups then appear on directories’ access control lists as though they were individuals, which enables the granting ofpermissions to many users simultaneously.

• Enabling system administrators to create groups containing client machine IP addresses to permit access when it originatesfrom the specified client machines. These types of groups are useful when it is necessary to adhere to machine-based licensingrestrictions or where it is difficult for some reason to obtain Kerberos credentials for processes running on those systems thatneed access to AFS.

The Protection Server’s main duty is to help the File Server determine if a user is authorized to access a file in the requestedmanner. The Protection Server creates a list of all the groups to which the user belongs. The File Server then compares this listto the ACL associated with the file’s parent directory. A user thus acquires access both as an individual and as a member of anygroups.

The Protection Server also maps Kerberos principals to AFS user ID numbers (AFS UIDs). These UIDs are functionally equiv-alent to UNIX UIDs, but operate in the domain of AFS rather than in the UNIX file system on a machine’s local disk. Thisconversion service is essential because the tickets that the Kerberos KDC gives to authenticated users are stamped with principalnames (to comply with Kerberos standards). The AFS server processes identify users by AFS UID, not by username. Beforethey can understand whom the token represents, they need the Protection Server to translate the username into an AFS UID. Forfurther discussion of the authentication process, see A More Detailed Look at Mutual Authentication.

1.3.4 The Volume Server

The Volume Server provides the interface through which you create, delete, move, and replicate volumes, as well as prepare themfor archiving to disk, tape, or other media (backing up). Volumes explained the advantages gained by storing files in volumes.Creating and deleting volumes are necessary when adding and removing users from the system; volume moves are done for loadbalancing; and replication enables volume placement on multiple file server machines (for more on replication, see Replication).

1.3.5 The Volume Location (VL) Server

The VL Server maintains a complete list of volume locations in the Volume Location Database (VLDB). When the Cache Manager(see The Cache Manager) begins to fill a file request from an application program, it first contacts the VL Server in order to learnwhich file server machine currently houses the volume containing the file. The Cache Manager then requests the file from theFile Server process running on that file server machine.

The VLDB and VL Server make it possible for AFS to take advantage of the increased system availability gained by usingmultiple file server machines, because the Cache Manager knows where to find a particular file. Indeed, in a certain sense the VLServer is the keystone of the entire file system--when the information in the VLDB is inaccessible, the Cache Manager cannotretrieve files, even if the File Server processes are working properly. A list of the information stored in the VLDB about eachvolume is provided in Volume Information in the VLDB.


1.3.6 The Salvager

The Salvager differs from other AFS Servers in that it runs only at selected times. The BOS Server invokes the Salvager whenthe File Server, Volume Server, or both fail. The Salvager attempts to repair disk corruption that can result from a failure.

As a system administrator, you can also invoke the Salvager as necessary, even if the File Server or Volume Server has not failed.See Salvaging Volumes.

1.3.7 The Update Server

The Update Server is an optional process that helps guarantee that all file server machines are running the same version of a serverprocess. System performance can be inconsistent if some machines are running one version of the File Server (for example) andother machines were running another version.

To ensure that all machines run the same version of a process, install new software on a single file server machine of eachsystem type, called the binary distribution machine for that type. The binary distribution machine runs the server portion of theUpdate Server, whereas all the other machines of that type run the client portion of the Update Server. The client portions checkfrequently with the server portion to see if they are running the right version of every process; if not, the client portion retrievesthe right version from the binary distribution machine and installs it locally. The system administrator does not need to rememberto install new software individually on all the file server machines: the Update Server does it automatically. For more on binarydistribution machines, see Binary Distribution Machines.

The Update Server also distributes configuration files that all file server machines need to store on their local disks (for a descrip-tion of the contents and purpose of these files, see Common Configuration Files in the /usr/afs/etc Directory). As with serverprocess software, the need for consistent system performance demands that all the machines have the same version of these files.The system administrator needs to make changes to these files on one machine only, the cell’s system control machine, whichruns a server portion of the Update Server. All other machines in the cell run a client portion that accesses the correct versions ofthese configuration files from the system control machine. For more information, see The System Control Machine.

1.3.8 The Backup Server

The Backup Server is an optional process that maintains the information in the Backup Database. The Backup Server and theBackup Database enable administrators to back up data from AFS volumes to tape and restore it from tape to the file system ifnecessary. The server and database together are referred to as the Backup System. This Backup System is only one way to backup AFS, and many AFS cells use different methods.

Administrators who wish to use the Backup System initially configure it by defining sets of volumes to be dumped togetherand the schedule by which the sets are to be dumped. They also install the system’s tape drives and define the drives’ TapeCoordinators, which are the processes that control the tape drives.

Once the Backup System is configured, user and system data can be dumped from volumes to tape or disk. In the event thatdata is ever lost from the system (for example, if a system or disk failure causes data to be lost), administrators can restore thedata from tape. If tapes are periodically archived, or saved, data can also be restored to its state at a specific time. Additionally,because Backup System data is difficult to reproduce, the Backup Database itself can be backed up to tape and restored if it everbecomes corrupted. For more information on configuring and using the Backup System, and on other AFS backup options, seeConfiguring the AFS Backup System and Backing Up and Restoring AFS Data.

1.3.9 The Cache Manager

As already mentioned in Caching and Callbacks, the Cache Manager is the one component in this section that resides on clientmachines rather than on file server machines. It is a combination of a daemon process and a set of extensions or modificationsin the client machine’s kernel, usually implemented as a loadable kernel module, that enable communication with the server pro-cesses running on server machines. Its main duty is to translate file requests (made by application programs on client machines)into remote procedure calls (RPCs) to the File Server. (The Cache Manager first contacts the VL Server to find out which FileServer currently houses the volume that contains a requested file, as mentioned in The Volume Location (VL) Server). When theCache Manager receives the requested file, it caches it before passing data on to the application program.


The Cache Manager also tracks the state of files in its cache compared to the version at the File Server by storing the callbackssent by the File Server. When the File Server breaks a callback, indicating that a file or volume changed, the Cache Managerrequests a copy of the new version before providing more data to application programs.

1.3.10 The Kerberos KDC

The Kerberos KDC (Key Distribution Center) performs two main functions related to network security:

• Verifying the identity of users as they log into the system by requiring that they provide a password or some other form ofauthentication credentials. The Kerberos KDC grants the user a ticket, which is converted into a token to prove to AFS serverprocesses that the user has authenticated. For more on tokens, see Complex Mutual Authentication.

• Providing the means through which server and client processes prove their identities to each other (mutually authenticate).This helps to create a secure environment in which to send cross-network messages.

The Kerberos KDC is a required service, but does not come with OpenAFS. One Kerberos KDC may provide authenticationservices for multiple AFS cells. Each AFS cell must be associated with a Kerberos realm with one or more Kerberos KDCssupporting version 4 or 5 of the Kerberos protocol. Kerberos version 4 is not secure and is supported only for backwardscompatibility; Kerberos 5 should be used for any new installation.

A Kerberos KDC maintains a database in which it stores encryption keys for users and for services, including the AFS serverencryption key. For users, these encryption keys are normally formed by converting a user password to a key, but Kerberos KDCsalso support other authentication mechanisms. To learn more about the procedures AFS uses to verify user identity and duringmutual authentication, see A More Detailed Look at Mutual Authentication.

Kerberos KDC software is included with some operating systems or may be acquired separately. MIT Kerberos, Heimdal, andMicrosoft Active Directory are known to work with OpenAFS as a Kerberos Server.This technology was originally developedby the Massachusetts Institute of Technology’s Project Athena.

NoteThe Authentication Server, or kaserver, was a Kerberos version 4 KDC. It is obsolete and should no longer be used. A third-party Kerberos version 5 KDC should be used instead. The Authentication Server is still provided with OpenAFS, but onlyfor backward compatibility and legacy support for sites that have not yet migrated to a Kerberos version 5 KDC. the KerberosServer. All references to the Kerberos KDC in this guide refer to a Kerberos 5 server.

1.3.11 The Network Time Protocol Daemon

The Network Time Protocol Daemon (NTPD) is not an AFS server process, but plays an important role. It helps guarantee thatall of the file server machines and client machines agree on the time. The NTPD on all file server machines learns the correcttime from a parent NTPD source, which may be located inside or outside the cell.

Keeping clocks synchronized is particularly important to the correct operation of AFS’s distributed database technology, whichcoordinates the copies of the Backup, Protection, and Volume Location Databases; see Replicating the OpenAFS AdministrativeDatabases. Client machines may also refer to these clocks for the correct time; therefore, it is less confusing if all file servermachines have the same time. For more technical detail about the NTPD, see The NTP web site or the documentation for youroperating system.

Clock Skew ImpactClient machines that are authenticating to an OpenAFS cell with valid credentials may still fail when the clocks of theclient machine, Kerberos KDC, and the File Server machines are not in sync.

http://www.ntp.org/


Chapter 2

Issues in Cell Configuration and Administration

This chapter discusses many of the issues to consider when configuring and administering a cell, and directs you to detailedrelated information available elsewhere in this guide. It is assumed you are already familiar with the material in An Overview ofOpenAFS Administration.

It is best to read this chapter before installing your cell’s first file server machine or performing any other administrative task.

2.1 Differences between AFS and UNIX: A Summary

AFS behaves like a standard UNIX file system in most respects, while also making file sharing easy within and between cells.This section describes some differences between AFS and the UNIX file system, referring you to more detailed information asappropriate.

2.1.1 Differences in File and Directory Protection

AFS augments the standard UNIX file protection mechanism in two ways: it associates an access control list (ACL) with eachdirectory, and it enables users to define a large number of their own groups, which can be placed on ACLs.

AFS uses ACLs to protect files and directories, rather than relying exclusively on the mode bits. This has several implications,which are discussed further in the indicated sections:

• AFS ACLs use seven access permissions rather than the three UNIX mode bits. See The AFS ACL Permissions.

• For directories, AFS ignores the UNIX mode bits. For files, AFS uses only the first set of mode bits (the owner bits), and theirmeaning interacts with permissions on the directory’s ACL. See How AFS Interprets the UNIX Mode Bits.

• A directory’s ACL protects all of the files in a directory in the same manner. To apply a more restrictive set of AFS permissionsto certain file, place it in directory with a different ACL. If a directory must contain files with different permissions, usesymbolic links to point to files stored in directories with different ACLs.

• Moving a file to a different directory changes its protection. See Differences Between UFS and AFS Data Protection.

• An ACL can include about 20 entries granting different combinations of permissions to different users or groups, rather thanonly the three UNIX entities represented by the three sets of mode bits. See Differences Between UFS and AFS Data Protection.

• You can designate an AFS file as write-only as in the UNIX file system, by setting only the w (write) mode bit. You cannotdesignate an AFS directory as write-only, because AFS ignores the mode bits on a directory. See How AFS Interprets theUNIX Mode Bits.

AFS enables users to create groups and add other users to those groups. Placing these groups on ACLs extends the samepermissions to a number of exactly specified users at the same time, which is much more convenient than placing the individualson the ACLs directly. See Administering the Protection Database.

There are also system-defined groups, system:anyuser and system:authuser, whose presence on an ACL extends access to awide range of users at once. See The System Groups and Using Groups on ACLs.


2.1.2 Differences in Authentication

Just as the AFS filespace is distinct from each machine’s local file system, AFS authentication is separate from local login.This has two practical implications, which will already be familiar to users and system administrators who use Kerberos forauthentication.

• To access AFS files, users must log into the local machine as normal, obtain Kerberos tickets, and then obtain AFS tokens.This process can often be automated through the system authentication configuration so that the user logs into the system asnormal and obtains Kerberos tickets and AFS tokens transparently. If you cannot or chose not to configure the system this way,your users must login and authenticate in separate steps, as detailed in the OpenAFS User Guide.

• Passwords may be stored in two separate places: the Kerberos KDC and, optionally, each machine’s local user database(/etc/passwd or equivalent) for the local system. A user’s passwords in the two places can differ if desired.

2.1.3 Differences in the Semantics of Standard UNIX Commands

This section summarizes how AFS modifies the functionality of some UNIX commands.

The chmod command Only members of the system:administrators group can use this command to turn on the setuid, setgidor sticky mode bits on AFS files. For more information, see Determining if a Client Can Run Setuid Programs.

The chown command Only members of the system:administrators group can issue this command on AFS files.

The chgrp command Only members of the system:administrators can issue this command on AFS files and directories.

The groups and id commands If the user’s AFS tokens are associated with a process authentication group (PAG), the output ofthese commands may include one or two large numbers. These are artificial groups used by the OpenAFS Cache Managerto track the PAG on some platforms. Other platforms may use other methods, such as native kernel support for a PAG or asimilar concept, in which case the large GIDs may not appear. To learn about PAGs, see Identifying AFS Tokens by PAG.

The ln command This command cannot create hard links between files in different AFS directories. See Creating Hard Links.

The sshd daemon and ssh command In order for a user to have access to files stored in AFS, that user needs to have Kerberostickets and an AFS token on the system from which they’re accessing AFS. This has an implication for users who log inremotely via protocols such as Secure Shell (SSH): that log-in process must create local Kerberos tickets and an AFS tokenon the system, or the user will have to separately authenticate to Kerberos and AFS after logging in.

The OpenSSH project provides an SSH client and server that uses the GSS-API protocol to pass Kerberos tickets betweenmachines. With a suitable SSH client, this allows users to delegate their Kerberos tickets to the remote machine, and thatmachine to store those tickets and obtain AFS tokens as part of the log-in process.

2.1.4 The AFS version of the fsck Command and inode-based fileservers

The fileserver uses either of two formats for storing data on disk. The inode-based format uses a combination of regularfiles and extra fields stored in the inode data structures that are normally reserved for use by the operating system. Thenamei format uses normal file storage and does not use special structures. The choice of storage formats is chosen atcompile time and the two formats are incompatible. The inode format is only available on certain platforms. The storageformat must be consistent for the fileserver binaries and all vice partitions on a given file server machine.

ImportantThis section on fsck advice only applies to the inode-based fileserver binaries. On servers using namei-based binaries,the vendor-supplied fsck can be used as normal.

http://www.openssh.org/


If you are using AFS fileserver binaries compiled with the inode-based format, never run the standard UNIX fsck command onan AFS file server machine. It does not understand how the File Server organizes volume data on disk, and so moves all AFSdata into the lost+found directory on the partition.

Instead, use the version of the fsck program that is included in the AFS distribution. The OpenAFS Quick Start Guide explainshow to replace the vendor-supplied fsck program with the AFS version as you install each server machine.

The AFS version functions like the standard fsck program on data stored on both UFS and AFS partitions. The appearance of abanner like the following as the fsck program initializes confirms that you are running the correct one:

--- AFS (R) version fsck---

where version is the AFS version. For correct results, it must match the AFS version of the server binaries in use on the machine.

If you ever accidentally run the standard version of the program, contact your AFS support provider, contact the OpenAFSmailing lists, or refer to the OpenAFS support web page for support options. It is sometimes possible to recover volume datafrom the lost+found directory. If the data is not recoverabled, then restoring from backup is recommended.

WarningRunning the fsck binary supplied by the operating system vendor on an fileserver using inode-based file storage willresult in data corruption!

2.1.5 Creating Hard Links

AFS does not allow hard links (created with the UNIX ln command) between files that reside in different directories, because inthat case it is unclear which of the directory’s ACLs to associate with the link.

AFS also does not allow hard links to directories, in order to keep the file system organized as a tree.

It is possible to create symbolic links (with the UNIX ln -s command) between elements in two different AFS directories, or evenbetween an element in AFS and one in a machine’s local UNIX file system. Do not create a symbolic link in AFS to a file whosename begins with either a number sign (#) or a percent sign (%), however. The Cache Manager interprets such links as a mountpoint to a regular or read/write volume, respectively.

2.1.6 AFS Implements Save on Close

When an application issues the UNIX close system call on a file, the Cache Manager performs a synchronous write of the datato the File Server that maintains the central copy of the file. It does not return control to the application until the File Serverhas acknowledged receipt of the data. For the fsync system call, control does not return to the application until the File Serverindicates that it has written the data to non-volatile storage on the file server machine.

When an application issues the UNIX write system call, the Cache Manager writes modifications to the local AFS client cacheonly. If the local machine crashes or an application program exits without issuing the close system call, it is possible that themodifications are not recorded in the central copy of the file maintained by the File Server. The Cache Manager does sometimeswrite this type of modified data from the cache to the File Server without receiving the close or fsync system call, such as whenit needs to free cache chunks for new data. However, it is not generally possible to predict when the Cache Manager transfersmodified data to the File Server in this way.

The implication is that if an application’s Save option invokes the write system call rather than close or fsync, the changes arenot necessarily stored permanently on the File Server machine. Most application programs issue the close system call for saveoperations, as well as when they finish handling a file and when they exit.

2.1.7 Setuid Programs

The UNIX setuid bit is ignored by default for programs run from AFS, but can be enabled by the system administrator on a clientmachine. The fs setcell command determines whether setuid programs that originate in a particular cell can run on a given client

http://www.openafs.org/support.html


machine. Running setuid binaries from AFS poses a security risk due to weaknesses in the integrity checks of the AFS protocoland should normally not be permitted. See Determining if a Client Can Run Setuid Programs.

Set the UNIX setuid bit only for files whose owner is UID 0 (the local superuser root). This does not present an automatic securityrisk: the local superuser has no special privilege in AFS, but only in the local machine’s UNIX file system and kernel. Settingthe UNIX setuid bit for files owned with a different UID will have unpredictable resuilts, since that UID will be interpreted aspossibly different users on each AFS client machine.

Any file can be marked with the setuid bit, but only members of the system:administrators group can issue the chown systemcall or the chown command, or issue the chmod system call or the chmod command to set the setuid bit.

2.2 Choosing a Cell Name

This section explains how to choose a cell name and explains why choosing an appropriate cell name is important.

Your cell name must distinguish your cell from all others in the AFS global namespace. By convention, the cell name is thesecond element in any AFS pathname; therefore, a unique cell name guarantees that every AFS pathname uniquely identifiesa file, even if cells use the same directory names at lower levels in their local AFS filespace. For example, both the ExampleCorporation cell and the Example Organization cell can have a home directory for the user pat, because the pathnames aredistinct: /afs/example.com/usr/pat and /afs/example.org/usr/pat.

By convention, cell names follow the Domain Name System (DNS) conventions for domain names. If you are already an Internetsite, then it is simplest and strongly recommended to choose your Internet domain name as the cell name.

If you are not an Internet site, it is best to choose a unique DNS-style name, particularly if you plan to connect to the Internet inthe future. There are a few constraints on AFS cell names:

• It can contain as many as 64 characters, but shorter names are better because the cell name frequently is part of machine andfile names. If your cell name is long, you can reduce pathname length either by creating a symbolic link to the complete cellname, at the second level in your file tree or by using the CellAlias configuration file on a client machine. See The Second(Cellname) Level.

• To guarantee it is suitable for different operating system types, the cell name can contain only lowercase characters, numbers,underscores, dashes, and periods. Do not include command shell metacharacters.

• It can include any number of fields, which are conventionally separated by periods (see the examples below).

2.2.1 How to Set the Cell Name

The cell name is recorded in two files on the local disk of each file server and client machine. Among other functions, thesefiles define the machine’s cell membership and so affect how programs and processes run on the machine; see Why Choosing theAppropriate Cell Name is Important. The procedure for setting the cell name is different for the two types of machines.

For file server machines, the two files that record the cell name are the /usr/afs/etc/ThisCell and /usr/afs/etc/CellServDB files.As described more explicitly in the OpenAFS Quick Start Guide, you set the cell name in both by issuing the bos setcellnamecommand on the first file server machine you install in your cell. It is not usually necessary to issue the command again. If youuse the Update Server, it distributes its copy of the ThisCell and CellServDB files to additional server machines that you install.If you do not use the Update Server, the OpenAFS Quick Start Guide explains how to copy the files manually.

For client machines, the two files that record the cell name are the /usr/vice/etc/ThisCell and /usr/vice/etc/CellServDB files.You create these files on a per-client basis, either with a text editor or by copying them onto the machine from a central source inAFS. See Maintaining Knowledge of Database Server Machines for details.

Change the cell name in these files only when you want to transfer the machine to a different cell (client machines can only haveone default cell at a time and server machines can only belong to one cell at a time). If the machine is a file server, follow thecomplete set of instructions in the OpenAFS Quick Start Guide for configuring a new cell. If the machine is a client, all you needto do is change the files appropriately and reboot the machine. The next section explains further the negative consequences ofchanging the name of an existing cell.

To set the default cell name used by most AFS commands without changing the local /usr/vice/etc/ThisCell file, set the AFS-CELL environment variable in the command shell. It is worth setting this variable if you need to complete significant adminis-trative work in a foreign cell.


NoteThe fs checkservers and fs mkmount commands do not use the AFSCELL variable. The fs checkservers command alwaysdefaults to the cell named in the ThisCell file, unless the -cell argument is used. The fs mkmount command defaults to thecell in which the parent directory of the new mount point resides.

2.2.2 Why Choosing the Appropriate Cell Name is Important

Take care to select a cell name that is suitable for long-term use. Changing a cell name later is complicated. An appropriatecell name is important because it is the second element in the pathname of all files in a cell’s file tree. Because each cellname is unique, its presence in an AFS pathname makes the pathname unique in the AFS global namespace, even if multiplecells use similar filespace organization at lower levels. For instance, it means that every cell can have a home directory called/afs/cellname/usr/pat without causing a conflict. The presence of the cell name in pathnames also means that users in everycell use the same pathname to access a file, whether the file resides in their local cell or in a foreign cell.

Another reason to choose the correct cell name early in the process of installing your cell is that the cell membership defined ineach machine’s ThisCell file affects the performance of many programs and processes running on the machine. For instance,AFS commands (fs, pts, and vos commands, for example) by default execute in the cell of the machine on which they are issued.The command interpreters check the ThisCell file on the local disk and then contact the database server machines listed in theCellServDB file or configured in DNS for the indicated cell. (The bos commands work differently because the issuer always hasto name of the machine on which to run the command.)

The ThisCell file also normally determines the cell for which a user receives an AFS token when he or she logs in to a machine.

If you change the cell name, you must change the ThisCell and CellServDB files on every server and client machine. Failure tochange them all will cause many commands from the AFS suites to not work as expected.

2.3 Participating in the AFS Global Namespace

Participating in the AFS global namespace makes your cell’s local file tree visible to AFS users in foreign cells and makes othercells’ file trees visible to your local users. It makes file sharing across cells just as easy as sharing within a cell. This sectionoutlines the procedures necessary for participating in the global namespace.

• Participation in the global namespace is not mandatory. Some cells use AFS primarily to facilitate file sharing within the cell,and are not interested in providing their users with access to foreign cells.

• Making your file tree visible does not mean making it vulnerable. You control how foreign users access your cell using thesame protection mechanisms that control local users’ access. See Granting and Denying Foreign Users Access to Your Cell.

• The two aspects of participation are independent. A cell can make its file tree visible without allowing its users to see foreigncells’ file trees, or can enable its users to see other file trees without advertising its own.

• You make your cell visible to others by advertising your database server machines and allowing users at other sites to accessyour database server and file server machines. See Making Your Cell Visible to Others.

• You control access to foreign cells on a per-client machine basis. In other words, it is possible to make a foreign cell accessiblefrom one client machine in your cell but not another. See Making Other Cells Visible in Your Cell.

2.3.1 What the Global Namespace Looks Like

The AFS global namespace appears the same to all AFS cells that participate in it, because they all agree to follow a small set ofconventions in constructing pathnames.

The first convention is that all AFS pathnames begin with the string /afs to indicate that they belong to the AFS global namespace.

The second convention is that the cell name is the second element in an AFS pathname; it indicates where the file resides (thatis, the cell in which a file server machine houses the file). As noted, the presence of a cell name in pathnames makes the global


namespace possible, because it guarantees that all AFS pathnames are unique even if cells use the same directory names at lowerlevels in their AFS filespace.

What appears at the third and lower levels in an AFS pathname depends on how a cell has chosen to arrange its filespace. Thereare some suggested conventional directories at the third level; see The Third Level.

2.3.2 Making Your Cell Visible to Others

You make your cell visible to others by advertising your cell name and database server machines. Just like client machines inthe local cell, the Cache Manager on machines in foreign cells use the information to reach your cell’s Volume Location (VL)Servers when they need volume and file location information. For authenticated access, foreign clients must be configured withthe necessary Kerberos version 5 domain-to-realm mappings and Key Distribution Center (KDC) location information for boththe local and remote Kerberos version 5 realms.

There are two places you can make this information available:

• In the global CellServDB file maintained by the AFS Registrar. This file lists the name and database server machines of everycell that has agreed to make this information available to other cells. This file is available at http://grand.central.org/csdb.html

To add or change your cell’s listing in this file, follow the instructions at http://grand.central.org/csdb.html. It is a good policyto check the file for changes on a regular schedule. An updated copy of this file is included with new releases of OpenAFS.

• A file called CellServDB.local in the /afs/cellname/service/etc directory of your cell’s filespace. List only your cell’sdatabase server machines.

Update the files whenever you change the identity of your cell’s database server machines. Also update the copies of theCellServDB files on all of your server machines (in the /usr/afs/etc directory) and client machines (in the /usr/vice/etc directory).For instructions, see Maintaining the Server CellServDB File and Maintaining Knowledge of Database Server Machines.

Once you have advertised your database server machines, it can be difficult to make your cell invisible again. You can remove theCellServDB.local file and ask the AFS Registrar to remove your entry from the global CellServDB file, but other cells probablyhave an entry for your cell in their local CellServDB files already. To make those entries invalid, you must change the names orIP addresses of your database server machines.

Your cell does not have to be invisible to be inaccessible, however. To make your cell completely inaccessible to foreign users,remove the system:anyuser group from all ACLs at the top three levels of your filespace; see Granting and Denying ForeignUsers Access to Your Cell.

2.3.3 Making Other Cells Visible in Your Cell

To make a foreign cell’s filespace visible on a client machine in your cell that is not configured for Freelance Mode or DynamicRoot mode, perform the following three steps:

1. Mount the cell’s root.cell volume at the second level in your cell’s filespace just below the /afs directory. Use the fsmkmount command with the -cell argument as instructed in To create a cellular mount point.

2. Mount AFS at the /afs directory on the client machine. The afsd program, which initializes the Cache Manager, performsthe mount automatically at the directory named in the first field of the local /usr/vice/etc/cacheinfo file or by the com-mand’s -mountdir argument. Mounting AFS at an alternate location makes it impossible to reach the filespace of any cellthat mounts its root.afs and root.cell volumes at the conventional locations. See Displaying and Setting the Cache Sizeand Location.

3. Create an entry for the cell in the list of database server machines which the Cache Manager maintains in kernel memory.

The /usr/vice/etc/CellServDB file on every client machine’s local disk lists the database server machines for the local andforeign cells. The afsd program reads the contents of the CellServDB file into kernel memory as it initializes the CacheManager. You can also use the fs newcell command to add or alter entries in kernel memory directly between reboots ofthe machine. See Maintaining Knowledge of Database Server Machines.

http://grand.central.org/csdb.html

http://grand.central.org/csdb.html


Non-windows client machines may enable Dynamic Root Mode by using the -dynroot option to afsd. When this option isenabled, all cells listed in the CellServDB file will appear in the /afs directory. The contents of the root.afs volume will beignored.

Windows client machines may enable Freelance Mode during client installation or by setting the FreelanceClient setting underService Parameters in the Windows Registry as mentioned in the Release Notes. When this option is enabled, the root.afsvolume is ignored and a mounpoint for each cell is automatically created in the the \\AFS directory when the folder \\AFS\cellname is accessed and the foreign Volume Location servers can be reached.

Note that making a foreign cell visible to client machines does not guarantee that your users can access its filespace. The ACLsin the foreign cell must also grant them the necessary permissions.

2.3.4 Granting and Denying Foreign Users Access to Your Cell

Making your cell visible in the AFS global namespace does not take away your control over the way in which users from foreigncells access your file tree.

By default, foreign users access your cell as the user anonymous, which means they have only the permissions granted tothe system:anyuser group on each directory’s ACL. Normally these permissions are limited to the l (lookup) and r (read)permissions.

There are three ways to grant wider access to foreign users:

• Grant additional permissions to the system:anyuser group on certain ACLs. Keep in mind, however, that all users can thenaccess that directory in the indicated way (not just specific foreign users you have in mind).

• Enable automatic registration for users in the foreign cell. This may be done by creating a cross-realm trust in the KerberosDatabase. Then add a PTS group named system:[email protected] and give it a group quota greater than the numberof foreign users expected to be registered. After the cross-realm trust and the PTS group are created, the aklog command willautomatically register foreign users as needed. Consult the documentation for your Kerberos Server for instructions on howto establish a cross-realm trust.

• Create a local authentication account for specific foreign users, by creating entries in the Protection Database, the KerberosDatabase, and the local password file.

2.4 Configuring Your AFS Filespace

This section summarizes the issues to consider when configuring your AFS filespace. For a discussion of creating volumes thatcorrespond most efficiently to the filespace’s directory structure, see Creating Volumes to Simplify Administration.

NoteFor Windows users: Windows uses a backslash (\) rather than a forward slash (/) to separate the elements in a pathname.The hierarchical organization of the filespace is however the same as on a UNIX machine.

AFS pathnames must follow a few conventions so the AFS global namespace looks the same from any AFS client machine.There are corresponding conventions to follow in building your file tree, not just because pathnames reflect the structure of a filetree, but also because the AFS Cache Manager expects a certain configuration.

2.4.1 The Top /afs Level

The first convention is that the top level in your file tree be called the /afs directory. If you name it something else, then you mustuse the -mountdir argument with the afsd program to get Cache Managers to mount AFS properly. You cannot participate in theAFS global namespace in that case.

http://docs.openafs.org/ReleaseNotesWindows/

http://docs.openafs.org/Reference/1/aklog.html


2.4.2 The Second (Cellname) Level

The second convention is that just below the /afs directory you place directories corresponding to each cell whose file tree isvisible and accessible from the local cell. Minimally, there must be a directory for the local cell. Each such directory is a mountpoint to the indicated cell’s root.cell volume. For example, in the Example Corporation cell, /afs/example.com is a mount pointfor the cell’s own root.cell volume and example.org is a mount point for the Example Organization cell’s root.cell volume. Thefs lsmount command displays the mount points.

% fs lsmount /afs/example.com’/afs/example.com’ is a mount point for volume ’#root.cell’% fs lsmount /afs/example.org’/afs/example.org’ is a mount point for volume ’#example.org:root.cell’

To reduce the amount of typing necessary in pathnames, you can create a symbolic link with an abbreviated name to the mountpoint of each cell your users frequently access (particularly the home cell). In the Example Corporation cell, for instance,/afs/example is a symbolic link to the /afs/example.com mount point, as the fs lsmount command reveals.

% fs lsmount /afs/example’/afs/example’ is a symbolic link, leading to a mount point for volume

’#root.cell’

2.4.3 The Third Level

You can organize the third level of your cell’s file tree any way you wish. The following list describes directories that appear atthis level in the conventional configuration:

common This directory contains programs and files needed by users working on machines of all system types, such as texteditors, online documentation files, and so on. Its /etc subdirectory is a logical place to keep the central update sources forfiles used on all of your cell’s client machines, such as the ThisCell and CellServDB files.

public A directory accessible to anyone who can access your filespace, because its ACL grants the l (lookup) and r (read)permissions to the system:anyuser group. It is useful if you want to enable your users to make selected informationavailable to everyone, but do not want to grant foreign users access to the contents of the usr directory which houses userhome directories (and is also at this level). It is conventional to create a subdirectory for each of your cell’s users.

service This directory contains files and subdirectories that help cells coordinate resource sharing. For a list of the proposedstandard files and subdirectories to create, call or write to AFS Product Support.As an example, files that other cells expect to find in this directory’s etc subdirectory can include the following:

• CellServDB.export, a list of database server machines for many cells• CellServDB.local, a list of the cell’s own database server machines• passwd, a copy of the local password file (/etc/passwd or equivalent) kept on the local disk of the cell’s client machines• group, a copy of the local groups file (/etc/group or equivalent) kept on the local disk of the cell’s client machines

sys_type A separate directory for storing the server and client binaries for each system type you use in the cell. Configurationis simplest if you use the system type names assigned in the AFS distribution, particularly if you wish to use the @sysvariable in pathnames (see Using the @sys Variable in Pathnames). The OpenAFS Release Notes lists the conventionalname for each supported system type.Within each such directory, create directories named bin, etc, usr, and so on, to store the programs normally kept in the/bin, /etc and /usr directories on a local disk. Then create symbolic links from the local directories on client machines intoAFS; see Configuring the Local Disk. Even if you do not choose to use symbolic links in this way, it can be convenient tohave central copies of system binaries in AFS. If binaries are accidentally removed from a machine, you can recopy themonto the local disk from AFS rather than having to recover them from tape

usr This directory contains home directories for your local users. As discussed in the previous entry for the public directory, itis often practical to protect this directory so that only locally authenticated users can access it. This keeps the contents ofyour user’s home directories as secure as possible.If your cell is quite large, directory lookup can be slowed if you put all home directories in a single usr directory. Forsuggestions on distributing user home directories among multiple grouping directories, see Grouping Home Directories.


2.5 Creating Volumes to Simplify Administration

This section discusses how to create volumes in ways that make administering your system easier.

At the top levels of your file tree (at least through the third level), each directory generally corresponds to a separate volume.Some cells also configure the subdirectories of some third level directories as separate volumes. Common examples are the/afs/cellname/common and /afs/cellname/usr directories.

You do not have to create a separate volume for every directory level in a tree, but the advantage is that each volume tends to besmaller and easier to move for load balancing. The overhead for a mount point is no greater than for a standard directory, nordoes the volume structure itself require much disk space. Most cells find that below the fourth level in the tree, using a separatevolume for each directory is no longer efficient. For instance, while each user’s home directory (at the fourth level in the tree)corresponds to a separate volume, all of the subdirectories in the home directory normally reside in the same volume.

Keep in mind that only one volume can be mounted at a given directory location in the tree. In contrast, a volume can be mountedat several locations, though this is not recommended because it distorts the hierarchical nature of the file tree, potentially causingconfusion.

2.5.1 Assigning Volume Names

You can name your volumes anything you choose, subject to a few restrictions:

• Read/write volume names can be up to 22 characters in length. The maximum length for volume names is 31 characters, andthere must be room to add the .readonly extension on read-only volumes.

• Do not add the .readonly and .backup extensions to volume names yourself, even if they are appropriate. The Volume Serveradds them automatically as it creates a read-only or backup version of a volume.

• There must be volumes named root.afs and root.cell, mounted respectively at the top (/afs) level in the filespace and just belowthat level, at the cell’s name (for example, at /afs/example.com in the Example Corporation cell).

Deviating from these names only creates confusion and extra work. Changing the name of the root.afs volume, for instance,means that you must use the -rootvol argument to the afsd program on every client machine, to name the alternate volume.

Similarly, changing the root.cell volume name prevents users in foreign cells from accessing your filespace, if the mount pointfor your cell in their filespace refers to the conventional root.cell name. Of course, this is one way to make your cell invisibleto other cells.

It is best to assign volume names that indicate the type of data they contain, and to use similar names for volumes with similarcontents. It is also helpful if the volume name is similar to (or at least has elements in common with) the name of the directoryat which it is mounted. Understanding the pattern then enables you accurately to guess what a volume contains and where it ismounted.

Many cells find that the most effective volume naming scheme puts a common prefix on the names of all related volumes. Table1 describes the recommended prefixing scheme.

Prefix Contents Example Name Example Mount Pointcommon. popular programs and files common.etc /afs/cellname/common/etcsrc. source code src.afs /afs/cellname/src/afsproj. project data proj.portafs /afs/cellname/proj/portafstest. testing or other temporary data test.smith /afs/cellname/usr/smith/testuser. user home directory data user.terry /afs/cellname/usr/terry

sys_type. programs compiled for anoperating system type rs_aix42.bin /afs/cellname/rs_aix42/bin

Table 2.1: Suggested volume prefixes

Table 2 is a more specific example for a cell’s rs_aix42 system volumes and directories:

There are several advantages to this scheme:


Example Name Example Mount Pointrs_aix42.bin /afs/cellname/rs_aix42/bin, /afs/cellname/rs_aix42/binrs_aix42.etc /afs/cellname/rs_aix42/etcrs_aix42.usr /afs/cellname/rs_aix42/usrrs_aix42.usr.afsws /afs/cellname/rs_aix42/usr/afswsrs_aix42.usr.lib /afs/cellname/rs_aix42/usr/librs_aix42.usr.bin /afs/cellname/rs_aix42/usr/binrs_aix42.usr.etc /afs/cellname/rs_aix42/usr/etcrs_aix42.usr.inc /afs/cellname/rs_aix42/usr/incrs_aix42.usr.man /afs/cellname/rs_aix42/usr/manrs_aix42.usr.sys /afs/cellname/rs_aix42/usr/sysrs_aix42.usr.local /afs/cellname/rs_aix42/usr/local

Table 2.2: Example volume-prefixing scheme

• The volume name is similar to the mount point name in the filespace. In all of the entries in Table 2, for example, the onlydifference between the volume and mount point name is that the former uses periods as separators and the latter uses slashes.Another advantage is that the volume name indicates the contents, or at least suggests the directory on which to issue the lscommand to learn the contents.

• It makes it easy to manipulate groups of related volumes at one time. In particular, the vos backupsys command’s -prefixargument enables you to create a backup version of every volume whose name starts with the same string of characters.Making a backup version of each volume is one of the first steps in backing up a volume with the AFS Backup System, anddoing it for many volumes with one command saves you a good deal of typing. For instructions for creating backup volumes,see Creating Backup Volumes, For information on the AFS Backup System, see Configuring the AFS Backup System andBacking Up and Restoring AFS Data.

• It makes it easy to group related volumes together on a partition. Grouping related volumes together has several advantages ofits own, discussed in Grouping Related Volumes on a Partition.

2.5.2 Grouping Related Volumes on a Partition

If your cell is large enough to make it practical, consider grouping related volumes together on a partition. In general, you needat least three file server machines for volume grouping to be effective. Grouping has several advantages, which are most obviouswhen the file server machine becomes inaccessible:

• If you keep a hardcopy record of the volumes on a partition, you know which volumes are unavailable. You can keep such arecord without grouping related volumes, but a list composed of unrelated volumes is much harder to maintain. Note that therecord must be on paper, because the outage can prevent you from accessing an online copy or from issuing the vos listvolcommand, which gives you the same information.

• The effect of an outage is more localized. For example, if all of the binaries for a given system type are on one partition, thenonly users of that system type are affected. If a partition houses binary volumes from several system types, then an outage canaffect more people, particularly if the binaries that remain available are interdependent with those that are not available.

The advantages of grouping related volumes on a partition do not necessarily extend to the grouping of all related volumes onone file server machine. For instance, it is probably unwise in a cell with two file server machines to put all system volumes onone machine and all user volumes on the other. An outage of either machine probably affects everyone.

Admittedly, the need to move volumes for load balancing purposes can limit the practicality of grouping related volumes. Youneed to weigh the complementary advantages case by case.

2.5.3 When to Replicate Volumes

As discussed in Replication, replication refers to making a copy, or clone, of a read/write source volume and then placing thecopy on one or more additional file server machines. Replicating a volume can increase the availability of the contents. If one


file server machine housing the volume becomes inaccessible, users can still access the copy of the volume stored on a differentmachine. No one machine is likely to become overburdened with requests for a popular file, either, because the file is availablefrom several machines.

However, replication is not appropriate for all cells. If a cell does not have much disk space, replication can be unduly expensive,because each clone not on the same partition as the read/write source takes up as much disk space as its source volume did atthe time the clone was made. Also, if you have only one file server machine, replication uses up disk space without increasingavailability.

Replication is also not appropriate for volumes that change frequently. You must issue the vos release command every time youneed to update a read-only volume to reflect changes in its read/write source.

For both of these reasons, replication is appropriate only for popular volumes whose contents do not change very often, such assystem binaries and other volumes mounted at the upper levels of your filespace. User volumes usually exist only in a read/writeversion since they change so often.

If you are replicating any volumes, you must replicate the root.afs and root.cell volumes, preferably at two or three sites each(even if your cell only has two or three file server machines). The Cache Manager needs to pass through the directories corre-sponding to the root.afs and root.cell volumes as it interprets any pathname. The unavailability of these volumes makes all othervolumes unavailable too, even if the file server machines storing the other volumes are still functioning.

Another reason to replicate the root.afs volume is that it can lessen the load on the File Server machine. The Cache Manager hasa bias to access a read-only version of the root.afs volume if it is replicate, which puts the Cache Manager onto the read-only paththrough the AFS filespace. While on the read-only path, the Cache Manager attempts to access a read-only copy of replicatedvolumes. The File Server needs to track only one callback per Cache Manager for all of the data in a read-only volume, ratherthan the one callback per file it must track for read/write volumes. Fewer callbacks translate into a smaller load on the File Server.

If the root.afs volume is not replicated, the Cache Manager follows a read/write path through the filespace, accessing the read-/write version of each volume. The File Server distributes and tracks a separate callback for each file in a read/write volume,imposing a greater load on it.

For more on read/write and read-only paths, see The Rules of Mount Point Traversal.

It also makes sense to replicate system binary volumes in many cases, as well as the volume corresponding to the /afs/cellname/usr directory and the volumes corresponding to the /afs/cellname/common directory and its subdirectories.

It is a good idea to place a replica on the same partition as the read/write source. In this case, the read-only volume is a clone(like a backup volume): it is a copy of the source volume’s vnode index, rather than a full copy of the volume contents. Only ifthe read/write volume moves to another partition or changes substantially does the read-only volume consume significant diskspace. Read-only volumes kept on other servers’ partitions always consume the full amount of disk space that the read/writesource consumed when the read-only volume was created.

You cannot have a replica volume on a different partition of the same server hosting the read/write volume. "Cheap" read-onlyvolumes must be on the same partition as the read/write; all other read-only volumes must be on different servers.

2.5.4 The Default Quota and ACL on a New Volume

Every AFS volume has associated with it a quota that limits the amount of disk space the volume is allowed to use. To set andchange quota, use the commands described in Setting and Displaying Volume Quota and Current Size.

By default, every new volume is assigned a space quota of 5000 KB blocks unless you include the -maxquota argument tothe vos create command. Also by default, the ACL on the root directory of every new volume grants all permissions to themembers of the system:administrators group. To learn how to change these values when creating an account with individualcommands, see To create one user account with individual commands. When using uss commands to create accounts, you canspecify alternate ACL and quota values in the template file’s V instruction; see Creating a Volume with the V Instruction.

2.6 Configuring Server Machines

This section discusses some issues to consider when configuring server machines, which store AFS data, transfer it to clientmachines on request, and house the AFS administrative databases. To learn about client machines, see Configuring ClientMachines.


If your cell has more than one AFS server machine, you can configure them to perform specialized functions. A machine canassume one or more of the roles described in the following list. For more details, see The Four Roles for File Server Machines.

• A simple file server machine runs only the processes that store and deliver AFS files to client machines. You can run as manysimple file server machines as you need to satisfy your cell’s performance and disk space requirements.

• A database server machine runs the four database server processes that maintain AFS’s replicated administrative databases:the Authentication, Backup, Protection, and Volume Location (VL) Server processes.

• A binary distribution machine distributes the AFS server binaries for its system type to all other server machines of that systemtype.

• The single system control machine distributes common server configuration files to all other server machines in the cell.

The OpenAFS Quick Beginnings explains how to configure your cell’s first file server machine to assume all four roles. TheOpenAFS Quick Beginnings chapter on installing additional server machines also explains how to configure them to perform oneor more roles.

2.6.1 Replicating the OpenAFS Administrative Databases

The AFS administrative databases are housed on database server machines and store information that is crucial for correct cellfunctioning. Both server processes and Cache Managers access the information frequently:

• Every time a Cache Manager fetches a file from a directory that it has not previously accessed, it must look up the file’s locationin the Volume Location Database (VLDB).

• Every time a user obtains an AFS token from the Authentication Server, the server looks up the user’s password in the Authen-tication Database.

• The first time that a user accesses a volume housed on a specific file server machine, the File Server contacts the ProtectionServer for a list of the user’s group memberships as recorded in the Protection Database.

• Every time you back up a volume using the AFS Backup System, the Backup Server creates records for it in the BackupDatabase.

Maintaining your cell is simplest if the first machine has the lowest IP address of any machine you plan to use as a databaseserver machine. If you later decide to use a machine with a lower IP address as a database server machine, you must update theCellServDB file on all clients before introducing the new machine.

If your cell has more than one server machine, it is best to run more than one as a database server machine (but more thanthree are rarely necessary). Replicating the administrative databases in this way yields the same benefits as replicating volumes:increased availability and reliability. If one database server machine or process stops functioning, the information in the databaseis still available from others. The load of requests for database information is spread across multiple machines, preventing anyone from becoming overloaded.

Unlike replicated volumes, however, replicated databases do change frequently. Consistent system performance demands thatall copies of the database always be identical, so it is not acceptable to record changes in only some of them. To synchronizethe copies of a database, the database server processes use AFS’s distributed database technology, Ubik. See Replicating theOpenAFS Administrative Databases.

If your cell has only one file server machine, it must also serve as a database server machine. If you cell has two file servermachines, it is not always advantageous to run both as database server machines. If a server, process, or network failure interruptscommunications between the database server processes on the two machines, it can become impossible to update the informationin the database because neither of them can alone elect itself as the synchronization site.


2.6.2 AFS Files on the Local Disk

It is generally simplest to store the binaries for all AFS server processes in the /usr/afs/bin directory on every file server machine,even if some processes do not actively run on the machine. This makes it easier to reconfigure a machine to fill a new role.

For security reasons, the /usr/afs directory on a file server machine and all of its subdirectories and files must be owned by thelocal superuser root and have only the first w (write) mode bit turned on. Some files even have only the first r (read) mode bitturned on (for example, the /usr/afs/etc/KeyFile file, which lists the AFS server encryption keys). Each time the BOS Serverstarts, it checks that the mode bits on certain files and directories match the expected values. For a list, see the OpenAFS QuickBeginnings section about protecting sensitive AFS directories, or the discussion of the output from the bos status command inTo display the status of server processes and their BosConfig entries.

For a description of the contents of all AFS directories on a file server machine’s local disk, see Administering Server Machines.

2.6.3 Configuring Partitions to Store AFS Data

The partitions that house AFS volumes on a file server machine must be mounted at directories named

/vicepindex

where index is one or two lowercase letters. By convention, the first AFS partition created is mounted at the /vicepa directory,the second at the /vicepb directory, and so on through the /vicepz directory. The names then continue with /vicepaa through/vicepaz, /vicepba through /vicepbz, and so on, up to the maximum supported number of server partitions, which is specified inthe OpenAFS Release Notes.

Each /vicepx directory must correspond to an entire partition or logical volume, and must be a subdirectory of the root directory(/). It is not acceptable to configure part of (for example) the /usr partition as an AFS server partition and mount it on a directorycalled /usr/vicepa.

Also, do not store non-AFS files on AFS server partitions. The File Server and Volume Server expect to have available all ofthe space on the partition. Sharing space also creates competition between AFS and the local UNIX file system for access to thepartition, particularly if the UNIX files are frequently used.

2.6.4 Monitoring, Rebooting and Automatic Process Restarts

AFS provides several tools for monitoring the File Server, including the scout and afsmonitor programs. You can configurethem to alert you when certain threshold values are exceeded, for example when a server partition is more than 95% full. SeeMonitoring and Auditing AFS Performance.

Rebooting a file server machine requires shutting down the AFS processes and so inevitably causes a service outage. Reboot fileserver machines as infrequently as possible. For instructions, see Rebooting a Server Machine.

The BOS Server checks each morning at 5:00 a.m. for any newly installed binary files in the /usr/afs/bin directory. It comparesthe timestamp on each binary file to the time at which the corresponding process last restarted. If the timestamp on the binary islater, the BOS Server restarts the corresponding process to start using it.

The BOS server also supports performing a weekly restart of all AFS server processes, including itself. This functionality isdisabled on new installs, but historically it was set to 4:00am on Sunday. Administrators may find that installations predatingOpenAFS 1.6.0 have weekly restarts enabled.

The default times are in the early morning hours when the outage that results from restarting a process is likely to disturb thefewest number of people. You can display the restart times for each machine with the bos getrestart command, and set themwith the bos setrestart command. The latter command enables you to disable automatic restarts entirely, by setting the time tonever. See Setting the BOS Server’s Restart Times.

2.7 Configuring Client Machines

This section summarizes issues to consider as you install and configure client machines in your cell.


2.7.1 Configuring the Local Disk

You can often free up significant amounts of local disk space on AFS client machines by storing standard UNIX files in AFS andcreating symbolic links to them from the local disk. The @sys pathname variable can be useful in links to system-specific files;see Using the @sys Variable in Pathnames.

There are two types of files that must actually reside on the local disk: boot sequence files needed before the afsd program isinvoked, and files that can be helpful during file server machine outages.

During a reboot, AFS is inaccessible until the afsd program executes and initializes the Cache Manager. (In the conventionalconfiguration, the AFS initialization file is included in the machine’s initialization sequence and invokes the afsd program.) Filesneeded during reboot prior to that point must reside on the local disk. They include the following, but this list is not necessarilyexhaustive.

• Standard UNIX utilities including the following or their equivalents:

– Machine initialization files (stored in the /etc or /sbin directory on many system types)

– The fstab file

– The mount command binary

– The umount command binary

• All subdirectories and files in the /usr/vice directory, including the following:

– The /usr/vice/cache directory

– The /usr/vice/etc/afsd command binary

– The /usr/vice/etc/cacheinfo file

– The /usr/vice/etc/CellServDB file

– The /usr/vice/etc/ThisCell file

For more information on these files, see Configuration and Cache-Related Files on the Local Disk.

The other type of files and programs to retain on the local disk are those you need when diagnosing and fixing problems causedby a file server outage, because the outage can make inaccessible the copies stored in AFS. Examples include the binaries fora text editor (such as ed or vi) and for the fs and bos commands. Store copies of AFS command binaries in the /usr/vice/etcdirectory as well as including them in the /usr/afsws directory, which is normally a link into AFS. Then place the /usr/afswsdirectory before the /usr/vice/etc directory in users’ PATH environment variable definition. When AFS is functioning normally,users access the copy in the /usr/afsws directory, which is more likely to be current than a local copy.

2.7.2 Enabling Access to Foreign Cells

As detailed in Making Other Cells Visible in Your Cell, you enable the Cache Manager to access a cell’s AFS filespace by storinga list of the cell’s database server machines in the local /usr/vice/etc/CellServDB file. The Cache Manager reads the list intokernel memory at reboot for faster retrieval. You can change the list in kernel memory between reboots by using the fs newcellcommand.

Because each client machine maintains its own copy of the CellServDB file, you can in theory enable access to different foreigncells on different client machines. This is not usually practical, however, especially if users do not always work on the samemachine.

2.7.3 Using the @sys Variable in Pathnames

When creating symbolic links into AFS on the local disk, it is often practical to use the @sys variable in pathnames. TheCache Manager automatically substitutes the local machine’s AFS system name (CPU/operating system type) for the @sysvariable. This means you can place the same links on machines of various system types and still have each machine access thebinaries for its system type. For example, the Cache Manager on a machine running AIX 4.2 converts /afs/example.com/@systo /afs/example.com/rs_aix42, whereas a machine running Solaris 10 converts it to /afs/example.com/sun4x_510.


If you want to use the @sys variable, it is simplest to use the conventional AFS system type names as specified in the OpenAFSRelease Notes. The Cache Manager records the local machine’s system type name in kernel memory during initialization. Ifyou do not use the conventional names, you must use the fs sysname command to change the value in kernel memory from itsdefault just after Cache Manager initialization, on every client machine of the relevant system type. The fs sysname commandalso displays the current value; see Displaying and Setting the System Type Name.

In pathnames in the AFS filespace itself, use the @sys variable carefully and sparingly, because it can lead to unexpected results.It is generally best to restrict its use to only one level in the filespace. The third level is a common choice, because that is wheremany cells store the binaries for different machine types.

Multiple instances of the @sys variable in a pathname are especially dangerous to people who must explicitly change directories(with the cd command, for example) into directories that store binaries for system types other than the machine on which theyare working, such as administrators or developers who maintain those directories. After changing directories, it is recommendedthat such people verify they are in the desired directory.

2.7.4 Setting Server Preferences

The Cache Manager stores a table of preferences for file server machines in kernel memory. A preference rank pairs a file servermachine interface’s IP address with an integer in the range from 1 to 65,534. When it needs to access a file, the Cache Managercompares the ranks for the interfaces of all machines that house the file, and first attempts to access the file via the interface withthe best rank. As it initializes, the Cache Manager sets default ranks that bias it to access files via interfaces that are close to it interms of network topology. You can adjust the preference ranks to improve performance if you wish.

The Cache Manager also uses similar preferences for Volume Location (VL) Server machines. Use the fs getserverprefs com-mand to display preference ranks and the fs setserverprefs command to set them. See Maintaining Server Preference Ranks.

2.8 Configuring AFS User Accounts

This section discusses some of the issues to consider when configuring AFS user accounts. Because AFS is separate from theUNIX file system, a user’s AFS account is separate from her UNIX account.

The preferred method for creating a user account is with the uss suite of commands. With a single command, you can create allthe components of one or many accounts, after you have prepared a template file that guides the account creation. See Creatingand Deleting User Accounts with the uss Command Suite.

Alternatively, you can issue the individual commands that create each component of an account. For instructions, along withinstructions for removing user accounts and changing user passwords, user volume quotas and usernames, see AdministeringUser Accounts.

When users leave your system, it is often good policy to remove their accounts. Instructions appear in Deleting IndividualAccounts with the uss delete Command and Removing a User Account.

An AFS user account consists of the following components, which are described in greater detail in The Components of an AFSUser Account.

• A Protection Database entry

• An Authentication Database entry

• A volume

• A home directory at which the volume is mounted

• Ownership of the home directory and full permissions on its ACL

• An entry in the local password file (/etc/passwd or equivalent) of each machine the user needs to log into

• Optionally, standard files and subdirectories that make the account more useful


By creating some components but not others, you can create accounts at different levels of functionality, using either uss com-mands as described in Creating and Deleting User Accounts with the uss Command Suite or individual commands as describedin Administering User Accounts. The levels of functionality include the following

• An authentication-only account enables the user to obtain AFS tokens and so to access protected AFS data and to issueprivileged commands. It consists only of entries in the Authentication and Protection Database. This type of account issuitable for administrative accounts and for users from foreign cells who need to access protected data. Local users generallyalso need a volume and home directory.

• A basic user account includes a volume for the user, in addition to Authentication and Protection Database entries. The volumeis mounted in the AFS filespace as the user’s home directory, and provides a repository for the user’s personal files.

• A full account adds configuration files for basic functions such as logging in, printing, and mail delivery to a basic account,making it more convenient and useful. For a discussion of some useful types of configuration files, see Creating Standard Filesin New AFS Accounts.

If your users have UNIX user accounts that predate the introduction of AFS in the cell, you possibly want to convert them intoAFS accounts. There are three main issues to consider:

• Making UNIX and AFS UIDs match

• Setting the password field in the local password file appropriately

• Moving files from the UNIX file system into AFS

For further discussion, see Converting Existing UNIX Accounts with uss or Converting Existing UNIX Accounts.

2.8.1 Choosing Usernames and Naming Other Account Components

This section suggests schemes for choosing usernames, AFS UIDs, user volume names and mount point names, and also outlinessome restrictions on your choices.

Usernames AFS imposes very few restrictions on the form of usernames. It is best to keep usernames short, both because manyutilities and applications can handle usernames of no more than eight characters and because by convention many componentsof and AFS account incorporate the name. These include the entries in the Protection and Authentication Databases, the volume,and the mount point. Depending on your electronic mail delivery system, the username can become part of the user’s mailingaddress. The username is also the string that the user types when logging in to a client machine.

Some common choices for usernames are last names, first names, initials, or a combination, with numbers sometimes added. Itis also best to avoid using the following characters, many of which have special meanings to the command shell.

• The comma (,)

• The colon (:), because AFS reserves it as a field separator in protection group names; see The Two Types of User-DefinedGroups

• The semicolon (;)

• The "at-sign" (@); this character is reserved for Internet mailing addresses

• Spaces

• The newline character

• The period (.); it is conventional to use this character only in the special username that an administrator adopts while performingprivileged tasks, such as pat.admin


AFS UIDs and UNIX UIDs AFS associates a unique identification number, the AFS UID, with every username, recording themapping in the user’s Protection Database entry. The AFS UID functions within AFS much as the UNIX UID does in the localfile system: the AFS server processes and the Cache Manager use it internally to identify a user, rather than the username.

Every AFS user also must have a UNIX UID recorded in the local password file (/etc/passwd or equivalent) of each clientmachine they log onto. Both administration and a user’s AFS access are simplest if the AFS UID and UNIX UID match. Oneimportant consequence of matching UIDs is that the owner reported by the ls -l command matches the AFS username.

It is usually best to allow the Protection Server to allocate the AFS UID as it creates the Protection Database entry. However,both the pts createuser command and the uss commands that create user accounts enable you to assign AFS UIDs explicitly.This is appropriate in two cases:

• You wish to group together the AFS UIDs of related users

• You are converting an existing UNIX account into an AFS account and want to make the AFS UID match the existing UNIXUID

After the Protection Server initializes for the first time on a cell’s first file server machine, it starts assigning AFS UIDs at adefault value. To change the default before creating any user accounts, or at any time, use the pts setmax command to reset themax user id counter. To display the counter, use the pts listmax command. See Displaying and Setting the AFS UIDand GID Counters.

AFS reserves one AFS UID, 32766, for the user anonymous. The AFS server processes assign this identity and AFS UID to anyuser who does not possess a token for the local cell. Do not assign this AFS UID to any other user or hardcode its current valueinto any programs or a file’s owner field, because it is subject to change in future releases.

User Volume Names Like any volume name, a user volume’s base (read/write) name cannot exceed 22 characters in length orinclude the .readonly or .backup extension. See Creating Volumes to Simplify Administration. By convention, user volumenames have the format user.username. Using the user. prefix not only makes it easy to identify the volume’s contents, but alsoto create a backup version of all user volumes by issuing a single vos backupsys command.

Mount Point Names By convention, the mount point for a user’s volume is named after the username. Many cells follow theconvention of mounting user volumes in the /afs/cellname/usr directory, as discussed in The Third Level. Very large cellssometimes find that mounting all user volumes in the same directory slows directory lookup, however; for suggested alternatives,see the following section.

2.8.2 Grouping Home Directories

Mounting user volumes in the /afs/cellname/usr directory is an AFS-appropriate variation on the standard UNIX practice ofputting user home directories under the /usr subdirectory. However, cells with more than a few hundred users sometimes findthat mounting all user volumes in a single directory results in slow directory lookup. The solution is to distribute user volumemount points into several directories; there are a number of alternative methods to accomplish this.

• Distribute user home directories into multiple directories that reflect organizational divisions, such as academic or corporatedepartments. For example, a company can create group directories called usr/marketing, usr/research, usr/finance. A goodfeature of this scheme is that knowing a user’s department is enough to find the user’s home directory. Also, it makes it easy toset the ACL to limit access to members of the department only. A potential drawback arises if departments are of sufficientlyunequal size that users in large departments experience slower lookup than users in small departments. This scheme is also notappropriate in cells where users frequently change between divisions.

• Distribute home directories into alphabetic subdirectories of the usr directory (the usr/a subdirectory, the usr/b subdirectory,and so on), based on the first letter of the username. If the cell is very large, create subdirectories under each letter thatcorrespond to the second letter in the user name. This scheme has the same advantages and disadvantages of a department-based scheme. Anyone who knows the user’s username can find the user’s home directory, but users with names that beginwith popular letters sometimes experience slower lookup.

• Distribute home directories randomly but evenly into more than one grouping directory. One cell that uses this scheme hasover twenty such directories called the usr1 directory, the usr2 directory, and so on. This scheme is especially appropriate incells where the other two schemes do not seem feasible. It eliminates the potential problem of differences in lookup speed,


because all directories are about the same size. Its disadvantage is that there is no way to guess which directory a givenuser’s volume is mounted in, but a solution is to create a symbolic link in the regular usr directory that references the actualmount point. For example, if user smith’s volume is mounted at the /afs/bigcell.example.com/usr17/smith directory, then the/afs/bigcell.example.com/usr/smith directory is a symbolic link to the ../usr17/smith directory. This way, if someone doesnot know which directory the user smith is in, he or she can access it through the link called usr/smith; people who do knowthe appropriate directory save lookup time by specifying it.

For instructions on how to implement the various schemes when using the uss program to create user accounts, see EvenlyDistributing User Home Directories with the G Instruction and Creating a Volume with the V Instruction.

2.8.3 Making a Backup Version of User Volumes Available

Mounting the backup version of a user’s volume is a simple way to enable users themselves to restore data they have accidentallyremoved or deleted. It is conventional to mount the backup version at a subdirectory of the user’s home directory (called perhapsthe OldFiles subdirectory), but other schemes are possible. Once per day you create a new backup version to capture the changesmade that day, overwriting the previous day’s backup version with the new one. Users can always retrieve the previous day’scopy of a file without your assistance, freeing you to deal with more pressing tasks.

Users sometimes want to delete the mount point to their backup volume, because they erroneously believe that the backupvolume’s contents count against their quota. Remind them that the backup volume is separate, so the only space it uses in theuser volume is the amount needed for the mount point.

For further discussion of backup volumes, see Backing Up AFS Data and Creating Backup Volumes.

2.8.4 Creating Standard Files in New AFS Accounts

From your experience as a UNIX administrator, you are probably familiar with the use of login and shell initialization files (suchas the .login and .cshrc files) to make an account easier to use.

It is often practical to add some AFS-specific directories to the definition of the user’s PATH environment variable, including thefollowing:

• The path to a bin subdirectory in the user’s home directory for binaries the user has created (that is, /afs/cellname/usr/username/bin)

• The /usr/afsws/bin path, which conventionally includes programs like fs, klog, kpasswd, pts, tokens, and unlog

• The /usr/afsws/etc path, if the user is an administrator; it usually houses the AFS command suites that require privilege (thebackup, butc, kas, uss, vos commands), and others.

If you are not using an AFS-modified login utility, it can be helpful to users to invoke the klog command in their .login file sothat they obtain AFS tokens as part of logging in. In the following example command sequence, the first line echoes the stringklog to the standard output stream, so that the user understands the purpose of the Password: prompt that appears whenthe second line is executed. The -setpag flag associates the new tokens with a process authentication group (PAG), which isdiscussed further in Identifying AFS Tokens by PAG.

echo -n "klog " klog -setpag

The following sequence of commands has a similar effect, except that the pagsh command forks a new shell with which the PAGand tokens are associated.

pagsh echo -n "klog " klog

If you use an AFS-modified login utility, this sequence is not necessary, because such utilities both log a user in locally andobtain AFS tokens.


2.9 Using AFS Protection Groups

AFS enables users to define their own groups of other users or machines. The groups are placed on ACLs to grant the same per-missions to many users without listing each user individually. For group creation instructions, see Administering the ProtectionDatabase.

Groups have AFS ID numbers, just as users do, but an AFS group ID (GID) is a negative integer whereas a user’s AFS UIDis a positive integer. By default, the Protection Server allocates a new group’s AFS GID automatically, but members of thesystem:administrators group can assign a GID when issuing the pts creategroup command. Before explicitly assigning a GID,it is best to verify that it is not already in use.

A group cannot belong to another group, but it can own another group or even itself as long as it (the owning group) has at leastone member. The current owner of a group can transfer ownership of the group to another user or group, even without the newowner’s permission. At that point the former owner loses administrative control over the group.

By default, each user can create 20 groups. A system administrator can increase or decrease this group creation quota with thepts setfields command.

Each Protection Database entry (group or user) is protected by a set of five privacy flagswhich limit who can administer the entryand what they can do. The default privacy flags are fairly restrictive, especially for user entries. See Setting the Privacy Flags onDatabase Entries.

2.9.1 The Three System Groups

As the Protection Server initializes for the first time on a cell’s first database server machine, it automatically creates three groupentries: the system:anyuser, system:authuser, and system:administrators groups.

The first two system groups are unlike any other groups in the Protection Database in that they do not have a stable membership:

• The system:anyuser group includes everyone who can access a cell’s AFS filespace: users who have tokens for the local cell,users who have logged in on a local AFS client machine but not obtained tokens (such as the local superuser root), and userswho have connected to a local machine from outside the cell. Placing the system:anyuser group on an ACL grants access tothe widest possible range of users. It is the only way to extend access to users from foreign AFS cells that do not have localaccounts.

• The system:authuser group includes everyone who has a valid token obtained from the cell’s AFS authentication service.

Because the groups do not have a stable membership, the pts membership command produces no output for them. Similarly,they do not appear in the list of groups to which a user belongs.

The system:administrators group does have a stable membership, consisting of the cell’s privileged administrators. Membersof this group can issue any pts command, and are the only ones who can issue several other restricted commands (such as thechown command on AFS files). By default, they also implicitly have the a (administer) and l (lookup) permissions on everyACL in the filespace. For information about changing this default, see Administering the system:administrators Group.

For a discussion of how to use system groups effectively on ACLs, see Using Groups on ACLs.

2.9.2 The Two Types of User-Defined Groups

All users can create regular groups. A regular group name has two fields separated by a colon, the first of which must indicatethe group’s ownership. The Protection Server refuses to create or change the name of a group if the result does not accuratelyindicate the ownership.

Members of the system:administrators group can create prefix-less groups whose names do not have the first field that indicatesownership. For suggestions on using the two types of groups effectively, see Using Groups Effectively.


2.10 Login and Authentication in AFS

As explained in Differences in Authentication, AFS authentication is separate from UNIX authentication because the two filesystems are separate. The separation has two practical implications:

• To access AFS files, users must both log into the local file system and authenticate with the AFS authentication service.(Logging into the local file system is necessary because the only way to access the AFS filespace is through a Cache Manager,which resides in the local machine’s kernel.)

• Passwords are stored in two separate places: in the Kerberos Database for AFS and in the each machine’s local password file(the /etc/passwd file or equivalent) for the local file system.

When a user successfully authenticates, the AFS authentication service passes a token to the user’s Cache Manager. The tokenis a small collection of data that certifies that the user has correctly provided the password associated with a particular AFSidentity. The Cache Manager presents the token to AFS server processes along with service requests, as proof that the user isgenuine. To learn about the mutual authentication procedure they use to establish identity, see A More Detailed Look at MutualAuthentication.

The Cache Manager stores tokens in the user’s credential structure in kernel memory. To distinguish one user’s credentialstructure from another’s, the Cache Manager identifies each one either by the user’s UNIX UID or by a process authenticationgroup (PAG), which is an identification number guaranteed to be unique in the cell. For further discussion, see Identifying AFSTokens by PAG.

A user can have only one token per cell in each separately identified credential structure. To obtain a second token for the samecell, the user must either log into a different machine or obtain another credential structure with a different identifier than anyexisting credential structure, which is most easily accomplished by issuing the pagsh command (see Identifying AFS Tokensby PAG). In a single credential structure, a user can have one token for each of many cells at the same time. As this implies,authentication status on one machine or PAG is independent of authentication status on another machine or PAG, which can bevery useful to a user or system administrator.

The AFS distribution includes library files that enable each system type’s login utility to authenticate users with AFS and logthem into the local file system in one step. If you do not configure an AFS-modified login utility on a client machine, its usersmust issue the klog command to authenticate with AFS after logging in.

NoteThe AFS-modified libraries do not necessarily support all features available in an operating system’s proprietary login utility. Insome cases, it is not possible to support a utility at all. For more information about the supported utilities in each AFS version,see the OpenAFS Release Notes.

2.10.1 Identifying AFS Tokens by PAG

As noted, the Cache Manager identifies user credential structures either by UNIX UID or by PAG. Using a PAG is preferablebecause it guaranteed to be unique: the Cache Manager allocates it based on a counter that increments with each use. In contrast,multiple users on a machine can share or assume the same UNIX UID, which creates potential security problems. The followingare two common such situations:

• The local superuser root can always assume any other user’s UNIX UID simply by issuing the su command, without providingthe user’s password. If the credential structure is associated with the user’s UNIX UID, then assuming the UID means inheritingthe AFS tokens.

• Two users working on different NFS client machines can have the same UNIX UID in their respective local file systems. Ifthey both access the same NFS/AFS Translator machine, and the Cache Manager there identifies them by their UNIX UID,they become indistinguishable. To eliminate this problem, the Cache Manager on a translator machine automatically generatesa PAG for each user and uses it, rather than the UNIX UID, to tell users apart.


Yet another advantage of PAGs over UIDs is that processes spawned by the user inherit the PAG and so share the token; thus theygain access to AFS as the authenticated user. In many environments, for example, printer and other daemons run under identities(such as the local superuser root) that the AFS server processes recognize only as the anonymous user. Unless PAGs are used,such daemons cannot access files for which the system:anyuser group does not have the necessary ACL permissions.

Once a user has a PAG, any new tokens the user obtains are associated with the PAG. The PAG expires two hours after anyassociated tokens expire or are discarded. If the user issues the klog command before the PAG expires, the new token is associatedwith the existing PAG (the PAG is said to be recycled in this case).

AFS-modified login utilities automatically generate a PAG, as described in the following section. If you use a standard loginutility, your users must issue the pagsh command before the klog command, or include the latter command’s -setpag flag. Forinstructions, see Using Two-Step Login and Authentication.

Users can also use either command at any time to create a new PAG. The difference between the two commands is that theklog command replaces the PAG associated with the current command shell and tokens. The pagsh command initializes a newcommand shell before creating a new PAG. If the user already had a PAG, any running processes or jobs continue to use thetokens associated with the old PAG whereas any new jobs or processes use the new PAG and its associated tokens. When youexit the new shell (by pressing <Ctrl-d>, for example), you return to the original PAG and shell. By default, the pagsh commandinitializes a Bourne shell, but you can include the -c argument to initialize a C shell (the /bin/csh program on many system types)or Korn shell (the /bin/ksh program) instead.

2.10.2 Using an AFS-modified login Utility

As previously mentioned, an AFS-modified login utility simultaneously obtains an AFS token and logs the user into the local filesystem. This section outlines the login and authentication process and its interaction with the value in the password field of thelocal password file.

An AFS-modified login utility performs a sequence of steps similar to the following; details can vary for different operatingsystems:

1. It checks the user’s entry in the local password file (the /etc/passwd file or equivalent).

2. If no entry exists, or if an asterisk (*) appears in the entry’s password field, the login attempt fails. If the entry exists, theattempt proceeds to the next step.

3. The utility obtains a PAG.

4. The utility converts the password provided by the user into an encryption key and encrypts a packet of data with the key.It sends the packet to the AFS authentication service (the AFS Authentication Server in the conventional configuration).

5. The authentication service decrypts the packet and, depending on the success of the decryption, judges the password to becorrect or incorrect. (For more details, see A More Detailed Look at Mutual Authentication.)

• If the authentication service judges the password incorrect, the user does not receive an AFS token. The PAG is retained,ready to be associated with any tokens obtained later. The attempt proceeds to Step 6.

• If the authentication service judges the password correct, it issues a token to the user as proof of AFS authentication. Thelogin utility logs the user into the local UNIX file system. Some login utilities echo the following banner to the screento alert the user to authentication with AFS. Step 6 is skipped.

AFS(R) version Login

6. If no AFS token was granted in Step 4, the login utility attempts to log the user into the local file system, by comparing thepassword provided to the local password file.

• If the password is incorrect or any value other than an encrypted 13-character string appears in the password field, thelogin attempt fails.

• If the password is correct, the user is logged into the local file system only.

As indicated, when you use an AFS-modified login utility, the password field in the local password file is no longer the primarygate for access to your system. If the user provides the correct AFS password, then the program never consults the local passwordfile. However, you can still use the password field to control access, in the following way:


• To prevent both local login and AFS authentication, place an asterisk (*) in the field. This is useful mainly in emergencies,when you want to prevent a certain user from logging into the machine.

• To prevent login to the local file system if the user does not provide the correct AFS password, place a character string of anylength other than the standard thirteen characters in the field. This is appropriate if you want to permit only people with localAFS accounts to login on your machines. A single X or other character is the most easily recognizable way to do this.

• To enable a user to log into the local file system even after providing an incorrect AFS password, record a standard UNIXencrypted password in the field by issuing the standard UNIX password-setting command (passwd or equivalent).

Systems that use a Pluggable Authentication Module (PAM) for login and AFS authentication do not necessarily consult thelocal password file at all, in which case they do not use the password field to control authentication and login attempts. Instead,instructions in the PAM configuration file (on many system types, /etc/pam.conf) fill the same function. See the instructions inthe OpenAFS Quick Beginnings for installing AFS-modified login utilities.

2.10.3 Using Two-Step Login and Authentication

In cells that do not use an AFS-modified login utility, users must issue separate commands to login and authenticate, as detailedin the OpenAFS User Guide:

1. They use the standard login program to login to the local file system, providing the password listed in the local passwordfile (the /etc/passwd file or equivalent).

2. They must issue the klog command to authenticate with the AFS authentication service, including its -setpag flag toassociate the new tokens with a process authentication group (PAG).

As mentioned in Creating Standard Files in New AFS Accounts, you can invoke the klog -setpag command in a user’s .loginfile (or equivalent) so that the user does not have to remember to issue the command after logging in. The user still must type apassword twice, once at the prompt generated by the login utility and once at the klog command’s prompt. This implies that thetwo passwords can differ, but it is less confusing if they do not.

Another effect of not using an AFS-modified login utility is that the AFS servers recognize the standard login program as theanonymous user. If the login program needs to access any AFS files (such as the .login file in a user’s home directory), then theACL that protects the file must include an entry granting the l (lookup) and r (read) permissions to the system:anyuser group.

When you do not use an AFS-modified login utility, an actual (scrambled) password must appear in the local password file foreach user. Use the /bin/passwd file to insert or change these passwords. It is simpler if the password in the local password filematches the AFS password, but it is not required.

2.10.4 Obtaining, Displaying, and Discarding Tokens

Once logged in, a user can obtain a token at any time with the klog command. If a valid token already exists, the new oneoverwrites it. If a PAG already exists, the new token is associated with it.

By default, the klog command authenticates the issuer using the identity currently logged in to the local file system. To authen-ticate as a different identity, use the -principal argument. To obtain a token for a foreign cell, use the -cell argument (it can becombined with the -principal argument). See the OpenAFS User Guide and the entry for the klog command in the OpenAFSAdministration Reference.

To discard either all tokens or the token for a particular cell, issue the unlog command. The command affects only the tokensassociated with the current command shell. See the OpenAFS User Guideand the entry for the unlog command in the OpenAFSAdministration Reference.

To display the tokens associated with the current command shell, issue the tokens command. The following examples illustrateits output in various situations.

If the issuer is not authenticated in any cell:


% tokensTokens held by the Cache Manager:

--End of list--

The following shows the output for a user with AFS UID 1000 in the Example Corporation cell:

% tokensTokens held by the Cache Manager:User’s (AFS ID 1000) tokens for [email protected] [Expires Jun 2 10:00]

--End of list--

The following shows the output for a user who is authenticated in Example Corporation cell, the Example Organization cell andthe Example Network cell. The user has different AFS UIDs in the three cells. Tokens for the last cell are expired:

% tokensTokens held by the Cache Manager:User’s (AFS ID 1000) tokens for [email protected] [Expires Jun 2 10:00]User’s (AFS ID 4286) tokens for [email protected] [Expires Jun 3 1:34]User’s (AFS ID 22) tokens for [email protected] [>>Expired<<]

--End of list--

The Kerberos version of the tokens command (the tokens.krb command), also reports information on the ticket-granting ticket,including the ticket’s owner, the ticket-granting service, and the expiration date, as in the following example. Also see Supportfor Kerberos Authentication.

% tokens.krbTokens held by the Cache Manager:User’s (AFS ID 1000) tokens for [email protected] [Expires Jun 2 10:00]User smith’s tokens for [email protected] [Expires Jun 2 10:00]--End of list--

2.10.5 Setting Default Token Lifetimes for Users

The maximum lifetime of a user token is the smallest of the ticket lifetimes recorded in the following three AuthenticationDatabase entries. The kas examine command reports the lifetime as Max ticket lifetime. Administrators who have theADMIN flag on their Authentication Database entry can use the -lifetime argument to the kas setfields command to set an entry’sticket lifetime.

• The afs entry, which corresponds to the AFS server processes. The default is 100 hours.

• The krbtgt.cellname entry, which corresponds to the ticket-granting ticket used internally in generating the token. The defaultis 720 hours (30 days).

• The entry for the user of the AFS-modified login utility or issuer of the klog command. The default is 25 hours for user entriescreated using the AFS 3.1 or later version of the Authentication Server, and 100 hours for user entries created using the AFS3.0 version of the Authentication Server. A user can use the kas examine command to display his or her own AuthenticationDatabase entry.

NoteAn AFS-modified login utility always grants a token with a lifetime calculated from the previously described three values. Whenissuing the klog command, a user can request a lifetime shorter than the default by using the -lifetime argument. For furtherinformation, see the OpenAFS User Guide and the klog reference page in the OpenAFS Administration Reference.


2.10.6 Changing Passwords

Regular AFS users can change their own passwords by using either the kpasswd or kas setpassword command. The commandsprompt for the current password and then twice for the new password, to screen out typing errors.

Administrators who have the ADMIN flag on their Authentication Database entries can change any user’s password, either byusing the kpasswd command (which requires knowing the current password) or the kas setpassword command.

If your cell does not use an AFS-modified login utility, remember also to change the local password, using the operating system’spassword-changing command. For more instructions on changing passwords, see Changing AFS Passwords.

2.10.7 Imposing Restrictions on Passwords and Authentication Attempts

You can help to make your cell more secure by imposing restrictions on user passwords and authentication attempts. To imposethe restrictions as you create an account, use the A instruction in the uss template file as described in Increasing AccountSecurity with the A Instruction. To set or change the values on an existing account, use the kas setfields command as describedin Improving Password and Authentication Security.

By default, AFS passwords never expire. Limiting password lifetime can help improve security by decreasing the time thepassword is subject to cracking attempts. You can choose an lifetime from 1 to 254 days after the password was last changed.It automatically applies to each new password as it is set. When the user changes passwords, you can also insist that the newpassword is not similar to any of the 20 passwords previously used.

Unscrupulous users can try to gain access to your AFS cell by guessing an authorized user’s password. To protect against thistype of attack, you can limit the number of times that a user can consecutively fail to provide the correct password. When thelimit is exceeded, the authentication service refuses further authentication attempts for a specified period of time (the lockouttime). To reenable authentication attempts before the lockout time expires, an administrator must issue the kas unlock command.

In addition to settings on user’s authentication accounts, you can improve security by automatically checking the quality ofnew user passwords. The kpasswd and kas setpassword commands pass the proposed password to a program or script calledkpwvalid, if it exists. The kpwvalid performs quality checks and returns a code to indicate whether the password is acceptable.You can create your own program or modified the sample program included in the AFS distribution. See the kpwvalid referencepage in the OpenAFS Administration Reference.

There are several types of quality checks that can improve password quality.

• The password is a minimum length

• The password is not a word

• The password contains both numbers and letters

2.10.8 Support for Kerberos Authentication

If your site is using standard Kerberos authentication rather than the AFS Authentication Server, use the modified versions ofthe klog, pagsh, and tokens commands that support Kerberos authentication. The binaries for the modified version of thesecommands have the same name as the standard binaries with the addition of a .krb extension.

Use either the Kerberos version or the standard command throughout the cell; do not mix the two versions. AFS Product Supportcan provide instructions on installing the Kerberos version of these four commands. For information on the differences betweenthe two versions of these commands, see the OpenAFS Administration Reference.

2.11 Security and Authorization in AFS

AFS incorporates several features to ensure that only authorized users gain access to data. This section summarizes the mostimportant of them and suggests methods for improving security in your cell.


2.11.1 Some Important Security Features

ACLs on Directories Files in AFS are protected by the access control list (ACL) associated with their parent directory. The ACLdefines which users or groups can access the data in the directory, and in what way. See Managing Access Control Lists.

Mutual Authentication Between Client and Server When an AFS client and server process communicate, each requires theother to prove its identity during mutual authentication, which involves the exchange of encrypted information that only validparties can decrypt and respond to. For a detailed description of the mutual authentication process, see A More Detailed Look atMutual Authentication.

AFS server processes mutually authenticate both with one another and with processes that represent human users. After mutualauthentication is complete, the server and client have established an authenticated connection, across which they can communi-cate repeatedly without having to authenticate again until the connection expires or one of the parties closes it. Authenticatedconnections have varying lifetimes.

Tokens In order to access AFS files, users must prove their identities to the AFS authentication service by providing the correctAFS password. If the password is correct, the authentication service sends the user a token as evidence of authenticated status.See Login and Authentication in AFS.

Servers assign the user identity anonymous to users and processes that do not have a valid token. The anonymous identity hasonly the access granted to the system:anyuser group on ACLs.

Authorization Checking Mutual authentication establishes that two parties communicating with one another are actually whothey claim to be. For many functions, AFS server processes also check that the client whose identity they have verified is alsoauthorized to make the request. Different requests require different kinds of privilege. See Three Types of Privilege.

Encrypted Network Communications

The AFS server processes encrypt particularly sensitive information before sending it back to clients. Even if an unauthorizedparty is able to eavesdrop on an authenticated connection, they cannot decipher encrypted data without the proper key.

The following AFS commands encrypt data because they involve server encryption keys and passwords:

• The bos addkey command, which adds a server encryption key to the /usr/afs/etc/KeyFile file

• The bos listkeys command, which lists the server encryption keys from the /usr/afs/etc/KeyFile file

• The kpasswd command, which changes a password in the Authentication Database

• Most commands in the kas command suite

In addition, the Update Server encrypts sensitive information (such as the contents of KeyFile) when distributing it. Othercommands in the bos suite and the commands in the fs, pts and vos suites do not encrypt data before transmitting it.

2.11.2 Three Types of Privilege

AFS uses three separate types of privilege for the reasons discussed in The Reason for Separate Privileges.

• Membership in the system:administrators group. Members are entitled to issue any pts command and those fs commandsthat set volume quota. By default, they also implicitly have the a (administer) and l (lookup) permissions on every ACL inthe file tree even if the ACL does not include an entry for them.

• The ADMIN flag on the Authentication Database entry. An administrator with this flag can issue any kas command.

• Inclusion in the /usr/afs/etc/UserList file. An administrator whose username appears in this file can issue any bos, vos, orbackup command (although some backup commands require additional privilege as described in Granting AdministrativePrivilege to Backup Operators).


2.11.3 Authorization Checking versus Authentication

AFS distinguishes between authentication and authorization checking. Authentication refers to the process of proving identity.Authorization checking refers to the process of verifying that an authenticated identity is allowed to perform a certain action.

AFS implements authentication at the level of connections. Each time two parties establish a new connection, they mutuallyauthenticate. In general, each issue of an AFS command establishes a new connection between AFS server process and client.

AFS implements authorization checking at the level of server machines. If authorization checking is enabled on a server machine,then all of the server processes running on it provide services only to authorized users. If authorization checking is disabled ona server machine, then all of the server processes perform any action for anyone. Obviously, disabling authorization checking isan extreme security exposure. For more information, see Managing Authentication and Authorization Requirements.

2.11.4 Improving Security in Your Cell

You can improve the level of security in your cell by configuring user accounts, server machines, and system administratoraccounts in the indicated way.

User Accounts

• Use an AFS-modified login utility, or include the -setpag flag to the klog command, to associate the credential structure thathouses tokens with a PAG rather than a UNIX UID. This prevents users from inheriting someone else’s tokens by assumingtheir UNIX identity. For further discussion, see Identifying AFS Tokens by PAG.

• Encourage users to issue the unlog command to destroy their tokens before logging out. This forestalls attempts to accesstokens left behind kernel memory. Consider including the unlog command in every user’s .logout file or equivalent.

Server Machines

• Disable authorization checking only in emergencies or for very brief periods of time. It is best to work at the console of theaffected machine during this time, to prevent anyone else from accessing the machine through the keyboard.

• Change the AFS server encryption key on a frequent and regular schedule. Make it difficult to guess (a long string includingnonalphabetic characters, for instance). Unlike user passwords, the password from which the AFS key is derived can be longerthan eight characters, because it is never used during login. The kas setpassword command accepts a password hundreds ofcharacters long. For instructions, see Managing Server Encryption Keys.

• As much as possible, limit the number of people who can login at a server machine’s console or remotely. Imposing this limitis an extra security precaution rather than a necessity. The machine cannot serve as an AFS client in this case.

• Particularly limit access to the local superuser root account on a server machine. The local superuser root has free access toimportant administrative subdirectories of the /usr/afs directory, as described in AFS Files on the Local Disk.

• As in any computing environment, server machines must be located in a secured area. Any other security measures areeffectively worthless if unauthorized people can access the computer hardware.

System Administrators

• Limit the number of system administrators in your cell. Limit the use of system administrator accounts on publicly accessibleworkstations. Such machines are not secure, so unscrupulous users can install programs that try to steal tokens or passwords.If administrators must use publicly accessible workstations at times, they must issue the unlog command before leaving themachine.

• Create an administrative account for each administrator separate from the personal account, and assign AFS privileges only tothe administrative account. The administrators must authenticate to the administrative accounts to perform duties that requireprivilege, which provides a useful audit trail as well.

• Administrators must not leave a machine unattended while they have valid tokens. Issue the unlog command before leaving.


• Use the -lifetime argument to the kas setfields command to set the token lifetime for administrative accounts to a fairly shortamount of time. The default lifetime for AFS tokens is 25 hours, but 30 or 60 minutes is possibly a more reasonable lifetime foradministrative tokens. The tokens for administrators who initiate AFS Backup System operations must last somewhat longer,because it can take several hours to complete some dump operations, depending on the speed of the tape device and the networkconnecting it to the file server machines that house the volumes is it accessing.

• Limit administrators’ use of the telnet program. It sends unencrypted passwords across the network. Similarly, limit use ofother remote programs such as rsh and rcp, which send unencrypted tokens across the network.

2.11.5 A More Detailed Look at Mutual Authentication

As in any file system, security is a prime concern in AFS. A file system that makes file sharing easy is not useful if it makesfile sharing mandatory, so AFS incorporates several features that prevent unauthorized users from accessing data. Security in anetworked environment is difficult because almost all procedures require transmission of information across wires that almostanyone can tap into. Also, many machines on networks are powerful enough that unscrupulous users can monitor transactions oreven intercept transmissions and fake the identity of one of the participants.

The most effective precaution against eavesdropping and information theft or fakery is for servers and clients to accept theclaimed identity of the other party only with sufficient proof. In other words, the nature of the network forces all parties on thenetwork to assume that the other party in a transaction is not genuine until proven so. Mutual authentication is the means throughwhich parties prove their genuineness.

Because the measures needed to prevent fakery must be quite sophisticated, the implementation of mutual authentication pro-cedures is complex. The underlying concept is simple, however: parties prove their identities by demonstrating knowledge ofa shared secret. A shared secret is a piece of information known only to the parties who are mutually authenticating (they cansometimes learn it in the first place from a trusted third party or some other source). The party who originates the transactionpresents the shared secret and refuses to accept the other party as valid until it shows that it knows the secret too.

The most common form of shared secret in AFS transactions is the encryption key, also referred to simply as a key. The twoparties use their shared key to encrypt the packets of information they send and to decrypt the ones they receive. Encryptionusing keys actually serves two related purposes. First, it protects messages as they cross the network, preventing anyone whodoes not know the key from eavesdropping. Second, ability to encrypt and decrypt messages successfully indicates that theparties are using the key (it is their shared secret). If they are using different keys, messages remain scrambled and unintelligibleafter decryption.

The following sections describe AFS’s mutual authentication procedures in more detail. Feel free to skip these sections if youare not interested in the mutual authentication process.

2.11.5.1 Simple Mutual Authentication

Simple mutual authentication involves only one encryption key and two parties, generally a client and server. The client contactsthe server by sending a challenge message encrypted with a key known only to the two of them. The server decrypts the messageusing its key, which is the same as the client’s if they really do share the same secret. The server responds to the challenge anduses its key to encrypt its response. The client uses its key to decrypt the server’s response, and if it is correct, then the client canbe sure that the server is genuine: only someone who knows the same key as the client can decrypt the challenge and answer itcorrectly. On its side, the server concludes that the client is genuine because the challenge message made sense when the serverdecrypted it.

AFS uses simple mutual authentication to verify user identities during the first part of the login procedure. In that case, the keyis based on the user’s password.

2.11.5.2 Complex Mutual Authentication

Complex mutual authentication involves three encryption keys and three parties. All secure AFS transactions (except the firstpart of the login process) employ complex mutual authentication.

When a client wishes to communicate with a server, it first contacts a third party called a ticket-granter. The ticket-granter andthe client mutually authenticate using the simple procedure. When they finish, the ticket-granter gives the client a server ticket


(or simply ticket) as proof that it (the ticket-granter) has preverified the identity of the client. The ticket-granter encrypts theticket with the first of the three keys, called the server encryption key because it is known only to the ticket-granter and the serverthe client wants to contact. The client does not know this key.

The ticket-granter sends several other pieces of information along with the ticket. They enable the client to use the ticketeffectively despite being unable to decrypt the ticket itself. Along with the ticket, the items constitute a token:

• A session key, which is the second encryption key involved in mutual authentication. The ticket-granter invents the sessionkey at random as the shared secret between client and server. For reasons explained further below, the ticket-granter also putsa copy of the session key inside the ticket. The client and server use the session key to encrypt messages they send to oneanother during their transactions. The ticket-granter invents a different session key for each connection between a client and aserver (there can be several transactions during a single connection).

• The name of the server for which the ticket is valid (and so which server encryption key encrypts the ticket itself).

• A ticket lifetime indicator. The default lifetime of AFS server tickets is 100 hours. If the client wants to contact the serveragain after the ticket expires, it must contact the ticket-granter to get a new ticket.

The ticket-granter seals the entire token with the third key involved in complex mutual authentication--the key known only to it(the ticket-granter) and the client. In some cases, this third key is derived from the password of the human user whom the clientrepresents.

Now that the client has a valid server ticket, it is ready to contact the server. It sends the server two things:

• The server ticket. This is encrypted with the server encryption key.

• Its request message, encrypted with the session key. Encrypting the message protects it as it crosses the network, since onlythe server/client pair for whom the ticket-granter invented the session key know it.

At this point, the server does not know the session key, because the ticket-granter just created it. However, the ticket-granter puta copy of the session key inside the ticket. The server uses the server encryption key to decrypts the ticket and learns the sessionkey. It then uses the session key to decrypt the client’s request message. It generates a response and sends it to the client. Itencrypts the response with the session key to protect it as it crosses the network.

This step is the heart of mutual authentication between client and server, because it proves to both parties that they know thesame secret:

• The server concludes that the client is authorized to make a request because the request message makes sense when the serverdecrypts it using the session key. If the client uses a different session key than the one the server finds inside the ticket, then therequest message remains unintelligible even after decryption. The two copies of the session key (the one inside the ticket andthe one the client used) can only be the same if they both came from the ticket-granter. The client cannot fake knowledge ofthe session key because it cannot look inside the ticket, sealed as it is with the server encryption key known only to the serverand the ticket-granter. The server trusts the ticket-granter to give tokens only to clients with whom it (the ticket-granter) hasauthenticated, so the server decides the client is legitimate.

(Note that there is no direct communication between the ticket-granter and the server, even though their relationship is centralto ticket-based mutual authentication. They interact only indirectly, via the client’s possession of a ticket sealed with theirshared secret.)

• The client concludes that the server is genuine and trusts the response it gets back from the server, because the response makessense after the client decrypts it using the session key. This indicates that the server encrypted the response with the samesession key as the client knows. The only way for the server to learn that matching session key is to decrypt the ticket first. Theserver can only decrypt the ticket because it shares the secret of the server encryption key with the ticket-granter. The clienttrusts the ticket-granter to give out tickets only for legitimate servers, so the client accepts a server that can decrypt the ticketas genuine, and accepts its response.

2.12 Backing Up AFS Data

AFS provides two related facilities that help the administrator back up AFS data: backup volumes and the AFS Backup System.


2.12.1 Backup Volumes

The first facility is the backup volume, which you create by cloning a read/write volume. The backup volume is read-only andso preserves the state of the read/write volume at the time the clone is made.

Backup volumes can ease administration if you mount them in the file system and make their contents available to users. Forexample, it often makes sense to mount the backup version of each user volume as a subdirectory of the user’s home directory. Aconventional name for this mount point is OldFiles. Create a new version of the backup volume (that is, reclone the read/write)once a day to capture any changes that were made since the previous backup. If a user accidentally removes or changes data, theuser can restore it from the backup volume, rather than having to ask you to restore it.

The OpenAFS User Guide does not mention backup volumes, so regular users do not know about them if you decide not to usethem. This implies that if you do make backup versions of user volumes, you need to tell your users about how the backup worksand where you have mounted it.

Users are often concerned that the data in a backup volume counts against their volume quota and some of them even want toremove the OldFiles mount point. It does not, because the backup volume is a separate volume. The only amount of space ituses in the user’s volume is the amount needed for the mount point, which is about the same as the amount needed for a standarddirectory element.

Backup volumes are discussed in detail in Creating Backup Volumes.

2.12.2 The AFS Backup System

Backup volumes can reduce restoration requests, but they reside on disk and so do not protect data from loss due to hardwarefailure. Like any file system, AFS is vulnerable to this sort of data loss.

To protect your cell’s users from permanent loss of data, you are strongly urged to back up your file system to tape on a regularand frequent schedule. The AFS Backup System is available to ease the administration and performance of backups. For detailedinformation about the AFS Backup System, see Configuring the AFS Backup System and Backing Up and Restoring AFS Data.

2.13 Accessing AFS through NFS

Users of NFS client machines can access the AFS filespace by mounting the /afs directory of an AFS client machine that isrunning the NFS/AFS Translator. This is a particular advantage in cells already running NFS who want to access AFS usingclient machines for which AFS is not available. See Appendix A, Managing the NFS/AFS Translator.


Part II

Managing File Server Machines


Chapter 3

Administering Server Machines

This chapter describes how to administer an AFS server machine. It describes the following configuration information andadministrative tasks:

• The binary and configuration files that must reside in the subdirectories of the /usr/afs directory on every server machine’slocal disk; see Local Disk Files on a Server Machine.

• The various roles or functions that an AFS server machine can perform, and how to determine which machines are taking arole; see The Four Roles for File Server Machines.

• How to maintain database server machines; see Administering Database Server Machines.

• How to maintain the list of database server machines in the /usr/afs/etc/CellServDB file; see Maintaining the Server CellServDBFile.

• How to control authorization checking on a server machine; see Managing Authentication and Authorization Requirements.

• How to install new disks or partitions on a file server machine; see Adding or Removing Disks and Partitions.

• How to change a server machine’s IP addresses and manager VLDB server entries; see Managing Server IP Addresses andVLDB Server Entries.

• How to reboot a file server machine; see Rebooting a Server Machine.

To learn how to install and configure a new server machine, see the OpenAFS Quick Beginnings.

To learn how to administer the server processes themselves, see Monitoring and Controlling Server Processes.

To learn how to administer volumes, see Managing Volumes.

3.1 Summary of Instructions

This chapter explains how to perform the following tasks by using the indicated commands:

Install new binaries bos installExamine binary check-and-restart time bos getrestartSet binary check-and-restart time bos setrestartExamine compilation dates on binary files bos getdateRestart a process to use new binaries bos restartRevert to old version of binaries bos uninstallRemove obsolete .BAK and .OLD versions bos pruneList partitions on a file server machine vos listpartShutdown AFS server processes bos shutdown


List volumes on a partition vos listvldbMove read/write volumes vos moveList a cell’s database server machines bos listhostsAdd a database server machine to server CellServDB file bos addhostRemove a database server machine from server CellServDB file bos removehostSet authorization checking requirements bos setauthPrevent authentication for bos, pts, and vos commands Include -noauth flag

Prevent authentication for kas commands

Include -noauth flag on somecommands or issuenoauthentication while ininteractive mode

Display all VLDB server entries vos listaddrsRemove a VLDB server entry vos changeaddrReboot a server machine remotely bos exec reboot_command

3.2 Local Disk Files on a Server Machine

Several types of files must reside in the subdirectories of the /usr/afs directory on an AFS server machine’s local disk. Theyinclude binaries, configuration files, the administrative database files (on database server machines), log files, and volume headerfiles.

Note for Windows users: Some files described in this document possibly do not exist on machines that run a Windows operatingsystem. Also, Windows uses a backslash (\) rather than a forward slash (/) to separate the elements in a pathname.

3.2.1 Binaries in the /usr/afs/bin Directory

The /usr/afs/bin directory stores the AFS server process and command suite binaries appropriate for the machine’s system (CPUand operating system) type. If a process has both a server portion and a client portion (as with the Update Server) or if it hasseparate components (as with the fs process), each component resides in a separate file.

To ensure predictable system performance, all file server machines must run the same AFS build version of a given process.To maintain consistency easily, use the Update Server process to distribute binaries from a binary distribution machine of eachsystem type, as described further in Binary Distribution Machines.

It is best to keep the binaries for all processes in the /usr/afs/bin directory, even if you do not run the process actively on themachine. It simplifies the process of reconfiguring machines (for example, adding database server functionality to an existingfile server machine). Similarly, it is best to keep the command suite binaries in the directory, even if you do not often issuecommands while working on the server machine. It enables you to issue commands during recovery from server and machineoutages.

The following lists the binary files in the /usr/afs/bin directory that are directly related to the AFS server processes or commandsuites. Other binaries (for example, for the klog command) sometimes appear in this directory on a particular file server machine’sdisk or in an AFS distribution.

backup The command suite for the AFS Backup System (the binary for the Backup Server is buserver).

bos The command suite for communicating with the Basic OverSeer (BOS) Server (the binary for the BOS Server is bosserver).

bosserver The binary for the Basic OverSeer (BOS) Server process.

buserver The binary for the Backup Server process.

fileserver The binary for the File Server component of the fs process.

kas The command suite for communicating with the Authentication Server (the binary for the Authentication Server is kaserver).


kaserver The binary for the Authentication Server process.

pts The command suite for communicating with the Protection Server process (the binary for the Protection Server is ptserver).

ptserver The binary for the Protection Server process.

salvager The binary for the Salvager component of the fs process.

udebug The binary for a program that reports the status of AFS’s distributed database technology, Ubik.

upclient The binary for the client portion of the Update Server process.

upserver The binary for the server portion of the Update Server process.

vlserver The binary for the Volume Location (VL) Server process.

volserver The binary for the Volume Server component of the fs process.

vos The command suite for communicating with the Volume and VL Server processes (the binaries for the servers are volserverand vlserver, respectively).

3.2.2 Common Configuration Files in the /usr/afs/etc Directory

The directory /usr/afs/etc on every file server machine’s local disk contains configuration files in ASCII and machine-independentbinary format. For predictable AFS performance throughout a cell, all server machines must have the same version of eachconfiguration file:

• Cells conventionally use the Update Server to distribute a common version of each file from the cell’s system control machineto other server machines (for more on the system control machine, see The System Control Machine). Run the Update Server’sserver portion on the system control machine, and the client portion on all other server machines. Update the files on the systemcontrol machine only, except as directed by instructions for dealing with emergencies.

Never directly edit any of the files in the /usr/afs/etc directory, except as directed by instructions for dealing with emergencies. Innormal circumstances, use the appropriate bos commands to change the files. The following list includes pointers to instructions.

The files in this directory include:

CellServDB An ASCII file that names the cell’s database server machines, which run the Authentication, Backup, Protection,and VL Server processes. You create the initial version of this file by issuing the bos setcellname command while installingyour cell’s first server machine. It is very important to update this file when you change the identity of your cell’s databaseserver machines.

The server CellServDB file is not the same as the CellServDB file stored in the /usr/vice/etc directory on client machines.The client version lists the database server machines for every AFS cell that you choose to make accessible from the clientmachine. The server CellServDB file lists only the local cell’s database server machines, because server processes nevercontact processes in other cells.

For instructions on maintaining this file, see Maintaining the Server CellServDB File.

KeyFile A machine-independent, binary-format file that lists the server encryption keys the AFS server processes use to encryptand decrypt tickets. The information in this file is the basis for secure communication in the cell, and so is extremelysensitive. The file is specially protected so that only privileged users can read or change it.

For instructions on maintaining this file, see Managing Server Encryption Keys.

ThisCell An ASCII file that consists of a single line defining the complete Internet domain-style name of the cell (such asexample.com). You create this file with the bos setcellname command during the installation of your cell’s first fileserver machine, as instructed in the OpenAFS Quick Beginnings.

Note that changing this file is only one step in changing your cell’s name. For discussion, see Choosing a Cell Name.

UserList An ASCII file that lists the usernames of the system administrators authorized to issue privileged bos, vos, and backupcommands. For instructions on maintaining the file, see Administering the UserList File.


3.2.3 Local Configuration Files in the /usr/afs/local Directory

The directory /usr/afs/local contains configuration files that are different for each file server machine in a cell. Thus, they are notupdated automatically from a central source like the files in /usr/afs/bin and /usr/afs/etc directories. The most important file isthe BosConfig file; it defines which server processes are to run on that machine.

As with the common configuration files in /usr/afs/etc, you must not edit these files directly. Use commands from the boscommand suite where appropriate; some files never need to be altered.

The files in this directory include the following:

BosConfig This file lists the server processes to run on the server machine, by defining which processes the BOS Server monitorsand what it does if the process fails. It also defines the times at which the BOS Server automatically restarts processes formaintenance purposes.

As you create server processes during a file server machine’s installation, their entries are defined in this file automatically.The OpenAFS Quick Beginnings outlines the bos commands to use. For a more complete description of the file, andinstructions for controlling process status by editing the file with commands from the bos suite, see Monitoring andControlling Server Processes.

NetInfo This optional ASCII file lists one or more of the network interface addresses on the server machine. If it exists whenthe File Server initializes, the File Server uses it as the basis for the list of interfaces that it registers in its Volume LocationDatabase (VLDB) server entry. See Managing Server IP Addresses and VLDB Server Entries.

NetRestrict This optional ASCII file lists one or more network interface addresses. If it exists when the File Server initializes,the File Server removes the specified addresses from the list of interfaces that it registers in its VLDB server entry. SeeManaging Server IP Addresses and VLDB Server Entries.

NoAuth This zero-length file instructs all AFS server processes running on the machine not to perform authorization checking.Thus, they perform any action for any user, even anonymous. This very insecure state is useful only in rare instances,mainly during the installation of the machine.

The file is created automatically when you start the initial bosserver process with the -noauth flag, or issue the bos setauthcommand to turn off authentication requirements. When you use the bos setauth command to turn on authentication, theBOS Server removes this file. For more information, see Managing Authentication and Authorization Requirements.

SALVAGE.fs This zero-length file controls how the BOS Server handles a crash of the File Server component of the fs process.The BOS Server creates this file each time it starts or restarts the fs process. If the file is present when the File Servercrashes, then the BOS Server runs the Salvager before restarting the File Server and Volume Server again. When the FileServer exits normally, the BOS Server removes the file so that the Salvager does not run.

Do not create or remove this file yourself; the BOS Server does so automatically. If necessary, you can salvage a volumeor partition by using the bos salvage command; see Salvaging Volumes.

salvage.lock This file guarantees that only one Salvager process runs on a file server machine at a time (the single process canfork multiple subprocesses to salvage multiple partitions in parallel). As the Salvager initiates (when invoked by the BOSServer or by issue of the bos salvage command), it creates this zero-length file and issues the flock system call on it. Itremoves the file when it completes the salvage operation. Because the Salvager must lock the file in order to run, only oneSalvager can run at a time.

sysid This file records the network interface addresses that the File Server (fileserver process) registers in its VLDB serverentry. When the Cache Manager requests volume location information, the Volume Location (VL) Server provides all ofthe interfaces registered for each server machine that houses the volume. This enables the Cache Manager to make useof multiple addresses when accessing AFS data stored on a multihomed file server machine. For further information, seeManaging Server IP Addresses and VLDB Server Entries.

3.2.4 Replicated Database Files in the /usr/afs/db Directory

The directory /usr/afs/db contains two types of files pertaining to the four replicated databases in the cell--the AuthenticationDatabase, Backup Database, Protection Database, and Volume Location Database (VLDB):


• A file that contains each database, with a .DB0 extension.

• A log file for each database, with a .DBSYS1 extension. The database server process logs each database operation in this filebefore performing it. If the operation is interrupted, the process consults this file to learn how to finish it.

Each database server process (Authentication, Backup, Protection, or VL Server) maintains its own database and log files.The database files are in binary format, so you must always access or alter them using commands from the kas suite (for theAuthentication Database), backup suite (for the Backup Database), pts suite (for the Protection Database), or vos suite (for theVLDB).

If a cell runs more than one database server machine, each database server process keeps its own copy of its database on itsmachine’s hard disk. However, it is important that all the copies of a given database are the same. To synchronize them,the database server processes call on AFS’s distributed database technology, Ubik, as described in Replicating the OpenAFSAdministrative Databases.

The files listed here appear in this directory only on database server machines. On non-database server machines, this directoryis empty.

bdb.DB0 The Backup Database file.

bdb.DBSYS1 The Backup Database log file.

kaserver.DB0 The Authentication Database file.

kaserver.DBSYS1 The Authentication Database log file.

prdb.DB0 The Protection Database file.

prdb.DBSYS1 The Protection Database log file.

vldb.DB0 The Volume Location Database file.

vldb.DBSYS1 The Volume Location Database log file.

3.2.5 Log Files in the /usr/afs/logs Directory

The /usr/afs/logs directory contains log files from various server processes. The files detail interesting events that occur duringnormal operations. For instance, the Volume Server can record volume moves in the VolserLog file. Events are recorded atcompletion, so the server processes do not use these files to reconstruct failed operations unlike the ones in the /usr/afs/dbdirectory.

The information in log files can be very useful as you evaluate process failures and other problems. For instance, if you receivea timeout message when you try to access a volume, checking the FileLog file possibly provides an explanation, showing thatthe File Server was unable to attach the volume. To examine a log file remotely, use the bos getlog command as described inDisplaying Server Process Log Files.

This directory also contains the core image files generated if a process being monitored by the BOS Server crashes. The BOSServer attempts to add an extension to the standard core name to indicate which process generated the core file (for example,naming a core file generated by the Protection Server core.ptserver). The BOS Server cannot always assign the correct extensionif two processes fail at about the same time, so it is not guaranteed to be correct.

The directory contains the following files:

AuthLog The Authentication Server’s log file.

BackupLog The Backup Server’s log file.

BosLog The BOS Server’s log file.

FileLog The File Server’s log file.

SalvageLog The Salvager’s log file.


VLLog The Volume Location (VL) Server’s log file.

VolserLog The Volume Server’s log file.

core.process If present, a core image file produced as an AFS server process on the machine crashed (probably the processnamed by process).

NoteTo prevent log files from growing unmanageably large, restart the server processes periodically, particularly the database serverprocesses. To avoid restarting the processes, use the UNIX rm command to remove the file as the process runs; it re-createsit automatically.

3.2.6 Volume Headers on Server Partitions

A partition that houses AFS volumes must be mounted at a subdirectory of the machine’s root ( / ) directory (not, for instanceunder the /usr directory). The file server machine’s file system registry file (/etc/fstab or equivalent) must correctly map thedirectory name and the partition’s device name. The directory name is of the form /vicepindex, where each index is one or twolowercase letters. By convention, the first AFS partition on a machine is mounted at /vicepa, the second at /vicepb, and so on. Ifthere are more than 26 partitions, continue with /vicepaa, /vicepab and so on. The OpenAFS Release Notes specifies the numberof supported partitions per server machine.

Do not store non-AFS files on AFS partitions. The File Server and Volume Server expect to have available all of the space on thepartition.

The /vicep directories contain two types of files:

Vvol_ID.vol Each such file is a volume header. The vol_ID corresponds to the volume ID number displayed in the output fromthe vos examine, vos listvldb, and vos listvol commands.

FORCESALVAGE This zero-length file triggers the Salvager to salvage the entire partition. The AFS-modified version of thefsck program creates this file if it discovers corruption.

NoteFor most system types, it is important never to run the standard fsck program provided with the operating system on an AFSfile server machine. It removes all AFS volume data from server partitions because it does not recognize their format.

3.3 The Four Roles for File Server Machines

In cells that have more than one server machine, not all server machines have to perform exactly the same functions. The are fourpossible roles a machine can assume, determined by which server processes it is running. A machine can assume more than onerole by running all of the relevant processes. The following list summarizes the four roles, which are described more completelyin subsequent sections.

• A simple file server machine runs only the processes that store and deliver AFS files to client machines. You can run as manysimple file server machines as you need to satisfy your cell’s performance and disk space requirements.

• A database server machine runs the four database server processes that maintain AFS’s replicated administrative databases:the Authentication, Backup, Protection, and Volume Location (VL) Server processes.

• A binary distribution machine distributes the AFS server binaries for its system type to all other server machines of that systemtype.

• The single system control machine distributes common server configuration files to all other server machines in the cell.


If a cell has a single server machine, it assumes the simple file server and database server roles. The instructions in the OpenAFSQuick Beginnings also have you configure it as the system control machine and binary distribution machine for its system type,but it does not actually perform those functions until you install another server machine.

It is best to keep the binaries for all of the AFS server processes in the /usr/afs/bin directory, even if not all processes are running.You can then change which roles a machine assumes simply by starting or stopping the processes that define the role.

3.3.1 Simple File Server Machines

A simple file server machine runs only the server processes that store and deliver AFS files to client machines, monitor processstatus, and pick up binaries and configuration files from the cell’s binary distribution and system control machines.

In general, only cells with more than three server machines need to run simple file server machines. In cells with three orfewer machines, all of them are usually database server machines (to benefit from replicating the administrative databases); seeDatabase Server Machines.

The following processes run on a simple file server machine:

• The BOS Server (bosserver process)

• The fs process, which combines the File Server, Volume Server, and Salvager processes so that they can coordinate theiroperations on the data in volumes and avoid the inconsistencies that can result from multiple simultaneous operations on thesame data

• A client portion of the Update Server that picks up binary files from the binary distribution machine of its AFS system type(the upclientbin process)

• A client portion of the Update Server that picks up common configuration files from the system control machine (the upclien-tetc process)

3.3.2 Database Server Machines

A database server machine runs the four processes that maintain the AFS replicated administrative databases: the AuthenticationServer, Backup Server, Protection Server, and Volume Location (VL) Server, which maintain the Authentication Database,Backup Database, Protection Database, and Volume Location Database (VLDB), respectively. To review the functions of theseserver processes and their databases, see AFS Server Processes and the Cache Manager.

If a cell has more than one server machine, it is best to run more than one database server machine, but more than three arerarely necessary. Replicating the databases in this way yields the same benefits as replicating volumes: increased availabilityand reliability of information. If one database server machine or process goes down, the information in the database is stillavailable from others. The load of requests for database information is spread across multiple machines, preventing any one frombecoming overloaded.

Unlike replicated volumes, however, replicated databases do change frequently. Consistent system performance demands that allcopies of the database always be identical, so it is not possible to record changes in only some of them. To synchronize the copiesof a database, the database server processes use AFS’s distributed database technology, Ubik. See Replicating the OpenAFSAdministrative Databases.

It is critical that the AFS server processes on every server machine in a cell know which machines are the database servermachines. The database server processes in particular must maintain constant contact with their peers in order to coordinate thecopies of the database. The other server processes often need information from the databases. Every file server machine keepsa list of its cell’s database server machines in its local /usr/afs/etc/CellServDB file. Cells that use the States edition of AFS canuse the system control machine to distribute this file (see The System Control Machine).

The following processes define a database server machine:

• The Authentication Server (kaserver process)

• The Backup Server (buserver process)


• The Protection Server (ptserver process)

• The VL Server (vlserver process)

Database server machines can also run the processes that define a simple file server machine, as listed in Simple File ServerMachines. One database server machine can act as the cell’s system control machine, and any database server machine can serveas the binary distribution machine for its system type; see The System Control Machine and Binary Distribution Machines.

3.3.3 Binary Distribution Machines

A binary distribution machine stores and distributes the binary files for the AFS processes and command suites to all other servermachines of its system type. Each file server machine keeps its own copy of AFS server process binaries on its local disk, byconvention in the /usr/afs/bin directory. For consistent system performance, however, all server machines must run the sameversion (build level) of a process. For instructions for checking a binary’s build level, see Displaying A Binary File’s Build Level.The easiest way to keep the binaries consistent is to have a binary distribution machine of each system type distribute them to itssystem-type peers.

The process that defines a binary distribution machine is the server portion of the Update Server (upserver process). The clientportion of the Update Server (upclientbin process) runs on the other server machines of that system type and references thebinary distribution machine.

Binary distribution machines usually also run the processes that define a simple file server machine, as listed in Simple FileServer Machines. One binary distribution machine can act as the cell’s system control machine, and any binary distributionmachine can serve as a database server machine; see The System Control Machine and Database Server Machines.

3.3.4 The System Control Machine

The system control machine stores and distributes system configuration files shared by all of the server machines in the cell. Eachfile server machine keeps its own copy of the configuration files on its local disk, by convention in the /usr/afs/etc directory. Forconsistent system performance, however, all server machines must use the same files. The easiest way to keep the files consistentis to have the system control machine distribute them. You make changes only to the copy stored on the system control machine,as directed by the instructions in this document.

For a list of the configuration files stored in the /usr/afs/etc directory, see Common Configuration Files in the /usr/afs/etc Direc-tory.

The OpenAFS Quick Beginnings configures a cell’s first server machine as the system control machine. If you wish, you canreassign the role to a different machine that you install later, but you must then change the client portion of the Update Server(upclientetc) process running on all other server machines to refer to the new system control machine.

The following processes define the system control machine:

• The server portion of the Update Server (upserver) process The client portion of the Update Server (upclientetc process) runson the other server machines and references the system control machine.

The system control machine can also run the processes that define a simple file server machine, as listed in Simple File ServerMachines. It can also server as a database server machine, and by convention acts as the binary distribution machine for itssystem type. A single upserver process can distribute both configuration files and binaries. See Database Server Machines andBinary Distribution Machines.

3.3.5 To locate database server machines

1. Issue the bos listhosts command.

% bos listhosts <machine name>

The machines listed in the output are the cell’s database server machines. For complete instructions and example output,see To display a cell’s database server machines.


2. (Optional) Issue the bos status command to verify that a machine listed in the output of the bos listhosts commandis actually running the processes that define it as a database server machine. For complete instructions, see DisplayingProcess Status and Information from the BosConfig File.

% bos status <machine name> buserver kaserver ptserver vlserver

If the specified machine is a database server machine, the output from the bos status command includes the followinglines:

Instance buserver, currently running normally.Instance kaserver, currently running normally.Instance ptserver, currently running normally.Instance vlserver, currently running normally.

3.3.6 To locate the system control machine

1. Issue the bos status command for any server machine. Complete instructions appear in Displaying Process Status andInformation from the BosConfig File.

% bos status <machine name> upserver upclientbin upclientetc -long

The output you see depends on the machine you have contacted: a simple file server machine, the system control machine,or a binary distribution machine. See Interpreting the Output from the bos status Command.

3.3.7 To locate the binary distribution machine for a system type

1. Issue the bos status command for a file server machine of the system type you are checking (to determine a machine’s sys-tem type, issue the fs sysname or sys command as described in Displaying and Setting the System Type Name. Completeinstructions for the bos status command appear in Displaying Process Status and Information from the BosConfig File.

% bos status <machine name> upserver upclientbin upclientetc -long

The output you see depends on the machine you have contacted: a simple file server machine, the system control machine,or a binary distribution machine. See Interpreting the Output from the bos status Command.

3.3.8 Interpreting the Output from the bos status Command

Interpreting the output of the bos status command is most straightforward for a simple file server machine. There is no upserverprocess, so the output includes the following message:

bos: failed to get instance info for ’upserver’ (no such entity)

A simple file server machine runs the upclientbin process, so the output includes a message like the following. It indicates thatfs7.example.com is the binary distribution machine for this system type.

Instance upclientbin, (type is simple) currently running normally.Process last started at Wed Mar 10 23:37:09 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upclient fs7.example.com -t 60 /usr/afs/bin’

A simple file server machine also runs the upclientetc process, so the output includes a message like the following. It indicatesthat fs1.example.com is the system control machine.

Instance upclientetc, (type is simple) currently running normally.Process last started at Mon Mar 22 05:23:49 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upclient fs1.example.com -t 60 /usr/afs/etc’


3.3.8.1 The Output on the System Control Machine

If you have issued the bos status command for the system control machine, the output includes an entry for the upserver processsimilar to the following:

Instance upserver, (type is simple) currently running normally.Process last started at Mon Mar 22 05:23:54 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upserver’

If you are using the default configuration recommended in the OpenAFS Quick Beginnings, the system control machine is alsothe binary distribution machine for its system type, and a single upserver process distributes both kinds of updates. In that case,the output includes the following messages:

bos: failed to get instance info for ’upclientbin’ (no such entity)bos: failed to get instance info for ’upclientetc’ (no such entity)

If the system control machine is not a binary distribution machine, the output includes an error message for the upclientetcprocess, but a complete a listing for the upclientbin process (in this case it refers to the machine fs5.example.com as the binarydistribution machine):

Instance upclientbin, (type is simple) currently running normally.Process last started at Mon Mar 22 05:23:49 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upclient fs5.example.com -t 60 /usr/afs/bin’bos: failed to get instance info for ’upclientetc’ (no such entity)

3.3.8.2 The Output on a Binary Distribution Machine

If you have issued the bos status command for a binary distribution machine, the output includes an entry for the upserverprocess similar to the following and error message for the upclientbin process:

Instance upserver, (type is simple) currently running normally.Process last started at Mon Apr 5 05:23:54 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upserver’bos: failed to get instance info for ’upclientbin’ (no such entity)

Unless this machine also happens to be the system control machine, a message like the following references the system controlmachine (in this case, fs3.example.com):

Instance upclientetc, (type is simple) currently running normally.Process last started at Mon Apr 5 05:23:49 1999 (1 proc start)Command 1 is ’/usr/afs/bin/upclient fs3.example.com -t 60 /usr/afs/etc’

3.4 Administering Database Server Machines

This section explains how to administer database server machines. For installation instructions, see the OpenAFS Quick Begin-nings.

3.4.1 Replicating the OpenAFS Administrative Databases

There are several benefits to replicating the AFS administrative databases (the Authentication, Backup, Protection, and VolumeLocation Databases), as discussed in Replicating the OpenAFS Administrative Databases. For correct cell functioning, the copiesof each database must be identical at all times. To keep the databases synchronized, AFS uses library of utilities called Ubik. Eachdatabase server process runs an associated lightweight Ubik process, and client-side programs call Ubik’s client-side subroutineswhen they submit requests to read and change the databases.


Ubik is designed to work with minimal administrator intervention, but there are several configuration requirements, as detailed inConfiguring the Cell for Proper Ubik Operation. The following brief overview of Ubik’s operation is helpful for understandingthe requirements. For more details, see How Ubik Operates Automatically.

Ubik is designed to distribute changes made in an AFS administrative database to all copies as quickly as possible. Only onecopy of the database, the synchronization site, accepts change requests from clients; the lightweight Ubik process running thereis the Ubik coordinator. To maintain maximum availability, there is a separate Ubik coordinator for each database, and thesynchronization site for each of the four databases can be on a different machine. The synchronization site for a database canalso move from machine to machine in response to process, machine, or network outages.

The other copies of a database, and the Ubik processes that maintain them, are termed secondary. The secondary sites do notaccept database changes directly from client-side programs, but only from the synchronization site.

After the Ubik coordinator records a change in its copy of a database, it immediately sends the change to the secondary sites.During the brief distribution period, clients cannot access any of the copies of the database, even for reading. If the coordinatorcannot reach a majority of the secondary sites, it halts the distribution and informs the client that the attempted change failed.

To avoid distribution failures, the Ubik processes maintain constant contact by exchanging time-stamped messages. As long as amajority of the secondary sites respond to the coordinator’s messages, there is a quorum of sites that are synchronized with thecoordinator. If a process, machine, or network outage breaks the quorum, the Ubik processes attempt to elect a new coordinatorin order to establish a new quorum among the highest possible number of sites. See A Flexible Coordinator Boosts Availability.

3.4.1.1 Configuring the Cell for Proper Ubik Operation

This section describes how to configure your cell to maintain proper Ubik operation.

• Run all four database server processes--Authentication Server, Backup Server, Protection Server, and VL Server--on alldatabase server machines.

Both the client and server portions of Ubik expect that all the database server machines listed in the CellServDB file arerunning all of the database server processes. There is no mechanism for indicating that only some database server processesare running on a machine.

• Maintain correct information in the /usr/afs/etc/CellServDB file at all times.

Ubik consults the /usr/afs/etc/CellServDB file to determine the sites with which to establish and maintain a quorum. Incorrectinformation can result in unsynchronized databases or election of a coordinator in each of several subgroups of machines,because the Ubik processes on various machines do not agree on which machines need to participate in the quorum.

If you use the Update Server, it is simplest to maintain the /usr/afs/etc/CellServDB file on the system control machine, whichdistributes its copy to all other server machines. The OpenAFS Quick Beginnings explains how to configure the Update Server.

The only reason to alter the file is when configuring or decommissioning a database server machine. Use the appropriate boscommands rather than editing the file by hand. For instructions, see Maintaining the Server CellServDB File. The instructionsin Monitoring and Controlling Server Processes for stopping and starting processes remind you to alter the CellServDB filewhen appropriate, as do the instructions in the OpenAFS Quick Beginnings for installing or decommissioning a database servermachine.

(Client processes and the server processes that do not maintain databases also rely on correct information in the CellServDB filefor proper operation, but their use of the information does not affect Ubik’s operation. See Maintaining the Server CellServDBFile and Maintaining Knowledge of Database Server Machines.)

• Keep the clocks synchronized on all machines in the cell, especially the database server machines.

Keeping clocks synchronized is important because the Ubik processes at a database’s sites timestamp the messages which theyexchange to maintain constant contact. Timestamping the messages is necessary because in a networked environment it is notsafe to assume that a message reaches its destination instantly. Ubik compares the timestamp on an incoming message withthe current time. If the difference is too great, it is possible that an outage is preventing reliable communication between theUbik sites, which can possibly result in unsynchronized databases. Ubik considers the message invalid, which can prompt it toattempt election of a different coordinator.

Electing a new coordinator is appropriate if a timestamped message is expired due to actual interruption of communication,but not if a message appears expired only because the sender and recipient do not share the same time. For detailed examplesof how unsynchronized clocks can destabilize Ubik operation, see How Ubik Uses Timestamped Messages.


3.4.1.2 How Ubik Operates Automatically

The following Ubik features help keep its maintenance requirements to a minimum:

• Ubik’s server and client portions operate automatically.

Each database server process runs a lightweight process to call on the server portion of the Ubik library. It is common to referto this lightweight process itself as Ubik. Because it is lightweight, the Ubik process does not appear in process listings such asthose generated by the UNIX ps command. Client-side programs that need to read and change the databases directly call thesubroutines in the Ubik library’s client portion, rather than running a separate lightweight process. Examples of such programsare the klog command and the commands in the pts suite.

• Ubik tracks database version numbers.

As the coordinator records a change to a database, it increments the database’s version number. The version number makesit easy for the coordinator to determine if a site has the most recent version or not. The version number speeds the return tonormal functioning after election of a new coordinator or when communication is restored after an outage, because it makes iteasy to determine which site has the most current database and which need to be updated.

• Ubik’s use of timestamped messages guarantees that database copies are always synchronized during normal operation.

Replicating a database to increase data availability is pointless if all copies of the database are not the same. Inconsistentperformance can result if clients receive different information depending on which copy of the database they access. Aspreviously noted, Ubik sites constantly track the status of their peers by exchanging timestamped messages. For a detaileddescription, see How Ubik Uses Timestamped Messages.

• The ability to move the coordinator maximizes database availability.

Suppose, for example, that in a cell with three database server machines a network partition separates the two secondarysites from the coordinator. The coordinator retires because it is no longer in contact with a majority of the sites listed in theCellServDB file. The two sites on the other side of the partition can elect a new coordinator among themselves, and it canthen accept database changes from clients. If the coordinator cannot move in this way, the database has to be read-only untilthe network partition is repaired. For a detailed description of Ubik’s election procedure, see A Flexible Coordinator BoostsAvailability.

3.4.1.2.1 How Ubik Uses Timestamped Messages

Ubik synchronizes the copies of a database by maintaining constant contact between the synchronization site and the secondarysites. The Ubik coordinator frequently sends a time-stamped guarantee message to each of the secondary sites. When thesecondary site receives the message, it concludes that it is in contact with the coordinator. It considers its copy of the databaseto be valid until time T, which is usually 60 seconds from the time the coordinator sent the message. In response, the secondarysite returns a vote message that acknowledges the coordinator as valid until a certain time X, which is usually 120 seconds in thefuture.

The coordinator sends guarantee messages more frequently than every T seconds, so that the expiration periods overlap. Thereis no danger of expiration unless a network partition or other outage actually interrupts communication. If the guarantee expires,the secondary site’s copy of the database it not necessarily current. Nonetheless, the database server continues to service clientrequests. It is considered better for overall cell functioning that a secondary site remains accessible even if the information it isdistributing is possibly out of date. Most of the AFS administrative databases do not change that frequently, in any case, andmaking a database inaccessible causes a timeout for clients that happen to access that copy.

As previously mentioned, Ubik’s use of timestamped messages makes it vital to synchronize the clocks on database servermachines. There are two ways that skewed clocks can interrupt normal Ubik functioning, depending on which clock is ahead ofthe others.

Suppose, for example, that the Ubik coordinator’s clock is ahead of the secondary sites: the coordinator’s clock says 9:35:30,but the secondary clocks say 9:31:30. The secondary sites send votes messages that acknowledge the coordinator as valid until9:33:30. This is two minutes in the future according to the secondary clocks, but is already in the past from the coordinator’sperspective. The coordinator concludes that it no longer has enough support to remain coordinator and forces election of a newcoordinator. Election takes about three minutes, during which time no copy of the database accepts changes.


The opposite possibility is that a secondary site’s clock (14:50:00) is ahead of the coordinator’s (14:46:30). When the coordinatorsends a guarantee message good until 14:47:30), it has already expired according to the secondary clock. Believing that it is out ofcontact with the coordinator, the secondary site stops sending votes for the coordinator and tries get itself elected as coordinator.This is appropriate if the coordinator has actually failed, but is inappropriate when there is no actual outage.

The attempt of a single secondary site to get elected as the new coordinator usually does not affect the performance of the othersites. As long as their clocks agree with the coordinator’s, they ignore the other secondary site’s request for votes and continuevoting for the current coordinator. However, if enough of the secondary sites’s clocks get ahead of the coordinator’s, they canforce election of a new coordinator even though the current one is actually working fine.

3.4.1.2.2 A Flexible Coordinator Boosts Availability

Ubik uses timestamped messages to determine when coordinator election is necessary, just as it does to keep the database copiessynchronized. As long as the coordinator receives vote messages from a majority of the sites (it implicitly votes for itself), itis appropriate for it to continue as coordinator because it is successfully distributing database changes. A majority is defined asmore than 50% of all database sites when there are an odd number of sites; with an even number of sites, the site with the lowestInternet address has an extra vote for breaking ties as necessary.If the coordinator is not receiving sufficient votes, it retires andthe Ubik sites elect a new coordinator. This does not happen spontaneously, but only when the coordinator really fails or stopsreceiving a majority of the votes. The secondary sites have a built-in bias to continue voting for an existing coordinator, whichprevents undue elections.

The election of the new coordinator is by majority vote. The Ubik subprocesses have a bias to vote for the site with the lowestInternet address, which helps it gather the necessary majority quicker than if all the sites were competing to receive votesthemselves. During the election (which normally lasts less than three minutes), clients can read information from the database,but cannot make any changes.

Ubik’s election procedure makes it possible for each database server process’s coordinator to be on a different machine. Forexample, if the Ubik coordinators for all four processes start out on machine A and the Protection Server on machine A fails forsome reason, then a different site (say machine B) must be elected as the new Protection Database Ubik coordinator. Machine Bremains the coordinator for the Protection Database even after the Protection Server on machine A is working again. The failureof the Protection Server has no effect on the Authentication, Backup, or VL Servers, so their coordinators remain on machine A.

3.4.2 Backing Up and Restoring the Administrative Databases

The AFS administrative databases store information that is critical for AFS operation in your cell. If a database becomes corrupteddue to a hardware failure or other problem on a database server machine, it likely to be difficult and time-consuming to recreateall of the information from scratch. To protect yourself against loss of data, back up the administrative databases to a permanentmedia, such as tape, on a regular basis. The recommended method is to use a standard local disk backup utility such as the UNIXtar command.

When deciding how often to back up a database, consider the amount of data that you are willing to recreate by hand if itbecomes necessary to restore the database from a backup copy. In most cells, the databases differ quite a bit in how often andhow much they change. Changes to the Authentication Database are probably the least frequent, and consist mostly of changeduser passwords. Protection Database and VLDB changes are probably more frequent, as users add or delete groups and changegroup memberships, and as you and other administrators create or move volumes. The number and frequency of changes isprobably greatest in the Backup Database, particularly if you perform backups every day.

The ease with which you can recapture lost changes also differs for the different databases:

• If regular users make a large proportion of the changes to the Authentication Database and Protection Database in your cell,then recovering them possibly requires a large amount of detective work and interviewing of users, assuming that they can evenremember what changes they made at what time.

• Recovering lost changes to the VLDB is more straightforward, because you can use the vos syncserv and vos syncvldbcommands to correct any discrepancies between the VLDB and the actual state of volumes on server machines. Running thesecommands can be time-consuming, however.


• The configuration information in the Backup Database (Tape Coordinator port offsets, volume sets and entries, the dumphierarchy, and so on) probably does not change that often, in which case it is not that hard to recover a few recent changes.In contrast, there are likely to be a large number of new dump records resulting from dump operations. You can recoverthese records by using the -dbadd argument to the backup scantape command, reading in information from the backup tapesthemselves. This can take a long time and require numerous tape changes, however, depending on how much data you back upin your cell and how you append dumps. Furthermore, the backup scantape command is subject to several restrictions. Themost basic is that it halts if it finds that an existing dump record in the database has the same dump ID number as a dump onthe tape it is scanning. If you want to continue with the scanning operation, you must locate and remove the existing recordfrom the database. For further discussion, see the backup scantape command’s reference page in the OpenAFS AdministrationReference.

These differences between the databases possibly suggest backing up the database at different frequencies, ranging from everyfew days or weekly for the Backup Database to every few weeks for the Authentication Database. On the other hand, it is probablysimpler from a logistical standpoint to back them all up at the same time (and frequently), particularly if tape consumption is nota major concern. Also, it is not generally necessary to keep backup copies of the databases for a long time, so you can recyclethe tapes fairly frequently.

3.4.3 To back up the administrative databases

1. Log in as the local superuser root on a database server machine that is not the synchronization site. The machine with thehighest IP address is normally the best choice, since it is least likely to become the synchronization site in an election.

2. Issue the bos shutdown command to shut down the relevant server process on the local machine. For a complete descrip-tion of the command, see To stop processes temporarily.

For the -instance argument, specify one or more database server process names (buserver for the Backup Server, kaserverfor the Authentication Server, ptserver for the Protection Server, or vlserver for the Volume Location Server. Include the-localauth flag because you are logged in as the local superuser root but do not necessarily have administrative tokens.

# bos shutdown <machine name> -instance <instances>+ -localauth [-wait]

3. Use a local disk backup utility, such as the UNIX tar command, to transfer one or more database files to tape. If the localdatabase server machine does not have a tape device attached, use a remote copy command to transfer the file to a machinewith a tape device, then use the tar command there.

The following command sequence backs up the complete contents of the /usr/afs/db directory

# cd /usr/afs/db# tar cvf tape_device .

To back up individual database files, substitute their names for the period in the preceding tar command:

• bdb.DB0 for the Backup Database

• kaserver.DB0 for the Authentication Database

• prdb.DB0 for the Protection Database

• vldb.DB0 for the VLDB

4. Issue the bos start command to restart the server processes on the local machine. For a complete description of thecommand, see To start processes by changing their status flags to Run. Provide the same values for the -instance argumentas in Step 2, and the -localauth flag for the same reason.

# bos start <machine name> -instance <server process name>+ -localauth


3.4.4 To restore an administrative database

1. Log in as the local superuser root on each database server machine in the cell.

2. Working on one of the machines, issue the bos shutdown command once for each database server machine, to shut downthe relevant server process on all of them. For a complete description of the command, see To stop processes temporarily.

For the -instance argument, specify one or more database server process names (buserver for the Backup Server, kaserverfor the Authentication Server, ptserver for the Protection Server, or vlserver for the Volume Location Server. Include the-localauth flag because you are logged in as the local superuser root but do not necessarily have administrative tokens.

# bos shutdown <machine name> -instance <instances>+ -localauth [-wait]

3. Remove the database from each database server machine, by issuing the following commands on each one.

# cd /usr/afs/db

For the Backup Database:

# rm bdb.DB0# rm bdb.DBSYS1

For the Authentication Database:

# rm kaserver.DB0# rm kaserver.DBSYS1

For the Protection Database:

# rm prdb.DB0# rm prdb.DBSYS1

For the VLDB:

# rm vldb.DB0# rm vldb.DBSYS1

4. Using the local disk backup utility that you used to back up the database, copy the most recently backed-up version ofit to the appropriate file on the database server machine with the lowest IP address. The following is an appropriate tarcommand if the synchronization site has a tape device attached:

# cd /usr/afs/db# tar xvf tape_device database_file

where database_file is one of the following:

• bdb.DB0 for the Backup Database

• kaserver.DB0 for the Authentication Database

• prdb.DB0 for the Protection Database

• vldb.DB0 for the VLDB

5. Working on one of the machines, issue the bos start command to restart the server process on each of the database servermachines in turn. Start with the machine with the lowest IP address, which becomes the synchronization site for the BackupDatabase. Wait for it to establish itself as the synchronization site before repeating the command to restart the process onthe other database server machines. For a complete description of the command, see To start processes by changing theirstatus flags to Run. Provide the same values for the -instance argument as in Step 2, and the -localauth flag for the samereason.

# bos start <machine name> -instance <server process name>+ -localauth


6. If the database has changed since you last backed it up, issue the appropriate commands from the instructions in theindicated sections to recreate the information in the restored database. If issuing pts commands, you must first obtainadministrative tokens. The backup and vos commands accept the -localauth flag if you are logged in as the local superuserroot, so you do not need administrative tokens. The Authentication Server always performs a separate authenticationanyway, so you only need to include the -admin argument if issuing kas commands.

• To define or remove volume sets and volume entries in the Backup Database, see Defining and Displaying Volume Setsand Volume Entries.

• To edit the dump hierarchy in the Backup Database, see Defining and Displaying the Dump Hierarchy.

• To define or remove Tape Coordinator port offset entries in the Backup Database, see Configuring Tape CoordinatorMachines and Tape Devices.

• To restore dump records in the Backup Database, see To scan the contents of a tape.

• To recreate Authentication Database entries or password changes for users, see the appropriate section of AdministeringUser Accounts.

• To recreate Protection Database entries or group membership information, see the appropriate section of Administeringthe Protection Database.

• To synchronize the VLDB with volume headers, see Synchronizing the VLDB and Volume Headers.

3.5 Installing Server Process Software

This section explains how to install new server process binaries on file server machines, how to revert to a previous version if thecurrent version is not working properly, and how to install new disks to house AFS volumes on a file server machine.

The most frequent reason to replace a server process’s binaries is to upgrade AFS to a new version. In general, installationinstructions accompany the updated software, but this chapter provides an additional reference.

Each AFS server machine must store the server process binaries in a local disk directory, called /usr/afs/bin by convention. Forpredictable system performance, it is best that all server machines run the same build level, or at least the same version, of theserver software. For instructions on checking AFS build level, see Displaying A Binary File’s Build Level.

The Update Server makes it easy to distribute a consistent version of software to all server machines. You designate one servermachine of each system type as the binary distribution machine by running the server portion of the Update Server (upserverprocess) on it. All other server machines of that system type run the client portion of the Update Server (upclientbin process)to retrieve updated software from the binary distribution machine. The OpenAFS Quick Beginnings explains how to install theappropriate processes. For more on binary distribution machines, see Binary Distribution Machines.

When you use the Update Server, you install new binaries on binary distribution machines only. If you install binaries directlyon a machine that is running the upclientbin process, they are overwritten the next time the process compares the contents of thelocal /usr/afs/bin directory to the contents on the system control machine, by default within five minutes.

The following instructions explain how to use the appropriate commands from the bos suite to install and uninstall server binaries.

3.5.1 Installing New Binaries

An AFS server process does not automatically switch to a new process binary file as soon as it is installed in the /usr/afs/bindirectory. The process continues to use the previous version of the binary file until it (the process) next restarts. By default,the BOS Server restarts processes for which there are new binary files every day at 5:00 a.m., as specified in the /usr/afs/local/-BosConfig file. To display or change this binary restart time, use the bos getrestart and bos setrestart commands, as describedin Setting the BOS Server’s Restart Times.

You can force the server machine to start using new server process binaries immediately by issuing the bos restart command asdescribed in the following instructions.

You do not need to restart processes when you install new command suite binaries. The new binary is invoked automatically thenext time a command from the suite is issued.


When you use the bos install command, the BOS Server automatically saves the current version of a binary file by adding a.BAK extension to its name. It renames the current .BAK version, if any, to the .OLD version, if there is no .OLD versionalready. If there is a current .OLD version, the current .BAK version must be at least seven days old to replace it.

It is best to store AFS binaries in the /usr/afs/bin directory, because that is the only directory the BOS Server automaticallychecks for new binaries. You can, however, use the bos install command’s -dir argument to install non-AFS binaries into otherdirectories on a server machine’s local disk. See the command’s reference page in the OpenAFS Administration Reference forfurther information.

3.5.2 To install new server binaries

1. Verify that you are listed in the /usr/afs/etc/UserList file. If necessary, issue the bos listusers command, which is fullydescribed in To display the users in the UserList file.

% bos listusers <machine name>

2. Verify that the binaries are available in the source directory from which you are installing them. If the machine is also anAFS client, you can retrieve the binaries from a central directory in AFS. Otherwise, you can obtain them directly from theAFS distribution media, from a local disk directory where you previously installed them, or from a remote machine usinga transfer utility such as the ftp command.

3. Issue the bos install command for the binary distribution machine. (If you have forgotten which machine is performingthat role, see To locate the binary distribution machine for a system type.)

% bos install <machine name> <files to install>+

where

i Is the shortest acceptable abbreviation of install.machine name Names the binary distribution machine.

files to install Names each binary file to install into the local /usr/afs/bin directory. Partial pathnames are interpretedrelative to the current working directory. The last element in each pathname (the filename itself) matches the nameof the file it is replacing, such as bosserver or volserver for server processes, bos or vos for commands.Each AFS server process other than the fs process uses a single binary file. The fs process uses three binary files:fileserver, volserver, and salvager. Installing a new version of one component does not necessarily mean that youneed to replace all three.

4. Repeat Step 3 for each binary distribution machine.

5. (Optional) If you want to restart processes to use the new binaries immediately, wait until the upclientbin process retrievesthem from the binary distribution machine. You can verify the timestamps on binary files by using the bos getdatecommand as described in Displaying Binary Version Dates. When the binary files are available on each server machine,issue the bos restart command, for which complete instructions appear in Stopping and Immediately Restarting Processes.

If you are working on an AFS client machine, it is a wise precaution to have a copy of the bos command suite binaries on thelocal disk before restarting server processes. In the conventional configuration, the /usr/afsws/bin directory that houses thebos command binary on client machines is a symbolic link into AFS, which conserves local disk space. However, restartingcertain processes (particularly the database server processes) can make the AFS filespace inaccessible, particularly if aproblem arises during the restart. Having a local copy of the bos binary enables you to uninstall or reinstall process binariesor restart processes even in this case. Use the cp command to copy the bos command binary from the /usr/afsws/bindirectory to a local directory such as /tmp.

Restarting a process causes a service outage. It is best to perform the restart at times of low system usage if possible.

% bos restart <machine name> <instances>+


3.5.3 Reverting to the Previous Version of Binaries

In rare cases, installing a new binary can cause problems serious enough to require reverting to the previous version. Just as withinstalling binaries, consistent system performance requires reverting every server machine back to the same version. Issue thebos uninstall command described here on each binary distribution machine.

When you use the bos uninstall command, the BOS Server discards the current version of a binary file and promotes the .BAKversion of the file by removing the extension. It renames the current .OLD version, if any, to .BAK.

If there is no current .BAK version, the bos uninstall command operation fails and generates an error message. If a .OLD versionstill exists, issue the mv command to rename it to .BAK before reissuing the bos uninstall command.

Just as when you install new binaries, the server processes do not start using a reverted version immediately. Presumably you arereverting because the current binaries do not work, so the following instructions have you restart the relevant processes.

3.5.4 To revert to the previous version of binaries



2. Verify that the .BAK version of each relevant binary is available in the /usr/afs/bin directory on each binary distribu-tion machine. If necessary, you can use the bos getdate command as described in Displaying Binary Version Dates. Ifnecessary, rename the .OLD version to .BAK

3. Issue the bos uninstall command for a binary distribution machine. (If you have forgotten which machine is performingthat role, see To locate the binary distribution machine for a system type.)

% bos uninstall <machine name> <files to uninstall>+

where

u Is the shortest acceptable abbreviation of uninstall.machine name Names the binary distribution machine.

files to uninstall Names each binary file in the /usr/afs/bin directory to replace with its .BAK version. The file namealone is sufficient, because the /usr/afs/bin directory is assumed.

4. Repeat Step 3 for each binary distribution machine.

5. Wait until the upclientbin process on each server machine retrieves the reverted from the binary distribution machine. Youcan verify the timestamps on binary files by using the bos getdate command as described in Displaying Binary VersionDates. When the binary files are available on each server machine, issue the bos restart command, for which completeinstructions appear in Stopping and Immediately Restarting Processes.

If you are working on an AFS client machine, it is a wise precaution to have a copy of the bos command suite binaries on thelocal disk before restarting server processes. In the conventional configuration, the /usr/afsws/bin directory that houses thebos command binary on client machines is a symbolic link into AFS, which conserves local disk space. However, restartingcertain processes (particularly the database server processes) can make the AFS filespace inaccessible, particularly if aproblem arises during the restart. Having a local copy of the bos binary enables you to uninstall or reinstall process binariesor restart processes even in this case. Use the cp command to copy the bos command binary from the /usr/afsws/bindirectory to a local directory such as /tmp.



3.5.5 Displaying Binary Version Dates

You can check the compilation dates for all three versions of a binary file in the /usr/afs/bin directory--the current, .BAK and.OLD versions. This is useful for verifying that new binaries have been copied to a file server machine from its binary distributionmachine before restarting a server process to use the new binaries.

To check dates on binaries in a directory other than /usr/afs/bin, add the -dir argument. See the OpenAFS AdministrationReference.

3.5.6 To display binary version dates

1. Issue the bos getdate command.

% bos getdate <machine name> <files to check>+

where

getd Is the shortest acceptable abbreviation of getdate.machine name Name the file server machine for which to display binary dates.files to check Names each binary file to display.

3.5.7 Removing Obsolete Binary Files

When processes with new binaries have been running without problems for a number of days, it is generally safe to remove the.BAK and .OLD versions from the /usr/afs/bin directory, both to reduce clutter and to free space on the file server machine’slocal disk.

You can use the bos prune command’s flags to remove the following types of files:

• To remove files in the /usr/afs/bin directory with a .BAK extension, use the -bak flag.

• To remove files in the /usr/afs/bin directory with a .OLD extension, use the -old flag.

• To remove files in the /usr/afs/logs directory called core, with any extension, use the -core flag.

• To remove all three types of files, use the -all flag.

3.5.8 To remove obsolete binaries



2. Issue the bos prune command with one or more of its flags.

% bos prune <machine name> [-bak] [-old] [-core] [-all]

where

p Is the shortest acceptable abbreviation of prune.machine name Names the file server machine on which to remove obsolete files.-bak Removes all the files with a .BAK extension from the /usr/afs/bin directory. Do not combine this flag with the -all

flag.-old Removes all the files a .OLD extension from the /usr/afs/bin directory. Do not combine this flag with the -all flag.-core Removes all core files from the /usr/afs/logs directory. Do not combine this flag with the -all flag-all Combines the effect of the other three flags. Do not combine it with the other three flags.


3.5.9 Displaying A Binary File’s Build Level

For the most consistent performance on a server machine, and cell-wide, it is best for all server processes to come from the sameAFS distribution. Every AFS binary includes an ASCII string that specifies its version, or build level. To display it, use thestrings and grep commands, which are included in most UNIX distributions.

3.5.10 To display an AFS binary’s build level

1. Change to the directory that houses the binary file . If you are not sure where the binary resides, issue the which command.

% which binary_file/bin_dir_path/binary_file% cd bin_dir_path

2. Issue the strings command to extract all ASCII strings from the binary file. Pipe the output to the grep command to locatethe relevant line.

% strings ./binary_file | grep Base

The output reports the AFS build level in a format like the following:

@(#)Base configuration afsversion build_level

For example, the following string indicates the binary is from AFS M.m build 3.0:

@(#)Base configuration afsM.m 3.0

3.6 Maintaining the Server CellServDB File

Every file server machine maintains a list of its home cell’s database server machines in the local disk file /usr/afs/etc/-CellServDB on its local disk. Both database server processes and non-database server processes consult the file:

• The database server processes (the Authentication, Backup, Protection, and Volume Location Servers) maintain constant con-tact with their peers in order to keep their copies of the replicated administrative databases synchronized.

As detailed in Replicating the OpenAFS Administrative Databases, the database server processes use the Ubik utility to syn-chronize the information in the databases they maintain. The Ubik coordinator at the synchronization site for each databasemaintains the single read/write copy of the database and distributes changes to the secondary sites as necessary. It must main-tain contact with a majority of the secondary sites to remain the coordinator, and consults the CellServDB file to learn howmany peers it has and on which machines they are running.

If the coordinator loses contact with the majority of its peers, they all cooperate to elect a new coordinator by majority vote.During the election, all of the Ubik processes consult the CellServDB file to learn where to send their votes, and what numberconstitutes a majority.

• The non-database server processes must know which machines are running the database server processes in order to retrieveinformation from the databases. For example, the first time that a user accesses an AFS file, the File Server that houses itcontacts the Protection Server to obtain a list of the user’s group memberships (the list is called a current protection subgroup,or CPS). The File Server uses the CPS as it determines if the access control list (ACL) protecting the file grants the requiredpermissions to the user (for more details, see About the Protection Database).

The consequences of missing or incorrect information in the CellServDB file are as follows:

• If the file does not list a machine, then it is effectively not a database server machine even if the database server processes arerunning. The Ubik coordinator does not send it database updates or include it in the count that establishes a majority. It doesnot participate in Ubik elections, and so refuses to distribute database information to any client machines that happen to contactit (which they can do if their /usr/vice/etc/CellServDB file lists it). Users of the client machine must wait for a timeout beforethey can contact a correctly functioning database server machine.


• If the file lists a machine that is not running the database server processes, the consequences can be serious. The Ubikcoordinator cannot send it database updates, but includes it in the count that establishes a majority. If valid secondary sitesgo down and stop sending their votes to the coordinator, it can wrongly appear that the coordinator no longer has the majorityit needs. The resulting election of a new coordinator causes a service outage during which information from the databasebecomes unavailable. Furthermore, the lack of a vote from the incorrectly listed site can disturb the election, if it makes theother sites believe that a majority of sites are not voting for the new coordinator.

A more minor consequence is that non-database server processes attempt to contact the database server processes on themachine. They experience a timeout delay because the processes are not running.

Note that the /usr/afs/etc/CellServDB file on a server machine is not the same as the /usr/vice/etc/CellServDB file on clientmachine. The client version includes entries for foreign cells as well as the local cell. However, it is important to update bothversions of the file whenever you change your cell’s database server machines. A server machine that is also a client needs tohave both files, and you need to update them both. For more information on maintaining the client version of the CellServDBfile, see Maintaining Knowledge of Database Server Machines.

3.6.1 Distributing the Server CellServDB File

To avoid the negative consequences of incorrect information in the /usr/afs/etc/CellServDB file, you must update it on all ofyour cell’s server machines every time you add or remove a database server machine. The OpenAFS Quick Beginnings providescomplete instructions for installing or removing a database server machine and for updating the CellServDB file in that context.This section explains how to distribute the file to your server machines and how to make other cells aware of the changes if youparticipate in the AFS global name space.

If you use the Update Server to distribute the central copy of the server CellServDB file stored on the cell’s system controlmachine. For instructions on configuring the Update Server, see the OpenAFS Quick Beginnings.

To avoid formatting errors that can cause errors, always use the bos addhost and bos removehost commands, rather than editingthe file directly. You must also restart the database server processes running on the machine, to initiate a coordinator electionamong the new set of database server machines. This step is included in the instructions that appear in To add a databaseserver machine to the CellServDB file and To remove a database server machine from the CellServDB file. For instructions ondisplaying the contents of the file, see To display a cell’s database server machines.

If you make your cell accessible to foreign users as part of the AFS global name space, you also need to inform other cells whenyou change your cell’s database server machines. The AFS Support group maintains a CellServDB file that lists all cells thatparticipate in the AFS global name space, and can change your cell’s entry at your request. For further details, see Making YourCell Visible to Others.

Another way to advertise your cell’s database server machines is to maintain a copy of the file at the conventional location inyour AFS filespace, /afs/cellname/service/etc/CellServDB.local. For further discussion, see The Third Level.

3.6.2 To display a cell’s database server machines

1. Issue the bos listhosts command. If you have maintained the file properly, the output is the same on every server machine,but the machine name argument enables you to check various machines if you wish.

% bos listhosts <machine name> [<cell name>]

where

listh Is the shortest acceptable abbreviation of listhosts.

machine name Specifies the server machine from which to display the /usr/afs/etc/CellServDB file.

cell name Specifies the complete Internet domain name of a foreign cell. You must already know the name of at least oneserver machine in the cell, to provide as the machine name argument.

The output lists the machines in the order they appear in the CellServDB file on the specified server machine. It assigns eachone a Host index number, as in the following example. There is no implied relationship between the index and a machine’s IPaddress, name, or role as Ubik coordinator or secondary site.


% bos listhosts fs1.example.comCell name is example.com

Host 1 is fs1.example.comHost 2 is fs7.example.comHost 3 is fs4.example.com

The output lists machines by name rather than IP address as long as the naming service (such as the Domain Name Service orlocal host table) is functioning properly. To display IP addresses, login to a server machine as the local superuser root and use atext editor or display command, such as the cat command, to view the /usr/afs/etc/CellServDB file.

3.6.3 To add a database server machine to the CellServDB file



2. Issue the bos addhost command to add each new database server machine to the CellServDB file. Specify the systemcontrol machine as machine name. (If you have forgotten which machine is the system control machine, see The Outputon the System Control Machine.)

% bos addhost <machine name> <host name>+

where

addh Is the shortest acceptable abbreviation of addhost.machine name Names the system control machine

host name Specifies the fully qualified hostname of each database server machine to add to the CellServDB file (forexample: fs4.example.com). The BOS Server uses the gethostbyname() routine to obtain each machine’s IP addressand records both the name and address automatically.

3. Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine, sothat the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the conventionalnames for the processes; make the appropriate substitution if you use different process names. For complete syntax, seeStopping and Immediately Restarting Processes.

Important: Repeat the following command in quick succession on all of the database server machines.

% bos restart <machine name> buserver kaserver ptserver vlserver

4. Edit the /usr/vice/etc/CellServDB file on each of your cell’s client machines. For instructions, see Maintaining Knowledgeof Database Server Machines.

5. If you participate in the AFS global name space, please have one of your cell’s designated site contacts register the changesyou have made with the AFS Product Support group.

If you maintain a central copy of your cell’s server CellServDB file in the conventional location (/afs/cellname/service/etc/CellServDB.local),edit the file to reflect the change.

3.6.4 To remove a database server machine from the CellServDB file




2. Issue the bos removehost command to remove each database server machine from the CellServDB file. Specify the systemcontrol machine as machine name. (If you have forgotten which machine is the system control machine, see The Outputon the System Control Machine.)

% bos removehost <machine name> <host name>+

where

removeh Is the shortest acceptable abbreviation of removehost.machine name Names the system control machine.

host name Specifies the fully qualified hostname of each database server machine to remove from the CellServDB file(for example: fs4.example.com).

3. Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine, sothat the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the conventionalnames for the processes; make the appropriate substitution if you use different process names. For complete syntax, seeStopping and Immediately Restarting Processes.

Important: Repeat the following command in quick succession on all of the database server machines.


4. Edit the /usr/vice/etc/CellServDB file on each of your cell’s client machines. For instructions, see Maintaining Knowledgeof Database Server Machines.

5. If you participate in the AFS global name space, please have one of your cell’s designated site contacts register the changesyou have made with the AFS Product Support group.

If you maintain a central copy of your cell’s server CellServDB file in the conventional location (/afs/cellname/service/etc/CellServDB.local),edit the file to reflect the change.

3.7 Managing Authentication and Authorization Requirements

This section describes how the AFS server processes guarantee that only properly authorized users perform privileged commands,by checking authorization checking and mutually authenticating with their clients. It explains how you can control authorizationchecking requirements on a per-machine or per-cell basis, and how to bypass mutual authentication when issuing commands.

3.7.1 Authentication versus Authorization

Many AFS commands are privileged in that the AFS server process invoked by the command performs it only for a properlyauthorized user. The server process performs the following two tests to determine if someone is properly authorized:

• In the authentication test, the server process mutually authenticates with the command interpreter, Cache Manager, or otherclient process that is acting on behalf of a user or application. The goal of this test is to determine who is issuing the command.The server process verifies that the issuer really is who he or she claims to be, by examining the server ticket and other compo-nents of the issuer’s token. (Secondarily, it allows the client process to verify that the server process is genuine.) If the issuer hasno token or otherwise fails the test, the server process assigns him or her the identity anonymous, a completely unprivilegeduser. For a more complete description of mutual authentication, see A More Detailed Look at Mutual Authentication.

Many individual commands enable you to bypass the authentication test by assuming the anonymous identity without evenattempting to mutually authenticate. Note, however, that this is futile if the command is privileged and the server process is stillperforming the authorization test, because in that case the process refuses to execute privileged commands for the anonymoususer.

• In the authorization test, the server process determines if the issuer is authorized to use the command by consulting a list ofprivileged users. The goal of this test is to determine what the issuer is allowed to do. Different server processes consultdifferent lists of users, as described in Managing Administrative Privilege. The server process refuses to execute any privilegedcommand for an unauthorized issuer. If a command has no privilege requirements, the server process skips this step andexecutes and immediately.


NoteNever place the anonymous user or the system:anyuser group on a privilege list; it makes authorization checking mean-ingless.You can use the bos setauth command to control whether the server processes on a server machine check for authorization.Other server machines are not affected. Keep in mind that turning off authorization checking is a grave security risk, becausethe server processes on that machine perform any action for any user.

3.7.2 Controlling Authorization Checking on a Server Machine

Disabling authorization checking is a serious breach of security because it means that the AFS server processes on a file servermachine performs any action for any user, even the anonymous user.

The only time it is common to disable authorization checking is when installing a new file server machine (see the IBM AFSQuick Beginnings). It is necessary then because it is not possible to configure all of the necessary security mechanisms beforeperforming other actions that normally make use of them. For greatest security, work at the console of the machine you areinstalling and enable authorization checking as soon as possible.

During normal operation, the only reason to disable authorization checking is if an error occurs with the server encryption keys,leaving the servers unable to authenticate users properly. For instructions on handling key-related emergencies, see HandlingServer Encryption Key Emergencies.

You control authorization checking on each file server machine separately; turning it on or off on one machine does not affect theothers. Because client machines generally choose a server process at random, it is hard to predict what authorization checkingconditions prevail for a given command unless you make the requirement the same on all machines. To turn authorizationchecking on or off for the entire cell, you must repeat the appropriate command on every file server machine.

The server processes constantly monitor the directory /usr/afs/local on their local disks to determine if they need to check forauthorization. If the file called NoAuth appears in that directory, then the servers do not check for authorization. When it is notpresent (the usual case), they perform authorization checking.

Control the presence of the NoAuth file through the BOS Server. When you disable authorization checking with the bos setauthcommand (or, during installation, by putting the -noauth flag on the command that starts up the BOS Server), the BOS Servercreates the zero-length NoAuth file. When you reenable authorization checking, the BOS Server removes the file.

3.7.3 To disable authorization checking on a server machine



2. Issue the bos setauth command to disable authorization checking.

% bos setauth <machine name> off

where

seta Is the shortest acceptable abbreviation of setauth.machine name Specifies the file server machine on which server processes do not check for authorization.

3.7.4 To enable authorization checking on a server machine

1. Reenable authorization checking. (No privilege is required because the machine is not currently checking for authoriza-tion.) For detailed syntax information, see the preceding section.

% bos setauth <machine name> on


3.7.5 Bypassing Mutual Authentication for an Individual Command

Several of the server processes allow any user (not just system administrators) to disable mutual authentication when issuing acommand. The server process treats the issuer as the unauthenticated user anonymous.

The facilities for preventing mutual authentication are provided for use in emergencies (such as the key emergency discussed inHandling Server Encryption Key Emergencies). During normal circumstances, authorization checking is turned on, making ituseless to prevent authentication: the server processes refuse to perform privileged commands for the user anonymous.

It can be useful to prevent authentication when authorization checking is turned off. The very act of trying to authenticate cancause problems if the server cannot understand a particular encryption key, as is likely to happen in a key emergency.

3.7.6 To bypass mutual authentication for bos, kas, pts, and vos commands

Provide the -noauth flag which is available on many of the commands in the suites. To verify that a command accepts the flag,issue the help command in its suite, or consult the command’s reference page in the OpenAFS Administration Reference (thereference page also specifies the shortest acceptable abbreviation for the flag on each command). The suites’ apropos and helpcommands do not themselves accept the flag.

You can bypass mutual authentication for all kas commands issued during an interactive session by including the -noauth flagon the kas interactive command. If you have already entered interactive mode with an authenticated identity, issue the (kas)noauthentication command to assume the anonymous identity.

3.7.7 To bypass mutual authentication for fs commands

This is not possible, except by issuing the unlog command to discard your tokens before issuing the fs command.

3.8 Adding or Removing Disks and Partitions

AFS makes it very easy to add storage space to your cell, just by adding disks to existing file server machines. This sectionexplains how to install or remove a disk used to store AFS volumes. (Another way to add storage space is to install additionalserver machines, as instructed in the OpenAFS Quick Beginnings.)

Both adding and removing a disk cause at least a brief file system outage, because you must restart the fs process to have itrecognize the new set of server partitions. Some operating systems require that you shut the machine off before adding orremoving a disk, in which case you must shut down all of the AFS server processes first. Otherwise, the AFS-related aspects ofadding or removing a disk are not complicated, so the duration of the outage depends mostly on how long it takes to install orremove the disk itself.

The following instructions for installing a new disk completely prepare it to house AFS volumes. You can then use the voscreate command to create new volumes, or the vos move command to move existing ones from other partitions. For instructions,see Creating Read/write Volumes and Moving Volumes. The instructions for removing a disk are basically the reverse of theinstallation instructions, but include extra steps that protect against data loss.

A server machines can house 256 AFS server partitions, each one mounted at a directory with a name of the form /vicepindex,where index is one or two lowercase letters. By convention, the first partition on a machine is mounted at /vicepa, the second at/vicepb, and so on to the twenty-sixth at /vicepz. Additional partitions are mounted at /vicepaa through /vicepaz and so on upto /vicepiv. Using the letters consecutively is not required, but is simpler.

Mount each /vicep directory directly under the local file system’s root directory ( / ), not as a subdirectory of any other directory;for example, /usr/vicepa is not an acceptable location. You must also map the directory to the partition’s device name in the fileserver machine’s file systems registry file (/etc/fstab or equivalent).

These instructions assume that the machine’s AFS initialization file includes the following command to restart the BOS Serverafter each reboot. The BOS Server starts the other AFS server processes listed in the local /usr/afs/local/BosConfig file. Forinformation on the bosserver command’s optional arguments, see its reference page in the OpenAFS Administration Reference.

/usr/afs/bin/bosserver


3.8.1 To add and mount a new disk to house AFS volumes

1. Become the local superuser root on the machine, if you are not already, by issuing the su command.

% su rootPassword: root_password

2. Decide how many AFS partitions to divide the new disk into and the names of the directories at which to mount them (theintroduction to this section describes the naming conventions). To display the names of the existing server partitions onthe machine, issue the vos listpart command. Include the -localauth flag because you are logged in as the local superuserroot but do not necessarily have administrative tokens.

# vos listpart <machine name> -localauth

where

listp Is the shortest acceptable abbreviation of listpart.machine name Names the local file server machine.

-localauth Constructs a server ticket using a key from the local /usr/afs/etc/KeyFile file. The bos command interpreterpresents it to the BOS Server during mutual authentication.

3. Create each directory at which to mount a partition.

# mkdir /vicepx[x]

4. Using a text editor, create an entry in the machine’s file systems registry file (/etc/fstab or equivalent) for each new diskpartition, mapping its device name to the directory you created in the previous step. Refer to existing entries in the file tolearn the proper format, which varies for different operating systems.

5. If the operating system requires that you shut off the machine to install a new disk, issue the bos shutdown commandto shut down all AFS server processes other than the BOS Server (it terminates safely when you shut off the machine).Include the -localauth flag because you are logged in as the local superuser root but do not necessarily have administrativetokens. For a complete description of the command, see To stop processes temporarily.

# bos shutdown <machine name> -localauth [-wait]

6. If necessary, shut off the machine. Install and format the new disk according to the instructions provided by the disk andoperating system vendors. If necessary, edit the disk’s partition table to reflect the changes you made to the files systemregistry file in step 4; consult the operating system documentation for instructions.

7. If you shut off the machine down in step 6, turn it on. Otherwise, issue the bos restart command to restart the fs process,forcing it to recognize the new set of server partitions. Include the -localauth flag because you are logged in as the localsuperuser root but do not necessarily have administrative tokens. For complete instructions for the bos restart command,see Stopping and Immediately Restarting Processes.

# bos restart <machine name> fs -localauth

8. Issue the bos status command to verify that all server processes are running correctly. For more detailed instructions, seeDisplaying Process Status and Information from the BosConfig File.

# bos status <machine name>


3.8.2 To unmount and remove a disk housing AFS volumes



2. Issue the vos listvol command to list the volumes housed on each partition of each disk you are about to remove, inpreparation for removing them or moving them to other partitions. For detailed instructions, see Displaying VolumeHeaders.

% vos listvol <machine name> [<partition name>]

3. Move any volume you wish to retain in the file system to another partition. You can move only read/write volumes. Formore detailed instructions, and for instructions on moving read-only and backup volumes, see Moving Volumes.

% vos move <volume name or ID> \<machine name on source> <partition name on source> \<machine name on destination> <partition name on destination>

4. (Optional) If there are any volumes you do not wish to retain, back them up using the vos dump command or the AFSBackup System. See Dumping and Restoring Volumes or Backing Up Data, respectively.



6. Issue the umount command, repeating it for each partition on the disk to be removed.

# cd /# umount /dev/<partition_block_device_name>

7. Using a text editor, remove or comment out each partition’s entry from the machine’s file systems registry file (/etc/fstabor equivalent).

8. Remove the /vicep directory associated with each partition.

# rmdir /vicepxx

9. If the operating system requires that you shut off the machine to remove a disk, issue the bos shutdown command to shutdown all AFS server processes other than the BOS Server (it terminates safely when you shut off the machine). Includethe -localauth flag because you are logged in as the local superuser root but do not necessarily have administrative tokens.For a complete description of the command, see To stop processes temporarily.


10. If necessary, shut off the machine. Remove the disk according to the instructions provided by the disk and operating systemvendors. If necessary, edit the disk’s partition table to reflect the changes you made to the files system registry file in step7; consult the operating system documentation for instructions.

11. If you shut off the machine down in step 10, turn it on. Otherwise, issue the bos restart command to restart the fs process,forcing it to recognize the new set of server partitions. Include the -localauth flag because you are logged in as the localsuperuser root but do not necessarily have administrative tokens. For complete instructions for the bos restart command,see Stopping and Immediately Restarting Processes.

# bos restart <machine name> fs -localauth

12. Issue the bos status command to verify that all server processes are running correctly. For more detailed instructions, seeDisplaying Process Status and Information from the BosConfig File.

# bos status <machine name>


3.9 Managing Server IP Addresses and VLDB Server Entries

The AFS support for multihomed file server machines is largely automatic. The File Server process records the IP addresses ofits file server machine’s network interfaces in the local /usr/afs/local/sysid file and also registers them in a server entry in theVolume Location Database (VLDB). The sysid file and server entry are identified by the same unique number, which creates anassociation between them.

When the Cache Manager requests volume location information, the Volume Location (VL) Server provides all of the interfacesregistered for each server machine that houses the volume. This enables the Cache Manager to make use of multiple addresseswhen accessing AFS data stored on a multihomed file server machine.

If you wish, you can control which interfaces the File Server registers in its VLDB server entry by creating two files in thelocal /usr/afs/local directory: NetInfo and NetRestrict. Each time the File Server restarts, it builds a list of the local machine’sinterfaces by reading the NetInfo file, if it exists. If you do not create the file, the File Server uses the list of network interfacesconfigured with the operating system. It then removes from the list any addresses that appear in the NetRestrict file, if it exists.The File Server records the resulting list in the sysid file and registers the interfaces in the VLDB server entry that has the sameunique identifier.

On database server machines, the NetInfo and NetRestrict files also determine which interfaces the Ubik database synchroniza-tion library uses when communicating with the database server processes running on other database server machines.

There is a maximum number of IP addresses in each server entry, as documented in the OpenAFS Release Notes. If a multihomedfile server machine has more interfaces than the maximum, AFS simply ignores the excess ones. It is probably appropriate forsuch machines to use the NetInfo and NetRestrict files to control which interfaces are registered.

If for some reason the sysid file no longer exists, the File Server creates a new one with a new unique identifier. When the FileServer registers the contents of the new file, the Volume Location (VL) Server normally recognizes automatically that the newfile corresponds to an existing server entry, and overwrites the existing server entry with the new file contents and identifier.However, it is best not to remove the sysid file if that can be avoided.

Similarly, it is important not to copy the sysid file from one file server machine to another. If you commonly copy the contentsof the /usr/afs directory from an existing machine as part of installing a new file server machine, be sure to remove the sysid filefrom the /usr/afs/local directory on the new machine before starting the File Server.

There are certain cases where the VL Server cannot determine whether it is appropriate to overwrite an existing server entry witha new sysid file’s contents and identifier. It then refuses to allow the File Server to register the interfaces, which prevents the FileServer from starting. This can happen if, for example, a new sysid file includes two interfaces that currently are registered bythemselves in separate server entries. In such cases, error messages in the /usr/afs/log/VLLog file on the VL Server machine andin the /usr/afs/log/FileLog file on the file server machine indicate that you need to use the vos changeaddr command to resolvethe problem. Contact the AFS Product Support group for instructions and assistance.

Except in this type of rare error case, the only appropriate use of the vos changeaddr command is to remove a VLDB serverentry completely when you remove a file server machine from service. The VLDB can accommodate a maximum number ofserver entries, as specified in the OpenAFS Release Notes. Removing obsolete entries makes it possible to allocate server entriesfor new file server machines as required. See the instructions that follow.

Do not use the vos changeaddr command to change the list of interfaces registered in a VLDB server entry. To change a fileserver machine’s IP addresses and server entry, see the instructions that follow.

3.9.1 To create or edit the server NetInfo file



2. Using a text editor, open the /usr/afs/local/NetInfo file. Place one IP address in dotted decimal format (for example, 192.12.107.33) on each line. The order of entries is not significant.

3. If you want the File Server to start using the revised list immediately, use the bos restart command to restart the fs process.For instructions, see Stopping and Immediately Restarting Processes.


3.9.2 To create or edit the server NetRestrict file



2. Using a text editor, open the /usr/afs/local/NetRestrict file. Place one IP address in dotted decimal format on each line.The order of the addresses is not significant. Use the value 255 as a wildcard that represents all possible addresses in thatfield. For example, the entry 192.12.105.255 indicates that the Cache Manager does not register any of the addressesin the 192.12.105 subnet.

3. If you want the File Server to start using the revised list immediately, use the bos restart command to restart the fs process.For instructions, see Stopping and Immediately Restarting Processes.

3.9.3 To display all server entries from the VLDB

1. Issue the vos listaddrs command to display all server entries from the VLDB.

% vos listaddrs

where lista is the shortest acceptable abbreviation of listaddrs.

The output displays all server entries from the VLDB, each on its own line. If a file server machine is multihomed, all ofits registered addresses appear on the line. The first one is the one reported as a volume’s site in the output from the vosexamine and vos listvldb commands.

VLDB server entries record IP addresses, and the command interpreter has the local name service (either a process like theDomain Name Service or a local host table) translate them to hostnames before displaying them. If an IP address appearsin the output, it is not possible to translate it.

The existence of an entry does not necessarily indicate that the machine that is still an active file server machine. To removeobsolete server entries, see the following instructions.

3.9.4 To remove obsolete server entries from the VLDB



2. Issue the vos changeaddr command to remove a server entry from the VLDB.

% vos changeaddr <original IP address> -remove

where

ch Is the shortest acceptable abbreviation of changeaddr.

original IP address Specifies one of the IP addresses currently registered for the file server machine in the VLDB. Anyof a multihomed file server machine’s addresses are acceptable to identify it.

-remove Removes the server entry.


3.9.5 To change a server machine’s IP addresses



2. If the machine is the system control machine or a binary distribution machine, and you are also changing its hostname,redefine all relevant upclient processes on other server machines to refer to the new hostname. Use the bos delete and boscreate commands as instructed in Creating and Removing Processes.

3. If the machine is a database server machine, edit its entry in the /usr/afs/etc/CellServDB file on every server machine inthe cell to list one of the new IP addresses. You can edit the file on the system control machine and wait the required time(by default, five minutes) for the Update Server to distribute the changed file to all server machines.

4. If the machine is a database server machine, issue the bos shutdown command to stop all server processes. If the machineis also a file server, the volumes on it are inaccessible during this time. For a complete description of the command, see Tostop processes temporarily.

% bos shutdown <machine name>

5. Use the utilities provided with the operating system to change one or more of the machine’s IP addresses.

6. If appropriate, edit the /usr/afs/local/NetInfo file, the /usr/afs/local/NetRestrict file, or both, to reflect the changed ad-dresses. Instructions appear earlier in this section.

7. If the machine is a database server machine, issue the bos restart command to restart all server processes on the machine.For complete instructions for the bos restart command, see Stopping and Immediately Restarting Processes.

% bos restart <machine name> -all

At the same time, issue the bos restart command on all other database server machines in the cell to restart the databaseserver processes only (the Authentication, Backup, Protection, and Volume Location Servers). Issue the commands inquick succession so that all of the database server processes vote in the quorum election.

% bos restart <machine name> kaserver buserver ptserver vlserver

If you are changing IP addresses on every database server machine in the cell, you must also issue the bos restart commandon every file server machine in the cell to restart the fs process.

8. If the machine is not a database server machine, issue the bos restart command to restart the fs process (if the machineis a database server, you already restarted the process in the previous step). The File Server automatically compiles a newlist of interfaces, records them in the /usr/afs/local/sysid file, and registers them in its VLDB server entry.

% bos restart <machine name> fs

9. If the machine is a database server machine, edit its entry in the /usr/vice/etc/CellServDB file on every client machine inthe cell to list one of the new IP addresses. Instructions appear in Maintaining Knowledge of Database Server Machines.

10. If there are machine entries in the Protection Database for the machine’s previous IP addresses, use the pts renamecommand to change them to the new addresses. For instructions, see Changing a Protection Database Entry’s Name.

3.10 Rebooting a Server Machine

You can reboot a server machine either by typing the appropriate commands at its console or by issuing the bos exec commandon a remote machine. Remote rebooting can be more convenient, because you do not need to leave your present location, butyou cannot track the progress of the reboot as you can at the console. Remote rebooting is possible because the server machine’soperating system recognizes the BOS Server, which executes the bos exec command, as the local superuser root.


Rebooting server machines is part of routine maintenance in some cells, and some instructions in the AFS documentation includeit as a step. It is certainly not intended to be the standard method for recovering from AFS-related problems, however, but only alast resort when the machine is unresponsive and you have tried all other reasonable options.

Rebooting causes a service outage. If the machine stores volumes, they are all inaccessible until the reboot completes and theFile Server reattaches them. If the machine is a database server machine, information from the databases can become unavailableduring the reelection of the synchronization site for each database server process; the VL Server outage generally has the greatestimpact, because the Cache Manager must be able to access the VLDB to fetch AFS data.

By convention, a server machine’s AFS initialization file includes the following command to restart the BOS Server after eachreboot. It starts the other AFS server processes listed in the local /usr/afs/local/BosConfig file. These instructions assume thatthe initialization file includes the command.

/usr/afs/bin/bosserver

3.10.1 To reboot a file server machine from its console



2. Issue the bos shutdown command to shut down all AFS server processes other than the BOS Server, which terminatessafely when you reboot the machine. Include the -localauth flag because you are logged in as the local superuser rootbut do not necessarily have administrative tokens. For a complete description of the command, see To stop processestemporarily.


3. Reboot the machine. On many system types, the appropriate command is shutdown, but the appropriate options vary;consult your UNIX administrator’s guide.

# shutdown

3.10.2 To reboot a file server machine remotely

1. Verify that you are listed in the /usr/afs/etc/UserList file on the machine you are rebooting. If necessary, issue the boslistusers command, which is fully described in To display the users in the UserList file.


2. Issue the bos shutdown to halt AFS server processes other than the BOS Server, which terminates safely when you turnoff the machine. For a complete description of the command, see To stop processes temporarily.

% bos shutdown <machine name> [-wait]

3. Issue the bos exec command to reboot the machine remotely.

% bos exec <machine name> reboot_command

where

machine name Names the file server machine to reboot.

reboot_command Is the rebooting command for the machine’s operating system. The shutdown command is appropriateon many system types, but consult your operating system documentation.


Chapter 4

Monitoring and Controlling Server Processes

One of your most important responsibilities as a system administrator is ensuring that the processes on file server machinesare running correctly. The BOS Server, which runs on every file server machine, relieves you of much of the responsibility byconstantly monitoring the other AFS server processes on its machine. It can automatically restart processes that have failed,ordering the restarts to take interdependencies into account.

Because different file server machines run different combinations of processes, you must define which processes the BOS Serveron each file server machine is to monitor (to learn how, see Controlling and Checking Process Status).

It is sometimes necessary to take direct control of server process status before performing routine maintenance or correctingproblems that the BOS Server cannot correct (such as problems with database replication or mutual authentication). At thosetimes, you control process status through the BOS Server by issuing bos commands.



Examine process status bos statusExamine information from the BosConfig file file bos status with -long flagCreate a process instance bos createStop a process bos stopStart a stopped process bos startStop a process temporarily bos shutdownStart a temporarily stopped process bos startupStop and immediately restart a process bos restartStop and immediately restart all processes bos restart with -bosserver flagExamine BOS Server’s restart times bos getrestartSet BOS Server’s restart times bos setrestartExamine a log file bos getlogExecute a command remotely bos exec

4.2 Brief Descriptions of the AFS Server Processes

This section briefly describes the different server processes that can run on an AFS server machine. In cells with multiple servermachines, not all processes necessarily run on all machines.

An AFS server process is referred to in one of three ways, depending on the context:

• The output from the bos status command refers to a process by the name assigned when the bos create command createsits entry in the /usr/afs/local/BosConfig file. The name can differ from machine to machine, but it is easiest to maintain the


cell if you assign the same name on all machines. The OpenAFS Quick Beginnings and the reference page for the bos createcommand list the conventional names. Examples are bosserver, kaserver, and vlserver.

• The process listing produced by the standard ps command generally matches the process’s binary file. Examples of processbinary files are /usr/afs/bin/bosserver, /usr/afs/bin/kaserver, and /usr/afs/bin/vlserver.

• In most contexts, including most references in the documentation, a process is referred to as (for example) the Basic OverSeer(BOS) Server, the Authentication Server, or the Volume Location Server.

The following sections specify each name for the process as well as some of the administrative tasks in which you use the process.For a more general description of the servers, see AFS Server Processes and the Cache Manager.

4.2.1 The bosserver Process: the Basic OverSeer Server

The bosserver process, which runs on every AFS server machine, is the Basic OverSeer (BOS) Server responsible for monitoringthe other AFS server processes running on its machine. If a process fails, the BOS Server can restart it automatically, withouthuman intervention. It takes interdependencies into account when restarting a process that has multiple component processes(such as the fs process described in The fs Collection of Processes: the File Server, Volume Server and Salvager).

Because the BOS Server does not monitor or restart itself, it does not appear in the output from the bos status command. Itappears in the ps command’s output as /usr/afs/bin/bosserver.

As a system administrator, you contact the BOS Server when you issue bos commands to perform the following kinds of tasks.

• Defining the processes for the BOS Server to monitor by creating entries in the /usr/afs/local/BosConfig file as described inControlling and Checking Process Status

• Stopping and starting processes on the file server machines according to subsequent instructions in this chapter

• Defining your cell’s database server machines in the /usr/afs/etc/CellServDB file as described in Maintaining the ServerCellServDB File

• Defining AFS server encryption keys in the /usr/afs/etc/KeyFile file as described in Managing Server Encryption Keys.

• Granting system administrator privileges with respect to BOS Server, Volume Server, and Backup Server operations, by addinga user to the /usr/afs/etc/UserList file as described in Administering the UserList File

• Setting authorization checking requirements on a server machine as described in Managing Authentication and AuthorizationRequirements

4.2.2 The buserver Process: the Backup Server

The buserver process, which runs on database server machines, is the Backup Server. It maintains information about BackupSystem configuration and operations in the Backup Database.

The process appears as buserver in the bos status command’s output, if the conventional name is assigned. It appears in theps command’s output as /usr/afs/bin/buserver.

As a system administrator, you contact the Backup Server when you issue any backup command that manipulates informationin the Backup Database, including those that change Backup System configuration information, that dump data from volumes topermanent storage, or that restore data to AFS. See Configuring the AFS Backup System and Backing Up and Restoring AFSData.

4.2.3 The fs Collection of Processes: the File Server, Volume Server and Salvager

The fs process, which runs on every file server machine, combines three component processes: File Server, Volume Server andSalvager. The three components perform independent functions, but are controlled as a single process for the following reasons.


• They all operate on the same data, namely files and directories stored in AFS volumes. Combining them as a single processenables them to coordinate their actions, never attempting simultaneous operations on the same data that can possibly corruptit.

• It enables the BOS Server to stop and restart the processes in the required order. When the File Server fails, the BOS Serverstops the Volume Server and runs the Salvager to correct any corruption that resulted from the failure. (The Salvager runs onlyin this special circumstance or when you invoke it yourself by issuing the bos salvage command as instructed in SalvagingVolumes.) If only the Volume Server fails, the BOS Server can restart it without affecting the File Server or Salvager.

The File Server component handles AFS data at the level of files and directories, manipulating file system elements as requestedby application programs and the standard operating system commands. Its main duty is to deliver requested files to client ma-chines and store them again on the server machine when the client is finished. It also maintains status and protection informationabout each file and directory. It runs continuously during normal operation.

The Volume Server component handles AFS data at the level of complete volumes rather than files and directories. In response tovos commands, it creates, removes, moves, dumps and restores entire volumes, among other actions. It runs continuously duringnormal operation.

The Salvager component runs only after the failure of one of the other two processes. It checks the file system for internalconsistency and repairs any errors it finds.

The process appears as fs in the bos status command’s output, if the conventional name is assigned. An auxiliary messagereports the status of the File Server or Salvager component. See Displaying Process Status and Information from the BosConfigFile.

The component processes of the fs process appear individually in the ps command’s output, as follows. There is no entry for thefs process itself.

• /usr/afs/bin/fileserver

• /usr/afs/bin/volserver

• /usr/afs/bin/salvager

The Cache Manager contacts the File Server component on your behalf whenever you access data or status information in anAFS file or directory or issue file manipulation commands such as the UNIX cp and ls commands. You can contact the FileServer directly by issuing fs commands that perform the following functions

• Administering the ACL of any directory in the file system as described in Managing Access Control Lists

• Installing new partitions for housing AFS volumes, in which case you must restart the fs process for it to recognize the newpartition; for instructions, see Adding or Removing Disks and Partitions

• Creating and deleting volume mount points in the AFS filespace as described in Mounting Volumes

• Setting volume quota and displaying information about the space used and available in a volume or partition as described inSetting and Displaying Volume Quota and Current Size

You contact the Volume Server component when you issue vos commands that manipulate volumes in any way--creating, remov-ing, replicating, moving, renaming, converting to different formats, and salvaging. For instructions, see Managing Volumes.

The Salvager normally runs automatically in case of a failure. You can also start it with the bos salvage command as describedin Salvaging Volumes.

4.2.4 The kaserver Process: the Authentication Server

The kaserver process, which runs on database server machines, is the Authentication Server responsible for several aspectsof AFS security. It verifies AFS user identity by requiring a password. It maintains all AFS server encryption keys and userpasswords in the Authentication Database. The Authentication Server’s Ticket Granting Service (TGS) module creates theshared secrets that AFS client and server processes use when establishing secure connections.


The process appears as kaserver in the bos status command’s output, if the conventional name is assigned. The ka stringstands for Kerberos Authentication, reflecting the fact that AFS’s authentication protocols are based on Kerberos, which wasoriginally developed at the Massachusetts Institute of Technology’s Project Athena.

It appears in the ps command’s output as /usr/afs/bin/kaserver.

As a system administrator, you contact the Authentication Server when you issue kas commands to perform the following kindsof tasks.

• Setting a user’s password. Users normally change their own passwords, so you probably perform this task only creating a newuser account as described in Creating AFS User Accounts and Changing AFS Passwords.

• Setting the AFS server encryption key in the Authentication Database, which the TGS uses to seal server tickets; see ManagingServer Encryption Keys.

• Granting or revoking system administrator privileges with respect to the Authentication Server as described in Granting Privi-lege for kas Commands: the ADMIN Flag.

4.2.5 The ptserver Process: the Protection Server

The ptserver process, which runs on database server machines, is the Protection Server. Its main responsibility is maintainingthe Protection Database which contains user, machine, and group entries. The Protection Server allocates AFS IDs and maintainsthe mapping between them and names. The File Server consults the Protection Server when verifying that a user is authorized toperform a requested action.

The process appears as ptserver in the bos status command’s output, if the conventional name is assigned. It appears in theps command’s output as /usr/afs/bin/ptserver.

As a system administrator, you contact the Protection Server when you issue pts commands to perform the following kinds oftasks.

• Creating a new user, machine, or group entry in the Protection Database as described in Administering the Protection Database

• Adding or removing group members or otherwise manipulating Protection Database entries as described in Administering theProtection Database

• Granting or revoking system administrator privilege by changing the membership of the system:administrators group asdescribed in Administering the system:administrators Group

4.2.6 The upserver and upclient Processes: the Update Server

The Update Server has two separate parts, each of which runs on a different type of server machine. The upserver process is theserver portion of the Update Server. Its function depends on which edition of AFS you use:

• It runs on the binary distribution machine of each system type you use as a server machine, distributing the contents of eachone’s /usr/afs/bin directory to the other server machines of that type. This guarantees that all machines have the same versionof AFS binaries. (For a list of the binaries, see Binaries in the /usr/afs/bin Directory.)

• It also runs on the cell’s system control machine, distributing the contents of its /usr/afs/etc directory to all the other servermachines in order to synchronize the configuration files stored in that directory. (For a list of the configuration files, seeCommon Configuration Files in the /usr/afs/etc Directory.)

The upclient process is the client portion of the Update Server, and like the server portion its function depends on the AFSedition in use.

• It runs on every server machine that is not a binary distribution machine, referencing the binary distribution machine of itssystem type as the source for updates to the binaries in the /usr/afs/bin directory. The conventional process name to assign isupclientbin.


• Another instance of the process runs on every server machine except the system control machine. It references the systemcontrol machine as the source for updates to the common configuration files in the /usr/afs/etc directory. The conventionalprocess name to assign is upclientetc.

In output from the bos status command, the server portion appears as upserver and the client portions as upclientbinand upclientetc, if the conventional names are assigned. In the output from the ps command, the server portion appears as/usr/afs/bin/upserver and the client portions as /usr/afs/bin/upclient.

You do not contact the Update Server directly once you have installed it. It operates automatically whenever you use boscommands to change the files that it distributes.

4.2.7 The vlserver Process: the Volume Location Server

The vlserver process, which runs on database server machines, is the Volume Location (VL) Server that automatically trackswhich file server machines house each volume, making its location transparent to client applications.

The process appears as vlserver in the bos status command’s output, if the conventional name is assigned. It appears in theps command’s output as /usr/afs/bin/vlserver.

As a system administrator, you contact the VL Server when you issue any vos command that changes the status of a volume (itrecords the status changes in the VLDB).

4.3 Controlling and Checking Process Status

To define the AFS server processes that run on a server machine, use the bos create command to create entries for them in thelocal /usr/afs/local/BosConfig file. The BOS Server monitors the processes listed in the BosConfig file that are marked with theRun status flag, and automatically attempts to restart them if they fail. After creating process entries, you use other commandsfrom the bos suite to stop and start processes or change the status flag as desired.

Never edit the BosConfig file directly rather than using bos commands. Similarly, it is not a good practice to run server processeswithout listing them in the BosConfig file, or to stop them using process termination commands such as the UNIX kill command.

4.3.1 The Information in the BosConfig File

A process’s entry in the BosConfig file includes the following information:

• The process’s name. The recommended conventional names are defined in both the OpenAFS Quick Beginnings and Creatingand Removing Processes. The name of a simple process usually matches the name of its binary file (for example, ptserver forthe Protection Server).

• Its type, which is one of the following:

simple A process that runs independently of any other on the server machine. If several simple processes fail at the same time,the BOS Server can restart them in any order. All standard AFS processes except the fs process are simple.

fs A process type reserved for the server process for which the conventional name is also fs. This process combines threecomponents: the File Server, the Volume Server, and the Salvager.

cron A process that runs at a defined time rather than continuously. There are no standard processes of this type.

• Its status flag, which tells the BOS Server whether it performs the following two actions with respect to the process:

– Start the process during BOS Server initialization

– Restart the process if it (the process) fails

The two possible values are Run (which directs the BOS Server to perform these actions) and NotRun (which directs the BOSServer to ignore the process). The BOS Server itself never changes the setting of this flag, even if the process fails repeatedly.Also, this flag is for internal use only; it does not appear in the bos status command’s output.


• Its command parameters, which are the commands that the BOS Server runs to start the process.

– A simple processes has one: the complete pathname to its binary file

– The fs process has three: the complete pathnames to each of the three component processes (/usr/afs/bin/fileserver, /us-r/afs/bin/volserver, and /usr/afs/bin/salvager)

– A cron process has two: the first the complete pathname to its binary file, the second the time at which the BOS Server runsit

In addition to process definitions, the BosConfig file also records automatic restart times for processes that have new binaries,and for all server processes including the BOS Server. See Setting the BOS Server’s Restart Times.

4.3.2 How the BOS Server Uses the Information in the BosConfig File

Whenever the BOS Server starts or restarts, it reads the BosConfig file to learn which processes it is to start and monitor. Ittransfers the information into kernel memory and does not read the BosConfig file again until it next restarts. This implies thatthe BOS Server’s memory state can change independently of the BosConfig file. You can, for example, stop a process but leaveits status flag in the BosConfig file as Run, or start a process even though its status flag in the BosConfig file is NotRun.

4.3.3 About Starting and Stopping the Database Server Processes

When you start or stop a database server process (Authentication Server, Backup Server, Protection Server, or Volume LocationServer) for more than a short time, you must follow the instructions in the OpenAFS Quick Beginnings for installing or removinga database server machine. Here is a summary of the tasks you must perform to preserve correct AFS functioning.

• Start or stop all four database server processes on that machine. All AFS server processes and the Cache Manager processesexpect all four database server processes to be running on each machine listed in the CellServDB file. There is no way toindicate in the file that a machine is running only some of the database server processes.

• Add or remove the machine in the /usr/afs/etc/CellServDB file on all server machines and the /usr/vice/etc/CellServDB fileon all client machines.

• Restart the database server processes on the other database server machines to force an election of a new Ubik coordinator foreach one.

4.3.4 About Starting and Stopping the Update Server

In the conventional cell configuration, one server machine of each system type acts as a binary distribution machine, running theserver portion of the Update Server (upserver process) to distribute the contents of its /usr/afs/bin directory. The other servermachines of its system type run an instance of the Update Server client portion (by convention called upclientbin) that referencesthe binary distribution machine.

It is conventional for the first server machine you install to act as the system control machine, running the server portion of theUpdate Server (upserver process) to distribute the contents of its /usr/afs/etc directory. All other server machines run an instanceof the Update Server client portion (by convention called upclientetc) that references the system control machine.

It is simplest not to move binary distribution or system control responsibilities to a different machine unless you completelydecommission a machine that is currently serving in one of those roles. Running the Update Server usually imposes very littleprocessing load. If you must move the functionality, perform the following related tasks.

• If you replace the system control machine, you must stop the upclientetc process on every other server machine and define anew one that references the new system control machine.

• If you replace a binary distribution machine, you must stop the upclientbin process on every other server machine of its systemtype and define a new one that references the new binary distribution machine (unless you are no longer running any servermachines of that system type).


4.4 Displaying Process Status and Information from the BosConfig File

To display the status of the AFS server processes on a server machine, issue the bos status command. Adding the -long flagdisplays most of the information from each process’s entry in the BosConfig file, including its type and command parameters. Italso displays a warning message if the mode bits on files and subdirectories in the /usr/afs directory do not match the expectedvalues.

4.4.1 To display the status of server processes and their BosConfig entries

1. Issue the bos status command.

% bos status <machine name> [<server process name>+] [-long]

where

stat Is the shortest acceptable abbreviation of status.

machine name Specifies the file server machine for which to display process status.

server process name Names each process for which to display status, using the name assigned when its entry was definedwith the bos create command. Omit this argument to display the status of all server processes.

-long Displays, in addition to status, information from the process’s entry in the BosConfig file: its type, its status flag, itscommand parameters, the associated notifier program, and so on.

The output includes an entry for each process and uses one of the following strings to indicate the process’s status:

• currently running normally indicates that the process is running and its status flag in the BosConfig file is Run.For cron entries, this message indicates that the command is still scheduled to run, not necessarily that it is actually runningwhen the bos status command was issued.

• temporarily enabled indicates that the process is running but that its status flag in the BosConfig file is NotRun. Themost common reason is that a system administrator has used the bos startup command to start the process.

• temporarily disabled indicates that the process is not running even though its status flag in the BosConfig file is Run.The most common reasons are either that a system administrator has used the bos shutdown command to stop the processor that the BOS Server ceased trying to restart the process after numerous failed attempts. In the latter case, a supplementarymessage appears: stopped for too many errors.

• disabled indicates that the process is not running and that its status flag in the BosConfig file is NotRun. The BOS Server isnot monitoring the process. Only a system administrator can set the flag this way; the BOS Server never does.

The output for the fs process always includes a message marked Auxiliary status, which can be one of the following:

• file server running indicates that the File Server and Volume Server components of the File Server process are run-ning normally.

• salvaging file system indicates that the Salvager is running, which usually implies that the File Server and VolumeServer are temporarily disabled. The BOS Server restarts them as soon as the Salvager is finished.

The output for a cron process also includes an Auxiliary status message to report when the command is scheduled to runnext; see the example that follows.

The output for any process can include the supplementary message has core file to indicate that at some point the processfailed and generated a core file in the /usr/afs/logs directory. In most cases, the BOS Server is able to restart the process and it isrunning.

The following example includes a user-defined cron entry called backupusers:


% bos status fs3.example.comInstance kaserver, currently running normally.Instance ptserver, currently running normally.Instance vlserver, has core file, currently running normally.Instance buserver, currently running normally.Instance fs, currently running normally.

Auxiliary status is: file server running.Instance upserver, currently running normally.Instance backupusers, currently running normally.

Auxiliary status is: run next at Mon Jun 7 02:00:00 1999.

If you include the -long flag to the bos status command, a process’s entry in the output includes the following additionalinformation from the BosConfig file:

• The process’s type (simple, fs, or cron).

• The day and time the process last started or restarted.

• The number of proc starts, which is how many times the BOS Server has started or restarted the process since it starteditself.

• The Last exit time when the process (or one of the component processes in the fs process) last terminated. This line doesnot appear if the process has not terminated since the BOS Server started.

• The Last error exit time when the process (or one of the component processes in the fs process) last failed due to anerror. A further explanation such as due to shutdown request sometimes appears. This line does not appear if theprocess has not failed since the BOS Server started.

• Each command that the BOS Server invokes to start the process, as specified by the -cmd argument to the bos create command.

• The pathname of the notifier program that the BOS Server invokes when the process terminates (if any), as specified by the-notifier argument to the bos create command.

In addition, if the BOS Server has found that the mode bits on certain files and directories under /usr/afs deviate from what itexpects, it prints the following warning message:

Bosserver process reports inappropriate access on server directories

The expected protections for the directories and files in the /usr/afs directory are as follows. A question mark indicates that theBOS Server does not check the mode bit. See the OpenAFS Quick Beginnings for more information about setting the protectionson these files and directories.

/usr/afs drwxr?xr-x/usr/afs/backup drwx???---/usr/afs/bin drwxr?xr-x/usr/afs/db drwx???---/usr/afs/etc drwxr?xr-x/usr/afs/etc/KeyFile -rw????---/usr/afs/etc/UserList -rw?????--/usr/afs/local drwx???---/usr/afs/logs drwxr?xr-x

The following illustrates the extended output for the fs process running on the machine fs3.example.com:

% bos status fs3.example.com fs -longInstance fs, (type is fs), currently running normally.

Auxiliary status is file server runningProcess last started at Mon May 3 8:29:19 1999 (3 proc starts)Last exit at Mon May 3 8:29:19 1999


Last error exit at Mon May 3 8:29:19 1999, due to shutdown requestCommand 1 is ’/usr/afs/bin/fileserver’Command 2 is ’/usr/afs/bin/volserver’Command 3 is ’/usr/afs/bin/salvager’

4.5 Creating and Removing Processes

To start a new AFS server process on a server machine, issue the bos create command, which creates an entry in the /usr/afs/lo-cal/BosConfig file, sets the process’s status flag to Run both in the file and in the BOS Server’s memory, and starts it runningimmediately. The binary file for the new process must already be installed, by convention in the /usr/afs/bin directory (seeInstalling New Binaries).

To stop a process permanently, first issue the bos stop command, which changes the process’s status flag to NotRun in boththe BosConfig file and the BOS Server’s memory; it is marked as disabled in the output from the bos status command. Ifdesired, issue the bos delete command to remove the process’s entry from the BosConfig file; the process no longer appears inthe bos status command’s output.

NoteIf you are starting or stopping a database server process in the manner described in this section, follow the complete instructionsin the OpenAFS Quick Beginnings for creating or removing a database server machine. If you run one database server processon a given machine, you must run them all; for more information, see About Starting and Stopping the Database ServerProcesses. Similarly, if you are stopping the upserver process on the system control machine or a binary distribution machine,you must complete the additional tasks described in About Starting and Stopping the Update Server.

4.5.1 To create and start a new process

1. Verify that you are authenticated as a user listed in the /usr/afs/etc/UserList file. If necessary, issue the bos listuserscommand, which is fully described in To display the users in the UserList file.


2. (Optional) Verify that the process’s binaries are installed in the /usr/afs/bin directory on this machine. If necessary, loginat the console or telnet to the machine and list the contents of the /usr/afs/bin directory.

If the binaries are not present, install them on the binary distribution machine of the appropriate system type, and wait forthe Update Server to copy them to this machine. For instructions, see Installing New Binaries.

% ls /usr/afs/bin

3. Issue the bos create command to create an entry in the BosConfig file and start the process.

% bos create <machine name> <server process name> \<server type> <command lines>+ [ -notifier <Notifier program>]

where

cr Is the shortest acceptable abbreviation of create.

machine name Specifies the file server machine on which to create the process.

server process name Names the process to create and start. For simple processes, the conventional value is the name ofthe process’s binary file. It is best to use the same name on every server machine that runs the process. The followingis a list of the conventional names for simple and fs-type processes (there are no standard cron processes).

• buserver for the Backup Server• fs for the process that combines the File Server, Volume Server, and Salvager• kaserver for the Authentication Server


• ptserver for the Protection Server• upclientbin for the client portion of the Update Server that references the binary distribution machine of this

machine’s system type• upclientetc for the client portion of the Update Server that references the system control machine• vlserver for the Volume Location (VL) Server

server type Defines the process’s type. Choose one of the following values:

• cron for a cron process• fs for the process named fs• simple for all other processes listed as acceptable values for the server process name argument

command lines Specifies each command the BOS Server runs to start the process. Specify no more than six commands(which can include the command’s options, in which case the entire string is surrounded by double quotes); anyadditional commands are ignored.For a simple process, provide the complete pathname of the process’s binary file on the local disk (for example,/usr/afs/bin/ptserver for the Protection Server). If including any of the initialization command’s options, surroundthe entire command in double quotes (" "). The upclient process has a required argument, and the commands for allother processes take optional arguments.For the fs process, provide the complete pathname of the local disk binary file for each of the component processes:fileserver, volserver, and salvager, in that order. The standard binary directory is /usr/afs/bin. If including any ofan initialization command’s options, surround the entire command in double quotes (" ").For a cron process, provide two parameters:

• The complete local disk pathname of either an executable file or a command from one of the AFS suites (completewith all of the necessary arguments). Surround this parameter with double quotes (" ") if it contains spaces.

• A specification of when the BOS Server executes the file or command indicated by the first parameter. There arethree acceptable values:– The string now, which directs the BOS Server to execute the file or command immediately and only once. It is

usually simpler to issue the command directly or issue the bos exec command.– A time of day. The BOS Server executes the file or command daily at the indicated time. Separate the hours

and minutes with a colon (hh:MM), and use either 24-hour format, or a value in the range from 1:00 through12:59 with the addition of am or pm. For example, both 14:30 and "2:30 pm" indicate 2:30 in the afternoon.Surround this parameter with double quotes (" ") if it contains a space.

– A day of the week and time of day, separated by a space and surrounded with double quotes (" "). The BOSServer executes the file or command weekly at the indicated day and time. For the day, provide either the wholename or the first three letters, all in lowercase letters (sunday or sun, thursday or thu, and so on). For the time,use the same format as when specifying the time alone.

-notifier Specifies the pathname of a program that the BOS Server runs when the process terminates. For more informationon notifier programs, see the bos create command reference page in the OpenAFS Administration Reference.

The following example defines and starts the Protection Server on the machine db2.example.com:

% bos create db2.example.com ptserver simple /usr/afs/bin/ptserver

The following example defines and starts the fs process on the machine fs6.example.com.

% bos create fs6.example.com fs fs /usr/afs/bin/fileserver \/usr/afs/bin/volserver /usr/afs/bin/salvager

The following example defines and starts a cron process called backupuser process on the machine fs3.example.com, schedulingit to run each day at 3:00 a.m.

% bos create fs3.example.com backupuser cron "/usr/afs/bin/vos backupsys -prefix user - ←↩local" 3:00


4.5.2 To stop a process and remove it from the BosConfig file



2. Issue the bos stop command to change each process’s status flag in the BosConfig file to NotRun and to stop it. You mustissue this command even for cron processes that you wish to remove from the BosConfig file, even though they do not runcontinuously. For a detailed description of this command, see To stop a process by changing its status to NotRun.

% bos stop <machine name> <server process name>+ [-wait]

3. Issue the bos delete command to remove each process from the BosConfig file.

% bos delete <machine name> <server process name>+

where

d Is the shortest acceptable abbreviation of delete.

machine name Specifies the server machine on which to remove processes from the BosConfig file.

server process name Names each process entry to remove from the BosConfig file. Provide the same names as in Step 2.

4.6 Stopping and Starting Processes Permanently

To stop a process so that the BOS Server no longer attempts to monitor it, issue the bos stop command. The process’s status flagis set to NotRun in both the BOS Server’s memory and in the BosConfig file. The process does not run again until you issue thebos start command, which sets its status flag back to Run in both the BOS Server’s memory and in the BosConfig file. (You canalso use the bos startup command to start the process again without changing its status flag in the BosConfig file; see Stoppingand Starting Processes Temporarily.)

There is no entry for the BOS Server in the BosConfig file, so the bos stop and bos start commands do not control it. To stopand immediately restart the BOS Server along with all other processes, use the -bosserver flag to the bos restart command asdescribed in Stopping and Immediately Restarting Processes.

NoteIf you are starting or stopping a database server process in the manner described in this section, follow the complete instructionsin the OpenAFS Quick Beginnings for creating or removing a database server machine. If you run one database server processon a given machine, you must run them all; for more information, see About Starting and Stopping the Database ServerProcesses. Similarly, if you are stopping the upserver process on the system control machine or a binary distribution machine,you must complete the additional tasks described in About Starting and Stopping the Update Server.

4.6.1 To stop a process by changing its status to NotRun



2. Issue the bos stop command to stop each process and set its status flag to NotRun in the BosConfig file and the BOSServer’s memory.

% bos stop <machine name> <server process name>+ [-wait]

where


sto Is the shortest acceptable abbreviation of stop.

machine name Specifies the server machine on which to stop the process.

server process name Names each process to stop, using the name assigned when its entry was defined with the bos createcommand.

-wait Delays the return of the command shell prompt until all specified processes have stopped. If you omit the flag, theprompt returns almost immediately, even if all processes are not yet stopped.

4.6.2 To start processes by changing their status flags to Run



2. Issue the bos start command to change each process’s status flag to Run in both the BosConfig file and the BOS Server’smemory and to start it.

% bos start <machine name> <server process name>+

where

start Must be typed in full.

machine name Specifies the server machine on which to start running each process.

server process name Specifies each process to start on machine name. Use the name assigned to the process at creation.

4.7 Stopping and Starting Processes Temporarily

It is sometimes necessary to halt a process temporarily (for example, to make slight configuration changes or to perform main-tenance). The commands described in this section change a process’s status in the BOS Server’s memory only; the effect isimmediate and lasts until you change the memory state again (or until the BOS Server restarts, at which time it starts the processaccording to its entry in the BosConfig file).

To stop a process temporarily by changing its status flag in BOS Server memory to NotRun, use the bos shutdown command.To restart a stopped process by changing its status flag in the BOS Server’s memory to Run, use the bos startup command. Theprocess starts regardless of its status flag in the BosConfig file. You can also use the bos startup command to start all processesmarked with status flag Run in the BosConfig file, as described in the following instructions.

Because the bos startup command starts a process without changing it status flag in the BosConfig file, it is useful for testinga server process without enabling it permanently. To stop and start processes by changing their status flags in the BosConfigfile, see Stopping and Starting Processes Permanently; to stop and immediately restart a process, see Stopping and ImmediatelyRestarting Processes.

NoteDo not temporarily stop a database server process on all machines at once. Doing so makes the database completely unavail-able.

4.7.1 To stop processes temporarily




2. Issue the bos shutdown command to stop each process by changing its status flag in the BOS Server’s memory to NotRun.

% bos shutdown <machine name> [<instances>+] [-wait]

where

sh Is the shortest acceptable abbreviation of shutdown.

machine name Specifies the server machine on which to stop processes temporarily.

instances Specifies each process to stop temporarily. Use the name assigned to the process at creation.

-wait Delays the return of the command shell prompt until all specified processes have actually stopped. If you omit theflag, the prompt returns almost immediately, even if all processes are not yet stopped.

4.7.2 To start all stopped processes that have status flag Run in the BosConfig file



2. Issue the bos startup command to start each process on a machine that has status flag Run in the BosConfig file bychanging its status flag in the BOS Server’s memory from NotRun to Run.

% bos startup <machine name>

where

startup Must be typed in full.

machine name Specifies the server machine on which you wish to start all processes that have status flag Run in theBosConfig file.

4.7.3 To start specific processes



2. Issue the bos startup command to start specific processes by changing their status flags in the BOS Server’s memory toRun without changing their status flags in the BosConfig file.

% bos startup <machine name> <instances>+

where

startup Must be typed in full.

machine name Names the server machine on which to start processes.

instances Specifies each process to start. Use the name assigned to the process at creation.


4.8 Stopping and Immediately Restarting Processes

Although by default the BOS Server checks each day for new installed binary files and restarts the associated processes, it issometimes desirable to stop and restart processes immediately. The bos restart command provides this functionality, starting acompletely new instance of each affected process:

• To stop and restart the BOS Server, which then restarts all processes marked with the Run status flag in the BosConfig file,include the -bosserver flag.

• To stop and restart all processes marked with the Run status flag in the BosConfig file, include the -all flag. The BOS Serverdoes not restart

• To stop and restart specific processes regardless of the setting of their status flags in the BosConfig file, specify the name ofeach process to restart.

Restarting processes causes a service outage. It is usually best to schedule restarts for periods of low usage. The BOS Serverautomatically restarts all processes once a week, to reduce the potential for the core leaks that can develop as any process runsfor an extended time; see Setting the BOS Server’s Restart Times.

4.8.1 To stop and restart all processes including the BOS Server



2. Issue the bos restart command with the -bosserver flag to stop and restart the BOS Server, which restarts every processmarked with status flag Run in the BosConfig file.

% bos restart <machine name> -bosserver

where

res Is the shortest acceptable abbreviation of restart.machine name Specifies the server machine on which to restart all processes.

-bosserver Stops the BOS Server and all processes running on the machine. A new BOS Server instance starts; it thenstarts new instances of all processes marked with status flag Run in the BosConfig file.

4.8.2 To stop and immediately restart all processes except the BOS Server



2. Issue the bos restart command with the -all flag to stop and immediately restart every process marked with status flagRun in the BosConfig file. The BOS Server does not restart.

% bos restart <machine name> -all

where

res Is the shortest acceptable abbreviation of restart.machine name Specifies the server machine on which to stop and restart processes.

-all Stops and immediately restarts all processes marked with status flag Run in the BosConfig file.


4.8.3 To stop and immediately restart specific processes



2. Issue the bos restart command to stop and immediately restart one or more specified processes, regardless of their statusflag setting in the BosConfig file.


where

res Is the shortest acceptable abbreviation of restart.machine name Names the server machine on which to restart the specified processes.

instances Specifies each process to stop and immediately restart. Use the name assigned to the process at creation.

4.9 Setting the BOS Server’s Restart Times

The BOS Server by default has general restarts disabled. If you wish, it may be configured so that it restarts once a week, andthe new instance restarts all processes marked with status flag Run in the local /usr/afs/local/BosConfig file (this is equivalent toissuing the bos restart command with the -bosserver flag). Historically, the default restart time was Sunday at 4:00 a.m - siteswhich have been upgraded from earlier versions of OpenAFS may find that this value is still present. The weekly restart wasdesigned to minimize core leaks, which can develop as a process continues to allocate virtual memory but does not free it again.It is believed that these leaks have been fixed in OpenAFS.

The BOS Server also by default checks once a day for any newly installed binary files. If it finds that the modification time stampon a process’s binary file in the /usr/afs/bin directory is more recent than the time at which the process last started, it restarts theprocess so that a new instance starts using the new binary file. The default binary-checking time is 5:00 a.m.

Because restarts can cause outages during which the file system is inaccessible, the default times for restarts are in the earlymorning when usage is likely to be lowest. Restarting a database server process on any database server machine usually makesthe entire system unavailable to everyone for a brief time, whereas restarting other types of processes inconveniences only usersinteracting with that process on that machine. The longest outages typically result from restarting the fs process, because the FileServer must reattach all volumes.

The BosConfig file on each file server machine records the two restart times. To display the current setting, issue the bosgetrestart command. To reset a time, use the bos setrestart command.

4.9.1 To display the BOS Server restart times

1. Issue the bos getrestart command to display the automatic restart times.

% bos getrestart <machine name>

where

getr Is the shortest acceptable abbreviation of getrestart.machine name Specifies the server machine for which to display the restart times.


4.9.2 To set the general or binary restart time



2. Issue the bos setrestart command with the -general flag to set the general restart time or the -newbinary flag to set thebinary restart time. The command accepts only one of the flags at a time.

% bos setrestart <machine name> "<time to restart server>" [-general] [-newbinary]

where

setr Is the shortest acceptable abbreviation of setrestart.machine name Specifies the server machine.

time to restart server Sets when the BOS Server restarts itself (if combined with the -general flag) or any process with anew binary file (if combined with the -newbinary flag). Provide one of the following types of values:

• The string never, which directs the BOS Server never to perform the indicated type of restart.• A time of day (the conventional type of value for the binary restart time). Separate the hours and minutes with a

colon (hh:MM), and use either 24-hour format, or a value in the range from 1:00 through 12:59 with the additionof am or pm. For example, both 14:30 and "2:30 pm" indicate 2:30 in the afternoon. Surround this parameterwith double quotes (" ") if it contains a space.

• A day of the week and time of day, separated by a space and surrounded with double quotes (" "). This is theconventional type of value for the general restart. For the day, provide either the whole name or the first threeletters, all in lowercase letters (sunday or sun, thursday or thu, and so on). For the time, use the same format aswhen specifying the time alone.

If desired, precede a time or day and time definition with the string every or at. These words do not change themeaning, but possibly make the output of the bos getrestart command easier to understand.

NoteIf the specified time is within one hour of the current time, the BOS Server does not perform the restart until thenext eligible time (the next day for a time or next week for a day and time).

-general Sets the general restart time when the BOS Server restarts itself.

-newbinary Sets the restart time for processes with new binary files.

4.10 Displaying Server Process Log Files

The /usr/afs/logs directory on each file server machine contains log files that detail interesting events that occur during normaloperation of some AFS server processes. The self-explanatory information in the log files can help you evaluate process failuresand other problems. To display a log file remotely, issue the bos getlog command. You can also establish a connection to theserver machine and use a text editor or other file display program (such as the cat command).

NoteLog files can grow unmanageably large if you do not periodically shutdown and restart the database server processes (forexample, if you disable the general restart time). In this case it is a good policy periodically to issue the UNIX rm command todelete the current log file. The server process automatically creates a new one as needed.


4.10.1 To examine a server process log file



2. Issue the bos getlog command to display a log file.

% bos getlog <machine name> <log file to examine>

where

getl Is the shortest acceptable abbreviation of getlog.

machine name Specifies the server machine from which to display the log file.

log file to examine Names the log file to be displayed. Provide one of the following file names to display the indicatedlog file from the /usr/afs/logs directory.

• AuthLog for the Authentication Server log file• BackupLog for the Backup Server log file• BosLog for the BOS Server log file• FileLog for the File Server log file• SalvageLog for the Salvager log file• VLLog for the Volume Location (VL) Server log file• VolserLog for the Volume Server log file

You can provide a full or relative pathname to display a file from another directory. Relative pathnames are interpretedrelative to the /usr/afs/logs directory.


Chapter 5

Managing Volumes

This chapter explains how to manage the volumes stored on file server machines. The volume is the designated unit of adminis-tration in AFS, so managing them is a large part of the administrator’s duties.



Create read/write volume vos createCreate read-only volume vos addsite and vos releaseCreate backup volume vos backupCreate many backup volumes at once vos backupsysExamine VLDB entry vos listvldbExamine volume header vos listvolExamine both VLDB entry and volume header vos examineDisplay volume’s name fs listquota or fs examineDisplay volume’s ID number fs examine or vos examine or vos listvolDisplay partition’s size and space available vos partinfoDisplay volume’s location fs whereis or vos examineCreate mount point fs mkmountRemove mount point fs rmmountDisplay mount point fs lsmountMove read/write volume vos moveSynchronize VLDB with volume headers vos syncvldb and vos syncservSet volume quota fs setvol or fs setquotaDisplay volume quota fs quota or fs listquota or fs examineDisplay volume’s current size fs listquota or fs examineDisplay list of volumes on a machine/partition vos listvolRemove read/write volume vos remove and fs rmmountRemove read-only volume vos removeRemove backup volume vos remove and fs rmmountRemove volume; no VLDB change vos zapRemove read-only site definition vos remsiteRemove VLDB entry; no volume change vos delentryDump volume vos dumpRestore dumped volume vos restoreRename volume vos rename, fs rmmount and fs mkmountUnlock volume vos unlockUnlock multiple volumes vos unlockvldbLock volume vos lock


5.2 About Volumes

An AFS volume is a logical unit of disk space that functions like a container for the files in an AFS directory, keeping them alltogether on one partition of a file server machine. To make a volume’s contents visible in the cell’s file tree and accessible tousers, you mount the volume at a directory location in the AFS filespace. The association between the volume and its locationin the filespace is called a mount point, and because of AFS’s internal workings it looks and acts just like a standard directoryelement. Users can access and manipulate a volume’s contents in the same way they access and manipulate the contents of astandard UNIX directory. For more on the relationship between volumes and directories, see About Mounting Volumes.

Many of an administrator’s daily activities involve manipulating volumes, since they are the basic storage and administrative unitof AFS. For a discussion of some of the ways volumes can make your job easier, see How Volumes Improve AFS Efficiency.

5.2.1 The Three Types of Volumes

There are three types of volumes in AFS, as described in the following list:

• The single read/write version of a volume houses the modifiable versions of the files and directories in that volume. It is oftenreferred to as the read/write source because volumes of the other two types are derived from it by a copying procedure calledcloning. For instructions on creating read/write volumes, see Creating Read/write Volumes.

• A read-only volume is a copy of the read/write source volume and can exist at multiple sites (a site is a particular partitionon a particular file server machine). Placing the same data at more than one site is called replication; see How VolumesImprove AFS Efficiency. As the name suggests, a read-only volume’s contents do not change automatically as the read/writesource changes, but only when an administrator issues the vos release command. For users to have a consistent view of theAFS filespace, all copies of the read-only volume must match each other and their read/write source. All read-only volumesshare the same name, which is derived by adding the .readonly extension to the read/write source’s name. For instructions oncreating of read-only volumes, see Replicating Volumes (Creating Read-only Volumes).

• A backup volume is a clone of the read/write source volume and is stored at the same site as the source. A backup version isuseful because it records the state of the read/write source at a certain time, allowing recovery of data that is later mistakenlychanged or deleted (for further discussion see How Volumes Improve AFS Efficiency). A backup volume’s name is derived byadding the .backup extension to the read/write source’s name. For instructions on creating of backup volumes, see CreatingBackup Volumes.

NoteA backup volume is not the same as the backup of a volume transferred to tape using the AFS Backup System, althoughmaking a backup version of a volume is usually a stage in the process of backing up the volume to tape. For information onbacking up a volume using the AFS Backup System, see Backing Up Data.

As noted, the three types of volumes are related to one another: read-only and backup volumes are both derived from a read/writevolume through a process called cloning. Read-only and backup volumes are exact copies of the read/write source at the timethey are created.

5.2.2 How Volumes Improve AFS Efficiency

Volumes make your cell easier to manage and more efficient in the following three ways:

• Volumes are easy to move between partitions, on the same or different machines, because they are by definition smallerthan a partition. Perhaps the most common reasons to move volumes are to balance the load among file server machinesor to take advantage of greater disk capacity on certain machines. You can move volumes as often as necessary withoutdisrupting user access to their contents, because the move procedure makes the contents unavailable for only a few seconds.The automatic tracking of volume locations in the Volume Location Database (VLDB) assures that access remains transparent.For instructions on moving volumes, see Moving Volumes.


• Volumes are the unit of replication in AFS. Replication refers to creating a read-only clone from the read/write source anddistributing of the clone to one or more sites. Replication improves system efficiency because more than one machine can fillrequests for popular files. It also boosts system reliability by helping to keep data available in the face of machine or serverprocess outage. In general, volumes containing popular application programs and other files that do not change often are thebest candidates for replication, but you can replicate any read/write volume. See Replicating Volumes (Creating Read-onlyVolumes).

• Volumes are the unit of backup in AFS, in two senses. You can create a backup volume version to preserves the state of aread/write source volume at a specified time. You can mount the backup version in the AFS filespace, enabling users to restoredata they have accidentally changed or deleted without administrator assistance, which frees you for more important jobs. Ifyou make a new backup version of user volumes once a day (presumably overwriting the former backup), then users are alwaysbe able to retrieve the previous day’s version of a file. For instructions, see Creating Backup Volumes.

Backup also refers to using the AFS Backup System to store permanent copies of volume contents on tape or in a specialbackup data. See Configuring the AFS Backup Systemand Backing Up and Restoring AFS Data.

5.2.3 Volume Information in the VLDB

The Volume Location Database (VLDB) includes entries for every volume in a cell. Perhaps the most important information inthe entry is the volume’s location, which is key to transparent access to AFS data. When a user opens a file, the Cache Managerconsults the Volume Location (VL) Server, which maintains the VLDB, for a list of the file server machines that house thevolume containing the file. The Cache Manager then requests the file from the File Server running on one of the relevant fileserver machines. The file location procedure is invisible to the user, who only needs to know the file’s pathname.

The VLDB volume entry for a read/write volume also contains the pertinent information about the read-only and backup versions,which do not have their own VLDB entries. (The rare exception is a read-only volume that has its own VLDB entry because itsread/write source has been removed.) A volume’s VLDB entry records the volume’s name, the unique volume ID number foreach version (read/write, read-only, backup, and releaseClone), a count of the number of sites that house a read/write or read-onlyversion, and a list of the sites.

To display the VLDB entry for one or more volumes, use the vos listvldb command as described in To display VLDB entries.To display the VLDB entry for a single volume along with its volume header, use the vos examine command as described in Todisplay one volume’s VLDB entry and volume header. (See the following section for a description of the volume header.)

5.2.4 The Information in Volume Headers

Whereas all versions of a volume share one VLDB entry, each volume on an AFS server partition has its own volume header, adata structure that maps the files and directories in the volume to physical memory addresses on the partition that stores them.The volume header binds the volume’s contents into a logical unit without requiring that they be stored in contiguous memoryblocks. The volume header also records the following information about the volume, some of it redundant with the VLDB entry:name, volume ID number, type, size, status (online, offline, or busy), space quota, timestamps for creation date and date of lastmodification, and number of accesses during the current day.

To display the volume headers on one or more partitions, use the vos listvol command as described in To display volume headers.To display the VLDB entry for a single volume along with its volume header, use the vos examine command as described in Todisplay one volume’s VLDB entry and volume header.

5.2.5 Keeping the VLDB and Volume Headers Synchronized

It is vital that the information in the VLDB correspond to the status of the actual volumes on the servers (as recorded in volumeheaders) as much of the time as possible. If a volume’s location information in the VLDB is incorrect, the Cache Manager cannotfind access its contents. Whenever you issue a vos command that changes a volume’s status, the Volume Server and VL Servercooperate to keep the volume header and VLDB synchronized. In rare cases, the header and VLDB can diverge, for instancebecause a vos operation halts prematurely. For instructions on resynchronizing them, see Synchronizing the VLDB and VolumeHeaders.


5.2.6 About Mounting Volumes

To make a volume’s contents visible in the cell’s file tree and accessible to users, you mount the volume at a directory location inthe AFS filespace. The association between the volume and its location in the filespace is called a mount point. An AFS mountpoint looks and functions like a regular UNIX file system directory, but structurally it is more like a symbolic link that tells theCache Manager the name of the volume associated with the directory. A mount point looks and acts like a directory only becausethe Cache Manager knows how to interpret it.

Consider the common case where the Cache Manager needs to retrieve a file requested by an application program. The CacheManager traverses the file’s complete pathname, starting at the AFS root (by convention mounted at the /afs directory) andcontinuing to the file. When the Cache Manager encounters (or crosses) a mount point during the traversal, it reads it to learn thename of the volume mounted at that directory location. After obtaining location information about the volume from the VolumeLocation (VL) Server, the Cache Manager fetches the indicated volume and opens its root directory. The root directory of avolume lists all the files, subdirectories, and mount points that reside in it. The Cache Manager scans the root directory listing forthe next element in the pathname. It continues down the path, using this method to interpret any other mount points it encounters,until it reaches the volume that houses the requested file.

Mount points act as the glue that connects the AFS file space, creating the illusion of a single, seamless file tree even whenvolumes reside on many different file server machines. A volume’s contents are visible and accessible when the volume ismounted at a directory location, and are not accessible at all if the volume is not mounted.

You can mount a volume at more than one location in the file tree, but this is not recommended for two reasons. First, it distortsthe hierarchical nature of the filespace. Second, the Cache Manager can become confused about which pathname it followed toreach the file (causing unpredictable output from the pwd command, for example). However, if you mount a volume at morethan one directory, the access control list (ACL) associated with the volume’s root directory applies to all of the mount points.

There are several types of mount points, each of which the Cache Manager handles in a different way and each of which isappropriate for a different purpose. See Mounting Volumes.

5.2.7 About Volume Names

A read/write volume’s name can be up to 22 characters in length. The Volume Server automatically adds the .readonly and.backup extensions to read-only and backup volumes respectively. Do not explicitly add the extensions to volume names, evenif they are appropriate.

It is conventional for a volume’s name to indicate the type of data it houses. For example, it is conventional to name all uservolumes user.username where username is the user’s login name. Similarly, many cells elect to put system binaries in volumeswith names that begin with the system type code. For a list of other naming conventions, see Creating Volumes to SimplifyAdministration.

5.3 Creating Read/write Volumes

A read/write volume is the most basic type of volume, and must exist before you can create read-only or backup versions of it.When you issue the vos create command to create a read/write volume, the VL Server creates a VLDB entry for it which recordsthe name you specify, assigns a read/write volume ID number, and reserves the next two consecutive volume ID numbers forread-only and backup versions that possibly are to be created later. At the same time, the Volume Server creates a volume headerat the site you designate, allocating space on disk to record the name of the volume’s root directory. The name is filled in whenyou issue the fs mkmount command to mount the volume, and matches the mount point name. The following is also recordedin the volume header:

• An initial ACL associated with the volume’s root directory. By default it grants all seven AFS access permissions to thesystem:administrators group. After you mount the volume, you can use the fs setacl command to add other entries and toremove or change the entry for the system:administrators group. See Setting ACL Entries.

• A space quota, which limits the amount of disk space the read/write version of the volume can use on the file server partition.The default is of 5000 kilobyte blocks, but you can use the -maxquota argument to the vos create command to set a differentquota.

To change the quota after creation, use the fs setquota command as described in Setting and Displaying Volume Quota andCurrent Size.


5.3.1 To create (and mount) a read/write volume



2. Verify that you have the a( administer), i( insert), and l( lookup) permissions on the ACL of the directory where you planto mount the volume. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.

% fs listacl [<dir/file path>]

Members of the system:administrators group always implicitly have the a( administer) and by default also the l( lookup)permission on every ACL and can use the fs setacl command to grant other rights as necessary.

3. Select a site (disk partition on a file server machine) for the new volume. To verify that the site has enough free space tohouse the volume (now, or if it grows to use its entire quota), issue the vos partinfo command.

NoteThe partition-related statistics in this command’s output do not always agree with the corresponding values in the outputof the standard UNIX df command. The statistics reported by this command can be up to five minutes old, because theCache Manager polls the File Server for partition information at that frequency. Also, on some operating systems, the dfcommand’s report of partition size includes reserved space not included in this command’s calculation, and so is likely tobe about 10% larger.

% vos partinfo <machine name> [<partition name>]

where

p Is the shortest acceptable abbreviation of partinfo.

machine name Specifies the file server machine for which to display partition size and usage.

partition name Names one partition for which to display partition size and usage. If you omit it, the output displays thesize and space available for all partitions on the machine.

4. Select a volume name, taking note of the information in About Volume Names.

5. Issue the vos create command to create the volume.

% vos create <machine name> <partition name> <volume name> \[-maxquota <initial quota (KB)>]

where


machine name Specifies the file server machine on which to place the volume.

partition name Specifies the disk partition on which to place the volume.

volume name Names the volume. It can be up to 22 alphanumeric and punctuation characters in length. Your cell possiblyhas naming conventions for volumes, such as beginning user volume names with the string user and using the periodto separate parts of the name.

-maxquota Sets the volume’s quota, as a number of kilobyte blocks. If you omit this argument, the quota is set to 5000kilobyte blocks.

6. (Optional) Issue the fs mkmount command to mount the volume in the filespace. For complete syntax, see To create aregular or read/write mount point.

% fs mkmount <directory> <volume name>


7. (Optional) Issue the fs lsmount command to verify that the mount point refers to the correct volume. Complete instructionsappear in To display a mount point.

% fs lsmount <directory>

8. (Optional) Issue the fs setvol command with the -offlinemsg argument to record auxiliary information about the volumein its volume header. For example, you can record who owns the volume or where you have mounted it in the filespace.To display the information, use the fs examine command.

% fs setvol <dir/file path> -offlinemsg <offline message>

where

sv Is an acceptable alias for setvol(and setv the shortest acceptable abbreviation).

dir/file path Names the mount point of the volume with which to associate the message. Partial pathnames are interpretedrelative to the current working directory.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to change a read-onlyvolume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-onlypaths through the filespace, see The Rules of Mount Point Traversal.

-offlinemsg Specifies up to 128 characters of auxiliary information to record in the volume header.

5.4 About Clones and Cloning

To create a backup or read-only volume, the Volume Server begins by cloning the read/write source volume to create a clone.The Volume Server creates the clone automatically when you issue the vos backup or vos backupsys command (for a backupvolume) or the vos release command (for a read-only volume). No special action is required on your part.

A clone is not a copy of the data in the read/write source volume, but rather a copy of the read/write volume’s vnode index. Thevnode index is a table of pointers between the files and directories in the volume and the physical disk blocks on the partitionwhere the data resides. From the clone, backup and read-only volumes are created in the following manner:

• A read-only volume that occupies the same partition as its read/write source (also known as a read-only clone), and a backupvolume, are created by attaching a volume header to the clone. These volumes initially consume very little disk space, becausethe clone portion (the vnode index) points to exactly the same files as the read/write volume, as illustrated in Figure 1. Thefile sharing is possible only because the clone is on the same partition as the read/write source volume. When a file in theread/write volume is deleted, it is not actually removed from the partition, because the backup or read-only clone still points toit. Similarly, when a file in the read/write is changed, the entire original file is preserved on disk because the clone still pointsto it, and the read/write volume’s vnode index changes to point to newly space for the changed file. When this happens, thebackup or read-only volume is said to grow or start occupying actual disk space.

• A read-only volume that does not occupy the same site as the read/write source is a copy of the clone and of all of the data inthe read/write source volume. It occupies the same amount of disk space as the read/write volume did at the time the read-onlyvolume was created.


Figure 5.1: File Sharing Between the Read/write Source and a Clone Volume

5.5 Replicating Volumes (Creating Read-only Volumes)

Replication refers to creating a read-only copy of a read/write volume and distributing the copy to one or more additional fileserver machines. Replication makes a volume’s contents accessible on more than one file server machine, which increases dataavailability. It can also increase system efficiency by reducing load on the network and File Server. Network load is reduced if aclient machine’s server preference ranks lead the Cache Manager to access the copy of a volume stored on the closest file servermachine. Load on the File Server is reduced because it issues only one callback for all data fetched from a read-only volume,as opposed to a callback for each file fetched from a read/write volume. The single callback is sufficient for an entire read-onlyvolume because the volume does not change except in response to administrator action, whereas each read/write file can changeat any time.

Replicating a volume requires issuing two commands. First, use the vos addsite command to add one or more read-only sitedefinitions to the volume’s VLDB entry (a site is a particular partition on a file server machine). Then use the vos releasecommand to clone the read/write source volume and distribute the clone to the defined read-only sites. You issue the vos addsiteonly once for each read-only site, but must reissue the vos release command every time the read/write volume’s contents changeand you want to update the read-only volumes.

For users to have a consistent view of the file system, the release of updated volume contents to read-only sites must be atomic:either all read-only sites receive the new version of the volume, or all sites keep the version they currently have. The vos releasecommand is designed to ensure that all copies of the volume’s read-only version match both the read/write source and each other.In cases where problems such as machine or server process outages prevent successful completion of the release operation, AFSuses two mechanisms to alert you.

First, the command interpreter generates an error message on the standard error stream naming each read-only site that did notreceive the new volume version. Second, during the release operation the Volume Location (VL) Server marks site definitionsin the VLDB entry with flags (New release and Old release) that indicate whether or not the site has the new volumeversion. If any flags remain after the operation completes, it was not successful. The Cache Manager refuses to access a read-onlysite marked with the Old release flag, which potentially imposes a greater load on the sites marked with the New releaseflag. It is important to investigate and eliminate the cause of the failure and then to issue the vos release command as many timesas necessary to complete the release without errors.

The pattern of site flags remaining in the volume’s VLDB entry after a failed release operation can help determine the point atwhich the operation failed. Use the vos examine or vos listvldb command to display the VLDB entry. The VL Server sets theflags in concert with the Volume Server’s operations, as follows:


1. Before the operation begins, the VL Server sets the New release flag on the read/write site definition in the VLDBentry and the Old release flag on read-only site definitions (unless the read-only site has been defined since the lastrelease operation and has no actual volume, in which case its site flag remains Not released).

2. If necessary, the Volume Server creates a temporary copy (a clone) of the read/write source called the ReleaseClone (see thefollowing discussion of when the Volume Server does or does not create a new ReleaseClone.) It assigns the ReleaseCloneits own volume ID number, which the VL Server records in the RClone field of the source volume’s VLDB entry.

3. The Volume Server distributes a copy of the ReleaseClone to each read-only site defined in the VLDB entry. As the sitesuccessfully receives the new clone, the VL Server sets the site’s flag in the VLDB entry to New release.

4. When all the read-only copies are successfully released, the VL Server clears all the New release site flags. TheReleaseClone is no longer needed, so the Volume Server deletes it and the VL Server erases its ID from the VLDB entry.

By default, the Volume Server determines automatically whether or not it needs to create a new ReleaseClone:

• If there are no flags (New release, Old release, or Not released) on site definitions in the VLDB entry, theprevious vos release command completed successfully and all read-only sites currently have the same volume. The VolumeServer infers that the current vos release command was issued because the read/write volume has changed. The Volume Servercreates a new ReleaseClone and distributes it to all of the read-only sites.

• If any site definition in the VLDB entry is marked with a flag, either the previous release operation did not complete successfullyor a new read-only site was defined since the last release. The Volume Server does not create a new ReleaseClone, insteaddistributing the existing ReleaseClone to sites marked with the Old release or Not released flag. As previously noted,the VL Server marks each VLDB site definition with the New release flag as the site receives the ReleaseClone, and clearsall flags after all sites successfully receive it.

To override the default behavior, forcing the Volume Server to create and release a new ReleaseClone to the read-only sites,include the -f flag. This is appropriate if, for example, the data at the read/write site has changed since the existing ReleaseClonewas created during the previous release operation.

5.5.1 Using Read-only Volumes Effectively

For maximum effectiveness, replicate only volumes that satisfy two criteria:

• The volume’s contents are heavily used. Examples include a volume housing binary files for text editors or other popularapplication programs, and volumes mounted along heavily traversed directory paths such as the paths leading to user homedirectories. It is an inefficient use of disk space to replicate volumes for which the demand is low enough that a single FileServer can easily service all requests.

• The volume’s contents change infrequently. As noted, file system consistency demands that the contents of read-only volumesmust match each other and their read/write source at all times. Each time the read/write volume changes, you must issue thevos release command to update the read-only volumes. This can become tedious (and easy to forget) if the read/write volumechanges frequently.

Explicitly mounting a read-only volume (creating a mount point that names a volume with a .readonly extension) is not generallynecessary or appropriate. The Cache Manager has a built-in bias to access the read-only version of a replicated volume wheneverpossible. As described in more detail in The Rules of Mount Point Traversal, when the Cache Manager encounters a mount pointit reads the volume name inside it and contacts the VL Server for a list of the sites that house the volume. In the normal case, ifthe mount point resides in a read-only volume and names a read/write volume (one that does not have a .readonly or .backupextension), the Cache Manager always attempts to access a read-only copy of the volume. Thus there is normally no reason toforce the Cache Manager to access a read-only volume by mounting it explicitly.

It is a good practice to place a read-only volume at the read/write site, for a couple of reasons. First, the read-only volume at theread/write site requires only a small amount of disk space, because it is a clone rather a copy of all of the data (see About Clonesand Cloning). Only if a large number of files are removed or changed in the read/write volume does the read-only copy occupymuch disk space. That normally does not happen because the appropriate response to changes in a replicated read/write volumeis to reclone it. The other reason to place a read-only volume at the read/write site is that the Cache Manager does not attempt


to access the read/write version of a replicated volume if all read-only copies become inaccessible. If the file server machinehousing the read/write volume is the only accessible machine, the Cache Manager can access the data only if there is a read-onlycopy at the read/write site.

The number of read-only sites to define depends on several factors. Perhaps the main trade-off is between the level of demandfor the volume’s contents and how much disk space you are willing to use for multiple copies of the volume. Of course, eachprospective read-only site must have enough available space to accommodate the volume. The limit on the number of read-onlycopies of a volume is determined by the maximum number of site definitions in a volume’s VLDB entry, which is defined inthe OpenAFS Release Notes. The site housing the read/write and backup versions of the volume counts as one site, and eachread-only site counts as an additional site (even the read-only site defined on the same file server machine and partition as theread/write site counts as a separate site). Note also that the Volume Server permits only one read-only copy of a volume per fileserver machine.

5.5.2 Replication Scenarios

The instructions in the following section explain how to replicate a volume for which no read-only sites are currently defined.However, you can also use the instructions in other common situations:

• If you are releasing a new clone to sites that already exist, you can skip Step 2. It can still be useful to issue the vos examinecommand, however, to verify that the desired read-only sites are defined.

• If you are adding new read-only sites to existing ones, perform all of the steps. In Step 3, issue the vos addsite command forthe new sites only.

• If you are defining sites but do not want to release a clone to them yet, stop after Step 3and continue when you are ready.

• If you are removing one or more sites before releasing a new clone to the remaining sites, follow the instructions for siteremoval in Removing Volumes and their Mount Pointsand then start with Step 4.

5.5.3 To replicate a read/write volume (create a read-only volume)



2. Select one or more sites at which to replicate the volume. There are several factors to consider:

• How many sites are already defined. As previously noted, it is usually appropriate to define a read-only site at theread/write site. Also, the Volume Server permits only one read-only copy of a volume per file server machine. To displaythe volume’s current sites, issue the vos examine command, which is described fully in Displaying One Volume’s VLDBEntry and Volume Header.

% vos examine <volume name or ID>

The final lines of output display the volume’s site definitions from the VLDB.

• Whether your cell dedicates any file server machines to housing read-only volumes only. In general, only very large cellsuse read-only server machines.

• Whether a site has enough free space to accommodate the volume. A read-only volume requires the same amount ofspace as the read/write version (unless it is at the read/write site itself). The first line of output from the vos examinecommand displays the read/write volume’s current size in kilobyte blocks, as shown in Displaying One Volume’s VLDBEntry and Volume Header.To display the amount of space available on a file server machine’s partitions, use the vos partinfo command, which isdescribed fully in Creating Read/write Volumes.



3. Issue the vos addsite command to define each new read-only site in the VLDB.

% vos addsite <machine name> <partition name> <volume name or ID>

where

ad Is the shortest acceptable abbreviation of addsite.

machine name Defines the file server machine for the new site.

partition name Names a disk partition on the machine machine name.

volume name or ID Identifies the read/write volume to be replicated, either by its complete name or its volume ID num-ber.

4. (Optional) Verify that the fs process (which incorporates the Volume Server) is functioning normally on each file servermachine where you have defined a read-only site, and that the vlserver process (the Volume Location Server) is functioningcorrectly on each database server machine. Knowing that they are functioning eliminates two possible sources of failurefor the release. Issue the bos status command on each file server machine housing a read-only site for this volume andon each database server machine. The command is described fully in Displaying Process Status and Information from theBosConfig File.

% bos status <machine name> fs vlserver

5. Issue the vos release command to clone the read/write source volume and distribute the clone to each read-only site.

% vos release <volume name or ID> [-f]

where

rel Is the shortest acceptable abbreviation of release.

volume name or ID Identifies the read/write volume to clone, either by its complete name or volume ID number. Theread-only version is given the same name with a .readonly extension. All read-only copies share the same read-onlyvolume ID number.

-f Creates and releases a brand new clone.

6. (Optional) Issue the vos examine command to verify that no site definition in the VLDB entry is marked with an Oldrelease or New release flag. The command is described fully in Displaying One Volume’s VLDB Entry and VolumeHeader.


If any flags appear in the output from Step 6, repeat Steps 4and 5until the Volume Server does not produce any error messagesduring the release operation and the flags no longer appear. Do not issue the vos release command when you know that theread/write site or any read-only site is inaccessible due to network, machine or server process outage.

5.6 Creating Backup Volumes

A backup volume is a clone that resides at the same site as its read/write source (to review the concept of cloning, see AboutClones and Cloning). Creating a backup version of a volume has two purposes:

• It is by convention the first step when dumping a volume’s contents to tape with the AFS Backup System. A volume isinaccessible while it is being dumped, so instead of dumping the read/write volume, you create and dump a backup version.Users do not normally access the backup version, so it is unlikely that the dump will disturb them. For more details, seeBacking Up Data.

• It enables users to restore mistakenly deleted or changed data themselves, freeing you for more crucial tasks. The backupversion captures the state of its read/write source at the time the backup is made, and its contents cannot change. Mount thebackup version in the filespace so that users can restore a file to its state at the time you made the backup. See Making theContents of Backup Volumes Available to Users.


5.6.1 Backing Up Multiple Volumes at Once

The vos backupsys command creates a backup version of many read/write volumes at once. This command is useful whenpreparing for large-scale backups to tape using the AFS Backup System.

To clone every read/write volume listed in the VLDB, omit all of the command’s options. Otherwise, combine the command’soptions to clone various groups of volumes. The options use one of two basic criteria to select volumes: location (the -serverand -partition arguments) or presence in the volume name of one of a set of specified character strings (the -prefix, -exclude,and -xprefix options).

To clone only volumes that reside on one file server machine, include the -server argument. To clone only volumes that reside onone partition, combine the -server and -partition arguments. The -partition argument can also be used alone to clone volumesthat reside on the indicated partition on every file server machine. These arguments can be combined with those that selectvolumes based on their names.

Combine the -prefix, -exclude, and -xprefix options (with or without the -server and -partition arguments) in the indicatedways to select volumes based on character strings contained in their names:

• To clone every read/write volume at the specified location whose name includes one of a set of specified character strings (forexample, begins with user. or includes the string afs), use the -prefix argument or combine the -xprefix and -exclude options.

• To clone every read/write volume at the specified location except those whose name includes one of a set of specified characterstrings, use the -xprefix argument or combine the -prefix and -exclude options.

• To clone every read/write volume at the specified location whose name includes one of one of a set of specified characterstrings, except those whose names include one of a different set of specified character strings, combine the -prefix and -xprefixarguments. The command creates a list of all volumes that match the -prefix argument and then removes from the list thevolumes that match the -xprefix argument. For effective results, the strings specified by the -xprefix argument must designatea subset of the volumes specified by the -prefix argument.

If the -exclude flag is combined with the -prefix and -xprefix arguments, the command creates a list of all volumes that donot match the -prefix argument and then adds to the list any volumes that match the -xprefix argument. As when the -excludeflag is not used, the result is effective only if the strings specified by the -xprefix argument designate a subset of the volumesspecified by the -prefix argument.

The -prefix and -xprefix arguments both accept multiple values, which can be used to define disjoint groups of volumes. Eachvalue can be one of two types:

1. A simple character string, which matches volumes whose name begin with the string. All characters are interpreted literally(that is, characters that potentially have special meaning to the command shell, such as the period, have only their literalmeaning).

2. A regular expression, which matches volumes whose names contain the expressions. Place a caret ( ˆ) at the beginning ofthe expression, and enclose the entire string in single quotes ( ’ ’). Explaining regular expressions is outside the scope ofthis reference page; see the UNIX manual page for regexp(5) or (for a brief introduction) Defining and Displaying VolumeSets and Volume Entries. As an example, the following expression matches volumes that have the string aix anywhere intheir names:

-prefix ’^.*aix’

To display a list of the volumes to be cloned, without actually cloning them, include the -dryrun flag. To display a statement thatsummarizes the criteria being used to select volume, include the -verbose flag.

To back up a single volume, use the vos backup command, which employs a more streamlined technique for finding a singlevolume.


5.6.2 Automating Creation of Backup Volumes

Most cells find that it is best to make a new backup version of relevant volumes each day. It is best to create the backup versionsat a time when usage is low, because the backup operation causes the read/write volume to be unavailable momentarily.

You can either issue the necessary the vos backupsys or vos backup commands at the console or create a cron entry in theBosConfig file on a file server machine, which eliminates the need for an administrator to initiate the backup operation.

The following example command creates a cron process called backupusers in the /usr/afs/local/BosConfig file on the machinefs3.example.com. The process runs every day at 1:00 a.m. to create a backup version of every volume in the cell whose namestarts with the string user. The -localauth flag enables the process to invoke the privileged vos backupsys command whileunauthenticated. Note that the -cmd argument specifies a complete pathname for the vos binary, because the PATH environmentvariable for the BOS Server (running as the local superuser root) generally does not include the path to AFS binaries.

% bos create fs3.example.com backupusers cron\-cmd "/usr/afs/bin/vos backupsys -prefix user -localauth" "1:00"

5.6.3 Making the Contents of Backup Volumes Available to Users

As noted, a backup volume preserves the state of the read/write source at the time the backup is created. Many cells chooseto mount backup volumes so that users can access and restore data they have accidentally deleted or changed since the lastbackup was made, without having to request help from administrators. The most sensible place to mount the backup version ofa user volume is at a subdirectory of the user’s home directory. Suitable names for this directory include OldFiles and Backup.The subdirectory looks just like the user’s own home directory as it was at the time the backup was created, with all files andsubdirectories in the same relative positions.

If you do create and mount backup volumes for your users, inform users of their existence. The OpenAFS User Guide doesnot mention backup volumes because making them available to users is optional. Explain to users how often you make a newbackup, so they know what they can recover. Remind them also that the data in their backup volume cannot change; however,they can use the standard UNIX cp command to copy it into their home volume and modify it there. Reassure users that the datain their backup volumes does not count against their read/write volume quota.

5.6.4 To create and mount a backup volume



2. Verify that you have the insert( i) and administer( a) permissions on the ACL of the directory in which you wish to mountthe volume. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.



3. Issue the vos backup command to create a backup version of a read/write source volume. The message shown confirmsthe success of the backup operation.

% vos backup <volume name or ID> Created backup volume for volume name or ID

where

backup Must be typed in full.

volume name or ID Identifies the read/write volume to back up, either by its complete name or volume ID number. Thebackup volume has the same name with the addition of the .backup extension. It has its own volume ID number.


4. (Optional) Issue the fs mkmount to mount the backup volume. While this step is optional, Cache Managers cannot accessthe volume’s contents if it is not mounted.

% fs mkmount <directory> <volume name> .backup

where

mk Is the shortest acceptable abbreviation of mkmount.directory Names the mount point to create. Do not create a file or directory of the same name beforehand. Partial

pathnames are interpreted relative to the current working directory. For the backup version of a user volume, theconventional location is the user’s home directory.

volume name.backup Is the full name of the backup volume.



5.6.5 To create multiple backup volumes at once



2. Issue the vos backupsys command to create a backup version of every read/write volume that shares the same prefix orsite. The effects of combining the three arguments are described in Backing Up Multiple Volumes at Once.

% vos backupsys [-prefix <common prefix on volume(s)>+] \[-server <machine name>] [-partition <partition name>] \[-exclude] [-xprefix <negative prefix on volume(s)>+] \[-dryrun] [-verbose]

where

backups Is the shortest acceptable abbreviation of backupsys.

-prefix Specifies one or more simple character strings or regular expressions of any length; a volume whose name includesthe string is placed on the list of volumes to be cloned. Include field separators (such as periods) if appropriate. Thisargument can be combined with any combination of the -server, -partition, -exclude, and -xprefix options.

-server Specifies the file server machine housing the volumes to backup. Can be combined with any combination of the-prefix, -partition, -exclude, and -xprefix options.

-partition Specifies the partition housing the volumes you wish to backup. Can be combined with any combination of the-prefix, -server, -exclude, and -xprefix options.

-exclude Indicates that all volumes except those indicated with the -prefix argument are to be backed up. The -prefixargument must be provided along with this one. Can also be combined with any combination of the -prefix, -server,and -partition arguments; or with both the -prefix and -xprefix arguments, but not with the -xprefix argument alone.

-xprefix Specifies one or more simple character strings or regular expressions of any length; a volume whose name doesnot include the string is placed on the list of volumes to be cloned. Can be combined with any combination of the -prefix, -server, and -partition arguments; in addition, it can be combined with both the -prefix and -exclude options,but not with the -exclude flag alone.

-dryrun Displays on the standard output stream a list of the volumes to be cloned, without actually cloning them.

-verbose Displays on the standard output stream a statement that summarizes the criteria being used to select volumes, ifcombined with the -dryrun flag; otherwise, traces the cloning operation for each volume.


5.7 Mounting Volumes

Mount points make the contents of AFS volumes visible and accessible in the AFS filespace, as described in About MountingVolumes. This section discusses in more detail how the Cache Manager handles mount points as it traverses the filespace. Itdescribes the three types of mount points, their purposes, and how to distinguish between them, and provides instructions forcreating, removing, and examining mount points.

5.7.1 The Rules of Mount Point Traversal

The Cache Manager observes three basic rules as it traverses the AFS filespace and encounters mount points:

• Rule 1: Access Backup and Read-only Volumes When Specified

When the Cache Manager encounters a mount point that specifies a volume with either a .readonly or a .backup extension, itaccesses that type of volume only. If a mount point does not have either a .backup or .readonly extension, the Cache Manageruses Rules 2 and 3.

For example, the Cache Manager never accesses the read/write version of a volume if the mount point names the backupversion. If the specified version is inaccessible, the Cache Manager reports an error.

• Rule 2: Follow the Read-only Path When Possible

If a mount point resides in a read-only volume and the volume that it references is replicated, the Cache Manager attempts toaccess a read-only copy of the volume; if the referenced volume is not replicated, the Cache Manager accesses the read/writecopy. The Cache Manager is thus said to prefer a read-only path through the filespace, accessing read-only volumes when theyare available.

The Cache Manager starts on the read-only path in the first place because it always accesses a read-only copy of the root.afsvolume if it exists; the volume is mounted at the root of a cell’s AFS filespace (named /afs by convention). That is, if the root.afsvolume is replicated, the Cache Manager attempts to access a read-only copy of it rather than the read/write copy. This rulethen keeps the Cache Manager on a read-only path as long as each successive volume is replicated. The implication is thatboth the root.afs and root.cell volumes must be replicated for the Cache Manager to access replicated volumes mounted belowthem in the AFS filespace. The volumes are conventionally mounted at the /afs and /afs/cellname directories, respectively.

• Rule 3: Once on a Read/write Path, Stay There

If a mount point resides in a read/write volume and the volume name does not have a .readonly or a .backup extension, theCache Manager attempts to access only the a read/write version of the volume. The access attempt fails with an error if theread/write version is inaccessible, even if a read-only version is accessible. In this situation the Cache Manager is said to be ona read/write path and cannot switch back to the read-only path unless mount point explicitly names a volume with a .readonlyextension. (Cellular mount points are an important exception to this rule, as explained in the following discussion.

5.7.2 The Three Types of Mount Points

AFS uses three types of mount points, each appropriate for a different purpose because of how the Cache Manager handles them.

• When the Cache Manager crosses a regular mount point, it obeys all three of the mount point traversal rules previouslydescribed.

AFS performs best when the vast majority of mount points in the filespace are regular, because the mount point traversal rulespromote the most efficient use of both replicated and nonreplicated volumes. Because there are likely to be multiple read-onlycopies of a replicated volume, it makes sense for the Cache Manager to access one of them rather than the single read/writeversion, and the second rule leads it to do so. If a volume is not replicated, the third rule means that the Cache Manager stillaccesses the read/write volume when that is the only type available. In other words, a regular mount point does not force theCache Manager always to access read-only volumes (it is explicitly not a "read-only mount point").

To create a regular mount point, use the fs mkmount command as described in To create a regular or read/write mount point.


NoteTo enable the Cache Manager to access the read-only version of a replicated volume named by a regular mount point, allvolumes that are mounted above it in the pathname must also be replicated. That is the only way the Cache Manager canstay on a read-only path to the target volume.

• When the Cache Manager crosses a read/write mount point, it attempts to access only the volume version named in the mountpoint. If the volume name is the base (read/write) form, without a .readonly or .backup extension, the Cache Manager accessesthe read/write version of the volume, even if it is replicated. In other words, the Cache Manager disregards the second mountpoint traversal rule when crossing a read/write mount point: it switches to the read/write path through the filespace.

It is conventional to create only one read/write mount point in a cell’s filespace, using it to mount the cell’s root.cell volumejust below the AFS filespace root (by convention, /afs/.cellname). As indicated, it is conventional to place a period at thestart of the read/write mount point’s name (for example, /afs/.example.com). The period distinguishes the read/write mountpoint from the regular mount point for the root.cell volume at the same level. This is the only case in which it is conventionalto create two mount points for the same volume. A desirable side effect of this naming convention for this read/write mountpoint is that it does not appear in the output of the UNIX ls command unless the -a flag is included, essentially hiding it fromregular users who have no use for it.

The existence of a single read/write mount point at this point in the filespace provides access to the read/write version of everyvolume when necessary, because it puts the Cache Manager on a read/write path right at the top of the filespace. At the sametime, the regular mount point for the root.cell volume puts the Cache Manager on a read-only path most of the time.

Using a read/write mount point for a read-only or backup volume is acceptable, but unnecessary. The first rule of mount pointtraversal already specifies that the Cache Manager accesses them if the volume name in a regular mount point has a .readonlyor .backup extension.

To create a read/write mount point, use the -rw flag on the fs mkmount command as described in To create a regular orread/write mount point.

• When the Cache Manager crosses a cellular mount point, it accesses the indicated volume in the specified cell, which isnormally a foreign cell. (If the mount point does not name a cell along with the volume, the Cache Manager accesses thevolume in the cell where the mount point resides.) When crossing a regular cellular mount point, the Cache Manager disregardsthe third mount point traversal rule. Instead, it accesses a read-only version of the volume if it is replicated, even if the volumethat houses the mount point is read/write.

It is inappropriate to circumvent this behavior by creating a read/write cellular mount point, because traversing the read/writepath imposes an unfair load on the foreign cell’s file server machines. The File Server must issue a callback for each filefetched from the read/write volume, rather than single callback required for a read-only volume. In any case, only a cell’s ownadministrators generally need to access the read/write versions of replicated volumes.

It is conventional to create cellular mount points only at the second level in a cell’s filespace, using them to mount foreign cells’root.cell volumes just below the AFS filespace root (by convention, at /afs/foreign_cellname). The mount point enableslocal users to access the foreign cell’s filespace, assuming they have the necessary permissions on the ACL of the volume’sroot directory and that there is an entry for the foreign cell in each local client machine’s /usr/vice/etc/CellServDB file, asdescribed in Maintaining Knowledge of Database Server Machines.

Creating cellular mount points at other levels in the filespace and mounting foreign volumes other than the root.cell volumeis not generally appropriate. It can be confusing to users if the Cache Manager switches between cells at various points in apathname.

To create a regular cellular mount point, use the -cell argument to specify the cell name, as described in To create a cellularmount point.

To examine a mount point, use the fs lsmount command as described in To display a mount point. The command’s outputuses distinct notation to identify regular, read/write, and cellular mount points. To remove a mount point, use the fs rmmountcommand as described in To remove a mount point.

5.7.3 Creating a mount point in a foreign cell

Creating a mount point in a foreign cell’s filespace (as opposed to mounting a foreign volume in the local cell) is basically thesame as creating a mount point in the local filespace. The differences are that the fs mkmount command’s directory argument


specifies a pathname in the foreign cell rather than the local cell, and you must have the required permissions on the ACL of theforeign directory where you are creating the mount point. The fs mkmount command’s -cell argument always specifies the cellin which the volume resides, not the cell in which to create the mount point.

5.7.4 To display a mount point

1. Issue the fs lsmount command.


where

ls Is the shortest acceptable abbreviation of lsmount.directory Names the mount point to display.

If the specified directory is a mount point, the output is of the following form:

’directory’ is a mount point for volume ’volume name’

For a regular mount point, a number sign (#) precedes the volume name string, as in the following example command issued ona client machine in the example.com cell.

% fs lsmount /afs/example.com/usr/terry’/afs/example.com/usr/terry’ is a mount point for volume ’#user.terry’

For a read/write mount point, a percent sign (%) precedes the volume name string, as in the following example command issuedon a client machine in the example.com cell. The cell’s administrators have followed the convention of preceding the read/writemount point’s name with a period.

% fs lsmount /afs/.example.com’/afs/.example.com’ is a mount point for volume ’%root.cell’

For a cellular mount point, a cell name and colon (:) follow the number or percent sign and precede the volume name string, asin the following example command issued on a client machine in the example.com cell.

% fs lsmount /afs/example.org’/afs/example.org’ is a mount point for volume ’#example.org:root.cell’

For a symbolic link to a mount point, the output is of the form shown in the following example command issued on a clientmachine in the example.com cell.

% fs lsmount /afs/example’/afs/example’ is a symbolic link, leading to a mount point for volume ’#root.cell’

If the directory is not a mount point or is not in AFS, the output reads as follows.

’directory’ is not a mount point.

If the output is garbled, it is possible that the mount point has become corrupted in the local cache. Use the fs flushmountcommand as described in To flush one or more mount points. This forces the Cache Manager to refetch the mount point.

5.7.5 To create a regular or read/write mount point

1. Verify that you have the i( insert) and a( administer) permissions on the ACL of the directory where you are placing themount point. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.



2. Issue the fs mkmount command to create the mount point. Include the -rw flag if creating a read/write mount point.

% fs mkmount <directory> <volume name> [-rw]

where

mk Is the shortest acceptable abbreviation for mkmount.directory Names the mount point to create. A file or directory with the same name cannot already exist. A partial

pathname is interpreted relative to the current working directory.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to create a newmount point in a read-only volume. By convention, you indicate the read/write path by placing a period before thecell name at the pathname’s second level (for example, /afs/.example.com). For further discussion of the concept ofread/write and read-only paths through the filespace, see The Rules of Mount Point Traversal.

volume name Specifies the volume’s full name, including the .backup or .readonly extension for a backup or read-onlyvolume, if appropriate.

-rw Creates a read/write mount point.

5.7.6 To create a cellular mount point

1. Verify that you have the i( insert) and a( administer) permissions on the ACL of the directory where you are placing themount point. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.


2. If you are mounting one or more foreign cells’ root.cell volume at the second level in your filespace and your cell’s root.afsvolume is replicated, you must create a temporary mount point for the root.afs volume’s read/write version in a directoryon which the ACL grants you the i and a permissions. The following command creates a mount point called new_cells inyour cell’s /afs/.cellname directory (the entry point to the read/write path in your cell).

Substitute your cell’s name for cellname.

% cd /afs/.cellname% fs mkmount new_cells root.afs% cd new_cells

3. Issue the fs mkmount command with the -cell argument to create a cellular mount point. Repeat the command for eachcellular mount point as required.

% fs mkmount <directory> <volume name> -cell <cell name>

where

mk Is the shortest acceptable abbreviation for mkmount.directory Names the mount point to create. A file or directory with the same name cannot already exist. A partial

pathname is interpreted relative to the current working directory. If you are mounting a foreign cell’s root.cellvolume, the standard value for this argument is the cell’s complete Internet domain name.

volume name Specifies the volume’s full name, usually root.cell for a cellular mount point.

-cell Specifies the complete Internet domain name of the cell in which the volume resides.

4. If you performed the instructions in Step 2, issue the vos release command to release the new version of the root.afsvolume to its read-only sites. (This command requires that you be listed in your cell’s /usr/afs/etc/UserList file. Ifnecessary, verify by issuing the bos listusers command, which is fully described in To display the users in the UserListfile.)

Also issue the fs checkvolumes command to force the local Cache Manager to access the new replica of the root.afsvolume. If desired, you can also remove the temporary new_cells mount point from the /afs/.cellname directory.


% vos release root.afs% fs checkvolumes% cd /afs/.cellname% fs rmmount new_cells

For your users to access a newly mounted foreign cell, you must also create an entry for it in each client machine’s local/usr/vice/etc/CellServDB file and either reboot the machine or use the fs newcell command to insert the entry directly intoits kernel memory. See the instructions in Maintaining Knowledge of Database Server Machines.

5.7.7 To remove a mount point

1. Verify that you have the d( delete) permission on the ACL of the directory from which you are removing the mount point.If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.



2. Issue the fs rmmount command to remove the mount point. The volume still exists, but its contents are inaccessible if thisis the only mount point for it.

% fs rmmount <directory>

where

rm Is the shortest acceptable abbreviation of rmmount.directory Names the mount point to remove. A partial pathname is interpreted relative to the current working directory.

Specify the read/write path to the mount point, to avoid the failure that results when you attempt to delete a mountpoint from a read-only volume. By convention, you indicate the read/write path by placing a period before the cellname at the pathname’s second level (for example, /afs/.example.com). For further discussion of the concept ofread/write and read-only paths through the filespace, see The Rules of Mount Point Traversal.

5.7.8 To access volumes directly by volume ID

You can directly access volumes by volume IDs. This is only recommended for temporary access to a volume. For example, ifyou have recently restored a volume from a backup, you can use this syntax to quickly access the volume without mounting it.

This feature is available with Windows and Linux based cache managers by default. This feature is available with other Unixbased cache managers when the dynamic root (-dynroot) and fake stat (-fakestat) modes are enabled.

Examples:

• Windows: \\afs\example.com%root.cell - Access a read/write volume directly

• Windows: \\afs\example.com#root.cell - Access a read-only volume directly

• Unix: /afs/.:mount/example.com:root.cell - Access a read/write volume directly

• Unix: /afs/.:mount/example.com:root.cell.readonly - Access a read-only volume directly

• Unix: /afs/.:mount/example.com:root.cell.backup - Access a backup volume directly

5.8 Displaying Information About Volumes

This section explains how to display information about volumes. If you know a volume’s name or volume ID number, thereare commands for displaying its VLDB entry, its volume header, or both. Other commands display the name or location of thevolume that contains a specified file or directory.

For instructions on displaying a volume’s quota, see Setting and Displaying Volume Quota and Current Size.


5.8.1 Displaying VLDB Entries

The vos listvldb command displays the VLDB entry for the volumes indicated by the combination of arguments you provide.The possibilities are listed here from most to least inclusive:

• To display every entry in the VLDB, provide no arguments. It can take a long time to generate the output, depending on thenumber of entries.

• To display every VLDB entry that mentions a specific file server machine as the site of a volume, specify the machine’s namewith the -server argument.

• To display every VLDB entry that mentions a certain partition on any file server machine as the site of a volume, specify thepartition name with the -partition argument.

• To display every VLDB entry that mentions a certain partition on a certain file server machine as the site of a volume, combinethe -server and -partition arguments.

• To display a single VLDB entry, specify a volume name or ID number with the -name argument.

• To display the VLDB entry only for volumes with locked VLDB entries, use the -locked flag with any of the site definitionsmentioned previously.

5.8.2 To display VLDB entries

1. Issue the vos listvldb command.

% vos listvldb [-name <volume name or ID>] [-server <machine name>] \[-partition <partition name>] [-locked]

where

listvl Is the shortest acceptable abbreviation of listvldb.

-name Identifies one volume either by its complete name or volume ID number. Do not combine this argument with the-server or -partition arguments.

-server Specifies a file server machine. Combine this argument with the -partition argument if desired, but not with the-name argument.

-partition Specifies a partition. Combine this argument with the -server argument if desired, but not with the -nameargument.

-locked Displays only locked VLDB entries. Combine this flag with any of the other options.

The VLDB entry for each volume includes the following information:

• The base (read/write) volume name. The read-only and backup versions have the same name with a .readonly and .backupextension, respectively.

• The volume ID numbers allocated to the versions of the volume that actually exist, in fields labeled RWrite for the read-/write, ROnly for the read-only, Backup for the backup, and RClone for the ReleaseClone. (If a field does not appear, thecorresponding version of the volume does not exist.) The appearance of the RClone field normally indicates that a releaseoperation did not complete successfully; the Old release and New release flags often also appear on one or more ofthe site definition lines described just following.

• The number of sites that house a read/write or read-only copy of the volume, following the string number of sites ->.

• A line for each site that houses a read/write or read-only copy of the volume, specifying the file server machine, partition, andtype of volume (RW for read/write or RO for read-only). If a backup version exists, it is understood to share the read/write site.Several flags can appear with a site definition:


Not released Indicates that the vos release command has not been issued since the vos addsite command was used todefine the read-only site.

Old release Indicates that a vos release command did not complete successfully, leaving the previous, obsolete versionof the volume at this site.

New release Indicates that a vos release command did not complete successfully, but that this site did receive the correctnew version of the volume.

• If the VLDB entry is locked, the string Volume is currently LOCKED.

For further discussion of the New release and Old release flags, see Replicating Volumes (Creating Read-only Volumes).

An example of this command and its output for a single volume:

% vos listvldb user.terryuser.terry

RWrite: 50489902 Backup: 50489904number of sites -> 1

server fs3.example.com partition /vicepc RW Site

5.8.3 Displaying Volume Headers

The vos listvol command displays the volume header for every volume on one or all partitions on a file server machine. The voscommand interpreter obtains the information from the Volume Server on the specified machine. You can control the amount ofinformation displayed by including one of the -fast, the -long, or the -extended flags described following the instructions in Todisplay volume headers.

To display a single volume’s volume header of one volume only, use the vos examine command as described in Displaying OneVolume’s VLDB Entry and Volume Header.

5.8.4 To display volume headers

1. Issue the vos listvol command.

% vos listvol <machine name> [<partition name>] [-fast] [-long] [-extended]

where

listvo Is the shortest acceptable abbreviation of listvol.machine name Names the file server machine for which to display volume headers. Provide this argument alone or with

the partition name argument.

partition name Names one partition on the file server machine named by the machine name argument, which must beprovided along with this one.

-fast Displays only the volume ID numbers of relevant volumes. Do not combine this flag with the -long or -extendedflag.

-long Displays more detailed information about each volume. Do not combine this flag with the -fast or -extended flag.

-extended Displays all of the information displayed by the -long flag, plus tables of statistics about reads and writes tothe files in the volume. Do not combine this flag with the -fast or -long flag.

The output is ordered alphabetically by volume name and by default provides the following information on a single line for eachvolume:

• Name

• Volume ID number

• Type (the flag is RW for read/write, RO for read-only, BK for backup)


• Size in kilobytes (1024 equals a megabyte)

• Number of files in the volume, if the -extended flag is provided

• Status on the file server machine, which is one of the following:

On-line The volume is completely accessible to Cache Managers.

Off-line The volume is not accessible to Cache Managers, but does not seem to be corrupted. This status appears while avolume is being dumped, for example.

Off-line**needs salvage** The volume is not accessible to Cache Managers, because it seems to be corrupted. Usethe bos salvage or salvager command to repair the corruption.

If the following message appears instead of the previously listed information, it indicates that a volume is not accessible to CacheManagers or the vos command interpreter, for example because a clone is being created.

**** Volume volume_ID is busy ****

If the following message appears instead of the previously listed information, it indicates that the File Server is unable to attachthe volume, perhaps because it is seriously corrupted. The FileLog and VolserLog log files in the /usr/afs/logs directory on thefile server machine possibly provide additional information; use the bos getlog command to display them.

**** Could not attach volume volume_ID ****

(For instructions on salvaging a corrupted or unattached volume, see Salvaging Volumes.)

The information about individual volumes is bracketed by summary lines. The first line of output specifies the number of volumesin the listing. The last line of output summarizes the number of volumes that are online, offline, and busy, as in the followingexample:

% vos listvol fs2.example.com /vicepbTotal number of volumes on server fs2.example.com \

partition /vicepb : 66sys 1969534847 RW 1582 K On-linesys.backup 1969535105 BK 1582 K On-line

. . . . . .

. . . . . .user.pat 1969534536 RW 17518 K On-lineuser.pat.backup 1969534538 BK 17537 K On-lineTotal volumes onLine 66 ; Total volumes offLine 0 ; Total busy 0

Output with the -fast Flag

If you include the -fast flag displays only the volume ID number of each volume, arranged in increasing numerical order, as inthe following example. The final line (which summarizes the number of on-line, off-line, and busy volumes) is omitted.

% vos listvol fs3.example.com /vicepa -fTotal number of volumes on server fs3.example.com \

partition /vicepa: 375048990250489904

.

.3597032549732810

Output with the -long Flag

When you include the -long flag, , the output for each volume includes all of the information in the default listing plus thefollowing. Each item in this list corresponds to a separate line of output:

• The file server machine and partition that house the volume, as determined by the command interpreter as the command runs,rather than derived from the VLDB or the volume header.


• The volume ID numbers associated with the various versions of the volume: read/write (RWrite), read-only (ROnly), backup(Backup), and ReleaseClone (RClone). One of them matches the volume ID number that appears on the first line of thevolume’s output. If the value in the RWrite, ROnly, or Backup field is 0 (zero), there is no volume of that type. If there iscurrently no ReleaseClone, the RClone field does not appear at all.

• The maximum space quota allotted to the read/write copy of the volume, expressed in kilobyte blocks in the MaxQuota field.

• The date and time the volume was created, in the Creation field. If the volume has been restored with the backup diskre-store, backup volrestore, or vos restore command, this is the restore time.

• The date and time when the contents of the volume last changed, in the Last Update field. For read-only and backupvolumes, it matches the timestamp in the Creation field.

• The number of times the volume has been accessed for a fetch or store operation since the later of the two following times:

– 12:00 a.m. on the day the command is issued

– The last time the volume changed location

An example of the output when the -long flag is included:

% vos listvol fs2.example.com b -longTotal number of volumes on server fs2.example.com

partition /vicepb: 66. . . . . .. . . . . .

user.pat 1969534536 RW 17518 K On-linefs2.example.com /vicepbRWrite 1969534536 ROnly 0 Backup 1969534538MaxQuota 20000 KCreation Mon Jun 12 09:02:25 1989Last Update Thu Jan 4 17:39:34 19901573 accesses in the past day (i.e., vnode references)

user.pat.backup 1969534538 BK 17537 K On-linefs2.example.com /vicepbRWrite 1969534536 ROnly 0 Backup 1969534538MaxQuota 20000 KCreation Fri Jan 5 06:37:59 1990Last Update Fri Jan 5 06:37:59 19900 accesses in the past day (i.e., vnode references)

. . . . .

. . . . .Total volumes onLine 66 ; Total volumes offLine 0 ; Total busy 0

Output with the -extended Flag

When you include the -extended flag, the output for each volume includes all of the information reported with the -long flag,plus two tables of statistics:

• The table labeled Raw Read/Write Stats table summarizes the number of times the volume has been accessed forreading or writing.

• The table labeled Writes Affecting Authorship table contains information on writes made to files and directories inthe specified volume.

An example of the output when the -extended flag is included:

% vos listvol fs3.example.com a -extendedcommon.bboards 1969535592 RW 23149 K used 9401 files On-line

fs3.example.com /vicepaRWrite 1969535592 ROnly 0 Backup 1969535594MaxQuota 30000 K


Creation Mon Mar 8 14:26:05 1999Last Update Mon Apr 26 09:20:43 199911533 accesses in the past day (i.e., vnode references)

Raw Read/Write Stats|-------------------------------------------|| Same Network | Diff Network ||----------|----------|----------|----------|| Total | Auth | Total | Auth ||----------|----------|----------|----------|

Reads | 151 | 151 | 1092 | 1068 |Writes | 3 | 3 | 324 | 324 |

|-------------------------------------------|Writes Affecting Authorship

|-------------------------------------------|| File Authorship | Directory Authorship||----------|----------|----------|----------|| Same | Diff | Same | Diff ||----------|----------|----------|----------|

0-60 sec | 92 | 0 | 100 | 4 |1-10 min | 1 | 0 | 14 | 6 |10min-1hr | 0 | 0 | 19 | 4 |1hr-1day | 1 | 0 | 13 | 0 |1day-1wk | 1 | 0 | 1 | 0 |> 1wk | 0 | 0 | 0 | 0 |

|-------------------------------------------|

5.8.5 Displaying One Volume’s VLDB Entry and Volume Header

The vos examine command displays information from both the VLDB and the volume header for a single volume. There is someredundancy in the information from the two sources, which allows you to compare the VLDB and volume header.

Because the volume header for each version of a volume (read/write, read-only, and backup) is different, you can specify whichone to display. Include the .readonly or .backup extension on the volume name or ID argument as appropriate. The informationfrom the VLDB is the same for all three versions.

5.8.6 To display one volume’s VLDB entry and volume header

1. Issue the vos examine command.


where

e Is the shortest acceptable abbreviation of examine.

volume name or ID Identifies one volume either by its complete name or volume ID number. It can be a read/write,read-only, or backup type. Use the .backup or .readonly extension if appropriate.

The top part of the output displays the same information from a volume header as the vos listvol command with the -long flag,as described following the instructions in To display volume headers. If you specify the read-only version of the volume and itexists at more than one site, the output includes all of them. The bottom part of the output lists the same information from theVLDB as the vos listvldb command, as described following the instructions in To display VLDB entries.

Below is an example for a volume whose VLDB entry is currently locked.

% vos examine user.terryuser.terry 536870981 RW 3459 K On-line

fs3.example.com /vicepaWrite 5360870981 ROnly 0 Backup 536870983


MaxQuota 40000 KCreation Mon Jun 12 15:22:06 1989Last Update Fri Jun 16 09:34:35 19895719 accesses in the past day (i.e., vnode references)RWrite: 5360870981 Backup: 536870983number of sites -> 1

server fs3.example.com partition /vicepa RW SiteVolume is currently LOCKED

5.8.7 Displaying the Name or Location of the Volume that Contains a File

This section explains how to learn the name, volume ID number, or location of the volume that contains a file or directory.

You can also use one piece of information about a volume (for example, its name) to obtain other information about it (forexample, its location). The following list points you to the relevant instructions:

• To use a volume’s name to learn the volume ID numbers of all its existing versions, use the vos examine command as describedin To display one volume’s VLDB entry and volume header.

You can also use the command to learn a volume’s name by providing its ID number.

• To use a volume’s name or ID number to learn its location, use the vos listvldb command as described in To display VLDBentries.

5.8.7.1 To display the name of the volume that contains a file

1. Issue the fs listquota command.

% fs listquota [<dir/file path>]

where

lq Is an acceptable alias for listquota(and listq the shortest acceptable abbreviation).

dir/file path Names a directory or file housed in the volume for which to display the name. Partial pathnames are inter-preted relative to the current working directory, which is the default if this argument is omitted.

The following is an example of the output:

% fs listquota /afs/example.com/usr/terryVolume Name Quota Used % Used Partitionuser.terry 15000 5071 34% 86%

5.8.7.2 To display the ID number of the volume that contains a file

1. Issue the fs examine command.

% fs examine [<dir/file path>]

where

exa Is the shortest acceptable abbreviation of examine.

dir/file path Names a directory or file housed in the volume for which to display the volume ID. Partial pathnames areinterpreted relative to the current working directory, which is the default if this argument is omitted.

The following example illustrates how the output reports the volume ID number in the vid field.


% fs examine /afs/example.com/usr/terryVolume status for vid = 50489902 named user.terryCurrent maximum quota is 15000Current blocks used are 5073The partition has 46383 blocks available out of 333305

NoteThe partition-related statistics in this command’s output do not always agree with the corresponding values in the output ofthe standard UNIX df command. The statistics reported by this command can be up to five minutes old, because the CacheManager polls the File Server for partition information at that frequency. Also, on some operating systems, the df command’sreport of partition size includes reserved space not included in this command’s calculation, and so is likely to be about 10%larger.

5.8.7.3 To display the location of the volume that contains a file

1. Issue the fs whereis command to display the name of the file server machine that houses the volume containing a file ordirectory.

% fs whereis [<dir/file path>]

where

whe Is the shortest acceptable abbreviation of whereis.

dir/file path Names a directory or file for which to report the location. Partial pathnames are interpreted relative to thecurrent working directory, which is the default if this argument is omitted.

The output displays the file server machine that houses the volume containing the file, as in the following example:

% fs whereis /afs/example.com/user/terryFile /afs/example.com/usr/terry is on host fs2.example.com

2. If you also want to know which partition houses the volume, first issue the fs listquota command to display the volume’sname. For complete syntax, see To display the name of the volume that contains a file.

% fs listquota [<dir/file path>]

Then issue the vos listvldb command, providing the volume name as the volume name or ID argument. For completesyntax and a description of the output, see To display VLDB entries.

% vos listvldb <volume name or ID>

5.9 Moving Volumes

There are three main reasons to move volumes:

• To place volumes on other partitions or machines temporarily while repairing or replacing a disk or file server machine.

• To free space on a partition that is becoming overcrowded. One symptom of overcrowding is that users cannot to save fileseven though the relevant volume is below its quota. The following error message confirms the problem:

afs: failed to store file (partition full)

You can track available space on AFS server partitions by using the scout or afsmonitor programs described in Monitoringand Auditing AFS Performance.


• A file server machine is becoming overloaded because it houses many more volumes than other machines of the same size, orhas volumes with more popular files in them.

To move a read/write volume, use the vos move command as described in the following instructions. Before attempting to movethe volume, the vos command interpreter verifies that there is enough free space for it on the destination partition. If not, it doesnot attempt the move operation and prints the following message.

vos: no space on target partition destination_part to move volume volume

To move a read-only volume, you actually remove the volume from the current site by issuing the vos remove command asdescribed in To remove a volume and unmount it. Then define a new site and release the volume to it by issuing the vos addsiteand vos release commands as described in To replicate a read/write volume (create a read-only volume).

A backup volume always resides at the same site as its read/write source volume, so you cannot move a backup volume exceptas part of moving the read/write source. The vos move command automatically deletes the backup version when you move aread/write volume. To create a new backup volume at the new site as soon as the move operation completes, issue the vos backupcommand as described in To create and mount a backup volume.

5.9.1 To move a read/write volume



2. Issue the vos move command to move the volume. Type it on a single line; it appears on multiple lines here only forlegibility.

% vos move <volume name or ID> \ <machine name on source><partition name on source > \ <machine name on destination> <partition name ←↩

ondestination>

where

m Is the shortest acceptable abbreviation of move.

volume name or ID Specifies the name or volume ID number of the read/write volume to move.

machine name on source Names the file server machine currently housing the volume.

partition name on source Names the partition currently housing the volume.

machine name on destination Names the file server machine to which to move the volume.

partition name on destination Names the partition to which to move the volume.

NoteIt is best not to halt a vos move operation before it completes, because parts of the volume can be left on both the sourceand destination machines. For more information, see the command’s reference page in the OpenAFS AdministrationReference.

3. (Optional) Issue the vos listvldb command to confirm the success of the move. Complete instructions appear in To displayVLDB entries.


4. If a backup version existed at the read/write volume’s previous site, create a new backup at the new site by issuing the vosbackup command, which is fully described in To create and mount a backup volume.

% vos backup <volume name or ID>


5.10 Synchronizing the VLDB and Volume Headers

AFS can provide transparent file access because the Volume Location Database (VLDB) constantly tracks volume locations.When the Cache Manager needs a file, it contacts the Volume Location (VL) Server, which reads the VLDB for the currentlocation of the volume containing the file. Therefore, the VLDB must accurately reflect the state of volumes on the file servermachines at all times. The Volume Server and VL Server automatically update a volume’s VLDB entry when its status changesduring a vos operation, by performing the following series of steps.

1. The VL Server locks the VLDB entry. The lock advises other operations not to manipulate any of the volume versions(read/write, read-only, or backup), which prevents the inconsistency that can result from multiple simultaneous operations.

2. The VL Server sets an intention flag in the VLDB entry that indicates the kind of operation to be performed. This flag neverappears in VLDB listings because it is for internal use only. In case the operation terminates prematurely, this flag tellsthe Salvager which operation was interrupted. (The Salvager then determines the steps necessary either to complete theoperation or return the volume to a previous consistent state. For more information on salvaging, see Salvaging Volumes.)

3. The Volume Server manipulates the volume. It usually sets the Off-line flag in the volume header, which makes thevolume inaccessible to the File Server and other Volume Server operations during the manipulation. When the operationcompletes, the volume is again marked On-line.

4. The VL Server records any changes resulting from the operation in the VLDB entry. Once the operation is complete, itremoves the intention flag set in Step 2and releases the lock set in Step 1.

If a vos operation fails while the Volume Server is manipulating the volume (corresponding to Step 3), the volume can be left inan intermediate state, which is termed corruption. In this case, the Off-line or Off-line**needs salvage** markerusually appears at the end of the first line of output from the vos examine command. To repair the corruption, run the Salvagerbefore attempting to resynchronize the VLDB and volume headers. For salvaging instructions, see Salvaging Volumes.

More commonly, an interruption while flags are being set or removed (corresponding to Step 1, Step 2, or Step 4) causes adiscrepancy between the VLDB and volume headers. To resynchronize the VLDB and volumes, use the vos syncvldb and vossyncserv commands. To achieve complete VLDB consistency, it is best to run the vos syncvldb command on all file servermachines in the cell, and then run the vos syncserv command on all file server machines in the cell.

There are several symptoms that indicate a volume operation failed:

• Error messages on the standard error stream or in server process log files indicate that an operation terminated abnormally.Perhaps you had to halt the operation before it completed (for instance, by using a signal such as Ctrl-c), or a file servermachine or server process was not functioning when the operation ran. To determine if a machine or process is still notfunctioning, issue the bos status command as described in Displaying Process Status and Information from the BosConfigFile.

• A subsequent vos operation fails because a previous failure left a VLDB entry locked. Sometimes an error message reportsthat a volume is locked. To display a list of locked volumes, use the -locked flag on the vos listvldb command as described inDisplaying VLDB Entries.

If the only problem with a volume is that its VLDB entry is locked, you probably do not need to synchronize the entire VLDB.Instead use the vos unlock or vos unlockvldb command to unlock the entry, as described in Unlocking and Locking VLDBEntries.

• A subsequent vos operation fails because a previous failure left a volume marked as offline. To check a volume’s currentstatus, check the first line of output from the vos examine command as described in Displaying One Volume’s VLDB Entryand Volume Header.

The vos syncvldb command corrects the information in the Volume Location Database (VLDB) either about all volumes housedon a file server machine, about the volumes on just one partition, or about a single volume. If checking about one or morepartitions, the command contacts the Volume Server to obtain a list of the volumes that actually reside on each partition. It thenobtains the VLDB entry for each volume from the VL Server. It changes the VLDB entry as necessary to reflect the state of thevolume on the partition. For example, it creates or updates a VLDB entry when it finds a volume for which the VLDB entry ismissing or incomplete. However, if there is already a VLDB entry that defines a different location for the volume, or there are


irreconcilable conflicts with other VLDB entries, it instead writes a message about the conflict to the standard error stream. Thecommand never removes volumes from the file server machine.

When checking a single volume’s VLDB entry, the command also automatically performs the operations invoked by the vossyncserv command: it not only verifies that the VLDB entry is correct for the specified volume type (read/write, backup, orread-only), but also checks that any related volume types mentioned in the VLDB entry actually exist at the site listed in theentry.

The vos syncserv command verifies that each volume type (read/write, read-only, and backup) mentioned in a VLDB entryactually exists at the site indicated in the entry. It checks all VLDB entries that mention a site either on any of a file servermachine’s partitions or on one partition. Note that command can end up inspecting sites other than on the specified machine orpartition, if there are read-only versions of the volume at sites other than the read/write site.

The command alters any incorrect information in the VLDB, unless there is an irreconcilable conflict with other VLDB entries.In that case, it writes a message to the standard error stream instead. The command never removes volumes from their sites.

5.10.1 To synchronize the VLDB with volume headers



2. Issue the vos syncvldb command to make the VLDB reflect the true state of all volumes on a machine or partition, or thestate of one volume.

NoteTo synchronize the VLDB completely, issue the command repeatedly, substituting each file server machine in your cellfor the -server argument in turn and omitting the -partition and -volume arguments, before proceeding to Step 3.

% vos syncvldb -server <machine name> [-partition <partition name>] \[-volume <volume name or ID>] [-verbose >> file]

where

syncv Is the shortest acceptable abbreviation of syncvldb.

-server Names the file server machine housing the volumes for which to verify VLDB entries. If you are also providingthe -volume argument, this argument must name the machine where the volume actually resides.

-partition Identifies the partition (on the file server machine specified by the -server argument) housing the volumes forwhich to verify VLDB entries. In general, it is best to omit this argument so that either the VLDB entries for allvolumes on a server machine are corrected (if you do not provide the -volume argument), or so that you do not needto guarantee that the partition actually houses the volume named by the -volume argument.

-volume Specifies the name or volume ID number of a single volume for which to verify the VLDB entry.

-verbose >> file Directs a detailed trace to the file called file, which can be either in AFS or on the local disk of themachine on which you are issuing the command. The command often writes a large amount of output to the standardoutput stream; writing it to a file enables you to examine the output more carefully.

3. Issue the vos syncserv command to inspect each volume for which the VLDB lists a version at the specified site.

NoteTo synchronize the VLDB completely, issue the command repeatedly, substituting each file server machine in your cellfor the machine name argument in turn and omitting the partition name argument.

% vos syncserv <machine name> [<partition name>] [-v >> file]


where

syncs Is the shortest acceptable abbreviation of syncserv.

machine name Names the file server machine mentioned in each VLDB entry to check.

partition name Identifies the partition mentioned in each VLDB entry to check. If synchronizing the entire VLDB, omitthis argument.

-v >> file Directs a detailed trace to the file called file, which can be either in AFS or on the local disk of the machineon which you are issuing the command. The command often writes a large amount of output to the standard outputstream; writing it to a file enables you to examine the output more carefully.

5.11 Salvaging Volumes

An unexpected interruption while the Volume Server or File Server is manipulating the data in a volume can leave the volumein an intermediate state (corrupted), rather than just creating a discrepancy between the information in the VLDB and volumeheaders. For example, the failure of the operation that saves changes to a file (by overwriting old data with new) can leave theold and new data mixed together on the disk.

If an operation halts because the Volume Server or File Server exits unexpectedly, the BOS Server automatically shuts downall components of the fs process and invokes the Salvager. The Salvager checks for and repairs any inconsistencies it can.Sometimes, however, there are symptoms of the following sort, which indicate corruption serious enough to create problems butnot serious enough to cause the File Server component to fail. In these cases you can invoke the Salvager yourself by issuing thebos salvage command.

• Symptom: A file appears in the output of the ls command, but attempts to access the file fail with messages indicating that itdoes not exist.

Possible cause: The Volume Server or File Server exited in the middle of a file-creation operation, after changing the directorystructure, but before actually storing data. (Other possible causes are that the ACL on the directory does not grant the permis-sions you need to access the file, or there is a process, machine, or network outage. Check for these causes before assumingthe file is corrupted.)

Salvager’s solution: Remove the file’s entry from the directory structure.

• Symptom: A volume is marked Off-line in the output from the vos examine and vos listvol commands, or attempts toaccess the volume fail.

Possible cause: Two files or versions of a file are sharing the same disk blocks because of an interrupted operation. TheFile Server and Volume Server normally refuse to attach volumes that exhibit this type of corruption, because it can be verydangerous. If the Volume Server or File Server do attach the volume but are unsure of the status of the affected disk blocks,they sometimes try to write yet more data there. When they cannot perform the write, the data is lost. This effect can cascade,causing loss of all data on a partition.

Salvager’s solution: Delete the data from the corrupted disk blocks in preference to losing an entire partition.

• Symptom: There is less space available on the partition than you expect based on the size statistic reported for each volumeby the vos listvol command.

Possible cause: There are orphaned files and directories. An orphaned element is completely inaccessible because it is notreferenced by any directory that can act as its parent (is higher in the file tree). An orphaned element is not counted in thecalculation of a volume’s size (or against its quota), even though it occupies space on the server partition.

Salvager’s solution: By default, print a message to the /usr/afs/logs/SalvageLog file reporting how many orphans werefound and the approximate number of kilobytes they are consuming. You can use the -orphans argument to remove or attachorphaned elements instead. See To salvage volumes.

When you notice symptoms such as these, use the bos salvage command to invoke the Salvager before corruption spreads. (Eventhough it operates on volumes, the command belongs to the bos suite because the BOS Server must coordinate the shutdown andrestart of the Volume Server and File Server with the Salvager. It shuts them down before the Salvager starts, and automaticallyrestarts them when the salvage operation finishes.)


All of the AFS data stored on a file server machine is inaccessible during the salvage of one or more partitions. If you salvagejust one volume, it alone is inaccessible.

When processing one or more partitions, the command restores consistency to corrupted read/write volumes where possible. Forread-only or backup volumes, it inspects only the volume header:

• If the volume header is corrupted, the Salvager removes the volume completely and records the removal in its log file, /usr/af-s/logs/SalvageLog. Issue the vos release or vos backup command to create the read-only or backup volume again.

• If the volume header is intact, the Salvager skips the volume (does not check for corruption in the contents). However, if theFile Server notices corruption as it initializes, it sometimes refuses to attach the volume or bring it online. In this case, it issimplest to remove the volume by issuing the vos remove or vos zap command. Then issue the vos release or vos backupcommand to create it again.

Combine the bos salvage command’s arguments as indicated to salvage different numbers of volumes:

• To salvage all volumes on a file server machine, combine the -server argument and the -all flag.

• To salvage all volumes on one partition, combine the -server and -partition arguments.

• To salvage only one read/write volume, combine the -server, -partition, and -volume arguments. Only that volume is inac-cessible to Cache Managers, because the BOS Server does not shutdown the File Server and Volume Server processes duringthe salvage of a single volume. Do not name a read-only or backup volume with the -volume argument. Instead, remove thevolume, using the vos remove or vos zap command. Then create a new copy of the volume with the vos release or vos backupcommand.

The Salvager always writes a trace to the /usr/afs/logs/SalvageLog file on the file server machine where it runs. To record thetrace in another file as well (either in AFS or on the local disk of the machine where you issue the bos salvage command), namethe file with the -file argument. Or, to display the trace on the standard output stream as it is written to the /usr/afs/logs/Sal-vageLog file, include the -showlog flag.

By default, multiple Salvager subprocesses run in parallel: one for each partition up to four, and four subprocesses for four ormore partitions. To increase or decrease the number of subprocesses running in parallel, provide a positive integer value for the-parallel argument.

If there is more than one server partition on a physical disk, the Salvager by default salvages them serially to avoid the inefficiencyof constantly moving the disk head from one partition to another. However, this strategy is often not ideal if the partitions areconfigured as logical volumes that span multiple disks. To force the Salvager to salvage logical volumes in parallel, providethe string all as the value for the -parallel argument. Provide a positive integer to specify the number of subprocesses to run inparallel (for example, -parallel 5all for five subprocesses), or omit the integer to run up to four subprocesses, depending on thenumber of logical volumes being salvaged.

The Salvager creates temporary files as it runs, by default writing them to the partition it is salvaging. The number of files can bequite large, and if the partition is too full to accommodate them, the Salvager terminates without completing the salvage operation(it always removes the temporary files before exiting). Other Salvager subprocesses running at the same time continue until theyfinish salvaging all other partitions where there is enough disk space for temporary files. To complete the interrupted salvage,reissue the command against the appropriate partitions, adding the -tmpdir argument to redirect the temporary files to a localdisk directory that has enough space.

The -orphans argument controls how the Salvager handles orphaned files and directories that it finds on server partitions it issalvaging. An orphaned element is completely inaccessible because it is not referenced by the vnode of any directory that canact as its parent (is higher in the filespace). Orphaned objects occupy space on the server partition, but do not count against thevolume’s quota.

During the salvage, the output of the bos status command reports the following auxiliary status for the fs process:

Salvaging file system


5.11.1 To salvage volumes



2. Issue the bos salvage command to salvage one or more volumes.

% bos salvage -server <machine name> [-partition <salvage partition>] \[-volume <salvage volume number or volume name>] \[-file salvage log output file] [-all] [-showlog] \[-parallel <# of max parallel partition salvaging>] \[-tmpdir <directory to place tmp files>] \[-orphans <ignore | remove | attach >]

where

-server Names the file server machine on which to salvage volumes. This argument can be combined either with the -allflag, the -partition argument, or both the -partition and -volume arguments.

-partition Names a single partition on which to salvage all volumes. The -server argument must be provided along withthis one.

-volume Specifies the name or volume ID number of one read/write volume to salvage. Combine this argument with the-server and -partition arguments.

-file Specifies the complete pathname of a file into which to write a trace of the salvage operation, in addition to the/usr/afs/logs/SalvageLog file on the server machine. If the file pathname is local, the trace is written to the specifiedfile on the local disk of the machine where the bos salvage command is issued. If the -volume argument is included,the file can be in AFS, though not in the volume being salvaged. Do not combine this argument with the -showlogflag.

-all Salvages all volumes on all of the partitions on the machine named by the -server argument.-showlog Displays the trace of the salvage operation on the standard output stream, as well as writing it to the /usr/af-

s/logs/SalvageLog file.-parallel Specifies the maximum number of Salvager subprocesses to run in parallel. Provide one of three values:

• An integer from the range 1 to 32. A value of 1 means that a single Salvager process salvages the partitionssequentially.

• The string all to run up to four Salvager subprocesses in parallel on partitions formatted as logical volumes thatspan multiple physical disks. Use this value only with such logical volumes.

• The string all followed immediately (with no intervening space) by an integer from the range 1 to 32, to run thespecified number of Salvager subprocesses in parallel on partitions formatted as logical volumes. Use this valueonly with such logical volumes.

The BOS Server never starts more Salvager subprocesses than there are partitions, and always starts only one processto salvage a single volume. If this argument is omitted, up to four Salvager subprocesses run in parallel.

-tmpdir Specifies the full pathname of a local disk directory to which the Salvager process writes temporary files as itruns. By default, it writes them to the partition it is currently salvaging.

-orphans Controls how the Salvager handles orphaned files and directories. Choose one of the following three values:ignore Leaves the orphaned objects on the disk, but prints a message to the /usr/afs/logs/SalvageLog file reporting

how many orphans were found and the approximate number of kilobytes they are consuming. This is the defaultif you omit the -orphans argument.

remove Removes the orphaned objects, and prints a message to the /usr/afs/logs/SalvageLog file reporting howmany orphans were removed and the approximate number of kilobytes they were consuming.

attach Attaches the orphaned objects by creating a reference to them in the vnode of the volume’s root directory.Since each object’s actual name is now lost, the Salvager assigns each one a name of the following form:

_ _ORPHANFILE_ _. index for files_ _ORPHANDIR_ _. index for directories

where index is a two-digit number that uniquely identifies each object. The orphans are charged against thevolume’s quota and appear in the output of the ls command issued against the volume’s root directory.


5.12 Setting and Displaying Volume Quota and Current Size

Every AFS volume has an associated quota which limits the volume’s size. The default quota for a newly created volume is 5,000kilobyte blocks (slightly less that 5 MB). When a volume reaches its quota, the File Server rejects attempts to create new files ordirectories in it. If an application is writing data into an existing file in a full volume, the File Server allows a defined overage(by default, 1 MB). (You can use the fileserver command’s -spare or -pctspare argument to change the default overage; see thecommand’s reference page in the OpenAFS Administration Reference.)

To set a quota other than 5000 KB as you create a volume, include the -maxquota argument to the vos create command, asdescribed in Creating Read/write Volumes. To modify an existing volume’s quota, issue either the fs setquota or the fs setvolcommand as described in the following instructions. Do not set an existing volume’s quota lower than its current size.

In general, smaller volumes are easier to administer than larger ones. If you need to move volumes, say for load-balancingpurposes, it is easier to find enough free space on other partitions for small volumes. Move operations complete more quickly forsmall volumes, reducing the potential for outages or other errors to interrupt the move. AFS supports a maximum volume size,which can vary for different AFS releases; see the OpenAFS Release Notes for the version you are using. Also, the size of apartition or logical places an absolute limit on volume size, because a volume cannot span multiple partitions or logical volumes.

It is generally safe to overpack partitions by putting more volumes on them than can actually fit if all the volumes reach theirmaximum quota. However, only experience determines to what degree overpacking works in your cell. It depends on what kindof quota you assign to volumes (particularly user volumes, which are more likely than system volumes to grow unpredictably)and how much information people generate and store in comparison to their quota.

There are several commands that display a volume’s quota, as described in the following instructions. They differ in how muchrelated information they produce.

5.12.1 To set quota for a single volume

1. Verify that you belong to the system:administrators group. If necessary, issue the pts membership command, which isfully described in To display the members of the system:administrators group.

% pts membership system:administrators

2. Issue the fs setquota command to set the volume’s maximum quota.

% fs setquota [<dir/file path>] -max <max quota in kbytes>

where

sq Is an acceptable alias for setquota.

dir/file path Names a file or directory in the volume for which to set the indicated quota. Partial pathnames are interpretedrelative to the current working directory, which is the default if you omit this argument.Specify the read/write path to the file or directory, to avoid the failure that results when you attempt to change aread-only volume. By convention, you indicate the read/write path by placing a period before the cell name at thepathname’s second level (for example, /afs/.example.com). For further discussion of the concept of read/write andread-only paths through the filespace, see The Rules of Mount Point Traversal.

max quota in kbytes Sets the volume’s quota, expressed in kilobyte blocks ( 1024 equals a megabyte). A value of 0grants an unlimited quota, but the size of the partition imposes an absolute limit. You must include the -max switchif omitting the dir/file path argument (to set the quota on the volume that houses the current working directory).

5.12.2 To set maximum quota on one or more volumes




2. Issue the fs setvol command to set the quota on one or more volumes.

% fs setvol [<dir/file path>+] -max <disk space quota in 1K units>

where

sv Is an acceptable alias for setvol.dir/file path Names one file or directory that resides in each volume for which to set the indicated quota. Partial pathnames

are interpreted relative to the current working directory, which is the default if you omit this argument.

disk space quota in 1K units Sets the maximum quota on each volume, expressed in kilobytes blocks ( 1024 equals amegabyte). A value of 0 grants an unlimited quota, but the size of the partition does impose an absolute limit.

5.12.3 To display percent quota used

1. Issue the fs quota command.

% fs quota [<dir/file path>+]

where

q Is the shortest acceptable abbreviation of quota.

dir/file path Names a directory or file in each volume for which to display percent quota used. Partial pathnames areinterpreted relative to the current working directory, which is the default if you omit this argument.

The following example illustrates the output produced by this command:

% fs quota /afs/example.com/usr/terry34% of quota used.

5.12.4 To display quota, current size, and other information

1. Issue the fs listquota command.

% fs listquota [<dir/file path>+]

where

lq Is an alias for listquota.

dir/file path Names a directory or file in each volume for which to display quota along with volume name and currentspace usage. Partial pathnames are interpreted relative to the current working directory, which is the default if youomit this argument.

As illustrated in the following example, the output reports the volume’s name, its quota and current size (both in kilobyte units),the percent quota used, and the percentage of space on the volume’s host partition that is used.

% fs listquota /afs/example.com/usr/terryVolume Name Quota Used % Used Partitionuser.terry 15000 5071 34% 86%


5.12.5 To display quota, current size, and more partition information

1. Issue the fs examine command.

% fs examine [<dir/file path>+]

where

exa Is the shortest acceptable abbreviation of examine.

dir/file path Names a directory or file in each volume for which to display quota information and information about thehost partition. Partial pathnames are interpreted relative to the current working directory, which is the default if youomit this argument.

As illustrated in the following example, the output displays the volume’s volume ID number and name, its quota and current size(both in kilobyte units), and the free and total number of kilobyte blocks on the volume’s host partition.

% fs examine /afs/example.com/usr/terryVolume status for vid = 50489902 named user.terryCurrent maximum quota is 15000Current blocks used are 5073The partition has 46383 blocks available out of 333305

NoteThe partition-related statistics in this command’s output do not always agree with the corresponding values in the output ofthe standard UNIX df command. The statistics reported by this command can be up to five minutes old, because the CacheManager polls the File Server for partition information at that frequency. Also, on some operating systems, the df command’sreport of partition size includes reserved space not included in this command’s calculation, and so is likely to be about 10%larger.

5.13 Removing Volumes and their Mount Points

To remove a volume from its site and its record from the VLDB, use the vos remove command. Use it to remove any of the threetypes of volumes; the effect depends on the type.

• If you indicate the read/write volume by specifying the volume’s base name without a .readonly or .backup extension, thecommand removes both the read/write and associated backup volume from the partition that houses them. You do not needto provide the -server and -partition arguments, because there can be only one read/write site. The site information is alsoremoved from the VLDB entry, and the site count (reported by the vos examine and vos listvldb commands as number ofsites) decrements by one. The read/write and backup volume ID numbers no longer appear in the output from the vosexamine and vos listvldb commands, but they are preserved internally. Read-only sites, if any, are not affected, but cannot bechanged unless a read/write site is again defined. The entire VLDB entry is removed if there are no read-only sites.

If there are no read-only copies left, it is best to remove the volume’s mount point to prevent attempts to access the volume’scontents. Do not remove the mount point if copies of the read-only volume remain.

• If you indicate a read-only volume by including the .readonly extension on its name, it is removed from the partition that housesit, and the corresponding site information is removed from the VLDB entry. The site count reported by the vos examine andvos listvldb commands as number of sites decrements by one for each volume you remove.

If there is more than one read-only site, you must include the -server argument (and optionally -partition argument) to specifythe site from which to remove the volume. If there is only one read-only site, the volume name is sufficient; if no read/writevolume exists in this case, the entire VLDB entry is removed.

It is not generally appropriate to remove the volume’s mount point when removing a read-only volume, especially if theread/write version of the volume still exists. If the read/write version no longer exists, remove the mount point as described inStep 5of To remove a volume and unmount it.


• If you indicate a backup volume by including the .backup extension on its name, it is removed from the partition that housesit and its site information is removed from the VLDB entry. You do not need to provide the -server and -partition arguments,because there can be only one backup site. The backup volume ID number no longer appears in the output from the vosexamine or vos listvldb command, but is preserved internally.

In the standard configuration, there is a separate mount point for the backup version of a user volume. Remember to removethe mount point to prevent attempt to access the nonexistent volume’s contents.

5.13.1 Other Removal Commands

The vos remove command is almost always the appropriate way to remove a volume, because it automatically removes a volume’sVLDB entry and both the volume header and all data from the partition. If either the VLDB entry or volume header does notexist, it is sometimes necessary to use other commands that remove only the remaining element. Do not use these commands inthe normal case when both the VLDB entry and the volume header exist, because by definition they create discrepancies betweenthem. For details on the commands’ syntax, see their reference pages in the OpenAFS Administration Reference.

The vos zap command removes a volume from its site by removing the volume header and volume data for which a VLDBentry no longer exists. You can tell a VLDB entry is missing if the vos listvol command displays the volume header but thevos examine or vos listvldb command cannot locate the VLDB entry. You must run this command to correct the discrepancy,because the vos syncvldb and vos syncserv commands never remove volume headers.

The vos remsite command removes a read-only site definition from the VLDB without affecting the volume on the file servermachine. Use this command when you have mistakenly issued the vos addsite command to define a read-only site, but have notyet issued the vos release command to release the volume to the site. If you have actually released a volume to the site, use thevos remove command instead.

The vos delentry command removes the entire VLDB entry that mentions the volume you specify. If versions of the volumeactually exist on file server machines, they are not affected. This command is useful if you know for certain that a volumeremoval was not recorded in the VLDB (perhaps you used the vos zap command during an emergency), and do not want to takethe time to resynchronize the entire VLDB with the vos syncvldb and vos syncserv commands.

5.13.2 To remove a volume and unmount it



2. If removing the volume’s mount point, verify that you have the d( delete) permission on its parent directory’s ACL. Ifnecessary, issue the fs listacl command, which is fully described in Displaying ACLs.



3. (Optional) Dump the volume to a file or to tape, in case you want to restore it later. To copy the volume’s contents to a file,use the vos dump command as instructed in Dumping and Restoring Volumes. You can then copy the file to tape using athird-party backup utility or an archiving utility such as the UNIX tar command.

Alternatively, use the AFS Backup System to create a tape copy. In this case, it can be convenient to create a temporaryvolume set that includes only the volume of interest. Temporary volume sets are not recorded in the Backup Database, andso do not clutter database with records for volume sets that you use only once. For instructions, see To create a dump.

4. Issue the vos remove command to remove the volume. If removing a read-only volume from multiple sites, repeat thecommand for each one.

% vos remove [-server machine name>] [-partition <partition name>] \-id <volume name or ID>


where

remo Is the shortest acceptable abbreviation of remove.

-server Specifies the file server machine on which the volume resides. It is necessary only when the -id argument namesa read-only volume that exists at multiple sites.

-partition Specifies the partition on machine name where the volume resides. It is necessary only when the -id argumentnames a read-only volume that exists at multiple sites. Provide the -server argument along with this one.

-id Identifies the volume to remove, either by its complete name or volume ID number. If identifying a read-only orbackup volume by name, include the appropriate extension ( .readonly or .backup).

5. If you are removing the last existing version of the volume, issue the fs rmmount command remove the correspondingmount point. Complete instructions appear in To remove a volume and unmount it.

If you are removing a backup volume that is mounted in the conventional way (at a subdirectory of its read/write volume’sroot directory), then removing the source volume’s mount point in this step is sufficient to remove the backup volume’smount point. If you mounted the backup at a completely separate directory, you need to repeat this step for the backupvolume’s mount point.


6. (Optional) If you created a dump file in Step 3, transfer it to tape. The preferred method is to use the AFS Backup System,which is described in Configuring the AFS Backup Systemand Backing Up and Restoring AFS Data.

5.14 Dumping and Restoring Volumes

Dumping a volume with the vos dump command converts its contents into ASCII format and writes them to the file you specify.The vos restore command places a dump file’s contents into a volume after converting them into the volume format appropriatefor the indicated file server machine.

5.14.1 About Dumping Volumes

Dumping a volume can be useful in several situations, including the following:

• You want to back it up to tape, perhaps by using a third-party backup utility. To facilitate this type of backup operation, the vosdump command can write to a named pipe. To learn about using the AFS Backup System instead, see Configuring the AFSBackup Systemand Backing Up and Restoring AFS Data.

• You are removing the volume from your cell (perhaps because its owner is leaving your cell). The vos dump commandenables you to create a copy for safekeeping without incurring the overhead of the Backup System. For complete instructionson removing a volume, see Removing Volumes and their Mount Points.

• You want to create a copy of the volume for safekeeping on a non-AFS server partition, perhaps while you move the actualvolume to another machine or perform maintenance tasks on the partition that houses the volume.

• You need to replace a corrupted read/write volume. If an uncorrupted read-only or backup version of the volume exists, dumpit and restore the data into the read/write volume, overwriting the corrupted contents.

• You want to copy or transfer the contents of the volume to another cell. You cannot use the vos move command, because AFSsupports volume moves only between file server machines that belong to the same cell.

• You want to have another read/write copy of the volume’s contents. The second volume must have a different name than theoriginal one. If you want the contents of the two volumes to remain identical, you must update them both manually. AFSprovides no facility for keeping read/write volumes synchronized in this way.

• You want a copy of only the files and directories in the volume with modification time stamps after a certain date. The vosdump command can create an incremental dump file as described in Step 3of the following instructions.


You can use the vos dump command to create a full dump, which contains the complete contents of the volume at the time youissue the command, or an incremental dump, which contains only those files and directories with modification timestamps (asdisplayed by the ls -l command) that are later than a date and time you specify. See Step 3of the following instructions.

Dumping a volume does not change its VLDB entry or permanently affect its status on the file server machine, but the volume’scontents are inaccessible during the dump operation. To avoid interrupting access to the volume, it is generally best to dump thevolume’s backup version, just after using the vos backup or vos backupsys command to create a new backup version.

If you do not provide a filename into which to write the dump, the vos dump command directs the output to the standard outputstream. You can pipe it directly to the vos restore command if you wish.

Because a volume dump file is in ASCII format, you can read its contents using a text editor or a command such as the catcommand. However, dump files sometimes contain special characters that do not have alphanumeric correlates, which can causeproblems for some display programs.

By default, the vos command interpreter consults the Volume Location Database (VLDB) to learn the volume’s location, so the-server and -partition arguments are not required. If the -id argument identifies a read-only volume that resides at multiplesites, then the command dumps the version from just one of them (normally, the one listed first in the volume’s VLDB entry asreported by the vos examine or vos listvldb command). To dump the read-only volume from a particular site, use the -serverand -partition arguments to specify the site. To bypass the VLDB lookup entirely, provide a volume ID number (rather thana volume name) as the value for the -id argument, along with the -server and -partition arguments. This makes it possible todump a volume for which there is no VLDB entry.

5.14.2 To dump a volume



2. Verify that you have the permissions necessary to create the dump file. If placing it in AFS, you must have the i( insert)permission on the ACL of the file’s directory. If necessary, issue the fs listacl command, which is fully described inDisplaying ACLs.



3. Issue the vos dump command to dump the volume.

% vos dump -id <volume name or ID> [-time <dump from time>] [-file <arg>] [-server < ←↩server>] [-partition <partition>]

where

-id Identifies the volume to be dumped by its complete name or volume ID number. If you want to dump the read-only orbackup version, specify its volume ID number or add the appropriate extension ( .readonly or .backup) to the name.To bypass the normal VLDB lookup of the volume’s location, provide the volume ID number and combine thisargument with the -server and -partition arguments.

-time Specifies whether the dump is full or incremental. Omit this argument to create a full dump, or provide one of threeacceptable values:

• The value 0(zero) to create a full dump.• A date in the format mm / dd / yyyy (month, day and year) to create an incremental dump that includes only files

and directories with modification timestamps later than midnight (12:00 a.m.) on the indicated date. Valid valuesfor the year range from 1970 to 2037; higher values are not valid because the latest possible date in the standardUNIX representation is in 2038. The command interpreter automatically reduces later dates to the maximum value.An example is 01/13/1999.


• A date and time in the format " mm / dd / yyyy hh : MM " to create an incremental dump that includes onlyfiles and directories with modification timestamps later than the specified date and time. The date format is thesame as for a date alone. Express the time as hours and minutes (hh:MM) in 24-hour format (for example, 20:30is 8:30 p.m.). Surround the entire expression with double quotes (" ") because it contains a space. An example is"01/13/1999 22:30".

-file Specifies the pathname of the file to which to write the dump. The file can be in AFS, but not in the volume beingdumped. A partial pathname is interpreted relative to the current working directory. Omit this argument to direct thedump to the standard output stream.

-server Specifies the file server machine on which the volume resides. Provide the -partition argument along with thisone.

-partition Specifies the partition on which the volume resides. Provide the -server argument along with this one.

5.14.3 About Restoring Volumes

Although you can dump any of the three types of volumes (read/write, read-only, or backup), you can restore a dump file to thefile system only as a read/write volume, using the vos restore command. The command automatically translates the dump file’scontents from ASCII back into the volume format appropriate for the file server machine that stores the restored version. As withthe vos dump command, you can restore a dump file via a named pipe, which facilitates interoperation with third-party backuputilities.

You can restore the contents of a dump file in one of two basic ways. In either case, you must restore a full dump of the volumebefore restoring any incremental dumps. Any incremental dumps that you then restore must have been created after the fulldump. If there is more than one incremental dump, you must restore them in the order they were created.

• You can restore volume data into a brand new volume with a new name and at a location that you specify. See To restore adump into a new volume and mount it.

You can assign a volume ID number as you restore the volume, though it is best to have the Volume Server allocate a volumenumber automatically. The most common reason for specifying the volume ID is that a volume’s VLDB entry has disappearedfor some reason, but you know the former read/write volume ID number and want to reuse it.

• You can restore volume data into an existing volume (usually the one that was previously dumped), overwriting its currentcontents. This is convenient if the current contents are corrupted or otherwise incorrect, because it allows you to replace themwith a coherent version from the past or from one of the volume’s clones. See To restore a dump file, overwriting an existingvolume.

Provide the -overwrite argument to preconfirm that you wish to overwrite the volume’s contents, and to specify whether youare restoring a full or incremental dump. If you omit the -overwrite argument, the Volume Server generates the followingprompt to confirm that you want to overwrite the existing volume with either a full ( f) or incremental ( i) dump:

Do you want to do a full/incremental restore or abort? [fia](a):

If you pipe in the dump file via the standard input stream instead of using the -file argument to name it, you must include the-overwrite argument because there is nowhere for the Volume Server to display the prompt in this case.

You can move the volume to a new site as you overwrite it with a full dump, by using the -server and -partition arguments tospecify the new site. You cannot move the volume when restoring an incremental dump.

The vos restore command sets the restored volume’s creation date in the volume header to the time of the restore operation, asreported in the Creation field in the output from the vos examine and vos listvol commands.

5.14.4 To restore a dump into a new volume and mount it




2. Verify that you have permissions needed to read the dump file and to mount the new volume. If the dump file resides inAFS, you need the r( read) permission on the ACL of its directory. You need the i( insert) and a( administer) permissionson the ACL of the directory where you are mounting the new volume. If necessary, issue the fs listacl command, which isfully described in Displaying ACLs.



3. Select a site (disk partition on a file server machine) for the new volume. If your cell groups different types of volumesonto different file server machines, that can guide your decision. It often makes sense to put the volume on the emptiestpartition that meets your other criteria. To display how much space is available on a file server machine’s partitions, usethe vos partinfo command, which is described fully in Creating Read/write Volumes.


4. Issue the vos restore command to create a new volume and restore the dump file into it. Type it on a single line; it appearson multiple lines here only for legibility.

% vos restore <machine name> <partition name> \<name of volume to be restored> \[-file <dump file>] [-id <volume ID>]

where

res Is the shortest acceptable abbreviation of restore.

machine name Names the file server machine on which to create the new volume.

partition name Names the partition on which to create the new volume.

name of volume to be restored Names the new read/write volume, which must not already have a VLDB entry. It can beup to 22 characters in length.

-file Is the dump file to restore. Partial pathnames are interpreted with respect to the current working directory. Omit thisargument if using a pipe to read in the dump file from the standard input stream.

-volume Specifies the new volume’s ID number. It is appropriate only if you are restoring a volume that no longer existsand want to use the volume ID number it had previously.

5. Issue the fs mkmount command to mount the new volume, making its contents accessible. Complete instructions appearin To create a regular or read/write mount point.




5.14.5 To restore a dump file, overwriting an existing volume



2. Verify that you have permissions needed to read the dump file. If it resides in AFS, you need the r( read) permission onthe ACL of its directory. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.




3. Restore the contents of the dump file into a read/write volume, overwriting the current contents. The volume retains itscurrent volume ID number. Type it on a single line; it appears on multiple lines here only for legibility.

% vos restore <machine name> <partition name> \<name of volume to be restored> \[-file <dump file>] [-id <volume ID>]

where

res Is the shortest acceptable abbreviation of restore.

machine name Names the file server machine where the volume already exists, or the machine to which to move it. Inthe latter case, the value for the -overwrite argument must be full.

partition name Names the partition where the volume already exists, or the partition to which to move it. In the lattercase, the value for the -overwrite argument must be full.

name of volume to be restored Names the read/write volume to overwrite with the contents of the dump file.

-file Is the dump file to restore. Partial pathnames are interpreted with respect to the current working directory. Omit thisargument if using a pipe to read in the dump file from the standard input stream; in this case, you must provide the-overwrite argument.

-overwrite Preconfirms that you want to overwrite the existing volume and specifies which type of dump file you arerestoring. Provide one of the following values:

• f or full if restoring a full dump file• i or incremental if restoring an incremental dump file. This value is not acceptable if you are moving the volume

while restoring it.• a to terminate the restore operation

4. If the volume is replicated, issue the vos release command to release the newly restored contents to read-only sites.Complete instructions appear in Replicating Volumes (Creating Read-only Volumes).

% vos release <volume name or ID>

5. Issue the vos backup command to create a new backup version of the volume. Complete instructions appear in CreatingBackup Volumes.

% vos backup <volume name or ID>

5.15 Renaming Volumes

You can use the vos rename command to rename a volume. For example, it is appropriate to rename a user’s home volume ifyou use the user. username convention for user volume names and you change the username. (For complete instructions forchanging usernames, see Changing Usernames.)

The vos rename command accepts only read/write volume names, but automatically changes the names of the associated read-only and backup volumes. As directed in the following instructions, you need to replace the volume’s current mount point witha new one that reflects the name change.


5.15.1 To rename a volume



2. Verify that you have the a( administer), d( delete), and i( insert) access permissions for the directory in which you arereplacing the volume’s mount point. If necessary, issue the fs listacl command, which is fully described in DisplayingACLs.



3. Issue the vos rename command to rename the volume.

% vos rename <old volume name> <new volume name>

where

ren Is the shortest acceptable abbreviation of rename.

old volume name Is the current name of a read/write volume.

new volume name Is the new name for the volume. It cannot be more than 22 characters in length.

If there is no Volume Location Database (VLDB) entry for the specified current volume name, the command fails with thefollowing error message:

vos: Could not find entry for volume old_volume_name.

4. Issue the fs rmmount command to remove the mount point that refers to the volume’s old name. Complete instructionsappear in To remove a mount point.


5. Issue the fs mkmount to create a mount point that indicates the volume’s new name. Complete instructions appear in Tocreate a regular or read/write mount point.

% fs mkmount <directory> <volume name> [-rw]

5.16 Unlocking and Locking VLDB Entries

As detailed in Synchronizing the VLDB and Volume Headers, The Volume Location (VL) Server locks the Volume LocationDatabase (VLDB) entry for a volume before the Volume Server executes any operation on it. No other operation can affect avolume with a locked VLDB entry, so the lock prevents the inconsistency or corruption that can result from multiple simultaneousoperations on a volume.

To verify that a VLDB entry is locked, issue the vos listvldb command as described in To display VLDB entries. The commandhas a -locked flag that displays locked entries only. If the VLDB entry is locked, the string Volume is currently LOCKEDappears on the last line of the volume’s output.

To lock a VLDB entry yourself, use the vos lock command. This is useful when you suspect something is wrong with a volumeand you want to prevent any changes to it while you are investigating the problem.

To unlock a locked VLDB entry, issue the vos unlock command, which unlocks a single VLDB entry, or the vos unlockvldbcommand, which unlocks potentially many entries. This is useful when a volume operation fails prematurely and leaves a VLDBentry locked, preventing you from acting to correct the problems resulting from the failure.


5.16.1 To lock a VLDB entry



2. Issue the vos lock to lock the entry.

% vos lock <volume name or ID>

where

lo Is the shortest acceptable abbreviation of lock.

volume name or ID Identifies the volume to be locked, either by its complete name or volume ID number. It can be anyof the three versions of the volume.

5.16.2 To unlock a single VLDB entry



2. Issue the vos unlock command to unlock the entry.

% vos unlock <volume name or ID>

where

unlock Must be typed in full.

volume name or ID Identifies the volume to be unlocked, either by its complete name or volume ID number. It can beany of the three versions of the volume.

5.16.3 To unlock multiple VLDB entries



2. Issue the vos unlockvldb command to unlock the desired entries.

% vos unlockvldb [<machine name>] [<partition name>]

where

unlockv Is the shortest acceptable abbreviation of unlockvldb.

machine name Specifies a file server machine. Provide this argument alone to unlock all VLDB entries that mention themachine in a site definition. Omit both this argument and the partition name argument to unlock all VLDB entries.

partition name Specifies a partition. Provide this argument alone to unlock all VLDB entries that mention the partition(on any machine) in a site definition. Omit both this argument and the machine name argument to unlock all VLDBentries.


Chapter 6

Configuring the AFS Backup System

The AFS Backup System helps you to create backup copies of data from AFS volumes and to restore data to the file system ifit is lost or corrupted. This chapter explains how to configure the Backup System. For instructions on backing up and restoringdata and displaying dump records, see Backing Up and Restoring AFS Data.



Determine tape capacity and filemark size fmsDefine Tape Coordinator entry in Backup Database backup addhostRemove Tape Coordinator entry from Backup Database backup delhostDisplay Tape Coordinator entries from Backup Database backup listhostsCreate volume set backup addvolsetAdd volume entry to volume set backup addvolentryList volume sets and entries backup listvolsetsDelete volume set from Backup Database backup delvolsetDelete volume entry from volume set backup delvolentryDefine dump level backup adddumpChange expiration date on existing dump level backup setexpDelete dump level from dump hierarchy backup deldumpDisplay dump hierarchy backup listdumpsLabel tape backup labeltapeRead label on tape backup readlabel

6.2 Introduction to Backup System Features

The AFS Backup System is highly flexible, enabling you to control most aspects of the backup process, including how oftenbackups are performed, which volumes are backed up, and whether to dump all of the data in a volume or just the data thathas changed since the last dump operation. You can also take advantage of several features that automate much of the backupprocess.

To administer and use the Backup System most efficiently, it helps to be familiar with its basic features, which are described inthe following sections. For pointers to instructions for implementing the features as you configure the Backup System in yourcell, see Overview of Backup System Configuration.


6.2.1 Volume Sets and Volume Entries

When you back up AFS data, you specify which data to include in terms of complete volumes rather than individual files. Moreprecisely, you define groups of volumes called volume sets, each of which includes one or more volumes that you want to backup in a single operation. You must include a volume in a volume set to back it up, because the command that backs up data (thebackup dump command) does not accept individual volume names.

A volume set consists of one or more volume entries, each of which specifies which volumes to back up based on their location(file server machine and partition) and volume name. You can use a wildcard notation to include all volumes that share a location,a common character string in their names, or both.

For instructions on creating and removing volume sets and volume entries, see Defining and Displaying Volume Sets and VolumeEntries.

6.2.2 Dumps and Dump Sets

A dump is the collection of data that results from backing up a volume set. A full dump includes all of the data in every volumein the volume set, as it exists at the time of the dump operation. An incremental dump includes only some of the data fromthe volumes in the volume set, namely those files and directory structures that have changed since a specified previous dumpoperation was performed. The previous dump is referred to as the incremental dump’s parent dump, and it can be either a fulldump or an incremental dump itself.

A dump set is a collection of one or more dumps stored together on one or more tapes. The first dump in the dump set is the initialdump, and any subsequent dump added onto the end of an existing dump set is an appended dump. Appending dumps is alwaysoptional, but maximizes use of a tape’s capacity. In contrast, creating only initial dumps can result in many partially filled tapes,because an initial dump must always start on a new tape, but does not necessarily extend to the end of the tape. Appended dumpsdo not have to be related to one another or to the initial dump (they do not have to be dumps of the same or related volume sets),but well-planned appending can reduce the number of times you have to change tapes during a restore operation. For example, itcan make sense to append incremental dumps of a volume set together in a single dump set.

All the records for a dump set are indexed together in the Backup Database based on the initial dump (for more on the BackupDatabase, see The Backup Database and Backup Server Process). To delete the database record of an appended dump, you mustdelete the initial dump record, and doing so deletes the records for all dumps in the dump set. Similarly, you cannot recycle justone tape in a dump set without deleting the database records of all tapes in the dump set.

For instructions on creating an initial dump, see Backing Up Data, and to learn how to append dumps, see Appending Dumps toan Existing Dump Set.

6.2.3 Dump Hierarchies, Dump Levels and Expiration Dates

A dump hierarchy is a logical structure that defines the relationship between full and incremental dumps; that is, it defines whichdump serves as the parent for an incremental dump. Each individual component of a hierarchy is a dump level. When you createa dump by issuing the backup dump command, you specify a volume set name and a dump level name. The Backup Systemuses the dump level to determine whether the dump is full or incremental, and if incremental, which dump level to use as theparent.

You can associate an expiration date with a dump level, to define when a dump created at that level expires. The Backup Systemrefuses to overwrite a tape until all dumps in the dump set to which the tape belongs have expired, so assigning expiration datesautomatically determines how you recycle tapes. You can define an expiration date either in absolute terms (for example, 13January 2000) or relative terms (for example, 30 days from when the dump is created). You can also change the expiration dateassociated with a dump level (but not with an actual dump that has already been created at that level).

For instructions on creating dump hierarchies, assigning expiration dates, and establishing a tape recycling schedule, see Definingand Displaying the Dump Hierarchy.

6.2.4 Dump Names and Tape Names

When you create a dump, the Backup System creates a Backup Database record for it, assigning a name comprising the volumeset name and the last element in the dump level pathname:


volume_set_name.dump_level_name

For example, a dump of the volume set user at the dump level /sunday/friday is called user.friday. The Backup System alsoassigns a unique dump ID number to the dump to distinguish it from other dumps with the same name that possibly exist.

The Backup System assigns a similar AFS tape name to each tape that contains a dump set, reflecting the volume set and dumplevel of the dump set’s initial dump, plus a numerical index of the tape’s position in the dump set, and a unique dump ID number:

volume_set_name.dump_level_name.tape_index (dump ID)

For example, the second tape in a dump set whose initial dump is of the volume set uservol at the dump level /sunday/fridayhas AFS tape name like uservol.friday.2 (914382400).

In addition to its AFS tape name, a tape can have an optional permanent name that you assign. Unlike the AFS tape name, thepermanent name does not have to indicate the volume set and dump level of the initial (or any other) dump, and so does notchange depending on the contents of the tape. The Backup System does not require a certain format for permanent names, soyou need to make sure that each tape’s name is unique. If a tape has a permanent name, the Backup System uses it rather thanthe AFS tape name when referring to the tape in prompts and the output from most backup commands, but still tracks the AFStape name internally.

6.2.5 Tape Labels, Dump Labels, and EOF Markers

Every tape used in the Backup System has a magnetic label at the beginning that records the tape’s name, capacity, and otherinformation. You can use the backup labeltape command to write a label, or the backup dump command creates one automat-ically if you use an unlabeled tape. The label records the following information:

• The tape’s permanent name, which you can assign by using the -pname argument to the backup labeltape command. It canbe any string of up to 32 characters. If you do not assign a permanent name, the Backup System records the value <NULL>when you use the backup labeltape command to assign an AFS tape name, or when you use the backup dump command towrite a dump to the tape.

• The tape’s AFS tape name, which can be one of three types of values:

– A name that reflects the volume set and dump level of the dump set’s initial dump and the tape’s place in the sequence oftapes for the dump set, as described in Dump Names and Tape Names. If the tape does not have a permanent name, you canassign the AFS tape name by using the -name argument to the backup labeltape command.

– The value <NULL>, which results when you assign a permanent name, or provide no value for the backup labeltapecommand’s -name argument.

– No AFS tape name at all, indicating that you have never labeled the tape or written a dump to it.

If a tape does not already have an actual AFS tape name when you write a dump to it, the Backup System constructs andrecords the appropriate AFS tape name. If the tape does have an AFS tape name and you are writing an initial dump, then thename must correctly reflect the dump’s volume set and dump level.

• The capacity, or size, of the tape, followed by a letter that indicates the unit of measure (k or K for kilobytes, m or M formegabytes, g or G for gigabytes, or t or T for terabytes). The tape’s manufacturer determines the tape’s capacity. For furtherdiscussion of how the Backup System uses the value in the capacity field, see Configuring the tapeconfig File.

For information about labeling tapes, see Writing and Reading Tape Labels.

In addition to the tape label, the Backup System writes a dump label on the tape for every appended dump (the tape label anddump label are the same for the initial dump). A dump label records the following information:

• The name of the tape containing the dump

• The date and time that the dump operation began

• The cell to which the volumes in the dump belong


• The dump’s size in kilobytes

• The dump’s dump level

• The dump’s dump ID

The Backup System writes a filemark (also called an End-of-File or EOF marker) between the data from each volume in a dump.The tape device’s manufacturer determines the filemark size, which is typically between 2 KB and 2 MB; in general, the largerthe usual capacity of the tapes that the device uses, the larger the filemark size. If a dump contains a small amount of data fromeach of a large number of volumes, as incremental dumps often do, then the filemark size can significantly affect how muchvolume data fits on the tape. To enable the Backup System to factor in filemark size as it writes a dump, you can record thefilemark size in a configuration file; see Configuring the tapeconfig File.

6.2.6 Tape Coordinator Machines, Port Offsets, and Backup Data Files

A Tape Coordinator machine is a machine that drives one or more attached tape devices used for backup operations. It mustrun the AFS client software (the Cache Manager) but reside in a physically secure location to prevent unauthorized access to itsconsole. Before backup operations can run on a Tape Coordinator machine, each tape device on the machine must be registeredin the Backup Database, and certain files and directories must exist on the machine’s local disk; for instructions, see To configurea Tape Coordinator machine.

Each tape device on a Tape Coordinator machine listens for backup requests on a different UNIX port. You pick the port indirectlyby assigning a port offset number to the tape device. The Backup System sets the device’s actual port by adding the port offset toa base port number that it determines internally. For instructions on assigning port offset numbers, see Configuring the tapeconfigFile.

For a tape device to perform backup operations, a Backup Tape Coordinator (butc) process dedicated to the device must be run-ning actively on the Tape Coordinator machine. You then direct backup requests to the device’s Tape Coordinator by specifyingits port offset number with the -portoffset argument to the backup command.

In addition to writing backup data to tape, you can direct it to a backup data file on the local disk of a Tape Coordinator machine.You can then to transfer the data to a data-archiving system, such as a hierarchical storage management (HSM) system, that youuse in conjunction with AFS and the Backup System. A backup data file has a port offset like a tape device. For instructions onconfiguring backup data files, see Dumping Data to a Backup Data File.

6.2.7 The Backup Database and Backup Server Process

The Backup Database is a replicated administrative database maintained by the Backup Server process on the cell’s databaseserver machines. Like the other AFS database server processes, the Backup Server uses the Ubik utility to keep the variouscopies of the database synchronized (for a discussion of Ubik, see Replicating the OpenAFS Administrative Databases).

The Backup Database records the following information:

• The Tape Coordinator machine’s hostname and the port offset number for each tape device used for backup operations

• The dump hierarchy, which consists of its component dump levels and their associated expiration dates

• The volume sets and their component volume entries

• A record for each dump, which includes the name of each tape it appears on, a list of the volumes from which data is included,the dump level, the expiration date, and the dump ID of the initial dump with which the dump is associated

• A record for each tape that houses dumped data


6.2.8 Interfaces to the Backup System

The backup suite of commands is the administrative interface to the Backup System. You can issue the commands in a commandshell (or invoke them in a shell script) on any AFS client or server machine from which you can access the backup binary. In theconventional configuration, the binary resides on the local disk.

The backup command suite provides an interactive mode, in which you can issue multiple commands over a persistent connec-tion to the Backup Server and the Volume Location (VL) Server. Interactive mode has several convenient features, including thefollowing:

• You need to type only the operation code, omitting the initial backup string.

• If you assume another AFS identity or specify a foreign cell as you enter interactive mode, it applies to all subsequent com-mands.

• You do not need to enclose shell metacharacters in double quotes.

• You can track current and pending operations with the (backup) jobs command, which is available only in this mode.

• You can cancel current and pending operations with the (backup) kill command, which is available only in this mode.

Before issuing a command that requires reading or writing a tape (or backup data file), you must also open a connection to theTape Coordinator machine that is attached to the relevant tape device (or that has the backup data file on its local disk), and issuethe butc command to initialize the Tape Coordinator process. The process must continue to run and the connection remain openas long as you need to use the tape device or file for backup operations.

For further discussion and instructions, see Using the Backup System’s Interfaces.

6.3 Overview of Backup System Configuration

Before you can use the Backup System to back up and restore data, you must configure several of its basic components. Theindicated sections of this chapter explain how to perform the following configuration tasks:

• Determining a tape’s capacity and a tape device’s filemark size, and recording them in the /usr/afs/backup/tapeconfig file (seeConfiguring the tapeconfig File)

• Determining how to grant administrative privilege to backup operators (see Granting Administrative Privilege to Backup Op-erators)

• Configuring Tape Coordinator machines, tape devices, and backup data files (see Configuring Tape Coordinator Machines andTape Devices)

• Defining volume sets and volume entries (see Defining and Displaying Volume Sets and Volume Entries)

• Defining dump levels to create a dump hierarchy (see Defining and Displaying the Dump Hierarchy)

• Labeling tapes (see Writing and Reading Tape Labels)

• Creating a device configuration file to automate the backup process (see Automating and Increasing the Efficiency of theBackup Process)

If you have already configured all of the components required for performing a backup dump or restore operation, you canproceed to the instructions in Backing Up Data and Restoring and Recovering Data.


6.4 Configuring the tapeconfig File

Several factors interact to determine how much data the Tape Coordinator can fit on a tape:

• The tape’s capacity (size), as set by the tape manufacturer.

• The tape device’s filemark size, as set by the tape device’s manufacturer. Recall from Tape Labels, Dump Labels, and EOFMarkers that the Tape Coordinator writes a filemark between the data from each volume in a dump. If a dump contains a smallamount of data from each of a large number of volumes, as incremental dumps often do, then the filemark size can significantlyaffect how much volume data fits on the tape.

• Whether or not you use the tape device’s compression mode.

(The amount of data that can fit in a backup data file is determined by amount of space available on the partition, and theoperating system’s maximum file size. The Tape Coordinator does not write filemarks when writing to a backup data file. Forfurther information about configuring a Tape Coordinator to write to a backup data file, see Dumping Data to a Backup DataFile.)

As the Tape Coordinator (butc) process initializes, it reads the /usr/afs/backup/tapeconfig file on its local disk to learn the tapecapacity and filemark size (for a tape device) or the file size (for a backup data file) to use for dump operations. When you begina dump operation, the Tape Coordinator also reads the tape or backup data file’s label to see if you have recorded a different tapecapacity or file size. If you have, the value on the label overrides the default value from the tapeconfig file.

As the Tape Coordinator writes data to a tape during a dump operation, it uses the capacity and filemark information to trackhow much tape it has used and how much remains before the physical end-of-tape (EOT). Shortly before reaching EOT, the TapeCoordinator stops writing and requests a new tape. Similarly, it uses a backup data file’s size to know when it is about to exhaustthe space in the file. If the Tape Coordinator reaches the EOT unexpectedly, it recovers by obtaining a new tape and writing toit the entire contents of the volume it was writing when it reached EOT. The interrupted volume remains on the first tape, but isnever used.

Many tape devices use tapes that can accommodate multiple gigabytes, or even multiple terabytes, of backup data, especiallyif you use the device’s compression mode. When writing to such devices and tapes, allowing the Tape Coordinator to hit theEOT unexpectedly is generally recommended. The devices write data so quickly that it usually does not take much extra time torewrite the interrupted volume on the new tape. Similarly, they compress data so well that the data abandoned on the first tapefrom the interrupted volume does not constitute a waste of much tape.

When writing to tapes that accommodate a smaller amount of data (say, less than two GB), it is better to avoid having the TapeCoordinator hit EOT unexpectedly. AFS supports volumes up to 2 GB in size, so an interrupted volume can in fact take up mostof the tape. For such tapes, recording accurate values for tape capacity and filemark size, if possible, helps to maximize bothuse of tape and the efficiency of dump operations. The following discussion of the fields in the tapeconfig file explains how todetermine the appropriate values.

Use a text editor to create an entry in a Tape Coordinator’s tapeconfig file for each tape device or backup data file that it uses.Each device or file’s entry is on its own line and has the following format:

[capacity filemark_size] device_name port_offset

where

capacity Specifies the capacity of the tapes used with a tape device, or the amount of data to write into a backup data file.Specify an integer value followed by a letter that indicates units, with no intervening space. The letter k or K indicateskilobytes, m or M indicates megabytes, g or G indicates gigabytes, and t or T indicates terabytes. If the units letter isomitted, the default is kilobytes.

To determine the capacity of a tape under two GB in size that you are going to use in regular (noncompression) mode, youcan either use the value that the tape’s manufacturer specifies on the tape’s packaging or use the fms command to calculatethe capacity, as described later in this section. To avoid having the Tape Coordinator reach the EOT unexpectedly, it is bestto record in the tapeconfig file or on the label a capacity that is about 10% smaller than the actual capacity of the tape. Tocalculate the appropriate value for a small tape used in compression mode, one method is to multiply the tape capacity (asrecorded by the manufacturer) by the device’s compression ratio.


For tapes that hold multiple gigabytes or terabytes of data, or if using a tape drive’s compression mode, the recommendedconfiguration is to record a value quite a bit (for instance, two times) larger than the maximum amount you believe can fiton the tape. It is not generally worthwhile to run the fms command on large tapes, even in noncompression mode. Thecommand definitely does not yield accurate results in compression mode. The Tape Coordinator is likely to reach the EOTunexpectedly, but compression mode fits so much data on the tape that the data abandoned from an interrupted volumedoes not represent much of the tape’s capacity.

For a backup data file, record a value slightly smaller than the amount of space available on the partition, and definitelysmaller than the operating system’s maximum file size. It is also best to limit the ability of other processes to write to thepartition, to prevent them from using up the space in the partition.

If this field is empty, the Tape Coordinator uses the maximum acceptable value (2048 GB or 2 TB). Either leave both thisfield and the filemark_size field empty, or provide a value in both of them.

filemark_size Specifies the tape device’s filemark size, which usually falls between 2 KB and 2 MB. Use the same notation asfor the capacity field, but note that if you omit the units letter, the default unit is bytes rather than kilobytes.

For a tape device in regular (noncompression) mode, you can use the fms command to determine filemark size, or use thevalue reported by the device’s manufacturer. To help the Tape Coordinator avoid reaching EOT unexpectedly, increase thevalue by about 10% when recording it in the tapeconfig file.

The recommended value for a tape device in compression mode is 0 (zero). The fms command does not yield accurateresults in compression mode, so you cannot use it to determine the filemark size.

The recommended value for a backup data file is also 0 (zero). The Tape Coordinator does not use filemarks when writingto a file, but a value must appear in this field nevertheless if there is also a value in the capacity field.

If this field is empty, the Tape Coordinator uses the value 0 (zero). Either leave both this field and the capacity field empty,or provide a value in both of them.

device_name Specifies the complete pathname of the tape device or backup data file. The format of tape device names dependson the operating system, but on UNIX systems, device names generally begin with the string /dev/. For a backup data file,this field defines the complete pathname, but for suggestions on how to name a backup data file, see Dumping Data to aBackup Data File.

port_offset Specifies the port offset number for a specific tape device or backup data file. Each tape device listens for backuprequests on a different UNIX port. You pick the port indirectly by recording a value in this field. The Backup System setsthe device’s actual port by adding the port offset to a base port number that it determines internally.

Legal values are the integers 0 through 58510 (the Backup System can track a maximum of 58,511 port offset numbers).Each value must be unique among the cell’s Tape Coordinators, but you do not have to assign port offset numbers sequen-tially, and you can associate any number of them with a single machine or even tape device. For example, if you planto use a device in both compression and noncompression mode, assign it two different port offsets with appropriate tapecapacity and filemark values for the different modes.

Assign port offset 0 (zero) to the Tape Coordinator for the tape device or backup data file that you use most often forbackup operations; doing so enables you to omit the -portoffset argument from the largest possible number of backupcommands.

The following example tapeconfig file includes entries for two tape devices, /dev/rmt0h and /dev/rmt1h. Each one uses tapeswith a capacity of 2 GB and has a filemark size of 1 MB. Their port offset numbers are 0 and 1.

2g 1m /dev/rmt0h 02G 1M /dev/rmt1h 1

The fms command reports the capacity of the tape you have inserted and the tape device’s filemark size, both on the standardoutput stream (stdout) and in its fms.log file, which it writes in the current working directory. The command interpreter mustwrite data to the entire tape, so running the command can take from several hours to more than a day, depending on the size ofthe tape.

6.4.1 To run the fms command on a noncompressing tape device

1. If an fms.log file does not already exist in the current directory, verify that you can insert and write to files in the currentdirectory. If the log file already exists, you must be able to write to the file.


2. Insert a tape into the drive. Running the command completely overwrites the tape, so use a blank tape or one that you wantto recycle.

3. Issue the fms command.

% fms <tape special file>

where

fms Must be typed in full.

tape special file Specifies the tape device’s UNIX device name, such as /dev/rmt0h.

The following example output reports that the tape in the device with device name /dev/rmt0h has a capacity of 2136604672bytes (about 2 GB), and that the device’s filemark size is 1910205 bytes (close to 2 MB).

% fms /dev/rmt0hwrote block: 130408Finished data capacity test - rewindingwrote 1109 blocks, 1109 file marksFinished file mark testTape capacity is 2136604672 bytesFile marks are 1910205 bytes

6.5 Granting Administrative Privilege to Backup Operators

Each person who issues the backup and butc commands in your cell must be listed in the /usr/afs/etc/UserList file on everydatabase server machine that stores the Backup Database and Volume Location Database (VLDB), and every machine thathouses a volume included in a volume set. By convention, the UserList file is the same on every server machine in the cell; theinstructions in this document assume that your cell is configured in this way. To edit the UserList file, use the bos adduser andbos removeuser commands as described in Administering the UserList File.

In addition to being listed in the UserList file, backup operators who issue the butc command must be able to write to the filesstored in each Tape Coordinator machine’s local /usr/afs/backup directory, which are protected by UNIX mode bits. Beforeconfiguring your cell’s first Tape Coordinator machine, decide which local user and group to designate as the owner of thedirectory and the files in it. Among the possible ownership options are the following:

• The local superuser root. With this option, the issuer of the butc command must log onto the local file system as the localsuperuser root. If the Tape Coordinator is also a server machine, the -localauth flag is used on the butc command to constructa server ticket from the local /usr/afs/etc/KeyFile file. On non-server machine, the issuer must issue the klog command toauthenticate as an AFS administrator while logged in as root.

• A single AFS administrator. Logging in and authenticating are a single step if an AFS-modified login utility is used. Theadministrator is the only user who can start the Tape Coordinator.

• An administrative account for which several operators know the password. This allows them all to start the Tape Coordinator.

Another option is to define a group in the local group file (/etc/group or equivalent) to which all backup operators belong. Thenturn on the w mode bit (write permission) in the group mode bits rather than the user mode bits of the /usr/afs/backup directoryand files in it. An advantage over the methods listed previously is that each operator can retain an individual administrativeaccount for finer granularity in auditing.

For instructions on implementing your choice of protection methods, see Configuring Tape Coordinator Machines and TapeDevices.


6.6 Configuring Tape Coordinator Machines and Tape Devices

This section explains how to configure a machine as a Tape Coordinator machine, and how to configure or remove the TapeCoordinator associated with a single tape device or backup data file.

NoteWhen configuring a tape device attached to an AIX system, you must set the device’s tape block size to 0 (zero) to indicatevariable block size. If you do not, it is possible that devices attached to machines of other system types cannot read the tapesmade on the AIX system. Use the AIX smit program to verify or change the value of the tape block size for a tape device, asinstructed in Sep 3.

6.6.1 To configure a Tape Coordinator machine




% su rootPassword: <root_password>

3. Install one or more tape devices on the Tape Coordinator machine according to the manufacturer’s instructions. TheBackup System can track a maximum of 58,511 tape devices or backup data files per cell.

If the Tape Coordinator machine is an AIX system, issue the following command to change the tape device’s tape blocksize to 0 (zero), which indicates variable block size. Repeat for each tape device.

# chdev -l ’device_name’ -a block_size=’0’

where device_name is the tape device’s device name (for example, /dev/rmt0h).

4. Verify that the binary files for the backup, butc, and fms commands are available on the local disk. If the machine is anAFS client, the conventional location is the /usr/afsws/etc directory.

# ls /usr/afsws/etc

5. Create the /usr/afs directory. (If the Tape Coordinator machine is also configured as a file server machine, this directoryalready exists.) Then create the /usr/afs/backup directory.

# mkdir /usr/afs# mkdir /usr/afs/backup

6. Use a text editor to create the /usr/afs/backup/tapeconfig file. Include a single line for each tape device or backup datafile, specifying the following information in the indicated order. For syntax details and suggestions on the values to use ineach field, see Configuring the tapeconfig File.

• The capacity of tapes to be used in the device, or the size of the backup data file

• The device’s filemark size

• The device’s device name, starting with the string /dev/• The device’s port offset number

7. Decide which user and group are to own the /usr/afs/backup directory and /usr/afs/backup/tapeconfig file, based on thesuggestions in Granting Administrative Privilege to Backup Operators. Correct the UNIX mode bits on the directory andfile, if necessary.


# chown admin_owner /usr/afs/backup# chown admin_owner /usr/afs/backup/tapeconfig# chgrp admin_group /usr/afs/backup# chgrp admin_group /usr/afs/backup/tapeconfig# chmod 774 /usr/afs/backup# chmod 664 /usr/afs/backup/tapeconfig

8. Issue the backup addhost command to create a Tape Coordinator entry in the Backup Database. Repeat the command foreach Tape Coordinator.

# backup addhost <tape machine name> [<TC port offset>]

where

addh Is the shortest acceptable abbreviation of addhost.tape machine name Specifies the Tape Coordinator machine’s fully qualified hostname.TC port offset Specifies the tape device’s port offset number. Provide the same value as you specified for the device in

the tapeconfig file. You must provide this argument unless the default value of 0 (zero) is appropriate.

6.6.2 To configure an additional Tape Coordinator on an existing Tape Coordinator machine





3. Install the tape device on the Tape Coordinator machine according to the manufacturer’s instructions.

If the Tape Coordinator machine is an AIX system, issue the following command to change the tape device’s tape blocksize to 0 (zero), which indicates variable block size.

# chdev -l ’device_name’ -a block_size=’0’

4. Choose the port offset number to assign to the tape device. If necessary, use the backup listhosts command to displaythe port offset numbers that are already used; for a discussion of the output, see To display the list of configured TapeCoordinators.

# backup listhosts

where listh is the shortest acceptable abbreviation of listhosts.

5. Use a text editor to add one or more entries for the device to the /usr/afs/backup/tapeconfig file. Specify the followinginformation in the indicated order. For syntax details and suggestions on the values to use in each field, see Configuringthe tapeconfig File.

• The capacity of tapes to be used in the device, or the size of the backup data file• The device’s filemark size• The device’s device name, starting with the string /dev/• The device’s port offset number

6. Issue the backup addhost command to create an entry in the Backup Database for the Tape Coordinator. For completesyntax, see Step 8 in To configure a Tape Coordinator machine.

# backup addhost <tape machine name> [<TC port offset>]


6.6.3 To unconfigure a Tape Coordinator



2. Using a text editor, remove each of the Tape Coordinator’s entries from the /usr/afs/backup/tapeconfig file.

3. Issue the backup delhost command to delete the Tape Coordinator’s Backup Database entry.

% backup delhost <tape machine name> [<TC port offset>]

where

delh Is the shortest acceptable abbreviation of delhost.tape machine name Is the complete Internet host name of the Tape Coordinator machine.

TC port offset Is the same port offset number removed from the tapeconfig file. You must provide this argument unlessthe default value of 0 (zero) is appropriate.

6.6.4 To display the list of configured Tape Coordinators

1. Issue the backup listhosts command to list the Tape Coordinators and port offset numbers currently configured in theBackup Database.

% backup listhosts

where

listh Is the shortest acceptable abbreviation of listhosts.

The output lists each Tape Coordinator machine and the port offset numbers currently allocated to it in the Backup Database.The appearance of a port offset number does not imply that the associated Tape Coordinator is actually running. Machine namesappear in the format in which they were specified with the backup addhost command.

The following example output lists the Tape Coordinators currently defined in the Backup Database of the Example Corporationcell:

% backup listhostsTape hosts:

Host backup1.example.com, port offset 0Host backup1.example.com, port offset 2Host backup2.example.com, port offset 1Host backup2.example.com, port offset 3

6.7 Defining and Displaying Volume Sets and Volume Entries

The Backup System handles data at the level of volumes rather than individual files. You must define groups of volumes calledvolume sets before performing backup operations, by using the backup addvolset command. A volume set name can be up to 31characters long and can include any character other than the period (.), but avoid using metacharacters that have special meaningsto the shell.

After creating a volume set, use the backup addvolentry command to place one or more volume entries in it. They define thevolumes that belong to it in terms of their location (file server machine and partition) and name. Use the command’s required-server argument to designate the file server machine that houses the volumes of interest and its required -partition argument todesignate the partition. Two types of values are acceptable:


• The fully qualified hostname of one machine or full name of one partition (such as /vicepm)

• The regular expression .* (period and asterisk), which matches every machine name or partition name in the VLDB

For the volume name (the required -volume argument), specify a combination of alphanumeric characters and one or moremetacharacters to specify part or all of the volume name with a wildcard. You can use any of the following metacharacters in thevolume name field:

. The period matches any single character.

* The asterisk matches zero or more instances of the preceding character. Combine it with any other alphanumeric character ormetacharacter.

[ ] Square brackets around a list of characters match a single instance of any of the characters, but no other characters; forexample, [abc] matches a single a or b or c, but not d or A. You can combine this expression with the asterisk.

ˆ The caret, when used as the first character in a square-bracketed set, designates a match with any single character other thanthe characters that follow it; for example, [ˆa] matches any single character except lowercase a. You can combine thisexpression with the asterisk.

\ A backslash preceding any of the metacharacters in this list makes it match its literal value only. For example, the expression\. (backslash and period) matches a single period, \* matches a single asterisk, and \\ matches a single backslash. You cancombine such expressions with the asterisk (for example, \.* matches any number of periods).

Perhaps the most common regular expression is the period followed by an asterisk (.*). This expression matches any string ofany length, because the period matches any character and the asterisk means any number of that character. As mentioned, it is theonly acceptable regular expression in the file server and partition fields of a volume entry. In the volume name field, it can standalone (in which case it matches every volume listed in the VLDB), or can combine with alphanumeric characters. For example,the string user.*\.backup matches any volume name that begins with the string user and ends with .backup.

Issuing the backup addvolentry command in interactive mode is simplest. If you issue it at the shell prompt, you must surroundany string that includes a regular expression with double quotes (" ") so that the shell passes them uninterpreted to the backupcommand interpreter rather than resolving them.

To define various combinations of volumes, provide the following types of values for the backup addvolentry command’s threearguments. The list uses the notation appropriate for interactive mode; if you issue the command at the shell prompt instead,place double quotes around any string that includes a regular expression. To create a volume entry that includes:

• All volumes listed in the VLDB, use the regular expression .* for all three arguments (-server .* -partition .* -volume .*)

• Every volume on a specific file server machine, specify its fully qualified hostname as the -server argument and use the regularexpression .* for the -partition and -volume arguments (for example: -server fs1.example.com -partition .* -volume .*)

• All volumes that reside on a partition with the same name on various file server machines, specify the complete partition nameas the -partition argument and use the regular expression .* for the -server and -volume arguments (for example: -server .*-partition /vicepd -volume .*)

• Every volume with a common string in its name, use the regular expression .* for the -server and -partition arguments,and provide a combination of alphanumeric characters and metacharacters as the -volume argument (for example: -server .*-partition .* -volume .*\.backup includes all volumes whose names end in the string .backup).

• All volumes on one partition, specify the machine’s fully qualified hostname as the -server argument and the full partition nameas the -partition argument, and use the regular expression .* for the -volume argument (for example: -server fs2.example.com-partition /vicepb -volume .*).

• A single volume, specify its complete name as the -volume argument. To bypass the potentially time-consuming search throughthe VLDB for matching entries, you can specify an actual machine and partition name for the -server and -partition argumentsrespectively. However, if it is possible that you need to move the volume in future, it is best to use the regular expression .* forthe machine and partition name.


As you create volume sets, define groups of volumes you want to dump to the same tape at the same time (for example, weekly ordaily) and in the same manner (fully or incrementally). In general, a volume set that includes volumes with similar contents (asindicated by similar names) is more useful than one that includes volumes that share a common location, especially if you oftenmove volumes for load-balancing or space reasons. Most often, then, it is appropriate to use the regular expression .* (periodfollowed by a backslash) for the -server and -partition arguments to the backup addvolentry command.

It is generally more efficient to include a limited number of volumes in a volume entry. Dumps of a volume set that includesa large number of volume can take a long time to complete, increasing the possibility that the operation fails due to a serviceinterruption or outage.

To remove a volume entry from a volume set, use the backup delvolentry command. To remove a volume set and all of itscomponent volume entries from the Backup Database, use the backup delvolset command. To display the volume entries in avolume set, use the backup listvolsets command.

By default, a Backup Database record is created for the new volume set. Sometimes it is convenient to create volume setswithout recording them permanently in the Backup Database, for example when using the backup volsetrestore command torestore a group of volumes that were not necessarily backed up together (for further discussion, see Using the backup volsetrestoreCommand). To create a temporary volume set, include the -temporary flag to the backup addvolset command. A temporaryvolume set exists only during the lifetime of the current interactive session, so the flag is effective only when used during aninteractive session (opened by issuing the backup (interactive) command). You can use the backup delvolset command todelete a temporary volume set before the interactive session ends, if you wish, but as noted it is automatically deleted when youend the session. One advantage of temporary volume sets is that the backup addvolset command, and any backup addvolentrycommands subsequently used to add volume entries to it, complete more quickly than for regular volume sets, because you arenot creating any Backup Database records.

6.7.1 To create a volume set



2. (Optional) Issue the backup command to enter interactive mode. If you are going to define volume entries right away withthe backup addvolentry command, this eliminates the need to surround metacharacter expressions with double quotes.You must enter interactive mode if creating a temporary volume set.

% backup

3. Issue the (backup) addvolset command to create the volume set. You must then issue the (backup) addvolentry commandto define volume entries in it.

backup> addvolset <volume set name> [-temporary]

where

addvols Is the shortest acceptable abbreviation of addvolset.volume set name Names the volume set. The name can include no more than 31 characters, cannot include periods,

and must be unique within the Backup Database. (A temporary volume set can have the same name as an existingpermanent volume set, but this is not recommended because of the confusion it can cause.)

-temporary Creates a temporary volume set, which exists only during the current interactive session.

6.7.2 To add a volume entry to a volume set




2. (Optional) Issue the backup command to enter interactive mode if you have not already. This makes it simpler to usemetacharacter expressions, because you do not need to surround them with double quotes. If you are adding entries to atemporary volume set, you must already have entered interactive mode before creating the volume set.

% backup

3. Issue the (backup) addvolentry command to define volume entries in an existing volume set. The Backup System assignseach volume entry an index within the volume set, starting with 1 (one).

backup> addvolentry -name <volume set name> \-server <machine name> \-partition <partition name> \-volumes <volume name (regular expression)>

where

addvole Is the shortest acceptable abbreviation of addvolentry.

-name Names the volume set to which to add the volume entry. It must already exist (use the backup addvolset commandto create it).

-server Defines the set of one or more file server machines that house the volumes in the volume entry. Provide either onefully-qualified hostname (such as fs1.example.com) or the metacharacter expression .* (period and asterisk), whichmatches all machine names in the VLDB.

-partition Defines the set of one or more partitions that house the volumes in the volume entry. Provide either onecomplete partition name (such as /vicepa) or the metacharacter expression .* (period and asterisk), which matchesall partition names.

-volumes Defines the set of one or more volumes included in the volume entry, identifying them by name. This argumentcan include a combination of alphanumeric characters and one or more of the metacharacter expressions discussed inthe introductory material in this section.

6.7.3 To display volume sets and volume entries

1. Issue the backup listvolsets command to display the volume entries in a specific volume set or all of them. If you aredisplaying a temporary volume set, you must still be in the interactive session in which you created it.

% backup listvolsets [<volume set name>]

where

listv Is the shortest acceptable abbreviation of listvolsets.

volume set name Names the volume set to display. Omit this argument to display all defined volume sets.

The output from the command uses the wildcard notation used when the volume entries were created. The string (temporary) marks a temporary volume set. The following example displays all three of the volume sets defined in a cell’sBackup Database, plus a temporary volume set pat+jones created during the current interactive session:

backup> listvVolume set pat+jones (temporary):

Entry 1: server fs1.example.com, partition /vicepe, volumes: user.pat.backupEntry 2: server fs5.example.com, partition /viceph, volumes: user.jones.backup

Volume set user:Entry 1: server .*, partition .*, volumes: user.*\.backup

Volume set sun:Entry 1: server .*, partition .*, volumes: sun4x_55\..*Entry 2: server .*, partition .*, volumes: sun4x_56\..*

Volume set rs:Entry 1: server .*, partition .*, volumes: rs_aix42\..*


6.7.4 To delete a volume set



2. Issue the backup delvolset command to delete one or more volume sets and all of the component volume entries in them.If you are deleting a temporary volume set, you must still be in the interactive session in which you created it.

% backup delvolset <volume set name>+

where

delvols Is the shortest acceptable abbreviation of delvolset.volume set name Names each volume set to delete.

6.7.5 To delete a volume entry from a volume set



2. Issue the backup command to enter interactive mode.

% backup

3. If the volume set includes more than one volume entry, issue the (backup) listvolsets command to display the indexnumber associated with each one (if there is only one volume entry, its index is 1). For a more detailed description of thecommand’s output, see To display volume sets and volume entries.

backup> listvolsets <volume set name>

where

listv Is the shortest acceptable abbreviation of listvolsets.

volume set name Names the volume set for which to display volume entries.

4. Issue the (backup) delvolentry command to delete the volume entry.

backup> delvolentry <volume set name> <volume entry index>

where

delvole Is the shortest acceptable abbreviation of delvolentry.

volume set name Names the volume set from which to delete a volume entry.

volume entry index Specifies the index number of the volume entry to delete.


6.8 Defining and Displaying the Dump Hierarchy

A dump hierarchy is a logical structure in the Backup Database that defines the relationship between full and incremental dumps;that is, it defines which dump serves as the parent for an incremental dump. Each individual component of a hierarchy is a dumplevel.

As you define dump levels with the backup adddump command, keep the following rules and suggestions in mind:

• Each full dump level is the top level of a hierarchy. You can create as many hierarchies as you need to dump different volumesets on different schedules.

• The name of a full dump level consists of an initial slash (/), followed by a string of up to 28 alphanumeric characters.

• The name of an incremental dump level resembles a pathname, starting with the name of a full dump level, then the firstincremental level, and so on, down to the final incremental level. Precede each level name with a slash to separate it from thepreceding level. Like the full level, each component level in the name can have up to 28 alphanumeric characters, not includingthe slash.

• A hierarchy can have any have any number of levels, but the maximum length of a complete dump level name is 256 characters,including the slashes.

• Before defining a given incremental level, you must define all of the levels above it in the hierarchy.

• Do not use the period (.) in dump level names. The Backup System uses the period as the separator between a dump’s volumeset name and dump level name when it creates the dump name and AFS tape name. Any other alphanumeric and punctuationcharacters are allowed, but it is best to avoid metacharacters. If you include a metacharacter, you must precede it with abackslash (\) or surround the entire dump level name with double quotes (" ").

• Naming dump levels for days or other actual time points reminds you when to perform dumps, and makes it easier to track therelationship between dumps performed at different levels. However, the names have no meaning to the Backup System: it doesnot automatically create dumps according to the names, and does not prevent you from, for example, using the /sunday levelwhen creating a dump on a Tuesday.

• It is best not to use the same name for more than one component level in a hierarchy, because it means the resulting dumpname no longer indicates which level was used. For example, if you name a dump level /full/incr/incr, then the dump nameand AFS tape name that result from dumping a volume set at the first incremental level (/full/incr) look the same as the namesthat result from dumping at the second incremental level (/full/incr/incr).

• Individual levels in different hierarchies can have the same name, but the complete pathnames must be unique. For example,/sunday1/monday and /sunday2/monday share the same name at the final level, but are unique because they have differentnames at the full level (belong to different hierarchies). However, using the same name in multiple hierarchies means thatdump and AFS tape names do not unambiguously indicate which hierarchy was used.

The following example shows three hierarchies. Each begins with a full dump at the top: sunday1 for the first hierarchy, sunday2for the second hierarchy, and sunday_bin for the third hierarchy. In all three hierarchies, each of the other dump levels is anincremental level.

/sunday1/monday/tuesday/wednesday/thursday/friday

/sunday2/monday

/tuesday/wednesday

/thursday/friday

/sunday_bin/monday

/wednesday/friday


In the first hierarchy, each incremental dump level refers to the full level /sunday1 as its parent. When (for example) you dump avolume set at the /sunday1/wednesday level, it includes data that has changed since the volume set was dumped at the /sunday1level.

In contrast, each incremental dump level in the second hierarchy refers to the immediately preceding dump level as its parent.When you dump a volume set at the corresponding level in the second hierarchy (/sunday2/monday/tuesday/wednesday), thedump includes only data that has changed since the volume set was dumped at the /sunday2/monday/tuesday level (presumablythe day before). Assuming you create dumps on the indicated days, an incremental dump made using this hierarchy contains lessdata than an incremental dump made at the corresponding level in the first hierarchy.

The third hierarchy is more appropriate for dumping volumes for which a daily backup is excessive because the data does notchange often (for example, system binaries).

6.8.1 Creating a Tape Recycling Schedule

If your cell is like most cells, you have a limited amount of room for storing backup tapes and a limited budget for new tapes.The easiest solution is to recycle tapes by overwriting them when you no longer need the backup data on them. The BackupSystem helps you implement a recycling schedule by enabling you to associate an expiration date with each dump level. Theexpiration date defines when a dump created at that level expires. Until that time the Backup System refuses to overwrite a tapethat contains the dump. Thus, assigning expiration dates automatically determines how you recycle tapes.

When designing a tape-recycling schedule, you must decide how far in the past and to what level of precision you want toguarantee access to backed up data. For instance, if you decide to guarantee that you can restore a user’s home volume to itsstate on any given day in the last two weeks, you cannot recycle the tape that contains a given daily dump for at least two weeksafter you create it. Similarly, if you decide to guarantee that you can restore home volumes to their state at the beginning of anygiven week in the last month, you cannot recycle the tapes in a dump set containing a weekly dump for at least four weeks. Thefollowing example dump hierarchy implements this recycling schedule by setting the expiration date for each daily incrementaldump to 13 days and the expiration date of the weekly full dumps to 27 days.

The tapes used to store dumps created at the daily incremental levels in the /sunday1 hierarchy expire just in time to be recycledfor daily dumps in the /sunday3 hierarchy (and vice versa), and there is a similar relationship between the /sunday2 and /sunday4hierarchies. Similarly, the tape that houses a full dump at the /sunday1 level expires just in time to be used for a full dump onthe first Sunday of the following month.

/sunday1 expires in 27d/monday1 expires in 13d/tuesday1 expires in 13d/wednesday1 expires in 13d/thursday1 expires in 13d/friday1 expires in 13d





If you use appended dumps in your cell, keep in mind that all dumps in a dump set are subject to the latest (furthest into thefuture) expiration date associated with any of the constituent dumps. You cannot recycle any of the tapes that contain a dump setuntil all of the dumps have reached their expiration date. See also Appending Dumps to an Existing Dump Set.

Most tape manufacturers recommend that you write to a tape a limited number of times, and it is best not to exceed this limitwhen recycling tapes. To help you track tape usage, the Backup System records a useCount counter on the tape’s label.It increments the counter each time the tape’s label is rewritten (each time you use the backup labeltape or backup dumpcommand). To display the useCount counter, use the backup readlabel or backup scantape command or include the -id and-verbose options when you issue the backup dumpinfo command. For instructions see Writing and Reading Tape Labels orDisplaying Backup Dump Records.

6.8.2 Archiving Tapes

Even if you make extensive use of tape recycling, there is probably some backup data that you need to archive for a long (or evenan indefinite) period of time. You can use the Backup System to archive data on a regular schedule, and you can also choose toarchive data on tapes that you previously expected to recycle.

If you want to archive data on a regular basis, you can create date-specific dump levels in the dump hierarchy. For example, ifyou decide to archive a full dump of all data in your cell at the beginning of each quarter in the year 2000, you can define thefollowing levels in the dump hierarchy:

/1Q2000/2Q2000/3Q2000/4Q2000

If you decide to archive data that is on tapes you previously planned to recycle, you must gather all of the tapes that containthe relevant dumps, both full and incremental. To avoid accidental erasure, it is best to set the switch on the tapes that makesthem read-only, before placing them in your archive storage area. If the tapes also contain a large amount of extraneous data thatyou do not want to archive, you can restore just the relevant data into a new temporary volume, and back up that volume to thesmallest number of tapes possible. One reason to keep a dump set small is to minimize the amount of irrelevant data in a dumpset you end up needing to archive.

If you do not expect to restore archived data to the file system, you can consider using the backup deletedump command toremove the associated dump records from the Backup Database, which helps keep it to an efficient size. If you ever need torestore the data, you can use the -dbadd flag to the backup scantape command to reinsert the dump records into the database.For instructions, see To scan the contents of a tape.

6.8.3 Defining Expiration Dates

To associate an expiration date with a dump level as you create it, use the -expires argument to the backup adddump command.To change an existing dump level’s expiration date, use the -expires argument to the backup setexp command. (Note that it isnot possible to change the expiration date of an actual dump that has already been created at that level). With both commands,you can define an expiration date either in absolute terms (for example, 13 January 2000) or relative terms (for example, 30 daysfrom when the dump is created).

• To define an absolute expiration date, provide a value for the -expires argument with the following format:

[at] mm/dd/yyyy [hh:MM]

where mm indicates the month, dd the day, and yyyy the year when the dump expires. Valid values for the year fall in the range1970 through 2037 (the latest possible date that the UNIX time representation can express is in early 2038). If you provide atime, it must be in 24-hour format with hh the hours and MM the minutes (for example, 21:50 is 9:50 p.m.). If you omit thetime, the default is 00:00 hours (12:00 midnight) on the indicated date.

• To define a relative expiration date, provide a value for the -expires argument with the following format:

[in] [yearsy] [monthsm] [daysd]


where each of years, months, and days is an integer. Provide at least one of them together with the corresponding units letter(y, m, or d respectively), with no intervening space. If you provide more than one of the three, list them in the indicated order.

The Backup System calculates a dump’s actual expiration date by adding the indicated relative value to the start time of thedump operation. For example, it assigns an expiration date 1 year, 6 months, and 2 days in the future to a dump created at adump level with associated expiration date in 1y 6m 2d.

• To indicate that a dump backed up at the corresponding dump level never expires, provide the value NEVER instead of adate and time. To recycle tapes that contain dumps created at such a level, you must use the backup readlabel command tooverwrite the tape’s label.

If you omit the -expires argument to the backup adddump command, then the expiration date is set to UNIX time zero (00:00hours on 1 January 1970). The Backup System considers dumps created at such a dump level to expire at their creation time. Ifno dumps in a dump set have an expiration date, then the Backup System does not impose any restriction on recycling the tapesthat contain the dump set. If you need to prevent premature recycling of the tapes that contain the dump set, you must use amanual tracking system.

6.8.4 To add a dump level to the dump hierarchy



2. Optional. Issue the backup command to enter interactive mode.

% backup

3. Issue the backup adddump command to define one or more dump levels. If you are defining an incremental level, thenall of the parent levels that precede it in its pathname must either already exist or precede it on the command line.

backup> adddump -dump <dump level name>+ [-expires <expiration date>+]

where

addd Is the shortest acceptable abbreviation of adddump.

-dump Names each dump level to added. If you specify more than one dump level name, you must include the -dumpswitch.Provide the entire pathname of the dump level, preceding each level in the pathname with a slash (/). Each componentlevel can be up to 28 characters in length, and the pathname can include up to 256 characters including the slashes.

-expires Sets the expiration date associated with each dump level. Specify either a relative or absolute expiration date, asdescribed in Defining Expiration Dates, or omit this argument to assign no expiration date to the dump levels.

NoteA plus sign follows this argument in the command’s syntax statement because it accepts a multiword value whichdoes not need to be enclosed in double quotes or other delimiters, not because it accepts multiple dates. Pro-vide only one date (and optionally, time) definition to be associated with each dump level specified by the -dumpargument.

6.8.5 To change a dump level’s expiration date





% backup

3. Issue the (backup) setexp command to change the expiration date associated with one or more dump levels.

backup> setexp -dump <dump level name>+ [-expires <expiration date>+]

where

se Is the shortest acceptable abbreviation of setexp.

-dump Names each existing dump level for which to change the expiration date.

-expires Sets the expiration date associated with each dump level. Specify either a relative or absolute expiration date, asdescribed in Defining Expiration Dates; omit this argument to remove the expiration date currently associated witheach dump level.

NoteA plus sign follows this argument in the command’s syntax statement because it accepts a multiword value whichdoes not need to be enclosed in double quotes or other delimiters, not because it accepts multiple dates. Pro-vide only one date (and optionally, time) definition to be associated with each dump level specified by the -dumpargument.

6.8.6 To delete a dump level from the dump hierarchy




% backup

3. Issue the (backup) deldump command to delete the dump level. Note that the command automatically removes allincremental dump levels for which the specified level serves as parent, either directly or indirectly.

backup> deldump <dump level name>

where

deld Is the shortest acceptable abbreviation of deldump.

dump level name Specifies the complete pathname of the dump level to delete.

6.8.7 To display the dump hierarchy

1. Issue the backup listdumps command to display the dump hierarchy.

% backup listdumps

where listd is the shortest acceptable abbreviation of listdumps.

The output from this command displays the dump hierarchy, reporting the expiration date associated with each dump level,as in the following example.


% backup listdumps/week1 expires in 27d

/tuesday expires in 13d/thursday expires in 13d

/sunday expires in 13d/tuesday expires in 13d

/thursday expires in 13d/week3 expires in 27d

/tuesday expires in 13d/thursday expires in 13d

/sunday expires in 13d/tuesday expires in 13d

/thursday expires in 13dsunday1 expires in 27d

/monday1 expires in 13d/tuesday1 expires in 13d/wednesday1 expires in 13d/thursday1 expires in 13d/friday1 expires in 13d

sunday2 expires in 27d/monday2 expires in 13d/tuesday2 expires in 13d/wednesday2 expires in 13d/thursday2 expires in 13d/friday2 expires in 13d



6.9 Writing and Reading Tape Labels

As described in Dump Names and Tape Names and Tape Labels, Dump Labels, and EOF Markers, you can assign either apermanent name or an AFS tape name to a tape that you use in the Backup System. The names are recorded on the tape’smagnetic label, along with an indication of the tape’s capacity (size).

You can assign either a permanent name or an AFS tape name, but not both. In general, assigning permanent names rather thanAFS tape names simplifies the backup process, because the Backup System does not dictate the format of permanent names. Ifa tape does not have a permanent name, then by default the Backup System accepts only three strictly defined values in the AFStape name field, and refuses to write a dump to a tape with an inappropriate AFS tape name. The acceptable values are a namethat matches the volume set and dump level of the initial dump, the value <NULL>, and no value in the field at all.

If a tape has a permanent name, the Backup System does not check the AFS tape name, and as part of the dump operationconstructs the appropriate AFS tape name itself and records it on the label. This means that if you assign a permanent name, theBackup System assigns an AFS tape name itself and the tape has both types of name. In contrast, if a tape has an AFS tape namebut not a permanent name, you cannot assign a permanent name without first erasing the AFS tape name.

(You can also suppress the Backup System’s check of a tape’s AFS tape name, even it does not have a permanent name, byassigning the value NO to the NAME_CHECK instruction in the device configuration file. See Eliminating the AFS Tape NameCheck.)


Because the Backup System accepts unlabeled tapes, you do not have to label a tape before using it for the first time. After thefirst use, there are a couple of cases in which you must relabel a tape in order to write a dump to it:

• The tape does not have a permanent name, and the AFS tape name on it does not match the new initial dump set you want tocreate (the volume set and dump level names are different, or the index is incorrect).

• You want to recycle a tape before all of the dumps on it have expired. The Backup System does not overwrite a tape withany unexpired dumps. Keep in mind, though, that if you relabel the tape to making recycling possible, you erase all the dumprecords for the tape from the Backup Database, which makes it impossible to restore any data from the tape.

NoteLabeling a tape that contains dump data makes it impossible to use that data in a restore operation, because the labelingoperation removes the dump’s records from the Backup Database. If you want to record a permanent name on a tape label,you must do it before dumping any data to the tape.

6.9.1 Recording a Name on the Label

To write a permanent name on a tape’s label, include the -pname argument to specify a string of up to 32 characters. Check thatno other tape used with the Backup System in your cell already has the permanent name you are assigning, because the BackupSystem does not prevent you from assigning the same name to multiple tapes. The Backup System overwrites the existing AFStape name, if any, with the value <NULL>. When a tape has a permanent name, the Backup System uses it instead of the AFStape name in most prompts and when referring to the tape in output from backup commands. The permanent name persists untilyou again include the -pname argument to the backup labeltape command, regardless of the tape’s contents and of how oftenyou recycle the tape or use the backup labeltape command without the -pname argument.

To write an AFS tape name on the label, provide a value for the -name argument that matches the volume set name and the finalelement in the dump level pathname of the initial dump that you plan to write to the tape, and an index that indicates the tape’splace in the sequence of tapes for the dump set. The format is as follows:

volume_set_name.dump_level_name.tape_index

If you omit the -name argument, the Backup System sets the AFS tape name to <NULL>. The Backup System automaticallyconstructs and records the appropriate name when you later write an initial dump to the tape by using the backup dump orbackup savedb command.

You cannot use the -name argument if the tape already has a permanent name. To erase a tape’s permanent name, provide a nullvalue to the -pname argument by issuing the following command:

% backup labeltape -pname ""

6.9.2 Recording a Capacity on the Label

To record the tape’s capacity on the label, specify a number of kilobytes as the -size argument. If you omit this argument thefirst time you label a tape, the Backup System records the default tape capacity associated with the specified port offset in the/usr/afs/backup/tapeconfig file on the Tape Coordinator machine. If the tape’s capacity is different (in particular, larger) thanthe capacity recorded in the tapeconfig file, it is best to record a capacity on the label before using the tape. Once set, the valuein the label’s capacity field persists until you again use the -size argument to the backup labeltape command. For a discussionof the appropriate capacity to record for tapes, see Configuring the tapeconfig File.

To read a tape’s label, use the backup readlabel command.

Most tapes also come with an adhesive label you can apply to the exterior casing. To help you easily identify a tape, record atleast the tape’s permanent and AFS tape names on the adhesive label. Depending on the recycling scheme you use, it can beuseful to record other information, such as the dump ID, dump creation date, and expiration date of each dump you write to thetape.


6.9.3 To label a tape



2. If the Tape Coordinator for the tape device that is to perform the operation is not already running, open a connection to theappropriate Tape Coordinator machine and issue the butc command, for which complete instructions appear in To start aTape Coordinator process.

% butc [<port offset>] [-noautoquery]

3. Place the tape in the device.

4. Optional. Issue the backup command to enter interactive mode, if you want to label multiple tapes or issue additionalcommands after labeling the tape. The interactive prompt appears in the following step.

% backup

5. Issue the (backup) labeltape command to label the tape.

backup> labeltape [-name <tape name, defaults to NULL>] \[-size <tape size in Kbytes, defaults to size in tapeconfig>] \[-portoffset <TC port offset>] [-pname <permanent tape name>]

where

la Is the shortest acceptable abbreviation of labeltape.

-name Specifies the AFS tape name to record on the label. Include this argument or the -pname argument, but not both.If you omit this argument, the AFS tape name is set to <NULL>. If you provide it, it must have the following format.

volume_set_name.dump_level_name.tape_index

for the tape to be acceptable for use in a future backup dump operation. The volume_set_name must match thevolume set name of the initial dump to be written to the tape, dump_level_name must match the last element of thedump level pathname at which the volume set is to be dumped, and tape_index must correctly indicate the tape’splace in the sequence of tapes that house the dump set; indexing begins with the number 1 (one).

-size Specifies the tape capacity to record on the label. If you are labeling the tape for the first time, you need to includethis argument only if the tape’s capacity differs from the capacity associated with the specified port offset in the/usr/afs/backup/tapeconfig file on the Tape Coordinator machine.If you provide a value, it is an integer value followed by a letter that indicates units, with no intervening space. Aunit value of k or K indicates kilobytes, m or M indicates megabytes, and g or G indicates gigabytes. If you omit theunits letter, the default is kilobytes.

-portoffset Specifies the port offset number of the Tape Coordinator handling the tape or backup data file for this operation.You must provide this argument unless the default value of 0 (zero) is appropriate.

-pname Specifies the permanent name to record on the label. It can be up to 32 characters in length, and include anyalphanumeric characters. Avoid metacharacters that have a special meaning to the shell, to avoid having to markthem as literal in commands issued at the shell prompt.Include this argument or the -name argument, but not both. When you provide this argument, the AFS tape name isset to <NULL>. If you omit this argument, any existing permanent name is retained.

6. If you did not include the -noautoquery flag when you issued the butc command, or if the device’s device configurationfile includes the instruction AUTOQUERY YES, then the Tape Coordinator prompts you to place the tape in the device’sdrive. You have already done so, but you must now press <Return> to indicate that the tape is ready for labeling.


6.9.4 To read the label on a tape





3. Place the tape in the device.

4. Optional. Issue the backup command to enter interactive mode, if you want to label multiple tapes or issue additionalcommands after labeling the tape. The interactive prompt appears in the following step.

% backup

5. Issue the (backup) readlabel command to read the label on the tape.

backup> readlabel [<TC port offset>]

where

rea Is the shortest acceptable abbreviation of readlabel.TC port offset Specifies the port offset number of Tape Coordinator handling the tape or backup data file for this opera-

tion. You must provide this argument unless the default value of 0 (zero) is appropriate.

6. If you did not include the -noautoquery flag when you issued the butc command, or the device’s device configuration fileincludes the instruction AUTOQUERY YES instruction, then the Tape Coordinator prompts you to place the tape in thedevice’s drive. You have already done so, but you must now press <Return> to indicate that the tape is ready for reading.

Information from the tape label appears both in the backup command window and in the Tape Coordinator window. The outputin the command window has the following format:

Tape read was labelled: tape_name (initial_dump_ID)size: size KBytes

where tape_name is the tape’s permanent name (if it has one) or AFS tape name, initial_dump_ID is the dump ID of the initialdump on the tape, and size is the capacity recorded on the label, in kilobytes.

The information in the Tape Coordinator window is more extensive. The tape’s permanent name appears in the tape namefield and its AFS tape name in the AFS tape name field. If either name is undefined, a value of <NULL> appears in the fieldinstead. The capacity recorded on the label appears in the size field. Other fields in the output report the creation time, dumplevel name, and dump ID of the initial dump on the tape (creationTime, dump path, and dump id respectively). Thecell field reports the cell in which the dump operation was performed, and the useCount field reports the number of timesthe tape has been relabeled, either with the backup labeltape command or during a dump operation. For further details, see thecommand’s reference page in the OpenAFS Administration Reference.

If the tape has no label, or if the drive is empty, the following message appears at the command shell:

Failed to read tape label.

The following example illustrates the output in the command shell for a tape in the device with port offset 1:

% backup readlabel 1Tape read was labelled: monthly_guest (917860000)

size: 2150000 KBytes


The following output appears in the Tape Coordinator window at the same time:

Tape label----------tape name = monthly_guestAFS tape name = guests.monthly.3creationTime = Mon Feb 1 04:06:40 1999cell = example.comsize = 2150000 Kbytesdump path = /monthlydump id = 917860000useCount = 44-- End of tape label --

6.10 Automating and Increasing the Efficiency of the Backup Process

The Backup System includes several optional features to help you automate the backup process in your cell and make it moreefficient. By combining several of the features, you can dump volume data to tape with minimal human intervention in mostcases. To take advantage of many of the features, you create a device configuration file in the /usr/afs/backup directory foreach tape device that participates in automated operations. For general instructions on creating the device configuration file, seeCreating a Device Configuration File. The following list refers you to sections that describe each feature in greater detail.

• You can use tape stackers and jukeboxes to perform backup operations. These are tape drives with an attached unit that storesseveral tapes and can physically insert and remove them from the tape reader (tape drive) without human intervention, meaningthat no operator has to be present even for backup operations that require several tapes. To use a stacker or jukebox with theBackup System, include the MOUNT and UNMOUNT instructions in its device configuration file. See Invoking a Device’sTape Mounting and Unmounting Routines.

• You can suppress the Tape Coordinator’s default prompt for the initial tape that it needs for a backup operation, again elim-inating the need for a human operator to be present when a backup operation begins. (You must still insert the correct tapein the drive at some point before the operation begins.) To suppress the initial prompt, include the -noautoquery flag on thebutc command, or assign the value NO to the AUTOQUERY instruction in the device configuration file. See Eliminating theSearch or Prompt for the Initial Tape.

• You can suppress the prompts that the Tape Coordinator otherwise generates when it encounters several types of errors. Whenyou use this feature, the Tape Coordinator instead responds to the errors in a default manner, which generally allows theoperation to continue without human intervention. To suppress prompts about error conditions, assign the value NO to theASK instruction in the device configuration file. See Enabling Default Responses to Error Conditions.

• You can suppress the Backup System’s default verification that the AFS tape name on a tape that has no permanent namematches the name derived from the volume set and dump level names of the initial dump the Backup System is writing to thetape. This enables you to recycle a tape without first relabeling it, as long as all dumps on it are expired. To suppress namechecking, assign the value NO to the NAME_CHECK instruction in the device configuration file. See Eliminating the AFSTape Name Check.

• You can promote tape streaming (the most efficient way for a tape device to operate) by setting the size of the memory buffer theTape Coordinator uses when transferring volume data between the file system and the device. To set the buffer size, include theBUFFERSIZE instruction in the device configuration file. See Setting the Memory Buffer Size to Promote Tape Streaming.

• You can write dumps to a backup data file on the local disk of the Tape Coordinator machine, rather than to tape. You can thentransfer the backup data file to a data-archiving system, such as a hierarchical storage management (HSM) system, that you usein conjunction with AFS and the Backup System. Writing a dump to a file is usually more efficient that issuing the equivalentvos dump commands individually. To write dumps to a file, include the FILE instruction in the device configuration file. SeeDumping Data to a Backup Data File.

There are two additional ways to increase backup automation and efficiency that do not involve the device configuration file:


• You can schedule one or more backup dump commands to run at specified times. This enables you to create backups at timesof low system usage, without requiring a human operator to be present. You can schedule a single dump operation for a futuretime, or multiple operations to run at various future times. See Scheduling Dumps.

• You can append dumps to a tape that already has other dumps on it. This enables you to use as much of a tape’s capacity aspossible. The appended dumps do not have be related in any way to one another or to the initial dump on the tape, but groupingdumps appropriately can reduce the number of necessary tape changes during a restore operation. To append a dump, includethe -append argument to the backup dump command. See Appending Dumps to an Existing Dump Set.

6.10.1 Creating a Device Configuration File

To use many of the features that automate backup operations, create a configuration file for each tape device in the /usr/af-s/backup directory on the local disk of the Tape Coordinator machine that drives the device. The filename has the followingform:

CFG_device_name

where device_name represents the name of the tape device or backup data file (see Dumping Data to a Backup Data File to learnabout writing dumps to a file rather than to tape).

For a tape device, construct the device_name portion of the name by stripping off the initial /dev/ string with which all UNIXdevice names conventionally begin, and replacing any other slashes in the name with underscores. For example, CFG_rmt_4mis the appropriate filename for a device called /dev/rmt/4m.

For a backup data file, construct the device_name portion by stripping off the initial slash (/) and replacing any other slashes(/) in the name with underscores. For example, CFG_var_tmp_FILE is the appropriate filename for a backup data file called/var/tmp/FILE.

Creating a device configuration file is optional. If you do not want to take advantage of any of the features that the file provides,you do not have to create it.

You can include one of each of the following instructions in any order in a device configuration file. All are optional. Place eachinstruction on its own line, but do not include any newline (<Return>) characters within an instruction.

MOUNT and UNMOUNT Identify a script of routines for mounting and unmounting tapes in a tape stacker or jukebox’s driveas needed. See Invoking a Device’s Tape Mounting and Unmounting Routines.

AUTOQUERY Controls whether the Tape Coordinator prompts for the first tape it needs for a backup operation. See Eliminat-ing the Search or Prompt for the Initial Tape.

ASK Controls whether the Tape Coordinator asks you how to respond to certain error conditions. See Enabling Default Re-sponses to Error Conditions.

NAME_CHECK Controls whether the Tape Coordinator verifies that an AFS tape name matches the initial dump you arewriting to the tape. See Eliminating the AFS Tape Name Check.

BUFFERSIZE Sets the size of the memory buffer the Tape Coordinator uses when transferring data between a tape device anda volume. See Setting the Memory Buffer Size to Promote Tape Streaming.

FILE Controls whether the Tape Coordinator writes dumps to, and restores data from, a tape device or a backup data file. SeeDumping Data to a Backup Data File.

6.10.2 Invoking a Device’s Tape Mounting and Unmounting Routines

A tape stacker or jukebox helps you automate backup operations because it can switch between multiple tapes during an operationwithout human intervention. To take advantage of this feature, include the MOUNT and optionally UNMOUNT instructions inthe device configuration file that you write for the stacker or jukebox. The instructions share the same syntax:

MOUNT filenameUNMOUNT filename


where filename is the pathname on the local disk of a script or program you have written that invokes the routines defined by thedevice’s manufacturer for mounting or unmounting a tape in the device’s tape drive. (For convenience, the following discussionuses the term script to refers to both scripts and programs.) The script usually also contains additional logic that handles errorconditions or modifies the script’s behavior depending on which backup operation is being performed.

You can refer to different scripts with the MOUNT or UNMOUNT instructions, or to a single script that invokes both mountingand unmounting routines. The scripts inherit the local identity and AFS tokens associated with to the issuer of the butc command.

You need to include a MOUNT instruction in the device configuration file for all tape devices, but the need for an UNMOUNTinstruction depends on the tape-handling routines that the device’s manufacturer provides. Some devices, usually stackers, haveonly a single routine for mounting tapes, which also automatically unmounts a tape whose presence prevents insertion of therequired new tape. In this case, an UNMOUNT instruction is not necessary. For devices that have separate mounting andunmounting routines, you must include an UNMOUNT instruction to remove a tape when the Tape Coordinator is finished withit; otherwise, subsequent attempts to run the MOUNT instruction fail with an error.

When the device configuration file includes a MOUNT instruction, you must stock the stacker or jukebox with the necessarytapes before running a backup operation. Many jukeboxes are able to search for the required tape by reading external labels (suchas barcodes) on the tapes, but many stackers can only switch between tapes in sequence and sometimes only in one direction. Inthe latter case, you must also stock the tapes in the correct order.

To obtain a list of the tapes required for a restore operation so that you can prestock them in the tape device, include the -n flag onthe appropriate backup command (backup diskrestore, backup volrestore, or backup volsetrestore). For a dump operation,it is generally sufficient to stock the device with more tapes than the operation is likely to require. You can prelabel the tapeswith permanent names or AFS tape names, or not prelabel them at all. If you prelabel the tapes for a dump operation with AFStape names, then it is simplest to load them into the stacker in sequential order by tape index. But it is probably simpler still toprelabel tapes with permanent tape names or use unlabeled tapes, in which case the Backup System generates and applies theappropriately indexed AFS tape name itself during the dump operation.

6.10.2.1 How the Tape Coordinator Uses the MOUNT and UNMOUNT Instructions

When you issue the butc command to initialize the Tape Coordinator for a given tape device, the Tape Coordinator looks forthe device configuration file called /usr/afs/backup/CFG_device_name on its local disk, where device_name has the formatdescribed in Creating a Device Configuration File. If the file exists and contains a MOUNT instruction, then whenever the TapeCoordinator needs a tape, it executes the script named by the instruction’s filename argument.

If the device configuration file does not exist, or does not include a MOUNT instruction, then whenever the Tape Coordinatorneeds a tape, it generates a prompt in its window instructing the operator to insert the necessary tape. The operator must insertthe tape and press <Return> before the Tape Coordinator continues the backup operation.

Note, however, that you can modify the Tape Coordinator’s behavior with respect to the first tape needed for an operation, bysetting the AUTOQUERY instruction in the device configuration file to NO, or including the -noautoquery flag to the butccommand. In this case, the Tape Coordinator does not execute the MOUNT instruction or prompt for a tape at the start of anoperation, because it expects to find the required first tape in the drive. See Eliminating the Search or Prompt for the Initial Tape.

If there is an UNMOUNT instruction in the device configuration file, then whenever the Tape Coordinator closes the tape device,it executes the script named by the instruction’s filename argument. It executes the script only once, and regardless of whetherthe close operation on the device succeeded or not. If the device configuration file does not include an UNMOUNT instruction,then the Tape Coordinator takes no action.

6.10.2.2 The Available Parameters and Required Exit Codes

When the Tape Coordinator executes the MOUNT script, it passes in five parameters, ordered as follows. You can use theparameters in your script to refine its response to varying circumstances that can arise during a backup operation.

1. The tape device or backup data file’s pathname, as recorded in the /usr/afs/backup/tapeconfig file.

2. The tape operation, which (except for the exceptions noted in the following list) matches the backup command operationcode used to initiate the operation:

• appenddump (when a backup dump command includes the -append flag)


• dump (when a backup dump command does not include the -append flag)

• labeltape• readlabel• restore (for a backup diskrestore, backup volrestore, or backup volsetrestore command)

• restoredb• savedb• scantape

3. The number of times the Tape Coordinator has attempted to open the tape device or backup data file. If the open attemptreturns an error, the Tape Coordinator increments this value by one and again invokes the MOUNT instruction.

4. The tape name. For some operations, the Tape Coordinator passes the string none, because it does not know the tape name(when running the backup scantape or backup readlabel, for example), or because the tape does not necessarily have aname (when running the backup labeltape command, for example).

5. The tape ID recorded in the Backup Database. As with the tape name, the Backup System passes the string none foroperations where it does not know the tape ID or the tape does not necessarily have an ID.

Your MOUNT script must return one of the following exit codes to tell the Tape Coordinator whether or not it mounted the tapesuccessfully:

• Code 0 (zero) indicates a successful mount, and the Tape Coordinator continues the backup operation. If the script or programcalled by the MOUNT instruction does not return this exit code, the Tape Coordinator never calls the UNMOUNT instruction.

• Code 1 indicates that mount attempt failed. The Tape Coordinator terminates the backup operation.

• Any other code indicates that the script was unable to access the correct tape. The Tape Coordinator prompts the operator toinsert it.

When the Tape Coordinator executes the UNMOUNT script, it passes in two parameters in the following order.

1. The tape device’s pathname (as specified in the /usr/afs/backup/tapeconfig file)

2. The tape operation, which is always unmount.

The following example script uses two of the parameters passed to it by the Backup System: tries and operation. It followsthe recommended practice of exiting if the value of the tries parameter exceeds one, because that implies that the stacker isout of tapes.

For a backup dump or backup savedb operation, the routine calls the example stackerCmd_NextTape function provided bythe stacker’s manufacturer. Note that the final lines in the file return the exit code that prompts the operator to insert a tape; theselines are invoked when either the stacker cannot load a tape or a the operation being performed is not one of those explicitlymentioned in the file (is a restore operation, for example).

#! /bin/csh -fset devicefile = $1set operation = $2set tries = $3set tapename = $4set tapeid = $5set exit_continue = 0set exit_abort = 1set exit_interactive = 2#--------------------------------------------if (${tries} > 1) then

echo "Too many tries"exit ${exit_interactive}

endifif (${operation} == "unmount") then


echo "UnMount: Will leave tape in drive"exit ${exit_continue}

endifif ((${operation} == "dump") |\

(${operation} == "appenddump") |\(${operation} == "savedb")) thenstackerCmd_NextTape ${devicefile}if (${status} != 0)exit${exit_interactive}echo "Will continue"exit ${exit_continue}

endifif ((${operation} == "labeltape") |\

(${operation} == "readlabel")) thenecho "Will continue"exit ${exit_continue}

endifecho "Prompt for tape"exit ${exit_interactive}

6.10.3 Eliminating the Search or Prompt for the Initial Tape

By default, the Tape Coordinator obtains the first tape it needs for a backup operation by reading the device configuration file forthe appropriate tape device. If there is a MOUNT instruction in the file, the Tape Coordinator executes the referenced script. Ifthe device configuration file does not exist or does not have a MOUNT instruction in it, the Tape Coordinator prompts you toinsert the correct tape and press <Return>.

If you know in advance that an operation requires a tape, you can increase efficiency by placing the required tape in the drivebefore issuing the backup command and telling the Tape Coordinator’s to skip its initial tape-acquisition steps. This both enablesthe operation to begin more quickly and eliminates that need for you to be present to insert a tape.

There are two ways to bypass the Tape Coordinator’s initial tape-acquisition steps:

1. Include the instruction AUTOQUERY NO in the device configuration file

2. Include the -noautoquery flag to the butc command

To avoid any error conditions that require operator attention, be sure that the tape you are placing in the drive does not containany unexpired dumps and is not write protected. If there is no permanent name on the tape’s label and you are creating an initialdump, make sure that the AFS tape name either matches the volume set and dump set names or is <NULL>. Alternatively,suppress the Tape Coordinator’s name verification step by assigning the value NO to the NAME_CHECK instruction in thedevice configuration file, as described in Eliminating the AFS Tape Name Check.

6.10.4 Enabling Default Responses to Error Conditions

By default, the Tape Coordinator asks you how to respond when it encounters certain error conditions. To suppress the promptsand cause the Tape Coordinator to handle the errors in a predetermined manner, include the instruction ASK NO in the deviceconfiguration file. If you assign the value YES, or omit the ASK instruction completely, the Tape Coordinator prompts you fordirection when it encounters one of the errors.

The following list describes the error conditions and the Tape Coordinator’s response to them.

• The Backup System is unable to dump a volume while running the backup dump command. When you assign the valueNO, the Tape Coordinator omits the volume from the dump and continues the operation. When you assign the value YES, itprompts to ask if you want to try to dump the volume again immediately, to omit the volume from the dump but continue theoperation, or to terminate the operation.


• The Backup System is unable to restore a volume while running the backup diskrestore, backup volrestore, or backup volse-trestore command. When you assign the value NO, the Tape Coordinator continues the operation, omitting the problematicvolume but restoring the remaining ones. When you assign the value YES, it prompts to ask if you want to omit the volumeand continue the operation, or to terminate the operation.

• The Backup System cannot determine if the dump set includes any more tapes while running the backup scantape command(the command’s reference page in the OpenAFS Administration Reference discusses possible reasons for this problem). Whenyou assign the value NO, the Tape Coordinator proceeds as though there are more tapes and invokes the MOUNT script namedin the device configuration file, or prompts the operator to insert the next tape. When you assign the value YES, it prompts toask if there are more tapes to scan.

• The Backup System determines that the tape contains an unexpired dump while running the backup labeltape command.When you assign the value NO, it terminates the operation without relabeling the tape. With a YES value, the Tape Coordinatorprompts to ask if you want to relabel the tape anyway.

6.10.5 Eliminating the AFS Tape Name Check

If a tape does not have a permanent name and you are writing an initial dump to it, then by default the Backup System verifiesthat the tape’s AFS tape name is acceptable. It accepts three types of values:

• A name that reflects the volume set and dump level of the initial dump and the tape’s place in the sequence of tapes for thedump set, as described in Dump Names and Tape Names. If the tape does not already have a permanent name, you can assignthe AFS tape name by using the -name argument to the backup labeltape command.

• A <NULL> value, which results when you assign a permanent name, or provide no value for the backup labeltape command’s-name argument.

• No AFS tape name at all, indicating that you have never labeled the tape or written a dump to it.

To bypass the name check, include the NAME_CHECK NO instruction in the device configuration file. This enables you torecycle a tape without first relabeling it, as long as all dumps on it are expired. (If a tape has unexpired dumps on it but you wantto recycle it anyway, you must use the backup labeltape command to relabel it first. For this to work, the ASK NO instructioncannot appear in the device configuration file.)

6.10.6 Setting the Memory Buffer Size to Promote Tape Streaming

By default, the Tape Coordinator uses a 16-KB memory buffer during dump operations. As it receives volume data from theVolume Server, the Tape Coordinator gathers 16 KB of data in the buffer before transferring the entire 16 KB to the tapedevice. Similarly, during a restore operation the Tape Coordinator by default buffers 32 KB of data from the tape device beforetransferring the entire 32 KB to the Volume Server for restoration into the file system. Buffering makes the volume of dataflowing to and from a tape device more even and so promotes tape streaming, which is the most efficient way for a tape device tooperate.

In a normal network configuration, the default buffer sizes are usually large enough to promote tape streaming. If the networkbetween the Tape Coordinator machine and file server machines is slow, it can help to increase the buffer size.

To determine if altering the buffer size is helpful for your configuration, observe the tape device in operation to see if it isstreaming, or consult the manufacturer. To set the buffer size, include the BUFFERSIZE instruction in the device configurationfile. It takes an integer value, and optionally units, in the following format:

BUFFERSIZE size[{k | K | m | M | g | G}]

where size specifies the amount of memory the Tape Coordinator allocates to use as a buffer during both dump and restoreoperations. The default unit is bytes, but use k or K to specify kilobytes, m or M for megabytes, and g or G for gigabytes. Thereis no space between the size value and the units letter.


6.10.7 Dumping Data to a Backup Data File

You can write dumps to a backup data file rather than to tape. This is useful if, for example, you want to transfer the data to adata-archiving system, such as a hierarchical storage management (HSM) system, that you use in conjunction with AFS and theBackup System. You can restore data from a backup data file into the file system as well. Using a backup data file is usuallymore efficient than issuing the equivalent vos dump and vos restore commands individually for multiple volumes.

Writing to a backup data file is simplest if it is on the local disk of the Tape Coordinator machine, but you can also write the fileto an NFS-mounted partition that resides on a remote machine. It is even acceptable to write to a file in AFS, provided that theaccess control list (ACL) on its parent directory grants the necessary permissions, but it is somewhat circular to back up AFSdata into AFS itself.

If the backup data file does not already exist when the Tape Coordinator attempts to write a dump to it, the Tape Coordinatorcreates it. For a restore operation to succeed, the file must exist and contain volume data previously written to it during a backupdump operation.

When writing to a backup data file, the Tape Coordinator writes data at 16 KB offsets. If a given block of data (such as themarker that signals the beginning or end of a volume) does not fill the entire 16 KB, the Tape Coordinator still skips to the nextoffset before writing the next block. In the output of a backup dumpinfo command issued with the -id option, the value in thePos column is the ordinal of the 16-KB offset at which the volume data begins, and so is not generally only one higher than theposition number on the previous line, as it is for dumps to tape.

Before writing to a backup data file, you need to configure the file as though it were a tape device.

NoteA file pathname, rather than a tape device name, must appear in the third field of the /usr/afs/backup/tapeconfig file whenthe FILE YES instruction appears in the device configuration file, and vice versa. If the tapeconfig file instead refers to a tapedevice, dump operations appear to succeed but are inoperative. You cannot restore data that you accidently dumped to a tapedevice while the FILE instruction was set to YES. In the same way, if the FILE instruction is set to NO, the tapeconfig entrymust refer to an actual tape device.

6.10.8 To configure a backup data file






# backup

4. Choose the port offset number to assign to the file. If necessary, display previously assigned port offsets by issuing the(backup) listhosts command, which is fully described in To display the list of configured Tape Coordinators.

backup> listhosts

As for a tape device, acceptable values are the integers 0 (zero) through 58510 (the Backup System can track a maximumof 58,511 port offset numbers). Each port offset must be unique in the cell, but you can associate any number them with asingle Tape Coordinator machine. You do not have to assign port offset numbers sequentially.

5. Issue the (backup) addhost command to register the backup data file’s port offset in the Backup Database.


backup> addhost <tape machine name> [<TC port offset>]

where

addh Is the shortest acceptable abbreviation of addhost.tape machine name Specifies the fully qualified hostname of the Tape Coordinator machine you invoke to write to the

backup data file.

TC port offset Specifies the file’s port offset number. You must provide this argument unless the default value of 0 (zero)is appropriate.

6. Using a text editor, create an entry for the backup data file in the local /usr/afs/backup/tapeconfig file, using the standardsyntax:

[capacity filemark_size] device_name port_offset

where

capacity Specifies the amount of space on the partition that houses the backup data file that you want to make availablefor the file. To avoid the complications that arise from filling up the partition, it is best to provide a value somewhatsmaller than the actual amount of space you expect to be available when the dump operation runs, and never largerthan the maximum file size allowed by the operating system.Specify a numerical value followed by a letter that indicates units, with no intervening space. The letter k or Kindicates kilobytes, m or M indicates megabytes, and g or G indicates gigabytes. If you omit the units letter, thedefault is kilobytes. If you leave this field empty, the Tape Coordinator uses the maximum acceptable value (2048GB or 2 TB). Also leave the filemark_size field empty in that case.

filemark_size Specify the value 0 (zero) or leave both this field and the capacity field empty. In the latter case, the TapeCoordinator also uses the value zero.

device_name Specifies the complete pathname of the backup data file. Rather than specifying an actual file pathname,however, the recommended configuration is to create a symbolic link in the /dev directory that points to the actualfile pathname, and record the symbolic link in this field. This configuration provides these advantages:

• It makes the device_name portion of the CFG_device_name, of the TE_device_name, and of the TL_device_namefilenames as short as possible. Because the symbolic link is in the /dev directory as though it is a tape device, youstrip off the entire /dev/ prefix when forming the filename, instead of just the initial slash (/). If, for example, thesymbolic link is called /dev/FILE, the device configuration file’s name is CFG_FILE, whereas if the actual path-name /var/tmp/FILE appears in the tapeconfig file, the configuration file’s name must be CFG_var_tmp_FILE.

• It provides for a more graceful, and potentially automated, recovery if the Tape Coordinator cannot write a completedump into the backup data file (for example, because the partition housing the backup data file becomes full). TheTape Coordinator’s reaction to this problem is to invoke the MOUNT script, or to prompt you if the MOUNTinstruction does not appear in the configuration file.– If there is a MOUNT script, you can prepare for this situation by adding a subroutine to the script that changes

the symbolic link to point to another backup data file on a partition where there is space available.– If there is no MOUNT instruction, the prompt enables you manually to change the symbolic link to point to

another backup data file and then press <Return> to signal that the Tape Coordinator can continue the operation.If this field names the actual file, there is no way to recover from exhausting the space on the partition. You cannotchange the tapeconfig file in the middle of an operation.

port_offset Specifies the port offset number that you chose for the backup data file.

7. Create the device configuration file CFG_device_name in the Tape Coordinator machine’s /usr/afs/backup directory.Include the FILE YES instruction in the file.

Construct the device_name portion of the name based on the device name you recorded in the tapeconfig file in Step 6. If,as recommended, you recorded a symbolic link name, strip off the /dev/ string and replace any other slashes (/) in the namewith underscores (_). For example, CFG_FILE is the appropriate name if the symbolic link is /dev/FILE. If you recordedthe name of an actual file, then strip off the initial slash only and replace any other slashes in the name with underscores.For a backup data file called /var/tmp/FILE, the appropriate device configuration filename is CFG_var_tmp_FILE.


8. If you chose in Step 6 to record a symbolic link name in the device_name field of the tapeconfig entry, then you must doone of the following:

• Use the ln -s command to create the appropriate symbolic link in the /dev directory

• Write a script that initializes the backup data file in this way, and include a MOUNT instruction in the device configura-tion file to invoke the script. An example script appears following these instructions.

You do not need to create the backup data file itself, because the Tape Coordinator does so if the file does not exist when thedump operation begins.

The following example script illustrates how you can automatically create a symbolic link to the backup data file during thepreparation phase for writing to the file. When the Tape Coordinator is executing a backup dump, backup restore, backupsavedb, or backup restoredb operation, the routine invokes the UNIX ln -s command to create a symbolic link from the backupdata file named in the tapeconfig file to the actual file to use (this is the recommended method). It uses the values of thetapename and tapeid parameters passed to it by the Backup System when constructing the filename.

The routine makes use of two other parameters as well: tries and operation. The tries parameter tracks how manytimes the Tape Coordinator has attempted to access the file. A value greater than one indicates that the Tape Coordinator cannotaccess it, and the routine returns exit code 2 (exit_interactive), which results in a prompt for the operator to load a tape.The operator can use this opportunity to change the name of the backup data file specified in the tapeconfig file.

#! /bin/csh -fset devicefile = $1set operation = $2set tries = $3set tapename = $4set tapeid = $5set exit_continue = 0set exit_abort = 1set exit_interactive = 2#--------------------------------------------if (${tries} > 1) then

echo "Too many tries"exit ${exit_interactive}

endifif (${operation} == "labeltape") then

echo "Won’t label a tape/file"exit ${exit_abort}

endifif ((${operation} == "dump") |\

(${operation} == "appenddump") |\(${operation} == "restore") |\(${operation} == "savedb") |\(${operation} == "restoredb")) then

/bin/rm -f ${devicefile}/bin/ln -s /hsm/${tapename}_${tapeid} ${devicefile}if (${status} != 0) exit ${exit_abort}

endifexit ${exit_continue}


Chapter 7

Backing Up and Restoring AFS Data

The instructions in this chapter explain how to back up and restore AFS data and to administer the Backup Database. Theyassume that you have already configured all of the Backup System components by following the instructions in Configuring theAFS Backup System.



Enter interactive mode backup (interactive)Leave interactive mode (backup) quitList operations in interactive mode (backup) jobsCancel operation in interactive mode (backup) killStart Tape Coordinator butcStop Tape Coordinator <Ctrl-c>Check status of Tape Coordinator backup statusBack up data backup dumpDisplay dump records backup dumpinfoDisplay volume’s dump history backup volinfoScan contents of tape backup scantapeRestore volume backup volrestoreRestore partition backup diskrestoreRestore group of volumes backup volsetrestoreVerify integrity of Backup Database backup dbverify

Repair corruption in Backup Database backup savedb and backuprestoredb

Delete dump set from Backup Database backup deletedump

7.2 Using the Backup System’s Interfaces

When performing backup operations, you interact with three Backup System components:

• You initiate backup operations by issuing commands from the backup suite. You can issue the commands in a command shell(or invoke them in a shell script) on any AFS client or server machine from which you can access the backup binary. In theconventional configuration, the binary resides in the /usr/afs/bin directory on a server machine and the /usr/afsws/etc directoryon a client machine.

The suite provides an interactive mode, in which you can issue multiple commands over a persistent connection to the Backup


Server and the Volume Location (VL) Server. Interactive mode has several convenient features. For a discussion and instruc-tions, see Using Interactive and Regular Command Mode.

Note that some operating systems include a backup command of their own. You must configure machines that run such anoperating system to ensure that you are accessing the desired backup binary.

• Before you perform a backup operation that involves reading or writing to a tape device or backup data file, you must open adedicated connection to the appropriate Tape Coordinator machine and start the Tape Coordinator (butc) process that handlesthe device or file. The butc process must continue to run over the dedicated connection as long as it is executing an operationor is to be available to execute one. For further discussion and instructions, see Starting and Stopping the Tape CoordinatorProcess.

• The Backup Server (buserver) process must be running on database server machines, because most backup operations requireaccessing or changing information in the Backup Database. The OpenAFS Quick Beginnings explains how to configure theBackup Server.

For consistent Backup System performance, the AFS build level of all three binaries (backup, butc, and buserver) must match.For instructions on displaying the build level, see Displaying A Binary File’s Build Level.

7.2.1 Performing Backup Operations as the Local Superuser Root or in a Foreign Cell

By default, the volumes and Backup Database involved in a backup operation must reside on server machines that belong tothe cell named in the /usr/vice/etc/ThisCell files on both the Tape Coordinator machine and the machine where you issue thebackup command. Also, to issue most backup commands you must have AFS tokens for an identity listed in the local cell’s/usr/afs/etc/UserList file (which by convention is the same on every server machine in a cell). You can, however, perform backupoperations on volumes or the Backup Database from a foreign cell, or perform backup operations while logged in as the localsuperuser root rather than as a privileged AFS identity.

To perform backup operations on volumes that reside in a foreign cell using machines from the local cell, you must designatethe foreign cell as the cell of execution for both the Tape Coordinator and the backup command interpreter. Use one of thetwo following methods. For either method, you must also have tokens as an administrator listed in the foreign cell’s /usr/af-s/etc/UserList file.

• Before issuing backup commands and the butc command, set the AFSCELL environment variable to the foreign cell name inboth command shells.

• Include the -cell argument to the butc and all backup commands. If you include the argument on the backup (interactive)command, it applies to all commands issued during the interactive session.

To perform backup operations without having administrative AFS tokens, you must log on as the local superuser root on boththe Tape Coordinator machine and the machine where you issue backup commands. Both machines must be server machines, orat least have a /usr/afs/etc/KeyFile file that matches the file on other server machines. Then include the -localauth argument onboth the butc command and all backup commands (or the backup (interactive) command). The Tape Coordinator and backupcommand interpreter construct a server ticket using the server encryption key with the highest key version number in the local/usr/afs/etc/KeyFile file, and present it to the Backup Server, Volume Server, and VL Server that belong to the cell named in thelocal /usr/afs/etc/ThisCell file. The ticket never expires.

You cannot combine the -cell and -localauth options on the same command. Also, each one overrides the local cell settingdefined by the AFSCELL environment variable or the /usr/vice/etc/ThisCell file.

7.2.2 Using Interactive and Regular Command Mode

The backup command suite provides an interactive mode, in which you can issue multiple commands over a persistent connectionto the Backup Server and the VL Server. Interactive mode provides the following features:

• The backup> prompt replaces the usual command shell prompt.

• You omit the initial backup string from command names. Type only the operation code and option names.


• You cannot issue commands that do not belong to the backup suite.

• If you assume an administrative AFS identity or specify a foreign cell as you enter interactive mode, it applies to all commandsissued during the interactive session. See Performing Backup Operations as the Local Superuser Root or in a Foreign Cell.

• You do not need to enclose shell metacharacters in double quotes.

When you initiate a backup operation in interactive mode, the Backup System assigns it a job ID number. You can display thelist of current and pending operations with the (backup) jobs command, for which instructions appear in To display pending orrunning jobs in interactive mode. (In both regular and interactive modes, the Tape Coordinator also assigns a task ID number toeach operation you initiate with a backup command. You can track task ID numbers with the backup status command. SeeStarting and Stopping the Tape Coordinator Process.)

You can cancel an operation in interactive mode with the (backup) kill command, for which instructions appear in To canceloperations in interactive mode. However, it is best not to interrupt a dump operation because the resulting dump is incomplete,and interrupting a restore operation can leave volumes in an inconsistent state, or even completely remove them from the servermachine. For further discussion, see Backing Up Data and Restoring and Recovering Data.

The (backup) jobs and (backup) kill commands are available only in interactive mode and there is no equivalent functionalityin regular command mode.

7.2.3 To enter interactive mode

1. Verify that you are authenticated as a user listed in the /usr/afs/etc/UserList file. Entering interactive mode does not itselfrequire privilege, but most other backup commands do, and the AFS identity you assume when entering the mode appliesto all commands you issue within it. If necessary, issue the bos listusers command, which is fully described in To displaythe users in the UserList file.


2. Issue the backup (interactive) command at the system prompt. The backup> prompt appears. You can include either,but not both, of the -localauth and -cell options, as discussed in Performing Backup Operations as the Local SuperuserRoot or in a Foreign Cell.

% backupbackup>

7.2.4 To exit interactive mode

1. Issue the quit command at the backup> prompt. The command shell prompt reappears when the command succeeds,which it does only if there are no jobs pending or currently running. To display and cancel pending or running jobs, followthe instructions in To display pending or running jobs in interactive mode and To cancel operations in interactive mode.

backup> quit%

7.2.5 To display pending or running jobs in interactive mode

1. Issue the jobs command at the backup> prompt.

backup> jobs

where

j Is the shortest acceptable abbreviation of jobs.


The output always includes the expiration date and time of the tokens that the backup command interpreter is using during thecurrent interactive session, in the following format:

date time: TOKEN EXPIRATION

If the execution date and time specified for a scheduled dump operation is later than date time, then its individual line (asdescribed in the following paragraphs) appears below this line to indicate that the current tokens will not be available to it.

If the issuer of the backup command included the -localauth flag when entering interactive mode, the line instead reads asfollows:

: TOKEN NEVER EXPIRES

The entry for a scheduled dump operation has the following format:

Job job_ID: timestamp: dump volume_set dump_level

where

job_ID Is a job identification number assigned by the Backup System.

timestamp Indicates the date and time the dump operation is to begin, in the format month/date/year hours:minutes (in 24-hourformat)

volume_set Indicates the volume set to dump.

dump_level Indicates the dump level at which to perform the dump operation.

The line for a pending or running operation of any other type has the following format:

Job job_ID: operation status

where

job_ID Is a job identification number assigned by the Backup System.

operation Identifies the operation the Tape Coordinator is performing, which is initiated by the indicated command:

Dump (dump name) Initiated by the backup dump command. The dump name has the following format:volume_set_name.dump_level_name

Restore Initiated by the backup diskrestore, backup volrestore, or backup volsetrestore command.

Labeltape (tape_label) Initiated by the backup labeltape command. The tape_label is the name specified by thebackup labeltape command’s -name or -pname argument.

Scantape Initiated by the backup scantape command.

SaveDb Initiated by the backup savedb command.

RestoreDb Initiated by the backup restoredb command.

status Indicates the job’s current status in one of the following messages. If no message appears, the job is either still pendingor has finished.

number Kbytes, volume volume_name For a running dump operation, indicates the number of kilobytes copiedto tape or a backup data file so far, and the volume currently being dumped.

number Kbytes, restore.volume For a running restore operation, indicates the number of kilobytes copied intoAFS from a tape or a backup data file so far.

[abort requested] The (backup) kill command was issued, but the termination signal has yet to reach the TapeCoordinator.

[abort sent] The operation is canceled by the (backup) kill command. Once the Backup System removes an opera-tion from the queue or stops it from running, it no longer appears at all in the output from the command.


[butc contact lost] The backup command interpreter cannot reach the Tape Coordinator. The message can meaneither that the Tape Coordinator handling the operation was terminated or failed while the operation was running, orthat the connection to the Tape Coordinator timed out.

[done] The Tape Coordinator has finished the operation.

[drive wait] The operation is waiting for the specified tape drive to become free.

[operator wait] The Tape Coordinator is waiting for the backup operator to insert a tape in the drive.

7.2.6 To cancel operations in interactive mode

1. Issue the jobs command at the backup> prompt, to learn the job ID number of the operation you want to cancel. Fordetails, see To display pending or running jobs in interactive mode.

backup> jobs

2. Issue the (backup) kill command to cancel the operation.

backup> kill <job ID or dump set name>

where

k Is the shortest acceptable abbreviation of kill.job ID or dump set name Specifies either the job ID number of the operation to cancel, as reported by the jobs command,

or for a dump operation only, the dump name in the format volume_set_name.dump_level_name.

7.2.7 Starting and Stopping the Tape Coordinator Process

Before performing a backup operation that reads from or writes to a tape device or backup data file, you must start the TapeCoordinator (butc) process that handles the drive or file. This section explains how to start, stop, and check the status of a TapeCoordinator process. To use these instructions, you must have already configured the Tape Coordinator machine and created aTape Coordinator entry in the Backup Database, as instructed in Configuring Tape Coordinator Machines and Tape Devices.

The Tape Coordinator assigns a task ID number to each operation it performs. The number is distinct from the job ID numberassigned by the backup command interpreter in interactive mode (which is discussed in Using Interactive and Regular CommandMode). The Tape Coordinator reports the task ID number in its onscreen trace and in the messages that it writes to its log and errorfiles. To view the task ID numbers of a Tape Coordinator’s running or pending operations, issue the backup status command.

7.2.8 To start a Tape Coordinator process

1. Verify that you are authenticated as a user listed in the /usr/afs/etc/UserList file of the cell in which the Tape Coordinatoris to access volume data and the Backup Database. If necessary, issue the bos listusers command, which is fully describedin To display the users in the UserList file.


Alternately, you can log into a file server machine as the local superuser root in Step 3.

2. Verify that you can write to the Tape Coordinator’s log and error files in the local /usr/afs/backup directory (the TE_device_nameand TL_device_name files). If the log and error files do not already exist, you must be able to insert and write to files inthe /usr/afs/backup directory.

3. Open a connection (using a command such as telnet or rlogin) to the Tape Coordinator machine that drives the tape device,or whose local disk houses the backup data file. The Tape Coordinator uses a devoted connection or window that mustremain open for the Tape Coordinator to accept requests and while it is executing them.

If you plan to include the -localauth flag to the butc command in the next step, log in as the local superuser root.


4. Issue the butc command to start the Tape Coordinator. You can include either, but not both, of the -localauth and -celloptions, as discussed in Performing Backup Operations as the Local Superuser Root or in a Foreign Cell.

% butc [<port offset>] [-debuglevel <trace level>] \[-cell <cellname>] [-noautoquery] [-localauth]

where

butc Must be typed in full.

port offset Specifies the Tape Coordinator’s port offset number. You must provide this argument unless the default valueof 0 (zero) is appropriate.

-debuglevel Specifies the type of trace messages that the Tape Coordinator writes to the standard output stream (stdout).Provide one of the following three values, or omit this argument to display the default type of messages (equivalentto setting a value of 0 [zero]):

• 0: The Tape Coordinator generates only the minimum number of messages necessary to communicate with thebackup operator, including prompts for insertion of additional tapes and messages that indicate errors or the begin-ning or completion of operations.

• 1: In addition to the messages displayed at level 0, the Tape Coordinator displays the name of each volume beingdumped or restored.

• 2: In addition to the messages displayed at levels 0 and 1, the Tape Coordinator displays all of the messages it isalso writing to its log file (/usr/afs/backup/TL_device_name).

cellname Names the cell in which to perform the backup operations (the cell where the relevant volumes reside and theBackup Server process is running). If you omit this argument, the Tape Coordinator uses its home cell, as defined inthe local /usr/vice/etc/ThisCell file. Do not combine this argument with the -localauth flag.

-noautoquery Disables the Tape Coordinator’s prompt for the first tape it needs for each operation. For a description ofthe advantages and consequences of including this flag, see Eliminating the Search or Prompt for the Initial Tape.

-localauth Constructs a server ticket using a key from the local /usr/afs/etc/KeyFile file. The butc process presents itto the Backup Server, Volume Server, and VL Server during mutual authentication. You must be logged into a fileserver machine as the local superuser root to include this flag, and cannot combine it with the -cell argument.

7.2.9 To stop a Tape Coordinator process

1. Enter an interrupt signal such as <Ctrl-c> over the dedicated connection to the Tape Coordinator.

7.2.10 To check the status of a Tape Coordinator process



2. Issue the backup status command.

% backup status [<TC port offset>]

where

st Is the shortest acceptable abbreviation of status.

TC port offset Specifies the Tape Coordinator’s port offset number. You must provide this argument unless the defaultvalue of 0 (zero) is appropriate.

The following message indicates that the Tape Coordinator is not currently performing an operation:


Tape coordinator is idle

Otherwise, the output includes a message of the following format for each running or pending operation:

Task task_ID: operation: status

where

task_ID Is a task identification number assigned by the Tape Coordinator. It begins with the Tape Coordinator’s port offsetnumber.

operation Identifies the operation the Tape Coordinator is performing, which is initiated by the indicated command:

• Dump (the backup dump command)

• Restore (the backup diskrestore, backup volrestore, or backup volsetrestore commands)

• Labeltape (the backup labeltape command)

• Scantape (the backup scantape command)

• SaveDb (the backup savedb command)

• RestoreDb (the backup restoredb command)

status Indicates the job’s current status in one of the following messages.

number Kbytes transferred, volume volume_name For a running dump operation, indicates the number ofkilobytes copied to tape or a backup data file so far, and the volume currently being dumped.

number Kbytes, restore.volume For a running restore operation, indicates the number of kilobytes copied intoAFS from a tape or a backup data file so far.

[abort requested] The (backup) kill command was issued, but the termination signal has yet to reach the TapeCoordinator.

[abort sent] The operation is canceled by the (backup) kill command. Once the Backup System removes an opera-tion from the queue or stops it from running, it no longer appears at all in the output from the command.

[butc contact lost] The backup command interpreter cannot reach the Tape Coordinator. The message can meaneither that the Tape Coordinator handling the operation was terminated or failed while the operation was running, orthat the connection to the Tape Coordinator timed out.

[done] The Tape Coordinator has finished the operation.

[drive wait] The operation is waiting for the specified tape drive to become free.

[operator wait] The Tape Coordinator is waiting for the backup operator to insert a tape in the drive.

If the Tape Coordinator is communicating with an XBSA server (a third-party backup utility that implements the Open Group’sBackup Service API [XBSA]), the following message appears last in the output:

XBSA_program Tape coordinator

where XBSA_program is the name of the XBSA-compliant program.

7.3 Backing Up Data

This section explains how to use the backup dump command to back up AFS data to tape or to a backup data file. Theinstructions assume that you understand Backup System concepts and have already configured the Backup System according tothe instructions in Configuring the AFS Backup System. Specifically, you must already have:

• Decided whether to dump data to tape or to a backup data file, and configured the Tape Coordinator machine and Tape Coor-dinator process appropriately. See Configuring Tape Coordinator Machines and Tape Devices and Dumping Data to a BackupData File.


• Defined a volume set that includes the volumes you want to dump together. See Defining and Displaying Volume Sets andVolume Entries.

• Defined the dump level in the dump hierarchy at which you want to dump the volume set. If it is an incremental dump level,you must have previously created a dump at its parent level. See Defining and Displaying the Dump Hierarchy.

• Created a device configuration file. Such a file is required for each tape stacker, jukebox device, or backup data file. You canalso use it to configure the Backup System’s automation features. See Automating and Increasing the Efficiency of the BackupProcess.

The most basic way to perform a dump operation is to create an initial dump of a single volume set as soon as the appropriateTape Coordinator is available, by providing only the required arguments to the backup dump command. Instructions appear inTo create a dump. The command has several optional arguments that you can use to increase the efficiency and flexibility of yourbackup procedures:

• To append a dump to the end of a set of tapes that already contains other dumps, include the -append argument. Otherwise, theBackup System creates an initial dump. Appending dumps enables you to use a tape’s full capacity and has other potentiallyuseful features. For a discussion, see Appending Dumps to an Existing Dump Set.

• To schedule one or more dump operations to run at a future time, include the -at argument. For a discussion and instructions,see Scheduling Dumps.

• To initiate a number of dump operations with a single backup dump command, include the -file argument to name a file inwhich you have listed the commands. For a discussion and instructions, see Appending Dumps to an Existing Dump Set andScheduling Dumps.

• To generate a list of the volumes to be included in a dump, without actually dumping them, combine the -n flag with the otherarguments to be used on the actual command.

7.3.1 Making Backup Operations More Efficient

There are several ways to make dump operations more efficient, less prone to error, and less disruptive to your users. Several ofthem also simplify the process of restoring data if that becomes necessary.

• It is best not to dump the read/write or read-only version of a volume, because no other users or processes can access a volumewhile it is being dumped. Instead, shortly before the dump operation begins, create a backup version of each volume to bedumped, and dump the backup version. Creating a Backup version usually makes the source volume unavailable for just a fewmoments (during which access attempts by other processes are blocked but do not fail). To automate the creation of backupvolumes, you can create a cron process in the /usr/afs/local/BosConfig file on one or more server machines, setting its starttime at a sufficient interval before the dump operation is to begin. Include the -localauth argument to the vos backup or vosbackupsys command to enable it to run without administrative tokens. For instructions, see To create and start a new process.

• The volume set, dump level, and Tape Coordinator port offset you specify on the backup dump command line must be properlydefined in the Backup Database. The Backup System checks the database before beginning a dump operation and halts thecommand immediately if any of the required entities are missing. If necessary, use the indicated commands:

– To display volume sets, use the backup listvolsets command as described in To display volume sets and volume entries.– To display dump levels, use the backup listdumps command as described in To display the dump hierarchy.– To display port offsets, use the backup listhosts command as described in To display the list of configured Tape Coordina-

tors.

• Ensure that a valid token corresponding to a privileged administrative identity is available to the Backup System processesboth when the backup dump command is issued and when the dump operation actually runs (for a complete description orthe necessary privileges, see Granting Administrative Privilege to Backup Operators). This is a special concern for scheduleddumps. One alternative is to run backup commands (or the script that invokes them) and the butc command on servermachines, and to include the -localauth argument on the command. In this case, the processes use the key with the highest keyversion number in the local /usr/afs/etc/KeyFile file to construct a token that never expires. Otherwise, you must use a methodto renew tokens before they expire, or grant tokens with long lifetimes. In either case, you must protect against improper accessto the tokens by securing the machines both physically and against unauthorized network access. The protection possibly needsto be even stronger than when a human operator is present during the operations.


• Record tape capacity and filemark size values that are as accurate as possible in the Tape Coordinator’s /usr/afs/backup/-tapeconfig file and on the tape’s label. For suggested values and a description of what can happen when they are inaccurate,see Configuring the tapeconfig File.

• If an unattended dump requires multiple tapes, arrange to provide them by properly configuring a tape stacker or jukebox andwriting a tape-mounting script to be invoked in the device’s CFG_device_name file. For instructions, see Invoking a Device’sTape Mounting and Unmounting Routines.

• You can configure any tape device or backup data file’s CFG_device_name file to take advantage of the Backup System’sautomation features. See Automating and Increasing the Efficiency of the Backup Process.

• When you issue a backup command in regular (noninteractive) mode, the command shell prompt does not return until theoperation completes. To avoid having to open additional connections, issue the backup dump command in interactive mode,especially when including the -at argument to schedule dump operations.

• An incremental dump proceeds most smoothly if there is a dump created at the dump level immediately above the level youare using. If the Backup System does not find a Backup Database record for a dump created at the immediate parent level,it looks for a dump created at one level higher in the hierarchy, continuing up to the full dump level if necessary. It createsan incremental dump at the level one below the lowest valid parent dump that it finds, or even creates a full dump if that isnecessary. This algorithm guarantees that the dump captures all data that has changed since the last dump, but has a couple ofdisadvantages. First, the Backup System’s search through the database for a valid parent dump takes extra time. Second, thesubsequent pattern of dumps can be confusing to a human operator who needs to restore data from them, because they werenot performed at the expected dump levels.

The easiest way to guarantee that a dump exists at the immediate parent level is always to perform dump operations on thepredetermined schedule. To check that the parent dump exists, you can issue the backup dumpinfo command (as described inTo display dump records) and search for it in the output. Alternatively, issue the backup volinfo command (as described in Todisplay a volume’s dump history) for a volume that you believe is in the parent dump.

• Always use dump levels from the same hierarchy (levels that are descendants of the same full level) when dumping a givenvolume set. The result of alternating between levels from different hierarchies can be confusing when you need to restore dataor read dump records. It also increases the chance that changed data is not captured in any dump, or is backed up redundantlyinto more than one dump.

• Use permanent tape names rather than AFS tape names. You can make permanent names more descriptive than is allowedby an AFS tape name’s strict format, and also bypass the name-checking step that the Backup System performs by defaultwhen a tape has an AFS tape name only. You can also configure the Tape Coordinator always to skip the check, however; forinstructions and a description of the acceptable format for AFS tape names, see Eliminating the AFS Tape Name Check.

• If you write dumps to tape, restore operations are simplest if all of your tape devices are compatible (can read the same typeof tape, at the same compression ratios, and so on). If you must use incompatible devices, then at least use compatible devicesfor all dumps performed at dump levels that are at the same depth in their respective hierarchies (compatible devices for alldumps performed at a full dump level, compatible devices for all dumps performed at a level 1 incremental dump level, and soon). The -portoffset argument to the backup diskrestore and backup volsetrestore commands accepts multiple port offsetnumbers, but uses the first listed port offset when restoring all full dumps, the second port offset when restoring all level 1dumps, and so on. If you did not use compatible tape devices when creating dumps at the same depth in a hierarchy, you mustrestore one volume at a time with the backup volrestore command.

• In some cases, it makes sense to use a temporary volume set, which exists only within the context of the interactive sessionin which it is created and for which no record is created in the Backup Database. One suitable situation is when dumping avolume to tape in preparation for removing it permanently (perhaps because its owner is leaving the cell). In this case, you candefine a volume entry that includes only the volume of interest without cluttering up the Backup Database with a volume setrecord that you are using only once.

• Do not perform a dump operation when you know that there are network, machine, or server process problems that canprevent the Backup System from accessing volumes or the Volume Location Database (VLDB). Although the Backup Systemautomatically makes a number of repeated attempts to get to an inaccessible volume, the dump operation takes extra time andin some cases stops completely to prompt you for instructions on how to continue. Furthermore, if the Backup System’s lastaccess attempt fails and the volume is omitted from the dump, you must take extra steps to have it backed up (namely, the stepsdescribed just following for a halted dump operation). For a more complete description of how the Backup System makesrepeated access attempts, see How Your Configuration Choices Influence the Dump Process.


• Review the logs created by the Backup System as soon as possible after a dump operation completes, particularly if it ranunattended. They name any volumes that were not successfully backed up, among other problems. The Backup Server writesto the /usr/afs/logs/BackupLog file on the local disk of the database server machine, and you can use the bos getlog commandto read it remotely if you wish; for instructions, see Displaying Server Process Log Files. The Tape Coordinator writes to twofiles in the local /usr/afs/backup directory on the machine where it is running: the TE_device_name file records errors, andthe TL_device_name file records both trace and error messages.

• Avoid halting a dump operation (for instance, by issuing the (backup) kill command in interactive mode), both becauseit introduces the potential for confusion and because recovering from the interruption requires extra effort. When a dumpoperation is interrupted, the volumes that were backed up before the halt signal is received are complete on the tape or in thebackup data file, and are usable in restore operations. The records in the Backup Database about the volumes’ dump historyaccurately show when and at which dump level they were backed up; to display the records, use the backup volinfo commandas described in To display a volume’s dump history.

However, there is no indication in the dump’s Backup Database record that volumes were omitted; to display the record, usethe backup dumpinfo command as described in To display dump records. You must choose one of the following methodsfor dealing with the volumes that were not backed up before the dump operation halted. (Actually, you must make the samedecision if the dump operation halts for reasons outside your control.)

– You can take no action, waiting until the next regularly scheduled dump operation to back them up. At that time, the BackupSystem automatically dumps them at the appropriate level to guarantee that the dump captures all of the data that changedsince the volume was last dumped. However, you are gambling that restoring the volume is not necessary before the nextdump operation. If restoration is necessary, you can restore the volume only to its state at the time it was last included in adump--you have lost all changes made to the volume since that time.

– You can discard the entire dump and run the dump operation again. To discard the dump, use the backup labeltape commandto relabel the tapes or backup data file, which automatically removes all associated records from the Backup Database. Forinstructions, see Writing and Reading Tape Labels. If a long time has passed since the backup version of the volumes wascreated, some of the source volumes have possibly changed. If that seems likely, reissue the vos backup or vos backupsyscommand on them before redoing the dump operation.

– You can create a new volume set that includes the missed volumes and dump it at a full dump level (even if you specify anincremental dump level, the Backup System uses the full dump level at the top of your specified level’s hierarchy, because ithas never before backed up these volumes as part of the new volume set). The next time you dump the original volume set,the Backup System automatically dumps the missed volumes at the level one below the level it used the last time it dumpedthe volumes as part of the original volume set.

7.3.2 How Your Configuration Choices Influence the Dump Process

This section provides an overview of the backup process, describing what happens at each stage both by default and as a resultof your configuration choices, including the configuration instructions you include in the device-specific CFG_device_name file.For the sake of clarity, it tracks the progress of a single backup dump command that creates an initial dump. For a discussion ofthe slight differences in the procedure when you append or schedule dumps, see Appending Dumps to an Existing Dump Set orScheduling Dumps.

As a concrete example, the following description traces a dump of the volume set user at the /weekly/mon/tues/wed dump level.The user volume set has one volume entry that matches the backup version of all user volumes:

.* .* user.*\.backup

The dump level belongs to the following dump hierarchy.

/weekly/mon

/tues/wed

/thurs/fri


1. You issue the butc command to start a Tape Coordinator to handle the dump operation. The Tape Coordinator does nothave to be running when you issue the backup dump command, but must be active in time to accept the list of volumes tobe included in the dump, when Step 3 is completed. To avoid coordination problems, it is best to start the Tape Coordinatorbefore issuing the backup dump command.

As the Tape Coordinator initializes, it reads the entry in its local /usr/afs/backup/tapeconfig file for the port offset youspecify on the butc command line. The entry specifies the name of the device to use, and the Tape Coordinator verifiesthat it can access it. It also reads the device’s configuration file, /usr/afs/backup/CFG_device_name, if it exists. See Step6 for a description of how the instructions in the file influence the dump operation.

2. You issue the backup dump command, specifying a volume set, dump level, and the same port offset number you specifiedon the butc command in Step 1. The Backup System verifies that they have correct Backup Database records and halts theoperation with an error message if they do not.

If you issue the command in interactive mode, the Backup System assigns the operation a job ID number, which you canuse to check the operation’s status or halt it by using the (backup) jobs or (backup) kill command, respectively. Forinstructions, see To display pending or running jobs in interactive mode and To cancel operations in interactive mode.

3. The Backup System works with the VL Server to generate a list of the volumes in the VLDB that match the name andlocation criteria defined in the volume set’s volume entries. If a volume matches more than one volume entry, the BackupSystem ignores the duplicates so that the dump includes only one copy of data from the volume.

To reduce the number of times you need to switch tapes during a restore operation, the Backup System sorts the volumes byserver machine and partition, and during the dump operation writes the data from all volumes stored on a specific partitionbefore moving to the next partition.

As previously mentioned, it is best to back up backup volumes rather than read/write volumes, to avoid blocking users’access to data during the dump. To achieve this, you must explicitly include the .backup suffix on the volume names involume entry definitions. For instructions, and to learn how to define volume entries that match multiple volumes, seeDefining and Displaying Volume Sets and Volume Entries.

In the example, suppose that 50 volumes match the user volume set criteria, including three called user.pat.backup,user.terry.backup, and user.smith.backup.

4. The Backup System next scans the dump hierarchy for the dump level you have specified on the backup dump commandline. If it is a full level, then in the current operation the Backup System backs up all of the data in all of the volumes inthe list obtained in Step 3.

If the dump level is incremental, the Backup System reads each volume’s dump history in the Backup Database to learnwhich of the parent levels in its pathname was used when the volume was most recently backed up as part of this volumeset. In the usual case, it is the current dump level’s immediate parent level.

An incremental dump of a volume includes only the data that changed since the volume was included in the parent dump.To determine which data are eligible, the Backup System uses the concept of a volume’s clone date. A read/write volume’sclone date is when the Backup System locks the volume before copying its contents into a dump. A backup volume’s clonedate is the completion time of the operation that created it by cloning its read/write source volume (the operation initiatedby a vos backup or vos backupsys command). A read-only volume’s clone date is the time of the release operation(initiated by the vos release command) that completed most recently before the dump operation.

More precisely then, an incremental dump includes only data that have a modification timestamp between the clone dateof the volume included in the parent dump (the parent clone date) and the clone date of the volume to be included in thecurrent dump (the current clone date).

There are some common exceptions to the general rule that a volume’s parent dump is the dump created at the immediateparent level:

• The volume did not exist at all at the time of the last dump. In this case, the Backup System automatically does a fulldump of it.

• The volume did not match the volume set’s name and location criteria at the time of the last dump. In this case, theBackup System automatically does a full dump of it, even if it was backed up recently (fully or incrementally) as part ofanother volume set. This redundancy is an argument for defining volume entries in terms of names rather than locations,particularly if you move volumes frequently.


• The volume was not included in the dump at the immediate parent level for some reason (perhaps a process, machine,or network access prevented the Backup System from accessing it). In this case, the Backup System sets the clone dateto the time of the last dump operation that included the volume. If the volume was not included in a dump performed atany of the levels in the current level’s pathname, the Backup System does a full dump of it.

In the example, the current dump level is /weekly/mon/tues/wed. The user.pat.backup and user.terry.backup volumeswere included in the dump performed yesterday, Tuesday, at the /weekly/mon/tues level. The Backup System uses as theirparent clone date 3:00 a.m. on Tuesday, which is when backup versions of them were created just before Tuesday’s dumpoperation. However, Tuesday’s dump did not include the user.smith.backup volume for some reason. The last time it wasincluded in a dump was Monday, at the /weekly/mon level. The Backup System uses a parent clone date of Monday at2:47 a.m., which is when a backup version of the volume was created just before the dump operation on Monday.

5. If performing an incremental dump, the Backup System works with the Volume Server to prepare a list of all of the files ineach volume that have changed (have modification timestamps) between the parent clone date and the current clone date.The dump includes the complete contents of every such file. If a file has not changed, the dump includes only a placeholderstub for it. The dump also includes a copy of the complete directory structure in the volume, whether or not it has changedsince the previous dump.

If none of the data in the volume has changed since the last dump, the Backup System omits the volume completely. Itgenerates the following message in the Tape Coordinator window and log files:

Volume volume_name (volume_ID) not dumped - has not been modifiedsince last dump.

6. The Tape Coordinator prepares to back up the data. If there is a CFG_device_name file, the Tape Coordinator already readit in Step 1. The following list describes how the instructions in the file guide the Tape Coordinator’s behavior at this point:

FILE If this instruction is set to YES, the Tape Coordinator writes data to a backup data file. The device_name field in thetapeconfig file must also specify a filename for the dump to work properly. For further discussion and instructionson configuring a backup data file, see Dumping Data to a Backup Data File.If it is set to NO or does not appear in the file, the Tape Coordinator writes to a tape device.

MOUNT and UNMOUNT If there is a MOUNT instruction in the file, each time the Tape Coordinator needs a newtape, it invokes the indicated script or program to mount a tape in the device’s tape drive. There must be a MOUNTinstruction if you want to utilize a tape stacker or jukebox’s ability to switch between tapes automatically. If there isno MOUNT instruction, the Tape Coordinator prompts the human operator whenever it needs a tape.The AUTOQUERY instruction, which is described just following, modifies the Tape Coordinator’s tape acquisitionprocedure for the first tape it needs in a dump operation.If there is an UNMOUNT instruction, then the Tape Coordinator invokes the indicated script or program whenever itcloses the tape device. Not all tape devices have a separate tape unmounting routine, in which case the UNMOUNTinstruction is not necessary. For more details on both instructions, see Invoking a Device’s Tape Mounting andUnmounting Routines.

AUTOQUERY If this instruction is set to NO, the Tape Coordinator assumes that the first tape needed for the dumpoperation is already in the tape drive. It does not use its usual tape acquisition procedure as described in the precedingdiscussion of the MOUNT instruction. You can achieve the same effect by including the -noautoquery flag to thebutc command.If this instruction is absent or set to YES, the Tape Coordinator uses its usual tape acquisition procedure even for thefirst tape. For more details, see Eliminating the Search or Prompt for the Initial Tape.

BUFFERSIZE If this instruction appears in the file, the Tape Coordinator sets its buffer size to the specified value ratherthan using the default buffer size of 16 KB. For further discussion, see Setting the Memory Buffer Size to PromoteTape Streaming.

If there is no CFG_device_name file, the Tape Coordinator writes data to a tape device and prompts the human operatoreach time it needs a tape (the only exception being the first tape if you include the -noautoquery flag to the butc command).

7. The Tape Coordinator opens either a tape drive or backup data file at this point, as directed by the instructions in theCFG_device_name file (described in Step 6). The instructions also determine whether it invokes a mount script or promptsthe operator. In Step 1 the Tape Coordinator read in the device’s capacity and filemark size from the tapeconfig file. It


now reads the same values from the tape or backup data file’s magnetic label, and overwrites the tapeconfig values if thereis a difference.

If creating an initial dump (as in the current example) and there is no permanent name on the label, the Tape Coordinatornext checks that the AFS tape name has one of the three acceptable formats. If not, it rejects the tape and you must usethe backup labeltape command to write an acceptable name. You can bypass this name-checking step by including theNAME_CHECK NO instruction in the CFG_device_name file. For discussion and a list of the acceptable AFS tape namevalues, see Eliminating the AFS Tape Name Check.

8. For an initial dump, the Tape Coordinator starts writing at the beginning of the tape or backup dump file, overwriting anyexisting data. To prevent inappropriate overwriting, the Backup System first checks the Backup Database for any dumprecords associated with the name (permanent or AFS tape name) on the tape or backup dump file’s label. It refuses to writeto a backup data file that has unexpired dumps in it, or to a tape that belongs to a dump set with any unexpired dumps. Torecycle a file or tape before all dumps have expired, you must use the backup labeltape command to relabel it. Doing soremoves the Backup Database records of all dumps in the file or on all tapes in the dump set, which makes it impossible torestore data from any of the tapes. For more information on expiration dates, see Defining Expiration Dates.

The Tape Coordinator also checks for two other types of inappropriate tape reuse. The tape cannot already have data on itthat belongs to the dump currently being performed, because that implies that the previous tape is still in the drive, or youhave mistakenly reinserted it. The Tape Coordinator generates the following message and attempts to obtain another tape:

Can’t overwrite tape containing the dump in progress

The tape cannot contain data from a parent dump of the current (incremental) dump, because overwriting a parent dumpmakes it impossible to restore data from the current dump. The Tape Coordinator generates the following message andattempts to obtain another tape:

Can’t overwrite the parent dump parent_name (parent_dump_ID)

9. The Tape Coordinator now writes data to the tape or backup data file. It uses the capacity and filemark size it obtainedin Step 7 as it tracks how much more space is available, automatically using its tape acquisition procedure if the dumpis not finished when it reaches the end of the tape. For a more detailed description, and a discussion of what happens ifthe Tape Coordinator reaches the physical end-of-tape unexpectedly, see Configuring the tapeconfig File. Similarly, forinstructions on configuring a backup data file to optimize recovery from unexpectedly running out of space, see Step 6 inthe instructions in Dumping Data to a Backup Data File.

If the Tape Coordinator cannot access a volume during the dump (perhaps because of a server process, machine, or networkoutage), it skips the volume and continues dumping all volumes that it can access. It generates an error message in the TapeCoordinator window and log file about the omitted volume. It generates a similar message if it discovers that a backupvolume has not been recloned since the previous dump operation (that is, that the volume’s current clone date is the sameas its parent clone date):

Volume volume_name (volume_ID) not dumped - has not been re-clonedsince last dump.

After completing a first pass through all of the volumes, it attempts to dump each omitted volume again. It first checksto see if the reason that the volume was inaccessible during the first pass is that it has been moved since the VL Servergenerated the list of volumes to dump in Step 3. If so, it dumps the volume from its new site. If the second attempt toaccess a volume also fails, the Tape Coordinator it generates the following message, prompting you for instruction on howto proceed:

Dump of volume volume_name (volume_ID) failedPlease select action to be taken for this volume.

r - retry, try dumping this volume againo - omit, this volume from this dumpa - abort, the entire dump

To increase the automation of the dump process, you can include the ASK NO instruction in the CFG_device_name fileto suppress this prompt and have the Tape Coordinator automatically omit the volume from the dump.

If you are tracking the dump as it happens, the prompt enables you to take corrective action. If the volume has not beenrecloned, you can issue the vos backup command. If the volume is inaccessible, you can investigate and attempt to resolvethe cause.


10. If the tape or backup data file does not already have an AFS tape name, the Backup System constructs the appropriateone and records it on the label and in the Backup Database. It also assigns a dump name and ID number to the dump andrecords them in dump record that it creates in the Backup Database. For details on tape and dump names, see Dump Namesand Tape Names. For instructions on displaying dump records or a volume’s dump history, or scanning the contents of atape, see Displaying Backup Dump Records.

7.3.3 Appending Dumps to an Existing Dump Set

The AFS Backup System enables you to append dumps to the end of the final tape in a dump set by including the -append flagto the backup dump command. Appending dumps improves Backup System automation and efficiency in several ways:

• It maximizes use of a tape’s capacity. An initial dump must always start on a new tape, but does not necessarily extend to theend of the final tape in the dump set. You can fill up the unused tape by appending one or more dumps.

• It can reduce the number of tapes and tape changes needed to complete a dump operation. Rather than performing a series ofinitial dumps first, instead begin with an initial dump and follow it immediately with several appended dumps. In this way youcan write all dumps in the series to the same tape (assuming the tape is large enough to accommodate them all). If, in contrast,you perform all of the initial dumps first, each must begin on a new tape and you must switch tapes again if you then want toappend dumps.

You can either issue the appropriate series of backup dump commands at the interactive backup> prompt, or record them ina file that you then name with the -file argument to the backup dump command. Appending dumps in this way enables you torun multiple unattended backup operations even without a tape stacker or jukebox, if all of the dumps fit on one tape.

• It can reduce the number of tape changes during a restore operation. For example, if you append all of the incremental dumpsof a volume set to tapes in one dump set, then restoring a volume from the volume set requires a minimum number of tapechanges. It is best not to append incremental dumps to a tape that contains the parent full dump, however: if the tape is lost ordamaged, you lose all of the data from the volume.

Although it can be efficient to group together appended dumps that are related, the Backup System does not require anyrelationship between the appended dumps on a tape or in a dump set.

When writing an appended dump, the Backup System performs most of the steps described in How Your Configuration ChoicesInfluence the Dump Process. Appended dumps do not have to be related to one another or the initial dump, so it skips Step 7:there is no need to check that the AFS tape name reflects the volume set and dump level names in this case. It also skips Step 8.Because it is not overwriting any existing data on the tape, it does not need to check the expiration dates of existing dumps onthe tape or in the file. Then in Step 9 the Tape Coordinator scans to the end of the last dump on the tape or in the backup data filebefore it begins writing data.

The Backup System imposes the following conditions on appended dumps:

• If writing to tape, the Tape Coordinator checks that it is the final one in a dump set for which there are complete and valid tapeand dump records in the Backup Database. If not, it rejects the tape and requests an acceptable one. If you believe the tape hasvalid data on it, you can reconstruct the Backup Database dump records for it by using the -dbadd argument to the backupscantape command as instructed in To scan the contents of a tape.

• The most recent dump on the tape or in the backup data file must have completed successfully.

• The dump set to which the tape or file belongs must begin with an initial dump that is recorded in the Backup Database. Ifthere are no dumps on the current tape, then the Backup System treats the dump operation as an initial dump and imposes therelevant requirements (for example, checks the AFS tape name if appropriate).

As you append dumps, keep in mind that all of a dump set’s dump and tape records in the Backup Database are indexed to theinitial dump. If you want to delete an appended dump’s record, you must delete the initial dump record, and doing so erases therecords of all dumps in the dump set. Without those records, you cannot restore any of the data in the dump set.

Similarly, all of the dumps in a dump set must expire before you can recycle (write a new initial dump to) any of the tapes in adump set. Do not append a dump if its expiration date is later than the date on which you want to recycle any of the tapes in itsdump set. To recycle a tape before the last expiration date, you must delete the initial dump’s record from the Backup Database.


Either use the backup labeltape command to relabel the tape as instructed in To label a tape, or use the backup deletedumpcommand to delete the record directly as instructed in To delete dump records from the Backup Database.

Although in theory you can append as many dumps as you wish, it generally makes sense to limit the number of tapes in a dumpset (for example, to five), for these reasons:

• If an unreadable spot develops on one of the tapes in a dump set, it can prevent the Tape Coordinator from scanning the tapeas part of a backup scantape operation you use to reconstruct Backup Database records. The Tape Coordinator can almostalways scan the tape successfully up to the point of damage and can usually skip past minor damage. A scanning operationcan start on any tape in a dump set, so damage on one tape does not prevent scanning of the others in the dump set. However,you can scan only the tapes that precede the damaged one in the dump set or the ones that follow the damaged one, but notboth. (For more information on using tapes to reconstruct the information in the Backup Database, see To scan the contents ofa tape.)

An unreadable bad spot can also prevent you from restoring a volume completely, because restore operations must begin withthe full dump and continue with each incremental dump in order. If you cannot restore a specific dump, you cannot restore anydata from later incremental dumps.

• If you decide in the future to archive one or more dumps, then you must archive the entire set of tapes that constitute thedump set, rather than just the ones that contain the data of interest. This wastes both tape and archive storage space. For moreinformation on archiving, see Archiving Tapes.

7.3.4 Scheduling Dumps

By default, the Backup System starts executing a dump operation as soon as you enter the backup dump command, and the TapeCoordinator begins writing data as soon as it is not busy and the list of files to write is available. You can, however, schedule adump operation to begin at a specific later time:

• To schedule a single dump operation, include the -at argument to specify its start time.

• To schedule multiple dump operations, list the operations in a file named by the -file argument and use the -at argument tospecify when the backup command interpreter reads the file. If you omit the -at argument, the command interpreter readsthe file immediately, which does not count as scheduling, but does allow you to initiate multiple dump operations in a singlecommand. Do not combine the -file argument with the -volumeset, -dump, -portoffset, -append, or -n options.

For file-formatting instructions, see the description of the -file argument in Step 7 of To create a dump.

The Backup System performs initial and appended dumps in the same manner whether they are scheduled or begin running assoon as you issue the backup dump command. The only difference is that the requirements for successful execution hold both atthe time you issue the command and when the Backup System actually begins running it. All required Backup Database entriesfor volume sets, dump levels, and port offsets, and all dump and tape records must exist at both times. Perhaps more importantly,the required administrative tokens must be available at both times. See Making Backup Operations More Efficient.

7.3.5 To create a dump





3. If using a tape device, insert the tape.



% backup

5. Decide which volume set and dump level to use. If necessary, issue the backup listvolsets and backup listdumps com-mands to display the existing volume sets and dump levels. For complete instructions and a description of the output, seeTo display volume sets and volume entries and To display the dump hierarchy.

backup> listvolsets [<volume set name>]backup> listdumps

If you want to use a temporary volume set, you must create it during the current interactive session. This can be useful ifyou are dumping a volume to tape in preparation for removing it permanently (perhaps because its owner is leaving thecell). In this case, you can define a volume entry that includes only the volume of interest without cluttering up the BackupDatabase with a volume set record that you are using only once. Complete instructions appear in Defining and DisplayingVolume Sets and Volume Entries.

backup> addvolset <volume set name> -temporarybackup> addvolentry -name <volume set name> \

-server <machine name> \-partition <partition name> \-volumes <volume name (regular expression)>

6. If you are creating an initial dump and writing to a tape or backup data file that does not have a permanent name, its AFStape name must satisfy the Backup System’s format requirements as described in Eliminating the AFS Tape Name Check.If necessary, use the backup readlabel command to display the label and the backup labeltape command to change thenames, as instructed in Writing and Reading Tape Labels. You must also relabel a tape if you want to overwrite it and it ispart of a dump set that includes any unexpired dumps, though this is not recommended. For a discussion of the appropriateway to recycle tapes, see Creating a Tape Recycling Schedule.

7. Issue the backup dump command to dump the volume set.

• To create one initial dump, provide only the volume set name, dump level name, and port offset (if not zero).

• To create one appended dump, add the -append flag.

• To schedule a single initial or appended dump, add the -at argument.

• To initiate multiple dump operations, record the appropriate commands in a file and name it with the -file argument. Donot combine this argument with options other than the -at argument.

backup> dump <volume set name> <dump level name> [<TC port offset>] \[-at <Date/time to start dump>+] \[-append] [-n] [-file <load file>]

where

dump Must be typed in full.

volume set name Names the volume set to dump.

dump level name Specifies the complete pathname of the dump level at which to dump the volume set.

TC port offset Specifies the port offset number of the Tape Coordinator process that is handling the operation. You mustprovide this argument unless the default value of 0 (zero) is appropriate.

-at Specifies the date and time in the future at which to run the command, or to read the file named by the -file argument.Provide a value in the format mm/dd/yyyy [hh:MM], where the month (mm), day (dd), and year (yyyy) are required.Valid values for the year range from 1970 to 2037; higher values are not valid because the latest possible date in thestandard UNIX representation is in February 2038. The Backup System automatically reduces any later date to themaximum value in 2038.The hour and minutes (hh:MM) are optional, but if provided must be in 24-hour format (for example, the value 14:36represents 2:36 p.m.). If you omit them, the time defaults to midnight (00:00 hours).As an example, the value 04/23/1999 20:20 schedules the command for 8:20 p.m. on 23 April 1999.


NoteA plus sign follows this argument in the command’s syntax statement because it accepts a multiword value whichdoes not need to be enclosed in double quotes or other delimiters, not because it accepts multiple dates. Provideonly one date (and optionally, time) definition.

-append Creates an appended dump by scanning to the end of the data from one or more previous dump operations that itfinds on the tape or in the backup data file.

-n Displays the names of all volumes to be included in the indicated dump, without actually writing data to tape or thebackup data file. Combine this flag with the arguments you plan to use on the actual command, but not with the -fileargument.

-file Specifies the local disk or AFS pathname of a file containing backup commands. The Backup System reads the fileimmediately, or at the time specified by the -at argument if it is provided. A partial pathname is interpreted relativeto the current working directory.Place each backup dump command on its own line in the indicated file, using the same syntax as for the commandline, but without the word backup at the start of the line. Each command must include the volume set name and dumplevel name arguments plus the TC port offset argument if the default value of zero is not appropriate. Commands inthe file can also include any of the backup dump command’s optional arguments, including the -at argument (whichmust specify a date and time later than the date and time at which the Backup System reads the file).

8. If you did not include the -noautoquery flag when you issued the butc command, or if the device’s CFG_device_nameconfiguration file includes the instruction AUTOQUERY YES, then the Tape Coordinator prompts you to place the tapein the device’s drive. You have already done so, but you must now press <Return> to indicate that the tape is ready forlabeling.

If more than one tape is required, you must either include the MOUNT instruction in the CFG_device_name file and stockthe corresponding stacker or jukebox with tapes, or remain at the console to respond to the Tape Coordinator’s prompts forsubsequent tapes.

9. After the dump operation completes, review the Backup System’s log files to check for errors. Use the bos getlog commandas instructed in Displaying Server Process Log Files to read the /usr/afs/logs/BackupLog file, and a text editor on the TapeCoordinator machine to read the TE_device_name and TL_device_name files in the local /usr/afs/backup directory.

It is also a good idea to record the tape name and dump ID number on the exterior label of each tape.

7.4 Displaying Backup Dump Records

The backup command suite includes three commands for displaying information about data you have backed up:

• To display information about one or more dump operations, such as the date it was performed and the number of volumesincluded, use the backup dumpinfo command as described in To display dump records. You can display a detailed record ofa single dump or more condensed records for a certain number of dumps, starting with the most recent and going back in time.You can specify the number of dumps or accept the default of 10.

• To display a volume’s dump history, use the backup volinfo command as described in To display a volume’s dump history.

• To display information extracted from a tape or backup data file about the volumes it includes, use the backup scantapecommand. To create new dump and tape records in the Backup Database derived from the tape and dump labels, add the-dbadd flag. For instructions, see To scan the contents of a tape.

7.4.1 To display dump records




2. Issue the backup dumpinfo command to list information about dumps recorded in the Backup Database.

% backup dumpinfo [-ndumps <no. of dumps>] [-id <dump id>] [-verbose]

where

dump Is the shortest acceptable abbreviation of dumpinfo.

-ndumps Displays the Backup Database record for each of the specified number of dumps, starting with the most recentand going back in time. If the database contains fewer dumps than are requested, the output includes the records forall existing dumps. Do not combine this argument with the -id argument or -verbose flag; omit all three options todisplay the records for the last 10 dumps.

-id Specifies the dump ID number of a single dump for which to display the Backup Database record. You must includethe -id switch. Do not combine this option with the -ndumps or -verbose arguments; omit all three arguments todisplay the records for the last 10 dumps.

-verbose Provides more detailed information about the dump specified with the -id argument, which must be providedalong with it. Do not combine this flag with the -ndumps option.

If the -ndumps argument is provided, the output presents the following information in table form, with a separate line for eachdump:

dumpid The dump ID number.

parentid The dump ID number of the dump’s parent dump. A value of 0 (zero) identifies a full dump.

lv The depth in the dump hierarchy of the dump level used to create the dump. A value of 0 (zero) identifies a full dump, inwhich case the value in the parentid field is also 0. A value of 1 or greater indicates an incremental dump made at thecorresponding level in the dump hierarchy.

created The date and time at which the Backup System started the dump operation that created the dump.

nt The number of tapes that contain the data in the dump. A value of 0 (zero) indicates that the dump operation was terminatedor failed. Use the backup deletedump command to remove such entries.

nvols The number of volumes from which the dump includes data. If a volume spans tapes, it is counted twice. A value of 0(zero) indicates that the dump operation was terminated or failed; the value in the nt field is also 0 (zero) in this case.

dump name The dump name in the form

volume_set_name.dump_level_name (initial_dump_ID)

where volume_set_name is the name of the volume set, and dump_level_name is the last element in the dump levelpathname at which the volume set was dumped.

The initial_dump_ID, if displayed, is the dump ID of the initial dump in the dump set to which this dump belongs. If thereis no value in parentheses, the dump is the initial dump in a dump set that has no appended dumps.

If the -id argument is provided alone, the first line of output begins with the string Dump and reports information for the entiredump in the following fields:

id The dump ID number.

level The depth in the dump hierarchy of the dump level used to create the dump. A value of 0 (zero) identifies a full dump.A value of 1 (one) or greater indicates an incremental dump made at the specified level in the dump hierarchy.

volumes The number of volumes for which the dump includes data.

created The date and time at which the dump operation began.

If an XBSA server was the backup medium for the dump (rather than a tape device or backup data file), the following line appearsnext:


Backup Service: XBSA_program: Server: hostname

where XBSA_program is the name of the XBSA-compliant program and hostname is the name of the machine on which theprogram runs.

Next the output includes an entry for each tape that houses volume data from the dump. Following the string Tape, the first twolines of each entry report information about that tape in the following fields:

name The tape’s permanent name if it has one, or its AFS tape name otherwise, and its tape ID number in parentheses.

nVolumes The number of volumes for which this tape includes dump data.

created The date and time at which the Tape Coordinator began writing data to this tape.

Following another blank line, the tape-specific information concludes with a table that includes a line for each volume dump onthe tape. The information appears in columns with the following headings:

Pos The relative position of each volume in this tape or file. On a tape, the counter begins at position 2 (the tape label occupiesposition 1), and increments by one for each volume. For volumes in a backup data file, the position numbers start with 1and do not usually increment only by one, because each is the ordinal of the 16 KB offset in the file at which the volume’sdata begins. The difference between the position numbers therefore indicates how many 16 KB blocks each volume’s dataoccupies. For example, if the second volume is at position 5 and the third volume in the list is at position 9, that means thatthe dump of the second volume occupies 64 KB (four 16-KB blocks) of space in the file.

Clone time For a backup or read-only volume, the time at which it was cloned from its read/write source. For a Read/Writevolume, it is the same as the dump creation date reported on the first line of the output.

Nbytes The number of bytes of data in the dump of the volume.

Volume The volume name, complete with .backup or .readonly extension if appropriate.

If both the -id and -verbose options are provided, the output is divided into several sections:

• The first section, headed by the underlined string Dump, includes information about the entire dump. The fields labeled id,level, created, and nVolumes report the same values (though in a different order) as appear on the first line of outputwhen the -id argument is provided by itself. Other fields of potential interest to the backup operator are:

Group id The dump’s group ID number, which is recorded in the dump’s Backup Database record if the GROUPID in-struction appears in the Tape Coordinator’s /usr/afs/backup/CFG_tcid file when the dump is created.

maxTapes The number of tapes that contain the dump set to which this dump belongs.

Start Tape Seq The ordinal of the tape on which this dump begins in the set of tapes that contain the dump set.

• For each tape that contains data from this dump, there follows a section headed by the underlined string Tape. The fieldslabeled name, written, and nVolumes report the same values (though in a different order) as appear on the second andthird lines of output when the -id argument is provided by itself. Other fields of potential interest to the backup operator are:

expires The date and time when this tape can be recycled, because all dumps it contains have expired.

nMBytes Data and nBytes Data Summed together, these fields represent the total amount of dumped data actuallyfrom volumes (as opposed to labels, filemarks, and other markers).

KBytes Tape Used The number of kilobytes of tape (or disk space, for a backup data file) used to store the dump data. Itis generally larger than the sum of the values in the nMBytes Data and nBytes Data fields, because it includes thespace required for the label, file marks and other markers, and because the Backup System writes data at 16 KB offsets,even if the data in a given block doesn’t fill the entire 16 KB.

• For each volume on a given tape, there follows a section headed by the underlined string Volume. The fields labeled name,position, clone, and nBytes report the same values (though in a different order) as appear in the table that lists thevolumes in each tape when the -id argument is provided by itself. Other fields of potential interest to the backup operator are:


id The volume ID.tape The name of the tape containing this volume data.

The following example command displays the Backup Database records for the five most recent dump operations.

% backup dump 5dumpid parentid lv created nt nvols dump name

924424000 0 0 04/18/1999 04:26 1 22 usr.sun (924424000)924685000 924424000 1 04/21/1999 04:56 1 62 usr.wed (924424000)924773000 924424000 1 04/22/1999 05:23 1 46 usr.thu (924424000)924860000 924424000 1 04/23/1999 05:33 1 58 usr.fri (924424000)925033000 0 0 04/25/1999 05:36 2 73 sys.week

7.4.2 To display a volume’s dump history



2. Issue the backup volinfo command to display a volume’s dump history.

% backup volinfo <volume name>

where

voli Is the shortest acceptable abbreviation of volinfo.volume name Names the volume for which to display the dump history. If you dumped the backup or read-only version

of the volume, include the .backup or .readonly extension.

The output includes a line for each Backup Database dump record that mentions the specified volume, order from most to leastrecent. The output for each record appears in a table with six columns:

dumpID The dump ID of the dump that includes the volume.

lvl The depth in the dump hierarchy of the dump level at which the volume was dumped. A value of 0 indicates a full dump.A value of 1 or greater indicates an incremental dump made at the specified depth in the dump hierarchy.

parentid The dump ID of the dump’s parent dump. A value of 0 indicates a full dump, which has no parent; in this case, thevalue in the lvl column is also 0.

creation date The date and time at which the Backup System started the dump operation that created the dump.

clone date For a backup or read-only volume, the time at which it was cloned from its read/write source. For a read/writevolume, the same as the value in the creation date field.

tape name The name of the tape containing the dump: either the permanent tape name, or an AFS tape name in the formatvolume_set_name.dump_level_name.tape_index where volume_set_name is the name of the volume set associated withthe initial dump in the dump set of which this tape is a part; dump_level_name is the name of the dump level at which theinitial dump was backed up; tape_index is the ordinal of the tape in the dump set. Either type of name can be followed bya dump ID in parentheses; if it appears, it is the dump ID of the initial dump in the dump set to which this appended dumpbelongs.

The following example shows part of the dump history of the backup volume user.smith.backup:

% backup volinfo user.smith.backupDumpID lvl parentID creation date clone date tape name924600000 1 924427600 04/20/1999 05:20 04/20/1999 05:01 user_incr_2 (924514392)924514392 1 924427600 04/19/1999 05:33 04/19/1999 05:08 user_incr_2924427600 0 0 04/18/1999 05:26 04/18/1999 04:58 user_full_6

. . . . . . . .

. . . . . . . .


7.4.3 To scan the contents of a tape

NoteThe ability to scan a tape that is corrupted or damaged depends on the extent of the damage and what type of data is corrupted.The Backup System can almost always scan the tape successfully up to the point of damage. If the damage is minor, the BackupSystem can usually skip over it and scan the rest of the tape, but more major damage can prevent further scanning. A scanningoperation does not have to begin with the first tape in a dump set, but the Backup System can process tapes only in sequentialorder after the initial tape provided. Therefore, damage on one tape does not prevent scanning of the others in the dump set,but it is possible to scan either the tapes that precede the damaged one or the ones that follow it, not both.

If you use the -dbadd flag to scan information into the Backup Database and the first tape you provide is not the first tape in thedump set, the following restrictions apply:

• If the first data on the tape is a continuation of a volume that begins on the previous (unscanned) tape in the dump set, theBackup System does not add a record for that volume to the Backup Database.

• The Backup System must read the marker that indicates the start of an appended dump to add database records for the volumesin it. If the first volume on the tape belongs to an appended dump, but is not immediately preceded by the appended-dumpmarker, the Backup System does not create a Backup Database record for it or any subsequent volumes that belong to thatappended dump.





3. If scanning a tape, place it in the drive.

4. (Optional) Issue the backup command to enter interactive mode.

% backup

5. Issue the backup scantape command to read the contents of the tape.

backup> scantape [-dbadd] [-portoffset <TC port offset>]

where

sc Is the shortest acceptable abbreviation of scantape.

-dbadd Constructs dump and tape records from the tape and dump labels in the dump and writes them into the BackupDatabase.

TC port offset Specifies the port offset number of the Tape Coordinator process that is handling the operation. You mustprovide this argument unless the default value of 0 (zero) is appropriate.

6. If you did not include the -noautoquery flag when you issued the butc command, or the device’s CFG_device_nameconfiguration file includes the instruction AUTOQUERY YES instruction, then the Tape Coordinator prompts you toplace the tape in the device’s drive. You have already done so, but you must now press <Return> to indicate that the tapeis ready for reading.


To terminate a tape scanning operation, use a termination signal such as <Ctrl-c>, or issue the (backup) kill command ininteractive mode. It is best not to interrupt the scan if you included the -dbadd argument. If the Backup System has alreadywritten new records into the Backup Database, then you must remove them before rerunning the scanning operation. If duringthe repeated scan operation the Backup System finds that a record it needs to create already exists, it halts the operation.

For each dump on the tape, the output in the Tape Coordinator window displays the dump label followed by an entry for eachvolume. There is no output in the command window. The dump label has the same fields as the tape label displayed by thebackup readlabel command, as described in Writing and Reading Tape Labels. Or see the OpenAFS Administration Referencefor a detailed description of the fields in the output.

The following example shows the dump label and first volume entry on the tape in the device that has port offset 2:

% backup scantape 2-- Dump label --tape name = monthly_guestAFS tape name = guests.monthly.3creationTime = Mon Feb 1 04:06:40 1999cell = example.comsize = 2150000 Kbytesdump path = /monthlydump id = 917860000useCount = 44-- End of dump label ---- volume --volume name: user.guest10.backupvolume ID 1937573829dumpSetName: guests.monthlydumpID 917860000level 0parentID 0endTime 0clonedate Mon Feb 1 03:03:23 1999

7.5 Restoring and Recovering Data

The purpose of making backups is to enable you to recover when data becomes corrupted or is removed accidentally, returningthe data to a coherent past state. The AFS Backup System provides three commands that restore varying numbers of volumes:

• To restore one or more volumes to a single site (partition on an AFS file server machine), use the backup volrestore command.

• To restore one or more volumes that are defined as a volume set, each to a specified site, use the backup volsetrestorecommand.

• To restore an entire partition (that is, all of the volumes that the VLDB lists as resident on it), use the backup diskrestorecommand.

The commands are suited to different purposes because they vary in the combinations of features they offer and in the require-ments they impose. To decide which is appropriate for a specific restore operation, see the subsequent sections of this intro-duction: Using the backup volrestore Command, Using the backup diskrestore Command, and Using the backup volsetrestoreCommand.

7.5.1 Making Restore Operations More Efficient

The following comments apply to all types of restore operation:

• The Backup System begins by restoring the most recent full dump of a volume. As it restores subsequent incremental dumps,it alters the data in the full dump appropriately, essentially repeating the volume’s change history. The backup diskrestore


and backup volsetrestore commands always restore all incremental dumps, bringing a volume to its state at the time of themost recent incremental dump. You can use the backup volrestore command to return a volume to its state at a specified timein the past, by not restoring the data from incremental dumps performed after that time.

• The Backup System sets a restored volume’s creation date to the date and time of the restore operation. The creation dateappears in the Creation field of the output from the vos examine and vos listvol commands.

• When identifying the volumes to restore, it is best to specify the base (read/write) name. In this case, the Backup Systemsearches the Backup Database for the most recent dump set that includes data from either the read/write or backup version ofthe volume, and restores dumps of that volume starting with the most recent full dump. If you include the .backup or .readonlyextension on the volume name, the Backup System restores dumps of that version only. If it cannot find data dumped from thatversion, it does not perform the restoration even if another version was dumped.

• All three restoration commands accept the -n option, which generates a list of the volumes to be restored and the tapes orbackup data files that contain the necessary dumps, without actually restoring data to AFS server partitions. This enables youto gather together the tapes before beginning the restore operation, even preloading them into a stacker or jukebox if you areusing one.

• If you back up AFS data to tape, restoration is simplest if all of your tape devices are compatible, meaning that they can readthe same type of tape, at the same compression ratios, and so on. (This suggestion also appears in Making Backup OperationsMore Efficient, because by the time you need to restore data it is too late to implement it.) You can still restore multiplevolumes with a single command even if data was backed up using incompatible devices, because the -portoffset argumentto all three restoration commands accepts multiple values. However, the Backup System uses the first port offset listed whenrestoring the full dump of each volume, the next port offset when restoring the level 1 incremental dump of each volume, andso on. If you did not use a compatible tape device when creating the full dump of every volume (and at each incremental leveltoo), you cannot restore multiple volumes with a single command. You must use the backup volrestore command to restoreone volume at a time, or use the backup volsetrestore command after defining volume sets that group volumes according tothe tape device used to dump them.

• During a restore operation, the Backup System uses instructions in the relevant CFG_device_name configuration file in muchthe same way as during a dump operation, as described in How Your Configuration Choices Influence the Dump Process. Ituses the MOUNT, UNMOUNT, AUTOQUERY, BUFFERSIZE, and FILE instructions just as for a dump operation. Adifference for the BUFFERSIZE instruction is that the default buffer size overridden by the instruction is 32 KB for restoreoperations rather than the 16 KB used for dump operations. The Backup System does not use the NAME_CHECK instructionat all during restore operations. The ASK instruction controls whether the Backup System prompts you if it cannot restorea volume for any reason. If the setting is NO, it skips the problematic volume and restores as many of the other volumes aspossible.

• Do not perform a restore operation when you know that there are network, machine, or server process problems that can preventthe Backup System from accessing volumes or the VLDB. Although the Backup System automatically makes a number ofrepeated attempts to restore a volume, the restore operation takes extra time and in some cases stops completely to prompt youfor instructions on how to continue.

• Avoid halting a restore operation (for instance by issuing the (backup) kill command in interactive mode). If a restore operationis interrupted for any reason, including causes outside your control, reissue the same restoration command as soon as ispractical; if an outage or other problem caused the operation to halt, do not continue until the system returns to normal.

Any volume that is completely restored when the operation halts is online and usable, but very few volumes are likely to bein this state. When restoring multiple volumes at once, the Backup System restores the full dump of every volume beforebeginning the level 1 incremental restore for any of them, and so on, completing the restore of every volume at a specificincremental level before beginning to restore data from the next incremental level. Unless a volume was dumped at fewerincremental levels than others being restored as part of the same operation, it is unlikely to be complete.

It is even more dangerous to interrupt a restore operation if you are overwriting the current contents of the volume. Dependingon how far the restore operation has progressed, it is possible that the volume is in such an inconsistent state that the BackupSystem removes it entirely. The data being restored is still available on tape or in the backup data file, but you must take extrasteps to re-create the volume.


7.5.2 Using the backup volrestore Command

The backup volrestore command is most appropriate when you need to restore a few volumes to a single site (partition on afile server machine). By default, it restores the volumes to their state at the time of the most recent dump operation (this istermed a full restore). You can also use the command to perform a date-specific restore, which restores only the dumps (full andincremental) performed before a specified date and time, leaving the volume in the state it was in at the time of the final relevantincremental dump. The backup diskrestore and backup volsetrestore commands can only perform full restores.

You can restore data into a new copy of each volume rather than overwriting the current version, by including the -extensionargument. After mounting the new volume in the filespace, you can compare the contents of the two and decide which to keeppermanently.

The following list summarizes how to combine the backup volrestore command’s arguments to restore a volume in differentways:

• To perform a date-specific restore as described just previously, use the -date argument to specify the date and optionally time.The Backup System restores the most recent full dump and each subsequent incremental dump for which the clone date of thevolume included in the dump is before the indicated date and time (for a definition of the clone date, see Step 4 in How YourConfiguration Choices Influence the Dump Process). You can combine this argument with the -extension argument to placethe date-specific restore in a new volume.

• To move a volume to a new site as you overwrite its contents with the restored data, use the -server and -partition arguments,singly or in combination, to specify the new site rather than the current site. The Backup System creates a new volume at thatsite, removes the existing volume, and updates the site information in the volume’s VLDB entry. The volume’s backup versionis not removed automatically from the original site, if it exists. Use the vos remove command to remove it and the vos backupcommand to create a backup version at the new site.

• To create a new volume to house the restored data, rather than overwriting an existing volume, use the -extension argument.The Backup System creates the new volume on the server and partition named by the -server and -partition arguments, derivesits name by adding the extension to the name specified with the -volume argument, and creates a new VLDB entry for it. Thecommand does not affect the existing volume in any way. However, if a volume with the specified extension also already exists,the command overwrites it. To make the contents of the new volume accessible, use the fs mkmount command to mount it.You can then compare its contents to those of the existing volume, to see which to retain permanently.

• To restore a volume that no longer exists on an AFS server partition, but for which you have backed up data, specify the nameof the new volume with the -volume argument and use the -server and -partition arguments to place it at the desired site. TheBackup System creates a new volume and new VLDB entry.

7.5.3 To restore volumes with the backup volrestore command





Repeat the command for each Tape Coordinator if you are using more than one tape device.



% backup


5. Issue the backup volrestore command with the desired arguments.

backup> volrestore <destination machine> <destination partition> \-volume <volume(s) to restore>+ \[-extension <new volume name extension>] \[-date <date from which to restore>] \[-portoffset <TC port offsets>+] [-n]

where

volr Is the shortest acceptable abbreviation of volrestore.

destination machine Names the file server machine on which to restore each volume. It does not have to be a volume’scurrent site.

destination partition Names the partition on which to restore each volume. It does not have to be a volume’s current site.

-volume Names each volume to restore. It is best to provide the base (read/write) name, for the reasons discussed inMaking Restore Operations More Efficient.

-extension Creates a new volume to house the restored data, with a name derived by appending the specified string toeach volume named by the -volume extension. The Backup System preserves the contents of the existing volume ifit still exists. Do not use either of the .readonly or .backup extensions, which are reserved. The combination of basevolume name and extension cannot exceed 22 characters in length. If you want a period to separate the extensionfrom the name, specify it as the first character of the string (as in .rst, for example).

-date Specifies a date and optionally time; the restored volume includes data from dumps performed before the dateonly. Provide a value in the format mm/dd/yyyy [hh:MM], where the required mm/dd/yyyy portion indicates themonth (mm), day (dd), and year (yyyy), and the optional hh:MM portion indicates the hour and minutes in 24-hourformat (for example, the value 14:36 represents 2:36 p.m.). If omitted, the time defaults to 59 seconds after midnight(00:00:59 hours).Valid values for the year range from 1970 to 2037; higher values are not valid because the latest possible date in thestandard UNIX representation is in February 2038. The command interpreter automatically reduces any later date tothe maximum value.


-portoffset Specifies one or more port offset numbers, each corresponding to a Tape Coordinator to use in the operation.If there is more than one value, the Backup System uses the first one when restoring the full dump of each volume,the second one when restoring the level 1 incremental dump of each volume, and so on. It uses the final value in thelist when restoring dumps at the corresponding depth in the dump hierarchy and all dumps at lower levels.Provide this argument unless the default value of 0 (zero) is appropriate for all dumps. If 0 is just one of the values inthe list, provide it explicitly in the appropriate order.

-n Displays the list of tapes that contain the dumps required by the restore operation, without actually performing theoperation.

6. If you did not include the -noautoquery flag when you issued the butc command, or the device’s CFG_device_nameconfiguration file includes the instruction AUTOQUERY YES, then the Tape Coordinator prompts you to place the tapein the device’s drive. You have already done so, but you must now press <Return> to indicate that the tape is ready forlabeling.


7. After the restore operation completes, review the Backup System’s log files to check for errors. Use the bos getlogcommand as instructed in Displaying Server Process Log Files to read the /usr/afs/logs/BackupLog file, and a text editoron the Tape Coordinator machine to read the TE_device_name and TL_device_name files in the local /usr/afs/backupdirectory.


7.5.4 Using the backup diskrestore Command

The backup diskrestore command is most appropriate when you need to restore all of the volumes on an AFS server partition,perhaps because a hardware failure has corrupted or destroyed all of the data. The command performs a full restore of all of theread/write volumes for which the VLDB lists the specified partition as the current site, using the dumps of either the read/write orbackup version of each volume depending on which type was dumped more recently. (You can restore any backup or read-onlyvolumes that resided on the partition by using the vos backup and vos release commands after the backup diskrestore operationis complete.)

By default, the Backup System restores the volumes to the site they previously occupied. To move the partition contents to adifferent site, use the -newserver and -newpartition arguments, singly or in combination.

By default, the Backup System overwrites the contents of existing volumes with the restored data. To create a new volume tohouse the restored data instead, use the -extension argument. The Backup System creates the new volume at the site designatedby the -newserver and -newpartition arguments if they are used or the -server and -partition arguments otherwise. It derivesthe volume name by adding the extension to the read/write base name listed in the VLDB, and creates a new VLDB entry. Thecommand does not affect the existing volume in any way. However, if a volume with the specified extension also already exists,the command overwrites it.

If a partition seems damaged, be sure not to run the vos syncserv command before the backup diskrestore command. As noted,the Backup System restores volumes according to VLDB site definitions. The vos syncserv command sometimes removes avolume’s VLDB entry when the corruption on the partition is so severe that the Volume Server cannot confirm the volume’spresence.

7.5.5 To restore a partition with the backup diskrestore command








% backup

5. Issue the backup diskrestore command with the desired arguments.

backup> diskrestore <machine to restore> <partition to restore> \[-portoffset <TC port offset>+] \[-newserver <destination machine>] \[-newpartition <destination partition>] \[-extension <new volume name extension>] [-n]

where

di Is the shortest acceptable abbreviation of diskrestore.

machine to restore Names the file server machine that the VLDB lists as the site of the volumes that need to be restored.

partition to restore Names the partition that the VLDB lists as the site of the volumes that need to be restored.



-newserver Names an alternate file server machine to which to restore the volumes. If you omit this argument, the volumesare restored to the file server machine named by the -server argument.

-newpartition Names an alternate partition to which to restore the data. If you omit this argument, the volumes arerestored to the partition named by the -partition argument.

-extension Creates a new volume for each volume being restored, to house the restored data, appending the specifiedstring to the volume’s read/write base name as listed in the VLDB. Any string other than .readonly or .backup isacceptable, but the combination of the base name and extension cannot exceed 22 characters in length. To use aperiod to separate the extension from the name, specify it as the first character of the string (as in .rst, for example).

-n Displays a list of the tapes necessary to perform the requested restore, without actually performing the operation.




7.5.6 Using the backup volsetrestore Command

The backup volsetrestore command is most appropriate when you need to perform a full restore of several read/write volumes,placing each at a specified site. You specify the volumes to restore either by naming a volume set with the -name argument orby listing each volume’s name and restoration site in a file named by the -file argument, as described in the following sections.

Because the backup volsetrestore command enables you to restore a large number of volumes with a single command, therestore operation can potentially take hours to complete. One way to reduce the time is to run multiple instances of the commandsimultaneously. Either use the -name argument to specify disjoint volume sets for each command, or the -file argument to namefiles that list different volumes. You must have several Tape Coordinators available to read the required tapes. Depending onhow the volumes to be restored were dumped to tape, specifying disjoint volume sets can also reduce the number of tape changesrequired.

7.5.6.1 Restoring a Volume Set with the -name Argument

Use the -name argument to restore a group of volumes defined in a volume set. The Backup System creates a list of the volumesin the VLDB that match the server, partition, and volume name criteria defined in the volume set’s volume entries, and for whichdumps are available. The volumes do not have to exist on the server partition as long as the VLDB still lists them (this can happenwhen, for instance, a hardware problem destroys the contents of an entire disk).

By default, the Backup System restores, as a read/write volume, each volume that matches the volume set criteria to the sitelisted in the VLDB. If a volume of the matching name exists at that site, its current contents are overwritten. You can insteadcreate a new volume to house the restored data by including the -extension argument. The Backup System creates the newvolume at the existing volume’s site, derives its name by adding the extension to the existing volume’s read/write base name, andcreates a new VLDB entry for it. The command does not affect the existing volume in any way. However, if a volume with thespecified extension also already exists, the command overwrites it. To make the contents of the new volume accessible, use the


fs mkmount command to mount it. You can then compare its contents to those of the existing volume, to see which to retainpermanently.

It is not required that the volume set was previously used to back up volumes (was used as the -volumeset option to the backupdump command). It can be defined especially to match the volumes that need to be restored with this command, and that isusually the better choice. Indeed, a temporary volume set, created by including the -temporary flag to the backup addvolsetcommand, can be especially useful in this context (instructions appear in Defining and Displaying Volume Sets and VolumeEntries). A temporary volume set is not added to the Backup Database and exists only during the current interactive backupsession, which is suitable if the volume set is needed only to complete the single restore operation initialized by this command.

The reason that a specially defined volume set is probably better is that volume sets previously defined for use in dump operationsusually match the backup version of volumes, whereas for a restore operation it is best to define volume entries that match thebase (read/write) name. In this case, the Backup System searches the Backup Database for the newest dump set that includes adump of either the read/write or the backup version of the volume. If, in contrast, a volume entry explicitly matches the volume’sbackup or read-only version, the Backup System uses dumps of that volume version only, restoring them to a read/write volumeby stripping off the .backup or .readonly extension.

If there are VLDB entries that match the volume set criteria, but for which there are no dumps recorded in the Backup Database,the Backup System cannot restore them. It generates an error message on the standard error stream for each one.

7.5.6.2 Restoring Volumes Listed in a File with the -file Argument

Use the -file argument to specify the name and site of each read/write volume to restore. Each volume’s entry must appear on itsown (unbroken) line in the file, and comply with the following format:

machine partition volume [comments...]

where

machine Names the file server machine to which to restore the volume. You can move the volume as you restore it by naming amachine other than the current site.

partition Names the partition to which to restore the volume. You can move the volume as you restore it by naming a partitionother than the current site.

volume Names the volume to restore. Specify the base (read/write) name to have the Backup System search the Backup Databasefor the newest dump set that includes a dump of either the read/write or the backup version of the volume. It restores thedumps of that version of the volume, starting with the most recent full dump. If, in contrast, you include the .backupor .readonly extension, the Backup System restores dumps of that volume version only, but into a read/write volumewithout the extension. The base name must match the name used in Backup Database dump records rather than in theVLDB, if they differ, because the Backup System does not consult the VLDB when you use the -file argument.

comments... Is any other text. The Backup System ignores any text on each line that appears after the volume name, so you canuse this field for helpful notes.

Do not use wildcards (for example, .*) in the machine, partition, or volume fields. It is acceptable for multiple lines in the file toname the same volume, but the Backup System processes only the first of them.

By default, the Backup System replaces the existing version of each volume with the restored data, placing the volume at the sitespecified in the machine and partition fields. You can instead create a new volume to house the restored contents by includingthe -extension argument. The Backup System creates a new volume at the site named in the machine and partition fields, derivesits name by adding the specified extension to the read/write version of the name in the volume field, and creates a new VLDBentry for it. The command does not affect the existing volume in any way. However, if a volume with the specified extension alsoalready exists, the command overwrites it. To make the contents of the new volume accessible, use the fs mkmount commandto mount it. You can then compare its contents to those of the existing volume, to see which to retain permanently.

If the file includes entries for volumes that have no dumps recorded in the Backup Database, the Backup System cannot restorethem. It generates an error message on the standard error stream for each one.

One way to generate a file to use as input to the -file argument is to issue the command with the -name and -n options and directthe output to a file. The output includes a line like the following for each volume (shown here on two lines only for legibilityreasons); the value comes from the source indicated in the following list:


machine partition volume_dumped # as volume_restored; \tape_name (tape_ID); pos position_number; date

where

machine Names the file server machine that currently houses the volume, as listed in the VLDB.

partition Names the partition that currently houses the volume, as listed in the VLDB.

volume_dumped Specifies the version (read/write or backup) of the volume that was dumped, as listed in the Backup Database.

volume_restored Specifies the name under which the Backup System restores the volume when the -n flag is not included. Ifyou include the -extension argument with the -name and -n options, then the extension appears on the name in this field(as in user.pat.rst, for example).

tape_name Names the tape containing the dump of the volume, from the Backup Database. If the tape has a permanent name,it appears here; otherwise, it is the AFS tape name.

tape_ID The tape ID of the tape containing the dump of the volume, from the Backup Database.

position_number Specifies the dump’s position on the tape (for example, 31 indicates that 30 volume dumps precede the currentone on the tape). If the dump was written to a backup data file, this number is the ordinal of the 16 KB-offset at which thevolume’s data begins.

date The date and time when the volume was dumped.

To make the entries suitable for use with the -file argument, edit them as indicated:

• The Backup System uses only the first three fields on each line of the input file, and so ignores all the fields after the numbersign (#). You can remove them if it makes it easier for you to read the file, but that is not necessary.

• The volume_dumped (third) field of each line in the output file becomes the volume field in the input file. The Backup Systemrestores data to read/write volumes only, so remove the .backup or .readonly extension if it appears on the name in thevolume_dumped field.

• The output file includes a line for every dump operation in which a specific volume was included (the full dump and anyincremental dumps), but the Backup System only processes the first line in the input file that mentions a specific volume. Youcan remove the repeated lines if it makes the file easier for you to read.

• The machine and partition fields on an output line designate the volume’s current site. To move the volume to another locationas you restore it, change the values.

7.5.7 To restore a group of volumes with the backup volsetrestore command









% backup

5. (Optional) If appropriate, issue the (backup) addvolset command to create a new volume set expressly for this restoreoperation. Include the -temporary flag if you do not need to add the volume set to the Backup Database. Then issue oneor more (backup) addvolentry commands to create volume entries that include only the volumes to be restored. Completeinstructions appear in Defining and Displaying Volume Sets and Volume Entries.

backup> addvolset <volume set name> [-temporary]backup> addvolentry -name <volume set name> \

-server <machine name> \-partition <partition name> \-volumes <volume name (regular expression)>

6. Issue the backup volsetrestore command with the desired arguments.

backup> volsetrestore [-name <volume set name>] \[-file <file name>] \[-portoffset <TC port offset>+] \[-extension <new volume name extension>] [-n]

where

-name Names a volume set to restore. The Backup System restores all of the volumes listed in the VLDB that matchthe volume set’s volume entries, as described in Restoring a Volume Set with the -name Argument. Provide thisargument or the -file argument, but not both.

-file Specifies the full pathname of a file that lists one or more volumes and the site (file server machine and partition) towhich to restore each. The input file has the format described in Restoring Volumes Listed in a File with the -fileArgument. Use either this argument or the -name argument, but not both.


-extension Creates a new volume for each volume being restored, to house the restored data, appending the specifiedstring to the volume’s read/write base name as listed in the VLDB. Any string other than .readonly or .backup isacceptable, but the combination of the base name and extension cannot exceed 22 characters in length. To use aperiod to separate the extension from the name, specify it as the first character of the string (as in .rst, for example).

-n Displays a list of the volumes to be restored when the flag is not included, without actually restoring them. The Outputsection of this reference page details the format of the output. When combined with the -name argument, its outputis easily edited for use as input to the -file argument on a subsequent backup volsetrestore command.





7.6 Maintaining the Backup Database

The Backup Database stores all of the configuration and tracking information that the Backup System uses when dumping andrestoring data. If a hardware failure or other problem on a database server machine corrupts or damages the database, it isrelatively easy to recreate the configuration information (the dump hierarchy and lists of volume sets and Tape Coordinator portoffset numbers). However, restoring the dump tracking information (dump records) is more complicated and time-consuming.To protect yourself against loss of data, back up the Backup Database itself to tape on a regular schedule.

Another potential concern is that the Backup Database can grow large rather quickly, because the Backup System keeps verydetailed and cross-referenced records of dump operations. Backup operations become less efficient if the Backup Server has tonavigate through a large number of obsolete records to find the data it needs. To keep the database to a manageable size, usethe backup deletedump command to delete obsolete records, as described in Removing Obsolete Records from the BackupDatabase. If you later find that you have removed records that you still need, you can use the backup scantape command to readthe information from the dump and tape labels on the corresponding tapes back into the database, as instructed in To scan thecontents of a tape.

7.6.1 Backing Up and Restoring the Backup Database

Because of the importance of the information in the Backup Database, it is best to back it up to tape or other permanent mediaon a regular basis. As for the other AFS, administrative databases, the recommended method is to use a utility designed to backup a machine’s local disk, such as the UNIX tar command. For instructions, see Backing Up and Restoring the AdministrativeDatabases.

In the rare event that the Backup Database seems damaged or corrupted, you can use the backup dbverify command to check itsstatus. If it is corrupted, use the backup savedb command to repair some types of damage. Then use the backup restoredb toreturn the corrected database to the local disks of the database server machines. For instructions, see Checking for and RepairingCorruption in the Backup Database.

7.6.2 Checking for and Repairing Corruption in the Backup Database

In rare cases, the Backup Database can become damaged or corrupted, perhaps because of disk or other hardware errors. Usethe backup dbverify command to check the integrity of the database. If it is corrupted, the most efficient way to repair it is touse the backup savedb command to copy the database to tape. The command automatically repairs several types of corruption,and you can then use the backup restoredb command to transfer the repaired copy of the database back to the local disks of thedatabase server machines.

The backup savedb command also removes orphan blocks, which are ranges of memory that the Backup Server preallocated inthe database but cannot use. Orphan blocks do not interfere with database access, but do waste disk space. The backup dbverifycommand reports the existence of orphan blocks if you include the -detail flag.

7.6.3 To verify the integrity of the Backup Database



2. Issue the backup dbverify command to check the integrity of the Backup Database.

% backup dbverify [-detail]

where

db Is the shortest acceptable abbreviation of dbverify.

-detail Reports the existence of orphan blocks and other information about the database, as described on the backupdbverify reference page in the OpenAFS Administration Reference.


The output reports one of the following messages:

• Database OK indicates that the Backup Database is undamaged.

• Database not OK indicates that the Backup Database is damaged. To recover from the problem, use the instructionsin To repair corruption in the Backup Database.

7.6.4 To repair corruption in the Backup Database

1. Log in as the local superuser root on each database server machine in the cell.



3. If writing to tape, place a tape in the appropriate device.

4. Working on one of the machines, issue the backup command to enter interactive mode.

# backup -localauth

where -localauth constructs a server ticket from the local /usr/afs/etc/KeyFile file. This flag enables you to issue aprivileged command while logged in as the local superuser root but without AFS administrative tokens.

5. Verify that no backup operations are actively running. If necessary, issue the (backup) status command as described in Tocheck the status of a Tape Coordinator process. Repeat for each Tape Coordinator port offset in turn.

backup> status -portoffset <TC port offset>

6. Issue the (backup) savedb command to repair corruption in the database as it is written to tape or a file.

backup> savedb [-portoffset <TC port offset>]

where

sa Is the shortest acceptable abbreviation of savedb.

-portoffset Specifies the port offset number of the Tape Coordinator handling the tape or backup data file for this operation.You must provide this argument unless the default value of 0 (zero) is appropriate.

7. Exit interactive mode.

backup> quit

8. On each machine in turn, issue the bos shutdown command to shut down the Backup Server process. Include the -localauth flag because you are logged in as the local superuser root, but do not necessarily have administrative tokens. Forcomplete command syntax, see To stop processes temporarily.

# /usr/afs/bin/bos shutdown <machine name> buserver -localauth -wait

9. On each machine in turn, issue the following commands to remove the Backup Database.

# cd /usr/afs/db# rm bdb.DB0# rm bdb.DBSYS1

10. On each machine in turn, starting with the machine with the lowest IP address, issue the bos start command to restartthe Backup Server process, which creates a zero-length copy of the Backup Database as it starts. For complete commandsyntax, see To start processes by changing their status flags to Run.


# /usr/afs/bin/bos start <machine name> buserver -localauth

11. Working on one of the machines, issue the backup command to enter interactive mode.

# backup -localauth

where -localauth constructs a server ticket from the local /usr/afs/etc/KeyFile file.

12. Issue the (backup) addhost command to create an entry in the new, empty database for the Tape Coordinator processhandling the tape or file from which you are reading the repaired copy of the database (presumably the process you startedin Step 2 and which performed the backup savedb operation in Step 6). For complete syntax, see Step 8 in To configure aTape Coordinator machine.

backup> addhost <tape machine name> [<TC port offset>]

13. Issue the (backup) restoredb command to copy the repaired database to the database server machines.

backup> restoredb [-portoffset <TC port offset>]

where

res Is the shortest acceptable abbreviation of restoredb.-portoffset Specifies the port offset number of the Tape Coordinator handling the tape or backup data file for this operation.

You must provide this argument unless the default value of 0 (zero) is appropriate.

14. (Optional) Exit interactive mode if you do not plan to issue any additional backup commands.

backup> quit

15. (Optional) If desired, enter Ctrl-d or another interrupt signal to exit the root shell on each database server machine. Youcan also issue the Ctrl-c signal on the Tape Coordinator machine to stop the process.

7.6.5 Removing Obsolete Records from the Backup Database

Whenever you recycle or relabel a tape using the backup dump or backup labeltape command, the Backup System automati-cally removes all of the dump records for the dumps contained on the tape and all other tapes in the dump set. However, obsoleterecords can still accumulate in the Backup Database over time. For example, when you discard a backup tape after using it themaximum number of times recommended by the manufacturer, the records for dumps on it remain in the database. Similarly, theBackup System does not automatically remove a dump’s record when the dump reaches its expiration date, but only if you thenrecycle or relabel the tape that contains the dump. Finally, if a backup operation halts in the middle, the records for any volumessuccessfully written to tape before the halt remain in the database.

A very large Backup Database can make backup operations less efficient because the Backup Server has to navigate through alarge number of records to find the ones it needs. To remove obsolete records, use the backup deletedump command. Eitheridentify individual dumps by dump ID number, or specify the removal of all dumps created during a certain time period. Keepin mind that you cannot remove the record of an appended dump except by removing the record of its initial dump, whichremoves the records of all associated appended dumps. Removing records of a dump makes it impossible to restore data fromthe corresponding tapes or from any dump that refers to the deleted dump as its parent, directly or indirectly. That is, restoreoperations must begin with the full dump and continue with each incremental dump in order. If you have removed the recordsfor a specific dump, you cannot restore any data from later incremental dumps.

Another way to truncate the Backup Database is to include the -archive argument to the backup savedb command. After a copyof the database is written to tape or to a backup data file, the Backup Server deletes the dump records for all dump operationswith timestamps prior to the date and time you specify. However, issuing the backup deletedump command with only the -toargument is equivalent in effect and is simpler because it does not require starting a Tape Coordinator process as the backupsavedb command does. For further information on the -archive argument to the backup savedb command, see the command’sreference page in the OpenAFS Administration Reference.

If you later need to access deleted dump records, and the corresponding tapes still exist, you can use the -dbadd argument to thebackup scantape command to scan their contents into the database, as instructed in To scan the contents of a tape.


7.6.6 To delete dump records from the Backup Database



2. (Optional) Issue the backup command to enter interactive mode, if you want to delete multiple records or issue additionalcommands. The interactive prompt appears in the following step.

% backup

3. (Optional) Issue the backup dumpinfo command to list information from the Backup Database that can help you decidewhich records to delete. For detailed instructions, see To display dump records.

backup> dumpinfo [<no. of dumps>] [-id <dump id>] [-verbose]

4. Issue the backup deletedump command to delete one or more dump sets.

backup> deletedump [-dumpid <dumpid>+] [-from <date time>] \[-to <date time>]

where

dele Is the shortest acceptable abbreviation of deletedump.

-dumpid Specifies the dump ID of each initial dump to delete from the Backup Database. The records for all associatedappended dumps are also deleted. Provide either this argument or the -to (and optionally, -from) argument.

-from Specifies the beginning of a range of dates; the record for any dump created during the indicated period of time isdeleted.To omit all records before the time indicated with the -to argument, omit this argument. Otherwise provide a valuein the following formatmm/dd/yyyy [hh:MM]where the month (mm), day (dd), and year (yyyy) are required. You can omit the hour and minutes (hh:MM) toindicate the default of midnight (00:00 hours). If you provide them, use 24-hour format (for example, the value 14:36represents 2:36 p.m.).You must provide the -to argument along with this one.


-to Specifies the end of a range of dates; the record of any dump created during the range is deleted from the BackupDatabase.To delete all records created after the date you specify with the -from argument, specify the value NOW. To deleteevery dump record in the Backup Database, provide the value NOW and omit the -from argument. Otherwise,provide a date value in the same format as described for the -from argument. Valid values for the year (yyyy) rangefrom 1970 to 2037; higher values are not valid because the latest possible date in the standard UNIX representationis in early 2038. The command interpreter automatically reduces any later date to the maximum value in 2038.If you omit the time portion (hh:MM), it defaults to 59 seconds after midnight (00:00:59 hours). Similarly, thebackup command interpreter automatically adds 59 seconds to any time value you provide. In both cases, adding59 seconds compensates for how the Backup Database and backup dumpinfo command represent dump creationtimes in hours and minutes only. For example, the Database records a creation timestamp of 20:55 for any dumpoperation that begins between 20:55:00 and 20:55:59. Automatically adding 59 seconds to a time thus includes therecords for all dumps created during that minute.Provide either this argument, or the -dumpid argument. This argument is required if the -from argument is provided.


Chapter 8

Monitoring and Auditing AFS Performance

AFS comes with three main monitoring tools:

• The scout program, which monitors and gathers statistics on File Server performance.

• The fstrace command suite, which traces Cache Manager operations in detail.

• The afsmonitor program, which monitors and gathers statistics on both the File Server and the Cache Manager.

AFS also provides a tool for auditing AFS events on file server machines running AIX.



Initialize the scout program scoutDisplay information about a trace log fstrace lslogDisplay information about an event set fstrace lssetChange the size of a trace log fstrace setlogSet the state of an event set fstrace setsetDump contents of a trace log fstrace dumpClear a trace log fstrace clearInitialize the afsmonitor program afsmonitor

8.2 Using the scout Program

The scout program monitors the status of the File Server process running on file server machines. It periodically collects statisticsfrom a specified set of File Server processes, displays them in a graphical format, and alerts you if any of the statistics exceed aconfigurable threshold.

More specifically, the scout program includes the following features.

• You can monitor, from a single location, the File Server process on any number of server machines from the local and foreigncells. The number is limited only by the size of the display window, which must be large enough to display the statistics.

• You can set a threshold for many of the statistics. When the value of a statistic exceeds the threshold, the scout programhighlights it (displays it in reverse video) to draw your attention to it. If the value goes back under the threshold, the high-lighting is deactivated. You control the thresholds, so highlighting reflects what you consider to be a noteworthy situation. SeeHighlighting Significant Statistics.


• The scout program alerts you to File Server process, machine, and network outages by highlighting the name of each machinethat does not respond to its probe, enabling you to respond more quickly.

• You can set how often the scout program collects statistics from the File Server processes.

8.2.1 System Requirements

The scout program runs on any AFS client machine that has access to the curses graphics package, which most UNIX distri-butions include as a standard utility. It can run on both dumb terminals and under windowing systems that emulate terminals,but the output looks best on machines that support reverse video and cursor addressing. For best results, set the TERM environ-ment variable to the correct terminal type, or one with characteristics similar to the actual ones. For machines running AIX, therecommended TERM setting is vt100, assuming the terminal is similar to that. For other operating systems, the wider range ofacceptable values includes xterm, xterms, vt100, vt200, and wyse85.

No privilege is required to run the scout program, so any user who can access the directory where its binary resides (the /us-r/afsws/bin directory in the conventional configuration) can use it. The program’s probes for collecting statistics do not imposea significant burden on the File Server process, but you can restrict its use by placing the binary file in a directory with a morerestrictive access control list (ACL).

Multiple instances of the scout program can run on a single client machine, each over its own dedicated connection (in its ownwindow). It must run in the foreground, so the window in which it runs does not accept further input except for an interruptsignal.

You can also run the scout program on several machines and view its output on a single machine, by opening telnet connectionsto the other machines from the central one and initializing the program in each remote window. In this case, you can include the-host flag to the scout command to make the name of each remote machine appear in the banner line at the top of the windowdisplaying its output. See The Banner Line.

8.2.2 Using the -basename argument to Specify a Domain Name

As previously mentioned, the scout program can monitor the File Server process on any number of file server machines. If all ofthe machines belong to the same cell, then their hostnames probably all have the same domain name suffix, such as example.comin the Example Corporation cell. In this case, you can use the -basename argument to the scout command, which has severaladvantages:

• You can omit the domain name suffix as you enter each file server machine’s name on the command line. The scout programautomatically appends the domain name to each machine’s name, resulting in a fully-qualified hostname. You can omit thedomain name suffix even when you don’t include the -basename argument, but in that case correct resolution of the namedepends on the state of your cell’s naming service at the time of connection.

• The machine names are more likely to fit in the appropriate column of the display without having to be truncated (for more ontruncating names in the display column, see The Statistics Display Region).

• The domain name appears in the banner line at the top of the display window to indicate the name of the cell you are monitoring.

8.2.3 The Layout of the scout Display

The scout program can display statistics either in a dedicated window or on a plain screen if a windowing environment is notavailable. For best results, use a window or screen that can print in reverse video and do cursor addressing.

The scout program screen has three main regions: the banner line, the statistics display region and the probe/message line. Thissection describes their contents, and graphic examples appear in Example Commands and Displays.


8.2.3.1 The Banner Line

By default, the string scout appears in the banner line at the top of the window or screen, to indicate that the scout program isrunning. You can display two additional types of information by include the appropriate option on the command line:

• Include the -host flag to display the local machine’s name in the banner line. This is particularly useful when you are runningthe scout program on several machines but displaying the results on a single machine.

For example, the following banner line appears when you run the scout program on the machine client1.example.com and usethe-host flag:

[client1.example.com] scout

• Include the -basename argument to display the specified cell domain name in the banner line. For further discussion, see Usingthe -basename argument to Specify a Domain Name.

For example, if you specify a value of example.com for the -basename argument, the banner line reads:

scout for example.com

8.2.3.2 The Statistics Display Region

The statistics display region occupies most of the window and is divided into six columns. The following list describes them asthey appear from left to right in the window.

Conn Displays the number of RPC connections open between the File Server process and client machines. This number nor-mally equals or exceeds the number in the fourth Ws column. It can exceed the number in that column because each useron the machine can have more than one connection open at once, and one client machine can handle several users.

Fetch Displays the number of fetch-type RPCs (fetch data, fetch access list, and fetch status) that the File Server process hasreceived from client machines since it started. It resets to zero when the File Server process restarts.

Store Displays the number of store-type RPCs (store data, store access list, and store status) that the File Server process hasreceived from client machines since it started. It resets to zero when the File Server process restarts.

Ws Displays the number of client machines (workstations) that have communicated with the File Server process within the last15 minutes (such machines are termed active). This number is likely to be smaller than the number in the Conn) columnbecause a single client machine can have several connections open to one File Server process.

[Unlabeled column] Displays the name of the file server machine on which the File Server process is running. It is 12 characterswide. Longer names are truncated and an asterisk (*) appears as the last character in the name. If all machines have thesame domain name suffix, you can use the -basename argument to decrease the need for truncation; see Using the -basename argument to Specify a Domain Name.

Disk attn Displays the number of kilobyte blocks available on up to 26 of the file server machine’s AFS server (/vicep)partitions. The display for each partition has the following format:

partition_letter:free_blocks

For example, a:8949 indicates that partition /vicepa has 8,949 KB free. If the window is not wide enough for all partitionentries to appear on a single line, the scout program automatically stacks the partition entries into subcolumns within thesixth column.

The label on the Disk attn column indicates the threshold value at which entries in the column become highlighted.By default, the scout program highlights a partition that is over 95% full, in which case the label is as follows:

Disk attn: > 95% used

For more on this threshold and its effect on highlighting, see Highlighting Significant Statistics.

For all columns except the fifth (file server machine name), you can use the -attention argument to set a threshold value abovewhich the scout program highlights the statistic. By default, only values in the fifth and sixth columns ever become highlighted.For instructions on using the -attention argument, see Highlighting Significant Statistics.


8.2.3.3 The Probe Reporting Line

The bottom line of the display indicates how many times the scout program has probed the File Server processes for statistics.The statistics gathered in the latest probe appear in the statistics display region. By default, the scout program probes the FileServers every 60 seconds, but you can use the -frequency argument to specify a different probe frequency.

8.2.4 Highlighting Significant Statistics

To draw your attention to a statistic that currently exceed a threshold value, the scout program displays it in reverse video(highlights it). You can set the threshold value for most statistics, and so determine which values are worthy of special attentionand which are normal.

8.2.4.1 Highlighting Server Outages

The only column in which you cannot control highlighting is the fifth, which identifies the file server machine for which statisticsare displayed in the other columns. The scout program uses highlighting in this column to indicate that the File Server processon a machine fails to respond to its probe, and automatically blanks out the other columns. Failure to respond to the probe canindicate a File Server process, file server machine, or network outage, so the highlighting draws your attention to a situation thatis probably interrupting service to users.

When the File Server process once again responds to the probes, its name appears normally and statistics reappear in the othercolumns. If all machine names become highlighted at once, a possible network outage has disrupted the connection between thefile server machines and the client machine running the scout program.

8.2.4.2 Highlighting for Extreme Statistic Values

To set the threshold value for one or more of the five statistics-displaying columns, use the -attention argument. The thresholdvalue applies to all File Server processes you are monitoring (you cannot set different thresholds for different machines). Fordetails, see the syntax description in To start the scout program.

It is not possible to change the threshold values for a running scout program. Stop the current program and start a new one. Also,the scout program does not retain threshold values across restarts, so you must specify all thresholds every time you start theprogram.

8.2.5 Resizing the scout Display

Do not resize the display window while the scout program is running. Increasing the size does no harm, but the scout programdoes not necessarily adjust to the new dimensions. Decreasing the display’s width can disturb column alignment, making thedisplay harder to read. With any type of resizing, the scout program does not adjust the display in any way until it displays theresults of the next probe.

To resize the display effectively, stop the scout program, resize the window and then restart the program. Even in this case, thescout program’s response depends on the accuracy of the information it receives from the display environment. Testing duringdevelopment has shown that the display environment does not reliably provide information about window resizing. If you usethe X windowing system, issuing the following sequence of commands before starting the scout program (or placing them in theshell initialization file) sometimes makes it adjust properly to resizing.

% set noglob% eval ’/usr/bin/X11/resize’% unset noglob


8.2.6 To start the scout program

1. Open a dedicated command shell. If necessary, adjust it to the appropriate size.

2. Issue the scout command to start the program.

% scout [initcmd] -server <FileServer name(s) to monitor>+ \[-basename <base server name>] \[-frequency <poll frequency, in seconds>] [-host] \[-attention <specify attention (highlighting) level>+] \[-debug <turn debugging output on to the named file>]

where

initcmd Is an optional string that accommodates the command’s use of the AFS command parser. It can be omitted andignored.

-server Identifies each File Server process to monitor, by naming the file server machine it is running on. Provide fully-qualified hostnames unless the -basename argument is used. In that case, specify only the initial part of each machinename, omitting the domain name suffix common to all the machine names.

-basename Specifies the domain name suffix common to all of the file server machines named by the -server argument.For discussion of this argument’s effects, see Using the -basename argument to Specify a Domain Name.Do not include the period that separates the domain suffix from the initial part of the machine name, but do includeany periods that occur within the suffix itself. (For example, in the Example Corporation cell, the proper value isexample.com, not .example.com.)

-frequency Sets the frequency, in seconds, of the scout program’s probes to File Server processes. Specify an integergreater than 0 (zero). The default is 60 seconds.

-host Displays the name of the machine that is running the scout program in the display window’s banner line. By default,no machine name is displayed.

-attention Defines the threshold value at which to highlight one or more statistics. You can provide the pairs of statisticand threshold in any order, separating each pair and the parts of each pair with one or more spaces. The followinglist defines the syntax for each statistic.conn connections Highlights the value in the Conn (first) column when the number of connections that the File

Server has open to client machines exceeds the connections value. The highlighting deactivates when the valuegoes back below the threshold. There is no default threshold.

fetch fetch_RPCs Highlights the value in the Fetch (second) column when the number of fetch RPCs that clientshave made to the File Server process exceeds the fetch_RPCs value. The highlighting deactivates only when theFile Server process restarts, at which time the value returns to zero. There is no default threshold.

store store_RPCs Highlights the value in the Store (third) column when the number of store RPCs that clientshave made to the File Server process exceeds the store_RPCs value. The highlighting deactivates only when theFile Server process restarts, at which time the value returns to zero. There is no default threshold.

ws active_clients Highlights the value in the Ws (fourth) column when the number of active client machines (thosethat have contacted the File Server in the last 15 minutes) exceeds the active_clients value. The highlightingdeactivates when the value goes back below the threshold. There is no default threshold.

disk percent_full % or disk min_blocks Highlights the value for a partition in the Disk attn (sixth) columnwhen either the amount of disk space used exceeds the percentage indicated by thepercent_full value, or thenumber of free KB blocks is less than the min_blocks value. The highlighting deactivates when the value goesback below the percent_full threshold or above the min_blocks threshold.The value you specify appears in the header of the sixth column following the string Disk attn. The defaultthreshold is 95% full.Acceptable values for percent_full are the integers from the range 0 (zero) to 99, and you must include thepercent sign to distinguish this statistic from a min_blocks value..

The following example sets the threshold for the Conn column to 100, for the Ws column to 50, and for the Diskattn column to 75%. There is no threshold for the Fetch and Store columns.-attention conn 100 ws 50 disk 75%The following example has the same affect as the previous one except that it sets the threshold for the Disk attncolumn to 5000 free KB blocks:-attention disk 5000 ws 50 conn 100


-debug Enables debugging output and directs it into the specified file. Partial pathnames are interpreted relative to thecurrent working directory. By default, no debugging output is produced.

8.2.7 To stop the scout program

1. Enter Ctrl-c in the display window. This is the proper interrupt signal even if the general interrupt signal in your environ-ment is different.

8.2.8 Example Commands and Displays

This section presents examples of the scout program, combining different arguments and illustrating the screen displays thatresult.

In the first example, an administrator in the Example Corporation issues the scout command without providing any optionalarguments or flags. She includes the -server argument because she is providing multiple machine names. She chooses to specifyon the initial part of each machine’s name even though she has not used the -basename argument, relying on the cell’s nameservice to obtain the fully-qualified name that the scout program requires for establishing a connection.

% scout -server fs1 fs2

Figure 2 depicts the resulting display. Notice first that the machine names in the fifth (unlabeled) column appear in the formatthe administrator used on the command line. Now consider the second line in the display region, where the machine name fs2appears in the fifth column. The Conn and Ws columns together show that machine fs2 has 144 RPC connections open to 44client machines, demonstrating that multiple connections per client machine are possible. The Fetch column shows that clientmachines have made 2,734,278 fetch RPCs to machine fs2 since the File Server process last started and the Store columnshows that they have made 34,066 store RPCs.

Six partition entries appear in the Disk attn column, marked a through f (for /vicepa through /vicepf). They appear on threelines in two subcolumns because of the width of the window; if the window is wider, there are more subcolumns. Four of thepartition entries (a, c, d, and e) appear in reverse video to indicate that they are more than 95% full (the threshold value thatappears in the Disk attn header).

Figure 8.1: First example scout display

In the second example, the administrator uses more of the scout program’s optional arguments.


• She provides the machine names in the same form as in Example 1, but this time she also uses the -basename argument tospecify their domain name suffix, example.com. This implies that the scout program does not need the name service to expandthe names to fully-qualified hostnames, but the name service still converts the hostnames to IP addresses.

• She uses the -host flag to display in the banner line the name of the client machine where the scout program is running.

• She uses the -frequency argument to changes the probing frequency from its default of once per minute to once every fiveseconds.

• She uses the -attention argument to changes the highlighting threshold for partitions to a 5000 KB minimum rather than thedefault of 95% full.

% scout -server fs1 fs2 -basename example.com -host -frequency 5 -attention disk 5000

The use of optional arguments results in several differences between Figure 3 and Figure 2. First, because the -host flag isincluded, the banner line displays the name of the machine running the scout process as [client52] along with the basenameexample.com specified with the -basename argument.

Another difference is that two rather than four of machine fs2’s partitions appear in reverse video, even though their values arealmost the same as in Figure 2. This is because the administrator changed the highlight threshold to a 5000 block minimum, asalso reflected in the Disk attn column’s header. And while machine fs2’s partitions /vicepa and /vicepd are still 95% full,they have more than 5000 free blocks left; partitions /vicepc and /vicepe are highlighted because they have fewer than 5000blocks free.

Note also the result of changing the probe frequency, reflected in the probe reporting line at the bottom left corner of the display.Both this example and the previous one represent a time lapse of one minute after the administrator issues the scout command.In this example, however, the scout program has probed the File Server processes 12 times as opposed to once

Figure 8.2: Second example scout display

In Figure 4, an administrator in the State University cell monitors three of that cell’s file server machines. He uses the -basenameargument to specify the stateu.edu domain name.

% scout -server server2 server3 server4 -basename stateu.edu


Figure 8.3: Third example scout display

Figure 5 illustrates three of the scout program’s features. First, you can monitor file server machines from different cells ina single display: fs1.abc.com, server3.stateu.edu, and sv7.def.com. Because the machines belong to different cells, it is notpossible to provide the -basename argument.

Second, it illustrates how the display must truncate machine names that do not fit in the fifth column, using an asterisk at the endof the name to show that it is shortened.

Third, it illustrates what happens when the scout process cannot reach a File Server process, in this case the one on the machinesv7.def.com: it highlights the machine name and blanks out the values in the other columns.

Figure 8.4: Fourth example scout display


8.3 Using the fstrace Command Suite

This section describes the fstrace commands that system administrators employ to trace Cache Manager activity for debuggingpurposes. It assumes the reader is familiar with the Cache Manager concepts described in Administering Client Machines andthe Cache Manager.

The fstrace command suite monitors the internal activity of the Cache Manager and enables you to record, or trace, its operationsin detail. The operations, which are termed events, comprise the cm event set. Examples of cm events are fetching files andlooking up information for a listing of files and subdirectories using the UNIX ls command.

Following are the fstrace commands and their respective functions:

• The fstrace apropos command provides a short description of commands.

• The fstrace clear command clears the trace log.

• The fstrace dump command dumps the contents of the trace log.

• The fstrace help command provides a description and syntax for commands.

• The fstrace lslog command lists information about the trace log.

• The fstrace lsset command lists information about the event set.

• The fstrace setlog command changes the size of the trace log.

• The fstrace setset command sets the state of the event set.

8.3.1 About the fstrace Command Suite

The fstrace command suite replaces and greatly expands the functionality formerly provided by the fs debug command. Its in-tended use is to aid in diagnosis of specific Cache Manager problems, such as client machine hangs, cache consistency problems,clock synchronization errors, and failures to access a volume or AFS file. Therefore, it is best not to keep fstrace logging enabledat all times, unlike the logging for AFS server processes.

Most of the messages in the trace log correspond to low-level Cache Manager operations. It is likely that only personnel familiarwith the AFS source code can interpret them. If you have an AFS source license, you can attempt to interpret the trace yourself,or work with the AFS Product Support group to resolve the underlying problems. If you do not have an AFS source license, itis probably more efficient to contact the AFS Product Support group immediately in case of problems. They can instruct you toactivate fstrace tracing if appropriate.

The log can grow in size very quickly; this can use valuable disk space if you are writing to a file in the local file space.Additionally, if the size of the log becomes too large, it can become difficult to parse the results for pertinent information.

When AFS tracing is enabled, each time a cm event occurs, a message is written to the trace log, cmfx. To diagnose a problem,read the output of the trace log and analyze the operations executed by the Cache Manager. The default size of the trace log is 60KB, but you can increase or decrease it.

To use the fstrace command suite, you must first enable tracing and reserve, or allocate, space for the trace log with the fstracesetset command. With this command, you can set the cm event set to one of three states to enable or disable tracing for the eventset and to allocate or deallocate space for the trace log in the kernel:

active Enables tracing for the event set and allocates space for the trace log.

inactive Temporarily disables tracing for the event set; however, the event set continues to allocate space occupied by thelog to which it sends data.

dormant Disables tracing for the event set; furthermore, the event set releases the space occupied by the log to which it sendsdata. When the cm event set that sends data to the cmfx trace log is in this state, the space allocated for that log is freed ordeallocated.


Both event sets and trace logs can be designated as persistent, which prevents accidental resetting of an event set’s state orclearing of a trace log. The designation is made as the kernel is compiled and cannot be changed.

If an event set such as cm is persistent, you can change its state only by including the -set argument to the fstrace setset command.(That is, you cannot change its state along with the state of all other event sets by issuing the fstrace setset command with noarguments.) Similarly, if a trace log such as cmfx is persistent, you can clear it only by including either the -set or -log argumentto the fstrace clear command (you cannot clear it along with all other trace logs by issuing the fstrace clear command with noarguments.)

When a problem occurs, set the cm event set to active using the fstrace setset command. When tracing is enabled on a busy AFSclient, the volume of events being recorded is significant; therefore, when you are diagnosing problems, restrict AFS activity asmuch as possible to minimize the amount of extraneous tracing in the log. Because tracing can have a negative impact on systemperformance, leave cm tracing in the dormant state when you are not diagnosing problems.

If a problem is reproducible, clear the cmfx trace log with the fstrace clear command and reproduce the problem. If the problemis not easily reproduced, keep the state of the event set active until the problem recurs.

To view the contents of the trace log and analyze the cm events, use the fstrace dump command to copy the content lines of thetrace log to standard output (stdout) or to a file.

NoteIf a particular command or process is causing problems, determine its process id (PID). Search the output of the fstrace dumpcommand for the PID to find only those lines associated with the problem.

8.3.2 Requirements for Using the fstrace Command Suite

Except for the fstrace help and fstrace apropos commands, which require no privilege, issuing the fstrace commands requiresthat the issuer be logged in as the local superuser root on the local client machine. Before issuing an fstrace command, verifythat you have the necessary privilege.

The Cache Manager catalog must be in place so that logging can occur. The fstrace command suite uses the standard UNIXcatalog utilities. The default location is /usr/vice/etc/C/afszcm.cat. It can be placed in another directory by placing the fileelsewhere and using the proper NLSPATH and LANG environment variables.

8.3.3 Using fstrace Commands Effectively

To use fstrace commands most effectively, configure them as indicated:

• Store the fstrace binary in a local disk directory.

• When you dump the fstrace log to a file, direct it to one on the local disk.

• The trace can grow large in just a few minutes. Before attempting to dump the log to a local file, verify that you have enoughroom. Be particularly careful if you are using disk quotas on partitions in the local file system.

• Attempt to limit Cache Manager activity on the AFS client machine other than the problem operation. This reduces the amountof extraneous data in the trace.

• Activate the fstrace log for the shortest possibly period of time. If possible activate the trace immediately before performingthe problem operation, deactivate it as soon as the operation completes, and dump the trace log to a file immediately.

• If possible, obtain UNIX process ID (PID) of the command or program that initiates the problematic operation. This enablesthe person analyzing the trace log to search it for messages associated with the PID.


8.3.4 Activating the Trace Log

To start Cache Manager tracing on an AFS client machine, you must first configure

• The cmfx kernel trace log using the fstrace setlog command

• The cm event set using the fstrace setset command

The fstrace setlog command sets the size of the cmfx kernel trace log in kilobytes. The trace log occupies 60 kilobytes of kernelby default. If the trace log already exists, it is cleared when this command is issued and a new log of the given size is created.Otherwise, a new log of the desired size is created.

The fstrace setset command sets the state of the cm kernel event set. The state of the cm event set determines whether informationon the events in that event set is logged.

After establishing kernel tracing on the AFS client machine, you can check the state of the event set and the size of the kernelbuffer allocated for the trace log. To display information about the state of the cm event set, issue the fstrace lsset command. Todisplay information about the cmfx trace log, use the fstrace lslog command. See the instructions in Displaying the State of aTrace Log or Event Set.

8.3.5 To configure the trace log



2. Issue the fstrace setlog command to set the size of the cmfx kernel trace log.

# fstrace setlog [-log <log_name>+] -buffersize <1-kilobyte_units>

The following example sets the size of the cmfx trace log to 80 KB.

# fstrace setlog cmfx 80

8.3.6 To set the event set



2. Issue the fstrace setset command to set the state of event sets.

% fstrace setset [-set <set_name>+] [-active] [-inactive] \[-dormant]

The following example activates the cm event set.

# fstrace setset cm -active

8.3.7 Displaying the State of a Trace Log or Event Set

An event set must be in the active state to be included in the trace log. To display an event set’s state, use the fstrace lssetcommand. To set its state, issue the fstrace setset command as described in To set the event set.

To display size and allocation information for the trace log, issue the fstrace lslogcommand with the -long argument.


8.3.8 To display the state of an event set



2. Issue the fstrace lsset command to display the available event set and its state.

# fstrace lsset [-set <set_name>+]

The following example displays the event set and its state on the local machine.

# fstrace lsset cmAvailable sets:cm active

The output from this command lists the event set and its states. The three event states for the cm event set are:

active Tracing is enabled.

inactive Tracing is disabled, but space is still allocated for the corresponding trace log (cmfx).

dormant Tracing is disabled, and space is no longer allocated for the corresponding trace log (cmfx).Disables tracing for theevent set.

8.3.9 To display the log size



2. Issue the fstrace lslog command to display information about the kernel trace log.

# fstrace lslog [-set <set_name>+] [-log <log_name>] [-long]

The following example uses the -long flag to display additional information about the cmfx trace log.

# fstrace lslog cmfx -longAvailable logs:cmfx : 60 kbytes (allocated)

The output from this command lists information on the trace log. When issued without the -long flag, the fstrace lslog commandlists only the name of the log. When issued with the -long flag, the fstrace lslog command lists the log, the size of the log inkilobytes, and the allocation state of the log.

There are two allocation states for the kernel trace log:

allocated Space is reserved for the log in the kernel. This indicates that the event set that writes to this log is either active(tracing is enabled for the event set) or inactive (tracing is temporarily disabled for the event set); however, the event setcontinues to reserve space occupied by the log to which it sends data.

unallocated Space is not reserved for the log in the kernel. This indicates that the event set that writes to this log is dormant(tracing is disabled for the event set); furthermore, the event set releases the space occupied by the log to which it sendsdata.


8.3.10 Dumping and Clearing the Trace Log

After the Cache Manager operation you want to trace is complete, use the fstrace dump command to dump the trace log tothe standard output stream or to the file named by the -file argument. Or, to dump the trace log continuously, use the -followargument (combine it with the -file argument if desired). To halt continuous dumping, press an interrupt signal such as <Ctrl-c>.

To clear a trace log when you no longer need the data in it, issue the fstrace clear command. (The fstrace setlog command alsoclears an existing trace log automatically when you use it to change the log’s size.)

8.3.11 To dump the contents of a trace log



2. Issue the fstrace dump command to dump trace logs.

# fstrace dump [-set <set_name>+] [-follow <log_name>] \[-file <output_filename>] \[-sleep <seconds_between_reads>]

At the beginning of the output of each dump is a header specifying the date and time at which the dump began. The number oflogs being dumped is also displayed if the -follow argument is not specified. The header appears as follows:

AFS Trace Dump --Date: date timeFound n logs.

where date is the starting date of the trace log dump, time is the starting time of the trace log dump, and n specifies the numberof logs found by the fstrace dump command.

The following is an example of trace log dump header:

AFS Trace Dump --Date: Fri Apr 16 10:44:38 1999Found 1 logs.

The contents of the log follow the header and are comprised of messages written to the log from an active event set. The messageswritten to the log contain the following three components:

• The timestamp associated with the message (number of seconds from an arbitrary start point)

• The process ID or thread ID associated with the message

• The message itself

A trace log message is formatted as follows:

time timestamp, pid pid:event message

where timestamp is the number of seconds from an arbitrary start point, pid is the process ID number of the Cache Managerevent, and event message is the Cache Manager event which corresponds with a function in the AFS source code.

The following is an example of a dumped trace log message:

time 749.641274, pid 3002:Returning code 2 from 19

For the messages in the trace log to be most readable, the Cache Manager catalog file needs to be installed on the local disk ofthe client machine; the conventional location is /usr/vice/etc/C/afszcm.cat. Log messages that begin with the string raw op,like the following, indicate that the catalog is not installed.


raw op 232c, time 511.916288, pid 0p0:Fri Apr 16 10:36:31 1999

Every 1024 seconds, a current time message is written to each log. This message has the following format:

time timestamp, pid pid: Current time: unix_time

where timestamp is the number of seconds from an arbitrary start point, pid is the process ID number, and unix_time is thestandard time format since January 1, 1970.

The current time message can be used to determine the actual time associated with each log message. Determine the actual timeas follows:

1. Locate the log message whose actual time you want to determine.

2. Search backward through the dump record until you come to a current time message.

3. If the current time message’s timestamp is smaller than the log message’s timestamp, subtract the former from the latter.If the current time message’s timestamp is larger than the log message’s timestamp, add 1024 to the latter and subtract theformer from the result.

4. Add the resulting number to the current time message’s unix_time to determine the log message’s actual time.

Because log data is stored in a finite, circular buffer, some of the data can be overwritten before being read. If this happens, thefollowing message appears at the appropriate place in the dump:

Log wrapped; data missing.

NoteIf this message appears in the middle of a dump, which can happen under a heavy work load, it indicates that not all of thelog data is being written to the log or some data is being overwritten. Increasing the size of the log with the fstrace setlogcommand can alleviate this problem.

8.3.12 To clear the contents of a trace log



2. Issue the fstrace clear command to clear logs by log name or by event set.

# fstrace clear [-set <set_name>+] [-log <log_name>+]

The following example clears the cmfx log used by the cm event set on the local machine.

# fstrace clear cm

The following example also clears the cmfx log on the local machine.

# fstrace clear cmfx


8.3.13 Examples of fstrace Commands

This section contains an extensive example of the use of the fstrace command suite, which is useful for gathering a detailedtrace of Cache Manager activity when you are working with AFS Product Support to diagnose a problem. The Product Supportrepresentative can guide you in choosing appropriate parameter settings for the trace.

Before starting the kernel trace log, try to isolate the Cache Manager on the AFS client machine that is experiencing the problemaccessing the file. If necessary, instruct users to move to another machine so as to minimize the Cache Manager activity on thismachine. To minimize the amount of unrelated AFS activity recorded in the trace log, place both the fstrace binary and the dumpfile must reside on the local disk, not in AFS. You must be logged in as the local superuser root to issue fstrace commands.

Before starting a kernel trace, issue the fstrace lsset command to check the state of the cm event set.

# fstrace lsset cm

If tracing has not been enabled previously or if tracing has been turned off on the client machine, the following output is displayed:

Available sets:cm inactive

If tracing has been turned off and kernel memory is not allocated for the trace log on the client machine, the following output isdisplayed:

Available sets:cm inactive (dormant)

If the current state of the cm event set is inactive or inactive (dormant), turn on kernel tracing by issuing the fstracesetset command with the -active flag.

# fstrace setset cm -active

If tracing is enabled currently on the client machine, the following output is displayed:

Available sets:cm active

If tracing is enabled currently, you do not need to use the fstrace setset command. Do issue the fstrace clear command to clearthe contents of any existing trace log, removing prior traces that are not related to the current problem.

# fstrace clear cm

After checking on the state of the event set, issue the fstrace lslog command with the -long flag to check the current state andsize of the kernel trace log .

# fstrace lslog cmfx -long

If tracing has not been enabled previously or the cm event set was set to active or inactive previously, output similar tothe following is displayed:

Available logs:cmfx : 60 kbytes (allocated)

The fstrace tracing utility allocates 60 kilobytes of memory to the trace log by default. You can increase or decrease the amountof memory allocated to the kernel trace log by setting it with the fstrace setlog command. The number specified with the -buffersize argument represents the number of kilobytes allocated to the kernel trace log. If you increase the size of the kerneltrace log to 100 kilobytes, issue the following command.

# fstrace setlog cmfx 100

After ensuring that the kernel trace log is configured for your needs, you can set up a file into which you can dump the kerneltrace log. For example, create a dump file with the name cmfx.dump.file.1 using the following fstrace dump command. Issuethe command as a continuous process by adding the -follow and -sleep arguments. Setting the -sleep argument to 10 dumpsoutput from the kernel trace log to the file every 10 seconds.


# fstrace dump -follow cmfx -file cmfx.dump.file.1 -sleep 10AFS Trace Dump -

Date: Fri Apr 16 10:54:57 1999Found 1 logs.time 32.965783, pid 0: Fri Apr 16 10:45:52 1999time 32.965783, pid 33657: Close 0x5c39ed8 flags 0x20time 32.965897, pid 33657: Gn_close vp 0x5c39ed8 flags 0x20 (returns0x0)time 35.159854, pid 10891: Breaking callback for 5bd95e4 states 1024(volume 0)time 35.407081, pid 10891: Breaking callback for 5c0fadc states 1024(volume 0)

. .

. .

. .time 71.440456, pid 33658: Lookup adp 0x5bbdcf0 name g3oCKs fid (7564fb7e:588d240.2ff978a8.6)time 71.440569, pid 33658: Returning code 2 from 19time 71.440619, pid 33658: Gn_lookup vp 0x5bbdcf0 name g3oCKs (returns0x2)time 71.464989, pid 38267: Gn_open vp 0x5bbd000 flags 0x0 (returns 0x0)AFS Trace Dump - Completed

8.4 Using the afsmonitor Program

The afsmonitor program enables you to monitor the status and performance of specified File Server and Cache Manager pro-cesses by gathering statistical information. Among its other uses, the afsmonitor program can be used to fine-tune CacheManager configuration and load balance File Servers.

The afsmonitor program enables you to perform the following tasks.

• Monitor any number of File Server and Cache Manager processes on any number of machines (in both local and foreign cells)from a single location.

• Set threshold values for any monitored statistic. When the value of a statistic exceeds the threshold, the afsmonitor programhighlights it to draw your attention. You can set threshold levels that apply to every machine or only some.

• Invoke programs or scripts automatically when a statistic exceeds its threshold.

8.4.1 Requirements for running the afsmonitor program

The following software must be accessible to a machine where the afsmonitor program is running:

• The AFS xstat libraries, which the afsmonitor program uses to gather data

• The curses graphics package, which most UNIX distributions provide as a standard utility

The afsmonitor screens format successfully both on so-called dumb terminals and in windowing systems that emulate terminals.For the output to looks its best, the display environment needs to support reverse video and cursor addressing. Set the TERMenvironment variable to the correct terminal type, or to a value that has characteristics similar to the actual terminal type. Thedisplay window or terminal must be at least 80 columns wide and 12 lines long.

The afsmonitor program must run in the foreground, and in its own separate, dedicated window or terminal. The window orterminal is unavailable for any other activity as long as the afsmonitor program is running. Any number of instances of theafsmonitor program can run on a single machine, as long as each instance runs in its own dedicated window or terminal. Notethat it can take up to three minutes to start an additional instance.


No privilege is required to run the afsmonitor program. By convention, it is installed in the /usr/afsws/bin directory, and anyonewho can access the directory can monitor File Servers and Cache Managers. The probes through which the afsmonitor programcollects statistics do not constitute a significant burden on the File Server or Cache Manager unless hundreds of people arerunning the program. If you wish to restrict its use, place the binary file in a directory available only to authorized users.

8.4.2 The afsmonitor Output Screens

The afsmonitor program displays its data on three screens:

• System Overview: This screen appears automatically when the afsmonitor program initializes. It summarizes separatelyfor File Servers and Cache Managers the number of machines being monitored and how many of them have alerts (statistics thathave exceeded their thresholds). It then lists the hostname and number of alerts for each machine being monitored, indicatingif appropriate that a process failed to respond to the last probe.

• File Server: This screen displays File Server statistics for each file server machine being monitored. It highlights statisticsthat have exceeded their thresholds, and identifies machines that failed to respond to the last probe.

• Cache Managers: This screen displays Cache Manager statistics for each client machine being monitored. It highlightsstatistics that have exceeded their thresholds, and identifies machines that failed to respond to the last probe.

Fields at the corners of every screen display the following information:

• In the top left corner, the program name and version number.

• In the top right corner, the screen name, current and total page numbers, and current and total column numbers. The pagenumber (for example, p.1 of 3) indicates the index of the current page and the total number of (vertical) pages over whichdata is displayed. The column number (for example, c.1 of 235) indicates the index of the current leftmost column andthe total number of columns in which data appears. (The symbol >>> indicates that there is additional data to the right; thesymbol <<< indicates that there is additional data to the left.)

• In the bottom left corner, a list of the available commands. Enter the first letter in the command name to run that command.Only the currently possible options appear; for example, if there is only one page of data, the next and prev commands,which scroll the screen up and down respectively, do not appear. For descriptions of the commands, see the following sectionabout navigating the display screens.

• In the bottom right corner, the probes field reports how many times the program has probed File Servers (fs), CacheManagers (cm), or both. The counts for File Servers and Cache Managers can differ. The freq field reports how often theprogram sends probes.

Navigating the afsmonitor Display Screens

As noted, the lower left hand corner of every display screen displays the names of the commands currently available for movingto alternate screens, which can either be a different type or display more statistics or machines of the current type. To execute acommand, press the lowercase version of the first letter in its name. Some commands also have an uppercase version that has asomewhat different effect, as indicated in the following list.

cm Switches to the Cache Managers screen. Available only on the System Overview and File Servers screens.

fs Switches to the File Servers screen. Available only on the System Overview and the Cache Managers screens.

left Scrolls horizontally to the left, to access the data columns situated to the left of the current set. Available when the <<<symbol appears at the top left of the screen. Press uppercase L to scroll horizontally all the way to the left (to display thefirst set of data columns).

next Scrolls down vertically to the next page of machine names. Available when there are two or more pages of machines andthe final page is not currently displayed. Press uppercase N to scroll to the final page.

oview Switches to the System Overview screen. Available only on the Cache Managers and File Servers screens.


prev Scrolls up vertically to the previous page of machine names. Available when there are two or more pages of machinesand the first page is not currently displayed. Press uppercase N to scroll to the first page.

right Scrolls horizontally to the right, to access the data columns situated to the right of the current set. This command isavailable when the >>> symbol appears at the upper right of the screen. Press uppercase R to scroll horizontally all theway to the right (to display the final set of data columns).

8.4.3 The System Overview Screen

The System Overview screen appears automatically as the afsmonitor program initializes. This screen displays the status ofas many File Server and Cache Manager processes as can fit in the current window; scroll down to access additional information.

The information on this screen is split into File Server information on the left and Cache Manager information on the right. Theheader for each grouping reports two pieces of information:

• The number of machines on which the program is monitoring the indicated process

• The number of alerts and the number of machines affected by them (an alert means that a statistic has exceeded its thresholdor a process failed to respond to the last probe)

A list of the machines being monitored follows. If there are any alerts on a machine, the number of them appears in squarebrackets to the left of the hostname. If a process failed to respond to the last probe, the letters PF (probe failure) appear in squarebrackets to the left of the hostname.

The following graphic is an example System Overview screen. The afsmonitor program is monitoring six File Servers andseven Cache Managers. The File Server process on host fs1.example.com and the Cache Manager on host cli33.example.comare each marked [ 1] to indicate that one threshold value is exceeded. The [PF] marker on host fs6.example.com indicatesthat its File Server process did not respond to the last probe.

Figure 8.5: The afsmonitor System Overview Screen

8.4.4 The File Servers Screen

The File Servers screen displays the values collected at the most recent probe for File Server statistics.


A summary line at the top of the screen (just below the standard program version and screen title blocks) specifies the number ofmonitored File Servers, the number of alerts, and the number of machines affected by the alerts.

The first column always displays the hostnames of the machines running the monitored File Servers.

To the right of the hostname column appear as many columns of statistics as can fit within the current width of the display screenor window; each column requires space for 10 characters. The name of the statistic appears at the top of each column. If the FileServer on a machine did not respond to the most recent probe, a pair of dashes (--) appears in each column. If a value exceedsits configured threshold, it is highlighted in reverse video. If a value is too large to fit into the allotted column width, it overflowsinto the next row in the same column.

For a list of the available File Server statistics, see Appendix C, The afsmonitor Program Statistics.

The following graphic depicts the File Servers screen that follows the System Overview Screen example previously dis-cussed; however, one additional server probe has been completed. In this example, the File Server process on fs1 has exceededthe configured threshold for the number of performance calls received (the numPerfCalls statistic), and that field appears inreverse video. Host fs6 did not respond to Probe 10, so dashes appear in all fields.

Figure 8.6: The afsmonitor File Servers Screen

Both the File Servers and Cache Managers screen (discussed in the following section) can display hundreds of columns of dataand are therefore designed to scroll left and right. In the preceding graphic, the screen displays the leftmost screen and the screentitle block shows that column 1 of 235 is displayed. The appearance of the >>> symbol in the upper right hand corner of thescreen and the right command in the command block indicate that additional data is available by scrolling right. (For informationon the available statistics, see Appendix C, The afsmonitor Program Statistics.)

If the right command is executed, the screen looks something like the following example. Note that the horizontal scroll symbolsnow point both to the left (<<<) and to the right (>>>) and both the left and right commands appear, indicating that additionaldata is available by scrolling both left and right.


Figure 8.7: The afsmonitor File Servers Screen Shifted One Page to the Right

8.4.5 The Cache Managers Screen

The Cache Managers screen displays the values collected at the most recent probe for Cache Manager statistics.

A summary line at the top of the screen (just below the standard program version and screen title blocks) specifies the number ofmonitored Cache Managers, the number of alerts, and the number of machines affected by the alerts.

The first column always displays the hostnames of the machines running the monitored Cache Managers.

To the right of the hostname column appear as many columns of statistics as can fit within the current width of the display screenor window; each column requires space for 10 characters. The name of the statistic appears at the top of each column. If theCache Manager on a machine did not respond to the most recent probe, a pair of dashes (--) appears in each column. If a valueexceeds its configured threshold, it is highlighted in reverse video. If a value is too large to fit into the allotted column width, itoverflows into the next row in the same column.

For a list of the available Cache Manager statistics, see Appendix C, The afsmonitor Program Statistics.

The following graphic depicts a Cache Managers screen that follows the System Overview Screen previously discussed. In theexample, the Cache Manager process on host cli33 has exceeded the configured threshold for the number of cells it can contact(the numCellsContacted statistic), so that field appears in reverse video.


Figure 8.8: The afsmonitor Cache Managers Screen

8.5 Configuring the afsmonitor Program

To customize the afsmonitor program, create an ASCII-format configuration file and use the -config argument to name it. Youcan specify the following in the configuration file:

• The File Servers, Cache Managers, or both to monitor.

• The statistics to display. By default, the display includes 271 statistics for File Servers and 570 statistics for Cache Managers.For information on the available statistics, see Appendix C, The afsmonitor Program Statistics.

• The threshold values to set for statistics and a script or program to execute if a threshold is exceeded. By default, no thresholdvalues are defined and no scripts or programs are executed.

The following list describes the instructions that can appear in the configuration file:

cm hostname Names a client machine for which to display Cache Manager statistics. The order of cm lines in the file determinesthe order in which client machines appear from top to bottom on the System Overview and Cache Managers outputscreens.

fs hostname Names a file server machine for which to display File Server statistics. The order of fs lines in the file determinesthe order in which file server machines appear from top to bottom on the System Overview and File Serversoutput screens.

thresh fs | cm field_name thresh_val [cmd_to_run] [arg1] ...[argn] Assigns the threshold value thresh_valto the statistic field_name, for either a File Server statistic (fs) or a Cache Manager statistic (cm). The optional cmd_to_executefield names a binary or script to execute each time the value of the statistic changes from being below thresh_val to be-ing at or above thresh_val. A change between two values that both exceed thresh_val does not retrigger the binary orscript. The optional arg1 through argn fields are additional values that the afsmonitor program passes as arguments to thecmd_to_execute command. If any of them include one or more spaces, enclose the entire field in double quotes.

The parameters fs, cm, field_name, threshold_val, and arg1 through argn correspond to the values with the same name onthe thresh line. The host_name parameter identifies the file server or client machine where the statistic has crossed thethreshold, and the actual_val parameter is the actual value of field_name that equals or exceeds the threshold value.


Use the thresh line to set either a global threshold, which applies to all file server machines listed on fs lines or clientmachines listed on cm lines in the configuration file, or a machine-specific threshold, which applies to only one file serveror client machine.

• To set a global threshold, place the thresh line before any of the fs or cm lines in the file.

• To set a machine-specific threshold, place the thresh line below the corresponding fs or cm line, and above any otherfs or cm lines. A machine-specific threshold value always overrides the corresponding global threshold, if set. Do notplace a thresh fs line directly after a cm line or a thresh cm line directly after a fs line.

show fs | cm field/group/section Specifies which individual statistic, group of statistics, or section of statistics todisplay on the File Servers screen (fs) or Cache Managers screen (cm) and the order in which to display them.The appendix of afsmonitor statistics in the OpenAFS Administration Guide specifies the group and section to which eachstatistic belongs. Include as many show lines as necessary to customize the screen display as desired, and place themanywhere in the file. The top-to-bottom order of the show lines in the configuration file determines the left-to-right orderin which the statistics appear on the corresponding screen.

If there are no show lines in the configuration file, then the screens display all statistics for both Cache Managers and FileServers. Similarly, if there are no show fs lines, the File Servers screen displays all file server statistics, and if thereare no show cm lines, the Cache Managers screen displays all client statistics.

# comments Precedes a line of text that the afsmonitor program ignores because of the initial number (#) sign, which mustappear in the very first column of the line.

For a list of the values that can appear in the field/group/section field of a show instruction, see Appendix C, The afsmonitorProgram Statistics.)

The following example illustrates a possible configuration file:

thresh cm dlocalAccesses 1000000thresh cm dremoteAccesses 500000 handleDRemotethresh fs rx_maxRtt_Usec 1000cm client5cm client33cm client14thresh cm dlocalAccesses 2000000thresh cm vcacheMisses 10000cm client2fs fs3fs fs9fs fs5fs fs10show cm numCellsContactedshow cm dlocalAccessesshow cm dremoteAccessesshow cm vcacheMissesshow cm Auth_Stats_group

Since the first three thresh instructions appear before any fs or cm instructions, they set global threshold values:

• All Cache Manager process in this file use 1000000 as the threshold for the dlocalAccesses statistic (except for the machineclient2 which uses an overriding value of 2000000.)

• All Cache Manager processes in this file use 500000 as the threshold value for the dremoteAccesses statistic; if that value isexceeded, the script handleDRemote is invoked.

• All File Server processes in this file use 1000 as the threshold value for the rx_maxRtt_Usec statistic.

The four cm instructions monitor the Cache Manager on the machines client5, client33, client14, and client2. The first threeuse all of the global threshold values.


The Cache Manager on client2 uses the global threshold value for the dremoteAccesses statistic, but a different one for thedlocalAccesses statistic. Furthermore, client22 is the only Cache Manager that uses the threshold set for the vcacheMissesstatistic.

The fs instructions monitor the File Server on the machines fs3, fs9, fs5, and fs10. They all use the global threshold fortherx_maxRtt_Usec statistic.

Because there are no show fs instructions, the File Servers screen displays all File Server statistics. The Cache Managersscreen displays only the statistics named in show cm instructions, ordering them from left to right. The Auth_Stats_groupincludes several statistics, all of which are displayed (curr_PAGs, curr_Records, curr_AuthRecords, curr_UnauthRecords,curr_MaxRecordsInPAG, curr_LongestChain, PAGCreations, TicketUpdates, HWM_PAGS, HWM_Records, HWM_MaxRecordsInPAG,and HWM_LongestChain).

8.6 Writing afsmonitor Statistics to a File

All of the statistical information collected and displayed by the afsmonitor program can be preserved by writing it to an outputfile. You can create an output file by using the -output argument when you startup the afsmonitor process. You can use theoutput file to track process performance over long periods of time and to apply post-processing techniques to further analyzesystem trends.

The afsmonitor program output file is a simple ASCII file that records the information reported by the File Server and CacheManager screens. The output file has the following format:

time host_name CM|FS list_of_measured_values

and specifies the time at which the list_of_measured_values were gathered from the Cache Manager (CM) or File Server (FS) pro-cess housed on host_name. On those occasion where probes fail, the value -1 is reported instead of the list_of_measured_values.

This file format provides several advantages:

• It can be viewed using a standard editor. If you intend to view this file frequently, use the -detailed flag with the -outputargument. It formats the output file in a way that is easier to read.

• It can be passed through filters to extract desired information using the standard set of UNIX tools.

• It is suitable for long term storage of the afsmonitor program output.

8.7 To start the afsmonitor Program

1. Open a separate command shell window or use a dedicated terminal for each instance of the afsmonitor program. Thiswindow or terminal must be devoted to the exclusive use of the afsmonitor process because the command cannot be runin the background.

2. Initialize the afsmonitor program. The message afsmonitor Collecting Statistics..., followed by theappearance of the System Overview screen, confirms a successful start.

% afsmonitor [initcmd] [-config <configuration file>] \[-frequency <poll frequency, in seconds>] \[-output <storage file name>] [-detailed] \[-debug <turn debugging output on to the named file>] \[-fshosts <list of file servers to monitor>+] \[-cmhosts <list of cache managers to monitor>+]

afsmonitor Collecting Statistics...

where



-config Specifies the pathname of an afsmonitor configuration file, which lists the machines and statistics to monitor.Partial pathnames are interpreted relative to the current working directory. Provide either this argument or one orboth of the -fshosts and -cmhosts arguments. You must use a configuration file to set thresholds or customize thescreen display. For instructions on creating the configuration file, see Configuring the afsmonitor Program.

-frequency Specifies how often to probe the File Server and Cache Manager processes, as a number of seconds. Accept-able values range from 1 and 86400; the default value is 60. This frequency applies to both File Server and CacheManager probes; however, File Server and Cache Manager probes are initiated and processed independent of eachother. The actual interval between probes to a host is the probe frequency plus the time needed by all hosts to respondto the probe.

-output Specifies the name of an output file to which to write all of the statistical data. By default, no output file is created.For information on this file, see Writing afsmonitor Statistics to a File.

-detailed Formats the output file named by the -output argument to be more easily readable. The -output argument mustbe provided along with this flag.

-fshosts Identifies each File Server process to monitor by specifying the host it is running on. You can identify a host usingeither its complete Internet-style host name or an abbreviation acceptable to the cell’s naming service. Combine thisargument with the -cmhosts if you wish, but not the -config argument.

-cmhosts Identifies each Cache Manager process to monitor by specifying the host it is running on. You can identify ahost using either its complete Internet-style host name or an abbreviation acceptable to the cell’s naming service.Combine this argument with the -fshosts if you wish, but not the -config argument.

8.8 To stop the afsmonitor program

To exit an afsmonitor program session, Enter the <Ctrl-c> interrupt signal or an uppercase Q.

8.9 The xstat Data Collection Facility

The afsmonitor program uses the xstat data collection facility to gather and calculate the data that it (the afsmonitor program)then uses to perform its function. You can also use the xstat facility to create your own data display programs. If you do, keepthe following in mind. The File Server considers any program calling its RPC routines to be a Cache Manager; therefore, anyprogram calling the File Server interface directly must export the Cache Manager’s callback interface. The calling program mustbe capable of emulating the necessary callback state, and it must respond to periodic keep-alive messages from the File Server.In addition, a calling program must be able to gather the collected data.

The xstat facility consists of two C language libraries available to user-level applications:

• /usr/afsws/lib/afs/libxstat_fs.a exports calls that gather information from one or more running File Server processes.

• /usr/afsws/lib/afs/libxstat_cm.a exports calls that collect information from one or more running Cache Managers.

The libraries allow the caller to register

• A set of File Servers or Cache Managers to be examined.

• The frequency with which the File Servers or Cache Managers are to be probed for data.

• A user-specified routine to be called each time data is collected.

The libraries handle all of the lightweight processes, callback interactions, and timing issues associated with the data collection.The user needs only to process the data as it arrives.


8.9.1 The libxstat Libraries

The libxstat_fs.a and libxstat_cm.a libraries handle the callback requirements and other complications associated with thecollection of data from File Servers and Cache Managers. The user provides only the means of accumulating the desired data.Each xstat library implements three routines:

• Initialization (xstat_fs_Init and xstat_cm_Init) arranges the periodic collection and handling of data.

• Immediate probe (xstat_fs_ForceProbeNow and xstat_cm_ForceProbeNow) forces the immediate collection of data, afterwhich collection returns to its normal probe schedule.

• Cleanup (xstat_fs_Cleanup and xstat_cm_Cleanup) terminates all connections and removes all traces of the data collectionfrom memory.

The File Server and Cache Manager each define data collections that clients can fetch. A data collection is simply a related set ofnumbers that can be collected as a unit. For example, the File Server and Cache Manager each define profiling and performancedata collections. The profiling collections maintain counts of the number of times internal functions are called within servers,allowing bottleneck analysis to be performed. The performance collections record, among other things, internal disk I/O statisticsfor a File Server and cache effectiveness figures for a Cache Manager, allowing for performance analysis.

For a copy of the detailed specification which provides much additional usage information about the xstat facility, its libraries,and the routines in the libraries, contact AFS Product Support.

8.9.2 Example xstat Commands

AFS comes with two low-level, example commands: xstat_fs_test and xstat_cm_test. The commands allow you to experimentwith the xstat facility. They gather information and display the available data collections for a File Server or Cache Manager.They are intended merely to provide examples of the types of data that can be collected via xstat; they are not intended for usein the actual collection of data.

8.9.2.1 To use the example xstat_fs_test command

1. Issue the example xstat_fs_test command to test the routines in the libxstat_fs.a library and display the data collectionsassociated with the File Server process. The command executes in the foreground.

% xstat_fs_test [initcmd] \-fsname <File Server name(s) to monitor>+ \-collID <Collection(s) to fetch>+ [-onceonly] \[-frequency <poll frequency, in seconds>] \[-period <data collection time, in minutes>] [-debug]

where

xstat_fs_test Must be typed in full.


-fsname Is the Internet host name of each file server machine on which to monitor the File Server process.

-collID Specifies each data collection to return. The indicated data collection defines the type and amount of data thecommand is to gather about the File Server. Data is returned in the form of a predefined data structure (refer to thespecification documents referenced previously for more information about the data structures).There are two acceptable values:

• 1 reports various internal performance statistics related to the File Server (for example, vnode cache entries and Rxprotocol activity).

• 2 reports all of the internal performance statistics provided by the 1 setting, plus some additional, detailed perfor-mance figures about the File Server (for example, minimum, maximum, and cumulative statistics regarding FileServer RPCs, how long they take to complete, and how many succeed).


-onceonly Directs the command to gather statistics just one time. Omit this option to have the command continue to probethe File Server for statistics every 30 seconds. If you omit this option, you can use the <Ctrl-c> interrupt signal tohalt the command at any time.

-frequency Sets the frequency in seconds at which the program initiates probes to the File Server. If you omit thisargument, the default is 30 seconds.

-period Sets how long the utility runs before exiting, as a number of minutes. If you omit this argument, the default is 10minutes.

-debug Displays additional information as the command runs.

8.9.2.2 To use the example xstat_cm_test command

1. Issue the example xstat_cm_test command to test the routines in the libxstat_cm.a library and display the data collectionsassociated with the Cache Manager. The command executes in the foreground.

% xstat_cm_test [initcmd] \-cmname <Cache Manager name(s) to monitor>+ \-collID <Collection(s) to fetch>+ \[-onceonly] [-frequency <poll frequency, in seconds>] \[-period <data collection time, in minutes>] [-debug]

where

xstat_cm_test Must be typed in full.


-cmname Is the host name of each client machine on which to monitor the Cache Manager.

-collID Specifies each data collection to return. The indicated data collection defines the type and amount of data thecommand is to gather about the Cache Manager. Data is returned in the form of a predefined data structure (refer tothe specification documents referenced previously for more information about the data structures).There are two acceptable values:

• 0 provides profiling information about the numbers of times different internal Cache Manager routines were calledsince the Cache manager was started.

• 1 reports various internal performance statistics related to the Cache manager (for example, statistics about howeffectively the cache is being used and the quantity of intracell and intercell data access).

• 2 reports all of the internal performance statistics provided by the 1 setting, plus some additional, detailed per-formance figures about the Cache Manager (for example, statistics about the number of RPCs sent by the CacheManager and how long they take to complete; and statistics regarding things such as authentication, access, andPAG information associated with data access).

-onceonly Directs the command to gather statistics just one time. Omit this option to have the command continue to probethe Cache Manager for statistics every 30 seconds. If you omit this option, you can use the <Ctrl-c> interrupt signalto halt the command at any time.

-frequency Sets the frequency in seconds at which the program initiates probes to the Cache Manager. If you omit thisargument, the default is 30 seconds.

-period Sets how long the utility runs before exiting, as a number of minutes. If you omit this argument, the default is 10minutes.

-debug Displays additional information as the command runs.

8.10 Auditing AFS Events on AIX File Servers

You can audit AFS events on AIX File Servers using an AFS mechanism that transfers audit information from AFS to the AIXauditing system. The following general classes of AFS events can be audited. For a complete list of specific AFS audit events,see Appendix D, AIX Audit Events.


• Authentication and Identification Events

• Security Events

• Privilege Required Events

• Object Creation and Deletion Events

• Attribute Modification Events

• Process Control Events

NoteThis section assumes familiarity with the AIX auditing system. For more information, see the AIX System Management Guidefor the version of AIX you are using.

8.10.1 Configuring AFS Auditing on AIX File Servers

The directory /usr/afs/local/audit contains three files that contain the information needed to configure AIX File Servers to auditAFS events:

• The events.sample file contains information on auditable AFS events. The contents of this file are integrated into the corre-sponding AIX events file (/etc/security/audit/events).

• The config.sample file defines the six classes of AFS audit events and the events that make up each class. It also defines theclasses of AFS audit events to audit for the File Server, which runs as the local superuser root. The contents of this file mustbe integrated into the corresponding AIX config file (/etc/security/audit/config).

• The objects.sample file contains a list of information about audited files. You must only audit files in the local file space. Thecontents of this file must be integrated into the corresponding AIX objects file (/etc/security/audit/objects).

Once you have properly configured these files to include the AFS-relevant information, use the AIX auditing system to start upand shut down the auditing.

8.10.2 To enable AFS auditing

1. Create the following string in the file /usr/afs/local/Audit on each File Server on which you plan to audit AFS events:

AFS_AUDIT_AllEvents

2. Issue the bos restart command (with the -all flag) to stop and restart all server processes on each File Server. For instruc-tions on using this command, see Stopping and Immediately Restarting Processes.

8.10.3 To disable AFS auditing

1. Remove the contents of the file /usr/afs/local/Audit on each File Server for which you are no longer interested in auditingAFS events.

2. Issue the bos restart command (with the -all flag) to stop and restart all server processes on each File Server. For instruc-tions on using this command, see Stopping and Immediately Restarting Processes.


Chapter 9

Managing Server Encryption Keys

This chapter explains how to maintain your cell’s server encryption keys, which are vital for secure communications in AFS.



Add a new server encryption key bos addkey and kas setpasswordInspect key checksums in the Authentication Database kas examineInspect key checksums in the KeyFile bos listkeysRemove an old server encryption key bos removekey

9.2 About Server Encryption Keys

An encryption key is a string of octal numbers used to encrypt and decrypt packets of information. In AFS, a server encryptionkey is the key used to protect information being transferred between AFS server processes and between them and their clients.A server encryption key is essentially a password for a server process and like a user password is stored in the AuthenticationDatabase.

Maintaining your cell’s server encryption keys properly is the most basic way to protect the information in your AFS filespacefrom access by unauthorized users.

9.2.1 Keys and Mutual Authentication: A Review

Server encryption keys play a central role in the mutual authentication between client and server processes in AFS. For a moredetailed description of mutual authentication, see A More Detailed Look at Mutual Authentication.

When a client wants to contact an AFS server, it first contacts the Ticket Granting Service (TGS) module of the AuthenticationServer. After verifying the client’s identity (based indirectly on the password of the human user whom the client represents), theTGS gives the client a server ticket. This ticket is encrypted with the server’s encryption key. (The TGS also invents a secondencryption key, called the session key, to be used only for a single episode of communication between server and client. Theserver ticket and session key, together with other pieces of information, are collectively referred to as a token.)

The client cannot read the server ticket or token because it does not know the server encryption key. However, the client sends it tothe AFS server along with service requests, because the ticket proves to the AFS server processes that it has already authenticatedwith the TGS. AFS servers trust the TGS to grant tickets only to valid clients. The fact that the client possesses a ticket encryptedwith the server’s encryption key proves to the server that the client is valid. On the other hand, the client assumes that only agenuine AFS server knows the server encryption key needed to decrypt the ticket. The server’s ability to decrypt the ticket andunderstand its contents proves to the client that the server is legitimate.


9.2.2 Maintaining AFS Server Encryption Keys

As you maintain your cell’s server encryption keys, keep the following in mind.

• Change the key frequently to enhance your cell’s security. Changing the key at least once a month is strongly recommended.

• The AFS server encryption key currently in use is stored in two places. When you add a new key, you must make changesin both places and make them in the correct order, as instructed in Adding Server Encryption Keys. Failure to follow theinstructions can seriously impair cell functioning, as clients and servers become unable to communicate. The two storage sitesfor the current server encryption key are the following:

1. The file /usr/afs/etc/KeyFile on the local disk of every file server machine. The file can list more than one key, each withan associated numerical identifier, the key version number or kvno. A client token records the key version number of thekey used to seal it, and the server process retrieves the appropriate key from this file when the client presents the token.

2. The afs entry in the Authentication Database. The current server encryption key is in the entry’s password field, just likean individual user’s scrambled password. The Authentication Server’s Ticket Granting Service (TGS) uses this key toencrypt the tokens it gives to clients. There is only a single key in the entry, because the TGS never needs to read existingtokens, but only to generate new ones by using the current key.

For instructions on creating the initial afs entry and KeyFile files as you install your cell’s first server machine, see theOpenAFS Quick Beginnings.

• At any specific time, the tokens that the Authentication Server’s Ticket Granting Service gives to clients are sealed with onlyone of the server encryption keys, namely the one stored in the afs entry in the Authentication Database.

• When you add a new server encryption key, you cannot immediately remove the former key from the /usr/afs/etc/KeyFilefile on the local disk of every AFS server machine. Any time that you add a new key, it is likely that some clients still havevalid, unexpired tokens sealed with the previous key. The more frequently you change the server encryption key, the more suchtickets there are likely to be. To be able to grant service appropriately to clients with such tokens, an AFS server process muststill be able to access the server encryption key used to seal it.

You can safely delete an old server encryption key only when it is certain that no clients have tokens sealed with that key. Ingeneral, wait a period of time at least as long as the maximum token lifetime in your cell. By default, the maximum tokenlifetime for users is 25 hours (except for users whose Authentication Database entries were created by using the 3.0 versionof AFS, for whom the default is 100 hours). You can use the -lifetime argument to the kas setfields command to change thisdefault.

Instructions for removing obsolete keys appear in Removing Server Encryption Keys.

• You create a new AFS server encryption key in much the same way regular users change their passwords, by providing acharacter string that is converted into an encryption key automatically. See Adding Server Encryption Keys.

• In addition to using server encryption keys when communicating with clients, the server processes use them to protect commu-nications with other server processes. Therefore, all server machines in your cell must have the same version of the KeyFilefile. The easiest way to maintain consistency is to use the Update Server to distribute the contents of the system controlmachine’s /usr/afs/etc directory to all of the other server machines. There are two implications:

– You must run the upserver process on the system control machine and an upclientetc process on all other server machinesthat references the system control machine. The OpenAFS Quick Beginnings explains how to install both processes. Forinstructions on verifying that the Update Server processes are running, see Displaying Process Status and Information fromthe BosConfig File.

– Change the KeyFile file only on the system control machine (except in the types of emergencies discussed in HandlingServer Encryption Key Emergencies). Any changes you make on other server machines are overwritten the next time theupclientetc process retrieves the contents of the system control machine’s /usr/afs/etc directory. By default, this happensevery five minutes.

• Never edit the KeyFile directly with a text editor. Instead, always use the appropriate bos commands as instructed in AddingServer Encryption Keys and Removing Server Encryption Keys.


9.3 Displaying Server Encryption Keys

To display the server encryption keys in the /usr/afs/etc/KeyFile file on any file server machine, use the bos listkeys command.Use the kas examine command to display the key in the Authentication Database’s afs entry.

By default the commands do not display the actual string of octal digits that constitute a key, but rather a checksum, a decimalnumber derived by encrypting a constant with the key. This prevents unauthorized users from easily accessing the actual key,which they can then use to falsify or eavesdrop on protected communications. The bos listkeys and kas examine commandsgenerate the same checksum for a given key, so displaying checksums rather than actual keys is generally sufficient. If yoususpect that the keys differ in a way that the checksums are not revealing, then you are probably experiencing authenticationproblems throughout your cell. The easiest solution is to create a new server encryption key following the instructions in AddingServer Encryption Keys or Handling Server Encryption Key Emergencies. Another common reason to issue the bos listkeyscommand is to display the key version numbers currently in use, in preparation for choosing the next one; here, the checksum issufficient because the key itself is irrelevant.

If it is important to display the actual octal digits, include the -showkey argument to both the bos listkeys and kas examinecommands.

9.3.1 To display the KeyFile file



2. Issue the bos listkeys command to display the contents of one machine’s /usr/afs/etc/KeyFile file.

% bos listkeys <machine name> [-showkey]

where

listk Is the shortest acceptable abbreviation of listkeys.

machine name Names a file server machine. In the normal case, it is acceptable to name any machine, because correctcell functioning requires that the KeyFile file be the same on all of them.

-showkey Displays the octal digits that constitute each key.

In the following example, the output displays a checksum for each server encryption key rather than the actual octal digits. Thepenultimate line indicates when an administrator last changed the file, and the final line confirms that the output is complete.

% bos listkeys fs1.example.comkey 0 has cksum 972037177key 1 has cksum 2825165022Keys last changed on Wed Jan 13 11:20:29 1999.All done.

9.3.2 To display the afs key from the Authentication Database

1. Issue the kas examine command to display the afs entry in the Authentication Database.

The Authentication Server performs its own authentication rather than accepting your existing AFS token. By default, itauthenticates your local (UNIX) identity, which possibly does not correspond to an AFS-privileged administrator. Includethe -admin argument to name an identity that has the ADMIN flag on its Authentication Database entry. To verify that anentry has the flag, issue the kas examine command as described in To check if the ADMIN flag is set.

% kas examine afs [-showkey] \-admin <admin principal to use for authentication>

Administrator’s (admin_user) password: <admin_password>


where


afs Designates the afs entry.

-showkey Displays the octal digits that constitute the key.

-admin Names an administrative account with the ADMIN flag on its Authentication Database entry, such as admin. Thepassword prompt echoes it as admin_user. Enter the appropriate password as admin_password.

In the following example, the admin user displays the afs entry without using the -showkey flag. The second line shows the keyversion number in parentheses and the key’s checksum. The line that begins with the string last mod reports the date on whichthe indicated administrator changed the key. There is no necessary relationship between this date and the date reported by thebos listkeys command, because the latter date changes for any type of change to the KeyFile file, not just a key addition. For adescription of the other lines in the output from the kas examine command, see its reference page in the OpenAFS AdministrationReference.

% kas examine afs -admin adminAdministrator’s (admin) password: <admin_password>User data for afskey (1) cksum is 2825165022, last cpw: no datepassword will never expire.An unlimited number of unsuccessful authentications is permitted.entry expires on never. Max ticket lifetime 100.00 hours.last mod on Wed Jan 13 11:21:36 1999 by adminpermit password reuse

9.4 Adding Server Encryption Keys

As noted, AFS records server encryption keys in two separate places:

1. In the file /usr/afs/etc/KeyFile on the local disk of each server machine, for use by the AFS server processes running onthe machine

2. In the afs entry in the Authentication Database, for use by the Ticket Granting Service (TGS) when creating tokens

To ensure that server processes and the TGS share the same AFS server encryption key, execute all the steps in this sectionwithout interruption.

The following instructions include a step in which you restart the database server processes (the Authentication, Backup, Pro-tection, and Volume Location Server processes) on all database server machines. As a database server process starts, it reads inthe server encryption key that has the highest key version number in the KeyFile file and uses it to protect the messages that itsends for synchronizing the database and maintaining quorum. It uses the same key throughout its lifetime, which can be for anextended period, even if you remove the key from the KeyFile file. However, if one of the peer database server processes restartsand the others do not, quorum and database synchronization break down because the processes are no longer using the same key:the restarted process is using the key that currently has the highest key version number, and the other processes are still usingthe key they read in when they originally started. To avoid this problem, it is safest to restart all of the database server processeswhen adding a new key.

After adding a new key, you can remove obsolete keys from the KeyFile file to prevent it from becoming cluttered. However, youmust take care not to remove keys that client or server processes are still using. For discussion and instructions, see RemovingServer Encryption Keys.

9.4.1 To add a new server encryption key




2. Issue the bos listkeys command to display the key version numbers that are already in use, as a first step in choosing thekey version number for the new key.

% bos listkeys <machine name>

where


machine name Names any file server machine.

3. Choose a key version number for the new key, based on the output from Step 2 and the following requirements:

• A key version number must be an integer between 0 (zero) and 255 to comply with Kerberos standards. It is simplest ifyou keep your key version numbers in sequence by choosing a key version number one greater than the largest existingone.

• Do not reuse a key version number currently found in the KeyFile file, particularly if it is also the one in the Authen-tication Database afs entry. Client processes possibly still have tickets sealed with the key that originally had that keyversion number, but the server processes start using the new key marked with that key version number. Because the keysdo not match, the server processes refuse requests from clients who hold legitimate tokens.

4. Issue the bos addkey command to create a new AFS server encryption key in the KeyFile file.

If you use the Update Server to distribute the contents of the system control machine’s /usr/afs/etc directory, substitutethe system control machine for the machine name argument. (If you have forgotten which machine is the system controlmachine, see To locate the system control machine.)

To avoid visible echoing of the string that corresponds to the new key, omit the -key argument from the command line;instead enter the string at the prompts that appear when you omit it, as shown in the following syntax specification.

% bos addkey -server <machine name> -kvno <key version number>input key: <afs_password>Retype input key: <afs_password>

where

addk Is the shortest acceptable abbreviation of addkey.

-server Names the cell’s system control machine if you are using the Update Server, or each server machine in turn if youare not.

-kvno Specifies the new key’s key version number as an integer from the range 0 (zero) through 255.Remember the number. You need to use it again in Step 6.

afs_password Is a character string similar to a user password, of any length from one to about 1,000 characters. Toimprove security, include nonalphabetic characters and make the string as long as is practical (you need to type itonly in this step and in Step 6).Do not enter an octal string directly. The BOS Server scrambles the character string into an octal string appropriatefor use as an encryption key before recording it in the KeyFile file.

5. If you are using the Update Server, wait for a few minutes while the Update Server distributes the new KeyFile file toall server machines. The maximum necessary waiting period is the largest value provided for the -t argument to theupclientetc process’s initialization command used on any of the server machines; the default time is five minutes.

To be certain that all machines have the same KeyFile file, issue the bos listkeys command for every file server machineand verify that the checksum for the new key is the same on all machines.


If you are not using the Update Server, try to complete Step 4 within five minutes.


6. Issue the kas setpassword command to enter the same key in the afs entry in the Authentication Database.


% kas setpassword -name afs -kvno <kvno> \-admin <admin principal to use for authentication>

Administrator’s (admin_user) password: <admin_password>new_password: afs_passwordVerifying, please re-enter new_password: <admin_password>

where

sp Is an acceptable alias for setpassword (setp is the shortest acceptable abbreviation).

-name afs Creates the new key in the afs entry.

-kvno Specifies the same key version number as in Step 4.

-admin Names an administrative account with the ADMIN flag on its Authentication Database entry, such as admin. Thepassword prompt echoes it as admin_user. Enter the appropriate password as admin_password.

afs_password Is the same character string you entered in Step 4.

7. (Optional.) If you want to verify that the keys you just created in the KeyFile file and the Authentication Database afsentry are identical and have the same key version number, follow the instructions in Displaying Server Encryption Keys.

8. Issue the bos restart command to restart the database server processes on all database server machines. This forces themto start using the key in the KeyFile file that currently has the highest key version number.

Repeat this command in quick succession for each database server machine, starting with the machine that has the lowestIP address.


where

res Is the shortest acceptable abbreviation of restart.machine name Names each database server machine in turn.

buserver kaserver ptserver vlserver Designates the Backup Server, Authentication Server, Protection Server, and Vol-ume Location (VL) Server, respectively.

9.5 Removing Server Encryption Keys

You can periodically remove old keys from the /usr/afs/etc/KeyFile file to keep it to a reasonable size. To avoid disturbing cellfunctioning, do not remove an old key until all tokens sealed with the key and held by users or client processes have expired. Afteradding a new key, wait to remove old keys at least as long as the longest token lifetime you use in your cell. For AuthenticationDatabase user entries created under AFS version 3.1 or higher, the default token lifetime is 25 hours; for entries created underAFS version 3.0, it is 100 hours.

There is no command for removing the key from the afs entry in the Authentication Database, because the key field in that entrymust never be empty. Use the kas setpassword command to replace the afs key, but only as part of the complete proceduredetailed in To add a new server encryption key.

Never remove from the KeyFile file the key that is currently in the afs entry in the Authentication Database. AFS server processesbecome unable to decrypt the tickets that clients present to them.


9.5.1 To remove a key from the KeyFile file



2. Issue the bos listkeys command to display the key version number of each key you want to remove. The output also revealswhether it has been at least 25 hours since a new key was placed in the KeyFile file. For complete instructions for the boslistkeys command, see To display the KeyFile file.


3. Issue the kas examine command to verify that the key currently in the Authentication Database’s afs entry does not havethe same key version number as any of the keys you are removing from the KeyFile file. For detailed instructions for thekas examine command, see To display the afs key from the Authentication Database.

% kas examine afs -admin <admin principal to use for authentication>Administrator’s (admin_user) password: <admin_password>

4. Issue the bos removekey command to remove one or more server encryption keys from the KeyFile file.

If you use the Update Server to distribute the contents of the system control machine’s /usr/afs/etc directory, substitutethe system control machine for the machine name argument. (If you have forgotten which machine is the system controlmachine, see To locate the system control machine.)

% bos removekey <machine name> <key version number>

where

removek Is the shortest acceptable abbreviation of removekey.machine name Names the cell’s system control machine if you are using the Update Server, or each server machine in

turn if you are not.key version number Specifies the key version number of each key to remove.

9.6 Handling Server Encryption Key Emergencies

In rare circumstances, the AFS server processes can become unable to decrypt the server tickets that clients or peer serverprocesses are presenting. Activity in your cell can come to a halt, because the server processes believe that the tickets are forgedor expired, and refuse to execute any actions. This can happen on one machine or several; the effect is more serious when moremachines are involved.

One common cause of server encryption key problems is that the client’s ticket is encrypted with a key that the server processdoes not know. Usually this means that the /usr/afs/etc/KeyFile on the server machine does not include the key in the afsAuthentication Database entry, which the Authentication Server’s Ticket Granting Service (TGS) module is using to encryptserver tickets.

Another possibility is that the KeyFile files on different machines do not contain the same keys. In this case, communicationsamong server processes themselves become impossible. For instance, AFS’s replicated database mechanism (Ubik) breaks downif the instances of a database server process on the different database server machines are not using the same key.

The appearance of the following error message when you direct a bos command to a file server machine in the local cell is onepossible symptom of server encryption key mismatch. (Note, however, that you can also get this message if you forget to includethe -cell argument when directing the bos command to a file server machine in a foreign cell.)

bos: failed to contact host’s bosserver (security object was passed a bad ticket).

The solution to server encryption key emergencies is to put a new AFS server encryption key in both the Authentication Databaseand the KeyFile file on every server machine, so that the TGS and all server processes again share the same key.

Handling key emergencies requires some unusual actions. The reasons for these actions are explained in the following sections;the actual procedures appear in the subsequent instructions.


9.6.1 Prevent Mutual Authentication

It is necessary to prevent the server processes from trying to mutually authenticate with you as you deal with a key emergency,because they possibly cannot decrypt your token. When you do not mutually authenticate, the server processes assign you theidentity anonymous. To prevent mutual authentication, use the unlog command to discard your tokens and include the -noauthflag on every command where it is available.

9.6.2 Disable Authorization Checking by Hand

Because the server processes recognize you as the user anonymous when you do not mutually authenticate, you must turn offauthorization checking. Only with authorization checking disabled do the server processes allow the anonymous user to performprivileged actions such as key creation.

In an emergency, disable authorization checking by creating the file /usr/afs/local/NoAuth by hand. In normal circumstances,use the bos setauth command instead.

9.6.3 Work Quickly on Each Machine

Disabling authorization checking is a serious security exposure, because server processes on the affected machine perform anyaction for anyone. Disable authorization checking only for as long as necessary, completing all steps in an uninterrupted sessionand as quickly as possible.

9.6.4 Work at the Console

Working at the console of each server machine on which you disable authorization checking ensures that no one else logs ontothe console while you are working there. It does not prevent others from connecting to the machine remotely (using the telnetprogram, for example), which is why it is important to work quickly. The only way to ensure complete security is to disablenetwork traffic, which is not a viable option in many environments. You can improve security in general by limiting the numberof people who can connect remotely to your server machines at any time, as recommended in Improving Security in Your Cell.

9.6.5 Change Individual KeyFile Files

If you use the Update Server to distribute the contents of the /usr/afs/etc directory, an emergency is the only time when itis appropriate to change the KeyFile file on individual machines instead. Updating each machine’s file is necessary becausemismatched keys can prevent the system control machine’s upserver process from mutually authenticating with upclientetcprocesses on other server machines, in which case the upserver process refuses to distribute its KeyFile file to them.

Even if it appears that the Update Server is working correctly, the only way to verify that is to change the key on the systemcontrol machine and wait the standard delay period to see if the upclientetc processes retrieve the key. During an emergency,it does not usually make sense to wait the standard delay period. It is more efficient simply to update the file on each servermachine separately. Also, even if the Update Server can distribute the file correctly, other processes can have trouble because ofmismatched keys. The following instructions add the new key file on the system control machine first. If the Update Server isworking, then it is distributing the same change as you are making on each server machine individually.

If your cell does not use the Update Server or you always change keys on server machines individually. The following instructionsare also appropriate for you.

9.6.6 Two Component Procedures

There are two subprocedures used frequently in the following instructions: disabling authorization checking and reenabling it.For the sake of clarity, the procedures are detailed here; the instructions refer to them as necessary.


9.6.6.1 Disabling Authorization Checking in an Emergency



2. Create the file /usr/afs/local/NoAuth to disable authorization checking.

# touch /usr/afs/local/NoAuth

3. Discard your tokens, in case they were sealed with an incompatible key, which can prevent some commands from execut-ing.

# unlog

9.6.6.2 Reenabling Authorization Checking in an Emergency



2. Remove the /usr/afs/local/NoAuth file.

# rm /usr/afs/local/NoAuth

3. Authenticate as an administrative identity that belongs to the system:administrators group and is listed in the /usr/af-s/etc/UserList file.

# klog <admin_user>Password: <admin_password>

4. If appropriate, log out from the console (or close the remote connection you are using), after issuing the unlog commandto destroy your tokens.

9.6.7 To create a new server encryption key in emergencies

1. On the system control machine, disable authorization checking as instructed in Disabling Authorization Checking in anEmergency.

2. Issue the bos listkeys command to display the key version numbers already in use in the KeyFile file, as a first step inchoosing the new key’s key version number.

# bos listkeys <machine name> -noauth

where


machine name Specifies a file server machine.

-noauth Bypasses mutual authentication with the BOS Server. Include it in case the key emergency is preventing success-ful mutual authentication.

3. Choose a key version number for the new key, based on what you learned in Step 2 plus the following requirements:

• It is best to keep your key version numbers in sequence by choosing a key version number one greater than the largestexisting one.


• Key version numbers must be integers between 0 and 255 to comply with Kerberos standards.

• Do not reuse a key version number currently listed in the KeyFile file.

4. On the system control machine, issue the bos addkey command to create a new AFS server encryption key in the KeyFilefile.

# bos addkey <machine name> -kvno <key version number> -noauthinput key: <afs_password>Retype input key: <afs_password>

where

addk Is the shortest acceptable abbreviation of addkey.

machine name Names the file server machine on which to define the new key in the KeyFile file.

-kvno Specifies the key version number you chose in Step 3, an integer in the range 0 (zero) through 255. You mustspecify the same number in Steps 7, 8, and 13.

-noauth Bypasses mutual authentication with the BOS Server. Include it in case the key emergency is preventing success-ful mutual authentication.

afs_password Is a character string similar to a user password, of any length from one to about 1,000 characters. Toimprove security, make the string as long as is practical, and include nonalphabetic characters.Do not type an octal string directly. The BOS Server scrambles the character string into an octal string appropriatefor use as an encryption key before recording it in the KeyFile file.Remember the string. You need to use it again in Steps 7, 8, and 13.

5. On every database server machine in your cell (other than the system control machine), disable authorization checkingas instructed in Disabling Authorization Checking in an Emergency. Do not repeat the procedure on the system controlmachine, if it is a database server machine, because you already disabled authorization checking in Step 1. (If you needto learn which machines are database server machines, use the bos listhosts command as described in To locate databaseserver machines.)

6. Wait at least 90 seconds after finishing Step 5, to allow each of the database server processes (the Authentication, Backup,Protection and Volume Location Servers) to finish electing a new sync site. Then issue the udebug command to verify thatthe election worked properly. Issue the following commands, substituting each database server machine’s name for servermachine in turn. Include the system control machine if it is a database server machine.

# udebug <server machine> buserver# udebug <server machine> kaserver# udebug <server machine> ptserver# udebug <server machine> vlserver

For each process, the output from all of the database server machines must agree on which one is the sync site for theprocess. It is not, however, necessary that the same machine serves as the sync site for each of the four processes. For eachprocess, the output from only one machine must include the following string:

I am sync site ...

The output on the other machines instead includes the following line

I am not sync site

and a subsequent line that begins with the string Sync host and specifies the IP address of the machine claiming to bethe sync site.

If the output does not meet these requirements or seems abnormal in another way, contact AFS Product Support forassistance.

7. On every database server machine in your cell (other than the system control machine), issue the bos addkey commanddescribed in Step 4. Be sure to use the same values for afs_password and kvno as you used in that step.


8. Issue the kas setpassword command to define the new key in the Authentication Database’s afs entry. It must match thekey you created in Step 4 and Step 7.

# kas setpassword -name afs -kvno <key version number> -noauthnew_password: <afs_password>Verifying, please re-enter new_password: <afs_password>

where


-kvno Is the same key version number you specified in Step 4.

afs_password Is the same character string you specified as afs_password in Step 4. It does not echo visibly.

9. On every database server machine in your cell (including the system control machine if it is a database server machine),reenable authorization checking as instructed in Reenabling Authorization Checking in an Emergency. If the systemcontrol machine is not a database server machine, do not perform this procedure until Step 11.

10. Repeat Step 6 to verify that each database server process has properly elected a sync site after being restarted in Step 9.

11. On the system control machine (if it is not a database server machine), reenable authorization checking as instructedin Reenabling Authorization Checking in an Emergency. If it is a database server machine, you already performed theprocedure in Step 9.

12. On all remaining (simple) file server machines, disable authorization checking as instructed in Disabling AuthorizationChecking in an Emergency.

13. On all remaining (simple) file server machines, issue the bos addkey command described in Step 4. Be sure to use thesame values for afs_password and kvno as you used in that step.

14. On all remaining (simple) file server machines, reenable authorization checking as instructed in Reenabling Authoriza-tion Checking in an Emergency.


Part III

Managing Client Machines


Chapter 10

Administering Client Machines and the Cache Man-ager

This chapter describes how to administer an AFS client machine, which is any machine from which users can access the AFSfilespace and communicate with AFS server processes. (A client machine can simultaneously function as an AFS server machineif appropriately configured.) An AFS client machine has the following characteristics:

• The kernel includes the set of modifications, commonly referred to as the Cache Manager, that enable access to AFS filesand directories. You can configure many of the Cache Manager’s features to suit your users’ needs. See Overview of CacheManager Customization.

• The /usr/vice/etc directory on the local disk stores several configuration files. See Configuration Files in the /usr/vice/etcDirectory.

• A cache stores temporary copies of data fetched from AFS file server machines, either in machine memory or on a devotedlocal disk partition. See Determining the Cache Type, Size, and Location and Setting Other Cache Parameters with the afsdprogram.

To learn how to install the client functionality on a machine, see the OpenAFS Quick Beginnings.



Display cache size set at reboot cat /usr/vice/etc/cacheinfoDisplay current cache size and usage fs getcacheparmsChange disk cache size without rebooting fs setcachesizeInitialize Cache Manager afsdDisplay contents of CellServDB file cat /usr/vice/etc/CellServDBDisplay list of database server machines from kernel memory fs listcellsChange list of database server machines in kernel memory fs newcellCheck cell’s status regarding setuid fs getcellstatusSet cell’s status regarding setuid fs setcellSet server probe interval fs checkservers -intervalDisplay machine’s cell membership cat /usr/vice/etc/ThisCellChange machine’s cell membership Edit /usr/vice/etc/ThisCellFlush cached file/directory fs flushFlush everything cached from a volume fs flushvolumeUpdate volume-to-mount-point mappings fs checkvolumesDisplay Cache Manager’s server preference ranks fs getserverprefs


Set Cache Manager’s server preference ranks fs setserverprefsDisplay client machine addresses to register fs getclientaddrsSet client machine addresses to register fs setclientaddrsControl the display of warning and status messages fs messagesDisplay and change machine’s system type fs sysnameEnable asynchronous writes fs storebehind

10.2 Overview of Cache Manager Customization

An AFS client machine’s kernel includes a set of modifications, commonly referred to as the Cache Manager, that enable accessto AFS files and directories and communications with AFS server processes. It is common to speak of the Cache Manager asa process or program, and in regular usage it appears to function like one. When configuring it, though, it is helpful to keep inmind that this usage is not strictly accurate.

The Cache Manager mainly fetches files on behalf of application programs running on the machine. When an application requestsan AFS file, the Cache Manager contacts the Volume Location (VL) Server to obtain a list of the file server machines that housethe volume containing the file. The Cache Manager then translates the application program’s system call requests into remoteprocedure calls (RPCs) to the File Server running on the appropriate machine. When the File Server delivers the file, the CacheManager stores it in a local cache before delivering it to the application program.

The File Server delivers a data structure called a callback along with the file. (To be precise, it delivers a callback for eachfile fetched from a read/write volume, and a single callback for all data fetched from a read-only volume.) A valid callbackindicates that the Cache Manager’s cached copy of a file matches the central copy maintained by the File Server. If an applicationon another AFS client machine changes the central copy, the File Server breaks the callback, and the Cache Manager mustretrieve the new version when an application program on its machine next requests data from the file. As long as the callbackis unbroken, however, the Cache Manager can continue to provide the cached version of the file to applications on its machine,which eliminates unnecessary network traffic.

The indicated sections of this chapter explain how to configure and customize the following Cache Manager features. All but thefirst (choosing disk or memory cache) are optional, because AFS sets suitable defaults for them.

• disk or memory cache. The AFS Cache Manager can use machine memory for caching instead of space on the local disk.Deciding which to use is the most basic configuration decision you must make. See Determining the Cache Type, Size, andLocation.

• cache size. Cache size probably has the most direct influence on client machine performance. It determines how often theCache Manager must contact the File Server across the network or discard cached data to make room for newly requestedfiles, both of which affect how quickly the Cache Manager delivers files to users. See Determining the Cache Type, Size, andLocation.

• cache location. For a disk cache, you can alter the conventional cache directory location (/usr/vice/cache) to take advantageof greater space availability on other disks on the machine. A larger cache can result in faster file delivery. See Determiningthe Cache Type, Size, and Location.

• chunk size and number. The afsd program, which initializes the Cache Manager, allows you to control the size and number ofchunks into which a cache is divided, plus related parameters. Setting these parameters is optional, because there are reasonabledefaults, but it provides precise control. The AFS distribution includes configuration scripts that set Cache Manager parametersto values that are reasonable for different configurations and usage patterns. See Setting Other Cache Parameters with the afsdprogram.

• knowledge of database server machines. Enable access to a cell’s AFS filespace and other services by listing the cell’s databaseserver machines in the /usr/vice/etc/CellServDB file on the local disk. See Maintaining Knowledge of Database ServerMachines.

• setuid privilege. You can control whether the Cache Manager allows programs from a cell to execute with setuid permission.See Determining if a Client Can Run Setuid Programs.


• cell membership. Each client belongs to a one cell defined by the local /usr/vice/etc/ThisCell file. Cell membership determinesthe default cell in which the machine’s users are authenticated and in which AFS commands run. See Setting a Client Machine’sCell Membership.

• cached file version. AFS’s system of callbacks normally guarantees that the Cache Manager has the most current versions offiles and directories possible. Nevertheless, you can force the Cache Manager to fetch the most current version of a file fromthe File Server if you suspect that the cache contains an outdated version. See Forcing the Update of Cached Data.

• File Server and Volume Location Server preferences. The Cache Manager sets numerical preference ranks for the interfaces onfile server machines and Volume Server (VL) machines. The ranks determine which interface the Cache Manager first attemptsto use when fetching data from a volume or from the Volume Location Database (VLDB). The Cache Manager sets defaultranks as it initializes, basing them on its network proximity to each interface, but you can modify the preference ranks if youwish. See Maintaining Server Preference Ranks.

• interfaces registered with the File Server. If the Cache Manager is multihomed (has multiple interface addresses), you cancontrol which of them it registers for File Servers to use when they initiate RPCs to the client machine. See ManagingMultihomed Client Machines.

• display of information messages. By default, the Cache Manager sends basic error and informational messages to the clientmachine’s console and to command shells. You can disable the messaging. See Controlling the Display of Warning andInformational Messages.

• system type. The Cache Manager records the local machine’s AFS system type in kernel memory, and substitutes the value forthe @sys variable in pathnames. See Displaying and Setting the System Type Name.

• delayed writes. By default, the Cache Manager writes all data to the File Server immediately and synchronously when anapplication program closes a file. You can enable asynchronous writes, either for an individual file, or all files that the CacheManager handles, and set how much data remains to be written when the Cache Manager returns control to the closing appli-cation. See Enabling Asynchronous Writes.

You must make all configuration changes on the client machine itself (at the console or over a direct connection such as a telnetconnection). You cannot configure the Cache Manager remotely. You must be logged in as the local superuser root to issue somecommands, whereas others require no privilege. All files mentioned in this chapter must actually reside on the local disk of eachAFS client machine (they cannot, for example, be symbolic links to files in AFS).

10.3 Configuration and Cache-Related Files on the Local Disk

This section briefly describes the client configuration files that must reside in the local /usr/vice/etc directory on every clientmachine. If the machine uses a disk cache, there must be a partition devoted to cache files; by convention, it is mounted at the/usr/vice/cache directory.

Note for Windows users: Some files described in this document possibly do not exist on machines that run a Windows operatingsystem. Also, Windows uses a backslash (\) rather than a forward slash (/) to separate the elements in a pathname.

10.3.1 Configuration Files in the /usr/vice/etc Directory

The /usr/vice/etc directory on a client machine’s local disk must contain certain configuration files for the Cache Manager tofunction properly. They control the most basic aspects of Cache Manager configuration.

If it is important that the client machines in your cell perform uniformly, it is most efficient to update these files from a centralsource. The following descriptions include pointers to sections that discuss how best to maintain the files.

afsd The binary file for the program that initializes the Cache Manager. It must run each time the machine reboots in order forthe machine to remain an AFS client machine. The program also initializes several daemons that improve Cache Managerfunctioning, such as the process that handles callbacks.


cacheinfo A one-line file that sets the cache’s most basic configuration parameters: the local directory at which the CacheManager mounts the AFS filespace, the local disk directory to use as the cache, and how many kilobytes to allocate to thecache.

The OpenAFS Quick Beginnings explains how to create this file as you install a client machine. To change the cache sizeon a machine that uses a memory cache, edit the file and reboot the machine. On a machine that uses a disk cache, you canchange the cache size without rebooting by issuing the fs setcachesize command. For instructions, see Determining theCache Type, Size, and Location.

CellServDB This ASCII file names the database server machines in the local cell and in any foreign cell to which you want toenable access from this machine. (Database server machines are the machines in a cell that run the Authentication, Backup,Protection, and VL Server processes; see Database Server Machines.)

The Cache Manager must be able to reach a cell’s database server machines to fetch files from its filespace. Incorrect ormissing information in the CellServDB file can slow or completely block access. It is important to update the file whenevera cell’s database server machines change.

As the afsd program initializes the Cache Manager, it loads the contents of the file into kernel memory. The CacheManager does not read the file between reboots, so to incorporate changes to the file into kernel memory, you must rebootthe machine. Alternatively, you can issue the fs newcell command to insert the changes directly into kernel memory withoutchanging the file. It can also be convenient to upgrade the file from a central source. For instructions, see MaintainingKnowledge of Database Server Machines.

(The CellServDB file on client machines is not the same as the one kept in the /usr/afs/etc directory on server machines,which lists only the local cell’s database server machines. For instructions on maintaining the server CellServDB file, seeMaintaining the Server CellServDB File).

NetInfo This optional ASCII file lists one or more of the network interface addresses on the client machine. If it exists whenthe Cache Manager initializes, the Cache Manager uses it as the basis for the list of interfaces that it registers with FileServers. See Managing Multihomed Client Machines.

NetRestrict This optional ASCII file lists one or more network interface addresses. If it exists when the Cache Manager initial-izes, the Cache Manager removes the specified addresses from the list of interfaces that it registers with File Servers. SeeManaging Multihomed Client Machines.

ThisCell This ASCII file contains a single line that specifies the complete domain-style name of the cell to which the machinebelongs. Examples are example.com and example.org. This value defines the default cell in which the machine’susers become authenticated, and in which the command interpreters (for example, the bos command) contact server pro-cesses.

The OpenAFS Quick Beginnings explains how to create this file as you install the AFS client functionality. To learn aboutchanging a client machine’s cell membership, see Setting a Client Machine’s Cell Membership.

In addition to these files, the /usr/vice/etc directory also sometimes contains the following types of files and subdirectories:

• The AFS initialization script, called afs.rc on many system types. In the conventional configuration specified by the OpenAFSQuick Beginnings, it is a symbolic link to the actual script kept in the same directory as other initialization files used by theoperating system.

• A subdirectory that houses AFS kernel library files used by a dynamic kernel loading program.

• A subdirectory called C, which houses the Cache Manager catalog file called afszcm.cat. The fstrace program uses the catalogfile to translate operation codes into character strings, which makes the message in the trace log more readable. See About thefstrace Command Suite.

10.3.2 Cache-Related Files

A client machine that uses a disk cache must have a local disk directory devoted to the cache. The conventional mount point is/usr/vice/cache, but you can use another partition that has more available space.

Do not delete or directly modify any of the files in the cache directory. Doing so can cause a kernel panic, from which the onlyway to recover is to reboot the machine. By default, only the local superuser root can read the files directly, by virtue of owningthem.


A client machine that uses a memory cache keeps all of the information stored in these files in machine memory instead.

CacheItems A binary-format file in which the Cache Manager tracks the contents of cache chunks (the V files in the directory,described just following), including the file ID number (fID) and the data version number.

VolumeItems A binary-format file in which the Cache Manager records the mapping between mount points and the volumesfrom which it has fetched data. The Cache Manager uses the information when responding to the pwd command, amongothers.

Vn A cache chunk file, which expands to a maximum size (by default, 64 KB) to house data fetched from AFS files. The numberof Vn files in the cache depends on the cache size among other factors. The n is the index assigned to each file; they arenumbered sequentially, but the Cache Manager does not necessarily use them in order or contiguously. If an AFS file islarger than the maximum size for Vn files, the Cache Manager divides it across multiple Vn files.

10.4 Determining the Cache Type, Size, and Location

This section explains how to configure a memory or disk cache, how to display and set the size of either type of cache, and howto set the location of the cache directory for a disk cache.

The Cache Manager uses a disk cache by default, and it is the preferred type of caching. To configure a memory cache, includethe -memcache flag on the afsd command, which is normally invoked in the machine’s AFS initialization file. If configured touse a memory cache, the Cache Manager does no disk caching, even if the machine has a disk.

10.4.1 Choosing the Cache Size

Cache size influences the performance of a client machine more directly than perhaps any other cache parameter. The largerthe cache, the faster the Cache Manager is likely to deliver files to users. A small cache can impair performance because itincreases the frequency at which the Cache Manager must discard cached data to make room for newly requested data. Whenan application asks for data that has been discarded, the Cache Manager must request it from the File Server, and fetching dataacross the network is almost always slower than fetching it from the local disk. The Cache Manager never discards data froma file that has been modified locally but not yet stored back to the File Server. If the cache is very small, the Cache Managerpossible cannot find any data to discard. For more information about the algorithm it uses when discarding cached data, see Howthe Cache Manager Chooses Data to Discard).

The amount of disk or memory you devote to caching depends on several factors. The amount of space available in memory oron the partition housing the disk cache directory imposes an absolute limit. In addition, you cannot allocate more than 95% of thespace available on the cache directory’s partition to a disk cache. The afsd program exits without starting the Cache Manager andprints an appropriate message to the standard output stream if you violate this restriction. For a memory cache, you must leaveenough memory for other processes and applications to run. If you try to allocate more memory than is actually available, theafsd program exits without initializing the Cache Manager and produces the following message on the standard output stream:

afsd: memCache allocation failure at number KB

where number is how many kilobytes were allocated just before the failure.

Within these hard limits, the factors that determine appropriate cache size include the number of users working on the machine,the size of the files with which they usually work, and (for a memory cache) the number of processes that usually run on themachine. The higher the demand from these factors, the larger the cache needs to be to maintain good performance.

Disk caches smaller than 10 MB do not generally perform well. Machines serving multiple users usually perform better with acache of at least 60 to 70 MB. The point at which enlarging the cache further does not really improve performance depends onthe factors mentioned previously, and is difficult to predict.

Memory caches smaller than 1 MB are nonfunctional, and the performance of caches smaller than 5 MB is usually unsatisfactory.Suitable upper limits are similar to those for disk caches but are probably determined more by the demands on memory fromother sources on the machine (number of users and processes). Machines running only a few processes possibly can use a smallermemory cache.

AFS imposes an absolute limit on cache size in some versions. See the OpenAFS Release Notes for the version you are using.


10.4.2 Displaying and Setting the Cache Size and Location

The Cache Manager determines how big to make the cache by reading the /usr/vice/etc/cacheinfo file as it initializes. As directedin the OpenAFS Quick Beginnings, you must create the file before running the afsd program. The file also defines the directoryon which to mount AFS (by convention, /afs), and the local disk directory to use for a cache directory.

To change any of the values in the file, log in as the local superuser root. You must reboot the machine to have the new valuetake effect. For instructions, see To edit the cacheinfo file.

To change the cache size at reboot without editing the cacheinfo file, include the -blocks argument to the afsd command; see thecommand’s reference page in the OpenAFS Administration Reference.

For a disk cache, you can also use the fs setcachesize command to reset the cache size without rebooting. The value you setpersists until the next reboot, at which time the cache size returns to the value specified in the cacheinfo file or by the -blocksargument to the afsd command. For instructions, see To change the disk cache size without rebooting.

To display the current cache size and the amount of space the Cache Manager is using at the moment, use the fs getcacheparmscommand as detailed in To display the current cache size.

10.4.3 To display the cache size set at reboot

1. Use a text editor or the cat command to display the contents of the /usr/vice/etc/cacheinfo file.

% cat /usr/vice/etc/cacheinfo

10.4.4 To display the current cache size

1. Issue the fs getcacheparms command on the client machine.

% fs getcacheparms

where getca is the shortest acceptable abbreviation of getcacheparms.

The output shows the number of kilobyte blocks the Cache Manager is using as a cache at the moment the command isissued, and the current size of the cache. For example:

AFS using 13709 of the cache’s available 15000 1K byte blocks.

10.4.5 To edit the cacheinfo file



2. Use a text editor to edit the /usr/vice/etc/cacheinfo file, which has three fields, separated by colons:

• The first field names the local directory on which to mount the AFS filespace. The conventional location is /afs.• The second field defines the local disk directory to use for the disk cache. The conventional location is the /usr/vice/cache

directory, but you can specify an alternate directory if another partition has more space available. There must always bea value in this field, but the Cache Manager ignores it if the machine uses a memory cache.

• The third field defines cache size as a number of kilobyte (1024-byte) blocks.

The following example mounts the AFS filespace at the /afs directory, names /usr/vice/cache as the cache directory, andsets cache size to 50,000 KB:

/afs:/usr/vice/cache:50000


10.4.6 To change the disk cache size without rebooting



2. Issue the fs setcachesize command to set a new disk cache size.

NoteThis command does not work for a memory cache.

# fs setcachesize <size in 1K byte blocks (0 => reset)>

where

setca Is the shortest acceptable abbreviation of setcachesize.

size in 1K byte blocks (0 => reset) Sets the number of kilobyte blocks to be used for the cache. Specify a positive integer(1024 equals 1 MB), or 0 (zero) to reset the cache size to the value specified in the cacheinfo file.

10.4.7 To reset the disk cache size to the default without rebooting



2. Issue the fs setcachesize command to reset the size of the local disk cache (the command does not work for a memorycache). Choose one of the two following options:

• To reset the cache size to the value specified in the local cacheinfo file, specify the value 0 (zero)

# fs setcachesize 0

• To reset the cache size to the value set at the last reboot of the machine, include the -reset flag. Unless the -blocksargument was used on the afsd command, this is also the value in the cacheinfo file.

# fs setcachesize -reset

where

setca Is the shortest acceptable abbreviation of setcachesize.

0 Resets the disk cache size to the value in the third field of the /usr/vice/etc/cacheinfo file.

-reset Resets the cache size to the value set at the last reboot.

10.4.8 How the Cache Manager Chooses Data to Discard

When the cache is full and application programs request more data from AFS, the Cache Manager must flush out cache chunksto make room for the data. The Cache Manager considers two factors:

1. How recently an application last accessed the data.

2. Whether the chunk is dirty. A dirty chunk contains changes to a file that have not yet been saved back to the permanentcopy stored on a file server machine.


The Cache Manager first checks the least-recently used chunk. If it is not dirty, the Cache Manager discards the data in thatchunk. If the chunk is dirty, the Cache Manager moves on to check the next least recently used chunk. It continues in this manneruntil it has created a sufficient number of empty chunks.

Chunks that contain data fetched from a read-only volume are by definition never dirty, so the Cache Manager can always discardthem. Normally, the Cache Manager can also find chunks of data fetched from read/write volumes that are not dirty, but a smallcache makes it difficult to find enough eligible data. If the Cache Manager cannot find any data to discard, it must return I/Oerrors to application programs that request more data from AFS. Application programs usually have a means for notifying theuser of such errors, but not for revealing their cause.

10.5 Setting Other Cache Parameters with the afsd program

There are only three cache configuration parameters you must set: the mount directory for AFS, the location of the disk cachedirectory, and the cache size. They correspond to the three fields in the /usr/vice/etc/cacheinfo file, as discussed in Determiningthe Cache Type, Size, and Location. However, if you want to experiment with fine-tuning cache performance, you can use thearguments on the afsd command to control several other parameters. This section discusses a few of these parameters that havethe most direct effect on cache performance. To learn more about the afsd command’s arguments, see its reference page in theOpenAFS Administration Reference.

In addition, the AFS initialization script included in the AFS distribution for each system type includes several variables thatset several afsd arguments in a way that is suitable for client machines of different sizes and usage patterns. For instructions onusing the script most effectively, see the section on configuring the Cache Manager in the OpenAFS Quick Beginnings.

10.5.1 Setting Cache Configuration Parameters

The cache configuration parameters with the most direct effect on cache performance include the following:

• total cache size. This is the amount of disk space or machine memory available for caching, as discussed in detail in Deter-mining the Cache Type, Size, and Location.

• number of cache chunks. For a disk cache, each chunk is a Vn file in the local cache directory (see Cache-Related Files). Fora memory cache, each chunk is a set of contiguous blocks allocated in machine memory.

This parameter does not have as much of an effect on cache performance as total size. However, adjusting it can influence howoften the Cache Manager must discard cached data to make room for new data. Suppose, for example, that you set the diskcache size to 50 MB and the number of chunks (Vn files) to 1,000. If each of the ten users on the machine caches 100 AFSfiles that average 20 KB in size, then all 1,000 chunks are full (a chunk can contain data from only one AFS file) but the cacheholds only about 20 MB of data. When a user requests more data from the File Server, the Cache Manager must discard cacheddata to reclaim some chunks, even though the cache is filled to less than 50% of its capacity. In such a situation, increasing thenumber of chunks enables the Cache Manager to discard data less often.

• chunk size. This parameter determines the maximum amount of data that can fit in a chunk. If a cached element is smallerthan the chunk size, the remaining space in the chunk is not used (a chunk can hold no more than one element). If an elementcannot fit in a single chunk, it is split across as many chunks as needed. This parameter also determines how much data theCache Manager requests at a time from the File Server (how much data per fetch RPC, because AFS uses partial file transfer).

The main reason to change chunk size is because of its relation to the amount of data fetched per RPC. If your network linksare very fast, it can improve performance to increase chunk size; if the network is especially slow, it can make sense to decreasechunk size.

• number of dcache entries in memory. The Cache Manager maintains one dcache entry for each cache chunk, recording a smallamount of information, such as the file ID (fID) and version number of the AFS file corresponding to the chunk.

For a disk cache, dcache entries reside in the /usr/vice/cache/CacheItems file; a small number are duplicated in machinememory to speed access.

For a memory cache, the number of dcache entries equals the number of cache chunks. For a discussion of the implications ofthis correspondence, see Controlling Memory Cache Configuration.


For a description of how the Cache Manager determines defaults for number of chunks, chunk size, and number of dcacheentries in a disk cache, see Configuring a Disk Cache; for a memory cache, see Controlling Memory Cache Configuration. Theinstructions also explain how to use the afsd command’s arguments to override the defaults.

10.5.2 Configuring a Disk Cache

The default number of cache chunks (Vn files) in a disk cache is calculated by the afsd command to be the greatest of thefollowing:

• 100

• 1.5 times the result of dividing cache size by chunk size (cachesize/chunksize * 1.5)

• The result of dividing cachesize by 10 MB (cachesize/10240)

You can override this value by specifying a positive integer with the -files argument. Consider increasing this value if more than75% of the Vn files are already used soon after the Cache Manager finishes initializing. Consider decreasing it if only a smallpercentage of the chunks are used at that point. In any case, never specify a value less than 100, because a smaller value cancause performance problems.

The following example sets the number of Vn files to 2,000:

/usr/vice/etc/afsd -files 2000

NoteIt is conventional to place the afsd command in a machine’s AFS initialization file, rather than entering it in a command shell.Furthermore, the values specified in this section are examples only, and are not necessarily suitable for a specific machine.

The default chunk size for a disk cache is 64 KB. In general, the only reason to change it is to adjust to exceptionally slow orfast networks; see Setting Cache Configuration Parameters. You can use the -chunksize argument to override the default. Chunksize must be a power of 2, so provide an integer between 0 (zero) and 30 to be used as an exponent of 2. For example, a valueof 10 sets chunk size to 1 KB (210 = 1024); a value of 16 equals the default for disk caches (216 = 64 KB). Specifying a valueof 0 (zero) or greater than 30 returns chunk size to the default. Values less than 10 (1 KB) are not recommended. The followingexample sets chunk size to 16 KB (214):

/usr/vice/etc/afsd -chunksize 14

For a disk cache, the default number of dcache entries duplicated in memory is one-half the number of chunks specified withthe -files argument, to a maximum of 2,000 entries. You can use the -dcache argument to change the default, even exceeding2,000 if you wish. Duplicating more than half the dcache entries in memory is not usually necessary, but sometimes improvesperformance slightly, because access to memory is faster than access to disk. The following example sets the number to 750:

/usr/vice/etc/afsd -dcache 750

When configuring a disk cache, you can combine the afsd command’s arguments in any way. The main reason for this flexibilityis that the setting you specify for disk cache size (in the cacheinfo file or with the -blocks argument) is an absolute maximumlimit. You cannot override it by specifying higher values for the -files or -chunksize arguments, alone or in combination. Arelated reason is that the Cache Manager does not have to reserve a set amount of memory on disk. Vn files (the chunks in a diskcache) are initially zero-length, but can expand up to the specified chunk size and shrink again, as needed. If you set the numberof Vn files to such a large value that expanding all of them to the full allowable size exceeds the total cache size, they simplynever grow to full size.


10.5.3 Controlling Memory Cache Configuration

Configuring a memory cache differs from configuring a disk cache in that not all combinations of the afsd command’s argumentsare allowed. This limitation results from the greater interaction between the configuration parameters in a memory cache than adisk cache. If all combinations are allowed, it is possible to set the parameters in an inconsistent way. A list of the acceptableand unacceptable combinations follows a discussion of default values.

The default chunk size for a memory cache is 8 KB. In general, the only reason to change it is to adjust to exceptionally slow orfast networks; see Setting Cache Configuration Parameters.

There is no predefined default for number of chunks in a memory cache. The Cache Manager instead calculates the correctnumber by dividing the total cache size by the chunk size. Recall that for a memory cache, all dcache entries must be in memory.This implies that the number of chunks equals the number of dcache entries in memory, and that there is no default for numberof dcache entries (like the number of chunks, it is calculated by dividing the total size by the chunk size).

The following are acceptable combinations of the afsd command’s arguments when configuring a memory cache:

• -blocks alone, which overrides the cache size specified in the /usr/vice/etc/cacheinfo file. The Cache Manager divides thevalue of this argument by the default chunk size of eight KB to calculate the number of chunks and dcache entries. Thefollowing example sets cache size to five MB (5,120 KB) and the number of chunks to 640 (5,120 divided by 8):

/usr/vice/etc/afsd -memcache -blocks 5120

• -chunksize alone, to override the default of eight KB. The chunk size must be a power of two, so provide an integer between0 (zero) and 30 to be used as an exponent of two. For example, a value of ten sets chunk size to 1 KB (210 = 1024); a value of13 equals the default for memory caches (213 = 8 KB). Specifying a value of 0 (zero) or greater than 30 returns the chunk sizeto the default. Values less than ten (equivalent to 1 KB) are not recommended. The following example sets the chunk size tofour KB (212). Assuming a total cache size of four MB (4,096 KB), the resulting number of chunks is 1024.

/usr/vice/etc/afsd -memcache -chunksize 12

• -blocks and -chunksize together override the defaults for cache size and chunk size. The Cache Manager divides the first bythe second to calculate the number of chunks and dcache entries. For example, the following example sets the cache size to sixMB (6,144 KB) and chunksize to four KB (212), resulting in 1,536 chunks:

/usr/vice/etc/afsd -memcache -blocks 6144 -chunksize 12

The following arguments or combinations explicitly set the number of chunks and dcache entries. It is best not to use them,because they set the cache size indirectly, forcing you to perform a hand calculation to determine the size of the cache. Instead,set the -blocks and -chunksize arguments alone or in combination; in those cases, the Cache Manager determines the number ofchunks and dcache entries itself. Because the following combinations are not recommended, no examples are included.

• The -dcache argument alone explicitly sets the number of chunks and dcache entries. The Cache Manager multiples this valuetimes the default chunk size of 8 KB to derive the total cache size (overriding the value in the cacheinfo file).

• The combination of -dcache and -chunksize sets the chunk number and size. The Cache Manager sets the specified values andmultiplies them together to obtain total cache size (overriding the value in the cacheinfo file).

Do not use the following arguments for a memory cache:

• -files alone. This argument controls the number of Vn files for a disk cache, but is ignored for a memory cache.

• -blocks and -dcache. An error message results, because it is possible to provide values such that dividing the first (total size)by the second (number of chunks) results in a chunk size that is not a power of two.


10.5.4 Tuning Cache Configuration

Tuning the parameters of the OpenAFS cache for optimal performance is highly dependent on the behavior of applications andusers on a client machine. The default options may perform poorly under certain conditions.

The xstat_cm_test command is useful for measuring how effectively the cache is operating. The following procedure may beused to aide in tuning the parameters for the data cache (dcache) and the stats cache (vcache):

1. Run the following command and replace "hostname" with the hostname of the machine to be measured:

xstat_cm_test hostname 2 -onceonly

Take note of the following fields: dcacheHits, dcacheMisses, vcacheHits, and vcacheMisses. Saving the above commandoutput to a file or filtering it using grep is advised.

2. Using the noted fields, compute the miss ratios for the dcache and vcache using the following formulas:

dcache miss ratio = dcacheMisses / ( dcacheMisses + dcacheHits )

vcache miss ratio = vcacheMisses / ( vcacheMisses + vcacheHits )

As a guideline, a miss ratio of 0.05 (5 percent) or less is acceptable and a miss ratio of 0.01 (1 percent) or less is recom-mended.

3. If your dcache miss ratio is too large, then cache performance is likely to improve if the data cache is made larger. If thevcache miss ratio is too large, then increase the size of the stat cache using the -stat parameter to afsd for a Unix-basedclient or using the Control Panel or registry interfaces on Microsoft Windows-based clients. The default size of the statcache is 10,000 entries on windows platforms and 300 entries on Unix platforms. There may be a significant performancepenalty when the vcache size is much smaller than the working set of commonly accessed files. On the fileserver, thenumber of callbacks should be more than the size of the vcache of any client that connects to the server. If the cache istoo small or there aren’t enough callbacks (-cb) on the fileserver, then the cached entries will be discarded prematurely,causing thrashing.

TipAs an example of how the wrong vcache size can degrade performance, one OpenAFS site had performance issueswith the Apache and mod_php software on a Unix web server serving web pages directly out of AFS. During peak times,the load on the server would spike with an excess of Apache processes. After profiling, it was found that Apache andPHP made lots of stat() library calls and that the default vcache size of 300 was too small. After some experimentation,a vcache size of 50,000 was found to improve performance. This size makes sense in light of that fact that the totalnumber of files in the website exceeded 350,000, including 50,000 PHP files. The number of callbacks configured on thefileserver was 1,500,000, so the vcache size was not too large.

4. After changing your configuration appropriately and restarting the AFS client service, wait until enough data has beencollected before changing the configuration further. The sum of the hits and misses should be at least five times the valueof the configured parameter before making further adjustments. Repeat this process until the desired miss ratio is achieved.Take note that the numbers from the xstat_cm_test command only reset when the client is restarted. If multiple samplesare taken, then subtract the previous measurement from the current measurement to accurately measure the activity thathappened between the samples.

10.6 Maintaining Knowledge of Database Server Machines

For the users of an AFS client machine to access a cell’s AFS filespace and other services, the Cache Manager and other client-side agents must have an accurate list of the cell’s database server machines. The affected functions include the following:


• Accessing files. The Cache Manager contacts the Volume Location (VL) Server to learn which file server machine houses thevolume containing a requested file or directory. If the Cache Manager cannot contact a cell’s VL Servers, it cannot fetch files.

• Authenticating. The klog program and AFS-modified login utilities contact the Authentication Server to obtain tokens, whichthe AFS server processes accept as proof that the user is authenticated.

• Creating protection groups. The pts command interpreter contacts the Protection Server when users create protection groupsor request information from the Protection Database.

• Editing access control lists (ACLs). The fs command interpreter contacts the File Server that maintains the read/write volumecontaining a file or directory; the location information comes from the VL Server.

To enable a machine’s users to access a cell, you must list the names and IP addresses of its database server machines in the/usr/vice/etc/CellServDB file on the machine’s local disk. In addition to the machine’s home cell, you can list any foreign cellsthat you want to enable users to access. (To enable access to a cell’s filespace, you must also mount its root.cell volume in thelocal AFS filespace; the conventional location is just under the AFS root directory, /afs. For instructions, see the OpenAFS QuickBeginnings.)

10.6.1 How Clients Use the List of Database Server Machines

As the afsd program runs and initializes the Cache Manager, it reads the contents of the CellServDB file into kernel memory.The Cache Manager does not consult the file again until the machine next reboots. In contrast, the command interpreters for theAFS command suites (such as fs and pts) read the CellServDB file each time they need to contact a database server process.

When a cell’s list of database server machines changes, you must change both the CellServDB file and the list in kernel memoryto preserve consistent client performance; some commands probably fail if the two lists of machines disagree. One possiblemethod for updating both the CellServDB file and kernel memory is to edit the file and reboot the machine. To avoid needing toreboot, you can instead perform both of the following steps:

1. Issue the fs newcell command to alter the list in kernel memory directly, making the changes available to the CacheManager.

2. Edit the CellServDB file to make the changes available to command interpreters. For a description of the file’s format, seeThe Format of the CellServDB file.

The consequences of missing or incorrect information in the CellServDB file or kernel memory are as follows:

• If there is no entry for a cell, the machine’s users cannot access the cell.

• If a cell’s entry does not include a database server machine, then the Cache Manager and command interpreters never attemptto contact the machine. The omission does not prevent access to the cell--as long as the information about the other databaseserver machines is correct and the server processes, machines, and network are functioning correctly--but it can put an undueburden on the machines that are listed. If all of the listed machines become inaccessible to clients, then the cell becomesinaccessible even if the omitted database server machine is functioning correctly.

• If a machine’s name or address is incorrect, or the machine is not actually running the database server processes, then requestsfrom clients time out. Users can experience lengthy delays because they have to wait the full timeout period before the CacheManager or command interpreter contacts another database server machine.

10.6.2 The Format of the CellServDB file

When editing the /usr/vice/etc/CellServDB file, you must use the correct format for cell and machine entries. Each cell has aseparate entry. The first line has the following format:

>cell_name #organization


where cell_name is the cell’s complete Internet domain name (for example, example.com) and organization is an optional fieldthat follows any number of spaces and the number sign (#) and can name the organization to which the cell corresponds (forexample, the Example Corporation). After the first line comes a separate line for each database server machine. Each line hasthe following format:

IP_address #machine_name

where IP_address is the machine’s IP address in dotted decimal format (for example, 192.12.105.3). Following any number ofspaces and the number sign (#) is machine_name, the machine’s fully-qualified hostname (for example, db1.example.com). Inthis case, the number sign does not indicate a comment: machine_name is a required field.

The order in which the cells appear is not important, but it is convenient to put the client machine’s home cell first. Do not includeany blank lines in the file, not even after the last entry.

The following example shows entries for two cells, each of which has three database server machines:

>example.com #Example Corporation (home cell)192.12.105.3 #db1.example.com192.12.105.4 #db2.example.com192.12.105.55 #db3.example.com>example.org #Example Organization cell138.255.68.93 #serverA.example.org138.255.68.72 #serverB.example.org138.255.33.154 #serverC.example.org

10.6.3 Maintaining the Client CellServDB File

Because a correct entry in the CellServDB file is vital for consistent client performance, you must also update the file on eachclient machine whenever a cell’s list of database server machines changes (for instance, when you follow the instructions in theOpenAFS Quick Beginnings to add or remove a database server machine).

Creating a symbolic or hard link from /usr/vice/etc/CellServDB to a central source file in AFS is not a viable option. The afsdprogram reads the file into kernel memory before the Cache Manager is completely initialized and able to access AFS.

Because every client machine has its own copy of the CellServDB file, you can in theory make the set of accessible cells differon various machines. In most cases, however, it is best to maintain consistency between the files on all client machines in thecell: differences between machines are particularly confusing if users commonly use a variety of machines rather than just one.

The AFS Product Support group maintains a central CellServDB file that includes all cells that have agreed to make their databaseserver machines access to other AFS cells. It is advisable to check this file periodically for updated information. See MakingYour Cell Visible to Others.

An entry in the local CellServDB is one of the two requirements for accessing a cell. The other is that the cell’s root.cell volumeis mounted in the local filespace, by convention as a subdirectory of the /afs directory. For instructions, see To create a cellularmount point.

NoteThe /usr/vice/etc/CellServDB file on a client machine is not the same as the /usr/afs/etc/CellServDB file on the local disk ofa file server machine. The server version lists only the database server machines in the server machine’s home cell, becauseserver processes never need to contact foreign cells. It is important to update both types of CellServDB file on all machinesin the cell whenever there is a change to your cell’s database server machines. For more information about maintaining theserver version of the CellServDB file, see Maintaining the Server CellServDB File.

10.6.4 To display the /usr/vice/etc/CellServDB file

1. Use a text editor or the cat command to display the contents of the /usr/vice/etc/CellServDB file. By default, the modebits on the file permit anyone to read it.

% cat /usr/vice/etc/CellServDB


10.6.5 To display the list of database server machines in kernel memory

1. Issue the fs listcells command.

% fs listcells [&]

where listc is the shortest acceptable abbreviation of listcells.

To have your shell prompt return immediately, include the ampersand (&), which makes the command run in the back-ground. It can take a while to generate the complete output because the kernel stores database server machines’ IP addressesonly, and the fs command interpreter has the cell’s name resolution service (such as the Domain Name Service or a localhost table) translate them into hostnames. You can halt the command at any time by issuing an interrupt signal such asCtrl-c.

The output includes a single line for each cell, in the following format:

Cell cell_name on hosts list_of_hostnames.

The name service sometimes returns hostnames in uppercase letters, and if it cannot resolve a name at all, it returns its IPaddress. The following example illustrates all three possibilities:

% fs listcells..

Cell example.com on hosts db1.example.com db2.example.com db3.example.comCell example.org on hosts SERVERA.EXAMPLE.ORG SERVERB.EXAMPLE.ORG

SERVERC.EXAMPLE.ORGCell example.net on hosts 191.255.64.111 191.255.64.112

.

.

10.6.6 To change the list of a cell’s database server machines in kernel memory



2. If you a use a central copy of the CellServDB file as a source for client machines, verify that its directory’s ACL grantsyou the l (lookup), r (read), and w (write) permissions. The conventional directory is /afs/cell_name/common/etc. Ifnecessary, issue the fs listacl command, which is fully described in Displaying ACLs.

# fs listacl [<dir/file path>]

3. Issue the fs newcell command to add or change a cell’s entry in kernel memory. Repeat the command for each cell.

NoteYou cannot use this command to remove a cell’s entry completely from kernel memory. In the rare cases when youurgently need to prevent access to a specific cell, you must edit the CellServDB file and reboot the machine.

# fs newcell <cell name> <primary servers>+ \[-linkedcell <linked cell name>]

where

n Is the shortest acceptable abbreviation of newcell.


cell name Specifies the complete Internet domain name of the cell for which to record a new list of database servermachines.

primary servers Specifies the fully-qualified hostname or IP address in dotted-decimal format for each database servermachine in the cell. The list you provide completely replaces the existing list.

-linkedcell Specifies the complete Internet domain name of the AFS cell to link to a DCE cell for the purposes of DFSfileset location. You can use this argument if the machine’s AFS users access DFS via the AFS/DFS Migration ToolkitProtocol Translator. For instructions, see the OpenAFS/DFS Migration Toolkit Administration Guide and Reference.

4. Add or edit the cell’s entry in the local /usr/vice/etc/CellServDB file, using one of the following three methods. In eachcase, be sure to obey the formatting requirements described in The Format of the CellServDB file.

• If you maintain a central source CellServDB file, first use a text editor to alter the central copy of the file. Then use acopying command such as the cp command to copy it to the local /usr/vice/etc/CellServDB file.

• If you do not use a central source CellServDB file, edit the local machine’s /usr/vice/etc/CellServDB file directly.

10.7 Determining if a Client Can Run Setuid Programs

A setuid program is one whose binary file has the UNIX setuid mode bit turned on. While a setuid program runs, the user whoinitialized it assumes the local identity (UNIX UID) of the binary file’s owner, and so is granted the permissions in the local filesystem that pertain to the owner. Most commonly, the issuer’s assumed identity (often referred to as effective UID) is the localsuperuser root.

AFS does not recognize effective UID: if a setuid program accesses AFS files and directories, it uses the current AFS identityof the user who initialized the program, not of the program’s owner. Nevertheless, it can be useful to store setuid programs inAFS for use on more than one client machine. AFS enables a client machine’s administrator to determine and change whetherthe local Cache Manager allows setuid programs to run or not.

By default, the Cache Manager ignores all setuid permissions in AFS, but this can be changed by a client machine’s admin-istrator. Each cell’s setuid status is set independently of other cells. To change a cell’s setuid status with respect to the localmachine, become the local superuser root and issue the fs setcell command. To determine a cell’s current setuid status, use thefs getcellstatus command.

WarningEnabling support for the UNIX setuid bit for AFS programs is not secure with the current AFS protocol. Enabling thiscapability is not recommended except in very restricted environments on trusted networks.

When you issue the fs setcell command, you directly alter a cell’s setuid status as recorded in kernel memory, so rebootingthe machine is not necessary. However, nondefault settings do not persist across reboots of the machine unless you add theappropriate fs setcell command to the machine’s AFS initialization file.

Only members of the system:administrators group can turn on the setuid mode bit on an AFS file or directory. When the setuidmode bit is turned on, the UNIX ls -l command displays the third user mode bit as an s instead of an x, but for an AFS file ordirectory, the s appears only if setuid permission is enabled for the cell in which the file resides.

10.7.1 To determine a cell’s setuid status

1. Issue the fs getcellstatus command to check the setuid status of each desired cell.

% fs getcellstatus <cell name>

where

getce Is the shortest acceptable abbreviation of getcellstatus.


cell name Names each cell for which to report setuid status. Provide the complete Internet domain name or a shortenedform that distinguishes it from the other cells listed in the local /usr/vice/etc/CellServDB file.

The output reports the setuid status of each cell:

• the string no setuid allowed indicates that the Cache Manager does not allow programs from the cell to run withsetuid permission

• setuid allowed indicates that the Cache Manager allows programs from the cell to run with setuid permission

10.7.2 To change a cell’s setuid status



2. Issue the fs setcell command to change the setuid status of the cell.

# fs setcell <cell name>+ [-suid] [-nosuid]

where

setce Is the shortest acceptable abbreviation of setcell.cell name Names each cell for which to change setuid status as specified by the -suid or -nosuid flag. Provide each

cell’s complete Internet domain name or a shortened form that distinguishes it from the other cells listed in the local/usr/vice/etc/CellServDB file.

-suid Enables programs from each specified cell to execute with setuid permission. Provide this flag or the -nosuid flag,or omit both to disable setuid permission for each cell.

-nosuid Prevents programs from each specified cell from executing with setuid permission. Provide this flag or the -suidflag, or omit both to disable setuid permission for each cell.

10.8 Setting the File Server Probe Interval

The Cache Manager periodically sends a probe to server machines to verify that they are still accessible. Specifically, it probesthe database server machines in its cell and those file servers that house data it has cached.

If a server process does not respond to a probe, the client machine assumes that it is inaccessible. By default, the interval betweenprobes is three minutes, so it can take up to three minutes for a client to recognize that a server process is once again accessibleafter it was inaccessible.

To adjust the probe interval, include the -interval argument to the fs checkservers command while logged in as the localsuperuser root. The new interval setting persists until you again issue the command or reboot the machine, at which time thesetting returns to the default. To preserve a nondefault setting across reboots, include the appropriate fs checkservers commandin the machine’s AFS initialization file.

10.8.1 To set a client’s file server probe interval



2. Issue the fs checkservers command with the -interval argument.


# fs checkservers -interval <seconds between probes>

where

checks Is the shortest acceptable abbreviation of checkservers.

-interval Specifies the number of seconds between probes. Provide an integer value greater than zero.

10.9 Setting a Client Machine’s Cell Membership

Each client machine belongs to a particular cell, as named in the /usr/vice/etc/ThisCell on its local disk. The machine’s cellmembership determines three defaults important to users of the machine:

• The cell for which users of the machine obtain tokens (authenticate) when they use the login program or issue the klogcommand. There are two effects:

– The klog program and AFS-modified login utilities contact an Authentication Server in the cell named in the ThisCell file.

– The klog program and AFS-modified login utilities combine the contents of the ThisCell file with the password that theuser provides, generating an encryption key from the combination. The user’s entry in the Authentication Database includesan encryption key also generated from the combination of password and cell name. If the cell name in the ThisCell file isincorrect, users cannot authenticate even if they provide the correct password.

• The cell the Cache Manager considers its local, or home, cell. The Cache Manager allows programs from its local cell to runwith setuid permission, but not programs from foreign cells, as discussed further in Determining if a Client Can Run SetuidPrograms.

• The default database server machines that are contacted by the AFS command interpreters running on this machine.

10.9.1 To display a client machine’s cell membership

1. Use a text editor or the cat command to display the contents of the /usr/vice/etc/ThisCell file.

% cat /usr/vice/etc/ThisCell

10.9.2 To set a client machine’s cell membership



2. Using a text editor, replace the cell name in the /usr/vice/etc/ThisCell file.

3. (Optional.) Reboot the machine to enable the Cache Manager to use the new cell name immediately; the appropriatecommand depends on the machine’s system type. The klog program, AFS-modified login utilities, and the AFS commandinterpreters use the new cell name the next time they are invoked; no reboot is necessary.

# sync# shutdown


10.10 Forcing the Update of Cached Data

AFS’s callback mechanism normally guarantees that the Cache Manager provides the most current version of a file or directoryto the application programs running on its machine. However, you can force the Cache Manager to discard (flush) cached dataso that the next time an application program requests it, the Cache Manager fetches the latest version available at the File Server.

You can control how many file system elements to flush at a time:

• To flush only specific files or directories, use the fs flush command. This command forces the Cache Manager to discardthe data and status information it has cached from the specified files or directories. It does not discard information froman application program’s buffer or information that has been altered locally (changes made in the cache but not yet savedpermanently to the File Server). However, the next time an application requests the element’s data or status information, theCache Manager has to contact the File Server to get it.

• To flush everything cached from a certain volume, use the fs flushvolume command. This command works like the fs flushcommand, but differs in two ways:

– The Cache Manager discards data for all elements in the cache that come from the same volume as the specified files ordirectories.

– The Cache Manager discards only data, not status information. This difference has little practical effect, but can lead todifferent output from the ls command when the two different commands are used to flush the same element.

In addition to callbacks, the Cache Manager has a mechanism for tracking other kinds of possible changes, such as changesin a volume’s location. If a volume moves and the Cache Manager has not accessed any data in it for a long time, the CacheManager’s volume location record can be wrong. To resynchronize it, use the fs checkvolumes command. When you issue thecommand, the Cache Manager creates a new table of mappings between volume names, ID numbers, and locations. This forcesthe Cache Manager to reference newly relocated and renamed volumes before it can provide data from them.

It is also possible for information about mount points to become corrupted in the cache. Symptoms of a corrupted mount pointincluded garbled output from the fs lsmount command, and failed attempts to change directory to or list the contents of a mountpoint. Use the fs flushmount command to discard a corrupted mount point. The Cache Manager must refetch the mount pointthe next time it crosses it in a pathname. (The Cache Manager periodically refreshes cached mount points, but the only other wayto discard them immediately is to reinitialize the Cache Manager by rebooting the machine.

10.10.1 To flush certain files or directories

1. Issue the fs flush command.

% fs flush [<dir/file path>+]

where

flush Must be typed in full.dir/file path Names each file or directory structure to flush from the cache. Omit this argument to flush the current

working directory. Flushing a directory structure does not flush any files or subdirectories cached from it.

10.10.2 To flush all data from a volume

1. Issue the fs flushvolume command.

% fs flushvolume [<dir/file path>+]

where

flushv Is the shortest acceptable abbreviation of flushvolume.dir/file path Names a file or directory from each volume to flush from the cache. The Cache Manager flushes everything

in the cache that it has fetched from the same volume. Omit this argument to flush all cached data fetched from thevolume that contains the current working directory.


10.10.3 To force the Cache Manager to notice other volume changes

1. Issue the fs checkvolumes command.

% fs checkvolumes

where checkv is the shortest acceptable abbreviation of checkvolumes.

The following command confirms that the command completed successfully:

All volumeID/name mappings checked.

10.10.4 To flush one or more mount points

1. Issue the fs flushmount command.

% fs flush [<dir/file path>+]

where

flushm Is the shortest acceptable abbreviation of flushmount.dir/file path Names each mount point to flush from the cache. Omit this argument to flush the current working directory.

Files or subdirectories cached from the associated volume are unaffected.

10.11 Maintaining Server Preference Ranks

As mentioned in the introduction to this chapter, AFS uses client-side data caching and callbacks to reduce the amount of networktraffic in your cell. The Cache Manager also tries to make its use of the network as efficient as possible by assigning preferenceranks to server machines based on their network proximity to the local machine. The ranks bias the Cache Manager to fetchinformation from the server machines that are on its own subnetwork or network rather than on other networks, if possible.Reducing the network distance that data travels between client and server machine tends to reduce network traffic and speed theCache Manager’s delivery of data to applications.

The Cache Manager stores two separate sets of preference ranks in kernel memory. The first set of ranks applies to machinesthat run the Volume Location (VL) Server process, hereafter referred to as VL Server machines. The second set of ranks appliesto machines that run the File Server process, hereafter referred to as file server machines. This section explains how the CacheManager sets default ranks, how to use the fs setserverprefs command to change the defaults or set new ranks, and how to usethe fs getserverprefs command to display the current set of ranks.

10.11.1 How the Cache Manager Sets Default Ranks

As the afsd program initializes the Cache Manager, it assigns a preference rank of 10,000 to each of the VL Server machineslisted in the local /usr/vice/etc/CellServDB file. It then randomizes the ranks by adding an integer randomly chosen from therange 0 (zero) to 126. It avoids assigning the same rank to machines in one cell, but it is possible for machines from differentcells to have the same rank. This does not present a problem in use, because the Cache Manager compares the ranks of only onecell’s database server machines at a time. Although AFS supports the use of multihomed database server machines, the CacheManager only uses the single address listed for each database server machine in the local /usr/vice/etc/CellServDB file. OnlyUbik can take advantage of a multihomed database server machine’s multiple interfaces.

The Cache Manager assigns preference ranks to a file server machine when it obtains the server’s VLDB record from the VLServer, the first time that it accesses a volume that resides on the machine. If the machine is multihomed, the Cache Managerassigns a distinct rank to each of its interfaces (up to the number of interfaces that the VLDB can store for each machine, which isspecified in the OpenAFS Release Notes). The Cache Manager compares the interface’s IP address to the local machine’s addressand applies the following algorithm:


• If the local machine is a file server machine, the base rank for each of its interfaces is 5,000.

• If the file server machine interface is on the same subnetwork as the local machine, its base rank is 20,000.

• If the file server machine interface is on the same network as the local machine, or is at the distant end of a point-to-point linkwith the local machine, its base rank is 30,000.

• If the file server machine interface is on a different network than the local machine, or the Cache Manager cannot obtainnetwork information about it, its base rank is 40,000.

If the client machine has only one interface, the Cache Manager compares it to the server interface’s IP address and sets arank according to the algorithm. If the client machine is multihomed, the Cache Manager compares each of the local interfaceaddresses to the server interface, and assigns to the server interface the lowest rank that results from comparing it to all of theclient interfaces.

After assigning a base rank to a file server machine interface, the Cache Manager adds to it a number randomly chosen from therange 0 (zero) to 15. As an example, a file server machine interface in the same subnetwork as the local machine receives a baserank of 20,000, but the Cache Manager records the actual rank as an integer between 20,000 and 20,015. This process reducesthe number of interfaces that have exactly the same rank. As with VL Server machine ranks, it is possible for file server machineinterfaces from foreign cells to have the same rank as interfaces in the local cell, but this does not present a problem. Only therelative ranks of the interfaces that house a specific volume are relevant, and AFS supports storage of a volume in only one cellat a time.

10.11.2 How the Cache Manager Uses Preference Ranks

Each preference rank pairs an interface’s IP address with an integer that can range from 1 to 65,534. A lower rank (lowernumber) indicates a stronger preference. Once set, a rank persists until the machine reboots, or until you use the fs setserverprefscommand to change it.

The Cache Manager uses VL Server machine ranks when it needs to fetch volume location information from a cell. It comparesthe ranks for the cell’s VL Server machines and attempts to contact the VL Server process on the machine with the best (lowestinteger) rank. If it cannot reach that VL Server, it tries to contact the VL Server with the next best rank, and so on. If all of acell’s VL Server machines are inaccessible, the Cache Manager cannot fetch data from the cell.

Similarly, when the Cache Manager needs to fetch data from a volume, it compares the ranks for the interfaces of machines thathouse the volume, and attempts to contact the interface that has the best rank. If it cannot reach the fileserver process via thatinterface, it tries to contact the interface with the next best integer rank, and so on. If it cannot reach any of the interfaces formachines that house the volume, it cannot fetch data from the volume.

10.11.3 Displaying and Setting Preference Ranks

To display the file server machine ranks that the Cache Manager is using, use the fs getserverprefs command. Include the -vlservers flag to display VL Server machine ranks instead. By default, the output appears on the standard output stream (stdout),but you can write it to a file instead by including the -file argument.

The Cache Manager stores IP addresses rather than hostnames in its kernel list of ranks, but by default the output identifiesinterfaces by hostname after calling a translation routine that refers to either the cell’s name service (such as the Domain NameServer) or the local host table. If an IP address appears in this case, it is because the translation attempt failed. To bypass thetranslation step and display IP addresses rather than hostnames, include the -numeric flag. This can significantly speed up theoutput.

You can use the fs setserverprefs command to reset an existing preference rank, or to set the initial rank of a file server ma-chine interface or VL Server machine for which the Cache Manager has no rank. The ranks you set persist until the machinereboots or until you issue the fs setserverprefs command again. To make a rank persist across a reboot, place the appropriate fssetserverprefs command in the machine’s AFS initialization file.

As with default ranks, the Cache Manager adds a randomly chosen integer to each rank range that you assign. For file servermachine interfaces, the randomizing number is from the range 0 (zero) to 15; for VL Server machines, it is from the range 0(zero) to 126. For example, if you assign a rank of 15,000 to a file server machine interface, the Cache Manager stores an integerbetween 15,000 to 15,015.


To assign VL Server machine ranks, list them after the -vlserver argument to the fs setserverprefs command.

To assign file server machine ranks, use or more of the three possible methods:

1. List them after the -servers argument on the command line.

2. Record them in a file and name it with the -file argument. You can easily generate a file with the proper format by includingthe -file argument to the fs getserverprefs command.

3. Provide them via the standard input stream, by including the -stdin flag. This enables you to feed in values directly froma command or script that generates preferences using an algorithm appropriate for your cell. It must generate them in theproper format, with one or more spaces between each pair and between the two parts of the pair. The AFS distributiondoes not include such a script, so you must write one if you want to use this method.

You can combine any of the -servers, -file, and -stdin options on the same command line if you wish. If more than one of themspecifies a rank for the same interface, the one assigned with the -servers argument takes precedence. You can also provide the-vlservers argument on the same command line to set VL Server machine ranks at the same time as file server machine ranks.

The fs command interpreter does not verify hostnames or IP addresses, and so willingly stores ranks for hostnames and addressesthat don’t actually exist. The Cache Manager never uses such ranks unless the same VLDB record for a server machine recordsthe same incorrect information.

10.11.4 To display server preference ranks

1. Issue the fs getserverprefs command to display the Cache Manager’s preference ranks for file server machines or VLServer machines.

% fs getserverprefs [-file <output to named file>] [-numeric] [-vlservers]

where

gp Is an acceptable alias for getserverprefs (gets is the shortest acceptable abbreviation).

-file Specifies the pathname of the file to which to write the list of ranks. Omit this argument to display the list on thestandard output stream (stdout).

-numeric Displays the IP address, rather than the hostname, of each ranked machine interface. Omit this flag to have theaddresses translated into hostnames, which takes longer.

-vlservers Displays ranks for VL Server machines rather than file server machines.

The following example displays file server machine ranks. The -numeric flag is not used, so the appearance of an IPaddress indicates that is not currently possible to translate it to a hostname.

% fs gpfs5.example.com 20000fs1.example.com 30014server1.example.org 40011fs3.example.com 20001fs4.example.com 30001192.12.106.120 40002192.12.106.119 40001

. . . . . . .

10.11.5 To set server preference ranks




2. Issue the fs setserverprefs command to set the Cache Manager’s preference ranks for one or more file server machines orVL Server machines.

# fs setserverprefs [-servers <fileserver names and ranks>+] \[-vlservers <VL server names and ranks>+] \[-file <input from named file>] [-stdin]

where

sp Is an acceptable alias for setserverprefs (sets is the shortest acceptable abbreviation).-servers Specifies one or more pairs of file server machine interface and rank. Identify each interface by its fully-qualified

hostname or IP address in dotted decimal format. Acceptable ranks are the integers from 1 to 65534. Separate theparts of a pair, and the pairs from one another, with one or more spaces.

-vlservers Specifies one or more pairs of VL Server machine and rank. Identify each machine by its fully-qualifiedhostname or IP address in dotted decimal format. Acceptable ranks are the integers from 1 to 65534.

-file Specifies the pathname of a file that contains one more pairs of file server machine interface and rank. Place each pairon its own line in the file. Use the same format for interfaces and ranks as with the -servers argument.

-stdin Indicates that pairs of file server machine interface and rank are being provided via the standard input stream (stdin).The program or script that generates the pairs must format them in the same manner as for the -servers argument.

10.12 Managing Multihomed Client Machines

The File Server can choose the interface to which to send a message when it initiates communication with the Cache Manageron a multihomed client machine (one with more than one network interface and IP address). If that interface is inaccessible, itautomatically switches to an alternate. This improves AFS performance, because it means that the outage of an interface doesnot interrupt communication between File Server and Cache Manager.

The File Server can choose the client interface when it sends two types of messages:

• A message to break the callback that the Cache Manager holds on a cached file

• A ping message to check that the Cache Manager is still accessible and responding; the File Server sends such a message everyfew minutes

(The File Server does not choose which client interface to respond to when filling a Cache Manager’s request for AFS data. Inthat case, it always responds to the client interface via which the Cache Manager sent the request.)

The Cache Manager compiles the list of eligible interfaces on its client machine automatically as it initializes, and records themin kernel memory. When the Cache Manager first establishes a connection with the File Server, it sends along the list of interfaceaddresses. The File Server records the addresses, and uses the one at the top of the list when it needs to break a callback or senda ping to the Cache Manager. If that interface is inaccessible, the File Server simultaneously sends a message to all of the otherinterfaces in the list. Whichever interface replies first is the one to which the File Server sends future messages.

You can control which addresses the Cache Manager registers with File Servers by listing them in two files in the /usr/vice/etcdirectory on the client machine’s local disk: NetInfo and NetRestrict. If the NetInfo file exists when the Cache Manager ini-tializes, the Cache Manager uses its contents as the basis for the list of interfaces. Otherwise, the Cache Manager uses the list ofinterfaces configured with the operating system. It then removes from the list any addresses that appear in the /usr/vice/etc/Ne-tRestrict file, if it exists. The Cache Manager records the resulting list in kernel memory.

You can also use the fs setclientaddrs command to change the list of addresses stored in the Cache Manager’s kernel memory,without rebooting the client machine. The list of addresses you provide on the command line completely replaces the currentlist in kernel memory. The changes you make persist only until the client machine reboots, however. To preserve the revised listacross reboots, list the interfaces in the NetInfo file (and if appropriate, the NetRestrict file) in the local /usr/vice/etc directory.(You can also place the appropriate fs setclientaddrs command in the machine’s AFS initialization script, but that is less efficient:by the time the Cache Manager reads the command in the script, it has already compiled a list of interfaces.)

To display the list of addresses that the Cache Manager is currently registering with File Servers, use the fs getclientaddrscommand.

Keep the following in mind when you change the NetInfo or NetRestrict file, or issue the fs getclientaddrs or fs setclientaddrscommands:


• When you issue the fs setclientaddrs command, the revised list of addresses does not propagate automatically to File Serverswith which the Cache Manager has already established a connection. They continue to use the list that the Cache Managerregistered with them when it first established a connection. To force previously contacted File Servers to use the revised list,you must either reboot each file server machine, or reboot the client machine after changing its NetInfo file, NetRestrict file,or both.

• The fs command interpreter verifies that each of the addresses you specify on the fs setclientaddrs command line is actuallyconfigured with the client machine’s operating system. If it is not, the command fails with an error message that marks theaddress as a Nonexistent interface.

• As previously noted, the File Server does not use the registered list of addresses when it responds to the Cache Manager’srequest for data (as opposed to initiating communication itself). It always attempts to send its reply to the interface from whichthe Cache Manager sent the request. If the reply attempt fails, the File Server selects an alternate route for resending the replyaccording to its server machine’s network routing configuration, not the list of addresses registered by the Cache Manager.

• The Cache Manager does not use the list of interfaces when choosing the interface via which to establish a connection to a FileServer.

• The list of addresses that the fs getclientaddrs command displays is not necessarily the one that a specific File Server is using,if an administrator has issued the fs setclientaddrs command since the Cache Manager first contacted that File Server. Itdetermines only which addresses the Cache Manager registers when connecting to File Servers in future.

10.12.1 To create or edit the client NetInfo file



2. Using a text editor, open the /usr/vice/etc/NetInfo file. Place one IP address in dotted decimal format (for example, 192.12.107.33) on each line. On the first line, put the address that you want each File Server to use initially. The order ofthe remaining machines does not matter, because if an RPC to the first interface fails, the File Server simultaneously sendsRPCs to all of the other interfaces in the list. Whichever interface replies first is the one to which the File Server then sendspings and RPCs to break callbacks.

3. If you want the Cache Manager to start using the revised list immediately, either reboot the machine, or use the fs setclien-taddrs command to create the same list of addresses in kernel memory directly.

10.12.2 To create or edit the client NetRestrict file



2. Using a text editor, open the /usr/vice/etc/NetRestrict file. Place one IP address in dotted decimal format on each line.The order of the addresses is not significant. Use the value 255 as a wildcard that represents all possible addresses in thatfield. For example, the entry 192.12.105.255 indicates that the Cache Manager does not register any of the addressesin the 192.12.105 subnet.

3. If you want the Cache Manager to start using the revised list immediately, either reboot the machine, or use the fs setclien-taddrs command to set a list of addresses that does not included the prohibited ones.


10.12.3 To display the list of addresses from kernel memory

1. Issue the fs getclientaddrs command.

% fs getclientaddrs

where gc is an acceptable alias for getclientaddrs (getcl is the shortest acceptable abbreviation).

The output lists each IP address on its own line, in dotted decimal format.

10.12.4 To set the list of addresses in kernel memory



2. Issue the fs setclientaddrs command to replace the list of addresses currently in kernel memory with a new list.

# fs setclientaddrs [-address <client network interfaces>+]

where

sc Is an acceptable alias for setclientaddrs (setcl is the shortest acceptable abbreviation).

-address Specifies one or more IP addresses in dotted decimal format (hostnames are not acceptable). Separate eachaddress with one or more spaces.

10.13 Controlling the Display of Warning and Informational Messages

By default, the Cache Manager generates two types of warning and informational messages:

• It sends user messages, which provide user-level status and warning information, to user screens.

• It sends console messages, which provide system-level status and warning information, to the client machine’s designatedconsole.

You can use the fs messages command to control whether the Cache Manager displays either type of message, both types, orneither. It is best not to disable messages completely, because they provide useful information.

If you want to monitor Cache Manager status and performance more actively, you can use the afsmonitor program to collect anextensive set of statistics (it also gathers File Server statistics). If you experience performance problems, you can use fstracesuite of commands to gather a low-level trace of Cache Manager operations, which the AFS Support and Development groupscan analyze to help solve your problem. To learn about both utilities, see Monitoring and Auditing AFS Performance.

10.13.1 To control the display of warning and status messages



2. Issue the fs messages command, using the -show argument to specify the type of messages to be displayed.

# fs messages -show <user|console|all|none>

where


me Is the shortest acceptable abbreviation of messages.

-show Specifies the types of messages to display. Choose one of the following values:

user Sends user messages to user screens.console Sends console messages to the console.all Sends user messages to user screens and console messages to the console (the default if the -show argument is

omitted).none Disables messages completely.

10.14 Displaying and Setting the System Type Name

The Cache Manager stores the system type name of the local client machine in kernel memory. It reads in the default value froma hardcoded definition in the AFS client software.

The Cache Manager uses the system name as a substitute for the @sys variable in AFS pathnames. The variable is useful whencreating a symbolic link from the local disk to an AFS directory that houses binaries for the client machine’s system type. Becausethe @sys variable automatically steers the Cache Manager to the appropriate directory, you can create the same symbolic link onclient machines of different system types. The link also remains valid when you upgrade the machine to a new system type.

Configuration is simplest if you use the system type names that AFS assigns. For a list, see the OpenAFS Release Notes.

To display the system name stored in kernel memory, use the sys or fs sysname command. To change the name, add the lattercommand’s -newsys argument.

10.14.1 To display the system type name

1. Issue the fs sysname or sys command.

% fs sysname% sys

The output of the fs sysname command has the following format:

Current sysname is ’system_name’

The sys command displays the system_name string with no other text.

10.14.2 To change the system type name



2. Issue the fs sysname command, using the -newsys argument to specify the new name.

# fs sysname <new sysname>

where

sys Is the shortest acceptable abbreviation of sysname.

new sysname Specifies the new system type name.


10.15 Enabling Asynchronous Writes

By default, the Cache Manager writes all data to the File Server immediately and synchronously when an application programcloses a file. That is, the close system call does not return until the Cache Manager has actually written all of the cached datafrom the file back to the File Server. You can enable the Cache Manager to write files asynchronously by specifying the numberof kilobytes of a file that can remain to be written to the File Server when the Cache Manager returns control to the application.

Enabling asynchronous writes can be helpful to users who commonly work with very large files, because it usually means thatthe application appears to perform faster. However, it introduces some complications. It is best not to enable asynchronouswrites unless the machine’s users are sophisticated enough to understand the potential problems and how to avoid them. Thecomplications include the following:

• In most cases, the Cache Manager returns control to applications earlier than it does by default, but it is not guaranteed to doso. Users cannot always expect faster performance.

• If an asynchronous write fails, there is no way to notify the application, because the close system call has already returned witha code indicating success.

• Asynchronous writing increases the possibility that the user fails to notice when a write operation makes a volume exceed itsquota. As always, the portion of the file that exceeds the quota is lost, as indicated by a message like the following:

No space left on device

To avoid losing data because of insufficient quota, before closing a file users must verify that the volume housing the file hasenough free space to accommodate it.

When you enable asynchronous writes by issuing the fs storebehind command, you set the number of kilobytes of a file that canstill remain to be written to the File Server when the Cache Manager returns control to the application program. You can applythe setting either to all files manipulated by applications running on the machine, or only to certain files:

• The setting that applies to all files is called the default store asynchrony for the machine, and persists until the machine reboots.If, for example, you set the default store asynchrony to 10 KB, it means that when an application closes a file, the CacheManager can return control to the application as soon as no more than 10 KB of a file that the application has closed remain tobe written to the File Server.

• The setting for an individual file overrides the default store asynchrony and persists as long as there is an entry for the file inthe internal table that the Cache Manager uses to track information about files. In general, such an entry persists at least untilan application closes the file or exits completely, but the Cache Manager is free to recycle the entry if the file is inactive andit needs to free up slots in the table. To be sure the entry exists in the table, issue the fs storebehind command shortly beforeclosing the file.

10.15.1 To set the default store asynchrony



2. Issue the fs storebehind command with the -allfiles argument.

# fs storebehind -allfiles <new default (KB)> [-verbose]

where

st Is the shortest acceptable abbreviation of storebehind.

-allfiles Sets the number of kilobytes of data that can remain to be written to the File Server when the Cache Managerreturns control to the application that closed a file.

-verbose Produces a message that confirms the new setting.


10.15.2 To set the store asynchrony for one or more files

1. Verify that you have the w (write) permission on the access control list (ACL) of each file for which you are setting thestore asynchrony, by issuing the fs listacl command, which is described fully in Displaying ACLs.

% fs listacl dir/file path

Alternatively, become the local superuser root on the client machine, if you are not already, by issuing the su command.


2. Issue the fs storebehind command with the -kbytes and -files arguments.

# fs storebehind -kbytes <asynchrony for specified names> \-files <specific pathnames>+ \[-verbose]

where

st Is the shortest acceptable abbreviation of storebehind.-kbytes Sets the number of kilobytes of data that can remain to be written to the File Server when the Cache Manager

returns control to the application that closed a file named by the -files argument.-files Specifies each file for which to set a store asynchrony that overrides the default. Partial pathnames are interpreted

relative to the current working directory.-verbose Produces a message that confirms that new setting.

10.15.3 To display the default store asynchrony

1. Issue the fs storebehind command with no arguments, or with the -verbose flag only.

% fs storebehind [-verbose]

where

st Is the shortest acceptable abbreviation of storebehind.-verbose Produces output that reports the default store asynchrony.

10.15.4 To display the store asynchrony for one or more files

1. Issue the fs storebehind command with the -files argument only.

% fs storebehind -files <specific pathnames>+

where

st Is the shortest acceptable abbreviation of storebehind.-files Specifies each file for which to display the store asynchrony. Partial pathnames are interpreted relative to the current

working directory.

The output lists each file separately. If a value has previously been set for the specified files, the output reports the following:

Will store up to y kbytes of file asynchronously.Default store asynchrony is x kbytes.

If the default store asynchrony applies to a file (because you have not set a -kbytes value for it), the output reports the following:

Will store file according to default.Default store asynchrony is x kbytes.


Part IV

Managing Users and Groups


Chapter 11

Creating and Deleting User Accounts with theuss Command Suite

The uss command suite helps you create and delete AFS user accounts quickly and easily. You can create a single account withthe uss add command, delete a single account with the uss delete command, or create and delete multiple accounts with the ussbulk command.

A single uss add or uss bulk command can create a complete AFS user account because the uss command interpreter refers toa template file in which you predefine the configuration of many account components. The uss delete command deletes most ofthe components of a user account, but does not use a template file.

The uss suite also easily incorporates shell scripts or other programs that you write to perform parts of account creation anddeletion unique to your site. To invoke a script or program automatically as a uss command runs, use the appropriate instructionsin the template file or bulk input file. Various sections of this chapter discuss possible uses for scripts.

Using the uss commands to create and delete accounts is the recommended method because it automates and correctly ordersmost of the necessary steps. The alternative is to issue a series of separate commands to the various AFS servers, which requiresmore careful record keeping. For instructions, see Administering User Accounts.



Add a single user account uss addDelete a single user account uss deleteAdd and delete multiple accounts uss bulk

11.2 Overview of the uss Command Suite

The commands in the uss suite help you to automate the creation and deletion of AFS user accounts:

• The uss add command creates all of the components of an account, one account at a time. It consults a template file that definesaccount configuration.

• The uss delete command deletes the major components of an account, one account at a time. It does not use a template file, soyou possibly need to perform additional tasks manually.

• The uss bulk command can create and delete multiple accounts. It refers to a bulk input file that can contain any number ofaccount-creation and deletion instructions, along with other instructions for further automating the process.


11.2.1 The Components of an AFS User Account

An AFS user account can have many components. The only two required components are entries in the Protection Database andAuthentication Database, but the other components add functionality and usability. The following information also appears in acorresponding section of Administering User Accounts, but is repeated here for your convenience.

• A Protection Database entry defines the username (the name provided when authenticating with AFS), and maps it to an AFSuser ID (AFS UID), a number that the AFS servers use internally when referencing users. The Protection Database also tracksthe groups to which the user belongs. For details, see Administering the Protection Database.

• An Authentication Database entry records the user’s AFS password in a scrambled form suitable for use as an encryption key.

• A home volume stores all the files in the user’s home directory together on a single partition of a file server machine. Thevolume has an associated quota that limits its size. For a complete discussion of volumes, see Managing Volumes.

• A mount point makes the contents of the user’s volume visible and accessible in the AFS filespace, and acts as the user’s homedirectory. For more details about mount points, see About Mounting Volumes.

• Full access permissions on the home directory’s access control list (ACL) and ownership of the directory (as displayed by theUNIX ls -ld command) enable the user to manage his or her files. For details on AFS file protection, see Managing AccessControl Lists.

• A local password file entry (in the /etc/passwd file or equivalent) of each AFS client machine enables the user to log in andaccess AFS files through the Cache Manager. A subsequent section in this chapter further discusses local password file entries.

• Other optional configuration files make the account more convenient to use. Such files help the user log in and log out moreeasily, receive electronic mail, print, and so on.

11.2.2 Privilege Requirements for the uss Commands

To issue uss commands successfully, you usually need all of the standard AFS administrative privileges: membership in thesystem:administrators group, inclusion in the /usr/afs/etc/UserList file on every relevant server machine, and the ADMIN flagon your Authentication Database entry. For details on administrative privilege, see Managing Administrative Privilege.

11.2.3 Avoiding and Recovering from Errors and Interrupted Operations

As for any complex operation, there are a number of possible reasons that an account-creation or deletion operation can haltbefore it completes. You can easily avoid several of the common reasons by making the following checks before issuing a usscommand:

• Verify that you have all of the administrative privileges you need to complete an operation, as described in Privilege Require-ments for the uss Commands. The instructions for using the uss add, uss delete, and uss bulk commands include this checkas a step.

• Proofread the template and bulk input files for correct syntax and acceptable values. For discussion, see Constructing a ussTemplate File and Constructing a Bulk Input File.

• Do not issue uss commands when you are aware of network, server machine, or server process outages. Because uss operationsaffect so many components of AFS, it is unlikely that the command can succeed when there are outages.

Another way to avoid errors that halt an operation is to preview the uss command by combining the -dryrun flag with theother arguments to be used on the actual command. The uss command interpreter generates a screen trace of the actions to beperformed by the actual command, without performing them.

Using the -dryrun flag reveals many basic errors that can halt an operation, particularly the ones due to incorrect syntax in thecommand line, template file, or bulk input file. It does not catch all possible errors, however, because the command interpreteris not actually attempting to perform the actions it is tracing. For example, a Volume Server outage does not necessarily halt the


volume creation step when the -dryrun flag is included, because the command interpreter is not actually contacting the server;such an outage halts the actual creation operation.

When the uss command interpreter encounters error conditions minor enough that they do not require halting the operation, itusually generates a message that begins with the string uss:Warning: and describes the action it is taking to avoid halting.For example, if a user’s Protection Database entry already exists, the following message appears on the standard output stream:

uss: Warning: User ’user’ already in the protection databaseThe uid for user ’user’ is AFS UID

If an error is more serious, the word Warning does not appear in the message, which instead describes why the commandinterpreter cannot perform the requested action. Not all of these errors cause the uss operation to halt, but they still require you totake corrective action. For example, attempting to create a mount point fails if you lack the necessary permissions on the parentdirectory’s ACL, or if the mount point pathname in the V instruction’s mount_point field is malformed. However, this error doesnot cause the creation operation to halt until later instructions in the template attempt to install subdirectories or files under thenonexistent mount point.

If the command shell prompts returns directly after an error message, then the error generally was serious enough to halt theoperation. When an error halts account creation or deletion, the best way to recover is to find and fix the cause, and then reissuethe same uss command.

The following list describes what happens when components of a user’s account already exist when you reissue an account-creation command (the uss add command, or the uss bulk command when the bulk input file contains add instructions):

• If the Protection Database entry already exists, a message confirms its existence and specifies the associated AFS UID.

• If the Authentication Database entry already exists, a message confirms its existence.

• If the volume and associated Volume Location Database (VLDB) entry already exist, a message confirms their existence.However, the uss command interpreter does alter the volume’s quota, mount point, or ACL if any of the relevant fields inthe template V instruction have changed since the command last ran. If the value in the mount_point field has changed, thecommand interpreter creates the new mount point but does not remove any existing mount points.

• If any of the fields in the template A instruction have changed, the uss command interpreter makes the changes withoutcomment.

• If a directory, file, or link defined by a template file D, E, F, L, or S instruction already exists, the uss command interpreterreplaces the existing element with one that conforms to the template definition. To control whether the uss command interpreterprompts for confirmation that you wish to overwrite a given element, use the -overwrite flag to the uss add or uss bulkcommand:

– If you include the -overwrite flag, the command interpreter automatically overwrites all elements without asking for confir-mation.

– If you omit the flag, the command interpreter prompts once for each account to ask if you want to overwrite all elementsassociated with it.

• The command interpreter always reexecutes X instructions in the template file. If a command’s result already holds, reissuingit has the same effect as reissuing it outside the context of the uss commands.

The following describes what happens when a uss delete command references account components that have already beendeleted.

• If the volume and VLDB entry no longer exist, a message confirms their absence.

• If the Authentication Database entry no longer exists, a message confirms its absence.


11.3 Creating Local Password File Entries with uss

To obtain authenticated access to a cell’s AFS filespace, a user must not only have a valid AFS token, but also an entry in thelocal password file (/etc/passwd or equivalent) of the AFS client machine. This section discusses why it is important for theuser’s AFS UID to match to the UNIX UID listed in the local password file, the appropriate value to put in the file’s passwordfield, and outlines a method for creating a single source password file.

For instructions on using the template file’s E instruction to generate local password file entries automatically as part of accountcreation, see Creating a Common Source Password File.

The following information also appears in a corresponding section of Administering User Accounts, but is repeated here for yourconvenience.

11.3.1 Assigning AFS and UNIX UIDs that Match

A user account is easiest to administer and use if the AFS user ID number (AFS UID) and UNIX UID match. All instructions inthe AFS documentation assume that they do.

The most basic reason to make AFS and UNIX UIDs the same is so that the owner name reported by the UNIX ls -l and ls -ldcommands makes sense for AFS files and directories. Following standard UNIX practice, the File Server records a number ratherthan a username in an AFS file or directory’s owner field: the owner’s AFS UID. When you issue the ls -l command, it translatesthe UID to a username according to the mapping in the local password file, not the AFS Protection Database. If the AFS andUNIX UIDs do not match, the ls -l command reports an unexpected (and incorrect) owner. The output can even vary on differentclient machines if their local password files map the same UNIX UID to different names.

Follow the recommendations in the indicated sections to make AFS and UNIX UIDs match when you are creating accounts forvarious types of users:

• If creating an AFS account for a user who already has a UNIX UID, see Converting Existing UNIX Accounts with uss.

• If some users in your cell have existing UNIX accounts but the user for whom you are creating an AFS account does not, thenit is best to allow the Protection Server to allocate an AFS UID automatically. To avoid overlap of AFS UIDs with existingUNIX UIDs, set the Protection Database’s max user id counter higher than the largest UNIX UID, using the instructionsin Displaying and Setting the AFS UID and GID Counters.

• If none of your users have existing UNIX accounts, allow the Protection Server to allocate AFS UIDs automatically, startingeither at its default or at the value you have set for the max user id counter.

11.3.2 Specifying Passwords in the Local Password File

Authenticating with AFS is easiest for your users if you install and configure an AFS-modified login utility, which logs a userinto the local file system and obtains an AFS token in one step. In this case, the local password file no longer controls a user’sability to login in most circumstances, because the AFS-modified login utility does not consult the local password file if the userprovides the correct AFS password. You can nonetheless use a password file entry’s password field (usually, the second field) inthe following ways to control login and authentication:

• To prevent both local login and AFS authentication, place an asterisk ( * ) in the field. This is useful mainly in emergencies,when you want to prevent a certain user from logging into the machine.

• To prevent login to the local file system if the user does not provide the correct AFS password, place a character string of anylength other than the standard thirteen characters in the field. This is appropriate if you want to allow only people with localAFS accounts to log into to your machines. A single X or other character is the most easily recognizable way to do this.


If you do not use an AFS-modified login utility, you must place a standard UNIX password in the local password file of everyclient machine the user will use. The user logs into the local file system only, and then must issue the klog command toauthenticate with AFS. It is simplest if the passwords in the local password file and the Authentication Database are the same,but this is not required.


11.3.3 Creating a Common Source Password File

This section explains how to create a common source version of the local password file when using uss commands to create useraccounts. The sequence of steps is as follows:

1. Include an E instruction in the template file to create a one-line file that has the format of a local password file entry.

2. Incorporate the one-line file into the common source version of the local password file. It makes sense to store this file inAFS. See the following two example scripts for automating this step.

As an example, the template file used by the Example Corporation includes the following E instruction to create a file calledpasswd_username in the directory /afs/.example.com/common/etc/newaccts (the entire contents of the template file appear inExample uss Templates and a full description of the E instruction appears in Creating One-Line Files with the E Instruction):

E /afs/.example.com/common/etc/newaccts/passwd_$USER 0644 root \"$USER:X:$UID:11:$NAME:$MTPT:/bin/csh"

For the user Joe L. Smith with username smith, this instruction creates a file called passwd_smith which contains the followingline:

smith:X:1205:11:Joe L. Smith:/afs/example.com/usr/usr1/smith:/bin/csh

A shell script is probably the easiest way to incorporate a set of files created in this manner into a common source password file,and two sample shell scripts appear here. To automate the process even further, you can create a cron process in a file servermachine’s /usr/afs/local/BosConfig directory to execute the shell script, perhaps each day at a given time; for details, see Tocreate and start a new process.

NoteThe following example scripts are suggestions only. If you choose to use them, or to model similar scripts on them, you musttest that your script has the desired result, preferably in a test environment.

Example C Shell Script

The first example is a simple C shell script suitable for the Example Corporation cell. It incorporates the individual filesfound in the /afs/.example.com/common/uss/newaccts directory into a new version of the global password file found in the/afs/.example.com/common/etc directory, sorting the files into alphabetical order. It takes care to save the current version witha .old extension, then removes the individual files when done.

set dir = /afs/.example.com/commoncat $dir/uss/newaccts/passwd_* $dir/etc/passwd >! $dir/etc/passwd.newmv $dir/etc/passwd $dir/etc/passwd.oldsort $dir/etc/passwd.new > $dir/etc/passwdrm $dir/etc/passwd.new $dir/uss/newaccts/passwd_*

Example Bourne Shell Script

The second, more elaborate, example is a Bourne shell script that first verifies that there are new passwd_username files to beincorporated into the global password file. While running, it checks that each new entry does not already exist. Like the shorterC shell example, it incorporates the individual files found in the /afs/.example.com/common/uss/newaccts directory into a newversion of the global passwd file found in the /afs/.example.com/common/etc directory.

#!/bin/shDESTDIR=/afs/.example.com/common/uss/newacctscd $DESTDIRDEST=/afs/.example.com/common/etccp /afs/.example.com/common/etc/passwd /afs/.example.com/common/uss/newaccts/passwdecho "copied in passwd file."PASSWD=/afs/.example.com/common/uss/newaccts/passwdENTRIES=‘ls passwd_*‘


case $ENTRIES in"")

echo No new entry found to be added to passwd file;;

*)echo "Adding new users to passwd file."for i in $ENTRIESdo

cat $i | awk -F: ’{print $1 > "foo"}’USER=‘cat foo‘case ‘egrep -e \^$USER\: $PASSWD‘ in"")

echo adding $USERcat $i >> $PASSWD;;

*)echo $USER already in passwd file;;

esacmv $i ../old.passdir/done_${i}

donecd /afs/.example.com/common/uss/newacctsecho "sorting password file"sort ${PASSWD} > ${PASSWD}.sortedecho "installing files"install ${PASSWD}.sorted ${DEST}/passwdecho "Password file is built, sorted and installed.";;

esac

11.4 Converting Existing UNIX Accounts with uss

This section discusses the three main issues you need to consider if there are existing UNIX accounts to be converted to AFSaccounts.

11.4.1 Making UNIX and AFS UIDs Match

As previously mentioned, AFS users must have an entry in the local password file on every client machine from which theyaccess the AFS filespace as an authenticated user. Both administration and use are much simpler if the UNIX UID and AFS UIDmatch. When converting existing UNIX accounts, you have two alternatives:

• Make the AFS UIDs match the existing UNIX UIDs. In this case, you need to assign the AFS UID yourself as you create anAFS account:

– If using the uss add command, include the -uid argument.

– If using the uss bulk command, specify the desired UID in the uid field of the add instruction in the bulk input file.

Because you are retaining the user’s UNIX UID, you do not need to alter the UID in the local password file entry. However, ifyou are using an AFS-modified login utility, you possibly need to change the password field in the entry. For a discussion ofhow the value in the password field affects login with an AFS-modified login utility, see Creating Local Password File Entrieswith uss.

If now or in the future you need to create AFS accounts for users who do not have an existing UNIX UID, then you mustguarantee that new AFS UIDs do not conflict with any existing UNIX UIDs. The simplest way is to set the max user idcounter in the Protection Database to a value higher than the largest existing UNIX UID. See Displaying and Setting the AFSUID and GID Counters.


• Change the existing UNIX UIDs to match the new AFS UIDs that the Protection Server assigns automatically.

Allow the Protection Server to allocate the AFS UIDs automatically as you create AFS accounts. For instructions on creatinga new entry for the local password file during account creation, see Creating Local Password File Entries with uss.

There is one drawback to changing the UNIX UID: any files and directories that the user owned in the local file system beforebecoming an AFS user still have the former UID in their owner field. If you want the ls -l and ls -ld commands to display thecorrect owner, you must use the chown command to change the value to the user’s new UID, whether you are leaving the filein the local file system or moving it to AFS. See Moving Local Files into AFS.

11.4.2 Setting the Password Field Appropriately

Existing UNIX accounts already have an entry in the local password file, probably with a (scrambled) password in the passwordfield. You possibly need to change the value in the field, depending on the type of login utility you use:

• If the login utility is not modified for use with AFS, the actual password must appear (in scrambled form) in the password fieldof the local password file entry.

• If the login utility is modified for use with AFS, choose one of the acceptable values, each of which affects the login utility’sbehavior differently. See Creating Local Password File Entries with uss.

If you choose to place an actual password in a local password file entry, then you can define a dummy password when you use atemplate file E instruction to create the entry, as described in Creating One-Line Files with the E Instruction. Have the user issuethe UNIX password-setting command (passwd or equivalent) to replace the dummy with an actual secret password.

11.4.3 Moving Local Files into AFS

New AFS users with existing UNIX accounts probably already own files and directories stored in a machine’s local file system,and it usually makes sense to transfer them into the new home volume. The easiest method is to move them onto the local diskof an AFS client machine, and then use the UNIX mv command to transfer them into the user’s new AFS home directory.

As you move files and directories into AFS, keep in mind that the meaning of their mode bits changes. AFS ignores the secondand third sets of mode bits (group and other), and does not use the first set (the owner bits) directly, but only in conjunction withentries on the ACL (for details, see How AFS Interprets the UNIX Mode Bits). Be sure that the ACL protects the file or directoryat least as securely as the mode bits.

If you have chosen to change a user’s UNIX UID to match a new AFS UID, you must change the ownership of UNIX files anddirectories as well. Only members of the system:administrators group can issue the chown command on files and directoriesonce they reside in AFS.

11.5 Constructing a uss Template File

Creating user accounts with uss commands is generally more convenient than using individual commands. You control theaccount creation process just as closely, but the uss template file enables you to predefine many aspects of account configuration.Because you construct the template before issuing uss commands, you have time to consider configuration details carefully andcorrect syntax errors. The following list summarizes some further advantages of using a template:

• You do not have to remember the correct order in which to create or delete account components, or the order of each command’sarguments, which reduces the likelihood of errors.

• You do not have to type the same information multiple times. Instead, you can place constants and variables in the templatefile that enable you to type as little on the command line as possible. See Using Constants and Variables in the Template File.

• You can create different templates for different types of users. Instead of having to remember which components differ for agiven user, specify the appropriate template when issuing the uss add or uss bulk command.

• You can create any of the three types of AFS account (authentication-only, basic, or full) by including or omitting certaininformation in the template, as described in Creating the Three Types of User Accounts.


The following list briefly describes the instructions that can appear in a template file and points you to a later section for moredetails. It lists them in the order that is usually optimal for correct handling of dependencies between the different types ofinstruction.

G Defines a directory that is one of a set of parent directories into which the uss command interpreter evenly distributes newlycreated home directories. Place the corresponding template file variable, $AUTO, in the mount_point field of the Vinstruction. See Evenly Distributing User Home Directories with the G Instruction and Creating a Volume with the VInstruction.

V Creates a volume, mounts it as the user’s home directory at a specified location in the AFS filespace, sets the volume’squota, and defines the owner and ACL for the directory. This instruction must appear in any template that is not empty(zero-length). See Creating a Volume with the V Instruction.

D Creates a directory, generally a subdirectory of the new home directory, and sets its mode bits, owner, and ACL. See Creatinga Directory with the D Instruction.

F Creates a file by copying a prototype and sets its mode bits and owner. See Creating a File from a Prototype with the FInstruction.

E Creates a single-line file by copying in the contents of the instruction itself, then sets the file’s mode bits and owner. SeeCreating One-Line Files with the E Instruction.

L Creates a hard link. See Creating Links with the L and S Instructions.

S Creates a symbolic link. See Creating Links with the L and S Instructions.

A Improves account security by imposing restrictions on passwords and authentication attempts. See Increasing Account Secu-rity with the A Instruction.

X Executes a command. See Executing Commands with the X Instruction.

11.5.1 Creating the Three Types of User Accounts

Using the uss add and uss bulk commands, you can create three types of accounts that differ in their levels of functionality. Fora description of the types, see Configuring AFS User Accounts. The following list explains how to construct a template for eachtype:

• To create an authentication-only account, create an empty (zero-length) template file. Such an account has only two compo-nents: entries in the Authentication Database and Protection Database.

• To create a basic account, include a V instruction, and G instructions if you want to distribute home directories evenly asdescribed in Evenly Distributing User Home Directories with the G Instruction. In addition to Authentication Database andProtection Database entries, this type of account includes a volume mounted at the home directory with owner and ACL setappropriately.

• To create a full account, include D, E, F, L, and S instructions as appropriate, in addition to the V and G instructions. Thistype of account includes configuration files for basic functions such as logging in, printing, and mail delivery. For a discussionof some useful types of configuration files, see Creating Standard Files in New AFS Accounts.

11.5.2 Using Constants and Variables in the Template File

Each instruction in the uss template file has several fields that define the characteristics of the element that it creates. The Dinstruction’s fields, for instance, define a directory’s pathname, owner, mode bits, and ACL.

You can place three types of values in a field: a variable, a constant, or a combination of the two. The appropriate value dependson the desired configuration, and determines which arguments you provide to the uss add command or which fields you includein a bulk input file add instruction.


If an aspect of account configuration is the same for every user, define a constant value in the appropriate field by insertinga character string. For example, to assign a space quota of 10,000 KB to every user volume, place the string 10000 in the Vinstruction’s quota field.

If, on the other hand, an aspect of account configuration varies for each user, put a variable in the appropriate field. When creatingeach account, provide a value for the variable by providing either the corresponding argument to the uss add command or a valuein the corresponding field of the add instruction in the bulk input file.

The uss command suite defines a set of template variables, each of which has a corresponding source for its value, as summarizedin Table 3. For a discussion of their intended uses, see the following sections about each template instruction (Creating a Volumewith the V Instruction through Executing Commands with the X Instruction).

Variable Source for value$AUTO Previous G instructions in template

$MTPT -mount argument to uss add command or mount_point field of bulk input file add instruction,when in V instruction; V instruction’s mount_point field when in subsequent instructions

$NAME-realname argument to uss add command or mount_point field of bulk input file addinstruction, if provided; otherwise, -user argument to uss add command or username field ofin bulk input file add instruction

$PART -partition argument to uss add command or partition field of bulk input file add instruction

$PWEXPIRES -pwexpires argument to uss add command or password_expires field of bulk input file addinstruction

$SERVER -server argument to uss add command or file_server field of bulk input file add instruction

$UID -uid argument to uss add command or uid field of bulk input file add instruction, if provided;otherwise, allocated automatically by Protection Server

$USER -user argument to uss add command or username field of bulk input file add instruction

$1 through $9 -var argument to uss add command or var1 through var9 fields of bulk input file addinstruction

Table 11.1: Source for values of uss template variables

A common use of variables is to define the file server machine and partition that house the user’s volume, which often vary fromuser to user. Place the $SERVER variable in the V instruction’s server field, and the $PART variable in its partition field. If usingthe uss add command, provide the desired value with the -server and -partition arguments. If using the uss bulk command,provide the desired values in the file_server and partition fields of each user’s add instruction in the bulk input file.

The variables $1 through $9 can be used to customize other aspects of the account. Provide a value for these variables with the-var argument to the uss add command or in the appropriate field of the bulk input file add instruction. The -var argument isunusual in that each instance for it has two parts: the number index and the value, separated by a space. For examples of the useof a number variable, see the discussions of the mount_point and quota fields in Creating a Volume with the V Instruction.

If some aspect of account configuration is partly constant and partly variable, you can combine variables and constants inan instruction field. For example, suppose that the Example Corporation mounts user volumes in the /afs/example.com/usrdirectory. That part of the pathname is constant, but the name of the mount point and home directory is the user’s username,which corresponds to the $USER variable. To configure accounts in this way, combine a constant string and a variable in the Vinstruction’s mount_point field as follows:

/afs/example.com/usr/$USER

Then provide the value for the $USER variable with the -user argument to the uss add command, or in the username field ofeach user’s add instruction in the bulk input file.

11.5.3 Where to Place Template Files

A template must be available to the uss command interpreter as it executes a uss add or uss bulk command, even if it is thezero-length file appropriate for creating an authentication-only account.

If you do not provide the -template argument to the uss add or uss bulk command, then the command interpreter searches for atemplate file called uss.template in each of the following directories in turn:


1. The current working directory

2. /afs/cellname/common/uss, where cellname is the local cell

3. /etc

To use a template file with a different name or stored in a different directory, include the -template argument to the uss add oruss bulk command. If you provide a filename only, the command interpreter looks for it in the directories listed just previously.If you provide a pathname and filename, it looks only in the specified directory, interpreting a partial pathname relative to thecurrent working directory.

11.5.4 Some General Rules for Constructing a Template

This section summarizes some general rules to follow when constructing a template file. For each instruction’s syntax definition,see the following sections (Evenly Distributing User Home Directories with the G Instruction through Executing Commandswith the X Instruction).

• If a variable takes its value from an element elsewhere within the template, the definition must precede the reference. Puttingthe instruction lines in the following order usually results in correct resolution of variables:

G V D F E L S A X

• The fields in each instruction must appear in the order specified by the instruction’s syntax definition, which appear in thefollowing sections about each instruction. You cannot omit a field. Separate each field from its neighbors with one or morespaces.

• When specifying a pathname, provide a full one. Partial pathnames are interpreted relative to the current working directory(the one in which the uss command is issued), with possibly unintended results.

• Each instruction must appear on a single line in the template file, with a newline character (<Return>) only at the end of theinstruction. Some example instructions appear in this document on more than one line, but that is only for legibility.

• Provide a value for every variable that appears in the template by including the corresponding argument to the uss add com-mand or placing a value in the corresponding field of the bulk input file add instruction. A missing value halts the entire creationoperation. If a variable does not appear in the template file, the command interpreter ignores the corresponding command-lineargument or field in the bulk input file, even if you provide it.

• You can use blank lines in the template file to increase its legibility. If you place comments in the file, begin each commentline with the number sign (#).

11.5.5 About Creating Local Disk Directories and Files

It is possible to use the D, E, and F instructions to create directories or files in the local file system of the machine on which youare issuing the uss command, but that usage is not recommended. It introduces two potential complications:

• The local file system automatically assigns ownership of a new local disk directory or file to its creator. Because you are theissuer of the uss command that is creating the object, it records your current UNIX UID. If that is not appropriate and youwant to designate another owner as the object is created, then you must be logged in as the local superuser root (the local filesystem allows only the root user to issue the UNIX chown command, which the uss command interpreter invokes to changethe owner from the default value). You must also use the -admin argument to the uss add or uss bulk command to authenticateas a privileged AFS administrator. Only an administrator can create Authentication Database and Protection Database entries,which the uss command interpreter always creates as part of a new account.

The alternative is to become the local superuser root after the uss operation completes, and issue the necessary chown com-mand then. However, that makes the account creation process that much less automated.

• Creating a local disk directory always generates an error message because the uss command interpreter cannot successfully seta local directory’s ACL. The directory is created nevertheless, and a value still must appear in the D instruction’s ACL field.


11.5.6 Example uss Templates

This section describes example templates for the basic and full account types (the template for an authentication-only account isempty).

The first example creates a basic account. It contains two G instructions and a V instruction that defines the volume name, fileserver machine, partition, quota in kilobytes, mount point, home directory owner, and home directory access control list. In theExample Corporation cell, a suitable template is:

G /afs/.example.com/usr1G /afs/.example.com/usr2V user.$USER $SERVER.example.com /vicep$PART 5000 $AUTO/$USER $UID \

$USER all staff rl

When issuing the uss add command with this type of template, provide the following arguments:

• -user to specify the username for the $USER variable

• -server to specify the unique part of the file server machine name for the $SERVER variable

• -partition to specify the unique part of the partition name for the $PART variable

The Protection Server automatically assigns an AFS UID for the $UID variable, and the G instructions provide a value for the$AUTO variable.

The following example template file creates a full account in the Example Corporation cell. The following sections about eachtype of instruction describe the effect of the examples. Note that the V and E instructions appear on two lines each only for thesake of legibility.

## Specify the available grouping directories#G /afs/.example.com/usr1G /afs/.example.com/usr2## Create the user’s home volume#V user.$USER $SERVER.example.com /vicep$PART 5000 /afs/.example.com/$AUTO/$USER \

$UID $USER all abc:staff rl## Create directories and files for mail#D $MTPT/.MESSAGES 0700 $UID $USER all abc:staff noneD $MTPT/.Outgoing 0700 $UID $USER rlidwk postman rlidwkD $MTPT/Mailbox 0700 $UID $USER all abc:staff none system:anyuser lik## Here are some useful scripts for login etc.#F $MTPT/.Xbiff 0755 $UID /afs/example.com/admin/user/protoF $MTPT/.Xresources 0644 $UID /afs/example.com/admin/user/protoF $MTPT/.Xsession 0755 $UID /afs/example.com/admin/user/protoF $MTPT/.cshrc 0755 $UID /afs/example.com/admin/user/protoF $MTPT/.login 0755 $UID /afs/example.com/admin/user/protoF $MTPT/.logout 0755 $UID /afs/example.com/admin/user/protoF $MTPT/.twmrc 0644 $UID /afs/example.com/admin/user/protoF $MTPT/preferences 0644 $UID /afs/example.com/admin/user/proto## Make a passwd entry#E /afs/.example.com/common/etc/newaccts/passwd_$USER 0644 root \

"$USER:X:$UID:11:$NAME:$MTPT:/bin/csh"#


# Put in the standard password/authentication checks#A $USER 250 noreuse 9 25## Create and mount a public volume for the user#X "create_public_vol $USER $1 $2"## Here we set up the symbolic link to public directory#S /afs/example.com/public/$USER $MTPT/public

11.5.7 Evenly Distributing User Home Directories with the G Instruction

In cells with thousands of user accounts, it often makes sense to distribute the mount points for user volumes into multiple parentdirectories, because placing them all in one directory noticeably slows down directory lookup when a user home directory is ac-cessed. A possible solution is to create parent directories that group user home directories alphabetically, or that reflect divisionslike academic or corporate departments. However, in a really large cell, some such groups can still be large enough to slow direc-tory lookup, and users who belong to those groups are unfairly penalized every time they access their home directory. Anotherdrawback to groupings that reflect workplace divisions is that you must move mount points when users change departmentalaffiliation.

An alternative is an even distribution of user home directories into multiple parent directories that do not represent workplacedivisions. The uss command suite enables you to define a list of directories by placing a G instruction for each one at thetop of the template file, and then using the $AUTO variable in the V instruction’s mount_point field. When the uss commandinterpreter encounters the $AUTO variable, it substitutes the directory named by a G instruction that currently has the fewestentries. (Actually, the $AUTO variable can appear in any field that includes a pathname, in any type of instruction. In all cases,the command interpreter substitutes the directory that currently has the fewest entries.)

The G instruction’s syntax is as follows:

G directory

where directory specifies either a complete directory pathname or only the final element (the directory itself). The choicedetermines the appropriate value to place in the V instruction’s mount_point field.

Specify the read/write path to each directory, to avoid the failure that results when you attempt to create a new mount point in aread-only volume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-only paths throughthe filespace, see Mounting Volumes.

For example, the Example Corporation example template for a full account in Example uss Templates defines two directories:

G /afs/.example.com/usr1G /afs/.example.com/usr2

and puts the value $AUTO/$USER in the V instruction’s mount_point field. An alternative with the same result is to define thedirectories as follows:

G usr1G usr2

and specify a more complete pathname in the V instruction’s mount_point field: /afs/.example.com/$AUTO/$USER.

11.5.8 Creating a Volume with the V Instruction

Unless the template file is empty (zero-length), one and only one V instruction must appear in it. (To create other volumes for auser as part of a uss account-creation operation, use the X instruction to invoke the vos create command or a script that invokes


that command along with others, such as the fs mkmount command. For an example, see Executing Commands with the XInstruction.)

The V instruction defines the following AFS entities:

• A volume and associated VLDB entry

• The volume’s site (file server machine and partition)

• The volume’s mount point in the AFS filespace, which becomes the user’s home directory

• The volume’s space quota

• The home directory’s owner, usually the new user

• The home directory’s ACL, which normally at least grants all permissions to the user

The following discussion of the fields in a V instruction refers to the example in the full account template from Example ussTemplates (the instruction appears here on two lines only for legibility):

V user.$USER $SERVER.example.com /vicep$PART 5000 \/afs/.example.com/$AUTO/$USER $UID $USER all abc:staff rl

The V instruction’s syntax is as follows:

V volume_name server partition quota mount_point owner ACL

where

V Indicates a volume creation instruction.

volume_name Specifies the volume’s name as recorded in the VLDB.

To follow the convention of including the user’s name as part of the volume name, include the $USER variable in this field.The variable takes its value from the -user argument to the uss add command or from the bulk input file add instruction’susername field.

The Example Corporation example uses the value user.$USER to assign the conventional volume name, user.username.When creating an account for user smith, for example, you then include -user smith as an argument to the uss addcommand, or place the value smith in the bulk input file add instruction’s username field.

server Names the file server machine on which to create the new volume. It is best to provide a fully qualified host name (forexample, fs1.example.com), but an abbreviated form is acceptable if the cell’s naming service is available to resolve it atthe time the volume is created.

To place different users’ volumes on different file server machines, use the $SERVER variable in this field, and providea value for it either with the -server argument to the uss add command or in the server field of the bulk input file addinstruction. One easy way to specify a fully qualified hostname without having to type it completely on the command lineis to combine a constant and the $SERVER variable. Specifically, the constant specifies the domain-name suffix commonto all the file server machines.

In the Example Corporation example, all of the file server machines in the cell share the example.com domain name suffix,so the server field combines a variable and constant: $SERVER.example.com. To place the new volume on the machinefs1.example.com, you then include -server fs1 as an argument to the uss add command, or place the value fs1 in the bulkinput file add instruction’s server field.

partition Specifies the partition on which to create the user’s volume; it must be on the file server machine named in the serverfield. Identify the partition by its complete name (for example, /vicepa) or use one of the abbreviations listed in Rules forUsing Abbreviations and Aliases.

To place different users’ volumes on different partitions, use the $PART variable in this field, and provide a value for iteither with the -partition argument to the uss add command or in the partition field of the bulk input file add instruction.Because all full partition names start with the /vicep string, it is convenient to combine that string as a constant with the$PART variable.

The Example Corporation example template combines the constant string /vicep and the $PART variable in this way, as/vicep$PART.


quota Sets the maximum number of kilobyte blocks the volume can occupy on the file server machine’s disk. It must be aninteger. If you assign the same quota to all user volumes, specify a constant value. To assign different quotas to differentvolumes, place one of the number variables ($1 through $9) in this field, and provide a value for it either with the -varargument to the uss add command or in the appropriate field of the bulk input file add instruction.

The Example Corporation example grants a 5000 KB initial quota to every new user.

mount_point Creates a mount point for the volume, which serves as the volume’s root directory and the user’s home directory.By convention, user home directory names include the username, which you can read in by including the $USER variablein this field.

Specify the read/write path to the mount point, to avoid the failure that results when you attempt to create the new mountpoint in a read-only volume. By convention, you indicate the read/write path by placing a period before the cell name atthe pathname’s second level (for example, /afs/.example.com). If you use the $AUTO variable in this field, the directo-ries named by each G instruction possibly already indicate the read/write path. For further discussion of the concept ofread/write and read-only paths through the filespace, see Mounting Volumes.

If other parts of the mount point name also vary from user to user, you can use the $MTPT variable in this field, and providea value with the uss add command’s -mount argument or in the mount_point field of a bulk input file add instruction.Note, however, that when the $MTPT variable appears in subsequent instructions in the template (usually, in D, E, or Finstructions), it instead takes as its value the complete contents of this field.

Combine constants and variables based on how you have decided to group home directories together in one or moreparent directories. Note that the parent directories must already exist before you run a uss add or uss bulk command thatreferences the template. Possibilities for grouping home directories include the following:

• Placing all user home directories in a single parent directory; the name /afs/cellname/usr is an AFS-appropriate vari-ation on the UNIX /usr convention. This choice is most appropriate for a cell with a small number of user ac-counts. The simplest way to implement this choice is to combine a constant string and the $USER variable, as in/afs/.example.com/usr/$USER.

• Distributing home directories evenly into a set of parent directories that do not correspond to workplace divisions. Thischoice is appropriate in cells with tens of thousands of accounts, where the number of home directories is large enoughto slow directory lookup significantly if they all reside together in one parent directory, but distribution according toworkplace divisions is not feasible.The $AUTO variable is designed to distribute home directories evenly in this manner. As explained in Evenly Distribut-ing User Home Directories with the G Instruction, the uss command interpreter substitutes the directory that is definedby a preceding G template instruction and that currently has the fewest entries. The example Example Corporationtemplate illustrates this choice by using the value /afs/.example.com/$AUTO/$USER.

• Distributing home directories into multiple directories that reflect divisions like academic or corporate departments.Perhaps the simplest way to implement this scheme is to use the $MTPT variable to represent the department, as in/afs/.example.com/usr/$MTPT/$USER. You then provide -user smith and -mount acctg arguments to the uss addcommand to create the mount point /afs/.example.com/usr/acctg/smith.

• Distributing home directories into alphabetic subdirectories of usr (usr/a, usr/b and so on), based on the first letter orletters in the username. The advantage is that knowing the username enables you easily to locate a home directory. Apotential drawback is that the distribution is not likely to be even, and if there are a large number of accounts, thenslowed directory lookup unfairly affects users whose names begins with popular letters.Perhaps the simplest way to implement this scheme is to use the $MTPT variable to represent the letter or letters, asin /afs/.example.com/usr/$MTPT/$USER. Then provide the -user smith and -mount s/m arguments to the uss addcommand to create the mount point /afs/.example.com/usr/s/m/smith.

owner Specifies the username or UID of the user to be designated the mount point’s owner in the output from the UNIX ls-ld command. To follow the standard convention for home directory ownership, use the $UID variable in this field, as inthe Example Corporation example template. The Protection Server then automatically assigns an AFS UID unless youprovide the -uid argument to the uss add command or fill in the uid field in the bulk input file add instruction. (If you areconverting existing UNIX accounts, see the discussion of additional considerations in Converting Existing UNIX Accountswith uss.)

ACL Sets the ACL on the new home directory. Provide one or more paired values, each pair consisting of an AFS username orgroup name and the desired permissions, in that order (a group name must already exist in the Protection Database to be


used). Separate the two parts of the pair, and each pair, with a space. For a discussion of the available permissions, seeThe AFS ACL Permissions.

At minimum, grant all permissions to the new user by including the value $USER all in this field. The File Serverautomatically grants all permissions to the system:administrators group as well. You cannot grant permissions to theissuer of the uss command, because as the last step in account creation the uss command interpreter automatically deletesthat user from any ACLs set during the creation process.

The Example Corporation example uses the following value to grant all permissions to the new user and r (read) and l(lookup) permissions to the members of the abc:staff group:

$USER all abc:staff rl

11.5.9 Creating a Directory with the D Instruction

Each D instruction in the template file creates a directory; there is no limit on the number of them in the template. If a D instructioncreates a subdirectory in a new user’s home directory (its intended use), then it must follow the V instruction. Creating a directoryon the local disk of the machine where the uss command runs is not recommended for the reasons outlined in About CreatingLocal Disk Directories and Files.

The following discussion of the fields in a D instruction refers to one of the examples in the full account template in Example ussTemplates:

D $MTPT/Mailbox 0700 $UID $USER all abc:staff none system:anyuser lik

The D instruction’s syntax is as follows:

D pathname mode_bits owner ACL

where

D Indicates a directory creation instruction.

pathname Specifies the directory’s full pathname. If it is a subdirectory of the user’s home directory, it is simplest to use the$MTPT variable to specify the home directory pathname. When the $MTPT variable appears in a D instruction, it takesits value from the preceding V instruction’s mount_point field (this dependency is why a D instruction must follow the Vinstruction).

Specify the read/write pathname to the directory, to avoid the failure that results when you attempt to create a new directoryin a read-only volume. By convention, you indicate the read/write path by placing a period before the cell name at thepathname’s second level (for example, /afs/.example.com). If you use the $MTPT variable in this field, the value in theV instruction’s mount_point field possibly already indicates the read/write path. For further discussion of the concept ofread/write and read-only paths through the filespace, see Mounting Volumes.

The Example Corporation example uses the value $MTPT/Mailbox to place the Mailbox subdirectory in the user’s homedirectory.

mode_bits Defines the directory’s UNIX mode bits. Acceptable values are the standard three- or four-digit numbers correspond-ing to a combination of permissions. Examples: 0755 corresponds to rwxr-xr-x, and 0644 to rw-r--r--. The first (owner)x bit must be turned on to enable access to a directory.

The Example Corporation example uses the value 0700 to set the mode bits on the Mailbox subdirectory to rwxr-----.

owner Specifies the username or UID of the user to be designated the directory’s owner in the output from the UNIX ls -ldcommand.

If the directory resides in AFS, place the $UID variable in this field, as in the Example Corporation example template. TheProtection Server then automatically assigns an AFS UID unless you provide the -uid argument to the uss add command orfill in the uid field in the bulk input file add instruction. (If you are converting existing UNIX accounts, see the discussionof additional considerations in Converting Existing UNIX Accounts with uss.)

If the directory resides on the local disk, it is simplest to specify the username or UNIX UID under which you are issuingthe uss command. For a discussion of the complications that arise from designating another user, see About Creating LocalDisk Directories and Files.


ACL Sets the ACL on the new directory. Provide one or more paired values, each pair consisting of an AFS username or groupname and the desired permissions, in that order (a group name must already exist in the Protection Database to be used).Separate the two parts of the pair, and each pair, with a space. For a description of the available permissions, see The AFSACL Permissions.

At minimum, grant all permissions to the new user by including the value $USER all. You cannot grant permissions to theissuer of the uss command, because as the last step in account creation the uss command interpreter automatically deletesthat user from any ACLs set during the creation process. An error message always appears if the directory is on the localdisk, as detailed in About Creating Local Disk Directories and Files.

The Example Corporation example uses the following value to grant all permissions to the new user, no permissionsto the members of the abc:staff group, and the l (lookup), i (insert), and k (lock) permissions to the members of thesystem:anyuser group:

$USER all abc:staff none system:anyuser likIt grants such extensive permissions to the system:anyuser group to enable any system user (including a mail-deliverydaemon) to insert mail into the Mailbox directory. The absence of the r (read) permission prevents members of thesystem:anyuser group from reading the mail files.

11.5.10 Creating a File from a Prototype with the F Instruction

Each F instruction in the template file creates a file by copying the contents of an existing prototype file; there is no limit on thenumber of them in the template, and each can refer to a different prototype. If an F instruction creates a file in a new user’s homedirectory or a subdirectory of it (the intended use), then it must follow the V or D instruction that creates the parent directory.Creating a file on the local disk of the machine where the uss command runs is not recommended for the reasons detailed inAbout Creating Local Disk Directories and Files.

The E instruction also creates a file, but the two types of instruction have complementary advantages. Files created with an Einstruction can be customized for each user, because variables can appear in the field that specifies the contents of the file. Incontrast, the contents of a file created using the F instruction are the same for every user. An E file can be only a single line,however, whereas an F file can be any length.

The following discussion of the fields in a F instruction refers to one of the examples in the full account template in Example ussTemplates:

F $MTPT/.login 0755 $UID /afs/example.com/admin/user/proto

The F instruction’s syntax is as follows:

F pathname mode_bits owner prototype_file

where

F Indicates a file creation instruction.

pathname Specifies the full pathname of the file to create, including the filename. If it resides in the user’s home directory ora subdirectory of it, it is simplest to use the $MTPT variable to specify the home directory pathname. When the $MTPTvariable appears in an F instruction, it takes its value from the preceding V instruction’s mount_point field (this dependencyis why an F instruction must follow the V instruction).

Specify the read/write path to the file, to avoid the failure that results when you attempt to create a new file in a read-only volume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). If you use the $MTPT variable in this field, the value in the V instruction’smount_point field possibly already indicates the read/write path. For further discussion of the concept of read/write andread-only paths through the filespace, see Mounting Volumes.

The Example Corporation example uses the value $MTPT/.login to place a file called .login in the user’s home directory.

mode_bits Defines the file’s UNIX mode bits. Acceptable values are the standard three- or four-digit numbers corresponding toa combination of permissions. Examples: 0755 corresponds to rwxr-xr-x, and 0644 to rw-r--r--.The Example Corporation example uses the value 0755 to set the mode bits on the .login file to rwxr-xr-x.


owner Specifies the username or UID of the user to be designated the file’s owner in the output from the UNIX ls -l command.

If the file resides in AFS, place the $UID variable in this field, as in the Example Corporation example template. TheProtection Server then automatically assigns an AFS UID unless you provide the -uid argument to the uss add commandor fill in the uid field in the bulk input file add instruction. (If you are converting existing UNIX accounts, see the discussionof additional considerations in Converting Existing UNIX Accounts with uss.)

If the file resides on the local disk, it is simplest to specify the username or UNIX UID under which you are issuing the usscommand. For a discussion of the complications that arise from designating another user, see About Creating Local DiskDirectories and Files.

prototype_file Names the AFS or local directory that houses the prototype file to copy. The prototype file’s name must matchthe final element in the pathname field.

The Example Corporation example references a prototype file called .login in the directory /afs/example.com/admin/user/proto.

11.5.11 Creating One-Line Files with the E Instruction

Each E instruction in the template file creates a file by echoing a specified single line into it; there is no limit on the number ofthem in the template. If an E instruction creates a file in a new user’s home directory or a subdirectory of it (the intended use),then it must follow the V or D instruction that creates the parent directory. Creating a file on the local disk of the machine wherethe uss command runs is not recommended for the reasons detailed in About Creating Local Disk Directories and Files.

The F instruction also creates a file, but the two types of instruction have complementary advantages. Files created with an Einstruction can be customized for each user, because variables can appear in the field that specifies the contents of the file. Thecommand interpreter replaces the variables with appropriate values before creating the file. In contrast, the contents of a filecreated using the F instruction are the same for every user. An E file can be only a single line, however, whereas an F file can beany length.

The E instruction is particularly suited to creating an entry for the new user in the cell’s common source password file, whichis then copied to client machines to serve as the local password file (/etc/passwd or equivalent). The following discussion ofthe fields refers to an example of this type of use, from the Example Corporation’s full account template shown in Example ussTemplates. For further discussion of how to incorporate the files created in this way into a common source password file, seeCreating a Common Source Password File.

E /afs/.example.com/common/etc/newaccts/passwd_$USER 0644 root \"$USER:X:$UID:11:$NAME:$MTPT:/bin/csh"

The E instruction’s syntax is as follows:

E pathname mode_bits owner "contents"

where

E Indicates a file creation instruction.

pathname Specifies the full pathname of the file to create, including the filename. It can include variables. If it resides inthe user’s home directory or a subdirectory of it, it is simplest to use the $MTPT variable to specify the home directorypathname. When the $MTPT variable appears in an E instruction, it takes its value from the preceding V instruction’smount_point field (this dependency is why an E instruction must follow the V instruction.)

Specify the read/write path to the file, to avoid the failure that results when you attempt to create a new file in a read-only volume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). If you use the $MTPT variable in this field, the value in the V instruction’smount_point field possibly already indicates the read/write path. For further discussion of the concept of read/write andread-only paths through the filespace, see Mounting Volumes.

The Example Corporation example writes the file created by the E instruction to /afs/.example.com/common/etc/newacctsdirectory, naming it after the new user:

/afs/.example.com/common/etc/newaccts/passwd_$USER


mode_bits Defines the file’s UNIX mode bits. Acceptable values are the standard three- or four-digit numbers corresponding toa combination of permissions. Examples: 0755 corresponds to rwxr-xr-x, and 0644 to rw-r--r--.The Example Corporation example uses the value 0644 to set the mode bits on the passwd_user file to r-xr--r--.

owner Specifies the username or UID of the user to be designated the file’s owner in the output from the UNIX ls -l command.

If the file resides in AFS and is to be owned by the user, place the $UID variable in this field. The Protection Serverthen automatically assigns an AFS UID unless you provide the -uid argument to the uss add command or fill in the uidfield in the bulk input file add instruction. (If you are converting existing UNIX accounts, see the discussion of additionalconsiderations in Converting Existing UNIX Accounts with uss.)

If the file resides on the local disk, specify the username or UNIX UID under which you are issuing the uss command. Fora discussion of the complications that arise from designating another user, see About Creating Local Disk Directories andFiles.

The Example Corporation example is creating an AFS file intended for incorporation into the common password file,rather than for direct use by the new user. It therefore designates the local superuser root as the owner of the new file.Designating an alternate owner on an AFS file does not introduce complications: issuing the chown command on AFS filesrequires membership in the system:administrators group, but the issuer of the uss command is necessarily authenticatedas a member of that group.

contents Specifies the one-line character string to write into the new file. Surround it with double quotes if it contains one ormore spaces. It cannot contain the newline character, but can contain any of the standard variables, which the commandinterpreter resolves as it creates the file.

The Example Corporation example has the following value in the contents field, to create a password file entry:

$USER:X:$UID:10:$NAME:$MTPT:/bin/csh

11.5.12 Creating Links with the L and S Instructions

Each L instruction in the template file creates a hard link between two files, as achieved by the standard UNIX ln command. TheS instruction creates a symbolic link between two files, as achieved by the standard UNIX ln -s command. An explanation oflinks is beyond the scope of this document, but the basic effect in both cases is to create a second name for an existing file, sothat it can be accessed via either name. Creating a link does not create a second copy of the file.

There is no limit on the number of L or S instructions in a template file. If the link is in a new user’s home directory or asubdirectory of it (the intended use), then it must follow the V or D instruction that creates the parent directory, and the F, E, orX instruction that creates the file being linked to. Creating a file on the local disk of the machine where the uss command runs isnot recommended, for the reasons detailed in About Creating Local Disk Directories and Files.

Note that AFS allows hard links only between files that reside in the same directory. This restriction is necessary to eliminate theconfusion that results from associating two potentially different ACLs (those of the two directories) with the same file. Symboliclinks are legal between two files that reside in different directories and even in different volumes. The ACL on the actual fileapplies to the link as well.

You do not set the owner or mode bits on a link created with an L or S instruction, as you do for directories or files. The usscommand interpreter automatically records the UNIX UID of the uss command’s issuer as the owner, and sets the mode bits tolrwxrwxrwx (777).

The following discussion of the fields in an L or S instruction refers to an example in the full account template from Exampleuss Templates, namely

S /afs/example.com/public/$USER $MTPT/public

The L and S instructions’ syntax is as follows:

L existing_file linkS existing_file link

where


L Indicates a hard link creation instruction.

S Indicates a symbolic link creation instruction.

existing_file Specifies the complete pathname of the existing file. If it resides in the user’s home directory or a subdirectory ofit, it is simplest to use the $MTPT variable to specify the home directory pathname. When the $MTPT variable appears inan L or S instruction, it takes its value from the preceding V instruction’s mount_point field (this dependency is why theinstruction must follow the V instruction).

Do not create a symbolic link to a file whose name begins with the number sign (#) or percent sign (%). When the CacheManager reads a symbolic link whose contents begin with one of those characters, it interprets it as a regular or read/writemount point, respectively.

The Example Corporation example creates a link to the publicly readable volume created and mounted by a preceding Xinstruction, by specifying the path to its mount point:

/afs/example.com/public/$USER

link Specifies the complete pathname of the second name for the file. If it resides in the user’s home directory or a subdirectoryof it, it is simplest to use the $MTPT variable to specify the home directory pathname.

Specify the read/write path to the link, to avoid the failure that results when you attempt to create a new link in a read-only volume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). If you use the $MTPT variable in this field, the value in the V instruction’smount_point field possibly already indicates the read/write path. For further discussion of the concept of read/write andread-only paths through the filespace, see Mounting Volumes.

The Example Corporation example creates a link called public in the user’s home directory:

$MTPT/public

11.5.13 Increasing Account Security with the A Instruction

The A instruction in the template file enhances cell security by imposing the following restrictions on users’ password choiceand authentication attempts.

• Limiting the user’s password lifetime. When the lifetime expires, the user can no longer use the password to authenticate andmust change it.

• Prohibiting the reuse of the user’s 20 most-recently used passwords.

• Limiting the number of consecutive times that a user can provide an incorrect password during authentication, and for howlong the Authentication Server refuses further authentication attempts after the limit is exceeded (referred to as an accountlockout). For regular user accounts in most cells, the recommended limit is nine and lockout time is 25 minutes.

The following discussion of the fields in an A instruction refers to the example in the full account template from Example ussTemplates, which sets a password lifetime of 250 days, prohibits reuse of passwords, limits the number of failed authenticationattempts to nine, and creates a lockout time of 25 minutes if the authentication limit is exceeded:

A $USER 250 noreuse 9 25

The A instruction’s syntax is as follows:

A username password_lifetime password_reuse failures locktime

where

A Indicates a security enhancing instruction.


username Names the Authentication Database entry on which to impose security restrictions. Use the $USER variable to readin the username from the uss add command’s -user argument, or from the username field of an add instruction in the bulkinput file. The Example Corporation example uses this value.

password_lifetime Sets the number of days after the user’s password is changed that it remains valid. When the passwordbecomes invalid (expires), the user is unable to authenticate, but has 30 more days in which to issue the kpasswd commandto change the password (after that, only an administrator can change it).

Specify an integer from the range 1 through 254 to specify the number of days until expiration, the value 0 to indicatethat the password never expires, or the value $PWEXPIRES to read in the number of days from the uss add or uss bulkcommand’s -pwexpires argument. If the A instruction does not appear in the template file, by default the user’s passwordnever expires.

The Example Corporation example sets a password lifetime of 250 days.

password_reuse Determines whether or not the user can change his or her password (using the kpasswd or kas setpasswordcommand) to one that is similar to any of his or her last 20 passwords. The acceptable values are reuse to allow reuse andnoreuse to prohibit it. If the A instruction does not appear in the template file, the default is to allow password reuse.

The Example Corporation example prohibits password reuse.

failures Sets the number of consecutive times the user can provide an incorrect password during authentication (using the klogcommand or a login utility that grants AFS tokens). When the user exceeds the limit, the Authentication Server rejectsfurther authentication attempts for the amount of time specified in the locktime field.

Specify an integer from the range 1 through 254 to specify the number of failures permitted, or the value 0 to indicate thatthere is no limit to the number of unsuccessful attempts. If the A instruction does not appear in the template file, the defaultis to allow an unlimited number of failures.

The Example Corporation example sets the limit to nine failed attempts.

locktime Specifies how long the Authentication Server refuses authentication attempts from a user who has exceeded the failurelimit set in the failures field.

Specify a number of hours and minutes (hh:mm) or minutes only (mm), from the range 01 (one minute) through 36:00 (36hours). The Authentication Server automatically reduces any larger value to 36:00 and also rounds up any nonzero valueto the next highest multiple of 8.5 minutes. A value of 0 (zero) sets an infinite lockout time, in which case an administratormust always issue the kas unlock command to unlock the account.

The Example Corporation example sets the lockout time to 25 minutes, which is rounded up to 25 minutes 30 seconds (thenext highest multiple of 8.5 minutes).

11.5.14 Executing Commands with the X Instruction

The X instruction in the template file executes a command, which can be a standard UNIX command, a shell script or program,or an AFS command. The command string can include standard template variables, and any number of X instructions can appearin a template file. If an instruction manipulates an element created by another instruction, it must appear after that instruction.

The following discussion of the field in an X instruction refers to the example in the full account template from Example ussTemplates:

X "create_public_vol $USER $1 $2"

The X instruction’s syntax is as follows:

X "command"

where command specifies the command to execute. Surround it with double quotes if it contains spaces. The command stringcan contain any of the standard variables, which the uss command interpreter resolves before passing the command on to theappropriate other command interpreter, but it cannot contain newline characters.

The Example Corporation example invokes a script called create_public_vol, which creates another volume associated with thenew user and mounts it in a publicly readable part of the Example Corporation’s filespace:


"create_public_vol $USER $1 $2"

It uses the $USER variable to read in the username and make it part of both the volume name and mount point name. The usscommand issuer supplies a file server machine name for the $1 variable and a partition name for the $2 variable, to specify thesite for the new volume.

11.6 Creating Individual Accounts with the uss add Command

After you have created a template file, you can create an individual account by issuing the uss add command (for templatecreation instructions see Constructing a uss Template File). When you issue the command, the uss command interpreter contactsvarious AFS servers to perform the following actions:

• Create a Protection Database entry. By default, the Protection Server assigns an AFS UID which becomes the value of the$UID variable used in the template.

• Create an Authentication Database entry, recording an encrypted version of the initial password.

• Create the account components defined in the indicated template file, contacting the File Server, Volume Server, and VolumeLocation (VL) Server as necessary.

To review which types of instructions to include in a template to create different file system objects, see Constructing a ussTemplate File. If the template is empty, the uss add command creates an authentication-only account consisting of ProtectionDatabase and Authentication Database entries.

When you issue the uss add command, provide a value for each variable in the template file by including the correspondingcommand-line argument. If you fail to supply a value for a variable, the uss command interpreter substitutes a null string, whichusually causes the account creation to fail. If you include a command line argument for which the corresponding variable doesnot appear in the template, it is ignored.

Table 4 summarizes the mappings between variables and the arguments to the uss add command. It is adapted from Table 3, butincludes only those variables that take their value from command line arguments.

Variable Command-line Argument$MTPT -mount (for occurrence in V instruction)$NAME -realname if provided; otherwise -user$PART -partition$PWEXPIRES -pwexpires$SERVER -server$UID -uid if provided; otherwise allocated by Protection Server$USER -user$1 through $9 -var

Table 11.2: Command-line argument sources for uss template variables

11.6.1 To create an AFS account with the uss add command

1. Authenticate as an AFS identity with all of the following privileges. In the conventional configuration, the admin useraccount has them, or you possibly have a personal administrative account. (To increase cell security, it is best to create spe-cial privileged accounts for use only while performing administrative procedures; for further discussion, see An Overviewof Administrative Privilege.) If necessary, issue the klog command to authenticate.

% klog admin_userPassword: <admin_password>


The following list specifies the necessary privileges and indicates how to check that you have them.

• Membership in the system:administrators group. If necessary, issue the pts membership command, which is fullydescribed in To display the members of the system:administrators group.


• Inclusion in the /usr/afs/etc/UserList file. If necessary, issue the bos listusers command, which is fully described in Todisplay the users in the UserList file.


• The ADMIN flag on the Authentication Database entry. However, the Authentication Server always prompts you for apassword in order to perform its own authentication. The following instructions direct you to specify the administrativeidentity on the uss command line itself.

• The i (insert) and l (lookup) permissions on the ACL of the directory in which you are mounting the user’s volume. Ifnecessary, issue the fs listacl command, which is fully described in Displaying ACLs.


Members of the system:administrators group always implicitly have the a (administer) and by default also the l(lookup) permission on every ACL and can use the fs setacl command to grant other rights as necessary.

2. (Optional) Log in as the local superuser root. This is necessary only if you are creating new files or directories in the localfile system and want to designate an alternate owner as the object is created. For a discussion of the issues involved, seeAbout Creating Local Disk Directories and Files.

3. Verify the location and functionality of the template file you are using. For a description of where the uss commandinterpreter expects to find the template, see Where to Place Template Files. You can always provide an alternate pathnameif you wish. Also note the variables used in the template, to be sure that you provide the corresponding arguments on theuss command line.

4. (Optional) Change to the directory where the template resides. This affects the type of pathname you must type in Step 6.

% cd template_directory

5. (Optional) Run the uss add command with the -dryrun flag to preview the creation of the account. Note any errormessages and correct the cause before reissuing the command without the -dryrun flag. The next step describes theuss add command’s syntax. For more information on the -dryrun flag, see Avoiding and Recovering from Errors andInterrupted Operations.

6. Issue the uss add command to create the account. Enter the command on a single line; it appears here on multiple linesonly for legibility.

The uss add operation creates an Authentication Database entry. The Authentication Server performs its own authentica-tion rather than accepting your existing AFS token. By default, it authenticates your local (UNIX) identity, which possiblydoes not correspond to an AFS-privileged administrator. Include the -admin argument to name an identity that has theADMIN flag on its Authentication Database entry. To verify that an entry has the flag, issue the kas examine command asdescribed in To check if the ADMIN flag is set.

% uss add -user <login name> -admin <administrator to authenticate> \[-realname <full name in quotes>] [-pass <initial passwd>] \[-pwexpires <password expires in [0..254] days (0 => never)>] \[-server <FileServer for home volume>] \[-partition <FileServer’s disk partition for home volume>] \[-mount <home directory mount point>] \[-uid <uid to assign the user>] \[-template <pathname of template file>] \[-var <auxiliary argument pairs (Numval)>+] [-dryrun] \[-overwrite]



where

ad Is the shortest acceptable abbreviation of add.

-user Names the user’s Authentication Database and Protection Database entries. Because it becomes the username (thename under which a user logs in), it must obey the restrictions that many operating systems impose on usernames(usually, to contain no more than eight lowercase letters). Also avoid the following characters: colon (:), semi-colon (;), comma (,), at sign (@), space, newline, and the period (.), which is conventionally used only in specialadministrative names.This argument provides the value for the $USER variable in the template file. For suggestions on standardizingusernames, see Choosing Usernames and Naming Other Account Components.

-admin Names an administrative account that has the ADMIN flag on its Authentication Database entry, such as admin.The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

-realname Specifies the user’s actual full name. If it contains spaces or punctuation, surround it with double quotes. Ifyou do not provide it, it defaults to the username provided with the -user argument.This argument provides the value for the $NAME variable in the template file. For information about using thisargument and variable as part of an automated process for creating entries in a local password file such as /etc/passwd,see Creating a Common Source Password File.

-pass Specifies the user’s initial password. Although the AFS commands that handle passwords accept strings of virtuallyunlimited length, it is best to use a password of eight characters or less, which is the maximum length that manyapplications and utilities accept.Possible choices for initial passwords include the username, a string of digits such as those from a Social Securitynumber, or a standard string such as changeme, which is the default if you do not provide this argument. There is nocorresponding variable in the template file.Instruct users to change their passwords to a truly secret string as soon as they authenticate with AFS for the firsttime. The OpenAFS User Guide explains how to use the kpasswd command to change an AFS password.

-pwexpires Sets the number of days after a user’s password is changed that it remains valid. Provide an integer from therange 1 through 254 to specify the number of days until expiration, or the value 0 to indicate that the password neverexpires (the default if you do not provide this argument). When the password becomes invalid (expires), the user isunable to authenticate, but has 30 more days in which to issue the kpasswd command to change the password; afterthat, only an administrator can change it.This argument provides the value for the $PWEXPIRES variable in the template file.

-server Names the file server machine on which to create the new user’s home volume. It is best to provide a fullyqualified hostname (for example, fs1.example.com), but an abbreviated form is acceptable provided that the cell’snaming service is available to resolve it when you issue the uss add command.This argument provides the value for the $SERVER variable in the template file. To avoid having to type a fullyqualified hostname on the command line, combine the $SERVER variable with a constant (for example, the cell’sdomain name) in the server field of the V instruction in the template file. For an example, see Creating a Volume withthe V Instruction.

-partition Specifies the partition on which to create the user’s home volume; it must be on the file server machine namedby the -server argument. Identify the partition by its complete name (for example, /vicepa), or use one of theabbreviations listed in Rules for Using Abbreviations and Aliases.This argument provides the value for the $PART variable in the template file.

-mount Specifies the pathname for the user’s home directory in the cell’s read/write filespace. Partial pathnames areinterpreted relative to the current working directory.This argument provides the value for the $MTPT variable in the template file, but only when it appears in the Vinstruction’s mount_point field. When the $MTPT variable appears in any subsequent instructions, it takes its valuefrom the V instruction’s mount_point field, rather than directly from this argument. For more details, and for sugges-tions about how to use this argument and the $MTPT variable, see Creating a Volume with the V Instruction.

-uid Specifies a positive integer other than 0 (zero) to assign as the user’s AFS UID. It is best to omit this argument andallow the Protection Server to assign an AFS UID that is one greater than the current value of the max user idcounter. (To display the counter, use the pts listmax command as described in To display the AFS ID counters.)If you have a reason to use this argument (perhaps because the user already has a UNIX UID), first use the ptsexamine command to verify that there is no existing account with the desired AFS UID; if there is, the accountcreation process terminates with an error.


This argument provides the value for the $UID variable in the template file.

-template Specifies the pathname of the template file. If you omit this argument, the command interpreter searches for atemplate file called uss.template in each of the following directories in turn:

(a) The current working directory(b) /afs/cellname/common/uss, where cellname names the local cell(c) /etc

If you specify a filename other than uss.template but without a pathname, the command interpreter searches for itin the indicated directories. If you provide a full or partial pathname, the command interpreter consults the specifiedfile only; it interprets partial pathnames relative to the current working directory.If the specified template file is empty (zero-length), the command creates Protection and Authentication Databaseentries only.To learn how to construct a template file, see Constructing a uss Template File.

-var Specifies values for each of the number variables $1 through $9 that can appear in the template file. You can use thenumber variables to assign values to variables in the uss template file that are not part of the standard set.For each instance of this argument, provide two parts in the indicated order, separated by a space:

• The integer from the range 1 through 9 that matches the variable in the template file. Do not precede it with a dollarsign.

• A string of alphanumeric characters to assign as the value of the variable.

To learn about suggested uses for the number variables, see the description of the V instruction’s quota field inCreating a Volume with the V Instruction.

-dryrun Reports actions that the command interpreter needs to perform to run the command, without actually performingthem.

-overwrite Overwrites any directories, files, and links that exist in the file system and for which there are definitions in D,E, F, L, or S instructions in the template file named by the -template argument. If you omit this flag, the commandinterpreter prompts you once for confirmation that you want to overwrite all such elements.

7. If the new user home directory resides in a replicated volume, use the vos release command to release the volume, asdescribed in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). For example, the Example Corporation template puts the mount points for uservolumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, it residesin the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changing it bycreating a new mount point the administrator must issue the vos release command.

8. Create an entry for the new user in the local password file (/etc/passwd or equivalent) on each AFS client machine that heor she can log into. For suggestions on automating this step, see Creating a Common Source Password File.

Even if you do not use the automated method, set the user’s UNIX UID to match the AFS UID assigned automatically bythe Protection Server or assigned with the -uid argument. The new user’s AFS UID appears in the trace produced by theuss add output, or you can use the pts examine command to display it, as described in To display a Protection Databaseentry.

11.7 Deleting Individual Accounts with the uss delete Command

The uss delete command deletes an AFS user account according to the arguments you provide on the command line; unlike theuss add command, it does not use a template file. When you issue the command, the uss command interpreter contacts variousAFS servers to perform the following actions:


• Remove the mount point for the user’s home volume

• Remove the user’s home volume and delete the associated VLDB entry, unless you include the -savevolume flag

• Delete the user’s Authentication Database entry

• Delete the user’s Protection Database entry

Before issuing the uss delete command, you can also perform the following optional tasks:

• Copy the user’s home volume to tape or another permanent medium and record the username and UID on a reserved list. Thisinformation enables you to restore the user’s account easily if he or she returns to your cell. For information about using theAFS Backup System to back up volumes, see Configuring the AFS Backup System and Backing Up and Restoring AFS Data.

• If the user has exclusive use of any other volumes (such as a volume for storing project-related data), make a backup copy ofeach one and then remove it and its mount point as instructed in Removing Volumes and their Mount Points.

• Use the pts listowned command to display any groups that the user owns; instructions appear in To list the groups that a useror group owns. Decide whether to use the pts delete command to remove the groups or the pts chown command to transferownership to another user or group. Instructions appear in To delete Protection Database entries and To change a group’sowner. Alternatively, you can have the user remove or transfer ownership of the groups before leaving. A group that remains inthe Protection Database after its owner is removed is considered orphaned, and only members of the system:administratorsgroup can administer it.

You can automate some of these tasks by including exec instructions in the bulk input file and using the uss bulk command todelete the account. See Creating and Deleting Multiple Accounts with the uss bulk Command.

11.7.1 To delete an AFS account









• The d (delete) permission on the ACL of the directory that houses the user’s home directory. If necessary, issue the fslistacl command, which is fully described in Displaying ACLs.




2. Consider and resolve the issues discussed in the introduction to this section concerning the continued maintenance of adeleted user’s account information, owned groups, and volumes.

3. (Optional) Run the uss delete command with the -dryrun flag to preview the deletion of the account. Note any errormessages and correct the cause before reissuing the command without the -dryrun flag. The next step describes the ussdelete command’s syntax.

4. Issue the uss delete command to delete the account. Enter the command on a single line; it appears here on multiple linesonly for legibility.

The delete operation always removes the user’s entry from the Authentication Database. The Authentication Server per-forms its own authentication rather than accepting your existing AFS token. By default, it authenticates your local (UNIX)identity, which possibly does not correspond to an AFS-privileged administrator. Include the -admin argument to namean identity that has the ADMIN flag on its Authentication Database entry. To verify that an entry has the flag, issue the kasexamine command as described in To check if the ADMIN flag is set.

% uss delete -user <login name> \-mountpoint <mountpoint for user’s volume> \[-savevolume] -admin <administrator to authenticate> \[-dryrun]


where

d Is the shortest acceptable abbreviation of delete.

-user Names the entry to delete from the Protection and Authentication Databases.

-mountpoint Specifies the pathname of the mount point to delete (the user’s home directory). Unless the -savevolumeargument is included, the volume mounted there is also deleted from the file server machine where it resides, as is itsrecord from the VLDB. Partial pathnames are interpreted relative to the current working directory.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to delete a mountpoint from a read-only volume. By convention, you indicate the read/write path by placing a period before the cellname at the pathname’s second level (for example, /afs/.example.com). For further discussion of the concept ofread/write and read-only paths through the filespace, see Mounting Volumes.

-savevolume Retains the user’s volume and VLDB entry.



5. If the deleted user home directory resided in a replicated volume, use the vos release command to release the volume, asdescribed in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). For example, the Example Corporation template puts the mount points for uservolumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, it residesin the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changing it bydeleting a mount point the administrator must issue the vos release command.

6. Delete the user’s entry from the local password file (/etc/passwd or equivalent) of each client machine. If you intend toreactivate the user’s account in the future, it is simpler to comment out the entry or place an asterisk (*) in the passwordfield.


11.8 Creating and Deleting Multiple Accounts with the uss bulk Command

The uss bulk command allows you to create and delete many accounts at once. Before executing the command, you must

• Construct a template if you plan to create any accounts, just as you must do before running the uss add command. The sametemplate applies to all accounts created by a single uss bulk command.

• Construct a bulk input file of instructions that create and delete accounts and execute any related commands, as described inConstructing a Bulk Input File.

11.8.1 Constructing a Bulk Input File

You can include five types of instructions in a bulk input file: add, delete, exec, savevolume, and delvolume. The followingsections discuss their uses.

Creating a User Account with the add Instruction

Each add instruction creates a single user account, and so is basically the equivalent of issuing one uss add command. There isno limit to the number of add instructions in the bulk input file.

As indicated by the following syntax statement, the order of the instruction’s fields matches the order of arguments to the ussadd command (though some of the command’s arguments do not have a corresponding field). Like the uss add command’sarguments, many of the fields provide a value for a variable in the uss template file. Each instruction must be a single line in thefile (have a newline character only at its end); it appears on multiple lines here only for legibility.

add username[:full_name][:initial_password][:password_expires][:file_server][:partition][:mount_point][:uid][:var1][:var2][:var3][:var4][:var5][:var6][:var7][:var8][:var9][:]

For a complete description of the acceptable values in each field, see the uss Bulk Input File reference page in the OpenAFSAdministration Reference, or the description of the corresponding arguments to the uss add command, in To create an AFSaccount with the uss add command. Following are some basic notes:

• Begin the line with the string add only, not uss add.

• Only the first argument, username, is required. It corresponds to the -user argument to the uss add command.

• Do not surround the full_name value with double quotes, even though you must use them around the value for the -realnameargument to the uss add command.

• If you want to omit a value for an argument, indicate an empty field by using two colons with nothing between them. Leavinga field empty is acceptable if the corresponding command line argument is optional or if the corresponding variable does notappear in the template file. For every field that precedes the last one to which you assign an actual value, you must eitherprovide a value or indicate an empty field. It is acceptable, but not necessary, to indicate empty fields after the last one in whichyou assign a value.

• After the last field, end the line with either a colon and newline character (<Return>), or a newline alone.

• The final nine fields are for assigning values to the number variables ($1 through $9), with the fields listed in increasingnumerical order. Specify the value only, not the variable number.

Deleting a User Account with the delete Instruction

Each delete instruction deletes a single user account, and so is basically the equivalent of issuing one uss delete command. Thereis no limit to the number of delete instructions in the bulk input file.

Like all instructions in the bulk input file, each delete instruction must be a single line in the file (have a newline character onlyat its end), even though it can cover multiple lines on a display screen. The curly braces ({ }) indicate two mutually exclusivechoices.


delete username:mount_point_path[:{ savevolume | delvolume }][:]

For a complete description of the acceptable values in each field, see the uss Bulk Input File reference page in the OpenAFSAdministration Reference or the description of the corresponding arguments to the uss delete command, in To delete an AFSaccount. Following are some basic notes:

• Begin the line with the string delete only, not uss delete.

• The first two arguments, username and mount_point_path, are required. They correspond to the -user and -mountpointarguments to the uss delete command.

• The third field, which is optional, controls whether the user’s home volume is removed from the file server where it resides,along with the corresponding VLDB entry. There are three possible values:

– No value treats the volume and VLDB entry according to the prevailing default, which is established by a preceding savevol-ume or delvolume instruction in the template file. See the following discussion of those instructions to learn how the defaultis set.

– The string savevolume preserves the volume and VLDB entry, overriding the default.

– The string delvolume removes the volume and VLDB entry, overriding the default.

• After the last field, end the line with either a colon and newline character (<Return>), or a newline alone.

Running a Command or Script with the exec Instruction

The exec instruction runs the indicated AFS command, compiled program, or UNIX shell script or command. The commandprocessor assumes the AFS and local identities of the issuer of the uss bulk command, who must have the privileges required torun the command.

The instruction’s syntax is as follows:

exec command

It is not necessary to surround the command string with double quotes (" ") or other delimiters.

Setting the Default Treatment of Volumes with the delvolume and savevolume Instructions

The savevolume and delvolume instructions set the default treatment of volumes referenced by the delete instructions that followthem in the bulk input file. Their syntax is as follows:

savevolumedelvolume

Both instructions are optional and take no arguments. If neither appears in the bulk input file, then by default all volumes andVLDB entries referenced by delete instructions are removed. If the savevolume instruction appears in the file, it prevents theremoval of the volume and VLDB entry referenced by all subsequent delete instructions in the file. The delvolume instructionexplicitly establishes the default (which is deletion) for subsequent delete instructions.

The effect of either instruction lasts until the end of the bulk input file, or until its opposite appears. To override the prevailingdefault for a particular delete instruction, put the savevolume or delvolume string in the instruction’s third field. (You can alsouse multiple instances of the savevolume and delvolume instructions to toggle back and forth between default preservation anddeletion of volumes.)

11.8.2 Example Bulk Input File Instructions

To create an authentication-only account, use an add instruction like the following example, which includes only the first (user-name) argument. The user’s real name is set to match the username (anderson) and her initial password is set to the stringchangeme.

add anderson


The following example also creates an authentication-only account, but sets nondefault values for the real name and initialpassword.

add smith:John Smith:js_pswd

The next two example add instructions require that the administrator of the Example Corporation cell (example.com) has writtena uss template file with the following V instruction in it:

V user.$USER $SERVER.example.com /vicep$PART 10000 /afs/.example.com/usr/$3/$USER \$UID $USER all

To create accounts for users named John Smith from the Marketing Department and Pat Jones from the Finance Department, theappropriate add instructions in the bulk input file are as follows:

add smith:John Smith:::fs1:a:::::marketingadd jones:Pat Jones:::fs3:c:::::finance

The new account for Smith consists of Protection and Authentication Database entries called smith. His initial password is thedefault string changeme, and the Protection Server generates his AFS UID. His home volume, called user.smith, has a 10,000 KBquota, resides on partition /vicepa of file server machine fs1.example.com, and is mounted at /afs/.example.com/usr/marketing/smith.The final $UID $USER all part of the V instruction gives him ownership of his home directory and all permissions on its ACL.The account for jones is similar, except that it resides on partition /vicepc of file server machine fs3.example.com and is mountedat /afs/.example.com/usr/finance/jones.

Notice that the fields corresponding to mount_point, uid, var1, and var2 are empty (between the values a and marketing on thefirst example line) because the corresponding variables do not appear in the V instruction in the template file. The initial_passwdand password_expires fields are also empty.

If you wish, you can specify values or empty fields for all nine number variables in an add instruction. In that case, the bulkinput file instructions are as follows:

add smith:John Smith:::fs1:a:::::marketing::::::add jones:Pat Jones:::fs3:c:::::finance::::::

The following example is a section of a bulk input file with a number of delete instructions and a savevolume instruction. Becausethe first three instructions appear before the savevolume instruction and their third field is blank, the corresponding volumes andVLDB entries are removed. The delete instruction for user terry follows the savevolume instruction, so her volume is notremoved, but the volume for user johnson is, because the delvolume string in the third field of the delete instruction overridesthe current default.

delete smith:/afs/example.com/usr/smithdelete pat:/afs/example.com/usr/patdelete rogers:/afs/example.com/usr/rogerssavevolumedelete terry:/afs/example.com/usr/terrydelete johnson:/afs/example.com/usr/johnson:delvolume

The following example exec instruction is useful as a separator between a set of add instructions and a set of delete instructions.It generates a message on the standard output stream that informs you of the uss bulk command’s progress.

exec echo "Additions completed; beginning deletions..."

11.8.3 To create and delete multiple AFS user accounts










• The d (delete), i (insert) and l (lookup) permissions on the ACL of the parent directory for each volume mount point.If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.



2. (Optional.) Log in as the local superuser root. This is necessary only if you are creating new files or directories in thelocal file system and want to designate an alternate owner as the object is created. For a discussion of the issues involved,see About Creating Local Disk Directories and Files.

3. If the bulk input file includes add instructions, verify the location and functionality of the template you are using. For adescription of where the uss command interpreter expects to find the template, see Where to Place Template Files. Youcan always provide an alternate pathname if you wish. Also note which variables appear in the template, to be sure thatyou provide the corresponding arguments in the add instruction or on the uss bulk command line.

4. Create a bulk input file that complies with the rules listed in Constructing a Bulk Input File. It is simplest to put the file inthe same directory as the template file you are using.

5. (Optional.) Change to the directory where the bulk input file and template file reside.

% cd template_directory

6. Issue the uss bulk command to create or delete accounts, or both. Enter the command on a single line; it appears here onmultiple lines only for legibility.

The bulk operation always manipulates user entries in the Authentication Database. The Authentication Server performs itsown authentication rather than accepting your existing AFS token. By default, it authenticates your local (UNIX) identity,which possibly does not correspond to an AFS-privileged administrator. Include the -admin argument to name an identitythat has the ADMIN flag on its Authentication Database entry. To verify that an entry has the flag, issue the kas examinecommand as described in To check if the ADMIN flag is set.

% uss bulk <bulk input file> \[-template <pathname of template file>] \-admin <administrator to authenticate> \[-dryrun] [-overwrite] \[-pwexpires <password expires in [0..254] days (0 => never)>] \[-pipe]


where


b Is the shortest acceptable abbreviation of bulk.

bulk input file Specifies the pathname of the bulk input file. Partial pathnames are interpreted relative to the currentworking directory. For a discussion of the required file format, see Constructing a Bulk Input File.

-template Specifies the pathname of the template file for any uss add commands that appear in the bulk input file. Partialpathnames are interpreted relative to the current working directory. For a discussion of the required file format, seeConstructing a uss Template File.

-admin Names an administrative account that has the ADMIN flag on its Authentication Database entry, such as the adminaccount. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.


-overwrite Overwrites any directories, files and links that exist in the file system and for which there are also D, E, F, L, orS instructions in the template file named by the -template argument. If this flag is omitted, the command interpreterprompts, once for each add instruction in the bulk input file, for confirmation that it is to overwrite such elements.Do not include this flag if there are no add instructions in the bulk input file.

-pwexpires Sets the number of days after a user’s password is changed that it remains valid, for each user named by anadd instruction in the bulk input file. Provide an integer from the range 1 through 254 to specify the number of daysuntil expiration, or the value 0 to indicate that the password never expires (the default).When the password becomes invalid (expires), the user is unable to authenticate, but has 30 more days in which toissue the kpasswd command to change the password (after that, only an administrator can change it).

-pipe Suppresses the Authentication Server’s prompt for the password of the issuer or the user named by the -adminargument (the Authentication Server always separately authenticates the user who is creating or deleting an entry inthe Authentication Database). Instead, the command interpreter accepts the password as piped input from anotherprogram, enabling you to run the uss bulk command in unattended batch jobs.

7. If a newly created or deleted user home directory resides in a replicated volume, use the vos release command to releasethe volume, as described in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). For example, the Example Corporation template puts the mount points for uservolumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, it residesin the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changing it bycreating or deleting a mount point, the administrator must issue the vos release command.

8. If you are creating accounts, create an entry for the new user in the local password file (/etc/passwd or equivalent) on eachAFS client machine that he or she can log into. For suggestions on automating this step, see Creating a Common SourcePassword File.

Even if you do not use the automated method, set the user’s UNIX UID to match the AFS UID assigned automatically bythe Protection Server or assigned with the -uid argument. The new user’s AFS UID appears in the trace produced by theuss add output or you can use the pts examine command to display it, as described in To display a Protection Databaseentry.

9. If you are deleting accounts, delete the user’s entry from the local password file (/etc/passwd or equivalent) of each clientmachine. If you intend to reactivate the user’s account in the future, it is simpler to comment out the entry or place anasterisk (*) in the password field.


Chapter 12

Administering User Accounts

This chapter explains how to create and maintain user accounts in your cell.

The preferred method for creating user accounts is the uss program, which enables you to create multiple accounts with asingle command. See Creating and Deleting User Accounts with the uss Command Suite. If you prefer to create each accountcomponent individually, follow the instructions in Creating AFS User Accounts.



Create Protection Database entry pts createuserCreate Authentication Database entry kas createCreate volume vos createMount volume fs mkmountCreate entry on ACL fs setaclExamine Protection Database entry pts examineChange directory ownership /etc/chownLimit failed authentication attempts kas setfields with -attempts and -locktimeUnlock Authentication Database entry kas unlockSet password lifetime kas setfields with -pwexpiresProhibit password reuse kas setfields with -reuseChange AFS password kas setpasswordList groups owned by user pts listownedRename Protection Database entry pts renameDelete Authentication Database entry kas deleteRename volume vos renameRemove mount point fs rmmountDelete Protection Database entry pts deleteList volume location vos listvldbRemove volume vos remove

12.2 The Components of an AFS User Account

The differences between AFS and the UNIX file system imply that a complete AFS user account is not the same as a UNIXuser account. The following list describes the components of an AFS account. The same information appears in a correspondingsection of Creating and Deleting User Accounts with the uss Command Suite, but is repeated here for your convenience.


• A Protection Database entry defines the username (the name provided when authenticating with AFS), and maps it to an AFSuser ID (AFS UID), a number that the AFS servers use internally when referencing users. The Protection Database also tracksthe groups to which the user belongs. For details, see Administering the Protection Database.

• An Authentication Database entry records the user’s AFS password in a scrambled form suitable for use as an encryption key.

• A home volume stores all the files in the user’s home directory together on a single partition of a file server machine. Thevolume has an associated quota that limits its size. For a complete discussion of volumes, see Managing Volumes.

• A mount point makes the contents of the user’s volume visible and accessible in the AFS filespace, and acts as the user’s homedirectory. For more details about mount points, see About Mounting Volumes.

• Full access permissions on the home directory’s access control list (ACL) and ownership of the directory (as displayed by theUNIX ls -ld command) enable the user to manage his or her files. For details on AFS file protection, see Managing AccessControl Lists.

• A local password file entry (in the /etc/passwd file or equivalent) of each AFS client machine enables the user to log in andaccess AFS files through the Cache Manager. A subsequent section in this chapter further discusses local password file entries.

• Other optional configuration files make the account more convenient to use. Such files help the user log in and log out moreeasily, receive electronic mail, print, and so on.

12.3 Creating Local Password File Entries

To obtain authenticated access to a cell’s AFS filespace, a user must not only have a valid AFS token, but also an entry in thelocal password file (/etc/passwd or equivalent) of the machine whose Cache Manager is representing the user. This sectiondiscusses why it is important for the user’s AFS UID to match to the UNIX UID listed in the local password file, and describesthe appropriate value to put in the file’s password field.

One reason to use uss commands is that they enable you to generate local password file entries automatically as part of accountcreation. See Creating a Common Source Password File.

Information similar to the information in this section appears in a corresponding section of Creating and Deleting User Accountswith the uss Command Suite, but is repeated here for your convenience

12.3.1 Assigning AFS and UNIX UIDs that Match

A user account is easiest to administer and use if the AFS user ID number (AFS UID) and UNIX UID match. All instructions inthe AFS documentation assume that they do.

The most basic reason to make AFS and UNIX UIDs the same is so that the owner name reported by the UNIX ls -l and ls -ldcommands makes sense for AFS files and directories. Following standard UNIX practice, the File Server records a number ratherthan a username in an AFS file or directory’s owner field: the owner’s AFS UID. When you issue the ls -l command, it translatesthe UID to a username according to the mapping in the local password file, not the AFS Protection Database. If the AFS andUNIX UIDs do not match, the ls -l command reports an unexpected (and incorrect) owner. The output can even vary on differentclient machines if their local password files map the same UNIX UID to different names.

Follow the recommendations in the indicated sections to make AFS and UNIX UIDs match when creating accounts for varioustypes of users:

• If creating an AFS account for a user who already has a UNIX UID, see Making UNIX and AFS UIDs Match.

• If some users in your cell have existing UNIX accounts but the user for whom you are creating an AFS account does not, thenit is best to allow the Protection Server to allocate an AFS UID automatically. To avoid overlap of AFS UIDs with existingUNIX UIDs, set the Protection Database’s max user id counter higher than the largest UNIX UID, using the instructionsin Displaying and Setting the AFS UID and GID Counters.

• If none of your users have existing UNIX accounts, allow the Protection Server to allocate AFS UIDs automatically, startingeither at its default or at the value you have set for the max user id counter.


12.3.2 Specifying Passwords in the Local Password File

Authenticating with AFS is easiest for your users if you install and configure an AFS-modified login utility, which logs a userinto the local file system and obtains an AFS token in one step. In this case, the local password file no longer controls a user’sability to login in most circumstances, because the AFS-modified login utility does not consult the local password file if the userprovides the correct AFS password. You can nonetheless use a password file entry’s password field (usually, the second field) inthe following ways to control login and authentication:

• To prevent both local login and AFS authentication, place an asterisk ( * ) in the field. This is useful mainly in emergencies,when you want to prevent a certain user from logging into the machine.

• To prevent login to the local file system if the user does not provide the correct AFS password, place a character string of anylength other than the standard thirteen characters in the field. This is appropriate if you want to allow only people with localAFS accounts to log into to your machines. A single X or other character is the most easily recognizable way to do this.


If you do not use an AFS-modified login utility, you must place a standard UNIX password in the local password file of everyclient machine the user will use. The user logs into the local file system only, and then must issue the klog command toauthenticate with AFS. It is simplest if the passwords in the local password file and the Authentication Database are the same,but this is not required.

12.4 Converting Existing UNIX Accounts

This section discusses the three main issues you need to consider if your cell has existing UNIX accounts that you wish to convertto AFS accounts.

12.4.1 Making UNIX and AFS UIDs Match

As previously mentioned, AFS users must have an entry in the local password file on every client machine from which theyaccess the AFS filespace as an authenticated user. Both administration and use are much simpler if the UNIX UID and AFS UIDmatch. When converting existing UNIX accounts, you have two alternatives:

• Make the AFS UIDs match the existing UNIX UIDs. In this case, you need to assign the AFS UID yourself by including the-id argument to the pts createuser command as you create the AFS account.

Because you are retaining the user’s UNIX UID, you do not need to alter the UID in the local password file entry. However,if you are using an AFS-modified login utility, you possibly need to change the password field in the entry. For a discussionof how the value in the password field affects login with an AFS-modified login utility, see Specifying Passwords in the LocalPassword File.

If now or in the future you need to create AFS accounts for users who do not have an existing UNIX UID, then you mustguarantee that new AFS UIDs do not conflict with any existing UNIX UIDs. The simplest way is to set the max user idcounter in the Protection Database to a value higher than the largest existing UNIX UID. See Displaying and Setting the AFSUID and GID Counters.

• Change the existing UNIX UIDs to match the new AFS UIDs that the Protection Server assigns automatically.

Allow the Protection Server to allocate the AFS UIDs automatically as you create AFS accounts. You must then alter the user’sentry in the local password file on every client machine to include the new UID.

There is one drawback to changing the UNIX UID: any files and directories that the user owned in the local file system beforebecoming an AFS user still have the former UID in their owner field. If you want the ls -l and ls -ld commands to display thecorrect owner, you must use the chown command to change the value to the user’s new UID, whether you are leaving the filein the local file system or moving it to AFS. See Moving Local Files into AFS.


12.4.2 Setting the Password Field Appropriately

Existing UNIX accounts already have an entry in the local password file, probably with a (scrambled) password in the passwordfield. You possibly need to change the value in the field, depending on the type of login utility you use:

• If the login utility is not modified for use with AFS, the actual password must appear (in scrambled form) in the local passwordfile entry.

• If the login utility is modified for use with AFS, choose one of the values discussed in Specifying Passwords in the LocalPassword File.

12.4.3 Moving Local Files into AFS

New AFS users with existing UNIX accounts probably already own files and directories stored in a machine’s local file system,and it usually makes sense to transfer them into the new home volume. The easiest method is to move them onto the local diskof an AFS client machine, and then use the UNIX mv command to transfer them into the user’s new AFS home directory.

As you move files and directories into AFS, keep in mind that the meaning of their mode bits changes. AFS ignores the secondand third sets of mode bits (group and other), and does not use the first set (the owner bits) directly, but only in conjunction withentries on the ACL (for details, see How AFS Interprets the UNIX Mode Bits). Be sure that the ACL protects the file or directoryat least as securely as the mode bits.

If you have chosen to change a user’s UNIX UID to match a new AFS UID, you must change the ownership of UNIX files anddirectories as well. Only members of the system:administrators group can issue the chown command on files and directoriesonce they reside in AFS.

12.5 Creating AFS User Accounts

There are two methods for creating user accounts. The preferred method--using the uss commands--enables you to create multipleaccounts with a single command. It uses a template to define standard values for the account components that are the same foreach user (such as quota), but provide differing values for more variable components (such as username). See Creating andDeleting User Accounts with the uss Command Suite.

The second method involves issuing a separate command to create each component of the account. It is best suited to creationof one account at a time, since some of the commands can create only one instance of the relevant component. To review thefunction of each component, see The Components of an AFS User Account.

Use the following instructions to create any of the three types of user account, which differ in their levels of functionality. For adescription of the types, see Configuring AFS User Accounts.

• To create an authentication-only account, perform Step 1 through Step 4 and also Step 14. This type of account consists onlyof entries in the Authentication Database and Protection Database.

• To create a basic account, perform Step 1 through Step 8 and Step 11 through Step 14. In addition to Authentication Databaseand Protection Database entries, this type of account includes a volume mounted at the home directory with owner and ACLset appropriately.

• To create a full account, perform all steps in the following instructions. This type of account includes configuration files forbasic functions such as logging in, printing, and mail delivery, making it more convenient and useful. For a discussion of someuseful types of configuration files, see Creating Standard Files in New AFS Accounts.

12.5.1 To create one user account with individual commands

1. Decide on the value to assign to each of the following account components. If you are creating an authentication-onlyaccount, you need to pick only a username, AFS UID, and initial password.


• The username. By convention, the names of many components of the user account incorporate this name. For a discus-sion of restrictions and suggested naming schemes, see Choosing Usernames and Naming Other Account Components.

• The AFS UID, if you want to assign a specific one. It is generally best to have the Protection Server allocate one instead,except when you are creating an AFS account for a user who already has an existing UNIX account. In that case,migrating the user’s files into AFS is simplest if you set the AFS UID to match the existing UNIX UID. See ConvertingExisting UNIX Accounts.

• The initial password. Advise the user to change this at the first login, using the password changing instructions in theOpenAFS User Guide.

• The name of the user’s home volume. The conventional name is user.username (for example, user.smith).

• The volume’s site (disk partition on a file server machine). Some cells designate certain machines or partitions for uservolumes only, or it possibly makes sense to place the volume on the emptiest partition that meets your other criteria. Todisplay the size and available space on a partition, use the vos partinfo command, which is fully described in CreatingRead/write Volumes.

• The name of the user’s home directory (the mount point for the home volume). The conventional location is a directory(or one of a set of directories) directly under the cell directory, such as /afs/cellname/usr. For suggestions on how toavoid the slowed directory lookup that can result from having large numbers of user home directories in a single usrdirectory, see Evenly Distributing User Home Directories with the G Instruction.

• The volume’s space quota. Include the -maxquota argument to the vos create command, or accept the default quota of5000 KB.

• The ACL on the home directory. By default, the ACL on every new volume grants all seven permissions to the sys-tem:administrators group. After volume creation, use the fs setacl command to remove the entry if desired, and togrant all seven permissions to the user.








• The ADMIN flag on your Authentication Database entry. However, the Authentication Server performs its own authenti-cation, so in Step 4 you specify an administrative identity on the kas command line itself.

• The i (insert) and a (administer) permissions on the ACL of the directory where you are mounting the user’s volume.If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.



• Knowledge of the password for the local superuser root.

3. Issue the pts createuser command to create an entry in the Protection Database. For a discussion of setting AFS UIDs, seeAssigning AFS and UNIX UIDs that Match. If you are converting an existing UNIX account into an AFS account, alsosee Converting Existing UNIX Accounts.


% pts createuser <user name> [<user id>]

where

cu Is an acceptable alias for createuser (and createu is the shortest acceptable abbreviation).

user name Specifies the user’s username (the character string typed at login). It is best to limit the name to eight or fewerlowercase letters, because many application programs impose that limit. The AFS servers themselves accept namesof up to 63 lowercase letters. Also avoid the following characters: colon (:), semicolon (;), comma (,), at sign (@),space, newline, and the period (.), which is conventionally used only in special administrative names.

user id Is optional and appropriate only if the user already has a UNIX UID that the AFS UID must match. If you do notprovide this argument, the Protection Server assigns one automatically based on the counter described in Displayingand Setting the AFS UID and GID Counters. If the ID you specify is less than 1 (one) or is already in use, an errorresults.

4. Issue the kas create command to create an entry in the Authentication Database. To avoid having the user’s temporaryinitial password echo visibly on the screen, omit the -initial_password argument; instead enter the password at the promptsthat appear when you omit the argument, as shown in the following syntax specification.


% kas create <name of user> \-admin <admin principal to use for authentication>

Administrator’s (admin_user) password: <admin_password>initial_password: <initial_password>Verifying, please re-enter initial_password: <initial_password>

where

cr Is the shortest acceptable abbreviation for create.

name of user Specifies the same username as in Step 3.


initial_password Specifies the initial password as a string of eight characters or less, to comply with the length restrictionthat some applications impose. Possible choices for an initial password include the username, a string of digitsfrom a personal identification number such as the Social Security number, or a standard string such as changeme.Instruct the user to change the string to a truly secret password as soon as possible by using the kpasswd commandas described in the IBM AFS User Guide.

5. Issue the vos create command to create the user’s volume.

% vos create <machine name> <partition name> <volume name> \[-maxquota <initial quota (KB)>]

where


machine name Names the file server machine on which to place the new volume.

partition name Names the partition on which to place the new volume.

volume name Names the new volume. The name can include up to 22 characters. By convention, user volume nameshave the form user.username, where username is the name assigned in Step 3.

-maxquota Sets the volume’s quota, as a number of kilobyte blocks. If you omit this argument, the default is 5000 KB.

6. Issue the fs mkmount command to mount the volume in the filespace and create the user’s home directory.



where

mk Is the shortest acceptable abbreviation for mkmount.directory Names the mount point to create. A directory of the same name must not already exist. Partial pathnames are

interpreted relative to the current working directory. By convention, user home directories are mounted in a directorycalled something like /afs/.cellname/usr, and the home directory name matches the username assigned in Step 3.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to create the newmount point in a read-only volume. By convention, you indicate the read/write path by placing a period before thecell name at the pathname’s second level (for example, /afs/.example.com). For further discussion of the concept ofread/write and read-only paths through the filespace, see The Rules of Mount Point Traversal.

volume name Is the name of the volume created in Step 5.

7. (Optional) Issue the fs setvol command with the -offlinemsg argument to record auxiliary information about the volumein its volume header. For example, you can record who owns the volume or where you have mounted it in the filespace.To display the information, use the fs examine command.

% fs setvol <dir/file path> -offlinemsg <offline message>

where

sv Is an acceptable alias for setvol (and setv the shortest acceptable abbreviation).

dir/file path Names the mount point of the volume with which to associate the message. Partial pathnames are interpretedrelative to the current working directory.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to change a read-onlyvolume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-onlypaths through the filespace, see The Rules of Mount Point Traversal.

-offlinemsg Specifies up to 128 characters of auxiliary information to record in the volume header.

8. Issue the fs setacl command to set the ACL on the new home directory. At the least, create an entry that grants allpermissions to the user, as shown.

You can also use the command to edit or remove the entry that the vos create command automatically places on the ACLfor a new volume’s root directory, which grants all permissions to the system:administrators group. Keep in mind thateven if you remove the entry, the members of the group by default have implicit a (administer) and by default l (lookup)permissions on every ACL, and can grant themselves other permissions as required.

For detailed instructions for the fs setacl command, see Setting ACL Entries.

% fs setacl <directory> -acl <user name> all \[system:administrators desired_permissions]

9. (Optional) Create configuration files and subdirectories in the new home directory. Possibilities include .login and .logoutfiles, a shell-initialization file such as .cshrc, files to help with printing and mail delivery, and so on.

If you are converting an existing UNIX account into an AFS account, you possibly wish to move some files and directoriesinto the user’s new AFS home directory. See Converting Existing UNIX Accounts.

10. (Optional) In the new .login or shell initialization file, define the user’s $PATH environment variable to include thedirectories where AFS binaries are kept (for example, the /usr/afsws/bin and /usr/afsws/etc directories).

11. In Step 12 and Step 14, you must know the user’s AFS UID. If you had the Protection Server assign it in Step 3, youprobably do not know it. If necessary, issue the pts examine command to display it.

% pts examine <user or group name or id>

where



user or group name or id Is the username that you assigned in Step 3.

The first line of the output displays the username and AFS UID. For further discussion and an example of the output, seeDisplaying Information from the Protection Database.

12. Designate the user as the owner of the home directory and any files and subdirectories created or moved in Step 9. Specifythe owner by the AFS UID you learned in Step 11 rather than by username. This is necessary for new accounts becausethe user does not yet have an entry in your local machine’s password file (/etc/passwd or equivalent). If you are convertingan existing UNIX account, an entry possibly already exists, but the UID is possibly incorrect. In that case, specifying ausername means that the corresponding (possibly incorrect) UID is recorded as the owner.

Some operating systems allow only the local superuser root to issue the chown command. If necessary, issuing the sucommand before the chown command.

% chown new_owner_ID directory

where

new_owner_ID Is the user’s AFS UID, which you learned in Step 11.

directory Names the home directory you created in Step 6, plus each subdirectory or file you created in Step 9.

13. If the new user home directory resides in a replicated volume, use the vos release command to release the volume, asdescribed in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). Suppose, for example, that the Example Corporation puts the mount points foruser volumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, itresides in the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changingit by creating a new mount point the administrator must issue the vos release command.

14. Create or modify an entry for the new user in the local password file (/etc/passwd or equivalent) of each machine the usercan log onto. Remember to make the UNIX UID the same as the AFS UID you learned in Step 11, and to fill the passwordfield appropriately (for instructions, see Specifying Passwords in the Local Password File).

12.6 Improving Password and Authentication Security

AFS provides several optional features than can help to protect your cell’s filespace against unauthorized access. The followinglist summarizes them, and instructions follow.

• Limit the number of consecutive failed login attempts.

One of the most common ways for an unauthorized user to access your filespace is to guess an authorized user’s password.This method of attack is most dangerous if the attacker can use many login processes in parallel or use the RPC interfacesdirectly.

To protect against this type of attack, use the -attempts argument to the kas setfields command to limit the number of timesthat a user can consecutively fail to enter the correct password when using either an AFS-modified login utility or the klogcommand. When the limit is exceeded, the Authentication Server locks the user’s Authentication Database entry (disallowsauthentication attempts) for a period of time that you define with the -locktime argument to the kas setfields command. Ifdesired, system administrators can use the kas unlock command to unlock the entry before the complete lockout time passes.

In certain circumstances, the mechanism used to enforce the number of failed authentication attempts can cause a lockout eventhough the number of failed attempts is less than the limit set by the -attempts argument. Client-side authentication programs


such as klog and an AFS-modified login utility normally choose an Authentication Server at random for each authenticationattempt, and in case of a failure are likely to choose a different Authentication Server for the next attempt. The AuthenticationServers running on the various database server machines do not communicate with each other about how many times a userhas failed to provide the correct password to them. Instead, each Authentication Server maintains its own separate copy ofthe auxiliary database file kaserverauxdb (located in the /usr/afs/local directory by default), which records the number ofconsecutive authentication failures for each user account and the time of the most recent failure. This implementation meansthat on average each Authentication Server knows about only a fraction of the total number of failed attempts. The only wayto avoid allowing more than the number of attempts set by the -attempts argument is to have each Authentication Server allowonly some fraction of the total. More specifically, if the limit on failed attempts is f, and the number of Authentication Serversis S, then each Authentication Server can only permit a number of attempts equal to f divided by S (the Ubik synchronizationsite for the Authentication Server tracks any remainder, f mod S).

Normally, this implementation does not reduce the number of allowed attempts to less than the configured limit (f ). If oneAuthentication Server refuses an attempt, the client contacts another instance of the server, continuing until either it successfullyauthenticates or has contacted all of the servers. However, if one or more of the Authentication Server processes is unavailable,the limit is effectively reduced by a percentage equal to the quantity U divided by S, where U is the number of unavailableservers and S is the number normally available.

To avoid the undesirable consequences of setting a limit on failed authentication attempts, note the following recommendations:

– Do not set the -attempts argument (the limit on failed authentication attempts) too low. A limit of nine failed attempts isrecommended for regular user accounts, to allow three failed attempts per Authentication Server in a cell with three databaseserver machines.

– Set fairly short lockout times when including the -locktime argument. Although guessing passwords is a common method ofattack, it is not a very sophisticated one. Setting a lockout time can help discourage attackers, but excessively long times arelikely to be more of a burden to authorized users than to potential attackers. A lockout time of 25 minutes is recommendedfor regular user accounts.

– Do not assign an infinite lockout time on an account (by setting the -locktime argument to 0 [zero]) unless there is a highlycompelling reason. Such accounts almost inevitably become locked at some point, because each Authentication Servernever resets the account’s failure counter in its copy of the kaauxdb file (in contrast, when the lockout time is not infinite,the counter resets after the specified amount of time has passed since the last failed attempt to that Authentication Server).Furthermore, the only way to unlock an account with an infinite lockout time is for an administrator to issue the kas unlockcommand. It is especially dangerous to set an infinite lockout time on an administrative account; if all administrativeaccounts become locked, the only way to unlock them is to shut down all instances of the Authentication Server and removethe kaauxdb file on each.

In summary, the recommended limit on authentication attempts is nine and lockout time 25 minutes.

• Limit password lifetime.

The longer a password is in use, the more time an attacker has to try to learn it. To protect against this type of attack, usethe -pwexpires argument to the kas setfields command to limit how many days a user’s password is valid. The user becomesunable to authenticate with AFS after the password expires, but has up to 30 days to use the kpasswd command to set a newpassword. After the 30 days pass, only an administrator who has the ADMIN flag on the Authentication Database entry canchange the password.

If you set a password lifetime, many AFS-modified login utilities (but not the klog command) set the PASSWORD_EXPIRESenvironment variable to the number of days remaining until the password expires. A setting of zero means that the passwordexpires today. If desired, you can customize your users’ login scripts to display the number of days remaining before expirationand even prompt for a password change when a small number of days remain before expiration.

• Prohibit reuse of passwords.

Forcing users to select new passwords periodically is not effective if they simply set the new password to the current value. Toprevent a user from setting a new password to a string similar to any of the last 20 passwords, use the -reuse argument to thekas setfields command.

If you prohibit password reuse and the user specifies an excessively similar password, the Authentication Server generates thefollowing message to reject it:

Password was not changed because it seems like a reused password


A persistent user can try to bypass this restriction by changing the password 20 times in quick succession (or running a scriptto do so). If you believe this is likely to be a problem, you can include the -minhours argument to the kaserver initializationcommand (for details, see the command’s reference page in the OpenAFS Administration Reference. If the user attempts tochange passwords too frequently, the following message appears.

Password was not changed because you changed it too recently; seeyour systems administrator

• Check the quality of new passwords.

You can impose a minimum quality standard on passwords by writing a script or program called kpwvalid. If the kpwvalidfile exists, the kpasswd and kas setpassword command interpreters invoke it to check a new password. If the password doesnot comply with the quality standard, the kpwvalid program returns an appropriate code and the command interpreter rejectsthe password.

The kpwvalid file must be executable, must reside in the same AFS directory as the kpasswd and kas binaries, and itsdirectory’s ACL must grant the w (write) permission only to the system:administrators group.

If you choose to write a kpwvalid program, consider imposing standards such as the following.

– A minimum length

– Words found in the dictionary are prohibited

– Numbers, punctuation, or both must appear along with letters

The AFS distribution includes an example kpwvalid program. See the kpwvalid reference page in the OpenAFS Administra-tion Reference.

12.6.1 To limit the number of consecutive failed authentication attempts

1. Issue the kas setfields command with the -attempts and -locktime arguments.


% kas setfields <name of user> \-admin <admin principal to use for authentication> \-attempts <maximum successive failed login tries ([0..254])> \-locktime <failure penalty [hh:mm or minutes]>


where

name of user Names the Authentication Database entry to edit.

-admin Names an administrative account that has the ADMIN flag on its Authentication Database entry, such as the adminaccount. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

-attempts Specifies the maximum consecutive number of times that a user can fail to provide the correct password duringauthentication (via the klog command or an AFS-modified login utility) before the Authentication Server refusesfurther attempts for the amount of time specified by the -locktime argument. The range of valid values is 0 (zero)through 254. If you omit this argument or specify 0, the Authentication Server allows an unlimited number of failures.

-locktime Specifies how long the Authentication Server refuses authentication attempts after the user exceeds the failurelimit specified by the -attempts argument.Specify a time in either hours and minutes (hh:mm) or minutes only (mm), from the range 01 (one minute) through36:00 (36 hours). The kas command interpreter automatically reduces any larger value to 36:00 and also rounds upeach nonzero value to the next-higher multiple of 8.5 minutes.It is best not to provide a value of 0 (zero), especially on administrative accounts, because it sets an infinite lockouttime. An administrator must always issue the kas unlock command to unlock such an account.


12.6.2 To unlock a locked user account

1. Issue the kas command to enter interactive mode.


% kas -admin <admin principal to use for authentication>Administrator’s (admin_user) password: <admin_password>ka>

where -admin names an administrative account that has the ADMIN flag on its Authentication Database entry, such asadmin. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

2. Issue the (kas) examine command to verify that the user’s account is in fact locked, as indicated by the message shown:

ka> examine <name of user>User is locked until time

3. Issue the (kas) unlock command to unlock the account.

ka> unlock <authentication ID>

where

u Is the shortest acceptable abbreviation of unlock.

authentication ID Names the Authentication Database entry to unlock.

12.6.3 To set password lifetime

1. Issue the kas setfields command with the -pwexpires argument.


% kas setfields <name of user> \-pwexpires <number days password is valid [0..254])> \-admin <admin principal to use for authentication>


where

name of user Specifies the Authentication Database entry on which to impose a password expiration.

-pwexpires Sets the number of days after the user’s password was last changed that it remains valid. Provide an integerfrom the range 1 through 254 to specify the number of days until expiration.When the password becomes invalid (expires), the user is unable to authenticate, but has 30 more days in whichto issue the kpasswd or kas setpassword command to change the password (after that, only an administrator canchange it). Note that the clock starts at the time the password was last changed, not when the kas setfields commandis issued. To avoid retroactive expiration, have the user change the password just before issuing the command.



12.6.4 To prohibit reuse of passwords

1. Issue the kas setfields command with the -reuse argument.


% kas setfields <name of user> -reuse < permit password reuse (yes/no)> \-admin <admin principal to use for authentication>


where

name of user Names the Authentication Database entry for which to set the password reuse policy.

-reuse Specifies whether the Authentication Server allows reuse of passwords similar to any of the user’s last 20 pass-words. Specify the value no to prohibit reuse, or the value yes to reinstate the default of allowing password reuse.


12.7 Changing AFS Passwords

After setting an initial password during account creation, you normally do not need to change user passwords, since they can usethe kpasswd command themselves by following the instructions in the OpenAFS User Guide. In the rare event that a user forgetsthe password or otherwise cannot log in, you can use the kas setpassword command to set a new password.

If entries in the local password file (/etc/passwd or equivalent) have actual scrambled passwords in their password field, remem-ber to change the password there also. For further discussion, see Specifying Passwords in the Local Password File.

12.7.1 To change an AFS password

1. Issue the kas setpassword command to change the password. To avoid having the new password echo visibly on thescreen, omit the -new_password argument; instead enter the password at the prompts that appear when you omit theargument, as shown.


% kas setpassword <name of user> \-admin <admin principal to use for authentication>

Administrator’s (admin_user) password: <admin_password>new_password: <new_password>Verifying, please re-enter new_password: <new_password>

where


name of user Names the Authentication Database entry for which to set the password.


new_password Specifies the user’s new password. It is subject to the restrictions imposed by the kpwvalid program, ifyou use it.


12.8 Displaying and Setting the Quota on User Volumes

User volumes are like all other volumes with respect to quota. Each new AFS volume has a default quota of 5000 KB, unlessyou use the -maxquota argument to the vos create command to set a different quota. You can also use either of the followingcommands to change quota at any time:

• fs setquota

• fs setvol

You can use any of the three following commands to display a volume’s quota:

• fs quota

• fs listquota

• fs examine

For instructions, see Setting and Displaying Volume Quota and Current Size.

12.9 Changing Usernames

By convention, many components of a user account incorporate the username, including the Protection and AuthenticationDatabase entries, the volume name and the home directory name. When changing a username, it is best to maintain consistencyby changing the names of all components, so the procedure for changing a username has almost as many steps as the procedurefor creating a new user account.

12.9.1 To change a username








• The ADMIN flag on the Authentication Database entry. However, the Authentication Server performs its own authenti-cation, so the following instructions direct you to specify an administrative identity on the kas command line itself.

• The a (administer), d (delete), and i (insert) permissions on the ACL of the directory where you are removing thecurrent mount point and creating a new one. If necessary, issue the fs listacl command, which is fully described inDisplaying ACLs.




2. Issue the pts listowned command to display the names of the groups the user owns. After you change the username in theProtection Database in Step 3, you must issue the pts rename command to change each group’s owner prefix to match thenew name, because the Protection Server does not automatically make this change. For a complete description of the ptslistowned command, see Displaying Information from the Protection Database.

% pts listowned <user or group name or id>

3. Issue the pts rename command to change the user’s name in the Protection Database.

% pts rename <old name> <new name>

4. Issue the pts rename command to change the group names you noted in Step 2, so that their owner prefix (the part of thegroup name before the colon) accurately reflects the owner’s new name.

Repeat the command for each group. Step 3 details its syntax.


5. Issue the kas command to enter interactive mode.


% kas -admin <admin principal to use for authentication>Administrator’s (admin_user) password: <admin_password>ka>

where -admin names an administrative account that has the ADMIN flag on its Authentication Database entry, such asadmin. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

6. Issue the (kas) delete command to delete the user’s existing Authentication Database entry.

ka> delete <name of user>

where

del Is the shortest acceptable abbreviation for delete, or you can use the alias rm.

name of user Names the Authentication Database entry to delete.

7. Issue the (kas) create command to create an Authentication Database entry for the new username. To avoid having theuser’s password echo visibly on the screen, do not include the -initial_password argument; instead enter the password atthe prompts that appear in that case, as shown in the following syntax specification.

ka> create <name of user>initial_password: <password>Verifying, please re-enter initial_password: <password>

where

cr Is the shortest acceptable abbreviation for create.

name of user Specifies the new username.

password Specifies the password for the new user account. If the user is willing to tell you his or her current password,you can retain it. Otherwise, provide a string of eight characters or less to comply with the length restriction that someapplications impose. Possible choices for an initial password include the username, a string of digits from a personalidentification number such as the Social Security number, or a standard string such as changeme. Instruct the user tochange the string to a truly secret password as soon as possible by using the kpasswd command as instructed in theOpenAFS User Guide.


8. Issue the quit command to leave interactive mode.

ka> quit

9. Issue the vos rename command to change the name of the user’s volume. For complete syntax, see To rename a volume.

% vos rename <old volume name> <new volume name>

10. Issue the fs rmmount command to remove the existing mount point. For the directory argument, specify the read/writepath to the mount point, to avoid the failure that results when you attempt to delete a mount point from a read-only volume.


11. Issue the fs mkmount command to create a mount point for the volume’s new name. Specify the read/write path to themount point for the directory argument, as in the previous step. For complete syntax, see Step 6 in To create one useraccount with individual commands.


12. If the changes you made in Step 10 and Step 11 are to a mount point that resides in a replicated volume, use the vos releasecommand to release the volume, as described in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). For example, the Example Corporation template puts the mount points for uservolumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, it residesin the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changing it theadministrator must issue the vos release command.

12.10 Removing a User Account

Before removing an account, it is best to make a backup copy of the user’s home volume on a permanent storage medium such astape. If you need to remove several accounts, it is probably more efficient to use the uss delete command instead; see DeletingIndividual Accounts with the uss delete Command.

12.10.1 To remove a user account









• The ADMIN flag on the Authentication Database entry. However, the Authentication Server performs its own authenti-cation, so the following instructions direct you to specify an administrative identity on the kas command line itself.

• The d (delete) permission on the ACL of the directory where you are removing the user volume’s mount point. Ifnecessary, issue the fs listacl command, which is fully described in Displaying ACLs.



2. (Optional) If it is possible you need to restore the user’s account someday, note the username and AFS UID, possibly in afile designated for that purpose. You can later restore the account with its original AFS UID.

3. (Optional) Copy the contents of the user’s volume to tape. You can use the vos dump command as described in Dumpingand Restoring Volumes or the AFS Backup System as described in Backing Up Data.

4. (Optional) If you intend to remove groups that the user owns from the Protection Database after removing the user’sentry, issue the pts listowned command to display them. For complete instructions, see Displaying Information from theProtection Database.


5. (Optional) Issue the pts delete command to remove the groups the user owns. However, if it is likely that other users haveplaced the groups on the ACLs of directories they own, it is best not to remove them.

% pts delete <user or group name or id>+

where

del Is the shortest acceptable abbreviation for delete.

user or group name or id Specifies the name or AFS UID of each group displayed in the output from Step 4.

6. Issue the kas delete command to remove the user’s Authentication Database entry.


% kas delete <name of user> \-admin <admin principal to use for authentication>


where

d Is the shortest acceptable abbreviation for delete.

name of user Names the Authentication Database entry to delete.


7. Issue the vos listvldb command to display the site of the user’s home volume in preparation for removing it. By convention,user volumes are named user.username.


where


listvl Is the shortest acceptable abbreviation of listvldb.

volume name or ID Specifies the volume’s name or volume ID number.

8. Issue the vos remove command to remove the user’s volume. It automatically removes the backup version of the volume,if it exists. It is not conventional to replicate user volumes, so the command usually also completely removes the volume’sentry from the Volume Location Database (VLDB). If there are ReadOnly replicas of the volume, you must repeat the vosremove command to remove each one individually.

% vos remove <machine name> <partition name> <volume name or ID>

where

remo Is the shortest acceptable abbreviation of remove.

machine name Names the file server machine that houses the volume, as specified in the output from Step 7.

partition name Names the partition that houses the volume, as specified in the output from Step 7.

volume name or ID Specifies the volume’s name or ID number.

9. Issue the fs rmmount command to remove the volume’s mount point.

If you mounted the user’s backup volume as a subdirectory of the home directory, then this command is sufficient tounmount the backup version as well. If you mounted the backup version at an unrelated location in the filespace, repeatthe fs rmmount command for it.


where

rmm Is the shortest acceptable abbreviation of rmmount.directory Names the mount point for the volume’s previous name (the former home directory). Partial pathnames are

interpreted relative to the current working directory.Specify the read/write path to the mount point, to avoid the failure that results when you attempt to delete a mountpoint from a read-only volume. By convention, you indicate the read/write path by placing a period before the cellname at the pathname’s second level (for example, /afs/.example.com). For further discussion of the concept ofread/write and read-only paths through the filespace, see Mounting Volumes.

10. Issue the pts delete command to remove the user’s Protection Database entry. A complete description of this commandappears in Step 5.

% pts delete <user or group name or id>

11. If the deleted user home directory resided in a replicated volume, use the vos release command to release the volume, asdescribed in To replicate a read/write volume (create a read-only volume).


NoteThis step can be necessary even if the home directory’s parent directory is not itself a mount point for a replicated volume(and is easier to overlook in that case). For example, the Example Corporation template puts the mount points for uservolumes in the /afs/example.com/usr directory. Because that is a regular directory rather than a mount point, it residesin the root.cell volume mounted at the /afs/example.com directory. That volume is replicated, so after changing it bydeleting a mount point the administrator must issue the vos release command.


Chapter 13

Administering the Protection Database

This chapter explains how to create and maintain user, machine, and group entries in the Protection Database.



Display Protection Database entry pts examineMap user, machine or group name to AFS ID pts examineDisplay entry’s owner or creator pts examineDisplay number of users or machines belonging to group pts examineDisplay number of groups user or machine belongs to pts examineDisplay group-creation quota pts examineDisplay entry’s privacy flags pts examineDisplay members of group, or groups that user or machine belongs to pts membershipDisplay groups that user or group owns pts listownedDisplay all entries in Protection Database pts listentriesCreate machine entry pts createuserCreate group entry pts creategroupAdd users and machines to groups pts adduserRemove users and machines from groups pts removeuserDelete machine or group entry pts deleteChange a group’s owner pts chownChange an entry’s name pts renameSet group creation quota pts setfieldsSet entry’s privacy flags pts setfieldsDisplay AFS ID counters pts listmaxSet AFS ID counters pts setmax

13.2 About the Protection Database

The Protection Database stores information about AFS users, client machines, and groups which the File Server process uses todetermine whether clients are authorized to access AFS data.

To obtain authenticated access to an AFS cell, a user must have an entry in the cell’s Protection Database. The first time that auser requests access to the data stored on a file server machine, the File Server on that machine contacts the Protection Server torequest the user’s current protection subgroup (CPS), which lists all the groups to which the user belongs. The File Server scans


the access control list (ACL) of the directory that houses the data, looking for groups on the CPS. It grants access in accordancewith the permissions that the ACL extends to those groups or to the user individually. (The File Server stores the CPS and usesit as long as the user has the same tokens. When a user’s group membership changes, he or she must reauthenticate for the FileServer to recognize the change.)

Only administrators who belong to the cell’s system:administrators group can create user entries (the group is itself definedin the Protection Database, as discussed in The System Groups). Members of the system:administrators group can also createmachine entries, which can then be used to control access based on the machine from which the access request originates. Aftercreating a machine entry, add it to a Protection Database group and place the group on ACLs (a machine cannot appear on ACLsdirectly). A machine entry can represent a single machine or multiple machines with consecutive IP addresses as specified bya wildcard notation. For instructions, see Creating User and Machine Entries. Because all replicas of a volume share the sameACL (the one on the volume’s root directory mount point), machine entries enable you to replicate the volume that houses aprogram’s binary file while still complying with a machine-based license agreement as required by the program’s manufacturer.See Creating User and Machine Entries.

A group entry is a list of user entries, machine entries, or both (groups cannot belong to other groups). Putting a group on an ACLis a convenient way to extend or deny access to a set of users without listing them on the ACL individually. Similarly, addingusers to a group automatically grants them access to all files and directories for which the associated ACL lists that group. Bothadministrators and regular users can create groups.

13.2.1 The System Groups

In addition to the groups that users and administrators can create, AFS defines the following three system groups. The ProtectionServer creates them automatically when it builds the first version of a cell’s Protection Database, and always assigns them thesame AFS GIDs.

system:anyuser Represents all users able to access the cell’s filespace from the local and foreign cells, authenticated or not. ItsAFS GID is -101. The group has no stable membership listed in the Protection Database. Accordingly, the pts examinecommand displays 0 in its membership field, and the pts membership command does not list any members for it.

Placing this group on an ACL is a convenient way to extend access to all users. The File Server automatically places thisgroup on the CPS of any user who requests access to data stored on a file server machine. (Every unauthenticated user isassigned the identity anonymous and this group is the only entry on the CPS for anonymous.)

system:authuser Represents all users who are able to access the cell’s filespace from the local and foreign cells and who havesuccessfully obtained an AFS token in the local cell (are authenticated). Its AFS GID is -102. Like the system:anyusergroup, it has no stable membership listed in the Protection Database. Accordingly, the pts examine command displays 0in its membership field, and the pts membership command does not list any members for it.

Placing this group on an ACL is therefore a convenient way to extend access to all authenticated users. The File Serverautomatically places this group on the CPS of any authenticated user who requests access to data stored on a file servermachine.

system:administrators Represents the small number of cell administrators authorized to issue privileged pts commands andthe fs commands that set quota. The ACL on the root directory of every newly created volume grants all permissionsto the group. Even if you remove that entry, the group implicitly retains the a (administer), and by default also the l(lookup), permission on every ACL. Its AFS GID is -204. For instructions on administering this group, see Administeringthe system:administrators Group.

13.3 Displaying Information from the Protection Database

This section describes the commands you can use to display Protection Database entries and associated information. In additionto name and AFS ID, the Protection Database stores the following information about each user, machine, or group entry.

• The entry’s owner, which is the user or group of users who can administer the entry

• The entry’s creator, which serves mostly as an audit trail


• A membership count, which indicates how many groups a user or machine belongs to, or how many members belong to agroup

• A set of privacy flags, which control which users can administer or display information about the entry

• A group-creation quota, which defines how many groups a user can create

• A list of the groups to which a user or machine belongs, or of the users and machines that belong to a group

• A list of the groups that a user or group owns

13.3.1 To display a Protection Database entry

1. Verify that you belong to the system:administrators group, which enables you to display an entry regardless of the settingof its first (s) privacy flag. By default, any user can display a Protection Database entry. If necessary, issue the ptsmembership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts examine command to display one or more Protection Database entries.

% pts examine <user or group name or id>+

where

e Is the shortest acceptable abbreviation of examine (and check is an alias).user or group name or id Specifies the name or AFS ID of each entry to display. Precede any AFS GID with a hyphen

(-) because it is a negative integer.

The output includes the following fields. Examples follow.

Name Specifies the entry’s name.

• For a user, this is the name used when authenticating with AFS and the name that appears on ACL entries.• For a machine, this is the IP address of a single machine, or a wildcard notation that represents a group of machines with

consecutive IP addresses, as described in Creating User and Machine Entries.• For a group, this is the name that appears on ACL entries and in the list of groups output by the pts membership

command. The names of regular groups have two parts, separated by a colon (:). The part before the colon indicates thegroup’s owner, and the part after is the unique name. A prefix-less group’s name does not have the owner prefix; onlymembers of the system:administrators group can create prefix-less groups. For further discussion of group names, seeCreating Groups.

id Specifies the entry’s unique AFS identification number. For user and machine entries, the AFS user ID (AFS UID) is apositive integer; for groups, the AFS group ID (AFS GID) is a negative integer. AFS UIDs and GIDs have the samefunction as their counterparts in the UNIX file system, but are used by the AFS servers and the Cache Manager only.

Normally, the Protection Server assigns an AFS UID or GID automatically when you create Protection Database entries.Members of the system:administrators group can specify an ID if desired. For further discussion, see Creating User andMachine Entries and Creating Groups.

owner Names the user or group who owns the entry and therefore can administer it (for more information about a group owninganother group, see Using Groups Effectively). Other users possibly have administrative privileges, too, depending on thesetting of the entry’s privacy flags. For instructions on changing the owner, see Changing a Group’s Owner.

creator Names the user who created the entry, and serves as an audit trail. If the entry is deleted from the Protection Database,the creator’s group creation quota increases by one, even if the creator no longer owns the entry; see Setting Group-CreationQuota.

The value anonymous in this field generally indicates that the entry was created when the Protection Server was runningin no-authentication mode, probably during initial configuration of the cell’s first file server machine. For a description ofno-authentication mode, see Managing Authentication and Authorization Requirements.


membership Specifies the number of groups to which the user or machine belongs, or the number of users or machines thatbelong to the group.

flags Specifies who can display or change information in a Protection Database entry. The five flags, each representing adifferent capability, always appear in the same order.

• For user entries, the default value is S----, which indicates that anyone can issue the pts examine command on theentry, but only the user and members of the system:administrators group can perform any other action.

• For machine entries, the default value is S----, which indicates that anyone can issue the pts examine command onthe entry, but only members of the system:administrators group can perform any other action.

• For group entries, the default value is S-M--, which indicates that anyone can issue the pts examine and pts member-ship commands on the entry, but only the group’s owner and members of the system:administrators group can performany other action.

For a complete description of possible values for the flags, see Setting the Privacy Flags on Database Entries.

group quota Specifies how many more groups a user can create in the Protection Database. The value for a newly createduser entry is 20, but members of the system:administrators group can issue the pts setfields command at any time tochange the value; see Setting Group-Creation Quota.

Group creation quota has no meaning for a machine or group entry: the Protection Server recognizes the issuer of the ptscreategroup command only as an authenticated user or as the anonymous user, never as a machine or group. The defaultvalue for group entries is 0 (zero), and there is no reason to change it.

The following examples show the output for a user called pat, a machine with IP address 192.12.108.133 and a group calledterry:friends:

% pts examine patName: pat, id: 1020, owner: system:administrators, creator: admin,membership: 12, flags: S----, group quota: 15.

% pts ex 192.12.108.133Name: 192.12.108.133, id: 5151, owner: system:administrators, creator: admin,membership: 1, flags: S----, group quota: 20.

% pts examine terry:friendsName: terry:friends, id: -567, owner: terry, creator: terry,membership: 12, flags: SOm--, group quota: 0.

13.3.2 To display group membership

1. Verify that you belong to the system:administrators group, which enables you to display an entry’s group membershipinformation regardless of the setting of its third (m) privacy flag. By default the owner and the user can display groupmembership for a user entry, the owner for a machine entry, and anyone for a group entry. If necessary, issue the ptsmembership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts membership command to display the list of groups to which a user or machine belongs, or the list of usersand machines that belong to a group.

% pts membership <user or group name or id>+

where

m Is the shortest acceptable abbreviation of membership.

user or group name or id Specifies the name or AFS UID of each user or machine for which to list the groups it belongsto, or the name or AFS GID of each group for which to list the members.

For user and machine entries, the output begins with the following string, and then each group appears on its own line:


Groups user_or_machine (id: AFS_UID) is a member of:

For group entries, the output begins with the following string, and then each member appears on its own line:

Members of group (id: AFS_GID) are:

For the system groups system:anyuser and system:authuser, the output includes the initial header string only, because thesegroups do not have a stable membership listed in their Protection Database entry. See The System Groups.

The following examples show the output for a user called terry and a group called terry:friends:

% pts mem terryGroups terry (id: 5347) is a member of:pat:friendssalesacctg:general

% pts mem terry:friendsMembers of terry:friends (id: -567) are:patsmithjohnson

13.3.3 To list the groups that a user or group owns

1. Verify that you belong to the system:administrators group, which enables you to display an entry’s group ownershipinformation regardless of the setting of its second (o) privacy flag. By default the owner can list the groups owned bygroup, and a user the groups he or she owns. If necessary, issue the pts membership command, which is fully describedin To display the members of the system:administrators group.


2. Issue the pts listowned command to list the groups owned by each user or group.

% pts listowned <user or group name or id>+

where

listo Is the shortest acceptable abbreviation of listowned.

user or group name or id Specifies the name or AFS UID of each user, or the name or AFS GID or each group, for whichto list the groups owned.

The output begins with the following string, and then each group appears on its own line:

Groups owned by user_or_group (id: AFS_ID) are:

The following examples show the output for a user called terry and a group called terry:friends:

% pts listo terryGroups owned by terry (id: 5347) are:terry:friendsterry:co-workers

% pts listo terry:friendsGroups owned by terry:friends (id: -567) are:terry:palsterry:buddies


13.3.4 To display all Protection Database entries



2. Issue the pts listentries command to display all Protection Database entries.

% pts listentries [-users] [-groups]

where

liste Is the shortest acceptable abbreviation of listentries.

-users Displays user and machine entries. The same output results if you omit both this flag and the -groups flag.

-groups Displays group entries.

The output is a table that includes the following columns. Examples follow.

Name Specifies the entry’s name.

ID Specifies the entry’s AFS identification number. For user and machine entries, the AFS user ID (AFS UID) is a positiveinteger; for groups, the AFS group ID (AFS GID) is a negative integer.

Owner Specifies the AFS ID of the user or group who owns the entry and therefore can administer it.

Creator Specifies the AFS UID of the user who created the entry.

The following example is from the Example Corporation cell. The issuer provides no options, so the output includes user andmachine entries.

% pts listentriesName ID Owner Creatoranonymous 32766 -204 -204admin 1 -204 32766pat 1000 -204 1terry 1001 -204 1smith 1003 -204 1jones 1004 -204 1192.12.105.33 2000 -204 1192.12.105.46 2001 -204 1

13.4 Creating User and Machine Entries

An entry in the Protection Database is one of the two required components of every AFS user account, along with an entry inthe Authentication Database. It is best to create a Protection Database user entry only in the context of creating a complete useraccount, by using the uss add or uss bulk command as described in Creating and Deleting User Accounts with the uss CommandSuite, or the pts createuser command as described in Creating AFS User Accounts.

You can also use the pts createuser command to create Protection Database machine entries, which can then be used to controlaccess based on the machine from which the access request originates. After creating a machine entry, add it to a ProtectionDatabase group and place the group on ACLs ( a machine cannot appear on ACLs directly). Because all replicas of a volumeshare the same ACL (the one on the volume’s root directory mount point), you can replicate the volume that houses a program’sbinary file while still complying with a machine-based license agreement as required by the program’s manufacturer. If you donot place any other entries on the ACL, then only users working on the designated machines can access the file.

Keep in mind that creating an ACL entry for a group with machine entries in it extends access to both authenticated and unau-thenticated users working on the machine. However, you can deny access to unauthenticated users by omitting an entry for the


system:anyuser group from the ACLs of the parent directories in the file’s pathname. Conversely, if you want to enable unau-thenticated users on the machine to access a file, then the ACL on every directory leading to it must include an entry for eitherthe system:anyuser group or a group to which the machine entry belongs. For more information on the system:anyuser group,see The System Groups.

Because a machine entry can include unauthenticated users, it is best not to add both machine entries and user entries to thesame group. In general, it is easier to use and administer nonmixed groups. A machine entry can represent a single machine, ormultiple machines with consecutive IP addresses (that is, all machines on a network or subnet) specified by a wildcard notation.See the instructions in To create machine entries in the Protection Database.

By default, the Protection Server assigns the next available AFS UID to a new user or machine entry. It is best to allow this,especially for machine entries. For user entries, it makes sense to assign an AFS UID only if the user already has a UNIX UIDthat the AFS UID needs to match (see Assigning AFS and UNIX UIDs that Match). When automatically allocating an AFS UID,the Protection Server increments the max user id counter by one and assigns the result to the new entry. Use the pts listmaxcommand to display the counter, as described in Displaying and Setting the AFS UID and GID Counters.

Do not reuse the AFS UIDs of users who have left your cell permanently or machine entries you have removed, even thoughdoing so seems to avoid the apparent waste of IDs. When you remove a user or machine entry from the Protection Database, thefs listacl command displays the AFS UID associated with the former entry, rather than the name. If you then assign the AFS UIDto a new user or machine, the new user or machine automatically inherits permissions that were granted to the previous possessorof the ID. To remove obsolete AFS UIDs from ACLs, use the fs cleanacl command described in Removing Obsolete AFS IDsfrom ACLs.

In addition to the name and AFS UID, the Protection Server records the following values in the indicated fields of a new user ormachine’s entry. For more information and instructions on displaying an entry, see To display a Protection Database entry.

• It sets the owner field to the system:administrators group, indicating that the group’s members administer the entry.

• It sets the creator field to the username of the user who issued the pts createuser command (or the uss add or uss bulkcommand).

• It sets the membership field to 0 (zero), because the new entry does not yet belong to any groups.

• It sets the flags field to S----; for explanation, see Setting the Privacy Flags on Database Entries.

• It sets the group quota field to 20, meaning that the new user can create 20 groups. This field has no meaning for machineentries. For further discussion, see Setting Group-Creation Quota.

13.4.1 To create machine entries in the Protection Database



2. Issue the pts createuser command to create one or more machine entries.

% pts createuser -name <user name>+

where

cu Is an alias for createuser (and createu is the shortest acceptable abbreviation).

-name Specifies an IP address in dotted-decimal notation for each machine entry. An entry can represent a single machineor a set of several machines with consecutive IP addresses, using the wildcard notation described in the followinglist. The letters W, X, Y, and Z each represent an actual number value in the field:

• W.X.Y.Z represents a single machine, for example 192.12.108.240.• W.X.Y.0 matches all machines whose IP addresses start with the first three numbers. For example, 192.12.108.0

matches both 192.12.108.119 and 192.12.108.120, but does not match 192.12.105.144.


• W.X.0.0 matches all machines whose IP addresses start with the first two numbers. For example, the address192.12.0.0 matches both 192.12.106.23 and 192.12.108.120, but does not match 192.5.30.95.

• W.0.0.0 matches all machines whose IP addresses start with the first number in the specified address. For example,the address 192.0.0.0 matches both 192.5.30.95 and 192.12.108.120, but does not match 138.255.63.52.

Do not define a machine entry with the name 0.0.0.0 to match every machine. The system:anyuser group is equiva-lent.

The following example creates a machine entry that includes all of the machines in the 192.12 network.

% pts cu 192.12.0.0

13.5 Creating Groups

Before you can add members to a group, you must create the group entry itself. The instructions in this section explain how tocreate both regular and prefix-less groups:

• A regular group’s name is preceded by a prefix that indicates who owns the group, in the following format:

owner_name:group_name

Any user can create a regular group. Group names must always be typed in full, so a short group_name that indicates thegroup’s purpose or its members’ common interest is practical. Groups with names like terry:1 and terry:2 are less usefulbecause their purpose is unclear. For more details on the required format for regular group names, see the instructions in Tocreate groups.

• A prefix-less group, as its name suggests, has only one field in its name, equivalent to a regular group’s group_name field.

Only members of the system:administrators group can create prefix-less groups. For a discussion of their purpose, see UsingPrefix-Less Groups.

By default, the Protection Server assigns the next available AFS GID to a new group entry, and it is best to allow this. Whenautomatically allocating an AFS GID (which is a negative integer), the Protection Server decrements the max group idcounter by one and assigns the result to the new group. Use the pts listmax command to display the counter, as described inDisplaying and Setting the AFS UID and GID Counters.

In addition to the name and AFS GID, the Protection Server records the following values in the indicated fields of a new group’sentry. See To display a Protection Database entry.

• It sets the owner field to the issuer of the pts creategroup command, or to the user or group specified by the -owner argument.

• It sets the creator field to the username of the user who issued the pts creategroup command.

• It sets the membership field to 0 (zero), because the group currently has no members.

• It sets the flags field to S-M--; for explanation, see Setting the Privacy Flags on Database Entries.

• It sets the group quota field to 0, because this field has no meaning for group entries.

13.5.1 Using Groups Effectively

The main reason to create groups is to place them on ACLs, which enables you to control access for multiple users withouthaving to list them individually on the ACL. There are three basic ways to use groups, each suited to a different purpose:

• Private use: you create a group and place it on the ACL of directories you own, without necessarily informing the group’smembers that they belong to it. Members notice only that they can or cannot access the directory in a certain way. You retainsole administrative control over the group, since you are the owner.

The existence of the group and the identity of its members is not necessarily secret. Other users can use the fs listacl commandand see the group’s name on a directory’s ACL, or use the pts membership command to list the groups they themselvesbelong to. You can set the group’s third privacy flag to limit who can use the pts membership command to list the group’smembership, but a member of the system:administrators group always can; see Setting the Privacy Flags on Database Entries.


• Shared use: you inform the group’s members that they belong to the group, but you still remain the sole administrator. Forexample, the manager of a work group can create a group of all the members in the work group, and encourage them to use iton the ACLs of directories that house information they want to share with other members of the group.

NoteIf you place a group owned by someone else on your ACLs, the group’s owner can change the group’s membership withoutinforming you. Someone new can gain or lose access in a way you did not intend and without your knowledge.

• Group use: you create a group and then use the pts chown command to assign ownership to a group, either another group or thegroup itself (the latter type is a self-owned group). You inform the members of the owning group that they all can administerthe owned group.

The main advantage of designating a group as an owner is that it spreads responsibility for administering a group amongseveral people. A single person does not have to perform all administrative tasks, and if the original creator leaves the group,ownership does not have to be transferred.

However, everyone in the owner group can make changes that affect others negatively, such as adding or removing people fromthe group inappropriately or changing the group’s ownership to themselves exclusively. These problems can be particularlysensitive in a self-owned group. Using an owner group works best if all the members know and trust each other; it is probablywise to keep the number of people in an owner group small.

13.5.2 To create groups

1. If creating a prefix-less group, verify that you belong to the system:administrators group. If necessary, issue the ptsmembership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts creategroup command to create each group. All of the groups have the same owner.

% pts creategroup -name <group name>+ [-owner <owner of the group>]

where

cg Is an alias for creategroup (and createg is the shortest acceptable abbreviation).

-name Names each group to create. The name can include up to 63 lowercase letters or numbers, but it is best not toinclude punctuation characters, especially those that have a special meaning to the shell.A prefix-less group name cannot include the colon (:), because it is used to separate the two parts of a regular groupname:owner_name:group_nameThe Protection Server requires that the owner_name prefix of a regular group name accurately indicate the group’sowner. By default, you are recorded as the owner, and the owner_name must be your AFS username. You can includethe -owner argument to designate another AFS user, a regular group, or a prefix-less group as the owner, providingthe required value in the owner_name field:

• If the owner is a user, it must be the AFS username.• If the owner is another regular group, it must match the owning group’s owner_name field. For example, if the

owner is the group terry:associates, the owner field must be terry.• If the owner is a prefix-less group, it must be the owning group’s name.

(For a discussion of why it is useful for a group to own another group, see Using Groups Effectively.)

-owner Is optional and designates an owner other than the issuer of the command. Specify either an AFS username or thename of a regular or prefix-less group that already has at least one member. Do not include this argument if you wantto make the group self-owned as described in Using Groups Effectively. For instructions, see To create a self-ownedgroup.Do not designate a machine as a group’s owner. Because a machine cannot authenticate, there is no way for a machineto administer the group.


13.5.3 To create a self-owned group

1. Issue the pts creategroup command to create a group. Do not include the -owner argument, because you must own agroup to reassign ownership. For complete instructions, see To create groups.

% pts creategroup <group name>

2. Issue the pts adduser command to add one or more members to the group (a group must already have at least one memberbefore owning another group). For complete instructions, see Adding and Removing Group Members.

% pts adduser -user <user name>+ -group <group name>+

3. Issue the pts chown command to assign group ownership to the group itself. For complete instructions, see To change agroup’s owner.

% pts chown <group name> <new owner>

13.5.4 Using Prefix-Less Groups

Members of the system:administrators group can create prefix-less groups, which are particularly suitable for group use, whichis described in Using Groups Effectively.

Suppose, for example, that the manager of the Example Corporation’s Accounting Department, user smith, creates a group thatincludes all of the corporation’s accountants and places the group on the ACLs of directories that house departmental records.Using a prefix-less group rather than a regular group is appropriate for the following reasons:

• The fact that smith created and owns the group is irrelevant, and a regular group must be called smith:acctg. A prefix-lessname like acctg is more appropriate.

• If another user (say jones) ever replaces smith as manager of the Accounting Department, jones needs to become the newowner of the group. If the group is a regular one, its owner_name prefix automatically changes to jones, but the change inthe owner_name prefix does not propagate to any regular groups owned by the group. Someone must use the pts renamecommand to change each one’s owner_name prefix from smith to jones.

A possible solution is to create an authentication account for a fictional user called acctg and make it the owner of regular groupswhich have acctg as their owner_name prefix. However, if the acctg account is also used for other purposes, then the number ofpeople who need to know user acctg’s password is possibly larger than the number of people who need to administer the groupsit owns.

A prefix-less group called acctg solves the problem of inappropriate owner names. The groups that it owns have acctg as theirowner_name prefix, which more accurately reflects their purpose than having the manager’s name there. Prefix-less groupsare also more accountable than dummy authentication accounts. Belonging to the group enables individuals to exercise thepermissions granted to the group on ACLs, but users continue to perform tasks under their own names rather than under thedummy username. Even if the group owns itself, only a finite number of people can administer the group entry.

13.6 Adding and Removing Group Members

Users and machines can be members of groups; groups cannot belong to other groups. Newly created groups have no membersat all. To add them, use the pts adduser command; to remove them, use the pts removeuser command.


13.6.1 To add users and machines to groups

1. Verify that you belong to the system:administrators group, which enables you to add members to a group regardless ofthe setting of its fourth (a) privacy flag. By default the group’s owner also has the necessary privilege. If necessary, issuethe pts membership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts adduser command to add one or more members to one or more groups.

% pts adduser -user <user name>+ -group <group name>+

where

ad Is the shortest acceptable abbreviation of adduser.

-user Specifies each username or machine IP address to add as a member of each group named by the -group argument.A group cannot belong to another group.

group name Names each group to which to add the new members.

13.6.2 To remove users and machines from groups

1. Verify that you belong to the system:administrators group, which enables you to remove members from a group regardlessof the setting of its fifth (r) privacy flag. By default the group’s owner also has the necessary privilege. If necessary, issuethe pts membership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts removeuser command to remove one or more members from one or more groups.

% pts removeuser -user <user name>+ -group <group name>+

where

rem Is the shortest acceptable abbreviation of removeuser.

-user Specifies each user or machine IP address to remove from each group named by the -group argument.

-group Names each group from which to remove members.

13.7 Deleting Protection Database Entries

It is best to delete a Protection Database user entry only if you are removing the complete user account. Use either the uss deletecommand as described in Deleting Individual Accounts with the uss delete Command, or the pts delete command as describedin Removing a User Account.

To remove machine and group entries, use the pts delete command as described in this section. The operation has the followingresults:

• When you delete a machine entry, its name (IP address wildcard) is removed from groups.

• When you delete a group entry, its AFS GID appears on ACLs instead of the name. The group-creation quota of the user whocreated the group increases by one, even if the user no longer owns the group.

To remove obsolete AFS IDs from ACLs, use the fs cleanacl command as described in Removing Obsolete AFS IDs fromACLs.


13.7.1 To delete Protection Database entries

1. Verify that you belong to the system:administrators group or own the group you are deleting. If necessary, issue the ptsmembership command, which is fully described in To display the members of the system:administrators group.


2. Issue the pts delete command to delete one or more entries from the Protection Database.

% pts delete <user or group name or id>+

where

del Is the shortest acceptable abbreviation of delete.

user or group name or id Specifies the IP address or AFS UID of each machine or the name or AFS GID or each groupto remove.

13.8 Changing a Group’s Owner

For user and machine entries, the Protection Server automatically assigns ownership to the system:administrators group at cre-ation time, and this cannot be changed. For group entries, you can change ownership. This transfers administrative responsibilityfor it to another user or group (for information on group ownership of other groups, see Using Groups Effectively).

When you create a regular group, its owner_name prefix must accurately reflect its owner, as described in To create groups:

• If the owner is a user, owner_name is the username.

• If the owner is a regular group, owner_name is the owning group’s owner_name prefix.

• If the owner is a prefix-less group, owner_name is the owner group’s name.

When you change a regular group’s owner, the Protection Server automatically changes its owner_name prefix appropriately. Forexample, if the user pat becomes the new owner of the group terry:friends, its name automatically changes to pat:friends, bothin the Protection Database and on ACLs.

However, the Protection Server does not automatically change the owner_name prefix of any regular groups that the group owns.To continue with the previous example, suppose that the group terry:friends owns the group terry:pals. When pat becomes thenew owner of terry:friends, the name terry:pals does not change. To change the owner_name prefix of a regular group that isowned by another group (in the example, to change the group’s name to pat:pals), use the pts rename command as described inChanging a Protection Database Entry’s Name.

13.8.1 To change a group’s owner

1. Verify that you belong to the system:administrators group or own the group for which you are changing the owner.If necessary, issue the pts membership command, which is fully described in To display the members of the sys-tem:administrators group.


2. (Optional) If you are changing the group’s owner to another group (or to itself) and want to retain administrative privilegeon the owned group, verify that you belong to the new owner group. If necessary, issue the pts membership command,which is fully described in To display group membership.

% pts membership <user or group name or id>

Use the pts adduser command to add yourself if necessary, as fully described in To add users and machines to groups.


% pts adduser <user name> <group name>

3. Issue the pts chown command to change the group’s owner.

% pts chown <group name> <new owner>

where

cho Is the shortest acceptable abbreviation of chown.group name Specifies the current name of the group.new owner Names the user or group to become the group’s owner.

4. (Optional) Issue the pts listowned command to display any groups that the group owns. As discussed in the introductionto this section, the pts chown command does not automatically change the owner_name prefix of any regular groups thata group owns.


If you want to change their names to match the new owning group, use the pts rename command on each one, as describedin To change the name of a machine or group entry.


13.9 Changing a Protection Database Entry’s Name

To change the name of a Protection Database entry, use the pts rename command. It is best to change a user entry’s name onlywhen renaming the entire user account, since so many components of the account (Authentication Database entry, volume name,home directory mount point, and so on) share the name. For instructions, see Changing Usernames. A machine entry’s namemaps to the actual IP address of one or more machine, so changing the entry’s name is appropriate only if the IP addresses havechanged.

It is likely, then, that most often you need to change group names. The following types of name changes are possible:

• Changing a regular group’s name to another regular group name. The most common reason for this type of change is that youhave used the pts chown command to change the owner of the group. That operation does not change the owner_name prefixof a regular group owned by the group whose name has been changed. Therefore, you must use the pts rename commandto change it appropriately. For example, when user pat becomes the owner of the terry:friends group, its name changesautomatically to pat:friends, but the name of a group it owns, terry:pals, does not change. Use the pts rename command torename terry:pals to pat:pals. The Protection Server does not accept changes to the owner_name prefix that do not reflect thetrue ownership (changing terry:pals to smith:pals is not possible).You can also use the pts rename command to change the group_name portion of a regular group name, with or withoutchanging the owner_name prefix.Both the group’s owner and the members of the system:administrators group can change its name to another regular groupname.

• Changing a regular group’s name to a prefix-less name. If you change a group’s name in this way, you must also use the ptsrename command to change the name of any regular group that the group owns. Only members of the system:administratorsgroup can make this type of name change.

• Changing a prefix-less name to another prefix-less name. As with other name changes, the owner_name prefix of any regulargroups that the prefix-less group owns does not change automatically. You must issue the pts rename command on them tomaintain consistency.Both the group’s owner and the members of the system:administrators group can change its name to another prefix-less name.

• Changing a prefix-less name to a regular name. The owner_name prefix on the new name must accurately reflect the group’sownership. As with other name changes, the owner_name prefix of any regular groups that the prefix-less group owns does notchange automatically. You must issue the pts rename command on them to maintain consistency.Only members of the system:administrators group can make this type of name change.


13.9.1 To change the name of a machine or group entry



2. Issue the pts rename command to change the entry’s name.


where

ren Is the shortest acceptable abbreviation of rename.

old name Specifies the entry’s current name.

new name Specifies the new name. If the new name is for a regular group, the owner_name prefix must correctly indicatethe owner.

13.10 Setting Group-Creation Quota

To prevent abuse of system resources, the Protection Server imposes a group-creation quota that limits how many more groups auser can create. When a new user entry is created, the quota is set to 20, but members of the system:administrators group canuse the pts setfields command to increase or decrease it at any time.

It is pointless to change group-creation quota for machine or group entries. It is not possible to authenticate as a group or machineand then create groups.

To display the group-creation quota, use the pts examine command to display a user entry’s group quota field, as de-scribed in To display a Protection Database entry.

13.10.1 To set group-creation quota



2. Issue the pts setfields command to specify how many more groups each of one or more users can create.

% pts setfields -nameorid <user or group name or id>+ \-groupquota <set limit on group creation>

where

setf Is the shortest acceptable abbreviation of setfields.

-nameorid Specifies the name or AFS UID of each user for which to set group-creation quota.

-groupquota Defines how many groups each user can create in addition to existing groups (in other words, groups thatalready exist do not count against the quota). The value you specify overwrites the current value, rather than incre-menting it.


13.11 Setting the Privacy Flags on Database Entries

Members of the system:administrators group can always display and administer Protection Database entries in any way, andregular users can display and administer their own entries and any group entries they own. The privacy flags on a ProtectionDatabase entry determine who else can display certain information from the entry, and who can add and remove members in agroup.

To display the flags, use the pts examine command as described in To display a Protection Database entry. The flags appear inthe output’s flags field. To set the flags, include the -access argument to the pts setfields command.

The five flags always appear, and always must be set, in the following order:

s Controls who can issue the pts examine command to display the entry.

o Controls who can issue the pts listowned command to display the groups that a user or group owns.

m Controls who can issue the pts membership command to display the groups a user or machine belongs to, or which users ormachines belong to a group.

a Controls who can issue the pts adduser command to add a user or machine to a group. It is meaningful only for groups, but avalue must always be set for it even on user and machine entries.

r Controls who can issue the pts removeuser command to remove a user or machine from a group. It is meaningful only forgroups, but a value must always be set for it even on user and machine entries.

Each flag can take three possible types of values to enable a different set of users to issue the corresponding command:

• A hyphen (-) designates the members of the system:administrators group and the entry’s owner. For user entries, it designatesthe user in addition.

• The lowercase version of the letter applies meaningfully to groups only, and designates members of the group in addition tothe individuals designated by the hyphen.

• The uppercase version of the letter designates everyone.

For example, the flags SOmar on a group entry indicate that anyone can examine the group’s entry and display the groups that itowns, and that only the group’s members can display, add, or remove its members.

The default privacy flags for user and machine entries are S----, meaning that anyone can display the entry. The ability toperform any other functions is restricted to members of the system:administrators group and the entry’s owner (as well as theuser for a user entry).

The default privacy flags for group entries are S-M--, meaning that all users can display the entry and the members of the group,but only the entry owner and members of the system:administrators group can perform other functions.

13.11.1 To set a Protection Database entry’s privacy flags



2. Issue the pts setfields command to set the privacy flags.

% pts setfields <user or group name or id>+ -access <set privacy flags>

where

setf Is the shortest acceptable abbreviation of setfields.


user or group name or id Specifies the name or AFS UID of each user, the IP address or AFS UID of each machine, orthe name or AFS GID of each group for which to set the privacy flags.

-access Specifies the set of privacy flags to associate with each entry. Provide a value for each of the five flags, observingthe following constraints:

• Provide a value for all five flags, even though the fourth and fifth flags are not meaningful for user and machineentries.

• For self-owned groups, the hyphen is equivalent to a lowercase letter, because all the members of a self-ownedgroup own it.

• Set the first flag to lowercase s or uppercase S only. For user and machine entries, the Protection Server interpretsthe lowercase s as equivalent to the hyphen.

• Set the second flag to the hyphen (-) or uppercase O only. For groups, the Protection Server interprets the hyphenas equivalent to lowercase o (that is, members of a group can always list the groups that it owns).

• Set the third flag to the hyphen (-), lowercase m, or uppercase M. For user and machine entries, the lowercase mdoes not have a meaningful interpretation, because they have no members.

• Set the fourth flag to the hyphen (-), lowercase a, or uppercase A. Although this flag does not have a meaningfulinterpretation for user and machine entries (because they have no members), it must be set, preferably to the hyphen.

• Set the fifth flag to the hyphen (-) or lowercase r only. Although this flag does not have a meaningful interpretationfor user and machine entries (because they have no members), it must be set, preferably to the hyphen.

13.12 Displaying and Setting the AFS UID and GID Counters

When you use the pts createuser command to create a user or machine entry in the Protection Database, the Protection Serverby default automatically allocates an AFS user ID (AFS UID) for it; similarly, it allocates an AFS group ID (AFS GID) for eachgroup entry you create with the pts creategroup command. It tracks the next available AFS UID (which is a positive integer)and AFS GID (which is a negative integer) with the max user id and max group id counters, respectively.

Members of the system:administrators group can include the -id argument to either pts creation command to assign a specificID to a new user, machine, or group. It often makes sense to assign AFS UIDs explicitly when creating AFS accounts for userswith existing UNIX accounts, as discussed in Assigning AFS and UNIX UIDs that Match. It is also useful if you want to establishranges of IDs that correspond to departmental affiliations (for example, assigning AFS UIDs from 300 to 399 to members of onedepartment, AFS UIDs from 400 to 499 to another department, and so on).

To display the current value of the counters, use the pts listmax command. When you next create a user or machine entry anddo not specify its AFS UID, the Protection Server increments the max user id counter by one and assigns that number to thenew entry. When you create a new group and do not specify its AFS GID, the Protection Server decrements the max groupid counter by one (makes it more negative), and assigns that number to the new group.

You can change the value of either counter, or both, in one of two ways:

• Directly, using the pts setmax command.

• Indirectly, by using the -id argument to the pts createuser command to assign an AFS UID that is larger than the max userid counter, or by using the -id to the pts creategroup command to assign an AFS GID that is less (more negative) than themax group id counter. In either case, the Protection Server changes the counter to the value of the -id argument. The ProtectionServer does not use the IDs between the previous value of the counter and the new one when allocating IDs automatically,unless you use the pts setmax command to move the counter back to its old value.

If the value you specify with the -id argument is less than the max user id counter or greater (less negative) than the maxgroup id counter, then the counter does not change.

13.12.1 To display the AFS ID counters

1. Issue the pts listmax command to display the counters.


% pts listmax

where listm is an acceptable abbreviation of listmax.

The following example illustrates the output’s format. In this case, the next automatically assigned AFS UID is 5439 and AFSGID is -469.

% pts listmaxMax user id is 5438 and max group id is -468.

13.12.2 To set the AFS ID counters



2. Issue the pts setmax command to set the max user id counter, the max group id counter, or both.

% pts setmax [-group <group max>] [-user <user max>]

where

setm Is the shortest acceptable abbreviation of setmax.

-group Specifies an integer one greater (less negative) than the AFS GID that the Protection Server is to assign to the nextgroup entry. Because the value is a negative integer, precede it with a hyphen (-).

-user Specifies an integer one less than the AFS UID that the Protection Server is to assign to the next user or machineentry.


Chapter 14

Managing Access Control Lists

To control access to a directory and all of the files in it, AFS associates an access control list (ACL) with it, rather than themode bits that the UNIX file system (UFS) associates with individual files or directories. AFS ACLs provide more refined accesscontrol because there are seven access permissions rather than UFS’s three, and there is room for approximately 20 user or groupentries on an ACL, rather than just the three UFS entries (owner, group, and other).



Examine access control list fs listaclEdit ACL’s normal permissions section fs setaclEdit ACL’s negative permissions section fs setacl with -negative flagReplace an ACL fs setacl with -clear flagCopy an ACL fs copyaclRemove obsolete AFS UIDs fs cleanacl

14.2 Protecting Data in AFS

This section describes the main differences between the AFS and UFS file protection systems, discusses the implications ofdirectory-level protections, and describes the seven access permissions.

14.2.1 Differences Between UFS and AFS Data Protection

The UFS mode bits data protection system and the AFS ACL system differ in the following ways:

• Protection at the file level (UFS) versus the directory level (AFS)

UFS associates a set of nine mode bits with each file element, three (rwx) for each of the element’s owner, owning group, andall other users. A similar set of mode bits on the file’s directory applies to the file only in an oblique way.

An AFS ACL instead protects all files in a directory in the same way. If a certain file is more sensitive than others, store it in adirectory with a more restrictive ACL.

Defining access at the directory level has important consequences:

– The permissions on a directory’s ACL apply to all of the files in the directory. When you move a file to a different directory,you effectively change the access permissions that apply to it to those on its new directory’s ACL. Changing a directory’sACL changes the protection on all the files in it.


– When you create a subdirectory, its initial ACL is created as a copy of its parent directory’s ACL. You can then change thesubdirectory’s ACL independently. However, the parent directory’s ACL continues to control access to the subdirectory inthe following way: the parent directory’s ACL must grant the l (lookup) permission to a user (or a group the user belongsto) in order for the user to access the subdirectory at all.In general, then, it is best to assign fairly liberal access permissions to high-level directories (including user home directo-ries). In particular, it often makes sense to grant at least the l permission to the system:anyuser or system:authuser groupon high-level directories. For further discussion, see Using Groups on ACLs.

• How the mode bits are interpreted

Mode bits are the only file-protection system in UFS. AFS allows you to set the UNIX mode bits on a file in addition to theACL on its directory, but it interprets them differently. See How AFS Interprets the UNIX Mode Bits.

• Three access permissions (UFS) versus seven (AFS)

UFS defines three access permissions in the form of mode bits: r (read), w (write), and x (execute). AFS defines sevenpermissions, which makes access control more precise. For detailed descriptions, see The AFS ACL Permissions.

a (administer)d (delete)i (insert)k (lock)l (lookup)r (read)w (write)

• Three defined users and groups (UFS) versus many (AFS)

UFS controls access for one user and two groups by providing a set of mode bits for each: the user who owns the file ordirectory, a single defined group, and everyone who has an account on the system.

AFS, in contrast, allows you to place many entries (individual users or groups) on an ACL, granting a different set of accesspermissions to each one. The number of possible entries is about 20, and depends on how much space each entry occupies inthe memory allocated for the ACL itself.

AFS defines two system groups, system:anyuser and system:authuser, which represent all users and all authenticated users,respectively; for further discussion, see Using Groups on ACLs. In addition, users can define their own groups in the ProtectionDatabase, consisting of individual users or machine IP addresses. Users who have the a permission on an ACL can createentries for the system groups as well as groups defined by themselves or other users. For information on defining groups, seeAdministering the Protection Database.

When a user requests access to a file or directory, the File Server sums together all of the permissions that the relevant ACLextends to the user and to groups to which the user belongs. Placing group entries on ACLs therefore can control access formany more users than the ACL can accommodate as individual entries.

14.2.2 The AFS ACL Permissions

Functionally, the seven standard ACL permissions fall into two groups: one that applies to the directory itself and one that appliesto the files it contains.

14.2.2.1 The Four Directory Permissions

The four permissions in this group are meaningful with respect to the directory itself. For example, the i (insert) permission doesnot control addition of data to a file, but rather creation of a new file or subdirectory.

The l (lookup) permission This permission functions as something of a gate keeper for access to the directory and its files,because a user must have it in order to exercise any other permissions. In particular, a user must have this permission toaccess anything in the directory’s subdirectories, even if the ACL on a subdirectory grants extensive permissions.

This permission enables a user to issue the following commands:

• The ls command to list the names of the files and subdirectories in the directory


• The ls -ld command to obtain complete status information for the directory element itself

• The fs listacl command to examine the directory’s ACL

This permission does not enable a user to read the contents of a file in the directory, to issue the ls -l command on a file inthe directory, or to issue the fs listacl command with the filename as the -path argument. Those operations require the r(read) permission which is described in The Three File Permissions.

Similarly, this permission does not enable a user to issue the ls, ls -l, ls -ld, or fs listacl commands against a subdirectoryof the directory. Those operations require the l permission on the ACL of the subdirectory itself.

The i (insert) permission This permission enables a user to add new files to the directory, either by creating or copying, and tocreate new subdirectories. It does not extend into any subdirectories, which are protected by their own ACLs.

The d (delete) permission This permission enables a user to remove files and subdirectories from the directory or move theminto other directories (assuming that the user has the i permission on the ACL of the other directories).

The a (administer) permission This permission enables a user to change the directory’s ACL. Members of the system:administratorsgroup implicitly have this permission on every directory (that is, even if that group does not appear on the ACL). Similarly,the owner of a volume root directory implicitly has this permission on its ACL and those of all directories within thevolume.

14.2.2.2 The Three File Permissions

The three permissions in this group are meaningful with respect to files in a directory, rather than the directory itself or itssubdirectories.

The r (read) permission This permission enables a user to read the contents of files in the directory and to issue the ls -lcommand to stat the file elements.

The w (write) permission This permission enables a user to modify the contents of files in the directory and to issue the chmodcommand to change their UNIX mode bits.

The k (lock) permission This permission enables the user to run programs that issue system calls to lock files in the directory.

14.2.2.3 The Eight Auxiliary Permissions

AFS provides eight additional permissions that do not have a defined meaning, denoted by the uppercase letters A, B, C, D, E,F, G, and H.

You can write application programs that assign a meaning to one or more of the permissions, and then place them on ACLs tocontrol file access by those programs. For example, you can modify a print program to recognize and interpret the permissions,and then place them on directories that house files that the program accesses. Use the fs listacl and fs setacl commands to displayand set the auxiliary permissions on ACLs just like the standard seven.

14.2.2.4 Shorthand Notation for Sets of Permissions

You can combine the seven permissions in any way in an ACL entry, but certain combinations are more useful than others.Four of the more common combinations have corresponding shorthand forms. When using the fs setacl command to defineACL entries, you can provide either one or more of the individual letters that represent the permissions, or one of the followingshorthand forms:

all Represents all seven standard permissions (rlidwka).

none Removes the entry from the ACL, leaving the user or group with no permissions.

read Represents the r (read) and l (lookup) permissions.

write Represents all permissions except a (administer): rlidwk.


14.2.3 Using Normal and Negative Permissions

ACLs enable you both to grant and to deny access to a directory and the files in it. To grant access, use the fs setacl commandto create an ACL entry that associates a set of permissions with a user or group, as described in Setting ACL Entries. When youuse the fs listacl command to display an ACL (as described in Displaying ACLs), such entries appear underneath the followingheader, which uses the term rights to refer to permissions:

Normal rights

There are two ways to deny access:

1. The recommended method is simply to omit an entry for the user or group from the ACL, or to omit the appropriatepermissions from the entry. Use the fs setacl command to remove or edit an existing entry, using the instructions in Toadd, remove, or edit normal ACL permissions. In most circumstances, this method is enough to prevent access of certainkinds or by certain users. You must take care, however, not to grant the undesired permissions to any groups to which suchusers belong.

2. The more explicit method for denying access is to use the -negative flag to the fs setacl command to create an entrythat associates negative permissions with the user or group; for instructions, see To add, remove, or edit negative ACLpermissions. The output from the fs listacl command lists negative entries underneath the following header:

Negative rights

When determining what type of access to grant to a user, the File Server first compiles a set of permissions by examining allof the entries in the Normal rights section of the ACL. It then subtracts any permissions associated with the user (orwith groups to which the user belongs) on the Negative rights section of the ACL. Therefore, negative permissionsalways cancel out normal permissions.

Using negative permissions reverses the usual semantics of the fs setacl command, introducing the potential for confusion.In particular, combining the none shorthand and the -negative flag constitutes a double negative: by removing an entryfrom the Negative rights section of the ACL, you enable a user once again to obtain permissions via entries in theNormal rights section. Combining the all shorthand with the -negative flag explicitly denies all permissions.

Note also that it is pointless to create an entry in the Negative rights section if an entry in the Normal rightssection grants the denied permissions to the system:anyuser group. In this case, users can obtain the permissions simplyby using the unlog command to discard their tokens. When they do so, the File Server recognizes them as the anonymoususer, who belongs to the system:anyuser group but does not match the entries on the Negative rights section of theACL.

14.2.4 Using Groups on ACLs

As previously mentioned, placing a group entry on an ACL enables you to control access for many users at once. You can granta new user access to many files and directories simply by adding the user to a group that appears on the relevant ACLs. You canalso create groups of machines, in which case any user logged on to the machine obtains the access that is granted to the group.On directories where they have the a permission on the ACL, users can define their own groups and can create ACL entries forany groups, not just groups that they create or own themselves. For instructions on creating groups of users or machines, and adiscussion of the most effective ways to use different types of groups, see Administering the Protection Database.

AFS also defines the following two system groups, which can be very useful on ACLs because they potentially represent a largegroup of people. For more information about these groups, see The System Groups.

system:anyuser Includes anyone who can access the cell’s file tree, including users who have logged in as the local superuserroot, have connected to a local machine from somewhere outside the cell, and AFS users who belong to a foreign cell.This group includes users who do not have tokens that are valid for the local AFS servers; the servers recognize them asthe user anonymous.

Note that creating an ACL entry for this group is the only way to extend access to AFS users from foreign cells, unless youcreate local authentication accounts for them.

system:authuser Includes all users who have a valid AFS token obtained from the local cell’s authentication service.


It is particularly useful to grant the l (lookup) permission to the system:anyuser group on the ACL of most directories in thefile system, especially at the upper levels. This permission enables users only to learn the names of files and subdirectories in adirectory, but without it they cannot traverse their way through the directories in the path to a target file.

A slightly more restrictive alternative is to grant the l permission to the system:authuser group. If that is still not restrictiveenough, you can grant the l to specific users or groups, which cannot exceed about 20 in number on a given ACL.

Another reason to grant certain permissions to the system:anyuser group is to enable the correct operation of processes thatprovide services such as printing and mail delivery. For example, in addition to the l permission, a print process possibly needsthe r (read) permission in order to access the contents of files, and a mail delivery process possibly requires the i (insert)permission to deliver new pieces of mail.

The ACL on the root directory of every newly created volume grants all permissions to the system:administrators group. Youcan remove this entry if you wish, but members of the system:administrators group always implicitly have the a (administer),and by default also the l, permission on every directory’s ACL. The a permission enables them to grant themselves otherpermissions explicitly when necessary. To learn about changing this default set of permissions, see Administering the sys-tem:administrators Group.

14.3 Displaying ACLs

To display the ACL associated with a file, directory or symbolic link, issue the fs listacl command. The output for a symboliclink displays the ACL that applies to its target file or directory, rather than the ACL on the directory that houses the symboliclink.

Note for AFS/DFS Migration Toolkit users: If the machine on which you issue the fs listacl command is configured to accessa DCE cell’s DFS filespace via the AFS/DFS Migration Toolkit, you can use the command to display the ACL on DFS files anddirectories. To display a DFS directory’s Initial Container and Initial Object ACL instead of the regular one, include the fs listaclcommand’s -id or -if flag. For instructions, see the OpenAFS/DFS Migration Toolkit Administration Guide and Reference. Thefs command interpreter ignores the -id and -if flags if you include them when displaying an AFS ACL.

14.3.1 To display an ACL

1. Issue the fs listacl command.

% fs listacl [<dir/file path>+]

where

la Is an acceptable alias for listacl (and lista is the shortest acceptable abbreviation).dir/file path Names one or more files or directories for which to display the ACL. For files, the output displays the ACL

for its directory. If you omit this argument, the output is for the current working directory. Partial pathnames areinterpreted relative to the current working directory. You can also use the following notation on its own or as part ofa pathname:. (A single period). Specifies the current working directory... (Two periods). Specifies the current working directory’s parent directory.* (The asterisk). Specifies each file and subdirectory in the current working directory. The ACL displayed for a file

is always the same as for its directory, but the ACL for each subdirectory can differ.

The following error message indicates that you do not have the permissions needed to display an ACL. To specify a directoryname as the dir/file path argument, you must have the l (lookup) permission on the ACL. To specify a filename, you must alsohave the r (read) permission on its directory’s ACL.

fs: You don’t have the required access permissions on ’dir/file path’

Members of the system:administrators group and the directory’s owner (as reported by the ls -ld command) implicitly havethe a (administer) permission on every directory’s ACL, and can use the fs setacl command to grant themselves the requiredpermissions; for instructions, see Setting ACL Entries.

The output for each file or directory specified as dir/file path begins with the following header to identify it:


Access list for dir/file path is

The Normal rights header appears on the next line, followed by lines that each pair a user or group name and a set ofpermissions. The permissions appear as the single letters defined in The AFS ACL Permissions, and always in the order rlidwka.If there are any negative permissions, the Negative rights header appears next, followed by pairs of negative permissions.

The following example displays the ACL on user terry’s home directory in the Example Corporation cell:

% fs la /afs/example.com/usr/terryAccess list for /afs/example.com/usr/terry isNormal permissions:

system:authuser rlpat rlwterry rlidwka

Negative permissions:terry:other-dept rljones rl

where pat, terry, and jones are individual users, system:authuser is a system group, and terry:other-dept is a group that terryowns. The list of normal permissions grants all permissions to terry, the r (read), l (lookup), and w (write) permissions to pat,and the r and l permissions to the members of the system:authuser group.

The list of negative permissions denies the r and l permissions to jones and the members of the terry:other-dept group. Theseentries effectively prevent them from accessing terry’s home directory in any way, because they cancel out the r and l permissionsextended to the system:authuser group, which is the only entry on the Normal rights section of the ACL that possiblyapplies to them.

14.4 Setting ACL Entries

To add, remove, or edit ACL entries, use the fs setacl command. By default, the command manipulates entries on the normalpermissions section of the ACL. To manipulate entries on the negative permissions section, include the -negative flag.

You must have the a (administer) permission on an ACL to edit it. The owner of a directory (as reported by the ls -ld) com-mand and members of the system:administrators group always implicitly have it on every ACL. By default, members of thesystem:administrators group also implicitly have the l (lookup) permission.

Note for AFS/DFS Migration Toolkit users: If the machine on which you issue the fs setacl command is configured to accessa DCE cell’s DFS filespace via the AFS/DFS Migration Toolkit, you can use the command to set the ACL on DFS files anddirectories. To set a DFS directory’s Initial Container and Initial Object ACL instead of the regular one, include the fs setaclcommand’s -id or -if flag. For instructions, see the OpenAFS/DFS Migration Toolkit Administration Guide and Reference. Thefs command interpreter ignores the -id and -if flags if you include them when setting an AFS ACL.

14.4.1 To add, remove, or edit normal ACL permissions

1. Verify that you have the a (administer) permission on each directory for which you are editing the ACL. If necessary,issue the fs listacl command, which is fully described in Displaying ACLs.


2. Issue the fs setacl command to edit entries in the normal permissions section of the ACL. To remove an entry, specify thenone shorthand as the permissions. If an ACL entry already exists, the permissions you specify completely replace thosein the existing entry.

% fs setacl -dir <directory>+ -acl <access list entries>+

where

sa Is an acceptable alias for setacl (and seta is the shortest acceptable abbreviation).


-dir Names one or more directories to which to apply the ACL entries defined by the -acl argument. Partial pathnamesare interpreted relative to the current working directory.Specify the read/write path to each directory, to avoid the failure that results when you attempt to change a read-onlyvolume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-onlypaths through the filespace, see The Rules of Mount Point Traversal.You can also use the following notation on its own or as part of a pathname:

. (A single period). If used by itself, sets the ACL on the current working directory.

.. (Two periods). If used by itself, sets the ACL on the current working directory’s parent directory.* (The asterisk). Sets the ACL on each of the subdirectories in the current working directory. You must precede it

with the -dir switch, since it potentially designates multiple directories. The fs command interpreter generatesthe following error message for each file in the directory:

fs: ’filename’: Not a directory

If you specify only one directory or file name, you can omit the -dir and -acl switches.

-acl Specifies one or more ACL entries, each of which pairs a user or group name and a set of permissions. Separate thepairs, and the two parts of each pair, with one or more spaces.To define the permissions, provide either:

• One or more of the letters that represent the standard or auxiliary permissions (rlidwka and ABCDEFGH), in anyorder

• One of the four shorthand notations:– all (equals rlidwka)– none (removes the entry)– read (equals rl)– write (equals rlidwk)

For a more detailed description of the permissions and shorthand notations, see The AFS ACL Permissions.On a single command line, you can combine user and group entries. You can also use individual letters in some pairsand the shorthand notations in other pairs, but cannot combine letters and shorthand notation within a single pair.

Either of the following examples grants user pat the r (read) and l (lookup) permissions on the ACL of the notes subdirectoryin the issuer’s home directory. They illustrate how it is possible to omit the -dir and -acl switches when you name only onedirectory.

% fs sa ~/notes pat rl% fs sa ~/notes pat read

The following example edits the ACL for the current working directory. It removes the entry for the system:anyuser group, andadds two entries: one grants all permissions except a (administer) to the members of the terry:colleagues group and the othergrants the r (read) and l (lookup) permissions to the system:authuser group. The command appears on two lines here only forlegibility.

% fs sa -dir . -acl system:anyuser none terry:colleagues write \system:authuser rl

14.4.2 To add, remove, or edit negative ACL permissions



2. Issue the fs setacl command with the -negative flag to edit entries in the negative permissions section of the ACL. Toremove an entry, specify the none shorthand as the permissions. If an ACL entry already exists for a user or group, thepermissions you specify completely replace those in the existing entry.


% fs setacl -dir <directory>+ -acl <access list entries>+ -negative

where

sa Is an acceptable alias for setacl (and seta is the shortest acceptable abbreviation).-dir Names one or more directories to which to apply the negative ACL entries defined by the -acl argument. Specify the

read/write path to each directory, to avoid the failure that results when you attempt to change a read-only volume.For a detailed description of acceptable values, see To add, remove, or edit normal ACL permissions.

-acl Specifies one or more ACL entries, each of which pairs a user or group name and a set of permissions. Separate thepairs, and the two parts of each pair, with one or more spaces. For a detailed description of acceptable values, see Toadd, remove, or edit normal ACL permissions. Keep in mind that the usual meaning of each permission is reversed.

-negative Places the entries defined by the -acl argument on the negative permissions section of the ACL for each directorynamed by the -dir argument.

The following example denies user pat the w (write) and d (delete) permissions for the project subdirectory of the currentworking directory.

% fs sa project pat wd -neg

14.5 Completely Replacing an ACL

It is sometimes simplest to clear an ACL completely before defining new permissions on it, for instance if the mix of normaland negative permissions makes it difficult to understand how their interaction affects a user’s access to the directory. To clearan ACL completely while you define new entries, include the -clear flag on the fs setacl command. When you include this flag,you can create entries on either the normal permissions or the negative permissions section of the ACL, but not on both at once.

Remember to create an entry that grants appropriate permissions to the directory’s owner. The owner implicitly has the a(administer) permission required to replace a deleted entry, but the effects of a missing ACL entry (particularly the lack of thelookup permission) can be so confusing that it becomes difficult for the owner to realize that the missing entry is causing theproblems.

14.5.1 To replace an ACL completely



2. Issue the fs setacl command with the -clear flag to clear the ACL completely before setting either normal or negativepermissions. Because you need to grant the owner of the directory all permissions, it is better in most cases to set normalpermissions at this point.

% fs setacl -dir <directory>+ -acl <access list entries>+ -clear \[-negative]

where

sa Is an acceptable alias for setacl (and seta is the shortest acceptable abbreviation).-dir Names one or more directories to which to apply the negative ACL entries defined by the -acl argument. Specify the

read/write path to each directory, to avoid the failure that results when you attempt to change a read-only volume.For a detailed description of acceptable values, see To add, remove, or edit normal ACL permissions.

-acl Specifies one or more ACL entries, each of which pairs a user or group name and a set of permissions. Separate thepairs, and the two parts of each pair, with one or more spaces. Remember to grant all permissions to the owner of thedirectory. For a detailed description of acceptable values, see To add, remove, or edit normal ACL permissions.

-clear Removes all entries from each ACL before creating the entries indicated by the -acl argument.-negative Places the entries defined by the -acl argument on the negative permissions section of each ACL.


14.6 Copying ACLs Between Directories

The fs copyacl command copies a source directory’s ACL to one or more destination directories. It does not affect the sourceACL at all, but changes each destination ACL as follows:

• If an entry on the source ACL does not exist on the destination ACL, the command copies it to the destination ACL.

• If an entry on the destination ACL does not also exist on the source ACL, the command does not remove it unless you includethe -clear flag to overwrite the destination ACL completely.

• If an entry is on both ACLs, the command changes the permissions on the destination ACL entry to match the source ACLentry.

Note for AFS/DFS Migration Toolkit users: If the machine is configured to enable AFS users to access a DCE cell’s DFSfilespace via the AFS/DFS Migration Toolkit, then you can use the fs copyacl command to copy ACLs between DFS files anddirectories also. The command includes -id and -if flags for altering a DFS directory’s Initial Container and Initial Object ACLsas well as its regular ACL; see the OpenAFS/DFS Migration Toolkit Administration Guide and Reference. You cannot copyACLs between AFS and DFS directories, because they use different ACL formats. The fs command interpreter ignores the -idand -if flags if you include them when copying AFS ACLs.

14.6.1 To copy an ACL between directories

1. Verify that you have the l (lookup) permission on the source ACL and the a (administer) permission on each destinationACL. To identify the source directory by naming a file in it, you must also have the r (read) permission on the sourceACL. If necessary, issue the fs listacl command, which is fully described in Displaying ACLs.


2. Issue the fs copyacl command to copy a source ACL to the ACL on one or more destination directories. (The commandappears here on two lines only for legibility.)

% fs copyacl -fromdir <source directory> -todir <destination directory>+ \[-clear]

where

co Is the shortest acceptable abbreviation for copyacl.-fromdir Names the source directory from which to copy the ACL. Partial pathnames are interpreted relative to the current

working directory. If this argument names a file, the ACL is copied from its directory.

-todir Names each destination directory to which to copy the source ACL. Partial pathnames are interpreted relative to thecurrent working directory. Filenames are not acceptable.Specify the read/write path to each directory, to avoid the failure that results when you attempt to change a read-onlyvolume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-onlypaths through the filespace, see The Rules of Mount Point Traversal.

-clear Completely overwrites each destination directory’s ACL with the source ACL.

The following example copies the ACL from the current working directory’s notes subdirectory to the plans subdirectory. Theissuer does not include the -clear flag, so the entry for user pat remains on the plans directory’s ACL although there is nocorresponding entry on the notes directory’s ACL.

% fs la notes plansAccess list for notes isNormal permissions:

terry rlidwkasmith rl


jones rlAccess list for plans isNormal permissions:

terry rlidwkpat rlidwk

% fs copyacl notes plans% fs la notes plansAccess list for notes isNormal permissions:

terry rlidwkasmith rljones rl

Access list for plans isNormal permissions:

terry rlidwkapat rlidwksmith rljones rl

14.7 Removing Obsolete AFS IDs from ACLs

When you remove a user or group entry from the Protection Database, the fs listacl command displays the user’s AFS UID (orgroup’s AFS GID) in ACL entries, rather than the name. In the following example, user terry has an ACL entry for the groupterry:friends (AFS GID -567) on her home directory in the Example Corporation cell, and then removes the group from theProtection Database.

% fs listacl /afs/example.com/usr/terryAccess list for /afs/example.com/usr/terry isNormal permissions:terry:friends rliksystem:anyuser lterry rlidwka

% pts delete terry:friends% fs listacl /afs/example.com/usr/terryAccess list for /afs/example.com/usr/terry isNormal permissions:-567 rliksystem:anyuser lterry rlidwka

Leaving AFS IDs on ACLs serves no function, because the ID no longer corresponds to an active user or group. Furthermore,if the ID is ever assigned to a new user or group, then the new possessor of the ID gains access that the owner of the directoryactually intended for the previous possessor. (Reusing AFS IDs is not recommended precisely for this reason.)

To remove obsolete AFS UIDs from ACLs, use the fs cleanacl command.

14.7.1 To clean obsolete AFS IDs from an ACL

1. Verify that you have the a (administer) permission on each directory for which you are cleaning the ACL. If necessary,issue the fs listacl command, which is fully described in Displaying ACLs.


2. Issue the fs cleanacl command to remove entries for obsolete AFS IDs.

% fs cleanacl [<dir/file path>+]

where


cl Is the shortest acceptable abbreviation of cleanacl.dir/file path Names each directory for which to clean the ACL. If this argument names a file, its directory’s ACL is

cleaned. Omit this argument to clean the current working directory’s ACL.Specify the read/write path to each directory, to avoid the failure that results when you attempt to change a read-onlyvolume. By convention, you indicate the read/write path by placing a period before the cell name at the pathname’ssecond level (for example, /afs/.example.com). For further discussion of the concept of read/write and read-onlypaths through the filespace, see The Rules of Mount Point Traversal.You can also use the following notation on its own or as part of a pathname:

. (A single period). If used by itself, cleans the current working directory’s ACL.

.. (Two periods). If used by itself, cleans the ACL on the current working directory’s parent directory.* (The asterisk). Cleans the ACL of each of the subdirectories in the current working directory. However, if you

use the asterisk and there are obsolete AFS IDs on any directory’s ACL, the following error message appears forevery file in the directory:

fs: ’filename’: Not a directory

If there are obsolete AFS IDs on a directory, the command interpreter displays its cleaned ACL under the following header.

Access list for directory is now

If a directory’s ACL has no obsolete AFS IDs on it, the following message appears for each.

Access list for directory is fine.

14.8 How AFS Interprets the UNIX Mode Bits

Although AFS uses ACLs to protect file data rather than the mode bits that UFS uses, it does not ignore the mode bits entirely.When you issue the chmod command on an AFS file or directory, AFS changes the bits appropriately. To change a file’s modebits, you must have the AFS w (write) permission on the ACL of the file’s directory. To change a directory’s mode bits, you musthave the d (delete), i (insert), and l (lookup) permissions on its ACL.

AFS also uses the UNIX mode bits as follows:

• It uses the initial bit to determine the element’s type. This is the bit that appears first in the output from the ls -l command andshows the hyphen (-) for a file or the letter d for a directory.

• It does not use any of the mode bits on a directory.

• For a file, the first (owner) set of bits interacts with the ACL entries that apply to the file in the following way:

– If the first r mode bit is not set, no one (including the owner) can read the file, no matter what permissions they have on theACL. If the bit is set, users also need the r (read) and l permissions on the ACL of the file’s directory to read the file.

– If the first w mode bit is not set, no one (including the owner) can modify the file. If the w bit is set, users also need the wand l permissions on the ACL of the file’s directory to modify the file.

– There is no ACL permission directly corresponding to the x mode bit, but to execute a file stored in AFS, the user must alsohave the r and l permissions on the ACL of the file’s directory.


Chapter 15

Managing Administrative Privilege

This chapter explains how to enable system administrators and operators to perform privileged AFS operations.



Display members of system:administrators group pts membershipAdd user to system:administrators group pts adduserRemove user from system:administrators group pts removeuserDisplay ADMIN flag in Authentication Database entry kas examineSet or remove ADMIN flag on Authentication Database entry kas setfieldsDisplay users in UserList file bos listusersAdd user to UserList file bos adduserRemove user from UserList file bos removeuser

15.2 An Overview of Administrative Privilege

A fully privileged AFS system administrator has the following characteristics:

• Membership in the cell’s system:administrators group. See Administering the system:administrators Group.

• The ADMIN flag on his or her entry in the cell’s Authentication Database. See Granting Privilege for kas Commands: theADMIN Flag.

• Inclusion in the file /usr/afs/etc/UserList on the local disk of each AFS server machine in the cell. See Administering theUserList File.

This section describes the three privileges and explains why more than one privilege is necessary.

NoteNever grant any administrative privilege to the user anonymous, even when a server outage makes it impossible to mutuallyauthenticate. If you grant such privilege, then any user who can access a machine in your cell can issue privileged commands.The alternative solution is to put the affected server machine into no-authentication mode and use the -noauth flag availableon many commands to prevent mutual authentication attempts. For further discussion, see Managing Authentication andAuthorization Requirements.


15.2.1 The Reason for Separate Privileges

Often, a cell’s administrators require full administrative privileges to perform their jobs effectively. However, separating the threetypes of privilege makes it possible to grant only the minimum set of privileges that a given administrator needs to complete hisor her work.

The system:administrators group privilege is perhaps the most basic, and most frequently used during normal operation (whenall the servers are running normally). When the Protection Database is unavailable due to machine or server outage, it is notpossible to issue commands that require this type of privilege.

The ADMIN flag privilege is separate because of the extreme sensitivity of the information in the Authentication Database,especially the server encryption key in the afs entry. When the Authentication Database is unavailable due to machine or serveroutage, it is not possible to issue commands that require this type of privilege.

The ability to issue privileged bos and vos command is recorded in the /usr/afs/etc/UserList file on the local disk of each AFSserver machine rather than in a database, so that in case of serious server or network problems administrators can still log ontoserver machines and use those commands while solving the problem.

15.3 Administering the system:administrators Group

The first type of AFS administrative privilege is membership . Members of the system:administrators group in the ProtectionDatabase have the following privileges:

• Permission to issue all pts commands, which are used to administer the Protection Database. See Administering the ProtectionDatabase.

• Permission to issue the fs setvol and fs setquota commands, which set the space quota on volumes as described in Setting andDisplaying Volume Quota and Current Size.

• Implicit a (administer) and by default l (lookup) permissions on the access control list (ACL) on every directory in the cell’sAFS filespace. Members of the group can use the fs setacl command to grant themselves any other permissions they require,as described in Setting ACL Entries.

You can change the ACL permissions that the File Server on a given file server machine implicitly grants to the members ofthe system:administrators group for the data in volumes that it houses. When you issue the bos create command to createand start the fs process on the machine, include the -implicit argument to the fileserver initialization command. For syntaxdetails, see the fileserver reference page in the OpenAFS Administration Reference. You can grant additional permissions, orremove the l permission. However, the File Server always implicitly grants the a permission to members of the group, even ifyou set the value of the -implicit argument to none.

15.3.1 To display the members of the system:administrators group

1. Issue the pts membership command to display the system:administrators group’s list of members. Any user can issuethis command as long as the first privacy flag on the system:administrators group’s Protection Database entry is notchanged from the default value of uppercase S.


where m is the shortest acceptable abbreviation of membership.

15.3.2 To add users to the system:administrators group




2. Issue the pts adduser group to add one or more users.

% pts adduser -user <user name>+ -group system:administrators

where

ad Is the shortest acceptable abbreviation of adduser.

-user Names each user to add to the system:administrators group.

15.3.3 To remove users from the system:administrators group



2. Issue the pts removeuser command to remove one or more users.

% pts removeuser -user <user name>+ -group system:administrators

where

rem Is the shortest acceptable abbreviation of removeuser.

-user Names each user to remove from the system:administrators group.

15.4 Granting Privilege for kas Commands: the ADMIN Flag

Administrators who have the ADMIN flag on their Authentication Database entry can issue all kas commands, which enable themto administer the Authentication Database.

15.4.1 To check if the ADMIN flag is set

1. Issue the kas examine command to display an entry from the Authentication Database.

The Authentication Server performs its own authentication rather than accepting your existing AFS token. By default, itauthenticates your local (UFS) identity, which possibly does not correspond to an AFS-privileged administrator. Includethe -admin_username argument (here abbreviated to -admin) to name a user identity that has the ADMIN flag on itsAuthentication Database entry.

% kas examine <name of user> \-admin <admin principal to use for authentication>


where


name of user Names the entry to display.

-admin Names an administrative account with the ADMIN flag on its Authentication Database entry, such as the adminaccount. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

If the ADMIN flag is turned on, it appears on the first line, as in this example:

% kas e terry -admin adminAdministrator’s (admin) password: <admin_password>User data for terry (ADMIN)key version is 0, etc...


15.4.2 To set or remove the ADMIN flag

1. Issue the kas setfields command to turn on the ADMIN flag in an Authentication Database entry.


The following command appears on two lines only for legibility.

% kas setfields <name of user> {ADMIN | NOADMIN} \-admin <admin principal to use for authentication>


where

sf Is an alias for setfields (and setf is the shortest acceptable abbreviation).

name of user Names the entry for which to set or remove the ADMIN flag.

ADMIN | NOADMIN Sets or removes the ADMIN flag, respectively.

-admin Names an administrative account with the ADMIN flag on its Authentication Database entry, such as the adminaccount. The password prompt echoes it as admin_user. Enter the appropriate password as admin_password.

15.5 Administering the UserList File

Inclusion in the file /usr/afs/etc/UserList on the local disk of each AFS server machine enables an administrator to issue com-mands from the indicated suites.

• The bos commands enable the administrator to manage server processes and the server configuration files that define the cell’sdatabase server machines, server encryption keys, and privileged users. See Administering Server Machines and Monitoringand Controlling Server Processes.

• The vos commands enable the administrator to manage volumes and the Volume Location Database (VLDB). See ManagingVolumes.

• The backup commands enable the administrator to use the AFS Backup System to copy data to permanent storage. SeeConfiguring the AFS Backup System and Backing Up and Restoring AFS Data.

Although each AFS server machine maintains a separate copy of the file on its local disk, it is conventional to keep all copies thesame. It can be confusing for an administrator to have the privilege on some machines but not others.

If your cell uses the Update Server to distribute the contents of the system control machine’s /usr/afs/etc directory, then edit onlythe copy of the UserList file stored on the system control machine. If you have forgotten which machine is the system controlmachine, see The Four Roles for File Server Machines.

To avoid making formatting errors that can result in performance problems, never edit the UserList file directly. Instead, use thebos adduser or bos removeuser commands as described in this section.

15.5.1 To display the users in the UserList file

1. Issue the bos listusers command to display the contents of the /usr/afs/etc/UserList file.


where

listu Is the shortest acceptable abbreviation of listusers.

machine name Names an AFS server machine. In the normal case, any machine is acceptable because the file is the sameon all of them.


15.5.2 To add users to the UserList file

1. Verify you are listed in the /usr/afs/etc/UserList file. If not, you must have a qualified administrator add you before youcan add entries to it yourself. If necessary, issue the bos listusers command, which is fully described in To display theusers in the UserList file.


2. Issue the bos adduser command to add one or more users to the UserList file.

% bos adduser <machine name> <user names>+

where

addu Is the shortest acceptable abbreviation of adduser.

machine name Names the system control machine if you use the Update Server to distribute the contents of the /usr/af-s/etc directory. By default, it can take up to five minutes for the Update Server to distribute the changes, so newlyadded users must wait that long before attempting to issue privileged commands.

user names Specifies the username of each administrator to add to the UserList file.

15.5.3 To remove users from the UserList file

1. Verify you are listed in the /usr/afs/etc/UserList file. If not, you must have a qualified administrator add you before youcan remove entries from it yourself. If necessary, issue the bos listusers command, which is fully described in To displaythe users in the UserList file.


2. Issue the bos removeuser command to remove one or more users from the UserList file.

% bos removeuser <machine name> <user names>+

where

removeu Is the shortest acceptable abbreviation of removeuser.

machine name Names the system control machine if you use the Update Server to distribute the contents of the /usr/af-s/etc directory. By default, it can take up to five minutes for the Update Server to distribute the change, so newlyremoved users can continue to issue privileged commands during that time.

user names Specifies the username of each administrator to add to the UserList file.


Part V

Managing the NFS/AFS Translator


The NFS(R)/AFS(R) Translator enables users working on NFS client machines to access, create and remove files stored in AFS.This chapter assumes familiarity with both NFS and AFS.

.1 Summary of Instructions


Mount directory on translator machine mountExamine value of @sys variable fs sysnameEnable/disable reexport of AFS, set other parameters fs exportafsAssign AFS tokens to user on NFS client machine knfs

.2 Overview

The NFS/AFS Translator enables users on NFS client machines to access the AFS filespace as if they are working on an AFSclient machine, which facilitates collaboration with other AFS users.

An NFS/AFS translator machine (or simply ltranslator machine) is a machine configured as both an AFS client and an NFSserver:

• Its AFS client functionality enables it to access the AFS filespace. The Cache Manager requests and caches files from AFSfile server machines, and can even maintain tokens for NFS users, if you have made the configuration changes that enable NFSusers to authenticate with AFS.

• Its NFS server functionality makes it possible for the translator machine to export the AFS filespace to NFS client machines.When a user on an NFS client machine mounts the translator machine’s /afs directory (or one of its subdirectories, if that featureis enabled), access to AFS is immediate and transparent. The NFS client machine does not need to run any AFS software.

.2.1 Enabling Unauthenticated or Authenticated AFS Access

By configuring the translation environment appropriately, you can provide either unauthenticated or authenticated access to AFSfrom NFS client machines. The sections of this chapter on configuring translator machines, NFS client machines, and AFS useraccounts explain how to configure the translation environment appropriately.

• If you configure the environment for unauthenticated access, the AFS File Server considers the NFS users to be the useranonymous. They can access only those AFS files and directories for which the access control list (ACL) extends the requiredpermissions to the system:anyuser group. They can issue only those AFS commands that do not require privilege, and thenonly if their NFS client machine is a system type for which AFS binaries are available and accessible by the system:anyusergroup. Such users presumably do not have AFS accounts.

• If you configure the environment for authenticated access, you must create entries in the AFS Authentication and ProtectionDatabases for the NFS users. The authentication procedure they use depends on whether the NFS client machine is a supportedsystem type (one for which AFS binaries are available):

– If AFS binaries are available for the NFS client machine, NFS users can issue the klog command on the NFS client machine.They can access the filespace and issue AFS commands to the same extent as authenticated users working on AFS clientmachines.

– If AFS binaries are not available for the NFS client machine, NFS users must establish a connection with the translatormachine (using the telnet utility, for example) and then issue the klog and knfs commands on the translator machine tomake its Cache Manager use the tokens correctly while users work on the NFS client. They can access the AFS filespaceas authenticated users, but cannot issue AFS commands. For instructions, see Authenticating on Unsupported NFS ClientMachines.


.2.2 Setting the AFSSERVER and AFSCONF Environment Variables

If you wish to enable your NFS users to issue AFS commands, you must define the AFSSERVER and AFSCONF environmentvariables in their command shell. This section explains the variables’ function and outlines the various methods for setting them.

Issuing AFS commands also requires that the NFS client machine is a supported system type (one for which AFS binaries areavailable and accessible). Users working on NFS client machines of unsupported system types can access AFS as authenti-cated users, but they cannot issue AFS commands. It is not necessary to define the AFSSERVER and AFSCONF variablesfor such users. For instructions on using the knfs command to obtain authenticated access on unsupported system types, seeAuthenticating on Unsupported NFS Client Machines.

.2.2.1 The AFSSERVER Variable

The AFSSERVER variable designates the AFS client machine that performs two functions for NFS clients:

• It acts as the NFS client’s remote executor by executing AFS-specific system calls on its behalf, such as those invoked by theklog and tokens commands and by many commands in the AFS suites.

• Its stores the tokens that NFS users obtain when they authenticate with AFS. This implies that the remote executor machineand the translator machine must be the same if the user needs authenticated access to AFS.

The choice of remote executor most directly affects commands that display or change Cache Manager configuration, such asthe fs getcacheparms, fs getcellstatus, and fs setcell commands. When issued on an NFS client, these commands affect theCache Manager on the designated remote executor machine. (Note, however, that several such commands require the issuer tobe logged into the remote executor’s local file system as the local superuser root. The ability of NFS client users to log in as rootis controlled by NFS, not by the NFS/AFS Translator, so setting the remote executor properly does not necessarily enable userson the NFS client to issue such commands.)

The choice of remote executor is also relevant for AFS commands that do not concern Cache Manager configuration but ratherhave the same result on every machine, such as the fs commands that display or set ACLs and volume quota. These commandstake an AFS path as one of their arguments. If the Cache Manager on the remote executor machine mounts the AFS filespace atthe /afs directory, as is conventional for AFS clients, then the pathname specified on the NFS client must begin with the string/afs for the Cache Manager to understand it. This implies that the remote executor must be the NFS client’s primary translatormachine (the one whose /afs directory is mounted at /afs on the NFS client).

.2.2.2 The AFSCONF Variable

The AFSCONF environment variable names the directory that houses the ThisCell and CellServDB files to use when runningAFS commands issued on the NFS client machine. As on an AFS client, these files determine the default cell for commandexecution.

For predictable performance, it is best that the files in the directory named by the AFSCONF variable match those in the /us-r/vice/etc directory on the translator machine. If your cell has an AFS directory that serves as the central update source for filesin the /usr/vice/etc directory, it is simplest to set the AFSCONF variable to refer to it. In the conventional configuration, thisdirectory is called /afs/cellname/common/etc.

.2.2.3 Setting Values for the Variables

To learn the values of the AFSSERVER and AFSCONF variables, AFS command interpreters consult the following three sourcesin sequence:

1. The current command shell’s environment variable definitions

2. The .AFSSERVER or .AFSCONF file in the issuer’s home directory

3. The /.AFSSERVER or /.AFSCONF file in the NFS client machine’s root (/ ) directory. If the client machine is diskless,its root directory can reside on an NFS server machine.


(Actually, before consulting these sources, the NFS client looks for the CellServDB and ThisCell files in its own /usr/vice/etcdirectory. If the directory exists, the NFS client does not use the value of the AFSCONF variable. However, the /usr/vice/etcdirectory usually exists only on AFS clients, not NFS clients.)

As previously detailed, correct performance generally requires that the remote executor machine be the NFS client’s primarytranslator machine (the one whose /afs directory is mounted at the /afs directory on the NFS client). The requirement holds forall users accessing AFS from the NFS client, so it is usually simplest to create the .AFSSERVER file in the NFS client’s rootdirectory. The main reason to create the file in a user’s home directory or to set the AFSSERVER environment variable in thecurrent command shell is that the user needs to switch to a different translator machine, perhaps because the original one hasbecome inaccessible.

Similarly, it generally makes sense to create the .AFSCONF file in the NFS client’s root directory. Creating it in the user’s homedirectory or setting the AFSCONF environment variable in the current command shell is useful mostly when there is a reason tospecify a different set of database server machines for the cell, perhaps in a testing situation.

.2.3 Delayed Writes for Files Saved on NFS Client Machines

When an application running on an AFS client machine issues the close or fsync system call on a file, the Cache Manager bydefault performs a synchronous write of the data to the File Server. (For further discussion, see AFS Implements Save on Closeand Enabling Asynchronous Writes.)

To avoid degrading performance for the AFS users working on a translator machine, AFS does not perform synchronous writesfor applications running on the translator machine’s NFS clients. Instead, one of the Cache Manager daemons (the maintenancedaemon) checks every 60 seconds for chunks in the cache that contain data saved on NFS clients, and writes their contents to theFile Server. This does not guarantee that data saved on NFS clients is written to the File Server within 60 seconds, but only thatthe maintenance daemon checks for and begins the write of data at that interval.

Furthermore, AFS always ignores the fsync system call as issued on an NFS client. The call requires an immediate and possiblytime-consuming response from the File Server, which potentially causes delays for other AFS clients of the File Server. NFSversion 3 automatically issues the fsync system call directly after the close call, but the Cache Manager ignores it and handlesthe operation just like a regular close.

The delayed write mechanism means that there is usually a delay between the time when an NFS application issues the close orfsync system call on a file and the time when the changes are recorded at the File Server, which is when they become visible tousers working on other AFS client machines (either directly or on its NFS clients). The delay is likely to be longer than for filessaved by users working directly on an AFS client machine.

The exact amount of delay is difficult to predict. The NFS protocol itself allows a standard delay before saved data must betransferred from the NFS client to the NFS server (the translator machine). The modified data remains in the translator machine’sAFS client cache until the maintenance daemon’s next scheduled check for such data, and it takes additional time to transferthe data to the File Server. The maintenance daemon uses a single thread, so there can be additional delay if it takes more than60 seconds to write out all of the modified NFS data. That is, if the maintenance daemon is still writing data at the time of thenext scheduled check, it cannot notice any additional modified data until the scheduled time after it completes the long writeoperation.

The Cache Manager’s response to the write system call is the same whether it is issued on an AFS client machine or on an NFSclient of a translator machine: it records the modifications in the local AFS client cache only.

.3 Configuring NFS/AFS Translator Machines

To act as an NFS/AFS translator machine, a machine must configured as follows:

• It must be an AFS client. Many system types supported as AFS clients can be translator machines. To learn about possiblerestrictions in a specific release of AFS, see the OpenAFS Release Notes.

• It must be an NFS server. The appropriate number of NFS server daemons (nfsd and others) depends on the anticipated NFSclient load.

• It must export the local directory on which the AFS filespace is mounted, /afs by convention.


If users on a translator machine’s NFS clients are to issue AFS commands, the translator machine must also meet the requirementsdiscussed in Configuring the Translator Machine to Accept AFS Commands.

.3.1 Loading NFS and AFS Kernel Extensions

The AFS distribution for system types that can act as NFS/AFS Translator machines usually includes two versions of the AFSkernel extensions file, one for machines where the kernel supports NFS server functionality, and one for machines not using NFS(the latter AFS kernel extensions file generally has the string nonfs in its name). A translator machine must use the NFS-enabledversion of the AFS extensions file. On some system types, you select the appropriate file by moving it to a certain location,whereas on other system types you set a variable that results in automatic selection of the correct file. See the instructions in theOpenAFS Quick Beginnings for incorporating AFS into the kernel on each system type.

On many system types, NFS is included in the kernel by default, so it is not necessary to load NFS kernel extensions explicitly.On system types where you must load NFS extensions, then in general you must load them before loading the AFS kernelextensions. The OpenAFS Quick Beginnings describes how to incorporate the AFS initialization script into a machine’s startupsequence so that it is ordered correctly with respect to the script that handles NFS.

In addition, the AFS extensions must be loaded into the kernel before the afsd command runs. The AFS initialization scriptincluded in the AFS distribution correctly orders the loading and afsd commands.

.3.2 Configuring the Translator Machine to Accept AFS Commands

For users working on a translator machine’s NFS clients to issue AFS commands, the -rmtsys flag must be included on the afsdcommand which initializes the translator machine’s Cache Manager. The flag starts an additional daemon (the remote executordaemon), which executes AFS-specific system calls on behalf of NFS clients. For a discussion of the implications of NFS usersissuing AFS commands, see Setting the AFSSERVER and AFSCONF Environment Variables.

The instructions in the OpenAFS Quick Beginnings for configuring the Cache Manager explain how to add options such as the-rmtsys flag to the afsd command in the AFS initialization script. On many system types, it is simplest to list the flag on theline in the script that defines the OPTIONS variable. The remote executor daemon does not consume many resources, so it issimplest to add it to the afsd command on every translator machine, even if not all users on the machine’s NFS clients issue AFScommands.

.3.3 Controlling Optional Translator Features

After an AFS client machine is configured as a translator machine, it by default exports the AFS filespace to NFS clients. Youcan disable and reenable translator functionality by using the fs exportafs command’s -start argument. The command’s otherarguments control other aspects of translator behavior.

• The -convert argument controls whether the second and third (group and other) sets of UNIX mode bits on an AFS file ordirectory being exported to NFS are set to match the first (owner) mode bits. By default, the mode bits are set to match.

Unlike AFS, NFS uses all three sets of mode bits when determining whether a user can read or write a file, even one stored inAFS. Some AFS files possibly do not have any group and other mode bits turned on, because AFS uses only the owner bits incombination with the ACL on the file’s directory. If only the owner mode bits are set, NFS allows only the file’s owner of thefile to read or write it. Setting the -convert argument to the value on enables other users to access the file in the same manneras the owner. Setting the value off preserves the mode bits set on the file as stored in AFS.

• The -uidcheck argument controls whether tokens can be assigned to an NFS user whose local UID on the NFS client machinediffers from the local UID associated with the tokens on the translator machine. By default, this is possible.

If you turn on UID checking by setting the value on, then tokens can be assigned only to an NFS user whose local UID matchesthe local UID of the process on the translator machine that is assigning the tokens. One consequence is that there is no point inincluding the -id argument to the knfs command: the only acceptable value is the local UID of the command’s issuer, which isthe value used when the -id argument is omitted. Requiring matching UIDs in this way is effective only when users have thesame local UID on the translator machine as on NFS client machines. In that case, it guarantees that users assign their tokensonly to their own NFS sessions. For instructions, see Authenticating on Unsupported NFS Client Machines.


NoteTurning on UID checking also prevents users on supported NFS clients from using the klog command to authenticate on theNFS client directly. They must authenticated and use the knfs command on the translator machine instead. This is becauseafter the klog command interpreter obtains the token on the NFS client, it passes it to the Cache Manager’s remote executordaemon, which makes the system call that stores the token in a credential structure on the translator machine. The remoteexecutor generally runs as the local superuser root, so in most cases its local UID (normally zero) does not match the localUID of the user who issued the klog command on the NFS client machine.On the other hand, although using the knfs command instead of the klog command is possibly less convenient for users, iteliminates a security exposure: the klog command interpreter passes the token across the network to the remote executordaemon in clear text mode.

If you disable UID checking by assigning the value off , the issuer of the knfs command can assign tokens to a user who hasa different local UID on the NFS client machine, such as the local superuser root. Indeed, more than one issuer of the knfscommand can assign tokens to the same user on the NFS client machine. Each time a different user issues the knfs commandwith the same value for the -id argument, that user’s tokens overwrite the existing ones. This can result in unpredictable accessfor the NFS user.

• The -submounts argument controls whether users on the NFS client can mount AFS directories other than the top-level /afsdirectory. By default, the translator does not permit these submounts.

Submounts can be useful in a couple of circumstances. If, for example, NFS users need to access their own AFS homedirectories only, then creating a submount to it eliminates the need for them to know or enter the complete path. Similarly, youcan use a submount to prevent users from accessing parts of the filespace higher in the AFS hierarchy than the submount.

.3.4 To configure an NFS/AFS translator machine

The following instructions configure the translator to enable users to issue AFS commands. Omit Step 6 if you do not want toenable this functionality.



2. Configure the NFS/AFS translator machine as an NFS server, if it is not already. Follow the instructions provided by yourNFS supplier. The appropriate number of NFS server daemons (such as nfsd) depends on the number of potential NFSclients.

3. Configure the NFS/AFS translator machine as an AFS client, if it is not already. For the most predictable performance, thetranslator machine’s local copies of the /usr/vice/etc/CellServDB and /usr/vice/etc/ThisCell files must be the same as onother client machines in the cell.

4. Modify the file that controls mounting of directories on the machine by remote NFS clients.

• On systems that use the /etc/exports file, edit it to enable export of the /afs directory to NFS clients. You can list thenames of specific NFS client machines if you want to provide access only to certain users. For a description of the file’sformat, see the NFS manual page for exports(5).The following example enables any NFS client machine to mount the machine’s /afs, /usr, and /usr2 directories:

/afs/usr/usr2

• On system types that use the share command, edit the /etc/dfs/dfstab file or equivalent to include share instructionsthat enable remote mounts of the /afs directory. Most distributions include the binary as /usr/sbin/share. The followingexample commands enable remote mounts of the root ( / ) and /afs directories. To verify the correct syntax, consult themanual page for the share command.


share -F nfs -o rw -d "root" /share -F nfs -o rw -d "afs gateway" /afs

5. Edit the machine’s AFS initialization file to invoke the standard UNIX exportfs command after the afsd program runs. Onsome system types, the modifications you made in Step 4 are not enough to enable exporting the AFS filespace via the /afsdirectory, because the resulting configuration changes are made before the afsd program runs during machine initialization.Only after the afsd program runs does the /afs directory become the mount point for the entire AFS filespace; before, it isa local directory like any other.

6. Modify the afsd command in the AFS initialization file to include the -rmtsys flag.

For system types other than IRIX, the instructions in the OpenAFS Quick Beginnings for configuring the Cache Managerexplain how to add the -rmtsys flag, for example by adding it to the line in the script that defines the value for the OPTIONSvariable.

On IRIX systems, the AFS initialization script automatically adds the -rmtsys flag if you have activated the afsxnfsconfiguration variable as instructed in the OpenAFS Quick Beginnings instructions for incorporating AFS extensions intothe kernel. If the variable is not already activated, issue the following command.

# /etc/chkconfig -f afsxnfs on

7. (Optional) Depending on the number of NFS clients you expect this machine to serve, it can be beneficial to add otherarguments to the afsd command in the machine’s initialization file, such as the -daemons argument to set the numberof background daemons. See Administering Client Machines and the Cache Manager and the afsd reference page in theOpenAFS Administration Reference.

8. Reboot the machine. On many system types, the appropriate command is shutdown; consult your operating systemadministrator’s guide.

# shutdown appropriate_options

.3.5 To disable or enable Translator functionality, or set optional features



2. Issue the fs exportafs command.

# fs exportafs nfs [-start {on | off}} ] [-convert {on | off}][-uidcheck {on | off}] [-submounts {on | off}]

-start Disables translator functionality if the value is off or reenables it if the value is on. Omit this argument to displaythe current setting of all parameters set by this command.

-convert Controls the setting of the second and third (group and other) sets of UNIX mode bits on AFS files and direc-tories as exported to NFS clients If the value is on, they are set to match the owner mode bits. If the value is off, thebits are not changed. If this argument is omitted, the default value is on.

-uidcheck Controls whether issuers of the knfs command can specify a value for its -id argument that does not matchtheir AFS UID:• If the value is on, the value of the -id argument must match the issuer’s local UID.• If the value is off, the issuer of the knfs command can use the -id argument to assign tokens to a user who has a

different local UID on the NFS client machine, such as the local superuser root.If this argument is omitted, the default value is off.

-submounts Controls whether the translator services an NFS mount of any directory in the AFS filespace other than thetop-level /afs directory. If the value is on, such submounts are allowed. If the value is off, only mounts of the /afsdirectory are allowed. If this argument is omitted, the default value is off.


.4 Configuring NFS Client Machines

Any NFS client machine that meets the following requirements can access files in AFS via the NFS/AFS Translator. It does notneed to be configured as an AFS client machine.

• It must NFS-mount a translator machine’s /afs directory on a local directory, which by convention is also called /afs. Thefollowing instructions explain how to add the mount command to the NFS client machine’s /etc/fstab file or equivalent.

The directory on which an NFS client mounts the translator’s machine’s /afs directory can be called something other than /afs.For instance, to make it easy to switch to another translator machine if the original one becomes inaccessible, you can mountmore than one translator machine’s /afs directory. Name the mount /afs for the translator machine that you normally use, anduse a different name the mount to each alternate translator machine.

Mounting the AFS filespace on a directory other than /afs introduces another requirement, however: when issuing a commandthat takes an AFS pathname argument, you must specify the full pathname, starting with /afs, rather than a relative pathname.Suppose, for example, that a translator machine’s AFS filespace is mounted at /afs2 on an NFS client machine and you issuethe following command to display the ACL on the current working directory, which is in AFS:

% fs listacl .

The fs command interpreter on the NFS client must construct a full pathname before passing the request to the Cache Manageron the translator machine. The AFS filespace is mounted at /afs2, so the full pathname starts with that string. However, theCache Manager on the translator cannot find a directory called /afs2, because its mount of the AFS filespace is called /afs. Thecommand fails. To prevent the failure, provide the file’s complete pathname, starting with the string /afs.

• It must run an appropriate number of NFS client biod daemons, which improve performance by handling pre-reading anddelayed writing. Most NFS vendors recommend running four such daemons, and most NFS initialization scripts start themautomatically. Consult your NFS documentation.

To enable users to issue AFS commands, the NFS client machine must also be a supported system type (one for which AFSbinaries are available) and able to access the AFS command binaries. The OpenAFS Release Notes list the supported systemtypes in each release.

In addition, the AFSSERVER and AFSCONF environment variables must be set appropriately, as discussed in Setting theAFSSERVER and AFSCONF Environment Variables.

.4.1 To configure an NFS client machine to access AFS

NoteThe following instructions enable NFS users to issue AFS commands. Omit Step 5 and Step 6 if you do not want to enable thisfunctionality.



2. Configure the machine as an NFS client machine, if it is not already. Follow the instructions provided by your NFS vendor.The number of NFS client (biod) daemons needs to be appropriate for the expected load on this machine. The usualrecommended number is four.

3. Create a directory called /afs on the machine, if one does not already exist, to act as the mount point for the translatormachine’s /afs directory. It is acceptable to use other names, but doing so introduces the limitation discussed in theintroduction to this section.

# mkdir /afs


4. Modify the machine’s file systems registry file (/etc/fstab or equivalent) to include a command that mounts a translatormachine’s /afs directory. To verify the correct syntax of the mount command, see the operating system’s mount(5) manualpage. The following example includes options that are appropriate on many system types.

mount -o hard,intr,timeo=300 translator_machine:/afs /afs

where

hard Indicates that the NFS client retries NFS requests until the NFS server (translator machine) responds. When usingthe translator, file operations possibly take longer than with NFS alone, because they must also pass through the AFSCache Manager. With a soft mount, a delayed response from the translator machine can cause the request to abort.Many NFS versions use hard mounts by default; if your version does not, it is best to add this option.

intr Enables the user to use a keyboard interrupt signal (such as <Ctrl-c>) to break the mount when the translatormachine is inaccessible. Include this option only if the hard option is used, in which case the connection does notautomatically break off when a translator machine goes down.

timeo Sets the maximum time (in tenths of seconds) the translator can take to respond to the NFS client’s request beforethe client considers the request timed out. With a hard mount, setting this option to a high number like 300 reduces thenumber of error messages like the following, which are generated when the translator does not respond immediately.

NFS server translator is not responding, still trying

With a soft mount, it reduces the number of actual errors returned on timed-out requests.

translator_machine Specifies the fully-qualified hostname of the translator machine whose /afs directory is to bemounted on the client machine’s /afs directory.

NoteTo mount the translator machine’s /afs directory onto a directory on the NFS client other than /afs, substitute the alternatedirectory name for the second instance of /afs in the mount command.

5. (Optional) If appropriate, create the /.AFSSERVER file to set the AFSSERVER environment variable for all of themachine’s users. For a discussion, see Setting the AFSSERVER and AFSCONF Environment Variables. Place a singleline in the file, specifying the fully-qualified hostname of the translator machine that is to serve as the remote executor. Toenable users to issue commands that handle tokens, it must be the machine named as translator_machine in Step 4.

6. (Optional) If appropriate, create the /.AFSCONF file to set the AFSCONF environment variable for all of the machine’susers. For a discussion, see Setting the AFSSERVER and AFSCONF Environment Variables. Place a single line in the file,specifying the name of the directory where the CellServDB and ThisCell files reside. If you use a central update sourcefor these files (by convention, /afs/cellname/common/etc), name it here.

.5 Configuring User Accounts

There are no requirements for NFS users to access AFS as unauthenticated users. To take advantage of more AFS functionality,however, they must meet the indicated requirements.

• To access AFS as authenticated users, they must of course authenticate with AFS, which requires an entry in the Protectionand Authentication Databases.

• To create and store files, they need the required ACL permissions. If you are providing a home directory for storage of personalfiles, it is conventional to create a dedicated volume and mount it at the user’s home directory location in the AFS filespace.

• To issue AFS commands, they must meet several additional requirements:

– They must be working on an NFS client machine of a supported system type and from which the AFS command binariesare accessible.


– Their command shell must define values for the AFSSERVER and AFSCONF environment variables, as described inSetting the AFSSERVER and AFSCONF Environment Variables. It is often simplest to define the variables by creating/.AFSSERVER and /.AFSCONF file in the NFS client machine’s root directory, but you can also either set the variables ineach user’s shell initialization file (.cshrc or equivalent), or create files called .AFSSERVER and .AFSCONF in each user’shome directory.

– They must have an entry in the AFS Protection and Authentication Databases, so that they can authenticate if the commandrequires AFS privilege. Other commands instead require assuming the local root identity on the translator machine; forfurther discussion, see The AFSSERVER Variable.

– Their PATH environment variable must include the pathname to the appropriate AFS binaries. If a user works on NFS clientmachines of different system types, include the @sys variable in the pathname rather than an actual system type name.

.5.1 To configure a user account for issuing AFS commands

1. Create entries for the user in the Protection and Authentication Databases, or create a complete AFS account. See theinstructions for account creation in Creating and Deleting User Accounts with the uss Command Suite or AdministeringUser Accounts.

2. Modify the user’s PATH environment variable to include the pathname of AFS binaries, such as /afs/cellname/sysname/usr/afsws/bin.If the user works on NFS client machines of different system types, considering replacing the specific sysname value withthe @sys variable. The PATH variable is commonly defined in a login or shell initialization file (such as the .login or.cshrc file).

3. (Optional) Set the AFSSERVER and AFSCONF environment variables if appropriate. This is required if the NFS clientmachines on which the user works do not have the /.AFSSERVER and /.AFSCONF files in their root directories, or ifyou want user-specific values to override those settings.

Either define the variables in the user’s login or shell initialization file, or create the files .AFSSERVER and .AFSCONFfiles in the user’s home directory.

For the AFSSERVER variable, specify the fully-qualified hostname of the translator machine that is to serve as the remoteexecutor. For the AFSCONF variable, specify the name of the directory where the CellServDB and ThisCell files reside.If you use a central update source for these files (by convention, /afs/cellname/common/etc), name it here.

4. If the pathname you defined in Step 2 includes the @sys variable, instruct users to check that their system name is definedcorrectly before they issue AFS commands. They issue the following command:

% fs sysname

.6 Authenticating on Unsupported NFS Client Machines

The knfs command enables users to authenticate with AFS when they are working on NFS clients of unsupported system types(those for which AFS binaries are not available). This enables such users to access the AFS file tree to the same extent as anyother AFS user. They cannot, however, issue AFS commands, which is possible only on NFS client machines of supportedsystem types.

To authenticate on an unsupported system type, establish a connection to the translator machine (using a facility such as telnet),and issue the klog command to obtain tokens for all the cells you wish to contact during the upcoming NFS session. Then issuethe knfs command, which stores the tokens in a credential structure associated with your NFS session. The Cache Manager usesthe tokens when performing AFS access requests that originate from your NFS session.

More specifically, the credential structure is identified by a process authentication group (PAG) number associated with a par-ticular local UID on a specific NFS client machine. By default, the NFS UID recorded in the credential structure is the sameas your local UID on the translator machine. You can include the -id argument to specify an alternate NFS UID, unless thetranslator machine’s administrator has used the fs exportafs command’s -uidcheck argument to enable UID checking. In thatcase, the value of the -id argument must match your local UID on the translator machine (so there is not point to including the-id argument). Enforcing matching UIDs prevents someone else from placing their tokens in your credential structure, eitheraccidentally or on purpose. However, it means that your cell’s administrators must set your local UID on the NFS client to match


your local UID on the translator machine. It also makes it impossible to authenticate by issuing the klog command on supportedNFS clients, meaning that all NFS users must use the knfs command. See Controlling Optional Translator Features.

After issuing the knfs command, you can begin working on the NFS client with authenticated access to AFS. When you arefinished working, it is a good policy to destroy your tokens by issuing the knfs command on the translator machine again, thistime with the -unlog flag. This is simpler if you have left the connection to the translator machine open, but you can alwaysestablish a new connection if you closed the original one.

If your NFS client machine is a supported system type and you wish to issue AFS commands on it, include the -sysnameargument to the knfs command. The remote executor daemon on the translator machine substitutes its value for the @sysvariable in pathnames when executing AFS commands that you issue on the NFS client machine. If your PATH environmentvariable uses the @sys variable in the pathnames for directories that house AFS binaries (as recommended), then setting thisargument enables the remote executor daemon to access the AFS binaries appropriate for your NFS client machine even if itssystem type differs from the translator machine’s.

If you do not issue the knfs command (or the klog command on the NFS client machine itself, if it is a supported systemtype), then you are not authenticated with AFS. For a description of unauthenticated access, see Enabling Unauthenticated orAuthenticated AFS Access.

.6.1 To authenticate using the knfs command

1. Log on to the relevant translator machine, either on the console or remotely by using a program such as telnet.

2. Obtain tokens for every cell you wish to access while working on the NFS client. AFS-modified login utilities acquire atoken for the translator machine’s local cell by default; use klog command to obtain tokens for other cells if desired.

3. Issue the knfs command to create a credential structure in the translator machine’s kernel memory for storing the tokensobtained in the previous step. Include the -id argument to associate the structure with a UID on the NFS client that differsfrom your local UID on the translator machine. This is possible unless the translator machine’s administrator has enabledUID checking on the translator machine; see Controlling Optional Translator Features. If the NFS client machine is asupported system type and you wish to issue AFS commands on it, include the -sysname argument to specify its systemtype.

% knfs -host <host name> [-id <user ID (decimal)>] \[-sysname <host’s ’@sys’ value>]

where

-host Specifies the fully-qualified hostname of the NFS client machine on which you are working.

-id Specifies a local UID number on the NFS client machine with which to associate the tokens, if different from yourlocal UID on the translator machine. If this argument is omitted, the tokens are associated with an NFS UID thatmatches your local UID on the translator machine. In both cases, the NFS client software marks your AFS accessrequests with the NFS UID when it forwards them to the Cache Manager on the translator machine.

-sysname Specifies the value that the local machine’s remote executor daemon substitutes for the @sys variable in path-names when executing AFS commands issued on the NFS client machine (which must be a supported system type).

The following error message indicates that the translator machine’s administrator has enabled UID checking and you haveprovided a value that differs from your local UID on the translator machine.

knfs: Translator in ’passwd sync’ mode; remote uid must be the same as local uid

4. Close the connection to the translator machine (if desired) and work on the NFS client machine.

.6.2 To display tokens using the knfs command

1. Log on to the relevant translator machine, either on the console or remotely by using a program such as telnet.


2. Issue the knfs command with the -tokens flag to display the tokens associated with either the NFS UID that matches yourlocal UID on the translator machine or the NFS UID specified by the -id argument.

% knfs -host <host name> [-id <user ID (decimal)>] -tokens

where

-host Specifies the fully-qualified hostname of the NFS client machine on which you are working.

-id Specifies the local UID on the NFS client machine for which to display tokens, if different from your local UID on thetranslator machine. If this argument is omitted, the tokens are for the NFS UID that matches your local UID on thetranslator machine.

-tokens Displays the tokens.

3. Close the connection to the translator machine if desired.

.6.3 To discard tokens using the knfs command

1. If you closed your connection to the translator machine after issuing the knfs command, reopen it.

2. Issue the knfs command with the -unlog flag.

% knfs -host <host name> [-id <user ID (decimal)>] -unlog

where

-host Specifies the fully-qualified hostname of the NFS client machine you are working on.

-id Specifies the local UID number on the NFS client machine for which to discard the associated tokens, if different fromyour local UID on the translator machine. If this argument is omitted, the tokens associated with an NFS UID thatmatches your local UID on the translator machine are discarded.

-unlog Discards the tokens.

3. If desired, close the connection to the translator machine.


Part VI

Using AFS Commands


This section describes the components of AFS commands and how to make entering commands more efficient by using shortened

forms. It has the following sections:

AFS Command SyntaxRules for Entering AFS CommandsRules for Using Abbreviations and AliasesDisplaying Online Help for AFS Commands

.7 AFS Command Syntax

AFS commands that belong to suites have the following structure:

command_suite operation_code-switch <value>[+] [-flag]

.7.1 Command Names

Together, the command_suite and operation_code make up the command name.

The command_suite specifies the group of related commands to which the command belongs, and indicates which commandinterpreter and server process perform the command. AFS has several command suites, including bos, fs, kas, pts, scout, ussand vos. Some of these suites have an interactive mode in which the issuer omits the command_suite portion of the commandname.

The operation_code tells the command interpreter and server process which action to perform. Most command suites includeseveral operation codes. The OpenAFS Administration Reference describes each operation code in detail, and the OpenAFSAdministration Guide describes how to use them in the context of performing administrative tasks.

Several AFS commands do not belong to a suite and so their names do not have a command_suite portion. Their structure isotherwise similar to the commands in the suites.

.7.2 Options

The term option refers to both arguments and flags, which are described in the following sections.

.7.3 Arguments

One or more arguments can follow the command name. Arguments specify the entities on which to act while performing thecommand (for example, which server machine, server process, or file). To minimize the potential for error, provide a command’sarguments in the order prescribed in its syntax definition.

Each argument has two parts, which appear in the indicated order:

• The switch specifies the argument’s type and is preceded by a hyphen ( - ). For instance, the switch -server usually indicatesthat the argument names a server machine. Switches can often be omitted, subject to the rules outlined in Conditions forOmitting Switches.

• The value names a particular entity of the type specified by the preceding switch. For example, the proper value for a -serverswitch is a server machine name like fs3.example.com. Unlike switches (which have a required form), values vary dependingon what the issuer wants to accomplish. Values appear surrounded by angle brackets (< >) in command descriptions and theonline help to show that they are user-supplied variable information.

Some arguments accept multiple values, as indicated by trailing plus sign ( + ) in the command descriptions and online help.How many of a command’s arguments take multiple values, and their ordering with respect to other arguments, determine whenit is acceptable to omit switches. See Conditions for Omitting Switches.

Some commands have optional as well as required arguments; the command descriptions and online help show optional argu-ments in square brackets ([ ]).


.7.4 Flags

Some commands have one or more flags, which specify the manner in which the command interpreter and server process performthe command, or what kind of output it produces. Flags are preceded by hyphens like switches, but they take no values. Althoughthe command descriptions and online help generally list a command’s flags after its arguments, there is no prescribed order forflags. They can appear anywhere on the command line following the operation code, except in between the parts of an argument.Flags are always optional.

.7.5 An Example Command

The following example illustrates the different parts of a command that belongs to an AFS command suite.

% bos getdate -server fs1.example.com -file ptserver kaserver

where

• bos is the command suite. The BOS Server executes most of the commands in this suite.

• getdate is the operation code. It tells the BOS Server on the specified server machine (in this case fs1.example.com) to reportthe modification dates of binary files in the local /usr/afs/bin directory.

• -server fs1.example.com is one argument, with -server as the switch and fs1.example.com as the value. This argumentspecifies the server machine on which BOS Server is to collect and report binary dates.

• -file ptserver kaserver is an argument that takes multiple values. The switch is -file and the values are ptserver and kaserver.This argument tells the BOS Server to report the modification dates on the files /usr/afs/bin/kaserver and /usr/afs/bin/pt-server.

.7.6 Rules for Entering AFS Commands

Enter each AFS command on a single line (press <Return> only at the end of the command). Some commands in this documentappear broken across multiple lines, but that is for legibility only.

Use a space to separate each element on a command line from its neighbors. Spaces rather than commas also separate multiplevalues of an argument.

In many cases, the issuer of a command can reduce the amount of typing necessary by using one or both of the following methods:

• Omitting switches

• Using accepted abbreviations for operation codes, switches (if they are included at all), and some types of values

The following sections explain the conditions for omitting or shortening parts of the command line. It is always acceptable totype a command in full, with all of its switches and no abbreviations.

.7.6.1 Conditions for Omitting Switches

It is always acceptable to type the switch part of an argument, but in many cases it is not necessary. Specifically, switches can beomitted if the following conditions are met.

• All of the command’s required arguments appear in the order prescribed by the syntax statement

• No switch is provided for any argument

• There is only one value for each argument (but note the important exception discussed in the following paragraph)


Omitting switches is possible only because there is a prescribed order for each command’s arguments. When the issuer doesnot include switches, the command interpreter relies instead on the order of arguments; it assumes that the first element after theoperation code is the command’s first argument, the next element is the command’s second argument, and so on. The importantexception is when a command’s final required argument accepts multiple values. In this case, the command interpreter assumesthat the issuer has correctly provided one value for each argument up through the final one, so any additional values at the endbelong to the final argument.

The following list describes the rules for omitting switches from the opposite perspective: an argument’s switch must be providedwhen any of the following conditions apply.

• The command’s arguments do not appear in the prescribed order

• An optional argument is omitted but a subsequent optional argument is provided

• A switch is provided for a preceding argument

• More than one value is supplied for a preceding argument (which must take multiple values, of course); without a switch onthe current argument, the command interpreter assumes that the current argument is another value for the preceding argument

.7.6.2 An Example of Omitting Switches

Consider again the example command from An Example Command.

% bos getdate -server fs1.example.com -file ptserver kaserver

This command has two required arguments: the server machine name (identified by the -server switch) and binary file name(identified by the -file switch). The second argument accepts multiple values. By complying with all three conditions, the issuercan omit the switches:

% bos getdate fs1.example.com ptserver kaserver

Because there are no switches, the bos command interpreter relies on the order of arguments. It assumes that the first elementfollowing the operation code, fs1.example.com, is the server machine name, and that the next argument, ptserver, is a binaryfile name. Then, because the command’s second (and last) argument accepts multiple values, the command interpreter correctlyinterprets kaserver as an additional value for it.

On the other hand, the following is not acceptable because it violates the first two conditions in Conditions for Omitting Switches:even though there is only one value per argument, the arguments do not appear in the prescribed order, and a switch is providedfor one argument but not the other.

% bos getdate ptserver -server fs1.example.com

.7.7 Rules for Using Abbreviations and Aliases

This section explains how to abbreviate operation codes, option names, server machine names, partition names, and cell names.It is not possible to abbreviate other types of values.

.7.7.1 Abbreviating Operation Codes

It is acceptable to abbreviate an operation code to the shortest form that still distinguishes it from the other operation codes in itssuite.


For example, it is acceptable to shorten bos install to bos i because there are no other operation codes in the bos command suitethat begin with the letter i. In contrast, there are several bos operation codes that start with the letter s, so the abbreviations must be

longer to remain unambiguous:

bos sa for bos salvagebos seta for bos setauthbos setc for bos setcellnamebos setr for bos setrestartbos sh for bos shutdownbos start for bos startbos startu for bos startupbos stat for bos statusbos sto for bos stop

In addition to abbreviations, some operation codes have an alias, a short form that is not derived by abbreviating the opera-tion code to its shortest unambiguous form. For example, the alias for the fs setacl command is fs sa, whereas the shortestunambiguous abbreviation is fs seta.

There are two usual reasons an operation code has an alias:

• Because the command is frequently issued, it is convenient to have a form shorter than the one derived by abbreviating. The fssetacl command is an example.

• Because the command’s name has changed, but users of previous versions of AFS know the former name. For example, boslisthosts has the alias bos getcell, its former name. It is acceptable to abbreviate aliases to their shortest unambiguous form(for example, bos getcell to bos getc).

Even if an operation code has an alias, it is still acceptable to use the shortest unambiguous form. Thus, the fs setacl commandhas three acceptable forms: fs setacl (the full form), fs seta (the shortest abbreviation), and fs sa (the alias).

.7.7.2 Abbreviating Switches and Flags

It is acceptable to shorten a switch or flag to the shortest form that distinguishes it from the other switches and flags for itsoperation code. It is often possible to omit switches entirely, subject to the conditions listed in Conditions for Omitting Switches.

.7.7.3 Abbreviating Server Machine Names

AFS server machines must have fully-qualified Internet-style host names (for example, fs1.example.com), but it is not alwaysnecessary to type the full name on the command line. AFS commands accept unambiguous shortened forms, but depend on thecell’s name service (such as the Domain Name Service) or a local host table to resolve a shortened name to the fully-qualifiedequivalent when the command is issued.

Most commands also accept the dotted decimal form of the machine’s IP address as an identifier.

.7.7.4 Abbreviating Partition Names

Partitions that house AFS volumes must have names of the form /vicepx or /vicepxx, where the variable final portion is oneor two lowercase letters. By convention, the first server partition created on a file server machine is called /vicepa, the second/vicepb, and so on. The OpenAFS Quick Beginnings explains how to configure and name a file server machine’s partitions inpreparation for storing AFS volumes on them.

When issuing AFS commands, you can abbreviate a partition name using any of the following forms:

/vicepa = vicepa = a = 0/vicepb = vicepb = b = 1

After /vicepz (for which the index is 25) comes

/vicepaa = vicepaa = aa = 26/vicepab = vicepab = ab = 27


and so on through

/vicepiv = vicepiv = iv = 255

.7.7.5 Abbreviating Cell Names

A cell’s full name usually matches its Internet domain name (such as example.org for the Example Organization or example.comfor Example Corporation). Some AFS commands accept unambiguous shortened forms, usually with respect to the local /us-r/vice/etc/CellServDB file but sometimes depending on the ability of the local name service to resolve the corresponding domainname.

.7.8 Displaying Online Help for AFS Commands

To display online help for AFS commands that belong to suites, use the help and apropos operation codes. A -help flag is alsoavailable on every almost every AFS command.

The online help entry for a command consists of two or three lines:

• The first line names the command and briefly describes what it does

• If the command has aliases, they appear on the next line

• The final line, which begins with the string Usage:, lists the command’s options in the prescribed order; online help entriesuse the same typographical symbols (brackets and so on) as this documentation.

If no operation code is specified, the help operation code displays the first line (short description) for every operation code in thesuite:

% command_suite help

If the issuer specifies one or more operation codes, the help operation code displays each command’s complete online entry(short description, alias if any, and syntax):

% command_suite help operation_code+

The -help flag displays a command’s syntax but not the short description or alias:

% command_name -help

The apropos operation code displays the short description of any command in a suite whose operation code or short descriptionincludes the specified keyword:

% command_suite apropos "<help string>"

The following example command displays the complete online help entry for the fs setacl command:

% fs help setaclfs setacl: set access control listaliases: saUsage: fs setacl -dir <directory>+ -acl <access list entries>+[-clear] [-negative] [-id] [-if] [-help]

To see only the syntax statement, use the -help flag:

% fs setacl -helpUsage: fs setacl -dir <directory>+ -acl <access list entries>+[-clear] [-negative] [-id] [-if] [-help]


In the following example, a user wants to display the quota for her home volume. She knows that the relevant command belongsto the fs suite, but cannot remember the operation code. She uses quota as the keyword:

% fs apropos quotalistquota: list volume quotaquota: show volume quota usagesetquota: set volume quota

The following illustrates the error message that results if no command name or short description contains the keyword:

% fs apropos "list quota"Sorry, no commands found


Part VII

The afsmonitor Program Statistics


This appendix lists the statistics you can gather with the afsmonitor program, grouping them by category and section, and brieflydescribing each field, group, and section. For instructions on using the afsmonitor program, see Monitoring and Auditing AFSPerformance

.8 The Cache Manager Statistics

Cache Manager statistics fields are classified into the following sections and groups:

• PerfStats_section - Performance Statistics Section.

– PerfStats_group - Performance Statistics Group.

– misc_group - Miscellaneous Group.

• Server_UpDown_section - Server Up/Down Statistics Section.

– FS_upDown_SC_group - File Server Up/Down Statistics in Same Cell Group.

– FS_upDown_OC_group - File Server Up/Down Statistics in Other Cells Group.

– VL_upDown_SC_group - VL Server Up/Down Statistics in Same Cell Group.

– VL_upDown_OC_group - VL Server Up/Down Statistics in Other Cells Group.

• RPCop_section - RPC Operation Measurements Section.

– FS_RPCopTimes_group - File Server RPC Operation Timings Group.

– FS_RPCopErrors_group - File Server RPC Operation Errors Group.

– FS_RPCopBytes_group - File Server RPC Transfer Timings Group.

– CM_RPCopTimes_group - Cache Manager RPC Operation Timings Group.

• Auth_Access_section - Authentication and Replicated File Access Section.

– Auth_Stats_group - Authentication Information for Cache Manager Group.

– Access_Stats_group - Unreplicated File Access Group.

All Cache Manager variables categorized under these sections and groups names are listed below.

.8.1 Performance Statistics Section (PerfStats_section)

Performance Statistics Group (PerfStats_group)

• dlocalAccesses: Number of data accesses to files within local cell.

• vlocalAccesses: Number of stat accesses to files within local cell.

• dremoteAccesses: Number of data accesses to files outside of local cell.

• vremoteAccesses: Number of stat accesses to files outside of local cell.

• cacheNumEntries: Number of cache entries.

• cacheBlocksTotal: Number of (1K) blocks configured for cache.

• cacheBlocksInUse: Number of cache blocks actively in use.

• cacheBlocksOrig: Number of cache blocks at bootup.

• cacheMaxDirtyChunks: Maximum number of dirty cache chunks tolerated.

• cacheCurrDirtyChunks: Current number of dirty cache chunks.


• dcacheHits: Number of data files found in local cache.

• vcacheHits: Number of stat entries found in local cache.

• dcacheMisses: Number of data files not found in local cache.

• vcacheMisses: Number of stat entries not found in local cache.

• cacheFlushes: Number of files flushed from cache.

• cacheFilesReused: Number of cache files reused.

• dcacheXAllocs: Additionally allocated dcaches.

• vcacheXAllocs: Additionally allocated vcaches.

• bufAlloced: Number of buffers allocated by AFS.

• bufHits: Number of pages found on buffer cache.

• bufMisses: Number of pages not found on buffer cache.

• bufFlushDirty: Number of cached dirty buffers flushed because all were busy.

• LargeBlocksActive: Number of currently used large free pool entries.

• LargeBlocksAlloced: Number of allocated large free pool entries.

• SmallBlocksActive: Number of currently used small free pool entries.

• SmallBlocksAlloced: Number of allocated used small free pool entries.

• OutStandingMemUsage: Amount of allocated memory.

• OutStandingAllocs: Outstanding osi_allocs (no osi_frees yet).

• CallBackAlloced: Number of callback structures allocated.

• CallBackFlushes: Number of callback flush operations performed.

• srvRecords: Number of servers currently on record.

• srvRecordsHWM: Server record high water mark.

• srvNumBuckets: Number of server hash chain buckets.

• srvMaxChainLength: Maximum server hash chain length.

• srvMaxChainLengthHWM: Server hash chain high water mark.

• sysName_ID: Sysname ID for host hardware.

Miscellaneous Group (misc_group)

• numPerfCalls: Number of performance calls received.

• epoch: Cache Manager epoch time.

• numCellsVisible: Number of cells we know about.

• numCellsContacted: Number of cells contacted.


.8.2 Server Up/Down Statistics Section (Server_UpDown_section)

File Server Up/Down Statistics in Same Cell Group (FS_upDown_SC_group)

NoteThe records referred to in this section are the internal records kept by the afsmonitor program to track the processes fromwhich data is being gathered.

• fs_sc_numTtlRecords: Number of fileserver records, active or inactive.

• fs_sc_numUpRecords: Number of (active) fileserver records currently marked up.

• fs_sc_numDownRecords: Number of (active) fileserver records currently marked down.

• fs_sc_sumOfRecordAges: Sum of fileserver record lifetimes.

• fs_sc_ageOfYoungestRecord: Age of youngest fileserver record.

• fs_sc_ageOfOldestRecord: Age of oldest fileserver record.

• fs_sc_numDowntimeIncidents: Number of (completed) downtime incidents.

• fs_sc_numRecordsNeverDown: Number of fileserver records never marked down.

• fs_sc_maxDowntimesInARecord: Maximum downtimes seen by any fileserver record.

• fs_sc_sumOfDowntimes: Sum of all (completed) downtimes, in seconds.

• fs_sc_shortestDowntime: Shortest downtime, in seconds.

• fs_sc_longestDowntime: Longest downtime, in seconds.

• fs_sc_down_0_10_min: Down time incidents: 0-10 minutes.

• fs_sc_down_10_30_min: Down time incidents: 10-30 minutes.

• fs_sc_down_half_1_hr: Down time incidents: 30-60 minutes.

• fs_sc_down_1_2_hr: Down time incidents: 1-2 hours.



• fs_sc_down_more_8_hr: Down time incidents: more than 8 hours.

• fs_sc_downDst_0: Down time incidents: 0 times.

• fs_sc_downDst_1: Down time incidents: 1 time.

• fs_sc_downDst_2_5: Down time incidents: 2-5 times.



• fs_sc_downDst_more_50: Down time incidents: more than 50 times.

File Server Up/Down Statistics in Other Cells Group (FS_upDown_OC_group)

• fs_oc_numTtlRecords: Number of fileserver records, active or inactive.

• fs_oc_numUpRecords: Number of (active) fileserver records currently marked up.


• fs_oc_numDownRecords: Number of (active) fileserver records currently marked down.

• fs_oc_sumOfRecordAges: Sum of server record lifetimes.

• fs_oc_ageOfYoungestRecord: Age of youngest fileserver record.

• fs_oc_ageOfOldestRecord: Age of oldest fileserver record.

• fs_oc_numDowntimeIncidents: Number of (completed) downtime incidents.

• fs_oc_numRecordsNeverDown: Number of fileserver records never marked down.

• fs_oc_maxDowntimesInARecord: Maximum downtimes seen by any fileserver.

• fs_oc_sumOfDowntimes: Sum of all (completed) downtimes, in seconds.

• fs_oc_shortestDowntime: Shortest downtime, in seconds.

• fs_oc_longestDowntime: Longest downtime, in seconds.

• fs_oc_down_0_10_min: Down time incidents: 0-10 minutes.

• fs_oc_down_10_30_min: Down time incidents: 10-30 minutes.

• fs_oc_down_half_1_hr: Down time incidents: 30-60 minutes.

• fs_oc_down_1_2_hr: Down time incidents: 1-2 hours.



• fs_oc_down_more_8_hr: Down time incidents: more than 8 hours.

• fs_oc_downDst_0: Down time incidents: 0 times.

• fs_oc_downDst_1: Down time incidents: 1 time.

• fs_oc_downDst_2_5: Down time incidents: 2-5 times.



• fs_oc_downDst_more_50: Down time incidents: more than 50 times.

VL Server Up/Down Statistics in Same Cell Group (VL_upDown_SC_group)

• vl_sc_numTtlRecords: Number of vlserver records, active or inactive.

• vl_sc_numUpRecords: Number of (active) vlserver records currently marked up.

• vl_sc_numDownRecords: Number of (active) vlserver records currently marked down.

• vl_sc_sumOfRecordAges: Sum of vlserver record lifetimes.

• vl_sc_ageOfYoungestRecord: Age of youngest vlserver record.

• vl_sc_ageOfOldestRecord: Age of oldest vlserver record.

• vl_sc_numDowntimeIncidents: Number of (completed) downtime incidents.

• vl_sc_numRecordsNeverDown: Number of vlserver records never marked down.

• vl_sc_maxDowntimesInARecord: Maximum downtimes seen by any vlserver record.

• vl_sc_sumOfDowntimes: Sum of all (completed) downtimes, in seconds.


• vl_sc_shortestDowntime: Shortest downtime, in seconds.

• vl_sc_longestDowntime: Longest downtime, in seconds.

• vl_sc_down_0_10_min: Down time incidents: 0-10 minutes.

• vl_sc_down_10_30_min: Down time incidents: 10-30 minutes.

• vl_sc_down_half_1_hr: Down time incidents: 30-60 minutes.

• vl_sc_down_1_2_hr: Down time incidents: 1-2 hours.



• vl_sc_down_more_8_hr: Down time incidents: more than 8 hours.

• vl_sc_downDst_0: Down time incidents: 0 times.

• vl_sc_downDst_1: Down time incidents: 1 time.

• vl_sc_downDst_2_5: Down time incidents: 2-5 times.



• vl_sc_downDst_more_50: Down time incidents: more than 50 times.

VL Server Up/Down Statistics in Other Cells Group (VL_upDown_DC_group)

• vl_oc_numTtlRecords: Number of vlserver records, active or inactive.

• vl_oc_numUpRecords: Number of (active) vlserver records currently marked up.

• vl_oc_numDownRecords: Number of (active) vlserver records currently marked down.

• vl_oc_sumOfRecordAges: Sum of vlserver record lifetimes.

• vl_oc_ageOfYoungestRecord: Age of youngest vlserver record.

• vl_oc_ageOfOldestRecord: Age of oldest vlserver record.

• vl_oc_numDowntimeIncidents: Number of (completed) downtime incidents.

• vl_oc_numRecordsNeverDown: Number of vlserver records never marked down.

• vl_oc_maxDowntimesInARecord: Maximum downtimes seen by any vlserver record.

• vl_oc_sumOfDowntimes: Sum of all (completed) downtimes, in seconds.

• vl_oc_shortestDowntime: Shortest downtime, in seconds.

• vl_oc_longestDowntime: Longest downtime, in seconds.

• vl_oc_down_0_10_min: Down time incidents: 0-10 minutes.

• vl_oc_down_10_30_min: Down time incidents: 10-30 minutes.

• vl_oc_down_half_1_hr: Down time incidents: 30-60 minutes.

• vl_oc_down_1_2_hr: Down time incidents: 1-2 hours.




• vl_oc_down_more_8_hr: Down time incidents: more than 8 hours.

• vl_oc_downDst_0: Down time incidents: 0 times.

• vl_oc_downDst_1: Down time incidents: 1 time.

• vl_oc_downDst_2_5: Down time incidents: 2-5 times.



• vl_oc_downDst_more_50: Down time incidents: more than 50 times.

.8.3 RPC Operation Measurements Section (RPCop_section)

File Server RPC Operation Timings Group (FS_RPCopTimes_group)

• FetchData_ops: Number of FetchData operations executed.

• FetchData_ops_ok: Number of successful FetchData operations.

• FetchData_sum: Sum of timings for FetchData operations.

• FetchData_sqr: Sum of squares of sample timings for FetchData operations.

• FetchData_min: Minimum execution time observed for FetchData operations.

• FetchData_max: Maximum execution time observed for FetchData operations.

• FetchACL_ops: Number of FetchACL operations executed.

• FetchACL_ops_ok: Number of successful FetchACL operations.

• FetchACL_sum: Sum of timings for FetchACL operations.

• FetchACL_sqr: Sum of squares of sample timings for FetchACL operations.

• FetchACL_min: Minimum execution time observed for FetchACL operations.

• FetchACL_max: Maximum execution time observed for FetchACL operations.

• FetchStatus_ops: Number of FetchStatus operations executed.

• FetchStatus_ops_ok: Number of successful FetchStatus operations.

• FetchStatus_sum: Sum of timings for FetchStatus operations.

• FetchStatus_sqr: Sum of squares of sample timings for FetchStatus operations.

• FetchStatus_min: Minimum execution time observed for FetchStatus operations.

• FetchStatus_max: Maximum execution time observed for FetchStatus operations.

• StoreData_ops: Number of StoreData operations executed.

• StoreData_ops_ok: Number of successful StoreData operations.

• StoreData_sum: Sum of timings for StoreData operations.

• StoreData_sqr: Sum of squares of sample timings for StoreData operations.

• StoreData_min: Minimum execution time observed for StoreData operations.

• StoreData_max: Maximum execution time observed for StoreData operations.

• StoreACL_ops: Number of StoreACL operations executed.


• StoreACL_ops_ok: Number of successful StoreACL operation.

• StoreACL_sum: Sum of timings for StoreACL operations.

• StoreACL_sqr: Sum of squares of sample timings for StoreACL operations.

• StoreACL_min: Minimum execution time observed for StoreACL operations.

• StoreACL_max: Maximum execution time observed for StoreACL operations.

• StoreStatus_ops: Number of StoreStatus operations executed.

• StoreStatus_ops_ok: Number of successful StoreStatus operations.

• StoreStatus_sum: Sum of timings for StoreStatus operations.

• StoreStatus_sqr: Sum of squares of sample timings for StoreStatus operations.

• StoreStatus_min: Minimum execution time observed for StoreStatus operations.

• StoreStatus_max: Maximum execution time observed for StoreStatus operations.

• RemoveFile_ops: Number of RemoveFile operations executed.

• RemoveFile_ops_ok: Number of successful RemoveFile operations.

• RemoveFile_sum: Sum of timings for RemoveFile operations.

• RemoveFile_sqr: Sum of squares of sample timings for RemoveFile operations.

• RemoveFile_min: Minimum execution time observed for RemoveFile operations.

• RemoveFile_max: Maximum execution time observed for RemoveFile operations.

• CreateFile_ops: Number of CreateFile operations executed.

• CreateFile_ops_ok: Number of successful CreateFile operations.

• CreateFile_sum: Sum of timings for CreateFile operations.

• CreateFile_sqr: Sum of squares of sample timings for CreateFile operations.

• CreateFile_min: Minimum execution time observed for CreateFile operations.

• CreateFile_max: Maximum execution time observed for CreateFile operations.

• Rename_ops: Number of Rename operations executed.

• Rename_ops_ok: Number of successful Rename operations.

• Rename_sum: Sum of timings for Rename operations.

• Rename_sqr: Sum of squares of sample timings for Rename operations.

• Rename_min: Minimum execution time observed for Rename operations.

• Rename_max: Maximum execution time observed for Rename operations.

• Symlink_ops: Number of Symlink operations executed.

• Symlink_ops_ok: Number of successful Symlink operations.

• Symlink_sum: Sum of timings for Symlink operations.

• Symlink_sqr: Sum of squares of sample timings for Symlink operations.

• Symlink_min: Minimum execution time observed for Symlink operations.

• Symlink_max: Maximum execution time observed for Symlink operations.


• Link_ops: Number of Link operations executed.

• Link_ops_ok: Number of successful Link operations.

• Link_sum: Sum of timings for Link operations.

• Link_sqr: Sum of squares of sample timings for Link operations.

• Link_min: Minimum execution time observed for Link operations.

• Link_max: Maximum execution time observed for Link operations.

• MakeDir_ops: Number of MakeDir operations executed.

• MakeDir_ops_ok: Number of successful MakeDir operations.

• MakeDir_sum: Sum of timings for MakeDir operations.

• MakeDir_sqr: Sum of squares of sample timings for MakeDir operations.

• MakeDir_min: Minimum execution time observed for MakeDir operations.

• MakeDir_max: Maximum execution time observed for MakeDir operations.

• RemoveDir_ops: Number of RemoveDir operations executed.

• RemoveDir_ops_ok: Number of successful RemoveDir operations.

• RemoveDir_sum: Sum of timings for RemoveDir operations.

• RemoveDir_sqr: Sum of squares of sample timings for RemoveDir operations.

• RemoveDir_min: Minimum execution time observed for RemoveDir operations.

• RemoveDir_max: Maximum execution time observed for RemoveDir operations.

• SetLock_ops: Number of SetLock operations executed.

• SetLock_ops_ok: Number of successful SetLock operations.

• SetLock_sum: Sum of timings for SetLock operations.

• SetLock_sqr: Sum of squares of sample timings for SetLock operations.

• SetLock_min: Minimum execution time observed for SetLock operations.

• SetLock_max: Maximum execution time observed for SetLock operations.

• ExtendLock_ops: Number of ExtendLock operations executed.

• ExtendLock_ops_ok: Number of successful ExtendLock operations.

• ExtendLock_sum: Sum of timings for ExtendLock operations.

• ExtendLock_sqr: Sum of squares of sample timings for ExtendLock operations.

• ExtendLock_min: Minimum execution time observed for ExtendLock operations.

• ExtendLock_max: Maximum execution time observed for ExtendLock operations.

• ReleaseLock_ops: Number of ReleaseLock operations executed.

• ReleaseLock_ops_ok: Number of successful ReleaseLock operations.

• ReleaseLock_sum: Sum of timings for ReleaseLock operations.

• ReleaseLock_sqr: Sum of squares of sample timings for StoreStatus operations.

• ReleaseLock_min: Minimum execution time observed for ReleaseLock operations.


• ReleaseLock_max: Maximum execution time observed for ReleaseLock operations.

• GetStatistics_ops: Number of GetStatistics operations executed.

• GetStatistics_ops_ok: Number of successful GetStatistics operations.

• GetStatistics_sum: Sum of timings for GetStatistics operations.

• GetStatistics_sqr: Sum of squares of sample timings for GetStatistics operations.

• GetStatistics_min: Minimum execution time observed for GetStatistics operations.

• GetStatistics_max: Maximum execution time observed for GetStatistics operations.

• GiveUpCallbacks_ops: Number of GiveUpCallbacks operations executed.

• GiveUpCallbacks_ops_ok: Number of successful GiveUpCallbacks operations.

• GiveUpCallbacks_sum: Sum of timings for GiveUpCallbacks operations.

• GiveUpCallbacks_sqr: Sum of squares of sample timings for GiveUpCallbacks operations.

• GiveUpCallbacks_min: Minimum execution time observed for GiveUpCallbacks operations.

• GiveUpCallbacks_max: Maximum execution time observed for GiveUpCallbacks operations.

• GetVolumeInfo_ops: Number of GetVolumeInfo operations executed.

• GetVolumeInfo_ops_ok: Number of successful GetVolumeInfo operations.

• GetVolumeInfo_sum: Sum of timings for GetVolumeInfo operations.

• GetVolumeInfo_sqr: Sum of squares of sample timings for GetVolumeInfo operations.

• GetVolumeInfo_min: Minimum execution time observed for GetVolumeInfo operations.

• GetVolumeInfo_max: Maximum execution time observed for GetVolumeInfo operations.

• GetVolumeStatus_ops: Number of GetVolumeStatus operations executed.

• GetVolumeStatus_ops_ok: Number of successful GetVolumeStatus operations.

• GetVolumeStatus_sum: Sum of timings for GetVolumeStatus operations.

• GetVolumeStatus_sqr: Sum of squares of sample timings for GetVolumeStatus operations.

• GetVolumeStatus_min: Minimum execution time observed for GetVolumeStatus operations.

• GetVolumeStatus_max: Maximum execution time observed for GetVolumeStatus operations.

• SetVolumeStatus_ops: Number of SetVolumeStatus operations executed.

• SetVolumeStatus_ops_ok: Number of successful SetVolumeStatus operations.

• SetVolumeStatus_sum: Sum of timings for SetVolumeStatus operations.

• SetVolumeStatus_sqr: Sum of squares of sample timings for SetVolumeStatus operations.

• SetVolumeStatus_min: Minimum execution time observed for SetVolumeStatus operations.

• SetVolumeStatus_max: Maximum execution time observed for SetVolumeStatus operations.

• GetRootVolume_ops: Number of GetRootVolume operations executed.

• GetRootVolume_ops_ok: Number of successful GetRootVolume operations.

• GetRootVolume_sum: Sum of timings for GetRootVolume operations.

• GetRootVolume_sqr: Sum of squares of sample timings for GetRootVolume operations.


• GetRootVolume_min: Minimum execution time observed for GetRootVolume operations.

• GetRootVolume_max: Maximum execution time observed for GetRootVolume operations.

• CheckToken_ops: Number of CheckToken operations executed.

• CheckToken_ops_ok: Number of successful CheckToken operations.

• CheckToken_sum: Sum of timings for CheckToken operations.

• CheckToken_sqr: Sum of squares of sample timings for CheckToken operations.

• CheckToken_min: Minimum execution time observed for CheckToken operations.

• CheckToken_max: Maximum execution time observed for CheckToken operations.

• GetTime_ops: Number of GetTime operations executed.

• GetTime_ops_ok: Number of successful GetTime operations.

• GetTime_sum: Sum of timings for GetTime operations.

• GetTime_sqr: Sum of squares of sample timings for GetTime operations.

• GetTime_min: Minimum execution time observed for GetTime operations.

• GetTime_max: Maximum execution time observed for GetTime operations.

• NGetVolumeInfo_ops: Number of NGetVolumeInfo operations executed.

• NGetVolumeInfo_ops_ok: Number of successful NGetVolumeInfo operations.

• NGetVolumeInfo_sum: Sum of timings for NGetVolumeInfo operations.

• NGetVolumeInfo_sqr: Sum of squares of sample timings for NGetVolumeInfo operations.

• NGetVolumeInfo_min: Minimum execution time observed for NGetVolumeInfo operations.

• NGetVolumeInfo_max: Maximum execution time observed for NGetVolumeInfo operations.

• BulkStatus_ops: Number of BulkStatus operations executed.

• BulkStatus_ops_ok: Number of successful BulkStatus operations.

• BulkStatus_sum: Sum of timings for BulkStatus operations.

• BulkStatus_sqr: Sum of squares of sample timings for BulkStatus operations.

• BulkStatus_min: Minimum execution time observed for BulkStatus operations.

• BulkStatus_max: Maximum execution time observed for BulkStatus operations.

• XStatsVersion_ops: Number of XStatsVersion operations executed.

• XStatsVersion_ops_ok: Number of successful XStatsVersion operations.

• XStatsVersion_sum: Sum of timings for XStatsVersion operations.

• XStatsVersion_sqr: Sum of squares of sample timings for XStatsVersion operations.

• XStatsVersion_min: Minimum execution time observed for XStatsVersion operations.

• XStatsVersion_max: Maximum execution time observed for XStatsVersion operations.

• GetXStats_ops: Number of GetXStats operations executed.

• GetXStats_ops_ok: Number of successful GetXStats operations.

• GetXStats_sum: Sum of timings for GetXStats operations.


• GetXstats_sqr: Sum of squares of sample timings for GetXStats operations.

• GetXStats_min: Minimum execution time observed for GetXStats operations.

• GetXStats_max: Maximum execution time observed for GetXStats operations.

File Server RPC Operation Errors Group (FS_RPCopErrors_group)

• FetchData_srv_err: Number of server-down errors during FetchData operations.

• FetchData_net_err: Number of network errors during FetchData operations.

• FetchData_prot_err_err: Number of protection violations during FetchData operations.

• FetchData_vol_err: Number of volume related errors during FetchData operations.

• FetchData_busy_err: Number of volume busy conditions during FetchData operations.

• FetchData_other_err: Number of miscellaneous other errors during FetchData operations.

• FetchACL_srv_err: Number of server-down errors during FetchACL operations.

• FetchACL_net_err: Number of network errors during FetchACL operations.

• FetchACL_prot_err: Number of protection violations during FetchACL operations.

• FetchACL_vol_err: Number of volume related errors during FetchACL operations.

• FetchACL_busy_err: Number of volume busy conditions encountered during FetchACL operations.

• FetchACL_other_err: Number of miscellaneous other errors during FetchACL operations.

• FetchStatus_srv_err: Number of server-down errors during FetchStatus operations.

• FetchStatus_net_err: Number of network errors during FetchStatus operations.

• FetchStatus_prot_err: Number of protection violations during FetchStatus operations.

• FetchStatus_vol_err: Number of volume related errors during FetchStatus operations.

• FetchStatus_busy_err: Number of volume busy conditions encountered during FetchStatus operations.

• FetchStatus_other_err: Number of miscellaneous other errors during FetchStatus operations.

• StoreData_srv_err: Number of server-down errors during StoreData operations.

• StoreData_net_err: Number of network errors during StoreData operations.

• StoreData_prot_err: Number of protection violations during StoreData operations.

• StoreData_vol_err: Number of volume related errors during StoreData operations.

• StoreData_busy_err: Number of volume busy conditions encountered during StoreData operations.

• StoreData_other_err: Number of miscellaneous other errors during StoreData operations.

• StoreACL_srv_err: Number of server-down errors during StoreACL operations.

• StoreACL_net_err: Number of network errors during StoreACL operations.

• StoreACL_prot_err: Number of protection violations during StoreACL operations.

• StoreACL_vol_err: Number of volume related errors during StoreACL operations.

• StoreACL_busy_err: Number of volume busy conditions encountered during StoreACL operations.

• StoreACL_other_err: Number of miscellaneous other errors during StoreACL operations.


• StoreStatus_srv_err: Number of server-down errors during StoreStatus operations.

• StoreStatus_net_err: Number of network errors during StoreStatus operations.

• StoreStatus_prot_err: Number of protection violations during StoreStatus operations.

• StoreStatus_vol_err: Number of volume related errors during StoreStatus operations.

• StoreStatus_busy_err: Number of volume busy conditions encountered during StoreStatus operations.

• StoreStatus_other_err: Number of miscellaneous other errors during StoreStatus operations.

• RemoveFile_srv_err: Number of server-down errors during RemoveFile operations.

• RemoveFile_net_err: Number of network errors during RemoveFile operations.

• RemoveFile_prot_err: Number of protection violations during RemoveFile operations.

• RemoveFile_vol_err: Number of volume related errors during RemoveFile operations.

• RemoveFile_busy_err: Number of volume busy conditions encountered during RemoveFile operations.

• RemoveFile_other_err: Number of miscellaneous other errors during RemoveFile operations.

• CreateFile_srv_err: Number of server-down errors during CreateFile operations.

• CreateFile_net_err: Number of network errors during CreateFile operations.

• CreateFile_prot_err: Number of protection violations during CreateFile operations.

• CreateFile_vol_err: Number of volume related errors during CreateFile operations.

• CreateFile_busy_err: Number of volume busy conditions encountered during CreateFile operations.

• CreateFile_other_err: Number of miscellaneous other errors during CreateFile operations.

• Rename_srv_err: Number of server-down errors during Rename operations.

• Rename_net_err: Number of network errors during Rename operations.

• Rename_prot_err: Number of protection violations during Rename operations.

• Rename_vol_err: Number of volume related errors during Rename operations.

• Rename_busy_err: Number of volume busy conditions encountered during Rename operations.

• Rename_other_err: Number of miscellaneous other errors during Rename operations.

• Symlink_srv_err: Number of server-down errors during Symlink operations.

• Symlink_net_err: Number of network errors during Symlink operations.

• Symlink_prot_err: Number of protection violations during Symlink operations.

• Symlink_vol_err: Number of volume related errors during Symlink operations.

• Symlink_busy_err: Number of volume busy conditions encountered during Symlink operations.

• Symlink_other_err: Number of miscellaneous other errors during Symlink operations.

• Link_srv_err: Number of server-down errors during Link operations.

• Link_net_err: Number of network errors during Link operations.

• Link_prot_err: Number of protection violations during Link operations.

• Link_vol_err: Number of volume related errors during Link operations.

• Link_busy_err: Number of volume busy conditions encountered during Link operations.


• Link_other_err: Number of miscellaneous other errors during Link operations.

• MakeDir_srv_err: Number of server-down errors during MakeDir operations.

• MakeDir_net_err: Number of network errors during MakeDir operations.

• MakeDir_prot_err: Number of protection violations during MakeDir operations.

• MakeDir_vol_err: Number of volume related errors during MakeDir operations.

• MakeDir_busy_err: Number of volume busy conditions encountered during MakeDir operations.

• MakeDir_other_err: Number of miscellaneous other errors during MakeDir operations.

• RemoveDir_srv_err: Number of server-down errors during RemoveDir operations.

• RemoveDir_net_err: Number of network errors during RemoveDir operations.

• RemoveDir_prot_err: Number of protection violations during RemoveDir operations.

• RemoveDir_vol_err: Number of volume related errors during RemoveDir operations.

• RemoveDir_busy_err: Number of volume busy conditions encountered during RemoveDir operations.

• RemoveDir_other_err: Number of miscellaneous other errors during RemoveDir operations.

• SetLock_srv_err: Number of server-down errors during SetLock operations.

• SetLock_net_err: Number of network errors during SetLock operations.

• SetLock_prot_err: Number of protection violations during SetLock operations.

• SetLock_vol_err: Number of volume related errors during SetLock operations.

• SetLock_busy_err: Number of volume busy conditions encountered during SetLock operations.

• SetLock_other_err: Number of miscellaneous other errors during SetLock operations.

• ExtendLock_srv_err: Number of server-down errors during ExtendLock operations.

• ExtendLock_net_err: Number of network errors during ExtendLock operations.

• ExtendLock_prot_err: Number of protection violations during ExtendLock operations.

• ExtendLock_vol_err: Number of volume related errors during ExtendLock operations.

• ExtendLock_busy_err: Number of volume busy conditions encountered during ExtendLock operations.

• ExtendLock_other_err: Number of miscellaneous other errors during ExtendLock operations.

• ReleaseLock_srv_err: Number of server-down errors during ReleaseLock operations.

• ReleaseLock_net_err: Number of network errors during ReleaseLock operations.

• ReleaseLock_prot_err: Number of protection violations during ReleaseLock operations.

• ReleaseLock_vol_err: Number of volume related errors during ReleaseLock operations.

• ReleaseLock_busy_err: Number of volume busy conditions encountered during ReleaseLock operations.

• ReleaseLock_other_err: Number of miscellaneous other errors during ReleaseLock operations.

• GetStatistics_srv_err: Number of server-down errors during GetStatistics operations.

• GetStatistics_net_err: Number of network errors during GetStatistics operations.

• GetStatistics_prot_err: Number of protection violations during GetStatistics operations.

• GetStatistics_vol_err: Number of volume related errors during GetStatistics operations.


• GetStatistics_busy_err: Number of volume busy conditions encountered during GetStatistics operations.

• GetStatistics_other_err: Number of miscellaneous other errors during GetStatistics operations.

• GiveUpCallbacks_srv_err: Number of server-down errors during GiveUpCallbacks operations.

• GiveUpCallbacks_net_err: Number of network errors during GiveUpCallbacks operations.

• GiveUpCallbacks_prot_err: Number of protection violations during GiveUpCallbacks operations.

• GiveUpCallbacks_vol_err: Number of volume related errors during GiveUpCallbacks operations.

• GiveUpCallbacks_busy_err: Number of volume busy conditions encountered during GiveUpCallbacks operations.

• GiveUpCallbacks_other_err: Number of miscellaneous other errors during GiveUpCallbacks operations.

• GetVolumeInfo_srv_err: Number of server-down errors during GetVolumeInfo operations.

• GetVolumeInfo_net_err: Number of network errors during GetVolumeInfo operations.

• GetVolumeInfo_prot_err: Number of protection violations during GetVolumeInfo operations.

• GetVolumeInfo_vol_err: Number of volume related errors during GetVolumeInfo operations.

• GetVolumeInfo_busy_err: Number of volume busy conditions encountered during GetVolumeInfo operations.

• GetVolumeInfo_other_err: Number of miscellaneous other errors during GetVolumeInfo operations.

• GetVolumeStatus_srv_err: Number of server-down errors during GetVolumeStatus operations.

• GetVolumeStatus_net_err: Number of network errors during GetVolumeStatus operations.

• GetVolumeStatus_prot_err: Number of protection violations during GetVolumeStatus operations.

• GetVolumeStatus_vol_err: Number of volume related errors during GetVolumeStatus operations.

• GetVolumeStatus_busy_err: Number of volume busy conditions encountered during GetVolumeStatus operations.

• GetVolumeStatus_other_err: Number of miscellaneous other errors during GetVolumeStatus operations.

• SetVolumeStatus_srv_err : Number of server-down errors during SetVolumeStatus operations.

• SetVolumeStatus_net_err: Number of network errors during SetVolumeStatus operations.

• SetVolumeStatus_prot_err: Number of protection violations during SetVolumeStatus operations.

• SetVolumeStatus_vol_err: Number of volume related errors during SetVolumeStatus operations.

• SetVolumeStatus_busy_err: Number of volume busy conditions encountered during SetVolumeStatus operations.

• SetVolumeStatus_other_err: Number of miscellaneous other errors during SetVolumeStatus operations.

• GetRootVolume_srv_err: Number of server-down errors during GetRootVolume operations.

• GetRootVolume_net_err: Number of network errors during GetRootVolume operations.

• GetRootVolume_prot_err: Number of protection violations during GetRootVolume operations.

• GetRootVolume_vol_err: Number of volume related errors during GetRootVolume operations.

• GetRootVolume_busy_err: Number of volume busy conditions encountered during GetRootVolume operations.

• GetRootVolume_other_err: Number of miscellaneous other errors during GetRootVolume operations.

• CheckToken_srv_err: Number of server-down errors during CheckToken operations.

• CheckToken_net_err: Number of network errors during CheckToken operations.

• CheckToken_prot_err: Number of protection violations during CheckToken operations.


• CheckToken_vol_err: Number of volume related errors during CheckToken operations.

• CheckToken_busy_err: Number of volume busy conditions encountered during CheckToken operations.

• CheckToken_other_err: Number of miscellaneous other errors during CheckToken operations.

• GetTime_srv_err: Number of server-down errors during GetTime operations.

• GetTime_net_err: Number of network errors during GetTime operations.

• GetTime_prot_err: Number of protection violations during GetTime operations.

• GetTime_vol_err: Number of volume related errors during GetTime operations.

• GetTime_busy_err: Number of volume busy conditions encountered during GetTime operations.

• GetTime_other_err: Number of miscellaneous other errors during GetTime operations.

• NGetVolumeInfo_srv_err: Number of server-down errors during NGetVolumeInfo operations.

• NGetVolumeInfo_net_err: Number of network errors during NGetVolumeInfo operations.

• NGetVolumeInfo_prot_err: Number of protection violations during NGetVolumeInfo operations.

• NGetVolumeInfo_vol_err: Number of volume related errors during NGetVolumeInfo operations.

• NGetVolumeInfo_busy_err: Number of volume busy conditions encountered during NGetVolumeInfo operations.

• NGetVolumeInfo_other_err: Number of miscellaneous other errors during NGetVolumeInfo operations.

• BulkStatus_srv_err: Number of server-down errors during BulkStatus operations.

• BulkStatus_net_err: Number of network errors during BulkStatus operations.

• BulkStatus_prot_err: Number of protection violations during BulkStatus operations.

• BulkStatus_vol_err: Number of volume related errors during BulkStatus operations.

• BulkStatus_busy_err: Number of volume busy conditions encountered during BulkStatus operations.

• BulkStatus_other_err: Number of miscellaneous other errors during BulkStatus operations.

• XStatsVersion_srv_err: Number of server-down errors during XStatsVersion operations.

• XStatsVersion_net_err: Number of network errors during XStatsVersion operations.

• XStatsVersion_prot_err: Number of protection violations during XStatsVersion operations.

• XStatsVersion_vol_err: Number of volume related errors during XStatsVersion operations.

• XStatsVersion_busy_err: Number of volume busy conditions encountered during XStatsVersion operations.

• XStatsVersion_other_err: Number of miscellaneous other errors during XStatsVersion operations.

• GetXStats_srv_err: Number of server-down errors during GetXStats operations.

• GetXStats_net_err: Number of network errors during GetXStats operations.

• GetXStats_prot_err: Number of protection violations during GetXStats operations.

• GetXStats_vol_err: Number of volume related errors during GetXStats operations.

• GetXStats_busy_err: Number of volume busy conditions encountered during GetXStats operations.

• GetXStats_other_err: Number of miscellaneous other errors during GetXStats operations.

File Server RPC Transfer Timings Group (FS_RPCopBytes_group)


• FetchData_xfers: Number of FetchData operations.

• FetchData_xfers_ok: Number of successful FetchData operations.

• FetchData_xfers_sum: Sum of timing values for FetchData operations.

• FetchData_xfers_sqr: Sum of squares of sample timings for FetchData operations.

• FetchData_xfers_min: Minimum transfer time observed for FetchData operations.

• FetchData_xfers_max: Maximum transfer time observed for FetchData operations.

• FetchData_xfers_bytes_sum: Sum of bytes transferred for FetchData operations.

• FetchData_xfers_bytes_min: Minimum byte transfer observed for FetchData operations.

• FetchData_xfers_bytes_max: Maximum byte transfer observed for FetchData operations.

• FetchData_xfers_bucket0: Tally in bucket0 for FetchData operations.









• StoreData_xfers: Number of StoreData operations.

• StoreData_xfers_ok: Number of successful StoreData operations.

• StoreData_xfers_sum: Sum of timing values for StoreData operations.

• StoreData_xfers_sqr: Sum of squares of sample timings for StoreData operations.

• StoreData_xfers_min: Minimum transfer time observed for StoreData operations.

• StoreData_xfers_max: Maximum transfer time observed for StoreData operations.

• StoreData_xfers_bytes_sum: Sum of bytes transferred for StoreData operations.

• StoreData_xfers_bytes_min: Minimum byte transfer observed for StoreData operations.

• StoreData_xfers_bytes_max: Maximum byte transfer observed for StoreData operations.

• StoreData_xfers_bucket0: Tally in bucket0 for StoreData operations.










Cache Manager RPC Operation Timings Group (CM_RPCopTimes_group)

• CallBack_ops: Number of CallBack operations executed.

• CallBack_ops_ok: Number of successful CallBack operations.

• CallBack_ops_sum: Sum of timings for CallBack operations.

• CallBack_ops_min: Minimum execution time observed for CallBack operations.

• CallBack_ops_max: Maximum execution time observed for CallBack operations.

• CallBack_ops_sqr: Sum of the square of CallBack operations executed.

• InitCallBackState_ops: Number of InitCallBackState operations executed.

• InitCallBackState_ops_ok: Number of successful InitCallBackState operations.

• InitCallBackState_ops_sum: Sum of timings for InitCallBackState operations.

• InitCallBackState_ops_min: Minimum execution time observed for InitCallBackState operations.

• InitCallBackState_ops_max: Maximum execution time observed for InitCallBackState operations.

• InitCallBackState_ops_sqr: Sum of squares of timings for InitCallBackState operations.

• Probe_ops: Number of Probe operations executed.

• Probe_ops_ok: Number of successful Probe operations.

• Probe_ops_sum: Sum of timings for Probe operations.

• Probe_ops_min: Minimum execution time observed for Probe operations.

• Probe_ops_max: Maximum execution time observed for Probe operations.

• Probe_ops_sqr: Sum of squares of timings for Probe operations.

• GetLock_ops: Number of GetLock operations executed.

• GetLock_ops_ok: Number of successful GetLock operations.

• GetLock_ops_sum: Sum of timings for GetLock operations.

• GetLock_ops_min: Minimum execution time observed for GetLock operations.

• GetLock_ops_max: Maximum execution time observed for GetLock operations.

• GetLock_ops_sqr: Sum of squares of timings for GetLock operations.

• GetCE_ops: Number of GetCE operations executed.

• GetCE_ops_ok: Number of successful GetCE operations.

• GetCE_ops_sum: Sum of timings for GetCE operations.

• GetCE_ops_min: Minimum execution time observed for GetCE operations.

• GetCE_ops_max: Maximum execution time observed for GetCE operations.

• GetCE_ops_sqr: Sum of squares of timings for GetCE operations.

• XStatsVersion_CM_ops: Number of XStatsVersion operations executed.

• XStatsVersion_CM_ops_ok: Number of successful XStatsVersion operations.


• XStatsVersion_CM_ops_sum: Sum of timings for XStatsVersion operations.

• XStatsVersion_CM_ops_min: Minimum execution time observed for XStatsVersion operations.

• XStatsVersion_CM_ops_max: Maximum execution time observed for XStatsVersion operations.

• XStatsVersion_CM_ops_sqr: Sum of squares of timings for XStatsVersion operations.

• GetXStats_CM_ops: Number of GetXStats operations executed.

• GetXStats_CM_ops_ok: Number of successful GetXStats operations.

• GetXStats_CM_ops_sum: Sum of timings for GetXStats operations.

• GetXStats_CM_ops_min: Minimum execution time observed for GetXStats operations.

• GetXStats_CM_ops_max: Maximum execution time observed for GetXStats operations.

• GetXStats_CM_ops_sqr: Sum of squares of timings for GetXStats operations.

.8.4 Authentication and Replicated File Access Section (Auth_Access_section)

Authentication Information for Cache Manager Group (Auth_Stats_group)

• curr_PAGs: Current number of PAGs.

• curr_Records: Current number of records in table.

• curr_AuthRecords: Current number of of authenticated records (with valid ticket).

• curr_UnauthRecords: Current number of of unauthenticated records (without any ticket at all).

• curr_MaxRecordsInPAG: Maximum records for a single PAG.

• curr_LongestChain: Length of longest current hash chain.

• PAGCreations: Number of PAG creations.

• TicketUpdates: Number of ticket additions/refreshes.

• HWM_PAGS: High water mark - number of PAGs.

• HWM_Records: High water mark - number of records.

• HWM_MaxRecordsInPAG: High water mark - maximum records for a single PAG.

• HWM_LongestChain: High water mark - longest hash chain.

Unreplicated File Access Group (Access_Stats_group)

• unreplicatedRefs: Number of references to unreplicated data.

• replicatedRefs: Number of references to replicated data.

• numReplicasAccessed: Number of replicas accessed.

• maxReplicasPerRef: Maximum number of replicas accessed per reference.

• refFirstReplicaOK: Number of references satisfied by 1st replica.


.9 The File Server Statistics

File Server statistics are classified into the following sections and groups:

• PerfStats_section: Performance Statistics Section.

– VnodeCache_group: Vnode Cache Group.

– Directory_group: Directory Package Group.

– Rx_group: Rx Group.

– HostModule_group: Host Module Fields Group.

– misc_group: Miscellaneous Variables Group.

• RPCop_section: RPC Operations Section.

– RPCopTimes_group: Individual RPC Operation Timings.

– RPCopBytes_group: Byte Information for Certain RPC Operations.

• CallBackStats_section: CallBack Statistics Section.

– CallBackCounters_group: General CallBack Counters.

– GotSomeSpaces_group: GotSomeSpaces Counters.

All File Server variables categorized under the above sections and groups names are listed below.

.9.1 Performance Statistics Section (PerfStats_section)

Vnode Cache Group (VnodeCache_group)

• vcache_L_Entries: Number of entries in LARGE vnode cache.

• vcache_L_Allocs: Number of allocs (large).

• vcache_L_Gets: Number of gets (large).

• vcache_L_Reads: Number of reads (large).

• vcache_L_Writes: Number of writes (large).

• vcache_S_Entries: Number of entries in SMALL vnode cache.

• vcache_S_Allocs: Number of allocs (small).

• vcache_S_Gets: Number of gets (small).

• vcache_S_Reads: Number of reads (small).

• vcache_S_Writes: Number of writes (small).

• vcache_H_Entries: Number of entries in HEADER vnode cache.

• vcache_H_Gets: Number of gets (header)

• vcache_H_Replacements: Number of replacements (header)

Directory Package Group (Directory_group)

• dir_Buffers: Number of buffers in use.

• dir_Calls: Number of read calls made.


• dir_IOs: I/O operations performed.

Rx Group (Rx_group)

• rx_packetRequests: Packet allocation requests.

• rx_noPackets_RcvClass: Failed packet requests (receive class).

• rx_noPackets_SendClass: Failed packet requests (send class).

• rx_noPackets_SpecialClass: Failed packet requests (special class).

• rx_socketGreedy: Did SO_GREEDY succeed?

• rx_bogusPacketOnRead: Short packets received.

• rx_bogusHost: Host address from bogus packets.

• rx_noPacketOnRead: Read packets with no packet there.

• rx_noPacketBuffersOnRead: Packets dropped due to buffer shortage.

• rx_selects: Selects waiting on packet or timeout.

• rx_sendSelects: Selects forced upon sends.

• rx_packetsRead_RcvClass: Packets read (receive class).

• rx_packetsRead_SendClass: Packets read (send class).

• rx_packetsRead_SpecialClass: Packets read (special class).

• rx_dataPacketsRead: Unique data packets read off wire.

• rx_ackPacketsRead: ACK packets read.

• rx_dupPacketsRead: Duplicate data packets read.

• rx_spuriousPacketsRead: Inappropriate packets read.

• rx_packetsSent_RcvClass: Packets sent (receive class).

• rx_packetsSent_SendClass: Packets sent (send class).

• rx_packetsSent_SpecialClass: Packets sent (special class).

• rx_ackPacketsSent: ACK packets sent.

• rx_pingPacketsSent: Ping packets sent.

• rx_abortPacketsSent: Abort packets sent.

• rx_busyPacketsSent: Busy packets sent.

• rx_dataPacketsSent: Unique data packets sent.

• rx_dataPacketsReSent: Retransmissions sent.

• rx_dataPacketsPushed: Retransmissions pushed by NACK.

• rx_ignoreAckedPacket: Packets with ACKed flag on rxi_Start.

• rx_totalRtt_Sec and rx_totalRtt_Usec: Total round trip time (in seconds and milliseconds).

• rx_minRtt_Sec and rx_minRtt_Usec: Minimum round trip time (in seconds and milliseconds).

• rx_maxRtt_Sec and rx_maxRtt_Usec: Maximum round trip time (in seconds and milliseconds).


• rx_nRttSamples: Round trip samples.

• rx_nServerConns: Total server connections.

• rx_nClientConns: Total client connections.

• rx_nPeerStructs: Total peer structures.

• rx_nCallStructs: Total call structures.

• rx_nFreeCallStructs: Total free call structures.

Host Module Fields Group (HostModule_group)

• host_NumHostEntries: Number of host entries.

• host_HostBlocks: Blocks in use for hosts.

• host_NonDeletedHosts: Non-deleted hosts.

• host_HostsInSameNetOrSubnet: Hosts in same subnet as server.

• host_HostsInDiffSubnet: Hosts in different subnet than server.

• host_HostsInDiffNetwork: Hosts in different network than server.

• host_NumClients: Number of client entries.

• host_ClientBlocks: Blocks in use for clients.

Miscellaneous Variables Group (misc_group)

• numPerfCalls: Number of performance calls received.

.9.2 RPC Operations Section (RPCop_section)

Individual RPC Operation Timings Group (RPCopTimes_group)

• epoch: Time when data collection began.

• FetchData_ops: Number of FetchData operations executed.

• FetchData_ops_ok: Number of successful FetchData operations.

• FetchData_sum: Sum of timings for FetchData operations.

• FetchData_sqr: Sum of squares of sample timings for FetchData operations.

• FetchData_min: Minimum execution time observed for FetchData operations.

• FetchData_max: Maximum execution time observed for FetchData operations.

• FetchACL_ops: Number of FetchACL operations executed.

• FetchACL_ops_ok: Number of successful FetchACL operations.

• FetchACL_sum: Sum of timings for FetchACL operations.

• FetchACL_sqr: Sum of squares of sample timings for FetchACL operations.

• FetchACL_min: Minimum execution time observed for FetchACL operations.

• FetchACL_max: Maximum execution time observed for FetchACL operations.


• FetchStatus_ops: Number of FetchStatus operations executed.

• FetchStatus_ops_ok: Number of successful FetchStatus operations.

• FetchStatus_sum: Sum of timings for FetchStatus operations.

• FetchStatus_sqr: Sum of squares of sample timings for FetchStatus operations.

• FetchStatus_min: Minimum execution time observed for FetchStatus operations.

• FetchStatus_max: Maximum execution time observed for FetchStatus operations.

• StoreData_ops: Number of StoreData operations executed.

• StoreData_ops_ok: Number of successful StoreData operations.

• StoreData_sum: Sum of timings for StoreData operations.

• StoreData_sqr: Sum of squares of sample timings for StoreData operations.

• StoreData_min: Minimum execution time observed for StoreData operations.

• StoreData_max: Maximum execution time observed for StoreData operations.

• StoreACL_ops: Number of StoreACL operations executed.

• StoreACL_ops_ok: Number of successful StoreACL operations.

• StoreACL_sum: Sum of timings for StoreACL operations.

• StoreACL_sqr: Sum of squares of sample timings for StoreACL operations.

• StoreACL_min: Minimum execution time observed for StoreACL operations.

• StoreACL_max: Maximum execution time observed for StoreACL operations.

• StoreStatus_ops: Number of StoreStatus operations executed.

• StoreStatus_ops_ok: Number of successful StoreStatus operations.

• StoreStatus_sum: Sum of timings for StoreStatus operations.

• StoreStatus_sqr: Sum of squares of sample timings for StoreStatus operations.

• StoreStatus_min: Minimum execution time observed for StoreStatus operations.

• StoreStatus_max: Maximum execution time observed for StoreStatus operations.

• RemoveFile_ops: Number of RemoveFile operations executed.

• RemoveFile_ops_ok: Number of successful RemoveFile operations.

• RemoveFile_sum: Sum of timings for RemoveFile operations.

• RemoveFile_sqr: Sum of squares of sample timings for RemoveFile operations.

• RemoveFile_min: Minimum execution time observed for RemoveFile operations.

• RemoveFile_max: Maximum execution time observed for RemoveFile operations.

• CreateFile_ops: Number of CreateFile operations executed.

• CreateFile_ops_ok: Number of successful CreateFile operations.

• CreateFile_sum: Sum of timings for CreateFile operations.

• CreateFile_sqr: Sum of squares of sample timings for CreateFile operations.

• CreateFile_min: Minimum execution time observed for CreateFile operations.


• CreateFile_max: Maximum execution time observed for CreateFile operations.

• Rename_ops: Number of Rename operations executed.

• Rename_ops_ok: Number of successful Rename operations.

• Rename_sum: Sum of timings for Rename operations.

• Rename_sqr: Sum of squares of sample timings for Rename operations.

• Rename_min: Minimum execution time observed for Rename operations.

• Rename_max: Maximum execution time observed for Rename operations.

• Symlink_ops: Number of Symlink operations executed.

• Symlink_ops_ok: Number of successful Symlink operations.

• Symlink_sum: Sum of timings for Symlink operations.

• Symlink_sqr: Sum of squares of sample timings for Symlink operations.

• Symlink_min: Minimum execution time observed for Symlink operations.

• Symlink_max: Maximum execution time observed for Symlink operations.

• Link_ops: Number of Link operations executed.

• Link_ops_ok: Number of successful Link operations.

• Link_sum: Sum of timings for Link operations.

• Link_sqr: Sum of squares of sample timings for Link operations.

• Link_min: Minimum execution time observed for Link operations.

• Link_max: Maximum execution time observed for Link operations.

• MakeDir_ops: Number of MakeDir operations executed.

• MakeDir_ops_ok: Number of successful MakeDir operations.

• MakeDir_sum: Sum of timings for MakeDir operations.

• MakeDir_sqr: Sum of squares of sample timings for MakeDir operations.

• MakeDir_min: Minimum execution time observed for MakeDir operations.

• MakeDir_max: Maximum execution time observed for MakeDir operations.

• RemoveDir_ops: Number of RemoveDir operations executed.

• RemoveDir_ops_ok: Number of successful RemoveDir operations.

• RemoveDir_sum: Sum of timings for RemoveDir operations.

• RemoveDir_sqr: Sum of squares of sample timings for RemoveDir operations.

• RemoveDir_min: Minimum execution time observed for RemoveDir operations.

• RemoveDir_max: Maximum execution time observed for RemoveDir operations.

• SetLock_ops: Number of SetLock operations executed.

• SetLock_ops_ok: Number of successful SetLock operations.

• SetLock_sum: Sum of timings for SetLock operations.

• SetLock_sqr: Sum of squares of sample timings for SetLock operations.


• SetLock_min: Minimum execution time observed for SetLock operations.

• SetLock_max: Maximum execution time observed for SetLock operations.

• ExtendLock_ops: Number of ExtendLock operations executed.

• ExtendLock_ops_ok: Number of successful ExtendLock operations.

• ExtendLock_sum: Sum of timings for ExtendLock operations.

• ExtendLock_sqr: Sum of squares of sample timings for ExtendLock operations.

• ExtendLock_min: Minimum execution time observed for ExtendLock operations.

• ExtendLock_max: Maximum execution time observed for ExtendLock operations.

• ReleaseLock_ops: Number of ReleaseLock operations executed.

• ReleaseLock_ops_ok: Number of successful ReleaseLock operations.

• ReleaseLock_sum: Sum of timings for ReleaseLock operations.

• ReleaseLock_sqr: Sum of squares of sample timings for ReleaseLock operations.

• ReleaseLock_min: Minimum execution time observed for ReleaseLock operations.

• ReleaseLock_max: Maximum execution time observed for ReleaseLock operations.

• GetStatistics_ops: Number of GetStatistics operations executed.

• GetStatistics_ops_ok: Number of successful GetStatistics operations.

• GetStatistics_sum: Sum of timings for GetStatistics operations.

• GetStatistics_sqr: Sum of squares of sample timings for GetStatistics operations.

• GetStatistics_min: Minimum execution time observed for GetStatistics operations.

• GetStatistics_max: Maximum execution time observed for GetStatistics operations.

• GiveUpCallbacks_ops: Number of GiveUpCallbacks operations executed.

• GiveUpCallbacks_ops_ok: Number of successful GiveUpCallbacks operations.

• GiveUpCallbacks_sum: Sum of timings for GiveUpCallbacks operations.

• GiveUpCallbacks_sqr: Sum of squares of sample timings for GiveUpCallbacks operations.

• GiveUpCallbacks_min: Minimum execution time observed for GiveUpCallbacks operations.

• GiveUpCallbacks_max: Maximum execution time observed for GiveUpCallbacks operations.

• GetVolumeInfo_ops: Number of GetVolumeInfo operations executed.

• GetVolumeInfo_ops_ok: Number of successful GetVolumeInfo operations.

• GetVolumeInfo_sum: Sum of timings for GetVolumeInfo operations.

• GetVolumeInfo_sqr: Sum of squares of sample timings for GetVolumeInfo operations.

• GetVolumeInfo_min: Minimum execution time observed for GetVolumeInfo operations.

• GetVolumeInfo_max: Maximum execution time observed for GetVolumeInfo operations.

• GetVolumeStatus_ops: Number of GetVolumeStatus operations executed.

• GetVolumeStatus_ops_ok: Number of successful GetVolumeStatus operations.

• GetVolumeStatus_sum: Sum of timings for GetVolumeStatus operations.


• GetVolumeStatus_sqr: Sum of squares of sample timings for GetVolumeStatus operations.

• GetVolumeStatus_min: Minimum execution time observed for GetVolumeStatus operations.

• GetVolumeStatus_max: Maximum execution time observed for GetVolumeStatus operations.

• SetVolumeStatus_ops: Number of SetVolumeStatus operations executed.

• SetVolumeStatus_ops_ok: Number of successful SetVolumeStatus operations.

• SetVolumeStatus_sum: Sum of timings for SetVolumeStatus operations.

• SetVolumeStatus_sqr: Sum of squares of sample timings for SetVolumeStatus operations.

• SetVolumeStatus_min: Minimum execution time observed for SetVolumeStatus operations.

• SetVolumeStatus_max: Maximum execution time observed for SetVolumeStatus operations.

• GetRootVolume_ops: Number of GetRootVolume operations executed.

• GetRootVolume_ops_ok: Number of successful GetRootVolume operations.

• GetRootVolume_sum: Sum of timings for GetRootVolume operations.

• GetRootVolume_sqr: Sum of squares of sample timings for GetRootVolume operations.

• GetRootVolume_min: Minimum execution time observed for GetRootVolume operations.

• GetRootVolume_max: Maximum execution time observed for GetRootVolume operations.

• CheckToken_ops: Number of CheckToken operations executed.

• CheckToken_ops_ok: Number of successful CheckToken operations.

• CheckToken_sum: Sum of timings for CheckToken operations.

• CheckToken_sqr: Sum of squares of sample timings for CheckToken operations.

• CheckToken_min: Minimum execution time observed for CheckToken operations.

• CheckToken_max: Maximum execution time observed for CheckToken operations.

• GetTime_ops: Number of GetTime operations executed.

• GetTime_ops_ok: Number of successful GetTime operations.

• GetTime_sum: Sum of timings for GetTime operations.

• GetTime_sqr: Sum of squares of sample timings for GetTime operations.

• GetTime_min: Minimum execution time observed for GetTime operations.

• GetTime_max: Maximum execution time observed for GetTime operations.

• NGetVolumeInfo_ops: Number of NGetVolumeInfo operations executed.

• NGetVolumeInfo_ops_ok: Number of successful NGetVolumeInfo operations.

• NGetVolumeInfo_sum: Sum of timings for NGetVolumeInfo operations.

• NGetVolumeInfo_sqr: Sum of squares of sample timings for NGetVolumeInfo operations.

• NGetVolumeInfo_min: Minimum execution time observed for NGetVolumeInfo operations.

• NGetVolumeInfo_max: Maximum execution time observed for NGetVolumeInfo operations.

• BulkStatus_ops: Number of BulkStatus operations executed.

• BulkStatus_ops_ok: Number of successful BulkStatus operations.


• BulkStatus_sum: Sum of timings for BulkStatus operations.

• BulkStatus_sqr: Sum of squares of sample timings for BulkStatus operations.

• BulkStatus_min: Minimum execution time observed for BulkStatus operations.

• BulkStatus_max: Maximum execution time observed for BulkStatus operations.

• XStatsVersion_ops: Number of XStatsVersion operations executed.

• XStatsVersion_ops_ok: Number of successful XStatsVersion operations.

• XStatsVersion_sum: Sum of timings for XStatsVersion operations.

• XStatsVersion_sqr: Sum of squares of sample timings for XStatsVersion operations.

• XStatsVersion_min: Minimum execution time observed for XStatsVersion operations.

• XStatsVersion_max: Maximum execution time observed for XStatsVersion operations.

• GetXStats_ops: Number of GetXStats operations executed.

• GetXStats_ops_ok: Number of successful GetXStats operations.

• GetXStats_sum: Sum of timings for GetXStats operations.

• GetXStats_sqr: Sum of squares of sample timings for GetXStats operations.

• GetXStats_min: Minimum execution time observed for GetXStats operations.

• GetXStats_max: Maximum execution time observed for GetXStats operations.

Byte Information for Certain RPC Operations Group (RPCopBytes_group)

• FetchData_xfers: Number of FetchData operations.

• FetchData_xfers_ok: Number of successful FetchData operations.

• FetchData_xfers_sum: Sum of timing values for FetchData operations.

• FetchData_xfers_sqr: Sum of squares of sample timings for FetchData operations.

• FetchData_xfers_min: Minimum transfer time observed for FetchData operations.

• FetchData_xfers_max: Maximum transfer time observed for FetchData operations.

• FetchData_xfers_bytes_sum: Sum of bytes transferred for FetchData operations.

• FetchData_xfers_bytes_min: Minimum byte transfer observed for FetchData operations.

• FetchData_xfers_bytes_max: Maximum byte transfer observed for FetchData operations.











• StoreData_xfers: Number of StoreData operations.

• StoreData_xfers_ok: Number of successful StoreData operations.

• StoreData_xfers_sum: Sum of timing values for StoreData operations.

• StoreData_xfers_sqr: Sum of squares of sample timings for StoreData operations.

• StoreData_xfers_min: Minimum transfer time observed for StoreData operations.

• StoreData_xfers_max: Maximum transfer time observed for StoreData operations.

• StoreData_xfers_bytes_sum: Sum of bytes transferred for StoreData operations.

• StoreData_xfers_bytes_min: Minimum byte transfer observed for StoreData operations.

• StoreData_xfers_bytes_max: Maximum byte transfer observed for StoreData operations.










.9.3 CallBack Statistics Section (CallBackStats_section)

General CallBack Counters Group (CallBackCounters_group)

• DeleteFiles: Number of times a file was deleted, causing its callbacks to be deleted.

• DeleteCallBacks: Number of times a callback was deleted due to a host giving up the callback.

• BreakCallBacks: Number of times a callback was broken.

• AddCallBack: Number of times a callback was added.

• GotSomeSpaces: Number of times we tried to reclaim some callback space due to running out of callbacks.

• DeleteAllCallBacks: Number of times all of the callbacks for a host were deleted.

• nFEs: Number of files with callbacks currently tracked by the fileserver.

• nCBs: Number of callbacks currently tracked by the fileserver.

• nblks: Maximum number of callbacks or files with callbacks the fileserver will track.

• CBsTimedOut: Number of callbacks that were deleted because they timed out.

• nbreakers: Number of threads busy breaking callbacks.

GotSomeSpaces Counters Group (GotSomeSpaces_group)


• GSS1: Number of times we tried to reclaim callback space and failed to find a suitable host for which to clear callbacks.

• GSS2: Number of times we repeatedly tried to reclaim callback space and never found any host for which to clear callbacks,and cleared callbacks on the host for which we were adding a callback.

• GSS3: Number of times we tried to reclaim callback space and we reclaimed space by clearing timed-out callbacks.

• GSS4: Number of times we tried to reclaim callback space and we found a suitable host to try to clear callbacks on.

• GSS5: Number of times we tried to reclaim callback space and we cleared callbacks on a host that was waiting to have acallback added to it.


Part VIII

AIX Audit Events


This Appendix provides a complete listing of the AFS events that can be audited on AIX file server machines. See ChapterMonitoring and Auditing AFS Performance for instructions on auditing AFS events on AIX file server machines.

.10 Introduction

Below is a list of the AFS events contained in the file /afs/usr/local/audit/events.sample. Each entry contains information onthe event class, the name of the event, the parameters associated with the event, and a description of the event.

Most events have an associated error code that shows the outcome of the event (since each event is recorded after it occurs),an AFSName (the authentication identify of the requesting process), and a host ID (from which the request originated). Manyevents follow the RPC server entry calls defined in the AFS Programmer’s Reference Manual.

Events are classed by functionality (this is AIX specific). Some events possibly fall into one of more of the following classeswhich are defined by the file /usr/afs/local/config.sample:

• A (afsauthent): Authentication and Identification Events

• S (afssecurity): Security Events

• P (afsprivilege): Privilege Required Events

• O (afsobjects): Object Creation and Deletion Events

• M (afsattributes): Attribute modification

• C (afsprocess): Process Control Events

.11 Audit-Specific Events

Event Class Parameters Description

AFS_Audit_WR None <string> The file "/usr/afs/Audit" has been writtento (AIX specific event).

AFS_Aud_On S ECode Auditing is on for this server process(recorded on startup of a server).

AFS_Aud_Off S ECode Auditing is off for this server process(recorded on startup of a server).

AFS_Aud_Unauth S ECode Event Event triggered by an unauthorized user.

NoteThe following audit-specific events indicate an error has occurred while recording the event. Most events have an AFSNameassociated with them and a host ID. If this information cannot be gathered out of the Rx structure, one of these events is raised.


AFS_Aud_NoCall S ECode EventNo rx call structure with this event.Cannot get security, AFS ID, or origin ofcall.

AFS_Aud_NoConn S ECode EventNo connection info associated with rxcall. Cannot get security, AFS ID, ororigin of call.

AFS_Aud_UnknSec S ECode Event Security of call is unknown (must beauthorized or unauthorized caller).

AFS_Aud_NoAFSId S ECode Event No AFS ID/name associated with a secureevent.



AFS_Aud_NoHost S ECode Event No information about origin (machine) ofcaller.

AFS_Aud_EINVAL None Event Error in audit event parameter (can’trecord the event parameter).

.12 Volume Server Events

Event Class Parameters DescriptionAFS_VS_Start P C ECode The volume server has started.

AFS_VS_Finish C ECodeThe volume server has finished. Finishevents are rare since the server process isnormally aborted.

AFS_VS_Exit C ECodeThe volume server has exited. Exit eventsare rare since the server process isnormally aborted.

AFS_VS_TransCr None ECode AFSName HostIDTrans VolID

AFSVolTransCreate - Create transactionfor a [volume, partition]

AFS_VS_EndTrn None ECode AFSName HostIDTrans AFSVolEndTrans - End a transaction.

AFS_VS_CrVol P OECode AFSName HostIDTrans VolID VolName TypeParentID

AFSVolCreateVolume - Create a volume(volumeId volumeName)

AFS_VS_DelVol P O ECode AFSName HostIDTrans AFSVolDeleteVolume - Delete a volume.

AFS_VS_NukVol P O ECode AFSName HostIDVolID

AFSVolNukeVolume - Obliterate avolume completely (volume ID).

AFS_VS_Dump None ECode AFSName HostIDTrans

AFSVolDump - Dump the contents of avolume.

AFS_VS_SigRst P M ECode AFSName HostIDVolName

AFSVolSignalRestore - Show intention tocall AFSVolRestore.

AFS_VS_Restore P O ECode AFSName HostIDTrans

AFSVolRestore - Recreate a volume froma dump.

AFS_VS_Forward P O ECode AFSName HostIDFromTrans Host DestTrans

AFSVolForward - Dump a volume, thenrestore to a given server and volume.

AFS_VS_Clone P OECode AFSName HostIDTrans Purge NewNameNewType NewVolID

AFSVolClone - Clone (and optionallypurge) a volume.

AFS_VS_ReClone P O ECode AFSName HostIDTrans CloneVolID AFSVolReClone - Reclone a volume.

AFS_VS_SetForw P M ECode AFSName HostIDTrans NewHost

AFSVolSetForwarding - Set forwardinginformation for a moved volume.

AFS_VS_GetFlgs None ECode AFSName HostIDTrans

AFSVolGetFlags - Get volume flags for atransaction.

AFS_VS_SetFlgs P M ECode AFSName HostIDTrans Flags

AFSVolSetFlags - Set volume flags for atransaction.

AFS_VS_GetName None ECode AFSName HostIDTrans

AFSVolGetName - Get the volume nameassociated with a transaction.

AFS_VS_GetStat None ECode AFSName HostIDTrans

AFSVolGetStatus - Get status of atransaction/volume.

AFS_VS_SetIdTy P M

ECode AFSName HostIDTrans VolName TypeParentId CloneIDBackupID

AFSVolSetIdsTypes - Set headerinformation for a volume.



AFS_VS_SetDate P M ECode AFSName HostIDTrans Date

AFSVolSetDate - Set creation date in avolume.

AFS_VS_ListPar None ECode AFSName HostID AFSVolListPartitions - Return a list ofAFS partitions on a server.

AFS_VS_ParInf None ECode AFSName HostIDPartName

AFSVolPartitionInfo - Get partitioninformation.

AFS_VS_ListVol None ECode AFSName HostID AFSVolListVolumes - Return a list ofvolumes on a server.

AFS_VS_XLstVol None ECode AFSName HostID AFSVolXListVolumes - Return a(detailed) list of volumes on a server.

AFS_VS_Lst1Vol None ECode AFSName HostIDVolID

AFSVolListOneVolume - Return headerinformation for a single volume.

AFS_VS_XLst1Vl None ECode AFSName HostIDVolID

AFSVolXListOneVolume - Return(detailed) header information for a singlevolume.

AFS_VS_GetNVol None ECode AFSName HostIDVolID

AFSVolGetNthVolume - Get volumeheader given its index.

AFS_VS_Monitor None ECode AFSName HostID AFSVolMonitor - Collect servertransaction state.

AFS_VS_SetInfo P O M ECode AFSName HostIDTrans AFSVolSetInfo - Set volume status.

.13 Backup Server Events

Event Class Parameters DescriptionAFS_BUDB_Start P ECode The backup server has started.

AFS_BUDB_Finish None ECodeThe backup server has finished. Finishevents are rare since the server process isnormally aborted.

AFS_BUDB_Exit None ECodeThe backup server has exited. Exit eventsare rare since the server process isnormally aborted.

AFS_BUDB_CrDmp P O ECode AFSName HostIDdumpId

BUDB_CreateDump - Create a newdump.

AFS_BUDB_AppDmp P ECode AFSName HostIDdumpId

BUDB_makeDumpAppended - Make thedump an appended dump.

AFS_BUDB_DelDmp P O ECode AFSName HostIDdumpId BUDB_DeleteDump - Delete a dump.

AFS_BUDB_FinDmp P ECode AFSName HostIDdumpId

BUDB_FinishDump- Notify buserver thatdump is finished.

AFS_BUDB_UseTpe P M ECode AFSName HostIDdumpId

BUDB_UseTape - Create/add a tape entryto a dump.

AFS_BUDB_DelTpe P M ECode AFSName HostIDdumpId

BUDB_DeleteTape - Remove a tape fromthe database.

AFS_BUDB_FinTpe P ECode AFSName HostIDdumpId

BUDB_FinishTape - Writing to a tape iscompleted.

AFS_BUDB_AddVol P M ECode AFSName HostIDvolId

BUDB_AddVolume - Add a volume to aparticular dump and tape.

AFS_BUDB_GetTxV None ECode AFSName HostIDType

BUDB_GetTextVersion - Get the versionnumber forhosts/volume-sets/dump-hierarchy.

AFS_BUDB_GetTxt P ECode AFSName HostIDType

BUDB_GetText - Get the informationabout hosts/volume-sets/dump-hierarchy.



AFS_BUDB_SavTxt M ECode AFSName HostIDType

BUDB_SaveText - Overwrite theinformation abouthosts/volume-sets/dump-hierarchy.

AFS_BUDB_GetLck None ECode AFSName HostID BUDB_GetLock - Take a lock forreading/writing text information.

AFS_BUDB_FrALck None ECode AFSName HostID BUDB_FreeLock - Free a lock.AFS_BUDB_FreLck None ECode AFSName HostID BUDB_FreeAllLocks - Free all locks.

AFS_BUDB_GetIId None ECode AFSName HostID BUDB_GetInstanceId - Get lock instanceid.

AFS_BUDB_DmpDB None ECode AFSName HostID BUDB_DumpDB - Start dumping thedatabase.

AFS_BUDB_RstDBH None ECode AFSName HostID BUDB_RestoreDbHeader - Restore thedatabase header.

AFS_BUDB_DBVfy None ECode AFSName HostID BUDB_DbVerify - Verify the database.

AFS_BUDB_FndDmp P ECode AFSName HostIDvolName

BUDB_FindDump - Find the dump avolume belongs to.

AFS_BUDB_GetDmp P ECode AFSName HostID BUDB_GetDumps - Get a list of dumps inthe database.

AFS_BUDB_FnLTpe P ECode AFSName HostIDdumpId

BUDB_FindLastTape - Find last tape, andlast volume on tape of a dump.

AFS_BUDB_GetTpe P ECode AFSName HostID BUDB_GetTapes - Find a list of tapesbased on name or dump ID.

AFS_BUDB_GetVol P ECode AFSName HostID BUDB_GetVolumes - Find a list ofvolumes based on dump or tape name.

AFS_BUDB_DelVDP P M ECode AFSName HostIDdumpSetName

BUDB_DeleteVDP - Delete dumps withgiven name and dump path.

AFS_BUDB_FndCln P M ECode AFSName HostIDvolName

BUDB_FindClone - Find clone time ofvolume.

AFS_BUDB_FndLaD P ECode AFSName HostIDvolName

BUDB_FindLatestDump - Find the latestdump a volume belongs to.

AFS_BUDB_TGetVr None ECode AFSName HostID BUDB_T_GetVersion - Test Get version.

AFS_BUDB_TDmpHa P ECode AFSName HostIDfile

BUDB_T_DumpHashTable - Test dumpof hash table.

AFS_BUDB_TDmpDB P ECode AFSName HostIDfile

BUDB_T_DumpDatabase - Test dump ofdatabase.

.14 Protection Server Events

Event Class Parameters DescriptionAFS_PTS_Start P ECode The protection server has started.

AFS_PTS_Finish C ECodeThe protection server has finished. Finishevents are rare since the server process isnormally aborted.

AFS_PTS_Exit C ECodeThe protection server has exited. Exitevents are rare since the server process isnormally aborted.

AFS_PTS_NmToId None ECode AFSName HostID PR_NameToID - Perform one or morename-to-ID translations.

AFS_PTS_IdToNm None ECode AFSName HostIDGroupId

PR_IDToName - Perform one or moreID-to-name translations.

AFS_PTS_NewEnt None ECode AFSName HostIDGroupId Name OwnerId

PR_NewEntry - Create a PDB (ProtectionDataBase) entry for the given name.

AFS_PTS_INewEnt None ECode AFSName HostIDGroupId Name OwnerId

PR_INewEntry - Create a PDB entry forthe given name and ID.



AFS_PTS_LstEnt None ECode AFSName HostIDGroupId

PR_ListEntry - Get the contents of a PDBentry based on its ID.

AFS_PTS_DmpEnt None ECode AFSName HostIDPosition

PR_DumpEntry - Get the contents of aPDB entry based on its offset.

AFS_PTS_ChgEnt NoneECode AFSName HostIDGroupId NewNameNewOwnerId NewId

PR_ChangeEntry - Change an existingPDB entry’s ID, name, owner, or acombination.

AFS_PTS_SetFEnt None ECode AFSName HostIDGroupId

PR_SetFieldsEntry - Changemiscellaneous fields in an existing PDBentry.

AFS_PTS_Del None ECode AFSName HostIDGroupId PR_Delete - Delete an existing PDB entry.

FS_PTS_WheIsIt None ECode AFSName HostIDGroupId Position

PR_WhereIsIt - Get the PDB byte offsetof the entry for a given ID.

AFS_PTS_AdToGrp None ECode AFSName HostIDGroupId UserId PR_AddToGroup - Add a user to a group.

AFS_PTS_RmFmGrp None ECode AFSName HostIDGroupId UserId

PR_RemoveFromGroup - Remove a userfrom a chosen group.

AFS_PTS_LstMax None ECode AFSName HostID PR_ListMax - Get the largest allocateduser and group ID.

AFS_PTS_SetMax None ECode AFSName HostIDGroupId flag

PR_SetMax - Set the largest allocateduser and group ID.

AFS_PTS_LstEle None ECode AFSName HostIDGroupId

PR_ListElements - List all IDs associatedwith a user or group.

AFS_PTS_GetCPS None ECode AFSName HostIDGroupId

PR_GetCPS - Get the CPS (CurrentProtection Subdomain) for the given ID.

AFS_PTS_GetCPS2 None ECode AFSName HostIDGroupId Host

PR_GetCPS2 - Get the CPS for the givenid and host.

AFS_PTS_GetHCPS None ECode AFSName HostIDHost

PR_GetHostCPS - Get the CPS for thegiven host.

AFS_PTS_LstOwn None ECode AFSName HostIDGroupId

PR_ListOwned - Get all IDs owned by thegiven ID.

AFS_PTS_IsMemOf None ECode AFSName HostIDUserId GroupId

PR_IsAMemberOf - Is a given user ID amember of a specified group?

.15 Authentication Events


AFS_KAA_ChPswd S ECode AFSName HostIDname instance

KAA_ChangePassword - Changepassword.

AFS_KAA_Auth A S ECode AFSName HostIDname instance

KAA_Authenticate - Authenticate to thecell.

AFS_KAA_AuthO S ECode AFSName HostIDname instance

KAA_Authenticate_old - Old styleauthentication.

AFS_KAT_GetTkt A S ECode AFSName HostIDname instance

KAT_GetTicket - An attempt was made toget an AFS ticket for some principal listedin the Authentication Database.

AFS_KAT_GetTktO S ECode AFSName HostIDname instance

KAT_GetTicket_old - An attempt wasmade to get an AFS ticket for someprincipal listed in the AuthenticationDatabase.

AFS_KAM_CrUser S P ECode AFSName HostIDname instance KAM_CreateUser - Create a user.



AFS_KAM_DelUser S P ECode AFSName HostIDname instance KAM_DeleteUser - Delete a user.

AFS_KAM_SetPswd S ECode AFSName HostIDname instance

KAM_SetPassword - Set the password fora user.

AFS_KAM_GetPswd S ECode AFSName HostIDname

KAM_GetPassword - Get the password ofa user.

AFS_KAM_GetEnt S ECode AFSName HostIDname instance

KAM_GetEntry - The RPC made by thekas examine command to get one entryfrom the Authentication Database (byindex entry).

AFS_KAM_LstEnt S ECode AFSName HostIDindex

KAM_ListEntry - The RPC made to listone or more entries in the AuthenticationDatabase.

AFS_KAM_Dbg S ECode AFSName HostIDKAM_Debug - The RPC that produces adebugging trace for the AuthenticationServer.

AFS_KAM_SetFld S PECode AFSName HostIDname instance flags datelifetime maxAssoc

KAM_SetFields - The RPC used by thekas setfields command to manipulate theAuthentication Database.

AFS_KAM_GetStat S ECode AFSName HostID KAM_GetStatus - An RPC used to getstatistics on the Authentication Server.

AFS_KAM_GRnKey S ECode AFSName HostID KAM_GetRandomKey - An RPC used togenerate a random encryption key.

AFS_UnlockUser S ECode AFSName HostIDname instance

KAM_Unlock - The RPC used to initiatethe kas unlock command.

AFS_LockStatus None ECode AFSName HostIDname instance

KAM_LockStatus - The RPC used todetermine whether a user’s AuthenticationDatabase entry is locked.

AFS_UseOfPriv P ECode AFSName HostIDname instance cell

An authorized command was issued andallowed because the user had privilege.

AFS_UnAth S ECode AFSName HostIDname instance cell

An authorized command was issued andallowed because the system was runningin noauth mode.

AFS_UDPAuth A S ECode name instance An authentication attempt was made witha Kerberos client.

AFS_UDPGetTckt A S ECode name instance cellname instance

An attempt was made to get a Kerberosticket.

AFS_RunNoAuth S ECode Check was made and some random serveris running noauth.

AFS_NoAuthDsbl S P ECode Server is set to run in authenticated mode.

AFS_NoAuthEnbl S P ECode Server is set to run in unauthenticatedmode.

.16 File Server and Cache Manager Interface Events


AFS_SRX_FchACL None ECode AFSName HostID(FID)

RXAFS_FetchACL - Fetch the ACLassociated with the given AFS fileidentifier.

AFS_SRX_FchStat None ECode AFSName HostID(FID)

RXAFS_FetchStatus - Fetch the statusinformation for a file system object.

AFS_SRX_StACL M ECode AFSName HostID(FID)

RXAFS_StoreACL - Associate an ACLwith the names directory.



AFS_SRX_StStat M ECode AFSName HostID(FID)

RXAFS_StoreStatus - Store statusinformation for the specified file.

AFS_SRX_RmFile O ECode AFSName HostID(FID) name

RXAFS_RemoveFile - Delete the givenfile.

AFS_SRX_CrFile O ECode AFSName HostID(FID) name RXAFS_CreateFile - Create the given file.

AFS_SRX_RNmFile O MECode AFSName HostID(oldFID) oldName(newFID) newName

RXAFS_Rename - Rename the specifiedfile in the given directory.

AFS_SRX_SymLink O ECode AFSName HostID(FID) name

RXAFS_Symlink - Create a symboliclink.

AFS_SRX_Link O ECode AFSName HostID(FID) name (FID) RXAFS_Link - Create a hard link.

AFS_SRX_MakeDir O ECode AFSName HostID(FID) name RXAFS_MakeDir - Create a directory.

AFS_SRX_RmDir O ECode AFSName HostID(FID) name

RXAFS_RemoveDir - Remove adirectory.

AFS_SRX_SetLock None ECode AFSName HostID(FID) type

RXAFS_SetLock - Set an advisory lockon the given file identifier.

AFS_SRX_ExtLock None ECode AFSName HostID(FID)

RXAFS_ExtendLock - Extend anadvisory lock on a file.

AFS_SRX_RelLock None ECode AFSName HostID(FID)

RXAFS_ReleaseLock - Release theadvisory lock on a file.

AFS_SRX_FchData None ECode AFSName HostID(FID)

StartRXAFS_FetchData - Begin a requestto fetch file data.

AFS_SRX_StData O ECode AFSName HostID(FID)

StartRXAFS_StoreData - Begin a requestto store file data.

AFS_SRX_BFchSta None ECode AFSName HostID(FID)

RXAFS_BulkStatus - Fetch statusinformation regarding a set of file systemobjects.

AFS_SRX_SetVolS M ECode AFSName HostIDvolId volName

RXAFS_SetVolumeStatus - Set the basicstatus information for the named volume.

AFS_Priv P ECode viceId callRoutine Checking Permission Rights of user - userhas permissions.

AFS_PrivSet P ECode viceId callRoutine Set the privileges of a user.

.17 BOS Server Events


AFS_BOS_CreBnod P C ECode AFSName HostID BOZO_CreateBnode - Create a processinstance.

AFS_BOS_DelBnod P C ECode AFSName HostIDinstance

BOZO_DeleteBnode - Delete a processinstance.

AFS_BOS_SetReSt P M C ECode AFSName HostID BOZO_Restart - Restart a given processinstance.

AFS_BOS_GetLog P ECode AFSName HostID StartBOZO_GetLog - Pass the IN paramswhen fetching a BOS Server log file.

AFS_BOS_SetStat P M C ECode AFSName HostIDinstance

BOZO_SetStatus - Set process instancestatus and goal.

AFS_BOS_SetTSta P M C ECode AFSName HostIDinstance

BOZO_SetTStatus - Temporarily setprocess instance status and goal.

AFS_BOS_StartAl P C ECode AFSName HostID BOZO_StartupAll - Start all existingprocess instances.



AFS_BOS_ShtdAll P C ECode AFSName HostID BOZO_ShutdownAll - Shut down allprocess instances.

AFS_BOS_ReStAll P C ECode AFSName HostID BOZO_RestartAll - Shut down, thenrestart all process instances.

AFS_BOS_ReBos P C ECode AFSName HostIDBOZO_ReBozo - Shut down, then restartall process instances and the BOS Serveritself.

AFS_BOS_ReBosIn P C ECodeBOZO_ReBozo - Same asAFS_BOS_ReBos but done internally(server restarts).

AFS_BOS_ReStart P C ECode AFSName HostIDinstance

BOZO_Restart - Restart a given processinstance.

AFS_BOS_WaitAll P C ECode AFSName HostID BOZO_WaitAll - Wait until all processinstances have reached their goals.

AFS_BOS_AddSUsr S P ECode AFSName HostID BOZO_AddSUser - Add a user to theUserList.

AFS_BOS_DelSUsr S P ECode AFSName HostID BOZO_DeleteSUser - Delete a user fromthe UserList.

AFS_BOS_LstSUsr None ECode AFSName HostIDBOZO_ListSUsers - Get the name of theuser in the given position in the UserListfile.

AFS_BOS_LstKey P ECode AFSName HostID BOZO_ListKeys - List information aboutthe key at a given index in the key file.

AFS_BOS_LstKeyU P ECode AFSName HostID BOZO_ListKeys - Same asAFS_BOS_LstKey, but unauthorized.

AFS_BOS_AddKey S P ECode AFSName HostID BOZO_AddKey - Add a key to the keyfile.

AFS_BOS_DelKey S P ECode AFSName HostID BOZO_DeleteKey - Delete the entry foran AFS key.

AFS_BOS_SetNoAu S P ECode AFSName HostIDflag

BOZO_SetNoAuthFlag - Enable ordisable authenticated call requirements.

AFS_BOS_SetCell S P ECode AFSName HostIDname

BOZO_SetCellName - Set the name ofthe cell to which the BOS Server belongs.

AFS_BOS_AddHst S P ECode AFSName HostIDname

BOZO_AddCellHost - Add an entry to thelist of database server hosts.

AFS_BOS_DelHst S P ECode AFSName HostIDname

BOZO_DeleteCellHost - Delete an entryfrom the list of database server hosts.

AFS_BOS_Inst P O M ECode AFSName HostIDname

StartBOZO_Install - Pass the INparameters when installing a serverbinary.EndBOZO_Install - Get the OUTparameters when installing a serverbinary.

AFS_BOS_UnInst P O M ECode AFSName HostIDname

BOZO_UnInstall - Roll back from aserver binary installation.

AFS_BOS_PrnLog P O ECode AFSName HostID BOZO_Prune - Throw away old versionsof server binaries and core file.

AFS_BOS_Exec P C ECode AFSName HostIDcmd

BOZO_Exec - Execute a shell commandat the server.

AFS_BOS_DoExec P C ECode exec The bosserver process was restarted.

AFS_BOS_StpProc P C ECode cmd An RPC to stop any process controlled bythe BOS Server.

.18 Volume Location Server Events



AFS_VL_CreEnt P M ECode AFSName HostIDname VL_CreateEntry - Create a VLDB entry.

AFS_VL_DelEnt P M ECode AFSName HostIDvolID VL_DeleteEntry - Delete a VLDB entry.

AFS_VL_GetNVlID None ECode AFSName HostID VL_GetNewVolumeId - Generate a newvolume ID.

AFS_VL_RepEnt P M ECode AFSName HostIDvolID

VL_ReplaceEntry - Replace entirecontents of VLDB entry.

AFS_VL_UpdEnt P M ECode AFSName HostIDvolID

VL_UpdateEntry - Update contents ofVLDB entry.

AFS_VL_SetLck P ECode AFSName HostIDvolID VL_SetLock - Lock VLDB entry.

AFS_VL_RelLck P ECode AFSName HostIDvolID VL_ReleaseLock - Unlock VLDB entry.


Chapter 16

Index

Aa ACL permission, 334A instruction

uss template file, 285access, see ACL

count, in volume header, 111permissions on ACL (see entries: permissions on ACL,

ACL), 333transparent (AFS feature), 5

ACLadding entries, 337auxiliary permissions, 334cleaning, 341clearing, 339compared to UNIX protection, 332copying between directories, 340default on new volume, 93displaying, 336editing entries, 337foreign users on, 335group entries, usefulness, 335normal vs. negative permissions, 335permissions defined, 333removing entries, 337removing obsolete AFS IDs, 341replacing all entries, 339setting entries, 337setting for directory with uss, 281setting on user home directory with uss, 280shorthand notation for grouping permissions, 334system groups on, 335

activeclients statistic from scout program, 202state of fstrace event set, 208

Active DirectoryKerberos KDC, 11

addingACL entry

negative permissions, 338normal permissions, 337

ADMIN flag to Authentication Database entry, 345CellServDB file (server) entry for database server ma-

chine, 63

database server machineto client CellServDB file and kernel memory, 252to server CellServDB file, 63

disk to file server machine, 66members to groups, 324read-only site definition in VLDB, 98server encryption key to KeyFile file, 230system:administrators group members, 344UserList file users, 347

ADMIN flag in Authentication Database entrydisplaying, 345privileges resulting, 345setting or removing, 345

administer ACL permission, see a ACL permissionadministering

server machine, 42user accounts, 298

administrative databaseabout replicating, 51backing up, 55restoring, 55

administrative privilegethree types, 343

AFS, see AFS UIDauditing events on AIX server machines, 225authentication separate from UNIX, 30differences from UNIX summarized, 12global namespace, 16initialization script, 242reducing traffic in, 7root directory (/afs)

in cell filespace, 18on client machine, 16

security features, 36server encryption key, see server encryption keyserver processes used in, 7

afs entry in Authentication Databasedisplaying, 229setting server encryption key, 230

AFS GIDcounter for automatic allocation, displaying and set-

ting, 330definition, 317


displayingfor all groups in Protection Database, 319for one group, 317

removing obsolete from ACL, 341AFS UID

assigningwith pts createuser command, 301with uss, 287

counter for automatic allocation, displaying and set-ting, 330

definition, 317displaying

for all users and machines in Protection Database,319

for one user or machine, 317matching with UNIX UID, 270, 299removing obsolete from ACL, 341reserved

anonymous user, 27system-defined groups, 30

reusing, about, 321setting counters for automatic allocation, 331

AFSCELL environment variable, 166AFSCONF environment variable (NFS/AFS Translator), 350afsd program, 241afsmonitor program

available statistics, 368Cache Manager statistics, 368command syntax, 222creating an output file, 222creating configuration files for, 220features summarized, 215file server statistics, 386requirements for running, 215screen layout, 216setting terminal type, 215stopping, 223

AFSSERVER environment variable (NFS/AFS Translator),350

afszcm.cat file, 242AIX

auditing AFS eventsabout, 225

configuring tape device, 140all shorthand for ACL permissions, 334all-or-nothing release of read-only volumes, 96anonymous user

AFS UID reserved, 27identity assigned to unauthenticated user, 64

archivingtapes in Backup System, 149

ASK instruction in CFG_device_name file, 160assigning

AFS GID to group, 322AFS UID to machine, 320AFS UID to user, 301AFS UID with uss, 270

asynchronyenabling for Cache Manager write operations, 264when AFS files saved on NFS clients, 351

at-sys (@sys) variable in pathnames, 25auditing AFS events on AIX server machines, 225authenticated identity

acquiring with klog command, 33authentication

AFS compared to UNIX, 12AFS separate from UNIX, 30compared to authorization checking, 64consequences of multiple failures, 35improving security, 305

Authentication Databaseafs entry, 228changing username, 310entry

creating with kas create command, 301creating with uss, 287deleting with uss, 290removing, 313

passwordsetting, 309

password lifetime, setting, 35, 308server encryption key

displaying, 229setting, 230

site for AFS server encryption key, 228Authentication Server

about starting and stopping, 78as kaserver process, 75binary in /usr/afs/bin, 43description, see Kerberos KDCdisplaying log file, 88restarting after adding entry to server CellServDB file,

63restarting after removing entry from server CellServDB

file, 63runs on database server machine, 48when to contact, 76

AuthLog file, 46displaying, 88

authorization checkingand restarting processes, 65compared to authentication, 64controlling cell-wide, 65defined, 64disabling, 65enabling, 65

automaticprocess restarts by BOS Server, 87update to admin. databases by Ubik, 51

automatingcreation of backup volumes, 100tape mounting and unmounting by Backup System, 157

AUTOQUERY instruction in CFG_device_name file, 160auxiliary ACL permissions, 334


availability of datainterrupted by dumping, 125

Bbacking up

administrative databases, 55Backup Database to tape, 195data from AFS volumes, 171

backup commandsadddump, 150addhost, 141addvolentry, 144addvolset, 144binary in /usr/afs/bin, 43dbverify, 195deldump, 151deletedump, 197delhost, 141delvolentry, 146delvolset, 145diskrestore, 190dump, 180dumpinfo, 181granting privilege for, 346interactive mode

entering, 167exiting, 167

jobs, 167kill, 169labeltape, 153listdumps, 151listhosts, 142listvolsets, 145quit, 167readlabel, 154restoredb, 197savedb, 196scantape, 185setexp, 150status, 170volinfo, 184volrestore, 188volsetrestore, 193

Backup Databaseadministering, 194backing up, 195described, 135dump hierarchy

displaying, 151dump ID numbers, displaying, 181dump levels

adding, 150deleting, 151displaying, 151

dump recordscreating as part of dump operation, 177displaying, 181

expiration dates, 148changing, 150setting, 150

port offset numbersdisplaying, 142

restoring, 195Tape Coordinator

adding entry, 141removing entry, 141

verifying integrity, 195volume dump history

displaying, 184recovering from tapes, 184

volume entrycreating, 144deleting from volume set, 146displaying, 145

volume setcreating, 144deleting, 145displaying, 145

Backup field in volume header, 110Backup Server

about starting and stopping, 78as buserver process, 74binary in /usr/afs/bin, 43description, 10displaying log file, 88restarting after adding entry to server CellServDB file,



Backup Systemautomating operations, 156automating tape mounting and unmounting, 157Backup Database, see Backup DatabaseBackup Server described, 10configuration overview, 136data

backing up/dumping, 171recovering, 186restoring, 186

dump, see dumpdump hierarchy, see dump hierarchydump history

recovering from tapes, 184dump ID number, see dump

assigning as part of dump operation, 177dump level, see dump hierarchydump name, see dumpdump operation, overview, 174dump records

deleting, 197dump set, see dump setdumps, full and incremental defined, 171


eliminating check for proper tape name, 161eliminating search/prompt for initial tape, 160filemarks, see Tape Coordinatorinteractive mode, 166

canceling operations, 169displaying pending/running operations, 167

interfaces, 165introduction, 132job ID numbers

about, 167displaying, 167

port offsets, see Tape Coordinatorrecycling schedule for tapes, 148reducing operator intervention, 156regular expressions, 143restores

date-specific, 186full, 186

restoringbacked up data, 186backup data, 188data, 186

running in foreign cell, 166scanning tapes, 184suggestions for improving efficiency, 172Tape Coordinator, see Tape Coordinatortape name, see tapesuseCount counter on tape label, 149using default responses to errors, 160volume dump history

recovering from tapes, 184volume entry, see volume entryvolume set, see volume set

backup volumeautomating creation of, 100changing name of, 129creating, 99creating multiple at once, 99defined, 91dumping, 125ID number in volume header, 110mounting, 101moving, 115removed by read/write move, 115removed by read/write removal, 123space-saving nature of, 95suggested schedule for creation of, 100

BackupLog file, 46displaying, 88

BAK version of binary filecreated by bos install command, 57removing obsolete, 60used by bos uninstall command, 58

banner line on the scout program screen, 202basenames in scout program, 201bdb.DB0 file, 46bdb.DBSYS1 file, 46

binary distribution machinedefined, 49identifying with bos status, 50

bos commandsaddhost, 63addkey

basic instructions, 231when handling key emergency, 236

adduser, 347binary in /usr/afs/bin, 43create, 81delete, 83exec, 72getdate, 60getlog, 88getrestart, 87granting privilege for, 346install, 58listhosts, 62listkeys, 229listusers, 346mutual authentication, bypassing, 66prune, 60removehost, 63removekey, 233removeuser, 347restart

excluding BOS Server, 86including BOS Server, 86selected processes, 86with -bosserver flag, 86

salvage, 119setauth, 65setrestart, 87shutdown, 84start, 84startup, 85status, 79stop, 83summary of functions, 74uninstall, 59

BOS Serveras bosserver process, 74binary in /usr/afs/bin, 43description, 8displaying log file, 88maintainer of server process binaries, 57memory state, 78monitoring server processes, 73restart times, displaying and setting, 87role in reboot of server machine, 71use of BosConfig file, 78when to contact, 74

BosConfig file, 45changing status flag from NotRun to Run, 84changing status flag from Run to NotRun, 83creating server process entry, 81


displaying entries, 79information, 77removing server process entry, 83restart times defined, 87

BosLog file, 46displaying, 88

bosserver, see BOS Serverbinary in /usr/afs/bin, 43

bulk mode in uss, 292buserver, see Backup Server

binary in /usr/afs/bin, 43butc command, 169

Ccache

tuning, 249cache files (client), 242Cache Manager

afsd initialization program, 241as interpreter of mount points, 93CellServDB file (client), using, 249collecting data with xstat data collection facility, 223configuring and customizing, 240data cache

displaying size set at reboot, 244data cache size

displaying current, 244resetting to default value (for disk cache only), 245setting in cacheinfo file, 244setting until next reboot, 244

database server processes, contacting, 249described, 240displaying

cache size from cacheinfo file, 244enabling asynchrony for write operations, 264flushing cache, 256functions of, 10interfaces registered with File Server, 260messages displayed, controlling, 262monitoring performance, 200preference ranks for File Server and VL Server, 257setting

cache size in cacheinfo file, 244disk cache location, 244home cell, 255probe interval for File Server, 254

setuid programs, 253system type name stored in kernel memory, 263use of NetInfo file, 260use of NetRestrict file, 260xstat data collection facility libraries, 223xstat data collections, 224xstat example commands, 224xstat_cm_test example command, 225

cacheinfo file, 241displaying contents, 244format, 244

resetting disk cache to size specified, 245setting

cache size, 244disk cache location, 244

CacheItems file, 243caching, 6callback, 7cell, 5

changing list in client kernel memory, 252filespace configuration issues, 18foreign, 5granting local access to foreign users, 18local, 5making foreign visible to local, 17making local visible to foreign, 17name

at second level in file tree, 16, 18choosing, 15setting, 15

setting home cell for client machine, 255CellServDB file (client)

about, 242, 249changing, 252copied into kernel memory, 249correct format, 250displaying, 251global source from AFS Support, 251maintaining, 251updating, 252

CellServDB file (server)about, 44adding database server machine, 63displaying, 62effect of wrong information in, 61importance to Ubik operation, 52maintaining, 61removing database server machine, 63

CellServDB file maintained by the AFS Registraras global update source, 17

CellServDB.local file, 17cellular mount point, see mount pointCFG_device_name file, 157changing

ACL entries, 337data cache size specified in cacheinfo file, 244data cache size temporarily, 244disk cache location, in cacheinfo file, 244disk cache size to default value, 245group ownership to self-owned, 323group-creation quota, 328mount point when renaming user, 312Protection Database entry

name, 327owner, 326

username, 310volume name, 129volume name when renaming user, 312


checksum, 229chgrp command

AFS compared to UNIX, 13chmod command

AFS compared to UNIX, 13choosing

namecell, 15user, 27user volume, 28user volume mount point, 28

chown commandAFS compared to UNIX, 13

clearingall ACL entries, 339contents of trace log (fstrace), 213

clientdefinition, 4machine

definition, 4client machine

/usr/vice/etc directory, 241cache files, 242CellServDB file, displaying, 251changing CellServDB file, 252changing list of cells in kernel memory, 252configuration files, 241configuration issues, 24controlling running of setuid programs, 253data cache size

displaying current, 244setting in cacheinfo file, 244setting until next reboot, 244

data cache size set at rebootdisplaying, 244

database server machines, displaying knowledge of, 251database server processes, contacting, 249disk cache size

resetting to default value, 245disk versus memory cache, 243displaying

data cache size from cacheinfo file, 244enabling access to foreign cell, 25files required on local disk, 24flushing data cache, 256making foreign cell visible, 17messages displayed, controlling, 262monitoring performance, 200setting

data cache size in cacheinfo file, 244disk cache location, 244home cell, 255

setting home cell, 15system type name stored in Cache Manager memory,

263client machines statistic from scout program, 202clocks

need to synchronize for Ubik, 52clone, 6, 95

forcing creation of new, 96cloning

defined, 95for backup, 99for replication, 96

close system callfor files saved on AFS client, 14for files saved on NFS client, 351

cm event set (fstrace), 208cmfx trace log (fstrace), 208collecting

data with xstat data collection facility, 223command interpreters

CellServDB file (client), using, 249command parameters

in BosConfig file, 78command suite

binariesdisplaying time stamp, 59installing, 57removing obsolete, 60uninstalling, 58

commandsafsd, 241afsmonitor, 222backup

to enter interactive mode, 167backup adddump, 150backup addhost, 141backup addvolentry, 144backup addvolset, 144backup dbverify, 195backup deldump, 151backup deletedump, 197backup delhost, 141backup delvolentry, 146backup delvolset, 145backup diskrestore, 190backup dump, 180backup dumpinfo, 181backup interactive, 167backup jobs, 167backup kill, 169backup labeltape, 153backup listdumps, 151backup listhosts, 142backup listvolsets, 145backup quit, 167backup readlabel, 154backup restoredb, 197backup savedb, 196backup scantape, 185backup setexp, 150backup status, 170backup volinfo, 184


backup volrestore, 188backup volsetrestore, 193bos addhost, 63bos addkey, 231bos adduser, 347bos create, 81bos delete, 83bos exec, 72bos getdate, 60bos getlog, 88bos getrestart, 87bos install, 58, 59bos listhosts, 62bos listkeys, 229bos listusers, 346bos prune, 60bos removehost, 63bos removekey, 233bos removeuser, 347bos restart

excluding BOS Server, 86including BOS Server, 86selected processes, 86

bos salvage, 119bos setauth, 65bos setrestart, 87bos shutdown, 84bos start, 84bos startup, 85bos status, 79bos stop, 83butc, 169chgrp (AFS compared to UNIX), 13chmod (AFS compared to UNIX), 13chown (AFS compared to UNIX), 13executing from uss template file, 286fms, 138fs checkservers, 254fs checkvolumes, 256fs cleanacl, 341fs copyacl, 340fs examine, 113, 122fs exportafs, 354fs flush, 256fs flushmount, 257fs flushvolume, 256fs getcacheparms, 244fs getcellstatus, 253fs getclientaddrs, 261fs getserverprefs, 259fs listacl, 336fs listcells, 251fs listquota, 113, 122fs lsmount, 105fs messages, 262fs mkmount, 94

general instructions, 105

when changing username, 312when creating user account, 303when mounting backup volume, 101when renaming volume, 130when restoring volume, 128

fs newcell, 252fs quota, 122fs rmmount, 107, 125, 130, 312, 314fs setacl, 337

with -clear flag, 339with -negative flag, 338

fs setcachesize, 244fs setcell, 254fs setclientaddrs, 262fs setquota, 121fs setserverprefs, 259fs setvol, 121fs storebehind

displaying asynchrony for specific files, 265displaying default asynchrony, 265setting asynchrony for specific files, 264setting default asynchrony, 264

fs sysname, 263fs whereis, 114fsck (AFS compared to UNIX), 13fsck (AFS version), 13fstrace clear, 213fstrace dump, 212fstrace lslog, 211fstrace lsset, 210fstrace setlog, 210fstrace setset, 210groups (AFS compared to UNIX), 13id (AFS compared to UNIX), 13kas create, 303

when changing username, 311kas delete, 313

when changing username, 311kas examine, 229

to display ADMIN flag, 345kas interactive, 66kas setfields, 35

limiting failed authentication attempts, 307prohibiting password reuse, 308setting ADMIN flag, 345setting password lifetime, 308

kas setpassword, 35, 231, 309kas unlock, 308klog, 33klog with -setpag flag, 31klog.krb, 35knfs, 358kpasswd, 35ln (AFS compared to UNIX), 13mount, 355pagsh, 31pagsh.krb, 35


privileged, defined, 64pts adduser, 324

for system:administrators group, 344pts chown, 326pts creategroup, 323pts createuser

machine entry, 321user account, 302

pts delete, 314, 325pts examine, 317pts listentries, 319pts listmax, 330pts listowned, 319pts membership, 318

displaying system:administrators group, 344pts removeuser, 325

for system:administrators group, 345pts rename

machine or group name, 327username, 310

pts setfieldssetting group creation quota, 328setting privacy flags, 329

pts setmax, 331scout, 203share, 353ssh (AFS compared to UNIX), 13sshd (AFS compared to UNIX), 13strings, 61sys, 263tokens, 33tokens.krb, 35udebug, 44umount, 68unlog, 33uss add, 288uss bulk, 296uss delete, 292vos addsite, 98vos backup, 101vos backupsys, 102vos changeaddr, 70vos create

basic instructions, 94when creating user account, 303

vos delentry, 124vos dump, 126vos examine

basic instructions, 112vos listaddrs, 70vos listpart, 66vos listvldb

syntax, 108vos listvol

syntax, 109vos lock, 130vos move

basic instructions, 115vos partinfo, 94vos release

basic instructions, 99vos remove, 314

basic instructions, 124vos remsite, 124vos rename

basic instructions, 129when changing username, 312

vos restore, 127, 128vos syncserv, 117vos syncvldb, 117vos unlock, 131vos unlockvldb, 131vos zap, 124which, 61xstat_cm_test, 225xstat_fs_test, 224

common configuration files (server), 44compilation

date of, listing on binary file, 59configuration file, see CFG_<device_name> configuration

fileconfiguration files

client machine, 241server machine, common, 49

configuringafsmonitor program, 220Cache Manager, 240client machine, issues, 24file server machine, issues, 22filespace, issues, 18trace log (fstrace), 210

Conn statistic from scout program, 202consistency guarantees

administrative databases, 53cached data, 7

constantsuss template file, 274

contacting processesAuthentication Server, 76Backup Server, 74BOS Server, 74File Server, 75Protection Server, 76Salvager, 75Update Server, 77VL Server, 77Volume Server, 75

controllingauthorization checking for entire cell, 65server machine interfaces registered in VLDB, 68

conventionsAFS pathnames, 16cell name, 15volume names, 19, 93


convertingexisting UNIX accounts to AFS accounts

with manual account creation, 300with uss, 272

coordinator (Ubik)defined, 51election procedure described, 54

copyingACL between directories, 340

core filesfor server processes, 46removing from /usr/afs/logs directory, 60

core leakpreventing with scheduled restarts, 87

correspondenceof volumes and directories, 5

corruptionsymptoms and types, 118

counterProtection Database (max user id, max group id), 330

CPS, 315creating

ACL as copy of another, 340ACL entry, 337ACL entry in negative permissions section, 338Authentication Database entry

with kas create command, 301with uss, 287

backup volume, 99cellular mount point, 106common local password file with uss, 270directory with uss, 281file with uss, 282, 283group, self-owned, 323link (hard or symbolic) with uss, 284mount point when changing username, 312mount point with uss, 280multiple backup volumes at once, 99NetInfo file (client version), 261NetInfo file (server version), 69NetRestrict file (client version), 261NetRestrict file (server version), 69PAG with klog or pagsh command, 31Protection Database group entry, 322Protection Database machine entry, 320Protection Database user entry

with pts createuser command, 301with uss, 287

read-only volume, 96read/write or regular mount point, 105read/write volume, 93server encryption key, 230server process, 81standard files in new user account, 29tape label (Backup System), 152user account

with individual commands, 301

with uss, 287user account types with uss, 274user accounts in bulk with uss, 292volume with uss, 278

creation daterecorded in volume header, 111

creatorProtection Database entry

displaying, 317displaying all, 319

criteria for replicating volumes, 97cron process

creating with bos create command, 82cron server process

defining in BosConfig file, 81cron-type server process

defined, 77used to automate volume backup, 100

current protection subgroup, 315curses graphics utility

afsmonitor program, 215scout program requirements, 201

Dd ACL permission, 334D instruction

uss template file, 281daily restart for new binaries

displaying and setting time, 87data

availability interrupted by dumping, 125data cache

changing location of disk cache, 244disk cache size

resetting to default value, 245disk versus memory, 243displaying size specified in cacheinfo file, 244flushing (forcing update), 256size

current, displaying, 244recommendations, 243set at reboot, displaying, 244setting in cacheinfo file, 244setting until next reboot, 244

Vn file in, 243data collection

with xstat data collection facility, 223database files, 45database server machine

addingto client CellServDB file and kernel memory, 252to server CellServDB file, 63

CellServDB file (client), displaying, 251CellServDB file (server) entry

adding, 63removing, 63

client knowledge of, 249


defined, 48displaying list in server CellServDB file, 62identifying with bos status, 49maintaining, 51reason to run three, 23removing

from client CellServDB file and kernel memory, 252from server CellServDB file, 63

use of NetInfo and NetRestrict files, 68database server process

about starting and stopping, 78need to run all on every database server machine, 52restarting after adding entry to server CellServDB file,


file, 63use of CellServDB file, 61

database, distributed, see administrative databasedatabases, distributed, 23date

on binary file, listing, 59date-specific restores, 186default

ACL, 93volume quota, 93, 121

definingdirectory for even distribution of accounts with uss,

278read-only site in VLDB, 98server encryption key, 230server encryption key in Authentication Database, 230server process in BosConfig file, 81

delayed write operationswhen AFS files saved on NFS clients, 351

delete ACL permission, see d ACL permissiondeleting

Authentication Database entry with uss, 290Protection Database user entry with uss, 290user accounts in bulk with uss, 292user accounts with uss, 290

denyingfile access with negative ACL entry, 338

desynchronization of VLDB/volume headersfixing, 116symptoms of, 116

determiningidentity of binary distribution machine, 50identity of database server machines, 49identity of system control machine, 50identity of:

simple file server machines, 50roles taken by server machine, 49success of replication, 96

differencesbetween AFS and UNIX, summarized, 12

directories/afs, 16, 18

/afs/cellname, 16, 18/usr/afs/backup, 140conventional under /afs/cellname, 19for grouping user home directories, 28lost+found, 13

directories (server)/usr/afs/bin, 57

directory/usr/afs/bin on server machines, 43/usr/afs/db on server machines, 45/usr/afs/etc, 44/usr/afs/local on server machines, 44/usr/afs/logs on server machines, 46/usr/vice/cache, 242/usr/vice/etc, 241/vicep on server machines, 47correspondence with volume, 5creating with uss, 281defining for even distribution of accounts with uss, 278disk cache, 242flushing from data cache on client machine, 256overwritten by uss if exists, 269root, 6

directory-level data protectionimplications, 332

directory/file nametranslating to volume ID number, 113translating to volume location, 114translating to volume name, 113

disablingauthorization checking, 65

discardingtokens, 33

diskfile server machine

adding/installing, 66removing, 67

Disk attn statistic from scout program, 202disk partition

displaying size of single, 122grouping related volumes on, 21monitoring usage of, 202moving volumes to reduce overcrowding, 114

display layout in scout program window, 201displaying

ACL entries, 336ADMIN flag in Authentication Database entry, 345AFS user id and max group id counters, 330BOS Server’s automatic restart times, 87Cache Manager preference ranks for file server ma-

chines, 257CellServDB file (client), 251CellServDB file (server), 62client interfaces registered with File Server, 260contents of trace log (fstrace), 212counters for AFS UID and AFS GID, 330creator of Protection Database entry, 317, 319


data cache sizeset at reboot, 244specified in cacheinfo file, 244

data cache size, current, 244database server machines in server CellServDB file, 62disk partition size, 122entries from BosConfig file, 79group-creation quota in Protection Database entry, 317groups owned by a user or group, 319groups to which user or machine belongs, 318KeyFile file, 229log files for server processes, 88members of group, 318membership count in Protection Database entry, 317mount point, 105owner of Protection Database entry, 317, 319privacy flags on Protection Database entry, 317Protection Database entries (all), 319Protection Database entry, 317server encryption key from Authentication Database,

229server encryption keys in KeyFile file, 229server entries from VLDB, 68server process status, 79state of event set (fstrace), 210state of trace log (fstrace), 211system:administrators group members, 344tape label (Backup System), 152time stamp on binary file, 59UserList file, 346VLDB entry

with volume header, 112VLDB entry with volume header, 112VLDB server entries, 68volume header, 109

with VLDB entry, 112volume header with VLDB entry, 112volume information, 107volume quota

percent used, 122with volume & partition info, 122with volume size, 122

volume size, 122volume’s VLDB entry, 108

distributed database, see administrative databasedistributed databases, 23distributed file system, 4

security issues, 38distribution

of CellServDB file (server), 62of databases, 23, 51

dormant (state of fstrace event set), 208dumb terminal

use in scout program, 201use with afsmonitor, 215

dump (Backup System)appended

creating, 178appended, defined, 133creating Backup Database record, 177defined, 133displaying Backup Database record, 181full, defined, 133ID number, described, 133ID number, displaying, 181incremental, defined, 133initial, defined, 133label, described, 134name, described, 133parent, defined, 133set, see dump set

dump hierarchy (Backup System)creating, 147described, 133displaying, 151dump level

defined, 133dump levels

adding, 150deleting, 151

expiration date on dump leveldescribed, 133

expiration dates, 148assigning to dump levels, 147changing, 150setting, 150

dump ID number (Backup System)displaying, 181

dump ID numbers (Backup System), 177dump levels

defined, 133expiration dates, 148

changing, 150setting, 150

in Backup Databaseadding, 150deleting, 151displaying, 151

dump set (Backup System)creating, 171defined, 133deleting from Backup Database, 197full dumps, 171incremental dumps, 171

dumpingBackup Database to tape, 195data, 171dump ID numbers, 177full dumps, 171incremental dumps, 171trace log contents (fstrace), 212volumes

reasons, 125using vos command, 126


without using AFS Backup System, 125dynamic kernel loader programs

directory for AFS library files, 242

EE instruction

uss template file, 283editing

NetInfo file (client version), 261NetInfo file (server version), 69NetRestrict file (client version), 261NetRestrict file (server version), 69

election of Ubik coordinator, 54emergency

server encryption keys mismatched, 233enabling authorization checking, 65encrypted network communication, 36entering

kas interactive mode, 66entry in VLDB

displaying, with volume header, 112for volume, 92locking/unlocking, 130

erasingall ACL entries, 339

etc/exports file, 353etc/fstab file, see file systems registry fileevent set (fstrace)

cm, 208displaying state, 210persistence, 209setting, 210

eventsauditing AFS on AIX server machines, 225

examplesscout program display, 205

executingcommand using uss template line, 286

expiration dates, 148absolute, 148changing, 150relative, 148setting, 150

FF instruction

uss template file, 282failure

of file storage due to full partition, 114of uss account creation

recovering from, 269Fetch statistic from scout program, 202file

creating standard ones in new user account, 29creating with uss, 282, 283flushing from data cache on client machine, 256overwritten by uss if exists, 269

required on client machine local disk, 24file extension

.BAK, 57

.OLD, 57File Server

as part of fs process, 75, 77binary in /usr/afs/bin, 43client interfaces registered, 260collecting data with xstat data collection facility, 223CPS requested from Protection Server, 315description, 8displaying log file, 88interfaces registered in VLDB

listed in sysid file, 45interfaces registered in VLDB server entry, 68monitoring with scout program, 203use of NetInfo file, 68use of NetRestrict file, 68use of sysid file, 68when to contact, 75xstat data collection facility libraries, 223xstat data collections, 224xstat example commands, 224xstat_fs_test example command, 224

file server machine, 4Cache Manager preference ranks for, 257configuration files in /usr/afs/local, 44core files in /usr/afs/logs, 46database files in /usr/afs/db, 45disk

adding/installing, 66removing, 67

displaying log files, 88inode-based, 13installing command and process binaries, 57log files in /usr/afs/logs, 46monitoring outages of, 203namei-based, 13partitions, naming, 47rebooting, about, 24restoring partitions using Backup System, 186salvaging volumes, 118

file server probe intervalsetting for a client machine, 254

file storagefailed due to partition crowding, 114

file systemdefined, 4monitoring activity, 200salvager, see Salvager

file systems registry fileadding new disk to file server machine, 67removing disk from file server machine, 68

file treeconventions

for configuring, 18third level, 19


creating volumes to match top level directories, 19FileLog file, 46

displaying, 88files

/usr/afs/etc/KeyFile, 228AFS initialization script, 242AFS libraries used by dynamic kernel loader programs,

242afsd, 241afszcm.cat, 242AuthLog, 46backup command binary, 43BackupLog, 46bdb.DB0, 46bdb.DBSYS1, 46bos command binary, 43BosConfig, 45, 77BosLog, 46bosserver binary, 43buserver, 43cacheinfo, 241CacheItems, 243CellServDB (client), 242CellServDB (server), 44, 61CellServDB file (client), 249CellServDB.local, 17CFG_<device_name>, 157displaying log files, 88exports, 353file systems registry (fstab), 67FileLog, 46fileserver, 43fms.log, 138FORCESALVAGE, 47global CellServDB, 17kas command binary, 43kaserver binary file, 43kaserver.DB0, 46kaserver.DBSYS1, 46KeyFile, 44NetInfo (client version), 242, 261NetInfo (server version), 45NetRestrict (client version), 242, 261NetRestrict (server version), 45NoAuth, 45prdb.DB0, 46prdb.DBSYS1, 46pts command binary, 44ptserver binary, 44SALVAGE.fs, 45salvage.lock, 45SalvageLog, 46salvager, 44server configuration, in /usr/afs/etc directory, 44sysid, 45tapeconfig, 137ThisCell (client), 242

ThisCell (server), 44udebug, 44upclient, 44upserver, 44UserList, 44V.vol_ID.vol, 47vldb.DB0, 46vldb.DBSYS1, 46VLLog, 46vlserver, 44Vn, 243VolserLog, 47volserver, 44VolumeItems, 243vos command binary, 44

fileserver, see File Serverbinary in /usr/afs/bin, 43

flexible synchronization site (Ubik), 54flushing

data cache on client machine, 256fms command, 138fms.log file, 138FORCESALVAGE file, 47foreign cell, 5

making local cell visible, 17making visible in local cell, 17

format of CellServDB file (client), 250fs commands

checkservers, 254checkvolumes, 256cleanacl, 341copyacl, 340examine, 113, 122exportafs, 354flush, 256flushmount, 257flushvolume, 256getcacheparms, 244getcellstatus, 253getclientaddrs, 261getserverprefs, 259granting privilege for, 344listacl, 336listcells, 251listquota, 113, 122lsmount, 105messages, 262mkmount

for read/write volume, 94general instructions, 105when changing username, 312when creating user account, 303when mounting backup volume, 101when renaming volume, 130when restoring volume, 128

mutual authentication, bypassing, 66newcell, 252


quota, 122rmmount, 107

when changing username, 312when removing user account, 314when removing volume, 125when renaming volume, 130

setacl, 337with -clear flag, 339with -negative flag, 338

setcachesize, 244setcell, 254setclientaddrs, 262setquota, 121setserverprefs, 259setvol, 121storebehind

displaying asynchrony for specific files, 265displaying default asynchrony, 265setting asynchrony for specific files, 264setting default asynchrony, 264

sysname, 263whereis, 114

fs process, 74creating, 82

fs server processdefining in BosConfig file, 81

fs-type server processdefined, 77

fsck commandAFS compared to UNIX, 13AFS version, 13

fstab file, see file systems registry filefstrace commands

clear, 213dump, 212example of use, 213lslog, 211lsset, 210privilege requirements, 209setlog, 210setset, 210

fsync system callfor files saved on AFS client, 14for files saved on NFS client, 351

full dump, 171creating using vos command, 126

full restores, 186

Gglobal namespace, 16granting

file access by setting ACL, 337privilege for backup commands, 346privilege for bos commands, 346privilege for fs commands, 344privilege for kas commands, 345, 346privilege for pts commands, 344

privilege for vos commands, 346group

ACL entry, usefulness of, 335AFS GID, 29AFS GID, assigning, 322creation quota, see quotadefinition, 9group use, 322groups owned, displaying, 319members, adding, 324members, displaying, 318members, removing, 325membership of machine or user, displaying, 318name, assigning, 322orphaned, displaying, 319owned by user or group, displaying, 319owner

displaying, 317displaying for all, 319

privacy flags, 29privacy flags on Protection Database entry


private use, 322Protection Database entry

deleting, 325displaying, 317displaying all, 319name, changing, 327

Protection Database entry, creating, 322Protection Database entry, described, 315regular and prefix-less, defined, 322restrictions, 29rules for naming, 323self-owned, creating, 323shared use, 322system, 316system-defined, 30system-defined on ACLs, 335using effectively, 322

group use of group, 322groups command

AFS compared to UNIX, 13

Hhandling

server encryption key emergency, 233hard link

AFS restrictions on, 14creating with uss, 284overwritten by uss if exists, 269

HeimdalKerberos KDC, 11

highlighting statistics in scout displaysetting thresholds, 204use of reverse video, 203


Ii ACL permission, 334id command

AFS compared to UNIX, 13identifying

binary distribution machine, 50database server machine, 49roles taken by server machine, 49simple file server machine, 50system control machine, 50

implicit ACL permissions, 334inactive (state of fstrace event set), 208incremental dump

creating using vos command, 126creating with Backup System, 171

initialization script for AFS, 242initializing

scout program, 203server process, 81

insert ACL permission, see i ACL permissioninstalling

disk on file server machine, 66server binaries, 57server process binaries, about, 57

intention flag in VLDB entry, 116interactive mode (Backup System)

entering, 167exiting, 167features, 166operations

canceling pending/running, 169displaying pending/running, 167

interactive mode (kas commands), 66Internet

conventions for cell name, 15

Jjob ID numbers (Backup System), 167

displaying, 167operations

canceling, 169

Kk ACL permission, 334kas commands

binary in /usr/afs/bin, 43create, 303

when changing username, 311delete

when changing username, 311when removing user account, 313

examineto display ADMIN flag, 345

examine, to inspect afs key, 229granting privilege for, 345interactive, 66mutual authentication, bypassing, 66

setfields, 35limiting failed authentication attempts, 307prohibiting password reuse, 308setting ADMIN flag, 345setting password lifetime, 308

setpassword, 35, 231, 309setpassword , when handling key emergency, 236unlock, 308

kaserver process, see Authentication Serverbinary in /usr/afs/bin, 43

kaserver.DB0 file, 46kaserver.DBSYS1 file, 46Kerberos

support for in AFS, 35use of usernames, 11

Kerberos KDCdescription, 11

kernel memory (client)CellServDB file, reading into, 249

key version numberdefined, 228

KeyFile fileadding server encryption key, 230displaying, 229function of, 44removing server encryption key, 232storage site for server encryption keys, 228

klog command, 33limiting failed attempts, 305when handling key emergency, 235with -setpag flag, 31

klog.krb command, 35knfs command, 358kpasswd command, 35kpwvalid program, 35kvno, see key version number

Ll ACL permission, 333L instruction

uss template file, 284learning

volume IDgiven directory/file name, 113given volume name, 113

volume locationgiven directory/file name, 114given volume name/ID number, 113

volume namegiven directory/file name, 113given volume ID number, 113

length restriction on volume names, 93libxstat_cm.a library, 223

data collections, 224example command using, 224obtaining more information, 224routines, 224


xstat_cm_test example command, 225libxstat_fs.a library, 223

data collections, 224example command using, 224obtaining more information, 224routines, 224xstat_fs_test example command, 224

listingtokens held by issuer, 33

ln commandAFS compared to UNIX, 13

local cell, 5granting foreign users access to, 18making foreign cells visible in, 17making visible to foreign cells, 17

local configuration files (server), 44local disk

files required on client machine, 24protecting on file server machine, 23

local password filecreating common source version with uss, 270creating entry for AFS user


setting password inwith manual account creation, 299with uss, 270

when not using AFS-modified login utility, 33when using AFS--modified login utility, 32

locationsetting for client, 244standard for uss template file, 275

lock ACL permission, see k ACL permissionlocked VLDB entry

displaying, 108unlocking, 130

lockingVLDB entry, 130

log filesdisplaying, 88fms.log, 138for replicated databases, 45for server processes, 46

loginlimiting failed attempts, 305

login utilityAFS version, 32AFS version’s interaction with local password file, 32

lookup ACL permission, see l ACL permissionlost+found directory, 13

Mmachine

adding to group, 324AFS UID, assigning, 320group memberships

displaying number, 317

group memberships, displaying, 318privacy flags on Protection Database entry


Protection Database entrydeleting, 325displaying, 317displaying all, 319name, changing, 327

Protection Database entry, creating, 320Protection Database entry, described, 315removing from group, 325

maintainingCellServDB file (client), 251synchrony of VLDB with volume headers, 92

majoritydefined for Ubik, 54

mappingAFS ID to group, machine, or username, 317group name to AFS GID, 317machine name to AFS UID, 317username to AFS UID, 317

max group id counter (Protection Database)displaying, 330setting, 331

max user id counter (Protection Database)displaying, 330setting, 331

maximum volume quota, 121MaxQuota field in volume header, 111members

group, adding, 324group, displaying, 317, 318group, removing, 325

membershipsystem groups, 316

memory state of BOS Server, 78message line in scout program display, 203MIT Kerberos

Kerberos KDC, 11mode bits (UNIX)

interpretation in AFS, 342monitoring

Cache Manager performance, 200Cache Manager processes with afsmonitor, 200disk usage with scout program, 202file server processes with afsmonitor, 200file server processes with scout, 200outages with scout program, 203server processes, 73

mount command, 355MOUNT instruction in CFG_device_name file, 157mount point

cellularcreating, 106described, 104

changing when renaming user, 312


choosing name for user volume, 28creating cellular, 106creating multiple per volume, 93creating read/write or regular, 105creating with uss, 280defined, 93definition, 6displaying, 105distinguishing different types, 105flushing from data cache on client machine, 256read/write

creating, 105described, 104

regularcreating, 105described, 103

removing, 107, 123removing when removing user account, 314

mountingbackup volume, 101disk on file server machine, 66foreign volume in local cell, 104read-only volume, 97read/write volume, 94volume

about, 93general instructions, 103

movingvolume, 114

mutual authentication, 38failure due to mismatched keys, 233preventing, 65server encryption key’s role, 227

Nname

Protection Database entrychanging, 327

NAME_CHECK instruction in CFG_device_name file, 161namei

definition, 13needs salvage status flag in volume header, 110negative ACL permissions

defined, 335NetInfo file (client version), 242, 261NetInfo file (server version), 45

creating/editing, 69NetRestrict file (client version), 242, 261NetRestrict file (server version), 45

creating/editing, 69network

defined, 4encrypted communication in AFS, 36reducing traffic through caching, 7

New releasestatus flag on site definition in VLDB entry, 109

New release site flag in VLDB

as indicator of failed replication, 96NFS/AFS Translator, 349

AFSCONF environment variable, 350NoAuth file, 45

creating in emergencies, 235none shorthand for ACL permissions, 334normal ACL permissions

defined, 335Not released

status flag on site definition in VLDB entry, 108NotRun status flag in BosConfig file

changing to Run, 84defined, 77

ntpddescription, 11

number variablesuss template file, 275

OOff-line status flag in volume header, 110Old release

status flag on site definition in VLDB entry, 109Old release site flag in VLDB

as indicator of failed replication, 96OLD version of binary file

created by bos install command, 57removing obsolete, 60used by bos uninstall command, 58

OldFiles directoryas mount point for backup volume, 101

On-line status flag in volume header, 110orphaned group, 319outages

BOS Server role in„ 8due to automatic server restart, 87due to server process restart, 86due to Ubik election, 54monitoring with scout program, 203

overcrowding of disk partitioneffect on users, 114moving volumes to reduce, 114

overwritingexisting directories/files/links with uss, 269

ownerProtection Database entry

changing, 326displaying, 317displaying all, 319rules for assigning, 323

PPAG

creating with klog or pagsh command, 31pagsh command, 31pagsh.krb command, 35participation

in AFS global namespace, 16


partitionhousing AFS volumes, 47restoring contents using Backup System, 186restoring using Backup System

to a new location, 190to the same location, 190

salvaging all volumes, 118passwd file, see local password filepassword

AFS compared to UNIX, 12changing in AFS, 35checking quality of, 35consequences of multiple failed authentication attempts,

35expiration, 35improving security, 305lifetime, 35local password file, 32restricting reuse, 35setting in Authentication Database, 309setting in local password file


performancecache, 249

permissions on ACLdefined, 333

persistent fstrace event set or trace log, 209possible variations

on replication, 98prdb.DB0 file, 46prdb.DBSYS1 file, 46preferences

setting, 259prefix-less group, see grouppreventing

core leaks, with scheduled BOS Server restarts, 87mutual authentication, 65

previewinguser account creation/deletion with uss, 268

privacy flags on Protection Database entry, 29displaying, 317setting, 328

private use of group, 322privilege, see administrative privilege

granting for backup commands, 346granting for bos commands, 346granting for fs commands, 344granting for kas commands, 345granting for pts commands, 344granting for vos commands, 346required for afsmonitor program, 215required for fstrace commands, 209required for scout program, 201required for uss commands, 268

privileged commands, 64process

lightweight Ubik, 52status flag in BosConfig file, 77

process authentication group, see PAGprocesses

Authentication Server, binary in /usr/afs/bin, 43Backup Server, binary in /usr/afs/bin, 43BOS Server, binary in /usr/afs/bin, 43File Server, binary in /usr/afs/bin, 43Protection Server, binary in /usr/afs/bin, 44Salvager, binary in /usr/afs/bin, 44Update Server, binaries in /usr/afs/bin, 44VL Server, binary in /usr/afs/bin, 44Volume Server, binary in /usr/afs/bin, 44

programsafsd, 241bosserver, 43buserver, 43fileserver, 43kaserver, 43ptserver, 44salvager, 44udebug, 44upclient, 44upserver, 44vlserver, 44volserver, 44

protectionAFS compared to UNIX, 12in AFS, 9in UNIX, 9

Protection Database, 9changing username, 310creator of entry


entry namechanging, 327

entry, deleting, 325group creation quota


group entry, 315displaying, 317displaying all, 319

group entry, creating, 322ID counters, setting, 331machine entry

displaying, 317displaying all, 319

machine entry, creating, 320machine entry, described, 315max user id and max group id counters, displaying and

setting, 330membership count

displaying, 317owner of entry

changing, 326



privacy flagsdisplaying, 317setting, 328

settingcounters for AFS UIDs, 331

user entrycreating with pts createuser command, 301creating with uss, 287deleting, 314deleting with uss, 290displaying, 317displaying all, 319

user entry, described, 315protection of file data

AFS compared to UFSACL, 332see also: ACL, 332

Protection Serverabout starting and stopping, 78as ptserver process, 76binary in /usr/afs/bin, 44building CPS, 315description, 9restarting after adding entry to server CellServDB file,



pts commandsadduser, 324

for system:administrators group, 344binary in /usr/afs/bin, 44chown, 326creategroup, 323createuser

machine entry, 321user account, 302

delete, 325when removing user account, 314

examine, 317granting privilege for, 344listentries, 319listmax, 330listowned, 319membership, 318

displaying system:administrators group, 344mutual authentication, bypassing, 66removeuser, 325

for system:administrators group, 345rename

machine or group name, 327username, 310

setfieldssetting group creation quota, 328setting privacy flags, 329

setmax, 331ptserver process, see Protection Server

binary in /usr/afs/bin, 44

Qquota

group-creationdisplaying, 317setting, 328

Rr ACL permission, 334RClone field in volume header, 110read

ACL permission, see r ACL permission)shorthand for ACL permissions, 334

read-only volumechanging name of, 129creating, 96

instructions, 98defined, 91defining site for in VLDB, 98dumping, 125ID number in volume header, 110mounting, 97moving, 115need for atomic release, 96releasing, 99removing

effect of, 123selecting site, 21

read/write mount point, see mount pointread/write volume

changing name of, 129cloning

for backup version, 99for replication, 96

creating, 93defined, 91dumping, 125ID number in volume header, 110mounting, 94moving, 115removing

effect of, 123instructions, 124

replication instructions, 98types suitable for replication, 97

rebootingfile server machine, limiting, 24server machine, instructions, 71

recyclinguseCounts of tapes (Backup System), 149

regular expressionBackup System, 143

regular group, see groupregular mount point, see mount point


releasestatus flags on site definitions in VLDB entry, 108

release stage in replication, 96ReleaseClone, 96ReleaseClone volume

ID number in volume header, 110releasing

read-only volume, 99read-only volume, forcing new cloning, 96read-only volume, need for atomicity, 96

removingACL entry, 337ADMIN flag from Authentication Database entry, 345all ACL entries, 339core files from /usr/afs/logs, 60database server machine

from client CellServDB file and kernel memory, 252from server CellServDB file, 63

disk from file server machine, 67group members, 325mount point, 107, 123

when changing username, 312mount point when removing user account, 314obsolete .BAK and .OLD version of binaries, 60obsolete AFS IDs from ACL, 341Protection Database entry, 314, 325server encryption key from KeyFile file, 232server process from BosConfig file, 83system:administrators group members, 345trace log contents (fstrace), 213user account components, 312UserList file users, 347volume, 123volume when removing user account, 314

renaminguser account components, 310volume, 129volume when changing username, 312

replacingall entries on ACL, 339

replicated database files, 45replication

appropriate volumes, 21defined, 92definition, 6detailed discussion, 96determining success of, 96forcing creation of new clone, 96need for all-or-nothing release, 96release stage, 96role of ReleaseClone, 96site definition stage, 96suitable types of volumes, 97variations possible in, 98

requirementsscout program, 201

resetting

disk cache size to default value, 245resizing

scout display, 203restart time for BOS Server (automatic)

displaying and setting time, 87restart times for BOS Server

about, 24displaying and setting, 87setting, 87

restartingserver process

except BOS Server, 86including BOS Server, 86when changing authorization checking, 65

server processes, 86restoring

administrative databases, 55Backup Database from tape, 195data

that no longer exists, 188data using Backup System, 186existing data

overwriting, 188preserving, 188

synchrony of VLDB and volume headers, 116volumes without using AFS Backup System, 127

restrictionson hard links in AFS, 14on volume names, 20

reverse videouse in scout program display, 203

revertingto old version of server process and command binaries,

58roles for server machine, 47

determining, 49summary, 22

ROnly field in volume header, 110root directory, 6, 93root superuser

limiting logins, 37root volumes (root.afs and root.cell), 20rules

for uss bulk input file, 293group names, assigning, 323uss template file, 276

Run status flag in BosConfig filechanging to NotRun, 83defined, 77

RWrite field in volume header, 110

SS instruction

uss template file, 284SALVAGE.fs file, 45salvage.lock file, 45SalvageLog file, 46


displaying, 88Salvager, see Salvager

as part of fs process, 75, 77binary in /usr/afs/bin, 44description, 10displaying log file, 88instructions for invoking, 118running before VLDB/volume header resynchroniza-

tion, 116when to contact, 75

salvagingvolumes, 118

savingprevious version of server binaries, 57

schedulingcreation of backup volumes, 100

scout program, 200attention levels, setting, 204banner line, 202basename, 201command syntax, 203display layout, 201display, resizing, 203examples (command and display), 205features summarized, 200highlighting in, 203monitoring disk usage, 202outages, monitoring, 203probe reporting line, 203requirements, 201reverse video, 203setting terminal type, 201starting, 203statistics displayed, 202stopping, 205

script for AFS initialization, 242secondary site (Ubik), 51security

AFS features, 36encrypted network communication, 36suggestions for improving, 37

self-owned group, 322server

definition, 4process

definition, 4list of AFS, 7

server encryption key, 38, 44adding to KeyFile file, 230Authentication Database, 228changing frequently, 228checksum displayed, 229defined, 38, 227displaying from Authentication Database, 229displaying from KeyFile file, 229emergency need to replace, 233KeyFile file, 228

password-like nature, 228removing from KeyFile file, 232role in mutual authentication, 227setting in Authentication Database, 230

server entry in VLDB, 68server machine

administering, 42binary distribution role, 49configuration files in /usr/afs/etc, 44configuration issues, 22database server role, 48determining roles, 49first installed, 22monitoring, 24need for consistent version of software, 57protecting directories on local disk, 23rebooting, 71roles for

summary, 22roles summarized, 47setting home cell, 15simple file server role, 48system control role, 49uninstalling command & process binaries, 58

server preference ranks, 257server process

binaries, see server process binariesbosserver, 74buserver, 74creating, 81creating and starting, 81creating ticket (tokens) for, 33cron type, defined, 77defining in BosConfig file, 81different names for, 73displaying entry in BosConfig, 79displaying log files, 88displaying status, 79fs, 74fs type, defined, 77kaserver, 75ptserver, 76removing from BosConfig file, 81, 83restarting

except BOS Server, 86restarting by restarting BOS Server, 86restarting for changed binaries, 57restarting immediately after stopping, 86restarting specific processes, 86simple type, defined, 77starting, 81starting up, 83stopping permanently, 81, 83upclient, 76upserver, 76use of CellServDB file, 61vlserver, 77


server process binariesdisplaying time stamp, 59in /usr/afs/bin, 43installing, 57removing obsolete, 60reverting to old version, 58uninstalling, 58

server ticket, 227server/client model, 4session key, 39, 227setting

ACL entries, 337ACL for directory with uss, 281ACL on home directory with uss, 280ADMIN flag in Authentication Database entry, 345AFS UID and AFS GID counters, 331AFS UID counters, 331AFS user id and max group id counters, 331BOS Server’s automatic restart times, 87Cache Manager preferences for file server machines,

257cell name, 15client interfaces registered with File Server, 260client-to-file-server probe interval, 254counters for AFS UID and AFS GID, 331data cache size in cacheinfo file, 244disk cache location in cacheinfo file, 244event set (fstrace), 210group-creation quota in Protection Database entry, 328home cell for client machine, 255password

in Authentication Database, 309privacy flags on Protection Database entry, 328server machine interfaces registered in VLDB, 68terminal type for scout, 201ThisCell file (client), value in, 255volume quota

on multiple volumes, 121on single volume, 121

volume quota with uss, 279setuid programs, 253

restrictions on, 14setting mode bits, 13

share command, 353shared secret, 38shared use of group, 322shorthand notation

ACL permissions, 334simple file server machine, 48

identifying with bos status, 50simple process

creating with bos create command, 82simple server process

defining in BosConfig file, 81simple-type server process

defined, 77site

count in VLDB, 108volume, defined, 91

site definition stage in replication, 96slowed performance

preventing in AFS, 7ssh command

AFS compared to UNIX, 13sshd command

AFS compared to UNIX, 13stages in volume replication, 96starting

database server process, about, 78scout program, 203server process, 81, 83

statistics display by scout program, 202status

displaying for server process, 79status flag

release, on site definitions in VLDB entry, 108status flag for process in BosConfig file

Run and Not Run, meaning of, 77status flag in BosConfig file

changing NotRun to Run, 84changing Run to NotRun, 83

status flags in volume header, 110stopping

database server process, about, 78server process

permanently, 83server process and immediately restarting, 86

Store statistic from scout program, 202strings command, 61suitability of volumes for replication, 97symbolic link

at second level of AFS pathname, 18creating with uss, 284overwritten by uss if exists, 269

symptomsof VLDB/volume header desynchronization, 116volume corruption, 118

synchronization site (Ubik)defined, 51flexibility, 54

synchronycontrolling for Cache Manager write operations, 264when AFS files saved on NFS clients, 351

synchrony of VLDB and volume headersmaintained by VL and Volume Servers, 92restoring, 116symptoms of lack of, 116

sys (@sys) variable in pathnames, 25sys command, 263sysid file, 45system control machine, 49

as distributor of UserList file, 346CellServDB file, distributing to server machines, 62identifying with bos status, 50


source for common KeyFile file, 228system groups

defined, 316using on ACLs, 335

system outagesdue to automatic server restart, 87due to server process restart, 86due to Ubik election, 54reducing, 8

system:administrators group, 316about, 30members

adding, 344displaying, 344removing, 345

privileges resulting, 344system:anyuser group, 316

about, 30using on ACLs, 335

system:authuser group, 316about, 30using on ACLs, 335

Ttape (Backup System)

automating mounting and unmounting, 157eliminating check for proper name, 161scanning, 184

Tape Coordinator (Backup System)adding to Backup Database, 141assigning file ownership, 140automating tape mounting and unmounting, 157configuring

AIX system, 140machine, 140tape device, 140

described, 135device configuration file (CFG), 157eliminating check for proper tape name, 161eliminating search/prompt for initial tape, 160filemark

described, 134determining size, 137

port offset numberassigning, 137defined, 135displaying, 142

processstarting, 169

removing from Backup Database, 141starting, 169status

displaying, 170stopping, 170task ID numbers, 169using default responses to errors, 160

tape labels

useCounts of tapes, 149tape recycling schedules, 148tapeconfig file, 137

ownership, assigning, 140tapes (Backup System)

archiving, 149capacity

determining, 137recording on label, 152

eliminating search/prompt for initial, 160label

creating, 152described, 134displaying, 152

namesassigning, 152described, 133

task ID numbers (Backup System), 169terminal type

setting for afsmonitor, 215setting for scout program, 201

TGS, 227ThisCell file (client), 242

how used by programs, 16setting value in, 255

ThisCell file (server), 44thresholds for statistics in scout display

setting, 204Ticket Granting Service, 227ticket-granter, 38tickets, see tokenstime stamp

on binary file, listing, 59tokens

command, 33creating for server process, 33data in, 38discarding with knfs command, 359discarding with unlog command, 33displaying for user, 33displaying with knfs command, 358one-per-cell rule, 31setting default lifetimes for users, 34

tokens.krb command, 35trace log (fstrace)

clearing contents, 213configuring, 210displaying state, 211dumping, 212persistence, 209

trace log from (fstrace)cmfx, 208

translatingdirectory/file name to volume ID number, 113directory/file name to volume location, 114directory/file name to volume name, 113volume ID number to name, 113


volume name to ID number, 113volume name/ID number to volume location, 113

translatorNFS/AFS, 349

transparent access as AFS feature, 5turning off authorization checking, 65turning on authorization checking, 65type flag for volume

VLDB entry, 108volume header, 109

UUbik

automatic updates, 51consistency guarantees, 53election of coordinator, 54failure due to mismatched server encryption keys, 233features summarized, 52majority defined, 54operation described, 51requirements summarized, 52server and client portions, 52use of CellServDB file, 61use of NetInfo and NetRestrict files, 68

udebugbinary in /usr/afs/bin, 44

UFSfile protection compared to AFS, 332mode bits, interpretation in AFS, 342

umount command, 68undefined ACL permissions, 334uninstalling

server process and command suite binaries, 58UNIX

differences from AFS summarized, 12mode bits, interpretation in AFS, 342UID

functional difference from AFS UID, 11UNIX UID

difference from AFS UID, 317matching with AFS UID, 270, 299

unlockingVLDB entry, 130

unlog command, 33when handling key emergency, 235

UNMOUNT instruction in CFG_device_name file, 157unmounting

file server machine disk, 67volume, 107, 123

upclient, see Update Serverbinary in /usr/afs/bin, 44

update daterecorded in volume header, 111

Update Serverabout starting and stopping, 78as upserver and upclient processes, 76binaries in /usr/afs/bin, 44

CellServDB file (server), distributing, 62client portion, 10

for binaries, 49for configuration files, 49

description, 10distributing server configuration files, 44distributor of KeyFile file, 228server portion, 10

on binary distribution machine, 49on system control machine, 49

when to contact, 77updating

CellServDB file (client), 252upserver, see Update Server

binary in /usr/afs/bin, 44useCount counter on tape label (Backup System), 149user

account, see user account, see user accountadding to group, 324AFS UID, assigning, 287, 301group memberships

displaying number, 317group memberships, displaying, 318group-creation quota


groups owned, displaying, 319name, see usernameprivacy flags on Protection Database entry


Protection Database entrydeleting, 325displaying, 317displaying all, 319

Protection Database entry, described, 315removing from group, 325

user accountcomponents, 267configuration issues, 26converting existing UNIX to AFS


creatingstandard files in, 29with individual commands, 301with uss, 287

creating different types with uss, 274creating/deleting many at once, 292creation using uss

previewing, 268deleting with uss, 290deletion using uss

previewing, 268matching AFS and UNIX UIDs, 270methods for grouping, 280removing from system, 312


suggestions for grouping home directories, 28two methods for creating and deleting, 267uss commands to create/delete

previewing, 268UserList file, 44

adding users, 347displaying, 346privileges resulting, 346removing users, 347

usernameassigning

with pts createuser command, 301with uss, 287

changing, 310choosing, 27part of volume name, 28use by Kerberos, 11

usr/afs/backup directoryownership, assigning, 140

usr/afs/bin directoryremoving obsolete .BAK and .OLD files, 60

usr/afs/bin directory on server machinescontents listed, 43

usr/afs/bin/bosserver, 74usr/afs/db directory on server machines

contents listed, 45usr/afs/etc directory on server machines

contents listed, 44usr/afs/local directory on server machines

contents listed, 44usr/afs/logs directory on server machines

contents listed, 46usr/vice/cache directory, 242usr/vice/etc directory, 241uss

accountrecovering from account creation failure, 269

AFS UID, assigning, 270command

reissuing, effect of, 269hard link, creating, 284previewing effect of command, 268symbolic link, creating, 284

uss bulk input filerules for constructing, 293

uss commandsACL, setting for directory, 281ACL, setting on home directory, 280add, 288

avoiding interruption, 268advantages over individual account-creation commands,

273bulk, 296command, executing with X instruction, 286converting existing UNIX accounts, 272creating individual user account, 287creating/deleting user accounts in bulk, 292

delete, 292avoiding interruption, 268

deleting individual user account, 290directory

creating, 281distributing evenly with G instruction, 278

file, creating by echoing one line, 283file, creating from prototype, 282local password file

creating common source version, 270overwriting existing account components, 269password/authentication security, setting with A instruc-

tion, 285privilege required, 268volume

creating with V instruction, 278mounting, 280setting quota, 279

uss template fileA instruction, 285ACL, setting

directory created by D instruction, 281user home directory with V instruction, 280

advantages, 273command, executing with X instruction, 286constants, 274D instruction, 281directory

creating with D instruction, 281G instruction for even distribution, 278

E instruction, 283F instruction, 282file

creating by echoing one line, 283creating from prototype, 282

G instruction, 278hard link, creating, 284instructions for different account types, 274instructions summarized, 273L instruction, 284mount point, creating with V instruction, 280number variables, 275password/authentication security, setting with A instruc-

tion, 285quota on volume, setting with V instruction, 279rules for constructing, 276S instruction, 284standard locations, 275symbolic link, creating, 284V instruction, 278variables, 274volume

creating with V instruction, 278X instruction, 286zero-length, 278

V


V.vol_ID.vol file, 47variable

AFSCELL, 166variables

@sys in pathnames, 25in uss template file, 274

variations possiblein replication, 98

vicep directory on server machinescontents listed, 47

VL Serverabout starting and stopping, 78as vlserver process, 77binary in /usr/afs/bin, 44Cache Manager preference ranks for, 257description, 9importance to transparent access, 9restarting after adding entry to server CellServDB file,


file, 63role in VLDB/volume header synchronization, 92runs on database server machine, 48when to contact, 77

VLDB, 9defining read-only site in, 98displaying entry

with volume header, 112displaying volume entry, 108intention flag set by VL Server, 116locking/unlocking entry, 130release status flags in volume entry, 108server machine interfaces registered

listed in sysid file, 45site count for volume, 108synchronizing with volume headers, 92, 116volume entry, 92volume type flags, 108

vldb.DB0 file, 46vldb.DBSYS1 file, 46VLLog file, 46

displaying, 88vlserver, see VL Server

binary in /usr/afs/bin, 44Vn file (data cache), 243vnode index, 95VolserLog file, 47

displaying, 88volserver, see Volume Server

binary in /usr/afs/bin, 44volume

as unit ofbackup, 6replication, 6resource management, 6

as unit of backup, 92as unit of replication, 92

automating creation of backup version, 100backing up using Backup System, 171backup, see backup volumeBackup System dump history, displaying, 184benefits for efficiency, 91correspondence with directory, 5counter in header for number of accesses, 111creating backup version of many at once, 99creating with uss, 278Creation date in volume header, 111defined, 91definition, 5direct access, 107displaying information about, 107dumping without AFS Backup System, 125entry in VLDB, 92flushing from data cache on client machine, 256grouping related on same partition, 21header, see volume headerin load balancing, 5, 91Last Update date in volume header, 111location, see volume locationmounting, 6

about, 93more than once, 93

moving, 114name, see volume nameoverwriting contents during Backup System restore, 188preserving contents during Backup System restore, 188quota, see volume quotaread-only, see read-only volumeread/write, see read/write volumeremoving

alternate commands, 124basic instructions, 123when removing user account, 314

renaming, 129replicating, 96restoring

using Backup System, 186with vos restore command, 127

root (root.afs and root.cell), 20root directory of, 6salvaging, 118separate for each top level directory, 19site, defined, 91size, displaying, 122symptoms of corruption, 118synchronizing VLDB and volume header, 116type to replicate, 21where to place replicated, 21

volume entry (Backup System)creating, 144defined, 133deleting, 146displaying, 145

volume entry (VLDB)


displaying, 108volume header

about, 92displaying

only, 109with VLDB entry, 112

in /vicep directories, 47synchronizing with VLDB, 92, 116

volume ID numberlearning

from volume name, 113learning from directory/file name, 113translating

to volume location, 113to volume name, 113

volume locationlearning from directory/file name, 114learning from volume name/ID number, 113

Volume Location Server, see VL Servervolume name

changingbasic instructions, 129when renaming user, 312

conventions, 93conventions for, 19learning

from directory/file name, 113from volume ID number, 113

restrictions, 20translating

to volume ID number, 113to volume location, 113

two required, 20volume quota

default for new volume, 93displaying

percent used, 122with volume &partition info, 122with volume size, 122

recorded in volume header, 111setting

on multiple volumes, 121on single volume, 121with uss, 279

Volume Serveras part of fs process, 75, 77binary in /usr/afs/bin, 44description, 9displaying log file, 88role in VLDB/volume header synchronization, 92when to contact, 75

volume set (Backup System)creating, 144defined, 133deleting, 145deleting volume entry, 146displaying, 145

volume entry, see volume entryVolumeItems file, 243vos commands

addsite, 98backup, 101backupsys, 102binary in /usr/afs/bin, 44changeaddr, 70create

basic instructions, 94when creating user account, 303

delentry, 124dump, 126examine

basic instructions, 112to learn volume ID, 113to learn volume name, 113

granting privilege for, 346listaddrs, 70listpart, 66listvldb

syntax, 108to learn volume location, 113

listvoloutput with -extended flag, 111output with -fast flag, 110output with -long flag, 110syntax, 109

lock, 130move

basic instructions, 115when removing file server machine disk, 67

mutual authentication, bypassing, 66partinfo, 94release

basic instructions, 99forcing new cloning with -f flag, 96

removebasic instructions, 124when removing user account, 314

remsite, 124rename

basic instructions, 129when changing username, 312

restoreto create new volume, 127to overwrite volume, 128

summary of functions, 77syncserv

effect, 117syntax, 117

syncvldbeffect, 116syntax, 117

unlock, 131unlockvldb, 131zap, 124


Ww ACL permission, 334weekly restart of BOS Server (automatic)

about, 24displaying and setting time, 87

which command, 61window

resizing scout display, 203write

ACL permission, see write ACL permissionoperations delayed from NFS clients, 351shorthand for ACL permissions, 334system call for files saved on AFS client, 14system call for files saved on NFS client, 351

Ws statistic from scout program, 202

XX instruction

uss template file, 286xstat as requirement for running afsmonitor, 215xstat data collection facility, 223

data collections, 224example commands, 224libxstat_cm.a library, 223libxstat_fs.a library, 223obtaining more information, 224xstat_cm_test example command, 225xstat_fs_test example command, 224

OpenAFS Administration Guide

Documents

Transcript of OpenAFS Administration Guide