OpenAFS
OpenAFS distributed network file system
Chapter 4: Common Definitions and Data Structures
This chapter discusses the definitions used in common by the File Server and the Cache Manager. They appear in the common.xg file, used by Rxgen to generate the C code instantiations of these definitions.

Section 4.1: File-Related Definitions

Section 4.1.1: struct AFSFid

This is the type for file system objects within AFS.

Fields
  • unsigned long Volume - This provides the identifier for the volume in which the object resides.
  • unsigned long Vnode - This specifies the index within the given volume corresponding to the object.
  • unsigned long Unique - This is a 'uniquifier' or generation number for the slot identified by the Vnode field.

Section 4.2: Callback-related Definitions

Section 4.2.1: Types of Callbacks

There are three types of callbacks defined by AFS-3:
  • EXCLUSIVE: This version of callback has not been implemented. Its intent was to allow a single Cache Manager to have exclusive rights on the associated file data.
  • SHARED: This callback type indicates that the status information kept by a Cache Manager for the associated file is up to date. All cached chunks from this file whose version numbers match the status information are thus guaranteed to also be up to date. This type of callback is non-exclusive, allowing any number of other Cache Managers to have callbacks on this file and cache chunks from the file.
  • DROPPED: This is used to indicate that the given callback promise has been cancelled by the issuing File Server. The Cache Manager is forced to mark the status of its cache entry as unknown, forcing it to stat the file the next time a user attempts to access any chunk from it.

Section 4.2.2: struct AFSCallBack

This is the canonical callback structure passed in many File Server RPC interface calls.
Fields
  • unsigned long CallBackVersion - Callback version number.
  • unsigned long ExpirationTime - Time when the callback expires, measured in seconds.
  • unsigned long CallBackType - The type of callback involved, one of EXCLUSIVE, SHARED, or DROPPED.

Section 4.2.3: Callback Arrays

AFS-3 sometimes does callbacks in bulk. Up to AFSCBMAX (50) callbacks can be handled at once. Layouts for the two related structures implementing callback arrays, struct AFSCBFids and struct AFSCBs, follow below. Note that the callback descriptor in slot i of the array in the AFSCBs structure applies to the file identifier contained in slot i in the fid array in the matching AFSCBFids structure.

Section 4.2.3.1: struct AFSCBFids


Fields

  • u int AFSCBFids len - Number of AFS file identifiers stored in the structure, up to a maximum of AFSCBMAX.
  • AFSFid *AFSCBFids val - Pointer to the first element of the array of file identifiers.

Section 4.2.3.2: struct AFSCBs


Fields

  • u int AFSCBs len - Number of AFS callback descriptors stored in the structure, up to a maximum of AFSCBMAX.
  • AFSCallBack *AFSCBs val - Pointer to the actual array of callback descriptors

Section 4.3: Locking Definitions

Section 4.3.1: struct AFSDBLockDesc

This structure describes the state of an AFS lock.
Fields
  • char waitStates - Types of lockers waiting for the lock.
  • char exclLocked - Does anyone have a boosted, shared or write lock? (A boosted lock allows the holder to have data read-locked and then 'boost' up to a write lock on the data without ever relinquishing the lock.)
  • char readersReading - Number of readers that actually hold a read lock on the associated object.
  • char numWaiting - Total number of parties waiting to acquire this lock in some fashion.

Section 4.3.2: struct AFSDBCacheEntry

This structure defines the description of a Cache Manager local cache entry, as made accessible via the RXAFSCB GetCE() callback RPC call. Note that File Servers do not make the above call. Rather, client debugging programs (such as cmdebug) are the agents which call RXAFSCB GetCE().
Fields
  • long addr - Memory location in the Cache Manager where this description is located.
  • long cell - Cell part of the fid.
  • AFSFid netFid - Network (standard) part of the fid
  • long Length - Number of bytes in the cache entry.
  • long DataVersion - Data version number for the contents of the cache entry.
  • struct AFSDBLockDesc lock - Status of the lock object controlling access to this cache entry.
  • long callback - Index in callback records for this object.
  • long cbExpires - Time when the callback expires.
  • short refCount - General reference count.
  • short opens - Number of opens performed on this object.
  • short writers - Number of writers active on this object.
  • char mvstat - The file classification, indicating one of normal file, mount point, or volume root.
  • char states - Remembers the state of the given file with a set of bits indicating, from lowest-order to highest order: stat info valid, read-only file, mount point valid, pending core file, wait-for-store, and mapped file.

Section 4.3.3: struct AFSDBLock

This is a fuller description of an AFS lock, including a string name used to identify it.
Fields
  • char name[16] - String name of the lock.
  • struct AFSDBLockDesc lock - Contents of the lock itself.

Section 4.4: Miscellaneous Definitions

Section 4.4.1: Opaque structures

A maximum size for opaque structures passed via the File Server interface is defined as AFSOPAQUEMAX. Currently, this is set to 1,024 bytes. The AFSOpaque typedef is defined for use by those parameters that wish their contents to travel completely uninterpreted across the network.

Section 4.4.2: String Lengths

Two common definitions used to specify basic AFS string lengths are AFSNAMEMAX and AFSPATHMAX. AFSNAMEMAX places an upper limit of 256 characters on such things as file and directory names passed as parameters. AFSPATHMAX defines the longest pathname expected by the system, composed of slash-separated instances of the individual directory and file names mentioned above. The longest acceptable pathname is currently set to 1,024 characters.

Section 4.1: File-Related Definitions

This chapter documents three packages defined directly in support of the Rx facility.
  • rx queue: Doubly-linked queue package.
  • rx clock: Clock package, using the 4.3BSD interval timer.
  • rx event: Future events package.
References to constants, structures, and functions defined by these support packages will appear in the following API chapter.

Section 4.2: Callback-related Definitions

This package provides a doubly-linked queue structure, along with a full suite of related operations. The main concern behind the coding of this facility was efficiency. All functions are implemented as macros, and it is suggested that only simple expressions be used for all parameters.
The rx queue facility is defined by the rx queue.h include file. Some macros visible in this file are intended for rx queue internal use only. An understanding of these "hidden" macros is important, so they will also be described by this document.

Section 4.2.1: Types of Callbacks

The queue structure provides the linkage information required to maintain a queue of objects. The queue structure is prepended to any user-defined data type which is to be organized in this fashion.
fields
  • struct queue *prev - Pointer to the previous queue header.
  • struct queue *next - Pointer to the next queue header.
Note that a null Rx queue consists of a single struct queue object whose next and previous pointers refer to itself.

Section 4.2.2: struct AFSCallBack

This section describes the internal operations defined for Rx queues. They will be referenced by the external operations documented in Section 4.2.3.

Section 4.2.2.1: Q(): Coerce type to a queue

element

#define _Q(x) ((struct queue *)(x))
This operation coerces the user structure named by x to a queue element. Any user structure using the rx queue package must have a struct queue as its first field.

Section 4.2.2.2: QA(): Add a queue element

before/after another element

#define _QA(q,i,a,b) (((i->a=q->a)->b=i)->b=q, q->a=i)
This operation adds the queue element referenced by i either before or after a queue element represented by q. If the (a, b) argument pair corresponds to an element's (next, prev) fields, the new element at i will be linked after q. If the (a, b) argument pair corresponds to an element's (prev, next) fields, the new element at i will be linked before q.

QR(): Remove a queue element

#define _QR(i) ((_Q(i)->prev->next=_Q(i)->next)->prev=_Q(i)->prev)
This operation removes the queue element referenced by i from its queue. The prev and next fields within queue element i itself is not updated to reflect the fact that it is no longer part of the queue.

QS(): Splice two queues together

#define _QS(q1,q2,a,b) if (queue_IsEmpty(q2)); else ((((q2->a->b=q1)->a->b=q2->b)->a=q1->a, q1->a=q2->a), queue_Init(q2))
This operation takes the queues identified by q1 and q2 and splices them together into a single queue. The order in which the two queues are appended is determined by the a and b arguments. If the (a, b) argument pair corresponds to q1's (next, prev) fields, then q2 is appended to q1. If the (a, b) argument pair corresponds to q1's (prev, next) fields, then q is prepended to q2.
This internal QS() routine uses two exported queue operations, namely queue Init() and queue IsEmpty(), defined in Sections 4.2.3.1 and 4.2.3.16 respectively below.

Section 4.2.3: Callback Arrays

Section 4.2.3.1: struct AFSCBFids

queue header

#define queue_Init(q) (_Q(q))->prev = (_Q(q))->next = (_Q(q))
The queue header referred to by the q argument is initialized so that it describes a null (empty) queue. A queue head is simply a queue element.

Section 4.2.3.2: struct AFSCBs

at the head of a queue

#define queue_Prepend(q,i) _QA(_Q(q),_Q(i),next,prev)
Place queue element i at the head of the queue denoted by q. The new queue element, i, should not currently be on any queue.

Section 4.2.3.3: queue Append(): Put an

element a the tail of a queue

#define queue_Append(q,i) _QA(_Q(q),_Q(i),prev,next)
Place queue element i at the tail of the queue denoted by q. The new queue element, i, should not currently be on any queue.

Section 4.2.3.4: queue InsertBefore(): Insert a

queue element before another element

#define queue_InsertBefore(i1,i2) _QA(_Q(i1),_Q(i2),prev,next)
Insert queue element i2 before element i1 in i1's queue. The new queue element, i2, should not currently be on any queue.

Section 4.2.3.5: queue InsertAfter(): Insert

a queue element after another element

#define queue_InsertAfter(i1,i2) _QA(_Q(i1),_Q(i2),next,prev)
Insert queue element i2 after element i1 in i1's queue. The new queue element, i2, should not currently be on any queue.

Section: 4.2.3.6: queue SplicePrepend():

Splice one queue before another

#define queue_SplicePrepend(q1,q2) _QS(_Q(q1),_Q(q2),next,prev)
Splice the members of the queue located at q2 to the beginning of the queue located at q1, reinitializing queue q2.

Section 4.2.3.7: queue SpliceAppend(): Splice

one queue after another

#define queue_SpliceAppend(q1,q2) _QS(_Q(q1),_Q(q2),prev,next)
Splice the members of the queue located at q2 to the end of the queue located at q1, reinitializing queue q2. Note that the implementation of queue SpliceAppend() is identical to that of queue SplicePrepend() except for the order of the next and prev arguments to the internal queue splicer, QS().

Section 4.2.3.8: queue Replace(): Replace the

contents of a queue with that of another

#define queue_Replace(q1,q2) (*_Q(q1) = *_Q(q2),
_Q(q1)->next->prev = _Q(q1)->prev->next = _Q(q1),
queue_Init(q2))
Replace the contents of the queue located at q1 with the contents of the queue located at q2. The prev and next fields from q2 are copied into the queue object referenced by q1, and the appropriate element pointers are reassigned. After the replacement has occurred, the queue header at q2 is reinitialized.

Section 4.2.3.9: queue Remove(): Remove an

element from its queue

#define queue_Remove(i) (_QR(i), _Q(i)->next = 0)
This function removes the queue element located at i from its queue. The next field for the removed entry is zeroed. Note that multiple removals of the same queue item are not supported.

Section 4.2.3.10: queue MoveAppend(): Move

an element from its queue to the end of another queue

#define queue_MoveAppend(q,i) (_QR(i), queue_Append(q,i))
This macro removes the queue element located at i from its current queue. Once removed, the element at i is appended to the end of the queue located at q.

Section 4.2.3.11: queue MovePrepend(): Move

an element from its queue to the head of another queue

#define queue_MovePrepend(q,i) (_QR(i), queue_Prepend(q,i))
This macro removes the queue element located at i from its current queue. Once removed, the element at i is inserted at the head fo the queue located at q.

Section 4.2.3.12: queue first(): Return the

first element of a queue, coerced to a particular type

#define queue_first(q,s) ((struct s *)_Q(q)->next)
Return a pointer to the first element of the queue located at q. The returned pointer value is coerced to conform to the given s structure. Note that a properly coerced pointer to the queue head is returned if q is empty.

Section 4.2.3.13: queue Last(): Return the

last element of a queue, coerced to a particular type

#define queue_Last(q,s) ((struct s *)_Q(q)->prev)
Return a pointer to the last element of the queue located at q. The returned pointer value is coerced to conform to the given s structure. Note that a properly coerced pointer to the queue head is returned if q is empty.

Section 4.2.3.14: queue Next(): Return the

next element of a queue, coerced to a particular type

#define queue_Next(i,s) ((struct s *)_Q(i)->next)
Return a pointer to the queue element occuring after the element located at i. The returned pointer value is coerced to conform to the given s structure. Note that a properly coerced pointer to the queue head is returned if item i is the last in its queue.

Section 4.2.3.15: queue Prev(): Return the

next element of a queue, coerced to a particular type

#define queue_Prev(i,s) ((struct s *)_Q(i)->prev)
Return a pointer to the queue element occuring before the element located at i. The returned pointer value is coerced to conform to the given s structure. Note that a properly coerced pointer to the queue head is returned if item i is the first in its queue.

Section 4.2.3.16: queue IsEmpty(): Is the

given queue empty?

#define queue_IsEmpty(q) (_Q(q)->next == _Q(q))
Return a non-zero value if the queue located at q does not have any elements in it. In this case, the queue consists solely of the queue header at q whose next and prev fields reference itself.

Section 4.2.3.17: queue IsNotEmpty(): Is the

given queue not empty?

#define queue_IsNotEmpty(q) (_Q(q)->next != _Q(q))
Return a non-zero value if the queue located at q has at least one element in it other than the queue header itself.

Section 4.2.3.18: queue IsOnQueue(): Is an

element currently queued?

#define queue_IsOnQueue(i) (_Q(i)->next != 0)
This macro returns a non-zero value if the queue item located at i is currently a member of a queue. This is determined by examining its next field. If it is non-null, the element is considered to be queued. Note that any element operated on by queue Remove() (Section 4.2.3.9) will have had its next field zeroed. Hence, it would cause a non-zero return from this call.

Section 4.2.3.19: queue Isfirst(): Is an

element the first on a queue?

#define queue_Isfirst(q,i) (_Q(q)->first == _Q(i))
This macro returns a non-zero value if the queue item located at i is the first element in the queue denoted by q.

Section 4.2.3.20: queue IsLast(): Is an

element the last on a queue?

#define queue_IsLast(q,i) (_Q(q)->prev == _Q(i))
This macro returns a non-zero value if the queue item located at i is the last element in the queue denoted by q.

Section 4.2.3.21: queue IsEnd(): Is an

element the end of a queue?

#define queue_IsEnd(q,i) (_Q(q) == _Q(i))
This macro returns a non-zero value if the queue item located at i is the end of the queue located at q. Basically, it determines whether a queue element in question is also the queue header structure itself, and thus does not represent an actual queue element. This function is useful for terminating an iterative sweep through a queue, identifying when the search has wrapped to the queue header.

Section 4.2.3.22: queue Scan(): for loop

test for scanning a queue in a forward direction

#define queue_Scan(q, qe, next, s)
(qe) = queue_first(q, s), next = queue_Next(qe, s);
!queue_IsEnd(q, qe);
(qe) = (next), next = queue_Next(qe, s)
This macro may be used as the body of a for loop test intended to scan through each element in the queue located at q. The qe argument is used as the for loop variable. The next argument is used to store the next value for qe in the upcoming loop iteration. The s argument provides the name of the structure to which each queue element is to be coerced. Thus, the values provided for the qe and next arguments must be of type (struct s *).
An example of how queue Scan() may be used appears in the code fragment below. It declares a structure named mystruct, which is suitable for queueing. This queueable structure is composed of the queue pointers themselves followed by an integer value. The actual queue header is kept in demoQueue, and the currItemP and nextItemP variables are used to step through the demoQueue. The queue Scan() macro is used in the for loop to generate references in currItemP to each queue element in turn for each iteration. The loop is used to increment every queued structure's myval field by one.
 struct mystruct { 
        struct queue q; 
        int myval; 
 }; 
 struct queue demoQueue; 
 struct mystruct *currItemP, *nextItemP; 
 ... 
 for (queue_Scan(&demoQueue, currItemP, nextItemP, mystruct)) { 
        currItemP->myval++; 
 } 
Note that extra initializers can be added before the body of the queue Scan() invocation above, and extra expressions can be added afterwards.

Section 4.2.3.23: queue ScanBackwards(): for

loop test for scanning a queue in a reverse direction

#define queue_ScanBackwards(q, qe, prev, s)
(qe) = queue_Last(q, s), prev = queue_Prev(qe, s);
!queue_IsEnd(q, qe);
(qe) = prev, prev = queue_Prev(qe, s)
This macro is identical to the queue Scan() macro described above in Section 4.2.3.22 except for the fact that the given queue is scanned backwards, starting at the last item in the queue.

Section 4.3: Locking Definitions

This package maintains a clock which is independent of the time of day. It uses the unix 4.3BSD interval timer (e.g., getitimer(), setitimer()) in TIMER REAL mode. Its definition and interface may be found in the rx clock.h include file.

Section 4.3.1: struct AFSDBLockDesc

This structure is used to represent a clock value as understood by this package. It consists of two fields, storing the number of seconds and microseconds that have elapsed since the associated clock Init() routine has been called.
fields
long sec -Seconds since call to clock Init().
long usec -Microseconds since call to clock Init().

Section 4.3.2: struct AFSDBCacheEntry

The integer-valued clock nUpdates is a variable exported by the rx clock facility. It records the number of times the clock value is actually updated. It is bumped each time the clock UpdateTime() routine is called, as described in Section 4.3.3.2.

Section 4.3.3: struct AFSDBLock

Section 4.3.3.1: clock Init(): Initialize the

clock package

This routine uses the unix setitimer() call to initialize the unix interval timer. If the setitimer() call fails, an error message will appear on stderr, and an exit(1) will be executed.

Section 4.3.3.2: clock UpdateTime(): Compute

the current time

The clock UpdateTime() function calls the unix getitimer() routine in order to update the current time. The exported clock nUpdates variable is incremented each time the clock UpdateTime() routine is called.

Section 4.3.3.3: clock GetTime(): Return the

current clock time

This macro updates the current time if necessary, and returns the current time into the cv argument, which is declared to be of type (struct clock *). 4.3.3.4 clock Sec(): Get the current clock time, truncated to seconds This macro returns the long value of the sec field of the current time. The recorded time is updated if necessary before the above value is returned.

Section 4.3.3.5: clock ElapsedTime(): Measure

milliseconds between two given clock values

This macro returns the elapsed time in milliseconds between the two clock structure pointers provided as arguments, cv1 and cv2.

Section 4.3.3.6: clock Advance(): Advance the

recorded clock time by a specified clock value

This macro takes a single (struct clock *) pointer argument, cv, and adds this clock value to the internal clock value maintined by the package.

Section 4.3.3.7: clock Gt(): Is a clock value

greater than another?

This macro takes two parameters of type (struct clock *), a and b. It returns a nonzero value if the a parameter points to a clock value which is later than the one pointed to by b.

Section 4.3.3.8: clock Ge(): Is a clock value

greater than or equal to another?

This macro takes two parameters of type (struct clock *), a and b. It returns a nonzero value if the a parameter points to a clock value which is greater than or equal to the one pointed to by b.

Section 4.3.3.9: clock Gt(): Are two clock

values equal?

This macro takes two parameters of type (struct clock *), a and b. It returns a non-zero value if the clock values pointed to by a and b are equal.

value less than or equal to another?

This macro takes two parameters of type (struct clock *), a and b. It returns a nonzero value if the a parameter points to a clock value which is less than or equal to the one pointed to by b.

Section 4.3.3.11: clock Lt(): Is a clock

value less than another?

This macro takes two parameters of type (struct clock *), a and b. It returns a nonzero value if the a parameter points to a clock value which is less than the one pointed to by b.

Section 4.3.3.12: clock IsZero(): Is a clock

value zero?

This macro takes a single parameter of type (struct clock *), c. It returns a non-zero value if the c parameter points to a clock value which is equal to zero.

Section 4.3.3.13: clock Zero(): Set a clock

value to zero

This macro takes a single parameter of type (struct clock *), c. It sets the given clock value to zero.

Section 4.3.3.14: clock Add(): Add two clock

values together

This macro takes two parameters of type (struct clock *), c1 and c2. It adds the value of the time in c2 to c1. Both clock values must be positive.

Section 4.3.3.15: clock Sub(): Subtract two

clock values

This macro takes two parameters of type (struct clock *), c1 and c2. It subtracts the value of the time in c2 from c1. The time pointed to by c2 should be less than the time pointed to by c1.

Section 4.3.3.16: clock Float(): Convert a

clock time into floating point

This macro takes a single parameter of type (struct clock *), c. It expresses the given clock value as a floating point number.

Section 4.4: Miscellaneous Definitions

This package maintains an event facility. An event is defined to be something that happens at or after a specified clock time, unless cancelled prematurely. The clock times used are those provided by the rx clock facility described in Section 4.3 above. A user routine associated with an event is called with the appropriate arguments when that event occurs. There are some restrictions on user routines associated with such events. first, this user-supplied routine should not cause process preemption. Also, the event passed to the user routine is still resident on the event queue at the time of invocation. The user must not remove this event explicitly (via an event Cancel(), see below). Rather, the user routine may remove or schedule any other event at this time.
The events recorded by this package are kept queued in order of expiration time, so that the first entry in the queue corresponds to the event which is the first to expire. This interface is defined by the rx event.h include file.

Section 4.4.1: Opaque structures

This structure defines the format of an Rx event record.
fields
struct queue junk -The queue to which this event belongs.
struct clock eventTime -The clock time recording when this event comes due.
int (*func)() -The user-supplied function to call upon expiration.
char *arg -The first argument to the (*func)() function above.
char *arg1 -The second argument to the (*func)() function above.

Section 4.4.2: String Lengths

This section covers the interface routines provided for the Rx event package.

Section 4.4.2.1: rxevent Init(): Initialize

the event package

The rxevent Init() routine takes two arguments. The first, nEvents, is an integer-valued parameter which specifies the number of event structures to allocate at one time. This specifies the appropriate granularity of memory allocation by the event package. The second parameter, scheduler, is a pointer to an integer-valued function. This function is to be called when an event is posted (added to the set of events managed by the package) that is scheduled to expire before any other existing event.
This routine sets up future event allocation block sizes, initializes the queues used to manage active and free event structures, and recalls that an initialization has occurred. Thus, this function may be safely called multiple times.

Section 4.4.2.2: rxevent Post(): Schedule an

event

This function constructs a new event based on the information included in its parameters and then schedules it. The rxevent Post() routine takes four parameters. The first is named when, and is of type (struct clock *). It specifies the clock time at which the event is to occur. The second parameter is named func and is a pointer to the integer-valued function to associate with the event that will be created. When the event comes due, this function will be executed by the event package. The next two arguments to rxevent Post() are named arg and arg1, and are both of type (char *). They serve as the two arguments thath will be supplied to the func routine when the event comes due.
If the given event is set to take place before any other event currently posted, the scheduler routine established when the rxevent Init() routine was called will be executed. This gives the application a chance to react to this new event in a reasonable way. One might expect that this scheduler routine will alter sleep times used by the application to make sure that it executes in time to handle the new event.

Section 4.4.2.3: rxevent Cancel 1(): Cancel

an event (internal use)

This routine removes an event from the set managed by this package. It takes a single parameter named ev of type (struct rxevent *). The ev argument identifies the pending event to be cancelled.
The rxevent Cancel 1() routine should never be called directly. Rather, it should be accessed through the rxevent Cancel() macro, described in Section 4.4.2.4 below.

Section 4.4.2.4: rxevent Cancel(): Cancel an

event (external use)

This macro is the proper way to call the rxevent Cancel 1() routine described in Section 4.4.2.3 above. Like rxevent Cancel 1(), it takes a single argument. This event ptr argument is of type (struct rxevent *), and identi::es the pending event to be cancelled. This macro #rst checks to see if event ptr is null. If not, it calls rxevent Cancel 1() to perform the real work. The event ptr argument is zeroed after the cancellation operation completes.

Section 4.4.2.4: rxevent RaiseEvents():

Initialize the event package

This function processes all events that have expired relative to the current clock time maintained by the event package. Each qualifying event is removed from the queue in order, and its user-supplied routine (func()) is executed with the associated arguments.
The rxevent RaiseEvents() routine takes a single output parameter named next, defined to be of type (struct clock *). Upon completion of rxevent RaiseEvents(), the relative time to the next event due to expire is placed in next. This knowledge may be used to calculate the amount of sleep time before more event processing is needed. If there is no recorded event which is still pending at this point, rxevent RaiseEvents() returns a zeroed clock value into next.

Section 4.4.2.6: rxevent TimeToNextEvent():

Get amount of time until the next event expires

This function returns the time between the current clock value as maintained by the event package and the the next event's expiration time. This information is placed in the single output argument,interval, defined to be of type (struct clock *). The rxevent TimeToNextEvent() function returns integer-valued quantities. If there are no scheduled events, a zero is returned. If there are one or more scheduled events, a 1 is returned. If zero is returned, the interval argument is not updated.

Section 4.1: File-Related Definitions

The Volume Server allows administrative tasks and probes to be performed on the set of AFS volumes residing on the machine on which it is running. As described in Chapter 2, a distributed database holding volume location info, the VLDB, is used by client applications to locate these volumes. Volume Server functions are typically invoked either directly from authorized users via the vos utility or by the AFS backup system.
This chapter briefly discusses various aspects of the Volume Server's architecture. First, the high-level on-disk representation of volumes is covered. Then, the transactions used in conjuction with volume operations are examined. Then, the program implementing the Volume Server, volserver, is considered. The nature and format of the log file kept by the Volume Server rounds out the description. As with all AFS servers, the Volume Server uses the Rx remote procedure call package for communication with its clients.

Section 4.2: Callback-related Definitions

For each volume on an AFS partition, there exists a file visible in the unix name space which describes the contents of that volume. By convention, each of these files is named by concatenating a prefix string, "V", the numerical volume ID, and the postfix string ".vol". Thus, file V0536870918.vol describes the volume whose numerical ID is 0536870918. Internally, each per-volume descriptor file has such fields as a version number, the numerical volume ID, and the numerical parent ID (useful for read-only or backup volumes). It also has a list of related inodes, namely files which are not visible from the unix name space (i.e., they do not appear as entries in any unix directory object). The set of important related inodes are:
  • Volume info inode: This field identifies the inode which hosts the on-disk representation of the volume's header. It is very similar to the information pointed to by the volume field of the struct volser trans defined in Section 5.4.1, recording important status information for the volume.
  • Large vnode index inode: This field identifies the inode which holds the list of vnode identifiers for all directory objects residing within the volume. These are "large" since they must also hold the Access Control List (ACL) information for the given AFS directory.
  • Small vnode index inode: This field identifies the inode which holds the list of vnode identifiers for all non-directory objects hosted by the volume.
All of the actual files and directories residing within an AFS volume, as identified by the contents of the large and small vnode index inodes, are also free-floating inodes, not appearing in the conventional unix name space. This is the reason the vendor-supplied fsck program should not be run on partitions containing AFS volumes. Since the inodes making up AFS files and directories, as well as the inodes serving as volume indices for them, are not mapped to any directory, the standard fsck program would throw away all of these "unreferenced" inodes. Thus, a special version of fsck is provided that recognizes partitions containing AFS volumes as well as standard unix partitions.

Section 4.3: Locking Definitions

Each individual volume operation is carried out by the Volume Server as a transaction, but not in the atomic sense of the word. Logically, creating a Volume Server transaction can be equated with performing an "exclusive open" on the given volume before beginning the actual work of the desired volume operation. No other Volume Server (or File Server) operation is allowed on the opened volume until the transaction is terminated. Thus, transactions in the context of the Volume Server serve to provide mutual exclusion without any of the normal atomicity guarantees. Volumes maintain enough internal state to enable recovery from interrupted or failed operations via use of the salvager program. Whenever volume inconsistencies are detected, this salvager program is run, which then attempts to correct the problem.
Volume transactions have timeouts associated with them. This guarantees that the death of the agent performing a given volume operation cannot result in the volume being permanently removed from circulation. There are actually two timeout periods defined for a volume transaction. The first is the warning time, defined to be 5 minutes. If a transaction lasts for more than this time period without making progress, the Volume Server prints a warning message to its log file (see Section 4.5). The second time value associated with a volume transaction is the hard timeout, defined to occur 10 minutes after any progress has been made on the given operation. After this period, the transaction will be unconditionally deleted, and the volume freed for any other operations. Transactions are reference-counted. Progress will be deemed to have occurred for a transaction, and its internal timeclock field will be updated, when:
  • 1 The transaction is first created.
  • 2 A reference is made to the transaction, causing the Volume Server to look it up in its internal tables.
  • 3 The transaction's reference count is decremented.

Section 4.4: Miscellaneous Definitions

The volserver user-level program is run on every AFS server machine, and implements the Volume Server agent. It is responsible for providing the Volume Server interface as defined by the volint.xg Rxgen file.
The volserver process defines and launches five threads to perform the bulk of its duties. One thread implements a background daemon whose job it is to garbage-collect timed-out transaction structures. The other four threads are RPC interface listeners, primed to accept remote procedure calls and thus perform the defined set of volume operations.
Certain non-standard configuration settings are made for the RPC subsystem by the volserver program. For example, it chooses to extend the length of time that an Rx connection may remain idle from the default 12 seconds to 120 seconds. The reasoning here is that certain volume operations may take longer than 12 seconds of processing time on the server, and thus the default setting for the connection timeout value would incorrectly terminate an RPC when in fact it was proceeding normally and correctly.
The volserver program takes a single, optional command line argument. If a positive integer value is provided on the command line, then it shall be used to set the debugging level within the Volume Server. By default, a value of zero is used, specifying that no special debugging output will be generated and fed to the Volume Server log file described below.

Section 4.5: Log File

The Volume Server keeps a log file, recording the set of events of special interest it has encountered. The file is named VolserLog, and is stored in the /usr/afs/logs directory on the local disk of the server machine on which the Volume Server runs. This is a human-readable file, with every entry time-stamped.
Whenever the volserver program restarts, it renames the current VolserLog file to VolserLog.old, and starts up a fresh log. A properly-authorized individual can easily inspect the log file residing on any given server machine. This is made possible by the BOS Server AFS agent running on the machine, which allows the contents of this file to be fetched and displayed on the caller's machine via the bos getlog command.
An excerpt from a Volume Server log file follows below. The numbers appearing in square brackets at the beginning of each line have been inserted so that we may reference the individual lines of the log excerpt in the following paragraph.
[1] Wed May 8 06:03:00 1991 AttachVolume: Error attaching volume
/vicepd/V1969547815.vol; volume needs salvage 
[2] Wed May 8 06:03:01 1991 Volser: ListVolumes: Could not attach volume
1969547815 
[3] Wed May 8 07:36:13 1991 Volser: Clone: Cloning volume 1969541499 to new
volume 1969541501 
[4] Wed May 8 11:25:05 1991 AttachVolume: Cannot read volume header
/vicepd/V1969547415.vol 
[5] Wed May 8 11:25:06 1991 Volser: CreateVolume: volume 1969547415
(bld.dce.s3.dv.pmax_ul3) created 
Line [1] indicates that the volume whose numerical ID is 1969547815 could not be attached on partition /vicepd. This error is probably the result of an aborted transaction which left the volume in an inconsistent state, or by actual damage to the volume structure or data. In this case, the Volume Server recommends that the salvager program be run on this volume to restore its integrity. Line [2] records the operation which revealed this situation, namely the invocation of an AFSVolListVolumes() RPC.
Line [4] reveals that the volume header file for a specific volume could not be read. Line [5], as with line [2] in the above paragraph, indicates why this is true. Someone had called the AFSVolCreateVolume() interface function, and as a precaution, the Volume Server first checked to see if such a volume was already present by attempting to read its header.
Thus verifying that the volume did not previously exist, the Volume Server allowed the AFSVolCreateVolume() call to continue its processing, creating and initializing the proper volume file, V1969547415.vol, and the associated header and index inodes.
 All Data Structures Files Functions Variables