- This chapter considers some of the architectual features of the AFS-3 BOS Server. First, the basic organizational and functional entity employed by the BOS Server, the bnode, is discussed. Next, the set of files with which the server interacts is examined. The notion of restart times is then explained in detail. Finally, the organization and components of the bosserver program itself, which implements the BOS Server, are presented.
Section 2.1: Bnodes
Section 2.1.1: Overview
- The information required to manage each AFS-related program running on a File Server machine is encapsulated in a bnode object. These bnodes serve as the basic building blocks for BOS Server services. Bnodes have two forms of existence:
- On-disk: The BosConfig file (see Section 2.3.4 below) defines the set of bnodes for which the BOS Server running on that machine will be responsible, along with specifying other information such as values for the two restart times. This file provides permanent storage (i.e., between bootups) for the desired list of programs for that server platform.
- In-memory: The contents of the BosConfig file are parsed and internalized by the BOS Server when it starts execution. The basic data for a particular server program is placed into a struct bnode structure.
- The initial contents of the BosConfig file are typically set up during system installation. The BOS Server can be directed, via its RPC interface, to alter existing bnode entries in the BosConfig file, add new ones, and delete old ones. Typically, this file is never edited directly.
Section 2.1.2: Bnode Classes
- The descriptions of the members of the AFS server suite fall into three broad classes of programs:
- Simple programs: This server class is populated by programs that simply need to be kept running, and do not depend on other programs for correctness or effectiveness. Examples of AFS servers falling into this category are the Volume Location Server, Authentication Server, and Protection Server. Since its members exhibit such straightforward behavior, this class of programs is referred to as the simple class.
- Interrelated programs: The File Server program depends on two other programs, and requires that they be executed at the appropriate times and in the appropriate sequence, for correct operation. The first of these programs is the Volume Server, which must be run concurrently with the File Server. The second is the salvager, which repairs the AFS volume metadata on the server partitions should the metadata become damaged. The salvager must not be run at the same time as the File Server. In honor of the File Server trio that inspired it, the class of programs consisting of groups of interrelated processes is named the fs class.
- Periodic programs: Some AFS servers, such as the BackupServer, only need to run every so often, but on a regular and well-defined basis. The name for this class is taken from the unix tool that is typically used to define such regular executions, namely the cron class.
- The recognition and definition of these three server classes is exploited by the BOS Server. Since all of the programs in a given class share certain common characteristics, they may all utilize the same basic data structures to record and manage their special requirements. Thus, it is not necessary to reimplement all the operations required to satisfy the capabilities promised by the BOS Server RPC interface for each and every program the BOS Server manages. Implementing one set of operations for each server class is sufficient to handle any and all server binaries to be run on the platform.
Section 2.1.3: Per-Class Bnode Operations
- As mentioned above, only one set of basic routines must be implemented for each AFS server class. Much like Sun's VFS/vnode interface [8], providing a common set of routines for interacting with a given file system, regardless of its underlying implementation and semantics, the BOS Server defines a common vector of operations for a class of programs to be run under the BOS Server's tutelage. In fact, it was this standardized file system interface that inspired the "bnode" name.
- The BOS Server manipulates the process or processes that are described by each bnode by invoking the proper functions in the appropriate order from the operation vector for that server class. Thus, the BOS Server itself operates in a class-independent fashion. This allows each class to take care of the special circumstances associated with it, yet to have the BOS Server itself be totally unaware of what special actions (if any) are needed for the class. This abstraction also allows more server classes to be implemented without any significant change to the BOS Server code itself.
- There are ten entries in this standardized class function array. The purpose and usage of each individual class function is described in detail in Section 3.3.5. Much like the VFS system, a collection of macros is also provided in order to simplify the invocation of these functions. These macros are described in Section 3.5. The ten function slots are named here for convenience:
- create()
- timeout()
- getstat()
- setstat()
- delete()
- procexit()
- getstring()
- getparm()
- restartp()
- hascore()
Section 2.2: BOS Server Directories
- The BOS Server expects the existence of the following directories on the local disk of the platform on which it runs. These directories define where the system binaries, log files, ubik databases, and other files lie.
- /usr/afs/bin: This directory houses the full set of AFS server binaries. Such executables as bosserver, fileserver, vlserver, volserver, kaserver, and ptserver reside here.
- /usr/afs/db: This directory serves as the well-known location on the server's local disk for the ubik database replicas for this machine. Specifically, the Authentication Server, Protection Server, and the Volume Location Server maintain their local database images here.
- /usr/afs/etc: This directory hosts the files containing the security, cell, and authorized system administrator list for the given machine. Specifically, the CellServDB, KeyFile, License, ThisCell, and UserList files are stored here.
- /usr/afs/local: This directory houses the BosConfig file, which supplies the BOS Server with the permanent set of bnodes to support. Also contained in this directory are files used exclusively by the salvager.
- /usr/afs/logs: All of the AFS server programs that maintain log files deposit them in this directory.
Section 2.3: BOS Server Files
- Several files, some mentioned above, are maintained on the server's local disk and manipulated by the BOS Server. This section examines many of these files, and describes their formats.
Section 2.3.1: /usr/afs/etc/UserList
- This file contains the names of individuals who are allowed to issue "restricted" BOS Server commands (e.g., creating & deleting bnodes, setting cell information, etc.) on the given hardware platform. The format is straightforward, with one administrator name per line. If a cell grants joe and schmoe these rights on a machine, that particular UserList will have the following two lines:
joe
schmoe
Section 2.3.2: /usr/afs/etc/CellServDB
- This file identifies the name of the cell to which the given server machine belongs, along with the set of machines on which its ubik database servers are running. Unlike the CellServDB found in /usr/vice/etc on AFS client machines, this file contains only the entry for the home cell. It shares the formatting rules with the /usr/vice/etc version of CellServDB. The contents of the CellServDB file used by the BOS Server on host grand.central.org are:
>grand.central.org #DARPA clearinghouse cell
192.54.226.100 #grand.central.org
192.54.226.101 #penn.central.org
Section 2.3.3: /usr/afs/etc/ThisCell
- The BOS Server obtains its notion of cell membership from the ThisCell file in the specified directory. As with the version of ThisCell found in /usr/vice/etc on AFS client machines, this file simply contains the character string identifying the home cell name. For any server machine in the grand.central.org cell, this file contains the following:
Section 2.3.4: /usr/afs/local/BosConfig
- The BosConfig file is the on-disk representation of the collection of bnodes this particular BOS Server manages. The BOS Server reads and writes to this file in the normal course of its affairs. The BOS Server itself, in fact, should be the only agent that modifies this file. Any changes to BosConfig should be carried out by issuing the proper RPCs to the BOS Server running on the desired machine.
- The following is the text of the BosConfig file on grand.central.org. A discussion of the contents follows immediately afterwards.
restarttime 11 0 4 0 0 checkbintime 3 0 5 0 0
bnode simple kaserver 1 parm /usr/afs/bin/kaserver
end bnode simple ptserver 1 parm
/usr/afs/bin/ptserver end bnode simple vlserver 1
parm /usr/afs/bin/vlserver end bnode fs fs 1 parm
/usr/afs/bin/fileserver parm /usr/afs/bin/volserver
parm /usr/afs/bin/salvager end bnode simple runntp
1 parm /usr/afs/bin/runntp -localclock transarc.com
end bnode simple upserver 1 parm
/usr/afs/bin/upserver end bnode simple
budb_server 1 parm /usr/afs/bin/budb_server end
bnode cron backup 1 parm
/usr/afs/backup/clones/lib/backup.csh daily parm
05:00 end
- The first two lines of this file set the system and new-binary restart times (see Section 2.4, below). They are optional, with the default system restart time being every Sunday at 4:00am and the new-binary restart time being 5:00am daily. Following the reserved words restarttime and checkbintime are five integers, providing the mask, day, hour, minute, and second values (in decimal) for the restart time, as diagrammed below:
restarttime <mask> <day> <hour> <minute>
<second> checkbintime <mask> <day> <hour>
<minute> <second>
- The range of acceptable values for these fields is presented in Section 3.3.1. In this example, the restart line specifies a (decimal) mask value of 11, selecting the KTIME HOUR, KTIME MIN, and KTIME DAY bits. This indicates that the hour, minute, and day values are the ones to be used when matching times. Thus, this line requests that system restarts occur on day 0 (Sunday), hour 4 (4:00am), and minute 0 within that hour.
- The sets of lines that follow define the individual bnodes for the particular machine. The first line of the bnode definition set must begin with the reserved word bnode, followed by the type name, the instance name, and the initial bnode goal:
bnode <type_name> <instance_name> <goal_val>
- The <type name>=""> and <instance name>=""> fields are both character strings, and the <goal val>=""> field is an integer. Acceptable values for the <type name>=""> are simple, fs, and cron. Acceptable values for <goal val>=""> are defined in Section 3.2.3, and are normally restricted to the integer values representing BSTAT NORMAL and BSTAT SHUTDOWN. Thus, in the bnode line defining the Authentication Server, it is declared to be of type simple, have instance name kaserver, and have 1 (BSTAT NORMAL) as a goal (e.g., it should be brought up and kept running).
- Following the bnode line in the BosConfig file may be one or more parm lines. These entries represent the command line parameters that will be used to invoke the proper related program or programs. The entire text of the line after the parm reserved word up to the terminating newline is stored as the command line string.
- After the parm lines, if any, the reserved word end must appear alone on a line, marking the end of an individual bnode definition.
Section 2.3.5: /usr/afs/local/NoAuth
- The appearance of this file is used to mark whether the BOS Server is to insist on properly authenticated connections for its restricted operations or whether it will allow any caller to exercise these commands. Not only is the BOS Server affected by the presence of this file, but so are all of the bnodes objects the BOS Server starts up. If /usr/afs/local/NoAuth is present, the BOS Server will start all of its bnodes with the -noauth flag.
- Completely unauthenticated AFS operation will result if this file is present when the BOS Server starts execution. The file itself is typically empty. If any data is put into the NoAuth file, it will be ignored by the system.
Section 2.3.6: /usr/afs/etc/KeyFile
- This file stores the set of AFS encryption keys used for file service operations. The file follows the format defined by struct afsconf key and struct afsconf keys in include file afs/keys.h. For the reader's convenience, these structures are detailed below:
- The first longword of the file reveals the number of keys that may be found there, with a maximum of AFSCONF MAXKEYS (8). The keys themselves follow, each preceded by its key version number.
- All information in this file is stored in network byte order. Each BOS Server converts the data to the appropriate host byte order befor storing and manipulating it.
Section 2.4: Restart Times
- It is possible to manually start or restart any server defined within the set of BOS Server bnodes from any AFS client machine, simply by making the appropriate call to the RPC interface while authenticated as a valid administrator (i.e., a principal listed in the UserList file on the given machine). However, two restart situations merit the implementation of special functionality within the BOS Server. There are two common occasions, occuring on a regular basis, where the entire system or individual server programs should be brought down and restarted:
- Complete system restart: To guard against the reliability and performance problems caused by any core leaks in long-running programs, the entire AFS system should be occasionally shut down and restarted periodically. This action 'clears out' the memory system, and may result in smaller memory images for these servers, as internal data structures are reinitialized back to their starting sizes. It also allows AFS partitions to be regularly examined, and salvaged if necessary.
- Another reason for performing a complete system restart is to commence execution of a new release of the BOS Server itself. The new-binary restarts described below do not restart the BOS Server if a new version of its software has been installed on the machine.
- New-binary restarts: New server software may be installed at any time with the assistance of the BOS Server. However, it is often not the case that such software installations occur as a result of the discovery of an error in the program or programs requiring immediate restart. On these occasions, restarting the given processes in prime time so that the new binaries may begin execution is counterproductive, causing system downtime and interfering with user productivity. The system administrator may wish to set an off-peak time when the server binaries are automatically compared to the running program images, and restarts take place should the on-disk binary be more recent than the currently running program. These restarts would thus minimize the resulting service disruption.
- Automatically performing these restart functions could be accomplished by creating cron-type bnodes that were defined to execute at the desired times. However, rather than force the system administrator to create and supervise such bnodes, the BOS Server supports the notion of an internal LWP thread with the same effect (see Section 2.5.2). As part of the BosConfig file defined above, the administrator may simply specify the values for the complete system restart and new-binary restart times, and a dedicated BOS Server thread will manage the restarts.
- Unless otherwise instructed, the BOS Server selects default times for the two above restart times. A complete system restart is carried out every Sunday at 4:00am by default, and a new-binary restart is executed for each updated binary at 5:00am every day.
Section 2.5: The bosserver Process
Section 2.5.1: Introduction
- The user-space bosserver process is in charge of managing the AFS server processes and software images, the local security and cell database files, and allowing administrators to execute arbitrary programs on the server machine on which it runs. It also implements the RPC interface defined in the bosint.xg Rxgen definition file.
Section 2.5.2: Threading
- As with all the other AFS server agents, the BOS Server is a multithreaded program. It is configured so that a minimum of two lightweight threads are guaranteed to be allocated to handle incoming RPC calls to the BOS Server, and a maximum of four threads are commissioned for this task.
- In addition to these threads assigned to RPC duties, there is one other thread employed by the BOS Server, the BozoDaemon(). This thread is responsible for keeping track of the two major restart events, namely the system restart and the new binary restart (see Section 2.4). Every 60 seconds, this thread is awakened, at which time it checks to see if either deadline has occurred. If the complete system restart is then due, it invokes internal BOS Server routines to shut down the entire suite of AFS agents on that machine and then reexecute the BOS Server binary, which results in the restart of all of the server processes. If the new-binary time has arrived, the BOS Server shuts down the bnodes for which binaries newer than those running are available, and then invokes the new software.
- In general, the following procedure is used when stopping and restarting processes. First, the restart() operation defined for each bnode's class is invoked via the BOP RESTART() macro. This allows each server to take any special steps required before cycling its service. After that function completes, the standard mechanisms are used to shut down each bnode's process, wait until it has truly stopped its execution, and then start it back up again.
Section 2.5.3: Initialization Algorithm
- This section describes the procedure followed by the BOS Server from the time when it is invoked to the time it has properly initialized the server machine upon which it is executing.
- The first check performed by the BOS Server is whether or not it is running as root. It needs to manipulate local unix files and directories in which only root has been given access, so it immediately exits with an error message if this is not the case. The BOS Server's unix working directory is then set to be /usr/afs/logs in order to more easily service incoming RPC requests to fetch the contents of the various server log files at this location. Also, changing the working directory in this fashion results in any core images dumped by the BOS Server's wards will be left in /usr/afs/logs.
- The command line is then inspected, and the BOS Server determines whether it will insist on authenticated RPC connections for secure administrative operations by setting up the /usr/afs/local/NoAuth file appropriately (see Section 2.3.5). It initializes the underlying bnode package and installs the three known bnode types (simple, fs, and cron).
- After the bnode package is thus set up, the BOS Server ensures that the set of local directories on which it will depend are present; refer to Section 2.2 for more details on this matter. The license file in /usr/afs/etc/License is then read to determine the number of AFS server machines the site is allowed to operate, and whether the cell is allowed to run the NFS/AFS Translator software. This file is typically obtained in the initial system installation, taken from the installation tape. The BOS Server will exit unless this file exists and is properly formatted.
- In order to record its actions, any existing /usr/afs/logs/BosLog file is moved to BosLog.old, and a new version is opened in append mode. The BOS Server immediately writes a log entry concerning the state of the above set of important directories.
- At this point, the BOS Server reads the BosConfig file, which lists the set of bnodes for which it will be responsible. It starts up the processes associated with the given bnodes. Once accomplished, it invokes its internal system restart LWP thread (covered in Section 2.5.2 above).
- Rx initialization begins at this point, setting up the RPC infrastructure to receive its packets on the AFSCONF NANNYPORT, UDP port 7007. The local cell database is then read and internalized, followed by acquisition of the AFS encryption keys.
- After all of these steps have been carried out, the BOS Server has gleaned all of the necessary information from its environemnt and has also started up its wards. The final initialization action required is to start all of its listener LWP threads, which are devoted to executing incoming requests for the BOS Server RPC interface.
Section 2.5.4: Command Line Switches
- The BOS Server recognizes exactly one command line argument: -noauth. By default, the BOS Server attempts to use authenticated RPC connections (unless the /usr/afs/local/NoAuth file is present; see Section 2.3.5). The appearance of the -noauth command line flag signals that this server incarnation is to use unauthenticated connections for even those operations that are normally restricted to system administrators. This switch is essential during the initial AFS system installation, where the procedures followed to bootstrap AFS onto a new machine require the BOS Server to run before some system databases have been created.
Section 2.1: Bnodes
- The AFS File Server is a user-level process that presides over the raw disk partitions on which it supports one or more volumes. It provides 'half' of the fundamental service of the system, namely exporting and regimenting access to the user data entrusted to it. The Cache Manager provides the other half, acting on behalf of its human users to locate and access the files stored on the file server machines.
- This chapter examines the structure of the File Server process. First, the set of AFS agents with which it must interact are discussed. Next, the threading structure of the server is examined. Some details of its handling of the race conditions created by the callback mechanism are then presented. This is followed by a discussion of the read-only volume synchronization mechanism. This functionality is used in each RPC interface call and intended to detect new releases of read-only volumes. File Servers do not generate callbacks for objects residing in read-only volumes, so this synchronization information is used to implement a 'whole-volume' callback. Finally, the fact that the File Server may drop certain information recorded about the Cache Managers with which it has communicated and yet guarantee correctness of operation is explored.
Section 2.2: BOS Server Directories
- By far the most frequent partner in File Server interactions is the set of Cache Managers actively fetching and storing chunks of data files for which the File Server provides central storage facilities. The File Server also periodically probes the Cache Managers recorded in its tables with which it has recently dealt, determining if they are still active or whether their records might be garbage-collected.
- There are two other server entities with which the File Server interacts, namely the Protection Server and the BOS Server. Given a fetch or store request generated by a Cache Manager, the File Server needs to determine if the caller is authorized to perform the given operation. An important step in this process is to determine what is referred to as the caller's Current Protection Subdomain, or CPS. A user's CPS is a list of principals, beginning with the user's internal identifier, followed by the the numerical identifiers for all groups to which the user belongs. Once this CPS information is determined, the File Server scans the ACL controlling access to the file system object in question. If it finds that the ACL contains an entry specifying a principal with the appropriate rights which also appears in the user's CPS, then the operation is cleared. Otherwise, it is rejected and a protection violation is reported to the Cache Manager for ultimate reflection back to the caller.
- The BOS Server performs administrative operations on the File Server process. Thus, their interactions are quite one-sided, and always initiated by the BOS Server. The BOS Server does not utilize the File Server's RPC interface, but rather generates unix signals to achieve the desired effect.
Section 2.3: BOS Server Files
- The File Server is organized as a multi-threaded server. Its threaded behavior within a single unix process is achieved by use of the LWP lightweight process facility, as described in detail in the companion "AFS-3 Programmer's
Reference: Specification for the Rx Remote Procedure Call Facility" document. The various threads utilized by the File Server are described below:
- WorkerLWP: This lightweight process sleeps until a request to execute one of the RPC interface functions arrives. It pulls the relevant information out of the request, including any incoming data delivered as part of the request, and then executes the server stub routine to carry out the operation. The thread finishes its current activation by feeding the return code and any output data back through the RPC channel back to the calling Cache Manager. The File Server initialization sequence specifies that at least three but no more than six of these WorkerLWP threads are to exist at any one time. It is currently not possible to configure the File Server process with a different number of WorkerLWP threads.
- FiveMinuteCheckLWP: This thread runs every five minutes, performing such housekeeping chores as cleaning up timed-out callbacks, setting disk usage statistics, and executing the special handling required by certain AIX implementations. Generally, this thread performs activities that do not take unbounded time to accomplish and do not block the thread. If reassurance is required, FiveMinuteCheckLWP can also be told to print out a banner message to the machine's console every so often, stating that the File Server process is still running. This is not strictly necessary and an artifact from earlier versions, as the File Server's status is now easily accessible at any time through the BOS Server running on its machine.
- HostCheckLWP: This thread, also activated every five minutes, performs periodic checking of the status of Cache Managers that have been previously contacted and thus appear in this File Server's internal tables. It generates RXAFSCB Probe() calls from the Cache Manager interface, and may find itself suspended for an arbitrary amount of time when it enounters unreachable Cache Managers.
Section 2.4: Restart Times
- Callbacks serve to implement the efficient AFS cache consistency mechanism, as described in Section 1.1.1. Because of the asynchronous nature of callback generation and the multi-threaded operation and organization of both the File Server and Cache Manager, race conditions can arise in their use. As an example, consider the case of a client machine fetching a chunk of file X. The File Server thread activated to carry out the operation ships the contents of the chunk and the callback information over to the requesting Cache Manager. Before the corresponding Cache Manager thread involved in the exchange can be scheduled, another request arrives at the File Server, this time storing a modified image of the same chunk from file X. Another worker thread comes to life and completes processing of this second request, including execution of an RXAFSCB CallBack() to the Cache Manager who still hasn't picked up on the results of its fetch operation. If the Cache Manager blindly honors the RXAFSCB CallBack() operation first and then proceeds to process the fetch, it will wind up believing it has a callback on X when in reality it is out of sync with the central copy on the File Server. To resolve the above class of callback race condition, the Cache Manager effectively doublechecks the callback information received from File Server calls, making sure they haven't already been nullified by other file system activity.
Section 2.5: The bosserver Process
- The File Server issues a callback for each file chunk it delivers from a read-write volume, thus allowing Cache Managers to efficiently synchronize their local caches with the authoritative File Server images. However, no callbacks are issued when data from read-only volumes is delivered to clients. Thus, it is possible for a new snapshot of the read-only volume to be propagated to the set of replication sites without Cache Managers becoming aware of the event and marking the appropriate chunks in their caches as stale. Although the Cache Manager refreshes its volume version information periodically (once an hour), there is still a window where a Cache Manager will fail to notice that it has outdated chunks.
- The volume synchronization mechanism was defined to close this window, resulting in what is nearly a 'whole-volume' callback device for read-only volumes. Each File Server RPC interface function handling the transfer of file data is equipped with a parameter (a volSyncP), which carries this volume synchronization information. This parameter is set to a non-zero value by the File Server exclusively when the data being fetched is coming from a read-only volume. Although the struct AFSVolSync defined in Section 5.1.2.2 passed via a volSyncP consists of six longwords, only the first one is set. This leading longword carries the creation date of the read-only volume. The Cache Manager immediately compares the synchronization value stored in its cached volume information against the one just received. If they are identical, then the operation is free to complete, secure in the knowledge that all the information and files held from that volume are still current. A mismatch, though, indicates that every file chunk from this volume is potentially out of date, having come from a previous release of the read-only volume. In this case, the Cache Manager proceeds to mark every chunk from this volume as suspect. The next time the Cache Manager considers accessing any of these chunks, it first checks with the File Server it came from which the chunks were obtained to see if they are up to date.
Section 2.6: Disposal of Cache Manager Records
- Every File Server, when first starting up, will, by default, allocate enough space to record 20,000 callback promises (see Section 5.3 for how to override this default). Should the File Server fully populate its callback records, it will not allocate more, allowing its memory image to possibly grow in an unbounded fashion. Rather, the File Server chooses to break callbacks until it acquires a free record. All reachable Cache Managers respond by marking their cache entries appropriately, preserving the consistency guarantee. In fact, a File Server may arbitrarily and unilaterally purge itself of all records associated with a particular Cache Manager. Such actions will reduce its performance (forcing these Cache Managers to revalidate items cached from that File Server) without sacrificing correctness.
Section 2.1: Bnodes
- This chapter describes a package allowing multiple threads of control to coexist and cooperate within one unix process. Each such thread of control is also referred to as a lightweight process, in contrast to the traditional unix (heavyweight) process. Except for the limitations of a fixed stack size and non-preemptive scheduling, these lightweight processes possess all the properties usually associated with full-fledged processes in typical operating systems. For the purposes of this document, the terms lightweight process, LWP, and thread are completely interchangeable, and they appear intermixed in this chapter. Included in this lightweight process facility are various sub-packages, including services for locking, I/O control, timers, fast time determination, and preemption.
- The Rx facility is not the only client of the LWP package. Other LWP clients within AFS include the file Server, Protection Server, BOS Server, Volume Server, Volume Location Server, and the Authentication Server, along with many of the AFS application programs.
Section 2.2: BOS Server Directories
2.2.1: sec2-2-1 LWP Overview
- The LWP package implements primitive functions that provide the basic facilities required to enable procedures written in C to execute concurrently and asynchronously. The LWP package is meant to be general-purpose (note the applications mentioned above), with a heavy emphasis on simplicity. Interprocess communication facilities can be built on top of this basic mechanism and in fact, many different IPC mechanisms could be implemented.
- In order to set up the threading support environment, a one-time invocation of the LWP InitializeProcessSupport() function must precede the use of the facilities described here. This initialization function carves an initial process out of the currently executing C procedure and returns its thread ID. For symmetry, an LWP TerminateProcessSupport() function may be used explicitly to release any storage allocated by its counterpart. If this function is used, it must be issued from the thread created by the original LWP InitializeProcessSupport() invocation.
- When any of the lightweight process functions completes, an integer value is returned to indicate whether an error condition was encountered. By convention, a return value of zero indicates that the operation succeeded.
- Macros, typedefs, and manifest constants for error codes needed by the threading mechanism are exported by the lwp.h include file. A lightweight process is identified by an object of type PROCESS, which is defined in the include file.
- The process model supported by the LWP operations is based on a non-preemptive priority dispatching scheme. A priority is an integer in the range [0..LWP MAX PRIORITY], where 0 is the lowest priority. Once a given thread is selected and dispatched, it remains in control until it voluntarily relinquishes its claim on the CPU. Control may be relinquished by either explicit means (LWP_DispatchProcess()) or implicit means (through the use of certain other LWP operations with this side effect). In general, all LWP operations that may cause a higher-priority process to become ready for dispatching preempt the process requesting the service. When this occurs, the dispatcher mechanism takes over and automatically schedules the highest-priority runnable process. Routines in this category, where the scheduler is guaranteed to be invoked in the absence of errors, are:
- LWP_WaitProcess()
- LWP_MwaitProcess()
- LWP_SignalProcess()
- LWP_DispatchProcess()
- LWP_DestroyProcess()
- The following functions are guaranteed not to cause preemption, and so may be issued with no fear of losing control to another thread:
- LWP_InitializeProcessSupport()
- LWP_NoYieldSignal()
- LWP_CurrentProcess()
- LWP_ActiveProcess()
- LWP_StackUsed()
- LWP_NewRock()
- LWP_GetRock()
- The symbol LWP NORMAL PRIORITY, whose value is (LWP MAX PRIORITY-2), provides a reasonable default value to use for process priorities.
- The lwp debug global variable can be set to activate or deactivate debugging messages tracing the flow of control within the LWP routines. To activate debugging messages, set lwp debug to a non-zero value. To deactivate, reset it to zero. All debugging output from the LWP routines is sent to stdout.
- The LWP package checks for stack overflows at each context switch. The variable that controls the action of the package when an overflow occurs is lwp overflowAction. If it is set to LWP SOMESSAGE, then a message will be printed on stderr announcing the overflow. If lwp overflowAction is set to LWP SOABORT, the abort() LWP routine will be called. finally, if lwp overflowAction is set to LWP SOQUIET, the LWP facility will ignore the errors. By default, the LWP SOABORT setting is used.
- Here is a sketch of a simple program (using some psuedocode) demonstrating the high-level use of the LWP facility. The opening #include line brings in the exported LWP definitions. Following this, a routine is defined to wait on a "queue" object until something is deposited in it, calling the scheduler as soon as something arrives. Please note that various LWP routines are introduced here. Their definitions will appear later, in Section 2.3.1.
#include <afs/lwp.h>
static read_process(id)
int *id;
{
LWP_DispatchProcess();
for (;;)
{
while (empty(q)) LWP_WaitProcess(q);
LWP_DispatchProcess();
}
}
- The next routine, write process(), sits in a loop, putting messages on the shared queue and signalling the reader, which is waiting for activity on the queue. Signalling a thread is accomplished via the LWP SignalProcess() library routine.
static write_process()
{ ...
for (mesg = messages; *mesg != 0; mesg++)
{
insert(q, *mesg);
LWP_SignalProcess(q);
}
}
- finally, here is the main routine for this demo pseudocode. It starts by calling the LWP initialization routine. Next, it creates some number of reader threads with calls to LWP CreateProcess() in addition to the single writer thread. When all threads terminate, they will signal the main routine on the done variable. Once signalled, the main routine will reap all the threads with the help of the LWP DestroyProcess() function.
main(argc, argv)
int argc;
char **argv;
{
PROCESS *id;
LWP_InitializeProcessSupport(0, &id);
for (i = 0; i < nreaders; i++)
LWP_CreateProcess(read_process, STACK_SIZE, 0, i, "Reader",
&readers[i]);
LWP_CreateProcess(write_process, STACK_SIZE, 1, 0, "Writer", &writer);
for (i = 0; i <= nreaders; i++)
LWP_WaitProcess(&done);
for (i = nreaders-1; i >= 0; i--)
LWP_DestroyProcess(readers[i]);
}
Section 2.2.2: Locking
- The LWP locking facility exports a number of routines and macros that allow a C programmer using LWP threading to place read and write locks on shared data structures. This locking facility was also written with simplicity in mind.
- In order to invoke the locking mechanism, an object of type struct Lock must be associated with the object. After being initialized with a call to LockInit(), the lock object is used in invocations of various macros, including ObtainReadLock(), ObtainWriteLock(), ReleaseReadLock(), ReleaseWriteLock(), ObtainSharedLock(), ReleaseSharedLock(), and BoostSharedLock().
- Lock semantics specify that any number of readers may hold a lock in the absence of a writer. Only a single writer may acquire a lock at any given time. The lock package guarantees fairness, legislating that each reader and writer will eventually obtain a given lock. However, this fairness is only guaranteed if the priorities of the competing processes are identical. Note that ordering is not guaranteed by this package.
- Shared locks are read locks that can be "boosted" into write locks. These shared locks have an unusual locking matrix. Unboosted shared locks are compatible with read locks, yet incompatible with write locks and other shared locks. In essence, a thread holding a shared lock on an object has effectively read-locked it, and has the option to promote it to a write lock without allowing any other writer to enter the critical region during the boost operation itself.
- It is illegal for a process to request a particular lock more than once without first releasing it. Failure to obey this restriction will cause deadlock. This restriction is not enforced by the LWP code.
- Here is a simple pseudocode fragment serving as an example of the available locking operations. It defines a struct Vnode object, which contains a lock object. The get vnode() routine will look up a struct Vnode object by name, and then either read-lock or write-lock it.
- As with the high-level LWP example above, the locking routines introduced here will be fully defined later, in Section 2.3.2.
#include <afs/lock.h>
struct Vnode {
...
struct Lock lock; Used to lock this vnode
... };
#define READ 0
#define WRITE 1
struct Vnode *get_vnode(name, how) char *name;
int how;
{
struct Vnode *v;
v = lookup(name);
if (how == READ)
ObtainReadLock(&v->lock);
else
ObtainWriteLock(&v->lock);
}
Section 2.2.3: IOMGR
- The IOMGR facility associated with the LWP service allows threads to wait on various unix events. The exported IOMGR Select() routine allows a thread to wait on the same set of events as the unix select() call. The parameters to these two routines are identical. IOMGR Select() puts the calling LWP to sleep until no threads are active. At this point, the built-in IOMGR thread, which runs at the lowest priority, wakes up and coalesces all of the select requests together. It then performs a single select() and wakes up all threads affected by the result.
- The IOMGR Signal() routine allows an LWP to wait on the delivery of a unix signal. The IOMGR thread installs a signal handler to catch all deliveries of the unix signal. This signal handler posts information about the signal delivery to a global data structure. The next time that the IOMGR thread runs, it delivers the signal to any waiting LWP.
- Here is a pseudocode example of the use of the IOMGR facility, providing the blueprint for an implemention a thread-level socket listener.
void rpc_SocketListener()
{
int ReadfdMask, WritefdMask, ExceptfdMask, rc;
struct timeval *tvp;
while(TRUE)
{ ...
ExceptfdMask = ReadfdMask = (1 << rpc_RequestSocket);
WritefdMask = 0;
rc = IOMGR_Select(8*sizeof(int), &ReadfdMask, &WritefdMask,
&ExceptfdMask, tvp);
switch(rc)
{
case 0: continue;
case -1:
SystemError("IOMGR_Select");
exit(-1);
case 1: ...
process packet ...
break;
default: Should never occur
}
}
}
Section 2.2.4: Timer
- The timer package exports a number of routines that assist in manipulating lists of objects of type struct TM Elem. These struct TM Elem timers are assigned a timeout value by the user and inserted in a package-maintained list. The time remaining to each timer's timeout is kept up to date by the package under user control. There are routines to remove a timer from its list, to return an expired timer from a list, and to return the next timer to expire.
- A timer is commonly used by inserting a field of type struct TM Elem into a structure. After setting the desired timeout value, the structure is inserted into a list by means of its timer field.
- Here is a simple pseudocode example of how the timer package may be used. After calling the package initialization function, TM Init(), the pseudocode spins in a loop. first, it updates all the timers via TM Rescan() calls. Then, it pulls out the first expired timer object with TM GetExpired() (if any), and processes it.
static struct TM_Elem *requests;
...
TM_Init(&requests); ...
for (;;) {
TM_Rescan(requests);
expired = TM_GetExpired(requests);
if (expired == 0)
break;
. . . process expired element . . .
}
Section 2.2.5: Fast Time
- The fast time routines allows a caller to determine the current time of day without incurring the expense of a kernel call. It works by mapping the page of the kernel that holds the time-of-day variable and examining it directly. Currently, this package only works on Suns. The routines may be called on other architectures, but they will run more slowly.
- The initialization routine for this package is fairly expensive, since it does a lookup of a kernel symbol via nlist(). If the client application program only runs for only a short time, it may wish to call FT Init() with the notReally parameter set to TRUE in order to prevent the lookup from taking place. This is useful if you are using another package that uses the fast time facility.
Section 2.2.6: Preemption
- The preemption package provides a mechanism by which control can pass between lightweight processes without the need for explicit calls to LWP DispatchProcess(). This effect is achieved by periodically interrupting the normal flow of control to check if other (higher priority) procesess are ready to run.
- The package makes use of the BSD interval timer facilities, and so will cause programs that make their own use of these facilities to malfunction. In particular, use of alarm(3) or explicit handling of SIGALRM is disallowed. Also, calls to sleep(3) may return prematurely.
- Care should be taken that routines are re-entrant where necessary. In particular, note that stdio(3) is not re-entrant in general, and hence multiple threads performing I/O on the same fiLE structure may function incorrectly.
- An example pseudocode routine illustrating the use of this preemption facility appears below.
#include <sys/time.h>
#include "preempt.h"
... struct timeval tv;
LWP_InitializeProcessSupport( ... );
tv.tv_sec = 10;
tv.tv_usec = 0;
PRE_InitPreempt(&tv);
PRE_PreemptMe(); ...
PRE_BeginCritical(); ...
PRE_EndCritical(); ...
PRE_EndPreempt();
Section 2.3: BOS Server Files
Section 2.3.1: /usr/afs/etc/UserList
- This section covers the calling interfaces to the LWP package. Please note that LWP macros (e.g., ActiveProcess) are also included here, rather than being relegated to a different section.
Section 2.3.1.1: LWP_InitializeProcessSupport
_ Initialize the LWP package
- int LWP_InitializeProcessSupport(IN int priority; OUT PROCESS *pid)
- Description
- This function initializes the LWP package. In addition, it turns the current thread of control into the initial process with the specified priority. The process ID of this initial thread is returned in the pid parameter. This routine must be called before any other routine in the LWP library. The scheduler will NOT be invoked as a result of calling LWP_InitializeProcessSupport().
- Error Codes
- LWP EBADPRI The given priority is invalid, either negative or too large.
Section 2.3.1.2: LWP_TerminateProcessSupport
_ End process support, perform cleanup
- int LWP_TerminateProcessSupport()
- Description
- This routine terminates the LWP threading support and cleans up after it by freeing any auxiliary storage used. This routine must be called from within the process that invoked LWP InitializeProcessSupport(). After LWP TerminateProcessSupport() has been called, it is acceptable to call LWP InitializeProcessSupport() again in order to restart LWP process support.
- Error Codes
- ---Always succeeds, or performs an abort().
Section 2.3.1.3: LWP_CreateProcess _ Create a
new thread
- int LWP_CreateProcess(IN int (*ep)(); IN int stacksize; IN int priority; IN char *parm; IN char *name; OUT PROCESS *pid)
- Description
- This function is used to create a new lightweight process with a given printable name. The ep argument identifies the function to be used as the body of the thread. The argument to be passed to this function is contained in parm. The new thread's stack size in bytes is specified in stacksize, and its execution priority in priority. The pid parameter is used to return the process ID of the new thread.
- If the thread is successfully created, it will be marked as runnable. The scheduler is called before the LWP CreateProcess() call completes, so the new thread may indeed begin its execution before the completion. Note that the new thread is guaranteed NOT to run before the call completes if the specified priority is lower than the caller's. On the other hand, if the new thread's priority is higher than the caller's, then it is guaranteed to run before the creation call completes.
- Error Codes
- LWP EBADPRI The given priority is invalid, either negative or too large.
LWP NOMEM Could not allocate memory to satisfy the creation request.
Section: 2.3.1.4: LWP_DestroyProcess _ Create
a new thread
- int LWP_DestroyProcess(IN PROCESS pid)
- Description
- This routine destroys the thread identified by pid. It will be terminated immediately, and its internal storage will be reclaimed. A thread is allowed to destroy itself. In this case, of course, it will only get to see the return code if the operation fails. Note that a thread may also destroy itself by returning from the parent C routine.
- The scheduler is called by this operation, which may cause an arbitrary number of threads to execute before the caller regains the processor.
- Error Codes
- LWP EINIT The LWP package has not been initialized.
Section 2.3.1.5: WaitProcess _ Wait on an
event
- int LWP WaitProcess(IN char *event)
- Description
- This routine puts the thread making the call to sleep until another LWP calls the LWP SignalProcess() or LWP NoYieldSignal() routine with the specified event. Note that signalled events are not queued. If a signal occurs and no thread is awakened, the signal is lost. The scheduler is invoked by the LWP WaitProcess() routine.
- Error Codes
- LWP EINIT The LWP package has not been initialized.
LWP EBADEVENT The given event pointer is null.
Section 2.3.1.6: MwaitProcess _ Wait on a set
of events
- int LWP MwaitProcess(IN int wcount; IN char *evlist[])
- Description
- This function allows a thread to wait for wcount signals on any of the items in the given evlist. Any number of signals of a particular event are only counted once. The evlist is a null-terminated list of events to wait for. The scheduler will be invoked.
- Error Codes
- LWP EINIT The LWP package has not been initialized.
LWP EBADCOUNT An illegal number of events has been supplied.
Section 2.3.1.7: SignalProcess _ Signal an
event
- int LWP SignalProcess(IN char *event)
- Description
- This routine causes the given event to be signalled. All threads waiting for this event (exclusively) will be marked as runnable, and the scheduler will be invoked. Note that threads waiting on multiple events via LWP MwaitProcess() may not be marked as runnable. Signals are not queued. Therefore, if no thread is waiting for the signalled event, the signal will be lost.
- Error Codes
- LWP EINIT The LWP package has not been initialized. LWP EBADEVENT A null event pointer has been provided. LWP ENOWAIT No thread was waiting on the given event.
Section 2.3.1.8: NoYieldSignal _ Signal an
event without invoking scheduler
- int LWP NoYieldSignal(IN char *event)
- Description
- This function is identical to LWP SignalProcess() except that the scheduler will not be invoked. Thus, control will remain with the signalling process.
- Error Codes
- LWP EINIT The LWP package has not been initialized. LWP EBADEVENT A null event pointer has been provided. LWP ENOWAIT No thread was waiting on the given event.
Section 2.3.1.9: DispatchProcess _ Yield
control to the scheduler
- int LWP DispatchProcess()
- Description
- This routine causes the calling thread to yield voluntarily to the LWP scheduler. If no other thread of appropriate priority is marked as runnable, the caller will continue its execution.
- Error Codes
- LWP EINIT The LWP package has not been initialized.
Section 2.3.1.10: CurrentProcess _ Get the
current thread's ID
- int LWP CurrentProcess(IN PROCESS *pid)
- Description
- This call places the current lightweight process ID in the pid parameter.
- Error Codes
- LWP EINIT The LWP package has not been initialized.
Section 2.3.1.11: ActiveProcess _ Get the
current thread's ID (macro)
- int LWP ActiveProcess()
- Description
- This macro's value is the current lightweight process ID. It generates a value identical to that acquired by calling the LWP CurrentProcess() function described above if the LWP package has been initialized. If no such initialization has been done, it will return a value of zero.
Section: 2.3.1.12: StackUsed _ Calculate
stack usage
- int LWP StackUsed(IN PROCESS pid; OUT int *max; OUT int *used)
- Description
- This function returns the amount of stack space allocated to the thread whose identifier is pid, and the amount actually used so far. This is possible if the global variable lwp stackUseEnabled was TRUE when the thread was created (it is set this way by default). If so, the thread's stack area was initialized with a special pattern. The memory still stamped with this pattern can be determined, and thus the amount of stack used can be calculated. The max parameter is always set to the thread's stack allocation value, and used is set to the computed stack usage if lwp stackUseEnabled was set when the process was created, or else zero.
- Error Codes
- LWP NO STACK Stack usage was not enabled at thread creation time.
Section 2.3.1.13: NewRock _ Establish
thread-specific storage
- int LWP NewRock (IN int tag; IN char **value)
- Description
- This function establishes a "rock", or thread-specific information, associating it with the calling LWP. The tag is intended to be any unique integer value, and the value is a pointer to a character array containing the given data.
- Users of the LWP package must coordinate their choice of tag values. Note that a tag's value cannot be changed. Thus, to obtain a mutable data structure, another level of indirection is required. Up to MAXROCKS (4) rocks may be associated with any given thread.
- Error Codes
- ENOROCKS A rock with the given tag field already exists. All of the MAXROCKS are in use.
Section: 2.3.1.14: GetRock _ Retrieve
thread-specific storage
- int LWP GetRock(IN int tag; OUT **value)
- Description
- This routine recovers the thread-specific information associated with the calling process and the given tag, if any. Such a rock had to be established through a LWP NewRock() call. The rock's value is deposited into value.
- Error Codes
- LWP EBADROCK A rock has not been associated with the given tag for this thread.
Section 2.3.2: /usr/afs/etc/CellServDB
- This section covers the calling interfaces to the locking package. Many of the user-callable routines are actually implemented as macros.
Section 2.3.2.1: Lock Init _ Initialize lock
structure
- void Lock Init(IN struct Lock *lock)
- Description
- This function must be called on the given lock object before any other operations can be performed on it.
- Error Codes
- ---No value is returned.
Section 2.3.2.2: ObtainReadLock _ Acquire a
read lock
- void ObtainReadLock(IN struct Lock *lock)
- Description
- This macro obtains a read lock on the specified lock object. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation. Read locks are incompatible with write, shared, and boosted shared locks.
- Error Codes
- ---No value is returned.
Section 2.3.2.3: ObtainWriteLock _ Acquire a
write lock
- void ObtainWriteLock(IN struct Lock *lock)
- Description
- This macro obtains a write lock on the specified lock object. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation.
- Write locks are incompatible with all other locks.
- Error Codes
- ---No value is returned.
Section 2.3.2.4: ObtainSharedLock _ Acquire a
shared lock
- void ObtainSharedLock(IN struct Lock *lock)
- Description
- This macro obtains a shared lock on the specified lock object. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation.
- Shared locks are incompatible with write and boosted shared locks, but are compatible with read locks.
- Error Codes
- ---No value is returned.
Section 2.3.2.5: ReleaseReadLock _ Release
read lock
- void ReleaseReadLock(IN struct Lock *lock)
- Description
- This macro releases the specified lock. The lock must have been previously read-locked. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation. The results are also unpredictable if the lock was not previously read-locked by the thread calling ReleaseReadLock().
- Error Codes
- ---No value is returned.
Section 2.3.2.6: ReleaseWriteLock _ Release
write lock
- void ReleaseWriteLock(IN struct Lock *lock)
- Description
- This macro releases the specified lock. The lock must have been previously write-locked. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation. The results are also unpredictable if the lock was not previously write-locked by the thread calling ReleaseWriteLock().
- Error Codes
- ---No value is returned.
Section 2.3.2.7: ReleaseSharedLock _ Release
shared lock
- void ReleaseSharedLock(IN struct Lock *lock)
- Description
- This macro releases the specified lock. The lock must have been previously share-locked. Since this is a macro and not a function call, results are not predictalbe if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation. The results are also unpredictable if the lock was not previously share-locked by the thread calling ReleaseSharedLock().
- Error Codes
- ---No value is returned.
Section 2.3.2.8: CheckLock _ Determine state
of a lock
- void CheckLock(IN struct Lock *lock)
- Description
- This macro produces an integer that specifies the status of the indicated lock. The value will be -1 if the lock is write-locked, 0 if unlocked, or otherwise a positive integer that indicates the number of readers (threads holding read locks). Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation.
- Error Codes
- ---No value is returned.
Section 2.3.2.9: BoostLock _ Boost a shared
lock
- void BoostLock(IN struct Lock *lock)
- Description
- This macro promotes ("boosts") a shared lock into a write lock. Such a boost operation guarantees that no other writer can get into the critical section in the process. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation.
- Error Codes
- ---No value is returned.
Section 2.3.2.10: UnboostLock _ Unboost a
shared lock
- void UnboostLock(IN struct Lock *lock)
- Description
- This macro demotes a boosted shared lock back down into a regular shared lock. Such an unboost operation guarantees that no other writer can get into the critical section in the process. Since this is a macro and not a function call, results are not predictable if the value of the lock parameter is a side-effect producing expression, as it will be evaluated multiple times in the course of the macro interpretation.
- Error Codes
- ---No value is returned.
Section 2.3.3: /usr/afs/etc/ThisCell
- This section covers the calling interfaces to the I/O management package.
Section: 2.3.3.1: IOMGR Initialize _
Initialize the package
- int IOMGR Initialize()
- Description
- This function initializes the IOMGR package. Its main task is to create the IOMGR thread itself, which runs at the lowest possible priority (0). The remainder of the lightweight processes must be running at priority 1 or greater (up to a maximum of LWP MAX PRIORITY (4)) for the IOMGR package to function correctly.
- Error Codes
- -1 The LWP and/or timer package haven't been initialized.
<misc> Any errors that may be returned by the LWP CreateProcess() routine.
Section 2.3.3.2: IOMGR finalize _ Clean up
the IOMGR facility
- int IOMGR finalize()
- Description
- This routine cleans up after the IOMGR package when it is no longer needed. It releases all storage and destroys the IOMGR thread itself.
- Error Codes
- <misc> Any errors that may be returned by the LWP DestroyProcess() routine.
Section 2.3.3.3: IOMGR Select _ Perform a
thread-level select()
- int IOMGR Select (IN int numfds; IN int *rfds; IN int *wfds; IN int *xfds; IN truct timeval *timeout)
- Description
- This routine performs an LWP version of unix select() operation. The parameters have the same meanings as with the unix call. However, the return values will be simplified (see below). If this is a polling select (i.e., the value of timeout is null), it is done and the IOMGR Select() function returns to the user with the results. Otherwise, the calling thread is put to sleep. If at some point the IOMGR thread is the only runnable process, it will awaken and collect all select requests. The IOMGR will then perform a single select and awaken the appropriate processes. This will force a return from the affected IOMGR Select() calls.
- Error Codes
- -1 An error occurred.
0 A timeout occurred.
1 Some number of file descriptors are ready.
Section 2.3.3.4: IOMGR Signal _ Associate
unix and LWP signals
- int IOMGR Signal(IN int signo; IN char *event)
- Description
- This function associates an LWP signal with a unix signal. After this call, when the given unix signal signo is delivered to the (heavyweight unix) process, the IOMGR thread will deliver an LWP signal to the event via LWP NoYieldSignal(). This wakes up any lightweight processes waiting on the event. Multiple deliveries of the signal may be coalesced into one LWP wakeup. The call to LWP NoYieldSignal() will happen synchronously. It is safe for an LWP to check for some condition and then go to sleep waiting for a unix signal without having to worry about delivery of the signal happening between the check and the call to LWP WaitProcess().
- Error Codes
- LWP EBADSIG The signo value is out of range.
LWP EBADEVENT The event pointer is null.
Section 2.3.3.5: IOMGR CancelSignal _ Cancel
unix and LWP signal association
- int IOMGR CancelSignal(IN int signo)
- Description
- This routine cancels the association between a unix signal and an LWP event. After calling this function, the unix signal signo will be handled however it was handled before the corresponding call to IOMGR Signal().
- Error Codes
- LWP EBADSIG The signo value is out of range.
Section 2.3.3.6: IOMGR Sleep _ Sleep for a
given period
- void IOMGR Sleep(IN unsigned seconds)
- Description
- This function calls IOMGR Select() with zero file descriptors and a timeout structure set up to cause the thread to sleep for the given number of seconds.
- Error Codes
- ---No value is returned.
Section 2.3.4: /usr/afs/local/BosConfig
- This section covers the calling interface to the timer package associated with the LWP facility.
Section 2.3.4.1: TM Init _ Initialize a timer
list
- int TM Init(IN struct TM Elem **list)
- Description
- This function causes the specified timer list to be initialized. TM Init() must be called before any other timer operations are applied to the list.
- Error Codes
- -1 A null timer list could not be produced.
Section 2.3.4.2: TM final _ Clean up a timer
list
- int TM final(IN struct TM Elem **list)
- Description
- This routine is called when the given empty timer list is no longer needed. All storage associated with the list is released.
- Error Codes
- -1 The list parameter is invalid.
Section 2.3.4.3: TM Insert _ Insert an object
into a timer list
- void TM Insert(IN struct TM Elem **list; IN struct TM Elem *elem)
- Description
- This routine enters an new element, elem, into the list denoted by list. Before the new element is queued, its TimeLeft field (the amount of time before the object comes due) is set to the value stored in its TotalTime field. In order to keep TimeLeft fields current, the TM Rescan() function may be used.
- Error Codes
- ---No return value is generated.
Section 2.3.4.4: TM Rescan _ Update all
timers in the list
- int TM Rescan(IN struct TM Elem *list)
- Description
- This function updates the TimeLeft fields of all timers on the given list. This is done by checking the time-of-day clock. Note: this is the only routine other than TM Init() that updates the TimeLeft field in the elements on the list.
- Instead of returning a value indicating success or failure, TM Rescan() returns the number of entries that were discovered to have timed out.
- Error Codes
- ---Instead of error codes, the number of entries that were discovered to have timed out is returned.
Section 2.3.4.5: TM GetExpired _ Returns an
expired timer
- struct TM Elem *TM GetExpired(IN struct TM Elem *list)
- Description
- This routine searches the specified timer list and returns a pointer to an expired timer element from that list. An expired timer is one whose TimeLeft field is less than or equal to zero. If there are no expired timers, a null element pointer is returned.
- Error Codes
- ---Instead of error codes, an expired timer pointer is returned, or a null timer pointer if there are no expired timer objects.
Section 2.3.4.6: TM GetEarliest _ Returns
earliest unexpired timer
- struct TM Elem *TM GetEarliest(IN struct TM Elem *list)
- Description
- This function returns a pointer to the timer element that will be next to expire on the given list. This is defined to be the timer element with the smallest (positive) TimeLeft field. If there are no timers on the list, or if they are all expired, this function will return a null pointer.
- Error Codes
- ---Instead of error codes, a pointer to the next timer element to expireis returned, or a null timer object pointer if they are all expired.
Section 2.3.4.7: TM eql _ Test for equality
of two timestamps
- bool TM eql(IN struct timemval *t1; IN struct timemval *t2)
- Description
- This function compares the given timestamps, t1 and t2, for equality. Note that the function return value, bool, has been set via typedef to be equivalent to unsigned char.
- Error Codes
- 0 If the two timestamps differ.
1 If the two timestamps are identical.
Section 2.3.5: /usr/afs/local/NoAuth
- This section covers the calling interface to the fast time package associated with the LWP facility.
Section 2.3.5.1: FT Init _ Initialize the
fast time package
- int FT Init(IN int printErrors; IN int notReally)
- Description
- This routine initializes the fast time package, mapping in the kernel page containing the time-of-day variable. The printErrors argument, if non-zero, will cause any errors in initalization to be printed to stderr. The notReally parameter specifies whether initialization is really to be done. Other calls in this package will do auto-initialization, and hence the option is offered here.
- Error Codes
- -1 Indicates that future calls to FT GetTimeOfDay() will still work, but will not be able to access the information directly, having to make a kernel call every time.
Section 2.3.5.2: FT GetTimeOfDay _ Initialize
the fast time package
- int FT GetTimeOfDay(IN struct timeval *tv; IN struct timezone *tz)
- Description
- This routine is meant to mimic the parameters and behavior of the unix gettimeofday() function. However, as implemented, it simply calls gettimeofday() and then does some bound-checking to make sure the value is reasonable.
- Error Codes
- <misc> Whatever value was returned by gettimeofday() internally.
Section 2.3.6: /usr/afs/etc/KeyFile
- This section covers the calling interface to the preemption package associated with the LWP facility.
Section 2.3.6.1: PRE InitPreempt _ Initialize
the preemption package
- int PRE InitPreempt(IN struct timeval *slice)
- Description
- This function must be called to initialize the preemption package. It must appear sometime after the call to LWP InitializeProcessSupport() and sometime before the first call to any other preemption routine. The slice argument specifies the time slice size to use. If the slice pointer is set to null in the call, then the default time slice, DEFAULTSLICE (10 milliseconds), will be used. This routine uses the unix interval timer and handling of the unix alarm signal, SIGALRM, to implement this timeslicing.
- Error Codes
- LWP EINIT The LWP package hasn't been initialized.
LWP ESYSTEM Operations on the signal vector or the interval timer have failed.
Section 2.3.6.2: PRE EndPreempt _ finalize
the preemption package
- int PRE EndPreempt()
- Description
- This routine finalizes use of the preemption package. No further preemptions will be made. Note that it is not necessary to make this call before exit. PRE EndPreempt() is provided only for those applications that wish to continue after turning off preemption.
- Error Codes
- LWP EINIT The LWP package hasn't been initialized.
LWP ESYSTEM Operations on the signal vector or the interval timer have failed.
Section 2.3.6.3: PRE PreemptMe _ Mark thread
as preemptible
- int PRE PreemptMe()
- Description
- This macro is used to signify the current thread as a candidate for preemption. The LWP InitializeProcessSupport() routine must have been called before PRE PreemptMe().
- Error Codes
- ---No return code is generated.
Section 2.3.6.4: PRE BeginCritical _ Enter
thread critical section
- int PRE BeginCritical()
- Description
- This macro places the current thread in a critical section. Upon return, and for as long as the thread is in the critical section, involuntary preemptions of this LWP will no longer occur.
- Error Codes
- ---No return code is generated.
Section 2.3.6.5: PRE EndCritical _ Exit
thread critical section
- int PRE EndCritical()
- Description
- This macro causes the executing thread to leave a critical section previously entered via PRE BeginCritical(). If involuntary preemptions were possible before the matching PRE BeginCritical(), they are once again possible.
- Error Codes
- ---No return code is generated.
Section 2.1: Bnodes
- The Volume Location Server allows AFS agents to query the location and basic status of volumes resident within the given cell. Volume Location Server functions may be invoked directly from authorized users via the vos utility.
- This chapter briefly discusses various aspects of the Volume Location Server's architecture. First, the need for volume location is examined, and the specific parties that call the Volume Location Server interface routines are identified. Then, the database maintained to provide volume location service, the Volume Location Database (VLDB), is examined. Finally, the vlserver process which implements the Volume Location Server is considered.
- As with all AFS servers, the Volume Location Server uses the Rx remote procedure call package for communication with its clients.
Section 2.2: BOS Server Directories
- The Cache Manager agent is the primary consumer of AFS volume location service, on which it is critically dependent for its own operation. The Cache Manager needs to map volume names or numerical identifiers to the set of File Servers on which its instances reside in order to satisfy the file system requests it is processing on behalf of it clients. Each time a Cache Manager encounters a mount point for which it does not have location information cached, it must acquire this information before the pathname resolution may be successfully completed. Once the File Server set is known for a particular volume, the Cache Manager may then select the proper site among them (e.g. choosing the single home for a read-write volume, or randomly selecting a site from a read-only volume's replication set) and begin addressing its file manipulation operations to that specific server.
- While the Cache Manager consults the volume location service, it is not capable of changing the location of volumes and hence modifying the information contained therein. This capability to perform acts which change volume location is concentrated within the Volume Server. The Volume Server process running on each server machine manages all volume operations affecting that platform, including creations, deletions, and movements between servers. It must update the volume location database every time it performs one of these actions.
- None of the other AFS system agents has a need to access the volume location database for its site. Surprisingly, this also applies to the File Server process. It is only aware of the specific set of volumes that reside on the set of physical disks directly attached to the machine on which they execute. It has no knowlege of the universe of volumes resident on other servers, either within its own cell or in foreign cells.
Section 2.3: BOS Server Files
- The Volume Location Database (VLDB) is used to allow AFS application programs to discover the location of any volume within its cell, along with select information about the nature and state of that volume. It is organized in a very straightforward fashion, and uses the ubik [4] [5] facility to to provide replication across multiple server sites.
Section 2.3.1: /usr/afs/etc/UserList
- The VLDB itself is a very simple structure, and synchronized copies may be maintained at two or more sites. Basically, each copy consists of header information, followed by a linear (yet unbounded) array of entries. There are several associated hash tables used to perform lookups into the VLDB. The first hash table looks up volume location information based on the volume's name. There are three other hash tables used for lookup, based on volume ID/type pairs, one for each possible volume type.
- The VLDB for a large site may grow to contain tens of thousands of entries, so some attempts were made to make each entry as small as possible. For example, server addresses within VLDB entries are represented as single-byte indicies into a table containing the full longword IP addresses.
- A free list is kept for deleted VLDB entries. The VLDB will not grow unless all the entries on the free list have been exhausted, keeping it as compact as possible.
Section 2.3.2: /usr/afs/etc/CellServDB
- The VLDB, along with other important AFS databases, may be replicated to multiple sites to improve its availability. The ubik replication package is used to implement this functionality for the VLDB. A full description of ubik and of the quorum completion algorithm it implements may be found in [4] and [5]. The basic abstraction provided by ubik is that of a disk file replicated to multiple server locations. One machine is considered to be the synchronization site, handling all write operations on the database file. Read operations may be directed to any of the active members of the quorum, namely a subset of the replication sites large enough to insure integrity across such failures as individual server crashes and network partitions. All of the quorum members participate in regular elections to determine the current synchronization site. The ubik algorithms allow server machines to enter and exit the quorum in an orderly and consistent fashion. All operations to one of these replicated "abstract files" are performed as part of a transaction. If all the related operations performed under a transaction are successful, then the transaction is committed, and the changes are made permanent. Otherwise, the transaction is aborted, and all of the operations for that transaction are undone.
Section 2.4: Restart Times
- The user-space vlserver process is in charge of providing volume location service for AFS clients. This program maintains the VLDB replica at its particular server, and cooperates with all other vlserver processes running in the given cell to propagate updates to the database. It implements the RPC interface defined in the vldbint.xg definition file for the rxgen RPC stub generator program. As part of its startup sequence, it must discover the VLDB version it has on its local disk, move to join the quorum of replication sites for the VLDB, and get the latest version if the one it came up with was out of date. Eventually, it will synchronize with the other VLDB replication sites, and it will begin accepting calls.
- The vlserver program uses at most three Rx worker threads to listen for incoming Volume Location Server calls. It has a single, optional command line argument. If the string "-noauth" appears when the program is invoked, then vlserver will run in an unauthenticated mode where any individual is considered authorized to perform any VLDB operation. This mode is necessary when first bootstrapping an AFS installation.
Section 3.1: Introduction
- This chapter documents the API for the BOS Server facility, as defined by the bosint.xg Rxgen interface file and the bnode.h include file. Descriptions of all the constants, structures, macros, and interface functions available to the application programmer appear in this chapter.
Section 3.2: Constants
- This section covers the basic constant definitions of interest to the BOS Server application programmer. These definitions appear in the bosint.h file, automatically generated from the bosint.xg Rxgen interface file. Another file is exported to the programmer, namely bnode.h.
- Each subsection is devoted to describing constants falling into each of the following categories:
- Status bits
- Bnode activity bits
- Bnode states
- Pruning server binaries
- Flag bits for struct bnode proc
- One constant of general utility is BOZO BSSIZE, which defines the length in characters of BOS Server character string buffers, including the trailing null. It is defined to be 256 characters.
Section 3.2.1: Status Bits
- The following bit values are used in the flags field of struct bozo status, as defined in Section 3.3.4. They record whether or not the associated bnode process currently has a stored core file, whether the bnode execution was stopped because of an excessive number of errors, and whether the mode bits on server binary directories are incorrect.
- Name
- BOZO HASCORE
- Value
- 1
- Description
- Does this bnode have a stored core file?
- Name
- BOZO ERRORSTOP
- Value
- 2
- Description
- Was this bnode execution shut down because of an excessive number of errors (more than 10 in a 10 second period)?
- Name
- BOZO BADDIRACCESS
- Value
- 3
- Description
- Are the mode bits on the /usr/afs directory and its descendants (etc, bin, logs, backup, db, local, etc/KeyFile, etc/UserList) correctly set?
Section 3.2.2: Bnode Activity Bits
- This section describes the legal values for the bit positions within the flags field of struct bnode, as defined in Section 3.3.8. They specify conditions related to the basic activity of the bnode and of the entities relying on it.
- Name
- BNODE NEEDTIMEOUT
- Value
- 0x01
- Description
- This bnode is utilizing the timeout mechanism for invoking actions on its behalf.
- Name
- BNODE ACTIVE
- Value
- 0x02
- Description
- The given bnode is in active service.
- Name
- BNODE WAIT
- Value
- 0x04
- Description
- Someone is waiting for a status change in this bnode.
- Name
- BNODE DELETE
- Value
- 0x08
- Description
- This bnode should be deleted at the earliest convenience.
- Name
- BNODE ERRORSTOP
- Value
- 0x10
- Description
- This bnode decommissioned because of an excessive number of errors in its associated unix processes.
Section 3.2.3: Bnode States
- The constants defined in this section are used as values within the goal and fileGoal fields within a struct bnode. They specify either the current state of the associated bnode, or the anticipated state. In particular, the fileGoal field, which is the value stored on disk for the bnode, always represents the desired state of the bnode, whether or not it properly reflects the current state. For this reason, only BSTAT SHUTDOWN and BSTAT NORMAL may be used within the fileGoal field. The goal field may take on any of these values, and accurately reflects the current status of the bnode.
- Name
- BSTAT SHUTDOWN
- Value
- 0
- Description
- The bnode's execution has been (should be) terminated.
- Name
- BSTAT NORMAL
- Value
- 1
- Description
- The bnode is (should be) executing normally.
- Name
- BSTAT SHUTTINGDOWN
- Value
- 2
- Description
- The bnode is currently being shutdown; execution has not yet ceased.
- Name
- BSTAT STARTINGUP
- Value
- 3
- Description
- The bnode execution is currently being commenced; execution has not yet begun.
Section 3.2.4: Pruning Server Binaries
- The BOZO Prune() interface function, fully defined in Section 3.6.6.4, allows a properly-authenticated caller to remove ("prune") old copies of server binaries and core files managed by the BOS Server. This section identifies the legal values for the flags argument to the above function, specifying exactly what is to be pruned.
- Name
- BOZO PRUNEOLD
- Value
- 1
- Description
- Prune all server binaries with the *.OLD extension.
- Name
- BOZO PRUNEBAK
- Value
- 2
- Description
- Prune all server binaries with the *.BAK extension.
- Name
- BOZO PRUNECORE
- Value
- 3
- Description
- Prune core files.
Section 3.2.5: Flag Bits for struct bnode proc
- This section specifies the acceptable bit values for the flags field in the struct bnode proc structure, as defined in Section 3.3.9. Basically, they are used to record whether or not the unix binary associated with the bnode has ever been run, and if so whether it has ever exited.
- Name
- BPROC STARTED
- Value
- 1
- Description
- Has the associated unix process ever been started up?
- Name
- BPROC EXITED
- Value
- 2
- Description
- Has the associated unix process ever exited?
Section 3.3: Structures
- This section describes the major exported BOS Server data structures of interest to application programmers.
Section 3.3.1: struct bozo netKTime
- This structure is used to communicate time values to and from the BOS Server. In particular, the BOZO GetRestartTime() and BOZO SetRestartTime() interface functions, described in Sections 3.6.2.5 and 3.6.2.6 respectively, use parameters declared to be of this type.
- Four of the fields in this structure specify the hour, minute, second, and day of the event in question. The first field in the layout serves as a mask, identifying which of the above four fields are to be considered when matching the specified time to a given reference time (most often the current time). The bit values that may be used for the mask field are defined in the afs/ktime.h include file. For convenience, their values are reproduced here, including some special cases at the end of the table.
- Name
- KTIME HOUR
- Value
- 0x01
- Description
- Hour should match.
- Name
- KTIME MIN
- Value
- 0x02
- Description
- Minute should match.
- Name
- KTIME SEC
- Value
- 0x04
- Description
- Second should match.
- Name
- KTIME DAY
- Value
- 0x08
- Description
- Day should match.
- Name
- KTIME TIME
- Value
- 0x07
- Description
- All times should match.
- Name
- KTIME NEVER
- Value
- 0x10
- Description
- Special case: never matches.
- Name
- KTIME NOW
- Value
- 0x20
- Description
- Special case: right now.
Fields
- int mask - A field of bit values used to specify which of the following field are to be used in computing matches.
- short hour - The hour, ranging in value from 0 to 23.
- short min - The minute, ranging in value from 0 to 59.
- short sec - The second, ranging in value from 0 to 59.
- short day - Zero specifies Sunday, other days follow in order.
Section 3.3.2: struct bozo key
- This structure defines the format of an AFS encryption key, as stored in the key file located at /usr/afs/etc/KeyFile at the host on which the BOS Server runs. It is used in the argument list of the BOZO ListKeys() and BOZO AddKeys() interface functions, as described in Sections 3.6.4.4 and 3.6.4.5 respectively.
Fields
- char data[8] - The array of 8 characters representing an encryption key.
Section 3.3.3: struct bozo keyInfo
- This structure defines the information kept regarding a given AFS encryption key, as represented by a variable of type struct bozo key, as described in Section 3.3.2 above. A parameter of this type is used by the BOZO ListKeys() function (described in Section 3.6.4.4). It contains fields holding the associated key's modification time, a checksum on the key, and an unused longword field. Note that the mod sec time field listed below is a standard unix time value.
Fields
- long mod sec - The time in seconds when the associated key was last modified.
- long mod usec - The number of microseconds elapsed since the second reported in the mod sec field. This field is never set by the BOS Server, and should always contain a zero.
- unsigned long keyCheckSum - The 32-bit cryptographic checksum of the associated key. A block of zeros is encrypted, and the first four bytes of the result are placed into this field.
- long spare2 - This longword field is currently unused, and is reserved for future use.
Section 3.3.4: struct bozo status
- This structure defines the layout of the information returned by the status parameter for the interface function BOZO GetInstanceInfo(), as defined in Section 3.6.2.3. The enclosed fields include such information as the temporary and long-term goals for the process instance, an array of bit values recording status information, start and exit times, and associated error codes and signals.
Fields
- long goal - The short-term goal for a process instance. Settings for this field are BSTAT SHUTDOWN, BSTAT NORMAL, BSTAT SHUTTINGDOWN, and BSTAT STARTINGUP. These values are fully defined in Section 3.2.3.
- long fileGoal - The long-term goal for a process instance. Accepted settings are restricted to a subset of those used by the goal field above, as explained in Section 3.2.3.
- long procStartTime - The last time the given process instance was started.
- long procStarts - The number of process starts executed on the behalf of the given bnode.
- long lastAnyExit - The last time the process instance exited for any reason.
- long lastErrorExit - The last time a process exited unexpectedly.
- long errorCode - The last exit's return code.
- long errorSignal - The last signal terminating the process.
- long flags - BOZO HASCORE, BOZO ERRORSTOP, and BOZO BADDIRACCESS. These constants are fully defined in Section 3.2.1.
- long spare[] - Eight longword spares, currently unassigned and reserved for future use.
Section 3.3.5: struct bnode ops
- This struture defines the base set of operations that each BOS Server bnode type (struct bnode type, see Section 3.3.6 below) must implement. They are called at the appropriate times within the BOS Server code via the BOP * macros (see Section 3.5 and the individual descriptions therein). They allow each bnode type to define its own behavior in response to its particular needs.
Fields
- struct bnode *(*create)() - This function is called whenever a bnode of the given type is created. Typically, this function will create bnode structures peculiar to its own type and initialize the new records. Each type implementation may take a different number of parameters. Note: there is no BOP macro defined for this particular function; it is always called directly.
- int (*timeout)() - This function is called whenever a timeout action must be taken for this bnode type. It takes a single argument, namely a pointer to a type-specific bnode structure. The BOP TIMEOUT macro is defined to simplify the construction of a call to this function.
- int (*getstat)() - This function is called whenever a caller is attempting to get status information concerning a bnode of the given type. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a pointer to a longword in which the desired status value will be placed. The BOP GETSTAT macro is defined to simplify the construction of a call to this function.
- int (*setstat)() - This function is called whenever a caller is attempting to set the status information concerning a bnode of the given type. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a longword from which the new status value is obtained. The BOP SETSTAT macro is defined to simplify the construction of a call to this function.
- int (*delete)() - This function is called whenever a bnode of this type is being deleted. It is expected that the proper deallocation and cleanup steps will be performed here. It takes a single argument, a pointer to a type-specific bnode structure. The BOP DELETE macro is defined to simplify the construction of a call to this function.
- int (*procexit)() - This function is called whenever the unix process implementing the given bnode exits. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a pointer to the struct bnode proc (defined in Section 3.3.9), describing that process in detail. The BOP PROCEXIT macro is defined to simplify the construction of a call to this function.
- int (*getstring)() - This function is called whenever the status string for the given bnode must be fetched. It takes three parameters. The first is a pointer to a type-specific bnode structure, the second is a pointer to a character buffer, and the third is a longword specifying the size, in bytes, of the above buffer. The BOP GETSTRING macro is defined to simplify the construction of a call to this function.
- int (*getparm)() - This function is called whenever a particular parameter string for the given bnode must be fetched. It takes four parameters. The first is a pointer to a type-specific bnode structure, the second is a longword identifying the index of the desired parameter string, the third is a pointer to a character buffer to receive the parameter string, and the fourth and final argument is a longword specifying the size, in bytes, of the above buffer. The BOP GETPARM macro is defined to simplify the construction of a call to this function.
- int (*restartp)() - This function is called whenever the unix process implementing the bnode of this type is being restarted. It is expected that the stored process command line will be parsed in preparation for the coming execution. It takes a single argument, a pointer to a type-specific bnode structure from which the command line can be located. The BOP RESTARTP macro is defined to simplify the construction of a call to this function.
- int (*hascore)() - This function is called whenever it must be determined if the attached process currently has a stored core file. It takes a single argument, a pointer to a type-specific bnode structure from which the name of the core file may be constructed. The BOP HASCORE macro is defined to simplify the construction of a call to this function.
Section 3.3.6: struct bnode type
- This structure encapsulates the defining characteristics for a given bnode type. Bnode types are placed on a singly-linked list within the BOS Server, and are identified by a null-terminated character string name. They also contain the function array defined in Section 3.3.5, that implements the behavior of that object type. There are three predefined bnode types known to the BOS Server. Their names are simple, fs, and cron. It is not currently possible to dynamically define and install new BOS Server types.
Fields
- struct bnode type *next - Pointer to the next bnode type definition structure in the list.
- char *name - The null-terminated string name by which this bnode type is identified.
- bnode ops *ops - The function array that defines the behavior of this given bnode type.
Section 3.3.7: struct bnode token
- This structure is used internally by the BOS Server when parsing the command lines with which it will start up process instances. This structure is made externally visible should more types of bnode types be implemented.
Fields
- struct bnode token *next - The next token structure queued to the list.
- char *key - A pointer to the token, or parsed character string, associated with this entry.
Section 3.3.8: struct bnode
- This structure defines the essence of a BOS Server process instance. It contains such important information as the identifying string name, numbers concerning periodic execution on its behalf, the bnode's type, data on start and error behavior, a reference count used for garbage collection, and a set of flag bits.
Fields
- char *name - The null-terminated character string providing the instance name associated with this bnode.
- long nextTimeout - The next time this bnode should be awakened. At the specified time, the bnode's flags field will be examined to see if BNODE NEEDTIMEOUT is set. If so, its timeout() operation will be invoked via the BOP TIMEOUT() macro. This field will then be reset to the current time plus the value kept in the period field.
- long period - This field specifies the time period between timeout calls. It is only used by processes that need to have periodic activity performed.
- long rsTime - The time that the BOS Server started counting restarts for this process instance.
- long rsCount - The count of the number of restarts since the time recorded in the rsTime field.
- struct bnode type *type - The type object defining this bnode's behavior.
- struct bnode ops *ops - This field is a pointer to the function array defining this bnode's basic behavior. Note that this is identical to the value of type->ops.
- This pointer is duplicated here for convenience. All of the BOP * macros, discussed in Section 3.5, reference the bnode's operation array through this pointer.
- long procStartTime - The last time this process instance was started (executed).
- long procStarts - The number of starts (executions) for this process instance.
- long lastAnyExit - The last time this process instance exited for any reason.
- long lastErrorExit - The last time this process instance exited unexpectedly.
- long errorCode - The last exit return code for this process instance.
- long errorSignal - The last signal that terminated this process instance.
- char *lastErrorName - The name of the last core file generated.
- short refCount - A reference count maintained for this bnode.
- short flags - This field contains a set of bit fields that identify additional status information for the given bnode. The meanings of the legal bit values, explained in Section 3.2.2, are: BOZO NEEDTIMEOUT, BOZO ACTIVE, BOZO WAIT, BOZO DELETE, and BOZO ERRORSTOP.
- char goal - The current goal for the process instance. It may take on any of the values defined in Section 3.2.3, namely BSTAT SHUTDOWN, BSTAT NORMAL, BSTAT SHUTTINGDOWN, and BSTAT STARTINGUP.
- This goal may be changed at will by an authorized caller. Such changes affect the current status of the process instance. See the description of the BOZO SetStatus() and BOZO SetTStatus() interface functions, defined in Sections 3.6.3.1 and 3.6.3.2 respectively, for more details.
- char fileGoal - This field is similar to goal, but represents the goal stored in the on-file BOS Server description of this process instance. As with the goal field, see functions the description of the BOZO SetStatus() and BOZO SetTStatus() interface functions defined in Sections 3.6.3.1 and 3.6.3.2 respectively for more details.
Section 3.3.9: struct bnode proc
- This structure defines all of the information known about each unix process the BOS Server is currently managing. It contains a reference to the bnode defining the process, along with the command line to be used to start the process, the optional core file name, the unix pid, and such things as a flag field to keep additional state information. The BOS Server keeps these records on a global singly-linked list.
Fields
- struct bnode proc *next - A pointer to the BOS Server's next process record.
- struct bnode *bnode - A pointer to the bnode creating and defining this unix process.
- char *comLine - The text of the command line used to start this process.
- char *coreName - An optional core file component name for this process.
- long pid - The unix pid, if successfully created.
- long lastExit - This field keeps the last termination code for this process.
- long lastSignal - The last signal used to kill this process.
- long flags - A set of bits providing additional process state. These bits are fully defined in Section 3.2.5, and are: BPROC STARTED and BPROC EXITED.
Section 3.4: Error Codes
- This section covers the set of error codes exported by the BOS Server, displaying the printable phrases with which they are associated.
- Name
- BZNOTACTIVE
- Value
- (39424L)
- Description
- process not active.
- Name
- BZNOENT
- Value
- (39425L)
- Description
- no such entity.
- Name
- BZBUSY
- Value
- (38426L)
- Description
- can't do operation now.
- Name
- BZEXISTS
- Value
- (29427L)
- Description
- entity already exists.
- Name
- BZNOCREATE
- Value
- (39428)
- Description
- failed to create entity.
- Name
- BZDOM
- Value
- (39429L)
- Description
- index out of range.
- Name
- BZACCESS
- Value
- (39430L)
- Description
- you are not authorized for this operation.
- Name
- BZSYNTAX
- Value
- (39431L)
- Description
- syntax error in create parameter.
- Name
- BZIO
- Value
- (39432L)
- Description
- I/O error.
- Name
- BZNET
- Value
- (39433L)
- Description
- network problem.
- Name
- BZBADTYPE
- Value
- (39434L)
- Description
- unrecognized bnode type.
Section 3.5: Macros
- The BOS Server defines a set of macros that are externally visible via the bnode.h file. They are used to facilitate the invocation of the members of the struct bnode ops function array, which defines the basic operations for a given bnode type. Invocations appear throughout the BOS Server code, wherever bnode type-specific operations are required. Note that the only member of the struct bnode ops function array that does not have a corresponding invocation macro defined is create(), which is always called directly.
Section 3.5.1: BOP TIMEOUT()
#define BOP_TIMEOUT(bnode) \
((*(bnode)->ops->timeout)((bnode)))
- Execute the bnode type-specific actions required when a timeout action must be taken. This macro takes a single argument, namely a pointer to a type-specific bnode structure.
Section 3.5.2: BOP GETSTAT()
#define BOP_GETSTAT(bnode, a) \
((*(bnode)->ops->getstat)((bnode),(a)))
- Execute the bnode type-specific actions required when a caller is attempting to get status information concerning the bnode. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a pointer to a longword in which the desired status value will be placed.
Section 3.5.3: BOP SETSTAT()
#define BOP_SETSTAT(bnode, a) \
((*(bnode)->ops->setstat)((bnode),(a)))
- Execute the bnode type-specific actions required when a caller is attempting to set the status information concerning the bnode. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a longword from which the new status value is obtained.
Section 3.5.4: BOP DELETE()
#define BOP_DELETE(bnode) \
((*(bnode)->ops->delete)((bnode)))
- Execute the bnode type-specific actions required when a bnode is deleted. This macro takes a single argument, namely a pointer to a type-specific bnode structure.
Section 3.5.5: BOP PROCEXIT()
#define BOP_PROCEXIT(bnode, a) \
((*(bnode)->ops->procexit)((bnode),(a)))
- Execute the bnode type-specific actions required whenever the unix process implementing the given bnode exits. It takes two parameters, the first being a pointer to a type-specific bnode structure, and the second being a pointer to the struct bnode proc (defined in Section 3.3.9), describing that process in detail.
Section 3.5.6: BOP GETSTRING()
#define BOP_GETSTRING(bnode, a, b) \
((*(bnode)->ops->getstring)((bnode),(a), (b)))
- Execute the bnode type-specific actions required when the status string for the given bnode must be fetched. It takes three parameters. The first is a pointer to a type-specific bnode structure, the second is a pointer to a character buffer, and the third is a longword specifying the size, in bytes, of the above buffer.
Section 3.5.7: BOP GETPARM()
#define BOP_GETPARM(bnode, n, b, l) \
((*(bnode)->ops->getparm)((bnode),(n),(b),(l)))
- Execute the bnode type-specific actions required when a particular parameter string for the given bnode must be fetched. It takes four parameters. The first is a pointer to a type-specific bnode structure, the second is a longword identifying the index of the desired parameter string, the third is a pointer to a character buffer to receive the parameter string, and the fourth and final argument is a longword specifying the size, in bytes, of the above buffer.
Section 3.5.8: BOP RESTARTP()
#define BOP_RESTARTP(bnode) \
((*(bnode)->ops->restartp)((bnode)))
- Execute the bnode type-specific actions required when the unix process implementing the bnode of this type is restarted. It is expected that the stored process command line will be parsed in preparation for the coming execution. It takes a single argument, a pointer to a type-specific bnode structure from which the command line can be located.
Section 3.5.9: BOP HASCORE()
#define BOP_HASCORE(bnode) ((*(bnode)->ops->hascore)((bnode)))
- Execute the bnode type-specific actions required when it must be determined whether or not the attached process currently has a stored core file. It takes a single argument, a pointer to a type-specific bnode structure from which the name of the core file may be constructed.
Section 3.6: Functions
- This section covers the BOS Server RPC interface routines. They are generated from the bosint.xg Rxgen file. At a high level, these functions may be seen as belonging to seven basic classes:
- Creating and removing process entries
- Examining process information
- Starting, stopping, and restarting processes
- Security configuration
- Cell configuration
- Installing/uninstalling server binaries
- Executing commands at the server
- The following is a summary of the interface functions and their purpose, divided according to the above classifications:
- Creating & Removing Process Entries
- Function Name
- BOZO CreateBnode()
- Description
- Create a process instance.
- Function Name
- BOZO DeleteBnode()
- Description
- Delete a process instance.
- Examining Process Information
- Function Name
- BOZO GetStatus()
- Description
- Get status information for the given process instance.
- Function Name
- BOZO EnumerateInstance()
- Description
- Get instance name from the i'th bnode.
- Function Name
- BOZO GetInstanceInfo()
- Description
- Get information on the given process instance.
- Function Name
- BOZO GetInstanceParm()
- Description
- Get text of command line associated with the given process instance.
- Function Name
- BOZO GetRestartTime()
- Description
- Get one of the BOS Server restart times.
- Function Name
- BOZO SetRestartTime()
- Description
- Set one of the BOS Server restart times.
- Function Name
- BOZOGetDates()
- Description
- Get the modification times for versions of a server binary file.
- Function Name
- StartBOZO GetLog()
- Description
- Pass the IN params when fetching a BOS Server log file.
- Function Name
- EndBOZO GetLog()
- Description
- Get the OUT params when fetching a BOS Server log file.
- Function Name
- GetInstanceStrings()
- Description
- Get strings related to a given process instance.
- Starting, Stopping & Restarting Processes
- Function Name
- BOZO SetStatus()
- Description
- Set process instance status and goal.
- Function Name
- BOZO SetTStatus()
- Description
- Start all existing process instances.
- Function Name
- BOZO StartupAll()
- Description
- Start all existing process instances.
- Function Name
- BOZO ShutdownAll()
- Description
- Shut down all process instances.
- Function Name
- BOZO RestartAll()
- Description
- Shut down, the restarted all process instances.
- Function Name
- BOZO ReBozo()
- Description
- Shut down, then restart all process instances and the BOS Server itself.
- Function Name
- BOZO Restart()
- Description
- Restart a given isntance.
- Function Name
- BOZO WaitAll()
- Description
- Wait until all process instances have reached their goals.
- Security Configuration
- Function Name
- BOZO AddSUser()
- Description
- Add a user to the UserList.
- Function Name
- BOZO DeleteSUser()
- Description
- Delete a user from the UserList.
- Function Name
- BOZO ListSUsers()
- Description
- Get the name of the user in a given position in the UserList file.
- Function Name
- BOZO ListKeys()
- Description
- List info about the key at a given index in the key file.
- Function Name
- BOZO AddKey()
- Description
- Add a key to the key file.
- Function Name
- BOZO DeleteKey()
- Description
- Delete the entry for an AFS key.
- Function Name
- BOZO SetNoAuthFlag()
- Description
- Enable or disable authenticated call requirements.
- Cell Configuration
- Function Name
- BOZO GetCellName()
- Description
- Get the name of the cell to which the BOS Server belongs.
- Function Name
- BOZO SetCellName()
- Description
- Set the name of the cell to which the BOS Server belongs.
- Function Name
- BOZO GetCellHost()
- Description
- Get the name of a database host given its index.
- Function Name
- BOZO AddCellHost()
- Description
- Add an entry to the list of database server hosts.
- Function Name
- BOZO DeleteCellHost()
- Description
- Delete an entry from the list of database server hosts.
- Installing/Uninstalling Server Binaries
- Function Name
- StartBOZO Install()
- Description
- Pass the IN params when installing a server binary.
- Function Name
- EndBOZO Install()
- Description
- Get the OUT params when installing a server binary.
- Function Name
- BOZO UnInstall()
- Description
- Roll back from a server binary installation.
- Function Name
- BOZO Prune()
- Description
- Throw away old versions of server binaries and core files.
- Executing Commands at the Server
- Function Name
- BOZO Exec()
- Description
- Execute a shell command at the server.
- All of the string parameters in these functions are expected to point to character buffers that are at least BOZO BSSIZE long.
Section 3.6.1: Creating and Removing Processes
- The two interface routines defined in this section are used for creating and deleting bnodes, thus determining which processe instances the BOS Server must manage.
Section 3.6.1.1: BOZO CreateBnode - Create a
process instance
int BOZO CreateBnode(IN struct rx connection *z conn,
IN char *type,
IN char *instance,
IN char *p1,
IN char *p2,
IN char *p3,
IN char *p4,
IN char *p5,
IN char *p6)
- Description
- This interface function allows the caller to create a bnode (process instance) on the server machine executing the routine.
- The instance's type is declared to be the string referenced in the type argument. There are three supported instance type names, namely simple, fs, and cron (see Section 2.1 for a detailed examination of the types of bnodes available).
- The bnode's name is specified via the instance parameter. Any name may be chosen for a BOS Server instance. However, it is advisable to choose a name related to the name of the actual binary being instantiated. There are eight well-known names already in common use, corresponding to the ASF system agents. They are as follows:
- kaserver for the Authentication Server.
- runntp for the Network Time Protocol Daemon (ntpd).
- ptserver for the Protection Server.
- upclient for the client portion of the UpdateServer, which brings over binary files from /usr/afs/bin directory and configuration files from /usr/afs/etc directory on the system control machine.
- upclientbin for the client portion of the UpdateServer, which uses the /usr/afs/bin directory on the binary distribution machine for this platform's CPU/operating system type.
- upclientetc for the client portion of the UpdateServer, which references the /usr/afs/etc directory on the system control machine.
- upserver for the server portion of the UpdateServer.
- vlserver for the Volume Location Server.
- Up to six command-line strings may be communicated in this routine, residing in arguments p1 through p6. Different types of bnodes allow for different numbers of actual server processes to be started, and the command lines required for such instantiation are passed in this manner.
- The given bnode's setstat() routine from its individual ops array will be called in the course of this execution via the BOP SETSTAT() macro.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to create new instances. If successfully created, the new BOS Server instance will be appended to the BosConfig file kept on the machine's local disk. The UserList and BosConfig files are examined in detail in Sections 2.3.1 and 2.3.4 respectively.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZEXISTS The given instance already exists.
BZBADTYPE Illegal value provided in the type parameter.
BZNOCREATE Failed to create desired entry.
Section 3.6.1.2: BOZO DeleteBnode - Delete a
process instance
int BOZO DeleteBnode(IN struct rx connection *z conn, IN char *instance)
- Description
- This routine deletes the BOS Server bnode whose name is specified by the instance parameter. If an instance with that name does not exist, this operation fails. Similarly, if the process or processes associated with the given bnode have not been shut down (see the descriptions for the BOZO ShutdownAll() and BOZO ShutdownAll() interface functions), the operation also fails.
- The given bnode's setstat() and delete() routines from its individual ops array will be called in the course of this execution via the BOP SETSTAT() and BOP DELETE() macros.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to delete existing instances. If successfully deleted, the old BOS Server instance will be removed from the BosConfig file kept on the machine's local disk.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZNOENT The given instance name not registered with the BOS Server.
BZBUSY The process(es) associated with the given instance are still active (i.e., a shutdown has not yet been performed or has not yet completed).
Section 3.6.2: Examining Process Information
- This section describes the ten interface functions that collectively allow callers to obtain and modify the information stored by the BOS Server to describe the set of process that it manages. Among the operations supported by the functions examined here are getting and setting status information, obtaining the instance parameters, times, and dates, and getting the text of log files on the server machine
Section 3.6.2.1: BOZO GetStatus - Get status
information for the given process instance
int BOZO GetStatus(IN struct rx connection *z conn,
IN char *instance,
OUT long *intStat,
OUT char **statdescr)
- Description
- This interface function looks up the bnode for the given process instance and places its numerical status indicator into intStat and its status string (if any) into a buffer referenced by statdescr.
- The set of values that may be returned in the intStat argument are defined fully in Section 3.2.3. Briefly, they are BSTAT STARTINGUP, BSTAT NORMAL, BSTAT SHUTTINGDOWN, and BSTAT SHUTDOWN.
- A buffer holding BOZO BSSIZE (256) characters is allocated, and statdescr is set to point to it. Not all bnodes types implement status strings, which are used to provide additional status information for the class. An example of one bnode type that does define these strings is fs, which exports the following status strings:
- "file server running"
- "file server up; volser down"
- "salvaging file system"
- "starting file server"
- "file server shutting down"
- "salvager shutting down"
- "file server shut down"
- The given bnode's getstat() routine from its individual ops array will be called in the course of this execution via the BOP GETSTAT() macro.
- Error Codes
- BZNOENT The given process instance is not registered with the BOS Server.
Section 3.6.2.2: BOZO EnumerateInstance - Get
instance name from i'th bnode
int BOZO EnumerateInstance(IN struct rx connection *z conn,
IN long instance,
OUT char **iname);
- Description
- This routine will find the bnode describing process instance number instance and return that instance's name in the buffer to which the iname parameter points. This function is meant to be used to enumerate all process instances at a BOS Server. The first legal instance number value is zero, which will return the instance name from the first registered bnode. Successive values for instance will return information from successive bnodes. When all bnodes have been thus enumerated, the BOZO EnumerateInstance() function will return BZDOM, indicating that the list of bnodes has been exhausted.
- Error Codes
- BZDOM The instance number indicated in the instance parameter does not exist.
Section 3.6.2.3: BOZO GetInstanceInfo - Get
information on the given process instance
int BOZO GetInstanceInfo(IN struct rx connection *z conn,
IN char *instance,
OUT char **type,
OUT struct bozo status *status)
- Description
- Given the string name of a BOS Server instance, this interface function returns the type of the instance and its associated status descriptor. The set of values that may be placed into the type parameter are simple, fs, and cron (see Section 2.1 for a detailed examination of the types of bnodes available). The status structure filled in by the call includes such information as the goal and file goals, the process start time, the number of times the process has started, exit information, and whether or not the process has a core file.
- Error Codes
- BZNOENT The given process instance is not registered with the BOS Server.
Section 3.6.2.4: BOZO GetInstanceParm - Get
text of command line associated with the given process instance
int BOZO GetInstanceParm(IN struct rx connection *z conn,
IN char *instance,
IN long num,
OUT char **parm)
- Description
- Given the string name of a BOS Server process instance and an index identifying the associated command line of interest, this routine returns the text of the desired command line. The first associated command line text for the instance may be acquired by setting the index parameter, num, to zero. If an index is specified for which there is no matching command line stored in the bnode, then the function returns BZDOM.
- Error Codes
- BZNOENT The given process instance is not registered with the BOS Server.
BZDOM There is no command line text associated with index num for this bnode.
Section 3.6.2.5: BOZO GetRestartTime - Get
one of the BOS Server restart times
int BOZO GetRestartTime(IN struct rx connection *z conn,
IN long type,
OUT struct bozo netKTime *restartTime)
- Description
- The BOS Server maintains two different restart times, for itself and all server processes it manages, as described in Section 2.4. Given which one of the two types of restart time is desired, this routine fetches the information from the BOS Server. The type argument is used to specify the exact restart time to fetch. If type is set to one (1), then the general restart time for all agents on the machine is fetched. If type is set to two (2), then the new-binary restart time is returned. A value other than these two for the type parameter results in a return value of BZDOM.
- Error Codes
- BZDOM All illegal value was passed in via the type parameter.
Section 3.6.2.6: BOZO SetRestartTime - Set
one of the BOS Server restart times
int BOZO SetRestartTime(IN struct rx connection *z conn,
IN long type,
IN struct bozo netKTime *restartTime)
- Description
- This function is the inverse of the BOZO GetRestartTime() interface routine described in Section 3.6.2.5 above. Given the type of restart time and its new value, this routine will set the desired restart time at the BOS Server receiving this call. The values for the type parameter are identical to those used by BOZO GetRestartTime(), namely one (1) for the general restart time and two (2) for the new-binary restart time.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to set its restart times.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZDOM All illegal value was passed in via the type parameter.
Section 3.6.2.7: BOZO GetDates - Get the
modification times for versions of a server binary file
int BOZO GetDates(IN struct rx connection *z conn,
IN char *path,
OUT long *newtime,
OUT long *baktime,
OUT long *oldtime)
- Description
- Given a fully-qualified pathname identifying the particular server binary to examine in the path argument, this interface routine returns the modification time of that file, along with the modification times for the intermediate (.BAK) and old (.OLD) versions. The above-mentioned times are deposited into the newtime, baktime and oldtime arguments. Any one or all of the reported times may be set to zero, indicating that the associated file does not exist.
- Error Codes
- ---None.
Section 3.6.2.8: StartBOZO GetLog - Pass the
IN params when fetching a BOS Server log file
int BOZO StartGetLog(IN struct rx connection *z conn, IN char *name)
- Description
- The BOZO GetLog() function defined in the BOS Server Rxgen interface file is used to acquire the contents of the given log file from the machine processing the call. It is defined to be a streamed function, namely one that can return an arbitrary amount of data. For full details on the definition and use of streamed functions, please refer to the Streamed Function Calls section in [4].
- This function is created by Rxgen in response to the BOZO GetLog() interface definition in the bosint.xg file. The StartBOZO GetLog() routine handles passing the IN parameters of the streamed call to the BOS Server. Specifically, the name parameter is used to convey the string name of the desired log file. For the purposes of opening the specified files at the machine being contacted, the current working directory for the BOS Server is considered to be /usr/afs/logs. If the caller is included in the locally-maintained UserList file, any pathname may be specified in the name parameter, and the contents of the given file will be fetched. All other callers must provide a string that does not include the slash character, as it might be used to construct an unauthorized request for a file outside the /usr/afs/logs directory.
- Error Codes
- RXGEN CC MARSHAL The transmission of the GetLog() IN parameters failed. This and all rxgen constant definitions are available from the rxgen consts.h include file.
Section 3.6.2.9: EndBOZO GetLog - Get the OUT
params when fetching a BOS Server log file
int BOZO EndGetLog(IN struct rx connection *z conn)
- Description
- This function is created by Rxgen in response to the BOZO GetLog() interface definition in the bosint.xg file. The EndBOZO GetLog() routine handles the recovery of the OUT parameters for this interface call (of which there are none). The utility of such functions is often the value they return. In this case, however, EndBOZO GetLog() always returns success. Thus, it is not even necessary to invoke this particular function, as it is basically a no-op.
- Error Codes
- ---Always returns successfully.
Section 3.6.2.10: BOZO GetInstanceStrings -
Get strings related to a given process instance
int BOZO GetInstanceStrings(IN struct rx connection *z conn,
IN char *instance,
OUT char **errorName,
OUT char **spare1,
OUT char **spare2,
OUT char **spare3)
- Description
- This interface function takes the string name of a BOS Server instance and returns a set of strings associated with it. At the current time, there is only one string of interest returned by this routine. Specifically, the errorName parameter is set to the error string associated with the bnode, if any. The other arguments, spare1 through spare3, are set to the null string. Note that memory is allocated for all of the OUT parameters, so the caller should be careful to free them once it is done.
- Error Codes
- BZNOENT The given process instance is not registered with the BOS Server.
Section 3.6.3: Starting, Stopping, and Restarting
Processes
- The eight interface functions described in this section allow BOS Server clients to manipulate the execution of the process instances the BOS Server controls.
Section 3.6.3.1: BOZO SetStatus - Set process
instance status and goal
int BOZO SetStatus(IN struct rx connection *z conn,
IN char *instance,
IN long status)
- Description
- This routine sets the actual status field, as well as the "file goal", of the given instance to the value supplied in the status parameter. Legal values for status are taken from the set described in Section 3.2.3, specifically BSTAT NORMAL and BSTAT SHUTDOWN. For more information about these constants (and about goals/file goals), please refer to Section 3.2.3.
- The given bnode's setstat() routine from its individual ops array will be called in the course of this execution via the BOP SETSTAT() macro.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to perform this operation. If successfully modified, the BOS Server bnode defining the given instance will be written out to the BosConfig file kept on the machine's local disk.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZNOENT The given instance name not registered with the BOS Server.
Section 3.6.3.2: BOZO SetTStatus -
Temporarily set process instance status and goal
int BOZO SetTStatus(IN struct rx connection *z conn,
IN char *instance,
IN long status)
- Description
- This interface routine is much like the BOZO SetStatus(), defined in Section 3.6.3.1 above, except that its effect is to set the instance status on a temporary basis. Specifically, the status field is set to the given status value, but the "file goal" field is not changed. Thus, the instance's stated goal has not changed, just its current status.
- The given bnode's setstat() routine from its individual ops array will be called in the course of this execution via the BOP SETSTAT() macro.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to perform this operation. If successfully modified, the BOS Server bnode defining the given instance will be written out to the BosConfig file kept on the machine's local disk.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZNOENT The given instance name not registered with the BOS Server.
Section 3.6.3.3: BOZO StartupAll - Start all
existing process instances
int BOZO StartupAll(IN struct rx connection *z conn)
- Description
- This interface function examines all bnodes and attempts to restart all of those that have not been explicitly been marked with the BSTAT SHUTDOWN file goal. Specifically, BOP SETSTAT() is invoked, causing the setstat() routine from each bnode's ops array to be called. The bnode's flags field is left with the BNODE ERRORSTOP bit turned off after this call.
- The given bnode's setstat() routine from its individual ops array will be called in the course of this execution via the BOP SETSTAT() macro.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to start up bnode process instances.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.3.4: BOZO ShutdownAll - Shut down
all process instances
int BOZO ShutdownAll(IN struct rx connection *z conn)
- Description
- This interface function iterates through all bnodes and attempts to shut them all down. Specifically, the BOP SETSTAT() is invoked, causing the setstat() routine from each bnode's ops array to be called, setting that bnode's goal field to BSTAT SHUTDOWN.
- The given bnode's setstat() routine from its individual ops array will be called in the course of this execution via the BOP SETSTAT() macro.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to perform this operation.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.3.5: BOZO RestartAll - Shut down,
then restart all process instances
int BOZO RestartAll(IN struct rx connection *z conn)
- Description
- This interface function shuts down every BOS Server process instance, waits until the shutdown is complete (i.e., all instances are registered as being in state BSTAT SHUTDOWN), and then starts them all up again. While all the processes known to the BOS Server are thus restarted, the invocation of the BOS Server itself does not share this fate. For simulation of a truly complete machine restart, as is necessary when an far-reaching change to a database file has been made, use the BOZO ReBozo() interface routine defined in Section 3.6.3.6 below.
- The given bnode's getstat() and setstat() routines from its individual ops array will be called in the course of this execution via the BOP GETSTAT() and BOP SETSTAT() macros.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to restart bnode process instances.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.3.6: BOZO ReBozo - Shut down,
then restart all process instances and the BOS Server itself
int BOZO ReBozo(IN struct rx connection *z conn)
- Description
- This interface routine is identical to the BOZO RestartAll() call, defined in Section 3.6.3.5 above, except for the fact that the BOS Server itself is restarted in addition to all the known bnodes. All of the BOS Server's open file descriptors are closed, and the standard BOS Server binary image is started via execve().
- The given bnode's getstat() and setstat() routines from its individual ops array will be called in the course of this execution via the BOP GETSTAT() and BOP SETSTAT() macros.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to restart bnode process instances.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.3.7: BOZO Restart - Restart a
given process instance
int BOZO Restart(IN struct rx connection *z conn, IN char *instance)
- Description
- This interface function is used to shut down and then restart the process instance identified by the instance string passed as an argument.
- The given bnode's getstat() and setstat() routines from its individual ops array will be called in the course of this execution via the BOP GETSTAT() and BOP SETSTAT() macros.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to restart bnode process instances.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZNOENT The given instance name not registered with the BOS Server.
Section 3.6.3.8: BOZO WaitAll - Wait until
all process instances have reached their goals
int BOZO WaitAll(IN struct rx connection *z conn)
- Description
- This interface function is used to synchronize with the status of the bnodes managed by the BOS Server. Specifically, the BOZO WaitAll() call returns when each bnode's current status matches the value in its short-term goal field. For each bnode it manages, the BOS Server thread handling this call invokes the BOP GETSTAT() macro, waiting until the bnode's status and goals line up.
- Typically, the BOZO WaitAll() routine is used to allow a program to wait until all bnodes have terminated their execution (i.e., all goal fields have been set to BSTAT SHUTDOWN and all corresponding processes have been killed). Note, however, that this routine may also be used to wait until all bnodes start up. The true utility of this application of BOZO WaitAll() is more questionable, since it will return when all bnodes have simply commenced execution, which does not imply that they have completed their initialization phases and are thus rendering their normal services.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to wait on bnodes through this interface function.
- The given bnode's getstat() routine from its individual ops array will be called in the course of this execution via the BOP GETSTAT() macro.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.4: Security Configuration
- This section describes the seven BOS Server interface functions that allow a properly-authorized person to examine and modify certain data relating to system security. Specifically, it allows for manipulation of the list of adminstratively 'privileged' individuals, the set of Kerberos keys used for file service, and whether authenticated connections should be required by the BOS Server and all other AFS server agents running on the machine.
Section 3.6.4.1: BOZO AddSUser - Add a user
to the UserList
int BOZO AddSUser(IN struct rx connection *z conn, IN char *name);
- Description
- This interface function is used to add the given user name to the UserList file of priviledged BOS Server principals. Only individuals already appearing in the UserList are permitted to add new entries. If the given user name already appears in the file, the function fails. Otherwise, the file is opened in append mode and the name is written at the end with a trailing newline.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
EEXIST The individual specified by name is already on the UserList.
EIO If the UserList file could not be opened or closed.
Section 3.6.4.2: BOZO DeleteSUser - Delete a
user from the UserList
int BOZO DeleteSUser(IN struct rx connection *z conn, IN char *name)
- Description
- This interface function is used to delete the given user name from the UserList file of priviledged BOS Server principals. Only individuals already appearing in the UserList are permitted to delete existing entries. The file is opened in read mode, and a new file named UserList.NXX is created in the same directory and opened in write mode. The original UserList is scanned, with each entry copied to the new file if it doesn't match the given name. After the scan is done, all files are closed, and the UserList.NXX file is renamed to UserList, overwriting the original.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
-1 The UserList file could not be opened.
EIO The UserList.NXX file could not be opened, or an error occured in the file close operations.
ENOENT The given name was not found in the original UserList file.
Section 3.6.4.3: BOZO ListSUsers - Get the
name of the user in the given position in the UserList file
int BOZO ListSUsers(IN struct rx connection *z conn,
IN long an,
OUT char **name)
- Description
- This interface function is used to request the name of priviledged user in the an'th slot in the BOS Server's UserList file. The string placed into the name parameter may be up to 256 characters long, including the trailing null.
- Error Codes
- The UserList file could not be opened, or an invalid value was specified for an.
Section 3.6.4.4: BOZO ListKeys - List info
about the key at a given index in the key file
int BOZO ListKeys(IN struct rx connection *z conn,
IN long an,
OUT long *kvno,
OUT struct bozo key *key,
OUT struct bozo keyInfo *keyinfo)
- Description
- This interface function allows its callers to specify the index of the desired AFS encryption key, and to fetch information regarding that key. If the caller is properly authorized, the version number of the specified key is placed into the kvno parameter. Similarly, a description of the given key is placed into the keyinfo parameter. When the BOS Server is running in noauth mode, the key itself will be copied into the key argument, otherwise the key structure will be zeroed. The data placed into the keyinfo argument, declared as a struct bozo keyInfo as defined in Section 3.3.3, is obtained as follows. The mod sec field is taken from the value of st mtime after stat()ing /usr/afs/etc/KeyFile, and the mod usec field is zeroed. The keyCheckSum is computed by an Authentication Server routine, which calculates a 32-bit cryptographic checksum of the key, encrypting a block of zeros and then using the first 4 bytes as the checksum.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to obtain information regarding the list of AFS keys held by the given BOS Server.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
BZDOM An invalid index was found in the an parameter.
KABADKEY Defined in the exported kautils.h header file corresponding to the Authentication Server, this return value indicates a problem with generating the checksum field of the keyinfo parameter.
Section 3.6.4.5: BOZO AddKey - Add a key to
the key file
int BOZO AddKey(IN struct rx connection *z conn,
IN long an,
IN struct bozo key *key)
- Description
- This interface function allows a properly-authorized caller to set the value of key version number an to the given AFS key. If a slot is found in the key file /usr/afs/etc/KeyFile marked as key version number an, its value is overwritten with the key provided. If an entry for the desired key version number does not exist, the key file is grown, and the new entry filled with the specified information.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to add new entries into the list of AFS keys held by the BOS Server.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
AFSCONF FULL The system key file already contains the maximum number of keys (AFSCONF MAXKEYS, or 8). These two constant defintions are available from the cellconfig.h and keys.h AFS include files respectively.
Section 3.6.4.6: BOZO DeleteKey - Delete the
entry for an AFS key
int BOZO DeleteKey(IN struct rx connection *z conn,
IN long an)
- Description
- This interface function allows a properly-authorized caller to delete key version number an from the key file, /usr/afs/etc/KeyFile. The existing keys are scanned, and if one with key version number an is found, it is removed. Any keys occurring after the deleted one are shifted to remove the file entry entirely.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to delete entries from the list of AFS keys held by the BOS Server.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
AFSCONF NOTFOUND An entry for key version number an was not found. This constant defintion is available from the cellconfig.h AFS include file.
Section 3.6.4.7: BOZO SetNoAuthFlag - Enable
or disable requirement for authenticated calls
int BOZO SetNoAuthFlag(IN struct rx connection *z conn,
IN long flag)
- Description
- This interface routine controls the level of authentication imposed on the BOS Server and all other AFS server agents on the machine by manipulating the NoAuth file in the /usr/afs/local directory on the server. If the flag parameter is set to zero (0), the NoAuth file will be removed, instructing the BOS Server and AFS agents to authenenticate the RPCs they receive. Otherwise, the file is created as an indication to honor all RPC calls to the BOS Server and AFS agents, regardless of the credentials carried by callers.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.5: Cell Configuration
- The five interface functions covered in this section all have to do with manipulating the configuration information of the machine on which the BOS Server runs. In particular, one may get and set the cell name for that server machine, enumerate the list of server machines running database servers for the cell, and add and delete machines from this list.
Section 3.6.5.1: BOZO GetCellName - Get the
name of the cell to which the BOS Server belongs
int BOZO GetCellName(IN struct rx connection *z conn, OUT char **name)
- Description
- This interface routine returns the name of the cell to which the given BOS Server belongs. The BOS Server consults a file on its local disk, /usr/afs/etc/ThisCell to obtain this information. If this file does not exist, then the BOS Server will return a null string.
- Error Codes
- AFSCONF UNKNOWN The BOS Server could not access the cell name file. This constant defintion is available from the cellconfig.h AFS include file.
Section 3.6.5.2: BOZO SetCellName - Set the
name of the cell to which the BOS Server belongs
int BOZO SetCellName(IN struct rx connection *z conn, IN char *name)
- Description
- This interface function allows the caller to set the name of the cell to which the given BOS Server belongs. The BOS Server writes this information to a file on its local disk, /usr/afs/etc/ThisCell. The current contents of this file are first obtained, along with other information about the current cell. If this operation fails, then BOZO SetCellName() also fails. The string name provided as an argument is then stored in ThisCell.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to set the name of the cell to which the machine executing the given BOS Server belongs.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
AFSCONF NOTFOUND Information about the current cell could not be obtained. This constant definition, along with AFSCONF FAILURE appearing below, is availabel from the cellconfig.h AFS include file.
AFSCONF FAILURE New cell name could not be written to file.
Section 3.6.5.3: BOZO GetCellHost - Get the
name of a database host given its index
int BOZO GetCellHost(IN struct rx connection *z conn,
IN long awhich,
OUT char **name)
- Description
- This interface routine allows the caller to get the name of the host appearing in position awhich in the list of hosts acting as database servers for the BOS Server's cell. The first valid position in the list is index zero. The host's name is deposited in the character buffer pointed to by name. If the value of the index provided in awhich is out of range, the function fails and a null string is placed in name.
- Error Codes
- BZDOM The host index in awhich is out of range.
AFSCONF NOTFOUND Information about the current cell could not be obtained. This constant defintion may be found in the cellconfig.h AFS include file.
Section 3.6.5.4: BOZO AddCellHost - Add an
entry to the list of database server hosts
int BOZO AddCellHost(IN struct rx connection *z conn, IN char *name)
- Description
- This interface function allows properly-authorized callers to add a name to the list of hosts running AFS database server processes for the BOS Server's home cell. If the given name does not already appear in the database server list, a new entry will be created. Regardless, the mapping from the given name to its IP address will be recomputed, and the cell database file, /usr/afs/etc/CellServDB will be updated.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to add an entry to the list of host names providing database services for the BOS Server's home cell.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
AFSCONF NOTFOUND Information about the current cell could not be obtained. This constant defintion may be found in the cellconfig.h AFS include file.
Section 3.6.5.5: BOZO DeleteCellHost - Delete
an entry from the list of database server hosts
int BOZO DeleteCellHost(IN struct rx connection *z conn, IN char *name)
- Description
- This interface routine allows properly-authorized callers to remove a given name from the list of hosts running AFS database server processes for the BOS Server's home cell. If the given name does not appear in the database server list, this function will fail. Otherwise, the matching entry will be removed, and the cell database file, /usr/afs/etc/CellServDB will be updated.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to delete an entry from the list of host names providing database services for the BOS Server's home cell.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
AFSCONF NOTFOUND Information about the current cell could not be obtained. This constant defintion may be found in the cellconfig.h AFS include file.
Section 3.6.6: Installing/Uninstalling Server
Binaries
- There are four BOS Server interface routines that allow administrators to install new server binaries and to roll back to older, perhaps more reliable, executables. They also allow for stored images of the old binaries (as well as core files) to be 'pruned', or selectively deleted.
3.6.6.1 StartBOZO Install - Pass the IN params when installing a server binary
int StartBOZO Install(IN struct rx connection *z conn,
IN char *path,
IN long size,
IN long flags,
IN long date)
- Description
- The BOZO Install() function defined in the BOS Server Rxgen interface file is used to deliver the executable image of an AFS server process to the given server machine and then installing it in the appropriate directory there. It is defined to be a streamed function, namely one that can deliver an arbitrary amount of data. For full details on the definition and use of streamed functions, please refer to the Streamed Function Calls section in [4].
- This function is created by Rxgen in response to the BOZO Install() interface definition in the bosint.xg file. The StartBOZO Install() routine handles passing the IN parameters of the streamed call to the BOS Server. Specifically, the apath argument specifies the name of the server binary to be installed (including the full pathname prefix, if necessary). Also, the length of the binary is communicated via the size argument, and the modification time the caller wants the given file to carry is placed in date. The flags argument is currently ignored by the BOS Server.
- After the above parameters are delivered with StartBOZO Install(), the BOS Server creates a file with the name given in the path parameter followed by a .NEW postfix. The size bytes comprising the text of the executable in question are then read over the RPC channel and stuffed into this new file. When the transfer is complete, the file is closed. The existing versions of the server binary are then 'demoted'; the *.BAK version (if it exists) is renamed to .OLD. overwriting the existing *.OLD version if and only if an *.OLD version does not exist, or if a *.OLD exists and the .BAK file is at least seven days old. The main binary is then renamed to *.BAK. Finally, the *.NEW file is renamed to be the new standard binary image to run, and its modification time is set to date.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to install server software onto the machine on which the BOS Server runs.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
100 An error was encountered when writing the binary image to the local disk file. The truncated file was closed and deleted on the BOS Server.
101 More than size bytes were delivered to the BOS Server in the RPC transfer of the executable image.
102 Fewer than size bytes were delivered to the BOS Server in the RPC transfer of the executable image.
Section 3.6.6.2: EndBOZO Install - Get the
OUT params when installing a server binary
int EndBOZO Install(IN struct rx connection *z conn)
- Description
- This function is created by Rxgen in response to the BOZO Install() interface definition in the bosint.xg file. The EndBOZO Install() routine handles the recovery of the OUT parameters for this interface call, of which there are none. The utility of such functions is often the value they return. In this case, however, EndBOZO Install() always returns success. Thus, it is not even necessary to invoke this particular function, as it is basically a no-op.
- Error Codes
- ---Always returns successfully.
Section 3.6.6.3: BOZO UnInstall - Roll back
from a server binary installation
int BOZO UnInstall(IN struct rx connection *z conn, IN char *path)
- Description
- This interface function allows a properly-authorized caller to "roll back" from the installation of a server binary. If the *.BAK version of the server named path exists, it will be renamed to be the main executable file. In this case, the *.OLD version, if it exists, will be renamed to *.BAK.If a *.BAK version of the binary in question is not found, the *.OLD version is renamed as the new standard binary file. If neither a *.BAK or a *.OLD version of the executable can be found, the function fails, returning the low-level unix error generated at the server.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to roll back server software on the machine on which the BOS Server runs.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.6.4: BOZO Prune - Throw away old
versions of server binaries and core files
int BOZO Prune(IN struct rx connection *z conn, IN long flags)
- Description
- This interface routine allows a properly-authorized caller to prune the saved versions of server binaries resident on the machine on which the BOS Server runs. The /usr/afs/bin directory on the server machine is scanned in directory order. If the BOZO PRUNEOLD bit is set in the flags argument, every file with the *.OLD extension is deleted. If the BOZO PRUNEBAK bit is set in the flags argument, every file with the *.BAK extension is deleted. Next, the /usr/afs/logs directory is scanned in directory order. If the BOZO PRUNECORE bit is set in the flags argument, every file with a name beginning with the prefix core is deleted.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to prune server software binary versions and core files on the machine on which the BOS Server runs.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.6.7: Executing Commands at the Server
- There is a single interface function defined by the BOS Server that allows execution of arbitrary programs or scripts on any server machine on which a BOS Server process is active.
3.6.7.1 BOZO Exec - Execute a shell command at the server
int BOZO Exec(IN struct rx connection *z conn, IN char *cmd)
- Description
- This interface routine allows a properly-authorized caller to execute any desired shell command on the server on which the given BOS Server runs. There is currently no provision made to pipe the output of the given command's execution back to the caller through the RPC channel.
- The BOS Server will only allow individuals listed in its locally-maintained UserList file to execute arbitrary shell commands on the server machine on which the BOS Server runs via this call.
- Error Codes
- BZACCESS The caller is not authorized to perform this operation.
Section 3.1: Introduction
- The AFS Cache Manager is a kernel-resident agent with the following duties and responsibilities:
- Users are to be given the illusion that files stored in the AFS distributed file system are in fact part of the local unix file system of their client machine. There are several areas in which this illusion is not fully realized:
- Semantics: Full unix semantics are not maintained by the set of agents implementing the AFS distributed file system. The largest deviation involves the time when changes made to a file are seen by others who also have the file open. In AFS, modifications made to a cached copy of a file are not necessarily reflected immediately to the central copy (the one hosted by File Server disk storage), and thus to other cache sites. Rather, the changes are only guaranteed to be visible to others who simultaneously have their own cached copies open when the modifying process executes a unix close() operation on the file.
- This differs from the semantics expected from the single-machine, local unix environment, where writes performed on one open file descriptor are immediately visible to all processes reading the file via their own file descriptors. Thus, instead of the standard "last writer wins" behavior, users see "last closer
wins" behavior on their AFS files. Incidentally, other DFSs, such as NFS, do not implement full unix semantics in this case either.
- Partial failures: A panic experienced by a local, single-machine unix file system will, by definition, cause all local processes to terminate immediately. On the other hand, any hard or soft failure experienced by a File Server process or the machine upon which it is executing does not cause any of the Cache Managers interacting with it to crash. Rather, the Cache Managers will now have to reflect their failures in getting responses from the affected File Server back up to their callers. Network partitions also induce the same behavior. From the user's point of view, part of the file system tree has become inaccessible. In addition, certain system calls (e.g., open() and read()) may return unexpected failures to their users. Thus, certain coding practices that have become common amongst experienced (single-machine) unix programmers (e.g., not checking error codes from operations that "can't" fail) cause these programs to misbehave in the face of partial failures.
- To support this transparent access paradigm, the Cache Manager proceeds to:
- Intercept all standard unix operations directed towards AFS objects, mapping them to references aimed at the corresponding copies in the local cache.
- Keep a synchronized local cache of AFS files referenced by the client machine's users. If the chunks involved in an operation reading data from an object are either stale or do not exist in the local cache, then they must be fetched from the File Server(s) on which they reside. This may require a query to the volume location service in order to locate the place(s) of residence. Authentication challenges from File Servers needing to verify the caller's identity are handled by the Cache Manager, and the chunk is then incorporated into the cache.
- Upon receipt of a unix close, all dirty chunks belonging to the object will be flushed back to the appropriate File Server.
- Callback deliveries and withdrawals from File Servers must be processed, keeping the local cache in close synchrony with the state of affairs at the central store.
- Interfaces are also be provided for those principals who wish to perform AFS-specific operations, such as Access Control List (ACL) manipulations or changes to the Cache Manager's configuration.
- This chapter takes a tour of the Cache Manager's architecture, and examines how it supports these roles and responsibilities. First, the set of AFS agents with which it must interact are discussed. Next, some of the Cache Manager's implementation and interface choices are examined. Finally, the server's ability to arbitrarily dispose of callback information without affecting the correctness of the cache consistency algorithm is explained.
Section 3.2: Constants
- The main AFS agent interacting with a Cache Manager is the File Server. The most common operation performed by the Cache Manager is to act as its users' agent in fetching and storing files to and from the centralized repositories. Related to this activity, a Cache Manager must be prepared to answer queries from a File Server concerning its health. It must also be able to accept callback revocation notices generated by File Servers. Since the Cache Manager not only engages in data transfer but must also determine where the data is located in the first place, it also directs inquiries to Volume Location Server agents. There must also be an interface allowing direct interactions with both common and administrative users. Certain AFS-specific operations must be made available to these parties. In addition, administrative users may desire to dynamically reconfigure the Cache Manager. For example, information about a newly-created cell may be added without restarting the client's machine.
Section 3.3: Structures
- The above roles and behaviors for the Cache Manager influenced the implementation choices and methods used to construct it, along with the desire to maximize portability. This section begins by showing how the VFS/vnode interface, pioneered and standardized by Sun Microsystems, provides not only the necessary fine-grain access to user file system operations, but also facilitates Cache Manager ports to new hardware and operating system platforms. Next, the use of unix system calls is examined. Finally, the threading structure employed is described.
Section 3.3.1: struct bozo netKTime
- As mentioned above, Sun Microsystems has introduced and propagated an important concept in the file system world, that of the Virtual File System (VFS) interface. This abstraction defines a core collection of file system functions which cover all operations required for users to manipulate their data. System calls are written in terms of these standardized routines. Also, the associated vnode concept generalizes the original unix inode idea and provides hooks for differing underlying environments. Thus, to port a system to a new hardware platform, the system programmers have only to construct implementations of this base array of functions consistent with the new underlying machine.
- The VFS abstraction also allows multiple file systems (e.g., vanilla unix, DOS, NFS, and AFS) to coexist on the same machine without interference. Thus, to make a machine AFS-capable, a system designer first extends the base vnode structure in well-defined ways in order to store AFS-specific operations with each file description. Then, the base function array is coded so that calls upon the proper AFS agents are made to accomplish each function's standard objectives. In effect, the Cache Manager consists of code that interprets the standard set of unix operations imported through this interface and executes the AFS protocols to carry them out.
Section 3.3.2: struct bozo key
- As mentioned above, many unix system calls are implemented in terms of the base function array of vnode-oriented operations. In addition, one existing system call has been modified and two new system calls have been added to perform AFS-specific operations apart from the Cache Manager's unix 'emulation' activities. The standard ioctl() system call has been augmented to handle AFS-related operations on objects accessed via open unix file descriptors. One of the brand-new system calls is pioctl(), which is much like ioctl() except it names targeted objects by pathname instead of file descriptor. Another is afs call(), which is used to initialize the Cache Manager threads, as described in the section immediately following.
Section 3.3.3: struct bozo keyInfo
- In order to execute its many roles, the Cache Manager is organized as a multi-threaded entity. It is implemented with (potentially multiple instantiations of) the following three thread classes:
- CallBack Listener: This thread implements the Cache Manager callback RPC interface, as described in Section 6.5.
- Periodic Maintenance: Certain maintenance and checkup activities need to be performed at five set intervals. Currently, the frequency of each of these operations is hard-wired. It would be a simple matter, though, to make these times configurable by adding command-line parameters to the Cache Manager.
- Thirty seconds: Flush pending writes for NFS clients coming in through the NFS-AFS Translator facility.
- One minute: Make sure local cache usage is below the assigned quota, write out dirty buffers holding directory data, and keep flock()s alive.
- Three minutes: Check for the resuscitation of File Servers previously determined to be down, and check the cache of previously computed access information in light of any newly expired tickets.
- Ten minutes: Check health of all File Servers marked as active, and garbage-collect old RPC connections.
- One hour: Check the status of the root AFS volume as well as all cached information concerning read-only volumes.
- Background Operations: The Cache Manager is capable of prefetching file system objects, as well as carrying out delayed stores, occurring sometime after a close() operation. At least two threads are created at Cache Manager initialization time and held in reserve to carry out these objectives. This class of background threads implements the following three operations:
- Prefetch operation: Fetches particular file system object chunks in the expectation that they will soon be needed.
- Path-based prefetch operation: The prefetch daemon mentioned above operates on objects already at least partly resident in the local cache, referenced by their vnode. The path-based prefetch daemon performs the same actions, but on objects named solely by their unix pathname.
- Delayed store operation: Flush all modified chunks from a file system object to the appropriate File Server's disks.
Section 3.4: Error Codes
- The Cache Manager is free to throw away any or all of the callbacks it has received from the set of File Servers from which it has cached files. This housecleaning does not in any way compromise the correctness of the AFS cache consistency algorithm. The File Server RPC interface described in this paper provides a call to allow a Cache Manager to advise of such unilateral jettisoning. However, failure to use this routine still leaves the machine's cache consistent. Let us examine the case of a Cache Manager on machine C disposing of its callback on file X from File Server F. The next user access on file X on machine C will cause the Cache Manager to notice that it does not currently hold a callback on it (although the File Server will think it does). The Cache Manager on C attempts to revalidate its entry when it is entirely possible that the file is still in sync with the central store. In response, the File Server will extend the existing callback information it has and deliver the new promise to the Cache Manager on C. Now consider the case where file X is modified by a party on a machine other than C before such an access occurs on C. Under these circumstances, the File Server will break its callback on file X before performing the central update. The Cache Manager on C will receive one of these "break callback" messages. Since it no longer has a callback on file X, the Cache Manager on C will cheerfully acknowledge the File Server's notification and move on to other matters. In either case, the callback information for both parties will eventually resynchronize. The only potential penalty paid is extra inquiries by the Cache Manager and thus providing for reduced performance instead of failure of operation.
Section 3.1: Introduction
- The rxkad security module is offered as one of the built-in Rx authentication models. It is based on the Kerberos system developed by MIT's Project Athena. Readers wishing detailed information regarding Kerberos design and implementation are directed to [2]. This chapter is devoted to defining how Kerberos authentication services are made available as Rx components, and assumes the reader has some familiarity with Kerberos. Included are descriptions of how client-side and server-side Rx security objects (struct rx securityClass; see Section 5.3.1.1) implementing this protocol may be generated by an Rx application. Also, a description appears of the set of routines available in the associated struct rx securityOps structures, as covered in Section 5.3.1.2. It is strongly recommended that the reader become familiar with this section on struct rx securityOps before reading on.
Section 3.2: Constants
- An important set of definitions related to the rxkad security package is provided by the rxkad.h include file. Determined here are various values for ticket lifetimes, along with structures for encryption keys and Kerberos principals. Declarations for the two routines required to generate the different rxkad security objects also appear here. The two functions are named rxkad NewServerSecurityObject() and rxkad NewClientSecurityObject(). In addition, type field values, encryption levels, security index operations, and statistics structures may be found in this file.
Section 3.3: Structures
- To be usable as an Rx security module, the rxkad facility exports routines to create server-side and client-side security objects. The server authentication object is incorporated into the server code when calling rx NewService(). The client authentication object is incorporated into the client code every time a connection is established via rx NewConnection(). Also, in order to implement these security objects, the rxkad module must provide definitions for some subset of the generic security operations as defined in the appropriate struct rx securityOps variable.
Section 3.3.1: struct bozo netKTime
Section 3.3.1.1: Security Operations
- The server side of the rxkad module fills in all but two of the possible routines associated with an Rx security object, as described in Section 5.3.1.2.
static struct rx_securityOps rxkad_server_ops = {
rxkad_Close,
rxkad_NewConnection,
rxkad_PreparePacket,
0,
rxkad_CheckAuthentication,
rxkad_CreateChallenge,
rxkad_GetChallenge,
0,
rxkad_CheckResponse,
rxkad_DestroyConnection,
rxkad_GetStats,
};
- The rxkad service does not need to take any special action each time a packet belonging to a call in an rxkad Rx connection is physically transmitted. Thus, a routine is not supplied for the op SendPacket() function slot. Similarly, no preparatory work needs to be done previous to the reception of a response packet from a security challenge, so the op GetResponse() function slot is also empty.
Section 3.3.1.2: Security Object
- The exported routine used to generate an rxkad-specific server-side security class object is named rxdad NewServerSecurityObject(). It is declared with four parameters, as follows:
struct rx_securityClass *
rxkad_NewServerSecurityObject(a_level, a_getKeyRockP, a_getKeyP, a_userOKP)
rxkad_level a_level;
char *a_getKeyRockP;
int (*a_getKeyP)();
int (*a_userOKP)();
- The first argument specifies the desired level of encryption, and may take on the following values (as defined in rxkad.h):
- rxkad clear: Specifies that packets are to be sent entirely in the clear, without any encryption whatsoever.
- rxkad auth: Specifies that packet sequence numbers are to be encrypted.
- rxkad crypt: Specifies that the entire data packet is to be encrypted.
- The second and third parameters represent, respectively, a pointer to a private data area, sometimes called a "rock", and a procedure reference that is called with the key version number accompanying the Kerberos ticket and returns a pointer to the server's decryption key. The fourth argument, if not null, is a pointer to a function that will be called for every new connection with the client's name, instance, and cell. This routine should return zero if the user is not acceptable to the server.
Section 3.3.2: struct bozo key
Section 3.3.2.1: Security Operations
- The client side of the rxkad module fills in relatively few of the routines associated with an Rx security object, as demonstrated below. The general Rx security object, of which this is an instance, is described in detail in Section 5.3.1.2.
static struct rx_securityOps rxkad_client_ops = {
rxkad_Close,
rxkad_NewConnection,
rxkad_PreparePacket,
0,
0,
0,
0,
rxkad_GetResponse,
0,
rxkad_CheckPacket,
rxkad_DestroyConnection,
rxkad_GetStats,
0,
0,
0,
};
- As expected, routines are defined for use when someone destroys a security object (rxkad Close()) and when an Rx connection using the rxkad model creates a new connection (rxkad NewConnection()) or deletes an existing one (rxkad DestroyConnection()). Security-specific operations must also be performed in behalf of rxkad when packets are created (rxkad PreparePacket()) and received (rxkad CheckPacket()). finally, the client side of an rxkad security object must also be capable of constructing responses to security challenges from the server (rxkad GetResponse()) and be willing to reveal statistics on its own operation (rxkad GetStats()).
Section 3.3.2.2: Security Object
- The exported routine used to generate an rxkad-specific client-side security class object is named rxkad NewClientSecurityObject(). It is declared with five parameters, specified below:
struct rx_securityClass * rxkad_NewClientSecurityObject(
a_level,
a_sessionKeyP,
a_kvno,
a_ticketLen,
a_ticketP
)
rxkad_level a_level;
struct ktc_encryptionKey *a_sessionKeyP;
long a_kvno;
int a_ticketLen;
char *a_ticketP;
- The first parameter, a level, specifies the level of encryption desired for this security object, with legal choices being identical to those defined for the server-side security object described in Section 3.3.1.2. The second parameter, a sessionKeyP, provides the session key to use. The ktc encryptionKey structure is defined in the rxkad.h include file, and consists of an array of 8 characters. The third parameter, a kvno, provides the key version number associated with a sessionKeyP. The fourth argument, a ticketLen, communicates the length in bytes of the data stored in the fifth parameter, a ticketP, which points to the Kerberos ticket to use for the principal for which the security object will operate.
Section 3.1: Introduction
- This chapter documents the API for the Volume Location Server facility, as defined by the vldbint.xg Rxgen interface file and the vldbint.h include file. Descriptions of all the constants, structures, macros, and interface functions available to the application programmer appear here.
- It is expected that Volume Location Server client programs run in user space, as does the associated vos volume utility. However, the kernel-resident Cache Manager agent also needs to call a subset of the Volume Location Server's RPC interface routines. Thus, a second Volume Location Server interface is available, built exclusively to satisfy the Cache Manager's limited needs. This subset interface is defined by the afsvlint.xg Rxgen interface file, and is examined in the final section of this chapter.
Section 3.2: Constants
- This section covers the basic constant definitions of interest to the Volume Location Server application programmer. These definitions appear in the vldbint.h file, automatically generated from the vldbint.xg Rxgen interface file, and in vlserver.h.
- Each subsection is devoted to describing the constants falling into the following categories:
- Configuration and boundary quantities
- Update entry bits
- List-by-attribute bits
- Volume type indices
- States for struct vlentry
- States for struct vldbentry
- ReleaseType argument values
- Miscellaneous items
Section 3.2.1: Status Bits
Quantities
- These constants define some basic system values, including configuration information.
- Name
- MAXNAMELEN
- Value
- 65
- Description
- Maximum size of various character strings, including volume name fields in structures and host names.
- Name
- MAXNSERVERS
- Value
- 8
- Description
- Maximum number of replications sites for a volume.
- Name
- MAXTYPES
- Value
- 3
- Description
- Maximum number of volume types.
- Name
- VLDBVERSION
- Value
- 1
- Description
- VLDB database version number
- Name
- HASHSIZE
- Value
- 8,191
- Description
- Size of internal Volume Location Server volume name and volume ID hash tables. This must always be a prime number.
- Name
- NULLO
- Value
- 0
- Description
- Specifies a null pointer value.
- Name
- VLDBALLOCCOUNT
- Value
- 40
- Description
- Value used when allocating memory internally for VLDB entry records.
- Name
- BADSERVERID
- Value
- 255
- Description
- Illegal Volume Location Server host ID.
- Name
- MAXSERVERID
- Value
- 30
- Description
- Maximum number of servers appearing in the VLDB.
- Name
- MAXSERVERFLAG
- Value
- 0x80
- Description
- First unused flag value in such fields as serverFlags in struct vldbentry and RepsitesNewFlags in struct VldbUpdateEntry.
- Name
- MAXPARTITIONID
- Value
- 126
- Description
- Maximum number of AFS disk partitions for any one server.
- Name
- MAXBUMPCOUNT
- Value
- 0x7fffffff
- Description
- Maximum interval that the current high-watermark value for a volume ID can be increased in one operation.
- Name
- MAXLOCKTIME
- Value
- 0x7fffffff
- Description
- Maximum number of seconds that any VLDB entry can remain locked.
- Name
- SIZE
- Value
- 1,024
- Description
- Maximum size of the name field within a struct.
Section 3.2.2: Bnode Activity Bits
- These constants define bit values for the Mask field in the struct VldbUpdateEntry. Specifically, setting these bits is equivalent to declaring that the corresponding field within an object of type struct VldbUpdateEntry has been set. For example, setting the VLUPDATE VOLUMENAME flag in Mask indicates that the name field contains a valid value.
- Name
- VLUPDATE VOLUMENAME
- Value
- 0x0001
- Description
- If set, indicates that the name field is valid.
- Name
- VLUPDATE VOLUMETYPE
- Value
- 0x0002
- Description
- If set, indicates that the volumeType field is valid.
- Name
- VLUPDATE FLAGS
- Value
- 0x0004
- Description
- If set, indicates that the flags field is valid.
- Name
- VLUPDATE READONLYID
- Value
- 0x0008
- Description
- If set, indicates that the ReadOnlyId field is valid.
- Name
- VLUPDATE BACKUPID
- Value
- 0x0010
- Description
- If set, indicates that the BackupId field is valid.
- Name
- VLUPDATE REPSITES
- Value
- 0x0020
- Description
- If set, indicates that the nModifiedRepsites field is valid.
- Name
- VLUPDATE CLONEID
- Value
- 0x0080
- Description
- If set, indicates that the cloneId field is valid.
- Name
- VLUPDATE REPS DELETE
- Value
- 0x0100
- Description
- Is the replica being deleted?
- Name
- VLUPDATE REPS ADD
- Value
- 0x0200
- Description
- Is the replica being added?
- Name
- VLUPDATE REPS MODSERV
- Value
- 0x0400
- Description
- Is the server part of the replica location correct?
- Name
- VLUPDATE REPS MODPART
- Value
- 0x0800
- Description
- Is the partition part of the replica location correct?
- Name
- VLUPDATE REPS MODFLAG
- Value
- 0x1000
- Description
- Various modification flag values.
Section 3.2.3: Bnode States
- These constants define bit values for the Mask field in the struct VldbListByAttributes is to be used in a match. Specifically, setting these bits is equivalent to declaring that the corresponding field within an object of type struct VldbListByAttributes is set. For example, setting the VLLIST SERVER flag in Mask indicates that the server field contains a valid value.
- Name
- VLLIST SERVER
- Value
- 0x1
- Description
- If set, indicates that the server field is valid.
- Name
- VLLIST PARTITION
- Value
- 0x2
- Description
- If set, indicates that the partition field is valid.
- Name
- VLLIST VOLUMETYPE
- Value
- 0x4
- Description
- If set, indicates that the volumetype field is valid.
- Name
- VLLIST VOLUMEID
- Value
- 0x8
- Description
- If set, indicates that the volumeid field is valid.
- Name
- VLLIST FLAG
- Value
- 0x10
- Description
- If set, indicates that that flag field is valid.
Section 3.2.4: Pruning Server Binaries
- These constants specify the order of entries in the volumeid array in an object of type struct vldbentry. They also identify the three different types of volumes in AFS.
- Name
- RWVOL
- Value
- 0
- Description
- Read-write volume.
- Name
- ROVOL
- Value
- 1
- Description
- Read-only volume.
- Name
- BACKVOL
- Value
- 2
- Description
- Backup volume.
Section 3.2.5: Flag Bits for struct bnode proc
- The following constants appear in the flags field in objects of type struct vlentry. The first three values listed specify the state of the entry, while all the rest stamp the entry with the type of an ongoing volume operation, such as a move, clone, backup, deletion, and dump. These volume operations are the legal values to provide to the voloper parameter of the VL SetLock() interface routine.
- For convenience, the constant VLOP ALLOPERS is defined as the inclusive OR of the above values from VLOP MOVE through VLOP DUMP.
- Name
- VLFREE
- Value
- 0x1
- Description
- Entry is in the free list.
- Name
- VLDELETED
- Value
- 0x2
- Description
- Entry is soft-deleted.
- Name
- VLLOCKED
- Value
- 0x4
- Description
- Advisory lock held on the entry.
- Name
- VLOP MOVE
- Value
- 0x10
- Description
- The associated volume is being moved between servers.
- Name
- VLOP RELEASE
- Value
- 0x20
- Description
- The associated volume is being cloned to its replication sites.
- Name
- VLOP BACKUP
- Value
- 0x40
- Description
- A backup volume is being created for the associated volume.
- Name
- VLOP DELETE
- Value
- 0x80
- Description
- The associated volume is being deleted.
- Name
- VLOP DUMP
- Value
- 0x100
- Description
- A dump is being taken of the associated volume.
Section 3.2.6: States for struct vldbentry
- Of the following constants, the first three appear in the flags field within an object of type struct vldbentry, advising of the existence of the basic volume types for the given volume, and hence the validity of the entries in the volumeId array field. The rest of the values provided in this table appear in the serverFlags array field, and apply to the instances of the volume appearing in the various replication sites.
- This structure appears in numerous Volume Location Server interface calls, namely VL CreateEntry(), VL GetEntryByID(), VL GetEntryByName(), VL ReplaceEntry() and VL ListEntry().
- Name
- VLF RWEXISTS
- Value
- 0x1000
- Description
- The read-write volume ID is valid.
- Name
- VLF ROEXISTS
- Value
- 0x2000
- Description
- The read-only volume ID is valid.
- Name
- VLF BACKEXISTS
- Value
- 0x4000
- Description
- The backup volume ID is valid.
- Name
- VLSF NEWREPSITE
- Value
- 0x01
- Description
- Not used; originally intended to mark an entry as belonging to a partially-created volume instance.
- Name
- VLSF ROVOL
- Value
- 0x02
- Description
- A read-only version of the volume appears at this server.
- Name
- VLSF RWVOL
- Value
- 0x02
- Description
- A read-write version of the volume appears at this server.
- Name
- VLSF BACKVOL
- Value
- 0x08
- Description
- A backup version of the volume appears at this server.
Section 3.2.7: ReleaseType Argument Values
- The following values are used in the ReleaseType argument to various Volume Location Server interface routines, namely VL ReplaceEntry(), VL UpdateEntry() and VL ReleaseLock().
- Name
- LOCKREL TIMESTAMP
- Value
- 1
- Description
- Is the LockTimestamp field valid?
- Name
- LOCKREL OPCODE
- Value
- 2
- Description
- Are any of the bits valid in the flags field?
- Name
- LOCKREL AFSID
- Value
- 4
- Description
- Is the LockAfsId field valid?
Section 3.2.8: Miscellaneous
- Miscellaneous values.
- Name
- VLREPSITE NEW
- Value
- 1
- Description
- Has a replication site gotten a new release of a volume?
- A synonym for this constant is VLSF NEWREPSITE.
Section 3.3: Structures
- This section describes the major exported Volume Location Server data structures of interest to application programmers, along with the typedefs based upon those structures.
Section 3.3.1: struct bozo netKTime
- This structure represents an entry in the VLDB as made visible to Volume Location Server clients. It appears in numerous Volume Location Server interface calls, namely VL CreateEntry(), VL GetEntryByID(), VL GetEntryByName(), VL ReplaceEntry() and VL ListEntry().
Fields
- char name[] - The string name for the volume, with a maximum length of MAXNAMELEN (65) characters, including the trailing null.
- long volumeType - The volume type, one of RWVOL, ROVOL, or BACKVOL.
- long nServers - The number of servers that have an instance of this volume.
- long serverNumber[] - An array of indices into the table of servers, identifying the sites holding an instance of this volume. There are at most MAXNSERVERS (8) of these server sites allowed by the Volume Location Server.
- long serverPartition[] - An array of partition identifiers, corresponding directly to the serverNumber array, specifying the partition on which each of those volume instances is located. As with the serverNumber array, serverPartition has up to MAXNSERVERS (8) entries.
- long serverFlags[] - This array holds one flag value for each of the servers in the previous arrays. Again, there are MAXNSERVERS (8) slots in this array.
- u long volumeId[] - An array of volume IDs, one for each volume type. There are MAXTYPES slots in this array.
- long cloneId - This field is used during a cloning operation.
- long flags - Flags concerning the status of the fields within this structure; see Section 3.2.6 for the bit values that apply.
Section 3.3.2: struct bozo key
- This structure is used internally by the Volume Location Server to fully represent a VLDB entry. The client-visible struct vldbentry represents merely a subset of the information contained herein.
Fields
- u long volumeId[] - An array of volume IDs, one for each of the MAXTYPES of volume types.
- long flags - Flags concerning the status of the fields within this structure; see Section 3.2.6 for the bit values that apply.
- long LockAfsId - The individual who locked the entry. This feature has not yet been implemented.
- long LockTimestamp - Time stamp on the entry lock.
- long cloneId - This field is used during a cloning operation.
- long AssociatedChain - Pointer to the linked list of associated VLDB entries.
- long nextIdHash[] - Array of MAXTYPES next pointers for the ID hash table pointer, one for each related volume ID.
- long nextNameHash - Next pointer for the volume name hash table.
- long spares1[] - Two longword spare fields.
- char name[] - The volume's string name, with a maximum of MAXNAMELEN (65) characters, including the trailing null.
- u char volumeType - The volume's type, one of RWVOL, ROVOL, or BACKVOL.
- u char serverNumber[] - An array of indices into the table of servers, identifying the sites holding an instance of this volume. There are at most MAXNSERVERS (8) of these server sites allowed by the Volume Location Server.
- u char serverPartition[] - An array of partition identifiers, corresponding directly to the serverNumber array, specifying the partition on which each of those volume instances is located. As with the serverNumber array, serverPartition has up to MAXNSERVERS (8) entries.
- u char serverFlags[] - This array holds one flag value for each of the servers in the previous arrays. Again, there are MAXNSERVERS (8) slots in this array.
- u char RefCount - Only valid for read-write volumes, this field serves as a reference count, basically the number of dependent children volumes.
- char spares2[] - This field is used for 32-bit alignment.
Section 3.3.3: struct bozo keyInfo
- This structure defines the leading section of the VLDB header, of type struct vlheader. It contains frequently-used global variables and general statistics information.
Fields
- long vldbversion - The VLDB version number. This field must appear first in the structure.
- long headersize - The total number of bytes in the header.
- long freePtr - Pointer to the first free enry in the free list, if any.
- long eofPtr - Pointer to the first free byte in the header file.
- long allocs - The total number of calls to the internal AllocBlock() function directed at this file.
- long frees - The total number of calls to the internal FreeBlock() function directed at this file.
- long MaxVolumeId - The largest volume ID ever granted for this cell.
- long totalEntries[] - The total number of VLDB entries by volume type in the VLDB. This array has MAXTYPES slots, one for each volume type.
Section 3.3.4: struct bozo status
- This is the layout of the information stored in the VLDB header. Notice it includes an object of type struct vital vlheader described above (see Section 3.3.3) as the first field.
Fields
- struct vital vlheader vital header - Holds critical VLDB header information.
- u long IpMappedAddr[] - Keeps MAXSERVERID+1 mappings of IP addresses to relative ones.
- long VolnameHash[] - The volume name hash table, with HASHSIZE slots.
- long VolidHash[][] - The volume ID hash table. The first dimension in this array selects which of the MAXTYPES volume types is desired, and the second dimension actually implements the HASHSIZE hash table buckets for the given volume type.
Section 3.3.5: struct bnode ops
- This structure is used as an argument to the VL UpdateEntry() routine (see Section 3.6.7). Please note that multiple entries can be updated at once by setting the appropriate Mask bits. The bit values for this purpose are defined in Section 3.2.2.
Fields
- u long Mask - Bit values determining which fields are to be affected by the update operation.
- char name[] - The volume name, up to MAXNAMELEN (65) characters including the trailing null.
- long volumeType - The volume type.
- long flags - This field is used in conjuction with Mask (in fact, one of the Mask bits determines if this field is valid) to choose the valid fields in this record.
- u long ReadOnlyId - The read-only ID.
- u long BackupId - The backup ID.
- long cloneId - The clone ID.
- long nModifiedRepsites - Number of replication sites whose entry is to be changed as below.
- u long RepsitesMask[] - Array of bit masks applying to the up to MAXNSERVERS (8) replication sites involved.
- long RepsitesTargetServer[] - Array of target servers for the operation, at most MAXNSERVERS (8) of them.
- long RepsitesTargetPart[] - Array of target server partitions for the operation, at most MAXNSERVERS (8) of them.
- long RepsitesNewServer[] - Array of new server sites, at most MAXNSERVERS (8) of them.
- long RepsitesNewPart[] - Array of new server partitions for the operation, at most MAXNSERVERS (8) of them.
- long RepsitesNewFlags[] - Flags applying to each of the new sites, at most MAXNSERVERS (8) of them.
Section 3.3.6: struct bnode type
- This structure is used by the VL ListAttributes() routine (see Section 3.6.11).
Fields
- u long Mask - Bit mask used to select the following attribute fields on which to match.
- long server - The server address to match.
- long partition - The partition ID to match.
- long volumetype - The volume type to match.
- long volumeid - The volume ID to match.
- long flag - Flags concerning these values.
Section 3.3.7: struct bnode token
- This structure is used to construct the vldblist object (See Section 3.3.12), which basically generates a queueable (singly-linked) version of struct vldbentry.
Fields
- vldbentry VldbEntry - The VLDB entry to be queued.
- vldblist next vldb - The next pointer in the list.
Section 3.3.8: struct bnode
- This structure defines the item returned in linked list form from the VL LinkedList() function (see Section 3.6.12). This same object is also returned in bulk form in calls to the VL ListAttributes() routine (see Section 3.6.11).
Fields
- vldblist node - The body of the first object in the linked list.
Section 3.3.9: struct bnode proc
- This structure defines fields to record statistics on opcode hit frequency. The MAX NUMBER OPCODES constant has been defined as the maximum number of opcodes supported by this structure, and is set to 30.
Fields
- unsigned long start time - Clock time when opcode statistics were last cleared.
- long requests[] - Number of requests received for each of the MAX NUMBER OPCODES opcode types.
- long aborts[] - Number of aborts experienced for each of the MAX NUMBER OPCODES opcode types.
- long reserved[] - These five longword fields are reserved for future use.
Section 3.3.10: bulk
typedef opaque bulk<DEFAULTBULK>;
- This typedef may be used to transfer an uninterpreted set of bytes across the Volume Location Server interface. It may carry up to DEFAULTBULK (10,000) bytes.
Fields
- bulk len - The number of bytes contained within the data pointed to by the next field.
- bulk val - A pointer to a sequence of bulk len bytes.
Section 3.3.11: bulkentries
typedef vldbentry bulkentries<>;
- This typedef is used to transfer an unbounded number of struct vldbentry objects. It appears in the parameter list for the VL ListAttributes() interface function.
Fields
- bulkentries len - The number of vldbentry structures contained within the data pointed to by the next field.
- bulkentries val - A pointer to a sequence of bulkentries len vldbentry structures.
Section 3.3.12: vldblist
typedef struct single_vldbentry *vldblist;
- This typedef defines a queueable struct vldbentry object, referenced by the single vldbentry typedef as well as struct vldb list.
Section 3.3.13: vlheader
typedef struct vlheader vlheader;
- This typedef provides a short name for objects of type struct vlheader (see Section 3.3.4).
Section 3.3.14: vlentry
typedef struct vlentry vlentry;
- This typedef provides a short name for objects of type struct vlentry (see Section 3.3.2).
Section 3.4: Error Codes
- This section covers the set of error codes exported by the Volume Location Server, displaying the printable phrases with which they are associated.
- Name
- VL IDEXIST
- Value
- (363520L)
- Description
- Volume Id entry exists in vl database.
- Name
- VL IO
- Value
- (363521L)
- Description
- I/O related error.
- Name
- VL NAMEEXIST
- Value
- (363522L)
- Description
- Volume name entry exists in vl database.
- Name
- VL CREATEFAIL
- Value
- (363523L)
- Description
- Internal creation failure.
- Name
- VL NOENT
- Value
- (363524L)
- Description
- No such entry.
- Name
- VL EMPTY
- Value
- (363525L)
- Description
- Vl database is empty.
- Name
- VL ENTDELETED
- Value
- (363526L)
- Description
- Entry is deleted (soft delete).
- Name
- VL BADNAME
- Value
- (363527L)
- Description
- Volume name is illegal.
- Name
- VL BADINDEX
- Value
- (363528L)
- Description
- Index is out of range.
- Name
- VL BADVOLTYPE
- Value
- (363529L)
- Description
- Bad volume range.
- Name
- VL BADSERVER
- Value
- (363530L)
- Description
- Illegal server number (out of range).
- Name
- VL BADPARTITION
- Value
- (363531L)
- Description
- Bad partition number.
- Name
- VL REPSFULL
- Value
- (363532L)
- Description
- Run out of space for Replication sites.
- Name
- VL NOREPSERVER
- Value
- (363533L)
- Description
- No such Replication server site exists.
- Name
- VL DUPREPSERVER
- Value
- (363534L)
- Description
- Replication site already exists.
- Name
- RL RWNOTFOUND
- Value
- (363535L)
- Description
- Parent R/W entry not found.
- Name
- VL BADREFCOUNT
- Value
- (363536L)
- Description
- Illegal Reference Count number.
- Name
- VL SIZEEXCEEDED
- Value
- (363537L)
- Description
- Vl size for attributes exceeded.
- Name
- VL BADENTRY
- Value
- (363538L)
- Description
- Bad incoming vl entry.
- Name
- VL BADVOLIDBUMP
- Value
- (363539L)
- Description
- Illegal max volid increment.
- Name
- VL IDALREADYHASHED
- Value
- (363540L)
- Description
- RO/BACK id already hashed.
- Name
- VL ENTRYLOCKED
- Value
- (363541L)
- Description
- Vl entry is already locked.
- Name
- VL BADVOLOPER
- Value
- (363542L)
- Description
- Bad volume operation code.
- Name
- VL BADRELLOCKTYPE
- Value
- (363543L)
- Description
- Bad release lock type.
- Name
- VL RERELEASE
- Value
- (363544L)
- Description
- Status report: last release was aborted.
- Name
- VL BADSERVERFLAG
- Value
- (363545L)
- Description
- Invalid replication site server flag.
- Name
- VL PERM
- Value
- (363546L)
- Description
- No permission access.
- Name
- VL NOMEM
- Value
- (363547L)
- Description
- malloc(realloc) failed to alloc enough memory.
Section 3.5: Macros
- The Volume Location Server defines a small number of macros, as described in this section. They are used to update the internal statistics variables and to compute offsets into character strings. All of these macros really refer to internal operations, and strictly speaking should not be exposed in this interface.
Section 3.5.1: BOP TIMEOUT()
#define COUNT_REQ(op)
static int this_op = op-VL_LOWEST_OPCODE;
dynamic_statistics.requests[this_op]++
- Bump the appropriate entry in the variable maintaining opcode usage statistics for the Volume Location Server. Note that a static variable is set up to record this op, namely the index into the opcode monitoring array. This static variable is used by the related COUNT ABO() macro defined below.
Section 3.5.2: BOP GETSTAT()
#define COUNT_ABO dynamic_statistics.aborts[this_op]++
- Bump the appropriate entry in the variable maintaining opcode abort statistics for the Volume Location Server. Note that this macro does not take any arguemnts. It expects to find a this op variable in its environment, and thus depends on its related macro, COUNT REQ() to define that variable.
Section 3.5.3: BOP SETSTAT()
#define DOFFSET(abase, astr, aitem) ((abase)+(((char *)(aitem)) -((char
*)(astr))))
- Compute the byte offset of charcter object aitem within the enclosing object astr, also expressed as a character-based object, then offset the resulting address by abase. This macro is used ot compute locations within the VLDB when actually writing out information.
Section 3.6: Functions
- This section covers the Volume Location Server RPC interface routines. The majority of them are generated from the vldbint.xg Rxgen file, and are meant to be used by user-space agents. There is also a subset interface definition provided in the afsvlint.xg Rxgen file. These routines, described in Section 3.7, are meant to be used by a kernel-space agent when dealing with the Volume Location Server; in particular, they are called by the Cache Manager.
Section 3.6.1: Creating and Removing Processes
entry
int VL CreateEntry(IN struct rx connection *z conn,
IN vldbentry *newentry)
- Description
- This function creates a new entry in the VLDB, as specified in the newentry argument. Both the name and numerical ID of the new volume must be unique (e.g., it must not already appear in the VLDB). For non-read-write entries, the read-write parent volume is accessed so that its reference count can be updated, and the new entry is added to the parent's chain of associated entries. The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function. VL NAMEEXIST The volume name already appears in the VLDB. VL CREATEFAIL Space for the new entry cannot be allocated within the VLDB. VL BADNAME The volume name is invalid. VL BADVOLTYPE The volume type is invalid. VL BADSERVER The indicated server information is invalid. VL BADPARTITION The indicated partition information is invalid. VL BADSERVERFLAG The server flag field is invalid. VL IO An error occurred while writing to the VLDB.
Section 3.6.2: Examining Process Information
entry
int VL DeleteEntry(IN struct rx connection *z conn,
IN long Volid,
IN long voltype)
- Description
- Delete the entry matching the given volume identifier and volume type as specified in the Volid and voltype arguments. For a read-write entry whose reference count is greater than 1, the entry is not actually deleted, since at least one child (read-only or backup) volume still depends on it. For cases of non-read-write volumes, the parent's reference count and associated chains are updated.
- If the associated VLDB entry is already marked as deleted (i.e., its flags field has the VLDELETED bit set), then no further action is taken, and VL ENTDELETED is returned. The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function. VL BADVOLTYPE An illegal volume type has been specified by the voltype argument. VL NOENT This volume instance does not appear in the VLDB. VL ENTDELETED The given VLDB entry has already been marked as deleted. VL IO An error occurred while writing to the VLDB.
Section 3.6.3: Starting, Stopping, and Restarting
volume ID/type
int VL GetEntryByID(IN struct rx connection *z conn, IN long Volid, IN long
voltype, OUT vldbentry *entry)
- Description
- Given a volume's numerical identifier (Volid) and type (voltype), return a pointer to the entry in the VLDB describing the given volume instance.
- The VLDB is read-locked for the duration of this operation.
- Error Codes
- VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL NOENT This volume instance does not appear in the VLDB.
VL ENTDELETED The given VLDB entry has already been marked as deleted.
Section 3.6.4: Security Configuration
by volume name
int VL GetEntryByName(IN struct rx connection *z conn,
IN char *volumename,
OUT vldbentry *entry)
- Description
- Given the volume name in the volumename parameter, return a pointer to the entry in the VLDB describing the given volume. The name in volumename may be no longer than MAXNAMELEN (65) characters, including the trailing null. Note that it is legal to use the volume's numerical identifier (in string form) as the volume name.
- The VLDB is read-locked for the duration of this operation.
- This function is closely related to the VL GetEntryByID() routine, as might be expected. In fact, the by-ID routine is called if the volume name provided in volumename is the string version of the volume's numerical identifier.
- Error Codes
- VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL NOENT This volume instance does not appear in the VLDB.
VL ENTDELETED The given VLDB entry has already been marked as deleted.
VL BADNAME The volume name is invalid.
Section 3.6.5: Cell Configuration
volume ID
int VL GetNewVolumeId(IN struct rx connection *z conn,
IN long bumpcount,
OUT long *newvolumid)
- Description
- Acquire bumpcount unused, consecutively-numbered volume identifiers from the Volume Location Server. The lowest-numbered of the newly-acquired set is placed in the newvolumid argument. The largest number of volume IDs that may be generated with any one call is bounded by the MAXBUMPCOUNT constant defined in Section 3.2.1. Currently, there is (effectively) no restriction on the number of volume identifiers that may thus be reserved in a single call.
- The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
VL BADVOLIDBUMP The value of the bumpcount parameter exceeds the system limit of MAXBUMPCOUNT.
VL IO An error occurred while writing to the VLDB.
Section 3.6.6: Installing/Uninstalling Server
contents of VLDB entry
int VL ReplaceEntry(IN struct rx connection *z conn,
IN long Volid,
IN long voltype,
IN vldbentry *newentry,
IN long ReleaseType)
- Description
- Perform a wholesale replacement of the VLDB entry corresponding to the volume instance whose identifier is Volid and type voltype with the information contained in the newentry argument. Individual VLDB entry fields cannot be selectively changed while the others are preserved; VL UpdateEntry() should be used for this objective. The permissible values for the ReleaseType parameter are defined in Section 3.2.7.
- The VLDB is write-locked for the duration of this operation. All of the hash tables impacted are brought up to date to incorporate the new information.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL BADRELLOCKTYPE An illegal release lock has been specified by the ReleaseType argument.
VL NOENT This volume instance does not appear in the VLDB.
VL BADENTRY An attempt was made to change a read-write volume ID.
VL IO An error occurred while writing to the VLDB.
Section 3.6.7: Executing Commands at the Server
VLDB entry
int VL UpdateEntry(IN struct rx connection *z conn,
IN long Volid,
IN long voltype,
IN VldbUpdateEntry *UpdateEntry,
IN long ReleaseType)
- Description
- Update the VLDB entry corresponding to the volume instance whose identifier is Volid and type voltype with the information contained in the UpdateEntry argument. Most of the entry's fields can be modified in a single call to VL UpdateEntry(). The Mask field within the UpdateEntry parameter selects the fields to update with the values stored within the other UpdateEntry fields. Permissible values for the ReleaseType parameter are defined in Section 3.2.7.
- The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL BADRELLOCKTYPE An illegal release lock has been specified by the ReleaseType argument.
VL NOENT This volume instance does not appear in the VLDB.
VL IO An error occurred while writing to the VLDB.
Section 3.6.8: VL SetLock - Lock VLDB entry
int VL SetLock(IN struct rx connection *z conn,
IN long Volid,
IN long voltype,
IN long voloper)
- Description
- Lock the VLDB entry matching the given volume ID (Volid) and type (voltype) for volume operation voloper (e.g., VLOP MOVE and VLOP RELEASE). If the entry is currently unlocked, then its LockTimestamp will be zero. If the lock is obtained, the given voloper is stamped into the flags field, and the LockTimestamp is set to the time of the call. When the caller attempts to lock the entry for a release operation, special care is taken to abort the operation if the entry has already been locked for this operation, and the existing lock has timed out. In this case, VL SetLock() returns VL RERELEASE.
- The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL BADVOLOPER An illegal volume operation was specified in the voloper argument. Legal values are defined in the latter part of the table in Section 3.2.5.
VL ENTDELETED The given VLDB entry has already been marked as deleted.
VL ENTRYLOCKED The given VLDB entry has already been locked (which has not yet timed out).
VL RERELEASE A VLDB entry locked for release has timed out, and the caller also wanted to perform a release operation on it.
VL IO An error was experienced while attempting to write to the VLDB.
Section 3.6.9: VL ReleaseLock - Unlock VLDB entry
int VL ReleaseLock(IN struct rx connection *z conn,
IN long Volid,
IN long voltype,
IN long ReleaseType)
- Description
- Unlock the VLDB entry matching the given volume ID (Volid) and type (voltype). The ReleaseType argument determines which VLDB entry fields from flags and LockAfsId will be cleared along with the lock timestamp in LockTimestamp. Permissible values for the ReleaseType parameter are defined in Section 3.2.7.
- The VLDB is write-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
VL BADVOLTYPE An illegal volume type has been specified by the voltype argument.
VL BADRELLOCKTYPE An illegal release lock has been specified by the ReleaseType argument.
VL NOENT This volume instance does not appear in the VLDB.
VL ENTDELETED The given VLDB entry has already been marked as deleted.
VL IO An error was experienced while attempting to write to the VLDB.
Section 3.6.10: VL ListEntry - Get contents of
VLDB via index
int VL ListEntry(IN struct rx connection *z conn,
IN long previous index,
OUT long *count,
OUT long *next index,
OUT vldbentry *entry)
- Description
- This function assists in the task of enumerating the contents of the VLDB. Given an index into the database, previous index, this call return the single VLDB entry at that offset, placing it in the entry argument. The number of VLDB entries left to list is placed in count, and the index of the next entry to request is returned in next index. If an illegal index is provided, count is set to -1.
- The VLDB is read-locked for the duration of this operation.
- Error Codes
- ---None.
Section 3.6.11: VL ListAttributes - List all VLDB
entry matching given attributes, single return object
int VL ListAttributes(IN struct rx connection *z conn,
IN VldbListByAttributes *attributes,
OUT long *nentries,
OUT bulkentries *blkentries)
- Description
- Retrieve all the VLDB entries that match the attributes listed in the attributes parameter, placing them in the blkentries object. The number of matching entries is placed in nentries. Matching can be done by server number, partition, volume type, flag, or volume ID. The legal values to use in the attributes argument are listed in Section 3.2.3. Note that if the VLLIST VOLUMEID bit is set in attributes, all other bit values are ignored and the volume ID provided is the sole search criterion.
- The VLDB is read-locked for the duration of this operation.
- Note that VL ListAttributes() is a potentially expensive function, as sequential search through all of the VLDB entries is performed in most cases.
- Error Codes
- VL NOMEM Memory for the blkentries object could not be allocated.
VL NOENT This specified volume instance does not appear in the VLDB.
VL SIZEEXCEEDED Ran out of room in the blkentries object.
VL IO Error while reading from the VLDB.
Section 3.6.12: VL LinkedList - List all VLDB
entry matching given attributes, linked list return object
int VL LinkedList(IN struct rx connection *z conn,
IN VldbListByAttributes *attributes,
OUT long *nentries,
OUT vldb list *linkedentries)
- Description
- Retrieve all the VLDB entries that match the attributes listed in the attributes parameter, creating a linked list of entries based in the linkedentries object. The number of matching entries is placed in nentries. Matching can be done by server number, partition, volume type, flag, or volume ID. The legal values to use in the attributes argument are listed in Section 3.2.3. Note that if the VLLIST VOLUMEID bit is set in attributes, all other bit values are ignored and the volume ID provided is the sole search criterion.
- The VL LinkedList() function is identical to the VL ListAttributes(), except for the method of delivering the VLDB entries to the caller.
- The VLDB is read-locked for the duration of this operation.
- Error Codes
- VL NOMEM Memory for an entry in the list based at linkedentries object could not be allocated.
VL NOENT This specified volume instance does not appear in the VLDB.
VL SIZEEXCEEDED Ran out of room in the current list object.
VL IO Error while reading from the VLDB.
Section 3.6.13: VL GetStats - Get Volume Location
Server statistics
int VL GetStats(IN struct rx connection *z conn,
OUT vldstats *stats,
OUT vital vlheader *vital header)
- Description
- Collect the different types of VLDB statistics. Part of the VLDB header is returned in vital header, which includes such information as the number of allocations and frees performed, and the next volume ID to be allocated. The dynamic per-operation stats are returned in the stats argument, reporting the number and types of operations and aborts.
- The VLDB is read-locked for the duration of this operation.
- Error Codes
- VL PERM The caller is not authorized to execute this function.
Section 3.6.14: VL Probe - Verify Volume Location
Server connectivity/status
int VL Probe(IN struct rx connection *z conn)
- Description
- This routine serves a 'pinging' function to determine whether the Volume Location Server is still running. If this call succeeds, then the Volume Location Server is shown to be capable of responding to RPCs, thus confirming connectivity and basic operation.
- The VLDB is not locked for this operation.
- Error Codes
- ---None.
Section 3.7: Kernel Interface Subset
- The interface described by this document so far applies to user-level clients, such as the vos utility. However, some volume location operations must be performed from within the kernel. Specifically, the Cache Manager must find out where volumes reside and otherwise gather information about them in order to conduct its business with the File Servers holding them. In order to support Volume Location Server interconnection for agents operating within the kernel, the afsvlint.xg Rxgen interface was built. It is a minimal subset of the user-level vldbint.xg definition. Within afsvlint.xg, there are duplicate definitions for such constants as MAXNAMELEN, MAXNSERVERS, MAXTYPES, VLF RWEXISTS, VLF ROEXISTS, VLF BACKEXISTS, VLSF NEWREPSITE, VLSF ROVOL, VLSF RWVOL, and VLSF BACKVOL. Since the only operations the Cache Manager must perform are volume location given a specific volume ID or name, and to find out about unresponsive Volume Location Servers, the following interface routines are duplicated in afsvlint.xg, along with the struct vldbentry declaration:
- VL GetEntryByID()
- VL GetEntryByName()
- VL Probe()