Every file server machine maintains a list of its home cell's database server machines in the local disk file /usr/afs/etc/CellServDB on its local disk. Both database server processes and non-database server processes consult the file:
The database server processes (the Authentication, Backup, Protection, and Volume Location Servers) maintain constant contact with their peers in order to keep their copies of the replicated administrative databases synchronized.
As detailed in Replicating the OpenAFS Administrative Databases, the database server processes use the Ubik utility to synchronize the information in the databases they maintain. The Ubik coordinator at the synchronization site for each database maintains the single read/write copy of the database and distributes changes to the secondary sites as necessary. It must maintain contact with a majority of the secondary sites to remain the coordinator, and consults the CellServDB file to learn how many peers it has and on which machines they are running.
If the coordinator loses contact with the majority of its peers, they all cooperate to elect a new coordinator by majority vote. During the election, all of the Ubik processes consult the CellServDB file to learn where to send their votes, and what number constitutes a majority.
The non-database server processes must know which machines are running the database server processes in order to retrieve information from the databases. For example, the first time that a user accesses an AFS file, the File Server that houses it contacts the Protection Server to obtain a list of the user's group memberships (the list is called a current protection subgroup, or CPS). The File Server uses the CPS as it determines if the access control list (ACL) protecting the file grants the required permissions to the user (for more details, see About the Protection Database).
The consequences of missing or incorrect information in the CellServDB file are as follows:
If the file does not list a machine, then it is effectively not a database server machine even if the database server processes are running. The Ubik coordinator does not send it database updates or include it in the count that establishes a majority. It does not participate in Ubik elections, and so refuses to distribute database information to any client machines that happen to contact it (which they can do if their /usr/vice/etc/CellServDB file lists it). Users of the client machine must wait for a timeout before they can contact a correctly functioning database server machine.
If the file lists a machine that is not running the database server processes, the consequences can be serious. The Ubik coordinator cannot send it database updates, but includes it in the count that establishes a majority. If valid secondary sites go down and stop sending their votes to the coordinator, it can wrongly appear that the coordinator no longer has the majority it needs. The resulting election of a new coordinator causes a service outage during which information from the database becomes unavailable. Furthermore, the lack of a vote from the incorrectly listed site can disturb the election, if it makes the other sites believe that a majority of sites are not voting for the new coordinator.
A more minor consequence is that non-database server processes attempt to contact the database server processes on the machine. They experience a timeout delay because the processes are not running.
Note that the /usr/afs/etc/CellServDB file on a server machine is not the same as the /usr/vice/etc/CellServDB file on client machine. The client version includes entries for foreign cells as well as the local cell. However, it is important to update both versions of the file whenever you change your cell's database server machines. A server machine that is also a client needs to have both files, and you need to update them both. For more information on maintaining the client version of the CellServDB file, see Maintaining Knowledge of Database Server Machines.
To avoid the negative consequences of incorrect information in the /usr/afs/etc/CellServDB file, you must update it on all of your cell's server machines every time you add or remove a database server machine. The OpenAFS Quick Beginnings provides complete instructions for installing or removing a database server machine and for updating the CellServDB file in that context. This section explains how to distribute the file to your server machines and how to make other cells aware of the changes if you participate in the AFS global name space.
If you use the Update Server to distribute the central copy of the server CellServDB file stored on the cell's system control machine. For instructions on configuring the Update Server, see the OpenAFS Quick Beginnings.
To avoid formatting errors that can cause errors, always use the bos addhost and bos removehost commands, rather than editing the file directly. You must also restart the database server processes running on the machine, to initiate a coordinator election among the new set of database server machines. This step is included in the instructions that appear in To add a database server machine to the CellServDB file and To remove a database server machine from the CellServDB file. For instructions on displaying the contents of the file, see To display a cell's database server machines.
If you make your cell accessible to foreign users as part of the AFS global name space, you also need to inform other cells when you change your cell's database server machines. The AFS Support group maintains a CellServDB file that lists all cells that participate in the AFS global name space, and can change your cell's entry at your request. For further details, see Making Your Cell Visible to Others.
Another way to advertise your cell's database server machines is to maintain a copy of the file at the conventional location in your AFS filespace, /afs/cellname/service/etc/CellServDB.local. For further discussion, see The Third Level.
Issue the bos listhosts command. If you have maintained the file properly, the output is the same on every server machine, but the machine name argument enables you to check various machines if you wish.
% bos listhosts <machine name
> [<cell name
>]
where
Is the shortest acceptable abbreviation of listhosts.
Specifies the server machine from which to display the /usr/afs/etc/CellServDB file.
Specifies the complete Internet domain name of a foreign cell. You must already know the name of at least one server machine in the cell, to provide as the machine name argument.
The output lists the machines in the order they appear in the CellServDB file on the
specified server machine. It assigns each one a Host
index number, as in the following
example. There is no implied relationship between the index and a machine's IP address, name, or role as Ubik coordinator or
secondary site.
% bos listhosts fs1.example.com
Cell name is example.com
Host 1 is fs1.example.com
Host 2 is fs7.example.com
Host 3 is fs4.example.com
The output lists machines by name rather than IP address as long as the naming service (such as the Domain Name Service or local host table) is functioning properly. To display IP addresses, login to a server machine as the local superuser root and use a text editor or display command, such as the cat command, to view the /usr/afs/etc/CellServDB file.
Verify that you are listed in the /usr/afs/etc/UserList file. If necessary, issue the bos listusers command, which is fully described in To display the users in the UserList file.
% bos listusers <machine name
>
Issue the bos addhost command to add each new database server machine to the CellServDB file. Specify the system control machine as machine name. (If you have forgotten which machine is the system control machine, see The Output on the System Control Machine.)
% bos addhost <machine name
> <host name
>+
where
Is the shortest acceptable abbreviation of addhost.
Names the system control machine
Specifies the fully qualified hostname of each database server machine to add to the CellServDB file (for example: fs4.example.com). The BOS Server uses the gethostbyname() routine to obtain each machine's IP address and records both the name and address automatically.
Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine, so that the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the conventional names for the processes; make the appropriate substitution if you use different process names. For complete syntax, see Stopping and Immediately Restarting Processes.
Important: Repeat the following command in quick succession on all of the database server machines.
% bos restart <machine name
> buserver kaserver ptserver vlserver
Edit the /usr/vice/etc/CellServDB file on each of your cell's client machines. For instructions, see Maintaining Knowledge of Database Server Machines.
If you participate in the AFS global name space, please have one of your cell's designated site contacts register the changes you have made with the AFS Product Support group.
If you maintain a central copy of your cell's server CellServDB file in the conventional location (/afs/cellname/service/etc/CellServDB.local), edit the file to reflect the change.
Verify that you are listed in the /usr/afs/etc/UserList file. If necessary, issue the bos listusers command, which is fully described in To display the users in the UserList file.
% bos listusers <machine name
>
Issue the bos removehost command to remove each database server machine from the CellServDB file. Specify the system control machine as machine name. (If you have forgotten which machine is the system control machine, see The Output on the System Control Machine.)
% bos removehost <machine name
> <host name
>+
where
Is the shortest acceptable abbreviation of removehost.
Names the system control machine.
Specifies the fully qualified hostname of each database server machine to remove from the CellServDB file (for example: fs4.example.com).
Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine, so that the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the conventional names for the processes; make the appropriate substitution if you use different process names. For complete syntax, see Stopping and Immediately Restarting Processes.
Important: Repeat the following command in quick succession on all of the database server machines.
% bos restart <machine name
> buserver kaserver ptserver vlserver
Edit the /usr/vice/etc/CellServDB file on each of your cell's client machines. For instructions, see Maintaining Knowledge of Database Server Machines.
If you participate in the AFS global name space, please have one of your cell's designated site contacts register the changes you have made with the AFS Product Support group.
If you maintain a central copy of your cell's server CellServDB file in the conventional location (/afs/cellname/service/etc/CellServDB.local), edit the file to reflect the change.