Administering Database Server Machines

Administering Database Server Machines
	Chapter 3. Administering Server Machines

This section explains how to administer database server machines. For installation instructions, see the OpenAFS Quick Beginnings.

Replicating the OpenAFS Administrative Databases

There are several benefits to replicating the AFS administrative databases (the Authentication, Backup, Protection, and Volume Location Databases), as discussed in Replicating the OpenAFS Administrative Databases. For correct cell functioning, the copies of each database must be identical at all times. To keep the databases synchronized, AFS uses library of utilities called Ubik. Each database server process runs an associated lightweight Ubik process, and client-side programs call Ubik's client-side subroutines when they submit requests to read and change the databases.

Ubik is designed to work with minimal administrator intervention, but there are several configuration requirements, as detailed in Configuring the Cell for Proper Ubik Operation. The following brief overview of Ubik's operation is helpful for understanding the requirements. For more details, see How Ubik Operates Automatically.

Ubik is designed to distribute changes made in an AFS administrative database to all copies as quickly as possible. Only one copy of the database, the synchronization site, accepts change requests from clients; the lightweight Ubik process running there is the Ubik coordinator. To maintain maximum availability, there is a separate Ubik coordinator for each database, and the synchronization site for each of the four databases can be on a different machine. The synchronization site for a database can also move from machine to machine in response to process, machine, or network outages.

The other copies of a database, and the Ubik processes that maintain them, are termed secondary. The secondary sites do not accept database changes directly from client-side programs, but only from the synchronization site.

After the Ubik coordinator records a change in its copy of a database, it immediately sends the change to the secondary sites. During the brief distribution period, clients cannot access any of the copies of the database, even for reading. If the coordinator cannot reach a majority of the secondary sites, it halts the distribution and informs the client that the attempted change failed.

To avoid distribution failures, the Ubik processes maintain constant contact by exchanging time-stamped messages. As long as a majority of the secondary sites respond to the coordinator's messages, there is a quorum of sites that are synchronized with the coordinator. If a process, machine, or network outage breaks the quorum, the Ubik processes attempt to elect a new coordinator in order to establish a new quorum among the highest possible number of sites. See A Flexible Coordinator Boosts Availability.

Configuring the Cell for Proper Ubik Operation

This section describes how to configure your cell to maintain proper Ubik operation.

Run all four database server processes--Authentication Server, Backup Server, Protection Server, and VL Server--on all database server machines.
Both the client and server portions of Ubik expect that all the database server machines listed in the CellServDB file are running all of the database server processes. There is no mechanism for indicating that only some database server processes are running on a machine.
Maintain correct information in the /usr/afs/etc/CellServDB file at all times.
Ubik consults the /usr/afs/etc/CellServDB file to determine the sites with which to establish and maintain a quorum. Incorrect information can result in unsynchronized databases or election of a coordinator in each of several subgroups of machines, because the Ubik processes on various machines do not agree on which machines need to participate in the quorum.
If you use the Update Server, it is simplest to maintain the /usr/afs/etc/CellServDB file on the system control machine, which distributes its copy to all other server machines. The OpenAFS Quick Beginnings explains how to configure the Update Server.
The only reason to alter the file is when configuring or decommissioning a database server machine. Use the appropriate bos commands rather than editing the file by hand. For instructions, see Maintaining the Server CellServDB File. The instructions in Monitoring and Controlling Server Processes for stopping and starting processes remind you to alter the CellServDB file when appropriate, as do the instructions in the OpenAFS Quick Beginnings for installing or decommissioning a database server machine.
(Client processes and the server processes that do not maintain databases also rely on correct information in the CellServDB file for proper operation, but their use of the information does not affect Ubik's operation. See Maintaining the Server CellServDB File and Maintaining Knowledge of Database Server Machines.)
Keep the clocks synchronized on all machines in the cell, especially the database server machines.
Keeping clocks synchronized is important because the Ubik processes at a database's sites timestamp the messages which they exchange to maintain constant contact. Timestamping the messages is necessary because in a networked environment it is not safe to assume that a message reaches its destination instantly. Ubik compares the timestamp on an incoming message with the current time. If the difference is too great, it is possible that an outage is preventing reliable communication between the Ubik sites, which can possibly result in unsynchronized databases. Ubik considers the message invalid, which can prompt it to attempt election of a different coordinator.
Electing a new coordinator is appropriate if a timestamped message is expired due to actual interruption of communication, but not if a message appears expired only because the sender and recipient do not share the same time. For detailed examples of how unsynchronized clocks can destabilize Ubik operation, see How Ubik Uses Timestamped Messages.

How Ubik Operates Automatically

The following Ubik features help keep its maintenance requirements to a minimum:

Ubik's server and client portions operate automatically.
Each database server process runs a lightweight process to call on the server portion of the Ubik library. It is common to refer to this lightweight process itself as Ubik. Because it is lightweight, the Ubik process does not appear in process listings such as those generated by the UNIX ps command. Client-side programs that need to read and change the databases directly call the subroutines in the Ubik library's client portion, rather than running a separate lightweight process. Examples of such programs are the klog command and the commands in the pts suite.
Ubik tracks database version numbers.
As the coordinator records a change to a database, it increments the database's version number. The version number makes it easy for the coordinator to determine if a site has the most recent version or not. The version number speeds the return to normal functioning after election of a new coordinator or when communication is restored after an outage, because it makes it easy to determine which site has the most current database and which need to be updated.
Ubik's use of timestamped messages guarantees that database copies are always synchronized during normal operation.
Replicating a database to increase data availability is pointless if all copies of the database are not the same. Inconsistent performance can result if clients receive different information depending on which copy of the database they access. As previously noted, Ubik sites constantly track the status of their peers by exchanging timestamped messages. For a detailed description, see How Ubik Uses Timestamped Messages.
The ability to move the coordinator maximizes database availability.
Suppose, for example, that in a cell with three database server machines a network partition separates the two secondary sites from the coordinator. The coordinator retires because it is no longer in contact with a majority of the sites listed in the CellServDB file. The two sites on the other side of the partition can elect a new coordinator among themselves, and it can then accept database changes from clients. If the coordinator cannot move in this way, the database has to be read-only until the network partition is repaired. For a detailed description of Ubik's election procedure, see A Flexible Coordinator Boosts Availability.

How Ubik Uses Timestamped Messages

Ubik synchronizes the copies of a database by maintaining constant contact between the synchronization site and the secondary sites. The Ubik coordinator frequently sends a time-stamped guarantee message to each of the secondary sites. When the secondary site receives the message, it concludes that it is in contact with the coordinator. It considers its copy of the database to be valid until time T, which is usually 60 seconds from the time the coordinator sent the message. In response, the secondary site returns a vote message that acknowledges the coordinator as valid until a certain time X, which is usually 120 seconds in the future.

The coordinator sends guarantee messages more frequently than every T seconds, so that the expiration periods overlap. There is no danger of expiration unless a network partition or other outage actually interrupts communication. If the guarantee expires, the secondary site's copy of the database it not necessarily current. Nonetheless, the database server continues to service client requests. It is considered better for overall cell functioning that a secondary site remains accessible even if the information it is distributing is possibly out of date. Most of the AFS administrative databases do not change that frequently, in any case, and making a database inaccessible causes a timeout for clients that happen to access that copy.

As previously mentioned, Ubik's use of timestamped messages makes it vital to synchronize the clocks on database server machines. There are two ways that skewed clocks can interrupt normal Ubik functioning, depending on which clock is ahead of the others.

Suppose, for example, that the Ubik coordinator's clock is ahead of the secondary sites: the coordinator's clock says 9:35:30, but the secondary clocks say 9:31:30. The secondary sites send votes messages that acknowledge the coordinator as valid until 9:33:30. This is two minutes in the future according to the secondary clocks, but is already in the past from the coordinator's perspective. The coordinator concludes that it no longer has enough support to remain coordinator and forces election of a new coordinator. Election takes about three minutes, during which time no copy of the database accepts changes.

The opposite possibility is that a secondary site's clock (14:50:00) is ahead of the coordinator's (14:46:30). When the coordinator sends a guarantee message good until 14:47:30), it has already expired according to the secondary clock. Believing that it is out of contact with the coordinator, the secondary site stops sending votes for the coordinator and tries get itself elected as coordinator. This is appropriate if the coordinator has actually failed, but is inappropriate when there is no actual outage.

The attempt of a single secondary site to get elected as the new coordinator usually does not affect the performance of the other sites. As long as their clocks agree with the coordinator's, they ignore the other secondary site's request for votes and continue voting for the current coordinator. However, if enough of the secondary sites's clocks get ahead of the coordinator's, they can force election of a new coordinator even though the current one is actually working fine.

A Flexible Coordinator Boosts Availability

Ubik uses timestamped messages to determine when coordinator election is necessary, just as it does to keep the database copies synchronized. As long as the coordinator receives vote messages from a majority of the sites (it implicitly votes for itself), it is appropriate for it to continue as coordinator because it is successfully distributing database changes. A majority is defined as more than 50% of all database sites when there are an odd number of sites; with an even number of sites, the site with the lowest Internet address has an extra vote for breaking ties as necessary.If the coordinator is not receiving sufficient votes, it retires and the Ubik sites elect a new coordinator. This does not happen spontaneously, but only when the coordinator really fails or stops receiving a majority of the votes. The secondary sites have a built-in bias to continue voting for an existing coordinator, which prevents undue elections.

The election of the new coordinator is by majority vote. The Ubik subprocesses have a bias to vote for the site with the lowest Internet address, which helps it gather the necessary majority quicker than if all the sites were competing to receive votes themselves. During the election (which normally lasts less than three minutes), clients can read information from the database, but cannot make any changes.

Ubik's election procedure makes it possible for each database server process's coordinator to be on a different machine. For example, if the Ubik coordinators for all four processes start out on machine A and the Protection Server on machine A fails for some reason, then a different site (say machine B) must be elected as the new Protection Database Ubik coordinator. Machine B remains the coordinator for the Protection Database even after the Protection Server on machine A is working again. The failure of the Protection Server has no effect on the Authentication, Backup, or VL Servers, so their coordinators remain on machine A.

Backing Up and Restoring the Administrative Databases

The AFS administrative databases store information that is critical for AFS operation in your cell. If a database becomes corrupted due to a hardware failure or other problem on a database server machine, it likely to be difficult and time-consuming to recreate all of the information from scratch. To protect yourself against loss of data, back up the administrative databases to a permanent media, such as tape, on a regular basis. The recommended method is to use a standard local disk backup utility such as the UNIX tar command.

When deciding how often to back up a database, consider the amount of data that you are willing to recreate by hand if it becomes necessary to restore the database from a backup copy. In most cells, the databases differ quite a bit in how often and how much they change. Changes to the Authentication Database are probably the least frequent, and consist mostly of changed user passwords. Protection Database and VLDB changes are probably more frequent, as users add or delete groups and change group memberships, and as you and other administrators create or move volumes. The number and frequency of changes is probably greatest in the Backup Database, particularly if you perform backups every day.

The ease with which you can recapture lost changes also differs for the different databases:

If regular users make a large proportion of the changes to the Authentication Database and Protection Database in your cell, then recovering them possibly requires a large amount of detective work and interviewing of users, assuming that they can even remember what changes they made at what time.
Recovering lost changes to the VLDB is more straightforward, because you can use the vos syncserv and vos syncvldb commands to correct any discrepancies between the VLDB and the actual state of volumes on server machines. Running these commands can be time-consuming, however.
The configuration information in the Backup Database (Tape Coordinator port offsets, volume sets and entries, the dump hierarchy, and so on) probably does not change that often, in which case it is not that hard to recover a few recent changes. In contrast, there are likely to be a large number of new dump records resulting from dump operations. You can recover these records by using the -dbadd argument to the backup scantape command, reading in information from the backup tapes themselves. This can take a long time and require numerous tape changes, however, depending on how much data you back up in your cell and how you append dumps. Furthermore, the backup scantape command is subject to several restrictions. The most basic is that it halts if it finds that an existing dump record in the database has the same dump ID number as a dump on the tape it is scanning. If you want to continue with the scanning operation, you must locate and remove the existing record from the database. For further discussion, see the backup scantape command's reference page in the OpenAFS Administration Reference.

These differences between the databases possibly suggest backing up the database at different frequencies, ranging from every few days or weekly for the Backup Database to every few weeks for the Authentication Database. On the other hand, it is probably simpler from a logistical standpoint to back them all up at the same time (and frequently), particularly if tape consumption is not a major concern. Also, it is not generally necessary to keep backup copies of the databases for a long time, so you can recycle the tapes fairly frequently.

To back up the administrative databases

Log in as the local superuser root on a database server machine that is not the synchronization site. The machine with the highest IP address is normally the best choice, since it is least likely to become the synchronization site in an election.
Issue the bos shutdown command to shut down the relevant server process on the local machine. For a complete description of the command, see To stop processes temporarily.
For the -instance argument, specify one or more database server process names (buserver for the Backup Server, kaserver for the Authentication Server, ptserver for the Protection Server, or vlserver for the Volume Location Server. Include the -localauth flag because you are logged in as the local superuser root but do not necessarily have administrative tokens.
```
   # bos shutdown <machine name> -instance <instances>+ -localauth [-wait]
```
Use a local disk backup utility, such as the UNIX tar command, to transfer one or more database files to tape. If the local database server machine does not have a tape device attached, use a remote copy command to transfer the file to a machine with a tape device, then use the tar command there.
The following command sequence backs up the complete contents of the /usr/afs/db directory
```
   # cd /usr/afs/db
   # tar cvf  tape_device .
```
To back up individual database files, substitute their names for the period in the preceding tar command:
- bdb.DB0 for the Backup Database
- kaserver.DB0 for the Authentication Database
- prdb.DB0 for the Protection Database
- vldb.DB0 for the VLDB
Issue the bos start command to restart the server processes on the local machine. For a complete description of the command, see To start processes by changing their status flags to Run. Provide the same values for the -instance argument as in Step 2, and the -localauth flag for the same reason.
```
   # bos start <machine name> -instance <server process name>+ -localauth
```

To restore an administrative database

Log in as the local superuser root on each database server machine in the cell.
Working on one of the machines, issue the bos shutdown command once for each database server machine, to shut down the relevant server process on all of them. For a complete description of the command, see To stop processes temporarily.
For the -instance argument, specify one or more database server process names (buserver for the Backup Server, kaserver for the Authentication Server, ptserver for the Protection Server, or vlserver for the Volume Location Server. Include the -localauth flag because you are logged in as the local superuser root but do not necessarily have administrative tokens.
```
   # bos shutdown <machine name> -instance <instances>+ -localauth [-wait]
```

Remove the database from each database server machine, by issuing the following commands on each one.

   # cd /usr/afs/db

For the Backup Database:

   # rm bdb.DB0
   # rm bdb.DBSYS1

For the Authentication Database:

   # rm kaserver.DB0
   # rm kaserver.DBSYS1

For the Protection Database:

   # rm prdb.DB0
   # rm prdb.DBSYS1

For the VLDB:

   # rm vldb.DB0
   # rm vldb.DBSYS1

Using the local disk backup utility that you used to back up the database, copy the most recently backed-up version of it to the appropriate file on the database server machine with the lowest IP address. The following is an appropriate tar command if the synchronization site has a tape device attached:
```
   # cd /usr/afs/db
   # tar xvf tape_device  database_file
```
where database_file is one of the following:
- bdb.DB0 for the Backup Database
- kaserver.DB0 for the Authentication Database
- prdb.DB0 for the Protection Database
- vldb.DB0 for the VLDB
Working on one of the machines, issue the bos start command to restart the server process on each of the database server machines in turn. Start with the machine with the lowest IP address, which becomes the synchronization site for the Backup Database. Wait for it to establish itself as the synchronization site before repeating the command to restart the process on the other database server machines. For a complete description of the command, see To start processes by changing their status flags to Run. Provide the same values for the -instance argument as in Step 2, and the -localauth flag for the same reason.
```
   # bos start <machine name> -instance  <server process name>+  -localauth
```
If the database has changed since you last backed it up, issue the appropriate commands from the instructions in the indicated sections to recreate the information in the restored database. If issuing pts commands, you must first obtain administrative tokens. The backup and vos commands accept the -localauth flag if you are logged in as the local superuser root, so you do not need administrative tokens. The Authentication Server always performs a separate authentication anyway, so you only need to include the -admin argument if issuing kas commands.
- To define or remove volume sets and volume entries in the Backup Database, see Defining and Displaying Volume Sets and Volume Entries.
- To edit the dump hierarchy in the Backup Database, see Defining and Displaying the Dump Hierarchy.
- To define or remove Tape Coordinator port offset entries in the Backup Database, see Configuring Tape Coordinator Machines and Tape Devices.
- To restore dump records in the Backup Database, see To scan the contents of a tape.
- To recreate Authentication Database entries or password changes for users, see the appropriate section of Administering User Accounts.
- To recreate Protection Database entries or group membership information, see the appropriate section of Administering the Protection Database.
- To synchronize the VLDB with volume headers, see Synchronizing the VLDB and Volume Headers.


The Four Roles for File Server Machines		Installing Server Process Software