Handling Server Encryption Key Emergencies

In rare circumstances, the AFS server processes can become unable to decrypt the server tickets that clients or peer server processes are presenting. Activity in your cell can come to a halt, because the server processes believe that the tickets are forged or expired, and refuse to execute any actions. This can happen on one machine or several; the effect is more serious when more machines are involved.

One common cause of server encryption key problems is that the client's ticket is encrypted with a key that the server process does not know. Usually this means that the /usr/afs/etc/KeyFile on the server machine does not include the key in the afs Authentication Database entry, which the Authentication Server's Ticket Granting Service (TGS) module is using to encrypt server tickets.

Another possibility is that the KeyFile files on different machines do not contain the same keys. In this case, communications among server processes themselves become impossible. For instance, AFS's replicated database mechanism (Ubik) breaks down if the instances of a database server process on the different database server machines are not using the same key.

The appearance of the following error message when you direct a bos command to a file server machine in the local cell is one possible symptom of server encryption key mismatch. (Note, however, that you can also get this message if you forget to include the -cell argument when directing the bos command to a file server machine in a foreign cell.)

   bos: failed to contact host's bosserver (security object was passed a bad ticket).

The solution to server encryption key emergencies is to put a new AFS server encryption key in both the Authentication Database and the KeyFile file on every server machine, so that the TGS and all server processes again share the same key.

Handling key emergencies requires some unusual actions. The reasons for these actions are explained in the following sections; the actual procedures appear in the subsequent instructions.

Prevent Mutual Authentication

It is necessary to prevent the server processes from trying to mutually authenticate with you as you deal with a key emergency, because they possibly cannot decrypt your token. When you do not mutually authenticate, the server processes assign you the identity anonymous. To prevent mutual authentication, use the unlog command to discard your tokens and include the -noauth flag on every command where it is available.

Disable Authorization Checking by Hand

Because the server processes recognize you as the user anonymous when you do not mutually authenticate, you must turn off authorization checking. Only with authorization checking disabled do the server processes allow the anonymous user to perform privileged actions such as key creation.

In an emergency, disable authorization checking by creating the file /usr/afs/local/NoAuth by hand. In normal circumstances, use the bos setauth command instead.

Work Quickly on Each Machine

Disabling authorization checking is a serious security exposure, because server processes on the affected machine perform any action for anyone. Disable authorization checking only for as long as necessary, completing all steps in an uninterrupted session and as quickly as possible.

Work at the Console

Working at the console of each server machine on which you disable authorization checking ensures that no one else logs onto the console while you are working there. It does not prevent others from connecting to the machine remotely (using the telnet program, for example), which is why it is important to work quickly. The only way to ensure complete security is to disable network traffic, which is not a viable option in many environments. You can improve security in general by limiting the number of people who can connect remotely to your server machines at any time, as recommended in Improving Security in Your Cell.

Change Individual KeyFile Files

If you use the Update Server to distribute the contents of the /usr/afs/etc directory, an emergency is the only time when it is appropriate to change the KeyFile file on individual machines instead. Updating each machine's file is necessary because mismatched keys can prevent the system control machine's upserver process from mutually authenticating with upclientetc processes on other server machines, in which case the upserver process refuses to distribute its KeyFile file to them.

Even if it appears that the Update Server is working correctly, the only way to verify that is to change the key on the system control machine and wait the standard delay period to see if the upclientetc processes retrieve the key. During an emergency, it does not usually make sense to wait the standard delay period. It is more efficient simply to update the file on each server machine separately. Also, even if the Update Server can distribute the file correctly, other processes can have trouble because of mismatched keys. The following instructions add the new key file on the system control machine first. If the Update Server is working, then it is distributing the same change as you are making on each server machine individually.

If your cell does not use the Update Server or you always change keys on server machines individually. The following instructions are also appropriate for you.

Two Component Procedures

There are two subprocedures used frequently in the following instructions: disabling authorization checking and reenabling it. For the sake of clarity, the procedures are detailed here; the instructions refer to them as necessary.

Disabling Authorization Checking in an Emergency

  1. Become the local superuser root on the machine, if you are not already, by issuing the su command.

       % su root
       Password: <root_password>
    
  2. Create the file /usr/afs/local/NoAuth to disable authorization checking.

       # touch /usr/afs/local/NoAuth
    
  3. Discard your tokens, in case they were sealed with an incompatible key, which can prevent some commands from executing.

       # unlog
    

Reenabling Authorization Checking in an Emergency

  1. Become the local superuser root on the machine, if you are not already, by issuing the su command.

       % su root
       Password: <root_password>
    
  2. Remove the /usr/afs/local/NoAuth file.

       # rm /usr/afs/local/NoAuth
    
  3. Authenticate as an administrative identity that belongs to the system:administrators group and is listed in the /usr/afs/etc/UserList file.

       # klog <admin_user>
       Password: <admin_password>
    
  4. If appropriate, log out from the console (or close the remote connection you are using), after issuing the unlog command to destroy your tokens.

To create a new server encryption key in emergencies

  1. On the system control machine, disable authorization checking as instructed in Disabling Authorization Checking in an Emergency.

  2. Issue the bos listkeys command to display the key version numbers already in use in the KeyFile file, as a first step in choosing the new key's key version number.

       # bos listkeys <machine name> -noauth
    

    where

    listk

    Is the shortest acceptable abbreviation of listkeys.

    machine name

    Specifies a file server machine.

    -noauth

    Bypasses mutual authentication with the BOS Server. Include it in case the key emergency is preventing successful mutual authentication.

  3. Choose a key version number for the new key, based on what you learned in Step 2 plus the following requirements:

    • It is best to keep your key version numbers in sequence by choosing a key version number one greater than the largest existing one.

    • Key version numbers must be integers between 0 and 255 to comply with Kerberos standards.

    • Do not reuse a key version number currently listed in the KeyFile file.

  4. On the system control machine, issue the bos addkey command to create a new AFS server encryption key in the KeyFile file.

       # bos addkey <machine name> -kvno <key version number> -noauth
       input key: <afs_password>
       Retype input key: <afs_password>
    

    where

    addk

    Is the shortest acceptable abbreviation of addkey.

    machine name

    Names the file server machine on which to define the new key in the KeyFile file.

    -kvno

    Specifies the key version number you chose in Step 3, an integer in the range 0 (zero) through 255. You must specify the same number in Steps 7, 8, and 13.

    -noauth

    Bypasses mutual authentication with the BOS Server. Include it in case the key emergency is preventing successful mutual authentication.

    afs_password

    Is a character string similar to a user password, of any length from one to about 1,000 characters. To improve security, make the string as long as is practical, and include nonalphabetic characters.

    Do not type an octal string directly. The BOS Server scrambles the character string into an octal string appropriate for use as an encryption key before recording it in the KeyFile file.

    Remember the string. You need to use it again in Steps 7, 8, and 13.

  5. On every database server machine in your cell (other than the system control machine), disable authorization checking as instructed in Disabling Authorization Checking in an Emergency. Do not repeat the procedure on the system control machine, if it is a database server machine, because you already disabled authorization checking in Step 1. (If you need to learn which machines are database server machines, use the bos listhosts command as described in To locate database server machines.)

  6. Wait at least 90 seconds after finishing Step 5, to allow each of the database server processes (the Authentication, Backup, Protection and Volume Location Servers) to finish electing a new sync site. Then issue the udebug command to verify that the election worked properly. Issue the following commands, substituting each database server machine's name for server machine in turn. Include the system control machine if it is a database server machine.

       # udebug <server machine> buserver
       # udebug <server machine> kaserver
       # udebug <server machine> ptserver
       # udebug <server machine> vlserver
    

    For each process, the output from all of the database server machines must agree on which one is the sync site for the process. It is not, however, necessary that the same machine serves as the sync site for each of the four processes. For each process, the output from only one machine must include the following string:

       I am sync site ...
    

    The output on the other machines instead includes the following line

       I am not sync site
    

    and a subsequent line that begins with the string Sync host and specifies the IP address of the machine claiming to be the sync site.

    If the output does not meet these requirements or seems abnormal in another way, contact AFS Product Support for assistance.

  7. On every database server machine in your cell (other than the system control machine), issue the bos addkey command described in Step 4. Be sure to use the same values for afs_password and kvno as you used in that step.

  8. Issue the kas setpassword command to define the new key in the Authentication Database's afs entry. It must match the key you created in Step 4 and Step 7.

       # kas setpassword  -name afs  -kvno <key version number> -noauth
       new_password: <afs_password>
       Verifying, please re-enter new_password: <afs_password>
    

    where

    sp

    Is an acceptable alias for setpassword (setp is the shortest acceptable abbreviation).

    -kvno

    Is the same key version number you specified in Step 4.

    afs_password

    Is the same character string you specified as afs_password in Step 4. It does not echo visibly.

  9. On every database server machine in your cell (including the system control machine if it is a database server machine), reenable authorization checking as instructed in Reenabling Authorization Checking in an Emergency. If the system control machine is not a database server machine, do not perform this procedure until Step 11.

  10. Repeat Step 6 to verify that each database server process has properly elected a sync site after being restarted in Step 9.

  11. On the system control machine (if it is not a database server machine), reenable authorization checking as instructed in Reenabling Authorization Checking in an Emergency. If it is a database server machine, you already performed the procedure in Step 9.

  12. On all remaining (simple) file server machines, disable authorization checking as instructed in Disabling Authorization Checking in an Emergency.

  13. On all remaining (simple) file server machines, issue the bos addkey command described in Step 4. Be sure to use the same values for afs_password and kvno as you used in that step.

  14. On all remaining (simple) file server machines, reenable authorization checking as instructed in Reenabling Authorization Checking in an Emergency.