The AFS Backup System is highly flexible, enabling you to control most aspects of the backup process, including how often backups are performed, which volumes are backed up, and whether to dump all of the data in a volume or just the data that has changed since the last dump operation. You can also take advantage of several features that automate much of the backup process.
To administer and use the Backup System most efficiently, it helps to be familiar with its basic features, which are described in the following sections. For pointers to instructions for implementing the features as you configure the Backup System in your cell, see Overview of Backup System Configuration.
When you back up AFS data, you specify which data to include in terms of complete volumes rather than individual files. More precisely, you define groups of volumes called volume sets, each of which includes one or more volumes that you want to back up in a single operation. You must include a volume in a volume set to back it up, because the command that backs up data (the backup dump command) does not accept individual volume names.
A volume set consists of one or more volume entries, each of which specifies which volumes to back up based on their location (file server machine and partition) and volume name. You can use a wildcard notation to include all volumes that share a location, a common character string in their names, or both.
For instructions on creating and removing volume sets and volume entries, see Defining and Displaying Volume Sets and Volume Entries.
A dump is the collection of data that results from backing up a volume set. A full dump includes all of the data in every volume in the volume set, as it exists at the time of the dump operation. An incremental dump includes only some of the data from the volumes in the volume set, namely those files and directory structures that have changed since a specified previous dump operation was performed. The previous dump is referred to as the incremental dump's parent dump, and it can be either a full dump or an incremental dump itself.
A dump set is a collection of one or more dumps stored together on one or more tapes. The first dump in the dump set is the initial dump, and any subsequent dump added onto the end of an existing dump set is an appended dump. Appending dumps is always optional, but maximizes use of a tape's capacity. In contrast, creating only initial dumps can result in many partially filled tapes, because an initial dump must always start on a new tape, but does not necessarily extend to the end of the tape. Appended dumps do not have to be related to one another or to the initial dump (they do not have to be dumps of the same or related volume sets), but well-planned appending can reduce the number of times you have to change tapes during a restore operation. For example, it can make sense to append incremental dumps of a volume set together in a single dump set.
All the records for a dump set are indexed together in the Backup Database based on the initial dump (for more on the Backup Database, see The Backup Database and Backup Server Process). To delete the database record of an appended dump, you must delete the initial dump record, and doing so deletes the records for all dumps in the dump set. Similarly, you cannot recycle just one tape in a dump set without deleting the database records of all tapes in the dump set.
For instructions on creating an initial dump, see Backing Up Data, and to learn how to append dumps, see Appending Dumps to an Existing Dump Set.
A dump hierarchy is a logical structure that defines the relationship between full and incremental dumps; that is, it defines which dump serves as the parent for an incremental dump. Each individual component of a hierarchy is a dump level. When you create a dump by issuing the backup dump command, you specify a volume set name and a dump level name. The Backup System uses the dump level to determine whether the dump is full or incremental, and if incremental, which dump level to use as the parent.
You can associate an expiration date with a dump level, to define when a dump created at that level expires. The Backup System refuses to overwrite a tape until all dumps in the dump set to which the tape belongs have expired, so assigning expiration dates automatically determines how you recycle tapes. You can define an expiration date either in absolute terms (for example, 13 January 2000) or relative terms (for example, 30 days from when the dump is created). You can also change the expiration date associated with a dump level (but not with an actual dump that has already been created at that level).
For instructions on creating dump hierarchies, assigning expiration dates, and establishing a tape recycling schedule, see Defining and Displaying the Dump Hierarchy.
When you create a dump, the Backup System creates a Backup Database record for it, assigning a name comprising the volume set name and the last element in the dump level pathname:
volume_set_name.dump_level_name
For example, a dump of the volume set user at the dump level /sunday/friday is called user.friday. The Backup System also assigns a unique dump ID number to the dump to distinguish it from other dumps with the same name that possibly exist.
The Backup System assigns a similar AFS tape name to each tape that contains a dump set, reflecting the volume set and dump level of the dump set's initial dump, plus a numerical index of the tape's position in the dump set, and a unique dump ID number:
volume_set_name.dump_level_name.tape_index (dump ID)
For example, the second tape in a dump set whose initial dump is of the volume set uservol at the dump level /sunday/friday has AFS tape name like uservol.friday.2 (914382400).
In addition to its AFS tape name, a tape can have an optional permanent name that you assign. Unlike the AFS tape name, the permanent name does not have to indicate the volume set and dump level of the initial (or any other) dump, and so does not change depending on the contents of the tape. The Backup System does not require a certain format for permanent names, so you need to make sure that each tape's name is unique. If a tape has a permanent name, the Backup System uses it rather than the AFS tape name when referring to the tape in prompts and the output from most backup commands, but still tracks the AFS tape name internally.
Every tape used in the Backup System has a magnetic label at the beginning that records the tape's name, capacity, and other information. You can use the backup labeltape command to write a label, or the backup dump command creates one automatically if you use an unlabeled tape. The label records the following information:
The tape's permanent name, which you can assign by using the -pname argument to
the backup labeltape command. It can be any string of up to 32 characters. If you do
not assign a permanent name, the Backup System records the value <NULL>
when you
use the backup labeltape command to assign an AFS tape name, or when you use the
backup dump command to write a dump to the tape.
The tape's AFS tape name, which can be one of three types of values:
A name that reflects the volume set and dump level of the dump set's initial dump and the tape's place in the sequence of tapes for the dump set, as described in Dump Names and Tape Names. If the tape does not have a permanent name, you can assign the AFS tape name by using the -name argument to the backup labeltape command.
The value <NULL>
, which results when you assign a permanent name, or
provide no value for the backup labeltape command's -name argument.
No AFS tape name at all, indicating that you have never labeled the tape or written a dump to it.
If a tape does not already have an actual AFS tape name when you write a dump to it, the Backup System constructs and records the appropriate AFS tape name. If the tape does have an AFS tape name and you are writing an initial dump, then the name must correctly reflect the dump's volume set and dump level.
The capacity, or size, of the tape, followed by a letter that indicates the unit of measure
(k
or K
for kilobytes,
m
or M
for megabytes,
g
or G
for gigabytes, or
t
or T
for terabytes). The tape's manufacturer
determines the tape's capacity. For further discussion of how the Backup System uses the value in the capacity field,
see Configuring the tapeconfig File.
For information about labeling tapes, see Writing and Reading Tape Labels.
In addition to the tape label, the Backup System writes a dump label on the tape for every appended dump (the tape label and dump label are the same for the initial dump). A dump label records the following information:
The name of the tape containing the dump
The date and time that the dump operation began
The cell to which the volumes in the dump belong
The dump's size in kilobytes
The dump's dump level
The dump's dump ID
The Backup System writes a filemark (also called an End-of-File or EOF marker) between the data from each volume in a dump. The tape device's manufacturer determines the filemark size, which is typically between 2 KB and 2 MB; in general, the larger the usual capacity of the tapes that the device uses, the larger the filemark size. If a dump contains a small amount of data from each of a large number of volumes, as incremental dumps often do, then the filemark size can significantly affect how much volume data fits on the tape. To enable the Backup System to factor in filemark size as it writes a dump, you can record the filemark size in a configuration file; see Configuring the tapeconfig File.
A Tape Coordinator machine is a machine that drives one or more attached tape devices used for backup operations. It must run the AFS client software (the Cache Manager) but reside in a physically secure location to prevent unauthorized access to its console. Before backup operations can run on a Tape Coordinator machine, each tape device on the machine must be registered in the Backup Database, and certain files and directories must exist on the machine's local disk; for instructions, see To configure a Tape Coordinator machine.
Each tape device on a Tape Coordinator machine listens for backup requests on a different UNIX port. You pick the port indirectly by assigning a port offset number to the tape device. The Backup System sets the device's actual port by adding the port offset to a base port number that it determines internally. For instructions on assigning port offset numbers, see Configuring the tapeconfig File.
For a tape device to perform backup operations, a Backup Tape Coordinator (butc) process dedicated to the device must be running actively on the Tape Coordinator machine. You then direct backup requests to the device's Tape Coordinator by specifying its port offset number with the -portoffset argument to the backup command.
In addition to writing backup data to tape, you can direct it to a backup data file on the local disk of a Tape Coordinator machine. You can then to transfer the data to a data-archiving system, such as a hierarchical storage management (HSM) system, that you use in conjunction with AFS and the Backup System. A backup data file has a port offset like a tape device. For instructions on configuring backup data files, see Dumping Data to a Backup Data File.
The Backup Database is a replicated administrative database maintained by the Backup Server process on the cell's database server machines. Like the other AFS database server processes, the Backup Server uses the Ubik utility to keep the various copies of the database synchronized (for a discussion of Ubik, see Replicating the OpenAFS Administrative Databases).
The Backup Database records the following information:
The Tape Coordinator machine's hostname and the port offset number for each tape device used for backup operations
The dump hierarchy, which consists of its component dump levels and their associated expiration dates
The volume sets and their component volume entries
A record for each dump, which includes the name of each tape it appears on, a list of the volumes from which data is included, the dump level, the expiration date, and the dump ID of the initial dump with which the dump is associated
A record for each tape that houses dumped data
The backup suite of commands is the administrative interface to the Backup System. You can issue the commands in a command shell (or invoke them in a shell script) on any AFS client or server machine from which you can access the backup binary. In the conventional configuration, the binary resides on the local disk.
The backup command suite provides an interactive mode, in which you can issue multiple commands over a persistent connection to the Backup Server and the Volume Location (VL) Server. Interactive mode has several convenient features, including the following:
You need to type only the operation code, omitting the initial backup string.
If you assume another AFS identity or specify a foreign cell as you enter interactive mode, it applies to all subsequent commands.
You do not need to enclose shell metacharacters in double quotes.
You can track current and pending operations with the (backup) jobs command, which is available only in this mode.
You can cancel current and pending operations with the (backup) kill command, which is available only in this mode.
Before issuing a command that requires reading or writing a tape (or backup data file), you must also open a connection to the Tape Coordinator machine that is attached to the relevant tape device (or that has the backup data file on its local disk), and issue the butc command to initialize the Tape Coordinator process. The process must continue to run and the connection remain open as long as you need to use the tape device or file for backup operations.
For further discussion and instructions, see Using the Backup System's Interfaces.