OpenAFS
OpenAFS distributed network file system
|
General Ubik Goal: The goal is to provide reliable operation among N servers, such that any server can crash with the remaining servers continuing operation within a short period of time. More...
#include <afsconfig.h>
#include <afs/param.h>
#include <roken.h>
#include <afs/opr.h>
#include <lock.h>
#include <rx/rx.h>
#include <afs/afsutil.h>
#include "ubik.h"
#include "ubik_int.h"
Functions | |
int | uvote_ShouldIRun (void) |
Decide if we should try to become sync site. | |
afs_int32 | uvote_GetSyncSite (void) |
Return the current synchronization site, if any. | |
afs_int32 | SVOTE_Beacon (struct rx_call *rxcall, afs_int32 astate, afs_int32 astart, struct ubik_version *avers, struct ubik_tid *atid) |
called by the sync site to handle vote beacons; if aconn is null, this is a local call | |
afs_int32 | SVOTE_SDebug (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug *aparm) |
Handle per-server debug command, where 0 is the first server. | |
afs_int32 | SVOTE_XSDebug (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug *aparm, afs_int32 *isclone) |
afs_int32 | SVOTE_XDebug (struct rx_call *rxcall, struct ubik_debug *aparm, afs_int32 *isclone) |
afs_int32 | SVOTE_Debug (struct rx_call *rxcall, struct ubik_debug *aparm) |
Handle basic network debug command. | |
afs_int32 | SVOTE_SDebugOld (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug_old *aparm) |
afs_int32 | SVOTE_DebugOld (struct rx_call *rxcall, struct ubik_debug_old *aparm) |
Handle basic network debug command. | |
afs_int32 | SVOTE_GetSyncSite (struct rx_call *rxcall, afs_int32 *ahost) |
Get the sync site; called by remote servers to find where they should go. | |
void | ubik_dprint_25 (const char *format,...) |
void | ubik_dprint (const char *format,...) |
void | ubik_vprint (const char *format, va_list ap) |
void | ubik_print (const char *format,...) |
int | uvote_Init (void) |
Called once/run to init the vote module. | |
void | uvote_set_dbVersion (struct ubik_version version) |
int | uvote_eq_dbVersion (struct ubik_version version) |
int | uvote_HaveSyncAndVersion (struct ubik_version version) |
Check if there is a sync site and whether we have a given db version. | |
Variables | |
afs_int32 | ubik_debugFlag = 0 |
print out debugging messages? | |
struct vote_data | vote_globals |
General Ubik Goal: The goal is to provide reliable operation among N servers, such that any server can crash with the remaining servers continuing operation within a short period of time.
While a short outage is acceptable, this time should be order of 3 minutes or less.
Theory of operation:
Note: #SMALLTIME and #BIGTIME are essentially the same time value, separated only by the clock skew, #MAXSKEW. In general, if you are making guarantees for someone else, promise them no more than #SMALLTIME seconds of whatever invariant you provide. If you are waiting to be sure some invariant is now false, wait at least #BIGTIME seconds to be sure that #SMALLTIME seconds has passed at the other site.
Now, back to the design: One site in the collection is a special site, designated the sync site. The sync site sends periodic messages, which can be thought of as keep-alive messages. When a non-sync site hears from the sync site, it knows that it is getting updates for the next #SMALLTIME seconds from that sync site.
If a server does not hear from the sync site in #SMALLTIME seconds, it determines that it no longer is getting updates, and thus refuses to give out potentially out-of-date data. If a sync site can not muster a majority of servers to agree that it is the sync site, then there is a possibility that a network partition has occurred, allowing another server to claim to be the sync site. Thus, any time that the sync site has not heard from a majority of the servers in the last #SMALLTIME seconds, it voluntarily relinquishes its role as sync site.
While attempting to nominate a new sync site, certain rules apply. First, a server can not reply "ok" (return 1 from ServBeacon) to two different hosts in less than #BIGTIME seconds; this allows a server that has heard affirmative replies from a majority of the servers to know that no other server in the network has heard enough affirmative replies in the last #BIGTIME seconds to become sync site, too. The variables #ubik_lastYesTime and #lastYesHost are used by all servers to keep track of which host they have last replied affirmatively to, when queried by a potential new sync site.
Once a sync site has become a sync site, it periodically sends beacon messages with a parameter of 1, indicating that it already has determined it is supposed to be the sync site. The servers treat such a message as a guarantee that no other site will become sync site for the next #SMALLTIME seconds. In the interim, these servers can answer a query concerning which site is the sync site without any communication with any server. The variables #lastBeaconArrival and #lastBeaconHost are used by all servers to keep track of which sync site has last contacted them.
One complication occurs while nominating a new sync site: each site may be trying to nominate a different site (based on the value of #lastYesHost), yet we must nominate the smallest host (under some order), to prevent this process from looping. The process could loop by having each server give one vote to another server, but with no server getting a majority of the votes. To avoid this, we try to withhold our votes for the server with the lowest internet address (an easy-to-generate order). To this effect, we keep track (in #lowestTime and #lowestHost) of the lowest server trying to become a sync site. We wait for this server unless there is already a sync site (indicated by ServBeacon's parameter being 1).
afs_int32 SVOTE_Beacon | ( | struct rx_call * | rxcall, |
afs_int32 | astate, | ||
afs_int32 | astart, | ||
struct ubik_version * | avers, | ||
struct ubik_tid * | atid | ||
) |
called by the sync site to handle vote beacons; if aconn is null, this is a local call
afs_int32 SVOTE_Debug | ( | struct rx_call * | rxcall, |
struct ubik_debug * | aparm | ||
) |
Handle basic network debug command.
This is the global state dumper.
afs_int32 SVOTE_DebugOld | ( | struct rx_call * | rxcall, |
struct ubik_debug_old * | aparm | ||
) |
Handle basic network debug command.
This is the global state dumper.
afs_int32 SVOTE_SDebug | ( | struct rx_call * | rxcall, |
afs_int32 | awhich, | ||
struct ubik_sdebug * | aparm | ||
) |
Handle per-server debug command, where 0 is the first server.
Basic network debugging hooks.
afs_int32 uvote_GetSyncSite | ( | void | ) |
Return the current synchronization site, if any.
Simple approach: if the last guy we voted yes for claims to be the sync site, then we we're happy to use that guy for a sync site until the time his mandate expires. If the guy does not claim to be sync site, then, of course, there's none.
In addition, if we lost the sync, we set #urecovery_syncSite to an invalid value, indicating that we no longer know which version of the dbase is the one we should have. We'll get a new one when we next hear from the sync site.
int uvote_HaveSyncAndVersion | ( | struct ubik_version | version | ) |
Check if there is a sync site and whether we have a given db version.
int uvote_ShouldIRun | ( | void | ) |
Decide if we should try to become sync site.
The basic rule is that we don't run if there is a valid sync site and it ain't us (we have to run if it is us, in order to keep our votes). If there is no sync site, then we want to run if we're the lowest numbered host running, otherwise we defer to the lowest host. However, if the lowest host hasn't been heard from for a while, then we start running again, in case he crashed.