OpenAFS
OpenAFS distributed network file system
/cygdrive/c/src/openafs/openafs.git/repo/src/ubik/vote.c File Reference

General Ubik Goal: The goal is to provide reliable operation among N servers, such that any server can crash with the remaining servers continuing operation within a short period of time. More...

#include <afsconfig.h>
#include <afs/param.h>
#include <roken.h>
#include <afs/opr.h>
#include <lock.h>
#include <rx/rx.h>
#include <afs/afsutil.h>
#include "ubik.h"
#include "ubik_int.h"

Functions

int uvote_ShouldIRun (void)
 Decide if we should try to become sync site.
afs_int32 uvote_GetSyncSite (void)
 Return the current synchronization site, if any.
afs_int32 SVOTE_Beacon (struct rx_call *rxcall, afs_int32 astate, afs_int32 astart, struct ubik_version *avers, struct ubik_tid *atid)
 called by the sync site to handle vote beacons; if aconn is null, this is a local call
afs_int32 SVOTE_SDebug (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug *aparm)
 Handle per-server debug command, where 0 is the first server.
afs_int32 SVOTE_XSDebug (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug *aparm, afs_int32 *isclone)
afs_int32 SVOTE_XDebug (struct rx_call *rxcall, struct ubik_debug *aparm, afs_int32 *isclone)
afs_int32 SVOTE_Debug (struct rx_call *rxcall, struct ubik_debug *aparm)
 Handle basic network debug command.
afs_int32 SVOTE_SDebugOld (struct rx_call *rxcall, afs_int32 awhich, struct ubik_sdebug_old *aparm)
afs_int32 SVOTE_DebugOld (struct rx_call *rxcall, struct ubik_debug_old *aparm)
 Handle basic network debug command.
afs_int32 SVOTE_GetSyncSite (struct rx_call *rxcall, afs_int32 *ahost)
 Get the sync site; called by remote servers to find where they should go.
void ubik_dprint_25 (const char *format,...)
void ubik_dprint (const char *format,...)
void ubik_vprint (const char *format, va_list ap)
void ubik_print (const char *format,...)
int uvote_Init (void)
 Called once/run to init the vote module.
void uvote_set_dbVersion (struct ubik_version version)
int uvote_eq_dbVersion (struct ubik_version version)
int uvote_HaveSyncAndVersion (struct ubik_version version)
 Check if there is a sync site and whether we have a given db version.

Variables

afs_int32 ubik_debugFlag = 0
 print out debugging messages?
struct vote_data vote_globals

Detailed Description

General Ubik Goal: The goal is to provide reliable operation among N servers, such that any server can crash with the remaining servers continuing operation within a short period of time.

While a short outage is acceptable, this time should be order of 3 minutes or less.

Theory of operation:

Note: #SMALLTIME and #BIGTIME are essentially the same time value, separated only by the clock skew, #MAXSKEW. In general, if you are making guarantees for someone else, promise them no more than #SMALLTIME seconds of whatever invariant you provide. If you are waiting to be sure some invariant is now false, wait at least #BIGTIME seconds to be sure that #SMALLTIME seconds has passed at the other site.

Now, back to the design: One site in the collection is a special site, designated the sync site. The sync site sends periodic messages, which can be thought of as keep-alive messages. When a non-sync site hears from the sync site, it knows that it is getting updates for the next #SMALLTIME seconds from that sync site.

If a server does not hear from the sync site in #SMALLTIME seconds, it determines that it no longer is getting updates, and thus refuses to give out potentially out-of-date data. If a sync site can not muster a majority of servers to agree that it is the sync site, then there is a possibility that a network partition has occurred, allowing another server to claim to be the sync site. Thus, any time that the sync site has not heard from a majority of the servers in the last #SMALLTIME seconds, it voluntarily relinquishes its role as sync site.

While attempting to nominate a new sync site, certain rules apply. First, a server can not reply "ok" (return 1 from ServBeacon) to two different hosts in less than #BIGTIME seconds; this allows a server that has heard affirmative replies from a majority of the servers to know that no other server in the network has heard enough affirmative replies in the last #BIGTIME seconds to become sync site, too. The variables #ubik_lastYesTime and #lastYesHost are used by all servers to keep track of which host they have last replied affirmatively to, when queried by a potential new sync site.

Once a sync site has become a sync site, it periodically sends beacon messages with a parameter of 1, indicating that it already has determined it is supposed to be the sync site. The servers treat such a message as a guarantee that no other site will become sync site for the next #SMALLTIME seconds. In the interim, these servers can answer a query concerning which site is the sync site without any communication with any server. The variables #lastBeaconArrival and #lastBeaconHost are used by all servers to keep track of which sync site has last contacted them.

One complication occurs while nominating a new sync site: each site may be trying to nominate a different site (based on the value of #lastYesHost), yet we must nominate the smallest host (under some order), to prevent this process from looping. The process could loop by having each server give one vote to another server, but with no server getting a majority of the votes. To avoid this, we try to withhold our votes for the server with the lowest internet address (an easy-to-generate order). To this effect, we keep track (in #lowestTime and #lowestHost) of the lowest server trying to become a sync site. We wait for this server unless there is already a sync site (indicated by ServBeacon's parameter being 1).


Function Documentation

afs_int32 SVOTE_Beacon ( struct rx_call rxcall,
afs_int32  astate,
afs_int32  astart,
struct ubik_version avers,
struct ubik_tid atid 
)

called by the sync site to handle vote beacons; if aconn is null, this is a local call

Returns:
0 or time when the vote was sent. It returns 0 if we are not voting for this sync site, or the time we actually voted yes, if non-zero.
afs_int32 SVOTE_Debug ( struct rx_call rxcall,
struct ubik_debug aparm 
)

Handle basic network debug command.

This is the global state dumper.

afs_int32 SVOTE_DebugOld ( struct rx_call rxcall,
struct ubik_debug_old aparm 
)

Handle basic network debug command.

This is the global state dumper.

afs_int32 SVOTE_SDebug ( struct rx_call rxcall,
afs_int32  awhich,
struct ubik_sdebug aparm 
)

Handle per-server debug command, where 0 is the first server.

Basic network debugging hooks.

afs_int32 uvote_GetSyncSite ( void  )

Return the current synchronization site, if any.

Simple approach: if the last guy we voted yes for claims to be the sync site, then we we're happy to use that guy for a sync site until the time his mandate expires. If the guy does not claim to be sync site, then, of course, there's none.

In addition, if we lost the sync, we set #urecovery_syncSite to an invalid value, indicating that we no longer know which version of the dbase is the one we should have. We'll get a new one when we next hear from the sync site.

Returns:
0 or currently valid sync site. It can return our own address, if we're the sync site.
int uvote_HaveSyncAndVersion ( struct ubik_version  version)

Check if there is a sync site and whether we have a given db version.

Returns:
1 if there is a valid sync site, and the given db version matches the sync site's
int uvote_ShouldIRun ( void  )

Decide if we should try to become sync site.

The basic rule is that we don't run if there is a valid sync site and it ain't us (we have to run if it is us, in order to keep our votes). If there is no sync site, then we want to run if we're the lowest numbered host running, otherwise we defer to the lowest host. However, if the lowest host hasn't been heard from for a while, then we start running again, in case he crashed.

Returns:
true if we should run, and false otherwise.
 All Data Structures Files Functions Variables