0% found this document useful (0 votes)
26 views32 pages

DC Unit4

Uploaded by

21cs004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views32 pages

DC Unit4

Uploaded by

21cs004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CS3551

cCS3551 UNIT IV RECOVERY & CONSENSUS


Check pointing and rollback recovery: Introduction Background and definition Issues in failure
recovery - Checkpoint-based recovery Log-based rollback recovery -Coordinated check pointing
algorithm - Algorithm for asynchronous check pointing and recovery. Consensus and agreement
algorithms: Problem Definition-Overview of results Agreement in a failure free system-
Agreement in synchronous systems with failures.

4.1 Check pointing and rollback recovery: Introduction


Rollback recovery protocols restore the system back to a consistent state after a failure,
It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
It treats a distributed system application as a collection of processes that communicate
over a network
Checkpoints
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Rollback propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect
1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to
form a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.

Page 1 of 32
CS8603 DS

3. Log-based rollback recovery - Combines check pointing with logging of non-


deterministic (PWD) assumption.
4.2 Background and definitions
4.2.1 System model

communicate only through messages.


Processes cooperate to execute a distributed application and interact with the outside world
by receiving and sending input and output messages, respectively.
Rollback-recovery protocols generally make assumptions about the reliability of the inter-
process communication.
Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.

An example of a distributed system with three processes.

4.2.2 A local checkpoint


All processes save their local states at certain instants of time
A local check point is a snapshot of the state of the process at a given instance
Assumption
A process stores all local checkpoints on the stable storage
A process is able to roll back to any of its existing local checkpoints

Page 2 of 32
CS8603 DS

, The kth local checkpoint at process


,0 A process takes a checkpoint ,0 before it starts execution
4.2.3 Consistent states
A global state of a distributed system is a collection of the individual states of all
participating processes and the states of the communication channels
Consistent global state
a global state that may occur during a failure-free execution of distribution of
distributed computation
the
corresponding sender must reflect the sending of the message
A global checkpoint is a set of local checkpoints, one from each process
A consistent global checkpoint is a global checkpoint such that no message is sent by a
process after taking its local point that is received by another process before taking its
checkpoint.

Page 3 of 32
CS8603 DS

For instance, Figure shows two examples of global states.


The state in fig (a) is consistent and the state in Figure (b) is inconsistent.
Note that the consistent state in Figure (a) shows message m1 to have been sent but not
yet received, but that is alright.
The state in Figure (a) is consistent because it represents a situation in which every
message that has been received, there is a corresponding message send event.
The state in Figure (b) is inconsistent because process P2 is shown to have received m2
but the state of process P1 does not reflect having sent it.
Such a state is impossible in any failure-free, correct computation. Inconsistent states
occur because of failures.
4.2.4 Interactions with outside world
A distributed system often interacts with the outside world to receive input data or deliver the
outcome of a computation. If a failure occurs, the outside world cannot be expected to roll back.
For example, a printer cannot roll back the effects of printing a character
Outside World Process (OWP)
It is a special process that interacts with the rest of the system through message passing.
It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
Thus, before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process- ||
4.2.5 Different types of Messages

1. In-transit message
messages that have been sent but not yet received
2. Lost messages

Page 4 of 32
CS8603 DS

3. Delayed messages
se the receiving process was
either down or the message arrived after rollback
4. Orphan messages

do not arise if processes roll back to a consistent global state


5. Duplicate messages
arise due to message logging and replaying during process recovery

In-transit messages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.

Page 5 of 32
CS8603 DS

Delayed messages
Messages whose receive is not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.

Lost messages
Messages whose send is not undone but receive is undone due to rollback are called lost messages.
This type of messages occurs when the process rolls back to a checkpoint prior to reception of the
message while the sender does not rollback beyond the send operation of the message. In Figure ,
message m1 is a lost message.
Duplicate messages
Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
When process P3 restarts from C3,8, it will resend message m4.
Therefore, P4 should not replay message m4 from its log.
If P4 replays message m4, then message m4 is called a duplicate message.
4.3 Issues in failure recovery
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery

Page 6 of 32
CS8603 DS

The computation comprises of three processes Pi, Pj , and Pk, connected through a communication
network. The processes communicate solely by exchanging messages over fault- free, FIFO
communication channels.

Processes Pi, Pj , and Pk have taken checkpoints

The rollback of process to checkpoint ,1 created an orphan message H


Orphan message I is created due to the roll back of process to checkpoint ,1
Messages C, D, E, and F are potentially problematic
Message C: a delayed message
Message D: a lost message since the send event for D is recorded in the
restored state for , but the receive event has been undone at process .
Lost messages can be handled by having processes keep a message log of all
the sent messages
Messages E, F: delayed orphan messages. After resuming execution from their
checkpoints, processes will generate both of these messages

4.4 Checkpoint-based recovery


Checkpoint-based rollback-recovery techniques can be classified into three categories:
1. Uncoordinated checkpointing
2. Coordinated checkpointing
3. Communication-induced checkpointing

1. Uncoordinated Checkpointing
Each process has autonomy in deciding when to take checkpoints
Advantages
The lower runtime overhead during normal execution

Page 7 of 32
CS8603 DS

Disadvantages
1. Domino effect during a recovery
2. Recovery from a failure is slow because processes need to iterate to find a
consistent set of checkpoints
3. Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithm
4. Not suitable for application with frequent output commits
The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation

The following direct dependency tracking technique is commonly used in uncoordinated


checkpointing.

Direct dependency tracking technique


Assume each process starts its execution with an initial checkpoint ,0
, : checkpoint interval, interval between , 1 and ,
When receives a message m during , , it records the dependency from , to , ,
which is later saved onto stable storage when takes ,

When a failure occurs, the recovering process initiates rollback by broadcasting a


dependency request message to collect all the dependency information maintained by each
process.

Page 8 of 32
CS8603 DS

When a process receives this message, it stops its execution and replies with the
dependency information saved on the stable storage as well as with the dependency
information, if any, which is associated with its current state.
The initiator then calculates the recovery line based on the global dependency information
and broadcasts a rollback request message containing the recovery line.

Upon receiving this message, a process whose current state belongs to the recovery line
simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
the recovery line.
2. Coordinated Checkpointing
In coordinated checkpointing, processes orchestrate their checkpointing activities so that all
local checkpoints form a consistent global state
Types
1. Blocking Checkpointing: After a process takes a local checkpoint, to prevent orphan
messages, it remains blocked until the entire checkpointing activity is complete
Disadvantages: The computation is blocked during the checkpointing
2. Non-blocking Checkpointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated checkpointing is to prevent a process
from receiving application messages that could make the checkpoint inconsistent.
Example (a) : Checkpoint inconsistency
Message m is sent by 0 after receiving a checkpoint request from the checkpoint
coordinator
Assume m reaches 1 before the checkpoint request
This situation results in an inconsistent checkpoint since checkpoint 1, shows the
receipt of message m from 0, while checkpoint 0, does not show m being sent from
0
Example (b) : A solution with FIFO channels
If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
message on each channel by a checkpoint request, forcing each process to take a checkpoint
before receiving the first post-checkpoint message

Page 9 of 32
CS8603 DS

Impossibility of min-process non-blocking checkpointing


A min-process, non-blocking checkpointing algorithm is one that forces only a minimum
number of processes to take a new checkpoint, and at the same time it does not force any
process to suspend its computation.

Algorithm
The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and sends
them a request.
Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, until
no more processes can be identified.
During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.

Page 10 of 32
CS8603 DS

In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-induced Checkpointing
Communication-induced checkpointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints
Two types of checkpoints
1. Autonomous checkpoints
2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
Communication-induced check pointing piggybacks protocol- related information on
each application message
The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
The forced checkpoint must be taken before the application may process the contents of
the message
In contrast with coordinated check pointing, no special coordination messages are
exchanged

Two types of communication-induced checkpointing


1. Model-based checkpointing
2. Index-based checkpointing.
Model-based checkpointing
Model-based checkpointing prevents patterns of communications and checkpoints that
could result in inconsistent states among the existing checkpoints.
No control messages are exchanged among the processes during normal operation.
All information necessary to execute the protocol is piggybacked on application
messages

Page 11 of 32
CS8603 DS

There are several domino-effect-free checkpoint and communication model.


The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede all
message-sending events.
Index-based checkpointing.
Index-based communication-induced checkpointing assigns monotonically increasing
indexes to checkpoints, such that the checkpoints having the same index at different
processes form a consistent state.
4.5 Log-based rollback recovery
A log-based rollback recovery makes use of deterministic and nondeterministic events in a
computation.

Deterministic and non-deterministic events


Log-based rollback recovery exploits the fact that a process execution can be modeled
as a sequence of deterministic state intervals, each starting with the execution of a non-
deterministic event.
A non-deterministic event can be the receipt of a message from another process or an
event internal to the process.
Note that a message send event is not a non-deterministic event.
For example, in Figure, the execution of process P0 is a sequence of four deterministic
intervals. The first one starts with the creation of the process, while the remaining three
start with the receipt of messages m0, m3, and m7, respectively.
Send event of message m2 is uniquely determined by the initial state of P0 and by the
receipt of message m0, and is therefore not a non-deterministic event.
Log-based rollback recovery assumes that all non-deterministic events can be
identified and their corresponding determinants can be logged into the stable storage.
Determinant -deterministic
event (e.g., message reception).
During failure-free operation, each process logs the determinants of all non-
deterministic events that it observes onto the stable storage. Additionally, each process
also takes checkpoints to reduce the extent of rollback during recovery.

Meenakshi.R

Page 12 of 32
CS8603 DS

The no-orphans consistency condition


Let e be a non-deterministic event that occurs at process p. We define the following:
Depend(e): the set of processes that are affected by a non-deterministic event e.
Log(e): the set of processes that have logged a copy of e
memory.
Stable(e): a predicate that is true if e

Suppose a set of processes crashes. A process p in becomes an orphan when p itself does
not fail and p e whose determinant
cannot be recovered from the stable storage or from the volatile memory of a surviving process.
storage or from the volatile memory of a surviving process. Formally, it can be stated as follows

Page 13 of 32
CS8603 DS

Types

1. Pessimistic Logging
Pessimistic logging protocols assume that a failure can occur after any non-deterministic
event in the computation. However, in reality failures are rare
Pessimistic protocols implement the following property, often referred to as synchronous logging,
which is a stronger than the always-no-orphans condition
Synchronous logging
e: Stable(e) |Depend(e)| = 0
Thai is,if an event has not been logged on the stable storage, then no process can depend
on it.
Example:
Suppose processes 1 and 2 fail as shown, restart from checkpoints B and C, and roll forward
using their determinant logs to deliver again the same sequence of messages as in the pre-
failure execution
Once the recovery is complete, both processes will be consistent with the state of 0
that includes the receipt of message 7 from 1

Page 14 of 32
CS8603 DS

Disadvantage: performance penalty for synchronous logging


Advantages:
immediate output commit
restart from most recent checkpoint
recovery limited to failed process(es)
simple garbage collection
Some pessimistic logging systems reduce the overhead of synchronous logging without
relying on hardware. For example, the sender-based message logging (SBML) protocol
keeps the determinants corresponding to the delivery of each message m in the volatile
memory of its sender.
The sender-based message logging (SBML) protocol
Two steps.
1. First, before sending m, the sender logs its content in volatile memory.
2. Then, when the receiver of m responds with an acknowledgment that includes the order
in which the message was delivered, the sender adds to the determinant the ordering
information.
2. Optimistic Logging
Processes log determinants asynchronously to the stable storage
Optimistically assume that logging will be complete before a failure occurs
Do not implement the always-no-orphans condition

Page 15 of 32
CS8603 DS

To perform rollbacks correctly, optimistic logging protocols track causal dependencies


during failure free execution
Optimistic logging protocols require a non-trivial garbage collection scheme
Pessimistic protocols need only keep the most recent checkpoint of each process, whereas
optimistic protocols may need to keep multiple checkpoints for each process

Consider the example shown in Figure Suppose process P2 fails before the determinant for
m5 is logged to the stable storage. Process P1 then becomes an orphan process and must
roll back to undo the effects of receiving the orphan message m6. The rollback of P1
further forces P0 to roll back to undo the effects of receiving message m7.
Advantage: better performance in failure-free execution
Disadvantages:
coordination required on output commit
more complex garbage collection
Since determinants are logged asynchronously, output commit in optimistic logging
protocols requires a guarantee that no failure scenario can revoke the output. For example,
if process P0 needs to commit output at state X, it must log messages m4 and m7 to the
stable storage and ask P2 to log m2 and m5. In this case, if any process fails, the
computation can be reconstructed up to state X.

Page 16 of 32
CS8603 DS

3. Causal Logging
Combines the advantages of both pessimistic and optimistic logging at the expense of a more
complex recovery protocol
Like optimistic logging, it does not require synchronous access to the stable storage except
during output commit
Like pessimistic logging, it allows each process to commit output independently and never
creates orphans, thus isolating processes from the effects of failures at other processes
Make sure that the always-no-orphans property holds
Each process maintains information about all the events that have causally affected its state

Consider the example in Figure Messages m5 and m6 are likely to be lost on the failures
of P1 and P2 at the indicated instants. Process
P0 at state X will have logged the determinants of the nondeterministic events that
causally precede its state accordi happened-before relation.
These events consist of the delivery of messages m0, m1, m2, m3, and m4.
The determinant of each of these non-deterministic events is either logged on the stable
storage or is available in the volatile log of process P0.
The determinant of each of these events contains the order in which its original receiver
delivered the corresponding message.

Page 17 of 32
CS8603 DS

The message sender, as in sender-based message logging, logs the message content. Thus,
process P P1 and P2 since it knows the
order in which P1 should replay messages m1 and m3 to reach the state from which P1 sent
message m4.
Similarly, P0 has the order in which P2 should replay message m2 to be consistent with
both P0 and P1.
The content of these messages is obtained from the sender log of P0 or regenerated
deterministically during the recovery of P1 and P2.
Note that information about messages m5 and m6 is lost due to failures. These messages
may be resent after recovery possibly in a different order.
However, since they did not causally affect the surviving process or the outside world, the
resulting state is consistent.
Each process maintains information about all the events that have causally affected its state.

4.6 KOO AND TOUEG COORDINATED CHECKPOINTING AND RECOVERY


TECHNIQUE:
Koo and Toueg coordinated check pointing and recovery technique takes a consistent set
of checkpoints and avoids the domino effect and livelock problems during the recovery.
Includes 2 parts: the check pointing algorithm and the recovery algorithm

A. The Checkpointing Algorithm


The checkpoint algorithm makes the following assumptions about the distributed system:
Processes communicate by exchanging messages through communication channels.
Communication channels are FIFO.
Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
Communication failures do not divide the network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent and
Tentative.

Page 18 of 32
CS8603 DS

A permanent checkpoint is a local checkpoint at a process and is a part of a consistent global


checkpoint.
A tentative checkpoint is a temporary checkpoint that is made a permanent checkpoint on the
successful termination of the checkpoint algorithm.

The algorithm consists of two phases.

First Phase
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to take
tentative checkpoints. Each process informs Pi whether it succeeded in taking a tentative
checkpoint.
2.
3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase
1. Pi informs all the processes of the decision it reached at the end of the first phase.
2. A process, on receiving the message from Pi will act accordingly.
3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic
Correctness: for two reasons
i. Either all or none of the processes take permanent checkpoint
ii. No process sends message after taking permanent checkpoint
An Optimization
The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.

Page 19 of 32
CS8603 DS

B. The Rollback Recovery Algorithm


The rollback recovery algorithm restores the system state to a consistent state after a failure. The
rollback recovery algorithm assumes that a single process invokes the algorithm. It assumes that
the checkpoint and the rollback recovery algorithms are not invoked concurrently. The rollback
recovery algorithm has two phases.
First Phase
1. An initiating process Pi sends a message to all other processes to check if they all are
willing to restart from their previous checkpoints.
2.
participating in a check pointing or a recovery process initiated by some other process).
3. If Pi learns that all processes are willing to restart from their previous checkpoints, Pi
decides that all processes should roll back to their previous checkpoints. Otherwise,
4. Pi aborts the roll back attempt and it may attempt a recovery at a later time.
Second Phase
1. Pi propagates its decision to all the processes.
2.
3. During the execution of the recovery algorithm, a process cannot send messages related

Correctness: Resume from a consistent state


Optimization: May not to recover all, since some of the processes did not change anything

Optimization: May not to recover all, since some of the processes did not change
anything

Page 20 of 32
CS8603 DS

The above protocol, in the event of failure of process X, the above protocol will require
processes X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.

4.7 ALGORITHM FOR ASYNCHRONOUS CHECKPOINTING AND RECOVERY:


The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
A. System Model and Assumptions
The algorithm makes the following assumptions about the underlying system:
The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
The message transmission delay is arbitrary, but finite.
Underlying computation/application is event-driven: process P is at state s, receives
message m, processes the message, moves to state s and send messages out. So the triplet
(s, m, msgs_sent) represents the state of P

Two type of log storage are maintained:


Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
Stable log: longer time to access but remained if crashed
A. Asynchronous Check pointing
After executing an event, the triplet is recorded without any synchronization with
other processes.
Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
B. The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:

pj , from the beginning of the computation till the checkpoint CkPti.

Page 21 of 32
CS8603 DS

CkPti) represents the number of messages sent by processor pi to processor pj , from


the beginning of the computation till the checkpoint CkPti.
Basic idea
Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCV
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.

The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
processor pi executes the following:
STEP (a)
if processor pi is recovering after a failure then
CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if
STEP (b)
for k = 1 1 to N {N is the number of processors in the system} do
for each neighboring processor pj do

Page 22 of 32
CS8603 DS

end for
for every ROLLBACK(j, c) message received from a neighbor j do
if RCV {Implies the presence of orphan messages} then
find the latest event e such that RCV
or stable storage.}
CkPti := e
end if
end for
end for{for k}
D. An Example
Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y
fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.

Figure 2: An example of Juan-Venkatesan algorithm.


Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is
initiated at processors X and Z.
respectively,
and X, Y, and Z send the following messages during the first iteration:
Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;

Page 23 of 32
CS8603 DS

X sends ROLLBACK(X,2) to Y and ROLLBACK(X,0) to Z;


Z sends ROLLBACK(Z,0) to X and ROLLBACK(Z,1) to Y.
Since RCV
from Y), X will set CkPtX to ex2 satisfying RCV
Since RCV set CkPtZ to ez1 satisfying RCV
1.

Y need not roll back further.


In the second iteration, Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;

Z sends ROLLBACK(Z,1) to Y and ROLLBACK(Z,0) to X;


X sends ROLLBACK(X,0) to Z and ROLLBACK(X, 1) to Y.
If Y rolls back beyond ey3 and loses the message from X that caused ey3, X can resend this
message to Y because ex2 is logged at X and this message available in the log. The second and
third iteration will progress in the same manner. The set of recovery points chosen at the end of
the first iteration, {ex2, ey2, ez1}, is consistent, and no further rollback occurs.

CONSENSUS PROBLEM IN ASYNCHRONOUS SYSTEMS.

Table: Overview of results on agreement.


f denotes number of failure-prone processes. n is the total number of processes.
Failure Synchronous system Asynchronous system

mode (message-passing and shared (message-passing and shared


memory) memory)
No Failure agreement attainable; agreement attainable;

common knowledge attainable concurrent common knowledge


Crash Failure agreement attainable agreement not attainable

f < n processes
Byzantine agreement attainable agreement not attainable

Failure f [(n - 1)/3] Byzantine processes


In a failure-free system, consensus can be attained in a straightforward manner.

Consensus Problem (all processes have an initial value)

Page 24 of 32
CS8603 DS

Agreement: All non-faulty processes must agree on the same (single) value.

Validity: If all the non-faulty processes have the same initial value, then the agreed upon
value by all the non-faulty processes must be that same value.

Termination: Each non-faulty process must eventually decide on a value.

Consensus Problem in Asynchronous Systems.

The overhead bounds are for the given algorithms, and not necessarily tight bounds for
the

problem.

Solvable Failure model and overhead Definition


Variants
Reliable Crash Failure, n > f (MP) Validity, Agreement, Integrity
broadcast conditions
k-set Crash Failure, f < k < n. (MP size of the set of values agreed
consensus and SM) upon must be less than k

C-agreement Crash Failure, n 5f + 1 (MP) values agreed upon are


within of each other

Renaming up to f fail-stop processes, n select a unique name from


2f + 1 (MP) a set of names
Crash Failure, f n - 1 (SM)

Circumventing the impossibility results for consensus in asynchronous systems:

Page 25 of 32
CS8603 DS

STEPS FOR BYZANTINE GENERALS (ITERATIVE FORMULATION),


SYNCHRONOUS, MESSAGE-PASSING:

Byzantine Agreement (single source has an initial value)

Agreement: All non-faulty processes must agree on the same value.

Validity: If the source process is non-faulty, then the agreed upon value by all the non-
faulty processes must be the same as the initial value of the source.

Page 26 of 32
CS8603 DS

STEPS FOR BYZANTINE GENERALS (RECURSIVE FORMULATION),


SYNCHRONOUS, MESSAGE-PASSING:

Page 27 of 32
CS8603 DS

CODE FOR THE PHASE KING ALGORITHM:

Each phase has a unique "phase king" derived, say, from PID.

Each phase has two rounds:

1 in 1st round, each process sends its estimate to all other processes.

2 in 2nd round, the "Phase king" process arrives at an estimate based on the values
it received in 1st round, and broadcasts its new estimate to all others.

Fig. Message pattern for the phase-king algorithm.

PHASE KING ALGORITHM CODE:

Meenakshi.R

Page 28 of 32
CS8603 DS

(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes

Correctness Argument

1 Among f + 1 phases, at least one phase k where phase-king is non-malicious.

2 In phase k, all non-malicious processes Pi and Pj will have same estimate of


consensus value as Pk does.

Pi and Pj use their own majority values. Pi 's mult > n=2 + f )

Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi s mult > n=2 +
f , Pj 's mult > n=2 for same value)

Pi and Pj use the phase-king's tie-breaker value. (In the phase in which P k is
non- malicious, it sends same value to Pi and Pj )

In all 3 cases, argue that Pi and Pj end up with same value as estimate

If all non-malicious processes have the value x at the start of a phase, they will
continue to have x as the consensus value at the end of the phase.

CODE FOR THE EPSILON CONSENSUS (MESSAGE-PASSING, ASYNCHRONOUS):

_-Agreement: All non-faulty processes must make a decision and the values decided
upon by any two non-faulty processes must be within range of each other.

Validity: If a non-faulty process Pi decides on some value vi , then that value must be
within the range of values initially proposed by the processes.

Termination: Each non-faulty process must eventually decide on a value. The algorithm
for the message-passing model assumes n 5f + 1, although the problem is solvable for n
> 3f + 1.

Main loop simulates sync rounds.

Main lines (1d)-(1f): processes perform all-all msg exchange

Process broadcasts its estimate of consensus value, and awaits n - f similar

msgs from other processes

the processes' estimate of the consensus value converges at a particular rate,

until it is _ from any other processes estimate.

# rounds determined by lines (1a)-(1c).

Page 29 of 32
CS8603 DS

TWO-PROCESS WAIT-FREE CONSENSUS USING FIFO QUEUE, COMPARE


& SWAP:

Wait-free Shared Memory Consensus using Shared Objects:

Not possible to go from bivalent to univalent state if even a single failure is allowed.

Difficulty is not being able to read & write a variable atomically.

It is not possible to reach consensus in an asynchronous shared memory system


using Read/Write atomic registers, even if a single process can fail by crashing.

There is no wait-free consensus algorithm for reaching consensus in an


asynchronous shared memory system using Read/Write atomic registers.

To overcome these negative results:

Weakening the consensus problem, e.g., k-set consensus, approximate


consensus, and renaming using atomic registers.

Using memory that is stronger than atomic Read/Write memory to design wait-
free consensus algorithms. Such a memory would need corresponding access
primitives.

Page 30 of 32
CS8603 DS

Are there objects (with supporting operations), using which there is a wait-free (i.e., (n -1)-
crash resilient) algorithm for reaching consensus in a n-process system? Yes, e.g., Test&Set,
Swap, Compare&Swap. The crash failure model requires the solutions to be wait-free.

TWO-PROCESS WAIT-FREE CONSENSUS USING FIFO QUEUE:

WAIT-FREE CONSENSUS USING COMPARE & SWAP:

Page 31 of 32
CS8603 DS

NONBLOCKING UNIVERSAL ALGORITHM:

Universality of Consensus Objects

An object is defined to be universal if that object along with read/write registers can simulate
any other object in a wait-free manner. In any system containing up to k processes, an object
X such that CN(X) = k is universal.

For any system with up to k processes, the universality of objects X with consensus number k
is shown by giving a universal algorithm to wait-free simulate any object using objects of type
X and read/write registers.

This is shown in two steps.

1 A universal algorithm to wait-free simulate any object whatsoever using


read/write registers and arbitrary k-processor consensus objects is given. This is the
main step.

2 Then, the arbitrary k-process consensus objects are simulated with objects of type
X, having consensus number k. This trivially follows after the first step.

Any object X with consensus number k is universal in a system with n k


processes.

A nonblocking operation, in the context of shared memory operations, is an operation that


may not complete itself but is guaranteed to complete at least one of the pending operations
in a finite number of steps.

Nonblocking Universal Algorithm:

The linked list stores the linearized sequence of operations and states following each
operation.

Operations to the arbitrary object Z are simulated in a nonblocking way using an arbitrary
consensus object (the field op.next in each record) which is accessed via the Decide call.

Each process attempts to thread its own operation next into the linked
list.

There are as many universal objects as there are operations to thread.

A single pointer/counter cannot be used instead of the array Head. Because reading
and updating the pointer cannot be done atomically in a wait-free manner.

Linearization of the operations given by the sequence number. As algorithm


is nonblock

Page 32 of 32

You might also like