DC Unit4
DC Unit4
Page 1 of 32
CS8603 DS
Page 2 of 32
CS8603 DS
Page 3 of 32
CS8603 DS
1. In-transit message
messages that have been sent but not yet received
2. Lost messages
Page 4 of 32
CS8603 DS
3. Delayed messages
se the receiving process was
either down or the message arrived after rollback
4. Orphan messages
In-transit messages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.
Page 5 of 32
CS8603 DS
Delayed messages
Messages whose receive is not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.
Lost messages
Messages whose send is not undone but receive is undone due to rollback are called lost messages.
This type of messages occurs when the process rolls back to a checkpoint prior to reception of the
message while the sender does not rollback beyond the send operation of the message. In Figure ,
message m1 is a lost message.
Duplicate messages
Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
When process P3 restarts from C3,8, it will resend message m4.
Therefore, P4 should not replay message m4 from its log.
If P4 replays message m4, then message m4 is called a duplicate message.
4.3 Issues in failure recovery
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
Page 6 of 32
CS8603 DS
The computation comprises of three processes Pi, Pj , and Pk, connected through a communication
network. The processes communicate solely by exchanging messages over fault- free, FIFO
communication channels.
1. Uncoordinated Checkpointing
Each process has autonomy in deciding when to take checkpoints
Advantages
The lower runtime overhead during normal execution
Page 7 of 32
CS8603 DS
Disadvantages
1. Domino effect during a recovery
2. Recovery from a failure is slow because processes need to iterate to find a
consistent set of checkpoints
3. Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithm
4. Not suitable for application with frequent output commits
The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation
Page 8 of 32
CS8603 DS
When a process receives this message, it stops its execution and replies with the
dependency information saved on the stable storage as well as with the dependency
information, if any, which is associated with its current state.
The initiator then calculates the recovery line based on the global dependency information
and broadcasts a rollback request message containing the recovery line.
Upon receiving this message, a process whose current state belongs to the recovery line
simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
the recovery line.
2. Coordinated Checkpointing
In coordinated checkpointing, processes orchestrate their checkpointing activities so that all
local checkpoints form a consistent global state
Types
1. Blocking Checkpointing: After a process takes a local checkpoint, to prevent orphan
messages, it remains blocked until the entire checkpointing activity is complete
Disadvantages: The computation is blocked during the checkpointing
2. Non-blocking Checkpointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated checkpointing is to prevent a process
from receiving application messages that could make the checkpoint inconsistent.
Example (a) : Checkpoint inconsistency
Message m is sent by 0 after receiving a checkpoint request from the checkpoint
coordinator
Assume m reaches 1 before the checkpoint request
This situation results in an inconsistent checkpoint since checkpoint 1, shows the
receipt of message m from 0, while checkpoint 0, does not show m being sent from
0
Example (b) : A solution with FIFO channels
If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
message on each channel by a checkpoint request, forcing each process to take a checkpoint
before receiving the first post-checkpoint message
Page 9 of 32
CS8603 DS
Algorithm
The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and sends
them a request.
Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, until
no more processes can be identified.
During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.
Page 10 of 32
CS8603 DS
In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-induced Checkpointing
Communication-induced checkpointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints
Two types of checkpoints
1. Autonomous checkpoints
2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
Communication-induced check pointing piggybacks protocol- related information on
each application message
The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
The forced checkpoint must be taken before the application may process the contents of
the message
In contrast with coordinated check pointing, no special coordination messages are
exchanged
Page 11 of 32
CS8603 DS
Meenakshi.R
Page 12 of 32
CS8603 DS
Suppose a set of processes crashes. A process p in becomes an orphan when p itself does
not fail and p e whose determinant
cannot be recovered from the stable storage or from the volatile memory of a surviving process.
storage or from the volatile memory of a surviving process. Formally, it can be stated as follows
Page 13 of 32
CS8603 DS
Types
1. Pessimistic Logging
Pessimistic logging protocols assume that a failure can occur after any non-deterministic
event in the computation. However, in reality failures are rare
Pessimistic protocols implement the following property, often referred to as synchronous logging,
which is a stronger than the always-no-orphans condition
Synchronous logging
e: Stable(e) |Depend(e)| = 0
Thai is,if an event has not been logged on the stable storage, then no process can depend
on it.
Example:
Suppose processes 1 and 2 fail as shown, restart from checkpoints B and C, and roll forward
using their determinant logs to deliver again the same sequence of messages as in the pre-
failure execution
Once the recovery is complete, both processes will be consistent with the state of 0
that includes the receipt of message 7 from 1
Page 14 of 32
CS8603 DS
Page 15 of 32
CS8603 DS
Consider the example shown in Figure Suppose process P2 fails before the determinant for
m5 is logged to the stable storage. Process P1 then becomes an orphan process and must
roll back to undo the effects of receiving the orphan message m6. The rollback of P1
further forces P0 to roll back to undo the effects of receiving message m7.
Advantage: better performance in failure-free execution
Disadvantages:
coordination required on output commit
more complex garbage collection
Since determinants are logged asynchronously, output commit in optimistic logging
protocols requires a guarantee that no failure scenario can revoke the output. For example,
if process P0 needs to commit output at state X, it must log messages m4 and m7 to the
stable storage and ask P2 to log m2 and m5. In this case, if any process fails, the
computation can be reconstructed up to state X.
Page 16 of 32
CS8603 DS
3. Causal Logging
Combines the advantages of both pessimistic and optimistic logging at the expense of a more
complex recovery protocol
Like optimistic logging, it does not require synchronous access to the stable storage except
during output commit
Like pessimistic logging, it allows each process to commit output independently and never
creates orphans, thus isolating processes from the effects of failures at other processes
Make sure that the always-no-orphans property holds
Each process maintains information about all the events that have causally affected its state
Consider the example in Figure Messages m5 and m6 are likely to be lost on the failures
of P1 and P2 at the indicated instants. Process
P0 at state X will have logged the determinants of the nondeterministic events that
causally precede its state accordi happened-before relation.
These events consist of the delivery of messages m0, m1, m2, m3, and m4.
The determinant of each of these non-deterministic events is either logged on the stable
storage or is available in the volatile log of process P0.
The determinant of each of these events contains the order in which its original receiver
delivered the corresponding message.
Page 17 of 32
CS8603 DS
The message sender, as in sender-based message logging, logs the message content. Thus,
process P P1 and P2 since it knows the
order in which P1 should replay messages m1 and m3 to reach the state from which P1 sent
message m4.
Similarly, P0 has the order in which P2 should replay message m2 to be consistent with
both P0 and P1.
The content of these messages is obtained from the sender log of P0 or regenerated
deterministically during the recovery of P1 and P2.
Note that information about messages m5 and m6 is lost due to failures. These messages
may be resent after recovery possibly in a different order.
However, since they did not causally affect the surviving process or the outside world, the
resulting state is consistent.
Each process maintains information about all the events that have causally affected its state.
Page 18 of 32
CS8603 DS
First Phase
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to take
tentative checkpoints. Each process informs Pi whether it succeeded in taking a tentative
checkpoint.
2.
3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase
1. Pi informs all the processes of the decision it reached at the end of the first phase.
2. A process, on receiving the message from Pi will act accordingly.
3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic
Correctness: for two reasons
i. Either all or none of the processes take permanent checkpoint
ii. No process sends message after taking permanent checkpoint
An Optimization
The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
Page 19 of 32
CS8603 DS
Optimization: May not to recover all, since some of the processes did not change
anything
Page 20 of 32
CS8603 DS
The above protocol, in the event of failure of process X, the above protocol will require
processes X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.
Page 21 of 32
CS8603 DS
The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
processor pi executes the following:
STEP (a)
if processor pi is recovering after a failure then
CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if
STEP (b)
for k = 1 1 to N {N is the number of processors in the system} do
for each neighboring processor pj do
Page 22 of 32
CS8603 DS
end for
for every ROLLBACK(j, c) message received from a neighbor j do
if RCV {Implies the presence of orphan messages} then
find the latest event e such that RCV
or stable storage.}
CkPti := e
end if
end for
end for{for k}
D. An Example
Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y
fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.
Page 23 of 32
CS8603 DS
f < n processes
Byzantine agreement attainable agreement not attainable
Page 24 of 32
CS8603 DS
Agreement: All non-faulty processes must agree on the same (single) value.
Validity: If all the non-faulty processes have the same initial value, then the agreed upon
value by all the non-faulty processes must be that same value.
The overhead bounds are for the given algorithms, and not necessarily tight bounds for
the
problem.
Page 25 of 32
CS8603 DS
Validity: If the source process is non-faulty, then the agreed upon value by all the non-
faulty processes must be the same as the initial value of the source.
Page 26 of 32
CS8603 DS
Page 27 of 32
CS8603 DS
Each phase has a unique "phase king" derived, say, from PID.
1 in 1st round, each process sends its estimate to all other processes.
2 in 2nd round, the "Phase king" process arrives at an estimate based on the values
it received in 1st round, and broadcasts its new estimate to all others.
Meenakshi.R
Page 28 of 32
CS8603 DS
(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes
Correctness Argument
Pi and Pj use their own majority values. Pi 's mult > n=2 + f )
Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi s mult > n=2 +
f , Pj 's mult > n=2 for same value)
Pi and Pj use the phase-king's tie-breaker value. (In the phase in which P k is
non- malicious, it sends same value to Pi and Pj )
In all 3 cases, argue that Pi and Pj end up with same value as estimate
If all non-malicious processes have the value x at the start of a phase, they will
continue to have x as the consensus value at the end of the phase.
_-Agreement: All non-faulty processes must make a decision and the values decided
upon by any two non-faulty processes must be within range of each other.
Validity: If a non-faulty process Pi decides on some value vi , then that value must be
within the range of values initially proposed by the processes.
Termination: Each non-faulty process must eventually decide on a value. The algorithm
for the message-passing model assumes n 5f + 1, although the problem is solvable for n
> 3f + 1.
Page 29 of 32
CS8603 DS
Not possible to go from bivalent to univalent state if even a single failure is allowed.
Using memory that is stronger than atomic Read/Write memory to design wait-
free consensus algorithms. Such a memory would need corresponding access
primitives.
Page 30 of 32
CS8603 DS
Are there objects (with supporting operations), using which there is a wait-free (i.e., (n -1)-
crash resilient) algorithm for reaching consensus in a n-process system? Yes, e.g., Test&Set,
Swap, Compare&Swap. The crash failure model requires the solutions to be wait-free.
Page 31 of 32
CS8603 DS
An object is defined to be universal if that object along with read/write registers can simulate
any other object in a wait-free manner. In any system containing up to k processes, an object
X such that CN(X) = k is universal.
For any system with up to k processes, the universality of objects X with consensus number k
is shown by giving a universal algorithm to wait-free simulate any object using objects of type
X and read/write registers.
2 Then, the arbitrary k-process consensus objects are simulated with objects of type
X, having consensus number k. This trivially follows after the first step.
The linked list stores the linearized sequence of operations and states following each
operation.
Operations to the arbitrary object Z are simulated in a nonblocking way using an arbitrary
consensus object (the field op.next in each record) which is accessed via the Decide call.
Each process attempts to thread its own operation next into the linked
list.
A single pointer/counter cannot be used instead of the array Head. Because reading
and updating the pointer cannot be done atomically in a wait-free manner.
Page 32 of 32