0% found this document useful (0 votes)
4 views

CS3551 Unit 2 Lecture Notes

The document discusses the concepts of logical time and global state in distributed systems, focusing on clock synchronization, causality, and the implementation of logical clocks. It covers the Network Time Protocol (NTP) for clock synchronization, the significance of causality in distributed algorithms, and the differences between scalar and vector clocks. Key learning outcomes include understanding the role of clock offset, the challenges of concurrency measurement, and the architecture of logical clocks.

Uploaded by

hariprasad.r.ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CS3551 Unit 2 Lecture Notes

The document discusses the concepts of logical time and global state in distributed systems, focusing on clock synchronization, causality, and the implementation of logical clocks. It covers the Network Time Protocol (NTP) for clock synchronization, the significance of causality in distributed algorithms, and the differences between scalar and vector clocks. Key learning outcomes include understanding the role of clock offset, the challenges of concurrency measurement, and the architecture of logical clocks.

Uploaded by

hariprasad.r.ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

UNIT II Logical Time and Global State:

Logical time: Physical Clock Synchronization: NTP- A Framework for a System of Logical
Clocks – Scalar Time – Vector Time: Message Ordering and Group Communication:
Message Ordering Paradigms – Asynchronous Execution with Synchronous Communication
– Synchronous Program Order on Asynchronous System – Group Communication – Causal
Order – Total Order; Global State and Snapshot Recording Algorithms : Introduction –
System Model and Definitions- Snapshot Algorithms for FIFO channels.

Lecture
Unit LOGICAL TIME AND GLOBAL STATE C303.2.1
No

Topic Logical Time: Physical Clock Synchronization

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Describe the role of clock offset and delay estimation in the K1
Network Time Protocol (NTP).

LO2 Explain the knowledge of concurrency among events help in K1


measuring concurrency in a distributed system.

LO3 How it is feasible to use a global physical clock in a distributed K1


system?

LO4 Discuss the significance of causality in distributed systems and K2


how it influences the design of distributed algorithms. Provide
examples of how causality can be applied in mutual exclusion,
deadlock detection, and maintaining consistency in replicated
databases.

Logical Time:

The concept of causality between events is fundamental to the design and analysis of
parallel and distributed computing and OS. Usually causality is tracked using physical time.
However, in DS, it is not possible to have global physical time.

Causality among events in a DS is a powerful concept in reasoning, analyzing and


drawing inferences about a computation. The knowledge of the causal precedence relation
among the events of processes helps solve a variety of problems in DS.

Examples of such problems:

1) Distributed algorithms design


2) Tracking of dependent events

3) Knowledge about the progress

4) Concurrency measure.

Distributed Algorithms Design:

The knowledge of the causal precedence relation among events helps ensure liveness
and fairness in mutual exclusion algorithms, help maintain consistency in replicated
database, and helps design correct deadlock detection algorithms to avoid phantom and
undetected deadlock.

Tracking of Dependent events:

In distributed debugging, the knowledge of the causal dependency among events


helps construct a consistent state for resuming reexecution; in failure recovery, it helps build
a checkpoint; in replicated databases, it aids in the detection of file inconsistencies in case of
a network partitioning.

Knowledge about the Progress:

The knowledge of the causal dependency among events helps measure the progress
of processes in the distributed computation. This is useful in discarding obsolete
information, garbage collection, and termination detection.

Concurrency Measure:

The knowledge of how many events are causally dependent is useful in measuring
the amount of concurrency in a computation. All events that are not causally related can be
executed concurrently. Thus, an analysis of the causality in a computation gives an idea of
the concurrency in the program.

PHYSICAL CLOCK SYNCHRONIZATION: NETWORK TIME PROTOCOL (NTP)

Centralized systems do not need clock synchronization, as they work under a common clock.
But the distributed systems do not follow common clock: each system functions based on its
own internal clock and its own notion of time. The time in distributed systems is measured
in the following contexts:

The time of the day at which an event happened on a specific machine in the network. The
time interval between two events that happened on different machines in the network. The
relative ordering of events that happened on different machines in the network.

Clock synchronization is the process of ensuring that physically distributed processors


have a common notion of time.

Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically clock synchronization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other
but also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.

Basic terminologies:

If Ca and Cb are two different clocks, then:

Time: The time of a clock in a machine p is given by the function Cp(t),where Cp(t)= t for
a perfect clock.

Frequency: Frequency is the rate at which a clock progresses. The frequency at time t of
clock Ca is Ca’(t).

Offset: Clock offset is the difference between the time reported by a clock and the real
time. The offset of the clock Ca is given by Ca(t)− t. The offset of clock C a relative to Cb at
time t ≥ 0 is given by Ca(t)- Cb(t)

Skew: The skew of a clock is the difference in the frequencies of the clock and the perfect
clock. The skew of a clock Ca relative to clock Cb at time t is Ca’(t)- Cb’(t).

Drift (rate): The drift of clock Ca the second derivative of the clock value with respect to
time. The drift is calculated as:

Clocking Inaccuracies

Physical clocks are synchronized to an accurate real-time standard like UTC

(Universal Coordinated Time). Due to the clock inaccuracy discussed above, a timer (clock)

is said to be working within its specification if:

- maximum skew rate.

1. Offset delay estimation

A time service for the Internet - synchronizes clients to UTC Reliability from redundant
paths, scalable, authenticates time sources Architecture. The design of NTP involves a
hierarchical tree of time servers with primary server at the root synchronizes with the UTC.
The next level contains secondary servers, which act as a backup to the primary server. At
the lowest level is the synchronization subnet which has the clients.
2. Clock offset and delay estimation

A source node cannot accurately estimate the local time on the target node due to varying

message or network delays between the nodes. This protocol employs a very common
practice of performing several trials and chooses the trial with the minimum delay.

Fig 1.24: Behavior of clocks

Fig 1.30 a) Offset and delay estimation Fig 1.30 b) Offset and delay estimation
between processes from same server
between processes from different servers
Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B are
stable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network delay
difference from A to B and from B to A, called differential delay, is small, the clock offset θ
and roundtrip delay δ of B relative to A at time T4 are approximately given by the following :

𝑎+𝑏
Θ= , δ=a-b
2

Each NTP message includes the latest three timestamps T1, T2 and T3. While T4 is determined
upon arrival. Thus, both peers A and B can independently calculate delay and offset using a
single bidirectional message stream.

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. Why is the concept of causality important in


distributed systems (DS)?

a) It helps in synchronizing physical clocks


across the network. b) It assists
in reasoning,
b) It assists in reasoning, analyzing, and analyzing,
drawing inferences about computations. and drawing K1
c) It provides a common notion of physical inferences
time in distributed systems. about
computations
d) It eliminates the need for distributed
algorithms.

2. What is the purpose of clock synchronization b) To


in distributed systems? provide a
common
a) To ensure each system has a unique notion notion of K1
of time. time across
b) To provide a common notion of time distributed
processors
across distributed processors.

c) To measure the time of day on each


individual machine.

d) To avoid the need for distributed


algorithms.

3. Which of the following problems can be


addressed by understanding the causal
precedence relation among events in
distributed systems?

a) Memory management and paging. b) Liveness


and fairness
b) Liveness and fairness in mutual exclusion in mutual K1
algorithms. exclusion
c) Centralized system performance algorithms.
optimization.

d) Reducing network bandwidth.

4. In the context of physical clocks, what is


meant by "offset"?

a) The rate at which a clock progresses.


c) The
b) The difference in the frequencies of two difference
clocks. between the
time reported K1
c) The difference between the time reported by a clock
by a clock and the real time. and the real
time.
d) The second derivative of the clock value
with respect to time.

5. Which term refers to the difference in


frequencies of two clocks at a specific time?

a) Drift. c) Skew. K1
b) Offset.

c) Skew.
d) Frequency.

6. How does the Network Time Protocol (NTP)


estimate clock offset and delay between two
nodes?

a) By averaging the results of multiple trials


c) By
and choosing the maximum delay.
performing
b) By using a single trial to measure the clock several trials
offset directly. and choosing K1
the trial with
c) By performing several trials and choosing the minimum
the trial with the minimum delay. delay.
d) By using the first trial only, as it is the
most accurate.

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the role of clock offset and delay


2 2 K1
estimation in the Network Time Protocol (NTP)?

2. How does the knowledge of concurrency among


events help in measuring concurrency in a 2 2 K1
distributed system?

3. Why is it not feasible to use a global physical clock


2 2 K1
in a distributed system?

4. Discuss the significance of causality in distributed


systems and how it influences the design of 13 2 K2
distributed algorithms. Provide examples of how
causality can be applied in mutual exclusion,
deadlock detection, and maintaining consistency in
replicated databases.

5. Describe the architecture of the Network Time


Protocol (NTP) and its hierarchical design. How
does this architecture contribute to the reliability,
scalability, and accuracy of time synchronization in 13 2 K2
distributed systems? Include a discussion of how
NTP handles clock offset and delay estimation.

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 50,78-81
and Mukesh Singhal
and Systems
CS68603 DS

Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.2

Topic A Framework for a System of Logical Clocks, Scalar Time – Vector Time

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Write the primary purpose of Lamport’s logical clock in K1


distributed systems

LO2 List the major limitation of scalar clocks in distributed K1


systems

LO3 Discuss the challenges and solutions in maintaining K2


logical time in distributed systems. How do different
logical clock systems (e.g., Lamport’s scalar clock and
vector clocks) address issues related to causality, event
ordering, and consistency

A Framework for a system of logical clocks

A system of logical clocks consists of a time domain T and a logical clock C. Elements of T
form a partially ordered set over a relation <. This relation is usually called the happened
before or causal precedence.

The logical clock C is a function that maps an event e in a distributed system to an element in
the time domain T denoted as C(e).

such that for any two events ei and ej,. eiej C(ei)< C(ej).

This monotonicity property is called the clock consistency condition. When T and C satisfy

the following condition,

Then the system of clocks is strongly consistent.

Implementing logical clocks

The two major issues in implanting logical clocks are:

Data structures: representation of each process

Protocols: rules for updating the data structures to ensure consistent conditions.

Meenakshi R
Data structures:

Each process pimaintains data structures with the given capabilities:

• A local logical clock (lci), that helps process pi measure its own progress.

• A logical global clock (gci), that is a representation of process pi’s local view of the logical

global time. It allows this process to assign consistent timestamps to its local events.

Protocol

The protocol ensures that a process’s logical clock, and thus its view of the global time, is
managed consistently with the following rules:

Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.

Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.

Scalar Time

Scalar time is designed by Lamport to synchronize all the events in distributed


systems. A Lamport logical clock is an incrementing counter maintained in each process.
This logical clock has meaning only in relation to messages moving between processes.
When a process receives a message, it resynchronizes its logical clock with that sender
maintaining causal relationship.

The Lamport’s algorithm is governed using the following rules:

The algorithm of Lamport Timestamps can be captured in a few rules:

 All the process counters start with value 0.

 A process increments its counter for each event (internal event, message sending,
message receiving) in that process.

 When a process sends a message, it includes its (incremented) counter value with the
message.

 On receiving a message, the counter of the recipient is updated to the greater of its
current counter and the timestamp in the received message, and then incremented by
one.

If Ci is the local clock for process Pi then,


if a and b are two successive events in Pi, then Ci(b) = Ci(a) + d1, where d1 > 0
CS68603 DS
if a is the sending of message m by Pi, then m is assigned timestamp tm = Ci(a)

if b is the receipt of m by Pj, then Cj(b) = max{Cj(b), tm + d2}, where d2 > 0

Rules of Lamport’s clock

Rule 1: Ci(b) = Ci(a) + d1, where d1 > 0

Rule 2: The following actions are implemented when pi receives a message m with
timestamp Cm:

a) Ci= max(Ci, Cm)

b) Execute Rule 1

c) deliver the message

Fig 1.20: Evolution of scalar time

Basic properties of scalar time:

1. Consistency property: Scalar clock always satisfies monotonicity. A monotonic clock

only increments its timestamp and never jump. Hence it is consistent.

Meenakshi R
2. Total Reordering: Scalar clocks order the events in distributed systems. But all the events
do not follow a common identical timestamp. Hence a tie breaking mechanism is essential to
order the events. The tie breaking is done through:

Linearly order process identifiers.

Process with low identifier value will be given higher priority.

The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is the
identity of the process where it occurred.

The total order relation ( ) over two events x and y with timestamp (h, i) and (k, j) is
given by:

A total order is generally used to ensure liveness properties in distributed algorithms.

3. Event Counting

If event e has a timestamp h, then h−1 represents the minimum logical duration, counted in
units of events, required before producing the event e. This is called height of the event e. h-
1 events have been produced sequentially before the event e regardless of the processes that
produced these events.

4. No strong consistency

The scalar clocks are not strongly consistent is that the logical local clock and logical global
clock of a process are squashed into one, resulting in the loss causal dependency information
among events at different processes.

Vector Time

The ordering from Lamport's clocks is not enough to guarantee that if two events precede one
another in the ordering relation they are also causally related. Vector Clocks use a vector
counter instead of an integer counter. The vector clock of a system with N processes is a
vector of N counters, one counter per process.

Vector counters have to follow the following update rules:

 Initially, all counters are zero.

 Each time a process experiences an event, it increments its own counter in the vector
by one.

Meenakshi R
 Each time a process sends a message, it includes a copy of its own (incremented)
vector in the message.

 Each time a process receives a message, it increments its own counter in the vector by
one and updates each element in its vector by taking the maximum of the value in its
own vector counter and the value in the vector in the received message.

The time domain is represented by a set of n-dimensional non-negative integer vectors in


vector time.

Rules of Vector Time

Rule 1: Before executing an event, process pi updates its local logical time as follows:

Rule 2: Each message m is piggybacked with the vector clock vt of the sender process at
sending time. On the receipt of such a message (m,vt), process

pi executes the following sequence of actions:

1. update its global logical time

2. execute R1

3. deliver the message m

Basic properties of vector time

Isomorphism:

“→” induces a partial order on the set of events that are produced by a distributed execution.

If events x and y are time stamped as vh and vk then,

There is an isomorphism between the set of partially ordered events produced by a

distributed computation and their vector timestamps.

If the process at which an event occurred is known, the test to compare two timestamps can
be simplified as:
CS68603 DS
2. Strong consistency

The system of vector clocks is strongly consistent; thus, by examining the vector timestamp

of two events, we can determine if the events are causally related.

3. Event counting

If an event e has timestamp vh, vh[j] denotes the number of events executed by process pj
that causally precede e.

Vector clock ordering relation

[i]- timestamp of process i.

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What is the key property that must be


satisfied by a system of logical clocks to be
considered strongly consistent? a) The clocks
must be synchronized to physical time.

b) The clocks must maintain a total order of c) The clocks


events. must satisfy
the clock K1
c) The clocks must satisfy the clock consistency
consistency condition. condition.

d) The clocks must have identical


timestamps for all events.

Meenakshi R
2. In Lamport's logical clock algorithm,
what happens when a process receives a
c) The
message with a timestamp?
process
a) The process resets its local clock to zero. updates its
clock to the
b) The process ignores the timestamp and maximum of
uses its current clock value. its current
c) The process updates its clock to the clock value K1
maximum of its current clock value and the and the
received timestamp, then increments it by received
one. timestamp,
then
d) The process decreases its clock value to increments it
match the received timestamp. by one.

3. What is the main limitation of Lamport's


scalar time in distributed systems?

a) It cannot ensure the consistency property. c) It does not


guarantee
b) It fails to provide a total order of events. strong
c) It does not guarantee strong consistency consistency
K1
and may lose causal dependency and may lose
information. causal
dependency
d) It requires global synchronization of information.
physical clocks.

4. How does a vector clock differ from a


c) A vector
Lamport scalar clock in terms of event
clock
ordering?
maintains
a) A vector clock uses a single counter, while partial
a scalar clock uses multiple counters. ordering of
events and
b) A vector clock provides a total order of K1
preserves
events, while a scalar clock does not. causal
relationships,
c) A vector clock maintains partial ordering
while a scalar
of events and preserves causal relationships,
clock does
while a scalar clock does not.
not.
d) A vector clock synchronizes with physical
time, while a scalar clock does not

5. What does the vector clock update rule


require when a process receives a message c) The
with a vector timestamp? process
updates each
a) The process increments all elements of its
element in its
vector by one.
vector by
b) The process sets its vector to the received taking the
vector without modification. maximum of
its current K1
c) The process updates each element in its value and the
vector by taking the maximum of its current corresponding
value and the corresponding value in the value in the
received vector, then increments its own received
counter. vector, then
increments its
d) The process resets its vector to all zeros.
own counter.

6. Which property of vector clocks allows us


to determine if two events are causally
related?

a) Event counting.
c) Strong
b) Total ordering. K1
consistency.
c) Strong consistency.

d) Clock skew.

7. In vector clocks, what does the vector


b) The
element vh[j] represent in the timestamp
number of
of an event e?
events
a) The number of events that causally executed by K1
succeed eat process pj. process pj
that causally
b) The number of events executed by process precede e.
pj that causally precede e.
c) The total number of events in the system.

d) The global time when e occurred.

8 What is the purpose of the tie-breaking


mechanism in Lamport's scalar clock
system?

a) To ensure that all events have the same


c) To order
timestamp.
events with
b) To maintain the global synchronization of identical
physical clocks. timestamps K1
based on
c) To order events with identical timestamps process
based on process identifiers. identifiers.
d) To reset the clock when two events occur
simultaneously.

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the primary purpose of Lamport’s logical


2 CO2 K1
clock in distributed systems?

2. What is the major limitation of scalar clocks in


2 CO2 K1
distributed systems?

3. What is the significance of the "isomorphism"


2 CO2 K1
property in vector clocks?

4. What is the primary purpose of Lamport’s logical


2 CO2 K1
clock in distributed systems?

5. Discuss the challenges and solutions in maintaining 13 CO2 K2


logical time in distributed systems. How do
different logical clock systems (e.g., Lamport’s
scalar clock and vector clocks) address issues
related to causality, event ordering, and
consistency?

6. Discuss the differences between scalar time and


vector time in distributed systems. How do these
13 CO2 K1
two approaches handle the ordering of events, and
what are the advantages and limitations of each?

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 52-59
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.3

Topic MESSAGE ORDERING AND GROUP COMMUNICATION

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Explain the purpose of the notation a ∼b in the context K1


of distributed systems.

LO2 Write the significance of the notation mi in distributed K1


systems communication

LO3 List the internal events not considered in message K1


ordering definitions in distributed systems

MESSAGE ORDERING AND GROUP COMMUNICATION

Inter-process communication via message – passing is at the core of any distributed system.
Multicasts are required at the application layer when super imposed topologies or overlays
are used, as well as at the lower layers of the protocol stack.

Notation:

The distributed system as a graph (N,L). The following notation is used to refer to
messages and events:

 When referring to a message without regards for the identity of the sender and
receiver proceses, we use mi. For message mi, its send and receive events are denoted
as si and ri, respectively.

 More generally, send and receive events are denoted simply as s and r. When the
relationship between the message and its send and receive events is to be stressed we
also use M, send(M) and receive (M), respectively.

For any two events a and b, where each can be either a send event or a receive event, the
notation a~b denotes that a and b occur at the same process, i.e., aƐ Ei and b ƐEi for some
process i. the send and receive event pair for a message is said to be a pair of corresponding
events. The send event corresponds to the receive event, and vice versa. For a given
execution E, let the set of all send-receive event pairs be denoted as Ʈ={(s,r)Ɛ Ei X Ej | s
corresponds to r}

When dealing with message ordering definitions, consider only send and receive events, but
not internal events, because only communication events are relevant.

As the distributed systems are a network of systems at various physical locations, the
coordination between them should always be preserved. The message ordering means the
order of delivering the messages to the intended recipients. The common message order
schemes are First in First out (FIFO), non FIFO, causal order and synchronous order. In case
of group communication with multicasting, the causal and total ordering scheme is followed.
It is also essential to define the behaviour of the system in case of failures. The following
are the notations that are widely used in this chapter:

Distributed systems are denoted by a graph (N, L).

The set of events are represented by event set {E, }

Message is denoted as mi: send and receive events as si and ri respectively.

Send (M) and receive (M) indicates the message M send and received.

ab denotes a and b occurs at the same process

The send receive pairs ={(s, r) Ei x Ejcorresponds to r}

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What does the notation (N, L) represent in


the context of distributed systems?

 A) The set of nodes and links in a


A) The set
distributed system
of nodes
 B) The set of events and messages in a and links in
K1
distributed system a
distributed
 C) The set of processes and their events system
in a distributed system

 D) The set of messages and their


corresponding send and receive events

2. What do si and ri represent in the context of


message mi?

 A) Send and receive events for message A) Send


and receive
mi
events for K1
 B) Internal events in a process message
mi
 C) The start and end of a process

 D) The set of all events in a process


3. What does the notation a~b signify?

 A) Events a and b occur in the same


process A) Events
a and b
 B) Events a and b are different send and
occur in K1
receive events
the same
 C) Events a and b are internal events process

 D) Events a and b occur in different


processes

4. Which of the following is NOT considered


when dealing with message ordering
definitions?

 A) Send events C) Internal


K1
events
 B) Receive events

 C) Internal events

 D) Communication events

5. Which message ordering schemes are


commonly used in group communication
with multicasting?

 A) FIFO and non-FIFO B) Causal


and total K1
 B) Causal and total ordering ordering

 C) Asynchronous and synchronous

 D) Internal and external ordering

6. In the context of distributed systems, what


does mi represent? C) A
message
 A) A specific process without
regard to
 B) A specific event K1
the identity
 C) A message without regard to the of the
identity of the sender and receiver sender and
receiver
 D) A network link between processes

Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. Explain the purpose of the notation a ∼b in the


2 CO2 K1
context of distributed systems.

2. What is the significance of the notation mi in


2 CO2 K1
distributed systems communication?

3. Why are internal events not considered in message


2 CO2 K1
ordering definitions in distributed systems?

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 189-190
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.4

Topic Message Ordering Paradigms

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Write the main difference between FIFO and non- K1


FIFO message ordering in distributed systems

LO2 List the purpose of using a sequence number in FIFO K1


channels, and how does it work

LO3 Examine the importance of message ordering in group K2


communication within distributed systems. How do
different message ordering paradigms affect the
reliability and consistency of group communication

Message Ordering Paradigms

The order of delivery of messages in a distributed system is an important aspect of system


executions because it determines the messaging behaviour that can be expected by the
distributed program. Distributed program logic greatly depends on their order of delivery. To
simplify the task of the programmer, programming languages in conjunction with the
middleware provide certain well-defined message delivery behaviour. The programmer can
then code the program logic with respect to this behaviour.

The message orderings are

(i) non-FIFO (ii) FIFO

(iii) Causal order

(iv) Synchronous order

There is always a trade-off between concurrency and ease of use and implementation.

Asynchronous Executions

An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality
relation
is a partial order. CS8603 DS

There cannot be any causal relationship between events in asynchronous execution.

The messages can be delivered in any order even in non FIFO.

Though there is a physical link that delivers the messages sent on it in FIFO order due to the
physical properties of the medium, a logical link may be formed as a composite of physical
links and multiple paths may exist between the two end points of the Logical link.

Meenakshi.R

Fig 2.1: a) FIFO executions b) non FIFO executions

FIFO executions

A FIFO execution is an A-execution in which, for all

The logical link is non-FIFO.

FIFO logical channels can be realistically assumed when designing distributed


algorithms since most of the transport layer protocols follow connection oriented service.

A FIFO logical channel can be created over a non-FIFO channel by using a separate
numbering scheme to sequence the messages on each logical channel.

The sender assigns and appends a <sequence_num, connection_id> tuple to each


message.

The receiver uses a buffer to order the incoming messages as per the sender’s
sequence numbers, and accepts only the “next” message in sequence.

Causally Ordered (CO) executions


CO execution is an A-execution in which, for all,

Two send events s and s’ are related by causality ordering (not physical time
ordering), then a causally ordered execution requires that their corresponding receive events r
and r’ occur in the same order at all common destinations.

If s and s’ are not related by causality, then CO is vacuously(blankly)satisfied.

Causal order is used in applications that update shared data, distributed shared
memory, or fair resource allocation.

The delayed message m is then given to the application for processing. The event of
an application processing an arrived message is referred to as a delivery event.

No message overtaken by a chain of messages between the same (sender, receiver)

pair.

If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.

Other properties of causal ordering

1. Message Order (MO): A MO execution is an A-execution in which, for all

2. Empty Interval Execution: An execution (E ≺) is an empty-interval (EI)

execution if for each pair of events (s, r) ∈ T, the open interval set

in the partial order is empty.

3. An execution (E, ≺) is CO if and only if for each pair of events (s, r) ∈ T and each event
e ∈ E,

weak common past:

weak common future:

Synchronous Execution

When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.The synchronous
communication always involves a handshake between the receiver and the sender, the
handshake events may appear to be occurring instantaneously and atomically.

The instantaneous communication property of synchronous executions requires a

Meenakshi.R
modified definition of the causality relation because for each (s, r) ∈ T, the send

event is not causally ordered before the receive event. The two events are viewed as being
atomic and simultaneous, and neither event precedes the other.

Fig 2.2 a) Execution in an asynchronous Fig 2.2 b) Equivalent synchronous communication


system
CS8603 DS

Causality in a synchronous execution: The synchronous causality relation << on E is the


smallest transitive relation that satisfies the following:

S1: If x occurs before y at the same process, then x << y.

S2: If (s, r ∈ T, then) for all x ∈ E, [(x<< s ⇐⇒ x<<r) and (s<< x ⇐⇒ r<< x)].

S3: If x<<y and y<<z, then x<<z.

Synchronous execution: A synchronous execution or S-execution is an execution (E, <<)


for which the causality relation << is a partial order.

Time stamping a synchronous execution: An execution (E, ≺) is synchronous if and only if


there exists a mapping from E to T (scalar timestamps) such that

for any message M, T(s(M)) = T(r(M))

for each process Pi , if ei≺ei’, then T(ei) < T(ei’).

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. Which of the following message


ordering paradigms ensures that
messages sent by the same process
are received in the same order they
were sent?

 a) Non-FIFO b) FIFO K1

 b) FIFO

 c) Causal order

 d) Synchronous order

2. In which message ordering paradigm


are messages received in an order
that preserves the causal relationship c) Causal order K2
between events?

 a) FIFO

Meenakshi.R
 b) Non-FIFO

 c) Causal order

 d) Synchronous order

3. Which of the following is NOT a


property of causally ordered (CO)
executions?

 a) If two send events are


causally related, their
corresponding receive events
must occur in the same order at
b) CO executions
all common destinations.
require that all
 b) CO executions require that messages between
all messages between a sender a sender and a K1
and a receiver are delivered in receiver are
FIFO order. delivered in FIFO
order.
 c) If two send events are not
causally related, the order of
their corresponding receive
events is irrelevant.

 d) CO is vacuously satisfied if
there is no causal relationship
between send events.

4. What is the primary characteristic of


synchronous execution in distributed
systems?

 a) Messages are delivered in


c) Communication
FIFO order.
involves a
 b) Messages are received in a handshake, and
non-FIFO order. events are viewed K1
as occurring
 c) Communication involves a instantaneously and
handshake, and events are atomically.
viewed as occurring
instantaneously and atomically.

 d) It ensures that all messages


are causally ordered.
5. Which of the following is true about
asynchronous executions in
distributed systems?

 a) They always maintain a


causal relationship between
events. b) Messages can be
delivered in any
 b) Messages can be delivered in K1
order, including
any order, including non-FIFO
non-FIFO order.
order.

 c) They require synchronous


send and receive operations.

 d) They guarantee a total order


of message delivery.

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the main difference between FIFO


and non-FIFO message ordering in 2 CO2 K1
distributed systems?

2. What is the purpose of using a sequence


number in FIFO channels, and how does it 2 CO2 K1
work?

3. Examine the importance of message


ordering in group communication within
distributed systems. How do different
13 CO2 K2
message ordering paradigms affect the
reliability and consistency of group
communication?
Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 190-195
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.5

Topic Asynchronous execution with synchronous communication

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Define Realizable Synchronous Communication K1


(RSC) and how is it related to A-execution

LO2 Write the purpose of a "crown" in the context of K1


message ordering, and why is it significant

LO3 Analyze the challenges of ensuring synchronous K2


communication in asynchronous distributed systems.
How can Realizable Synchronous Communication
(RSC) be achieved, and what role does the crown
criterion play in this context

LO4 Discuss the impact of non-FIFO message ordering on K2


the design and verification of distributed algorithms.
What strategies can be employed to manage the
complexity introduced by non-FIFO channels

Asynchronous execution with synchronous communication

When all the communication between pairs of processes is by using synchronous send and
receive primitives, the resulting order is synchronous order. The algorithms run on
asynchronous systems will not work in synchronous system and vice versa is also true.

Realizable Synchronous Communication (RSC)

A-execution can be realized under synchronous communication is called a realizable with


synchronous communication (RSC).

An execution can be modeled to give a total order that extends the partial order

(E, ≺).

In an A-execution, the messages can be made to appear instantaneous if there exist a

linear extension of the execution, such that each send event is immediately followed by its

corresponding receive event in this linear extension.


Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺) DS
CS8603
such that for each pair (s, r) ∈ T, the interval { x∈ E s ≺ x ≺ r } is empty.

A A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear
extension of the partial order (E, ≺).

In the non-separated linear extension, if the adjacent send event and its corresponding

receive event are viewed atomically, then that pair of events shares a common past and a
common future with each other.

Crown

Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of pairs
of corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1, sk−1 ≺
r0.

The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies

may exist in a crown. The crown criterion states that an A-computation is RSC, i.e., it can be

realized on a system with synchronous communication, if and only if it contains no crown.

Timestamp criterion for RSC execution

An execution (E, ≺) is RSC if and only if there exists a mapping from E to T (scalar

timestamps) such that

Hierarchy of ordering paradigms

The orders of executions are:

Synchronous order (SYNC)

Causal order (CO)

FIFO Order(FIFO)

Non FIFO order (non-FIFO)

The Execution orders have the following results

For an A-execution, A is RSC if and only if A is an S-execution.

RSC ⊂ CO ⊂ FIFO ⊂ A

This hierarchy is illustrated in Figure 2.3(a), and example executions of each class are

Meenakshi.R
shown side-by-side in Figure 2.3(b)

The above hierarchy implies that some executions belonging to a class X will not
belong to any of the classes included in X. The degree of concurrency is most in A and least
in SYNC.

A program using synchronous communication is easiest to develop and verify.

A program using non-FIFO communication, resulting in an A execution, is hardest to


design and verify.

Fig 2.3: Hierarchy of execution classes

Simulations

The events in the RSC execution are scheduled as per some non-separated linear

extension, and adjacent (s, r) events in this linear extension are executed sequentially in the
synchronous system.

The partial order of the asynchronous execution remains unchanged.

If an A-execution is not RSC, then there is no way to schedule the events to make
them RSC, without actually altering the partial order of the given A-execution.

However, the following indirect strategy that does not alter the partial order can be
used.

Each channel Ci,j is modeled by a control process Pi,j that simulates the channel buffer.

An asynchronous communication from i to j becomes a synchronous communication


from i to Pi,j followed by a synchronous communication from Pi,j to j.

This enables the decoupling of the sender from the receiver, a feature that is essential
in asynchronous system.

Meenakshi.R
CS8603 DS

Fig 2.4: Modeling channels as processes to simulate an execution using asynchronous


primitives on synchronous system

Synchronous programs on asynchronous systems

A (valid) S-execution can be trivially realized on an asynchronous system by

scheduling the messages in the order in which they appear in the S-execution.

The partial order of the S-execution remains unchanged but the communication
occurs on an asynchronous system that uses asynchronous communication primitives.

Once a message send event is scheduled, the middleware layer waits for
acknowledgment; after the ack is received, the synchronous send primitive completes.

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What is the primary characteristic of


synchronous execution in distributed
systems?

 a) Messages are delivered in c) Communication


FIFO order. involves a
handshake, and
 b) Messages are received in a events are viewed K1
non-FIFO order. as occurring
instantaneously and
 c) Communication involves a
atomically.
handshake, and events are
viewed as occurring
instantaneously and atomically.

 d) It ensures that all messages


are causally ordered.

2. What is the significance of a "crown"


in the context of message ordering?

 a) It represents a causal
relationship between events. b) It indicates a
cyclic dependency
 b) It indicates a cyclic
that prevents
dependency that prevents
realizable K1
realizable synchronous
synchronous
communication (RSC).
communication
 c) It ensures that messages are (RSC).
delivered in FIFO order.

 d) It helps to enforce total


ordering of messages.

3. In a system where synchronous


communication is used, which of the
following statements is true?

 a) Messages are sent and


received without c) Messages appear
acknowledging the delivery. to be sent and
received
 b) The causality relation is instantaneously, K1
partial and not guaranteed. without any delays.
 c) Messages appear to be sent
and received instantaneously,
without any delays.

 d) The message order can be


non-FIFO.

4. In the hierarchy of execution classes,


which ordering paradigm provides
the highest degree of concurrency?

 a) Synchronous order (SYNC)


d) Non-FIFO order
K1
 b) Causal order (CO) (A-execution)

 c) FIFO order

 d) Non-FIFO order (A-


execution)
5. What does the term "Realizable
Synchronous Communication
(RSC)" refer to in distributed
systems?

 a) An execution that can only be


achieved using asynchronous
communication.
b) A linear
 b) A linear extension of an A- extension of an A-
execution where each send execution where
event is immediately followed each send event is
K1
by its corresponding receive immediately
event. followed by its
corresponding
 c) A causal order of events that receive event.
cannot be converted into
synchronous communication.

 d) A communication model
where messages are always
delivered in FIFO order.

6. Which of the following is true for


synchronous programs executed on
asynchronous systems?

 a) The partial order of the


synchronous execution must be
altered to fit asynchronous
b) The
communication.
communication
 b) The communication must must occur using
occur using asynchronous asynchronous K1
primitives, with the order of primitives, with the
events being preserved. order of events
being preserved.
 c) Synchronous send primitives
are always blocked until the
message is acknowledged.

 d) Synchronous executions
cannot be realized on
asynchronous systems.
Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is Realizable Synchronous Communication


2 CO2 K1
(RSC) and how is it related to A-execution?

2. What is a "crown" in the context of message


2 CO2 K1
ordering, and why is it significant?

3. Analyze the challenges of ensuring synchronous


communication in asynchronous distributed
systems. How can Realizable Synchronous 13 CO2 K2
Communication (RSC) be achieved, and what role
does the crown criterion play in this context?

4. Discuss the impact of non-FIFO message ordering


on the design and verification of distributed
algorithms. What strategies can be employed to 13 CO2 K2
manage the complexity introduced by non-FIFO
channels?

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 195-200
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.6

Topic SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS SYSTEM

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Write the difference between a binary rendezvous and K1


a multi-way rendezvous in synchronous
communication.

LO2 Write the purpose of introducing process identifiers in K1


the Bagrodia algorithm.

LO3 Analyze the Bagrodia algorithm for scheduling K2


synchronous communication in distributed systems.

LO4 Describe the process of scheduling synchronous K2


communication events in a distributed system.

SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS SYSTEM

There do not exists real systems with instantaneous communication that allows for
synchronous communication to be naturally realized. The basic question is how a system
with synchronous communication can be implemented. First examine non-determinism in
program execution, and CSP as a representative synchronous programming language, before
examing an implementation of synchronous communication.

Non deterministic programs

The partial ordering of messages in the distributed systems makes the repeated runs of the
same program will produce the same partial order, thus preserving deterministic nature. But
sometimes the distributed systems exhibit non determinism:

A receive call can receive a message from any sender who has sent a message, if the
expected sender is not specified.

Multiple send and receive calls which are enabled at a process can be executed in an
interchangeable order.

If i sends to j, and j sends to i concurrently using blocking synchronous calls, there


results a deadlock.

There is no semantic dependency between the send and the immediately following
receive at each of the processes. If the receive call at one of the processes can be scheduled
before the send call, then there is no deadlock.
Rendezvous

Rendezvous systems are a form of synchronous communication among an arbitraryCS8603


numberDS
of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems
are possible:

Binary rendezvous: When two processes agree to synchronize.

Multi-way rendezvous: When more than two processes agree to synchronize.

Features of binary rendezvous:

For the receive command, the sender must be specified. However, multiple recieve

commands can exist. A type check on the data is implicitly performed.

Send and received commands may be individually disabled or enabled. A command is


disabled if it is guarded and the guard evaluates to false. The guard would likely contain an
expression on some local variables.
Meenakshi.R
Synchronous communication is implemented by scheduling messages under
the covers using asynchronous communication.

Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not alter the
partial order of the execution.

Binary rendezvous algorithm

If multiple interactions are enabled, a process chooses one of them and tries to

synchronize with the partner process. The problem reduces to one of scheduling messages
satisfying the following constraints:

Schedule on-line, atomically, and in a distributed manner.

Schedule in a deadlock-free manner (i.e., crown-free).

Schedule to satisfy the progress property in addition to the safety property.

Steps in Bagrodia algorithm

1. Receive commands are forever enabled from all processes.

2. A send command, once enabled, remains enabled until it completes, i.e., it is


not possible that a send command gets before the send is executed.

3. To prevent deadlock, process identifiers are used to introduce asymmetry to break


potential crowns that arise.
4. Each process attempts to schedule only one send event at any time.
CS8603 DS

The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M),
and permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.

(message types)

M, ack(M), request(M), permission(M)

(1) Pi wants to execute SEND(M) to a lower priority process Pj:

Pi executes send(M) and blocks until it receives ack(M) from Pj . The send event SEND(M)
now completes.

Any M’ message (from a higher priority processes) and request(M’) request for
synchronization (from a lower priority processes) received during the blocking period are
queued.

(2) Pi wants to execute SEND(M) to a higher priority process Pj:

(2a) Pi seeks permission from Pj by executing send(request(M)).

// to avoid deadlock in which cyclically blocked processes queue // messages. (2b) While Pi
is waiting for permission, it remains unblocked.

(i) If a message M’ arrives from a higher priority process Pk, Pi accepts M’ by


scheduling a

RECEIVE(M’) event and then executes send(ack(M’)) to Pk.

(ii) If a request (M’) arrives from a lower priority process Pk, Pi executes
send(permission(M’)) to Pk
and blocks waiting for the message M’. When M’ arrives, the RECEIVE(M’) event is
executed.

(2c) When the permission (M) arrives, Pi knows partner Pj is synchronized and Pi executes
send(M). The SEND (M) now completes.

(3) request (M) arrival at Pi from a lower priority process Pj:

At the time a request (M) is processed by Pi, process Pi executes send (permission(M)) to Pj
and blocks waiting for the message M. When M arrives, the RECEIVE (M) event is executed
and the process unblocks.

(4) Message M arrival at Pi from a higher priority process Pj:

At the time a message M is processed by Pi, process Pi executes RECEIVE(M) (which is


assumed to be always enabled) and then send(ack(M)) to Pj .

(5) Processing when Pi is unblocked:

When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it
as a message arrival (as per rules 3 or 4).

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What is a key characteristic of C) It involves


synchronous communication in processes
distributed systems? communicating K1
at the same
 A) It allows for instantaneous time.
communication.

 B) It requires real-time clocks for


synchronization.

 C) It involves processes
communicating at the same time.

 D) It only works with deterministic


systems.

2. What can cause non-determinism in


distributed systems?

 A) The use of a single receive call


that specifies an expected sender.
C) Blocking
 B) Synchronous communication synchronous
between processes. calls executed
K1
 C) Blocking synchronous calls concurrently
executed concurrently by two by two
processes. processes.

 D) A deterministic sequence of send


and receive events.

3. In a binary rendezvous system, what is


required for the receive command?

 A) The sender must not be specified.


B) The
 B) The receiver must be specified.
receiver must K1
 C) The receive command must occur be specified.
before the send command.

 D) The send command must be


disabled.

4. What is a feature of binary rendezvous C)


systems? Synchronous
communication
 A) They do not allow type checking is implemented K1
on data. using
 B) Commands cannot be disabled asynchronous
based on guard conditions. communication
under the
 C) Synchronous communication is covers.
implemented using asynchronous
communication under the covers.

 D) They require all processes to


synchronize at the same time.

5. What does the Bagrodia algorithm ensure


in the context of binary rendezvous?

 A) Deadlock-free scheduling of send


and receive events.
A) Deadlock-
 B) All processes are blocked until free scheduling
K1
communication is completed. of send and
receive events.
 C) Processes can execute multiple
send events simultaneously.

 D) Send events are executed based


on the last process identifier.

6. Which of the following is NOT a message


type used in the synchronous execution
model?

 A) M
C) sync(M) K1
 B) ack(M)

 C) sync(M)

 D) permission(M)

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the difference between a binary


rendezvous and a multi-way rendezvous in 2 CO2 K1
synchronous communication?

2. What is the purpose of introducing process


2 CO2 K1
identifiers in the Bagrodia algorithm?

3. Analyze the Bagrodia algorithm for 13 CO2 K2


scheduling synchronous communication in
distributed systems.

4. Describe the process of scheduling


synchronous communication events in a 13 CO2 K2
distributed system.

Referenc Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 200-205
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.7

Topic GROUP COMMUNICATION

Learning Outcome (LO) At the end of this lecture, students


Bloom’s Knowledge Level
will be able to

LO1 Explain How does the closed group multicast algorithm K1


differ from the open group multicast algorithm in terms
of sender involvement.

LO2 List the primary purpose of the Raynal–Schiper–Toueg K1


algorithm in the context of causal ordering

LO3 Explain the Raynal–Schiper–Toueg algorithm in detail, K2


highlighting its role in ensuring causal order in group
communication. Discuss the challenges involved in
implementing this algorithm and the techniques used to
reduce overhead in local space and message space.

LO4 Compare and contrast open group and closed group K2


multicast algorithms. Discuss their advantages,
disadvantages, and the contexts in which each type is
most appropriate. Provide examples to support your
analysis.

GROUP COMMUNICATION

Group communication is done by broadcasting of messages. A message broadcast is the


sending of a message to all members in the distributed system. The communication may be

Multicast: A message is sent to a certain subset or a group.

Unicasting: A point-to-point message communication.

The network layer protocol cannot provide the following functionalities:

 Application-specific ordering semantics on the order of delivery of messages.

 Adapting groups to dynamically changing membership.

 Sending multicasts to an arbitrary set of processes at each send event.

 Providing various fault-tolerance semantics.


 The multicast algorithms can be open or closed group.
CS8603 DS
Differences between closed and open group algorithms:

Closed group algorithms Open group algorithms


If sender is also one of the receiver in the If sender is not a part of the communication

multicast algorithm, then it is closed group group, then it is open group algorithm.
They are specific and easy to implement.
algorithm. They are more general, difficult to design and

It does not support large systems where client It


expensive.
can support large systems.

processes have short life.

CAUSAL ORDER (CO)


Pageare
In the context of group communication, there 8 of
two19modes of communication:

causal order and total order. Given a system with FIFO channels, causal order needs to be
explicitly enforced by a protocol. The following two criteria must be met by a causal

ordering protocol:

Safety: In order to prevent causal order from being violated, a message M that
arrives at a process may need to be buffered until all system wide messages sent in the causal
past of the send (M) event to that same destination have already arrived. The arrival of a
message is transparent to the application process. The delivery event corresponds to the
receive event in the execution model.

Liveness: A message that arrives at a process must eventually be delivered to the


process.

The Raynal–Schiper–Toueg algorithm

Each message M should carry a log of all other messages sent causally before M’s

send event, and sent to the same destination dest(M).

The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of


several algorithms that reduces the size of the local space and message space
overhead by various techniques.

This log can then be examined to ensure whether it is safe to deliver a message.

All algorithms aim to reduce this log overhead, and the space and time overhead of
maintaining the log information at the processes.
To distribute this log information, broadcast and multicast communication is used.

The hardware-assisted or network layer protocol assisted multicast cannot efficiently


provide features:

Application-specific ordering semantics on the order of delivery of messages.

Adapting groups to dynamically changing membership.

Sending multicasts to an arbitrary set of processes at each send event.

Providing various fault-tolerance semantics

Causal Order (CO)

An optimal CO algorithm stores in local message logs and propagates on messages,

information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:

Propagation Constraint I: it is not known that the message M is delivered to d.

Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send (M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.

Fig 2.6: Conditions for causal ordering

The Propagation Constraints also imply that if either (I) or (II) is false, the information

“d ∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
Falsified.

not in the causal future of Deliverd (M1, a)

not in the causal future of e k, c where d ∈Mk,cDests and there is no other message sent
causally between Mi,a and Mk,c to the same destination d.

Information about messages: (i) not known to be delivered

(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using (source,
timestamp, destination) information.

Information about messages already delivered and messages guaranteed to be delivered in


CO is implicitly tracked without storing or propagating it, and is derived from the explicit
information. The algorithm for the send and receive operations is given in Fig. 2.7 a) and b).
Procedure SND is executed atomically. Procedure RCV is executed atomically except for a
possible interruption in line 2a where a non-blocking wait is required to meet the Delivery
Condition.
CS8603 DS

Fig 2.7 a) Send algorithm by Kshemkalyani–Singhal to optimally implement causal


ordering

Fig 2.7 b) Receive algorithm by Kshemkalyani–Singhal to optimally implement causal


ordering
CS8603 DS

The data structures maintained are sorted row–major and then column–major:

1. Explicit tracking:

Tracking of (source, timestamp, destination) information for messages (i) not known to be

delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the I.Dests
field of entries in local logs at nodes and o.Dests field of entries in messages. Sets li,a
Dests and oi,a. Dests contain explicit information of destinations to which Mi,a is not
guaranteed to be delivered in CO and is not known to be delivered. The information about d ϵ
M i,a. Dests is propagated up to the earliest events on all causal paths from (i,a) at which it is
known that M i,a is delivered to d or is guaranteed to be delivered to d in CO.

2. Implicit tracking:

Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be delivered
in CO, is performed implicitly. The information about messages (i) already delivered or (ii)
guaranteed to be delivered in CO is deleted and not propagated because it is redundant as far
as enforcing CO is concerned. It is useful in determining what information that is being
carried in other messages and is being stored in logs at other nodes has become redundant
and thus can be purged.

The semantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked without
explicitly storing it.

The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining only
oi,aDests or li,aDests, which is a part of the explicit information.

Meenakshi.R
Fig 2.8: Illustration of propagation constraints
CS8603 DS

Multicasts M5,1and M4,1

Message M5,1 sent to processes P4 and P6 contains the piggybacked information M5,1.
Dest= {P4, P6}. Additionally, at the send event (5, 1), the information M5,1.Dests =
{P4,P6} is also inserted in the local log Log5. When M5,1 is delivered to P6, the (new)
piggybacked information “P4 ϵM5,1. Dests” is stored in Log6 as “M 5,1.Dests ={4}”;
information about “P6ϵM5,1.Dests,” which was needed for routing, must not be stored in Log6
because of constraint I. symmetrically, when M5,1 is delivered to process P4 at event (4,1),
only the new piggybacked information “P6 ϵ M5,1.Dests” is inserted in Log4 as “M5,1.Dests
={P6},” which is later propagated during multicate M4,2.

Multicast M4,3

At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated on multicast M4,3 only
to process P6 to ensure causal delivery using the delivery condition. The piggybacked
information on message M4,3 sent to process P3 must not contain this information because of
constraint II. ( the piggybacked information contains “M4,3.Dests = {P6}.” As long as any
future message sent to P6 is delivered in causal order w.r.t m4,3 sent to P6, it will also be
delivered in casual order w.r.t M5,1 sent to P6) and as M5,1 is already delivered to P4, the
information “M5.1.Dests=ø” is piggybacked on M4,3 sent to P3. Similarly, the information
“P6 ϵ M5,1.Dests” must be deleted from Log4 as it will no longer be needed, because of
constraint II. “M5.1.Dests=ø” is stored in Log4 to remember that M5,1 has been delivered or
is guaranted to be delivered in causal order to all its destination.

Learning implicit information at P2 and P3

When message M4,2is received by processes P2 and P3, they insert the (new) piggybacked
information in their local logs, as information M5,1.Dests = P6. They both continue to store
this in Log2 and Log3 and propagate this information on multicasts until they learn at events
(2, 4) and (3, 2) on receipt of messages M3,3and M4,3, respectively, that any future message
is expected to be delivered in causal order to process P6, w.r.t. M5,1sent toP6. Hence
by constraint II, this information must be deleted from Log2 andLog3. The flow of events is
given by;

When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2), this

is inferred to be valid current implicit information about multicast M5,1because the log

Log3 already contains explicit informationP6 ∈M5,1.Dests about that multicast.

Therefore, the explicit information in Log3 is inferred to be old and must be deleted to

achieve optimality. M5,1Dests is set to ∅ in Log3.


The logic by which P2 learns this implicit knowledge on the arrival of M3,3is

identical.

Processing at P6

When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further, P6
propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current implicit
information M5,1 has been delivered to P6 by its very absence in the explicit information.

When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests

= P6 it is used only to ensure causal delivery of M4,3 using the Delivery Condition,

and is not inserted in Log6 (constraint I) – further, the presence of M5,1 .Dests = P4 in Log6
implies the implicit information that M5,1 has already been delivered to P6. Also, the
absence of P4 in M5,1 .Dests in the explicit piggybacked information implies the implicit
information that M5,1 has been delivered or is guaranteed to be

delivered in causal order to P4, and, therefore, M5,1. Dests is set to ∅ in Log6.

When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests

= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery

Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,

which gives the implicit information that M5,1 has been delivered or is guaranteed

to be delivered in causal order to both P4 and P6.

Processing at P1

When M2,2 arrives carrying piggybacked information M5,1.Dests = P6 this (new)

information is inserted in Log1.

When M6,2 arrives with piggybacked information M5,1.Dests ={P4}, P1learns


implicit information M5,1has been delivered to P6 by the very absence of explicit
information

P6 ∈ M5,1.Dests in the piggybacked information, and hence marks information P6 ∈

M5,1 Dests for deletion from Log1. Simultaneously, M5,1 Dests = P6 in Log1 implies

the implicit information that M5,1has been delivered or is guaranteed to be delivered in


causal order to P4.Thus, P1 also learns that the explicit piggybacked information
M5,1. Dests = P4 is outdated. M5,1. Dests in Log1 is set to ∅. The information “P6 ∈M5,1.
Dests piggybacked on M2,3,which arrives at P 1, is inferred to be outdated using the implicit
knowledge derived from M5,1.Dest= ∅” in Log1.

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. Which of the following statements is true


about open group algorithms?

 a) The sender is also one of the


receivers in the multicast algorithm. Answer: b)
They are more
 b) They are more general, difficult to
general,
design, and expensive. K1
difficult to
 c) They do not support large systems design, and
where client processes have a short expensive.
life.

 d) They are specific and easy to


implement.

2. What is a key requirement for a causal


ordering protocol?

 a) Messages must be delivered out of Answer: b) A


order for faster processing. message must
be buffered
 b) A message must be buffered until
until all
all system-wide messages sent in the
system-wide
causal past of that message have K1
messages sent
arrived.
in the causal
 c) Messages must be delivered past of that
randomly. message have
arrived.
 d) The arrival of a message must be
immediately visible to the
application process.
3. Which algorithm is mentioned as an
example of reducing the size of local space
and message space overhead in causal
ordering? Answer: a)
Raynal–
 a) Raynal–Schiper–Toueg algorithm K1
Schiper–Toueg
 b) Dijkstra's algorithm algorithm

 c) Bellman-Ford algorithm

 d) Paxos algorithm

4. What is the primary distinction between


closed and open group algorithms?

 a) Closed group algorithms support


large systems; open group algorithms Answer: c) In
do not. closed group
algorithms, the
 b) Open group algorithms are
sender is part K1
specific and easy to implement.
of the
 c) In closed group algorithms, the communication
sender is part of the communication group.
group.

 d) In open group algorithms, the


sender is also one of the receivers.

5. Which of the following is NOT a


functionality that the network layer
protocol can provide?

 a) Application-specific ordering Answer: a)


semantics on the order of delivery of Application-
specific
messages.
ordering
K1
 b) Adapting groups to dynamically semantics on
changing membership. the order of
delivery of
 c) Sending multicasts to an arbitrary messages.
set of processes at each send event.

 d) Basic point-to-point
communication (Unicasting).
6. In the context of causal order (CO), what
does Propagation Constraint I specify?

 a) It specifies that information about


a message that has been delivered Answer: b) It
must be stored and propagated. specifies that a
message's
 b) It specifies that a message's
information
information should not be stored or
should not be
propagated if it is known that the K1
stored or
message is delivered.
propagated if it
 c) It specifies that a message's is known that
information must be propagated the message is
indefinitely. delivered.

 d) It specifies that information about


a message must always be stored,
regardless of its delivery status.

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. How does the closed group multicast algorithm


differ from the open group multicast algorithm in 2 CO2 K1
terms of sender involvement?

2. What is the primary purpose of the Raynal–


Schiper–Toueg algorithm in the context of causal 2 CO2 K1
ordering?

3. Explain the Raynal–Schiper–Toueg algorithm in


detail, highlighting its role in ensuring causal order
in group communication. Discuss the challenges
13 CO2 K2
involved in implementing this algorithm and the
techniques used to reduce overhead in local space
and message space.

4. Compare and contrast open group and closed group


multicast algorithms. Discuss their advantages, 13 CO2 K2
disadvantages, and the contexts in which each type
is most appropriate. Provide examples to support
your analysis.

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 206- 215
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.8

Topic Total Order

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 Write the main drawback of the centralized algorithm for K1


total ordering in distributed systems.

LO2 Explain the concept of total order in the context of K1


message delivery.

LO3 Explain the Three-Phase Distributed Algorithm for Total K2


Ordering in Distributed Systems. Include a detailed
discussion of the sender and receiver sides of the
algorithm, the role of timestamps, and how message
delivery order is ensured. Discuss the complexity and
drawbacks of this algorithm compared to centralized
algorithms.

TOTAL ORDER

For each pair of processes Pi and Pj and for each pair of messages Mx and My that are
delivered to both the processes, Pi is delivered Mx before My if and only if Pj is delivered
Mxbefore My.

Centralized Algorithm for total ordering

Each process sends the message it wants to broadcast to a centralized process, which relays
all the messages it receives to every other process over FIFO channels.

Complexity: Each message transmission takes two message hops and exactly n messages in a
system of n processes.
Drawbacks: A centralized algorithm has a single point of failure and congestion, and is not
an elegant solution.

Three phase distributed algorithm

Three phases can be seen in both sender and receiver side.

Sender side

Phase 1

In the first phase, a process multicasts the message M with a locally unique tag and

the local timestamp to the group members.

Phase 2

The sender process awaits a reply from all the group members who respond with a

tentative proposal for a revised timestamp for that message M.

The await call is non-blocking.

Phase 3

The process multicasts the final timestamp to the group.

Receiver Side

Phase 1Fig 2.9: Sender side of three phase


distributed algorithm
CS8603 DS

The receiver receives the message with a tentative timestamp. It updates the variable
priority that tracks the highest proposed timestamp, then revises the proposed timestamp to
the priority, and places the message with its tag and the revised timestamp at the tail of the
queue temp_Q. In the queue, the entry is marked as undeliverable.

Phase 2

The receiver sends the revised timestamp back to the sender. The receiver then waits

in a non-blocking manner for the final timestamp.

Phase 3

The final timestamp is received from the multicaster. The corresponding message

entry in temp_Q is identified using the tag, and is marked as deliverable after the revised
timestamp is overwritten by the final timestamp.

The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under consideration,
that message entry has to be placed in its sorted position in the queue.

If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from temp_Q, and
enqueued in deliver_Q.

Complexity

This algorithm uses three phases, and, to send a message to n − 1 processes, it uses 3(n – 1)

messages and incurs a delay of three message hops

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

In a system with total ordering, which of c) A


the following statements is true? message
Mx is
a) A message can be delivered out of order as delivered
long as it eventually reaches all processes. before My
b) A message is delivered to all processes at at one
the same time. process if
and only if
c) A message Mx is delivered before My at Mx is
one process if and only if Mx is delivered delivered
before My at all other processes. before My
at all other
d) Each process independently determines the processes.
order of message delivery.

What is the main drawback of the


centralized algorithm for total ordering in
a distributed system?

a) It requires too many messages to be c) It relies


sent. on a single
point of
b) It introduces excessive delays in
failure and
message delivery.
can cause
c) It relies on a single point of failure and congestion.
can cause congestion.

d) It cannot guarantee message delivery in


FIFO order.

How many message hops does the


centralized algorithm require to deliver a
message in a system with nnn processes?
b) 2
a) 1 message hop
message
b) 2 message hops hops

c) n−1n - 1n−1 message hops

d) 3 message hops

In the three-phase distributed algorithm,


what happens during the first phase on the b) The
sender side? sender
multicasts
a) The sender process awaits a reply from all the
group members. message
b) The sender multicasts the message with a with a
locally unique tag and timestamp. locally
unique tag
c) The sender receives the final timestamp and
from the multicaster. timestamp.

d) The sender marks the message as


deliverable

During Phase 2 on the receiver side in the


three-phase distributed algorithm, what
action does the receiver take?

a) It marks the message as deliverable. c) It sends


the revised
b) It multicasts the final timestamp to the
timestamp
group.
back to the
c) It sends the revised timestamp back to the sender.
sender.

d) It dequeues the message from the


temporary queue.

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the main drawback of the centralized


algorithm for total ordering in distributed 2 CO2 K1
systems?

2. Explain the concept of total order in the context of


2 CO2 K1
message delivery.

3. Explain the Three-Phase Distributed Algorithm


for Total Ordering in Distributed Systems. Include
a detailed discussion of the sender and receiver
sides of the algorithm, the role of timestamps, and 13 CO2 K2
how message delivery order is ensured. Discuss
the complexity and drawbacks of this algorithm
compared to centralized algorithms.
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.9

Author(s) Name
Global State Title of the book
and Snapshot Page numbers
Recording Algorithm – Introduction – System Model and
Topic
Definitions Distributed Computing
Ajay D. Kshemkalyani
Learning Outcome Principles, Algorithms 215-220 Bloom’s
and Mukesh Singhal (LO) At the end of this lecture, students will be
able to and Systems Knowledge Level

LO1 Define a global state in a distributed system? K1

LO2 Define a consistent global state. K1

LO3 List the main challenge in recording a global state in a K2


distributed system?

LO4 Explain Global State and Snapshot Recording in K2


Distributed Systems. Discuss the system model, the
concept of consistent global state, and the challenges in

Reference Book:
recording a global snapshot. Use relevant examples and
diagrams to illustrate the interpretation of cuts and
consistency conditions.

GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS

A distributed computing system consists of processes that do not share a common

memory and communicate asynchronously with each other by message passing. Each
component of has a local state. The state of the process is the local memory and a history of
its activity.

The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel. The global state of a distributed system is a
collection of the local states of its components.

If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.

The absence of shared memory necessitates ways of getting a coherent and complete
view of the system based on the local states of individual processes.

A meaningful global snapshot can be obtained if the components of the distributed


system record their local states at the same time.

This would be possible if the local clocks at processes were perfectly synchronized
or if there were a global system clock that could be instantaneously read by the processes.

If processes read time from a single common clock, various indeterminate


transmission delays during the read operation will cause the processes to identify various
physical instants as the same time.

System Model

The system consists of a collection of n processes, p1, p2,…,pn that are connected
by channels.

Let Cij denote the channel from process pi to process pj.

Processes and channels have states associated with them.

The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc., and may be highly dependent on the local context of the
distributed application.
CS8603 DS

The state of channel Cij, denoted by SCij, is given by the set of messages in transit
in the channel.

The events that may happen are: internal event, send (send (mij)) and receive

(rec(mij)) events.

The occurrences of events cause changes in the process state.

A channel is a distributed entity and its state depends on the local states of the
processes on which it is incident.

The transit function records the state of the channel Cij.

In the FIFO model, each channel acts as a first-in first-out message queue and, thus,
message ordering is preserved by a channel.

In the non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random order.

A consistent global state

The global state of a distributed system is a collection of the local states of the

processes and the channels. The global state is given by:

The two conditions for global state are:

Condition 1 preserves law of conservation of messages. Condition C2 states that in the


collected global state, for every effect, its cause must be present.

Law of conservation of messages: Every message mijthat is recorded as sent in the local
state of a process pi must be captured in the state of the channel Cij or in the collected
local state of the receiver process pj.

In a consistent global state, every message that is recorded as received is also recorded

as sent. Such a global state captures the notion of causality that a message cannot be
received if it was not sent.

Meenakshi.R
Consistent global states are meaningful global states and inconsistent global states are not
meaningful in the sense that a distributed system can never be in an inconsistent state.

Interpretation of cuts

Cuts in a space–time diagram provide a powerful graphical aid in representing and

reasoning about the global states of a computation. A cut is a line joining an arbitrary point
on each process line that slices the space–time diagram into a PAST and a FUTURE.

A consistent global state corresponds to a cut in which every message received in the
PAST of the cut has been sent in the PAST of that cut. Such a cut is known as a consistent
cut.

In a consistent snapshot, all the recorded local states of processes are concurrent; that
is, the recorded local state of no process casually affects the recorded local state of any other
process.

Issues in recording global state

The non-availability of global clock in distributed system, raises the following issues:

Issue 1:

How to distinguish between the messages to be recorded in the snapshot from those

not to be recorded?

Answer:

Any message that is sent by a process before recording its snapshot, must be

recorded in the global snapshot (from C1).

Any message that is sent by a process after recording its snapshot, must not be
recorded in the global snapshot (from C2).

Issue 2:

How to determine the instant when a process takes its snapshot? The answer

Answer:

A process pj must record its snapshot before processing a message mij that was sent by
process pi after recording its snapshot.

Meenakshi.R
Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What does the global state of a distributed b) The


system consist of? collection
a) The state of the central server of local
b) The collection of local states of all states of all K1
processes and channels processes
c) The state of the shared memory and
d) The state of the operating system channels

2. In a distributed system, what is the state of a c) The set


channel defined by? of
messages
a) The number of processes in the system sent along
b) The contents of the local memory K1
the channel
c) The set of messages sent along the channel minus the
minus the messages received messages
d) The synchronization level of the clocks received

3. Which of the following conditions must be


satisfied for a global state to be consistent? c) Every
message
a) All processes must have the same local that is
state recorded as
b) All messages sent must be received in the K1
received
order they were sent must also
c) Every message that is recorded as received be recorded
must also be recorded as sent as sent
d) The system must have a global clock

4. What is a "cut" in the context of a space-time b) A line


diagram in distributed systems? that divides
a) A method to terminate a process the space-
b) A line that divides the space-time diagram time K1
into past and future diagram
c) A signal to synchronize processes into past
d) A breakpoint in process execution and future
5. What issue arises due to the absence of a c)
global clock in distributed systems when Difficulty
recording global state? in
determining
a) Difficulty in process synchronization which K1
b) Inability to detect process failures messages to
c) Difficulty in determining which messages include in
to include in the snapshot the
d) Inefficient message passing snapshot

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is a global state in a distributed system? 2 CO2 K1

2. Define a consistent global state. 2 CO2 K1

3. What is the main challenge in recording a global


2 CO2 K1
state in a distributed system?

4. Explain Global State and Snapshot Recording in


Distributed Systems. Discuss the system model,
the concept of consistent global state, and the
13 CO2 K2
challenges in recording a global snapshot. Use
relevant examples and diagrams to illustrate the
interpretation of cuts and consistency conditions.

Reference Book:

Author(s) Name Title of the book Page numbers

Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 87-93
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.10

Topic Snapshot Algorithms for FIFO Channels

Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level

LO1 List the purpose of a global snapshot in a distributed K1


system

LO2 Explain the role of the marker in the Chandy-Lamport K2


algorithm.

LO3 Discuss the Chandy-Lamport snapshot algorithm for K2


recording a global state in a distributed system with
FIFO channels.

SNAPSHOT ALGORITHMS FOR FIFO CHANNELS

Each distributed application has number of processes running on different physical

servers. These processes communicate with each other through messaging channels.

A snapshot captures the local states of each process along with the state of each
communication channel.

Snapshots are required to:

 Check pointing

 Collecting garbage

 Detecting deadlocks

 Debugging

Chandy–Lamport algorithm

The algorithm will record a global snapshot for each process channel.

The Chandy-Lamport algorithm uses a control message, called a marker.


CS8603 DS
After a site has recorded its snapshot, it sends a marker along all of
its outgoing channels before sending out any more messages.

Since channels are FIFO, a marker separates the messages in the channel into
those to be included in the snapshot from those not to be recorded in the snapshot.

This addresses issue I1. The role of markers in a FIFO system is to act as delimiters
for the messages in the channels so that the channel state recorded by the process at the
receiving end of the channel satisfies the condition C2.

Fig 2.10: Chandy–Lamport algorithm

Initiating a snapshot

 Process Pi initiates the snapshot

 Pi records its own state and prepares a special marker message.

 Send the marker message to all other processes.

 Start recording all incoming messages from channels Cij for j not equal to i.

Propagating a snapshot

For all processes Pjconsider a message on channel Ckj.

If marker message is seen for the first time:

 Pjrecords own sate and marks Ckj as empty

Meenakshi.R
 Send the marker message to all other processes.

 Record all incoming messages from channels Clj for 1 not equal to j or
k.

 Else add all messages from inbound channels.

Terminating a snapshot

 All processes have received a marker.

 All process have received a marker on all the N-1 incoming channels.

 A central server can gather the partial state to build a global snapshot.

Correctness of the algorithm

Since a process records its snapshot when it receives the first marker on any
incoming channel, no messages that follow markers on the channels incoming to it are
recorded in the process’s snapshot.

A process stops recording the state of an incoming channel when a marker is


received on that channel.

Due to FIFO property of channels, it follows that no message sent after the marker
on that channel is recorded in the channel state. Thus, condition C2 is satisfied.

When a process pj receives message mij that precedes the marker on channel Cij, it
acts as follows: if process pj has not taken its snapshot yet, then it includes mij in its
recorded snapshot. Otherwise, it records mij in the state of the channel Cij. Thus, condition
C1 is satisfied.

Complexity

The recording part of a single instance of the algorithm requires O(e) messages

and O(d) time, where e is the number of edges in the network and d is the diameter of the
network.

2.9.2 Properties of the recorded global state


The recorded global state may not correspond to any of the global states that occurred
during the computation.

This happens because a process can change its state asynchronously before the markers it
sent

are received by other sites and the other sites record their states.

But the system could have passed through the recorded global states in some equivalent
executions.

The recorded global state is a valid state in an equivalent execution and if a stable property

(i.e., a property that persists) holds in the system before the snapshot algorithm begins, it
holds in the recorded global snapshot.

Therefore, a recorded global state is useful in detecting stable properties.

Assessment questions to the lecture

Bloom’s
Qn No Question Answer
Knowledge Level

1. What is the primary purpose of a snapshot


in a distributed system?

A) To increase network bandwidth


B) To capture
B) To capture the local states of each the local states
process and the state of communication of each process
K1
channels and the state of
communication
C) To improve processing speed channels
D) To enhance memory usage

2. In the Chandy-Lamport algorithm, what is C) To separate


the role of the marker? messages in the
channel into
A) To start the snapshot process those to be K1

B) To synchronize clocks between included in the


processes snapshot and
those not to be
C) To separate messages in the channel included
into those to be included in the snapshot
and those not to be included

D) To terminate the snapshot process

3. How does a process initiate a snapshot in


the Chandy-Lamport algorithm?

A) By sending a termination message to all


other processes B) By
recording its
B) By recording its local state and then local state and
sending a marker message to all other then sending a K1
processes marker
message to all
C) By waiting for a marker message from other processes
another process

D) By broadcasting a snapshot request to


all processes

4. What condition must be satisfied for the


global snapshot recorded by the Chandy-
Lamport algorithm to be considered
consistent? C) Every
A) All processes must record their local message
received in the
states simultaneously
recorded
B) No process changes its state during the snapshot must K1
snapshot also be
recorded as
C) Every message received in the recorded sent in the
snapshot must also be recorded as sent in global state
the global state

D) All messages sent after the snapshot


must be discarded

5. When does a process stop recording the


state of an incoming channel in the C) When it
Chandy-Lamport algorithm? receives a
K1
marker on that
A) When the snapshot is completed channel
B) When it receives the first message on
the channel

C) When it receives a marker on that


channel

D) When all other processes have


received the marker

6. Why might the recorded global state not


correspond to any global state that actually
occurred during computation?
B) Because
A) Because of delays in the network
processes can
B) Because processes can change state change state
asynchronously before other sites receive asynchronously K1
the markers before other
sites receive
C) Due to the lack of synchronization the markers
between process clocks

D) Because the snapshot is only a partial


state of the system

Students have to prepare answers for the following questions at the end of the lecture

Bloom’s
Qn
Question Marks CO Knowledge
No
Level

1. What is the purpose of a global snapshot in a


2 CO2 K1
distributed system?

2. Explain the role of the marker in the Chandy-


2 CO2 K1
Lamport algorithm.

3. Discuss the Chandy-Lamport snapshot algorithm


for recording a global state in a distributed system 13 CO2 K2
with FIFO channels.

Reference Book:

Author(s) Name Title of the book Page numbers


Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 93-97
and Mukesh Singhal
and Systems

You might also like