CS3551 Unit 2 Lecture Notes
CS3551 Unit 2 Lecture Notes
Logical time: Physical Clock Synchronization: NTP- A Framework for a System of Logical
Clocks – Scalar Time – Vector Time: Message Ordering and Group Communication:
Message Ordering Paradigms – Asynchronous Execution with Synchronous Communication
– Synchronous Program Order on Asynchronous System – Group Communication – Causal
Order – Total Order; Global State and Snapshot Recording Algorithms : Introduction –
System Model and Definitions- Snapshot Algorithms for FIFO channels.
Lecture
Unit LOGICAL TIME AND GLOBAL STATE C303.2.1
No
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
LO1 Describe the role of clock offset and delay estimation in the K1
Network Time Protocol (NTP).
Logical Time:
The concept of causality between events is fundamental to the design and analysis of
parallel and distributed computing and OS. Usually causality is tracked using physical time.
However, in DS, it is not possible to have global physical time.
4) Concurrency measure.
The knowledge of the causal precedence relation among events helps ensure liveness
and fairness in mutual exclusion algorithms, help maintain consistency in replicated
database, and helps design correct deadlock detection algorithms to avoid phantom and
undetected deadlock.
The knowledge of the causal dependency among events helps measure the progress
of processes in the distributed computation. This is useful in discarding obsolete
information, garbage collection, and termination detection.
Concurrency Measure:
The knowledge of how many events are causally dependent is useful in measuring
the amount of concurrency in a computation. All events that are not causally related can be
executed concurrently. Thus, an analysis of the causality in a computation gives an idea of
the concurrency in the program.
Centralized systems do not need clock synchronization, as they work under a common clock.
But the distributed systems do not follow common clock: each system functions based on its
own internal clock and its own notion of time. The time in distributed systems is measured
in the following contexts:
The time of the day at which an event happened on a specific machine in the network. The
time interval between two events that happened on different machines in the network. The
relative ordering of events that happened on different machines in the network.
Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically clock synchronization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other
but also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.
Basic terminologies:
Time: The time of a clock in a machine p is given by the function Cp(t),where Cp(t)= t for
a perfect clock.
Frequency: Frequency is the rate at which a clock progresses. The frequency at time t of
clock Ca is Ca’(t).
Offset: Clock offset is the difference between the time reported by a clock and the real
time. The offset of the clock Ca is given by Ca(t)− t. The offset of clock C a relative to Cb at
time t ≥ 0 is given by Ca(t)- Cb(t)
Skew: The skew of a clock is the difference in the frequencies of the clock and the perfect
clock. The skew of a clock Ca relative to clock Cb at time t is Ca’(t)- Cb’(t).
Drift (rate): The drift of clock Ca the second derivative of the clock value with respect to
time. The drift is calculated as:
Clocking Inaccuracies
(Universal Coordinated Time). Due to the clock inaccuracy discussed above, a timer (clock)
A time service for the Internet - synchronizes clients to UTC Reliability from redundant
paths, scalable, authenticates time sources Architecture. The design of NTP involves a
hierarchical tree of time servers with primary server at the root synchronizes with the UTC.
The next level contains secondary servers, which act as a backup to the primary server. At
the lowest level is the synchronization subnet which has the clients.
2. Clock offset and delay estimation
A source node cannot accurately estimate the local time on the target node due to varying
message or network delays between the nodes. This protocol employs a very common
practice of performing several trials and chooses the trial with the minimum delay.
Fig 1.30 a) Offset and delay estimation Fig 1.30 b) Offset and delay estimation
between processes from same server
between processes from different servers
Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B are
stable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network delay
difference from A to B and from B to A, called differential delay, is small, the clock offset θ
and roundtrip delay δ of B relative to A at time T4 are approximately given by the following :
𝑎+𝑏
Θ= , δ=a-b
2
Each NTP message includes the latest three timestamps T1, T2 and T3. While T4 is determined
upon arrival. Thus, both peers A and B can independently calculate delay and offset using a
single bidirectional message stream.
Bloom’s
Qn No Question Answer
Knowledge Level
a) Drift. c) Skew. K1
b) Offset.
c) Skew.
d) Frequency.
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 50,78-81
and Mukesh Singhal
and Systems
CS68603 DS
Topic A Framework for a System of Logical Clocks, Scalar Time – Vector Time
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
A system of logical clocks consists of a time domain T and a logical clock C. Elements of T
form a partially ordered set over a relation <. This relation is usually called the happened
before or causal precedence.
The logical clock C is a function that maps an event e in a distributed system to an element in
the time domain T denoted as C(e).
such that for any two events ei and ej,. eiej C(ei)< C(ej).
This monotonicity property is called the clock consistency condition. When T and C satisfy
Protocols: rules for updating the data structures to ensure consistent conditions.
Meenakshi R
Data structures:
• A local logical clock (lci), that helps process pi measure its own progress.
• A logical global clock (gci), that is a representation of process pi’s local view of the logical
global time. It allows this process to assign consistent timestamps to its local events.
Protocol
The protocol ensures that a process’s logical clock, and thus its view of the global time, is
managed consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.
Scalar Time
A process increments its counter for each event (internal event, message sending,
message receiving) in that process.
When a process sends a message, it includes its (incremented) counter value with the
message.
On receiving a message, the counter of the recipient is updated to the greater of its
current counter and the timestamp in the received message, and then incremented by
one.
Rule 2: The following actions are implemented when pi receives a message m with
timestamp Cm:
b) Execute Rule 1
Meenakshi R
2. Total Reordering: Scalar clocks order the events in distributed systems. But all the events
do not follow a common identical timestamp. Hence a tie breaking mechanism is essential to
order the events. The tie breaking is done through:
The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is the
identity of the process where it occurred.
The total order relation ( ) over two events x and y with timestamp (h, i) and (k, j) is
given by:
3. Event Counting
If event e has a timestamp h, then h−1 represents the minimum logical duration, counted in
units of events, required before producing the event e. This is called height of the event e. h-
1 events have been produced sequentially before the event e regardless of the processes that
produced these events.
4. No strong consistency
The scalar clocks are not strongly consistent is that the logical local clock and logical global
clock of a process are squashed into one, resulting in the loss causal dependency information
among events at different processes.
Vector Time
The ordering from Lamport's clocks is not enough to guarantee that if two events precede one
another in the ordering relation they are also causally related. Vector Clocks use a vector
counter instead of an integer counter. The vector clock of a system with N processes is a
vector of N counters, one counter per process.
Each time a process experiences an event, it increments its own counter in the vector
by one.
Meenakshi R
Each time a process sends a message, it includes a copy of its own (incremented)
vector in the message.
Each time a process receives a message, it increments its own counter in the vector by
one and updates each element in its vector by taking the maximum of the value in its
own vector counter and the value in the vector in the received message.
Rule 1: Before executing an event, process pi updates its local logical time as follows:
Rule 2: Each message m is piggybacked with the vector clock vt of the sender process at
sending time. On the receipt of such a message (m,vt), process
2. execute R1
Isomorphism:
“→” induces a partial order on the set of events that are produced by a distributed execution.
If the process at which an event occurred is known, the test to compare two timestamps can
be simplified as:
CS68603 DS
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector timestamp
3. Event counting
If an event e has timestamp vh, vh[j] denotes the number of events executed by process pj
that causally precede e.
Bloom’s
Qn No Question Answer
Knowledge Level
Meenakshi R
2. In Lamport's logical clock algorithm,
what happens when a process receives a
c) The
message with a timestamp?
process
a) The process resets its local clock to zero. updates its
clock to the
b) The process ignores the timestamp and maximum of
uses its current clock value. its current
c) The process updates its clock to the clock value K1
maximum of its current clock value and the and the
received timestamp, then increments it by received
one. timestamp,
then
d) The process decreases its clock value to increments it
match the received timestamp. by one.
a) Event counting.
c) Strong
b) Total ordering. K1
consistency.
c) Strong consistency.
d) Clock skew.
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 52-59
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.3
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
Inter-process communication via message – passing is at the core of any distributed system.
Multicasts are required at the application layer when super imposed topologies or overlays
are used, as well as at the lower layers of the protocol stack.
Notation:
The distributed system as a graph (N,L). The following notation is used to refer to
messages and events:
When referring to a message without regards for the identity of the sender and
receiver proceses, we use mi. For message mi, its send and receive events are denoted
as si and ri, respectively.
More generally, send and receive events are denoted simply as s and r. When the
relationship between the message and its send and receive events is to be stressed we
also use M, send(M) and receive (M), respectively.
For any two events a and b, where each can be either a send event or a receive event, the
notation a~b denotes that a and b occur at the same process, i.e., aƐ Ei and b ƐEi for some
process i. the send and receive event pair for a message is said to be a pair of corresponding
events. The send event corresponds to the receive event, and vice versa. For a given
execution E, let the set of all send-receive event pairs be denoted as Ʈ={(s,r)Ɛ Ei X Ej | s
corresponds to r}
When dealing with message ordering definitions, consider only send and receive events, but
not internal events, because only communication events are relevant.
As the distributed systems are a network of systems at various physical locations, the
coordination between them should always be preserved. The message ordering means the
order of delivering the messages to the intended recipients. The common message order
schemes are First in First out (FIFO), non FIFO, causal order and synchronous order. In case
of group communication with multicasting, the causal and total ordering scheme is followed.
It is also essential to define the behaviour of the system in case of failures. The following
are the notations that are widely used in this chapter:
Send (M) and receive (M) indicates the message M send and received.
Bloom’s
Qn No Question Answer
Knowledge Level
C) Internal events
D) Communication events
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 189-190
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.4
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
There is always a trade-off between concurrency and ease of use and implementation.
Asynchronous Executions
An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality
relation
is a partial order. CS8603 DS
Though there is a physical link that delivers the messages sent on it in FIFO order due to the
physical properties of the medium, a logical link may be formed as a composite of physical
links and multiple paths may exist between the two end points of the Logical link.
Meenakshi.R
FIFO executions
A FIFO logical channel can be created over a non-FIFO channel by using a separate
numbering scheme to sequence the messages on each logical channel.
The receiver uses a buffer to order the incoming messages as per the sender’s
sequence numbers, and accepts only the “next” message in sequence.
Two send events s and s’ are related by causality ordering (not physical time
ordering), then a causally ordered execution requires that their corresponding receive events r
and r’ occur in the same order at all common destinations.
Causal order is used in applications that update shared data, distributed shared
memory, or fair resource allocation.
The delayed message m is then given to the application for processing. The event of
an application processing an arrived message is referred to as a delivery event.
pair.
If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.
execution if for each pair of events (s, r) ∈ T, the open interval set
3. An execution (E, ≺) is CO if and only if for each pair of events (s, r) ∈ T and each event
e ∈ E,
Synchronous Execution
When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.The synchronous
communication always involves a handshake between the receiver and the sender, the
handshake events may appear to be occurring instantaneously and atomically.
Meenakshi.R
modified definition of the causality relation because for each (s, r) ∈ T, the send
event is not causally ordered before the receive event. The two events are viewed as being
atomic and simultaneous, and neither event precedes the other.
S2: If (s, r ∈ T, then) for all x ∈ E, [(x<< s ⇐⇒ x<<r) and (s<< x ⇐⇒ r<< x)].
Bloom’s
Qn No Question Answer
Knowledge Level
a) Non-FIFO b) FIFO K1
b) FIFO
c) Causal order
d) Synchronous order
a) FIFO
Meenakshi.R
b) Non-FIFO
c) Causal order
d) Synchronous order
d) CO is vacuously satisfied if
there is no causal relationship
between send events.
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 190-195
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.5
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
When all the communication between pairs of processes is by using synchronous send and
receive primitives, the resulting order is synchronous order. The algorithms run on
asynchronous systems will not work in synchronous system and vice versa is also true.
An execution can be modeled to give a total order that extends the partial order
(E, ≺).
linear extension of the execution, such that each send event is immediately followed by its
A A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear
extension of the partial order (E, ≺).
In the non-separated linear extension, if the adjacent send event and its corresponding
receive event are viewed atomically, then that pair of events shares a common past and a
common future with each other.
Crown
Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of pairs
of corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1, sk−1 ≺
r0.
The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies
may exist in a crown. The crown criterion states that an A-computation is RSC, i.e., it can be
An execution (E, ≺) is RSC if and only if there exists a mapping from E to T (scalar
FIFO Order(FIFO)
RSC ⊂ CO ⊂ FIFO ⊂ A
This hierarchy is illustrated in Figure 2.3(a), and example executions of each class are
Meenakshi.R
shown side-by-side in Figure 2.3(b)
The above hierarchy implies that some executions belonging to a class X will not
belong to any of the classes included in X. The degree of concurrency is most in A and least
in SYNC.
Simulations
The events in the RSC execution are scheduled as per some non-separated linear
extension, and adjacent (s, r) events in this linear extension are executed sequentially in the
synchronous system.
If an A-execution is not RSC, then there is no way to schedule the events to make
them RSC, without actually altering the partial order of the given A-execution.
However, the following indirect strategy that does not alter the partial order can be
used.
Each channel Ci,j is modeled by a control process Pi,j that simulates the channel buffer.
This enables the decoupling of the sender from the receiver, a feature that is essential
in asynchronous system.
Meenakshi.R
CS8603 DS
scheduling the messages in the order in which they appear in the S-execution.
The partial order of the S-execution remains unchanged but the communication
occurs on an asynchronous system that uses asynchronous communication primitives.
Once a message send event is scheduled, the middleware layer waits for
acknowledgment; after the ack is received, the synchronous send primitive completes.
Bloom’s
Qn No Question Answer
Knowledge Level
a) It represents a causal
relationship between events. b) It indicates a
cyclic dependency
b) It indicates a cyclic
that prevents
dependency that prevents
realizable K1
realizable synchronous
synchronous
communication (RSC).
communication
c) It ensures that messages are (RSC).
delivered in FIFO order.
c) FIFO order
d) A communication model
where messages are always
delivered in FIFO order.
d) Synchronous executions
cannot be realized on
asynchronous systems.
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 195-200
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.6
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
There do not exists real systems with instantaneous communication that allows for
synchronous communication to be naturally realized. The basic question is how a system
with synchronous communication can be implemented. First examine non-determinism in
program execution, and CSP as a representative synchronous programming language, before
examing an implementation of synchronous communication.
The partial ordering of messages in the distributed systems makes the repeated runs of the
same program will produce the same partial order, thus preserving deterministic nature. But
sometimes the distributed systems exhibit non determinism:
A receive call can receive a message from any sender who has sent a message, if the
expected sender is not specified.
Multiple send and receive calls which are enabled at a process can be executed in an
interchangeable order.
There is no semantic dependency between the send and the immediately following
receive at each of the processes. If the receive call at one of the processes can be scheduled
before the send call, then there is no deadlock.
Rendezvous
For the receive command, the sender must be specified. However, multiple recieve
Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not alter the
partial order of the execution.
If multiple interactions are enabled, a process chooses one of them and tries to
synchronize with the partner process. The problem reduces to one of scheduling messages
satisfying the following constraints:
The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M),
and permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.
(message types)
Pi executes send(M) and blocks until it receives ack(M) from Pj . The send event SEND(M)
now completes.
Any M’ message (from a higher priority processes) and request(M’) request for
synchronization (from a lower priority processes) received during the blocking period are
queued.
// to avoid deadlock in which cyclically blocked processes queue // messages. (2b) While Pi
is waiting for permission, it remains unblocked.
(ii) If a request (M’) arrives from a lower priority process Pk, Pi executes
send(permission(M’)) to Pk
and blocks waiting for the message M’. When M’ arrives, the RECEIVE(M’) event is
executed.
(2c) When the permission (M) arrives, Pi knows partner Pj is synchronized and Pi executes
send(M). The SEND (M) now completes.
At the time a request (M) is processed by Pi, process Pi executes send (permission(M)) to Pj
and blocks waiting for the message M. When M arrives, the RECEIVE (M) event is executed
and the process unblocks.
When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it
as a message arrival (as per rules 3 or 4).
Bloom’s
Qn No Question Answer
Knowledge Level
C) It involves processes
communicating at the same time.
A) M
C) sync(M) K1
B) ack(M)
C) sync(M)
D) permission(M)
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Referenc Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 200-205
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.7
GROUP COMMUNICATION
multicast algorithm, then it is closed group group, then it is open group algorithm.
They are specific and easy to implement.
algorithm. They are more general, difficult to design and
causal order and total order. Given a system with FIFO channels, causal order needs to be
explicitly enforced by a protocol. The following two criteria must be met by a causal
ordering protocol:
Safety: In order to prevent causal order from being violated, a message M that
arrives at a process may need to be buffered until all system wide messages sent in the causal
past of the send (M) event to that same destination have already arrived. The arrival of a
message is transparent to the application process. The delivery event corresponds to the
receive event in the execution model.
Each message M should carry a log of all other messages sent causally before M’s
This log can then be examined to ensure whether it is safe to deliver a message.
All algorithms aim to reduce this log overhead, and the space and time overhead of
maintaining the log information at the processes.
To distribute this log information, broadcast and multicast communication is used.
information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:
Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send (M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.
The Propagation Constraints also imply that if either (I) or (II) is false, the information
“d ∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
Falsified.
not in the causal future of e k, c where d ∈Mk,cDests and there is no other message sent
causally between Mi,a and Mk,c to the same destination d.
(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using (source,
timestamp, destination) information.
The data structures maintained are sorted row–major and then column–major:
1. Explicit tracking:
Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the I.Dests
field of entries in local logs at nodes and o.Dests field of entries in messages. Sets li,a
Dests and oi,a. Dests contain explicit information of destinations to which Mi,a is not
guaranteed to be delivered in CO and is not known to be delivered. The information about d ϵ
M i,a. Dests is propagated up to the earliest events on all causal paths from (i,a) at which it is
known that M i,a is delivered to d or is guaranteed to be delivered to d in CO.
2. Implicit tracking:
Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be delivered
in CO, is performed implicitly. The information about messages (i) already delivered or (ii)
guaranteed to be delivered in CO is deleted and not propagated because it is redundant as far
as enforcing CO is concerned. It is useful in determining what information that is being
carried in other messages and is being stored in logs at other nodes has become redundant
and thus can be purged.
The semantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked without
explicitly storing it.
The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining only
oi,aDests or li,aDests, which is a part of the explicit information.
Meenakshi.R
Fig 2.8: Illustration of propagation constraints
CS8603 DS
Message M5,1 sent to processes P4 and P6 contains the piggybacked information M5,1.
Dest= {P4, P6}. Additionally, at the send event (5, 1), the information M5,1.Dests =
{P4,P6} is also inserted in the local log Log5. When M5,1 is delivered to P6, the (new)
piggybacked information “P4 ϵM5,1. Dests” is stored in Log6 as “M 5,1.Dests ={4}”;
information about “P6ϵM5,1.Dests,” which was needed for routing, must not be stored in Log6
because of constraint I. symmetrically, when M5,1 is delivered to process P4 at event (4,1),
only the new piggybacked information “P6 ϵ M5,1.Dests” is inserted in Log4 as “M5,1.Dests
={P6},” which is later propagated during multicate M4,2.
Multicast M4,3
At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated on multicast M4,3 only
to process P6 to ensure causal delivery using the delivery condition. The piggybacked
information on message M4,3 sent to process P3 must not contain this information because of
constraint II. ( the piggybacked information contains “M4,3.Dests = {P6}.” As long as any
future message sent to P6 is delivered in causal order w.r.t m4,3 sent to P6, it will also be
delivered in casual order w.r.t M5,1 sent to P6) and as M5,1 is already delivered to P4, the
information “M5.1.Dests=ø” is piggybacked on M4,3 sent to P3. Similarly, the information
“P6 ϵ M5,1.Dests” must be deleted from Log4 as it will no longer be needed, because of
constraint II. “M5.1.Dests=ø” is stored in Log4 to remember that M5,1 has been delivered or
is guaranted to be delivered in causal order to all its destination.
When message M4,2is received by processes P2 and P3, they insert the (new) piggybacked
information in their local logs, as information M5,1.Dests = P6. They both continue to store
this in Log2 and Log3 and propagate this information on multicasts until they learn at events
(2, 4) and (3, 2) on receipt of messages M3,3and M4,3, respectively, that any future message
is expected to be delivered in causal order to process P6, w.r.t. M5,1sent toP6. Hence
by constraint II, this information must be deleted from Log2 andLog3. The flow of events is
given by;
When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2), this
is inferred to be valid current implicit information about multicast M5,1because the log
Therefore, the explicit information in Log3 is inferred to be old and must be deleted to
identical.
Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further, P6
propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current implicit
information M5,1 has been delivered to P6 by its very absence in the explicit information.
= P6 it is used only to ensure causal delivery of M4,3 using the Delivery Condition,
and is not inserted in Log6 (constraint I) – further, the presence of M5,1 .Dests = P4 in Log6
implies the implicit information that M5,1 has already been delivered to P6. Also, the
absence of P4 in M5,1 .Dests in the explicit piggybacked information implies the implicit
information that M5,1 has been delivered or is guaranteed to be
delivered in causal order to P4, and, therefore, M5,1. Dests is set to ∅ in Log6.
When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests
= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery
Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,
which gives the implicit information that M5,1 has been delivered or is guaranteed
Processing at P1
M5,1 Dests for deletion from Log1. Simultaneously, M5,1 Dests = P6 in Log1 implies
Bloom’s
Qn No Question Answer
Knowledge Level
c) Bellman-Ford algorithm
d) Paxos algorithm
d) Basic point-to-point
communication (Unicasting).
6. In the context of causal order (CO), what
does Propagation Constraint I specify?
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 206- 215
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.8
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
TOTAL ORDER
For each pair of processes Pi and Pj and for each pair of messages Mx and My that are
delivered to both the processes, Pi is delivered Mx before My if and only if Pj is delivered
Mxbefore My.
Each process sends the message it wants to broadcast to a centralized process, which relays
all the messages it receives to every other process over FIFO channels.
Complexity: Each message transmission takes two message hops and exactly n messages in a
system of n processes.
Drawbacks: A centralized algorithm has a single point of failure and congestion, and is not
an elegant solution.
Sender side
Phase 1
In the first phase, a process multicasts the message M with a locally unique tag and
Phase 2
The sender process awaits a reply from all the group members who respond with a
Phase 3
Receiver Side
The receiver receives the message with a tentative timestamp. It updates the variable
priority that tracks the highest proposed timestamp, then revises the proposed timestamp to
the priority, and places the message with its tag and the revised timestamp at the tail of the
queue temp_Q. In the queue, the entry is marked as undeliverable.
Phase 2
The receiver sends the revised timestamp back to the sender. The receiver then waits
Phase 3
The final timestamp is received from the multicaster. The corresponding message
entry in temp_Q is identified using the tag, and is marked as deliverable after the revised
timestamp is overwritten by the final timestamp.
The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under consideration,
that message entry has to be placed in its sorted position in the queue.
If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from temp_Q, and
enqueued in deliver_Q.
Complexity
This algorithm uses three phases, and, to send a message to n − 1 processes, it uses 3(n – 1)
Bloom’s
Qn No Question Answer
Knowledge Level
d) 3 message hops
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Author(s) Name
Global State Title of the book
and Snapshot Page numbers
Recording Algorithm – Introduction – System Model and
Topic
Definitions Distributed Computing
Ajay D. Kshemkalyani
Learning Outcome Principles, Algorithms 215-220 Bloom’s
and Mukesh Singhal (LO) At the end of this lecture, students will be
able to and Systems Knowledge Level
Reference Book:
recording a global snapshot. Use relevant examples and
diagrams to illustrate the interpretation of cuts and
consistency conditions.
memory and communicate asynchronously with each other by message passing. Each
component of has a local state. The state of the process is the local memory and a history of
its activity.
The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel. The global state of a distributed system is a
collection of the local states of its components.
If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.
The absence of shared memory necessitates ways of getting a coherent and complete
view of the system based on the local states of individual processes.
This would be possible if the local clocks at processes were perfectly synchronized
or if there were a global system clock that could be instantaneously read by the processes.
System Model
The system consists of a collection of n processes, p1, p2,…,pn that are connected
by channels.
The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc., and may be highly dependent on the local context of the
distributed application.
CS8603 DS
The state of channel Cij, denoted by SCij, is given by the set of messages in transit
in the channel.
The events that may happen are: internal event, send (send (mij)) and receive
(rec(mij)) events.
A channel is a distributed entity and its state depends on the local states of the
processes on which it is incident.
In the FIFO model, each channel acts as a first-in first-out message queue and, thus,
message ordering is preserved by a channel.
In the non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random order.
The global state of a distributed system is a collection of the local states of the
Law of conservation of messages: Every message mijthat is recorded as sent in the local
state of a process pi must be captured in the state of the channel Cij or in the collected
local state of the receiver process pj.
In a consistent global state, every message that is recorded as received is also recorded
as sent. Such a global state captures the notion of causality that a message cannot be
received if it was not sent.
Meenakshi.R
Consistent global states are meaningful global states and inconsistent global states are not
meaningful in the sense that a distributed system can never be in an inconsistent state.
Interpretation of cuts
reasoning about the global states of a computation. A cut is a line joining an arbitrary point
on each process line that slices the space–time diagram into a PAST and a FUTURE.
A consistent global state corresponds to a cut in which every message received in the
PAST of the cut has been sent in the PAST of that cut. Such a cut is known as a consistent
cut.
In a consistent snapshot, all the recorded local states of processes are concurrent; that
is, the recorded local state of no process casually affects the recorded local state of any other
process.
The non-availability of global clock in distributed system, raises the following issues:
Issue 1:
How to distinguish between the messages to be recorded in the snapshot from those
not to be recorded?
Answer:
Any message that is sent by a process before recording its snapshot, must be
Any message that is sent by a process after recording its snapshot, must not be
recorded in the global snapshot (from C2).
Issue 2:
How to determine the instant when a process takes its snapshot? The answer
Answer:
A process pj must record its snapshot before processing a message mij that was sent by
process pi after recording its snapshot.
Meenakshi.R
Assessment questions to the lecture
Bloom’s
Qn No Question Answer
Knowledge Level
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book:
Distributed Computing
Ajay D. Kshemkalyani
Principles, Algorithms 87-93
and Mukesh Singhal
and Systems
Unit LOGICAL TIME AND GLOBAL STATE Lecture No C303.2.10
Learning Outcome (LO) At the end of this lecture, students will be Bloom’s
able to Knowledge Level
servers. These processes communicate with each other through messaging channels.
A snapshot captures the local states of each process along with the state of each
communication channel.
Check pointing
Collecting garbage
Detecting deadlocks
Debugging
Chandy–Lamport algorithm
The algorithm will record a global snapshot for each process channel.
Since channels are FIFO, a marker separates the messages in the channel into
those to be included in the snapshot from those not to be recorded in the snapshot.
This addresses issue I1. The role of markers in a FIFO system is to act as delimiters
for the messages in the channels so that the channel state recorded by the process at the
receiving end of the channel satisfies the condition C2.
Initiating a snapshot
Start recording all incoming messages from channels Cij for j not equal to i.
Propagating a snapshot
Meenakshi.R
Send the marker message to all other processes.
Record all incoming messages from channels Clj for 1 not equal to j or
k.
Terminating a snapshot
All process have received a marker on all the N-1 incoming channels.
A central server can gather the partial state to build a global snapshot.
Since a process records its snapshot when it receives the first marker on any
incoming channel, no messages that follow markers on the channels incoming to it are
recorded in the process’s snapshot.
Due to FIFO property of channels, it follows that no message sent after the marker
on that channel is recorded in the channel state. Thus, condition C2 is satisfied.
When a process pj receives message mij that precedes the marker on channel Cij, it
acts as follows: if process pj has not taken its snapshot yet, then it includes mij in its
recorded snapshot. Otherwise, it records mij in the state of the channel Cij. Thus, condition
C1 is satisfied.
Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is the diameter of the
network.
This happens because a process can change its state asynchronously before the markers it
sent
are received by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some equivalent
executions.
The recorded global state is a valid state in an equivalent execution and if a stable property
(i.e., a property that persists) holds in the system before the snapshot algorithm begins, it
holds in the recorded global snapshot.
Bloom’s
Qn No Question Answer
Knowledge Level
Students have to prepare answers for the following questions at the end of the lecture
Bloom’s
Qn
Question Marks CO Knowledge
No
Level
Reference Book: