0% found this document useful (0 votes)

257 views355 pages

Understanding Distributed Algorithms

This document provides an introduction to distributed algorithms. It discusses how distributed systems differ from uniprocessor systems in their lack of global state knowledge, global time frame, and nondeterminism. The document outlines some basic assumptions made in analyzing distributed algorithms, such as asynchronous message passing communication. It introduces concepts like causal order and invariants that are important for understanding distributed algorithms.

Uploaded by

udaysk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

257 views355 pages

Understanding Distributed Algorithms

Uploaded by

udaysk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Distributed Algorithms

Wan Fokkink
Distributed Algorithms: An Intuitive Approach
MIT Press, 2013 (revised 1st edition, 2015)

1 / 329
Algorithms

A skilled programmer must have good insight into algorithms.

At bachelor level you were offered courses on basic algorithms:

searching, sorting, pattern recognition, graph problems, ...

You learned how to detect such subproblems within your programs,

and solve them effectively.

You’re trained in algorithmic thought for uniprocessor programs

(e.g. divide-and-conquer, greedy, memoization).

2 / 329
Distributed systems

A distributed system is an interconnected collection of

autonomous processes.

Motivation:

I information exchange

I resource sharing

I parallelization to increase performance

I replication to increase reliability

I multicore programming

3 / 329
Distributed versus uniprocessor

Distributed systems differ from uniprocessor systems in three aspects.

I Lack of knowledge on the global state: A process has no

up-to-date knowledge on the local states of other processes.
Example: termination and deadlock detection become an issue.

I Lack of a global time frame: No total order on events by

their temporal occurrence.
Example: mutual exclusion becomes an issue.

I Nondeterminism: Execution by processes is nondeterministic,

so running a system twice can give different results.
Example: race conditions.

4 / 329
Aim of this course

This course offers a bird’s-eye view on a wide range of algorithms

for basic and important challenges in distributed systems.

It aims to provide you with an algorithmic frame of mind for

solving fundamental problems in distributed computing.

I Handwaving correctness arguments.

I Back-of-the-envelope complexity calculations.

I Carefully developed exercises to acquaint you with intricacies

of distributed algorithms.

5 / 329
Message passing

The two main paradigms to capture communication in

a distributed system are message passing and shared memory.

We’ll only consider message passing.

(The course Concurrency & Multithreading is dedicated to shared memory.)

Asynchronous communication means that sending and receiving

of a message are independent events.

In case of synchronous communication, sending and receiving

of a message are coordinated to form a single event; a message
is only allowed to be sent if its destination is ready to receive it.

We’ll mainly consider asynchronous communication.

6 / 329
Communication protocols
In a computer network, messages are transported through a medium,
which may lose, duplicate or garble these messages.

A communication protocol detects and corrects such flaws during

message passing.

Example: Sliding window protocols.

B K C
A D
S R

F L E

7 0 7 0
6 1 6 1
5 2 5 2
4 3 4 3

7 / 329
Assumptions

Unless stated otherwise, we assume:

I a strongly connected network

I each process knows only its neighbors
I message passing communication
I asynchronous communication
I channels are non-FIFO
I the delay of messages in channels is arbitrary but finite
I channels don’t lose, duplicate or garble messages
I processes don’t crash
I processes have unique id’s

8 / 329
Directed versus undirected channels

Channels can be directed or undirected.

Question: What is more general, an algorithm for a directed

or for an undirected network ?

Remarks:

I Algorithms for undirected channels often include ack’s.

I Acyclic networks must always be undirected

(else the network wouldn’t be strongly connected).

9 / 329
Complexity measures

Resource consumption of an execution of a distributed algorithm

can be considered in several ways.

Message complexity: Total number of messages exchanged.

Bit complexity: Total number of bits exchanged.
(Only interesting when messages can be very long.)
Time complexity: Amount of time consumed.
(We assume: (1) event processing takes no time, and
(2) a message is received at most one time unit after it is sent.)
Space complexity: Amount of memory needed for the processes.

Different executions require different consumption of resources.

We consider worst- and average-case complexity (the latter with
a probability distribution over all executions).

10 / 329
Big O notation

Complexity measures state how resource consumption

(messages, time, space) grows in relation to input size.

For example, if an algorithm has a worst-case message complexity

of O(n2 ), then for an input of size n, the algorithm in the worst case
takes in the order of n2 messages.

Let f , g : N → R>0 .

f = O(g ) if, for some C > 0, f (n) ≤ C ·g (n) for all n ∈ N.

f = Θ(g ) if f = O(g ) and g = O(f ).

11 / 329
Formal framework

Now follows a formal framework for describing distributed systems,

mainly to fix terminology.

In this course, correctness proofs and complexity estimations of

distributed algorithms are presented in an informal fashion.

(The course Protocol Validation treats algorithms and tools to prove correctness
of distributed algorithms and network protocols.)

12 / 329
Transition systems

The (global) state of a distributed system is called a configuration.

The configuration evolves in discrete steps, called transitions.

A transition system consists of:

I a set C of configurations;
I a binary transition relation → on C; and
I a set I ⊆ C of initial configurations.

γ ∈ C is terminal if γ → δ for no δ ∈ C.

13 / 329
Executions

An execution is a sequence γ0 γ1 γ2 · · · of configurations that

either is infinite or ends in a terminal configuration, such that:
I γ0 ∈ I, and
I γi → γi+1 for all i ≥ 0
(excluding, for finite executions, the terminal γi at the end).

A configuration δ is reachable if there is a γ0 ∈ I and

a sequence γ0 γ1 γ2 · · · γk = δ with γi → γi+1 for all 0 ≤ i < k.

14 / 329
States and events

A configuration of a distributed system is composed from

the states at its processes, and the messages in its channels.

A transition is associated to an event (or, in case of synchronous

communication, two events) at one (or two) of its processes.

A process can perform internal, send and receive events.

A process is an initiator if its first event is an internal or send event.

An algorithm is centralized if there is exactly one initiator.

A decentralized algorithm can have multiple initiators.

15 / 329
Assertions

An assertion is a predicate on the configurations of an algorithm.

An assertion is a safety property if it is true in each configuration

of each execution of the algorithm.

“something bad will never happen”

An assertion is a liveness property if it is true in some configuration

of each execution of the algorithm.

“something good will eventually happen”

16 / 329
Invariants

Assertion P on configurations is an invariant if:

I P(γ) for all γ ∈ I, and
I if γ → δ and P(γ), then P(δ).

Each invariant is a safety property.

Question: Give a transition system S and an assertion P

such that P is a safety property but not an invariant for S.

17 / 329
Causal order

In each configuration of an asynchronous system, applicable events

at different processes are independent.

The causal order ≺ on occurrences of events in an execution is

the smallest transitive relation such that:
I if a and b are events at the same process and a occurs before b,
then a ≺ b; and
I if a is a send and b the corresponding receive event, then a ≺ b.

This relation is irreflexive.

a b denotes a ≺ b ∨ a = b.

18 / 329
Computations

If neither a b nor b a, then a and b are called concurrent.

A permutation of concurrent events in an execution doesn’t affect

the result of the execution.

These permutations together form a computation.

All executions of a computation start in the same initial configuration,

and if they are finite, they all end in the same terminal configuration.

19 / 329
Question

Consider the finite execution abc.

Let a ≺ b be the only causal relationship.

Which executions are in the same computation ?

20 / 329
Lamport’s clock

A logical clock C maps occurrences of events in a computation

to a partially ordered set such that a ≺ b ⇒ C (a) < C (b).

Lamport’s clock LC assigns to each event a the length k of

a longest causality chain a1 ≺ · · · ≺ ak = a.

LC can be computed at run-time:

Let a be an event, and k the clock value of the previous event
at the same process (k = 0 if there is no such previous event).

∗ If a is an internal or send event, then LC (a) = k + 1.

∗ If a is a receive event, and b the send event corresponding to a,

then LC (a) = max{k, LC (b)} + 1.

21 / 329
Question

Consider the following sequences of events at processes p0 , p1 , p2 :

p0 : a s1 r3 b
p1 : c r2 s3
p2 : r1 d s2 e

si and ri are corresponding send and receive events, for i = 1, 2, 3.

Provide all events with Lamport’s clock values.

Answer: 1 2 8 9
1 6 7
3 4 5 6

22 / 329
Vector clock

Given processes p0 , . . . , pN−1 .

We define a partial order on NN by:

(k0 , . . . , kN−1 ) ≤ (`0 , . . . , `N−1 ) ⇔ ki ≤ `i for all i = 0, . . . , N −1.

Vector clock VC maps each event in a computation to a unique

value in NN such that a ≺ b ⇔ VC (a) < VC (b).

VC (a) = (k0 , . . . , kN−1 ) where each ki is the length of a longest

causality chain a1i ≺ · · · ≺ aki i of events at process pi with aki i a.

VC can also be computed at run-time.

23 / 329
Question

Consider the same sequences of events at processes p0 , p1 , p2 :

p0 : a s1 r3 b
p1 : c r2 s3
p2 : r1 d s2 e

Provide all events with vector clock values.

Answer: (1 0 0) (2 0 0) (3 3 3) (4 3 3)
(0 1 0) (2 2 3) (2 3 3)
(2 0 1) (2 0 2) (2 0 3) (2 0 4)

24 / 329
Vector clock - Correctness

Let a ≺ b.

Any causality chain for a is also one for b. So VC (a) ≤ VC (b).

At the process where b occurs, there is a longer causality chain

for b than for a. So VC (a) < VC (b).

Let VC (a) < VC (b).

Consider the longest causality chain a1i ≺ · · · ≺ aki = a of events

at the process pi where a occurs.

VC (a) < VC (b) implies that the i-th coefficient of VC (b) is ≥ k.

So a b.

Since a and b are distinct, a ≺ b.

25 / 329
Snapshots

A snapshot of an execution of a distributed algorithm should return

a configuration of an execution in the same computation.

Snapshots can be used for:

I Restarting after a failure.
I Off-line determination of stable properties,
which remain true as soon as they have become true.
Examples: deadlock, garbage.
I Debugging.

Challenge: Take a snapshot without freezing the execution.

26 / 329
Snapshots

We distinguish basic messages of the underlying distributed algorithm

and control messages of the snapshot algorithm.

A snapshot of a (basic) execution consists of:

I a local snapshot of the (basic) state of each process, and
I the channel state of (basic) messages in transit for each channel.

A snapshot is meaningful if it is a configuration of an execution

in the same computation as the actual execution.

27 / 329
Snapshots

We need to avoid the following situations.

1. Process p takes a local snapshot, and then sends a message m

to process q, where:
• q takes a local snapshot after the receipt of m,
• or m is included in the channel state of pq.

2. p sends m to q, and then takes a local snapshot, where:

• q takes a local snapshot before the receipt of m,
• and m is not included in the channel state of pq.

28 / 329
Chandy-Lamport algorithm

Consider a directed network with FIFO channels.

Initiators take a local snapshot of their state, and send a control message
hmarkeri to their neighbors.

When a process that hasn’t yet taken a snapshot receives hmarkeri, it

I takes a local snapshot of its state, and
I sends hmarkeri to all its neighbors.

Process q computes as channel state of pq the messages it receives via pq

after taking its local snapshot and before receiving hmarkeri from p.

If channels are FIFO, this produces a meaningful snapshot.

Message complexity: Θ(E )

Worst-case time complexity: O(D)

29 / 329
Chandy-Lamport algorithm - Example

30 / 329
Chandy-Lamport algorithm - Example

hmkri
m1
snapshot
hmkri

30 / 329
Chandy-Lamport algorithm - Example

hmkri
m1 m2
hmkri
snapshot
∅

30 / 329
Chandy-Lamport algorithm - Example

snapshot
∅ hmkri
m1 m2
∅

30 / 329
Chandy-Lamport algorithm - Example

m1 {m2 }

The snapshot (processes red/blue/green, channels ∅, ∅, ∅, {m2 })

isn’t a configuration in the actual execution.

The send of m1 isn’t causally before the send of m2 .

So the snapshot is a configuration of an execution that is in
the same computation as the actual execution.

30 / 329
Chandy-Lamport algorithm - Correctness

Claim: If a post-snapshot event e is causally before an event f ,

then f is also post-snapshot.

This implies that the snapshot is a configuration of an execution

that is in the same computation as the actual execution.

Proof : The case that e and f occur at the same process is trivial.

Let e be a send and f the corresponding receive event.

Let e occur at p and f at q.
e is post-snapshot at p, so p sent hmarkeri to q before e.
Channels are FIFO, so q receives this hmarkeri before f .
Hence f post-snapshot at q.

31 / 329
Lai-Yang algorithm

Suppose channels are non-FIFO. We use piggybacking.

Initiators take a local snapshot of their state.

When a process has taken its local snapshot, it appends true

to each outgoing basic message.

When a process that hasn’t yet taken a snapshot receives a message

with true or a control message (see next slide) for the first time,
it takes a local snapshot of its state before reception of this message.

Process q computes as channel state of pq the basic messages

without the tag true that it receives via pq after its local snapshot.

32 / 329
Lai-Yang algorithm - Control messages

Question: How does q know when it can determine the channel state
of pq ?

p sends a control message to q, informing q how many basic messages

without the tag true p sent into pq.

These control messages also ensure that all processes eventually take
a local snapshot.

33 / 329
Lai-Yang algorithm - Multiple snapshots

Question: How can multiple subsequent snapshots be supported ?

Answer: Each snapshot is provided with a sequence number.

Basic message carry the sequence number of the last snapshot at

the sender (instead of true or false).

Control messages carry the sequence number of their snapshot.

34 / 329
What we need from last lecture

fully asynchronous message passing framework

channels are non-FIFO, and can be directed or undirected

configurations and transitions at the global level

states and events (internal/send/receive) at local level
(non)initiator
(de)centralized algorithm;

causal order ≺ on events in an execution

computation of executions, by reordering concurrent events

snapshot algorithm to compute a configuration of a computation

basic/control algorithm

35 / 329
Wave algorithms

A decide event is a special internal event.

In a wave algorithm, each computation (also called wave)

satisfies the following properties:

I termination: it is finite;

I decision: it contains one or more decide events; and

I dependence: for each decide event e and process p,

f ≺ e for an event f at p.

36 / 329
Wave algorithms - Example

In the ring algorithm, the initiator sends a token, which is passed on

by all other processes.

The initiator decides after the token has returned.

Question: For each process, which event is causally before

the decide event ?

The ring algorithm is an example of a traversal algorithm.

37 / 329
Traversal algorithms

A traversal algorithm is a centralized wave algorithm;

i.e., there is one initiator, which sends around a token.

I In each computation, the token first visits all processes.

I Finally, the token returns to the initiator, who performs

a decide event.

Traversal algorithms build a spanning tree:

I the initiator is the root; and
I each noninitiator has as parent the neighbor from which it
received the token first.

38 / 329
Tarry’s algorithm (from 1895)

Consider an undirected network.

R1 A process never forwards the token through the same channel

twice.
R2 A process only forwards the token to its parent when there is
no other option.

The token travels through each channel both ways, and finally
ends up at the initiator.

Message complexity: 2E messages

Time complexity: ≤ 2E time units
Gaston Tarry
39 / 329
Tarry’s algorithm - Example

p is the initiator.

3 10
p q r
8 9
12
4 7 1 11 2
6
s t
5

The network is undirected and unweighted.

Arrows and numbers mark the path of the token.

Solid arrows establish a parent-child relation (in the opposite direction).

40 / 329
Tarry’s algorithm - Spanning tree

The parent-child relation is the reversal of the solid arrows.

p q r

s t

Tree edges, which are part of the spanning tree, are solid.

Frond edges, which aren’t part of the spanning tree, are dashed.

41 / 329
Tarry’s algorithm - Correctness
Claim: The token θ travels through each channel in either direction,
and ends up at the initiator.

Proof : A noninitiator holding θ, received θ once more than it sent θ.

So by R1, this process can send θ into a channel.
Hence θ ends at the initiator, after traversing all its channels both ways.
Assume some channel isn’t traversed by θ both ways.
Let noninitiator q be the earliest visited process with such a channel.
q sends θ to its parent p. Namely, since θ visits p before q,
it traverses the channel pq both ways.
So by R2, q sends θ into all its channels.
Since q sends and receives θ an equal number of times, it also
receives θ through all its channels.
So θ travels through all channels of q both ways; contradiction.
42 / 329
Question

p q r

s t

Could this spanning tree have been produced by a depth-first search

starting at p ?

43 / 329
Depth-first search

Depth-first search is obtained by adding to Tarry’s algorithm:

R3 When a process receives the token, it immediately sends it
back through the same channel if this is allowed by R1,2.

Example: 3 6
p q r
4 5
12
10 9 1 7 2
8
s t
11

In the spanning tree of a depth-first search, all frond edges connect

an ancestor with one of its descendants in the spanning tree.

44 / 329
Depth-first search with neighbor knowledge

To prevent transmission of the token through a frond edge,

visited processes are included in the token.

The token isn’t forwarded to processes in this list

(except when a process sends the token back to its parent).

Message complexity: 2N − 2 messages

Each tree edge carries 2 tokens.

Time complexity: ≤ 2N − 2 time units

Bit complexity: Up to kN bits per message

(where k bits are needed to represent one process).

45 / 329
Awerbuch’s algorithm

A process holding the token for the first time informs all neighbors
except its parent and the process to which it forwards the token.

The token is only forwarded when these neighbors have all

acknowledged reception.

The token is only forwarded to processes that weren’t yet visited

by the token (except when a process sends the token to its parent).

46 / 329
Awerbuch’s algorithm - Complexity

Message complexity: ≤ 4E messages

Each frond edge carries 2 info and 2 ack messages.

Each tree edges carries 2 tokens, and possibly 1 info/ack pair.

Time complexity: ≤ 4N − 2 time units

Each tree edge carries 2 tokens.

Each process waits at most 2 time units for ack’s to return.

47 / 329
Cidon’s algorithm

Abolish ack’s from Awerbuch’s algorithm.

The token is forwarded without delay.

Each process p records to which process fw p it forwarded the token last.

Suppose process p receives the token from a process q 6= fw p .

Then p marks pq as frond edge and dismisses the token.

Suppose process q receives an info message from fw q .

Then q marks pq as frond edge and continues forwarding the token.

48 / 329
Cidon’s algorithm - Complexity

Message complexity: ≤ 4E messages

Each channel carries at most 2 info messages and 2 tokens.

Time complexity: ≤ 2N − 2 time units

Each tree edge carries 2 tokens.

At least once per time unit, a token is forwarded through a tree edge.

49 / 329
Cidon’s algorithm - Example

9
p q
1
3
2 8

4 7
s r t
5 6

50 / 329
Tree algorithm

The tree algorithm is a decentralized wave algorithm

for undirected, acyclic networks.

The local algorithm at a process p:

I p waits until it received messages from all neighbors except one,
which becomes its parent.
Then it sends a message to its parent.
I If p receives a message from its parent, it decides.
It sends the decision to all neighbors except its parent.
I If p receives a decision from its parent,
it passes it on to all other neighbors.

Always two (neighboring) processes decide.

51 / 329
Tree algorithm - Example

decide decide

52 / 329
Questions

What happens if the tree algorithm is applied to a network

containing a cycle ?

Apply the tree algorithm to compute the size of an undirected,

acyclic network.

53 / 329
Tree algorithm - Correctness

Claim: If the tree algorithm is run on an acyclic network with N > 1,

then exactly two processes decide.

Proof : Suppose some process p never sends a message.

p doesn’t receive a message through two of its channels, qp and rp.
q doesn’t receive a message through two of its channels, pq and sq.
Continuing this argument, we get a cycle of processes that don’t
receive a message through two of their channels.
Since the network topology is a tree, there is no cycle; contradiction.
So each process eventually sends a message.

Clearly each channel carries at least one message.

There are N − 1 channels, so one channel carries two messages.
Only the two processes connected by this channel decide.
54 / 329
Echo algorithm

The echo algorithm is a centralized wave algorithm for undirected networks.

I The initiator sends a message to all neighbors.

I When a noninitiator receives a message for the first time,

it makes the sender its parent.
Then it sends a message to all neighbors except its parent.

I When a noninitiator has received a message from all neighbors,

it sends a message to its parent.

I When the initiator has received a message from all neighbors,

it decides.

Message complexity: 2E messages

55 / 329
Echo algorithm - Example

decide

56 / 329
Questions

Use the echo algorithm to determine the largest process id.

Let each process initiate a run of the echo algorithm, tagged by its id.

Processes only participate in the “largest” wave they have seen so far.

Which of these concurrent waves complete ?

57 / 329
Communication and resource deadlock

A deadlock occurs if there is a cycle of processes waiting until:

I another process on the cycle sends some input
(communication deadlock)
I or resources held by other processes on the cycle are released
(resource deadlock)

Both types of deadlock are captured by the N-out-of-M model:

A process can wait for N grants out of M requests.

Examples:
I A process is waiting for one message from a group of processes:
N=1
I A database transaction first needs to lock several files: N = M.
58 / 329
Wait-for graph

A (non-blocked) process can issue a request to M other processes,

and becomes blocked until N of these requests have been granted.

Then it informs the remaining M − N processes that the request

can be dismissed.

Only non-blocked processes can grant a request.

A (directed) wait-for graph captures dependencies between processes.

There is an edge from node p to node q if p sent a request to q

that wasn’t yet dismissed by p or granted by q.

59 / 329
Wait-for graph - Example 1

Suppose process p must wait for a message from process q.

In the wait-for graph, node p sends a request to node q.

Then edge pq is created in the wait-for graph, and p becomes

blocked.

When q sends a message to p, the request of p is granted.

Then edge pq is removed from the wait-for graph, and p becomes

unblocked.

60 / 329
Wait-for graph - Example 2

Suppose two processes p and q want to claim a resource.

In the wait-for graph, nodes u, v representing p, q send a request to

node w representing the resource. Edges uw and vw are created.

Since the resource is free, the resource is given to say p.

So w sends a grant to u. Edge uw is removed.

The basic (mutual exclusion) algorithm requires that the resource

must be released by p before q can claim it.
So w sends a request to u, creating edge wu in the wait-for graph.

After p releases the resource, u grants the request of w .

Edge wu is removed.

The resource is given to q. Hence w grants the request from v .

Edge vw is removed and edge wv is created.

61 / 329
Drawing wait-for graphs

AND (3−out−of−3) request OR (1−out−of−3) request

62 / 329
Questions

Draw the wait-for graph for the initial configuration of the tree algorithm,
applied to the following network.

63 / 329
Static analysis on a wait-for graph

A snapshot is taken of the wait-for graph.

A static analysis on the wait-for graph may reveal deadlocks:

I Non-blocked nodes can grant requests.

I When a request is granted, the corresponding edge is removed.

I When an N-out-of-M request has received N grants,

the requester becomes unblocked.
(The remaining M − N outgoing edges are dismissed.)

When no more grants are possible, nodes that remain blocked in the
wait-for graph are deadlocked in the snapshot of the basic algorithm.

64 / 329
Static analysis - Example 1

b b

Is there a deadlock ?

65 / 329
Static analysis - Example 1

b b

Deadlock

65 / 329
Static analysis - Example 2

b b

66 / 329
Static analysis - Example 2

No deadlock

66 / 329
Bracha-Toueg deadlock detection algorithm - Snapshot

Given an undirected network, and a basic algorithm.

A process that suspects it is deadlocked, initiates

a (Lai-Yang) snapshot to compute the wait-for graph.

Each node u takes a local snapshot of:

I requests it sent or received that weren’t yet granted or dismissed;
I grant and dismiss messages in edges.

Then it computes:
Out u : the nodes it sent a request to (not granted)
Inu : the nodes it received a request from (not dismissed)

67 / 329
Bracha-Toueg deadlock detection algorithm

requests u is the number of grants u requires to become unblocked.

When u receives a grant message, requests u ← requests u − 1.

If requests u becomes 0, u sends grant messages to all nodes in Inu .

If after termination of the deadlock detection run, requests > 0 at

the initiator, then it is deadlocked (in the basic algorithm).

Challenge: The initiator must detect termination of deadlock detection.

68 / 329
Bracha-Toueg deadlock detection algorithm

Initially notified u = false and free u = false at all nodes u.

The initiator starts a deadlock detection run by executing Notify.

Notify u : notified u ← true

for all w ∈ Out u send NOTIFY to w
if requests u = 0 then Grant u
for all w ∈ Out u await DONE from w

Grant u : free u ← true

for all w ∈ Inu send GRANT to w
for all w ∈ Inu await ACK from w

While a node is awaiting DONE or ACK messages,

it can process incoming NOTIFY and GRANT messages.

69 / 329
Bracha-Toueg deadlock detection algorithm

Let u receive NOTIFY.

If notified u = false, then u executes Notify u .
u sends back DONE.

Let u receive GRANT.

If requests u > 0, then requests u ← requests u − 1;
if requests u becomes 0, then u executes Grant u .
u sends back ACK.

When the initiator has received DONE from all nodes in its Out set,
it checks the value of its free field.

If it is still false, the initiator concludes it is deadlocked.

70 / 329
Bracha-Toueg deadlock detection algorithm - Example

await DONE from v , x

NOTIFY
requests u = 2 u x requests x = 0

NOTIFY

requests v = 1 v w requests w = 1

u is the initiator