Distributed Systems: C5 Basic Distributed Algorithms
Distributed Systems: C5 Basic Distributed Algorithms
Distributed Systems
- Master in CS -
C5
Basic Distributed Algorithms
Fall 2020
Introduction
Program structure
2
Introduction
Program structure (2)
Introduction
Program structure (4)
4
Introduction
Program structure (5)
1. Declare process-local variables whose scope is global to the process, and message types
2. Declare shared variables, if any, (for distributed shared memory systems) . They are
explicitly labeled as such
3. Initialization code.
4. The repetitive and the alternative commands are not explicitly shown as in the
presented syntax
5. The guarded commands are shown as explicit modules or procedures (e.g. lines 1–4 in
the Algorithm) The guard usually checks for the arrival of a message of a certain type,
perhaps with additional conditions on some parameter values and other local variables
6. The body of the
procedure gives the list
of commands to be
executed if the guard
evaluates to true
7. Process termination
may be explicitly stated
in the body of any
procedure(s)
8. The symbol ⊥ is used
to denote an undefined
value. When used in a
comparison, its value is
- ∞.
Introduction
Elementary DS related graph
algorithms (1)
Objective
– Presentation of the elementary distributed
algorithms on graphs
Assumptions
– Un-weighted undirected edges
– Communication by message-passing on the
edges
Facts of Distributed Algorithms
– Each node has only partial view of the graph
(system) (The set of its immediate neighbors)
– A node can communicate with only its
immediate neighbors along the incident edges
6
Introduction
Elementary DS related graph
algorithms (1)
1. Synchronous single-initiator
spanning tree algorithm using flooding
8
Synchronous single-initiator
spanning tree algorithm using
flooding
-root initiates a flooding of
QUERY messages in the graph
to identify tree edges
-The parent of a node is that
node from which a QUERY is
first received
-If multiple QUERYs are
received in the same round,
one of the senders is randomly
chosen as the parent
Example
10
10
Termination
The algorithm terminates after all
the rounds are executed
Algorithm can be straightforward
modified so that a process exits
after the round in which it sets its
parent variable
11
11
Complexity
The local space complexity at a node
– of the order of the degree of edge incidence
The local time complexity at a node
– of the order of (diameter + degree of edge
incidence)
The global space complexity
– the sum of the local space complexities
Thisalgorithm sends at least one
message per edge, and at most two
messages per edge.
– The number of messages is between l and 2l.
The message time complexity is
– d rounds or message hops
12
12
Other features
The resulted spanning tree
– Breadth-first tree (BFS)
The code is the same for all
processes but the pre-designated
root executes a different logic.
=> In the strictest sense, the algorithm
is asymmetric
13
13
2. Asynchronous single-initiator
spanning tree algorithm using flooding
Assumption
– This algorithm assumes a designated root node
which initiates the algorithm
The pseudo-code for each process Pi is
shown in Algorithm 2
Messages: QUERY, ACCEPT, REJECT
14
14
Asynchronous single-initiator spanning
tree algorithm using flooding
15
16
Asynchronous single-initiator
spanning tree algorithm using flooding
17
17
18
18
Asynchronous single-initiator spanning
tree algorithm using flooding
Termination
19
19
20
20
Asynchronous single-initiator spanning tree
algorithm using flooding Example
A
F
E C
B D
Figure 2 (source
[3]) Asynchronous
single-initiator
example
Example execution of the asynchronous algorithm
The resulting spanning tree rooted at A is shown in boldface
The numbers next to the QUERY messages indicate the
approximate chronological order in which messages get sent.
Each procedure is executed atomically:
=> the sending of a message sent at a particular time is triggered
by the receipt of a corresponding message at the same time
Concurrently and independently actions
– Indicated by the same numbers used for messages sent by
different nodes
ACCEPT and REJECT messages
– not shown to keep the figure simple
It does not matter when the ACCEPT and REJECT messages
are delivered 21
21
22
Asynchronous single-initiator spanning
tree algorithm using flooding
Example
23
23
24
24
3. Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Concurrent initiation assumption
– Any node may spontaneously initiate the spanning tree
algorithm
– Initiation precondition - the initiator has not already
been invoked locally due to the receipt of a QUERY
message
» In other words: two or more processes that are not yet
participating in the algorithm initiate the algorithm
concurrently
Modified Algorithm 2 => Algorithm 3
– Algorithm 3 objective is to construct a single spanning
tree
When concurrent initiations are detected, two
options are available (see Option 1 and Option 2
in the following slides)
Note.
– Even though there can be multiple concurrent
initiations, along any single edge, only two concurrent
initiations will be detected
25
25
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Option 1
– When two concurrent initiations are
detected by two adjacent nodes that
have sent to each other a QUERY
from different initiations, the two
partially computed spanning trees can
be merged
» This merging cannot be done based only
on local knowledge or there might be
cycles
26
26
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Example
In Figure 3, consider that the algorithm is initiated
concurrently by A, G, and J.
Dotted lines show the portions of the graphs covered by the
three algorithms.
At this time, the initiations by A and G are detected along
edge BD, the initiations by A and J are detected along edge
CF, the initiations by G and J are detected along edge HI.
If the three partially computed spanning trees are merged
along BD, CF, and HI => no longer a spanning tree
Even if there are just two initiations, the two partially
computed trees may “meet” along multiple edges in the
graph, and care must be taken not to introduce cycles during
the merger of the trees
27
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Option 2
Suppress the instance initiated by
one root and continue the instance
initiated by the other root, based on
some rule such as tie-breaking using
the processor identifier
– It must be ensured that the rule is
correct
28
28
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Example
A’s initiation is suppressed due to
the conflict detected along BD,
G’s initiation is suppressed due to
the conflict detected along HI, and
J’s initiation is suppressed due to
the conflict detected along CF,
=> the algorithm hangs and a tie
rule must be used
Algorithm 3 uses Design 2 option
29
29
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Algorithm description
Tie rule
– Allows only the algorithm initiated by
the root with the higher processor
identifier to continue
To implement this:
– The messages need to be enhanced with
a parameter that indicates the root node
which initiated that instance of the
algorithm
30
30
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Algorithm description
Algorithm 3
(source [3])
Spanning tree
algorithm
(asynchronous)
without assuming a
designated root.
Initiators use
flooding to start the
algorithm. The code
shown is for
processor Pi where
1≤i≤n
31
31
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Algorithm description
When QUERY(newroot)
from j arrives at i, (three
possibilities):
1. newroot > myroot
• i should suppress its current
execution due to its lower
priority
• i reinitializes its data structures
and joins j’s subtree with
newroot as the root
2. newroot = myroot
• j’s execution is initiated by the
same root as i’s initiation, and i
has already identified its
parent.
• => A REJECT is sent to j
3. newroot < myroot
• j’s root has a lower priority
and hence i does not join j’s
subtree
• i sends a REJECT
• j will eventually receive a
QUERY(myroot) from i and
abandon its current execution
in favour of i’s myroot (or a
larger value).
32
32
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Algorithm description
When ACCEPT(newroot)
from j arrives at i, there are
three possibilities:
1. newroot = myroot
- The ACCEPT is in response to
a QUERY sent by node i
- The ACCEPT is processed
normally
2. newroot < myroot
- The ACCEPT is in response to
a QUERY i had sent to j
earlier, but i has updated its
myroot to a higher value since
then
- Ignore the ACCEPT message.
3. newroot > myroot
- The ACCEPT is in response to
a QUERY i had sent earlier
- But i never updates its myroot
to a lower value.
- => This case cannot arise
33
Asynchronous concurrent-initiator
spanning tree algorithm using flooding
Termination
– Main algorithm drawback
» only the root knows when its algorithm has
terminated
– To inform the other nodes, the root can send
a special message along the newly
constructed spanning tree edges
Complexity
– The time complexity of the algorithm is O(l)
messages, and
– The number of messages is O(nl)
34
34
4. Broadcast and converge-cast on a tree
35
35
36
36
Broadcast and converge-cast on a tree
Broadcast algorithm
37
37
Objective
– collects information from all the nodes
at the root node in order to compute
some global function
Algorithm initiation
– Initiated by the leaf nodes of the tree
» usually in response to receiving a request
sent by the root using a broadcast
38
38
Broadcast and converge-cast on a tree
Converge-cast algorithm
39
Termination
– The termination condition for each
node in a broadcast as well as in a
converge-cast is self-evident.
Complexity
– Space complexity
» Each broadcast and each converge-cast
requires n−l messages
– Time complexity
» Each broadcast and each converge-cast
time is proportional to maximum
height h of the tree which is O(n)
40
40
Broadcast and converge-cast on a tree
Examples
Example 1 (converge-case)
– Assumption
» Each node has an integer variable
associated with the application
– Objective
» Calculate the minimum of these variables
– How
» Each leaf node reports its local value to its
parent
» When a non-leaf node receives a report
from all its children
u Computes the minimum of those values
u Sends this minimum value to its parent
41
41
Example 2 (converge-cast)
– Objective
» Solving the leader election problem (details in a
future lecture)
– How
» Leader election - all the processes agree on a
common distinguished process (the leader)
– A leader is required in many distributed
systems and algorithms because
» algorithms are typically not completely symmetrical
» some process has to take the lead in initiating the
algorithm
» another reason – it is not desirable that all processes
replicate the algorithm initiation, to save on
resources
42
42
5. Single source shortest path
algorithm: synchronous Bellman–Ford
Bellman–Ford sequential shortest path algorithm
– Finds the shortest path from a given node to all other nodes
– Alternative solution to this problem: Dijkstra’s algorithm
– Note. Belman-Ford searches the shortest path by searching in
ascending order of hops (different from Djikstra’s)
– Usage: DVR (Distance Vector Routing) Routing based on BF
algorithm
Network topology representation
– A weighted graph, with unidirectional links
– Weights (positive) may be lengths, delays or loads on the links
43
44
44
Single source shortest path algorithm:
synchronous Bellman–Ford
Algorithm 4
– A synchronous distributed algorithm to compute
the shortest path
– Assumptions
» The full topology (N, L) is not known to any process;
» No cyclic paths having negative weight
» Each process can communicate only with its neighbors
» Each process is aware of only the incident links and their
weights
» The processes know the number of nodes | N | = n
=> The algorithm is not uniform (this assumption on n is
required for termination)
45
45
46
46
Single source shortest path algorithm:
synchronous Bellman–Ford
Discussion
47
47
Termination
– As the longest path can be of length
n−1, the values of all variables are
calculated after n−1 rounds.
Complexity
– Time complexity is n−1 rounds
– Message complexity is (n− 1) l
messages
48
48
Single source shortest path algorithm:
synchronous Bellman–Ford
Case of dynamic network graph (1)
49
49
Variable
length is replaced by array
LENGTH[1..n]
– LENGTH[k] denotes the length measured
from node k as source node
– LENGTH vector is included in each
UPDATE message
– Now, the k-th component of the LENGTH
received from node m indicates the length of
the shortest path from m to root k
Variable
parent is replaced by an array
PARENT[1..n]
– PARENT[k] denotes the next hop to which
to route a packet destined for k
– The array PARENT acts as the routing table
50
50
6. Single source shortest path
algorithm: asynchronous Bellman–
Ford
Same assumptions as in
synchronous Bellman-Ford
No termination condition for nodes
– Exercise - modify the algorithm so
that each node knows when the
shortest path to itself was determined
51
51
52
52
Single source shortest path algorithm:
asynchronous Bellman–Ford
Complexity
– Exponential complexity
– Space complexity: O(c^n) number of
messages
– Time complexity: O(c^n* d)
» c is a constant
Notes
– If all links have equal weight the
algorithm effectively computes the
minimum-hop path
– The minimum-hop routing tables to all
destinations are calculated using
O(n^2 * l ) messages
53
53
54
54
Other shortest paths
algorithms
Centralized Floyd-Warshall algorithm
– Data Structures:
– Uses two n x n matrices LENGTH and VIA
» LENGTH[i,j] represents the shortest path from i to j
» LENGTH[i,j] is initialized to the initial known
conditions
» LENGTH[i,j] =
u weighi,j if i and j are neighbors
u 0 if i = j
u Infinit, otherwise
» VIA[i,j] is the first hop on shortest path from i to j
» VIA[j,j] =
u j if i and j are neighbors
u 0 if i = j
u Infinit, otherwise
55
55
56
56
9. Challenges in designing
distributed graph algorithms
The graph can change
– If either there are link or node failures, or worse still,
partitions in the network
– The graph can also change when new links and new
nodes are added to the network.
The case of mobile systems
– The presented algorithms either fail or require a more
complicated redesign if we assume that the graph
topology changes dynamically
– The graph (N, L) changes dynamically in the normal
course of execution of a distributed execution
– The challenge of mobile systems additionally needs to
deal with the new communication model
» Each node is capable of transmitting data wirelessly, and all
nodes within a certain radius can receive it.
» This is the unit-disk radius model
57
57