Synchronization
Synchronization
Synchronization
CS755! 6-1!
Synchronization Problem
• How processes cooperate and synchronize with one another in a
distributed system !
➡ In single CPU systems, critical regions, mutual exclusion, and other
synchronization problems are solved using methods such as semaphores.!
➡ These methods will not work in distributed systems because they
implicitly rely on the existence of shared memory. !
➡ Examples: !
✦ Two processes interacting using a semaphore must both be able to access the
semaphore. In a centralized system, the semaphore is stored in the kernel and
accessed by the processes using system calls!
✦ If two events occur in a distributed system, it is difficult to determine which
event occurred first.!
• How to decide on relative ordering of events!
➡ Does one event precede another event?!
➡ Difficult to determine if events occur on different machines.!
CS755! 6-2!
Issues
• Part 1 - Clocks!
➡ How to synchronize events based on actual time?!
✦ Clock synchronization!
➡ How to determine relative ordering?!
✦ Logical clocks!
• Part 2 - Global State and Election!
➡ What is the global state of a distributed system?!
➡ How do we determine the coordinator of a distributed system?!
CS755! 6-3!
Clock synchronization
• In a centralized system:!
➡ Time is unambiguous: A process gets the time by issuing a system call to the
kernel. If process A gets the time and latter process B gets the time. The value
B gets is higher than (or possibly equal to) the value A got!
➡ Example: UNIX make examines the times at which all the source and object
files were last modified:!
✦ If time (input.c) > time(input.o) then recompile input.c !
✦ If time (input.c) < time(input.o) then no compilation is needed !
CS755! 6-4!
Clock synchronization (2)
• In a distributed system:!
➡ Achieving agreement on time is not trivial!!
CS755! 6-5!
Logical vs Physical Clocks
• Clock synchronization need not be absolute! (due to Lamport, 1978): !
➡ If two processes do not interact, their clocks need not be synchronized.!
➡ What matters is not that all processes agree on exactly what time is it, but
rather, that they agree on the order in which events occur.!
➡ We will discuss this later under logical clock synchronization .!
CS755! 6-6!
Logical Clock Synchronization
• Happens-before relation!
➡ If a and b are events in the same process, and a happens-before b, then
ab is true!
➡ If a is the event of a message being sent by one process, and b is the event
of the same message being received by another process, then ab is also
true!
➡ If two events, a and b, happen in different processes that do not exchange
messages , then ab is not true, but neither is ba. These events are said
to be concurrent (ab). !
➡ happens-before is transitive: ab and bc ac !
p1
a b m1
p2 Physical
c d time
m2
p3
e f
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-7!
Lamport s Algorithm
• Capturing happens-before relation!
• Each process pi has a local monotonically increasing counter, called its
logical clock Li!
• Each event e that occurs at process pi is assigned a Lamport timestamp Li(e)!
• Rules:!
➡ Li is incremented before event e is issued at pi: Li := Li + 1!
➡ When pi sends message m, it adds t = Li: (m, t) [this is event send(m)]!
➡ On receiving (m, t), pj computes Lj := max(Lj, t); Lj := Lj + 1; timestamp event
receive(m) ! 1 2
p1
a b m1
3 4
Physical
p2
time
c d
m2
1 5
p3
e f
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-8!
0! 1!
Example
2! 0! 1! 2!
0! 0! 0! 0! 0! 0!
6! A! 8! 10! 6! A! 8! 10!
12! 16! 20! 12! 16! 20!
18! 24! B! 30! 18! 24! B! 30!
24! 32! 40! 24! 32! 40!
30! 40! 50! 30! 40! 50!
36! 48! C! 60! 36! 48! C! 60!
42! 56! 70! 42! 61! 70!
48! D! 64! 80! 48! D! 69! 80!
54! 72! 90! 70! 77! 90!
60! 80! 100! 76! 85! 100!
a) Three processes, each with its own clock. The clocks run at different rates.!
b) Lamport's algorithm corrects the clocks.!
• Lamport solution:!
➡ Between every two events, the clock must tick at least once!
➡ No two events occur at exactly the same time. If two events happen in processes
1 and 2, both with time 40, the former becomes 40.1 and the latter becomes 40.2!
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2007
6-9!
Lamport Timestamps
a) The updates have to be ordered the same way across the replicas.!
b) This can be achieved by a totally ordered multicast algorithm.!
c) Totally ordered multicasting can be achieved by means of Lamport clocks.!
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2007
6-10!
Part 2
Global State & Election
Algorithms
CS755! 6-11!
Global State
• Sometimes it is useful to know the global state of a distributed system!
➡ Garbage collection!
p1! p2!
Object! message
reference!
garbage object
➡ Deadlocks!
p1! wait-for
p2!
wait-for
➡ Termination!
p1! p2!
activate
passive
passive
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-12!
Distributed Snapshot
• Represents a state in which the distributed system might have been.!
➡ Consistent global state: If A sends a message to B and and the system
records B receiving the message, it should also record A sending it.!
!! 0 1 2 3
e1 e1 e1 e1
p1
m1 m2
Physical
p2
0 1 2 time
e2 e2 e2
Inconsistent cut
Consistent cut
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-13!
Election Problem
• Many algorithms require one process to act as a coordinator!
✦ Example: Coordinator in the centralized mutual exclusion algorithm !
• In general, it does not matter which process takes on this special
responsibility, but one has to do it. !
• How to elect a coordinator? !!
CS755! 6-14!
General Approach
• Assumptions:!
➡ Each process has a unique number (e.g., its network address) to distinguish
them and hence to select one!
➡ One process per machine: For simplicity!
➡ Every process knows the process number of every other process!
➡ Processes do not know which processes are currently up and which ones are
currently down!
• Approach:!
➡ Locate the process with the highest process number and designate it as
coordinator.!
➡ Election algorithms differ in the way they do the location !
CS755! 6-15!
The Bully Algorithm
• When a process, P, notices that the coordinator is no longer responding
to requests, it initiates an election:!
➡ P sends an ELECTION message to all processes with higher numbers!
➡ If no one responds, P wins the election and becomes coordinator!
➡ If one of the higher-ups answers, it takes over. P s job is done!
• When a process gets an ELECTION message from one of its lower-
numbered colleagues:!
➡ Receiver sends an OK message back to the sender to indicate that he is
alive and will take over!
➡ Receiver holds an election, unless it is already holding one!
➡ Eventually, all processes give up but one, and that one is the new
coordinator.!
➡ New coordinator announces its victory by sending all processes a
message telling them that starting immediately it is the new coordinator!
• If a process that was previously down comes back:!
➡ It holds an election. If it happens to be the highest process currently
running, it will win the election and take over the coordinator s job!
CS755!
• The biggest guy in town always wins!! 6-16!
Example
2! 1! 2! 1! 2! 1!
4!
Election!
4!
OK! 4!
5! 5! 5!
OK!
Elec
Election!
tion
0! 6! 0! 6! 0! 6!
!
7! 3! 7! 3! 7! 3!
Previous!
coordinator!
has crashed!
2! 1! 2! 1!
4! 5! 4! 5!
OK! Coordinator!
0! 6! 0! 6!
7! 3! 7! 3!
Coordinator!
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2002
6-17!
A Ring Algorithm
• Use a ring (processes are physically or logically ordered, so that each
process knows who its successor is). No token is used. !
• Algorithm: !
➡ When any process notices that coordinator is not functioning: !
✦ Builds an ELECTION message (containing its own process number)!
✦ Sends the message to its successor (if successor is down, sender skips over it and
goes to next member along the ring, or the one after that, until a running process
is located)!
✦ At each step, sender adds its own process number to the list in the message!
➡ When the message gets back to the process that started it all: !
✦ Process recognizes the message containing its own process number!
✦ Changes message type to COORDINATOR!
✦ Circulates message once again to inform everyone else: Who the new
coordinator is (list member with highest number); Who the members of the new
ring are!
✦ When message has circulated once, it is removed!
CS755! 6-18!
Example
5! Election message!
6! 1! 2!
0! 2!
0! 3!
2!
5! 3!
6!
Previous coordinator! 7! 4!
has crashed!
No response!
6! 5!
5!
Election
message!
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2007
6-19!
Part 3
Sharing in Distributed
Systems
CS755! 6-20!
Mutual Exclusion Problem
• Systems involving multiple processes are often programmed using critical
regions.!
• When a process has to read or update certain shared data structure, it first
enters a critical region to achieve mutual exclusion, i.e., ensure that no
other process will use the shared data structure at the same time !
• In single-processor systems, critical regions are protected using
semaphores, and monitors or similar constructs. How to accomplish the
same in distributed systems?!
➡ We assume, for now, that transactions are not supported by the system!
CS755! 6-21!
Protocols & Requirements
• Application-level protocol !
enter() ! !//enter critical section - block if necessary!
resourceAccesses() !// access shared resources in critical section!
exit() ! !// leave critical section - other processes may enter!
• Requirements!
1. At most one process may execute in the critical section (safety)!
2. Requests to enter and exit the critical section eventually succeed
(liveness)!
3. If one request to enter the critical section happened-before another,
then entry to the critical section is granted in that order ( order)!
CS755! 6-22!
A Centralized Algorithm
(Central Server Algorithm)
• Shortcomings!
➡ Coordinator: A single point of failure; A performance bottleneck!
➡ If processes normally block after making a request, they cannot distinguish a
dead coordinator from permission denied !
CS755! 6-24!
A Distributed Algorithm
• Based on a total ordering of events in a system (happens-before
relation)!
• Algorithm: !
➡ When a process wants to enter a critical region: !
✦ Builds a message: {name of critical region; process number; current time}!
✦ Sends the message to all other processes (assuming reliable transfer)!
➡ When a process receives a request message from another process: !
✦ If the receiver is not in the critical region and does not want to enter it, it
sends back an OK message to the sender!
✦ If the receiver is already in the critical region, it does not reply. Instead it
queues the request!
✦ If the receiver wants to enter the critical region but has not yet done so, it
compares the timestamp with the one contained in the message it has sent
everyone: If the incoming message is lower, the receiver sends back an OK
message; otherwise the receiver queues the request and sends nothing!
CS755! 6-25!
Example
a) Two processes want to enter the same critical region at the same
moment.
b) Process 0 has the lowest timestamp, so it wins.
c) When process 0 is done, it sends an OK also, so 2 can now enter
the critical region.
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2007
6-26!
A Token Ring Algorithm
Processes!
0! 2! 4! 9! 7! 1! 6! 5! 8! 3!
Network! 2!
1! 3!
0! 4!
9! 5!
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755! © Prentice-Hall, Inc. 2007
6-27!
A Token Ring Algorithm (2)
• Pros:!
➡ Guarantees mutual exclusion (only 1 process has the token at any instant)!
➡ No starvation (token circulates among processes in a well-defined order)!
• Cons:!
➡ Regeneration of the token if it is ever lost: Detecting that the token is lost is
difficult, since the amount of time between successive appearances of the
token on the network is unbounded!
➡ Recovery if a process crashes: It is easier than in other cases though. If we
require a process receiving the token to acknowledge receipt. A dead process
is detected when its neighbor tries to give the token and fails. Dead process is
then removed and the token handed to the next process down the line.!
CS755! 6-28!
Comparison of the 3
Algorithms
Messages per Delay before entry
Algorithm Problems
entry/exit (in message times)
Crash of any
Distributed 2(n–1) 2(n–1)
process
Lost token,
Token ring 1 to ' 0 to n – 1
process crash
System may be
System in a temporarily in an System in a
consistent inconsistent state consistent
state during execution state
CS755! 6-30!
Transaction Primitives
• Special primitives required for programming using transactions!
➡ Supplied by the operating system or by the language runtime system!
CS755! 6-31!
Centralized Transaction
Execution
User! User!
…!
Application ! Application !
Read, Write, !
Results!
Abort, EOT!
Scheduler!
(SC)!
Scheduled!
Results!
Operations!
Recovery!
Manager!
(RM)!
CS755! 6-32!
Distributed Transaction
Execution
User application!
Local!
RM! RM! Recovery!
Protocol!
CS755! 6-33!
Properties of Transactions
ATOMICITY!
➡ All or nothing!
➡ Multiple operations combined as an atomic transaction!
CONSISTENCY!
➡ No violation of integrity constraints!
➡ Transactions are correct programs!
DURABILITY!
➡ Committed updates persist!
➡ Database recovery!
CS755! 6-34!
Transactions Provide…
• Atomic and reliable execution in the presence of failures!
• Correct execution in the presence of multiple user accesses !
• Correct management of replicas (if they support it) !
CS755! 6-35!
Lost Update Problem
Initial values: A=100, B=200, C=300
Transaction
T
:
Transaction
U:
balance = b.getBalance();
balance = b.getBalance();
b.setBalance(balance*1.1);
b.setBalance(balance*1.1);
a.withdraw(balance/10);
c.withdraw(balance/10);
balance = b.getBalance();
$200
balance = b.getBalance();
$200
b.setBalance(balance*1.1);
$220
b.setBalance(balance*1.1);
$220
a.withdraw(balance/10);
$80
c.withdraw(balance/10);
$280
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-36!
Inconsistent Retrievals Problem
Initial values: A=200, B=200
Transaction V:
Transaction W:
a.withdraw(100);
aBranch.branchTotal();
b.deposit(100);
a.withdraw(100);
$100
total = a.getBalance();
$100
total = total+b.getBalance();
$300
total = total+c.getBalance();
b.deposit(100);
$300
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-37!
Conflict Rules
Operations of different
Conflict
Reason
transactions
read
read
No
Because the effect of a pair of
read
operations
does not depend on the order in which they are
executed
read
write
Yes
Because the effect of a
read
and a
write
operation
depends on the order of their execution
write
write
Yes
Because the effect of a pair of
write
operations
depends on the order of their execution
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-38!
Resolving Lost Update Example
Transaction T:
Transaction U:
balance = b.getBalance();
balance = b.getBalance();
b.setBalance(balance*1.1);
b.setBalance(balance*1.1);
a.withdraw(balance/10);
c.withdraw(balance/10);
balance = b.getBalance();
$200
b.setBalance(balance*1.1);
$220
balance = b.getBalance();
$220
b.setBalance(balance*1.1);
$242
a.withdraw(balance/10);
$80
c.withdraw(balance/10);
$278
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-39!
Resolving Inconsistent Retrieval
Problem
Transaction V:
Transaction W:
a.withdraw(100);
aBranch.branchTotal();
b.deposit(100);
a.withdraw(100);
$100
b.deposit(100);
$300
total = a.getBalance();
$100
total = total+b.getBalance();
$400
total = total+c.getBalance();
...
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-40!
Distributed Transaction
Serializability
• Somewhat more involved. Two execution histories have to be considered:!
➡ local histories !
➡ global history!
CS755! 6-41!
Locking-Based Algorithms
• Transactions indicate their intentions by requesting locks from the
scheduler (called lock manager).!
• Locks are either read lock (rl) [also called shared lock] or write lock (wl)
[also called exclusive lock]!
• Read locks and write locks conflict (because Read and Write operations are
incompatible)!
! ! ! rl ! wl!
! !rl ! yes !no!
! !wl ! no !no!
• Locking works nicely to allow concurrent processing of transactions.!
CS755! 6-42!
Two-Phase Locking (2PL)
1. A transaction locks an object before using it.!
2. When an object is locked by another transaction, the requesting
transaction must wait.!
3. When a transaction releases a lock, it may not request another
lock.!
Lock point!
Obtain lock!
Release lock!
No. of locks!
Phase 1! Phase 2!
BEGIN! END!
CS755! 6-43!
Strict 2PL
Hold locks until the end.
No. of locks!
Obtain lock!
Release lock!
Transaction!
BEGIN! END! duration!
period of!
use!
CS755! 6-44!
Locking Example
Transaction T:
Transaction U:
balance = b.getBalance();
balance = b.getBalance();
b.setBalance(bal*1.1);
b.setBalance(bal*1.1);
a.withdraw(bal/10);
c.withdraw(bal/10);
Operations
Locks
Operations
Locks
openTransaction
bal = b.getBalance();
write lock B
b.setBalance(bal*1.1);
openTransaction
a.withdraw(bal/10);
write lock A
bal = b.getBalance();
waits for T’s
lock on B
closeTransaction
unlock A, B
write lock B
b.setBalance(bal*1.1);
c.withdraw(bal/10);
write lock C
closeTransaction
unlock B, C
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755! © Addison-Wesley Publishers 2000 6-45!