8 Distributed-Systems
8 Distributed-Systems
1
Fall 10/ Lecture 6
Distributed Systems
600.437
Distributed Transactions
Department of Computer Science
The Johns Hopkins University
Yair Amir
2
Fall 10/ Lecture 6
Lecture 6
Distributed Transactions
Further reading:
Concurrency Control and Recovery in Database Systems
P. A. Bernstein, V. Hadzilacos, and N. Goodman,
Addison Wesley. 1987.
Transaction Processing: Concepts and Techniques
Jim Gray & Andreas Reuter,
Morgan Kaufmann Publishers, 1993.
Yair Amir
3
Fall 10/ Lecture 6
Transaction Processing System
TPS
Database
Clients
Messages to
outside world
Real actions
(firing a missile)
Yair Amir
4
Fall 10/ Lecture 6
Basic Definition
Atomicity.
Consistency.
Isolation.
Durability.
Transaction - a collection of operations on the
physical and abstract application
state, with the following properties:
The ACID properties of a transaction.
Yair Amir
5
Fall 10/ Lecture 6
Atomicity
Changes to the state are atomic:
- A jump from the initial state to the result state
without any observable intermediate state.
- All or nothing ( Commit / Abort ) semantics.
- Changes include:
- Database changes.
- Messages to outside world.
- Actions on transducers.
(testable / untestable)
Yair Amir
6
Fall 10/ Lecture 6
Consistency
- The transaction is a correct transformation
of the state.
This means that the transaction is a correct
program.
Yair Amir
7
Fall 10/ Lecture 6
Isolation
Even though transactions execute concurrently,
it appears to the outside observer as if they
execute in some serial order.
Isolation is required to guarantee consistent
input, which is needed for a consistent program
to provide consistent output.
Yair Amir
8
Fall 10/ Lecture 6
Durability
- Once a transaction completes successfully
(commits), its changes to the state survive
failures (what is the failure model ? ).
- The only way to get rid of what a committed
transaction has done is to execute a
compensating transaction (which is,
sometimes, impossible).
Yair Amir
9
Fall 10/ Lecture 6
A Distributed Database
Network
Database
Manager
Database
Manager
Database
Manager
Database
Manager
Yair Amir
10
Fall 10/ Lecture 6
A Distributed Transaction
A distributed transaction is composed of
several sub-transactions, each running on a
different site.
Each database manager (DM) can decide to
abort (the veto property).
An Atomic Commitment Protocol (ACP) is run
by each of the DMs to ensure that all the
subtransactions are consistently committed
or aborted.
Yair Amir
11
Fall 10/ Lecture 6
Atomic Commitment Protocol
All the DM that reach a decision, reach the same
decision.
Decisions are not reversible.
A Commit decision can only be reached if all the DMs
voted to commit.
If there are no failures and all the DMs voted to
commit, the decision will be Commit.
At any point, if all failures are repaired, and no new
failures are introduced, then all the DMs eventually
reach a decision.
A correct ACP guarantees that:
Yair Amir
12
Fall 10/ Lecture 6
Two Phase Commit
Forced disk write
Lazy disk write
Send prepare to commit
Coordinator Participant
Yair Amir
13
Fall 10/ Lecture 6
Forced disk write
Lazy disk write
Return vote (ready or abort)
Two Phase Commit
Coordinator Participant
Yair Amir
14
Fall 10/ Lecture 6
Forced write
Lazy write
Send decision (commit or abort)
Two Phase Commit
Coordinator Participant
Yair Amir
15
Fall 10/ Lecture 6
State Diagram for 2PC
Init
Wait
Commit Abort
Coordinator
Init
Wait
Commit
Abort
Participant
Abort
message
Commit
message
Ready
vote
Abort
vote
send prepare
All voted
Ready
Timeout
or abort
Yair Amir
16
Fall 10/ Lecture 6
Presumed Abort 2PC
When the recovery mechanism has no
information about a transaction, it presumes that
the transaction has been aborted.
Coordinator Participant
Forced write if commit
Lazy write if abort
Forced write if commit
Lazy write if abort
Yair Amir
17
Fall 10/ Lecture 6
Presumed Commit 2PC
When the recovery mechanism has no
information about a transaction, it presumes that
the transaction has been committed.
Coordinator Participant
Forced write if abort
Lazy write if commit
Forced write if abort
Lazy write if commit
Force write
(On recovery-
needs to talk to all
the participants!!!)
Yair Amir
18
Fall 10/ Lecture 6
Non Blocking ACPs
An ACP is called blocking if the occurrence
of some failures forces the DMs to wait until
failures are repaired before terminating the
transaction.
When a transaction is blocked at the DM, its
locks cannot be released. This may lead to
system blocking.
What can we say about network partitions
and blocking?
Yair Amir
19
Fall 10/ Lecture 6
Non Blocking ACPs
An ACP is called blocking if the occurrence
of some failures forces the DMs to wait until
failures are repaired before terminating the
transaction.
When a transaction is blocked at the DM, its
locks cannot be released. This may lead to
system blocking.
Every protocol that tolerates network
partitions is bound to be blocking.
Yair Amir
20
Fall 10/ Lecture 6
Quorum Based Protocols
Every DM has to agree locally.
A majority of the DMs must agree to abort or
commit after all the DMs agreed locally.
Simple majority can be generalized to
weighted majority.
Majority can be generalized to quorum.
Instead of one quorum, there can be an abort
quorum and a commit quorum.
Yair Amir
21
Fall 10/ Lecture 6
3PC State Diagram (no faults)
Skeen - 1982.
Init
Wait
Commit
Abort
Coordinator
Init
Wait
Commit
Abort
Participant
Abort
message
Prepare
Commit
Ready
vote
Abort
vote
send prepare
All voted
Ready
Timeout
or abort
PC
All sent
OK
PC
Yair Amir
22
Fall 10/ Lecture 6
Decision Rule for Recovery
If at least one DM aborted - decide to abort.
If at least one DM committed - decide to commit.
Otherwise if at least one DM in Pre-Commit and a
quorum of DMs in (Pre-Commit and Wait) - move to
Pre-Commit and send prepare commit.
Otherwise if there is a quorum of DMs in (Wait and
Pre-Abort) move to Pre-Abort and send prepare
abort.
Otherwise - Block.
Collected States:
Yair Amir
23
Fall 10/ Lecture 6
3PC Recovery Procedure
Send state and id.
The new coordinator collects the states from
all the connected DMs, it computes its next
step according to the decision rule.
Upon receiving a Prepare-Commit/Prepare
-abort, each DM sends an OK message.
Upon receiving an OK message from a
quorum, the coordinator commits/aborts and
sends the decision.
Yair Amir
24
Fall 10/ Lecture 6
3PC Recovery State Diagram
Skeen - 1982.
Wait
Commit Abort
PA PC
Prepare
Commit
Prepare
Abort
Commit Abort
Yair Amir
25
Fall 10/ Lecture 6
3PC Can Block a Quorum
- Simple majority, 3 DMs,
smallest connected DM is the coordinator.
DM1 DM3 DM2
Prepare
Ready
PC
W W
Prepare
Commit
Prepare
Abort
OK PA
DM1 & DM3
are blocked.
Yair Amir
26
Fall 10/ Lecture 6
Enhanced 3PC Highlights
Uses identical state diagrams as 3PC.
Uses similar communication to 3PC (with
different message contents).
Maintains two additional counters:
Last_elected- the index of the last election this DM
participated in.
Last_attempt - the election number in the last
attempt this DM made to commit or abort.
Uses a different decision rule and recover
procedure.
E3PC:
Keidar & Dolev - 1995.
Yair Amir
27
Fall 10/ Lecture 6
E3PC Decision Rule
If at least one DM aborted - decide abort.
If at least one DM committed - decide commit.
If IMAC and there is a quorum - move to
Prepare-Commit.
If not IMAC and there is a quorum - move to
Prepare-Abort.
Otherwise (i.e. no quorum) - Block
IMAC : a predicate that is true iff all the connected
members with max Last_attempt are in the PC state.
Yair Amir
28
Fall 10/ Lecture 6
E3PC Recovery Procedure
Elect a coordinator - send state and 2 counters.
upon getting the Max_elected from the
coordinator, set Last_elected = Max_elected+1.
If the coordinator decision is not to block
It sets Last_attempt = Last_elected.
move to the calculated state and multicast decision.
Upon receiving Prepare-Commit/Prepare-Abort,
the DM:
Sets Last_attempt = Last_elected.
Changes state to PC or PA and sends OK.
If a fault happens - restart the recovery
procedure, otherwise termination is guaranteed.
Yair Amir
29
Fall 10/ Lecture 6
3EPC Does Not Block a Quorum
- Simple majority, 3 DMs,
smallest connected DM is the coordinator.
(1,0)
(1,0)
(1,1)
(2,0)
DM1 DM3 DM2
Prepare
Ready
PC
W W
Prepare
Commit
Prepare
Abort
OK PA
(2,0)
(2,2)
(1,0)
(2,2)
DM1 and DM3
abort
(last elected, last attempt)
Yair Amir
30
Fall 10/ Lecture 6
Summary
Basic approach: Two Phase Commit:
works.
pays in forced disk writes.
vulnerable to coordinator failure at certain times.
Presumed Abort 2PC:
Saves forced disk writes by invoking lazy writes on
abort.
Presumed Commit 2PC:
Saves forced disk writes by invoking lazy writes on
commit but pays a price at recovery.
Yair Amir
31
Fall 10/ Lecture 6
Summary (cont.)
Basic approach: Two Phase Commit:
works.
pays in forced disk writes.
vulnerable to coordinator failure at certain times.
Three Phase Commit:
pays even more in forced disk writes.
most of the time solves the vulnerability problem of
2PC when quorum exists.
Enhanced Three Phase Commit:
Costs exactly as 3PC, but with better logic.
Always solves the vulnerability problem of 2PC
when quorum exists.