0% found this document useful (0 votes)
63 views

L20: Replicated State Machines With Paxos: Sam Madden 6.033 Spring 2014

The document discusses replicated state machines and the Paxos algorithm for achieving consensus in distributed systems. It begins by describing the overall goal of building fault-tolerant stateful servers using replicated state machines. It then introduces the key ideas of sending the same commands to all replicas in the same order to keep them in sync, designating a primary node to order transactions, and using a view server to track the current primary. It proceeds to provide examples of how a replicated state machine with a view server would operate. It notes the problem of making the view server fault-tolerant and then introduces Paxos as an algorithm to achieve consensus on values like the next view despite failures. It outlines Paxos' properties and setup with

Uploaded by

rasromeo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

L20: Replicated State Machines With Paxos: Sam Madden 6.033 Spring 2014

The document discusses replicated state machines and the Paxos algorithm for achieving consensus in distributed systems. It begins by describing the overall goal of building fault-tolerant stateful servers using replicated state machines. It then introduces the key ideas of sending the same commands to all replicas in the same order to keep them in sync, designating a primary node to order transactions, and using a view server to track the current primary. It proceeds to provide examples of how a replicated state machine with a view server would operate. It notes the problem of making the view server fault-tolerant and then introduces Paxos as an algorithm to achieve consensus on values like the next view despite failures. It outlines Paxos' properties and setup with

Uploaded by

rasromeo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

L20:

Replicated state machines


with Paxos



Sam Madden
6.033 Spring 2014

Overall Goal
Building a stateful server (e.g., database) that
remains available in the presence of node
failures

Last Eme: Replicated State Machine


(RSM)
Key idea: send all replicas same commands in
same order
This will keep them in sync

Idea: make one node a primary (coordinator)


to order all transacEons and send to others
What if primary fails? Introduce a view server
that keeps track of current primary / replica

RSM w/ View Server Example


Backup

View:
1: R1, R2

Primary

1: R1, R2

Replica 1

View Server

View:
1: R1, R2
Replica 2

Client
View:
1: R1, R2
Send request to
primary

Get view

View:
1: R1, R2
1: R1, R2

Replica 1
Send request to
backup

View Server

View:
1: R1, R2
Replica 2

Client
View:
1: R1, R2

View:
1: R1, R2
1: R1, R2

Replica 1

View Server

View:
1: R1, R2
Replica 2

Client
View:
1: R1, R2
Send request to
primary

View:
1: R1, R2
1: R1, R2
2: R2, --

Replica 1
Send request to
backup

View Server

View:
1: R1, R2
Replica 2

Client
View:
1: R1, R2

View:
1: R1, R2
1: R1, R2
2: R2, --

Replica 1

View Server

View:
1: R1, R2
Replica 2

Client
View:
1: R1, R2

Error!
Send request to
Replica 1

View:
1: R1, R2
1: R1, R2
2: R2, --

Replica 1

Error!

Send request to
Replica 2

View Server

View:
2: R2, --
Replica 2

Moment R2 learns about


new view it becomes ac4ve

Replica 2 refuses to
process request

Client
View:
1: R1, R2

View:
1: R1, R2
1: R1, R2
2: R2, --

Replica 1

View Server

View:
2: R2, --
Replica 2

Client
View:
1: R1, R2
Get view

View:
1: R1, R2
1: R1, R2
2: R2, --

Replica 1

View Server

View:
2: R2, --
Replica 2

Client
View:
2: R2, --
Get view

View:
2: R2, --
1: R1, R2
2: R2, --

Replica 1

View Server

View:
2: R2, --
Replica 2

Client
View:
2: R2, --

Send request to
primary

View:
2: R2, --
1: R1, R2
2: R2, --

Replica 1
Send request to
backup

View Server

View:
2: R2, --
Replica 2

Problem: How to make view server


fault tolerant

Paxos Goal
N >= 3 nodes, trying to agree about some value
(e.g., the next view server for the RSM protocol)

Paxos properEes
l

All nodes agree on a value, despite node failures,


network failures, delays
l
l

E.g., X is the next operaEon to execute


E.g., Y is the next primary

Fault tolerant: succeeds if fewer than N/2 nodes fail


l

Liveness not guaranteed

AssumpEon: nodes are fail-stop

Setup
Servers each run 3 processes:
Proposer proposes new values
Acceptor accepts (or rejects) proposals
Learner client waiEng for new value

Paxos rule
l

If a majority of nodes in an earlier proposal


number accepted a value, later proposals must
accept the same value

State maintained by acceptor:


l
l
l

Np: largest proposal seen in prepare


Na: largest proposal seen in accept
Va: value accepted for proposal Na

State must be persistent across reboot

Propose(V):
choose unique N, > Np
send Prepare(N) to acceptors
if Prepare_OK(Na, Va) from majority:
V' = Va with highest Na, or V if none
send Accept(N, V') to acceptors
if Accept_OK(N) from majority:
send Decided(V') to learners
l

Prepare(N):
if N > Np:
log Np = N
reply Prepare_OK(Na, Va)
Accept(N, V):
if N Np:
log Na = N, log Va = V
reply Accept_OK(Na, Va)

Paxos

Proposer


Np: largest proposal seen


in prepare
Na: largest proposal seen
in accept
Va: value accepted for
proposal Na

Acceptor

Propose(V):
choose unique N, > Np
send Prepare(N) to acceptors
if Prepare_OK(Na, Va) from majority:
V' = Va with highest Na, or V if none
send Accept(N, V') to acceptors
if Accept_OK(N) from majority:
send Decided(V') to learners
l

Prepare(N):
if N > Np:
log Np = N
reply Prepare_OK(Na, Va)
Accept(N, V):
if N Np:
log Na = N, log Va = V
reply Accept_OK(Na, Va)

Paxos

Proposer


Np: largest proposal seen


in prepare
Na: largest proposal seen
in accept
Va: value accepted for
proposal Na

Acceptor

Example 1
Proposers

Acceptors
Prep(10)

Log
Np=10

Prep(10)

Log
Np=10

Log
Np=10

Prep(10)

Proposers

Acceptors
Ok, None

Log
Np=10

Ok, None

Log
Np=10

Log
Np=10

Ok, None

Proposers

Acceptors
Acc(10,x)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Acc(10,x)

Acc(10,x)

Commit point when majority of


acceptors log value
Proposers

Acceptors
Ok

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Ok

Ok

A13 can share with learners


Proposers

Acceptors/Learners
Dec(10,x)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Dec(10,x)

Dec(10,x)

Example 2
Proposers

Acceptors
Prep(10)

Log
Np=10

Prep(10)

Log
Np=10

Log
Np=10

Prep(10)

Proposers

1
2

Acceptors

Prep(11)
Prep(11)

Prep(11)

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Proposers

Acceptors
Ok, None

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Ok, None

2
Ok, None

Proposers

Acceptors
Ok, None

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Ok, None

2
Ok, None

P1 aeempts commit rst, fails


Proposers

Acceptors
Acc(10,x)

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Log
Np=10

Log
Np=11

Acc(10,x)

2
Acc(10,x)

Decided Message Omieed for Brevity

Rejected
because Np >
10

(This is why
we must
record Np)

Example 3
Proposers

Acceptors
Prep(10)

Log
Np=10

Prep(10)

Log
Np=10

Log
Np=10

Prep(10)

Proposers

Acceptors
Ok, None

Log
Np=10

Ok, None

Log
Np=10

Log
Np=10

Ok, None

Proposers

Acceptors
Acc(10,x)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Acc(10,x)

2
Acc(10,x)

Proposers

Acceptors
Ok

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Ok

2
Ok

Proposers

1
2

Acceptors

Prep(11)
Prep(11)

Prep(11)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

New proposer learns previously


commieed value
Proposers

Acceptors
Ok, x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Ok, x

2
Ok, x

Decided Message Omieed for Brevity

Example 4
Proposers

Acceptors
Prep(10)

Log
Np=10

Prep(10)

Log
Np=10

Log
Np=10

Prep(10)

Proposers

Acceptors
Ok, None

Log
Np=10

Ok, None

Log
Np=10

Log
Np=10

Ok, None

Proposers

Acceptors
Acc(10,x)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Acc(10,x)

2
Acc(10,x)

Proposers

Acceptors
Ok

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=10

Ok

Proposers

1
2

Acceptors

Prep(11)
Prep(11)

Prep(11)

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Np=11

P2 learns value is x; has to propagate it


even if hears from non-majority
Proposers

1
2

Acceptors

Ok, x
Ok, x

Ok, none

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Na=10

Log
Va=x

Log
Np=11

Log
Np=10

Log
Np=11

P2 learns commieed value is x, tells A3


Proposers

1
2

Acceptors

Acc(11,x)

Acc(11,x)
Acc(11,x)

Log
Log
Log
Log Log
Log
Np=10 Na=10 Va=x Np=11 Na=11 Va=x

Log
Log
Log Log
Log
Log
Np=10 Na=10 Va=x Np=11 Na=11 Va=x

Log
Log
Log
Log
Np=10 Np=11 Na=11 Va=x

Decided Message Omieed for Brevity

Summary
l

Consistency: single-copy semanEcs

Replicated state machines provide single-copy


l
l

Key issue: agreeing on order of operaEons


Hard case: network parEEon

Paxos allows replicas to reach consensus,


in presence of machine and network failures
l

Widely used in pracEce [Chubby, ZooKeeper, etc.]

You might also like