Distributed Systems Ii Fault-Tolerant Broadcast (CNT.) : Prof Philippas Tsigas
Distributed Systems Ii Fault-Tolerant Broadcast (CNT.) : Prof Philippas Tsigas
DISTRIBUTED SYSTEMS II
FAULT-TOLERANT BROADCAST (CNT.)
T1
T2
F1
F3
F2
Time
C1
C2
C3
P1
P2
2
Figure 11.12
P3
Atomic Broadcast
Requires that all correct processes deliver all
messages in the same order.
Implies that all correct processes see the same view of
the world.
Atomic Broadcast
Theorem: Atomic broadcast is impossible in
asynchronous systems.
Proof:
Equivalent to consensus problem.
Review of Consensus
3
What is Consensus?
N processes
Each process p has
Consensus (II)
All correct processes propose a value, and must agree on a value related
to the proposed values!
Definition: The Consensus problem is specified as follows:
Termination: Every correct process eventually decides some value.
Validity: If all processes that propose a value, propose v, then all
correct processes eventually decide v.
Agreement: If a correct process decides v, then all correct processes
eventually decide v.
Integrity: Every process decides at most once, and if it decides on v
(not NU,) then some some process must have proposed it. (NU is a
special value which stands for no unanimity).
FLP
Theorem: Consensus is impossible in any asynchronous system
if one process can halt. [Fisher, Lynch, Peterson 1985]
Impossibility of distributed consensus with one faulty process
(the original paper)
http://dl.acm.org/citation.cfm?id=214121
A Brief Tour of FLP Impossibility
http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/
Possible Homework Assignment Area
8
Atomic Broadcast
Theorem 1: Any atomic broadcast algorithm solves
consensus.
Everybody does an Atomic Broadcast
Decides first value delivered
Theorem 2: Atomic broadcast is impossible in any
asynchronous system if one process can halt.
Proof: By contradiction using FLP and Theorem 1
9
Figure 11.14
10
Atomic Broadcast
Consensus is solvable in:
Synchronous systems (we will discuss such an
algorithm that works in f+1 rounds) [We will come
back to that!!!]
Certain semi-synchronous systems
Consensus is also solvable in
Asynchronous systems with randomization
Asynchronous systems with failure-detectors [We will
come back to that!!!]
11
12
Teachingmaterial
basedonDistributed
Systems:Concepts
andDesign,Edition3,
AddisonWesley2001.
Viewing:Theseslides
mustbeviewedin
slideshowmode.
11.4
Multicast communication
How
multicast
to
localofarea
network?
Revision
of restrict
IP multicast
(section
4.5.1
page154)
Givecan
twoyou
reasons
fora restricting
thethe
scope
a multicast
message
IP multicast an implementation of group communication
built on top of IP (note IP packets are addressed to computers)
allows the sender to transmit a single IP packet to a set of computers that form a
multicast group (a class D internet address with first 4 bits 1110)
Dynamic membership of groups. Can send to a group with or without joining it
To multicast, send a UDP datagram with a multicast address
To join, make a socket join a group (s.joinGroup(group) - Fig 4.17) enabling it to
receive messages to the group
Multicast routers
Local messages use local multicast capability. Routers make it efficient by
choosing other routers on the way.
Failure model
Omission failures some but not all members may receive a message.
e.g. a recipient may drop message, or a multicast router may fail
IP packets may not arrive in sender order, group members can receive
messages in different orders
14
Manyisprojects
- multicast
Amoeba,
Transis,
Introduction
What
meanttoby[the
term Isis,
broadcast
? Horus (refs p436)
System model
The system consists of a collection of processes which can
communicate reliably over 1-1 channels
Processes fail only by crashing (no arbitrary failures)
Processes are members of groups - which are the
destinations of multicast messages
In general process p can belong to more than one group
Operations
multicast(g, m) sends message m to all members of process group g
deliver (m) is called to get a multicast message delivered. It is different from
receive as it may be delayed to allow for ordering or reliability.
Open
they are useful for notification of events to groups of interested processes
Closed group
Open group
Figure 11.9
17
Reliability
one-to-one
communication(Ch.2 page 57)
validity?
How
do weof
achieve
integrity?
The term reliable 1-1 communication is defined in
terms of validity and integrity as follows:
validity:
any message in the outgoing message buffer is eventually delivered to
the incoming message buffer;
integrity:
the message received is identical to one sent, and no messages are
delivered twice.
validity - by use of acknowledgements and retries
integrity
by use checksums, reject duplicates (e.g. due to retries).
If allowing for malicious users, use security18techniques
What are
ack-implosions?
11.4.1
Basic
multicast
A correct process will eventually deliver the message
provided the multicaster does not crash
note that IP multicast does not give this guarantee
Problem
if the number of processes is large, the protocol will suffer from ack-implosion
processes can
to R-multicast
message,
a process
B-multicasts it to
belong to aseveral
closed
groups
processes in the group including itself
Figure 11.10
when a message is B-delivered, the recipient B-multicasts
it to the group, then R-delivers it. Duplicates are detected.
the
Process
p maintains:
piggybacked
values in a message allow recipients to learn about
Spg a message
sequence
for each group it belongs to and
messages
they have
not yet number
received
The hold back queue is not necessary for reliability as in the implementation using
IP muilticast, but it simplifies the protocol, allowing sequence numbers to represent
sets of messages. Hold-back queues are also used for ordering protocols.
Message
processing
deliver
Figure 11.11
Incoming
messages
Hold-back
queue
Delivery queue
When delivery
guarantees are
met
24
Causal ordering
If multicast(g, m) multicast(g,m ), where is the happened-before relation
between messages in group g, then any correct process that delivers m will
deliver m before m .
Total ordering
If a correct process delivers message m before it delivers m, then any other
correct process that delivers m will deliver m before m.
26
T1
T2
F1
Ordered
multicast delivery is expensive in bandwidth and
latency. Therefore the less expensive orderings (e.g.
F
F2
FIFO
or causal) are chosen for applications for which3
they are suitable
Time
C1
C2
C3
P1
P2
27
Figure 11.12
P3
total (makes
the numbers
the same at
all sites)
Bulletinboard: os.interesting
Item
From
Subject
23
A.Hanlon
Mach
24
G.Joseph
Microkernels
25
A.Hanlon
Re:Microkernels
26
T.LHeureux
RPCperformance
27
M.Walker
Re:Mach
end
Figure 11.13
30
Figure 11.14
31
Members
that
do not
multicast
send
messages
(with
a sequence
number)
Discussion
of
sequencer
protocol
What
can
the
sequencer
doheartbeat
about
its
history
buffer
becoming
full?
Members
piggyback
on
theirsome
messages
the
latest
sequence
number
they have
seen
happens
when
member
stops
multicasting?
1 Message
22
2P
P4
3 Agreed Seq
1
2
eq
S
d
se
o
p
o
r
P1
3
Figure 11.15
P3
33
Latency
3 messages are sent in sequence, therefore it has a higher latency than sequencer
method
this ordering may not be causal or FIFO
35
36
Figure 11.16
Note: a process can immediately CO-deliver to
itself its own messages (not shown)
37
Comments
after delivering a message from pj, process pi
updates its vector timestamp
by adding 1 to the jth element of its timestamp
Summary
Multicast communication can specify requirements for reliability and ordering, in
terms of integrity, validity and agreement
B-multicast
a correct process will eventually deliver a message provided the multicaster does not
crash
reliable multicast
in which the correct processes agree on the set of messages to be delivered;
we showed two implementations: over B-multicast and IP multicast
delivery ordering
FIFO, total and causal delivery ordering.
FIFO ordering by means of senders sequence numbers
total ordering by means of a sequencer or by agreement of sequence numbers
between processes in a group
causal ordering by means of vector timestamps
DISTRIBUTED SYSTEMS II
FAULT-TOLERANT AGREEMENT
Teachingmaterial
basedonDistributed
Systems:Concepts
andDesign,Edition3,
AddisonWesley2001.
Viewing:Theseslides
mustbeviewedin
slideshowmode.
Consensus - Agreement
All correct processes propose a value, and must agree on a value related
to the proposed values!
Definition: The Consensus problem is specified as follows:
Termination: Every correct process eventually decides some value.
Validity: If all processes that propose a value, propose v, then all
correct processes eventually decide v.
Agreement: If a correct process decides v, then all correct processes
eventually decide v.
Integrity: Every process decides at most once, and if it decides on v
(not NU,) then some some process must have proposed it. (NU is a
special value which stands for no unanimity).
43
G
Battlefield
Troops
44
Bluearmy
Enemy
Redarmy
<------------------------------->
Blue
G
messengers
Red
G
45
Rules:
Blue and red army must attack
at same time
Blue and red generals synchronize
through messengers
Messengers (messages) can be lost
46
RG
attackat9am
Isthisenough??
47
RG
attackat9am
ack(redgoesat9am)
Isthisenough??
48
RG
attackat9am
ack(redgoesat9am)
gotack
Isthisenough??
49
50
Alternatives??
51
Probabilistic Approach?
Send as many messages as possible, hope one
gets through...
assumebluestarts...
BG
RG
attackat9am
attackat9am
attackat9am
attackat9am
52
Eventual Commit
Eventually both sides attack...
assumebluestarts...
BG
RG
attackASAP
retransmits
retransmits
onmyway!
53
RG
readytoattack?
retransmits
phase1
yes,atyourdisposal
attackASAP
retransmits
ack
phase2
54
Chalmerssurroundedbyarmyunits
ArmieshavetoattacksimultaneouslyinordertoconquerChalmers
Communicationbetweengeneralsbymeansofmessengers
Somegeneralsofthearmiesaretraitors
55
56
Byzantine Empire
Number of processes: n
Maximum number of possibly failing processes: f
Necessary and sufficient condition for a solution to Byzantine
agreement:
f<n/3
Minimal number of rounds in a deterministic solution:
f+1
There exist randomized solutions with a lower expected number of
rounds
58
Senario 1
59
Senario 2
60
61
A:VA=0
E1
B:VB=1
B:VB=0
C:VC=0
A:VA=1
62
A:VA=0
E0
B:VB=1
B:VB=0
C:VC=0
A:VA=1
63
A:VA=0
E1
B:VB=1
B:VB=0
C:VC=0
A:VA=1
64
A:VA=0
B:VB=1
B:VB=0
C:VC=0
A:VA=1
65
Proof
In E0 A and B decide 0
In E1 B and C decide 1
In E2 C has to decide 1 and A has to decide 0,
contradiction!
66