Distributed Systems
Principles and Paradigms
Maarten van Steen
VU Amsterdam, Dept. Computer Science
Room R4.20, steen@[Link]
Chapter 13: Distributed Coordination-Based
Systems
Version: December 2, 2009
1 / 17
Contents
Chapter
01: Introduction
02: Architectures
03: Processes
04: Communication
05: Naming
06: Synchronization
07: Consistency & Replication
08: Fault Tolerance
09: Security
10: Distributed Object-Based Systems
11: Distributed File Systems
12: Distributed Web-Based Systems
13: Distributed Coordination-Based Systems
2 / 17 2 / 17
Coordination-Based Systems 13.1 Coordination Models Coordination-Based Systems 13.1 Coordination Models
Coordination models
Essence
We are trying to separate computation from coordination; coordination
deals with all aspects of communication between processes, as well as
their cooperation.
Couplings
Make a distinction between
Temporal coupling: Are cooperating/communicating processes
alive at the same time?
Referential coupling: Do cooperating/communicating processes
know each other explicitly?
3 / 17 3 / 17
Coordination-Based Systems 13.1 Coordination Models Coordination-Based Systems 13.1 Coordination Models
Coordination models
Temporal
Coupled Decoupled
Coupled Direct Mailbox
Referential
Decoupled Meeting Generative
oriented communication
4 / 17 4 / 17
Coordination-Based Systems 13.2 Architectures Coordination-Based Systems 13.2 Architectures
Architectures: Overview
Essence
A data item is described by means of attributes.
When made available, it is said to be published.
A process interested in reading an item, must provide a subscription: a
description of the items it wants.
Middleware must match published items and subscriptions.
Publisher Subscriber Subscriber
Read/Delivery
Data item Subscription
Notification
Publish/subscribe middleware Match
5 / 17 5 / 17
Coordination-Based Systems 13.2 Architectures Coordination-Based Systems 13.2 Architectures
Example: Jini/Javaspaces
Coordination model
Temporal and referential uncoupling by means of JavaSpaces, a
tuple-based storage system.
A tuple is a typed set of references to objects
Tuples are stored in serialized, that is, marshaled form into a
JavaSpace
To read a tuple, construct a template, with some fields left open
Match a template against a tuple through a field-by-field
comparison
6 / 17 6 / 17
Coordination-Based Systems 13.2 Architectures Coordination-Based Systems 13.2 Architectures
Example: Jini/Javaspaces
A Write A B Write B T Read T
Look for
Insert a Insert a tuple that
copy of A copy of B matches T
B A Return C
A (and optionally
remove it)
B
Tuple instance B C
A JavaSpace
Write: A copy of a tuple (tuple instance) is stored in a JavaSpace
Read: A template is compared to tuple instances; the first match returns a
tuple instance
Take: A template is compared to tuple instances; the first match returns a
tuple instance and removes the matching instance from the JavaSpace
7 / 17 7 / 17
Coordination-Based Systems 13.2 Architectures Coordination-Based Systems 13.2 Architectures
Example: TIB/Rendezvous
Coordination model
Uses of subject-based addressing ⇒ publish-subscribe system.
Receiving a message on subject X is possible only if the receiver had
subscribed to X
Publishing a message on subject X ⇒ message is sent to all (currently
running) subscribers to X .
Publ. on A Subs. to A Subs. to A Subs. to A Subs. to B
Publ. on B Subs. to B
Subj: A Subj: B
RV lib RV lib RV lib RV lib RV lib
RV RV RV RV RV
daemon daemon daemon daemon daemon
Network
Multicast message
on A to subscribers Multicast message on B to subscribers
8 / 17 8 / 17
Coordination-Based Systems 13.2 Architectures Coordination-Based Systems 13.2 Architectures
Example: Lime
Lime
Every node has its own dataspace:
When P and Q are in each other’s proximity, dataspaces become shared
Published data items are stored locally, until removed
P can publish data items from specific process
Reactions describe what to do when a match is found
Transient, shared dataspace
Process Process Process
Local Local Local
dataspace dataspace dataspace
Wireless link
9 / 17 9 / 17
Coordination-Based Systems 13.4 Communication Coordination-Based Systems 13.4 Communication
Content-based routing
Observation
When a coordination-based system is built across a wide-area
network, we need an efficient routing mechanism (centralized solutions
won’t do).
Solution
Naive: Broadcast subscriptions to all nodes in the system and let
servers prepend destination address when data item is published
Refinement: Forward subscriptions to all routers and let them
compute and install filters.
10 / 17 10 / 17
Coordination-Based Systems 13.4 Communication Coordination-Based Systems 13.4 Communication
Content-based routing: naive solution
1
1
2
3 1 R1
5
3 R2
3
3
4
11 / 17 11 / 17
Coordination-Based Systems 13.7 Consistency and Replication Coordination-Based Systems 13.7 Consistency and Replication
Replication: Static approaches
Note
Replicating data items to all machines implies broadcasting removals.
Process doing
Tuple broadcast a write broadcasts
Network
(a)
Process doing a take
examines local JavaSpace Tuple delete Subspaces
Network
(b)
12 / 17 12 / 17
Coordination-Based Systems 13.7 Consistency and Replication Coordination-Based Systems 13.7 Consistency and Replication
Balancing read/write operations
Problem
Find a balance between the costs for reads, and writes/removals ⇒ organize
dataspace as 2D grid
A broadcasts
A C tuple to these
Example machines
A writes a data
item;
B wants to
read it. B
B broadcasts template
to these machines
13 / 17 13 / 17
Coordination-Based Systems 13.7 Consistency and Replication Coordination-Based Systems 13.7 Consistency and Replication
Dynamic replication
Observation: Not all data items are equal
Decide on replication on a per-type basis
Refinement: Let a central component observe read/write patterns and
decide on replication strategy (self-replication)
Application
Distribution
Distribution Invocation Policy
Dataspace Distribution
manager table
slice manager
manager handler
Local OS
To network
14 / 17 14 / 17
Coordination-Based Systems 13.8 Fault Tolerance Coordination-Based Systems 13.8 Fault Tolerance
Fault tolerance
Observation
In many cases, fault tolerance is achieved by using a primary-backup
approach for a central dataspace server.
Refinement
Decide per data type the required availability, and replicate based on
availability of nodes:
MTTF: mean time to failure
MTTR: mean time to repair
Node availability:
MTTF
MTTF + MTTR
Let nodes estimate MTTF and MTTR by logging the current time.
15 / 17 15 / 17
Coordination-Based Systems 13.9 Security Coordination-Based Systems 13.9 Security
Security
Dilemma
We wanted anonymity between processes, but security requires that
we authenticate publishers and subscribers ⇒ we need to trust the
servers that establish the matching between the two.
Information confidentiality: the middleware is not allowed to see
what data is published. In practice, only restricted number of fields
can be used.
Subscription confidentiality: the middleware is not allowed to see
what subscriptions look like. Solution: Match on encrypted data
fields, although this alone will often reveal too much info on
publishers and subscribers.
Publication confidentiality: ensure that specific processes are not
even allowed to see certain messages.
16 / 17 16 / 17
Coordination-Based Systems 13.9 Security Coordination-Based Systems 13.9 Security
Secure decoupling
Solution
Let an accounting service manage keys, and re-encrypt a data item before it
is forwarded to a subscriber ⇒ (1) routers work on encrypted data,
(2) publisher and subscriber need not share a key.
Transform
Obtain encryption key Accounting service (AS) Provide encryption key
Publisher Subscriber
Broker
Message encrypted Publish/subscribe middleware Message encrypted
with publisher's key with subscriber's key
Dilemma
Is security the show-stopper for publish/subscribe systems?
17 / 17 17 / 17