Distributed System Models
From Coulouris, Dollimore Kindberg and Blair
Distributed Systems: Concepts and Design, 5e
What distributed systems are really about – abstraction (extracting simplicity)
! Extracting simplicity (abstraction) vs. mastering complexity
♦ Network administrators usually called “masters of complexity”
♦ Why? – they are the only ones who know how to run the system
♦ Why? – vertical integration of networking equipment; needs to be
managed individually
♦ Software-Defined Networking (SDN) should be able to change this, but
that’s a different story…
♦ Layering provides abstraction – but that’s the data path, not the network
control plane
! Distributed systems are (mostly) software-based
♦ Easier to extract simplicity through abstraction; horizontalisation more
natural; can build more advanced concepts
♦ Open interfaces is the key – know what is provided by the “lower layers”
and use this abstraction, don’t worry about mastering it!
♦ Of course, we will master (some of it) in this course J
Cloud and Distributed Computing
Architectural Models
! Architecture of a system is its structure in terms of separately
specified components
! Overall goal is to ensure that the structure will meet present and
likely future demands upon it
! Major concerns – Performance, reliability, availability, cost-
effectiveness
! An architectural model simplifies and abstracts the functions of
the individual distributed system components and then considers:
♦ The placement of components across a network of computers, seeking to
define useful patterns for the distribution of data and workload
♦ The interrelationships between components – i.e., their functional
roles and the patterns of communication between them
Cloud and Distributed Computing
Software and hardware service layers in distributed systems
Applications, serv ices
Mi ddleware
Operating sys tem
Platform
Computer and network hardware
Cloud and Distributed Computing
Platform
! Lowest-level hardware and software layers
! Provide services to layers above them
♦ Implemented independently in each computer
! Bring system’s programming interface up to a level
that facilitates communication and coordination
between processes
♦ E.g. Intel x86/Solaris
Cloud and Distributed Computing
System Architectures: Client-Server Model
! Division of responsibilities between system
components and their placement on computers in the
network
♦ Major impact on performance, reliability, and security
! Client processes interact with individual server
processes in separate host computers in order to
access the shared resources that they manage
! Servers may in turn be clients of other servers – e.g. a
web server is often a client of a local file server that
manages the files in which the web pages are stored
! Web servers and most other Internet services are clients
of the Domain Name Service, which translates Internet
domain names to network addresses
Cloud and Distributed Computing
Clients invoke individual servers
Client invocation Server
result
result invocation
Server
Client
Key:
Process: Computer:
Cloud and Distributed Computing
Peer-to-peer Model
! All processes involved in a task or activity play similar roles
♦ Interact cooperatively as peers
♦ No distinction between client and server processes or the computers
upon which they run
! Scales better than client-server
♦ System capacity and bandwidth virtually increased as it is distributed
among many participating entities
♦ Today’s desktop systems have more capacity than yesterday's servers
! File sharing applications (Napster, BitTorrent)
! Variations of the p2p theme are used in a number of application
areas today
♦ Application-level routing, p2p media streaming, etc.
Cloud and Distributed Computing
A distributed application based on peer processes
Peer 2
Peer 1
Applica tion
Applica tion
Sha rable Peer 3
objects
Applica tion
Peer 4
Applica tion
Peers 5 .... N
Cloud and Distributed Computing
Variations of the CS model
! Or, how we get from this:
Cloud and Distributed Computing
Variations of the CS model
! To that:
Cloud and Distributed Computing
Variations of the CS model
! Use of multiple servers and caches to increase
performance, availability and resilience
♦ Exploit data/service partition and replication
! Use of mobile code and mobile agents
♦ Can improve interactive response by performing local
operations at the client
! Thin clients
♦ Low-cost computers with limited hardware resources that are
simple to manage
♦ Hold minimum software locally; download OS and application
SW from server
Cloud and Distributed Computing
A service provided by multiple servers
Serv ice
Serv er
Client
Serv er
Client
Serv er
Cloud and Distributed Computing
Web proxy server
Client Web
server
Prox y
server
Client Web
server
Cloud and Distributed Computing
Web applets
a) c lient reques t results in the downloading of appl et c ode
Client Web
Applet code server
b) c lient interacts with the applet
Web
Client Applet server
e.g., Javascript
AJAX
Cloud and Distributed Computing
Thin clients and compute servers
Compute server
Network computer or PC
Thin network Application
Client Process
Terminals
X11;
Virtual Network Computing (VNC);
Remote desktop protocols
Cloud computing
Cloud and Distributed Computing
Design Requirements for Distributed Architectures
! Performance Issues – arising from limited processing and
communication capacities
♦ Responsiveness – context switch and data transfer between
processes is slow; impacts interactivity; use few software layers; transfer
small-sized data
♦ Throughput – the rate at which computational work is done; the ability of
a distributed system to perform work for all its users – forcing data
through middleware layers can have a negative impact on throughput
♦ Load balancing; Caching and replication
! Quality of service – applies to OS as well as networks
♦ Non-functional properties of systems that affect the quality of the service
experienced by clients and users – reliability, security and
performance
! Dependability
♦ Crucial for safety-critical systems; correctness, security, fault
tolerance
Cloud and Distributed Computing
Fundamental models
! All different system models share some fundamental properties
♦ Composed of processes communicating with one another by sending
messages over a network
! Fundamental models: based on fundamental properties that
allow us to be more specific about their characteristics, failures
and security risks
♦ Address correctness, fault tolerance, QoS
! A system model should address:
♦ What are the main entities in the system?
♦ How do they interact?
♦ Which characteristics affect (their) individual and collective behaviour
! Purpose of a model:
♦ To make explicit all relevant assumptions about the systems
♦ To make generalisations about what is possible or impossible, given
those assumptions (can then formally prove it)
Cloud and Distributed Computing
Fundamental models: Interaction
! Computation occurs within processes
! The processes interact by passing messages, resulting
in communication (information flow) and coordination
(synchronisation and ordering of activities) between
processes
♦ Distributed systems design is concerned especially with these
interactions
! Communication takes place with delays that are often of
considerable duration
! The accuracy with which independent processes can be
coordinated is limited by these delays and by the
difficulty of maintaining a common notion of time
across all computers in a distributed system
Cloud and Distributed Computing
Interaction Model (cont.)
! Performance of communication channels
♦ The delay between the start of a message’s transmission in one process
and the beginning of receipt by another is referred to as latency. It
includes:
" Propagation time: constant; depends on physical length and communication material
" Transmission time: (fairly) variable; depends on message size and bandwidth
" Queuing: (very) variable; depends on network and system load in routers/end-
systems(OS)
♦ The bandwidth of a computer network is the total amount of information
that can be transmitted over it in a given time
" Shared among competing channels
♦ Jitter is the variation in the time taken to deliver a series of messages
! Clocks and timing events
♦ Each computer has own internal clock; can timestamp events
♦ Clocks have different offsets and drift rates; very hard to synchronise
♦ Solutions exist but have limitations (e.g. GPS -> sky visibility)
Cloud and Distributed Computing
Two variants of the interaction model
! Synchronous distributed systems: strong assumption of time
♦ The time to execute each step of a process has known lower and
upper bounds
♦ Each message transmitted over a channel is received within a known
bounded time
♦ Each process has a local clock whose drift rate from real time has a
known bound
♦ Hard to arrive at realistic values for bounds and provide guarantees
! Asynchronous distributed systems; no bounds on:
♦ Process execution speeds
♦ Message transmission delays
♦ Clock drift rates
♦ E.g. the Internet: no intrinsic bound on server or network load; how
long does it take to download a file?
♦ Any solution valid for an asynchronous distributed system also valid
for a synchronous one
Cloud and Distributed Computing
Interaction Model – Event ordering
! In many cases, we are interested in knowing whether an event (sending or
receiving a message) at one process occurred before, after or
concurrently with another event in another process
♦ The execution of a system can be described in terms of events and their
ordering despite the lack of accurate clocks
♦ Consider the following set of exchanges between a group of email users
X, Y, Z, and A on a mailing list
" User X sends a message with the subject Meeting
" Users Y and Z reply by sending a message with the subject Re: Meeting
♦ In real-time, X’s message was sent first; Y reads it and replies; Z reads
both X’s message and Y’s reply and then sends another reply, which
references both X’s and Y’s messages
♦ Due to the independent delays – messages may be delivered in
random order
Cloud and Distributed Computing
Real-time ordering of events
send receive receive
X
m1
m2
send receive
Physical
Y time
receive
send
Z
receive receive
m3 m1 m2
A
receive receive receive
t1 t2 t3
Cloud and Distributed Computing
Fundamental models: Failure
! Correct operation of a distributed system is threatened
whenever a fault occurs in any of the computers on which it
runs or in the network that connects them
♦ The failure model defines and classifies faults
! In a distributed system, both processes and communication
channels may fail – i.e., depart from what is considered
correct or desirable behaviour
! Three types of failures are considered for each type of
component
♦ Omission failures – a process or communication channel fails to
perform actions that it is supposed to do
♦ Arbitrary/Byzantine failures – any type of error may occur
♦ Timing failures – process does not meet its execution deadline
Cloud and Distributed Computing
Failure Model (cont.)
! Process omission failure
♦ Process has crashed – it has halted and will not execute any
further steps of its program
♦ Normal detection approach for a crashed process is to observe
that it repeatedly fails to respond to queries – relies upon the
use of timeouts
♦ Fail-stop behaviour: (other) processes can detect certainly that
a process has crashed
! Communication omission failure
♦ A process p performs a send by inserting message m into its
outgoing message buffer
♦ The communication channel transports m to q’s incoming
message buffer and delivers it
♦ The message buffers are usually provided by the OSes
Cloud and Distributed Computing
Processes and channels
process p process q
send m rec eive
send-omission failure receive-omission failure
Communi cation channel
Outgoing message buffer Inc oming message buffer
channel omission failure
Cloud and Distributed Computing
Failure Model (cont.)
! Arbitrary failures
♦ An arbitrary (or Byzantine) process failure is one in which the
process omits intended processing steps or takes unintended
processing steps
♦ Worst possible failure
" Any type of error can occur
" Cannot be detected by seeing whether the process responds to invocations
" E.g. return a wrong value in response to an invocation
♦ Examples of arbitrary communication failure are:
" Message contents are corrupted
" Non-existent messages are delivered
" Real messages delivered more than once
Cloud and Distributed Computing
Failure Model (cont.)
! Masking failures
♦ Possible to construct reliable services from components that
exhibit failures
♦ A service masks a failure by hiding it altogether or by
converting it into a more acceptable type of failure
♦ E.g., checksums mask corrupted messages: convert and
arbitrary failure to an omission failure
! Reliability of one-to-one communication
♦ Reliable communication service can be built by masking some
of the failures of a basic communication channel
♦ Defined in terms of validity and integrity
" Validity: any message in the outgoing message buffer is eventually delivered
to the incoming message buffer
" Integrity: the message received is identical to the one sent; no messages
delivered twice
Cloud and Distributed Computing
Security Model
! The security of a distributed system can be achieved by
securing the processes and the channels used for their
interactions and by protecting the objects (and resources of
all types) that they encapsulate against unauthorized access
! Object protection achieved via use of the concepts of
principals and access rights
♦ Principal can be a user or a process
♦ Access rights specify who is allowed to perform the operations of an
object; e.g. who is allowed to read or write the state of an object
! A server is responsible for verifying the identity of the
principal behind each invocation and checking that that
identity has sufficient access rights to perform the requested
operation on the particular object invoked
Cloud and Distributed Computing
Objects and principals
Acc ess rights Object
invoc ation
Client
result Serv er
Principal (user) Network Principal (server)
Cloud and Distributed Computing
The enemy
Copy of m
The enemy
m’
Process p m Process q
Communication channel
Cloud and Distributed Computing
Threats
! To processes and channels
♦ Validity, integrity, privacy
! Denial of service
♦ Make excessive and pointless invocations on services or message
transmissions in a network
♦ Results in overload on physical resources; e.g. processing capacity,
network bandwidth
! Mobile code
♦ A problem for any process that receives and executes program code
from elsewhere
♦ Such code may easily play a Trojan horse role (modifies resources
available to the host node, but not to the originator of the code)
! Security and threat models
♦ Basis for the analysis and design of secure systems
♦ Careful analysis of threats (all forms of attack) arising from network,
physical, human environment
♦ Evaluates the risks and consequences of each
Cloud and Distributed Computing
Summary
! Most distributed systems arranged according to one of a variety of
architectural models
! Client-server model prevalent
♦ Use of multiple servers and data partition and replication to accommodate
large demand
! Peer-to-peer model
♦ All processes play similar roles; exploit large number of available resources
! Fundamental models
♦ Interaction: concerned with performance of processes and communication
channels and absence of global clocks
♦ Failure: classifies failures of processes and basic communication
channels
♦ Security: identifies possible threats to processes and communication
channels
Cloud and Distributed Computing