CC_Unit 1
CC_Unit 1
Characteristics Evolution of Cloud Computing SOAP – REST – Basics of Cloud Management Products Google App Engine (GAE)
S-2 SLO-1
Virtualization
Issues in Distributed Systems Cloud Characteristics- Elasticity Full and Para Virtualization Cloud Storage – Provisioning Programming Environment for
S-3 SLO-1
in Cloud Cloud Storage GAE
Lab 1: Practical - ImplementRPC Lab 4: Use Google Lab 7: Create a simple web service Lab 10: Use security tools like Lab13: Install and configure
S-4-S-5 SLO-1 and Bankers algorithm. collaboration tools: Create Google using Python Flask/Java/any ACUNETIX, ETTERCAP to OpenStack all-in-one using
Docs, Sheets and Slides and share it language [Web Service: Client-server model scan web applications on the cloud. Devstack/Packstack.
with otherusers. should be implemented using socket/http].
S-6 SLO-1 Distributed System Model On-demand Provisioning Implementation Levels of Managed and Unmanaged Architecture of GFS
Virtualization Cloud Storage
Request/Reply Protocols NIST Cloud ComputingReference Tools and Mechanisms Cloud Security Overview Case Studies: Openstack, Heroku and
S-7 SLO-1
Architecture Docker Containers
S-8 SLO-1 RMI Architectural Design Challenges Virtualization of CPU Cloud Security Challenges Amazon EC2
Lab 2: Create and distribute aTorrentLab 5: Explore public cloud services Lab 8:Install Oracle Virtual Box/VMwareLab 11: Cloud networks for findingLab 14: Launch VMs in OpenStack
file to share a file in LANlike Amazon, Google,Sales Force, Workstation and create a chat applicationvulnerabilities, verifying leakage ofthroughdashboard.
Environment. Digital Ocean etc [Note: Launch two virtual machines for chatinformation to anunauthorized third
S-9-10 SLO-1
application]. party.
Logical Clocks and CasualOrdering of Deployment Models: Public,Private and Memory – I/O Devices Architecture Design – Virtual AWS
S-11 SLO-1 Events Hybrid Clouds Machine Security
RPC- Election Algorithm Service Models: IaaS- PaaS – Desktop Virtualization Security – Application Security Microsoft Azure
S-12 SLO-1
SaaS
Distributed Mutual Exclusion Benefits of Cloud Computing. Server Virtualization. Data Security Google Compute Engine.
S-13 SLO-1 - Distributed Deadlock Detection
Algorithms
Lab 3: Demonstration andLab 6: Quizzes on different serviceLab 9: Review web servicesimplementation - Lab12: Report submission - Generate Lab 15: OpenStack Dashboard should
assessment of the implementedmodels anddeployment models. ProperConnection should be a detailed report describing be accessed though web browser. Verify
algorithms. Report submission - Comparison ofestablished between the clientand server to make vulnerabilities along with the suitable the working ofinstance by logging into
various services provided by differentuse of theservice offered by the Server. Review actionthat can be taken to remedy the it/pinging the instance.
S-14-15 SLO-1 Cloud Service Providersthe working of application in virtual loopholes.
(configuration of VM, cost, networkenvironment.
bandwidth etc.).
Cloud Computing- Unit I
Distributed Systems
A distributed System is a collection of autonomous computer systems that are physically
separated but are connected by a centralized computer network that is equipped with distributed
system software.
Real-world Applications of Distributed Systems
Cloud Computing
Cloud computing platforms like Amazon Web Services (AWS), Microsoft Azure, and Google
Cloud Platform (GCP) are built on distributed systems. They provide scalable and reliable
infrastructure services, such as storage, computing power, and networking, to businesses and
individuals worldwide.
Online Marketplaces
Amazon is a prime example of an online marketplace that utilizes distributed systems. When a
user searches for a product, Amazon’s distributed database indexes millions of products across
multiple categories and returns search results quickly and accurately
1. Client/Server Systems: Client-Server System is the most basic communication method where
the client sends input to the server and the server replies to the client with an output. The client
requests the server for resources or a task to do, the server allocates the resource or performs the
task and sends the result in the form of a response to the request of the client. Client Server
System can be applied with multiple servers.
3. Middleware: Middleware can be thought of as an application that sits between two separate
applications and provides service to both. It works as a base for different interoperability
applications running on different operating systems. Data can be transferred to other between
others by using this service.
4. Three-tier: Three-tier system uses a separate layer and server for each function of a program.
In this data of the client is stored in the middle tier rather than sorted into the client system or on
their server through which development can be done easily. It includes an Application Layer,
Data Layer, and Presentation Layer. This is mostly used in web or online applications.
5. N-tier: N-tier is also called a multitier distributed system. The N-tier system can contain any
number of functions in the network. N-tier systems contain similar structures to three-tier
architecture. When interoperability sends the request to another application to perform a task or
to provide a service. N-tier is commonly used in web applications and data systems.
● All the nodes in the distributed system are connected to each other. So nodes can easily
share data with other nodes.
● More nodes can easily be added to the distributed system i.e. it can be scaled as required.
● Failure of one node does not lead to the failure of the entire distributed system. Other
nodes can still communicate with each other.
● Resources like printers can be shared with multiple nodes rather than being restricted to
just one.
1
● Overloading may occur in the network if all the nodes of the distributed system try to
send data at once.
● Resource Sharing
Resource sharing means that the existing resources in a distributed system can be accessed or
remotely accessed across multiple computers in the system.
Hardware resources are shared for reductions in cost and convenience. Data is shared for
consistency and exchange of information.
Resources are managed by a software module known as a resource manager. Every resource has
its own management policies and methods.
● Heterogeneity
In distributed systems components can have variety and differences in Networks, Computer
hardware, Operating systems, Programming languages and implementations by different
developers.
● Openness
Openness is concerned with extensions and improvements of distributed systems. The distributed
system must be open in terms of Hardware and Softwares. In order to make a distributed system
open,
2
1. A detailed and well-defined interface of components must be published.
● Concurrency
Concurrency is a property of a system representing the fact that multiple activities are executed
at the same time. The concurrent execution of activities takes place in different components
running on multiple machines as part of a distributed system. In addition, these activities may
perform some kind of interactions among them. Concurrency reduces the latency and
increases the throughput of the distributed system.
● Scalability
Scalability is mainly concerned about how the distributed system handles the growth as the
number of users for the system increases. Mostly we scale the distributed system by adding more
computers in the network. Components should not need to be changed when we scale the system.
Components should be designed in such a way that it is scalable.
● Fault Tolerance
In a distributed system hardware, software, network anything can fail. The system must be
designed in such a way that it is available all the time even after something has failed.
● Transparency
Distributed systems should be perceived by users and application programmers as a whole rather
than as a collection of cooperating components. Transparency can be of various types like
access, location, concurrency, replication, etc.
The banker’s algorithm is a resource allocation and deadlock avoidance algorithm that tests for
safety by simulating the allocation for predetermined maximum possible amounts of all
resources, then makes an “s-state” check to test for possible activities, before deciding whether
allocation should be allowed to continue.
3
a) Why Banker’s algorithm is named so?
Banker’s algorithm is named so because it is used in banking system to check whether loan can
be sanctioned to a person or not.
Suppose there are n number of account holders in a bank and the total sum of their money is S. If
a person applies for a loan then the bank first subtracts the loan amount from the total money that
bank has and if the remaining amount is greater than S then only the loan is sanctioned. It is done
because if all the account holders comes to withdraw their money then the bank can easily do it.
In other words, the bank would never allocate its money in such a way that it can no longer
satisfy the needs of all its customers. The bank would try to be in safe state always.
Following Data structures are used to implement the Banker’s Algorithm:
Let ‘n’ be the number of processes in the system and ‘m’ be the number of resources types.
Available :
● It is a 1-d array of size ‘m’ indicating the number of available resources of each type.
● Available[ j ] = k means there are ‘k’ instances of resource type Rj
Max :
● It is a 2-d array of size ‘n*m’ that defines the maximum demand of each process in a system.
● Max[i, j ] = k means process Pi may request at most ‘k’ instances of resource type Rj.
Allocation :
● It is a 2-d array of size ‘n*m’ that defines the number of resources of each type currently
allocated to each process.
● Allocation[i, j ] = k means process Pi is currently allocated ‘k’ instances of resource type Rj
Need :
● It is a 2-d array of size ‘n*m’ that indicates the remaining resource need of each process.
● Need [ i, j ] = k means process Pi currently need ‘k’ instances of resource type Rj
for its execution.
● Need [ i, j ] = Max [ i, j ] – Allocation [ i, j ]
Allocationi specifies the resources currently allocated to process Pi and Needi specifies the
additional resources that process Pi may still request to complete its task.
Banker’s algorithm consists of Safety algorithm and Resource request algorithm
b) Safety Algorithm
4
The algorithm for finding out whether or not a system is in a safe state can be described as follows:
1) Let Work and Finish be vectors of length ‘m’ and ‘n’ respectively.
Initialize: Work = Available
Finish[i] = false; for i=1, 2, 3, 4….n
2) Find an i such that both
a) Finish[i] = false
b) Needi <= Work
if no such i exists goto step (4)
3) Work = Work + Allocation[i]
Finish[i] = true
goto step (2)
4) if Finish [i] = true for all i
then the system is in a safe state
c) Resource-Request Algorithm
Let Requesti be the request array for process Pi. Requesti [j] = k means process Pi wants k
instances of resource type Rj. When a request for resources is made by process Pi, the following
actions are taken:
d) Example:
Considering a system with five processes P 0 through P4 and three resources of type A, B, C.
Resource type A has 10 instances, B has 5 instances and type C has 7 instances. Suppose at time
t0 following snapshot of the system has been taken:
5
Question1. What will be the content of the Need matrix?
Question2. Is the system in a safe state? If Yes, then what is the safe sequence?
6
Question3. What will happen if process P1 requests one additional instance of resource type
A and two instances of resource type C?
7
We must determine whether this new system state is safe. To do so, we again execute Safety
algorithm on the above data structures.
Hence the new system state is safe, so we can immediately grant the request for process P1 .
Distributed computing is a system where processing and data storage is distributed across
multiple devices or systems, rather than handled by a single central device.
Types of Distributed Computing System Models
i. Physical Model
ii. Architectural Model
iii. Fundamental Model
i. Physical Model
A physical model represents the underlying hardware elements of a distributed system. It
encompasses the hardware composition of a distributed system in terms of computers and other
devices and their interconnections.
Nodes
8
Nodes are the end devices that can process data, execute tasks, and communicate with the other
nodes. These end devices are generally the computers at the user end or can be servers,
workstations, etc.
Links
Links are the communication channels between different nodes and intermediate devices. These
may be wired or wireless.
Middleware
These are the softwares installed and executed on the nodes. By running middleware on each
node, the distributed computing system achieves a decentralised control and decision-making. It
handles various tasks like communication with other nodes, resource management, fault
tolerance, synchronisation of different nodes and security to prevent malicious and unauthorised
access.
Network Topology
This defines the arrangement of nodes and links in the distributed computing system. The most
common network topologies that are implemented are bus, star, mesh, ring or hybrid
Communication Protocols
Communication protocols are the set rules and procedures for transmitting data from in the links.
Examples of these protocols include TCP, UDP, HTTPS, MQTT etc.
Client-Server model
It is a centralised approach in which the clients initiate requests for services and severs respond
by providing those services. It mainly works on the request-response model where the client
sends a request to the server and the server processes it, and responds to the client accordingly.
Peer-to-peer model
It is a decentralised approach in which all the distributed computing nodes, known as peers, are
all the same in terms of computing capabilities and can both request as well as provide services
to other peers.
Layered model
It involves organising the system into multiple layers, where each layer will provision a specific
service
9
Micro-services model
In this system, a complex application or task, is decomposed into multiple independent tasks and
these services running on different servers.
Interaction Model
Distributed computing systems are full of many processes interacting with each other in highly
complex ways. Interaction model provides a framework to understand the mechanisms and
patterns that are used for communication and coordination among various processes.
10
Request/Reply Protocol
Communication Protocols for Remote Procedure Calls are
Request Protocol
Request/Reply Protocol
The Request/Reply/Acknowledgement-Reply Protocol
11
RMI
RMI stands for Remote Method Invocation. It is a mechanism that allows an object residing in
one system (JVM) to access/invoke an object running on another JVM.
RMI is used to build distributed applications; it provides remote communication between Java
programs. It is provided in the package java.rmi.
In an RMI application, we write two programs, a server program (resides on the server) and
a client program (resides on the client).
● Inside the server program, a remote object is created and reference of that object is made
available for the client (using the registry).
● The client program requests the remote objects on the server and tries to invoke its
methods.
The following diagram shows the architecture of an RMI application.
12
Let us now discuss the components of this architecture.
● Transport Layer − This layer connects the client and the server. It manages the existing
connection and also sets up new connections.
● Stub − A stub is a representation (proxy) of the remote object at client. It resides in the
client system; it acts as a gateway for the client program.
● Skeleton − This is the object which resides on the server side. stub communicates with
this skeleton to pass request to the remote object.
● RRL(Remote Reference Layer) − It is the layer which manages the references made
by the client to the remote object.
13
● The result is passed all the way back to the client.
Whenever a client invokes a method that accepts parameters on a remote object, the parameters
are bundled into a message before being sent over the network. These parameters may be of
primitive type or objects. In case of primitive type, the parameters are put together and a header
is attached to it. In case the parameters are objects, then they are serialized. This process is
known as marshalling.
At the server side, the packed parameters are unbundled and then the required method is
invoked. This process is known as unmarshalling.
d) RMI Registry
RMI registry is a namespace on which all server objects are placed. Each time the server creates
an object, it registers this object with the RMIregistry (using bind() or reBind() methods).
These are registered using a unique name known as bind name.
To invoke a remote object, the client needs a reference of that object. At that time, the client
fetches the object from the registry using its bind name (using lookup() method).
The following illustration explains the entire process −
14
e) Goals of RMI
● Logical Clocks refer to implementing a protocol on all machines within your distributed
system, so that the machines are able to maintain consistent ordering of events within
some
virtual timespan.
● Distributed systems may have no physically synchronous global clock, so a logical clock
allows global ordering on events from different processes in such systems.
a) Example
If we go outside then we have made a full plan that at which place we have to go first, second
and so on. We don’t go to second place at first and then the first place. We always maintain the
procedure or an organization that is planned before. In a similar way, we should do the
operations on our PCs one by one in an organized way.
Suppose, we have more than 10 PCs in a distributed system and every PC is doing it’s own work
but then how we make them work together. There comes a solution to this i.e. LOGICAL
CLOCK.
Method-1:
● This means that if one PC has a time 2:00 pm then every PC should have the same time
which is quite not possible. Not every clock can sync at one time. Then we can’t follow
this method.
15
Method-2:
●
Another approach is to assign Timestamps to events.
● Taking the example into consideration, this means if we assign the first place as 1, second
place as 2, third place as 3 and so on. Then we always know that the first place will
always
come first and then so on. Similarly, If we give each PC their individual number than it
will be organized in a way that 1st PC will complete its process first and then second and
so on. But Timestamps will only work as long as they obey causality.
b) Causality
● Taking single PC only if 2 events A and B are occurring one by one then TS(A) < TS(B). If
A has timestamp of 1, then B should have timestamp more than 1, then only happen
before
relationship occurs.
● Taking 2 PCs and event A in P1 (PC.1) and event B in P2 (PC.2) then also the condition will
be TS(A) < TS(B). Taking example- suppose you are sending message to someone at 2:00:00
pm, and the other person is receiving it at 2:00:02 pm.Then it’s obvious that TS(sender) <
TS(receiver).
● Transitive Relation
If, TS(A) <TS(B) and TS(B) <TS(C), then TS(A) < TS(C)
● Concurrent Event
This means that not every process occurs one by one, some processes are made to happen
simultaneously i.e., A || B.
d) Causal ordering
Causal ordering is a vital tool for thinking about distributed systems. Once you understand it, many
other concepts become much simpler.
(i) The fundamental property of distributed systems:
16
Messages sent between machines may arrive zero or more times at any point after they are
sent This is the sole reason that building distributed systems is hard.
For example, because of this property it is impossible for two computers communicating over a
network to agree on the exact time. You can send me a message saying "it is now 10:00:00" but I
don't know how long it took for that message to arrive. We can send messages back and forth all
day but we will never know for sure that we are synchronized.
If we can't agree on the time then we can't always agree on what order things happen in. Suppose
I say "my user logged on at 10:00:00" and you say "my user logged on at 10:00:01". Maybe mine
was first or maybe my clock is just fast relative to yours. The only way to know for sure is if
something connects those two events.
For example, if my user logged on and then sent your user an email and if you received that
email before your user logged on then we know for sure that mine was first.
This concept is called causal ordering and is written like this:
A -> B (event A is causally ordered before event B)
Let's define it a little more formally. We model the world as follows: We have a number of
machines on which we observe a series of events. These events are either specific to one
machine (eg user input) or are communications between machines. We define the causal ordering
of these events by three rules:
If A and B happen on the same machine and A happens before B then A -> B
If I send you some message M and you receive it then (send M) -> (recv M)
17
On a single machine causal ordering is exactly the same as time ordering (actually, on a multi-
core machine the situation is more complicated, but let's forget about that for now).
Between machines causal ordering is conveyed by messages. Since sending messages is the only
way for machines to affect each other this gives rise to a nice property:
If not(A -> B) then A cannot possibly have caused B
Since we don't have a single global time this is the only thing that allows us to reason about
causality in a distributed system. This is really important so let's say it again:
Communication bounds causality.
The lack of a total global order is not just an accidental property of computer systems, it is
a fundamental property of the laws of physics. I claimed that understanding causal order makes
many other concepts much simpler.
(ii) Clocks
Lamport clocks and Vector clocks are data-structures which efficiently approximate the causal
ordering and so can be used by programs to reason about causality.
If A -> B then LC_A < LC_B
⮚ Lamport clocks
• The algorithm follows some simple rules:
• A process increments its counter before each local event (e.g., message sending event);
• When a process sends a message, it includes its counter value with the message after
executing step 1;
• On receiving a message, the counter of the recipient is updated, if necessary, to the
greater of its current counter and the timestamp in the received message. The counter
is then
incremented by 1 before the message is considered received
⮚ Vector clocks
• A vector clock is a data structure used for determining the partial ordering of events in a
distributed system and detecting causality violations.
• Just as in Lamport timestamps, inter-process messages contain the state of the sending
process's logical clock.
• A vector clock of a system of N processes is an array/vector of N logical clocks, one
clock per process; a local "largest possible values" copy of the global clock-array is kept
in each
process.
11. Consistency
18
When mutable state is distributed over multiple machines each machine can receive update
events at different times and in different orders.
If the final state is dependent on the order of updates then the system must choose a single
serialisation of the events, imposing a global total order.
A distributed system is consistent exactly when the outside world can never observe two
different serialisations.
The CAP theorem states that a distributed system can only provide two of three properties
simultaneously: consistency, availability, and partition tolerance. The theorem formalizes the
tradeoff between consistency and availability when there's a partition
Now that we have a basic understanding of the CAP theorem, let’s break down the acronym and
discuss the meanings of consistency, availability, and partition tolerance.
Consistency
In a consistent system, all nodes see the same data simultaneously. If we perform a read
operation on a consistent system, it should return the value of the most recent write operation.
The read should cause all nodes to return the same data. All users see the same data at the same
time, regardless of the node they connect to. When data is written to a single node, it is then
replicated across the other nodes in the system.
Availability
19
When availability is present in a distributed system, it means that the system remains
operational all of the time. Every request will get a response regardless of the individual state
of the nodes. This means that the system will operate even if there are multiple nodes down.
Unlike a consistent system, there’s no guarantee that the response will be the most recent
write operation.
Partition tolerance
The first choice risks violating consistency if some other machine makes the same choice with a
different set of events.
The second violates availability by waiting for every other machine that could possibly have
received a conflicting event before performing the requested action.
There is no need for an actual network partition to happen - the trade-off between availability
and consistency exists whenever communication between components is not instant.
Ordering requires waiting
Even your hardware cannot escape this law. It provides the illusion of synchronous access to
memory at the cost of availabilty. If you want to write fast parallel programs then you need to
understand the messaging model used by the underlying hardware.
a) Distributed Algorithm:
❖ Distributed system is a collection of independent computers that do not share their memory.
❖ Each processor has its own memory and they communicate via communication networks.
20
❖ Communication in networks is implemented in a process on one machine communicating
with a process on other machine.
❖ Many algorithms used in distributed system require a coordinator that performs functions
needed by other processes in the system.
b) Election Algorithms:
➢
Election algorithms choose a process from group of processors to act as a coordinator. If
the coordinator process crashes due to some reasons, then a new coordinator is elected on
other processor.
➢ Election algorithm assumes that every active process in the system has a unique priority
number.
➢ The process with highest priority will be chosen as a new coordinator. Hence, when a
coordinator fails, this algorithm elects that active process which has highest priority
number. Then this number is send to every active process in the distributed system.
We have two election algorithms for two different configurations of distributed system.
1. The Bully Algorithm –
This algorithm applies to system where every process can send a message to every other process
in the system.
Algorithm – Suppose process P sends a message to the coordinator.
1. If coordinator does not respond to it within a time interval T, then it is assumed that
coordinator has failed.
2. Now process P sends election message to every process with high priority number.
3. It waits for responses, if no one responds for time interval T then process P elects itself
as a coordinator.
4. Then it sends a message to all lower priority number processes that it is elected as
their new coordinator.
21
5. However, if an answer is received within time T from any other process Q,
○ (I) Process P again waits for time interval T’ to receive another message
from Q that it has been elected as coordinator.
○ (II) If Q doesn’t responds within time interval T’ then it is assumed to
have failed and algorithm is restarted.
When any process notices that the coordinator is no longer responding to request, it initiates an
ELECTION. The process holds an ELECTION message as follows:
Example:
We start with 6 processes, all directly connected to each other. Process 6 is the leader,
as it has the highest number.
Process 6 fails.
22
Process 3 notices that Process 6 does not respond. So it starts an election, notifying
those processes with ids greater than 3.
Both Process 4 and Process 5 respond, telling Process 3 that they'll take over from here.
23
Process 4 sends election messages to both Process 5 and Process 6.
24
When Process 6 does not respond Process 5 declares itself the winner.
Algorithm –
1. If process P1 detects a coordinator failure, it creates new active list which is empty initially.
It sends election message to its neighbour on right and adds number 1 to its active list.
2. If process P2 receives message elect from processes on left, it responds in 3 ways:
● (I) If message received does not contain 1 in active list then P1 adds 2 to its active list
and forwards the message.
25
● (II) If this is the first election message it has received or sent, P1 creates new active list
with numbers 1 and 2. It then sends election message 1 followed by 2.
● (III) If Process P1 receives its own election message 1 then active list for P1 now
contains numbers of all the active processes in the system. Now Process P1 detects
highest priority number from list and elects it as the new coordinator.
Example:
We start with 6 processes, connected in a logical ring. Process 6 is the leader, as it has the highest
number.
Process 6 fails
26
Process 3 notices that Process 6 does not respond. So it starts an election, sending a message
containing its id to the next node in the ring.
Process 5 passes the message on, adding its own id to the message.
Process 0 passes the message on, adding its own id to the message.
27
Process 1 passes the message on, adding its own id to the message.
Process 4 passes the message on, adding its own id to the message.
28
When Process 3 receives the message back, it knows the message has gone around the ring, as its
own id is in the list. Picking the highest id in the list, it starts the coordinator message "5 is the
leader" around the ring.
29
Process 0 passes on the coordinator message.
30
Process 3 receives the coordinator message, and stops it.
In single computer system, memory and other resources are shared between different processes.
The status of shared resources and the status of users is easily available in the shared memory so
with the help of shared variable (For example: Semaphores) mutual exclusion problem can be
easily solved.
In Distributed systems, we neither have shared memory nor a common physical clock and there
for we cannot solve mutual exclusion problem using shared variables. To eliminate the mutual
exclusion problem in distributed system approach based on message passing is used.
A site in distributed system does not have complete information of state of the system due to lack
of shared memory and a common physical clock.
31
● No Deadlock:
Two or more site should not endlessly wait for any message that will never arrive.
● No Starvation:
Every site who wants to execute critical section should get an opportunity to execute it in
finite time. Any site should not wait indefinitely to execute critical section while other site
are repeatedly executing critical section
● Fairness:
Each site should get a fair chance to execute critical section. Any request to execute critical
section must be executed in the order they are made i.e Critical section execution requests
should be executed in the order of their arrival in the system.
● Fault Tolerance:
In case of failure, it should be able to recognize it by itself in order to continue functioning
without any disruption.
As we know shared variables or a local kernel can not be used to implement mutual exclusion in
distributed systems. Message passing is a way to implement mutual exclusion. Below are the
three approaches based on message passing to implement mutual exclusion in distributed
systems:
Example:
● Suzuki-Kasami’s Broadcast Algorithm
● A site communicates with other sites in order to determine which sites should execute the
critical section next. This requires the exchange of two or more successive rounds of
messages among sites.
32
● This approach uses timestamps instead of sequence numbers to order requests for the
critical section.
● Whenever a site makes a request for a critical section, it gets a timestamp. Timestamp is
also used to resolve any conflict between critical section requests.
● All algorithms which follow a non-token based approach maintain a logical clock.
Logical clocks get updated according to Lamport’s scheme
Example:
● Lamport's algorithm, Ricart–Agrawala algorithm
● Instead of requesting permission to execute the critical section from all other sites, Each
site requests only a subset of sites which is called a quorum.
● Any two subsets of sites or Quorum contains a common site.
● This common site is responsible to ensure mutual exclusion
Distributed deadlocks can occur when distributed transactions or concurrency control are utilized
in distributed systems. It may be identified via a distributed technique like edge chasing or by
creating a global wait-for graph (WFG) from local wait-for graphs at a deadlock detector.
Phantom deadlocks are identified in a distributed system but do not exist due to internal system
delays.
In a distributed system, deadlock cannot be prevented nor avoided because the system is too vast.
As a result, only deadlock detection is possible. The following are required for distributed
system deadlock detection techniques:
1. Progress
2. Safety
1. Centralized Approach
33
Only one resource is responsible for detecting deadlock in the centralized method, and it is
simple and easy to use. Still, the disadvantages include excessive workload on a single node and
single- point failure (i.e., the entire system is dependent on one node, and if that node fails, the
entire system crashes), making the system less reliable.
2. Hierarchical Approach
3. Distributed Approach
In the distributed technique, various nodes work to detect deadlocks. There is no single point of
failure as the workload is equally spread among all nodes. It also helps to increase the speed of
deadlock detection.
1. There are mainly three approaches to handling deadlocks: deadlock prevention, deadlock
avoidance, and deadlock detection.
2. Handling deadlock becomes more complex in distributed systems since no site has
complete knowledge of the system's present state and every inter-site communication
entails a limited and unpredictable latency.
3. The operating system uses the deadlock Avoidance method to determine whether the
system is in a safe or unsafe state. The process must inform the operating system of the
maximum number of resources, and a process may request to complete its execution.
34
5. In distributed systems, this method is highly inefficient and impractical.
6. The presence of cyclical wait needs an examination of the status of process resource
interactions to detect deadlock.
7. The best way to dealing with deadlocks in distributed systems appears to be deadlock
detection.
2. Detecting deadlocks entails tackling two issues: WFG maintenance and searching the
WFG for the presence of cycles.
3. In a distributed system, a cycle may include multiple sites. The search for cycles is highly
dependent on the system's WFG as represented across the system.
2. It includes rolling multiple deadlocked processes and giving their resources to the
blocked processes in the deadlock so that they may resume execution.
35