0% found this document useful (0 votes)
10 views

1

The document provides an overview of distributed systems, including their architectures, communication networks, and inherent limitations. It discusses various types of distributed systems such as client-server, multi-tier, peer-to-peer, and service-oriented architectures, along with their characteristics and use cases. Additionally, it covers concepts like Lamport's Logical Clock and the challenges associated with achieving consistency, fault tolerance, and security in distributed environments.

Uploaded by

Pallavi Molaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

1

The document provides an overview of distributed systems, including their architectures, communication networks, and inherent limitations. It discusses various types of distributed systems such as client-server, multi-tier, peer-to-peer, and service-oriented architectures, along with their characteristics and use cases. Additionally, it covers concepts like Lamport's Logical Clock and the challenges associated with achieving consistency, fault tolerance, and security in distributed environments.

Uploaded by

Pallavi Molaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-1: Architectures of Distributed Systems, System Architecture types, issues in distributed

operating systems, communication networks, communication primitives. Theoretical


Foundations, inherent limitations of a distributed system, lamp ports logical clocks, vector
clocks, casual ordering of messages, global state, cuts of a distributed computation, termination
detection. Distributed Mutual Exclusion, introduction, the classification of mutual exclusion and
associated algorithms, a comparative performance analysis.

A distributed system is a collection of computer programs that utilize computational resources


across multiple, separate computation nodes to achieve a common, shared goal. Also known as
distributed computing or distributed databases, it relies on separate nodes to communicate and
synchronize over a common network. These nodes typically represent separate physical
hardware devices but can also represent separate software processes, or other recursive
encapsulated systems. Distributed systems aim to remove bottlenecks or central points of failure
from a system.
Distributed computing systems have the following characteristics:
Resource sharing – A distributed system can share hardware, software, or data
Simultaneous processing – Multiple machines can process the same function simultaneously
Scalability – The computing and processing capacity can scale up as needed when extended to
additional machines
Error detection – Failures can be more easily detected
Transparency – A node can access and communicate with other nodes in the system
Communication networks, communication primitives
Communication networks in distributed systems enable nodes to exchange information and
coordinate their actions. Various communication primitives, or basic operations, are used to
facilitate communication between nodes. Here are some common communication primitives and
an overview of communication networks:
Communication Primitives:
Remote Procedure Call (RPC): Allows a process to execute a procedure (function or method)
on another remote process as if it were a local procedure call.
Use Case: Used for invoking functions on remote servers as if they were local.
Message Passing: Processes communicate by exchanging messages. Message passing can be
either synchronous or asynchronous.
Use Case: Commonly used in distributed systems for inter-process communication.
Publish-Subscribe: A messaging pattern where senders (publishers) of messages do not
program the messages to be sent directly to specific receivers (subscribers).
Use Case: Well-suited for event-driven architectures and decoupled communication.
Request-Reply: A communication pattern where a client sends a request message to a server and
waits for a corresponding reply.
Use Case: Often used in client-server architectures for synchronous communication.
Shared Memory: Processes communicate by reading and writing data to a shared memory
location.
Use Case: Common in parallel computing and multi-core systems.
Socket Communication: Processes communicate over a network using sockets, allowing data to
be sent and received between nodes.
Use Case: Fundamental for network communication in distributed systems.
Barrier Synchronization: Processes synchronize at a designated point (barrier) in their
execution, ensuring that all processes reach the barrier before any can proceed.
Use Case: Used for coordinating activities in parallel or distributed systems.
Transaction: Ensures that a set of operations is executed atomically, consistently, and isolated
from other transactions.
Use Case: Critical for maintaining data consistency in distributed databases.

Communication Networks:
Local Area Network (LAN): Connects nodes in a limited geographical area, such as a single
building or campus.
Wide Area Network (WAN): Spans a larger geographic area, connecting nodes across cities,
countries, or continents.
Internet: A global network of interconnected networks, enabling communication between nodes
worldwide.
Intranet: A private network within an organization that uses internet technologies for internal
communication.
Wireless Networks: Use radio waves or infrared signals for communication, including Wi-Fi,
Bluetooth, and cellular networks.
Satellite Networks: Communication through satellites, suitable for remote or geographically
dispersed areas.
Sensor Networks: Networks of interconnected sensors that collect and transmit data from the
physical world.
Overlay Networks: Virtual networks created on top of existing networks to provide specific
services, such as content delivery networks (CDNs).
Peer-to-Peer Networks: Nodes communicate directly with each other, without a central server,
common in P2P file-sharing systems.
Understanding these communication primitives and networks is crucial for designing and
implementing effective distributed systems, ensuring efficient and reliable communication
between nodes. The choice of communication mechanisms depends on factors like system
requirements, latency constraints, fault tolerance, and scalability needs.
Top of Form

Types of distributed systems.


Client-server: A client-server architecture is broken down into two primary responsibilities. The
client is responsible for the user interface presentation, which then connects over the network to
the server. The server is responsible for handling business logic and state management. A client-
server architecture can easily degrade into a centralized architecture if the server is not made
redundant. A truly distributed client-server setup will have multiple server nodes to distribute
client connections. Most modern client-server architectures are clients that connect to an
encapsulated distributed system on the server.
Multi-tier: A multi-tier architecture expands on the client-server architecture. The server in a
multi-tier architecture is decomposed into further granular nodes, which decouple additional
backend server responsibilities like data processing and data management. These additional
nodes are used to asynchronously process long-running jobs and free up the remaining backend
nodes to focus on responding to client requests, and interfacing with the data store.
Peer-to-peer: In a peer-to-peer distributed system, each node contains the full instance of an
application. There is no node separation of presentation and data processing. A node contains the
presentation layer and data handling layers. The peer nodes may contain the entire state data of
the entire system.
Peer-to-peer systems have the benefit of extreme redundancy. When a peer-to-peer node is
initialized and brought online, it discovers and connects to other peers and synchronizes its local
state with the state from the greater system. This feature means the failure of one node on a peer-
to-peer system won’t disrupt any of the other nodes. It also means that a peer-to-peer system will
persist.
Service-oriented architecture: Service-oriented architecture (SOA) is a predecessor of
microservices. The main difference between SOA and microservices is node scope – the scope of
microservice nodes exist at the feature level. In microservices a node encapsulates the business
logic to handle a specific feature set, such as payment processing. Microservices contain multiple
disparate business logic nodes that interface with independent database nodes. Comparatively,
SOA nodes encapsulate an entire application or enterprise division. The service boundary for
SOA nodes typically includes an entire database system within the node.
Microservices have emerged as a more popular alternative to SOA due to their benefits.
Microservices are more composable, allowing teams to reuse functionality provided by the small
service nodes. Microservices are more robust and enable more dynamic vertical and horizontal
scaling.
Inherent limitations of a distributed system
Distributed systems offer numerous advantages, including scalability, fault tolerance, and
improved performance. However, they also come with inherent challenges and limitations. Some
of the key limitations of distributed systems include:
Network Delays and Latency: Communication between nodes in a distributed system occurs over
a network, introducing delays and latency. The time taken to transmit data between nodes can
impact the overall system performance.
Consistency and Coherence: Achieving consistency and coherence of data across distributed
nodes can be challenging. Maintaining a globally consistent state in the face of concurrent
updates is a complex problem.
Fault Tolerance: While distributed systems are designed to be resilient to failures, achieving true
fault tolerance can be difficult. Failures in hardware, software, or the network can still occur,
leading to data inconsistency or service disruptions.
Complexity of Programming and Debugging: Developing distributed systems is inherently more
complex than building centralized systems. Programmers need to deal with issues such as
concurrency control, data consistency, and communication protocols, making development and
debugging more challenging.
Security Concerns: Distributed systems are susceptible to various security threats, including
unauthorized access, data interception, and malicious attacks. Ensuring the security of
communication and data across the network is a continuous challenge.
Scalability Challenges: While distributed systems can scale horizontally by adding more nodes,
achieving seamless scalability is not always straightforward. Bottlenecks, resource contention,
and load balancing issues can limit scalability.
Data Integrity and Reliability: Ensuring the integrity and reliability of data in a distributed
environment, especially during concurrent updates and network partitions, requires sophisticated
mechanisms such as distributed transactions and consensus algorithms.
Consensus and Coordination: Achieving consensus among distributed nodes is a critical
challenge. Consensus algorithms, such as the Paxos or Raft protocols, are used to ensure that
nodes agree on a consistent state, but they add complexity to the system.
Lack of Global Clock: In a distributed system, maintaining a global clock across all nodes is
impractical. This lack of a synchronized global clock can make it challenging to reason about the
order of events and timestamps.
Data Partitioning and Distribution: Deciding how to partition and distribute data across nodes is
a crucial aspect of distributed system design. Poor data distribution strategies can lead to
imbalances, affecting performance and scalability.
Dependency on Network Stability: The performance and reliability of distributed systems heavily
depend on the stability and efficiency of the underlying network. Network outages or
degradation can impact the overall system's availability and responsiveness.
Higher Operational Complexity: Operating and maintaining a distributed system is more
complex than managing a centralized system. Tasks such as configuration management,
monitoring, and debugging become more challenging in a distributed environment.
Addressing these limitations often involves the use of advanced algorithms, protocols, and
architectural patterns. Despite these challenges, the benefits of distributed systems often
outweigh the drawbacks in scenarios where scalability, fault tolerance, and performance are
critical requirements.

Distributed systems use cases


Many modern applications utilize distributed systems. High traffic web and mobile applications
are distributed systems. Users connect in a client-server manner, where the client is a web
browser or a mobile application. The server is then its own distributed system. Modern web
servers follow a multi-tier system pattern. A load balancer is used to delegate requests to many
server login nodes that communicate over message queue systems.
Kubernetes is a popular tool for distributed systems, since it can create a distributed system from
a collection of containers. The containers create nodes of the distributed system and then
Kubernetes orchestrates network communication between the nodes and also handles the
dynamic horizontal and vertical scaling of nodes in the system.
Another good example of distributed systems are cryptocurrencies like Bitcoin and Ethereum,
which are peer-to-peer distributed systems. Every node in a cryptocurrency network is a self-
contained replication of the full history of the currency ledger. When a currency node is brought
online, it bootstraps by connecting to other nodes and downloading its full copy of the ledger.
Additionally, cryptocurrencies have clients or “wallets” that connect to the ledger nodes via
JSON RPC protocol.
Design issues of the distributed system:
Heterogeneity: Heterogeneity is applied to the network, computer hardware, operating system,
and implementation of different developers. A key component of the heterogeneous distributed
system client-server environment is middleware. Middleware is a set of services that enables
applications and end-user to interact with each other across a heterogeneous distributed system.
Openness: The openness of the distributed system is determined primarily by the degree to
which new resource-sharing services can be made available to the users. Open systems are
characterized by the fact that their key interfaces are published. It is based on a uniform
communication mechanism and published interface for access to shared resources. It can be
constructed from heterogeneous hardware and software.
Scalability: The scalability of the system should remain efficient even with a significant increase
in the number of users and resources connected. It shouldn’t matter if a program has 10 or 100
nodes; performance shouldn’t vary. A distributed system’s scaling requires consideration of a
number of elements, including size, geography, and management.
Security: The security of an information system has three components Confidentially, integrity,
and availability. Encryption protects shared resources and keeps sensitive information secrets
when transmitted.
Failure Handling: When some faults occur in hardware and the software program, it may
produce incorrect results or they may stop before they have completed the intended computation
so corrective measures should to implemented to handle this case. Failure handling is difficult in
distributed systems because the failure is partial i, e, some components fail while others continue
to function.
Concurrency: There is a possibility that several clients will attempt to access a shared resource
at the same time. Multiple users make requests on the same resources, i.e. read, write, and
update. Each resource must be safe in a concurrent environment. Any object that represents a
shared resource in a distributed system must ensure that it operates correctly in a concurrent
environment.
Transparency: Transparency ensures that the distributed system should be perceived as a single
entity by the users or the application programmers rather than a collection of autonomous
systems, which is cooperating. The user should be unaware of where the services are located and
the transfer from a local machine to a remote one should be transparent.

LAMPORT’S LOGICAL CLOCK


Lamport’s Logical Clock was created by Leslie Lamport. It is a procedure to determine the order
of events occurring. It provides a basis for the more advanced Vector Clock Algorithm. Due to
the absence of a Global Clock in a Distributed Operating System Lamport Logical Clock is
needed.
Algorithm:
Happened before relation (->): a -> b, means ‘a’ happened before ‘b’.
Logical Clock: The criteria for the logical clocks are:
[C1]: Ci (a) < Ci(b), [ Ci -> Logical Clock, if ‘a’ happened before ‘b’, then time of ‘a’ will be
less than ‘b’ in a particular process.]
[C2]: Ci(a) < Cj(b), [ Clock value of Ci(a) is less than Cj(b)]
Reference:
Process: Pi
Event: Eij, where i is the process in number and j is jth event in the ith process.
tm: vector time span for message m.
Ci vector clock associated with process Pi, the jth element is Ci[j] and contains Pi‘s latest value
for the current time in process Pj.
d: drift time, generally d is 1.
Implementation Rules[IR]:
[IR1]: If a -> b [‘a’ happened before ‘b’ within the same process] then, Ci(b) = Ci(a) + d
[IR2]: Cj = max (Cj, tm + d) [If there’s more number of processes, then t m = value of Ci(a), Cj =
max value between Cj and tm + d]
For Example:

Take the starting value as 1, since it is the 1st event and there is no incoming value at the starting
point:
e11 = 1
e21 = 1
The value of the next point will go on increasing by d (d = 1), if there is no incoming value i.e.,
to follow [IR1].
e12 = e11 + d = 1 + 1 = 2
e13 = e12 + d = 2 + 1 = 3
e14 = e13 + d = 3 + 1 = 4
e15 = e14 + d = 4 + 1 = 5
e16 = e15 + d = 5 + 1 = 6
e22 = e21 + d = 1 + 1 = 2
e24 = e23 + d = 3 + 1 = 4
e26 = e25 + d = 6 + 1 = 7
When there will be incoming value, then follow [IR2] i.e., take the maximum value
between Cj and Tm + d.
e17 = max (7, 5) = 7, [e16 + d = 6 + 1 = 7, e24 + d = 4 + 1 = 5, maximum among 7 and 5 is 7]
e23 = max (3, 3) = 3, [e22 + d = 2 + 1 = 3, e12 + d = 2 + 1 = 3, maximum among 3 and 3 is 3]
e25 = max (5, 6) = 6, [e24 + 1 = 4 + 1 = 5, e15 + d = 5 + 1 = 6, maximum among 5 and 6 is 6]
Limitation:
In case of [IR1], if a -> b, then C(a) < C(b) -> true.
In case of [IR2], if a -> b, then C(a) < C(b) -> May be true or may not be true.

C
// C program to illustrate the Lamport's
// Logical Clock
#include <stdio.h>
// Function to find the maximum timestamp between 2 events
int max1(int a, int b)
{
// Return the greatest of the two
if (a > b)
return a;
else
return b;
}

// Function to display the logical timestamp


void display (int e1, int e2, int p1[5], int p2[3])
{
int i;

printf ("\nThe time stamps of events in P1:\n");

for (i = 0; i < e1; i++) {


printf("%d ", p1[i]);
}

Printf ("\nThe time stamps of events in P2:\n");

// Print the array p2[]


for (i = 0; i < e2; i++)
printf("%d ", p2[i]);
}

// Function to find the timestamp of events


void lamportLogicalClock (int e1, int e2, int m[5][3])
{
int i, j, k, p1[e1], p2[e2];
// Initialize p1[] and p2[]
for (i = 0; i < e1; i++)
p1[i] = i + 1;
for (i = 0; i < e2; i++)
p2[i] = i + 1;
for (i = 0; i < e2; i++)
printf ("\te2%d", i + 1);
for (i = 0; i < e1; i++) {
printf("\n e1%d \t", i + 1);
for (j = 0; j < e2; j++)
printf ("%d\t", m[i][j]);
}
for (i = 0; i < e1; i++) {
for (j = 0; j < e2; j++) {
if (m[i][j] == 1) {
p2[j] = max1(p2[j], p1[i] + 1);
for (k = j + 1; k < e2; k++)
p2[k] = p2[k - 1] + 1;
}
if (m[i][j] == -1)
{
p1[i] = max1(p1[i], p2[j] + 1);
for (k = i + 1; k < e1; k++)
p1[k] = p1[k - 1] + 1;
}
}
}

Display (e1, e2, p1, p2);


}

// Driver Code
int main ()
{
int e1 = 5, e2 = 3, m[5][3];

/*dep[i][j] = 1, if message is sent from ei to ej


dep[i][j] = -1, if message is received by ei from ej, ep[i][j] = 0, otherwise*/
m[0][0] = 0;
m[0][1] = 0;
m[0][2] = 0;
m[1][0] = 0;
m[1][1] = 0;
m[1][2] = 1;
m[2][0] = 0;
m[2][1] = 0;
m[2][2] = 0;
m[3][0] = 0;
m[3][1] = 0;
m[3][2] = 0;
m[4][0] = 0;
m[4][1] = -1;
m[4][2] = 0;
lamportLogicalClock(e1, e2, m);
return 0;
}

Vector Clock
Vector Clock is an algorithm that generates partial ordering of events and detects causality
violations in a distributed system. These clocks expand on Scalar time to facilitate a causally
consistent view of the distributed system; they detect whether a contributed event has caused
another event in the distributed system. It essentially captures all the causal relationships. This
algorithm helps us label every process with a vector (a list of integers) with an integer for each
local clock of every process within the system. So for N given processes, there will be vector/
array of size N.
Working of vector clock algorithm:
Initially, all the clocks are set to zero.
Every time, an Internal event occurs in a process, the value of the processes’ logical clock in the
vector is incremented by 1
Also, every time a process sends a message, the value of the processes’ logical clock in the
vector is incremented by 1.
Every time, a process receives a message, the value of the processes’ logical clock in the vector
is incremented by 1, and moreover, each element is updated by taking the maximum of the value
in its own vector clock and the value in the vector in the received message (for every element).
Example: Consider a process (P) with a vector size N for each process: the above set of rules
mentioned are to be executed by the vector clock:

The above example depicts the vector clocks mechanism in which the vector clocks are updated
after execution of internal events, the arrows indicate how the values of vectors are sent in
between the processes (P1, P2, P3).
To sum up, Vector clocks algorithms are used in distributed systems to provide a causally
consistent ordering of events but the entire Vector is sent to each process for every message sent,
in order to keep the vector clocks in sync.

Causal Ordering of Messages in Distributed System


Causal ordering of messages is one of the four semantics of multicast communication namely
unordered, totally ordered, causal, and sync-ordered communication. Multicast communication
methods vary according to the message’s reliability guarantee and ordering guarantee. The
causal ordering of messages describes the causal relationship between a message send event and
a message receive event.
For example, if send(M1) -> send(M2) then every recipient of both the messages M1 and M2
must receive the message M1 before receiving the message M2. In Distributed Systems the
causal ordering of messages is not automatically guaranteed.
Reasons that may lead to violation of causal ordering of messages
 It may happen due to a transmission delay.
 Congestion in the network.
 Failure of a system.
Protocols that are used to provide causal ordering of messages
 Birman Schipher Stephenson Protocol
 Schipher Eggli Sandoz Protocol
Both protocol’s algorithm requires that the messages be delivered reliably and both prefer that
there is no network partitioning between the systems. The general idea of both protocols is to
deliver a message to a process only if the message immediately preceding it has been delivered
to the process. Otherwise, the message is not delivered immediately instead it is stored in a
buffer memory until the message preceding it has been delivered.
Casual ordering of messages in distributed systems
In distributed systems, causal ordering of messages is a concept that ensures a consistent and
meaningful order of events, particularly with respect to the communication between different
processes or nodes. Causal ordering helps capture the cause-and-effect relationship between
events, ensuring that events that are causally related are ordered in a way that reflects their
logical dependencies.
Key Aspects of Causal Ordering:
Happened-Before Relationship: At the core of causal ordering is the "happened-before"
relationship. If event A happened before event B, there is a causal relationship between them.
Definition of Causality: If event A directly causes or influences event B, then A is said to
causally precede B. This relationship may not be based on real-time ordering but rather on
logical dependencies.
Lamport Clocks: Lamport clocks are often used to implement causal ordering. Each process
maintains a logical clock, and events are timestamped with these logical clocks. The logical
clocks do not necessarily reflect real time but provide a consistent ordering based on causality.
Vector Clocks: Vector clocks extend the idea of Lamport clocks by assigning a vector of logical
clocks to each process. The vector captures the causality between processes and helps establish a
partial ordering of events.
Example Scenario:
Consider three processes, A, B, and C, in a distributed system:
Process A sends a message to Process B.
Process B sends a message to Process C.
In this scenario, the causal order is established based on the relationships between the events:
Event A (sending a message from A to B) causally precedes Event B (receiving a message at B).
Event B causally precedes Event C (sending a message from B to C).
As a result, we can establish the causal order: A -> B -> C.
Importance of Causal Ordering:
Consistency: Causal ordering ensures that events are ordered in a way that makes sense from a
logical standpoint, preserving the causal relationships between them.
Concurrency Control: It helps in managing concurrency by providing a way to reason about the
order of events in a distributed system.
Correctness of Algorithms: Many distributed algorithms, such as consensus algorithms or
distributed databases, rely on causal ordering to ensure correctness and consistency in their
operation.
Debugging and Understanding: Causal ordering aids in debugging distributed systems and
understanding the sequence of events during development and troubleshooting.
Implementing causal ordering typically involves the use of logical clocks, vector clocks, or other
timestamping mechanisms to capture the ordering of events based on their causal relationships
rather than real-time ordering.
GLOBAL STATE OF DISTRIBUTED SYSTEMS
In the context of distributed systems, the term "global state" refers to the collective state of all
individual components or nodes within the system at a specific point in time. It encompasses the
local states of each component and the communication channels between them.
Key Aspects of Global State in Distributed Systems:
Local States: The local state of each individual node or process in the distributed system. It
includes the variables, data, and status information that characterize the state of that specific
node.
Communication Channels: Information about the messages and communication between
nodes. This involves tracking the messages sent and received, their content, and the state of
communication channels.
Consistency: The global state is considered consistent when it accurately reflects the combined
local states and communication events across all nodes. Achieving consistency is crucial for
reasoning about the system's behavior.
Snapshot: A snapshot of the global state is essentially a snapshot of the entire distributed system
at a particular moment. It captures the state of each node and the messages in transit, providing a
coherent view of the system.
Use Cases and Importance of Global State:
Debugging and Analysis: Understanding the global state is essential for debugging distributed
systems. Analyzing the global state allows developers to identify issues, trace the flow of
information, and pinpoint the cause of problems.
Monitoring: Monitoring the global state helps in real-time observation of the distributed
system's behavior. This is critical for ensuring that the system operates within desired
parameters.
Consistency Checking: Verifying the consistency of the global state helps in ensuring that the
distributed system adheres to the desired properties and constraints.
Failure Recovery: When a failure occurs in a distributed system, understanding the global state
at the time of failure can be instrumental in recovery processes. It helps in determining the
impact of failures and the state of the system that needs to be restored.
Distributed Algorithms: Some distributed algorithms, especially those related to consensus,
fault-tolerance, and coordination, rely on the concept of a global state to ensure correctness and
reliability.
Challenges in Capturing Global State:
Concurrency and Ordering: Capturing a consistent global state in the presence of concurrency
and non-deterministic ordering of events is a challenging aspect.
Communication Overhead: Collecting and maintaining information about the global state
might introduce communication overhead, especially in large-scale distributed systems.
Scalability: Ensuring efficient and scalable mechanisms for capturing and managing the global
state becomes increasingly challenging as the size of the distributed system grows.
Different approaches, such as distributed snapshot algorithms and global state recording
techniques, are employed to capture and maintain the global state in distributed systems. These
techniques aim to strike a balance between accuracy, efficiency, and scalability, considering the
complexities inherent in distributed environments.

CUTS OF A DISTRIBUTED COMPUTATION


In the context of distributed systems, a "cut" refers to a division of the global state of the system
at a specific point in time. This division separates the events into two categories: those that have
occurred and those that have not. The concept of cuts is particularly relevant when discussing
distributed algorithms, debugging, and analyzing the state of the system.
There are different types of cuts used in distributed systems, including consistent cuts, global
states, and distributed snapshots. Let's explore these concepts:
Consistent Cut: A consistent cut is a special type of cut that maintains the causal relationship
between events. In other words, if event A causally precedes event B, then A must be included in
the cut before B. Consistent cuts help capture the logical dependencies between events.
Achieving a consistent cut can be challenging, especially in the presence of concurrent and
asynchronous events.
Global State: The global state of a distributed system represents the collective state of all its
nodes at a specific point in time. It is essentially a cut that includes the local states of individual
nodes and the communication channels between them.
Global states are crucial for debugging, monitoring, and analyzing the behavior of a distributed
system.
Distributed Snapshot: A distributed snapshot is a collection of consistent cuts taken from each
individual node in the system. It represents a coordinated and consistent view of the entire
system's state.
Distributed snapshot algorithms, such as the Chandy-Lamport snapshot algorithm, help in
capturing a global snapshot of a distributed system.
Example Scenario:
Consider a distributed system with three processes, A, B, and C. Each process is responsible for
a specific task, and they communicate with each other. A consistent cut might look like the
following:
Events A1, B2, and C3 have occurred.
Messages from A to B and from B to C are in transit.
This cut reflects the state of the system at that particular moment, capturing the events that have
happened and the ongoing communication.
Importance of Cuts in Distributed Systems:
Debugging: Cuts are essential for debugging distributed systems. They provide snapshots of the
system's state, aiding in the identification and resolution of issues.
Analysis: Analyzing consistent cuts and global states helps in understanding the logical
dependencies between events and the overall behavior of the system.
Concurrency Control: Cuts are used in distributed systems to reason about concurrency and
ensure consistency in the presence of concurrent events.
Distributed Algorithms: Various distributed algorithms rely on the concept of cuts to ensure
correctness and coordination. For example, distributed snapshot algorithms use cuts to capture a
consistent view of the system.
Capturing and understanding cuts are fundamental for designing and analyzing distributed
systems, as they provide insights into the dynamic and concurrent nature of distributed
computations.

TERMINATION DETECTION OF DISTRIBUTED SYSTEMS


Termination detection in distributed systems refers to the process of determining when a
distributed computation has completed or when a certain condition indicating the end of a
computation has been satisfied. Detecting termination is crucial for various reasons, such as
resource management, ensuring that algorithms have completed their tasks, and avoiding
unnecessary waiting.
Approaches to Termination Detection:
Centralized Approaches: In a centralized approach, a central coordinator or manager is
responsible for monitoring the progress of the distributed system and detecting termination.
The central entity collects information from all nodes, and when it determines that the
computation is complete, it signals termination.
Distributed Approaches: Distributed approaches distribute the responsibility of termination
detection among the nodes in the system.
Nodes communicate with each other to exchange information about their states and progress.
Once all nodes agree that termination has occurred, the distributed system is considered
terminated.
Token-Based Approaches: Token-based approaches involve passing a special token or marker
through the processes or nodes in the distributed system.
When a process has completed its part of the computation, it passes the token to another process.
When the token circulates and reaches a designated point, termination is detected.
Local Predicates: Local predicates involve each process or node independently determining
whether it has completed its portion of the computation based on local conditions.
Once a process determines locally that it has finished, it communicates this information to other
processes. Termination is declared when all processes have locally satisfied the termination
condition.
Wait-Die Schemes: Wait-die schemes involve processes indicating their willingness to wait for
other processes to complete before declaring termination.
If a process is waiting for another, and the latter has finished its computation, the waiting process
can proceed. This ensures a synchronized termination.
Challenges and Considerations:
Asynchrony: Dealing with the inherent asynchrony in distributed systems can make termination
detection challenging. Processes operate independently, and messages may be delayed or
delivered out of order.
Fault Tolerance: Termination detection mechanisms need to be resilient to faults, ensuring that
the system can correctly detect termination even in the presence of failures.
Dynamic Systems: In dynamic distributed systems where processes can join or leave
dynamically, adapting termination detection mechanisms to changing system configurations is a
challenge.
Overhead: Some termination detection mechanisms may introduce communication overhead,
impacting the overall performance of the distributed system.
Applications:
Distributed Algorithms: Many distributed algorithms require termination detection to ensure
that the system has reached a consistent state or that a particular task has been completed.
Parallel and Distributed Computing: In parallel and distributed computing environments,
termination detection is crucial for managing resources and signaling when computations are
complete.
Distributed Databases: Termination detection is important in distributed databases to ensure
that distributed transactions have completed and that the system is in a consistent state.
Workflow Systems: In workflow management systems, termination detection is used to
determine when a distributed workflow or business process has reached its end.
Termination detection is a critical aspect of distributed systems, and the choice of a particular
approach depends on the characteristics of the system, the application requirements, and the
desired level of coordination among nodes.

DISTRIBUTED MUTUAL EXCLUSION


Distributed Mutual Exclusion (DME) is a fundamental problem in distributed systems where
multiple processes or nodes share resources, and there is a need to ensure that only one process
accesses a shared resource at a time. Mutual exclusion is a critical requirement for preventing
conflicts, ensuring consistency, and avoiding data corruption in a distributed environment.
Several algorithms have been proposed to address the Distributed Mutual Exclusion problem.
Here are some key approaches:
Centralized Approaches: In a centralized approach, a central coordinator or server is
responsible for managing access to the shared resource. Processes communicate with the central
authority to request permission to access the critical section.
While this approach is straightforward, it introduces a single point of failure and can result in
increased communication overhead.
Distributed Token-Based Approaches: Token-based algorithms involve the passing of a
special token or marker among the processes. A process holding the token has the right to access
the shared resource.
Examples include the Ricart-Agrawala algorithm and the Maekawa algorithm. These algorithms
aim to reduce contention and communication overhead.
Quorum-Based Approaches: Quorum-based algorithms divide the set of processes into
quorums, subsets of processes that overlap. A process can only enter its critical section if it has
permission from a quorum.
The Quorum-based algorithm ensures that only one process from each quorum accesses the
shared resource at any given time.
Lease-Based Approaches: Lease-based algorithms grant leases or time-limited permissions to
processes. Processes must periodically renew their leases to maintain access to the critical
section.
If a process fails or doesn't renew its lease, another process can acquire the lease and access the
resource.
Timestamp-Based Approaches: Timestamp-based algorithms use logical clocks or timestamps
to order process requests. Processes with higher timestamps get priority in accessing the critical
section.
This approach ensures a deterministic order of access to the shared resource.
Voting-Based Approaches: In voting-based algorithms, processes cast votes to indicate their
desire to enter the critical section. If a process receives enough votes, it can enter the critical
section.
This approach allows for a democratic decision-making process.
Challenges and Considerations:
Fault Tolerance: Distributed Mutual Exclusion algorithms must be designed to handle failures,
ensuring that the system remains correct even in the presence of crashed processes or
communication failures.
Scalability: As the number of processes in a distributed system grows, scalability becomes a
concern. The algorithm's efficiency and communication overhead must scale gracefully with the
system size.
Latency: Minimizing latency in accessing the critical section is crucial for performance.
Efficient algorithms aim to reduce delays in granting access while maintaining correctness.
Fairness: Ensuring fairness in granting access to the critical section is important. Unfairness can
lead to some processes being starved, waiting for an extended period to access the shared
resource.
Distributed Mutual Exclusion algorithms are essential for building robust and reliable distributed
systems. The choice of algorithm depends on the specific characteristics of the system, including
fault-tolerance requirements, communication patterns, and the nature of the shared resource.

CLASSIFICATION OF MUTUAL EXCLUSION AND ASSOCIATED ALGORITHMS, A


COMPARATIVE PERFORMANCE ANALYSIS
Mutual Exclusion algorithms in distributed systems can be classified based on various criteria
such as centralized vs. decentralized, token-based vs. quorum-based, lease-based vs. timestamp-
based, and voting-based. Here's a brief classification along with some associated algorithms,
followed by considerations for comparative performance analysis:
1. Centralized vs. Decentralized:
Centralized Mutual Exclusion:
A central coordinator or server manages access to the shared resource.
Example Algorithm: Centralized Token Ring.
Decentralized Mutual Exclusion:
No central coordinator; processes coordinate among themselves.
Example Algorithms: Ricart-Agrawala, Maekawa, Bully Algorithm.
2. Token-Based vs. Quorum-Based:
Token-Based Mutual Exclusion:
Processes pass a token to control access to the critical section.
Example Algorithm: Ricart-Agrawala.
Quorum-Based Mutual Exclusion:
Processes form quorums, and access is granted based on quorum agreement.
Example Algorithm: Quorum-Based Replication.
3. Lease-Based vs. Timestamp-Based:
Lease-Based Mutual Exclusion:
Processes acquire leases for a specified time to access the critical section.
Example Algorithm: Leases in the Google File System.
Timestamp-Based Mutual Exclusion:
Processes use logical clocks or timestamps to order requests.
Example Algorithm: Lamport's Timestamp Algorithm.
4. Voting-Based:
Voting-Based Mutual Exclusion:
Processes cast votes to enter the critical section, and majority voting determines access.
Example Algorithm: Bully Algorithm.
Comparative Performance Analysis Considerations:
Latency: Evaluate the average time it takes for a process to enter the critical section after
making a request. Lower latency is desirable.
Throughput: Measure the number of critical section entries per unit time. Higher throughput
indicates better performance.
Fault Tolerance: Assess how well the algorithm handles process failures, network partitions, or
communication failures without compromising correctness.
Scalability: Examine how the algorithm's performance scales with an increasing number of
processes. Ideally, the algorithm should remain efficient as the system size grows.
Fairness: Analyze whether the algorithm ensures fairness, preventing processes from being
starved of access to the critical section.
Message Complexity: Evaluate the number of messages exchanged between processes for
mutual exclusion. Lower message complexity is generally preferable.
Synchronization Overhead: Consider the overhead introduced by synchronization mechanisms.
Evaluate the impact on the overall system performance.
Ease of Implementation: Assess the complexity and ease of implementation of each algorithm.
Simplicity can be an advantage, especially in terms of system maintenance.
Adaptability to Network Conditions: Consider how well the algorithm adapts to varying
network conditions, including latency and message delivery delays.
Trade-offs: Understand the trade-offs involved in terms of fault tolerance, latency, and
overhead. Some algorithms may prioritize certain aspects over others.
Comparative performance analysis should be conducted in the context of specific application
requirements, system configurations, and expected usage patterns. Additionally, it's essential to
consider the impact of factors like network latency, reliability, and fault tolerance on the overall
performance of mutual exclusion algorithms in distributed systems.
What are the problems with message passing communication model? How these problems
handled in Remote Procedure call? Explain
The message passing communication model and Remote Procedure Call (RPC) are both
mechanisms for communication between processes or distributed systems. Each approach has its
advantages and challenges. Let's discuss some common problems associated with the message
passing communication model and how RPC addresses some of these issues:
Problems with Message Passing Communication Model:
Complexity of Communication: Implementing communication between processes using low-level
message passing can be complex. Developers need to manage details such as message encoding,
decoding, and network communication explicitly.
Solution: Higher-level abstractions provided by RPC frameworks can simplify communication
by abstracting away many of these low-level details.
Data Serialization: In a message passing model, data sent between processes needs to be
serialized into a format suitable for transmission. This serialization and deserialization process
can introduce overhead and may require careful handling of complex data structures.
Solution: RPC frameworks often include built-in support for data serialization, making it easier
for developers to pass complex data structures without worrying about the underlying details.
Error Handling: In a message passing system, handling errors and ensuring reliable
communication can be challenging. Detecting and recovering from communication failures
requires additional effort.
Solution: RPC frameworks often provide mechanisms for error handling, including support for
exception propagation across distributed systems. This helps in managing failures and ensuring
more robust communication.
Service Discovery: In a message passing system, processes need to discover the location and
availability of other processes for communication. This can become a challenge, especially in
dynamic and large-scale distributed environments.
Solution: Some RPC frameworks incorporate service discovery mechanisms, making it easier for
processes to locate and communicate with remote services dynamically.
How RPC Addresses These Problems:
Abstraction: RPC provides a higher-level abstraction for communication compared to low-level
message passing. Developers can define remote procedures and invoke them without dealing
with the intricacies of message encoding and network communication.
Data Serialization: RPC frameworks often include built-in support for data serialization,
simplifying the process of passing complex data structures between processes. This helps in
reducing the overhead associated with manual serialization/deserialization.
Error Handling: RPC frameworks typically provide mechanisms for handling errors, including
the ability to propagate exceptions between the client and server. This enhances the reliability
and robustness of the communication.
Service Discovery: Many RPC frameworks incorporate service discovery mechanisms or can be
integrated with service discovery tools, simplifying the process of locating and connecting with
remote services.
In summary, while the message passing communication model can be powerful, RPC offers a
higher-level abstraction that simplifies many aspects of distributed communication, making it
more convenient for developers to build and maintain distributed systems.

How to achieve mutual exclusion with non-token based algorithms for distributed systems?
Achieving mutual exclusion in a distributed system without using tokens (token-based
algorithms often involve passing a special token among processes to control access to a critical
section) can be challenging but is certainly possible. Non-token-based algorithms typically rely
on other mechanisms, such as logical clocks or timestamps, to coordinate processes and enforce
mutual exclusion. Here's an overview of two well-known non-token-based algorithms for
achieving mutual exclusion:
Ricart-Agrawala Algorithm: Proposed by Glenn Ricart and Ashok Agrawala, this algorithm is
designed to achieve mutual exclusion in a distributed system without using tokens.
It is based on the concept of requesting and receiving permission from other processes before
entering a critical section.
Algorithm Steps:
Requesting Permission: When a process wants to enter the critical section, it sends a request to
all other processes in the system. The request includes the process's timestamp or logical clock
value.
Receiving Permission: When a process receives a request, it replies with permission if it is not
currently in its critical section or if the requesting process has a higher priority (determined by
timestamp or logical clock). If the process is currently in the critical section, it queues the
incoming request.
Entering Critical Section: A process can enter the critical section when it has received replies
from all other processes, granting permission.
Exiting Critical Section: After completing the critical section, the process sends release messages
to all queued requests, allowing them to enter their critical sections.
Properties:
Mutual exclusion is achieved because a process can only enter the critical section when it has
received permission from all other processes.
Deadlock-free, but it assumes a reliable communication network.
Maekawa's Algorithm: Proposed by M. Maekawa, this algorithm is designed to achieve mutual
exclusion with a reduced number of messages compared to Ricart-Agrawala. It uses a voting
mechanism among a subset of processes.
Algorithm Steps:
Requesting Votes: When a process wants to enter the critical section, it sends a request to a
subset of processes. The subset is determined based on a predefined set of relationships between
processes.
Voting: Each process in the subset votes on whether the requesting process can enter the critical
section. A majority of positive votes are required for permission.
Entering Critical Section: The requesting process can enter the critical section if it receives a
majority of positive votes.
Exiting Critical Section:
After completing the critical section, the process sends release messages to the processes that
voted positively, allowing them to enter their critical sections.
Properties:
 Mutual exclusion is achieved through a voting mechanism.
 Reduces the number of messages compared to Ricart-Agrawala.

These algorithms demonstrate that it is possible to achieve mutual exclusion in a distributed


system without relying on tokens. However, the choice of algorithm may depend on specific
requirements, communication patterns, and system characteristics.

Define advanced operating systems. Discuss clearly about i) Operating principles of


architecture driven advanced operating systems. ii) Application driven advanced operating
systems.
Advanced operating systems refer to sophisticated and feature-rich software that provides a
platform for running applications and managing hardware resources in a computer system. These
operating systems go beyond basic functionalities and often incorporate advanced features to
enhance performance, scalability, security, and user experience. Let's discuss the operating
principles of architecture-driven advanced operating systems and application-driven advanced
operating systems:
i) Operating Principles of Architecture-Driven Advanced Operating Systems:
Microkernel Architecture: Divide the operating system into small, modular components, where
the essential functions are placed in a microkernel. Additional services, like file systems or
device drivers, are implemented as user-level processes.
Advantages: Flexibility, modularity, and the ability to upgrade or replace components without
affecting the entire system.
Distributed Systems: Extend the operating system across multiple machines, creating a
distributed operating system. Nodes in the system communicate to share resources and provide a
unified computing environment.
Advantages: Improved resource utilization, fault tolerance, and scalability.
Parallelism and Multithreading: Exploit parallelism in both hardware and software to enhance
performance. Multithreading allows multiple threads to execute concurrently within a process.
Advantages: Increased throughput, responsiveness, and efficient utilization of multicore
processors.
Virtualization: Create virtual instances of hardware resources, such as virtual machines or
containers, to provide isolated environments for applications. This allows multiple operating
systems or applications to run on the same physical hardware.
Advantages: Resource consolidation, improved flexibility, and efficient use of infrastructure.
Real-time Systems: Meet strict timing requirements for critical tasks in real-time applications,
such as industrial control systems or embedded systems.
Advantages: Predictable and timely response to events, critical for applications with stringent
timing constraints.
ii) Application-Driven Advanced Operating Systems:
Specialized Operating Systems: Design operating systems tailored for specific applications or
domains. Examples include real-time operating systems (RTOS), network operating systems, or
multimedia operating systems.
Advantages: Optimization for specific workloads, improved performance, and resource
efficiency.
Mobile and Embedded Systems: Adapt the operating system to the constraints of mobile devices
and embedded systems, considering factors like power consumption, limited resources, and
mobility.
Advantages: Optimized resource usage, extended battery life, and efficient operation in resource-
constrained environments.
Middleware and Service-Oriented Architectures: Incorporate middleware and service-oriented
architectures to support distributed and networked applications. Services can be dynamically
discovered and utilized.
Advantages: Improved flexibility, scalability, and ease of development for distributed
applications.
Cloud Operating Systems: Address the unique challenges of cloud computing environments,
such as resource provisioning, scalability, and dynamic workload management.
Advantages: Efficient resource utilization, on-demand scalability, and seamless integration with
cloud services.
Security and Trustworthy Computing: Integrate advanced security features, such as secure boot,
encryption, and access controls, to protect against various threats.
Advantages: Enhanced data protection, privacy, and resilience against cyber attacks.
In summary, architecture-driven advanced operating systems focus on the underlying structure
and design principles to provide a flexible and extensible foundation. Application-driven
advanced operating systems, on the other hand, tailor their features to meet the specific
requirements of diverse application domains, ensuring optimal performance and resource
utilization.

EXPLAIN IN DETAIL ABOUT THE ARCHITECTURE OF DISTRIBUTED


OPERATING SYSTEM.
A distributed operating system (DOS) is an operating system that runs on multiple machines and
enables them to work together as a single system. The architecture of a distributed operating
system is designed to manage and coordinate the resources, services, and processes across a
network of interconnected computers. Here are key components and concepts in the architecture
of a distributed operating system:
1. Network Communication:
Inter-Process Communication (IPC): Facilitates communication between processes running on
different machines. This involves message passing, remote procedure calls (RPC), and other
communication mechanisms.
Network Protocols: Distributed systems use network protocols like TCP/IP to ensure reliable
and efficient communication between nodes.
2. Resource Management:
Process Management: The distributed OS manages processes across multiple nodes. This
includes process creation, scheduling, and termination.
Memory Management: Distributes memory resources across machines and ensures efficient
utilization.
File System: A distributed file system allows access to files distributed across different nodes.
Examples include Google File System (GFS) and Hadoop Distributed File System (HDFS).
3. Communication Middleware:
Middleware Services: Middleware provides essential services for communication, such as
message queuing, naming services, and distributed object communication.
Remote Procedure Call (RPC): Allows a process to execute procedures on another machine as
if they were local. Middleware handles the details of marshaling and unmarshaling data.
4. Naming and Directory Services:
Name Resolution: Resolving names to addresses is crucial in a distributed environment.
Naming services provide a way to identify resources and processes across the network.
Directory Services: Centralized or distributed directories store information about resources,
facilitating efficient name resolution.
5. Security and Authentication:
Authentication: Ensures that processes and users are who they claim to be. Secure
communication protocols and authentication mechanisms are crucial in a distributed
environment.
Access Control: Distributed systems employ access control mechanisms to regulate access to
resources and data.
6. Distributed Synchronization:
Clock Synchronization: Ensures that clocks across different nodes are synchronized, enabling
coordination and ordering of events.
Distributed Mutual Exclusion: Ensures that only one process accesses a critical section at a
time, even in a distributed environment.
7. Fault Tolerance:
Replication: Replicating data or processes across multiple nodes ensures fault tolerance and
high availability.
Checkpointing: Periodically saving the state of a distributed system allows for recovery in case
of failures.
8. Load Balancing:
Load Distribution: Distributes the load of tasks evenly across machines to ensure optimal
resource utilization.
Dynamic Resource Allocation: Adjusts resources based on changing workloads.
9. Consistency and Replication:
Consistency Models: Define how distributed systems ensure consistency among replicas of
data. Examples include eventual consistency and strong consistency.
Replication Control: Strategies for managing data replicas, ensuring consistency and fault
tolerance.
10. Distributed Coordination:
Consensus Algorithms: Achieving consensus among distributed nodes is critical for decision-
making. Examples include the Paxos algorithm and Raft protocol.
Atomic Transactions: Ensures that a series of operations either complete entirely or have no
effect.
11. Scalability and Performance:
Scalability Strategies: Distributed systems must be designed to scale horizontally, adding more
machines to handle increased load.
Performance Monitoring: Tools for monitoring and optimizing the performance of distributed
systems.
12. User Interface and Interaction:
User-Level Interface: Provides users with a unified view of the distributed system, abstracting
complexities.
Application Programming Interface (API): Enables developers to build distributed
applications using standardized interfaces.
13. Distributed Algorithms:
Algorithms for Coordination: Distributed systems often employ advanced algorithms for
coordination, consensus, and synchronization.
14. Global State and Event Ordering:
Global State Monitoring: Tools and mechanisms for monitoring the global state of a distributed
system.
Event Ordering: Establishes a consistent order of events in a distributed environment.
The architecture of a distributed operating system is complex and involves a combination of
various components to ensure efficient communication, resource management, fault tolerance,
and coordination across a network of interconnected machines. Designing and implementing a
distributed operating system requires addressing the challenges posed by the distributed nature of
the environment, including issues like network delays, failures, and scalability concerns.
EXPLAIN IN BRIEF ABOUT LAMPORTS LOGICAL CLOCK WITH SUITABLE
EXAMPLES.
Lamport's Logical Clock is a concept introduced by computer scientist Leslie Lamport to order
events in a distributed system. It provides a partial ordering of events that reflects the causality
between them, even in the absence of a global clock. Lamport's Logical Clock is often used to
establish a consistent sequence of events in a distributed system.
Lamport Clock Rules:
Initialization: Each process in the system has its own logical clock, initially set to 0.
Event Timestamping: Every time a process experiences an internal event or sends a message, it
increments its logical clock by 1.
Message Reception: When a process receives a message, it sets its logical clock to the
maximum of its own logical clock and the timestamp of the received message plus 1.
Example:
Consider two processes, A and B, in a distributed system. The logical clocks for each process are
initialized to 0.
Events and Timestamps: Process A sends a message to Process B. After sending the message,
Process A increments its logical clock from 0 to 1.
A sends a message (timestamp: 1)
Process B receives the message. Upon receiving the message, Process B sets its logical clock to
the maximum of its current logical clock (0) and the received timestamp (1) plus 1. Thus, B's
logical clock becomes 2.
B receives the message (timestamp: 2)
Independent Events: If Process B performs an internal event (e.g., a local computation) without
any message involvement, it increments its logical clock.
B performs an internal event (timestamp: 3)
Another Message Exchange:If later Process B sends a message to Process A, it increments its
logical clock to 4 (the maximum of its current logical clock and the timestamp of the last internal
event plus 1).
B sends another message (timestamp: 4)
When Process A receives the message, it updates its logical clock to 5 (the maximum of its
current logical clock and the received timestamp plus 1).
A receives another message (timestamp: 5)
Observations:Lamport's Logical Clock ensures that events are partially ordered based on
causality. If Event A causally precedes Event B, the timestamp of A is less than the timestamp of
B.
If two events are independent (neither one causally precedes the other), their timestamps may not
reflect their real-time order.
The logical clocks help establish a consistent global order of events in a distributed system, even
when the processes operate independently and don't share a global clock.
Lamport's Logical Clock provides a simple and effective way to reason about the ordering of
events in a distributed system, especially in scenarios where a fully synchronized global clock is
not feasible.

WHAT IS THE IMPORTANCE OF SEQUENCE NUMBER IN TOKEN BASED


DISTRIBUTED MUTUAL EXCLUSION ALGORITHMS? EXPLAIN IN DETAIL.
In token-based distributed mutual exclusion algorithms, sequence numbers play a crucial role in
establishing a logical order among the processes and ensuring a fair and orderly access to a
shared resource. These sequence numbers are assigned to processes and are used to determine
the priority or eligibility of a process to enter the critical section. The importance of sequence
numbers lies in achieving a systematic and coordinated access to the shared resource. Here's a
detailed explanation of the significance of sequence numbers:
1. Ordering and Priority: Sequence numbers are assigned to processes in a way that establishes a
logical order or priority among them. The order is typically based on the numerical value of the
sequence numbers.
A process with a lower sequence number has a higher priority or precedence over a process with
a higher sequence number.
2. Token Passing Mechanism: The token is a special marker that circulates among the processes.
The possession of the token grants a process the right to enter the critical section.
Sequence numbers are used in the token passing mechanism to determine the order in which
processes receive and release the token.
3. Token Request and Release: When a process wishes to enter the critical section, it sends a
token request message to the other processes. The request includes the process's own sequence
number.
Processes compare the incoming request with their own sequence numbers to determine if the
requesting process has a higher priority.
4. Token Distribution: The token is passed to the process with the lowest sequence number
among those requesting it. This ensures that the process with the highest priority gains access to
the critical section first.
5. Fairness and Avoidance of Starvation: Sequence numbers help in achieving fairness by
establishing a systematic order for process access. The token is passed in an orderly fashion,
preventing any process from being starved of access to the critical section indefinitely.
6. Avoidance of Deadlocks: Sequence numbers contribute to the prevention of deadlocks by
ensuring that processes request and release the token in a well-defined order. Deadlocks can
occur if processes do not agree on a common order.
7. Dynamic Addition of Processes: Sequence numbers facilitate the dynamic addition of
processes to the system. New processes can be assigned sequence numbers based on the existing
order, ensuring that they integrate seamlessly into the token passing mechanism.
8. Deterministic Execution: The use of sequence numbers provides determinism in the execution
of the distributed mutual exclusion algorithm. The order in which processes access the critical
section is predictable and follows a logical sequence.
Example Scenario: Consider three processes: P1, P2, and P3, with sequence numbers 5, 7, and 2,
respectively. The token is initially with P1. When P1 releases the token, P2 receives it because 7
> 2. If P2 then releases the token, P3 receives it because 2 < 5. This cycle continues, ensuring a
fair and orderly access to the critical section.
In summary, sequence numbers in token-based distributed mutual exclusion algorithms are
instrumental in establishing a logical order among processes, determining priority, ensuring
fairness, preventing deadlocks, and facilitating the systematic passing of the token. They
contribute to the overall effectiveness and reliability of the algorithm in coordinating access to
shared resources in a distributed system.
WHAT ARE THE PROBLEMS WITH MESSAGE PASSING COMMUNICATION
MODEL? HOW THESE PROBLEMS HANDLED IN REMOTE PROCEDURE CALL?
EXPLAIN.
The message-passing communication model, where processes communicate by sending and
receiving messages, is widely used in distributed systems. While this model provides a clear and
intuitive way for processes to exchange information, there are several challenges associated with
it. Remote Procedure Call (RPC) is a higher-level abstraction built on top of message passing
that aims to address some of these challenges. Let's explore the problems with the message-
passing model and how RPC helps handle them:
Problems with Message-Passing Communication Model:
Complexity: Writing communication code using low-level message-passing APIs can be
complex and error-prone. Developers need to manage the details of message creation,
serialization, and deserialization.
Data Representation: In a message-passing model, processes need to agree on a common data
representation for messages. This can be challenging when systems with different architectures
or languages need to communicate.
Error Handling: Handling errors and exceptions in message passing requires additional
messaging for error reporting and may complicate the overall communication code.
Resource Management: Managing resources such as sockets, ports, and buffers manually can
be tedious, leading to potential resource leaks or inefficiencies.
Scalability: As the number of processes and communication links increases, managing the
communication relationships and dealing with potential bottlenecks become more challenging.
Remote Procedure Call (RPC) as a Solution:
RPC is a high-level communication abstraction that allows processes to invoke procedures or
functions on remote machines as if they were local. It addresses some of the problems associated
with the message-passing model:
Abstraction: RPC provides a higher-level abstraction where developers can invoke remote
procedures in a manner similar to calling local procedures. This abstraction hides many of the
complexities of low-level message passing.
Interface Definition: RPC typically uses an Interface Definition Language (IDL) to define the
interfaces and data structures shared between client and server. This ensures a common
understanding of data representations.
Stub Code Generation: RPC frameworks often provide tools to generate stub code
automatically. Stubs handle the marshaling and unmarshaling of data, simplifying the
development process.
Error Handling: RPC frameworks often include mechanisms for handling errors transparently.
For example, they may provide exceptions or error return codes to indicate problems during
remote procedure invocation.
Resource Management: RPC frameworks manage many low-level details such as resource
allocation and deallocation, making it easier for developers to focus on application logic.
Scalability: RPC can be built on top of various communication mechanisms, such as HTTP,
which allows for easier integration with existing network infrastructure. This can enhance
scalability and interoperability.
How RPC Works:
Interface Definition: Developers define interfaces using IDL, specifying the procedures that can
be called remotely and the data structures used in communication.
Stub Code Generation: Tools generate client and server stub code from the interface definition.
The client stub handles the marshaling of parameters and sends the request to the server. The
server stub receives the request, unpacks the parameters, and calls the actual procedure.
Data Marshaling: Parameters and return values are automatically marshaled and unmarshaled
by the stub code, handling the conversion between local and network representations.
Communication: RPC can use various communication protocols (e.g., HTTP, TCP) to transmit
messages between the client and server. This abstracts away the low-level details from the
developer.
Transparency: From the developer's perspective, calling a remote procedure is similar to calling
a local one. The complexity of message passing, serialization, and deserialization is hidden
behind the RPC abstraction.
While RPC simplifies many aspects of distributed communication, it's essential to be aware that
it introduces its own set of challenges, such as dealing with partial failures, ensuring
idempotency, and handling distributed transactions. Different RPC frameworks and
implementations address these challenges in various ways.
IN HOW MANY WAYS OPERATING SYSTEM IS DESIGNED WITH SEPARATION
OF POLICIES AND MECHANISMS? EXPLAIN EACH DESIGN APPROACH IN
DETAIL.
The separation of policies and mechanisms is a design principle in operating systems that
involves decoupling the decision-making policies from the low-level implementation
mechanisms. This separation allows for greater flexibility, modularity, and adaptability in the
design of operating systems. There are several ways to achieve this separation, and each
approach emphasizes different aspects of the policy-mechanism distinction. Here are three
common design approaches:
1. Explicit Interface Design: In this approach, the operating system provides explicit interfaces
or APIs (Application Programming Interfaces) for applications and users to interact with the
system. The interfaces encapsulate the mechanisms used to implement various services, and
policies are left to be determined by the applications or users.
Key Characteristics: Well-defined interfaces: The operating system exposes well-defined
interfaces for processes to interact with the system.
Customizable policies: Applications can implement their own policies using the provided
interfaces.
Example: File system interfaces where applications can read, write, and manage files. The
policies regarding file organization, caching, or allocation are left to the application or higher-
level software.
2. Microkernel Architecture: The microkernel architecture is a modular approach where the core
functionality of the operating system is kept minimal, and additional services are implemented as
separate user-level processes or modules. This approach moves many traditional operating
system services out of the kernel into user space.
Key Characteristics: Minimal kernel: The kernel provides only essential services like process
scheduling, interprocess communication, and basic memory management.
Extensible services: Additional services such as file systems, device drivers, and networking are
implemented as separate user-level processes.
Example: In the Mach microkernel, file systems, networking, and other services are
implemented as separate servers running in user space. This allows for easier customization and
replacement of individual services without modifying the kernel.
3. Policy-Driven Design: In a policy-driven design, the operating system is designed to allow
dynamic changes to policies without modifying the underlying mechanisms. Policies are
specified separately from the mechanisms and can be configured or modified at runtime.
Key Characteristics: Policy configurability: Users or administrators can configure policies
without modifying the system's core code.
Run-time adaptability: Changes to policies take effect dynamically without requiring system
restarts.
Example: Linux's sysctl allows users to dynamically adjust kernel parameters at runtime. For
example, changing the scheduling policy, network parameters, or memory management
parameters without recompiling the kernel.
Advantages of Separation of Policies and Mechanisms:
Flexibility and Adaptability:Separating policies from mechanisms allows for easy
customization and adaptation of system behavior without modifying the underlying code.
Modularity: Modularity is enhanced, making it easier to replace or upgrade individual
components or services without affecting the entire system.
Ease of Maintenance: Maintenance becomes more manageable as changes to policies can be
isolated from changes to core mechanisms. This reduces the risk of unintended side effects.
Interchangeability: Different policies can be easily interchanged or experimented with,
providing a way to optimize system behavior for different scenarios or user requirements.
Enhanced Security: Security policies can be configured independently of the underlying
mechanisms, allowing for easier adaptation to evolving security requirements.
Challenges:
Complexity: Achieving a clean separation of policies and mechanisms can introduce additional
complexity, and striking the right balance is crucial.
Performance Overhead: In some cases, the separation might introduce additional layers of
abstraction, potentially leading to a performance overhead.
Consistency: Ensuring consistency between different policies and mechanisms can be
challenging, especially in complex systems.
The separation of policies and mechanisms is a powerful design principle that promotes
flexibility and adaptability in operating systems. Different approaches cater to specific design
goals and requirements, and the choice of an approach depends on the system's intended use and
characteristics.

DEMONSTRATE THE HAPPENED BEFORE LAW USED IN LOGICAL CLOCKS.


AND ALSO EXPLAIN HOW IT IS IMPLEMENTED IN DISTRIBUTED SYSTEMS
THROUGH LAMPORT’S LOGICAL CLOCKS.
Happened-Before Relationship: The "happened-before" relationship is a partial order defined
on events in a distributed system. It captures the causal relationship between events, providing a
way to compare their occurrences. If event A happened before event B, we say "A → B." This
relationship is crucial in Lamport's Logical Clocks, which leverage the happened-before
relationship to order events in a distributed system.
Lamport’s Logical Clocks and Happened-Before:
In Lamport's Logical Clocks, each process maintains a logical clock, and each event is
timestamped with the process's logical clock value at the time the event occurs. The logical clock
provides a partial ordering of events based on the happened-before relationship.
Event Timestamping: When an event occurs at a process, its logical clock is incremented, and
the event is timestamped with the current logical clock value.
Event A at Process P1: (timestamp: 1)
Message Sending: When a process sends a message, the logical clock value is included in the
message.
P1 sends a message to P2 with timestamp 1.
Message Reception: Upon receiving a message, the logical clock of the receiving process is
updated to be the maximum of its current logical clock value and the timestamp received in the
message plus 1.
P2 receives the message, updates its logical clock to 2.
Causal Relationship: If event A at Process P1 has a lower timestamp than event B at Process P2
(A → B), then A causally precedes B. The happened-before relationship is based on these logical
clock timestamps.
(timestamp of A at P1) < (timestamp of B at P2) => A → B
Demonstration:
Consider two processes, P1 and P2, with their respective logical clocks:
Initial State:
P1: Clock = 0 P2: Clock = 0
Events at P1:
Event A at P1:
P1: Clock = 1 Event A: (timestamp: 1)
Event B at P1:
P1: Clock = 2 Event B: (timestamp: 2)
Messages between P1 and P2: 3. P1 sends a message to P2:
P1 sends a message to P2 with timestamp 2.
P2 receives the message and updates its logical clock:
P2: Clock = max(0, 2) + 1 = 3
Events at P2: 5. Event C at P2:
P2: Clock = 4 Event C: (timestamp: 4)
Happened-Before Relationship:
Event A at P1 precedes Event B at P1: A→B
Event B at P1 precedes Message Send from P1 to P2: SendB→Send
Message Send from P1 to P2 precedes Event C at P2: Send→C
Implementation in Distributed Systems:
Lamport's Logical Clocks provide a way to implement the happened-before relationship in
distributed systems without relying on a global clock. Each process maintains its logical clock,
and the timestamps attached to events reflect the partial ordering based on causality.
The key concept is that if A→B, then the timestamp of event A should be less than the
timestamp of event B. The logical clocks are updated based on local events and the receipt of
messages, ensuring that the logical clock values reflect the causal relationship between events in
the system.
This logical clock implementation is fundamental for various distributed algorithms, such as
distributed mutual exclusion, distributed snapshots, and consistency protocols, where
understanding the causality of events is essential for correct coordination and communication.

EXPLAIN THE ARCHITECTURE AND WORKING PRINCIPLES OF LOOSELY


COUPLED AND TIGHTLY COUPLED DISTRIBUTED OPERATING SYSTEMS.
COMPARE THEM.
Loosely Coupled and Tightly Coupled Distributed Operating Systems: Distributed operating
systems can be categorized based on the degree of coupling between the nodes in the system.
The terms "loosely coupled" and "tightly coupled" refer to the level of interdependence and
communication between the nodes in a distributed environment.
1. Loosely Coupled Distributed Operating System:
Architecture: Decentralized Structure: Nodes in a loosely coupled system operate somewhat
independently and have a decentralized structure.
Limited Communication: Communication between nodes is limited, and nodes may not have
direct access to each other's resources or memory.
Autonomy: Nodes maintain a higher degree of autonomy, making decisions independently
without tight coordination.
Message Passing: Communication often occurs through message passing, and each node acts as
a separate entity.
Example: A cluster of workstations or a collection of independent servers where each node
operates independently and communicates only when necessary.
Working Principles:
Asynchronous Communication: Nodes operate asynchronously and may not be synchronized
with a global clock.
Resource Independence: Nodes have independent resource management, and resource sharing
is limited.
Fault Tolerance: Loosely coupled systems are often more fault-tolerant, as the failure of one
node may not significantly impact others.
2. Tightly Coupled Distributed Operating System:
Architecture: Centralized or Semi-Centralized Structure: Tightly coupled systems have a
more centralized or semi-centralized structure with a higher level of interdependence between
nodes.
Shared Resources: Nodes typically have direct access to shared resources like memory and
storage, and coordination is more centralized.
Global Clock: Tightly coupled systems may use a global clock or time synchronization
mechanisms for coordinated execution.
Example: Multi-processor systems, mainframes, or tightly integrated clusters where nodes
collaborate closely and share resources extensively.
Working Principles:
Synchronous Communication: Nodes often operate synchronously, and there is tight
coordination between processes or threads.
Resource Sharing: Resources such as memory and storage are shared among nodes, and
processes can communicate through shared memory or other efficient mechanisms.
Performance Optimization: Tightly coupled systems are often designed for optimized
performance, taking advantage of shared resources and parallel processing capabilities.
Comparison:
Communication: Loosely Coupled: Limited and asynchronous communication through
message passing.
Tightly Coupled: More extensive and often synchronous communication, direct access to shared
resources.
Autonomy: Loosely Coupled: Nodes operate with a higher degree of autonomy, making more
independent decisions.
Tightly Coupled: Coordination is more centralized, and nodes may have less autonomy.
Resource Sharing: Loosely Coupled: Limited resource sharing, each node may have its own
resources.
Tightly Coupled: Extensive resource sharing, with direct access to shared resources.
Fault Tolerance: Loosely Coupled: Generally more fault-tolerant, as the failure of one node
may not impact others significantly.
Tightly Coupled: Greater interdependence can lead to more significant impacts in case of
failures.
Global Clock: Loosely Coupled: Typically no global clock or synchronization requirement.
Tightly Coupled: May use a global clock or synchronized clocks for coordinated execution.
Example Systems: Loosely Coupled: Cluster of workstations, collection of independent
servers.
Tightly Coupled: Multi-processor systems, mainframes, tightly integrated clusters.
Considerations:
Flexibility vs. Performance:
Loosely coupled systems provide more flexibility and fault tolerance but may sacrifice
performance.
Tightly coupled systems optimize performance but may have less flexibility and be more
susceptible to the impact of failures.
Scalability: Loosely coupled systems are often more scalable as new nodes can be added
independently.
Tightly coupled systems may face scalability challenges due to increased interdependence.
Application Requirements: The choice between loosely coupled and tightly coupled systems
depends on the specific requirements of the distributed application, including performance, fault
tolerance, and resource utilization.
In summary, the choice between loosely coupled and tightly coupled distributed operating
systems depends on the specific characteristics and requirements of the application or system
being designed. Different use cases may benefit from different degrees of coupling based on
factors such as flexibility, fault tolerance, performance, and scalability.

EXPLAIN THE ENFORCEMENT OF MUTUAL EXCLUSION IN DISTRIBUTED


SYSTEMS. DESCRIBE THE REQUIREMENTS OF IT WITH THE
IMPLEMENTATION OF RICART AND AGRAWALA’S ALGORITHM.
Mutual exclusion is a fundamental requirement in distributed systems to ensure that concurrent
processes or nodes do not interfere with each other while accessing shared resources or critical
sections. The goal is to guarantee that only one process at a time can execute a critical section,
preventing conflicts and ensuring the correctness of the distributed application. Ricart and
Agrawala's Algorithm is one of the classical algorithms designed to enforce mutual exclusion in
distributed systems.
Requirements for Mutual Exclusion in Distributed Systems:
Safety: Only one process is allowed to enter the critical section at a time. This property ensures
that conflicting operations do not occur simultaneously.
Liveness: If a process requests entry to the critical section and no other process is currently
executing it, then the requesting process should eventually be granted access.
Fault Tolerance: The algorithm should continue to work correctly even in the presence of
failures, such as process crashes or network partitions.
Fairness: The algorithm should strive to provide fair access to the critical section, ensuring that
all processes have an opportunity to execute it.
Ricart and Agrawala’s Algorithm:
Ricart and Agrawala's Mutual Exclusion Algorithm is a decentralized, token-based algorithm
that ensures mutual exclusion in a distributed environment. The algorithm assumes a system of N
processes, each of which may request access to a critical section.
Basic Idea:
Requesting Entry: When a process wants to enter the critical section, it sends a request message
to all other processes.
Granting Permission: A process, upon receiving a request, compares the timestamp of the
request with its own timestamp. If the requesting process has a higher priority (lower timestamp),
permission is granted immediately; otherwise, the request is deferred.
Release of Critical Section: After a process finishes its critical section, it sends release
messages to all deferred processes, allowing them to enter their critical sections.
Implementation Details:
Timestamps: Each process maintains a Lamport logical clock or a similar mechanism to
generate timestamps.
Request and Reply Messages: Two types of messages: request and reply. A process sends
request messages to all other processes, and upon receiving a request, a process sends a reply
either granting immediate access or deferring the request.
Token-Based Access: The process that holds the token (permission) can enter the critical
section. The token is passed among processes based on their priorities and request timestamps.
Avoiding Deadlocks: If a process holds the token but does not need to enter the critical section,
it forwards the token to the next process in the queue.
Algorithm Steps:
Requesting Entry: When a process P_i wants to enter the critical section, it sends a request
message to all other processes, including its own timestamp T_i.
Receiving Requests: Upon receiving a request from another process P_j:
If P_j is not interested in entering the critical section, it replies immediately.
If P_j is also interested, it compares timestamps:
If T_i < T_j or (T_i = T_j and i < j), P_i grants access to P_j.
Otherwise, P_i defers its decision and replies later.
Entering Critical Section: After receiving replies from all other processes, if P_i has received
permission from all, it enters the critical section.
Releasing Critical Section: After finishing the critical section, P_i sends release messages to all
deferred processes, allowing them to enter their critical sections.
Considerations and Limitations:
 The algorithm provides mutual exclusion while avoiding deadlock.
 It assumes reliable communication channels and does not address Byzantine failures.
 The fairness of the algorithm ensures that processes receive access in the order of their
request timestamps.
Ricart and Agrawala's Algorithm is an effective solution for enforcing mutual exclusion in
distributed systems while considering factors such as fairness and avoiding deadlocks. It is a
foundational algorithm in the field of distributed computing.

WITH NEAT SKETCH EXPLAIN THE FUNCTIONAL COMPONENTS OF


DISTRIBUTED SYSTEM ARCHITECTURE.

In a distributed system architecture, functional components work together to enable


communication, coordination, and resource sharing among distributed entities. Here are the key
functional components with a sketch illustrating their relationships:
Architecture Sketch:
Nodes: Represented as individual computing devices.
Communication Infrastructure: Network links and communication middleware facilitating
inter-node communication.
Middleware: Abstraction layer providing communication services.
Distributed File System: Enabling distributed file sharing.
Naming Services: Facilitating resource identification and addressing.
Security Services: Ensuring the security of data and resources.
Distributed Database: Managing distributed data storage and retrieval.
Load Balancer: Distributing network traffic across nodes.
Distributed Synchronization: Coordinating activities and ensuring synchronization.
Fault Tolerance: Ensuring resilience to failures.
Cluster Management: Managing resources within distributed clusters.
This sketch illustrates the interconnectedness of the functional components in a distributed
system architecture. These components work together to provide a scalable, reliable, and
efficient computing environment in which multiple nodes collaborate to achieve common goals.

You might also like