Distributed Mutual Exclusion
Distributed Mutual Exclusion (DME) refers to a problem in distributed systems where multiple
processes, typically running on different machines, must coordinate access to a shared resource in
such a way that only one process can access the resource at a time. The goal is to ensure that the
critical section (the part of the program where shared resources are accessed) is entered by only
one process at a time, despite the fact that the processes may be running concurrently on different
machines in a network.
Key Challenges:
1. No Shared Memory: In distributed systems, processes do not have access to a common
memory or shared variables, which makes it difficult to directly coordinate.
2. Asynchronous Communication: Processes may not run at the same speed and
communication between them is typically asynchronous.
3. Fault Tolerance: The system should still work properly even if some processes fail, or
messages are lost.
Key Concepts:
• Mutual Exclusion: The idea that only one process can be in the critical section at any time.
• Distributed Coordination: The processes must coordinate with each other using some
form of communication (like message passing) to decide which process should enter the
critical section.
Algorithms for Distributed Mutual Exclusion
There are several well-known algorithms for solving the DME problem. Some of the most
important ones are:
1. Lamport's Algorithm (1978)
• Principle: This is based on the concept of logical clocks. Every process maintains a local
clock that increments with each event (including message sends and receives). Each
process sends a request for entering the critical section with a timestamp, and processes
use these timestamps to determine the order in which requests should be granted.
• Operation:
o When a process wants to enter the critical section, it sends a request to all other
processes with its timestamp.
o A process grants entry to the critical section only if it has received the request with
the smallest timestamp, considering both the local timestamps and any requests it
has already made.
o This algorithm ensures that the critical section is accessed in a first-come, first-
served manner.
• Pros:
o Simple and works well in most cases.
o Ensures fairness based on the logical timestamps.
• Cons:
o It can result in high message overhead, as a process needs to communicate with
every other process to request and receive permission.
o It requires each process to maintain a logical clock, which adds complexity.
2. Ricart-Agrawala Algorithm (1981)
• Principle: This algorithm is based on the idea of request and acknowledgment messages.
When a process wants to enter the critical section, it sends a request message to all other
processes and waits for a majority of responses before entering.
• Operation:
o When process PiP_iPi wants to enter the critical section, it sends a request message
to all other processes.
o Each process replies with a "grant" message, or it may send a "deny" message if it
is also requesting access to the critical section.
o The request messages are prioritized based on timestamps or identifiers to ensure
that a process with an earlier request gets priority.
o The process enters the critical section when it receives replies from all other
processes.
• Pros:
o Fairness is ensured by prioritizing requests based on timestamps or identifiers.
o More efficient than Lamport’s algorithm since it uses fewer messages for
synchronization.
• Cons:
o Similar to Lamport’s algorithm, it involves many messages, which can lead to
communication overhead.
o It may be inefficient in highly dynamic environments where processes join or leave
frequently.
3. Maekawa’s Algorithm (1985)
• Principle: This is a quorum-based approach. Instead of broadcasting requests to all other
processes, each process is assigned a subset of processes (called a quorum) that it must
communicate with to gain access to the critical section.
• Operation:
o The system is divided into overlapping quorums. To enter the critical section, a
process needs permission from a majority of processes in its quorum.
o This reduces the total number of messages required for synchronization.
• Pros:
o More efficient than algorithms like Lamport's in terms of communication overhead,
especially in large systems.
o Reduces the number of messages because each process does not need to
communicate with all other processes.
• Cons:
o The design and management of quorums can be complex.
o The system may not be as fault-tolerant as other algorithms, as quorum-based
approaches may fail if a large subset of processes become unavailable.
4. Token-based Algorithms (A Ring-based Algorithm)
• Principle: A token is a special message or object that circulates in the system. Only the
process holding the token is allowed to enter the critical section. This approach avoids the
need for sending multiple requests and grants but requires that a process can get the token
when it wants to enter.
• Operation:
o A single token is maintained in the system, and each process must obtain the token
to enter the critical section.
o The token is passed around in a predefined order or in a way that ensures fairness.
o If a process holding the token crashes, a mechanism must be in place to regenerate
or recover the token.
• Pros:
o Very efficient in terms of communication overhead, as only a single message (the
token) circulates.
o Well-suited for systems where processes need to frequently access the critical
section.
• Cons:
o If the token is lost or the process holding the token fails, the system may need to go
through a recovery mechanism, which can be complex.
o Can be less fault-tolerant if no mechanism for regenerating the token is in place.
Fault Tolerance and Recovery
Distributed systems must handle failures, so fault-tolerant algorithms are essential. Some
algorithms (e.g., Ricart-Agrawala, Lamport) are relatively easy to modify for fault tolerance, but
many require additional mechanisms for recovering from failed processes or lost messages.
Summary
The problem of Distributed Mutual Exclusion is a fundamental challenge in distributed computing,
and various algorithms offer different trade-offs in terms of message complexity, fairness, and fault
tolerance. The choice of algorithm depends on factors like system size, communication overhead,
and the likelihood of process failures. Each algorithm provides a solution to ensure that only one
process enters the critical section at a time, but they vary in terms of their efficiency and scalability.