0% found this document useful (0 votes)
8 views

UNIT I

A Distributed System consists of independent computers (nodes) that work together to enable resource sharing, fault tolerance, and scalability. Key features include resource sharing, concurrency, and transparency, while nodes can be clients, servers, or middleware. The document also discusses various computation paradigms, operating systems, goals, theoretical issues, and algorithms relevant to distributed systems.

Uploaded by

qwertyuiopno013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

UNIT I

A Distributed System consists of independent computers (nodes) that work together to enable resource sharing, fault tolerance, and scalability. Key features include resource sharing, concurrency, and transparency, while nodes can be clients, servers, or middleware. The document also discusses various computation paradigms, operating systems, goals, theoretical issues, and algorithms relevant to distributed systems.

Uploaded by

qwertyuiopno013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

.

Introduction to Distributed Systems


A Distributed System is a collection of independent computers that work together as a single
system. These computers, called nodes, communicate and coordinate their actions through a
network. The key goal is to enable resource sharing, fault tolerance, and scalability.

Features of Distributed Systems:

 Resource Sharing: Multiple users can share hardware and software resources across
different nodes.
 Concurrency: Different computations can be executed simultaneously on different
nodes.
 Scalability: The system can expand by adding more nodes without significant
performance degradation.
 Fault Tolerance: Even if some nodes fail, the system continues functioning properly.
 Transparency: Users perceive the system as a single entity rather than multiple
interconnected machines.
o Access Transparency: Users access resources uniformly.
o Location Transparency: Users don’t need to know where resources are
physically located.
o Replication Transparency: Users do not notice if data is replicated for
performance.
o Failure Transparency: The system recovers automatically from failures.
o Concurrency Transparency: Multiple users can access shared resources
simultaneously.
2. Nodes of a Distributed System
A node in a distributed system is an independent computing entity that participates in the
execution of processes. Nodes communicate via a network to coordinate tasks and share
resources. The different types of nodes in a distributed system are:

Types of Nodes:

 Client Nodes: These nodes request services from servers. Examples include web
browsers accessing websites.
 Server Nodes: Provide services such as file storage, database management, or
computing resources.
 Middleware Nodes: Act as intermediaries to facilitate communication between
clients and servers, ensuring interoperability and security.
 Storage Nodes: Responsible for managing and storing data, commonly used in cloud
storage systems.

Characteristics of Nodes:

 Autonomy: Each node operates independently but follows coordination protocols.


 Heterogeneity: Nodes may have different hardware and operating systems.
 Dynamic Participation: Nodes can join or leave the system dynamically.

3. Distributed Computation Paradigms


Distributed computing follows various paradigms, which define how computation and
communication occur in a distributed environment.

1. Client-Server Model

 The client requests services from a central server.


 The server processes the request and sends back the response.
 Used in web applications, database access, and email services.

Example:

 A user accesses a website via a browser (client), which sends an HTTP request to a
web server.
2. Peer-to-Peer (P2P) Model

 Every node can act as both a client and a server.


 Nodes share resources directly without a central authority.
 Used in file-sharing applications like BitTorrent.

Example:

 A user downloads a file using a P2P protocol where multiple peers contribute parts of
the file.

3. Three-Tier Architecture

 Divides the system into three layers:


1. Presentation Layer (User Interface)
2. Application Layer (Business Logic)
3. Database Layer (Data Storage)
 Used in web applications and enterprise software.
Example:

 A banking application where the UI handles user input, the application layer
processes transactions, and the database stores user account details.

4. Multi-Tier Architecture

 Extends the three-tier model by adding extra layers such as caching, security, or load
balancing.
 Used in large-scale cloud applications.

Example:

 A cloud-based e-commerce platform where an additional caching layer speeds up


page loading.

5. Service-Oriented Architecture (SOA)

 Services communicate over a network using standard protocols such as SOAP or


REST.
 Enables interoperability between different software components.
 Used in microservices architectures.

Example:

 An online travel booking system where separate services handle flight, hotel, and
payment processing.

4. Model of Distributed Systems


A distributed system can be modeled as:

 Physical Models: Depicts actual hardware and interconnections.


 Architectural Models: Defines component interactions (e.g., client-server, peer-to-
peer).
 Fundamental Models: Focuses on aspects like communication, failure, and security.

5. Types of Operating Systems in Distributed Systems


1. Centralized Operating System:
o Single system manages all resources.
o Limited scalability.
o Example: Mainframe systems.
2. Network Operating System (NOS):
o Each computer has its own OS but communicates over a network.
o Provides file sharing and communication services.
o Example: Windows Server, UNIX-based systems.
3. Distributed Operating System (DOS):
oManages resources across multiple computers as a single system.
oProvides load balancing and process migration.
oExample: Google’s Borg, Apache Mesos.
4. Cooperative Autonomous Systems:
o Nodes operate independently but collaborate dynamically.
o Used in IoT networks and self-organizing systems.
o Example: Sensor networks, blockchain networks.

6. Goals of Distributed Systems


The primary goals of a distributed system include:

1. Transparency

 The system should hide the complexity of distributed processes from users.
 Different forms of transparency:
o Access, Location, Replication, Failure, Concurrency.

2. Scalability

 The system should be able to grow in size without significant performance


degradation.
 Methods: Load balancing, data replication, caching.

3. Openness

 The system should be able to integrate different hardware and software components.
 Uses standardized communication protocols.

4. Fault Tolerance

 The system should continue to function despite node or network failures.


 Techniques: Redundancy, checkpointing, failover mechanisms.

5. Resource Sharing

 Enable multiple users to share resources like files, databases, and processing power.

6. Concurrency

 Allow multiple users and processes to execute simultaneously without conflicts.

7. Security

 Protect data and communication between nodes from unauthorized access.


 Techniques: Encryption, authentication, access control.
1. Theoretical Issues in Distributed Systems
1.1 Notions of Time and State

In distributed systems, time and state play a crucial role in coordinating and synchronizing
events across multiple nodes. Since there is no global clock, each node maintains its own
time, leading to challenges in maintaining a consistent view of the system's state.

 Time: Ensures that events in a distributed system occur in a logical order.


 State: Represents the current condition of a system at any given point in time.
 Challenges:
o Lack of global synchronization.
o Message delays causing inconsistency.
o Difficulty in determining event order.

1.2 States and Events in a Distributed System

A state is a snapshot of the system at a particular moment, while an event represents a


change in the system.

Types of Events:

1. Internal Events: Events occurring within a single node without communication with
others.
2. Message Sending Events: When one node sends a message to another.
3. Message Receiving Events: When a node receives and processes a message.

State Transitions:

 The system state changes whenever an event occurs.


 Multiple nodes experiencing state transitions can lead to concurrency issues.

Diagram: States and Events in a Distributed System


+---------+ Send Msg +---------+
| State A | ---------------> | State B |
+---------+ +---------+

1.3 Time, Clocks, and Event Precedence

Since different nodes have their own local clocks, defining the correct order of events
requires logical and physical clock synchronization.

Logical Clocks:

 Introduced by Leslie Lamport to order events in distributed systems.


 Follows the happened-before relation (→):
o If A → B, then A occurred before B.
 Uses Lamport Timestamps (L):
o Each process increments its timestamp before sending a message.
o The receiving process updates its clock to be greater than the sender’s
timestamp.

Example:

Process P1 Process P2
A (1) ----> B (2)
L(A) < L(B) → A happened before B.

Vector Clocks:

 A more refined approach to track causal relationships.


 Each node maintains a vector of timestamps.
 Ensures better event ordering than Lamport timestamps.

Diagram: Logical Clock Event Ordering


P1: 1 --- 2 --- 3
\
P2: 1 --- 2 --- 4

1.4 Recording the State of Distributed Systems

Recording the global state is essential for debugging, fault tolerance, and recovery in
distributed systems.

Methods:

1. Snapshot Algorithm (Chandy-Lamport Algorithm):


o Captures a consistent global state without halting the system.
o Uses markers to coordinate state recording across nodes.
2. Checkpointing:
o Periodically saves the state of nodes.
o Used for fault recovery.

Diagram: Chandy-Lamport Snapshot Algorithm


+---------+ Marker +---------+
| Node A |--------->| Node B |
+---------+ +---------+

2. Election Algorithms
Election algorithms are used in distributed systems to select a coordinator among distributed
nodes.

2.1 Bully Algorithm:


 The node with the highest ID becomes the leader.
 If a node detects a failure in the current leader, it initiates an election.
 The highest-ID node sends messages to lower-ID nodes to announce leadership.

What is the Bully Algorithm for Leader Node Election?

The Bully algorithm is a popular technique for choosing a leader in distributed


networks that uses the greater priority concept. It operates as follows:

Initiation: When a node detects that the current leader has failed (usually through a
timeout mechanism), it initiates an election.

Election Process:

The initiating node sends an “election” message to all other nodes with higher
priority.

Nodes with higher priority respond by either acknowledging their current leader status
or declaring themselves candidates.

If no higher-priority node responds, the initiating node assumes leadership.

Notification: The newly elected leader informs all nodes of its leadership status,
ensuring consistency across the distributed system.

Messages in Bully Algorithm for Leader Node Election

There can be three types of messages that processes exchange with each other in the
bully algorithm:

Election message: Sent to announce election.

OK (Alive) message: Responds to the Election message.

Coordinator (Victory) message: Sent by winner of the election to announce the new
coordinator.

Steps Involved in Bully Algorithm for Leader Node Election


2.2 Ring Algorithm:

 Nodes are arranged in a logical ring.


 Each node passes an election message around the ring.
 The node with the highest ID in the ring is elected as the leader.

What is the Ring Election Algorithm?

The Ring Election Algorithm is a method used in distributed systems to elect a leader
among a group of interconnected nodes arranged in a ring-like structure. It ensures
that only one node in the network becomes the leader, facilitating coordination and
decision-making within the system.
How Does Ring Election Algorithm Work?

Below is how the ring election algorithm works:

Step 1: Initialization: Each node in the network is assigned a unique identifier or


priority.

Step 2: Message Passing: The algorithm begins when a node initiates an election
process. It sends a special message, often called an "election message" or "token,"
containing its identifier, to its neighboring node(s) in the ring.

Step 3: Comparison and Forwarding: Upon receiving the election message, each node
compares the identifier in the message with its own. If the received identifier is
greater than its own, it forwards the message to the next node in the ring. If the
received identifier is smaller than its own, it discards the message.

Step 4: Propagation: This process continues until the message returns to the initiating
node. As the message travels around the ring, each node updates its state to reflect the
highest identifier it has encountered.

Step 5: Leader Election: Once the message returns to the initiating node, it knows it
has the highest identifier in the network. It declares itself as the leader.

3. Physical Clock Synchronization Algorithms


In a distributed system, physical clocks on different nodes must be synchronized.

3.1 Cristian’s Algorithm:

 A client requests time from a time server.


 The server replies with its timestamp.
 The client adjusts its clock considering network delays.
 Cristian’s Algorithm is a clock synchronization algorithm is used to
synchronize time with a time server by client processes. This algorithm
works well with low-latency networks where Round Trip Time is short as
compared to accuracy while redundancy-prone distributed
systems/applications do not go hand in hand with this algorithm. Here
Round Trip Time refers to the time duration between the start of a Request
and the end of the corresponding Response.
Below is an illustration imitating the working of Cristian’s algorithm:


Algorithm:
1) The process on the client machine sends the request for fetching clock
time(time at the server) to the Clock Server at time .
2) The Clock Server listens to the request made by the client process and
returns the response in form of clock server time.
3) The client process fetches the response from the Clock Server at time
and calculates the synchronized client clock time using the formula given
below.

3.2 Berkeley Algorithm:

 A master node polls worker nodes for their clock times.


 Computes the average time and sends updates to synchronize clocks.
Details was explained in class,please refer that

4. Logical Clock Algorithms


4.1 Lamport’s Logical Clocks:

 Assigns timestamps to events to establish an order.


 Maintains a counter that increments with each event.

Lamport’s Logical Clock was created by Leslie Lamport. It is a


procedure to determine the order of events occurring. It provides a basis
for the more advanced Vector Clock Algorithm. Due to the absence of
a Global Clock in a Distributed Operating System Lamport Logical
Clock is needed.
Algorithm:
 Happened before relation(->): a -> b, means ‘a’ happened before ‘b’.
 Logical Clock: The criteria for the logical clocks are:
o [C1]: Ci (a) < Ci(b), [ Ci -> Logical Clock, If ‘a’ happened
before ‘b’, then time of ‘a’ will be less than ‘b’ in a particular
process. ]
o [C2]: Ci(a) < Cj(b), [ Clock value of C i(a) is less than C j(b) ]
Reference:
 Process: Pi
 Event: Eij, where i is the process in number and j: jth event in
the ith process.
 tm: vector time span for message m.
 Ci vector clock associated with process Pi, the jth element is Ci[j] and
contains Pi‘s latest value for the current time in process Pj.
 d: drift time, generally d is 1.
Implementation Rules[IR]:
 [IR1]: If a -> b [‘a’ happened before ‘b’ within the same process]
then, Ci(b) =Ci(a) + d
 [IR2]: Cj = max(Cj, tm + d) [If there’s more number of processes, then
tm = value of Ci(a), Cj = max value between Cj and tm + d]
For Example:

 Take the starting value as 1, since it is the 1 st event and there is no


incoming value at the starting point:
o e11 = 1
o e21 = 1
 The value of the next point will go on increasing by d (d = 1), if there is
no incoming value i.e., to follow [IR1].
o e12 = e11 + d = 1 + 1 = 2
o e13 = e12 + d = 2 + 1 = 3
o e14 = e13 + d = 3 + 1 = 4
o e15 = e14 + d = 4 + 1 = 5
o e16 = e15 + d = 5 + 1 = 6
o e22 = e21 + d = 1 + 1 = 2
o e24 = e23 + d = 3 + 1 = 4
o e26 = e25 + d = 6 + 1 = 7
 When there will be incoming value, then follow [IR2] i.e., take the
maximum value between Cj and Tm + d.
o e17 = max(7, 5) = 7, [e16 + d = 6 + 1 = 7, e24 + d = 4 + 1 = 5,
maximum among 7 and 5 is 7]
o e23 = max(3, 3) = 3, [e22 + d = 2 + 1 = 3, e12 + d = 2 + 1 = 3,
maximum among 3 and 3 is 3]
o e25 = max(5, 6) = 6, [e24 + 1 = 4 + 1 = 5, e15 + d = 5 + 1 = 6,
maximum among 5 and 6 is 6]
Limitation:
 In case of [IR1], if a -> b, then C(a) < C(b) -> true.
 In case of [IR2], if a -> b, then C(a) < C(b) -> May be true or may not
be true.
4.2 Vector Clocks:

 Each process maintains a vector with timestamps.


 Allows detection of causal relationships between events.

What are Vector Clocks?

Vector clocks are a mechanism used in distributed systems to track the causality and
ordering of events across multiple nodes or processes. Each process in the system
maintains a vector of logical clocks, with each element in the vector representing the
state of that process’s clock. When events occur, these clocks are incremented, and
the vectors are exchanged and updated during communication between processes.

By comparing vector clocks, the system can identify if an event on one node causally
happened before, after, or concurrently with an event on another node, enabling
effective conflict resolution and ensuring consistency.

Use Cases of Vector Clocks in Distributed Systems

Vector clocks have several important use cases in distributed systems, particularly in
scenarios where tracking the order of events and understanding causality is critical.
Here are some key use cases:
Vector clocks are used in distributed databases such as Cassandra or Amazon
DynamoDB to settle disputes that arise when several data replicas are updated
separately.

Several people can edit the same document at once using collaborative editing
programs like Google Docs.

In event-driven systems, like distributed logging or monitoring systems, where the


sequence of occurrences is important.

Knowing the sequence in which various nodes operate is crucial for debugging or
monitoring distributed systems.

Several clients may read and edit files simultaneously in distributed file systems such
as the Hadoop Distributed File System (HDFS) or Google File System (GFS).

Advantages of Vector Clocks in Distributed Systems

Vector clocks offer several advantages in distributed systems, particularly in


managing the complexities of tracking and resolving the order of events. Here are the
key benefits:

Causality Tracking: Vector clocks allow distributed systems to accurately track the
causal relationships between events. This helps in understanding the sequence of
operations across different nodes, which is critical for maintaining consistency and
preventing conflicts.

Conflict Resolution: Vector clocks provide a systematic way to detect and resolve
conflicts that arise due to concurrent updates or operations in a distributed system.

Efficiency in Event Ordering: Vector clocks efficiently manage event ordering


without the need for a central coordinator, which can be a bottleneck in distributed
systems.

Fault Tolerance: Vector clocks enhance fault tolerance by enabling the system to
handle network partitions or node failures gracefully. Since each node maintains its
own version of the clock, the system can continue to operate and later reconcile
differences when nodes are reconnected.

Scalability: Vector clocks scale well in large distributed systems because they do not
require global synchronization or coordination. Each process only needs to keep track
of its own events and those of other relevant processes.

Limitations of Vector Clocks in Distributed Systems


Even though it helps track the causality in distributed systems, there are several
limitations associated with vector clocks that can influence their applicability and
efficiency. Some of the key limitations include:

One issue is scalability: In an n-node system, the size of a vector clock grows linearly
with the number of nodes. The memory uses can be huge and will result in a high cost
due to communication.

Difficulty in Implementation: Correct implementation of vector clocks is challenging;


especially in scenarios where nodes are frequently joining as well as leaving the
system, and when frequent network partitions are happening.

Partial Ordering: Vector clocks are capable of allowing only partial ordering of
events, meaning they tell the causal relationship between some, not all, events. This
may lead to vagueness in determining the exact order of events.

The overhead of the communication: In any communication between different nodes,


a vector clock needs to be sent along with each message. This is adding extra bytes to
the messages being exchanged. This may be problematic in systems where bandwidth
is limited or in applications that are highly sensitive to latency.

Limited by Network Dynamics: Vector clocks rely on the assumption of a relatively


stable number of nodes; in highly dynamic systems, where nodes frequently join and
leave, it becomes problematic to handle vector clocks and they easily develop
inconsistencies.

How does the vector clock algorithm work?

Here is how the vector clock algorithm works:

All clocks are always set at zero.

This is because every time an Internal event occurs in a process, the value of the
processes’s logical clock in the vector is incremented by 1.

Also, whenever a process sends a message, the value of the processes’s logical clock
in the vector is incremented by 1.

Every time a process receives a message, it increments the sender process’s logical
clock value in the vector by 1.

Besides, each element is updated by taking a maximum both of the value in its own
vector clock and of the value in the vector in the received message for every element.

You might also like