0% found this document useful (0 votes)
4 views51 pages

Module 1

The document provides an overview of distributed computing, defining it as a system of interconnected independent computers that collaborate on tasks. It contrasts centralized and distributed systems, highlighting features such as scalability, fault tolerance, and resource sharing. Additionally, it discusses the importance of inter-process communication, message passing, and the various architectures and advantages of distributed systems.

Uploaded by

shivampoddar171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views51 pages

Module 1

The document provides an overview of distributed computing, defining it as a system of interconnected independent computers that collaborate on tasks. It contrasts centralized and distributed systems, highlighting features such as scalability, fault tolerance, and resource sharing. Additionally, it discusses the importance of inter-process communication, message passing, and the various architectures and advantages of distributed systems.

Uploaded by

shivampoddar171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Module 1: Introduction

By
Prof. Ankita Mandore,
Assistant Professor,
CSE (Data Science), DSCE
What is Distributed Computing?

 A distributed systems is a collection of independent. computers.


interconnected via a network, Capable of collaborating on a task.
 Distributed computing is computing performed in a distributed system.
 The process of computation was started from working on a single processor.
This Uniprocessor computing Can be termed as centralized computing.
Example of Distributed System
Features of Distributed systems

 Communication is hidden from Users


 Application interact in Uniform and Consistent way
 High degree of scalability
 Resource sharing is possible in distributed systems.
 Distributed systems art as fault tolerant systems.
 Enhanced performance
Centralized Systems vs Distributed
Systems
Centralized Systems Distributed Systems
In Centralized Systems Several jobs are In Distributed Systems Jobs are
done on a particular central processing distributed among several processors.
Unit (CPU) The processor are interconnected
by a computer network.
Centralized control and authority Decentralized control and authority
Communication flows to central node Direct communication between nodes
Single point of failure Redundancy, less vulnerable to single
points of failure
Limited scalability due to centralization Highly scalable, new nodes can be
added easily
Relatively simpler to manage More complex to manage
Relation to Computer System
Component
 In distributed computing system, each node consists of a Processor (CPU),
local memory and interface.
 Communication between any two or more nodes is only by message passing
because there is no common memory available.
 Distributed software is also termed as middleware. The distributed system
uses layered architecture to break down the complexity of system design.
 Each computer has memory processing unit and the computers are connected
by a communication Network. All the computers can communicate with each
other through LAN and WAN.
 A distributed system tem is an information processing system that contains a
number of independent computers that cooperate with one another over a
communications network in order to achieve a specific objective.
 A distributed computer system consists of multiple software component that
are on multiple computers but run as a single system.
 A distributed Systems can consist of any number of possible configurations,
such as mainframes. personal computers, workstations, mini computers and
so on.
Motivation
 Economics
 A collection of microprocessors after a better price / performance than mainframes.
 Low price / performance ratio is the Cost effective way to increase computing power.
 Speed
 A distributed system may have more total computing power than a mainframe
 Scalability
 Distributed systems can be extended through the addition of components, there by
providing better scalability compared to centralized systems
 Inherent distribution
 Some applications are inherently distributed e.g. A Supermarket chain
 Reliability
 If one machine crashes, the system as a whole can still survive It gives higher availability
and improved reliability
 Incremental growth
 Computing power can be added in small increments.
Need of Distributed System
 Resource sharing is main motivation of the distributed system. The term
resource is a rather abstract one, but it best characterizes the range of things
that can use usefully be shared in a networked computer system.
 Resources may be the software resources or hardware resources. printers,
disks, CDROM and data are the example &f software and hardware resources
 A resource manager is a Software module that manages a set of resources of a
particular type.
 Primary requirement of distributed system are as follows :
1. Fault tolerance
2. Consistency
3. Security
4. Reliability
5. Replicated data
6. Concurrent transactions
Focus on Resource sharing

 The term resource is a rather abstract one, but it best characterizes the
range of things that can usefully be shared is a networked computer system.
 Equipment are shared to reduce cast. Data shared in database or web pages
are high- level resources which are mark significant to users without regard
for the server on servers that provide these.
 Types of resources
1. Hardware resource: Hard disk, printer, camera
2. Data: File, database, web page.
3. Service: Search Engine
Patterns of resource sharing vary widely in their scope and in how closely users
work together.
 Search Engine:
Users need no contact between users.
 Computer supported co-operative working (CSCW):
Users cooperate directly share resources mechanisms to coordinate users action are
determined by the pattern of sharing and the geographic distribution
For effective sharing, each resource must be managed by a program that offers' a
communication interface enabling the resource to be accessed and updated reliably and
consistently.
 Service:
Manages a collection of related resources and presents their functionalities to users and
applications .
 Server:
Server is basically storage of resources and it provides services to the authenticated clients.
It is running program on a networked computer. Server accepts requests from client and
performs service and responds to request.
Example: Apache server
The complete interaction between server machine and client machine, from the point when
the client sends its request to when it receives the server's response is called a remote
invocation.
 Hardware resources
 CPU
a) Computing server: It extents processor intensive applications for clients
b) Remote object server: It executes methods on behalf of clients
c) Worm program: It Shares CPU capacity f desktop machine with the local user
 Memory
Cache server holds recently accessed web pages in its RAM, for faster access by other
local computers
 Disk
File server, virtual disk server, videos on demand server
 Screen
Network window systems
 Printer
Networked printer accept print Jobs from many computers and managing them with a
queuing system
 Software Resources
 Web page
Web servers enable multiple clients to share read only page content.
 File
File Servers enable multiple clients to share read write files
 Object
Possibilities for software objects are limitless. Shared white board, Shared diary and
room booking system are examples of this type.
 Database
Databases are in tended to record the definitive State of some related sets of data.
They have been shared ever since. multi-user computers appeared. They include
techniques to manage concurrent updates
 News group Content
The net news system makes rend only copies of the recently posted news items
available to clients throughout the internet
 Video/Audio Stream
Servers can store entire videos on disk and deliver them at playback speed to
multiple clients simultaneously
Advantages of Distributed Computing
Disadvantages of Distributed Computing
Architectures of Distributed Systems
Architectures of Distributed Systems
Inter Process Communication- Shared
Data
Inter Process Communication- Message
Passing
Message Passing in Distributed System
 A process is a program in execution.
 Resource manager process to monitor the current status of usage of its local
resources All resource managers communicate each other from time to time to
dynamically balance the system load.
 Therefore a DOS needs to provide inter-process communication (IPC) mechanism for
communication activities.
 IPC basically requires information sharing among two or more processes.
 Two basic methods for information sharing
 Original sharing, or shared-data approach
 Copy sharing, or message-passing approach
 The shared-data paradigm gives the conceptual communication pattern.

 In message-passing approach, the information to be shared is physically


copied from the sender process address space to the address spaces of all
receiver processes
 This done by transmitting the data in the form of messages.
 A message is a block of information.
 Communication processes interact directly with each other
 Distributed system communicate by exchanging messages.
 Message passing is the basic IPC mechanism in distributed system.
 Message-passing system is a subsystem of a DSM that provides a set of
message-based IPC protocols.
 It enables processes to communicate by simple communication primitives
send and receive.
 Message send communication primitives is denoted by send() and receive
communication primitives denoted by Receive()
 Message passing primitive commands SEND (msg, dost) RECEIVE (src, buffer)
Desirable Features of a Good Message-
Passing system
1. Simplicity
• MPS should be simple and easy to use
• Construction of new applications and to communicate with existing ones by using
the primitives provided by the MPS Different modules of a distributed application
use.
• Simple primitives without bothering the system or network.
• Use of clean and simple semantics of IPC protocols
2. Uniform Semantics
• Uses two type of communication
• Local communication - the communicating process are on the same node
• Remote communication the communicating processes are on - different nodes

• The semantics of remotes communication should be close as possible to those of local


communications

3. Efficiency
• If the MPS is not efficient, IPC may become so expensive
• Application users try to avoid its use in their applications
• An IPC protocol of a MPS can be made efficient by reducing the number of message
exchanges during communication
• Some optimizations are
• Avoiding cost of establishing and terminating connections between the same pair of processes
of every exchange
• Minimizing the cost of maintaining connections
• Piggybacking of acknowledgement
4. Reliability
• A reliable IPC protocol can cope up with failure problems and guarantees the
delivery of a message.
• Failure due to node crash or communication link failure
• Handling of lost messages usually involves acknowledgements and retransmissions
on the basis of timeouts
• Another issues related to reliability is duplicate messages
• Duplicate messages because of event of failures or timeouts
• A reliable IPC protocol is also capable of detecting and handling duplicates
• Use sequence number to avoid duplicate messages

5. Correctness
• IPC system has group communication
• One sender to multiple receiver, multiple sender to one receiver
• Correctness related to IPC protocols group communication
 Issues related to correctness is
 Atomicity
 Ensures that every message sent to a group of receivers will be delivered to either all of
them or none of them
 Ordered delivery
 Ensures that messages arrive at all receivers in an order acceptable to the application
 Survivability
 Guarantees that messages will be delivered despite of partial failure of processes,
machines, or communication links

6. Flexibility
• Not all applications require the same degree of reliability and correctness of the
IPC protocols
• Many applications do not require atomicity or ordered delivery of messages
• The IPC primitives should be such that users have the flexibility to choose and
specify the types and levels of reliability and correctness requirements of
applications
• Flexibility permit control flow as synchronous and asynchronous send/receive
7. Security
• A MPS be capable of providing a secure end-to-end communication
• A message in transit on the network should not be accessible to any user other
than those to whom it is addressed and the sender
• Steps necessary for secure communication is
• Authentication of the receiver(s) of a message by the sender
• Authentication of the sender of a message by its receiver(s)
• Encryption of a message before sending it over the network

8. Portability
• Two different aspects of portability
• It should be easily construct new IPC facility on another system by reusing the basic design
of existing MPS
• Applications are also portable heterogeneity must be considered while designing MPS
Message Structure

 A message is a block of information formatted by a sending process in such a


manner that it is meaningful to the receiving process
 It consists of fixed-length header and a variable-size collection of typed data
objects
Issues in IPC by message passing
 In a message-oriented IPC protocol, the sending process determines the actual
contents of a message
 The receiving process to convert the contents
 Special primitives are explicitly used for sending and receiving the messages
 Following issues to be discussed in the design of an IPС protocol
 Who is the sender?
 Who is the receiver?
 Is there one receiver or many receivers?
 Is the message guaranteed to have been accepted by its receiver(s)?
 Does the sender need to wait for a reply?
 What should be done in case of failure (crash or communication)?
 What should be done if the receiver is not accept the message?
 Will the message be discarded or stored in a buffer In case of buffering,
 what should be done if the buffer is full?
 If there are several outstanding messages for a receiver, can it choose the order in which
to service the messages?
 Synchronization . ...
 Major issue of communicating process is synchronization
 The semantics classified as blocking and non-blocking types
 Non-blocking semantics if it invocation does not block the execution of its invoker
 Otherwise a primitive is blocking (execution of invoker is blocked)
 Two types of semantics used for the send and receive primitives
 In case of blocking send, after the execution of send, the sending process is
blocked until acknowledgement is received
 Blocking receive, after execution of receive statement, the receiving process is
blocked until it receives message
 Non-blocking send, after sending process sending process is allowed to execute
 Non-blocking receive, the receiving process proceeds with its execution after
execution the receive statement
 An important issue in a non-blocking receive primitive is how receiving
process know that the message has arrived in the message buffer
 The following two methods used for this
 Polling
 A test primitive is allowed to the receiver to check the buffer status
 Receiver periodically poll the kernel to check the buffer

 Interrupt
 When the message is filled in the buffer, software interrupt is used to notify the receiving
process
 This method permits the receiving process to continue without having unsuccessful test
requests
 Its highly efficient and allows maximum parallelism
 Drawback is user-level interrupts make programming difficult
 A variant of Nonblocking receive primitive is the conditional receive primitive
 It returns control immediately, either with a message or an indicator that no
message
 Blocking send primitive uses the timeout values
 The value set by user or default value
 Timeout value used for blocking receive primitive to prevent the receiving
process blocked indefinitely
 Both the send and receive primitives of a communication between two
process use blocking semantics is said to be synchronous
 If its uses nonblocking primitives then communication asynchronous
 Synchronous communication is simple and easy to implement
 Provide high reliability
 Drawbacks are
 Limits the concurrency and is subject to communication deadlocks
 Less flexible because sending process always has to wait for an acknowledgement,
even it is nor required
Buffering
 Messages copying from the address space of the sending process to the
address space of the receiving process
 If the receiving process is not ready to receive messages, then it should be
save for later usage
 The message buffering is related synchronization strategy
 The following are the buffering strategies
1. Null buffer or no buffer
2. Buffer with unbounded capacity
3. Single-message buffer
4. Finite-bound or multiple-message buffer
Null buffer (or no buffering)
 There is no place to temporarily store the message
 One of the following implementation strategies used
 The message remains in sender address space and execution of send is delayed
until the receiver executes receive
 The message is simply discarded and the timeout mechanism is used to resend the
message after a timeout period
Single-Message buffer
 A buffer capacity to store single message is used on the receiver's node
 An application module may have at most one message outstanding at a time
 Single-message buffer strategy is to keep the message ready for use at the
location of the receiver
 The request message is buffered on the receiver's node if the receiver is not
ready to receive the message
 The message buffer may either be located in the kernel's address space or in
the receiver process's address space
Unbounded-capacity buffer

 A sender does not wait for the receiver to be ready


 An unbounded-capacity message buffer that can store all unreceived
messages
 It assure that all the messages sent to the receiver will be delivered
Finite-bound (or multiple-message)
buffer
 Asynchronous mode of communication use finite-bound buffers
 Need mechanism to handle the problem of buffer overflow
 Two ways to handle buffer overflow
 Unsuccessful communication
 Message transfers simply fail whenever there is no mode buffer space
 The send normally returns an error message to the sending process
 This method is less reliable

 Flow-controlled communication
 The sender is blocked until the receiver accepts some messages
 This method introduces a synchronization between sender and receiver
 It result in unexpected deadlocks
 The amount of buffer space to be allocated depends on implementation
 A create-buffer system call is provided to the users
 The receiver mail box is located in the kernel address space or in the receiver
process address space
 This buffering provides better concurrency and flexibility
Multidatagram Messages
 All networks has upper bound of the size of data transmitted at a time
 This size is known as Maximum Transfer Unit(MTU) of network a
 Message size greater than MTU has fragmented in to multiples of the MTU
 Each fragment sent separately
 Each fragment is sent in a packet with control information and data
 Each packet is known as datagram
 Messages smaller than the MTU of the network can be sent in a single packet
known as single-datagram messages
 Messages larger than the MTU of the network have to be fragmented and sent
in multiple packets known as multidatagram messages
Encoding and Decoding of Message Data
 The structure of program objects should be preserved, while transmitting
from the address of the sending process to receiving process
 Since both processes are on computers of different architectures it is difficult
 Because two reasons
 An absolute pointer value loses its meaning when transferred from one address
space to another
 Different program objects occupy varying amount of storage space, ex. Long int,
short int, var size character strings
 Due to this problem the program objects first converted to a stream form for
transmission and placed into message buffer
 This conversion process on the sender side is known as encoding of a message
data
 When received stream form converted to original program objects
 Known as decoding.
Two representations used for the
encoding and decoding
 Tagged representation
 The type of each program object along with its value is encoded in the message
 The receiving process to check the type of each program object in the message
 Program object is the self-describing nature of the coded data format
 Untagged representation
 The message data only contains program objects
 No information is included in the message data to specify the type of each program
object
 Receiver process must have prior knowledge of how to decode
Algorithmic challenges in distributed
computing

 Designing useful execution models and frameworks


 Dynamic distributed graph algorithms and distributed routing
algorithms
 Time and global state in a distributed system
 Synchronization/coordination mechanisms
 Group communication, multicast, and ordered message delivery
 Monitoring distributed events and predicates
 Distributed program design and verification tools
 Debugging distributed programs
 Data replication, consistency models, and caching.
Applications of distributed computing
and newer challenges
1. Mobile systems
2. Sensor networks
3. Ubiquitous or pervasive computing
4. Peer-to-peer computing
5. Publish-subscribe, content distribution, and multimedia
6. Distributed agents
7. Distributed data mining
8. Grid computing
9. Security in distributed system
Types of Message Passing in Distributed
Systems

 Message passing describes the method by which nodes or processes interact


and share information in distributed systems.
 Message passing can be divided into two main categories according to the
sender and receiver's timing and synchronization
1. Synchronous Message Passing
 Synchronous message passing involves a tightly coordinated interaction between
the sender and receiver. The key characteristics include:
 Timing Coordination: Before proceeding with execution, the sender waits for the
recipient to confirm receipt of the message or finish processing it.
 Request-Response Pattern: often use a request-response paradigm in which the
sender sends a message requesting something and then waits for the recipient to
react.
 Advantages:
 Ensures precise synchronization between communicating entities.
 Simplifies error handling as the sender knows when the message has been successfully
received or processed.
 Disadvantages:
 May introduce latency if the receiver is busy or unavailable.
 Synchronous blocking can reduce overall system throughput if many processes are
waiting for responses.
2. Asynchronous Message Passing
 Asynchronous message passing allows processes to operate independently of each
other in terms of timing. Key features include:
 Decoupled Timing: The sender does not wait for an immediate response from the
receiver after sending a message. It continues its execution without blocking.
 Event-Driven Model: Communication is often event-driven, where processes
respond to messages or events as they occur asynchronously.
 Advantages:
 Enhances system responsiveness and throughput by allowing processes to execute
concurrently.
 Allows for interactions that are loosely connected, allowing processes to process
messages at their own speed.
 Disadvantages:
 Requires additional mechanisms (like callbacks or event handlers) to manage responses
or coordinate actions.
 Handling out-of-order messages or ensuring message delivery reliability can be more
complex compared to synchronous communication.
3. Unicast Messaging
 Unicast messaging is a one-to-one communication where a message is sent from a
single sender to a specific receiver. The key characteristics include:
 Direct Communication: The message is targeted at a single, specific node or
endpoint.
 Efficiency for Point-to-Point: Since only one recipient receives the message,
resources are efficiently used for direct, point-to-point communication.
 Advantages:
 Optimized for targeted communication, as the message is only sent to the intended
recipient.
 Minimizes network load compared to group messaging, as it doesn’t broadcast to
unnecessary nodes.
 Disadvantages:
 Not scalable for group communications; sending multiple unicast messages can strain the
system in larger networks.
 Can increase the complexity of managing multiple unicast connections in large-scale
applications.
4. Multicast Messaging

 Multicast messaging enables one-to-many communication, where a message is sent from one
sender to a specific group of receivers. The key characteristics include:
 Group-Based Communication: Messages are delivered to a subset of nodes that have joined
the multicast group.
 Efficient for Groups: Saves bandwidth by sending the message once to all nodes in the group
instead of individually.
 Advantages:
 Reduces network traffic by sending a single message to multiple recipients, making it ideal for
content distribution or group updates.
 Scales efficiently for applications where data needs to reach specific groups, like video
conferencing or online gaming.
 Disadvantages:
 Complex to implement as nodes need mechanisms to manage group memberships and handle
node join/leave requests.
 Not all network infrastructures support multicast natively, which can limit its applicability.
5. Broadcast Messaging
 Broadcast messaging involves sending a message from one sender to all nodes
within the network. The key characteristics include:
 Wide Coverage: The message is sent to every node, ensuring that all nodes in the
network receive it.
 Network-Wide Reach: Suitable for announcements, alerts, or updates intended
for all nodes without targeting specific ones.
 Advantages:
 Guarantees that every node in the network receives the message, which is useful for
critical notifications or status updates.
 Simplifies dissemination of information when all nodes need to be aware of an event or
data change.
 Disadvantages:
 Consumes significant network resources since every node, regardless of relevance,
receives the message.
 Can lead to unnecessary processing at nodes that don’t need the message, potentially
causing inefficiency.

You might also like