0% found this document useful (0 votes)
25 views20 pages

MPI Part2 Updated

The document provides an overview of the Message Passing Interface (MPI), detailing its capabilities for both shared and distributed memory architectures, and its support for various programming languages. It explains the concept of MPI communicators, including the default communicator MPI_COMM_WORLD, and outlines important MPI calls for initiating and finalizing computations, as well as sending and receiving messages. Additionally, it discusses the differences between point-to-point and collective communications, highlighting the benefits of using collective operations for improved code readability and performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views20 pages

MPI Part2 Updated

The document provides an overview of the Message Passing Interface (MPI), detailing its capabilities for both shared and distributed memory architectures, and its support for various programming languages. It explains the concept of MPI communicators, including the default communicator MPI_COMM_WORLD, and outlines important MPI calls for initiating and finalizing computations, as well as sending and receiving messages. Additionally, it discusses the differences between point-to-point and collective communications, highlighting the benefits of using collective operations for improved code readability and performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Distributed Memory Programming Model: MPI Overview

MPI Overview

Message Passing Interface


MPI Can be used for Shared Memory, as well as Distributed
Memory architectures (Hybrid, if requiredi)
Supported by Fortran, C, C++ (but modules also available
for python, & Java) Hides hardware details of underlying
system (so portable)
Many high performance libraries have MPI versions of API
calls
MPI version 3.0 specification has 400+ commands (function
calls). Knowledge of only 11-12 of them can help you do the
job in more than 90% of cases.

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI Communicators

MPI Communicators
MPI_COMM_WORLD: Name of default MPI Communicator
A communication universe (communication domain,
communication group) for a group of processes
Stored in variables of type MPI_COMM
Communicators are used as arguments to all message
transfer MPI routines
Each process within communicator has a rank; a unique
integer identifier ranging between [0, #processors − 1]
Multiple communicators can be established in a single MPI
program
Intra-Communicator: Used for communication within a
single group
Inter-Communicator: Used for communication between two
disjoint groups
Parallel & Distributed Systems
Distributed Memory Programming Model: MPI Communicators

MPI Communicators (cont.)

MPI_COMM_WORLD

P2 COMM 2
COMM 1
COMM 4

P1
P0

COMM 3

P3
COMM 5

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI First Look at OpenMPI

First Look (hellompi.c)

#include <stdio.h>
#include <mpi.h>

int main(int a r gc, char **argv)


{
int s i z e , my_rank;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

printf("Hello from %d out of %d\n", my_rank, s i z e ) ;

MPI_Finalize();

return 0 ;

}
mpicc hellompi.c # Compilation (mpiCC f o r C++, a l s o gcc h e llo mp i. c -lmpi )
mpirun -np 4 -h o s t f i l e filename a.out # Execution

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI First Look at OpenMPI

Configuring a Simple MPI based Distributed Computing Cluster

Requirements

SSH Server
apt-get inst all openssh-server
OpenMPI Library
apt-get inst a l l openmpi-bin openmpi-doc
libopenmpi-dev
NFS Network File System
apt-get inst a l l nfs-server nfs-client

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI First Look at OpenMPI

Configuring a Simple MPI based Distributed Computing Cluster (cont.)

Transfering Files

There are many ways to transfer files. You can setup an


NFS mountpoint, share files using dropbox, or send files
using scp. The scp method is given below:
scp /location/of/a.out
username@ipaddress:/home/username/a.out
Note: All cluster nodes must be able to find the
executable file at the same location as any other cluster
node

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI First Look at OpenMPI

Important MPI Calls

MPI_Init(int*, char**); / / I n i t i a t e an MPI Computation


MPI_Finalize(void); / / Terminate an MPI Computation
MPI_Comm_size(MPI_COMM, i n t ) ; / / How many processes
MPI_Comm_rank(MPI_COMM, i n t ) ; / / Who am I?
MPI_Get_processor_name(char*, i n t ) ; / / What i s the hostname?
MPI_Wtime(void); / / Elapsed time in seconds
MPI_Abort(MPI_COMM); / / Terminate a l l processes

Sending/Receiving

What may happen in code P0 (left) and P1 (right) below?


int a = 100; int a ;
send(&a, P1); receive(&a, P0); a = 0;
printf("%d\n", a);

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI Sending/Receiving Messages

Approaches to Send/Receive

Blocking (Non-Buffered) Send/Receive


Follow some form of “handshaking” protocol

Request to Send → Clear to Send → Send Data →


Acknowledgement Problem 1: Idling Overhead (both
sender/receiver side)
Problem 2: Deadlock (sending at same time)
Blocking (Buffered) Send/Receive
Copy send-data to designated buffer, and returns after “copy” operation is
completed
Problem 1: Buffer Size
for ( i = 0 ; i < 1000; i + + ) { for ( i = 0 ; i < 1000; i + + ) {
produce_data(&a); receive(&a, P0);
send(&a, P1); consume_data(&a);
} }

Problem 2: Deadlock (sending at same time)

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI Sending/Receiving Messages

Approaches to Send/Receive (cont.)

Non-Blocking Send/Receive
Return from Send/Receive operation before it is “safe” to return.
Programmer responsibility to ensure that “sending data” is not altered immediately

Blocking Operations: Safe and Easy Programming (at cost of overhead and risk of
deadlocks)
Non-Blocking Operations: Useful for Performance optimization, and breaking
deadlocks (but brings in plenty of race-conditions if programmer not careful)

Parallel & Distributed Systems


Point to Point Communication
Collective communication involves communication of data using all processes inside of a given
communicator, the default communicator that contains all available processes is called
MPI_COMM_WORLD. Whenever a collective call is made it must be called by all processes
inside of the communicatior. Collective communications will not interfere with point-to-point
communications nor will point-to-point communications interfere with collective communication.
Collective communications also do not need the use of tags. Send and receive buffers when using
collective communication calls must match in order for the call to work and there is no guarantee
that a function will be synchronizing (except for barrier). Also all collective communication
operations are blocking. These are some things to keep in mind while using collective
communication operations.
Distributed Memory Programming Model: MPI Point-to-Point Send/Receive

Point to Point Communication


MPI provides a set of send and receive functions that allow communication of typed
data with an associated message tag Typing of the message contents is necessary for
heterogeneous support. The type information is needed so that correct data
representation conversions can be performed as data is sent from one architecture to
another. The tag allows selectivity of messages at the receiving end. One can receive
on a particular tag, or one can wild-card this quantity, allowing reception of messages
with any tag. Message selectivity on the source process of the message is provided.

Types of Point-to-Point Send/Receive Calls

Synchronous Transfer: Send/Receive routines return only when the message


transfer is completed. Not only does this transfer data, but it also synchronizes
processes
MPI_Send() / / Blocking Send
MPI_Recv() / / Blocking Receive
Asynchronous Transfers: Send/Receive do not wait for transfer data and proceeds
with execution next line of instruction. (Precaution: Do not modify the
send/receive buffers)
MPI_Isend() / / Non-Blocking Send
MPI_Irecv() / / Non-Blocking Receive
Distributed Memory Programming Model: MPI Point-to-Point Send/Receive

Point to Point Communication (cont.)


Sending

int MPI_Send(void * bu ffe r , int count, MPI_DATATYPE datatype,


int destination , int tag, MPI_Comm comm);

Send the data stored in buffer

Count is the number of entries in the buffer

What is the datatype of the buffer (MPI_CHAR, MPI_INT, MPI_FLOAT,


MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_LONG, MPI_SHORT,
MPI_UNSIGNED_CHAR, etc.)

Destination is the rank of process, to whom buffer is to be sent to, residing in


communication universe comm

The tag of the message (to distinguish between different types of messages)

Parallel & Distributed Systems


Distributed Memory Programming Model: MPI Point-to-Point Send/Receive

Point to Point Communication (cont.)

Receiving
int MPI_Recv(void * b u ffe r , int count, MPI_DATATYPE datatype,
int source, int tag, MPI_Comm comm,
MPI_Status * s t a t u s );

Store the received message in buffer


Count is the number of entries to be received in the buffer. If number of entries is
larger than the capacity of buffer, an overflow error MPI_ERR_TRUNCATE is
returned.
Datatype is the type of data that has been received
Source is the rank of process, residing in communication domain comm, from whom
buffer is received. Source can be hard-set, or a wild-card MPI_ANY_SOURCE.
To retrieve message of certain type, set the tag argument. If there are many
messages of same tag from same process, any one of them may be retrieved. If
message of any tag is to be retrieved, use the wild-card MPI_ANY_TAG.
Store status of received message in status (next slide). If not needed, use
MPI_STATUS_IGNORE

Parallel & Distributed Systems


Collective communications
Point-to-Point: It is programmer’s responsibility to ensure that all processes
participate correctly in a given communication (Programmer’s burden)
MPI simplifies this using Collective Communication.
Collective communications transmit data among all processes in a group
specified by an intra-communicator object. One function, the barrier
function, serves to synchronize processes without passing data. No
process returns from the barrier function until all processes in the group
have called it. A barrier is a simple way of separating two phases of
computation to ensure that the messages generated in the two phases
do not intermingle
Types are:
Synchronization:
Barriers: MPI_Barrier()
Moving Data:
Broadcasting: MPI_Bcast() Scattering: MPI_Scatter() Gathering:
MPI_Gather()
Collective Computation:
Reduction: MPI_Reduce()
Difference to point-to-point communications
No message tags
Most calls/versions support blocking communication only
MPI provides the following collective communication functions.

Barrier synchronization across all group members


Global communication functions – Data Movement Routines
- Broadcast of same data from one member to all members of a group
- Gather data from all group members to one member
- Scatter different data from one member to other members of a group
- A variation on Gather where all members of the group receive the result
- Scatter/Gather data from all members to all members of a group (also
called complete exchange or all-to-all)
Global reduction operations such as sum and product, max and min, bitwise and
logical, or user-defined functions.
-Reduction where the result is returned to all group members and a
variation where the result is returned to one member
-A combined reduction and scatter operation
Code Readability and Maintainability.

It is easier to read and maintain code with collectives.


For example if we want to send something to every process it
would require N^2 point to point communications, with a
collective it is one simple call.
Performance
MPI has designed algorithms that are optimized to do
collective communication. As mentioned above, we can also
save a lot of time having one call versus several.
The five major ways of communication that MPI implements are:
barriers: wait for others before proceeding - uses Barrier
all-to-one: all processes send data to one - uses Gather and Allgather

one-to-all: sends data to all processes from one - uses Broadcast and Scatter

all-to-all: all processes send data to all processes - uses Alltoall

combining results: get results from every process and do something with it. -
uses Reduce
Communication Domains
A communicator object specifies a communication domain
which can be used for point-to-point communications.

An intracommunicator is used for communicating within a


single group of processes. The intracommunicator has fixed
attributes, for example, that describe the process group and
the topology of the processes in the group.
Intracommunicators are also used for collective operations
within a group of processes.

An intercommunicator is used for point-to-point


communication between two disjoint groups of processes.
The fixed attributes of an intercommunicator are the two
groups. No topology is associated with an
intercommunicator.

You might also like