0% found this document useful (0 votes)
12 views76 pages

message passing-1

The document provides an overview of the Message Passing Interface (MPI), a protocol for parallel programming that facilitates communication between processes across multiple computers. It covers key concepts such as point-to-point and collective communication, along with various MPI functions for sending and receiving messages, and the structure of communicators. Additionally, it discusses the advantages of collective communication and provides examples of MPI commands and operations.

Uploaded by

itspratik011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views76 pages

message passing-1

The document provides an overview of the Message Passing Interface (MPI), a protocol for parallel programming that facilitates communication between processes across multiple computers. It covers key concepts such as point-to-point and collective communication, along with various MPI functions for sending and receiving messages, and the structure of communicators. Additionally, it discusses the advantages of collective communication and provides examples of MPI commands and operations.

Uploaded by

itspratik011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Message Passing

1
Outline

• MPI basics,
• Point-to-point communication,
• Collective communication,
• Synchronous/asynchronous send/receive,
• algorithms for
– gather,
– scatter,
– broadcast, and
– reduce.
Message Passing Interface (MPI) Basics

• To set-up a cluster/parallel computers we must:


– Configure the individual computers
– Establish some form of communication between machines
– Run the program(s) that exploit the above

• What does MPI do?


– MPI allows moving data and commands between processes.
– Data that is needed for a computation or from a computation

3
Message Passing Interface (MPI) Basics

• Message Passing Interface (MPI) is a communication protocol for


parallel programming.
• MPI is specifically used to allow applications to run in parallel across a
number of separate computers connected by a network.
• The MPI standard specifies a library of functions that implement the
message-passing model of parallel computation.
• MPI was developed by the MPI Forum, a consortium of parallel computer
vendors and software development specialists.
• As a standard, MPI provides a common high-level view of a message-
passing environment that can be mapped to various physical systems.
• Software implemented using MPI functions can be easily ported among
machines that support the MPI model.

4
Message Passing Interface (MPI) Basics

MPI includes functions for:

• Point-to-point communication (blocking and nonblocking send/receive, . . .)


• Collective communication (broadcast, gather, scatter, total exchange, . . .)
• Aggregate computation (barrier, reduction, and scan or parallel prefix)
• Group management (group construction, destruction, inquiry, . . .)
• Communicator specification (inter-/intracommunicator construction, destruction, . . .)
• Virtual topology specification (various topology definitions, . . .)

5
Message Passing Interface (MPI)
• MPI is a specification for the developers and users of message passing libraries.
• MPI primarily addresses the message-passing parallel programming model
- data is moved from the address space of one process to that of another process through
cooperative operations on each process.

6
Message Passing Interface (MPI)

7
Message Passing Interface (MPI)

8
Message Passing Interface (MPI)
Commands

Initiate an MPI computation.


Terminate a computation.

Determine my process identifier.


Determine number of processes.

Send a message.
Receive a message.

9
Message Passing Interface (MPI)
Commands

10
Message Passing Interface (MPI)
Commands

11
Message Passing Interface (MPI)
Commands

12
Message Passing Interface (MPI)
Commands

13
14
15
MPI_COMM_WORLD, size and
ranks
• When a program is ran with MPI all the processes are grouped in what we call a communicator.
• A communicator as a box grouping processes together, allowing them to communicate.
• Every communication is linked to a communicator, allowing the communication to reach different
processes.
• Communications can be either of two types :
• Point-to-Point : Two processes in the same communicator are going to communicate.
• Collective : All the processes in a communicator are going to communicate together.
• The default communicator is called MPI_COMM_WORLD. It basically groups all the processes when
the program started.
MPI_COMM_WORLD, size and
ranks
• MPI_COMM_WORLD is not the only communicator in MPI.
• Replace communicators by MPI_COMM_WORLD.
• The number in a communicator does not change once it is created. That number is called the size of the
communicator.
• At the same time, each process inside a communicator has a unique number to identify it.
• This number is called the rank of the process.
• The rank of each process is the number inside each circle.
• The rank of a process always ranges from 0 to size-1
Hello World Program

• MPI must always be initialized and finalized.


• Both operations must be the first and last calls of your
code, always.
• write a program which does the following steps :
• Initializes MPI.
• Reads the rank of the current process on
MPI_COMM_WORLD.
• Prints greeting message, from process #<RANK
OF THE PROCESS.
• Finalizes MPI and exits.
Hello World Program
• There is a simple way to compile all MPI codes.
• When you install any implementation, such as OpenMPI or MPICH, wrapper compilers are provided:
• mpicxx -o hello_world hello_world.cpp [for OpenMPI]
• There are three such wrappers to compile in the three languages mainly supported by MPI
implementations : C, C++ and Fortran.
• The respective wrappers are : mpicc, mpicxx and mpifort.
• These wrappers will call compilers using environment variables.
• run it across multiple numbers of processes using the mpirun command.
• For instance: mpirun -np 4 ./hello_world
• Note that -np 4 indicates that the program will be run accross 4 processes.

Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by
a consortium of academic, research, and industry partners.
MPICH: is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
Example

20
Message Passing

21
Message Passing

Communication type: Cooperative operations


• The message-passing approach makes the exchange of data
cooperative.
• Data is explicitly sent by one process and received by another.
• An advantage is that any change in the receiving process’s memory is
made with the receiver’s explicit participation.
• Communication and synchronization are combined.
Process 0 Process 1
Send(data)
Receive(data)

22
Message Passing

Communication type: Cooperative operations


• One-sided operations between processes include remote memory
reads and writes
• Only one process needs to explicitly participate.
• An advantage is that communication and synchronization are
decoupled.

Process 0 Process 1
Put(data)
(memory)

(memory)
Get(data)

23
Point-to-point communication

• The elementary communication operation in MPI is “point-to-point"


communication, that is,
– direct communication between two processors,
– one of which sends and the other receives.

• Point-to-point communication in MPI is "two-sided", meaning that both an


explicit send and an explicit receive are required.
– Data are not transferred without the participation of both processors.

• In a generic send or receive, a message consisting of some block of data is


transferred between processors.
– A message consists of an envelope, indicating the source and destination
processors, and a body, containing the actual data to be sent.

24
Point-to-point communication

• MPI provides a set of send and receive functions that allow the
communication of typed data with an associated message tag.
• The type information is needed so that correct data representation
conversions can be performed as data is sent from one architecture to
another.
• The tag allows selectivity of messages at the receiving end.
• One can receive on a particular tag, or one can wild-card this quantity,
allowing reception of messages with any tag.
• Message selectivity on the source process of the message is also provided.

25
Point-to-point communication

Source and destination


• In general, the source and destination processes operate asynchronously.
• That is,
– the source process may complete sending a message long before the
destination process gets around to receiving it, and
– the destination process may initiate receiving a message that has not yet
been sent.

• These sent, but not yet received messages are called pending messages.
• It is an important feature of MPI that pending messages are not maintained
in a simple FIFO queue.
• Instead, each pending message has several attributes and the destination
process (the receiving process) can use the attributes to determine which
message to receive.

26
Point-to-point communication

27
Point-to-point communication

Sending and receiving messages


• Sending messages is straightforward. Although receiving messages is not
quite so simple. A process may have several pending messages.
• To receive a message, a process specifies a message envelope that MPI
compares to the envelopes of pending messages.
– If there is a match, a message is received.
– Otherwise, the receive operation cannot be completed until a matching
message is sent.

• In addition, the process receiving a message must provide storage into


which the
body of the message can be copied.
• The receiving process must be careful to provide enough storage for the
entire message.

28
Parallel Programming Example (With
Point to Point Communication)
Bridge Construction
• A bridge is to be assembled from girders being constructed at a foundry. These two activities
are organized by providing trucks to transport girders from the foundry to the bridge site.
• This situation is illustrated in the figure overleaf with the foundry and bridge represented as
tasks and the stream of trucks as a channel. Notice that this approach allows assembly of the
bridge and construction of girders to proceed in parallel without any explicit coordination.
• The foundry crew puts girders on trucks as they are produced, and the assembly crew adds
girders to the bridge as and when they arrive.

• Two solutions to the bridge construction problem.


• Both represent the foundry and the bridge
assembly site as separate tasks.

29
Parallel Programming Example (With
Point to Point Communication)
Bridge Construction
• The first uses a single channel on which girders generated by foundry are transported as fast
as they are generated.
– If foundry generates girders faster than they are consumed by bridge, then girders
accumulate at the construction site.
• The second solution uses a second channel to pass flow control messages from bridge to
foundry so as to avoid overflow.

30
Parallel Programming Example (With
Point to Point Communication)

Bridge Construction: A coded implementation of the first solution using MPI is

Code for Solution (a) – Requires buffering of messages from foundry


program main
begin
MPI_INIT()
MPI_COMM_SIZE(MPI_COMM_WORLD, count)
if count != 2 then exit /* Must be just 2 processes */
MPI_COMM_RANK(MPI_COMM_WORLD, myid)
if myid = 0 then
foundry(100) /* Execute Foundry */
else
bridge() /* Execute Bridge */
endif
MPI_FINALISE()
end

31
procedure foundry(numgirders)
begin
for i = 1 to numgirders
/* Make a girder and send it */
/* MPI_SEND(buf, count, datatype, dest, tag, comm) */
MPI_SEND(i, 1, MPI_INT, 1, 0, MPI_COMM_WORLD)
endfor
i = -1 /* Send shutdown message */
MPI_SEND(i, 1, MPI_INT, 1, 0, MPI_COMM_WORLD)
end

procedure bridge
begin
/* Wait for girders and add them to the bridge */
/* MPI_RECV(buf, count, datatype, source, tag, comm, status) */
MPI_RECV(msg, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, status)
while msg != -1 do
use_girder(msg)
MPI_RECV(msg, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, status)
endwhile
end
Processes can use point-to-point communications operations to send a message from one named process
to another

32
MPI Blocking and Non-blocking
Blocking: Returns after local actions completed, though the message transfer
may not have been completed.
• i.e. return only when the buffer is ready to be reused.
• Collective communications in MPI are always blocking.
Non-Blocking: Returns immediately

• Blocking communication is done using MPI_Send() and MPI_Recv().


• These functions do not return (i.e., they block) until the communication is finished.
• Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either
because MPI saved it somewhere, or because it has been received by the destination.
• Similarly, MPI_Recv() returns when the receive buffer has been filled with valid data.
• In contrast, non-blocking communication is done using MPI_Isend() and MPI_Irecv().
• These function return immediately (i.e., they do not block) even if the communication is not
finished yet.
• You must call MPI_Wait() or MPI_Test() to see whether the communication has finished.

33
Synchronous/asynchronous
Send/Receive
• In case of asynchronous send/receive, there is no synchronization
between the sending and receiving processes.
• Due to no synchronization, messages can be in pending state.

• Synchronous mode send requires MPI to synchronize the sending and


receiving processes.
• When a synchronous mode send operation is completed, the sending
process may assume the destination process has begun receiving the
message.
• The destination process need not be done receiving the message, but it
must have begun receiving the message.

34
Collective communication

• Collective communications transmit data among all processes in a group specified


by an intra-communicator object.
• MPI includes routines for performing collective communications.
• These routines allow larger groups of processors to communicate in various ways
like:
– one to many,
– many to one and
– many to many.

• One function, the barrier function, serves to synchronize processes without


passing data.
• No process returns from the barrier function until all processes in the group have
called it.
• A barrier is a simple way of separating two phases of computation to ensure that
the messages generated in the two phases do not intermingle.

35
Collective communication

MPI provides the following collective communication functions:


• Barrier synchronization across all group members
• Global communication functions – Data Movement Routines
– Broadcast of same data from one member to all members of a group
– Gather data from all group members to one member
– Scatter different data from one member to other members of a group
– A variation on Gather where all members of the group receive the result
– Scatter/Gather data from all members to all members of a group (also called
complete exchange or all-to-all)
• Global reduction operations such as sum and product, max and min, bitwise and
logical, or user-defined functions.
– Reduction where the result is returned to all group members and a variation
where the result is returned to one member
– A combined reduction and scatter operation

36
Collective communication

Global Operations: Global Synchronisation

MPI_BARRIER(comm)
IN comm communicator(handle)

• This function is used to synchronise execution of a group of processes.


• No process returns from this function until all processes have called it.
• A barrier is a simple way of separating two phases of computation to ensure that the
messages generated in the two phases do not intermingle.
• In many cases, the need for a barrier can be avoided with appropriate use of tags and
• source specifiers.

37
Collective communication

• Examples of collective communications include


– broadcast operations,
– gather and scatter operations, and
– reduction operations

38
Advantages of collective
communication over point-to-point
• The possibility of error is significantly reduced: One line of code for the
call to the collective routine while in point-to-point, there are several point-
to-point calls.

• The source code is much more readable, thus simplifying code debugging
and maintenance.

• Optimized forms of the collective routines are often faster than the
equivalent operation expressed in terms of point-to-point routines.

39
Broadcast operation
The simplest kind of collective operation is the broadcast. In a broadcast operation a single process
sends a copy of some data to all the other processes in a group.

Here, each row in the figure represents a different process.


Each colored block in a column represents the location of a piece of the data.
Blocks with the same color that are located on multiple processes contain copies of the same
data.

MPI_BCAST broadcasts a message from the process with rank root to all processes of the
group, itself included. It is called by all members of group using the same arguments for
comm, root. On return, the contents of root's communication buffer has been copied to all
processes.
Broadcast operation
MPI_BCAST(inbuf, incnt, intype, root, comm)
INOUT inbuf address of input buffer, or output buffer at root (choice)
IN incnt number of elements in input buffer (integer)
IN intype datatype of input buffer elements (handle)
IN root process id of root process (integer)
IN comm communicator (handle)

• This function implements a one-to-all broadcast where a single named root process
sends the same data to all other processes.
• At the time of the call, the data is located in inbuf in process root and consists of incnt
items of type intype.
• After the call the data is replicated in inbuf in all processes.
• As inbuf is used for input at the root and for output in other processes, it has type INOUT.
Broadcast Operation

Syntax:
MPI Bcast ( send buffer, send count, send type, rank, comm)
Example
send count = 1;
root = 0;
MPI Bcast ( &a, send count, MPI INT, root, comm)
Broadcast Operation

The MPI BCAST routine enables you to copy data from the memory of the root processor to
the same memory locations for other processors in the communicator.

In this example, one data value in processor 0 is broadcast to the same memory locations
in the other 3 processors. Clearly, you could send data to each processor with multiple calls
to one of the send routines. The broadcast routine makes this data motion a bit easier.
Broadcast operation: Example
#include <stdio.h> int main(int argc, char** argv) {
#include <stdlib.h> MPI_Init(NULL, NULL);
#include <mpi.h>

void my_bcast(void* data, int count, MPI_Datatype datatype, int root, int world_rank;
MPI_Comm communicator) { MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_rank;
MPI_Comm_rank(communicator, &world_rank); int data;
int world_size; if (world_rank == 0)
MPI_Comm_size(communicator, &world_size); {
data = 100;
if (world_rank == root) { printf("Process 0 broadcasting data %d\n", data);
// If we are the root process, send our data to everyone my_bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
int i; }
for (i = 0; i < world_size; i++) {
else {
if (i != world_rank) {
MPI_Send(data, count, datatype, i, 0, communicator); my_bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
} printf("Process %d received data %d from root process\n",
} world_rank, data);
} else { }
// If we are a receiver process, receive the data from the root
MPI_Recv(data, count, datatype, root, 0, communicator, MPI_STATUS_IGNORE); MPI_Finalize();
} }
}
mpirun -n 4 ./my_bcast
Process 0 broadcasting data 100
Process 2 received data 100 from root process
Process 3 received data 100 from root process
Process 1 received data 100 from root process
Scatter and Gather operation

Another important class of collective operations are those that distribute data from one
processor onto a group of processors or vice versa. These are called scatter and gather
operations.

• In a scatter operation, all of the data (an array of some type) are initially collected on a single
processor (the left side of the figure).
• After the scatter operation, pieces of the data are distributed on different processors (the right side
of the figure).
• The gather operation is the inverse operation to scatter: it collects pieces of the data that are
distributed
Parallel across a group of processors and reassembles them in the properorder on a single
Computing CS453

processor.
Scatter and Gather operation

Parallel Computing CS453


Gather operation
• Each process (root process included) sends the contents of its send buffer to the
root process.
• The root process receives the messages and stores them in rank order.
• The outcome is as if each of the n processes in the group (including the root
process) had executed a call to

and the root had executed n calls to

where extent(recvtype) is the type extent obtained from a call to MPI_Type_extent().


Gather operation
MPI_GATHER(inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm)
IN inbuf address of input buffer (choice)
IN incnt number of elements sent to each (integer)
IN intype datatype of input buffer elements (handle)
OUT outbuf address of output buffer (choice)
IN outcnt number of elements received from each (integer)
IN outtype datatype of output buffer elements (handle)
IN root process id of root process (integer)
IN comm communicator (handle)

• This function implements an all-to-one gather operation.


• All processes (including the root process) send data located in inbuf to root.
• The process places the data in contiguous non overlapping locations in outbuf, with the
data from process i preceding that from process i+1.
• The outbuf in the root process must be P times larger than inbuf, where P is the number of
processes participating.
• The outbuf in processes other than root is ignored.
Gather operation

Syntax:
MPI Gather ( send buffer, send count, send type, recv buffer, recv count, recv type, recv rank, comm
)
Example:
send count = 1;
recv count = 1;
recv rank = 0;
MPI Gather ( &a, send count, MPI REAL, &a, recv count, MPI REAL, recv rank, MPI COMM WORLD );
Gather operation

Here, data values A on each processor are gathered and moved to processor 0 into
contiguous memory locations.
Scatter operation

MPI_SCATTER(inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm)


IN inbuf address of input buffer (choice)
IN incnt number of elements sent to each (integer)
IN intype datatype of input buffer elements (handle)
OUT outbuf address of output buffer (choice)
IN outcnt number of elements received from each (integer)
IN outtype datatype of output buffer elements (handle)
IN root process id of root process (integer)
IN comm communicator (handle)

• The scatter operation is the reverse of MPI_GATHER.


• A specified root process sends data to all processes, sending the ith portion of its inbuf
to process i; each process receives data from root in outbuf.
• Hence the inbuf in the root process must be P times larger than outbuf.
• This function differs from MPI_BCAST in that every process receives a different
value.
Parallel Computing CS453
Scatter operation

Here, four contiguous data values, elements of processor 0 beginning at A, are copied with
one element going to each processor at location A.

Parallel Computing CS453


Scatter Operation

Syntax:
MPI Scatter ( send buffer, send count, send type, recv buffer, recv count, recv type, rank,
comm )
Example:
send count = 1;
recv count = 1;
send rank = 0;
MPI Scatter ( &a, send count, MPI REAL, &a, recv count, MPI REAL, send rank, MPI COMM
WORLD );
Reduction operation
• A reduction is a collective operation in which a single process (the root process) collects data
from the other processes in a group and combines them into a single data item.
• For example, you might use a reduction to compute the sum of the elements of an array
that is distributed over several processors.
• Operations other than arithmetic ones are also possible, for example, maximum and
minimum, as well as various logical and bit-wise operations.

• The functions MPI_REDUCE and MPI_ALLREDUCE implement reduction operations.


• They combine the values provided in the input buffer of each process, using a specified
operation op, and return the combined value either to the output buffer of the single root
process (in the case of MPI_REDUCE) or to the output buffer of all processes
(MPI_ALLREDUCE).
• The operation is applied pointwise to each of the count values provided by each process.
• All operations return count values with the same datatype as the operands.
Reduction operation
• A reduction is a collective operation in which a single process (the root process) collects data
from the other processes in a group and combines them into a single data item.
• For example, you might use a reduction to compute the sum of the elements of an array
that is distributed over several processors.
• Operations other than arithmetic ones are also possible, for example, maximum and
minimum, as well as various logical and bit-wise operations.

The data, which may be array or scalar values, are initially distributed across the processors. After the
reduction operation, the reduced data (array or scalar) are located on the root processor.
Reduction operation
• The functions MPI_REDUCE and MPI_ALLREDUCE implement reduction
operations.
• They combine the values provided in the input buffer of each process, using a
specified operation op, and return the combined value either to the output buffer
of the single root process (in the case of MPI_REDUCE) or to the output buffer of
all processes (MPI_ALLREDUCE).
• The operation is applied pointwise to each of the count values provided by each
process.
• All operations return count values with the same datatype as the operands.
Reduction operation
MPI_REDUCE(inbuf, outbuf, count, type, op, root, comm)
MPI_ALLREDUCE(inbuf, outbuf, count, type, op, comm)
IN inbuf address of input buffer (choice)
OUT outbuf address of output buffer (choice)
IN count number of elements in input buffer (integer)
IN type datatype of input buffer elements (handle)
IN op reduction operation (handle)
IN root process id of root process (integer)
IN comm communicator (handle)
Reduction operation
The MPI REDUCE routine enables you to collect
data from each processor
reduce these data to a single value (such as a sum or max) and
store the reduced result on the root processor

Here, sums the values of A on each processor and stores results in X on processor the

32/34
Reduction operation

Syntax:
MPI Reduce ( send buffer, recv buffer, count, datatype, operation, rank, comm)

Example:

count = 1;
rank = 0;
MPI Reduce ( &a, &x, count, MPI REAL, MPI SUM, rank, MPI COMM_WORLD);
Reduction operation
Predefined operations available for MPI REDUCE:
Reduction operation
The operation MPI_MAXLOC combines pairs of values
(vi , li ) and returns the pair (v , l ) such that v is the
maximum among all vi 's and l is the smallest among
all li 's such that v = vi .

Similarly, MPI_MINLOC combines pairs of values and


returns the pair (v , l ) such that v is the minimum
among all vi 's and l is the smallest among all li 's such
that v = vi .

One possible application of MPI_MAXLOC or MPI_MINLOC is to compute the maximum or


minimum of a list of numbers each residing on a different process and also the rank of the first
process that stores this maximum or minimum .
Reduction operation
Since both MPI_MAXLOC and MPI_MINLOC require datatypes that
correspond to pairs of values, a new set of MPI datatypes have been
defined as follows:
Collective communication
routines
MPI provides the following collective communication routines:
Broadcast from one process to all other processes
Global reduction operations such as sum, min, max or user-defined reductions Gather
data from all processes to one process
Scatter data from one process to all processes
Advanced operations where all processes receive the same result from a gather, scatter,
or reduction. There is also a vector variant of most collective operations where each
message can be a different size.
64
Example: One-Dimensional Matrix-Vector Multiplication

• Message-passing program using collective communications will be


to multiply a dense n x n matrix A with a vector b ,
– i.e., x = Ab
• One way of performing this multiplication in parallel is to have each
process compute different portions of the product-vector x .
• In particular, each one of the p processes is responsible for
computing n/p consecutive elements of x .
• This algorithm can be implemented in MPI by distributing the
matrix A in a row-wise fashion, such that each process receives
the n/p rows that correspond to the portion of the product-vector x it
computes.
• Vector b is distributed in a fashion similar to x.

65
Example: One-Dimensional Matrix-Vector Multiplication

66
Example: One-Dimensional Matrix-Vector Multiplication

67
Example: One-Dimensional Matrix-Vector Multiplication

• An alternate way of computing x is to parallelize the task of performing


the dot-product for each element of x.
• That is, for each element xi , of vector x , all the processes will compute
a part of it, and the result will be obtained by adding up these partial
dot-products.
• This algorithm can be implemented in MPI by distributing matrix A in
a column-wise fashion.
• Each process gets n/p consecutive columns of A , and the elements
of vector b that correspond to these columns.

• Furthermore, at the end of the computation we want the product-


vector x to be distributed in a fashion similar to vector b .

68
Example: One-Dimensional Matrix-Vector Multiplication

69
Example: One-Dimensional Matrix-Vector Multiplication

70
Example: One-Dimensional Matrix-Vector Multiplication

• Comparing these two programs for performing matrix-vector


multiplication we see that
• the row wise version needs to perform only a
MPI_Allgather operation whereas
• the column-wise program needs to perform a MPI_Reduce
and a MPI_Scatter operation.
• In general, a row-wise distribution is preferable as it leads to
small communication overhead.

71
All-to-All communication

• The all-to-all personalized communication operation is performed in MPI by


using the MPI_Alltoall function.

MPI_Alltoall(void *sendbuf, int sendcount,


MPI_Datatype senddatatype, void *recvbuf, int
recvcount, MPI_Datatype recvdatatype, MPI_Comm
comm)

• Each process sends a different portion of the sendbuf array to each other process,
including itself.
• Each process sends to process i sendcount contiguous elements of type
senddatatype starting from the i * sendcount location of its sendbuf array.
• The data that are received are stored in the recvbuf array.
• Each process receives from process i recvcount elements of type recvdatatype
and stores them in its recvbuf array starting at location i * recvcount.
• MPI_Alltoall must be called by all the processes with the same values for the
sendcount, senddatatype, recvcount, recvdatatype, and comm arguments. Note
that sendcount and recvcount are the number of elements sent to, and received
from, each individual process.

72
Groups and Communicators
• In many parallel algorithms, communication operations need to be restricted
to certain subsets of processes.
• MPI provides several mechanisms for partitioning the group of processes that
belong to a communicator into subgroups each corresponding to a different
communicator.
• A general method for partitioning a graph of processes is to use
MPI_Comm_split that is defined as follows:
MPI_Comm_split(MPI_Comm comm, int color, int key,
MPI_Comm *newcomm)

• This function is a collective operation, and thus needs to be called by all the processes in
the communicator comm.
• The function takes color and key as input parameters in addition to the communicator,
and partitions the group of processes in the communicator comm into disjoint subgroups.
• Each subgroup contains all processes that have supplied the same value for the color
parameter.
• Within each subgroup, the processes are ranked in the order defined by the value of the
key parameter, with ties broken according to their rank in the old communicator
(i.e., comm ).
• A new communicator for each subgroup is returned in the newcomm parameter.

73
Groups and Communicators

• If each process called MPI_Comm_split using the values of parameters color


and key as shown in Figure then three communicators will be created, containing
processes {0, 1, 2}, {3, 4, 5, 6}, and {7}, respectively.

74
One-to-All Broadcast and All-to-One
Reduction
• Parallel algorithms often require a single process to send identical data to all other processes
or to a subset of them.
– This operation is known as one-to-all broadcast.
• Initially, only the source process has the data of size m that needs to be broadcast.
• At the termination of the procedure, there are p copies of the initial data – one belonging to
each process.
• The dual of one-to-all broadcast is all-to-one reduction.
• In an all-to-one reduction operation, each of the p participating processes starts with a buffer
M containing m words.
• The data from all processes are combined through an associative operator and accumulated
at a single destination process into one buffer of size m.

75
THANK YOU

76

You might also like