0% found this document useful (0 votes)
411 views

Case Studies - N-Body Solvers - Tree Search - Openmp and Mpi Implementations and Comparison

This document summarizes parallel programming techniques for n-body solvers and tree search algorithms. It describes OpenMP and MPI implementations of n-body solvers that distribute particle data across processes. It also discusses mapping tree search problems to parallel processes using work stealing and distributing partial tours. Dynamic load balancing is important for efficient parallel tree searches.

Uploaded by

Monika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
411 views

Case Studies - N-Body Solvers - Tree Search - Openmp and Mpi Implementations and Comparison

This document summarizes parallel programming techniques for n-body solvers and tree search algorithms. It describes OpenMP and MPI implementations of n-body solvers that distribute particle data across processes. It also discusses mapping tree search problems to parallel processes using work stealing and distributing partial tours. Dynamic load balancing is important for efficient parallel tree searches.

Uploaded by

Monika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT – 5 : PARALLEL PROGRAM DEVELOPMENT

Case studies – n-Body solvers – Tree Search – OpenMP and MPI


implementations and comparison.

THE N-BODY SOLVERS


 The n-body problem is one of the most famous problems in mathematical physics and
molecular dynamics.
 Find the positions and velocities of a collection of interacting particles over a period of
time.
 An n-body solver is a program that finds the solution to an n-body problem by simulating
the behavior of the particles.
 For example,
 An astrophysicist might want to know the positions and velocities of a collection
of stars.
 A chemist might want to know the positions and velocities of a collection of molecules or
atoms.

For n = 2, the problem was completely solved.


For n = 3, solutions exist in special cases.
In general, numerical methods must be used to simulate such systems.

Problem Formulation
 To determine the positions and velocities.
 Based on
Newton’s second law of motion
Newton’s law of universal gravitation
Newton’s second law of motion

1
The acceleration of an object is dependent upon two variables –
1. the net force acting upon the object
2. the mass of the object
The acceleration of an object depends directly upon the net force acting upon the object,
and inversely upon the mass of the object.
Newton’s law of universal gravitation
A particle attracts every other particle in the universe using a force that is directly
proportional to the product of their masses and inversely proportional to the square of the
distance between them.
Suppose we have n=2 particles (q and k) with
Masses - mq and mk
Positions - sq(t) and sk(t) at a time t
The force on particle q exerted by particle k is given by

The total force on any particle is calculated by adding all the forces due to all the particles.
If our n particles are numbered 0, 1, 2, … , n -1, then the total force on particle q is given by

The acceleration of particle q is given by the formula

2
Thus Newton’s laws give us a system of differential equations — equations involving
derivatives.
Our job is to find at each time t , the position and the velocity of the particle.

Basic Algorithm for Computing N-Body Forces

Computation of the forces


We’re assuming that the forces and the positions of the particles are stored as two-dimensional
arrays, forces and pos, respectively.
The x-component of the force on particle q is forces[q][X] and the y-component is forces[q][Y].
Similarly, the components of the position are pos[q][X] and pos[q][Y].

3
A Reduced Algorithm for Computing N-Body Forces

The individual forces

Euler’s Method

4
PARALLELIZING THE N-BODY SOLVERS USING OPENMP
 Apply Foster’s methodology.
 Initially, we want a lot of tasks.
 Start by making our tasks the computations of the positions, the velocities, and the total
forces at each timestep.
Communications Among Tasks in the Basic N-Body Solver

Communications Among Agglomerated Tasks in the Basic N-Body Solver

Communications Among Agglomerated Tasks in the Reduced N-Body Solver

5
PARALLELIZING THE BASIC SOLVER USING OPENMP

PARALLELIZING THE REDUCED SOLVER USING OPENMP

PARALLELIZING THE BASIC SOLVER USING MPI


 Each process stores the entire global array of particle masses.
 Each process only uses a single n-element array for the positions.
 Each process uses a pointer loc_pos that refers to the start of its block of pos.
 So on process 0 local_pos = pos; on process 1 local_pos = pos + loc_n; etc.

6
Communication In A Possible MPI Implementation of the N-Body Solver

PARALLELIZING THE REDUCED SOLVER USING MPI

7
Run-Times for OpenMP and MPI Versions of N-Body Solvers

TREE SEARCH
Many problems can be solved using a tree search. As a simple example, consider the traveling
salesperson problem, or TSP. In TSP, a salesperson is given a list of cities. The salesman needs
to visit and a cost for traveling between each pair of cities. The problem is to visit each city once,
returning to the starting city, with the least possible cost. Thus, the TSP is to find a minimum-
cost tour.

TSP is what’s known as an NP-complete problem. This means that there is no algorithm known
for solving it that, in all cases, is significantly better than exhaustive search. Exhaustive search
means examining all possible solutions to the problem and choosing the best. The number of
possible solutions to TSP grows exponentially as the number of cities is increased.

For example, if we add one additional city to an n-city problem, we’ll increase the number of
possible solutions by a factor of n - 1. Thus, although there are only six possible solutions to a
four-city problem, there are 4*6 = 24 to a five-city problem, 5*24 = 120 to a six-city problem,
6*120 =720 to a seven-city problem, and so on.

Example: Consider a four-city TSP

8
Solution :
 Start at the origin, here city 0
 Do the depth-first search
 Maintain the current best tour, that is minimum cost
 If a node is reached with cost larger than current minimum cost, do not go deeper

Search tree for four-city TSP

In the example, we’ll start at the root, and branch left until we reach the leaf Labeled

Then we back up to the tree node labeled 0→ 1, since it is the deepest ancestor node
with unvisited children, and we’ll branch down to get to the leaf labeled

Continuing, we’ll back up to the root and branch down to the node labeled 0→2.
When we visit its child, labeled

we’ll go no further in this subtree, since we’ve already found a complete tour with cost less than
21.We’ll back up to 0→2 and branch down to its remaining unvisited child. Continuing in this
fashion, we eventually find the least-cost tour.

Algorithm
 Cities are numbered 0, 1, . . . , n − 1
 A tour contains number of cities, the cities in the tour, and the cost of it

9
 Number of cities is citycount (tour)
 Initially, tour contains the first city 0 and cost 0
 Besttour(tour) checks if this is the best tour so far
 Ipdatebesttour(tour) updates the best tour
 Feasible (tour, city ) checks if city has been visited, and if not, if it can be added to tour
so that cost up to city < cost( best tour )
 Add(tour, city ) adds city to tour; city must be feasible
 Removelast(tour, city ) removes last city from tour

PARALLELIZING TREE SEARCH USING OPENMP

Mapping
 Assume p processes
 One process could run until there are p tours in the stack
 Assign them to processes

Best Tour
 Processes work independently until each finds it local best tour
 Do global reduction on process 0 to find the best tour Simple, but a process may search
through partial tours that cannot lead to global best tour

Dynamic Mapping
 When a process runs out of work, get more work
 Each stack entry is partial tour

10
 A process can get a partial tour and work on it
 The order in which nodes are visited does not matter

When a single thread executes some code, we use the OpenMP directive
# pragma omp single

This will insure that the following structured block of code will be executed by one thread in the
team, and the other threads in the team will wait in an implicit barrier at the end of the block
until the executing thread is finished.

The test can also be replaced by the OpenMP directive


# pragma omp master

PARALLELIZING TREE SEARCH USING MPI


 Process 0 generates and sends partial tours to p processes.
 When a process finds a best tour, it sends its cost to all other processes.

 Destination can check periodically using


MPI_Recv(& receivedcost , 1 , MPI INT , MPI ANY SOURCE , NEW COST TAG,
comm, & status ) ;
 But receiving process will block.

 We can use MPI_Iprobe to check if a message from src with tag in communicator comm
is available
 If such is available ∗msg is 1 and status−>MPI SOURCE contains the source; otherwise
∗msg is 0

11
 To check if there is a message from any source
MPI_Iprobe(MPI ANY SOURCE,NEW COST TAG, &msg, &status)

 If msg=1, we can receive with


MPI_Recv(&received cost,1,MPI INT,status.MPI SOURCE, NEW COST
TAG,comm,MPI STATUS IGNORE);

When the call to MPI Allreduce returns, we have two alternatives:


(1) If process 0 already has the best tour, we simply return.
(2)Otherwise, the process owning the best tour sends it to process 0.

12

You might also like