0% found this document useful (0 votes)
11 views21 pages

Chapter 3

Uploaded by

saifiashour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

Chapter 3

Uploaded by

saifiashour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Part 2: Parallel Models (II)

1. Classification of Computers According to


Memory.
2. PRAM
3. Interconnection Networks.
Parallel Computer Memory Architectures

Shared Memory Distributed Memory Hybrid D-S Memory


Shared Memory
:Advantages
Global address space provides a user-friendly )1(
. programming perspective to memory
.Data sharing between tasks is fast )2(

:Disadvantages
(1) Scalability between memory and CPUs. Adding more
CPUs can geometrically increases traffic on the shared
memory-CPU path, and for cache coherent systems,
geometrically increase traffic associated with cache/memory
management.
(2) Programmer responsibility for synchronization constructs
that insure "correct" access of global memory.
(3) Expense: it becomes increasingly difficult and expensive
to design and produce shared memory machines with ever
increasing numbers of processors.
Non-Shared (Distributed)
Memory
:Advantages
Memory is scalable with number of processors. Increase )1(
the number of processors and the size of memory increases
.proportionately
Each processor can rapidly access its own memory )2(
without interference and without the overhead incurred with
.trying to maintain cache coherency
Cost effectiveness: can use commodity, off-the-shelf )3(
.processors and networking

:Disadvantages
The programmer is responsible for many of the details )1(
.associated with data communication between processors
It may be difficult to map existing data structures, based )2(
.on global memory, to this memory organization
Parallel Random Access Machine (PRAM)

Processors: there are n identical processors (PEs), , each of which is


identical to the RAM processor. Assume that n is (large) finite.
Memory: Common/Global memory with M locations,
Memory Access Unit: similar to MAU of RAM, but allows any PE to
get to any memory location

p0 p1 pi pi+1 pn-1

memory access unit

y Shared memory x
operations Each step of a PRAM algorithm consists of:
Read phase - up to n PEs may z
simultaneously perform one p0 p1 pi pi+1 pn
read from memory to its local
memory (i.e., a register) x

receive(x,z)
Compute phase - every processor is entitled to perform a (small) fixed
number of logical or arithmetic operations on the contents of its local
memory (registers)
+ * ++ /
p0 p1 pi pi+1 pn

z
Write phase - up to n PEs may p0 p1 pi pi+1 pn
simultaneously write a value that is its
local memory (i.e., a register) to the
y
global/common memory, send(z,y)
Memory there are a number of different ways for the
Access processors to gain access to memory.
(1) Exclusive p0 p1 pi pi+1 pn
itRead
is not possible to read a memory cell
simultaneously by several processors;
p0 p1 pi pi+1 pn
(2) Exclusive
itWrite
is not possible to write to memory cell
simultaneously by several processors;
(3) Concurrent Read p0 p1 pi pi+1 pn
it is possible to read a memory cell
simultaneously by several processors;
p0 p1 pi pi+1 pn
(4) Concurrent
itWrite
is possible to write to memory cell
simultaneously by several processors;
Concurrent Write p0 p1 pi pi+1 pn

a) Priority CW - only PE with highest priority succeeds


b) Common CW - all PEs writing to the same location must write
the same value
c) Arbitrary CW - one PE, chosen arbitrarily, succeeds
d) Combining CW
i) Arithmetic functions - SUM, PRODUCT
ii) Logical functions - AND, XOR
iii) Selection/Semigroup - MAX, MIN
PRAM models
a) EREW
b) CREW
c) ERCW
d) CRCW
Interconnection
Networks
In PRAM, all exchange of data among processors take place #
.through the shared memory
Another way for processors to communicate is via direct links #
.connecting them

The M locations of memory are distributed among the N #


.processors

When processors Pi wishes to send a datum to processor Pj, it #


.uses the network to route the datum from its memory to that of Pj
Interconnection
Networks

Two processors directly connected by a link are said to be #


.neighbors
p1

p2 p4

p3 p5 p6

The link between Pi and Pj represents two links, namely, one #


.from Pi to Pj and one from Pj to Pi
Interconnection
Networks

There are a number of questions that are need to answer


:when designed a model of computation of this kind

?What shape should the network have )1(


How many neighbors should each processor have, how are
these neighbors selected, should all processors have the
.same number of neighbors

Can a processor communicate with all of its neighbors at )2(


?once
Can processor send data to all of its processors and receive
.data from all of its neighbors in one time unit
Interconnection
Networks

What are the size of a message that a processor can )3(


?transmission at a time
If a datum is considered to have constant size, how many
.data can be sent in one transmission

How long does it take for a processor to initiate a )4(


?transmission
Is the time required by a processor to start up a
?communication significant

How long does it take for a datum to travel between two )5(
?neighboring processors
Is the time required by a datum to go from Pi to its neighbor Pj
a function of the length of the link connecting Pi and Pj
Interconnection
Networks

?How long does it take a processor to receive a datum )6(


Is the time required by a datum sent by Pi to gain access to
?processor Pj significant

?Are the paths static or dynamic )7(


?Does the algorithm allow for flexibility in choosing the paths

Do the processor operate synchronously or )8(


?asynchronously

What kind or processor is used by an interconnection )9(


?network
Interconnection
Networks
degree of the network maximum degree of any PE in the network.

communication diameter maximum of the minimum distance


between any pair of PEs.

bisection width minimum number of wires that have


to be removed in order to disconnect
the network into 2 “equal” size subnetworks.
Interconnection
Networks
Scalability
A network model must be scalable so that more processors can be easily
added when new resources are available.
Also, the model should be regular.
linear array
p0 p1 p2 pn-1

n processors are connected in the form of a one dimensional array.


pi is connected to pi-1 and pi+1,
2 deg
for all i=1..n-2
p0 is connected to p1 and O(n) Diam
1 Bisect.
pn-1 is connected to pn-2

scalability
ring

p0 p1 p2 pn-1

n processors are connected in the form of a ring.


pi is connected to pi-1 and pi+1mod n,

2 deg
O(n) diam
2 bisect.

HW: Study the Scalability


tree p0

p1 p2

p3 p4 p5 p6 3 deg
p7 p8 p9 p10 p11 p12 p13 p14 O(2log n) diam
1 Bisect.

Consists of n=2d -1 processors arranged as a complete binary tree. Each processor


at level i is connected by a 2-way communication line to its parent at level i+1
and to its two children at level i-1.

HW: Study the Scalability


mesh
p0 p1 p2 P0,0 P0,1 P0,2
A two dimensional network is obtained by
p3 p4 p5 P1,0 P1,1 P1,2
arranging the n processors into m1 x m2.
The processor in row i and column j is p6 p7 p8 P2,0 P2,1 P2,2
denoted by pi,j.
p9 p10 p11 P3,0 P3,1 P3,2
Pi-1,j
4 deg
Pi,j-1 Pi,j Pi,j+1 O(m1+m2) diam
O(m1) Bisect
Pi+1,j

p0 p1 p2 p'0 p’1 p’2

p3 p4 p5 p’3 p’4 p’5

p6 p7 p8 p’6 p’7 p’8

p9 p10 p11 p’9 p’10 p’11


hypercube
p6 p7 110 111 110 111

p2 p3 010 011 010 011


p4 p5 100 101 100 101

p0 p1 000 001 000 001

Consists of n=2d processors connected as cube.


each processor pi is connected to pj if and only if i and j is differ in one
bit(binary).
O(log n) deg
O(log n) diam
O(log n) bisect.
0110 0111 1110 1111

0010 0011 1010 1011


0100 0101 1100 1101

0000 0001 1000 1001


star
p7
p6 p8

p5 p0 p1

p4 p2
p3

O(n-1) deg
2 diam
1 bisect.

HW: Study the Scalability

You might also like