Chapter 3
Chapter 3
:Disadvantages
(1) Scalability between memory and CPUs. Adding more
CPUs can geometrically increases traffic on the shared
memory-CPU path, and for cache coherent systems,
geometrically increase traffic associated with cache/memory
management.
(2) Programmer responsibility for synchronization constructs
that insure "correct" access of global memory.
(3) Expense: it becomes increasingly difficult and expensive
to design and produce shared memory machines with ever
increasing numbers of processors.
Non-Shared (Distributed)
Memory
:Advantages
Memory is scalable with number of processors. Increase )1(
the number of processors and the size of memory increases
.proportionately
Each processor can rapidly access its own memory )2(
without interference and without the overhead incurred with
.trying to maintain cache coherency
Cost effectiveness: can use commodity, off-the-shelf )3(
.processors and networking
:Disadvantages
The programmer is responsible for many of the details )1(
.associated with data communication between processors
It may be difficult to map existing data structures, based )2(
.on global memory, to this memory organization
Parallel Random Access Machine (PRAM)
p0 p1 pi pi+1 pn-1
y Shared memory x
operations Each step of a PRAM algorithm consists of:
Read phase - up to n PEs may z
simultaneously perform one p0 p1 pi pi+1 pn
read from memory to its local
memory (i.e., a register) x
receive(x,z)
Compute phase - every processor is entitled to perform a (small) fixed
number of logical or arithmetic operations on the contents of its local
memory (registers)
+ * ++ /
p0 p1 pi pi+1 pn
z
Write phase - up to n PEs may p0 p1 pi pi+1 pn
simultaneously write a value that is its
local memory (i.e., a register) to the
y
global/common memory, send(z,y)
Memory there are a number of different ways for the
Access processors to gain access to memory.
(1) Exclusive p0 p1 pi pi+1 pn
itRead
is not possible to read a memory cell
simultaneously by several processors;
p0 p1 pi pi+1 pn
(2) Exclusive
itWrite
is not possible to write to memory cell
simultaneously by several processors;
(3) Concurrent Read p0 p1 pi pi+1 pn
it is possible to read a memory cell
simultaneously by several processors;
p0 p1 pi pi+1 pn
(4) Concurrent
itWrite
is possible to write to memory cell
simultaneously by several processors;
Concurrent Write p0 p1 pi pi+1 pn
p2 p4
p3 p5 p6
How long does it take for a datum to travel between two )5(
?neighboring processors
Is the time required by a datum to go from Pi to its neighbor Pj
a function of the length of the link connecting Pi and Pj
Interconnection
Networks
scalability
ring
p0 p1 p2 pn-1
2 deg
O(n) diam
2 bisect.
p1 p2
p3 p4 p5 p6 3 deg
p7 p8 p9 p10 p11 p12 p13 p14 O(2log n) diam
1 Bisect.
p5 p0 p1
p4 p2
p3
O(n-1) deg
2 diam
1 bisect.