Chapter 3
Chapter 3
Jun Zhang
Laboratory for High Performance Computing & Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506
Chapter 3: CS621 1
3.1a: Architecture of Theoretical Parallel
Computer
Parallel Random Access Machine (PRAM) is a
theoretical model of parallel computer, with
1.) p identical processors
2.) a global memory of unbounded size
3.) memory is uniformly accessible to all processors
Chapter 3: CS621 2
3.1b: Illustration of the PRAM Model
Chapter 3: CS621 3
3.1c: PRAM Subclasses
Exclusive-read, exclusive-write (EREW) PRAM: Access to a
memory location is exclusive. No concurrent read or write
operations are allowed
The weakest PRAM model, affording minimum concurrency in
memory access
Concurrent-read, exclusive-write (CREW) PRAM: Multiple read
accesses to a memory location is allowed. Multiple write
accesses to a memory location is serialized
Exclusive-read, concurrent-write (ERCW) PRAM: Multiple write
accesses are allowed to a memory location. Multiple read
accesses are serialized
Concurrent-read, concurrent-write (CRCW) PRAM: Both
multiple read and multiple write accesses to a memory location
are allowed
This is the most powerful PRAM model
Chapter 3: CS621 4
3.1d: PRAM Semantics
Concurrent read access to a memory location by all
processors is OK.
Concurrent write access to a memory location
presents semantic discrepancy and requires
arbitration
The most frequently used arbitration protocols are:
Common: Concurrent write is allowed if all the writing
processors have the same value
Arbitrary: An arbitrary processor is allowed to
proceed with the write operation, and the rest fail
Priority: Processors are prioritized a priori, the
processor with the highest priority writes and others
fail
Sum: The sum of all the quantities is written
Chapter 3: CS621 5
3.2: Processor Granularity
Chapter 3: CS621 6
3.3a: Interconnection Networks
Chapter 3: CS621 8
3.3d: Switch Board Connections
Chapter 3: CS621 9
3.3e: Dynamic Interconnection
Chapter 3: CS621 10
3.3f: Multistage Dynamic Interconnection
Chapter 3: CS621 11
3.3g: Switch Functionalities
Chapter 3: CS621 12
3.3h: Cost of a Switch
Chapter 3: CS621 14
3.3i: Approximation of Network Costs
Chapter 3: CS621 15
3.4a: Network Topologies
Chapter 3: CS621 16
3.4b: Bus-Based Networks
Chapter 3: CS621 17
3.6b: Bus-Based Interconnect with Cache
Chapter 3: CS621 18
3.6c: Crossbar Network
A crossbar network uses a grid of switches or
switching nodes to connect p processors to b
memory banks
It is a non-blocking network
The total number of switching nodes is (pb)
In many cases, b is at least on the order of p, the
complexity of the crossbar network is (p*p)
Disadvantage: Switch complexity is difficult to realize
at high data rates
Scalable in terms of performance, but not scalable
in terms of cost
Chapter 3: CS621 19
3.6d: Crossbar Network (I)
Chapter 3: CS621 20
3.6e: Crossbar Network (II)
Chapter 3: CS621 21
3.6f: Multistage Networks
Chapter 3: CS621 22
3.6g: Multistage Interconnection (I)
Chapter 3: CS621 23
3.6h: Multistage Interconnection (II)
Chapter 3: CS621 24
3.6i: Omega Network
Chapter 3: CS621 25
3.6j: Illustration of Omega Network
Chapter 3: CS621 26
3.6k: Perfect Shuffle
Chapter 3: CS621 27
3.6l: Four-Stage Omega Network
Chapter 3: CS621 28
3.6m: Two Connection Modes
Chapter 3: CS621 30
3.6o: Cost and Communication of Omega
Network
The cost of an omega network is (p log p)
Data routing scheme in an omega network:
Chapter 3: CS621 31
3.7: Completely-Connected Network
Chapter 3: CS621 32
3.8: Star-Connected Network
Chapter 3: CS621 33
3.9: Linear Array and Ring Networks
Chapter 3: CS621 34
3.10: 2D Mesh Networks
2D mesh network
Chapter 3: CS621 35
3.11: 3D Mesh Network
Chapter 3: CS621 36
3.12a: Hypercube Networks
Chapter 3: CS621 37
3.12b: Labeling Hypercube Networks
Chapter 3: CS621 38
3.12c: 4D Hypercube
Chapter 3: CS621 39
3.12d: Partition Hypercube
Chapter 3: CS621 42
3.13c: Static and Dynamic Tree Networks
Chapter 3: CS621 43
3.13d: Communication in Tree Networks
Chapter 3: CS621 44
3.13e: Fat Tree Network
Chapter 3: CS621 45
3.14a: Evaluating Static Interconnection
Networks
There are several criteria to characterize the
cost and performance of static
interconnection networks
Diameter
Connectivity
Bisection Width
Bisection Bandwidth
Cost
Chapter 3: CS621 46
3.14b: Diameter of a Network
The diameter of a
network is the the
maximum distance
between any two
processing nodes in the
network
The distance between
two processing nodes
is defined as the
shortest path between
them
Chapter 3: CS621 47
3.14c: Diameters of Mesh Networks
Chapter 3: CS621 48
3.14d: Diameters of Hypercube and Tree
Chapter 3: CS621 49
3.15a: Connectivity of Networks
Chapter 3: CS621 51
3.15c: Connectivity of Hypercube & Tree
Chapter 3: CS621 52
3.16a: Bisection Width & Channel Width
Chapter 3: CS621 53
3.16b: Channel Rate & Channel
Bandwidth
The peak rate a single physical wire can deliver bits
is called channel rate
The channel bandwidth is the peak rate at which
data can be communicated between the ends of a
communication link
Channel bandwidth is the product of channel rate
and channel width
The bisection bandwidth is the minimum volume of
communication allowed between any two halves of
the network
It is the product of bisection width and channel
bandwidth
Chapter 3: CS621 54
3.16c: Characteristics of Static Networks
Network Diameter Bisection Arc Number
Width Connect. of Links
Fully conn-ted 1 p2 / 4 p-1 p(p-1)/2
Star 2 1 1 p-1
Binary tree 2 log(( p + 1) / 2) 1 1 p-1
Linear array p-1 1 1 p-1
Ring |p-2| 2 2 p
2D mesh 2( p 1) p 2 2( p p )
2D meshwrap 2 p / 2 2 p 4 2p
Hypercube log p p/2 log p ( p log p ) / 2
Chapter 3: CS621 55
3.17: Cost of Static Interconnection
Networks
The cost of a static network can be defined in
proportion to the number of communication links or
the number of wires required by the network
Another measure is the bisection bandwidth of a
network
a lower bound on the area in a 2D packaging or the
volume in a 3D packaging
Definition is in terms of the order of magnitudes
Completely connected and hypercube networks are
more expensive than others
Chapter 3: CS621 56
3.18: Evaluating Dynamic Interconnection
Networks
Need consider both processing nodes and switching
units
Criteria similar to those used with the static
interconnection networks can be defined
The diameter is defined as the maximum distance
between any two nodes in the network
The connectivity is the maximum number of nodes
(or edges) that must fail to break the network
The cost of a dynamic network is determined by the
number of switching nodes in the network
Chapter 3: CS621 57