0% found this document useful (0 votes)

42 views25 pages

15CS72 ACA Module1 Chapter1Final

The document discusses various architectures of parallel computers, including shared memory multiprocessors (UMA, NUMA, COMA) and distributed memory multicomputers. It also covers vector and SIMD supercomputers, detailing their operational models and examples of early commercial systems. Additionally, it introduces theoretical models like PRAM and VLSI complexity models for analyzing parallel algorithms and their performance.

Uploaded by

Tarun Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views25 pages

15CS72 ACA Module1 Chapter1Final

Uploaded by

Tarun Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Scanned by CamScanner

Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
1.2 Multiprocessors and Multicomputers

The parallel computers have two different architectures i.e. shared common memory
with multiprocessor and unshared distributed memory with multicomputer.

1.2.1 Shared memory Multiprocessor

1. UMA Model
2. NUMA Model
3. COMA Model

UMA Model(Uniform Memory Access)

1. Here the physical memory is uniformly shared by all the processors.
2. All processors have equal access time to all memory words, which is why it is
called uniform memory access. Refer the diagram given below

3. The system interconnect is through bus, crossbar or multistage network.

4. When all processors have equal access to all peripheral devices, the system is
called a symmetric multiprocessor. In this case, all the processors are equally
capable of running the executive programs, such as OS kernel and I/O service
routines.
5. ln asymmetric multiprocessor only master processor can execute the operating
system and handle I/O. The remaining processors have no I/O capability and
thus are called Attached Processors. Attached processors execute user codes
under the supervision of the master processor.
NUMA(Non uniform memory Access) Model

Each processor has different access time for a particular memory word and hence the
name is non uniform memory access. There are two NUMA models given below
1. Shared Local memory NUMA Model: Example: BBN TC-2000 Butterfly
● Each processor has local memory which is shared with other processors
also. Hence collection of all local memories forms a global address
space accessible by all processors as shown in figure below

● It is faster to access a local memory with a local processor. The access of

remote memory attached to other processors takes longer due to the
added delay through the interconnection network.
2. Hierarchical Cluster Model : Example :The Cedar multiprocessor, built at the
University of Illinois
● Each cluster is a collection of multiple processors.
● All processors belonging to the same cluster are allowed to uniformly
access the Cluster Shared Memory(CSM).
● All clusters have equal access to the global memory. However, the access
time to the cluster memory is shorter than that to the global memory.
● The fastest is local memory access. The next is global memory access.
The slowest is access of remote memory.

COMA( Cache Only Memory Architecture) Model

1. Here each processor has a cache memory.
2. There is no memory hierarchy at each processor node. All the caches form a
global address space.
3. Remote access to any cache is assisted by distributed cache directory.
4. Whenever data is accessed in remote cache it gets migrated to where it will be
used. This reduces the number of redundant copies and allows a more efficient
use of memory resources.
5. Examples : Kendall Square Research's KSR-1 machine.
Some early commercial Multiprocessor systems
1. Sequent Symmetry S81
2. IBM ES/9000
3. BBN TC-2000
1.2.2 Distributed Memory Multicomputers
1. The system consists of multiple computers, often called as nodes, which are
interconnected by a message-passing network.
2. Each node is an autonomous computer consisting of a processor, local memory,
and sometimes attached disks or I/O peripherals.
3. Each node has private local memory which is not accessible by other nodes ,
hence it is also called as no remote memory access (NORMA) machines.
4. Internode communication is carried out by passing messages through the static
connection network.
5. Multiple nodes are connected through various static network topologies like rings,
tree, mesh, torus, hypercube , cube connected cycles etc.
6. Various communication patterns are demanded among the nodes, such as
one-to-one, broadcasting, permutations, and multicast patterns.
Examples :
1. Caltech Cosmic and Intel iPSC/1- uses hypercube architecture and
software controlled message switching.
2. Intel Paragon and the Parsys SuperNode 1000- uses mesh architecture
and hardware controlled message routing.

Some early commercial Multicomputer Systems

1. Intel Paragon XP/S
2. nCUBE/2 6480
3. Parasys SuperNode 1000
Gordon Bell (1992) has provided a taxonomy of MIMD machines as shown below.

Multiprocessors have single address space. Multiprocessors using centrally shared

memory have limited scalability.Multicomputers use distributed memories with multiple
address spaces. They are scalable with distributed memory.
1.3 MultiVector and SIMD Computers
1.3.1 Vector supercomputers
1. The program and data are loaded into main memory from the host computer.
2. All instructions are first decoded by the scalar control unit. If the decoded
instruction is a scalar operation it will be directly executed by the scalar
processor using the scalar functional pipelines.
3. If the instruction is decoded as vector operation , it will be sent to vector control
unit.
4. The vector control unit manages the flow of vector data between vector functional
units and main memory.
5. There are multiple vector functional units which are pipelined. Data is forwarded
from one vector functional unit to another i.e. called vector chaining.
There are two types of vector processor
1. Register Register vector processor
2. Memory Memory vector processor
Register Register Vector Processor
1. Vector registers are used to hold the vector operands, intermediate and final
vector results.
2. The vector functional pipelines retrieve operands from and put results into the
vector registers. All vector registers are programmable for user instructions.
3. The length of each vector register is usually fixed, say, sixty-four 64-bit
component registers in a vector register in a Cray Series supercomputer.

Memory to Memory Vector Processor

1. Vector operands and results are directly retrieved from and stored into the main
memory in superwords, say, 512 bits as in Cyber 205.
Some Early Commercial Vector Supercomputers
1. The DEC VAX 9000 was Digital's largest mainframe system providing concurrent
scalar and vector and multiprocessing capabilities.
2. The Cray Y-MP family offered both vector and multiprocessing capabilities.

1.3.2 SIMD Supercomputers

The operational model of SIMD machine is specified by a 5-tuple
M=(N,C,I,M,R)

1. N is the number of processing elements (PEs) in the machine. For example, the
Illiac IV had 64 PEs and the Connection Machine CM-2 had 65,536 PEs.
2. C is the set of instructions directly executed by the control unit(CU).
3. I is the set of instructions broadcast by the CU to all PEs for parallel execution.
These include arithmetic, logic, data routing, masking, and other local operations
executed by each active PE over data within that PE.
4. M is the set of masking schemes, where each mask partitions the set of PEs into
enabled and disabled subsets.
5. R specifies the data routing schemes to be followed during inter PE
communication.
Operational Specification of MasPar MP-1 computer
1. MP-1 has 1024 to 16384 PE’s.
2. The Control unit executed scalar instructions and broadcasted vector instructions
to PE array and controls Inter PE communication.
3. Each PE was a register-based load/store RISC processor capable of handling
integer and floating point computations.
4. The masking scheme was built within each PE and continuously monitored by
the CU which could set and reset the status of each PE dynamically at run time.
5. The MP-1 had an X-Net mesh nearest 8 neighbour network plus a global
multistage crossbar router for inter-CU-PE.
Some Early Commercial SIMD Supercomputers
1. Maspar Computer Corporation MP-1 family
2. Thinking Machines Corporation CM-2
3. DAP600

1.4 PRAM and VLSI Complexity Models

These are theoretical models of parallel computer which helps for developing parallel
algorithms and scalability and programmability analysis. No real computer system can
behave exactly like the PRAM. but at the same time the PRAM model provides us with
a basis for the study of parallel algorithms and their performance in terms of time and
space complexity.

1.4.1 Parallel Random Access Machines

The PRAM model is shown in figure below. The n- processor PRAM has globally
addressable memory. The shared memory is centralized or distributed among multiple
processors. The n processors operate on a synchronized read-memory, compute, and
write-memory cycle. Four memory-update options are possible.
Exclusive Read(ER)—This allows at most one processor to read from any memory
location in each
Cycle.
Exclusive Write(EW)—This allows at most one processor to write into a memory
location at a time.
Concurrent Read(CR)—This allows multiple processors to read the same information
from the same memory cell in the same cycle.
Concurrent Write(CW) —This allows simultaneous writes to the same memory location.
Hence some policy must be set up to resolve write conflicts.
Various combinations of the above options lead to several variants of the PRAM model
as specified below.
Since CR does not create a conflict problem, variants differ mainly in how they handle
the CW conflicts.

PRAM Variants: Described below are four variants of the PRAM model, depending on
how the memory reads and writes are handled.
EREW-PRAM model→ This model forbids more than one processor from reading or
writing the same memory cell simultaneously.
CREW-PRAM model→ Concurrent Reads to same memory location is allowed, but
write conflicts are avoided by mutual exclusion.
ERCW-PRAM model→ This allows exclusive read or concurrent writes to the same
memory location.
CRCW-PRAM model→ This model allows either concurrent reads or concurrent writes
to the same memory location.
In reality such parallel machines don’t exist. The CREW algorithm runs faster than an
equivalent EREW algorithm. It has been proved that the best n-processor EREW
algorithm can be no more than O(log n) times slower than any n-processor CRCW
algorithm. These models are used by computer scientists for complexity analysis,
performance analysis and scalability analysis.

1.4.2 VLSI Complexity Model

The parallel computers use VLSI chips for fabricating processor arrays, memory arrays
etc.
AT 2 Model
Let A be the chip area. The latency T is time required from when the inputs are applied
until all outputs are produced for a single problem instance.Let s be the size of the
problem. Then there exists a lower bound f(s) such that
A * T 2 >= O(f(s)). The chip is represented by the base area in the two horizontal
dimensions. The vertical dimension corresponds to time. Therefore, the
three-dimensional solid represents the history of the computation performed by the chip
as shown in figure 1.15.

Three bounds on VLSI circuits are shown below. The bounds are obtained by setting
limits on memory, l/O, and communication for implementing parallel algorithms with
VLSI chips
Memory Bound on Chip Area A
Amount of memory required for computation is influenced by the chip area. The memory
is limited by how densely information can be placed on the chip. As depicted in Fig. 1.15
, the memory requirement of a computation sets a lower bound on the chip area A.
I/O Bound on Volume AT
The volume of the rectangular cube is represented by the product AT. As information
flows through the chip for a period of time T, the number of input bits cannot exceed the
volume. The volume represents the amount of information flowing through the chip
during the entire course of the computation.
Bisection Communication Bound √AT
The bisection is represented by a vertical slice in the cube. The distance of this vertical
slice is √A and height is T. The bisection area represents the maximum amount of
information exchange between the two halves of the chip circuit during the time period
T.
Note
The efficiency of algorithm is measured through time complexity and space complexity.
The time complexity is a measure of execution time and it is a function of problem size
s. For example, a time complexity is said to be O(f(s)) if there exist positive constants cl,
c2 and s0, such that c1f(s)<=g(s)<=c2f(s) for all non negative values of s>s0.
Even the space complexity can be defined as a function of problem size s. Deterministic
algorithm is the one which produces the same output every time a program is run. Non
deterministic algorithm contains operations resulting in one outcome from a set of
possible outcomes. The set of problems which are solvable in polynomial time by
deterministic algorithms are called P-class problems. The set of problems solvable by
nondeterministic algorithms in polynomial time is called NP-class. Most computer
scientists believe that P != NP. This leads to the conjecture that there exists a subclass,
called NPC problems. Thus NP-complete problems are considered the hardest ones to
solve. Only approximation algorithms can be derived for solving NP-complete problems
in polynomial time.

Vewlix VLX 1 Base
No ratings yet
Vewlix VLX 1 Base
6 pages
Nagai Tilt XRD
No ratings yet
Nagai Tilt XRD
7 pages
COME6102 Chapter 1 Introduction 2 of 2
No ratings yet
COME6102 Chapter 1 Introduction 2 of 2
8 pages
Making Salts
No ratings yet
Making Salts
29 pages
15CS72 IAT3 Solution
No ratings yet
15CS72 IAT3 Solution
17 pages
NRA24 User Manual (CAN)
No ratings yet
NRA24 User Manual (CAN)
16 pages
Thermal Energy and Heat (Repaired)
No ratings yet
Thermal Energy and Heat (Repaired)
9 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
Topology Class 1
No ratings yet
Topology Class 1
45 pages
Upd 2
No ratings yet
Upd 2
87 pages
F
No ratings yet
F
88 pages
250+ TOP MCQs On Geotechnical Engineering and Answers
100% (4)
250+ TOP MCQs On Geotechnical Engineering and Answers
4 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
Cloud Computing Lecture3
No ratings yet
Cloud Computing Lecture3
50 pages
A Novel Online Machine Learning Approach For..
No ratings yet
A Novel Online Machine Learning Approach For..
7 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
Evo Series
No ratings yet
Evo Series
2 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
15CS72 ACA Module2Final
No ratings yet
15CS72 ACA Module2Final
29 pages
Pda 2
No ratings yet
Pda 2
105 pages
Blackbody Radiation Planck's Law
No ratings yet
Blackbody Radiation Planck's Law
4 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
ACA Unit5 Notes
No ratings yet
ACA Unit5 Notes
26 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Parallel
No ratings yet
Parallel
5 pages
Prosman2 - Fluidity of Molten Metal
No ratings yet
Prosman2 - Fluidity of Molten Metal
22 pages
Advanced Computer Architecture Assigment
No ratings yet
Advanced Computer Architecture Assigment
60 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
CS311 Final Term Question File 2019, 2020, 2021
No ratings yet
CS311 Final Term Question File 2019, 2020, 2021
5 pages
NIC Asia Bank Limited
No ratings yet
NIC Asia Bank Limited
49 pages
17 Computer Architecture and Organization
No ratings yet
17 Computer Architecture and Organization
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
NGR Installation Manual PDF
No ratings yet
NGR Installation Manual PDF
15 pages
Module2
No ratings yet
Module2
124 pages
PM-0.5 MK: - Reference Manual
No ratings yet
PM-0.5 MK: - Reference Manual
7 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
S 8 Mod 1
No ratings yet
S 8 Mod 1
33 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
DMS (22319) - Chapter 5 Notes
100% (1)
DMS (22319) - Chapter 5 Notes
53 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Unit 1
No ratings yet
Unit 1
25 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
1-117 Ac Comp Quiz
100% (1)
1-117 Ac Comp Quiz
394 pages
Free Booklet On Lettering Basics, by Mark Van Leeuwen
No ratings yet
Free Booklet On Lettering Basics, by Mark Van Leeuwen
9 pages
Software Engineering
100% (2)
Software Engineering
185 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Reviewer
No ratings yet
Reviewer
5 pages
Secrets of Sight Reading Piano Music
100% (5)
Secrets of Sight Reading Piano Music
8 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Flynn Taxonomy
No ratings yet
Flynn Taxonomy
4 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
POLB-Wharf Design (Version 2.0)
No ratings yet
POLB-Wharf Design (Version 2.0)
103 pages
Bracing BR-1 & BR-2 - R02
No ratings yet
Bracing BR-1 & BR-2 - R02
1 page
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Seminar
No ratings yet
Seminar
85 pages
Part - B Unit - 5 Multiprocessors and Thread - Level Parallelism
No ratings yet
Part - B Unit - 5 Multiprocessors and Thread - Level Parallelism
20 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
Multiprocessor Architecture: Taxonomy of Parallel Architectures
100% (1)
Multiprocessor Architecture: Taxonomy of Parallel Architectures
32 pages
June-July2023.18
No ratings yet
June-July2023.18
2 pages
June-July2024.18
No ratings yet
June-July2024.18
2 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
Large Computer Systems and Pipelining: Homework
No ratings yet
Large Computer Systems and Pipelining: Homework
11 pages
Chap15 Sima Mimd
No ratings yet
Chap15 Sima Mimd
12 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Computer Architecture Flynn's Taxonomy
No ratings yet
Computer Architecture Flynn's Taxonomy
4 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
Parallel and Distributed Algorithms: Johnnie W. Baker
No ratings yet
Parallel and Distributed Algorithms: Johnnie W. Baker
67 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
PARALLEL PROGRAMMING module 1
No ratings yet
PARALLEL PROGRAMMING module 1
20 pages
Chapter - 5 Multiprocessors and Thread-Level Parallelism: A Taxonomy of Parallel Architectures
No ratings yet
Chapter - 5 Multiprocessors and Thread-Level Parallelism: A Taxonomy of Parallel Architectures
41 pages
Parallel Computig Assignment
No ratings yet
Parallel Computig Assignment
15 pages
Field Test Genius 20 - Gearless
100% (1)
Field Test Genius 20 - Gearless
3 pages
Senarai Amali Fizik SPM Ting 4
100% (4)
Senarai Amali Fizik SPM Ting 4
52 pages
Lecture4 - Distribution Power Flow
No ratings yet
Lecture4 - Distribution Power Flow
4 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Plunger Lift Brochure
No ratings yet
Plunger Lift Brochure
4 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages

15CS72 ACA Module1 Chapter1Final

Uploaded by

15CS72 ACA Module1 Chapter1Final

Uploaded by

Scanned by CamScanner

1.2.1 Shared memory Multiprocessor

UMA Model(Uniform Memory Access)

3. The system interconnect is through bus, crossbar or multistage network.

● It is faster to access a local memory with a local processor. The access of

COMA( Cache Only Memory Architecture) Model

Some early commercial Multicomputer Systems

Multiprocessors have single address space. Multiprocessors using centrally shared

Memory to Memory Vector Processor

1.3.2 SIMD Supercomputers

1.4 PRAM and VLSI Complexity Models

1.4.1 Parallel Random Access Machines

1.4.2 VLSI Complexity Model

You might also like