0% found this document useful (0 votes)

5 views38 pages

CS212 Unit 5

Uploaded by

Pranjal Is Such A Nerd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views38 pages

CS212 Unit 5

Uploaded by

Pranjal Is Such A Nerd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

CS212

Computer Organization

UNIT-5
Parallel Processing &
Multiprocessor
Topics to be covered
• Flynn's taxonomy
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
• Vector Processing
• Array Processors
Flynn’s Taxonomy
Data Stream

Single Multiple

Single SISD SIMD

Instruction
Stream
Multiple MISD MIMD
Single Instruction Single Data (SISD)
• SISD represents the organization of a single
computer containing a control unit, a processor
unit, and a memory unit.
• Instructions are executed sequentially and the
system may or may not have internal parallel
processing capabilities.
Single Instruction Multiple Data (SIMD)
• SIMD represents an organization that includes
many processing units under the supervision of a
common control unit.
• All processors receive the same instruction from
the control unit but operate on different items of
data.
Multiple Instruction Single Data (MISD)
• There is no computer at present that can be
classified as MISD.
• MISD structure is only of theoretical interest since
no practical system has been constructed using
this organization.
Multiple Instruction Multiple Data (MIMD)
• MIMD organization refers to a computer system
capable of processing several programs at the
same time.
• Most multiprocessor and multicomputer systems
can be classified in this category.
• Contains multiple processing units.
• Execution of multiple instructions on multiple
data.
Parallel Processing
• Parallel processing is a term used to denote a large
class of techniques that are used to provide
simultaneous data-processing tasks for the purpose
of increasing the computational speed of a
computer system.
• Purpose of parallel processing is to speed up the
computer processing capability and increase its
throughput.
• Throughput:
The amount of processing that can be
accomplished during a given interval of time.
Pipelining
• Pipeline is a technique of decomposing a sequential process into
sub operations, with each sub process being executed in a
special dedicated segment that operates concurrently with all
other segments.
• A pipeline can be visualized as a collection of processing
segments through which binary information flows.
• Each segment performs partial processing dictated by the way
the task is partitioned.
• The result obtained from the computation in each segment is
transferred to the next segment in the pipeline.
• The registers provide isolation between each segment.
• The technique is efficient for those applications that need to
repeat the same task many times with different sets of data.
Pipelining example
for

R1 R2

Multiplier

R3 R4

Adder

R5
Pipelining
• General structure of four segment pipeline
Clock

Input
S1 R1 S2 R2 S3 R3 S4 R4
Space-time Diagram
Segment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 4

2 4

3 4

4 4

Non Pipelined Architecture = 16

Segment 1 2 3 4 5 6 7
Clock cycles
1 4

2 1 Pipelined Architecture
3 1
4 1 =7
Speedup
• Speedup of a pipeline processing over an equivalent
non-pipeline processing is𝑛defined
𝑡𝑛 by the ratio
𝑆=
( 𝑘+ 𝑛− 1) 𝑡 𝑝
• If number of tasks in pipeline increases w.r.t. number of
segments then becomes larger than , 𝑛 under
𝑡 𝑛 𝑡 𝑛 this
𝑆= =
condition speedup becomes 𝑛 𝑡𝑝 𝑡 𝑝

• Assuming time to process a task

𝑆=in
𝑘𝑡pipeline
𝑝
=𝑘 and non-
pipeline circuit is same then 𝑡𝑝

• Theoretically maximum speedup achieved is the

number of segments in the pipeline.
Arithmetic Pipeline
• Usually found in high speed computers.
• Used to implement floating point operations,
multiplication of fixed point numbers and similar
operations.
• Consider an example of floating point addition
and subtraction.

• A and B are two fractions that represent the

mantissas and a and b are the exponents.
Example of Arithmetic Pipeline
• Consider the two normalized floating-point numbers:
X = 0.9504 x 103 Y = 0.8200 x 102
• Segment-1: The larger exponent is chosen as the
exponent of result.
• Segment-2: Aligning the mantissa numbers
X = 0.9504 x 103 Y = 0.0820 x 103
• Segment-3: Addition of the two mantissas produces the
sum
Z = 1.0324 x 103
• Segment-4: Normalize the result
Z = 0.10324 x 104
Example of Arithmetic Pipeline
• The sub-operations that are performed in the
four segments are:
1. Compare the exponents
2. Align the mantissas
3. Add or subtract the mantissas
4. Normalize the result
a Exponents b A Mantissas B

R R

Segment 1: Compare exponents Difference

by subtraction

Segment 2: Choose exponent Align mantissas

Add or subtract
Segment 3:
mantissas

R R

Segment 4: Adjust exponent Normalize result

R R
Instruction Pipeline
• In the most general case, the computer needs to process each
instruction with the following sequence of steps
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
• Different segments may take different times to operate on the
incoming information.
• Some segments are skipped for certain operations.
• The design of an instruction pipeline will be most efficient if the
instruction cycle is divided into segments of equal duration.
Instruction Pipeline
• Assume that the decoding of the instruction can be
combined with the calculation of the effective address into
one segment.
• Assume further that most of the instructions place the
result into a processor registers so that the instruction
execution and storing of the result can be combined into
one segment.
• This reduces the instruction pipeline into four segments.
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the effective address
of the operand
3. FO: Fetch the operand
4. EX: Execute the operation
Four segment CPU pipeline
Fetch instruction
Segment1: from memory

Segment2: Decode instruction &

calculate effective address
yes
Branch?
no
Fetch operand
Segment3:
from memory

Segment4: Execute instruction

Interrupt yes no
Interrupt?
handling

Update PC

Empty pipe
Space-time Diagram
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
FI DA FO EX
2
FI DA FO EX
3
FI DA FO EX
4 FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7
Space-time Diagram
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
FI DA FO EX
2
FI DA FO EX
(Branch) 3
FI - - FI DA FO EX
4 - - - FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7
Pipeline Conflict
• There are three major difficulties that cause the
instruction pipeline conflicts.
1. Resource conflicts caused by access to memory by
two segments at the same time. Most of these
conflicts can be resolved by using separate
instruction and data memories.
2. Data dependency conflicts arise when an instruction
depends on the result of a previous instruction, but
this result is not yet available.
3. Branch difficulties arise from branch and other
instructions that change the value of PC.
Vector Processing
• In many science and engineering applications, the
problems can be formulated in terms of vectors and
matrices that lend themselves to vector processing.
• Applications of Vector processing
1. Long-range weather forecasting
2. Petroleum explorations
3. Seismic data analysis
4. Medical diagnosis
5. Aerodynamics and space flight simulations
6. Artificial intelligence and expert systems
7. Mapping the human genome
8. Image processing
Vector Processing
Matrix Multiplication
• Matrix multiplication is one of the most
computationally intensive operations performed
in computers with vector processors.
• An n x m matrix of numbers has n rows and m
columns and may be considered as constituting a
set of n row vectors or a set of m column vectors.
• Consider, for example, the multiplication of two
3x3 matrices A and B.
Vector Processing
• The product matrix C is a 3 x 3 matrix whose elements are related
to the elements of A and B by the inner product:

• The total number of multiplications or additions required to

compute the matrix product is 9 x 3 = 27.
• The values of A and B are either in memory or in processor
registers.

Source A

Source B Multiplier Pipeline Adder Pipeline

SIMD Array Processor

PE1 M1

Master control
PE2 M2
unit

PE3 M3

Main memory
PEn Mn
Tightly coupled V/S Loosely coupled
Tightly Coupled System Loosely Coupled System
Tasks and/or processors Tasks or processors do not
communicate in a highly communicate in a synchronized
synchronized fashion. fashion.
Communicates through a Communicates by message
common shared memory. passing packets.
Shared memory system. Distributed memory system.
Overhead for data exchange is Overhead for data exchange is
lower comparatively. higher comparatively.
Interconnection Structures
1. Time-shared common bus
2. Multiport memory
3. Crossbar switch
4. Multistage switching network
5. Hypercube system
1. Time-shared common bus
Memory unit

CPU 1 CPU 2 CPU 3 IOP 1 IOP 2

2. Multiport Memory
Memory modules

MM 1 MM 2 MM 3 MM 4

CPU 1

CPU 2

CPU 3

CPU 4
3. Crossbar switch
Memory modules

MM 1 MM 2 MM 3 MM 4

CPU 1

CPU 2

CPU 3

CPU 4
4. Multistage switching network
Operation of 2 X 2 interchange switch

0 0
A A
1 1
B B

A connected to 0 A connected to 1

0 0
A A
1 1
B B

B connected to 0 B connected to 1
4. Multistage switching network
0 000
1 001
0
1
0
010
0 1 011
P1
1
P2
0
100
1 101
0
1
0 110
1 111
5. Hypercube Interconnection
011 111

0 01 11 010 110

001 101

1 00 10 000 100

One-cube Two-cube Three-cube

010 x-or 001 = 011

Cache Coherence Problem
Main Memory X 100
230
50 Write through

Write back

Cache X 100
50 X 50 X 50
120
115
230
Processor P1 P2 P3
Cache Coherence Solution
• Write Update
• Write Invalidate
• Software approaches
– Compiler based cache coherence mechanism
• Hardware approaches
– Directory protocols
– Snoopy protocols
Shared Memory Architecture
Local bus

Common System
Local
shared bus CPU IOP
memory
memory controller

System bus

System System
Local
bus CPU IOP bus CPU
memory
controller controller

Local bus Local bus

Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
28 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
32 pages
Pipelining and Vector Processing Guide
No ratings yet
Pipelining and Vector Processing Guide
63 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Dld&Co Cse-Ds Unit 4-2
No ratings yet
Dld&Co Cse-Ds Unit 4-2
38 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Understanding Parallel Processing Techniques
No ratings yet
Understanding Parallel Processing Techniques
21 pages
Chapter9pipelining 200907163859
No ratings yet
Chapter9pipelining 200907163859
13 pages
Pipelining
No ratings yet
Pipelining
33 pages
Pipelining 2
No ratings yet
Pipelining 2
43 pages
3rd Unit
No ratings yet
3rd Unit
72 pages
Mod 3
No ratings yet
Mod 3
46 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Csso U 5
No ratings yet
Csso U 5
29 pages
Unit 5 1
No ratings yet
Unit 5 1
21 pages
COA Module 5 QB Complete Solutions
No ratings yet
COA Module 5 QB Complete Solutions
32 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
7 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Arithmetic Pipelining Overview
No ratings yet
Arithmetic Pipelining Overview
36 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
COAU5
No ratings yet
COAU5
31 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
CSC 424 Assignment
100% (1)
CSC 424 Assignment
8 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Unit-5 (Coa) Notes
100% (1)
Unit-5 (Coa) Notes
33 pages
Mca Coa-Unit III
No ratings yet
Mca Coa-Unit III
16 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Unit 5
No ratings yet
Unit 5
51 pages
Parallel Processing & Pipelining Guide
No ratings yet
Parallel Processing & Pipelining Guide
8 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Ca Unit 2.2
100% (2)
Ca Unit 2.2
22 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
37 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Pipelining and Parallel Processing Overview
No ratings yet
Pipelining and Parallel Processing Overview
34 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
5.pipeline and Multiprocessors
100% (1)
5.pipeline and Multiprocessors
16 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Pipelining and Vector Processing: - Parallel
No ratings yet
Pipelining and Vector Processing: - Parallel
37 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
18 pages
Coa Unit-3 Part-2
No ratings yet
Coa Unit-3 Part-2
35 pages
Mid Sem Q1 Q4 Solutions
No ratings yet
Mid Sem Q1 Q4 Solutions
5 pages
Advanced Computer Architectures
100% (6)
Advanced Computer Architectures
29 pages
Unit 4 Pollutions
No ratings yet
Unit 4 Pollutions
63 pages
Unit 1 - Environment Definitions & Natural Resources
No ratings yet
Unit 1 - Environment Definitions & Natural Resources
47 pages
Theories of Leadership
No ratings yet
Theories of Leadership
15 pages
Numerical Aptitude Master Sheet
No ratings yet
Numerical Aptitude Master Sheet
4 pages
Exp 8
No ratings yet
Exp 8
5 pages
NFS2-3030/LCM-320 Firmware Upgrade Guide
No ratings yet
NFS2-3030/LCM-320 Firmware Upgrade Guide
2 pages
DH-PFS3010-8ET-96 PoE Switch Overview
No ratings yet
DH-PFS3010-8ET-96 PoE Switch Overview
1 page
Mechanical Engineering 4R03 Mechanical Controls: Student Name
No ratings yet
Mechanical Engineering 4R03 Mechanical Controls: Student Name
2 pages
MR Medium Voltage Compensation
No ratings yet
MR Medium Voltage Compensation
8 pages
Biosignal and Medical Image Processing 3rd Semmlow Solution Manual Full
100% (6)
Biosignal and Medical Image Processing 3rd Semmlow Solution Manual Full
129 pages
3.4.6 Lab Explore Python Classes
No ratings yet
3.4.6 Lab Explore Python Classes
6 pages
Load Tap Changers for Power Transformers
No ratings yet
Load Tap Changers for Power Transformers
22 pages
TONEX ONE Quick Start Guide
No ratings yet
TONEX ONE Quick Start Guide
48 pages
Week 2 Hardware Concepts and Software Concepts
No ratings yet
Week 2 Hardware Concepts and Software Concepts
8 pages
AEM Project Management Checklist
No ratings yet
AEM Project Management Checklist
16 pages
Jhenning 2017 Resume
No ratings yet
Jhenning 2017 Resume
1 page
Load Balancing in The Cloud AWS NGINX Plus
No ratings yet
Load Balancing in The Cloud AWS NGINX Plus
40 pages
Food Recommendation System by Mood
No ratings yet
Food Recommendation System by Mood
18 pages
x200 Installation Guide
No ratings yet
x200 Installation Guide
14 pages
Metro Rail Signal Systems Guide
100% (1)
Metro Rail Signal Systems Guide
20 pages
Overview of Waveform Types in Electronics
No ratings yet
Overview of Waveform Types in Electronics
14 pages
Python Module Usage Guide
No ratings yet
Python Module Usage Guide
8 pages
Circuit Theory Question Bank
No ratings yet
Circuit Theory Question Bank
11 pages
ARM Assembler: Structure / Loops
No ratings yet
ARM Assembler: Structure / Loops
61 pages
SMC - Error: Enum SMC - Error Error Number Enum Value Description
No ratings yet
SMC - Error: Enum SMC - Error Error Number Enum Value Description
5 pages
MERN Stack Interview Questions
No ratings yet
MERN Stack Interview Questions
16 pages
Text 7A
No ratings yet
Text 7A
3 pages
Spring 2020 Grade Sheet IIUC
No ratings yet
Spring 2020 Grade Sheet IIUC
1 page
SIP Questions
No ratings yet
SIP Questions
7 pages
IEC 61850 Substation Automation Overview
No ratings yet
IEC 61850 Substation Automation Overview
157 pages
Unit 1&unit 2
No ratings yet
Unit 1&unit 2
32 pages
Microprocessor Based System Design
No ratings yet
Microprocessor Based System Design
44 pages
Rec702 Co Po
No ratings yet
Rec702 Co Po
9 pages
I/O and Interrupts in CISC 3320
No ratings yet
I/O and Interrupts in CISC 3320
38 pages

CS212 Unit 5

Uploaded by

CS212 Unit 5

Uploaded by

CS212

Single SISD SIMD

Non Pipelined Architecture = 16

• Assuming time to process a task

• Theoretically maximum speedup achieved is the

• A and B are two fractions that represent the

Segment 1: Compare exponents Difference

Segment 2: Choose exponent Align mantissas

Segment 4: Adjust exponent Normalize result

Segment2: Decode instruction &

Segment4: Execute instruction

• The total number of multiplications or additions required to

Source B Multiplier Pipeline Adder Pipeline

CPU 1 CPU 2 CPU 3 IOP 1 IOP 2

One-cube Two-cube Three-cube

010 x-or 001 = 011

Local bus Local bus

You might also like