0% found this document useful (0 votes)

316 views53 pages

Parallel Processing and Pipelining

The document discusses parallel processing and pipelining in computer architecture. It describes four categories of parallel processing based on instruction and data streams: SISD, SIMD, MISD, and MIMD. It then explains pipelining using examples of an assembly line and a laundromat pipeline. Key aspects of pipelining covered are stages, clocks, latches, characteristics, and performance metrics like latency, throughput, and clock speed. Pipelining improves throughput by processing different instructions in overlapping stages.

Uploaded by

Deshitha Chamikara Wickramarathna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

316 views53 pages

Parallel Processing and Pipelining

Uploaded by

Deshitha Chamikara Wickramarathna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

EA 2004

Computer Architecture - II

Pipelining
Parallel processing

A parallel processing system is able to perform

simultaneous data processing to achieve faster
execution time

The system may have two or more ALUs and be able

to execute two or more instructions at the same time

Goal is to increase the throughput the amount of

processing that can be accomplished during a given
interval of time
Parallel processing

Parallel processing can happen in two basic streams:

instruction stream - consists of the sequence of

instructions read from memory

data stream - encapsulates the operations performed

on the memory

computers can be classified into 4 different categories more

focused on the behavioral aspects of Parallel Processing
Parallel processing classification

Single instruction stream, single data stream SISD

Single instruction stream, multiple data stream SIMD

Multiple instruction stream, single data stream MISD

Multiple instruction stream, multiple data stream MIMD

Single instruction stream, single
data stream SISD

A single control unit

A Processor Unit

A memory unit

Instructions are executed sequentially. Parallel processing

may be achieved by means of multiple functional
units or by pipeline processing
Single instruction stream,
multiple data stream SIMD
A single control unit
Many Processor Units
A memory unit

Includes multiple processing units with a single control

unit. All processors receive the same instruction, but
operate on different data.
Multiple instruction stream,
single data stream MISD
Many Processor Units
Which on its own contains
A control unit
A local memory
Theoretical only

processors receive different instructions, but operate

on same data.

i.e. Space shuttle flight control systems

Multiple instruction stream,
multiple data stream MIMD
Many Processor Units
Many Control Units
A computer system capable of processing several
programs at the same time.

Most multiprocessor and supercomputer systems can

be classified in this category

Parallel processing also be classified via pipelining which

concerns operational and structural interconnections
What is a Pipeline
Pipelining is used by all modern microprocessors to
enhance performance by overlapping the execution
of instructions.
A common analogue for a pipeline is a factory
assembly line. Assume that there are three stages:
o Welding
o Painting
o Polishing

For simplicity, assume that each task takes one hour.

What is a Pipeline
A single person would take three hours to produce one
product.

Three people, one person could work on each stage,

upon completing their stage they could pass their
product on to the next person (since each stage takes
one hour there will be no waiting).

Then produce one product per hour assuming the

assembly line has been filled.
Pipelining: Laundry Example

Small laundry has one

washer, one dryer and one
operator, it takes 90 A B C D
minutes to finish one load:

Washer takes 30 minutes

Dryer takes 40 minutes
operator folding takes 20
minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
This operator scheduled his loads to be delivered to the laundry every 90
minutes which is the time required to finish one load. In other words he
will not start a new task unless he is already done with the previous task
The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Another operator asks for the delivery of loads to the laundry every 40
minutes!?.
Pipelined laundry takes 3.5 hours for 4 loads
Pipelining Facts Multiple tasks operating
simultaneously
Pipelining doesnt help
6 PM latency of single task, it
7 8 9
helps throughput of entire
Time workload
T
a 30 40 40 40 40 20 Pipeline rate limited by
s slowest pipeline stage
k A
Potential speedup =
O Number of pipe stages
r B
d Unbalanced lengths of pipe
e The washer stages reduces speedup
r C waits for the
dryer for 10
minutes Time to fill pipeline and
D time to drain it reduces
speedup
Building a Car
Unpipelined Start and finish a job before moving to the next

Parallelism = 1 car

24 hrs.
Latency= 24 hrs.
Throughput = 1/24 hrs.
24 hrs.

Jobs
24 hrs.

Time
Latency the amount of time that a single operation takes to execute
Throughput the rate at which operations get executed (generally
expressed as operations/second or operations/cycle)
The Assembly Line
Pipelined Break the job into smaller stages
Eng. Body Paint 8h
A B C Parallelism = 3 cars
8h Eng. Body Paint Latency= 24 hrs.
A B C
Throughput = 1/8 hrs.
Eng. Body Paint
A B C
Jobs
Eng. Body Paint
3X
A B C

Time
In computer..
Unpipelined Start and finish a job before moving to the next
FET DEC EXE

FET DEC EXE

Jobs

Time
In computer..
Pipelined Break the job into smaller stages
FET DEC EXE
A B C
I1 I1 I1
Cycle 1 FET DEC EXE
A B C
I2 I2
Cycle 2 FET DEC EXE
A B C
Jobs I3
Cycle 3
A B C

Time
In computer..
Unpipelined Start and finish a job before moving to the next
FET DEC EXE

3 ns FET DEC EXE

Jobs

Clock Speed = 1/3ns = 333 MHz

Time
In computer..
Pipelined Break the job into smaller stages
FET DEC EXE
A B C
I1 I1 I1 Clock Speed = 1/1ns = 1 GHz
Cycle 1 FET DEC EXE
A B C
I2 I2
Cycle 2 FET DEC EXE
A B C
Jobs I3
Cycle 3
A B C
1ns
3 ns
Time
Pipelining

Latency the amount of time that a single operation takes to execute

Throughput the rate at which operations get executed (generally
expressed as operations/second or operations/cycle)
Clocks and Latches

Stage 1 Stage 2
Clocks and Latches

Stage 1 L Stage 2 L

Clk
Clocks and Latches

Stage 1 L Stage 2 L

Clk

Four segment pipeline:

Clock

Input S1 R1 S2 R2 S3 R3 S4 R4
Example
Assume a 2 ns flip-flop delay
Characteristics Of Pipelining
Decomposes a sequential process into segments.

Divide the processor into segment processors each one

is dedicated to a particular segment.

Each segment is executed in a dedicated segment-

processor operates concurrently with all other segments.

Information flows through these multiple hardware

segments.

If the stages of a pipeline are not balanced and one

stage is slower than another, the entire throughput of
the pipeline is affected
Pipelining
Instruction execution is divided into k segments or
stages
Instruction exits pipe stage k-1 and proceeds into pipe

stage k
All pipe stages take the same amount of time; called

one processor cycle

Length of the processor cycle is determined by the

slowest pipe stage

k segments
Pipeline Performance
n:instructions n is equivalent to number of loads in
the laundry example
k: stages in
k is the stages (washing, drying and
pipeline folding.
: clock cycle Clock cycle is the slowest task time
Tk: total time

Tk (k (n 1))
n
T1 nk
Speedup
Tk k (n 1) k
Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Speedup
Consider a k-segment pipeline operating on n data
sets. (In the above example, k = 3 and n = 4.)

It takes k clock cycles to fill the pipeline and get the

first result from the output of the pipeline.

After that the remaining (n - 1) results will come out

at each clock cycle.

It therefore takes (k + n - 1) clock cycles to complete

the task.
Speedup
If we execute the same task sequentially in a
single processing unit, it takes (k * n) clock
cycles.
The speedup gained by using the pipeline is:

S = k * n / (k + n - 1 )
Speedup
S = k * n / (k + n - 1 )

For n >> k (such as 1 million data sets on a 3-stage

pipeline),

S~k

So we can gain the speedup which is equal to the

number of functional units for a large data sets. This
is because the multiple functional units can work in
parallel except for the filling and cleaning-up cycles.
Speedup
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS

Speedup
Sk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the system with 4

identical function units
Example of Pipelining
Suppose we want to perform the combined
multiply and add operations with a stream
of numbers:

Ai * Bi + Ci for i =1,2,3,,7
Example of Pipelining
The sub-operations performed in each
segment of the pipeline are as follows:

R1 Ai, R2 Bi
R3 R1 * R2 R4 Ci
R5 R3 + R4
Example of Pipelining
Ai Bi Ci

R1 Ai , R2 Bi R1 R2
Input Ai and Bi

R3 R1 * R2, R4 Ci
Multiplier
Multiply and input Ci

R5 R3 + R4 R3 R4
Add Ci to product
Adder

R5
Content of registers in pipeline example

Clock
Pulse
Segment1 Segment2 Segment3
number R1 R2 R3 R4 R5

1 A1 B1 ---- ---- ----

2 A2 B2 A1*B1 C1 ----
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 ---- ---- A7*B7 C7 A6*B6+C6
9 ---- ---- ---- ---- A7*B7+C7

Exercise: Looking at the above example define how the operation of

Ai*Bi + Ci*Di+ Ei
is executed using a pipeline
Arithmetic Pipeline
From the early times of computing arithmetics withheld an
important aspect, yet arithmetic operations happen to
consume much of the time with in the arithmetic and logic
unit.

Thus pipelining is used to boost the performance of ALUs

and has opened up to many means of High performance of
computing.

Arithmetic pipelines are generally used for fixed point

operations and floating point operations.
Arithmetic Pipeline: Floating Point Adder

A generic floating point number can be stated as

X = A * 2a

Where X happens to be a binary value.

A is defined to be the mantissa and a is called the
exponent.
Arithmetic Pipeline: Floating Point Adder

X = A * 2a
Y = B * 2b
A floating point adder can be executed via 4 simple
sub operations

Compare the exponents.

Align the mantissas.
Add or subtract the mantissas.
Normalize the result.
Arithmetic Pipeline: Floating Point Adder

Given below is a simple demonstration of how two

decimal floats are added.

Consider the two input floats of X and Y

X = 0.9832* 103
Y = 0.8929* 102

Note: Decimal numbers are used for simplicity of explanation

Arithmetic Pipeline: Floating Point Adder

X = 0.9832* 103
Y = 0.8929* 102

In the initial segment the two exponents are compared.

The larger exponent is 3 and thus it is chosen as the
exponent for the result.

difference between the two exponents is 1 (3-2).

Arithmetic Pipeline: Floating Point Adder
X = 0.9832* 103
Y = 0.8929* 102
Since Y is with the lesser exponent its mantissa is
shifted to the right and the two gained values are,
X = 0.9832* 103
Y = 0.08929* 103

Afterwards the two mantissas are simply added and the

value Z is gained
Z = 1.07249* 103
Finally the gained result is normalized in manner which
staples a mantissa with a fraction with a none zero
value for the first decimal point.
Arithmetic Pipeline for Floating Point Adder

Exponents
Mantissas
a b
A B

R
R
Compare
Difference
Segment 1 Exponent
By subtraction
Align mantissas
R
R
Segment 2 Choose exponent

Add or subtract
Segment 3 mantissas

R R

Normalize
Segment 4 Adjust
Exponent result

R R
Arithmetic Pipeline for Floating Point Adder
Instruction Pipeline

An Instruction pipeline works in a similar manner

to the Arithmetic Pipeline even though it works
with an instruction field as suppose to a data
stream.
Instruction Pipeline
process of an instruction requires the following sequence
of steps.

Fetch the instruction from memory.

Decode the instruction.
Calculate the effective address.
Fetch the operands from memory.
Execute the instruction.
Store the result in the proper place.
Instruction Pipeline
Consider the following specification of a pipeline mean to
have 4 separate segments

In such a system up to 4 different instructions can be

processed at the same time.
Pipeline Conflicts
Difficulties in general can be caused due to the reasons
specified below.
Resource conflicts
when two segments access memory at the same

time.
Data dependency conflicts
occur when an instruction is dependent of a result of a
previous instruction which is not available yet
Branch difficulties conflicts
when branching and other instructions that change the
value of the PC.
Four-segment CPU pipeline for overcome
Pipeline Conflicts
Fetch instruction
Segment 1 from memory

Decode instruction
And calculate
Segment 2
Effective address

yes
Branch?

no
Fetch operand
Segment 3 From memory

Execute
Segment 4
instruction

Interrupt yes
handling Interrupt?

no
Update PC

Empty pipe
Four-segment CPU pipeline for overcome
Pipeline Conflicts
Timing of Instruction Pipeline

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13

Instruction: 1 FI DA FO EX

2 FI DA FO EX

(Branch) 3 FI DA FO EX

4 FI -- -- FI DA FO EX

5 -- -- -- FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX
Four-segment CPU pipeline for overcome
Pipeline Conflicts
The four segments illustrated in above table have the following
meanings:

FI is the segment that fetches an instruction.

DA is the segment that decodes the instruction and

calculate the effective address.

FO is the segment that fetches the operand.

EX is the segment that executes the instruction.

Thank You

Unit I Introduction To 8085 Microprocessor
No ratings yet
Unit I Introduction To 8085 Microprocessor
55 pages
8086 Assembly Language Programming Guide
No ratings yet
8086 Assembly Language Programming Guide
46 pages
Principles of Concurrency
No ratings yet
Principles of Concurrency
7 pages
Chapter 07 Computer Arithmetic 2
No ratings yet
Chapter 07 Computer Arithmetic 2
57 pages
Numerical Measures PDF
No ratings yet
Numerical Measures PDF
34 pages
3.permutations and Combinations
No ratings yet
3.permutations and Combinations
4 pages
DD Vahid 5
No ratings yet
DD Vahid 5
94 pages
8086 Microprocessor Programming Guide
No ratings yet
8086 Microprocessor Programming Guide
69 pages
Lecture Notes Wireless Communication
No ratings yet
Lecture Notes Wireless Communication
87 pages
MIPS Processor Design and Pipelining
No ratings yet
MIPS Processor Design and Pipelining
95 pages
Unit 2 - Programming of 8085 Microprocessor
100% (1)
Unit 2 - Programming of 8085 Microprocessor
32 pages
Multiprocessor System Architecture
No ratings yet
Multiprocessor System Architecture
11 pages
Onur 447 Spring15 Lecture2 Isa Afterlecture
No ratings yet
Onur 447 Spring15 Lecture2 Isa Afterlecture
57 pages
Introduction To Epidata Epinfo SPSS and STATA
No ratings yet
Introduction To Epidata Epinfo SPSS and STATA
2 pages
Hardwired Control
100% (1)
Hardwired Control
3 pages
Understanding Linear Equations Systems
No ratings yet
Understanding Linear Equations Systems
11 pages
Local Area Network (Lan) : Characteristics of Lans
No ratings yet
Local Area Network (Lan) : Characteristics of Lans
9 pages
5.pipeline and Multiprocessors
100% (1)
5.pipeline and Multiprocessors
16 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Page Replacement Algorithms - Page Fault - Gate Vidyalay
No ratings yet
Page Replacement Algorithms - Page Fault - Gate Vidyalay
9 pages
Flynn's Classification
No ratings yet
Flynn's Classification
13 pages
OS Module
No ratings yet
OS Module
163 pages
Ram & Rom 2
No ratings yet
Ram & Rom 2
13 pages
Chapter 2: 8051 Microcontroller Architecture: 2.1 What Is 8051 Standard?
No ratings yet
Chapter 2: 8051 Microcontroller Architecture: 2.1 What Is 8051 Standard?
46 pages
Convolutional Codes Overview
No ratings yet
Convolutional Codes Overview
30 pages
Data Link Layer
No ratings yet
Data Link Layer
50 pages
Error Detection in Computer Networks
No ratings yet
Error Detection in Computer Networks
6 pages
Error Detection & Correction Guide
No ratings yet
Error Detection & Correction Guide
36 pages
Understanding Computer Interrupts
No ratings yet
Understanding Computer Interrupts
8 pages
Cache Mapping Functions
No ratings yet
Cache Mapping Functions
39 pages
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
Error Detection Techniques in Networks
No ratings yet
Error Detection Techniques in Networks
6 pages
Microprocessor Complete Note (BCA) BitinfoNepal
No ratings yet
Microprocessor Complete Note (BCA) BitinfoNepal
101 pages
Introduction To Operating Systems
100% (1)
Introduction To Operating Systems
14 pages
Flag Register in 8085 Microprocessor
100% (1)
Flag Register in 8085 Microprocessor
11 pages
8085 - Memory Interfacing Problems
No ratings yet
8085 - Memory Interfacing Problems
4 pages
Time Delay Calculations
100% (1)
Time Delay Calculations
16 pages
Bce613a-Mod 5
No ratings yet
Bce613a-Mod 5
32 pages
OSI Model: 7 Layers Explained
No ratings yet
OSI Model: 7 Layers Explained
2 pages
Fdma
No ratings yet
Fdma
2 pages
Decoder Design and Applications
No ratings yet
Decoder Design and Applications
25 pages
Combinational Logic Circuits Overview
No ratings yet
Combinational Logic Circuits Overview
33 pages
Chapter 2 Basic Architecture of The 8088 and 8086 Microprocessors
No ratings yet
Chapter 2 Basic Architecture of The 8088 and 8086 Microprocessors
6 pages
Microprocessor Unit 4
100% (1)
Microprocessor Unit 4
55 pages
Ethernet (LAN) Address Resolution Protocol (ARP) Reverse Address Resolution Protocol (RARP)
100% (1)
Ethernet (LAN) Address Resolution Protocol (ARP) Reverse Address Resolution Protocol (RARP)
55 pages
AVR Serial Communication Guide
No ratings yet
AVR Serial Communication Guide
20 pages
CPU Cycles and Pipeline Performance
No ratings yet
CPU Cycles and Pipeline Performance
16 pages
SHA-512 Algorithm Overview and Process
No ratings yet
SHA-512 Algorithm Overview and Process
16 pages
Booth's Algorithm for Multiplication
No ratings yet
Booth's Algorithm for Multiplication
18 pages
28-Rs232 Interfacing-12-04-2024
No ratings yet
28-Rs232 Interfacing-12-04-2024
15 pages
Page Replacement Algorithms Guide
No ratings yet
Page Replacement Algorithms Guide
3 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Unit 5
No ratings yet
Unit 5
51 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
2015 02 26 - Ej
No ratings yet
2015 02 26 - Ej
20 pages
Abbas 20
No ratings yet
Abbas 20
20 pages
CMS Monte Carlo Production Overview
No ratings yet
CMS Monte Carlo Production Overview
35 pages
Adc F08
No ratings yet
Adc F08
57 pages
A3016 Power Supply Manual
No ratings yet
A3016 Power Supply Manual
27 pages
Operational Amplifiers
100% (1)
Operational Amplifiers
33 pages
Database Explorer: Create Databases Easily
No ratings yet
Database Explorer: Create Databases Easily
3 pages
Client-Server Architecture Explained
No ratings yet
Client-Server Architecture Explained
14 pages
The 555 Timer Circuit II
No ratings yet
The 555 Timer Circuit II
15 pages
Computer Logic for Tech Enthusiasts
No ratings yet
Computer Logic for Tech Enthusiasts
30 pages
Individual Support Work Skills Workshop Student Assessments Booklet
0% (2)
Individual Support Work Skills Workshop Student Assessments Booklet
68 pages
8 Bie2014 Hda Protection of House Buyers
No ratings yet
8 Bie2014 Hda Protection of House Buyers
28 pages
Understanding Periodisation in Training
No ratings yet
Understanding Periodisation in Training
17 pages
Lesson 2: Performing Basic Mensuration and Calculation
No ratings yet
Lesson 2: Performing Basic Mensuration and Calculation
9 pages
G31MV Series Manual en v1.0
No ratings yet
G31MV Series Manual en v1.0
77 pages
Final ICT - Contact Center Services Grade 7-10
No ratings yet
Final ICT - Contact Center Services Grade 7-10
19 pages
12 Remote Wireless Attendant Calling System: FBXWACS12, FORBIX SEMICON
No ratings yet
12 Remote Wireless Attendant Calling System: FBXWACS12, FORBIX SEMICON
1 page
Generative AI Lesson Plan
No ratings yet
Generative AI Lesson Plan
3 pages
Amoxi Tabs PDF
No ratings yet
Amoxi Tabs PDF
1 page
Ferrocement Hull A Small Contribution of
No ratings yet
Ferrocement Hull A Small Contribution of
12 pages
Indications and Techniques for Open Cholecystectomy
No ratings yet
Indications and Techniques for Open Cholecystectomy
15 pages
Silica Analyzer for Water Systems
No ratings yet
Silica Analyzer for Water Systems
3 pages
Aircraft Dynamics and Ageing: University of Zagreb Faculty of Traffic and Transport Sciences Aeronautics Study Programme
No ratings yet
Aircraft Dynamics and Ageing: University of Zagreb Faculty of Traffic and Transport Sciences Aeronautics Study Programme
146 pages
Publish Wraper - Log - DEV
No ratings yet
Publish Wraper - Log - DEV
2 pages
Thorn Martinek: Associate's Degree in Drafting and Design
No ratings yet
Thorn Martinek: Associate's Degree in Drafting and Design
2 pages
SO1 Financial Reporting
No ratings yet
SO1 Financial Reporting
27 pages
En 14175 PDF
No ratings yet
En 14175 PDF
4 pages
Endel S AWS OF Nheritance: Figure 4.1 Seven Pairs of Contrasting Traits in Pea Plant Studied by Mendel
No ratings yet
Endel S AWS OF Nheritance: Figure 4.1 Seven Pairs of Contrasting Traits in Pea Plant Studied by Mendel
23 pages
Warehouse Management System: Oracle WMS
No ratings yet
Warehouse Management System: Oracle WMS
30 pages
Deliveraddis
No ratings yet
Deliveraddis
7 pages
CFA Digital Lyric Booklet 4 5 PDF
100% (10)
CFA Digital Lyric Booklet 4 5 PDF
45 pages
Oscar Niemeyer: Poet of Concrete'
No ratings yet
Oscar Niemeyer: Poet of Concrete'
28 pages
Environmental Health Risks of Lugbe Dumps
No ratings yet
Environmental Health Risks of Lugbe Dumps
5 pages
FMM Johor HIRAC Training Evaluation
No ratings yet
FMM Johor HIRAC Training Evaluation
2 pages
Accounting Costs Vs Economic Costs Plus When To Use Each
No ratings yet
Accounting Costs Vs Economic Costs Plus When To Use Each
8 pages
FortiSandbox 4.4.1 Admin Guide
No ratings yet
FortiSandbox 4.4.1 Admin Guide
256 pages
Lab2 Solids Determination
No ratings yet
Lab2 Solids Determination
14 pages
Foam/Sand Mixtures in Tunneling
No ratings yet
Foam/Sand Mixtures in Tunneling
150 pages
Sociology at The Movies: Presentation
No ratings yet
Sociology at The Movies: Presentation
17 pages

Parallel Processing and Pipelining

Uploaded by

Parallel Processing and Pipelining

Uploaded by

EA 2004

A parallel processing system is able to perform

The system may have two or more ALUs and be able

Goal is to increase the throughput the amount of

Parallel processing can happen in two basic streams:

instruction stream - consists of the sequence of

data stream - encapsulates the operations performed

computers can be classified into 4 different categories more

Single instruction stream, single data stream SISD

Single instruction stream, multiple data stream SIMD

Multiple instruction stream, single data stream MISD

Multiple instruction stream, multiple data stream MIMD

A single control unit

Instructions are executed sequentially. Parallel processing

Includes multiple processing units with a single control

processors receive different instructions, but operate

i.e. Space shuttle flight control systems

Most multiprocessor and supercomputer systems can

Parallel processing also be classified via pipelining which

For simplicity, assume that each task takes one hour.

Three people, one person could work on each stage,

Then produce one product per hour assuming the

Small laundry has one

Washer takes 30 minutes

FET DEC EXE

3 ns FET DEC EXE

Clock Speed = 1/3ns = 333 MHz

Latency the amount of time that a single operation takes to execute

Four segment pipeline:

Divide the processor into segment processors each one

Each segment is executed in a dedicated segment-

Information flows through these multiple hardware

If the stages of a pipeline are not balanced and one

one processor cycle

slowest pipe stage

It takes k clock cycles to fill the pipeline and get the

After that the remaining (n - 1) results will come out

It therefore takes (k + n - 1) clock cycles to complete

For n >> k (such as 1 million data sets on a 3-stage

So we can gain the speedup which is equal to the

4-Stage Pipeline is basically identical to the system with 4

1 A1 B1 ---- ---- ----

Exercise: Looking at the above example define how the operation of

Thus pipelining is used to boost the performance of ALUs

Arithmetic pipelines are generally used for fixed point

A generic floating point number can be stated as

Where X happens to be a binary value.

Compare the exponents.

Given below is a simple demonstration of how two

Consider the two input floats of X and Y

Note: Decimal numbers are used for simplicity of explanation

In the initial segment the two exponents are compared.

difference between the two exponents is 1 (3-2).

Afterwards the two mantissas are simply added and the

An Instruction pipeline works in a similar manner

Fetch the instruction from memory.

In such a system up to 4 different instructions can be

FI is the segment that fetches an instruction.

DA is the segment that decodes the instruction and

FO is the segment that fetches the operand.

EX is the segment that executes the instruction.

You might also like