PIPELINE
Parallel processing
A parallel processing system is able to perform
concurrent data processing to achieve faster
execution time
The system may have two or more ALUs and
be able to execute two or more instructions at
the same time
Goal is to increase the throughput – the
amount of processing that can be
accomplished during a given interval of time
Parallel processing
classification
Single instruction stream, single data stream – SISD
Single instruction stream, multiple data stream –
SIMD
Multiple instruction stream, single data stream –
MISD
Multiple instruction stream, multiple data stream –
MIMD
Single instruction stream, single data
stream – SISD
Single control unit, single computer, and a
memory unit
Instructions are executed sequentially. Parallel
processing may be achieved by means of
multiple functional units or by pipeline
processing
Single instruction stream, multiple
data stream – SIMD
Represents an organization that includes many
processing units under the supervision of a
common control unit.
Includes multiple processing units with a single
control unit. All processors receive the same
instruction, but operate on different data.
Multiple instruction stream, single
data stream – MISD
Theoretical only
processors receive different instructions, but
operate on the same data.
Multiple instruction stream,
multiple data stream – MIMD
A computer system capable of processing
several programs at the same time.
Most multiprocessor and multicomputer
systems can be classified in this category
Pipelining: Laundry
Example
Small laundry has one
washer, one dryer and
one operator, it takes 90 A B C D
minutes to finish one
load:
Washer takes 30 minutes
Dryer takes 40 minutes
“operator folding” takes
20 minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
This operator scheduled his loads to be delivered to the laundry every
90 minutes which is the time required to finish one load. In other
words he will not start a new task unless he is already done with the
previous task
The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined
Laundry
Operator start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Another operator asks for the delivery of loads to the laundry every 40 minutes!?.
Pipelined laundry takes 3.5 hours for 4 loads
Multiple tasks
Pipelining Facts operating
simultaneously
Pipelining doesn’t
6 PM 7 8 9 help latency of single
task, it helps
Time throughput of entire
T workload
a 30 40 40 40 40 20 Pipeline rate limited
s by slowest pipeline
k A stage
O
Potential speedup =
r B Number of pipe
d stages
e The washer
Unbalanced lengths
r C waits for the
dryer for 10 of pipe stages
minutes reduces speedup
D Time to “fill” pipeline
and time to “drain” it
reduces speedup
9.2 Pipelining
• Decomposes a sequential process into
segments.
• Divide the processor into segment processors
each one is dedicated to a particular segment.
• Each segment is executed in a dedicated
segment-processor operates concurrently with
all other segments.
• Information flows through these multiple
hardware segments.
5-Stage Pipelining
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Some definitions
Pipeline: is an implementation
technique where multiple instructions
are overlapped in execution.
Pipeline stage: The computer pipeline
is to divided instruction processing into
stages.
Each stage completes a part of an
instruction and loads a new part in parallel.
Some definitions
Throughput of the instruction pipeline is determined by
how often an instruction exits the pipeline. Pipelining
does not decrease the time for individual instruction
execution. Instead, it increases instruction throughput.
Machine cycle . The time required to move an
instruction one step further in the pipeline. The length of
the machine cycle is determined by the time required for
the slowest pipe stage.
Instruction pipeline versus sequential
processing
sequential processing
Instruction pipeline
Instruction pipeline (Contd.)
sequential processing is
faster for few instructions
Instructions seperate
1. Fetch the instruction
2. Decode the instruction
3. Fetch the operands from
memory
4. Execute the instruction
5. Store the results in the proper
place
5-Stage Pipelining
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Five Stage
Instruction
Pipeline
Fetch instruction
Decode
instruction
Fetch operands
Execute
instructions
Write result
Difficulties...
If a complicated memory access
occurs in stage 1, stage 2 will be
delayed and the rest of the pipe is
stalled.
If there is a branch, if.. and jump,
then some of the instructions that
have already entered the pipeline
should not be processed.
We need to deal with these difficulties
Pipeline Hazards
There are situations, called hazards,
that prevent the next instruction in the
instruction stream from executing
during its designated cycle
There are three classes of hazards
Structural hazard
Data hazard
Branch hazard
Pipeline Hazards
Structural hazard
Resource conflicts when the hardware
cannot support all possible combination of
instructions simultaneously
Data hazard
An instruction depends on the results of a
previous instruction that has not yet exited
pipeline
Structural hazard
Some pipeline processors have
shared a single-memory pipeline
for data and instructions
Structural hazard
Memory data fetch requires on FI and FO
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Structural hazard
To solve this hazard, we “stall” the
pipeline until the resource is freed
A stall is commonly called pipeline
bubble, since it floats through the
pipeline taking space but carry no
useful work
Structural hazard
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
Data hazard
Example:
ADD R1R2+R3
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
FO: fetch data value WO: store the executed
S1 value S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)
Time
Data hazard
Delay load approach inserts a no-operation
instruction to avoid the data conflict
ADD R1R2+R3
No-op
No-op
SUB R4R1-R5
AND R6R1 AND R7
OR R8R1 OR R9
XOR R10R1 XOR R11
Data hazard
Data hazard
It can be further solved by a simple hardware technique
called forwarding (also called bypassing or short-
circuiting)
The insight in forwarding is that the result is not really
needed by SUB until the ADD execute completely
If the forwarding hardware detects that the previous
ALU operation has written the register corresponding to
a source for the current ALU operation, control logic
selects the results in ALU instead of from memory
Data hazard