0% found this document useful (0 votes)
4 views43 pages

Chapter 6

The document discusses the principles of pipelining in computer architecture, highlighting its advantages in increasing throughput while not reducing the latency of individual tasks. It outlines various hazards that can occur in pipelined execution, such as structural, control, and data hazards, along with potential solutions like stalling, forwarding, and reordering code. Additionally, it covers the MIPS architecture's suitability for pipelining and introduces concepts like superscalar and dynamic pipelining for further performance improvements.

Uploaded by

tasfi12129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views43 pages

Chapter 6

The document discusses the principles of pipelining in computer architecture, highlighting its advantages in increasing throughput while not reducing the latency of individual tasks. It outlines various hazards that can occur in pipelined execution, such as structural, control, and data hazards, along with potential solutions like stalling, forwarding, and reordering code. Additionally, it covers the MIPS architecture's suitability for pipelining and introduces concepts like superscalar and dynamic pipelining for further performance improvements.

Uploaded by

tasfi12129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Computer Organization and

Architecture
PIPELINE Hazard
Pipelining
• Start work ASAP!! Do not waste time!

Not pipelined

Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped

Pipelined
Pipelined vs. Single-Cycle Instruction
Execution: the Plan
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data
Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Assume 2 ns for memory access, ALU operation; 1 ns for register access:
therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access

2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
• Pipelining does not reduce latency of a single task, it increases
throughput of entire workload
• Pipeline rate limited by longest stage
– potential speedup = number pipe stages
– unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when there is slack
in the pipeline – reduces speedup
Example Problem
• Problem: for the laundry fill in the following table when
1. the stage lengths are 30, 30, 30 30 min., resp.
2. the stage lengths are 20, 20, 60, 20 min., resp.

Person Unpipelined Pipeline 1 Ratio unpipelined Pipeline 2 Ratio unpiplelined


finish time finish time to pipeline 1 finish time to pipeline 2
1
2
3
4

n
• Come up with a formula for pipeline speed-up!
Pipelining MIPS

• What makes it easy with MIPS?


– all instructions are same length
• so fetch and decode stages are similar for all instructions
– just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
– memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
– operands are aligned in memory
• one data transfer instruction requires one memory access stage
Pipelining MIPS

• What makes it hard?


– structural hazards: different instructions, at different stages, in the
pipeline want to use the same hardware resource
– control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction, already
in pipeline
– data hazards: an instruction in the pipeline requires data to be
computed by a previous instruction still in the pipeline

• Before actually building the pipelined datapath and control we


first briefly examine these potential hazards individually…
Structural Hazards
• Structural hazard: inadequate hardware to simultaneously support all
instructions in the pipeline in the same clock cycle
• E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth lw instructions

Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Hazard if single memory
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

• MIPS was designed to be pipelined: structural hazards are easy to


avoid!
Control Hazards
• Control hazard: need to make a decision based on the result of a previous
instruction still executing in pipeline
• Solution 1 Stall the pipeline

Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction
Reg ALU
Data
Reg Note that branch outcome is
add $4, $5, $6 fetch access computed in ID stage with
Instruction Data added hardware (later…)
beq $1, $2, 40 Reg ALU Reg
2ns fetch access

Instruction Data
lw $3, 300($0) bubble Reg ALU Reg
fetch access

4 ns 2ns

Pipeline stall
Control Hazards
• Solution 2 Predict branch outcome
– e.g., predict branch-not-taken :

Prediction success

Prediction failure: undo (=flush) lw


DATA hazard
• Solutions for Data Hazards
– Stalling
– Forwarding:
• connect new value directly to next stage
– Reordering
Pipeline stages of MIPS instruction
• the 5 steps in instruction execution

1. Instruction Fetch & PC Increment (IF)


2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
Data Hazards - Stalling
• STALL still required for load - data avail. after
This is another
representation of the

Data Hazards stall.

LW R1, 0(R2) IF ID EX MEM WB

SUB R4, R1, R5 IF ID EX MEM WB

AND R6, R1, R7 IF ID EX MEM WB

OR R8, R1, R9 IF ID EX MEM WB

LW R1, 0(R2) IF ID EX MEM WB

SUB R4, R1, R5 IF ID stall EX MEM WB

AND R6, R1, R7 IF stall ID EX MEM WB

OR R8, R1, R9 stall IF ID EX MEM WB


Forwarding to execution stage
Reordering Code to Avoid Pipeline Stall
• Example:
(Software Solution)
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
Data hazard
sw $t0, 4($t1)

• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
sw $t2, 0($t1)
Interchanged
Pipelined Datapath

• We now move to actually building a pipelined datapath


• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
– all 5 steps done in a single clock cycle
– dedicated hardware required for each step

• What happens if we break the execution into multiple cycles, but


keep the extra hardware?
Review - Single-Cycle Datapath
“Steps”
ADD

4 ADD

P <<2
C ADDR RD
Instruction I
32 16 32
5 5 5
Instruction
Memor RN RN WN
y 1 2 RD Zero
Register 1 AL
WDFile U
RD M
2 U ADDR
X
Dat
Memor RD M
E a U
16 X 32
WD y
X
T
N
D

IF ID EX MEM WB
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access Write Back
Pipelined Datapath – Key Idea
• What happens if we break the execution into multiple cycles, but keep the
extra hardware?
– Answer: We may be able to start executing a new instruction at each clock
cycle - pipelining
• …but we shall need extra registers to hold data between cycles – pipeline
registers
Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Only data flowing right to left may cause hazard…, why?
Bug in the Datapath

IF/ID ID/EX EX/MEM MEM/WB


ADD

4 ADD

P <<2
C ADDR RD
Instruction I
32 16 32
5 5 5
Instruction
Memor RN RN WN
y 1 2 RD
Register 1 AL
WDFile U
RD M
2 U ADDR
X
Dat
Memor RD M
E a U
16 X 32
WD y
X
T
N
D

Write register number comes from another later instruction!


Corrected Datapath
IF/ID ID/EX EX/MEM MEM/WB

ADD
ADD
4 64 bits 133 bits
102 bits 69 bits
<<2
P
C
ADDR RD 5
RN1 RD1
32
AL Zer
Instruction RN2
5 o
Memor Registe U
WN
y 5 r File RD2 M
WD U ADDR
X
Dat
Memor RD M
E a U
16 X 32 y X
T WD
N
5 D

Destination register number is also passed through ID/EX, EX/MEM


and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
• Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 1
LW
Single-Clock-Cycle Diagram:
Clock Cycle 2
SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 3
ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 4
SUB ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 5
SUB ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 6
SUB ADD SW
Single-Clock-Cycle Diagram:
Clock Cycle 7
SUB ADD
Single-Clock-Cycle Diagram:
Clock Cycle 8
SUB
Superscalar and Dynamic Pipelining
• In the interest of even faster processors, there
have been three major directions:
• Superpipelining: It simply means longer
pipelines.
– Since the ideal maximum speedup from pipelining
is related to the number of pipeline stages, some
recent microprocessors have gone to pipelines
with eight or more stages.
Superscalar and Dynamic Pipelining
• Superscalar: … to replicate the internal
components of the computer so that it can
launch multiple instructions in every pipeline
stage.
Superscalar and Dynamic Pipelining
Superscalar and Dynamic Pipelining
• Dynamic pipeline scheduling / Dynamic pipelining:
… by the hardware to avoid pipeline hazards.
• Dynamic pipelining is normally combined with
extra hardware resources so later instructions can
proceed in parallel.
• Divided into three major units:
– Instruction fetch and issue unit
– Execute units
– Commit unit
Superscalar and Dynamic Pipelining
• In-order completion
• Out-of-order completion
• Dynamic pipelining is more complecated than the
traditional or static pipelining – why?
– Dynamic scheduling is normally combined with branch
prediction, so the commit unit must be able to discard
all the results in the execution units that were due to
instructions executed after a mispredicted branch.
– Dynamic scheduling is also typically combined with
superscalar execution, so each unit may be issuing or
committing four to six instructions each clock cycle.
• Combining dynamic scheduling with branch
prediction is called speculative execution.
Superscalar and Dynamic Pipelining
• In-order completion
• Out-of-order completion
• Dynamic pipelining is more complecated than the
traditional or static pipelining – why?
– Dynamic scheduling is normally combined with branch
prediction, so the commit unit must be able to discard
all the results in the execution units that were due to
instructions executed after a mispredicted branch.
– Dynamic scheduling is also typically combined with
superscalar execution, so each unit may be issuing or
committing four to six instructions each clock cycle.
• Combining dynamic scheduling with branch
prediction is called speculative execution.

You might also like