0% found this document useful (0 votes)
10 views19 pages

Lec13 Pipe Control

This document discusses hazards that can occur in pipelined processors and ways to address them. It begins with a review of pipelining and pipeline control. It then introduces different types of hazards including structural hazards when the hardware cannot support certain combinations of instructions, data hazards when an instruction depends on the result of a prior instruction, and control hazards involving branches. Common solutions like stalling the pipeline by inserting bubbles are presented. Specific examples of structural hazards due to sharing functional units like memory and data hazards when instructions depend on a value not yet computed are described. Forwarding of register values and stalling the pipeline are shown as techniques to resolve such hazards.

Uploaded by

Mahmoud Magdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views19 pages

Lec13 Pipe Control

This document discusses hazards that can occur in pipelined processors and ways to address them. It begins with a review of pipelining and pipeline control. It then introduces different types of hazards including structural hazards when the hardware cannot support certain combinations of instructions, data hazards when an instruction depends on the result of a prior instruction, and control hazards involving branches. Common solutions like stalling the pipeline by inserting bubbles are presented. Specific examples of structural hazards due to sharing functional units like memory and data hazards when instructions depend on a value not yet computed are described. Forwarding of register values and stalling the pipeline are shown as techniques to resolve such hazards.

Uploaded by

Mahmoud Magdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

ECE 361

Computer Architecture
Lecture 13: Designing a Pipeline Processor

361 hazards.1

Review: A Pipelined Datapath

Clk

Ifetch Reg/Dec Exec Mem Wr

RegWr ExtOp ALUOp Branch

1
0
PC+4
PC+4
PC+4
PC

Imm16
Imm16
Mem/Wr Register
Ex/Mem Register

Rs busA Zero Data


ID/Ex Register
IF/ID Register

A Ra
busB Mem
Exec
IUnit

Rb RA Do 1
Rt Unit WA
RFile
Mux

Di
Rt Rw Di
I 0 0
Rd 1

RegDst ALUSrc MemWr MemtoReg


361 hazards.2

1
Review: Pipeline Control “Data Stationary Control”

° The Main Control generates the control signals during Reg/Dec


• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

Reg/Dec Exec Mem Wr

ExtOp ExtOp
ALUSrc ALUSrc

Ex/Mem Register

Mem/Wr Register
ALUOp ALUOp
ID/Ex Register
IF/ID Register

Main RegDst RegDst


Control
MemWr MemWr MemWr
Branch Branch Branch

MemtoReg MemtoReg MemtoReg MemtoReg


RegWr RegWr RegWr RegWr

361 hazards.3

Review: Pipeline Summary

° Pipeline Processor:
• Natural enhancement of the multiple clock cycle processor
• Each functional unit can only be used once per instruction
• If a instruction is going to use a functional unit:
- it must use it at the same stage as all other instructions
• Pipeline Control:
- Each stage’s control signal depends ONLY on the instruction
that is currently in that stage

361 hazards.4

2
Outline of Today’s Lecture

° Recap and Introduction

° Introduction to Hazards

° Forwarding

° 1 cycle Load Delay

° 1 cycle Branch Delay

° What makes pipelining hard

° Summary

361 hazards.5

Its not that easy for computers

° Limits to pipelining: Hazards prevent next instruction from executing


during its designated clock cycle
• structural hazards: HW cannot support this combination of
instructions
• data hazards: instruction depends on result of prior instruction
still in the pipeline
• control hazards: pipelining of branches & other instructions that
change the PC

° Common solution is to stall the pipeline until the hazard is resolved,


inserting one or more “bubbles” in the pipeline

361 hazards.6

3
Single Memory is a Structural Hazard

Time (clock cycles)

ALU
I Mem Reg Mem Reg
n Load
s

ALU
Mem Reg Mem Reg
t Instr 1
r.

ALU
Mem Reg Mem Reg
O Instr 2
r

ALU
d Mem Reg Mem Reg
e
Instr 3
r

ALU
Mem Reg Mem Reg
Instr 4

361 hazards.7

Option 1: Stall to resolve Memory Structural Hazard

Time (clock cycles)


ALU

I Mem Reg Mem Reg


n Load
s
ALU

Mem Reg Mem Reg


t Instr 1
r.
ALU

Mem Reg Mem Reg


O Instr 2
r
d
Instr 3(stall)
ALU

bubble Mem Reg Mem Reg


e
r
Instr 4
ALU

Mem Reg Mem Reg

361 hazards.8

4
Option 2: Duplicate to Resolve Structural Hazard
• Separate Instruction Cache (Im) & Data Cache (Dm)
Time (clock cycles)

ALU
I Im Reg Dm Reg
n Load
s

ALU
Im Reg Dm Reg
t Instr 1
r.

ALU
Im Reg Dm Reg
O Instr 2
r

ALU
d Im Reg Dm Reg
e
Instr 3
r

ALU
Im Reg Dm Reg
Instr 4

361 hazards.9

Data Hazard on r1

add r1 ,r2,r3

sub r4, r1 ,r3

and r6, r1 ,r7

or r8, r1 ,r9

xor r10, r1 ,r11

361 hazards.10

5
Data Hazard on r1: (Figure 6.30, page 397, P&H)

• Dependencies backwards in time are hazards

Time (clock cycles)


IF ID/RF EX MEM WB

ALU
I add r1,r2,r3 Im Reg Dm Reg

ALU
s
t
sub r4,r1,r3 Im Reg Dm Reg

r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11

361 hazards.11

Option1: HW Stalls to Resolve Data Hazard

• Dependencies backwards in time are hazards

Time (clock cycles)


IF ID/RF EX MEM WB
ALU

I add r1,r2,r3 Im Reg Dm Reg

n
ALU

s
t
sub r4, r1,r3 Im bubble bubble bubble Reg Dm Reg

r.
and r6,r1,r7
ALU

Im Reg Dm
O
r
or r8,r1,r9
ALU

d Im Reg
e
r Im Reg
xor r10,r1,r11

361 hazards.12

6
But recall use of “Data Stationary Control”

° The Main Control generates the control signals during Reg/Dec


• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

Reg/Dec Exec Mem Wr

ExtOp ExtOp
ALUSrc ALUSrc

Ex/Mem Register

Mem/Wr Register
ALUOp ALUOp
ID/Ex Register
IF/ID Register

Main RegDst RegDst


Control
MemWr MemWr MemWr
Branch Branch Branch

MemtoReg MemtoReg MemtoReg MemtoReg


RegWr RegWr RegWr RegWr

361 hazards.13

Option 1: How HW really stalls pipeline


• HW doesn’t change PC => keeps fetching same instruction
& sets control signals to benign values (0)
Time (clock cycles)
IF ID/RF EX MEM WB
ALU

I add r1,r2,r3 Im Reg Dm Reg

n
s
t
stall Im bubble bubble bubble bubble

r.
stall Im bubble bubble bubble bubble
O
r Im bubble bubble bubble bubble
d stall
e
sub r4,r1,r3
ALU

Im Reg Dm Reg
r

and r6,r1,r7
ALU

Im Reg Dm

361 hazards.14

7
Option 2: SW inserts indepdendent instructions

• Worst case inserts NOP instructions

Time (clock cycles)


IF ID/RF E MEM WB

ALU
X
I add r1,r2,r3 Im Reg Dm Reg

ALU
s Im Reg Dm Reg
t
nop
r.

ALU
Im Reg Dm Reg
nop
O

ALU
r Im Reg Dm Reg
d nop
e
sub r4,r1,r3

ALU
Im Reg Dm Reg
r

and r6,r1,r7

ALU
Im Reg Dm

361 hazards.15

Questions and Administrative Matters

361 hazards.16

8
Option 3 Insight: Data is available! )
• Pipeline registers already contain needed data

Time (clock cycles)


IF ID/RF E MEM WB

ALU
X
I add r1,r2,r3 Im Reg Dm Reg

ALU
s
t
sub r4,r1,r3 Im Reg Dm Reg

r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11

361 hazards.17

HW Change for “Forwarding” (Bypassing):)

• Increase multiplexors to add paths from pipeline registers


• Assumes register read during write gets new value
(otherwise more results to be forwarded)

361 hazards.18

9
From Last Lecture: The Delay Load Phenomenon
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Clock

I0: Load Ifetch Reg/Dec Exec Mem Wr

Plus 1 Ifetch Reg/Dec Exec Mem Wr

Plus 2 Ifetch Reg/Dec Exec Mem Wr

Plus 3 Ifetch Reg/Dec Exec Mem Wr

Plus 4 Ifetch Reg/Dec Exec Mem Wr

° Although Load is fetched during Cycle 1:


• The data is NOT written into the Reg File until the end of Cycle 5
• We cannot read this value from the Reg File until Cycle 6
• 3-instruction delay before the load take effect

361 hazards.19

Forwarding reduces Data Hazard to 1 cycle:

Time (clock cycles)


IF ID/RF EX MEM WB
ALU

I lw r1, 0(r2) Im Reg Dm Reg

n
ALU

s
t
sub r4,r1,r6 Im Reg Dm Reg

r.
ALU

Im Reg Dm Reg
and r6,r1,r7
O
ALU

r Im Reg Dm Reg
d or r8,r1,r9
e
r

361 hazards.20

10
Option1: HW Stalls to Resolve Data Hazard
• “Interlock”: checks for hazard & stalls

Time (clock cycles)


IF ID/RF EX MEM WB

ALU
I lw r1, 0(r2) Im Reg Dm Reg

n
s
t
stall Im bubble bubble bubble bubble

r.

ALU
sub r4,r1,r3 Im Reg Dm Reg
O
r

ALU
Im Reg Dm Reg
d and r6,r1,r7
e

ALU
r Im Reg Dm Reg
or r8,r1,r9

361 hazards.21

Option 2: SW inserts independent instructions


• Worst case inserts NOP instructions
• MIPS I solution: No HW checking
Time (clock cycles)
IF ID/RF EX MEM WB
ALU

I lw r1, 0(r2) Im Reg Dm Reg

n
ALU

s
t
nop Im Reg Dm Reg

r.
ALU

sub r4,r1,r3 Im Reg Dm Reg


O
r
ALU

Im Reg Dm Reg
d and r6,r1,r7
e
ALU

r Im Reg Dm Reg
or r8,r1,r9

361 hazards.22

11
Software Scheduling to Avoid Load Hazards

Try producing fast code for


a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f
in memory.
Slow code:
LW Rb,b
LW Rc,c
ADD Ra,Rb,Rc
SW a,Ra
LW Re,e
LW Rf,f
SUB Rd,Re,Rf
SW d,Rd
361 hazards.23

Software Scheduling to Avoid Load Hazards

Try producing fast code for


a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f
in memory.
Slow code:
Fast code:
LW Rb,b
LW Rb,b
LW Rc,c
LW Rc,c
ADD Ra,Rb,Rc LW Re,e
SW a,Ra ADD Ra,Rb,Rc
LW Re,e LW Rf,f
LW Rf,f SW a,Ra
SUB Rd,Re,Rf SUB Rd,Re,Rf
SW d,Rd SW d,Rd

361 hazards.24

12
Compiler Avoiding Load Stalls:

scheduled unscheduled

54%
gcc
31%

spice 42%
14%
tex 65%
25%

0% 20% 40% 60% 80%


% loads stalling pipeline

361 hazards.25

From Last Lecture: The Delay Branch Phenomenon


Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11
Clk

12: Beq Ifetch Reg/Dec Exec Mem Wr


(target is 1000)
16: R-type Ifetch Reg/Dec Exec Mem Wr

20: R-type Ifetch Reg/Dec Exec Mem Wr

24: R-type Ifetch Reg/Dec Exec Mem Wr

1000: Target of Br Ifetch Reg/Dec Exec Mem Wr

° Although Beq is fetched during Cycle 4:


• Target address is NOT written into the PC until the end of Cycle 7
• Branch’s target is NOT fetched until Cycle 8
• 3-instruction delay before the branch take effect

361 hazards.26

13
Control Hazard on Branches: 3 stage stall

361 hazards.27

Branch Stall Impact

° If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

° 2 part solution:
• Determine branch taken or not sooner, AND
• Compute taken branch address earlier

° MIPS branch tests = 0 or ° 0

° Solution Option 1:
• Move Zero test to ID/RF stage
• Adder to calculate new PC in ID/RF stage
• 1 clock cycle penalty for branch vs. 3

361 hazards.28

14
Option 1: move HW forward to reduce branch delay

Instruction Instr. Decode Execute Memory Write


Fetch Reg. Fetch Addr. Calc. Access Back

361 hazards.29

Branch Delay now 1 clock cycle

Instruction Instr. Decode Execute Memory Write


Fetch Reg. Fetch Addr. Calc. Access Back

361 hazards.30

15
Option 2: Define Branch as Delayed

° Worst case, SW inserts NOP into branch delay

° Where get instructions to fill branch delay slot?


• Before branch instruction
• From the target address: only valuable when branch
• From fall through: only valuable when don’t branch

° Compiler effectiveness for single branch delay slot:


• Fills about 60% of branch delay slots
• About 80% of instructions executed in branch delay slots useful in
computation
• about 50% (60% x 80%) of slots usefully filled

361 hazards.31

When is pipelining hard?

° Interrupts: 5 instructions executing in 5 stage pipeline


• How to stop the pipeline?
• Restrart?
• Who caused the interrupt?
Stage Problem interrupts occurring
IF Page fault on instruction fetch; misaligned memory
access; memory-protection violation
ID Undefined or illegal opcode
EX Arithmetic interrupt
MEM Page fault on data fetch; misaligned memory
access; memory-protection violation

361 hazards.32

16
When is pipelining hard?

° Complex Addressing Modes and Instructions

° Address modes: Autoincrement causes register change during


instruction execution
• Interrupts?
• Now worry about write hazards since write no longer last stage
- Write After Read (WAR): Write occurs before independent read
- Write After Write (WAW): Writes occur in wrong order, leaving
wrong result in registers
- (Previous data hazard called RAW, for Read After Write)

° Memory-memory Move instructions


• Multiple page faults
• make progress?

361 hazards.33

When is pipelining hard?

° Floating Point: long execution time


° Also, may pipeline FP execution unit so that can initiate new
instructions without waiting full latency
FP Instruction Latency Initiation Rate (MIPS R4000)
Add, Subtract 4 3
Multiply 8 4
Divide 36 35
Square root 112 111
Negate 2 1
Absolute value 2 1
FP compare 3 2
° Divide, Square Root take -10X to -30X longer than Add
• Exceptions?
• Adds WAR and WAW hazards since pipelines are no longer
same length

361 hazards.34

17
Hazard Detection

Suppose instruction i is about to be issued and a predecessor


instruction j is in the instruction pipeline.
Rregs ( i ) = Registers read by instruction i
Wregs ( i ) = Registers written by instruction i
° A RAW hazard exists on register ρ if ∃ ρ, ρ ∈ Rregs( i ) ∩ Wregs( j )
– Keep a record of pending writes (for inst's in the pipe) and compare
with operand regs of current instruction.
– When instruction issues, reserve its result register.
– When on operation completes, remove its write reservation.

° A WAW hazard exists on register ρ if ∃ ρ, ρ ∈ Wregs( i ) ∩ Wregs( j )

° A WAR hazard exists on register ρ if ∃ ρ, ρ ∈ Wregs( i ) ∩ Rregs( j )

361 hazards.35

Avoiding Data Hazards by Design


Suppose instructions are executed in a pipelined fashion such that
Instructions are initiated in order.

° WAW avoidance: if writes to a particular resource (e.g., reg) are


performed in the same stage for all instructions, then no WAW
hazards occur.
proof: writes are in the same time sequence as instructions.
I R/D E W
I R/D E W
I R/D E W

° WAR avoidance: if in all instructions reads of a resource occur at an


earlier stage than writes to that resource occur in any instruction,
then no WAR hazards occur.
proof: A successor instruction must issue later, hence it will perform
writes only after all reads for the current instruction.

361 hazards.36

18
First Generation RISC Pipelines

° All instructions follow same pipeline order (“static schedule”).


° Register write in last stage
– Avoid WAW hazards
° All register reads performed in first stage after issue.
– Avoid WAR hazards
° Memory access in stage 4
– Avoid all memory hazards
° Control hazards resolved by delayed branch (with fast path)
° RAW hazards resolved by bypass, except on load results
which are resolved by fiat (delayed load).

Substantial pipelining with very little cost or complexity.


Machine organization is (slightly) exposed!
Relies very heavily on "hit assumption"of memory accesses in cache

361 hazards.37

Review: Summary of Pipelining Basics

° Speed Up Š Pipeline Depth; if ideal CPI is 1, then:

Speedup = Pipeline depth " Clock cycle unpipelined


1+Pipeline stall cycles per instruction Clock cycle pipelined
° Hazards limit performance on computers:
! • structural: need more HW resources
• data: need forwarding, compiler scheduling
• control: early evaluation & PC, delayed branch, prediction

° Increasing length of pipe increases impact of hazards since pipelining


helps instruction bandwidth, not latency

° Compilers key to reducing cost of data and control hazards


• load delay slots
• branch delay slots

° Exceptions, Instruction Set, FP makes pipelining harder

° Longer pipelines => Branch prediction, more instruction parallelism?

361 hazards.38

19

You might also like