0% found this document useful (0 votes)

23 views39 pages

Pipeline

The document discusses instruction pipelining, explaining how it works and its benefits. Pipelining allows multiple instructions to be processed simultaneously by overlapping their execution across different stages. While pipelining can improve throughput, hazards like structural, data, and control hazards can reduce its effectiveness if they cause stalls.

Uploaded by

Nepal Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views39 pages

Pipeline

Uploaded by

Nepal Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction to Instruction Pipelining

Pipelining

Pipelining is an implementation technique in which

multiple instructions are overlapped in execution

Here we will consider RISC architecture

Memory Access by Load/Store
All other instructions use registers.
Pipelining is Natural!

Laundry Example
Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes

Dryer takes 40 minutes

“Folder” takes 20 minutes

Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D

Sequential laundry takes 6 hours for 4 loads

If they learned pipelining, how long would laundry take?
Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D

Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons
Pipelining doesn’t help
6 PM 7 8 9 latency of single task, it helps
throughput of entire workload
Time
Pipeline rate is limited by
slowest pipeline stage
30 40 40 40 40 20
T Multiple tasks operating
a A simultaneously using
s different resources
k
Potential speedup = Number
B pipeline stages
O
r Unbalanced lengths of
d C pipeline stages reduces
e speedup
r
D Time to “fill” pipeline and
time to “drain” it reduces
speedup

Stall for Dependencies

The Five Stages of Load
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch Reg/Dec Exec Mem Wr

Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file

Branch instruction – 2 stages

Store instruction – 4 stages
ALU instruction – 5 stages (4th stage is idle)
Pipelining
Improve performance by increasing throughput

Ideal speedup is number of stages in the pipeline.

Do we achieve this? NO!
The computer pipeline stage time are limited by the slowest resource, either
the ALU operation, or the memory access
Fill and drain time
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1 Cycle 2
Clk
Single Cycle Implementation:
Load Store Waste

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk

Multiple Cycle Implementation:

Load Store R-type
Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch

Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr

Store Ifetch Reg Exec Mem Wr

R-type Ifetch Reg Exec Mem Wr

Why Pipeline?
Suppose we execute 100 instructions
Single Cycle Machine
45 ns/cycle x 1 CPI x 100 inst = 4500 ns
Multicycle Machine
10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
Ideal pipelined machine
10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Why Pipeline? Because the resources are there!
Time (clock cycles)

ALU
I Im Reg Dm Reg
n Inst 0
s

ALU
t Inst 1 Im Reg Dm Reg
r.

ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3

ALU
Im Reg Dm Reg
e
r
Inst 4

ALU
Im Reg Dm Reg
Speedup and Efficiency

k-stage pipeline processes n tasks in k + (n-1) clock

cycles:

k cycles for the first task and n-1 cycles for the remaining
n-1 tasks

Total time to process n tasks Tk = [ k + (n-1)]t

For the non-pipelined processor T1 = n k t

Speedup factor T1 nkt nk

Sk =
Tk = [ k + (n-1)] t= k + (n-1)

nk ≈k
If n is very large (n >> k), then Sk ≈ n
Efficiency and Throughput

Efficiency of the k-stages pipeline:

Sk n
Ek = = k + (n-1)
k

Pipeline throughput (the number of tasks per unit time):

Hk = [ k +n(n-1)] t = nf
k + (n-1)
Can pipelining get us into trouble?
Yes: Pipeline Hazards
Structural hazards: attempt to use the same resource two different
ways at the same time
E.g., combined washer/dryer would be a structural hazard or folder busy
doing something else (watching TV)
Single memory cause structural hazards
Data hazards: attempt to use item before it is ready
E.g., one sock of pair in dryer and one in washer; can’t fold until you get
sock from washer through dryer
instruction depends on result of prior instruction still in the pipeline
Control hazards: attempt to make a decision before condition is
evaluated
E.g., washing football uniforms and need to get proper detergent level;
need to see after dryer before next load in
branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards
Slow Down From Stalls

• Perfect pipelining with no hazards à an instruction completes every

cycle (total cycles ~ num instructions) k + (n-1) ≈ n

à speedup = increase in clock speed = num pipeline stages

nk
Sk ≈ ≈k
n
•With hazards and stalls, some cycles (= stall time) go by
during which no instruction completes, and then the stalled
instruction completes

• Total cycles = number of instructions + stall cycles

• Slowdown because of stalls = 1/ (1 + stall cycles per instr)

Speedup Equation for Pipelining
Compared to unpipelined,

Now it is evident that

Speedup Equation for Pipelining

Putting in the equation of speedup

Thus, if there are no stalls, the speedup is equal to the number

of pipeline stages, matching our intuition for the ideal case.
Single Memory is a Structural Hazard
Time (clock cycles)

ALU
I Mem Mem Reg
n
Load Reg

ALU
Mem Reg Mem Reg
t Instr 1
r.

ALU
Mem Reg Mem Reg
O Instr 2
r

ALU
d Mem Reg Mem Reg
e
Instr 3
r

ALU
Mem Reg Mem Reg
Instr 4
Single Memory is a Structural Hazard
Time (clock cycles)

ALU
I Mem Mem Reg
n
Load Reg

ALU
Mem Reg Mem Reg
t Instr 1
r.

ALU
Mem Reg Mem Reg
O Instr 2
r
d
Instr 3

ALU
Bubble Mem Reg Mem Reg
e
r
Instr 4

ALU
Mem Reg Mem Reg

One cycle stall for structural hazard

Example: Dual-port vs. Single-port

Machine A: Dual ported memory

Machine B: Single ported memory, but its pipelined implementation has
a 1.05 times faster clock rate
Ideal CPI = 1 for both
Data references are of 40% mix

SpeedUpA = [1/(1 + 0)] x (clockunpipe/clockpipe)

= Pipeline Depth
SpeedUpB = [1/(1 + 0.4x1)] x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

Machine A is 1.33 times faster

Control Hazard
When a branch is executed, it may or may not change the
PC to something other than incrementing it.

If a branch changes the PC to its target address, it is

called a taken branch;

If it falls through, it is not taken, or untaken.

If instruction i is a taken branch, then the PC is normally

not changed until the end of ID, after the completion of the
address calculation and comparison.
Control Hazard Solution #1: Stall
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r
Load Lost

ALU
Mem Reg Mem Reg
d potential
e
r

Stall: wait until decision is clear

Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) =>slow

Move decision to end of decode by improving hardware

save 1 cycle per branch

If 20% instructions are BEQ, all others have CPI 1, what is the average
CPI?
Control Hazard Solution #1: Stall
Control Hazard Solution #2: Predict
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r
Load

ALU
Mem Reg Mem Reg
d
e
r

Predict: guess one direction (taken/untaken) then back up if wrong

Impact: 0 lost cycles per branch instruction if right, 1 if wrong
(right 50% of time)
Need to “Squash” and restart following instruction if wrong
Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5
Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)
Control Hazard Solution #2: Predict
Control Hazard Solution #3: Delayed Branch
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r

ALU
d Misc Mem Reg Mem Reg

ALU
r Load Mem Reg Mem Reg

Delayed Branch: Redefine branch behavior (takes place after

next instruction)
Impact: 0 extra clock cycles per branch instruction if can find
instruction to put in “slot” ( 50% of time)
The longer the pipeline, the harder to fill
Used by MIPS architecture
Control Hazard Solution #3: Delayed Branch
Scheduling Branch Delay Slots (Fig A.14)
A. From before branch B. From branch target C. From fall through
add $1,$2,$3 sub $4,$5,$6 add $1,$2,$3
if $2=0 then if $1=0 then
delay slot delay slot
add $1,$2,$3
if $1=0 then
delay slot sub $4,$5,$6

becomes becomes becomes

add $1,$2,$3
if $2=0 then if $1=0 then
add $1,$2,$3 sub $4,$5,$6
add $1,$2,$3
if $1=0 then
sub $4,$5,$6

A is the best choice, fills delay slot & reduces instruction count (IC)
In B, the sub instruction may need to be copied, increasing IC
In B and C, must be okay to execute sub when branch fails
More On Delayed Branch

Compiler effectiveness for single branch delay

slot:
Fills about 60% of branch delay slots
About 80% of instructions executed in branch delay slots
useful in computation
About 50% (60% x 80%) of slots usefully filled
Evaluating Branch Alternatives
A simplified pipeline speedup equation for Branch:

Pipeline speedup = Pipeline depth

1 +Branch frequency´Branch penalty

Assume that in a deeper pipeline, it takes at least three pipeline

stages before the branch-target address is known and an
additional cycle before the branch condition is evaluated

Assume 4% unconditional branch

6% conditional branch- untaken
10% conditional branch-taken
Evaluating Branch Alternatives
Data Hazard on r1
An instruction depends on the result of a previous instruction still in the pipeline

add r1 ,r2,r3

sub r4, r1 ,r3

and r6, r1 ,r7

or r8, r1 ,r9

xor r10, r1 ,r11

Data Hazard on r1:
• Dependencies backwards in time are hazards

Time (clock cycles)

IF ID/RF EX MEM WB

ALU
I add r1,r2,r3 Im Reg Dm Reg

ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11
Data Hazard Solution:
• “Forward” result from one stage to another
Time (clock cycles)
IF ID/RF EX MEM WB

ALU
I add r1,r2,r3 Im Reg Dm Reg

ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11
• “or” OK if define read/write properly
•Forwarding can’t prevent all data hazard! – lw followed by R-type?
Forwarding (or Bypassing): What about Loads?
• Dependencies backwards in time are hazards
Time (clock cycles)
IF ID/RF EX MEM WB

ALU
lw r1,0(r2) Im Reg Dm Reg

ALU
Im Reg Dm Reg
sub r4,r1,r3

• Can’t solve with forwarding:

• Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads

• Dependencies backwards in time are hazards

Time (clock cycles)
IF ID/RF EX MEM WB

ALU
lw r1,0(r2) Im Reg Dm Reg

ALU
sub r4,r1,r3 Stall Im Reg Dm Reg

• Can’t solve with forwarding:

• Must delay/stall instruction dependent on loads
Software Scheduling to Avoid Load
Hazards
Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code: Fast code:
LW Rb,b LW Rb,b
LW Rc,c LW Rc,c
ADD Ra,Rb,Rc LW Re,e
SW a,Ra ADD Ra,Rb,Rc
LW Re,e LW Rf,f
LW Rf,f SW a,Ra
SUB Rd,Re,Rf SUB Rd,Re,Rf
SW d,Rd SW d,Rd

Compiler optimizes for performance by out-of-order execution

Summary: Pipelining
What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores; Memory
addresses are asigned
What makes it hard?
structural hazards: suppose we had only one memory
control hazards: need to worry about branch instructions
data hazards: an instruction depends on a previous instruction

We’ll talk about modern processors and what really

makes it hard:
trying to improve performance with out-of-order execution, etc.
Summary
Pipelining is a fundamental concept
multiple steps using distinct resources
Utilize capabilities of the Datapath by pipelined
instruction processing
start next instruction while working on the current one
limited by length of longest stage (plus fill/flush)
detect and resolve hazards

Computer Organization and Design Pipeliing-Chapter+4 Slides
No ratings yet
Computer Organization and Design Pipeliing-Chapter+4 Slides
131 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
No ratings yet
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
7 pages
VHDL Implementation of A Mips-32 Pipeline Processor
No ratings yet
VHDL Implementation of A Mips-32 Pipeline Processor
5 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Pipelining
No ratings yet
Pipelining
43 pages
Chapter 6 Pipelining Summary Computer Organization
No ratings yet
Chapter 6 Pipelining Summary Computer Organization
8 pages
Question Bank
No ratings yet
Question Bank
10 pages
CPU Design Pipelining Superscalar VLIW Presentation
No ratings yet
CPU Design Pipelining Superscalar VLIW Presentation
27 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Unit 3 Students
No ratings yet
Unit 3 Students
37 pages
Project Report
No ratings yet
Project Report
23 pages
32 Hazards in Pipeline 06-04-2023
No ratings yet
32 Hazards in Pipeline 06-04-2023
24 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Gate Cse Cao
100% (1)
Gate Cse Cao
108 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
What Are The Basic Components in A Microprocessor
No ratings yet
What Are The Basic Components in A Microprocessor
5 pages
DLCA Musa QB November 2024
No ratings yet
DLCA Musa QB November 2024
30 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Computer Architecture: CS/B.TECH (CSE-NEW) /SEM-4/CS-403/2012
No ratings yet
Computer Architecture: CS/B.TECH (CSE-NEW) /SEM-4/CS-403/2012
8 pages
An Introduction To Digital Design Using A
No ratings yet
An Introduction To Digital Design Using A
30 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Pipe Lining
No ratings yet
Pipe Lining
61 pages
Review FinalExam
No ratings yet
Review FinalExam
10 pages
A. Instruction-Level Parallelism: Ntroduction
No ratings yet
A. Instruction-Level Parallelism: Ntroduction
3 pages
Lecture-16 CH-04 4
No ratings yet
Lecture-16 CH-04 4
21 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
Computer Architecture Assignment: The ARM Cortex-A53
No ratings yet
Computer Architecture Assignment: The ARM Cortex-A53
8 pages
ACA - Chapter 6
No ratings yet
ACA - Chapter 6
75 pages
Computer Organization & Computer Organization & Computer Organization & Computer Organization & Assembly Languages Assembly Languages
No ratings yet
Computer Organization & Computer Organization & Computer Organization & Computer Organization & Assembly Languages Assembly Languages
119 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
61 pages
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
No ratings yet
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
69 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
Pipelining
No ratings yet
Pipelining
47 pages
Pipeline
No ratings yet
Pipeline
33 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Lec 1
No ratings yet
Lec 1
30 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
CTE 433 Computer Architecture II
No ratings yet
CTE 433 Computer Architecture II
28 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Piplining
No ratings yet
Piplining
23 pages
CA
No ratings yet
CA
3 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Pipelining
No ratings yet
Pipelining
44 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Shri G.S. Institute of Technology and Science: Computer Architecture and Organisation (CO-24009) Session: 2019-2020
No ratings yet
Shri G.S. Institute of Technology and Science: Computer Architecture and Organisation (CO-24009) Session: 2019-2020
27 pages
Systems I: Pipelining II
No ratings yet
Systems I: Pipelining II
30 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
100% (2)
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
24 pages
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
No ratings yet
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
6 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages

Pipeline

Uploaded by

Pipeline

Uploaded by

Introduction to Instruction Pipelining

Pipelining is an implementation technique in which

Here we will consider RISC architecture

Dryer takes 40 minutes

“Folder” takes 20 minutes

Sequential laundry takes 6 hours for 4 loads

Pipelined laundry takes 3.5 hours for 4 loads

Stall for Dependencies

Load Ifetch Reg/Dec Exec Mem Wr

Ifetch: Instruction Fetch

Branch instruction – 2 stages

Ideal speedup is number of stages in the pipeline.

Multiple Cycle Implementation:

Store Ifetch Reg Exec Mem Wr

R-type Ifetch Reg Exec Mem Wr

k-stage pipeline processes n tasks in k + (n-1) clock

Total time to process n tasks Tk = [ k + (n-1)]t

For the non-pipelined processor T1 = n k t

Speedup factor T1 nkt nk

Efficiency of the k-stages pipeline:

Pipeline throughput (the number of tasks per unit time):

• Perfect pipelining with no hazards à an instruction completes every

à speedup = increase in clock speed = num pipeline stages

• Total cycles = number of instructions + stall cycles

• Slowdown because of stalls = 1/ (1 + stall cycles per instr)

Now it is evident that

Putting in the equation of speedup

Thus, if there are no stalls, the speedup is equal to the number

One cycle stall for structural hazard

Machine A: Dual ported memory

SpeedUpA = [1/(1 + 0)] x (clockunpipe/clockpipe)

Machine A is 1.33 times faster

If a branch changes the PC to its target address, it is

If it falls through, it is not taken, or untaken.

If instruction i is a taken branch, then the PC is normally

Stall: wait until decision is clear

Move decision to end of decode by improving hardware

Predict: guess one direction (taken/untaken) then back up if wrong

Delayed Branch: Redefine branch behavior (takes place after

becomes becomes becomes

Compiler effectiveness for single branch delay

Pipeline speedup = Pipeline depth

Assume that in a deeper pipeline, it takes at least three pipeline

Assume 4% unconditional branch

sub r4, r1 ,r3

and r6, r1 ,r7

xor r10, r1 ,r11

Time (clock cycles)

• Can’t solve with forwarding:

• Dependencies backwards in time are hazards

• Can’t solve with forwarding:

Compiler optimizes for performance by out-of-order execution

We’ll talk about modern processors and what really

You might also like