0% found this document useful (0 votes)

10 views64 pages

Module 2

The document discusses pipelining in computer architecture, explaining its basic concepts, characteristics, and advantages, such as increased throughput and reduced cycles per instruction. It details single-cycle and multi-cycle datapaths, highlighting their respective benefits and drawbacks, as well as the various types of hazards that can occur during pipelining, such as structural, data, and control hazards. Additionally, it covers techniques for resolving these hazards, including forwarding and pipeline interlocks.

Uploaded by

anikaarajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views64 pages

Module 2

Uploaded by

anikaarajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Pipelining: Basic and

Intermediate Concepts
Chappathi Making
Chappathi Making (Contd.)
Chappathi Making (Contd.)
Pipelining Basics

• Pipelining overlaps execution of instructions to

improve performance.
• Pipelining partitions the system into multiple
independent stages with added buffers between
the stages.
• Similar to an assembly line where each stage
handles part of the task.
• Each stage works concurrently on different
instructions.
• Key metric: throughput, or instructions completed
per unit time.
Classic Five-Stage RISC
Pipeline

• Five stages: Instruction Fetch (IF),

Instruction Decode (ID), Execute
(EX), Memory Access (MEM), Write-
Back (WB).
• Each new instruction begins one
cycle after the previous.
• Multiple instructions progress
simultaneously through the pipeline.
Visualizing Pipelining
Pipelining Characteristics

Time Pipelining doesn’t reduce

30 40 40 40 40 20 latency of single task, it

A improves throughput of entire

workload
B
Pipeline rate limited by slowest
C pipeline stage

D Potential speedup = Number of

pipe stages

Unbalanced lengths of pipe stages reduces speedup

Time to “fill” pipeline and time to “drain” it reduces speedup

Datapaths – single cycle and multi-cycle

●
Datapath
– collection of functional units (like ALU,
registers, memory) and their connections
that execute instructions
– It’s the “hardware pathway” data travels
through during instruction execution.
– two kinds of Datapaths: Single Cycle
Datapath and Multiple Cycle Datapath

26 / 35
Single Cycle Datapath
●
Every instruction completes in exactly one clock
cycle — no matter if it’s a simple ADD or a complex
LOAD.
●
Key characteristics:
●
One long clock cycle that must be long enough for
the slowest instruction.
●
All steps (fetch, decode, execute, memory access,
write-back) happen in that one cycle.
●
Hardware must duplicate some resources (e.g.,
separate instruction and data memory) so multiple
steps can happen at once in that cycle.

27 / 35
Single Cycle Datapath (Contd.)
●
Advantages of Single Cycle Datapath
●
Simplicity: It is less complex than top-down since
its development is easier than that of other large
systems and as such perfect for such systems.
●
Consistent Timing: All of the instructions take
equal amounts of time.

28 / 35
Single Cycle Datapath (Contd.)
●
Disadvantages of Single Cycle Datapath
●
Inefficient for Complex Instructions: Since all
instructions, including the basic ones, have to wait
for the longest operation to be completed, this can
result in more time being taken.
●
Duplicate Hardware Requirement: It calls for
more hardware due to the fact that functional units
cannot be shared or utilized in the same instruction
cycle.

29 / 35
Multi-Cycle Datapath
●
Each instruction is split into multiple steps, with each
step taking one clock cycle.
●
Different instructions take different numbers of
cycles depending on complexity.
●
Key characteristics:
●
Shorter clock cycle (based on the slowest single step,
not the slowest whole instruction).
●
Can reuse hardware (same memory for instruction
and data in different cycles).
●
Control logic is more complex (often
microprogrammed).

30 / 35
Multi-Cycle Datapath (Contd.)
●
Advantages of Multi-Cycle Datapath
●
More efficient — clock period is shorter, simple
instructions finish quickly, complex ones take longer.
●
Less hardware duplication - Some functional units
can be utilized more than once during a single step
of the same instruction, as is the case with ALU.

31 / 35
Multi-Cycle Datapath (Contd.)
●
Disadvantages of Multi-Cycle Datapath
●
Complex Control Logic: Additionally, the control
unit has to be more complex to handle multiple
cycles per one instruction.
●
Extra Registers: In between the steps, more
registers are required to hold results that will be used
in later operations or computations.
●
Slower overall throughput compared to pipelining
(but better than single-cycle).

32 / 35
Single Vs Multi-Cycle Datapaths

33 / 35
Pipeline – a series of datapaths

34 / 35
Datapath with pipeline registers

35 / 35
Pipelined RISC Data path
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back

Next PC
Next SEQ PC Next SEQ PC
Adder

4 RS1
Zero?

MEM/WB
Address

Reg File

EX/MEM
RS2

ID/EX
IF/ID

ALU
Memory

Memory

MUX

WB Data
Sign
Extend
Imm

RD RD RD
RISC MIPS Instruction Pipeline
 Each instruction can take at most 5 clock cycles
 Instruction fetch cycle (IF)
 Based on PC, fetch the instruction from memory
 Increment PC

Next PC
Instruction
Fetch Adder

4
Address

Memory
RISC MIPS Instruction Pipeline
 Instruction decode/register fetch cycle (ID)
 Decode the instruction + register read operation
 Fixed field decoding [ADD R1,R2,R2] OR [LW R1,8(R2) ]
Ex: A3.01.02.03 : 10100011 00000001 00000010 00000011
 Ex: 86.01.02.03 : 10000110 00000001 00001000 00000010

Next SEQ PC

Instr. Decode
RS1
Reg. Fetch
Reg File

RS2

Sign
Extend
Imm

RD
RISC MIPS Instruction Pipeline
 Execution/Effective address cycle (EX)
 Memory reference: Calculate the effective address
 [LW R1,8(R2) ] EFF ADDR= [R2] +8
 Register-register ALU instruction [ADD R1,R2,R2]

Execute
Zero?
Addr. Calc
ALU

RD
RISC MIPS Instruction Pipeline
 Memory access cycle (MEM)
 Load from memory and store in register [LW R1,8(R2)]
 Store the data from the register to memory [SW R3,16(R4)]

Memory
Access
Memory
RISC Instruction Pipeline
 Write-back cycle (WB)
 Register-register ALU instruction or load instruction
 Write to register file [LW R1,8(R2)] , [ADD R1,R2,R3]

Instr. Decode Write

Reg. Fetch Back

RS1

RS2
Reg File

Imm

Sign
Extend
Visualizing Pipelining
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

ALU
IM REG DM REG

IM REG ALU DM REG

ALU
IM REG DM REG
Benefits of Pipelining

• Increases instruction throughput

significantly.
• Can approach ideal speedup =
number of pipeline stages.
• Reduction in CPI (cycles per
instruction) for many workloads.
• Not visible to the programmer;
handled entirely in hardware.
Limits to pipelining
 Hazards: circumstances that would cause incorrect execution if
next instruction is fetched and executed

Structural hazards: Attempting to use the same hardware to do

two different things at the same time

Data hazards: Instruction depends on result of prior instruction

still in the pipeline

Control hazards: Caused by delay between the fetching of

instructions and decisions about changes in control flow
(branches)
Structural Hazard

Eg: Uniport Memory

Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
Load Ifetch Reg DMem Reg

ALU
Instr 1 Ifetch Reg DMem Reg

ALU
Instr 2 Ifetch Reg DMem Reg

ALU
Instr 3 Ifetch Reg DMem Reg

Instr 4

Structural Hazard
Resolving Structural Hazard
 Eliminate the use same hardware for two different things at the
same time

 Solution 1: Wait

 must detect the hazard

 must have mechanism to stall

 Solution 2: Duplicate hardware

 Multiple such units will help both instruction to progress

Handling Structural Hazards

• Duplicate or pipeline functional units

to avoid conflicts.
• Use stalls (pipeline bubbles) when
conflicts can't be avoided.
• Example: shared memory causing
fetch/store conflict.
• Optimized designs use split caches
and buffers to reduce stalls.
Detecting & Resolving Structural Hazard

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
Load Ifetch Reg DMem Reg

ALU
Instr 1 Ifetch Reg DMem Reg

ALU
Instr 2 Ifetch Reg DMem Reg

Stall Bubble Bubble Bubble Bubble Bubble

ALU
Instr 3 Ifetch Reg DMem Reg
Eliminating Structural Hazards at Design

Adder
Next PC
Next SEQ PC Next SEQ PC

4 Zero?
RS1

MUX MUX

MEM/WB
Address

RS2

EX/MEM
Reg File

ID/EX
IF/ID

ALU

Cache
Cache
Instr

Data

MUX

WB Data
Sign
Extend
Imm

RD RD RD
Data Hazard

Time (clock cycles)

IF ID/RF EX MEM WB

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
sub r4,r1,r3

ALU
and r6,r1,r7 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
or r8,r1,r9

ALU
xor r10,r1,r11 Ifetch Reg DMem Reg
Data Hazards and
Forwarding
• Data hazard: later instruction reads a
register before it’s updated.
• Forwarding (bypassing) sends results
directly to dependent stages.
• Prevents stalls for most ALU
operations.
• More complex forwarding needed for
multi-cycle instructions.
Three Generic Data Hazards

 Read After Write (RAW)

InstrJ tries to read operand before InstrI writes it

I : add r1,r2,r3
J: sub r4,r1,r3

 Caused by a data dependence

 This hazard results from an actual need for communication.
Three Generic Data Hazards
 Write After Read (WAR)
InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
 Called an anti-dependence by compiler writers.
 This results from reuse of the name r1
 Can’t happen in MIPS 5 stage pipeline because:
 All instructions take 5 stages, and
 Reads are always in stage 2, and
 Writes are always in stage 5
Three Generic Data Hazards
 Write After Write (WAW)
InstrJ writes operand before InstrI writes it.
I: sub r1,r4,r3

 Called an output dependence J: add r1,r2,r3

 This also results from the reuse of name r1. K: mul r6,r1,r7
 Can’t happen in MIPS 5 stage pipeline because:

 All instructions take 5 stages, and

 Writes are always in stage 5

 WAR and WAW happens in out of order pipes
Operand Forwarding to Avoid Data Hazard

Time (clock cycles)

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

ALU
sub r4,r1,r3 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
and r6,r1,r7

ALU
Ifetch Reg DMem Reg
or r8,r1,r9

ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
Pipeline Interlocks

• Hardware mechanism to detect and

resolve hazards.
• Stalls the pipeline when data isn't
available in time.
• Ensures correct execution without
programmer intervention.
• Example: stall after a load followed
by a dependent ALU op.
Branch Hazards

• Branches disrupt sequential

instruction flow.
• The target of the branch isn’t known
until later stage (ID or EX).
• Stalls or incorrect path execution can
occur.
• Common solution: freeze pipeline or
flush incorrect instructions.
Control Hazard on Branches

=> Three Stage Stall

ALU
10: beq r1,r3,36 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5

ALU
18: or r6,r1,r7 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9

ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
Four Branch Hazard Alternatives

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

#3: Predict Branch Taken

#4: Delayed Branch

Four Branch Hazard Alternatives

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

Execute successor instructions in sequence

“Squash” instructions in pipeline if branch actually taken

Four Branch Hazard Alternatives
#3: Predict Branch Taken

But branch target address in is not known by IF stage

Target is known at same time as branch outcome (IDstage)

 MIPS still incurs 1 cycle branch penalty

Four Branch Hazard Alternatives
#4: Delayed Branch
Define branch to take place AFTER one instruction
following the branch instruction.
 1 slot delay allows proper decision and branch target
address in 5 stage pipeline (MIPS uses this approach)

Where to get instructions to fill branch delay slot?

Conditional Branches
 When do you know you have a branch?
 During ID cycle (Could you know before that?)
 When do you know if the branch is Taken or Not-Taken
 During EXE cycle/ ID stage depending on the design
 We need for sophisticated solutions for following cases
 Modern pipelines are deep ( 10 + stages)
 Several instructions issued/cycle
 Several predicted branches in-flight at the same time
Simple static predictive schemes
 Predict branch Not -Taken (easiest to implement; default
for dynamic branch prediction)
If prediction correct no problem;
If prediction incorrect, delay = number of stages
between ID and EXE
 Predict branch Taken
Interesting only if target address can be computed
early
 Prediction depends on the direction of branch
Backward-Taken-Forward-Not-Taken (BTFNT)
Rationale: Backward branches at end of loops:
mostly taken
Dynamic branch prediction
 Execution of a branch requires knowledge of:
Branch instruction - encode whether instruction is a
branch or not. Decide on taken or not taken (i.e.,
prediction can be done at IF stage)
Whether the branch is Taken/Not-Taken (hence a branch
prediction mechanism)
If the branch is taken what is the target address (can be
computed but can also be “precomputed”, i.e., stored in
some table)
If the branch is taken what is the instruction at the branch
target address (saves the fetch cycle for that instruction)
Dynamic branch prediction
 Use a Branch Prediction Buffer (BPB)
Also called Branch Prediction Table (BPT), Branch History
Table (BHT)
Records previous outcomes of the branch instruction.
How to index into the table is an issue.
 A prediction using BPB is attempted when the branch
instruction is fetched (IF stage or equivalent)
 It is acted upon during ID stage (when we know we have a
branch)
Dynamic branch prediction
 Has a prediction been made (Y/N)
If not use default “Not Taken”
 Is it correct or incorrect ?
 Two cases:
Case 1: Yes and the prediction was correct (known at
ID stage) or No but the default was correct: No delay
Case 2: Yes and the prediction was incorrect or No and
the default was incorrect: Delay
RISC Instruction Set

• MIPS
RISC-V 64-bit
32/64 bit is used as the
representative RISC architecture.
• Three main instruction classes: ALU,
Load/Store, Branches.
• Fixed instruction format simplifies
decoding and pipelining.
• Registers are the only operands for
arithmetic/logical ops.
Unpipelined Implementation

• Executes one instruction at a time,

over multiple cycles.
• Typical instruction takes 4-5 cycles
(CPI ≈ 4.54).
• Easy to understand and implement,
but inefficient.
• Used as a baseline to illustrate
pipelining benefits.
Unpipelined RISC Data path

Instruction Instr. Decode Execute Memory Write

Fetch Reg. Fetch Addr. Calc Access Back
Next PC
Adder

Next SEQ PC

4 RS1
Zero?

Reg File
Address

RS2

ALU
Inst

Memory
Memory

RD L
M
D
Sign
Imm Extend

WB Data
Implementing the Pipeline

• Split instruction/data memory or

cache to avoid memory conflicts.
• Use pipeline registers between
stages (e.g., IF/ID, ID/EX).
• Careful control needed to avoid
simultaneous hardware usage.
• Balance stage durations to prevent
bottlenecks.
Pipelined RISC Data path
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back

Next PC
Next SEQ PC Next SEQ PC
Adder

4 RS1
Zero?

MEM/WB
Address

Reg File

EX/MEM
RS2

ID/EX
IF/ID

ALU
Memory

Memory

MUX

WB Data
Sign
Extend
Imm

RD RD RD
RISC MIPS Instruction Pipeline
RISC-V
 Each instruction can take at most 5 clock cycles
 Instruction fetch cycle (IF)
 Based on PC, fetch the instruction from memory
 Increment PC

Next PC
Instruction
Fetch Adder

4
Address

Memory
RISC MIPS Instruction Pipeline
RISC-V
 Instruction decode/register fetch cycle (ID)
 Decode the instruction + register read operation
 Fixed field decoding [ADD R1,R2,R2] OR [LW R1,8(R2) ]
Ex: A3.01.02.03 : 10100011 00000001 00000010 00000011
 Ex: 86.01.02.03 : 10000110 00000001 00001000 00000010

Next SEQ PC

Instr. Decode
RS1
Reg. Fetch
Reg File

RS2

Sign
Extend
Imm

RD
RISC MIPS Instruction Pipeline
RISC-V
 Execution/Effective address cycle (EX)
 Memory reference: Calculate the effective address
 [LW R1,8(R2) ] EFF ADDR= [R2] +8
 Register-register ALU instruction [ADD R1,R2,R2]

Execute
Zero?
Addr. Calc
ALU

RD
RISC MIPS Instruction Pipeline
RISC-V
 Memory access cycle (MEM)
 Load from memory and store in register [LW R1,8(R2)]
 Store the data from the register to memory [SW R3,16(R4)]

Memory
Access
Memory
RISC Instruction Pipeline
RISC-V
 Write-back cycle (WB)
 Register-register ALU instruction or load instruction
 Write to register file [LW R1,8(R2)] , [ADD R1,R2,R3]

Instr. Decode Write

Reg. Fetch Back

RS1

RS2
Reg File

Imm

Sign
Extend
RISC-V
RISC MIPS Instruction Pipeline
Cycles required to implement different instructions
 Branch instructions – 4 cycles
 Store instructions – 4 cycles
 All other instructions – 5 cycles

EX
IF ID MEM WB
Visualizing Pipelining
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

ALU
IM REG DM REG

IM REG ALU DM REG

ALU
IM REG DM REG
Multi-cycle Operations
 Can EXE stage complete the operation in 1 cycle ?

 Some operations require more than 1 clock cycle to complete.

Floating Point/Integer Multiply

Floating Point/Integer Divide

Floating Point Add/Sub

 Dedicated hardware units are available on the processor for

performing these operations.

 FP-Mul and FP-Add are fully pipelined, but FP-Div is un-

pipelined.
Multi-cycle Operations

 Latency: The number of intervening cycles between an

instruction that produces a result and an instruction that uses
the result.
 Initiation / Repeat Interval: The number of cycles that must
elapse between issuing two operations of a given type.

CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Tiled Chip Multicore Processor Overview
No ratings yet
Tiled Chip Multicore Processor Overview
64 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
MIPS Pipelining and Hazards Explained
No ratings yet
MIPS Pipelining and Hazards Explained
48 pages
Lec 1
No ratings yet
Lec 1
30 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
77 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Lecture-4-08 01 2025
No ratings yet
Lecture-4-08 01 2025
35 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
MIPS Pipelining and Hazards
0% (1)
MIPS Pipelining and Hazards
38 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Chapter 04 Processor 3.5
No ratings yet
Chapter 04 Processor 3.5
52 pages
Week 11
No ratings yet
Week 11
33 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Bản Sao Của Lecture 9 - Pipelined Processor Design
No ratings yet
Bản Sao Của Lecture 9 - Pipelined Processor Design
11 pages
Giub 20223 62 17427 2024-06-18T10 47 08
No ratings yet
Giub 20223 62 17427 2024-06-18T10 47 08
32 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
32 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
121 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
L117-19 MIPS Pipeline Implementation
No ratings yet
L117-19 MIPS Pipeline Implementation
37 pages
Multi-Core Computer Architecture: Instruction Pipeline Hazards
No ratings yet
Multi-Core Computer Architecture: Instruction Pipeline Hazards
23 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
38 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Pipeline Hazards in EE 108b
No ratings yet
Pipeline Hazards in EE 108b
48 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
RISC Pipeline Overview
No ratings yet
RISC Pipeline Overview
39 pages
MIPS Processor Architecture Overview
No ratings yet
MIPS Processor Architecture Overview
70 pages
KARAK Electronics: Audio & Video Tech
No ratings yet
KARAK Electronics: Audio & Video Tech
40 pages
Workstation Customization Programming
No ratings yet
Workstation Customization Programming
451 pages
0417 s12 QP 11
No ratings yet
0417 s12 QP 11
16 pages
Student Datacard System Overview
0% (1)
Student Datacard System Overview
2 pages
LANID Network Device Identifier Manual
No ratings yet
LANID Network Device Identifier Manual
5 pages
Kdfi V1.4: User Manual (English)
100% (1)
Kdfi V1.4: User Manual (English)
13 pages
ARM64 Cheat Sheet
No ratings yet
ARM64 Cheat Sheet
1 page
Design Development and Testing of An Automated Egg Incubator
No ratings yet
Design Development and Testing of An Automated Egg Incubator
12 pages
Catalog Insulation Monitoring Vigilhom
100% (1)
Catalog Insulation Monitoring Vigilhom
63 pages
Informatica Scenarios
No ratings yet
Informatica Scenarios
12 pages
New RS232 485 Data Protocol of UL202
No ratings yet
New RS232 485 Data Protocol of UL202
3 pages
Soal Ujian Sistem Operasi
No ratings yet
Soal Ujian Sistem Operasi
5 pages
Mi 100B T
No ratings yet
Mi 100B T
24 pages
Panasonic LCD (2012) TX-l55wt50 (La34)
No ratings yet
Panasonic LCD (2012) TX-l55wt50 (La34)
67 pages
DP Guide LSH 3 - 06 Email General
No ratings yet
DP Guide LSH 3 - 06 Email General
16 pages
Advantages and Disadvantages of Open-Source OS
No ratings yet
Advantages and Disadvantages of Open-Source OS
4 pages
NP 2511D 2 - 3511D 2 - D F10116 Com R100P1 - Eng
No ratings yet
NP 2511D 2 - 3511D 2 - D F10116 Com R100P1 - Eng
56 pages
Neural Network Hardware Implementation (HDL) - Vishu Garg
No ratings yet
Neural Network Hardware Implementation (HDL) - Vishu Garg
13 pages
HMI Design Challenges and Solutions
No ratings yet
HMI Design Challenges and Solutions
11 pages
All Chapter Cricket Shop
No ratings yet
All Chapter Cricket Shop
97 pages
Seben Metal Detector PDF
No ratings yet
Seben Metal Detector PDF
12 pages
Example Java Pdf417 Script
No ratings yet
Example Java Pdf417 Script
2 pages
Affidavit For PC
No ratings yet
Affidavit For PC
2 pages
PK2200UM
No ratings yet
PK2200UM
126 pages
ARINC 429 Data Bus Analyzer: Manual
No ratings yet
ARINC 429 Data Bus Analyzer: Manual
82 pages
ENETEK - Instal & Operation Manual - IP55 Outdoor Hybrid
No ratings yet
ENETEK - Instal & Operation Manual - IP55 Outdoor Hybrid
32 pages
BC-5500 Operation Mannual (1.6) PDF
100% (4)
BC-5500 Operation Mannual (1.6) PDF
385 pages
Install Electric Fence Guide
No ratings yet
Install Electric Fence Guide
9 pages
RDC Trm-20-40 User Manual
No ratings yet
RDC Trm-20-40 User Manual
85 pages
ATMi Setup Guide Lo R PDF
No ratings yet
ATMi Setup Guide Lo R PDF
24 pages

Module 2

Uploaded by

Module 2

Uploaded by

Pipelining: Basic and

• Pipelining overlaps execution of instructions to

• Five stages: Instruction Fetch (IF),

Time Pipelining doesn’t reduce

30 40 40 40 40 20 latency of single task, it

A improves throughput of entire

D Potential speedup = Number of

Unbalanced lengths of pipe stages reduces speedup

Time to “fill” pipeline and time to “drain” it reduces speedup

Instr. Decode Write

IM REG ALU DM REG

IM REG ALU DM REG

• Increases instruction throughput

Structural hazards: Attempting to use the same hardware to do

Data hazards: Instruction depends on result of prior instruction

Control hazards: Caused by delay between the fetching of

Eg: Uniport Memory

 must detect the hazard

 must have mechanism to stall

 Solution 2: Duplicate hardware

 Multiple such units will help both instruction to progress

• Duplicate or pipeline functional units

Time (clock cycles)

Stall Bubble Bubble Bubble Bubble Bubble

Time (clock cycles)

 Read After Write (RAW)

 Caused by a data dependence

 Called an output dependence J: add r1,r2,r3

 All instructions take 5 stages, and

 Writes are always in stage 5

Time (clock cycles)

• Hardware mechanism to detect and

• Branches disrupt sequential

=> Three Stage Stall

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

#3: Predict Branch Taken

#4: Delayed Branch

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

Execute successor instructions in sequence

“Squash” instructions in pipeline if branch actually taken

But branch target address in is not known by IF stage

Target is known at same time as branch outcome (IDstage)

 MIPS still incurs 1 cycle branch penalty

Where to get instructions to fill branch delay slot?

• Executes one instruction at a time,

Instruction Instr. Decode Execute Memory Write

• Split instruction/data memory or

Instr. Decode Write

IM REG ALU DM REG

IM REG ALU DM REG

 Some operations require more than 1 clock cycle to complete.

Floating Point/Integer Multiply

Floating Point/Integer Divide

Floating Point Add/Sub

 Dedicated hardware units are available on the processor for

 FP-Mul and FP-Add are fully pipelined, but FP-Div is un-

 Latency: The number of intervening cycles between an

You might also like