0% found this document useful (0 votes)
29 views32 pages

Understanding RISC Architecture and MIPS

Uploaded by

Ahmed Sami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views32 pages

Understanding RISC Architecture and MIPS

Uploaded by

Ahmed Sami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CSEN B302

RISC
Reduced Instruction Set Computers
MIPS
Dr. Eng. Amr T. Abdel-Hamid

Spring 2024
Before the RISC era
 Compilers were hard to build especially for machines w
ith registers
 Make machine do more work than software
 Have instructions load and store directly to memory
(memory-to-memory operations)
 Software costs were rising and hardware costs were dr
opping
 Move as much functionality to hardware
 Magnetic core memory was used as main memory
which was slow and expensive
 Minimize assembly code
Dr. Amr Talaat

 Complex Instruction Set Computers (CISC)


 Use complex instructions “MULT”, “ADD”…

CSEN B302
Technology was advancing
 Compilers were improving
 Simple compilers found it difficult to use more
complex instructions
 Optimizing compilers rarely needed more pow
erful instructions
 Caches
 allowed main memory to be accessed at simil
ar speeds to control memory
 Semiconductor memory was replacing magnetic
core memory
Dr. Amr Talaat

 Reduced performance gap between control an


d main memory

CSEN B302
Inception of RISC
 1974 – John Cocke (IBM) proved that 80% of work was done using only
20% of the instructions
 Three RISC projects
 IBM 801 machine (1974)
 Berkeley’s RISC-I and RISC-II processors (1980)
 Stanford’s MIPS processor (1981)
 1986 – announcement of first commercial RISC chip

Dynamic Machine Instruction Memory Reference


Occurrence (Weighted) (Weighted)
Pascal C Pascal C Pascal C
Assign 45 38 13 13 14 15
Loop 5 3 42 32 33 26
Call 15 12 31 33 44 45
Dr. Amr Talaat

If 29 43 11 21 7 13
GoTo- 3 - - - -
Other 6 1 3 1 2 1

CSEN B302
RISC Approach

 Use only simple instructions that can be execute


d within one clock cycle
 Fewer transistors for instructions = more regi
sters
 Pipelining
 Register-to-register operations
 Operand reuse
 Reduction of load/store
Dr. Amr Talaat

CSEN B302
Load/Store Architecture

 Individual instructions to store/load data and to


perform operations
 All operations are performed on operands in regi
sters
 Main memory is used only to load/store instructi
ons

add $r3, $r2, $r1


add $r5, $r4, $r3
Dr. Amr Talaat

Only Registers Can be accessed by


instrctions other than Load/Store

CSEN B302
Pipelining
Sequential
IF ID OF OE OS

IF ID OF OE OS

Clock Cycle IF ID OF OE OS

Pipelined

IF ID OF OE OS

IF ID OF OE OS
IF – Instruction Fetch
ID – Instruction Decod
IF ID OF OE OS e
Clock Cycle
Dr. Amr Talaat

OF – Operand Fetch
OE – Operand Executio
n
OS – Operation Store

Time CSEN B302


Pipelining
Data Dependency

IF ID OF OE OS

IF ID OF OE OS IF – Instruction Fetch
ID – Instruction Deco
de
OF – Operand Fetch
OE – Operand Executi
IF ID OF OE OS on
OS – Operation Store

Branch Address Dependency

IF ID OF OE OS
Dr. Amr Talaat

IF ID OF OE OS

CSEN B302
Pipelining

 Data dependencies can be addressed by reorderi


ng the instructions when possible (compiler)
 Performance degradation from branches can be
reduced by branch prediction or executing instru
ctions for both branches until the correct branch
is identified
Dr. Amr Talaat

CSEN B302
5 Steps of MIPS Datapath
Figure A.2, Page A-8

Instruction Instr. Decode Execute Memory Write


Fetch Reg. Fetch Addr. Calc Access Back
Next PC

MUX
Adder

Next SEQ PC

4 RS1
Zero?

MUXMUX
RS2
Address

Memory

Reg File
Inst

ALU
L

Memory
RD

Data
M

MUX
D
Dr. Amr Talaat

Sign
Imm Extend

WB Data

CSEN B302
5 Steps of MIPS Datapath
Figure A.3, Page A-9
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back

MUX
Next PC
Next SEQ PC Next SEQ PC
Adder

4 RS1
Zero?

MUXMUX

MEM/WB
Address

Memory

RS2

EX/MEM
Reg File

ID/EX
IF/ID

ALU

Memory
Data

MUX

WB Data
Sign
Dr. Amr Talaat

Extend
Imm

RD RD RD

CSEN B302
5 Steps of MIPS Datapath
Figure A.3, Page A-9
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back

MUX
Next PC
Next SEQ PC Next SEQ PC
Adder

4 RS1
Zero?

MUXMUX

MEM/WB
Address

Memory

RS2

EX/MEM
Reg File

ID/EX
IF/ID

ALU

Memory
Data

MUX

WB Data
Sign
Dr. Amr Talaat

Extend
Imm

RD RD RD

• Data stationary control


– local decode for each instruction phase / pipeline stage CSEN B302
Visualizing Pipelining
Figure A.2, Page A-8

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7


I

ALU
n Ifetch Reg DMem Reg
s
t

ALU
r. Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg
d
e
Dr. Amr Talaat

ALU
Ifetch Reg DMem Reg

CSEN B302
Pipelining is not quite that easy!

 Limits to pipelining: Hazards prevent next instructi


on from executing during its designated clock cycl
e
 Structural hazards: HW cannot support this co
mbination of instructions (single person to fold
and put clothes away)
 Data hazards: Instruction depends on result of
prior instruction still in the pipeline (missing soc
k)
 Control hazards: Caused by delay between the f
Dr. Amr Talaat

etching of instructions and decisions about chan


ges in control flow (branches and jumps).

CSEN B302
One Memory Port/Structural Hazards
Figure A.4, Page A-14

Time (clock cycles)


Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I Load Ifetch

ALU
Reg DMem Reg

n
s

ALU
t
Instr 1 Ifetch Reg DMem Reg

r.

ALU
Reg
Instr 2 Ifetch Reg DMem

O
r
Dr. Amr Talaat

ALU
d Instr 3 Ifetch Reg DMem Reg

e
r

ALU
Instr 4 Ifetch Reg DMem Reg

CSEN B302
One Memory Port/Structural Hazards
(Similar to Figure A.5, Page A-15)

Time (clock cycles)


Cycle Cycle
1 2
Cycle 3
Cycle Cycle
4 5Cycle C
6ycle 7
I Load Ifetch

ALU
Reg DMem Reg

n
s

ALU
t
Instr 1 Ifetch Reg DMem Reg

r.

ALU
Reg
Instr 2 Ifetch Reg DMem

O
r
d Stall BubbleBubbleBubbleBubbleBubble
Dr. Amr Talaat

e
r

ALU
Instr 3 Ifetch Reg DMem Reg

How do you “bubble” the pipe?


CSEN B302
Speed Up Equation for Pipelining

CPIpipelined  Ideal CPI  Average Stall cycles per Inst

Ideal CPI  Pipeline depth Cycle Timeunpipelined


Speedup  
Ideal CPI  Pipeline stall CPI Cycle Timepipelined

For simple RISC pipeline, CPI = 1:

Pipeline depth Cycle Timeunpipelined


Speedup  
1  Pipeline stall CPI Cycle Timepipelined
Dr. Amr Talaat

CSEN B302
Example: Dual-port vs. Single-port

 Machine A: Dual ported memory (“Harvard Architect


ure”)
 Machine B: Single ported memory, but its pipelined i
mplementation has a 1.05 times faster clock rate
 Ideal CPI = 1 for both
 Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe
/ 1.05)
Dr. Amr Talaat

= (Pipeline Depth/1.4) x 1.05


= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) =
1.33
 Machine A is 1.33 times faster CSEN B302
Data Hazard on R1
Figure A.6, Page A-17

Time (clock cycles)

IF ID/RF EX MEM WB

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

n
s
t

ALU
Ifetch Reg DMem Reg
sub r4,r1,r3
r.

ALU
O
Reg
and r6,r1,r7 Ifetch Reg DMem

r
d
Dr. Amr Talaat

ALU
Ifetch Reg DMem Reg
e or r8,r1,r9
r

ALU
xor r10,r1,r11 Ifetch Reg DMem Reg

CSEN B302
Three Generic Data Hazards

 Read After Write (RAW)


InstrJ tries to read operand before InstrI writes it

I: add r1,r2,r3
J: sub r4,r1,r3

 Caused by a “Dependence” (in compiler nomenclatu


re). This hazard results from an actual need for co
Dr. Amr Talaat

mmunication.

CSEN B302
Three Generic Data Hazards
 Write After Read (WAR)
InstrJ writes operand before InstrI reads it

I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
 Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.

 Can’t happen in MIPS 5 stage pipeline because:


 All instructions take 5 stages, and
Dr. Amr Talaat

 Reads are always in stage 2, and


 Writes are always in stage 5

CSEN B302
Three Generic Data Hazards
 Write After Write (WAW)
InstrJ writes operand before InstrI writes it.
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7

 Called an “output dependence” by compiler writer


s
This also results from the reuse of name “r1”.
Dr. Amr Talaat

 Can’t happen in MIPS 5 stage pipeline because:


 All instructions take 5 stages, and
 Writes are always in stage 5
 Will see WAR and WAW in more complicated CSENpipe
B302
Forwarding to Avoid Data Hazard
Figure A.7, Page A-19

Time (clock cycles)


I
n add r1,r2,r3 Ifetch

ALU
Reg DMem Reg

s
t
r. sub r4,r1,r3

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r

ALU
Ifetch Reg DMem Reg
or r8,r1,r9
Dr. Amr Talaat

ALU
Ifetch Reg DMem Reg
xor r10,r1,r11

CSEN B302
HW Change for Forwarding
Figure A.23, Page A-37

NextPC

mux
Registers

MEM/WR
EX/MEM
ALU
ID/EX

Data
mux

Memory

mux
Immediate
Dr. Amr Talaat

What circuit detects and resolves this hazard?


CSEN B302
Forwarding to Avoid LW-SW Data Hazard
Figure A.8, Page A-20

Time (clock cycles)


I
n add r1,r2,r3 Ifetch

ALU
Reg DMem Reg

s
t
r. lw r4, 0(r1)

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg
d sw r4,12(r1)
e
r

ALU
Ifetch Reg DMem Reg
or r8,r6,r9
Dr. Amr Talaat

ALU
Ifetch Reg DMem Reg
xor r10,r9,r11

CSEN B302
25
Data Hazard Even with Forwarding
Figure A.9, Page A-21

Time (clock cycles)

I lw r1, 0(r2) Ifetch

ALU
Reg DMem Reg

n
s
t

ALU
sub r4,r1,r6 Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
Dr. Amr Talaat

and r6,r1,r7
r
d
e

ALU
Ifetch Reg DMem Reg

r
or r8,r1,r9
CSEN B302
Data Hazard Even with Forwarding
(Similar to Figure A.10, Page A-21)

Time (clock cycles)


I
n

ALU
s lw r1, 0(r2) Ifetch Reg DMem Reg

t
r.

ALU
Ifetch Reg Bubble DMem Reg
sub r4,r1,r6
O
r
d Bubble

ALU
Ifetch Reg DMem Reg
e and r6,r1,r7
Dr. Amr Talaat

r
Bubble

ALU
Ifetch Reg DMem
or r8,r1,r9

How is this detected? CSEN B302


Control Hazard on Branches
Three Stage Stall
10: beq r1,r3,36

ALU
Ifetch Reg DMem Reg

ALU
Reg
14: and r2,r3,r5 Ifetch Reg DMem

ALU
Reg
18: or r6,r1,r7 Ifetch Reg DMem

ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9
Dr. Amr Talaat

ALU
36: xor r10,r1,r11 Ifetch Reg DMem

What do you do with the 3 instructions in between?


How do you do it?
CSEN B302
Pipelined MIPS Datapath
Figure A.24, page A-38
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC Next S

MUX
EQ PC

Adder
Adder

Zero?
4 RS1

MEM/WB
Address

Memory

RS2

EX/MEM
Reg File

ID/EX

ALU
IF/ID

Memory
MUX

Data

MUX

WB Data
Sign
Dr. Amr Talaat

Extend
Imm

RD RD RD

• Interplay of instruction set design and cycle time.


CSEN B302
Other Advantages

 New microprocessors can be developed and test


ed more quickly if being less complicated is one
of it’s aims
 Smaller instruction sets are easier for compiler p
rogrammers to use
Dr. Amr Talaat

CSEN B302
Use of RISC today
 CISC and RISC architectures are nearly indistinguishable
 CISC processors use pipelining and can complete multipl
e instructions per cycle
 Transistor technology has allowed more room on chips al
lowing RISC to have more CISC like instruction
Dr. Amr Talaat

CSEN B302
Thanks & Good Luck
Dr. Amr Talaat

CSEN B302

You might also like