0% found this document useful (0 votes)
113 views

Slides Chapter 5 Basic Processing Unit

The document describes the basic processing unit of a processor. It discusses how instructions are executed in multiple stages, including fetching instructions from memory, decoding, executing operations, accessing memory if needed, and writing results back to registers. It provides details on the typical components involved in each stage, such as the program counter, instruction register, register file, ALU, and control circuitry. The document uses a 5-stage RISC processor as an example to illustrate how instruction execution can be divided into separate hardware stages in a pipelined fashion.

Uploaded by

Win War
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Slides Chapter 5 Basic Processing Unit

The document describes the basic processing unit of a processor. It discusses how instructions are executed in multiple stages, including fetching instructions from memory, decoding, executing operations, accessing memory if needed, and writing results back to registers. It provides details on the typical components involved in each stage, such as the program counter, instruction register, register file, ALU, and control circuitry. The document uses a 5-stage RISC processor as an example to illustrate how instruction execution can be divided into separate hardware stages in a pipelined fashion.

Uploaded by

Win War
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Basic Processing Unit

Credits: Federico Baronti


Processing Unit
• A processor reads program instructions from
the computer’s memory and executes them.
This includes the following basic phases:
– Fetching and decoding the instruction
– Executing the instruction, which includes:
1. Reading one or more registers (in the register file)

datapath
2. Doing some computation (in the ALU)
3. Accessing the memory
4. Writing a register (in the register file)
Processor’s building blocks

• PC provides instruction
address
• Instruction is fetched into
IR
• Instruction address
generator updates PC
• ALU performs some
computation during
execution
• Control circuitry interprets
instruction and generates
control signals to perform
the actions needed.
A digital processing system
• datapath
A multi-stage digital processing system
• datapath
Why multi-stage?
• Processing moves from one stage to the next in
each clock cycle
• Such a multi-stage system is the basis for
pipelined operation
– High-performance processors have a pipelined
organization
– Pipelining enables the execution of successive
instructions to be overlapped
• We will get back to pipeline later. Let’s now
focus on the basics of the multi-stage
architecture of a RISC-style processor
Instruction execution
• Pipelined organization is most effective if all
instructions can be executed in the same number of
steps.
• Each step is carried out in a separate hardware
stage.
• Processor design will be illustrated using five
hardware stages.
• How can instruction execution be divided into five
steps?
– Let’s start from some representative RISC instructions
A memory access instruction:
Load R5, X(R7)
1. Fetch the instruction and increment the
program counter.
2. Decode the instruction and read the contents
of register R7 in the register file.
3. Compute the effective address = X + [R7].
4. Read the memory source operand.
5. Load the operand into the destination
register, R5.
A computational instruction:
Add R3, R4, R5
1. Fetch the instruction and increment the program
counter.
2. Decode the instruction and read registers
R4 and R5.
3. Compute the sum [R4] + [R5].
4. No action.
5. Load the result into the destination register, R3.

• Stage 4 (memory access) is not involved in this


instruction.
5-stage Architecture of a
RISC Processor
1. Fetch an instruction and increment the program
counter.
2. Decode the instruction and read registers from the
register file.
3. Perform an ALU operation.
4. Read or write memory data if the instruction involves a
memory operand.
5. Write the result into the destination register.

• This sequence determines the hardware stages needed.


Hardware components: Register file

• A 2-port register file


is needed to read the
two source registers
at the same time.

• It may be
implemented using a
2-port memory.
Hardware components: Register file
Hardware components: ALU (1)
• Both source operands
and the destination
location are in the
register file.
[RA] and [RB] denote [RB]
values of registers that
new [RC]
are identified by [RA]
addresses A and B
new [RC] denotes the
result that is stored to
the register identified
by address C
Hardware components: ALU (2)
• In this case, one of
the source
operands is the
immediate value
in the IR.
new [RC]

[RA]
A 5-stage implementation of
a RISC processor
• Instruction processing
moves from stage to stage
in every clock cycle,
starting with fetch.

• The instruction is decoded


and the source registers
are read in stage 2.

• Computation takes place


in the ALU in stage 3.
A 5-stage implementation of
a RISC processor

• …

• If a memory operation is
involved, it takes place in
stage 4.

• The result of the


instruction is written in
the destination register in
stage 5.
The datapath – Stages 2 to 5

• Register file,
used in stages 2 and 5
– (Inter-stage registers RA, RB, RZ, RM, RY
needed to carry data from one stage to
the next)

• ALU stage

• Memory stage

• Final stage to store result


to the register file
Memory stage
• For a calculation
instruction:
– MuxY selects [RZ] to be
placed in RY.
• For a memory instruction:
– RZ provides memory address,
and MuxY selects read data
to be placed in RY.
– RM provides data for a
memory write operation.
• In subroutine calls or
exception handling:
– Input 2 of MuxY is used
(return address stored in the
register file)
Instruction Fetch Stage (1)
• MuxMA selects the PC
when fetching instructions
(RZ in the Memory Stage –
we are assuming no
Harvard architecture)
• The Instruction address
generator increments the
PC after fetching an
instruction
– It also generates branch
and subroutine addresses.
Instruction Fetch Stage (2)
• When an instruction is
read, it is placed in IR.
• The control circuitry
decodes the instruction.
– It generates the control
signals that drive all
units.
• The Immediate block
extends the immediate
operand to 32 bits,
according to the type of
instruction.
Instruction address generator
• Connections to
registers RY and RA
are used to support
subroutine call and
return instructions
Example: Add R3, R4, R5
1. Memory address ←[PC],
Read memory,
IR←Memory data,
PC ← [PC] + 4
2. Decode instruction,
RA ← [R4], RB ← [R5]
3. RZ ← [RA] + [RB]
4. RY ← [RZ]
5. R3 ← [RY]
Example: Load R5, X(R7)
1. Memory address ← [PC],
Read memory,
IR ← Memory data,
PC ← [PC] + 4
=X
2. Decode instruction,
RA ← [R7]
3. RZ ← [RA] + Immediate value
X
4. Memory address ←[RZ], Read
memory,
RY ← Memory data
5. R5 ← [RY]
Example: Store R6, X(R8)
1. Memory address ← [PC],
Read memory,
IR ← Memory data,
PC ← [PC] + 4
2. Decode instruction,
RA ← [R8], RB ← [R6]
3. RZ ← [RA] + Immediate value
X, RM ← [RB]
4. Memory address ←[RZ],
Memory data ← [RM], Write
memory
5. No action
Unconditional branch
1. Memory address ←[PC], Read memory,
IR ← Memory data, PC ←[PC] + 4
2. Decode instruction
3. PC ← [PC] + Branch offset
4. No action
5. No action
Conditional branch: Branch_if_[R5]=[R6]
LOOP
1. Memory address ← [PC], Read memory,
IR ← Memory data, PC ←[PC] + 4
2. Decode instruction, RA ← [R5], RB ←[R6]
3. Compare [RA] to [RB],
If [RA] = [RB], then
PC ← [PC] + Branch offset
4. No action
5. No action
Subroutine call with indirection: Call_register
R9
1. Memory address ← [PC], Read memory,
IR ← Memory data, PC ←[PC] + 4
2. Decode instruction, RA ← [R9]
3. PC-Temp ← [PC],
PC ← [RA]
4. RY ← [PC-Temp]
5. Register LINK ← [RY]
Control signals
• Select multiplexer inputs to route the flow of
data

• Set the function performed by the ALU

• Determine when data are written into the PC,


the IR, the register file, and the memory
Register file control signals

Generated by decoding
the OPCODE field of the
instruction hold in the
IR register
Instruction
Format
R

I
ALU control signals

Generated by decoding
the OPCODE field of the
instruction hold in the
IR register Analyzed by the
CONTROL CIRCUITRY
during the execution
of a branch
instruction
Result selection

Generated by decoding
the OPCODE field of the
instruction hold in the
IR register
Memory access
• When data are found in the cache, access to
memory can be completed in one clock cycle.
• Otherwise, read and write operations may require
several clock cycles to load data from main memory
into the cache.
• A control signal is needed to indicate that memory
function has been completed (MFC). E.g., for step 1:
1.Memory address ← [PC], Read memory,
Wait for MFC,
IR ← Memory data, PC ← [PC] + 4
Memory and IR control signals

MuxY
Memory and IR control signals

1. Imm 16-bit sign


extended
2. Imm 16-bit
MuxY
unsigned extended
3. Imm 16-bit “high”
extended
4. Imm 26-bit in CALL
instr. which is
special extended
Control signals of instruction address generator
Control signal generation
• Circuitry must be implemented to generate control
signals so actions take place in correct sequence and at
correct time.
• There are two basic approaches:
hardwired control and microprogramming
• Hardwired control involves implementing circuitry that
considers step (ring) counter, IR, ALU result, and external
inputs.
• Step (Ring) counter keeps track of execution progress,
one clock cycle for each of the five steps described
(unless a memory access takes longer than one cycle).
Hardwired generation of control signals

E.g.
RF_wtite = T5&(ALU | Load | Call);
PC_enable = T1&MFC | T3&(BR | Ret | Call);
CISC processors
• CISC-style processors have more complex
instructions.
• The full set of instructions cannot all be
implemented in a fixed number of steps.
• Execution steps for different instructions do not
all follow a prescribed sequence of actions.
• Hardware organization should therefore enable
a flexible flow of data and actions to
accommodate CISC.
Hardware organization for a CISC computer
Main difference between
5-stage RISC organization
and CISC organization,
where a datapath cannot
Hold temporary results be identified easily
during instruction
execution
Bus
• An example of an interconnection network.
• When functional units are connected to a
common bus, tri-state drivers are needed.

Register Enable
A 3-bus interconnection network
Example 1: Add R5, R6
1. Memory address ← [PC],
Read memory, Wait for
MFC, IR ← Memory data,
PC ← [PC] + 4
2. Decode instruction
3. R5 ← [R5] + [R6]
A 3-bus interconnection network
Example 2: And X(R7), R9
1. Memory address ← [PC], Read memory,
Wait for MFC,
IR ← Memory data,
PC ← [PC] + 4
2. Decode instruction
3. Memory address ← [PC], Read memory,
Wait for MFC,
Temp1 ← Memory data,
PC ← [PC] + 4
4. Temp2 ← [Temp1] + [R7]
5. Memory address ← [Temp2], Read
memory, Wait for MFC, Temp1 ← Memory
data
6. Temp1 ←[Temp1] AND [R9]
7. Memory address ← [Temp2], Memory data
← [Temp1], Write memory, Wait for MFC

X is stored as a second word of the


instruction
References
• C. Hamacher, Z. Vranesic, S. Zaky, N. Manjikian
"Computer Organization and Embedded Systems,”
McGraw-Hill International Edition
– Chapter V: Basic Processing Unit

You might also like