Lecture 7
Lecture 7
Processor Storage
We are building a
microprocessor
(CPU)
Assembly Language
Processors
Arithmetic Finite State
Logic Units Machines
Devices Flip-flops
Circuits
Logic Gates
Transistors
Building a Microprocessor
§ A microprocessor is programmable.
§ Receives a set of instructions and inputs
§ It will execute these instructions on the input
to do some computation and output the
result.
CPU
Building a Microprocessor
§ We separated the CPU to 3 parts:
ú One handles computation.
ú One handles storage.
ú One orchestrates the process.
§ Last week we leaned
Controller
about the ALU Thing
A B
Storage Arithmetic
func flags Thing Thing
G
This Week: Finishing the CPU
MIPS
control unit
Storage Arithmetic
Thing Thing
Our Goal: The MIPS CPU Controller
Thing
PCWriteCond
Control PCSource
PCWrite Unit
ALUOp
IorD
ALUSrcB
MemRead
ALUSrcA
MemWrite
RegWrite
MemtoReg
RegDst
IRWrite
Opcode
0
1
Shift left 2 2
Instruction
[31-26] Registers
0 Instruction Read reg 1 0
A
PC
Arithmetic
Storage
Thing
Things
The Register File
(part of “the storage thing”)
Memory and registers
§ CPUs have registers that store a single value
ú Program counters, instruction registers, etc.
§ But we need to store large amount of data.
§ There are also units that do this:
ú Register file: small number of fast memory units.
Allows multiple values to be read and written
simultaneously.
ú Cache memory: larger grid of slow memory cells
ú Main memory: even larger grid of even slower memory
cells.
Stores most of the information to be processed by the
CPU.
The Register File
§ An array of registers in the CPU
§ Each register has an “address”: a number for
the register
ú With k address bits you get 2k registers
ú MIPS has k=5 à 32 registers
ú x86-64 : 16 registers
§ Each register is n-bit wide.
ú For MIPS-32 : each n=32 bits
ú For x86-64 : n=64 bit
Register File Functionality
WriteEnable Register 0
0 1 2 3 n bits 2
Decoder Reg B select
Data, A and B, and all the
2
registers (R0 to R3) have the
Destination
Reg. Address same bitwidth (n bits).
Register File – Read Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3
0 1 2 3 2
Decoder Reg B select
2
Destination
Reg. Address
Register File – Read Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3
0 1 2 3 2
Decoder Reg B select
2
Destination
Reg. Address
Register File – Write Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3
0 1 2 3 2
Decoder Reg B select
2
Destination
Reg. Address
Register File – Write Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3
0 1 2 3 2
Decoder Reg B select
2
Destination
Reg. Address
Register File
a modern phone
can store a billion 32-bit numbers.
Main Memory
(part of “the storage thing”)
Lines Row 2
Row 3
...
Row 2m-1
§ There are 2m rows. ...
ú m is the address width D0 D1 D2 Dn-1
§ Each row contains n bits. Data
ú n is the data-width Lines
§ What’s the size of this memory?
ú 2m * n bits => 2m * n / 8 Bytes
Storage cells
§ Each row is made of n storage cells.
ú Each cell stores a single bit of information.
§ Multiple ways of building these cells.
ú e.g. RAM cell DRAM IC cell
Select
Select
B C B C
S Q
C capacitor
R Q
B
RAM cell
Memory Array
Decoder
row (word) to
WL
read/write
WL
Bitline:
read/write data BL BL
BL
wordline 1
Decoder
wordline 2
Cell 2 Cell 1 Cell 0
wordline 3
Cell 2 Cell 1 Cell 0
wordline 1
Decoder
wordline 2
Cell 2 Cell 1 Cell 0
wordline 3
Cell 2 Cell 1 Cell 0
wordline 1
Decoder
wordline 2
Cell 2 Cell 1 Cell 0
wordline 3
Cell 2 Cell 1 Cell 0
A Y
A Y
0 0
1 1
Controlling the Flow
WE
§ Since some lines (buses) will
now be used for both input A Y
and output, we introduce a
(sort of) new gate called the
tri-state buffer.
§ When WE (write enable) WE A Y
signal is low, buffer output is 0 X Z
a high impedance “signal” Z.
1 0 0
ú The output is floating: neither
connected to high voltage or 1 1 1
to the ground.
ú This is called “high Z”
WE
WE A Y
A Y 0 X Z
1 0 0
1 1 1
WE = 1
A Y
WE = 0
A Y
31
Control the flow using tri-
state buffer
32
Timing Is Everything
§ RAM is slow.
§ Flipflops store
and read data in
a single clock cycle.
§ RAM is slower
and further away
from the CPU.
§ We need to coordinate when to read and write
data, addresses.
Summary: Memory vs Registers
§ Memory houses most of the data values
being used by a program.
ú And the program instructions themselves!
§ Registers are for local / temporary data
stores, meant to be used to execute an
instruction.
Example:
SRAM
Static Random Access Memory
35
Asynchronous SRAM Interface –
An example
Address
(n-bit) Data
(m-bit)
CE’
SRAM
Read/Write
’
OE’
36
Read/Write SRAM - Timing
waveforms
Clock
SRAM Read SRAM Write
Address
__
CE
Read/
Write
__
OE
hi-Z hi-Z hi-Z
Data
37
Example:
DRAM
Dynamic Random Access Memory
38
Memory Technology: DRAM
_bitline
§ Capacitor leaks through the RC path
ú DRAM cell loses charge over time
ú DRAM cell needs to be refreshed
39
DRAM – Dynamic Random Access Memory
Array of Values
3455434 READ address
43543 WRITE address, value
98734
0 Accessing any location takes
847 the same amount of time
42
873909 Data needs to be constantly
1729 refreshed
40
DRAM in Today’s Systems
41
Factors that Affect Choice of
Memory
1. Speed
ú Should be reasonably fast compared to processor
2. Capacity
ú Should be large enough to fit programs and data
3. Cost
ú Should be cheap
42
Why DRAM?
Flip-flops Higher
Cost
Access Latency
43
SRAM vs DRAM
§ SRAM:
ú ~6 transistors
ú Retains data bits in its memory as long as power is being
supplied
ú Used in caches (holds a small amount of data)
ú Faster access times and more expensive
§ DRAM:
ú 1 transistor + 1 capacitor
ú Must be periodically refreshed to retain their data
which increases the power usage
DRAM capacitors have a tendency to leak electrons
and lose their charge
ú Used in main memory (holds much more data)
ú Slower access times and cheaper
Memory Hierarchy
45
Registers and
caches are in here
Memory is in here
46
Registers and caches
are in here in the CPU
47
Memory Hierarchy as Food
(In terms of access speed)
§ Registers: food in your mouth, ready for chewing
§ Cache: food on your plate
§ Memory: food in your fridge
§ Hard disk: grocery store down the street
§ Network: the farm
48
But … memory is far away
§ Most processor spend most of their time
waiting.
ú ... often for memory. This delay is referred to as
the “memory wall”.
53
First, some key terms …
§ The cache has a few sets of blocks
§ In a direct mapped cache, each set has one
block
§ In a N-way set associative cache, each set has
N blocks.
§ A fully associative cache has one set with all
the blocks.
§ A memory address gets “hashed” to a set.
§ Different memory addresses may be hashed
to the same set.
54
Addresses and Caches
§ Each load fetches an entire cache block -- not just
a single value.
ú The size of a cache “block” is dependent on the cache.
ú A “block” is a set of words with closely related
addresses.
ú Why fetch a whole block when you just need part of it?
spatial locality
55
Bit Masking
§ A bit vector is an integer that should be interpreted
as a sequence of bits.
ú We can think of an address as a bit vector.
§ A mask is a value that can be used to turn specific
bits in a bit vector on or off.
56
Cache Associativity
57
A small example
§ Consider a 8-bit memory address (byte-
addressable)
§ 10101010, 256 different addresses
§ What if we divide 256 addresses into 8-byte
blocks?
§ How many blocks are there?
§ The address is now “hierarchical”:
ú block number
ú offset within the block
§ 10101010
§ block number, block offset
58
Exercise: Cache Masking
Given a 32-bit address space, identify the tag,
set, and block offset for a (direct mapped)
cache that stores 16 32-byte blocks.
59
Exercise: Cache Masking
Given a 32-bit address space, identify the tag,
set, and block offset for a (direct mapped)
cache that stores 16 32-byte blocks.
63
Cache Loading and Evicting
§ Each cache has a finite size.
ú It can store some maximum number of blocks.
ú Based on its associativity, it can store a set
number of blocks with a specific hash.
64
How do we choose what to evict?
§ Ideally, we’d kick out data we never need again.
Controller
Thing
Data path
Storage Arithmetic
Thing Thing