0% found this document useful (0 votes)
19 views66 pages

Lecture 7

Uploaded by

sjsksjdjd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views66 pages

Lecture 7

Uploaded by

sjsksjdjd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Lecture 7:

Processor Storage
We are building a
microprocessor
(CPU)

Assembly Language

Processors
Arithmetic Finite State
Logic Units Machines

Devices Flip-flops

Circuits
Logic Gates

Transistors
Building a Microprocessor
§ A microprocessor is programmable.
§ Receives a set of instructions and inputs
§ It will execute these instructions on the input
to do some computation and output the
result.

CPU
Building a Microprocessor
§ We separated the CPU to 3 parts:
ú One handles computation.
ú One handles storage.
ú One orchestrates the process.
§ Last week we leaned
Controller
about the ALU Thing
A B

Storage Arithmetic
func flags Thing Thing

G
This Week: Finishing the CPU
MIPS
control unit

Register file Controller MIPS


Memory Thing datapath

Storage Arithmetic
Thing Thing
Our Goal: The MIPS CPU Controller
Thing
PCWriteCond
Control PCSource
PCWrite Unit
ALUOp
IorD
ALUSrcB
MemRead
ALUSrcA
MemWrite
RegWrite
MemtoReg
RegDst
IRWrite
Opcode

0
1
Shift left 2 2

Instruction
[31-26] Registers
0 Instruction Read reg 1 0
A
PC

Address [25-21] Zero


Read A 1
1 Instruction Read reg 2 data 1
Memory [20-16]
data ALU ALU
Instruction 0 result Out
Write [15-0]
data Write reg Read B 0
1 data 2
4 1 B
Instruction U
Memory Register Write data 2 AL
3
0
Memory 1
data
register Sign
extend Shift left 2

Arithmetic
Storage
Thing
Things
The Register File
(part of “the storage thing”)
Memory and registers
§ CPUs have registers that store a single value
ú Program counters, instruction registers, etc.
§ But we need to store large amount of data.
§ There are also units that do this:
ú Register file: small number of fast memory units.
Allows multiple values to be read and written
simultaneously.
ú Cache memory: larger grid of slow memory cells
ú Main memory: even larger grid of even slower memory
cells.
Stores most of the information to be processed by the
CPU.
The Register File
§ An array of registers in the CPU
§ Each register has an “address”: a number for
the register
ú With k address bits you get 2k registers
ú MIPS has k=5 à 32 registers
ú x86-64 : 16 registers
§ Each register is n-bit wide.
ú For MIPS-32 : each n=32 bits
ú For x86-64 : n=64 bit
Register File Functionality

WriteEnable Register 0

Destination Reg. Register 1


(k-bit address)
Register 2
n-bit value to write

Register
File n-bit value
Register A …
(k-bit address) from Reg. A
Register 2k-1
Register B n-bit value
(k-bit address) Register File from Reg. B
Register File Functionality

WriteEnable Register 0 this register


file has
Destination Reg. Register 1
two read ports
(k-bit address)
Register 2 and
n-bit value to write one write port

Register
File n-bit value
Register A …
(k-bit address) from Reg. A
Register 2k-1
Register B n-bit value
(k-bit address) Register File from Reg. B
Register File Structure
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3

0 1 2 3 n bits 2
Decoder Reg B select
Data, A and B, and all the
2
registers (R0 to R3) have the
Destination
Reg. Address same bitwidth (n bits).
Register File – Read Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3

0 1 2 3 2
Decoder Reg B select

2
Destination
Reg. Address
Register File – Read Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3

0 1 2 3 2
Decoder Reg B select

2
Destination
Reg. Address
Register File – Write Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3

0 1 2 3 2
Decoder Reg B select

2
Destination
Reg. Address
Register File – Write Operation
Load Enable
Reg A select
Data
Load 2
R0
0
1 A
Load
R1 2
3
Load
R2 0
1 B
Load 2
R3
3

0 1 2 3 2
Decoder Reg B select

2
Destination
Reg. Address
Register File

§ So many wires for only 4 registers.


§ MIPS has
R0
32 registers.
R1
§ What if we
R2
want 1 billion
registers? R3

ú This is realistic: Decode


r

a modern phone
can store a billion 32-bit numbers.
Main Memory
(part of “the storage thing”)

4K character memory from IBM 1401


http://www.righto.com/2015/08/examining-core-memory-module-inside.html
Large Memory
§ We want to store millions and billions of bits.
§ Register files are fast but too costly for
storing lots of data.
ú Too many wires.
ú Flipflops use too many gates.
§ We need something else
Main Memory and Addressing
Address
§ A collection of addressable 0 01001010
memory units. 1 11110000
word
2 01001010
§ Usually addressed in units 3 11101010
of bytes. 4 00001110

ú Byte = 8 bits 5 ...


word
6 ...
§ Every group of 4 bytes is 7
one 32-bit word. 8
9
ú For us: word
10
word = 32 bits = 4 bytes.
11
ú Other definitions exist. 12
Electronic Memory
§ Like register files, main memory is made up
of a decoder and rows of memory units.
Row 0
m Row 1
Address
Decoder

Lines Row 2
Row 3
...
Row 2m-1
§ There are 2m rows. ...
ú m is the address width D0 D1 D2 Dn-1
§ Each row contains n bits. Data
ú n is the data-width Lines
§ What’s the size of this memory?
ú 2m * n bits => 2m * n / 8 Bytes
Storage cells
§ Each row is made of n storage cells.
ú Each cell stores a single bit of information.
§ Multiple ways of building these cells.
ú e.g. RAM cell DRAM IC cell
Select
Select

B C B C
S Q

C capacitor
R Q
B
RAM cell
Memory Array

Cell Cell Cell


Decoder

Cell Cell Cell

Cell Cell Cell

Cell Cell Cell


Memory Array – Main Signals
WL
Wordline:
WL
which memory

Decoder
row (word) to
WL
read/write
WL

Bitline:
read/write data BL BL
BL

Also read/write signal: are we reading or writing?


How we read from wordline 2
wordline 0
Cell 2 Cell 1 Cell 0

wordline 1
Decoder

Cell 2 Cell 1 Cell 0

wordline 2
Cell 2 Cell 1 Cell 0

wordline 3
Cell 2 Cell 1 Cell 0

bitline 2 bitline 1 bitline 0


How we read from wordline 2
wordline 0
Cell 2 Cell 1 Cell 0

wordline 1
Decoder

Cell 2 Cell 1 Cell 0

wordline 2
Cell 2 Cell 1 Cell 0

wordline 3
Cell 2 Cell 1 Cell 0

bitline 2 bitline 1 bitline 0


How we read from wordline 2
wordline 0
Cell 2 Cell 1 Cell 0

wordline 1
Decoder

Cell 2 Cell 1 Cell 0

wordline 2
Cell 2 Cell 1 Cell 0

wordline 3
Cell 2 Cell 1 Cell 0

bitline 2 bitline 1 bitline 0


Data Bus
§ Communication between
components takes place through
groups of shared wires called a
shared bus (or data bus).
§ Multiple components can read
from a bus at the same time.
§ Only one can write to a bus at
the same time.
ú Also called a bus driver.
§ How do we make it so?
Buffer

A Y

A Y
0 0
1 1
Controlling the Flow
WE
§ Since some lines (buses) will
now be used for both input A Y
and output, we introduce a
(sort of) new gate called the
tri-state buffer.
§ When WE (write enable) WE A Y
signal is low, buffer output is 0 X Z
a high impedance “signal” Z.
1 0 0
ú The output is floating: neither
connected to high voltage or 1 1 1
to the ground.
ú This is called “high Z”
WE
WE A Y

A Y 0 X Z

1 0 0

1 1 1
WE = 1
A Y

WE = 0
A Y

31
Control the flow using tri-
state buffer

Control c0, c1 and c2 so


that only one of the
devices output is written to
the bus.

In general, the bus can be


read by multiple devices at
the same but can only be
written by one device at a
time.

32
Timing Is Everything
§ RAM is slow.
§ Flipflops store
and read data in
a single clock cycle.
§ RAM is slower
and further away
from the CPU.
§ We need to coordinate when to read and write
data, addresses.
Summary: Memory vs Registers
§ Memory houses most of the data values
being used by a program.
ú And the program instructions themselves!
§ Registers are for local / temporary data
stores, meant to be used to execute an
instruction.
Example:

SRAM
Static Random Access Memory

There are other types of RAMs such as DRAM, SDRAM,


DDR SDRAM, RDRAM, VRAM, etc.

35
Asynchronous SRAM Interface –
An example
Address
(n-bit) Data
(m-bit)
CE’
SRAM
Read/Write

OE’

Chip Enable’ Read/Write Output Enable’ Access Type


(CE’) ’ (OE’)
0 0 X SRAM Write
0 1 0 SRAM Read
1 X X SRAM not enabled

36
Read/Write SRAM - Timing
waveforms
Clock
SRAM Read SRAM Write
Address
__
CE
Read/
Write
__
OE
hi-Z hi-Z hi-Z
Data

Data from Data to SRAM


SRAM
§ Reading and writing of signals takes time. To make sure things are
read/written correctly, we must control the timing carefully.
§ 6 transistors per memory cell

37
Example:

DRAM
Dynamic Random Access Memory

There are other types of RAMs such as DRAM, SDRAM,


DDR SDRAM, RDRAM, VRAM, etc.

38
Memory Technology: DRAM

§ Dynamic random access memory


§ Capacitor charge state indicates stored value
ú Whether the capacitor is charged or discharged
indicates storage of 1 or 0
ú 1 capacitor
ú 1 access transistor row enable

_bitline
§ Capacitor leaks through the RC path
ú DRAM cell loses charge over time
ú DRAM cell needs to be refreshed

39
DRAM – Dynamic Random Access Memory
Array of Values
3455434 READ address
43543 WRITE address, value
98734
0 Accessing any location takes
847 the same amount of time
42
873909 Data needs to be constantly
1729 refreshed

40
DRAM in Today’s Systems

Why DRAM? Why not some other memory?

41
Factors that Affect Choice of
Memory
1. Speed
ú Should be reasonably fast compared to processor

2. Capacity
ú Should be large enough to fit programs and data

3. Cost
ú Should be cheap

42
Why DRAM?

Flip-flops Higher
Cost

SRAM Favorable point in the


trade-off spectrum
Cost

DRAM Higher access


latency
Disk
Flash

Access Latency

43
SRAM vs DRAM
§ SRAM:
ú ~6 transistors
ú Retains data bits in its memory as long as power is being
supplied
ú Used in caches (holds a small amount of data)
ú Faster access times and more expensive
§ DRAM:
ú 1 transistor + 1 capacitor
ú Must be periodically refreshed to retain their data
which increases the power usage
DRAM capacitors have a tendency to leak electrons
and lose their charge
ú Used in main memory (holds much more data)
ú Slower access times and cheaper
Memory Hierarchy

§ There are in fact multiple levels of memory.


ú We saw only two.

§ Memory sorted by access speed:


1. Register file or “registers” In this
2. Cache (several levels) In this course
course In this
3. RAM or “memory”: off-chip
4. Hard disk: requires OS support course
5. Network: quite slow

45
Registers and
caches are in here

Memory is in here

46
Registers and caches
are in here in the CPU

Memory chips are


plugged in here

47
Memory Hierarchy as Food
(In terms of access speed)
§ Registers: food in your mouth, ready for chewing
§ Cache: food on your plate
§ Memory: food in your fridge
§ Hard disk: grocery store down the street
§ Network: the farm

48
But … memory is far away
§ Most processor spend most of their time
waiting.
ú ... often for memory. This delay is referred to as
the “memory wall”.

§ No matter how fast we make a processor, if


memory is too far away, we’ll just spend
more time waiting.
ú As processors get faster, more processor cycles
can be executed before a load completes.
49
Scaling the Memory Wall
§ Caches are a structure that makes it appear that
memory is closer than it is.

§ Every load to memory fetches more than just the


value that is loaded.
ú In fact, a lot of values -- a block (or line) -- is brought
from the memory to a location close to the processor.
ú Why? The cost of a bus is its length. Making it wider is
inexpensive.
ú The closer location is called a cache. It stores the value
that was loaded and the values near it, in case they are
needed soon.
50
Big Idea: Locality

Caches rely on spatial and temporal locality.

This is a Big Idea in computing. Basically: if we


used something recently, we’re likely to use it
again (or something near it) soon.

§ “It or something near it” is spatial locality.


§ “… soon” is temporal locality.
51
Examples of Locality
§ “Iterating over an array” exhibits both
temporal and spatial locality.
§ “Executing code” often exhibits temporal and
spatial locality.
§ “Accessing items from a dictionary” does not:
the items in the dictionary may not be close
to each other in memory.
§ Linked lists and other dynamically allocated
structures can also cause locality problems.
52
First, some key terms …
§ Address: the unique memory location of data
§ Tag: a unique identifier for a block of words (data) in
the cache
§ Block: the basic block of the cache storage (contains
a group of words)
§ Set: caches are organized into sets, each of them
holds one ore more blocks
§ Associativity: the way of organization in the cache
§ Offset: the index of the word within a block of words
(Read Textbook Chapter 8.3 for detailed definitions!)

53
First, some key terms …
§ The cache has a few sets of blocks
§ In a direct mapped cache, each set has one
block
§ In a N-way set associative cache, each set has
N blocks.
§ A fully associative cache has one set with all
the blocks.
§ A memory address gets “hashed” to a set.
§ Different memory addresses may be hashed
to the same set.
54
Addresses and Caches
§ Each load fetches an entire cache block -- not just
a single value.
ú The size of a cache “block” is dependent on the cache.
ú A “block” is a set of words with closely related
addresses.
ú Why fetch a whole block when you just need part of it?
spatial locality

§ The easiest way to define a block is to look at its


mask.

55
Bit Masking
§ A bit vector is an integer that should be interpreted
as a sequence of bits.
ú We can think of an address as a bit vector.
§ A mask is a value that can be used to turn specific
bits in a bit vector on or off.

§ For example, let’s set a mod-16 mask.


value = ....
mod_16 = 15 # 0x000000F
print (value & mod_16) # Only the bottom 4 bits
# “&” is “bitwise and”

56
Cache Associativity

57
A small example
§ Consider a 8-bit memory address (byte-
addressable)
§ 10101010, 256 different addresses
§ What if we divide 256 addresses into 8-byte
blocks?
§ How many blocks are there?
§ The address is now “hierarchical”:
ú block number
ú offset within the block
§ 10101010
§ block number, block offset

58
Exercise: Cache Masking
Given a 32-bit address space, identify the tag,
set, and block offset for a (direct mapped)
cache that stores 16 32-byte blocks.

00000000 00000000 00000000 00000000 <- 32 bits

59
Exercise: Cache Masking
Given a 32-bit address space, identify the tag,
set, and block offset for a (direct mapped)
cache that stores 16 32-byte blocks.

In a direct-mapped cache, we use part of the


address as an index into the cache. Since there
are 16 storage locations in this cache, we need 4
(2^4 = 16) bits from the address as the index.
60
Exercise: Cache Masking
Given a 32-bit address space, identify the tag,
set, and block offset for a (direct mapped)
cache that stores 16 32-byte blocks.

00000000 00000000 00000000 00000000 <- 32 bits


^^^^^ <- offset into a block
^ ^^^ <- set

Everything else is the tag.


We match the tag to make sure the memory
address matches.
61
Associativity
§ Most caches use some form of hashing.
ú The caches are smaller than the memory they are
caching from, so they can’t store everything!
§ If two blocks hash to the same value, they
can’t both be stored. To avoid that, caches are
often associative.
ú A 2-way set associative cache can store two blocks
that hash to the same value.
ú A fully associative cache doesn’t have to worry
about hash collisions at all.
62
2-way set associative cache: how it’s done in hardware

63
Cache Loading and Evicting
§ Each cache has a finite size.
ú It can store some maximum number of blocks.
ú Based on its associativity, it can store a set
number of blocks with a specific hash.

§ Every time a load is performed from memory,


the block must be stored.
ú This means that another block might need to
be evicted.

64
How do we choose what to evict?
§ Ideally, we’d kick out data we never need again.

§ But we can’t see the future, so we do the next best


thing. We rely on locality and kick out … something
old.

§ The most common heuristic is “least recently used”


(LRU).
ú The cache block that was accessed the longest time ago is
dropped.
§ “first in first out” (FIFO) heuristic:
ú FIFO: first block that was inserted in the cache will be
evicted (the block that has been in the cache the longest)
§ Other heuristics include “least frequently used”, and
others. 65
§ Now we know what the Arithmetic and
Storage Things do
§ How does the Controller Thing Work?
§ Next lecture

Controller
Thing
Data path

Storage Arithmetic
Thing Thing

You might also like