The Memory System
The Memory System
3. DDR
4. Cache memory
0 0 1 2 3 0 3 2 1 0
4 4 5 6 7 4 7 6 5 4
• •
• •
• •
k k k k k k k k k k
2 -4 2 -4 2 -3 2 - 2 2 - 1 2 - 4 2 - 1 2 - 2 2 -3 2 -4
Processor Memory
k-bit
address bus
MAR
n-bit
data bus
Up to 2 k addressable
MDR locations
Control lines
( R / W, MFC, etc.)
W0
•
•
•
FF FF
A0 W1
•
•
•
A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2
• • • • • •
A3
W15 •
•
•
16 words of 8 bits each: 16x8 memory
org.. It has 16 external connections:
Sense / Write Sense / Write Sense / Write R /W
circuit circuit circuit
addr. 4, data 8, control: 2, CS
power/ground: 2
1K memory cells: 128x8 memory,
Data input/output lines: b 7 b1 b0
external connections: ? 19(7+8+2+2)
1Kx1:? 15 (10+1+2+2) Figure 5.2. Organization of bit cells in a memory chip.
A Memory Chip
5-bit row
address W0
W1
32 32
5-bit
decoder memory cell
array
W31
Sense/Write
circuitry
10-bit
address
32-to-1
R /W
output multiplexer
and
CS
input demultiplexer
5-bit column
address
Data
input/output
T1 T2
X Y
Word line
Bit lines
Word line
T
C
RAS
Row Addr. Strobe
A 20 - 9 A 8 - 0 Sense / Write CS
circuits
R /W
Column
address Column
latch decoder
CAS D7 D0
Column Addr. Strobe
Row
address Ro w
decoder Cell array
latch
Row/Column
address
Column Co lumn
address Read/Write
decoder circuits & latches
counter
Clock
RAS Mode register
CA S and Data input Data output
register register
R /W timing control
CS
R /W
RAS
C AS
Data D0 D1 D2 D3
A 19
A 20
2-bit
decoder
512K ´ 8
memory chip
D 31-24 D 23-16 D 15-8 D 7-0
Chip select
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
Memory System
Considerations
The choice of a RAM chip for a given application depends on
several factors:
Cost, speed, power, size…
SRAMs are faster, more expensive, smaller.
DRAMs are slower, cheaper, larger.
Which one for cache and main memory, respectively?
Refresh overhead – suppose a SDRAM whose cells are in 8K
rows; 4 clock cycles are needed to access each row; then it
takes 8192×4=32,768 cycles to refresh all rows; if the clock rate
is 133 MHz, then it takes 32,768/(133×10-6)=246×10-6 seconds;
suppose the typical refreshing period is 64 ms, then the refresh
overhead is 0.246/64=0.0038<0.4% of the total time available for
accessing the memory.
Memory Controller
Row/Column
Address address
RAS
R/ W
C AS
Memory
Request controller R/ W
Processor Memory
CS
Clock
Clock
Data
Cache
Magnetic
Disks Magnetic Tapes
27 / 19
Cache Memories
Cache
What is cache?
Page 315
Why we need it?
Locality of reference (very important)
- temporal
- spatial
Cache block – cache line
A set of contiguous address locations of some size
Cache
Main
Processor Cache
memory
Replacement algorithm
Hit / miss
Write-through / Write-back
Load through
Cache Memory
High speed (towards CPU speed)
Small size (power & cost)
Miss
Main
CPU Memory
Cache (Slow)
(Fast) Mem
Hit Cache
32 / 19
Cache Memory
00000000 Main
00000001
•
Memory
00000 Cache •
00001 •
• •
• •
• •
• •
FFFFF •
•
•
3FFFFFFF
Block 0
11101,1111111,1100
Tag: 11101
Block: 1111111=127, in the 127th block of the
cache
Word:1100=12, the 12th word of the 127th
block in the cache
Associative Mapping Main
memory
Block 0
Block 1
Cache
tag
Block 0
tag
Block 1
Block i
tag
Block 127
4096=212.
Figure 5.16. Associative-mapped cache.
Associative Mapping
Tag Word
12 4 Main memory address
111011111111,1100
Tag: 111011111111
Word:1100=12, the 12th word of a block in the
cache
Main
memory
Block 0
Cache
tag
Block 0
Set 0
Block 63
tag
Block 1
Block 64
tag
Block 2
Set 1
tag Block 65
Block 3
present (4096/64=26). Figure 5.17. Set-associative-mapped cache with two blocks per set.
Tag Set Word
6 6 4 Main memory address
Set-Associative Mapping
Tag Set Word
6 6 4 Main memory address
111011,111111,1100
Tag: 111011
Set: 111111=63, in the 63th set of the cache
Word:1100=12, the 12th word of the 63th set
in the cache
Replacement Algorithms
Difficult to determine which blocks to kick out
Least Recently Used (LRU) block
The cache controller tracks references to all
blocks as computation proceeds.
Increase / clear track counters when a
hit/miss occurs
Replacement Algorithms
41 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss
Cache A A A A A E E E E E
FIFO B B B B B A A A A
C C C C C C C F
D D D D D D
42 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss
Cache A B C A D E A D C F
LRU A B C A D E A D C
A B C A D E A D
B C C C E A
43 / 19
Performance
Considerations
Overview
Two key factors: performance and cost
Price/performance ratio
Performance depends on how fast machine
instructions can be brought into the processor for
execution and how fast they can be executed.
For memory hierarchy, it is beneficial if transfers to
and from the faster units can be done at a rate equal
to that of the faster unit.
This is not possible if both the slow and the fast
units are accessed in the same manner.
However, it can be achieved when parallelism is
used in the organizations of the slower unit.
Interleaving
If the main memory is structured as a collection of physically
separated modules, each with its own ABR (Address buffer
register) and DBR( Data buffer register), memory access
operations may proceed in more than one module at the same
time.
m bits k bits
k bits m bits Address in module Module MM address
Module Address in module MM address
Tave=hC+(1-h)M