L06 Memory
L06 Memory
edu/~cs152
CS 152/252A Computer
Architecture and Engineering Sophia Shao
Lecture 6 – Memory
NASA, Microchip, SiFive Announces
Partnership for RISC-V Spaceflight
Computing Platform
NASA has confirmed a partnership with
Microchip and SiFive to create a space-
centric processor built around the free
and open source RISC-V architecture:
the High-Performance Spaceflight
Computing (HPSC) chip.
https://www.hackster.io/news/nasa-
microchip-sifive-announces-
partnership-for-risc-v-spaceflight-
computing-platform-f52c55cf14f6
Last time in Lecture 4
§ Handling exceptions in pipelined machines by passing
exceptions down pipeline until instructions cross commit
point in order
§ Can use values before commit through bypass network
§ Pipeline hazards can be avoided through software
techniques: scheduling, loop unrolling
§ Decoupled architectures use queues between “access”
and “execute” pipelines to tolerate long memory latency
§ Regularizing all functional units to have same latency
simplifies more complex pipeline design by avoiding
structural hazards, can be expanded to in-order
superscalar designs
2
More Complex In-Order Pipeline
Inst. Data
PC D Decode GPRs X1 + X2 Mem X3 W
Mem
3
In-Order Superscalar Pipeline
Inst. 2 Dual Data
PC Mem D Decode GPRs X1 + X2 Mem X3 W
FPRs X1 X2 FAdd X3 W
4
Early Read-Only Memory Technologies
IBM Balanced
Capacitor ROS
IBM Card Capacitor ROS
5
Early Read/Write Main Memory Technologies
Babbage, 1800s: Digits
stored on mechanical wheels
Williams Tube,
Manchester Mark 1, 1947
Rotating magnetic
drum memory on
IBM 650, 1954
6
MIT Whirlwind Core Memory, 1950
7
Core Memory
§ Core memory was first large scale reliable main memory
– invented by Forrester in late 40s/early 50s at MIT for Whirlwind project
§ Bits stored as magnetization polarity on small ferrite cores
threaded onto two-dimensional grid of wires
§ Coincident current pulses on X and Y wires would write
cell and also sense original state (destructive reads)
§ Robust, non-volatile storage
§ Used on space shuttle
computers
§ Cores threaded onto wires by
hand (25 billion a year at peak
production)
§ Core access time ~ 1µs
DEC PDP-8/E Board,
4K words x 12 bits, (1968)
8
Semiconductor Memory
§ Semiconductor memory began to be
competitive in early 1970s
– Intel formed to exploit market for semiconductor
memory
– Early semiconductor memory was Static RAM (SRAM).
SRAM cell internals similar to a latch (cross-coupled
inverters).
9
One-Transistor Dynamic RAM [Dennard, IBM]
1-T DRAM Cell
word
access transistor
TiN top electrode (VREF)
VREF
Ta2O5 dielectric
bit
Storage
capacitor (FET gate,
trench, stack)
poly W bottom
word electrode
line access
transistor
10
Modern DRAM Structure
Row Address
N
Decoder
Row 2N
Memory cell
M Column Decoder & (one bit)
N+M
Sense Amplifiers
Data D
~7
Clock and control signals
DRAM
Address lines multiplexed
row/column address ~12 chip
Data bus
(4b,8b,16b,32b)
13
DRAM Packaging, Apple M1
• 128b databus,
running at 4.2Gb/s
• 68GB/s bandwidth
14
DRAM Operation
§ Three steps in read/write access to a given bank
§ Row access (RAS)
– decode row address, enable addressed row (often multiple Kb in row)
– bitlines share charge with storage cell
– small change in voltage detected by sense amplifiers which latch whole row of bits
– sense amplifiers drive bitlines full rail to recharge storage cells
§ Column access (CAS)
– decode column address to select small number of sense amplifier latches (4, 8, 16,
or 32 bits depending on DRAM package)
– on read, send latched bits out to chip pins
– on write, change sense amplifier latches which then charge storage cells to
required value
– can perform multiple column accesses on same row without another row access
(burst mode)
§ Precharge
– charges bit lines to known value, required before next row access
§ Each step has a latency of around 15-20ns in modern DRAMs
§ Various DRAM standards (DDR, RDRAM) have different ways of
encoding the signals for transmission to the DRAM, but all share
same core architecture
15
Double-Data Rate (DDR2) DRAM
200MHz
Clock
Data
400Mb/s
[ Micron, 256Mb DDR2 SDRAM datasheet ] Data Rate
16
Computer Architecture Terminology
Latency (in seconds or cycles): Time taken for a single
operation from start to finish (initiation to useable result)
Bandwidth (in operations/second or operations/cycle): Rate
of which operations can be performed
Occupancy (in seconds or cycles): Time during which the
unit is blocked on an operation (structural hazard)
Note, for a single functional unit:
§ Occupancy can be much less than latency (how?)
§ Occupancy can be greater than latency (how?)
§ Bandwidth can be greater than 1/latency (how?)
§ Bandwidth can be less than 1/latency (how?)
17
CS152 Administrivia
§ HW1 released
– Due Today
§ Lab1 released
– Due Feb 09
§ Lab reports must be readable English summaries – not
dumps of log files!!!!!!
– We will reward good reports, and penalize undecipherable reports
– Page limit (check lab spec/Ed)
§ Lecture Ed thread
– One thread per lecture
– Post your questions following the format:
• [Slide #] Your question
– The staff team will address and clarify the questions asynchronously.
§ Tell us what you think
– http://tinyurl.com/cs152feedback
CS252 18
CS252 Administrivia
§ CS252 Readings on
– https://ucb-cs252-sp23.hotcrp.com/u/0/
– Use hotcrp to upload reviews before Wednesday:
• Write one paragraph on main content of paper including good/bad
points of paper
• Also, answer/ask 1-3 questions about paper for discussion
• First two “360 Architecture”, “VAX11-780”
– 2-3pm Wednesday, Soda 606/Zoom
§ CS252 Project Timeline
– Proposal Wed Feb 22
– Use 252A GSIs (Abe and Prashanth) and my OHs to get feedback.
CS252 19
CPU-Memory Bottleneck
CPU Memory
20
Processor-DRAM Gap (latency)
µProc 60%/year
1000 CPU
Performance
Processor-Memory
100 Performance Gap:
(growing 50%/yr)
10 DRAM
7%/year
DRAM
1
1988
1980
1981
1982
1983
1984
1985
1986
1987
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Time
Four-issue 3GHz superscalar accessing 100ns DRAM could execute 1,200
instructions during time for one memory access!
21
Physical Size Affects Latency
CPU
CPU
Small
Memory Big Memory
22
Memory Hierarchy
A B
Small,
Fast Memory Big, Slow Memory
CPU
(RF, SRAM) (DRAM)
24
Management of Memory Hierarchy
§ Small/fast storage, e.g., registers
– Address usually specified in instruction
– Generally implemented directly as a register file
• but hardware might do things behind software’s
back, e.g., stack management, register renaming
25
Real Memory Reference Patterns
Memory Address (one dot per access)
Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. Time
IBM Systems Journal 10(3): 168-192 (1971)
26
Typical Memory Reference Patterns
Instruction
fetches
subroutine subroutine
call return
Stack
accesses
argument access
cess
r ac
Data ecto
v
accesses scalar accesses
Time
27
Two predictable properties of memory
references:
§ Temporal Locality: If a location is
referenced it is likely to be referenced again
in the near future.
28
Memory Reference Patterns
Memory Address (one dot per access)
Temporal
Locality
Spatial
Locality
Donald J. Hatfield, Jeanette Gerald: Program Time
Restructuring for Virtual Memory. IBM Systems Journal
10(3): 168-192 (1971) 29
Caches exploit both types of
predictability:
§ Exploit temporal locality by remembering the
contents of recently accessed locations.
30
Inside a Cache
Address Address
Processor Main
CACHE
Memory
Data Data
Address 6848
Tag 416
Data Block
31
Cache Algorithm (Read)
Look at Processor Address, search cache tags to
find match. Then either
Memory
Cache
33
Direct-Mapped Cache
t
k b
V Tag Data Block
2k
lines
t
=
34
Direct Map Address Selection
higher-order vs. lower-order address bits
2k
lines
t
=
35
2-Way Set-Associative Cache
Tag Index Block
Offset b
t
k
V Tag Data Block V Tag Data Block
= = Data
Word
or Byte
HIT
36
Fully Associative Cache
V Tag Data Block
t
=
Tag
t
=
HIT
Offset
Data
Block
= Word
b or Byte
37
Acknowledgements
§ This course is partly inspired by previous MIT 6.823 and
Berkeley CS252 computer architecture courses created by
my collaborators and colleagues:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
– Joel Emer (Intel/MIT)
– James Hoe (CMU)
– John Kubiatowicz (UCB)
– David Patterson (UCB)
38