CENG 3420
Computer Organization & Design
Lecture 13: Memory Organization-1
Bei Yu
CSE Department, CUHK
[email protected] (Textbook: Chapters 5.1–5.2 & A.8–A.9)
Spring 2022
Introduction
Review: Major Components of a Computer
Processor Devices
Control Input
Memory
Datapath Output
Memory
Main
Cache
Secondary
Memory
(Disk)
3/24
Why We Need Memory?
Combinational Circuit:
• Always gives the same output for a given set of inputs
• E.g., adders
Sequential Circuit:
• Store information
• Output depends on stored information
• E.g., counter
• Need a storage element
4/24
Who Cares About the Memory Hierarchy?
1000 Processor
Processor Growth Curve follows CPU
60%/yr.
“Moore’s Law”
Performance
(2x/1.5 yr)
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2x/10 yrs)
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Time
Processor-DRAM Memory Performance Gap
5/24
6/24
7/24
Memory System Revisted
• Maximum size of memory is determined by addressing scheme
E.g.
16-bit addresses can only address 216 = 65536 memory locations
• Most machines are byte-addressable
• each memory address location refers to a byte
• Most machines retrieve/store data in words
• Common abbreviations
• 1k ≈ 210 (kilo)
• 1M ≈ 220 (Mega)
• 1G ≈ 230 (Giga)
• 1T ≈ 240 (Tera)
8/24
Simplified View
Data transfer takes place through
• MAR: memory address register
• MDR: memory data register
Processor Memory
k-bit
address bus
MAR
n -bit
data bus
Up to 2 k addressable
MDR locations
Word length = n bits
Several addressable
Control lines locations (bytes) are
grouped into a word
( R / W , MFC, etc.)
9/24
Big Picture
Processor usually runs much faster than main memory:
• Small memories are fast, large memories are slow.
• Use a cache memory to store data in the processor that is likely to be used.
Main memory is limited:
• Use virtual memory to increase the apparent size of physical memory by moving
unused sections of memory to disk (automatically).
• A translation between virtual and physical addresses is done by a memory
management unit (MMU)
• To be discussed in later lectures
10/24
Characteristics of the Memory Hierarchy
Processor
Inclusive–
4-8 bytes (word) what is in L1$
is a subset of
Increasing L1$ what is in L2$
distance 8-32 bytes (block) is a subset of
from the L2$ what is in MM
processor that is a
1 to 4 blocks
in access subset of is in
time Main Memory
SM
1,024+ bytes (disk sector = page)
Secondary Memory
(Relative) size of the memory at each level
11/24
Memory Hierarchy: Why Does it Work?
Temporal Locality (locality in time)
If a memory location is referenced then it will tend to be referenced again soon
• Keep most recently accessed data items closer to the processor
12/24
Memory Hierarchy: Why Does it Work?
Temporal Locality (locality in time)
If a memory location is referenced then it will tend to be referenced again soon
• Keep most recently accessed data items closer to the processor
Spatial Locality (locality in space)
If a memory location is referenced, the locations with nearby addresses will tend to be
referenced soon
• Move blocks consisting of contiguous words closer to the processor
12/24
Memory Hierarchy
Taking advantage of the principle of locality:
• Present the user with as much memory as is available in the cheapest technology.
• Provide access at the speed offered by the fastest technology
Processor
Control Tertiary
Secondary Storage
Storage (Tape)
Second Main
(Disk)
On-Chip
Registers
Level Memory
Cache
Datapath Cache (DRAM)
(SRAM)
Speed: ~1 ns Tens ns Hundreds ns – 1 us Tens ms Tens sec
Size (bytes): Hundreds Mega's Giga's Tera's
13/24
https://youtu.be/p3q5zWCw8J4
14/24
Terminology
Random Access Memory (RAM)
Property: comparable access time for any memory locations
Block (or line)
the minimum unit of information that is present (or not) in a cache
15/24
Terminology
• Hit Rate: the fraction of memory accesses found in a level of the memory hierarchy
• Miss Rate: the fraction of memory accesses not found in a level of the memory
hierarchy, i.e. 1 - (Hit Rate)
Hit Time
Time to access the block + Time to determine hit/miss
Miss Penalty
Time to replace a block in that level with the corresponding block from a lower level
Hit Time << Miss Penalty
16/24
Bandwidth v.s. Latency
Example
• Mary acts FAST but she’s always LATE.
• Peter is always PUNCTUAL but he is SLOW.
17/24
Bandwidth v.s. Latency
Example
• Mary acts FAST but she’s always LATE.
• Peter is always PUNCTUAL but he is SLOW.
Bandwidth:
• talking about the “number of bits/bytes per second” when transferring a block of
data steadily.
Latency:
• amount of time to transfer the first word of a block after issuing the access signal.
• Usually measure in “number of clock cycles” or in ns/µs.
17/24
Question:
Suppose the clock rate is 500 MHz. What is the latency and what is the bandwidth,
assuming that each data is 64 bits?
Clock
Row
Access
Strobe
Data d0 d1 d2
18/24
• 500 MHz = 2.0 × 10−9 second
• latency = 5 cycle = 10−8 second
8
• bandwidth = = 4 × 109 byte / second.
2 × 10−9
19/24
Information Storage
Storage based on Feedback
• What if we add feedback to a pair of inverters?
21/24
Storage based on Feedback
• What if we add feedback to a pair of inverters?
• Usually drawn as a ring of cross-coupled inverters
• Stable way to store one bit of information (w. power)
21/24
How to change the value stored?
• Replace inverter with NOR gate
• SR-Latch
22/24
QUESTION:
What’s the Q value based on different R, S inputs?
• R=S=1:
• S=0, R=1:
• S=1, R=0:
• R=S=0:
23/24
How to remember?
• S: set
• R: re-set
• R=S=1: not determined, not allowed
• S=0, R=1: set value to 0
• S=1, R=0:set value to 1
• R=S=0: latch holds current value
24/24