0% found this document useful (0 votes)
81 views50 pages

The Memory System

The document discusses the memory system, including different memory technologies like RAM, DRAM, and cache memory. It describes the internal organization of memory chips and how they are accessed. It explains the differences between static and dynamic RAM, synchronous DRAM operation, and concepts like latency and bandwidth that impact memory performance.

Uploaded by

Fajar Faizin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views50 pages

The Memory System

The document discusses the memory system, including different memory technologies like RAM, DRAM, and cache memory. It describes the internal organization of memory chips and how they are accessed. It explains the differences between static and dynamic RAM, synchronous DRAM operation, and concepts like latency and bandwidth that impact memory performance.

Uploaded by

Fajar Faizin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

The Memory System

Disesuaikan dengan kebutuhan perkuliahan


Arsitektur dan Organisasi Komputer
Universitas Pertamina
Objectives
 Basic memory circuits
 Organization of the main memory
 Memory technology
 Direct memory access as an I/O mechanism
 Cache memory
Pre-test
Pretest: Jelaskan dengan singkat istilah
berikut:
1. Volataile non volatile

2. Static RAM dan dynamic RAM

3. DDR

4. Cache memory

5. Perbedaan cache memory dan main


memory
Basic Concepts
 The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
 Most modern computers are byte addressable.
Word
address Byte address Byte address

0 0 1 2 3 0 3 2 1 0

4 4 5 6 7 4 7 6 5 4

• •
• •
• •

k k k k k k k k k k
2 -4 2 -4 2 -3 2 - 2 2 - 1 2 - 4 2 - 1 2 - 2 2 -3 2 -4

(a) Big-endian assignment (b) Little-endian assignment


Traditional Architecture

Processor Memory
k-bit
address bus
MAR
n-bit
data bus
Up to 2 k addressable
MDR locations

Word length = n bits

Control lines
( R / W, MFC, etc.)

Figure 5.1. Connection of the memory to the processor.


Basic Concepts
 “Block transfer” – bulk data transfer
 Memory access time
 Memory cycle time
 RAM – any location can be accessed for a
Read or Write operation in some fixed
amount of time that is independent of the
location’s address.
 Cache memory
 Virtual memory, memory management unit
Semiconductor RAM
Memories
Internal Organization of
Memory Chips
b7 b7 b1 b1 b0 b0

W0




FF FF
A0 W1




A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2
• • • • • •
A3

W15 •


16 words of 8 bits each: 16x8 memory
org.. It has 16 external connections:
Sense / Write Sense / Write Sense / Write R /W
circuit circuit circuit
addr. 4, data 8, control: 2, CS

power/ground: 2
1K memory cells: 128x8 memory,
Data input/output lines: b 7 b1 b0
external connections: ? 19(7+8+2+2)
1Kx1:? 15 (10+1+2+2) Figure 5.2. Organization of bit cells in a memory chip.
A Memory Chip
5-bit row
address W0
W1
32  32
5-bit
decoder memory cell
array
W31
Sense/Write
circuitry

10-bit
address
32-to-1
R /W
output multiplexer
and
CS
input demultiplexer

5-bit column
address
Data
input/output

Figure 5.3. Organization of a 1K  1 memory chip.


Static Memories
 The circuits are capable of retaining their state as long as power
is applied.
b b

T1 T2
X Y

Word line

Bit lines

Figure 5.4. A static RAM cell.


Asynchronous DRAMs
 Static RAMs are fast, but they cost more area and are more expensive.
 Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not
retain their state indefinitely – need to be periodically refreshed.
Bit line

Word line

T
C

Figure 5.6. A single-transistor dynamic memory cell


A Dynamic Memory Chip

RAS
Row Addr. Strobe

Row Row 4096  (512  8 )


address
latch decoder cell array

A 20 - 9  A 8 - 0 Sense / Write CS
circuits
R /W

Column
address Column
latch decoder

CAS D7 D0
Column Addr. Strobe

Figure 5.7. Internal organization of a 2M  8 dynamic memory chip.


Fast Page Mode
 When the DRAM in last slide is accessed, the
contents of all 4096 cells in the selected row are
sensed, but only 8 bits are placed on the data lines
D7-0, as selected by A8-0.
 Fast page mode – make it possible to access the
other bytes in the same row without having to
reselect the row.
 A latch is added at the output of the sense amplifier
in each column.
 Good for bulk transfer.
Synchronous DRAMs
 The operations of SDRAM are controlled by a clock signal.
Refresh
counter

Row
address Ro w
decoder Cell array
latch
Row/Column
address
Column Co lumn
address Read/Write
decoder circuits & latches
counter

Clock
RAS Mode register
CA S and Data input Data output
register register
R /W timing control

CS

Figure 5.8. Synchronous DRAM. Data


Synchronous DRAMs
Clock

R /W

RAS

C AS

Address Row Col

Data D0 D1 D2 D3

Figure 5.9. Burst read of length 4 in an SDRAM.


Synchronous DRAMs
 No CAS pulses is needed in burst operation.
 Refresh circuits are included (every 64ms).
 Clock frequency > 100 MHz
 Intel PC100 and PC133
Latency and Bandwidth
 The speed and efficiency of data transfers among
memory, processor, and disk have a large impact on
the performance of a computer system.
 Memory latency – the amount of time it takes to
transfer a word of data to or from the memory.
 Memory bandwidth – the number of bits or bytes
that can be transferred in one second. It is used to
measure how much time is needed to transfer an
entire block of data.
 Bandwidth is not determined solely by memory. It is
the product of the rate at which data are transferred
(and accessed) and the width of the data bus.
DDR SDRAM
 Double-Data-Rate SDRAM
 Standard SDRAM performs all actions on the rising
edge of the clock signal.
 DDR SDRAM accesses the cell array in the same
way, but transfers the data on both edges of the
clock.
 The cell array is organized in two banks. Each can
be accessed separately.
 DDR SDRAMs and standard SDRAMs are most
efficiently used in applications where block transfers
are prevalent.
Structures of Larger Memories
21-bit
addresses 19-bit internal chip address
A0
A1

A 19
A 20

2-bit
decoder

512K ´ 8
memory chip
D 31-24 D 23-16 D 15-8 D 7-0

512K ´ 8 memory chip

19-bit 8-bit data


address input/output

Chip select

Figure 5.10. Organization of a 2M  32 memory module using 512K  8 static memory chips.
Memory System
Considerations
 The choice of a RAM chip for a given application depends on
several factors:
Cost, speed, power, size…
 SRAMs are faster, more expensive, smaller.
 DRAMs are slower, cheaper, larger.
 Which one for cache and main memory, respectively?
 Refresh overhead – suppose a SDRAM whose cells are in 8K
rows; 4 clock cycles are needed to access each row; then it
takes 8192×4=32,768 cycles to refresh all rows; if the clock rate
is 133 MHz, then it takes 32,768/(133×10-6)=246×10-6 seconds;
suppose the typical refreshing period is 64 ms, then the refresh
overhead is 0.246/64=0.0038<0.4% of the total time available for
accessing the memory.
Memory Controller
Row/Column
Address address

RAS
R/ W
C AS
Memory
Request controller R/ W
Processor Memory
CS
Clock
Clock

Data

Figure 5.11. Use of a memory controller.


Read-Only Memories
Read-Only-Memory
Read-Only-Memory
 Volatile / non-volatile memory
 ROM
 PROM: programmable ROM
 EPROM: erasable, reprogrammable ROM
 EEPROM: can be programmed and erased
electrically

Figure 5.12. A ROM cell.


Flash Memory
 Similar to EEPROM
 Difference: only possible to write an entire
block of cells instead of a single cell
 Low power
 Use in portable equipment
 Implementation of such modules
 Flash cards
 Flash drives
Memory
Hierarchy
Speed, Size,
and Cost
Memory Hierarchy
Main Memory I/O Processor
CPU

Cache

Magnetic
Disks Magnetic Tapes
27 / 19
Cache Memories
Cache
 What is cache?
Page 315
 Why we need it?
 Locality of reference (very important)
- temporal
- spatial
 Cache block – cache line
 A set of contiguous address locations of some size
Cache

Main
Processor Cache
memory

Figure 5.14. Use of a cache memory.

 Replacement algorithm
 Hit / miss
 Write-through / Write-back
 Load through
Cache Memory
 High speed (towards CPU speed)
 Small size (power & cost)
Miss

Main
CPU Memory
Cache (Slow)
(Fast) Mem
Hit Cache

95% hit ratio


Access = 0.95 Cache + 0.05 Mem 31 / 19
Cache Memory

CPU 30-bit Address


Main
Memory
Cache 1 Gword
1 Mword

Only 20 bits !!!

32 / 19
Cache Memory
00000000 Main
00000001

Memory
00000 Cache •
00001 •
• •
• •
• •
• •
FFFFF •


3FFFFFFF

Address Mapping !!!


33 / 19
Main
memory

Block 0

Direct Mapping Block 1

Block j of main memory maps onto


block j modulo 128 of the cache
Cache Block 127
tag
Block 0 Block 128
tag
Block 1 Block 129

4: one of 16 words. (each


block has 16=24 words) tag
Block 127 Block 255

7: points to a particular block Block 256

in the cache (128=27) Block 257


Figure 5.15. Direct-mapped cache.
5: 5 tag bits are compared
with the tag bits associated
with its location in the cache.
Block 4095
Identify which of the 32
blocks that are resident in Tag Block Word

the cache (4096/128). 5 7 4 Main memory address


Direct Mapping
Tag Block Word
5 7 4 Main memory address

11101,1111111,1100

 Tag: 11101
 Block: 1111111=127, in the 127th block of the
cache
 Word:1100=12, the 12th word of the 127th
block in the cache
Associative Mapping Main
memory

Block 0

Block 1

Cache
tag
Block 0
tag
Block 1

Block i

tag
Block 127

4: one of 16 words. (each


block has 16=24 words)
Block 4095
12: 12 tag bits Identify which
of the 4096 blocks that are Tag Word

resident in the cache 12 4 Main memory address

4096=212.
Figure 5.16. Associative-mapped cache.
Associative Mapping
Tag Word
12 4 Main memory address

111011111111,1100

 Tag: 111011111111
 Word:1100=12, the 12th word of a block in the
cache
Main
memory

Block 0

Set-Associative Mapping Block 1

Cache
tag
Block 0
Set 0
Block 63
tag
Block 1
Block 64
tag
Block 2
Set 1
tag Block 65
Block 3

tag Block 127


4: one of 16 words. (each Block 126
Set 63
Block 128
block has 16=24 words) tag
Block 127
Block 129
6: points to a particular set in
the cache (128/2=64=26)
6: 6 tag bits is used to check
if the desired block is Block 4095

present (4096/64=26). Figure 5.17. Set-associative-mapped cache with two blocks per set.
Tag Set Word
6 6 4 Main memory address
Set-Associative Mapping
Tag Set Word
6 6 4 Main memory address

111011,111111,1100

 Tag: 111011
 Set: 111111=63, in the 63th set of the cache
 Word:1100=12, the 12th word of the 63th set
in the cache
Replacement Algorithms
 Difficult to determine which blocks to kick out
 Least Recently Used (LRU) block
 The cache controller tracks references to all
blocks as computation proceeds.
 Increase / clear track counters when a
hit/miss occurs
Replacement Algorithms

 For Associative & Set-Associative Cache


Which location should be emptied when the cache
is full and a miss occurs?
 First In First Out (FIFO)

 Least Recently Used (LRU)

 Distinguish an Empty location from a Full one


 Valid Bit

41 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss

Cache A A A A A E E E E E
FIFO  B B B B B A A A A
C C C C C C C F
D D D D D D

Hit Ratio = 3 / 10 = 0.3

42 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss

Cache A B C A D E A D C F
LRU  A B C A D E A D C
A B C A D E A D
B C C C E A

Hit Ratio = 4 / 10 = 0.4

43 / 19
Performance
Considerations
Overview
 Two key factors: performance and cost
 Price/performance ratio
 Performance depends on how fast machine
instructions can be brought into the processor for
execution and how fast they can be executed.
 For memory hierarchy, it is beneficial if transfers to
and from the faster units can be done at a rate equal
to that of the faster unit.
 This is not possible if both the slow and the fast
units are accessed in the same manner.
 However, it can be achieved when parallelism is
used in the organizations of the slower unit.
Interleaving
 If the main memory is structured as a collection of physically
separated modules, each with its own ABR (Address buffer
register) and DBR( Data buffer register), memory access
operations may proceed in more than one module at the same
time.
m bits k bits
k bits m bits Address in module Module MM address
Module Address in module MM address

ABR DBR ABR DBR ABR DBR


ABR DBR ABR DBR ABR DBR
Module Module Module
k
Module Module Module 0 i 2 - 1
0 i n- 1

(b) Consecutive words in consecutive modules


(a) Consecutive words in a module
Figure 5.25. Addressing multiple-module memory systems.
Hit Rate and Miss Penalty
 The success rate in accessing information at various
levels of the memory hierarchy – hit rate / miss rate.
 Ideally, the entire memory hierarchy would appear to
the processor as a single memory unit that has the
access time of a cache on the processor chip and
the size of a magnetic disk – depends on the hit rate
(>>0.9).
 A miss causes extra time needed to bring the
desired information into the cache.
Performance

Tave=hC+(1-h)M

Tave: average access time experienced by the processor


h: hit rate
M: miss penalty, the time to access information in the main
memory
C: the time to access information in the cache
Performance
How to Improve Hit Rate?
 Use larger cache – increased cost
 Increase the block size while keeping the
total cache size constant.
 However, if the block size is too large, some
items may not be referenced before the block
is replaced – miss penalty increases.
 Load-through approach

You might also like