OPERATING SYSTEMS AND NETWORKS
Prof. Dr. Andreas Wölfl
Department of Applied Computer Science,
Deggendorf Institute of Technology
[email protected]
MEMORY MANAGEMENT
MEMORY MANAGEMENT
640K ought to be enough for anybody.
(Bill Gates, 1981)
Prof. Dr. Andreas Wölfl Operating Systems and Networks 1
MEMORY MANAGEMENT
1. CACHING
CACHING
Cache
Hardware or software used to store data temporarily in a computing context.
• Small amount of faster, more expensive memory used to
CPU
improve the performance of frequently accessed data.
• Local to a cache client.
• Commonly used by the CPU and operating systems. Cache
• Decreases data access times and reduces latency.
RAM
Prof. Dr. Andreas Wölfl Operating Systems and Networks 2
CACHING
Data access using caches:
• When a cache client attempts to access data, it first checks the cache.
• If the data is found, that is referred to as a cache hit.
• If the data is not found, that is referred to as a cache miss
• In case of a hit, the data is taken directly from the cache.
• In case of a miss, the data is fetched the ordinary way and put into the cache.
• How data gets evicted from the cache depends on the replacement
algorithm1 .
1
We will learn some algorithms later in the lecture
Prof. Dr. Andreas Wölfl Operating Systems and Networks 3
CACHING
Caches used in a modern CPU:
• L1-Cache: Built onto the CPU core itself. Used to speed up frequently and
recently used instructions and data.
• L2-Cache: Separate from the CPU cores, but built onto the CPU package.
Used to speed up references to main memory.
• LLC/L3-Cache: Built onto the the CPU package and shared among cores.
Mainly used to tackle the cache coherence problem, which aims to keep all
cached copies of the same memory identical.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 4
CACHING
Performance difference between L1/L2/L3: CPU
• Regsiter: 1 ns Core 1 Core 2
• L1 cache access: 2 ns Registers Registers
• L2 cache access: 5 ns
L1 Cache L1 Cache
• L3 cache access: 10 ns
• RAM access: 100 ns L2 Cache L2 Cache
Example: Intel Core i9 13900K
L3 Cache
• L1: 80kB per core.
• L2: 2MB per core.
RAM
• LLC: 36MB shared.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 5
MEMORY MANAGEMENT
2. ADDRESS SPACES
ADDRESS SPACES
Major drawbacks of exposing physical memory to processes:
• Processes may trash the operating system (i.e., write garbage to memory
locations in kernel space → compromise the kernel).
• It is difficult to have multiple programs running at the same time (as two
processes may reference the same physical memory location).
Challenge: How can multiple programs execute without interfering with each
other when referencing main memory?
Prof. Dr. Andreas Wölfl Operating Systems and Networks 6
ADDRESS SPACES
Address Space
The set of (virtual) addresses that a process can use to address memory.
• An address space is a kind of abstract memory for programs to live in.
• Isolation: each process is assigned an individual address space.
• The operating system is responsible to map the addresses in an address
space to the physical memory location in RAM.
→ For example, address 0x0028 in one program maps to a different phyisical
location than address 0x0028 in another program.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 7
ADDRESS SPACES
Relocation
Maps each process’s address space onto a different part of physical memory
• Static Relocation: Add an offset equal to the starting location in memory to
all memory addresses during loading. E.g., for a program starting at 0x4000,
the address 0x0028 is rewritten to 0x4028.
• Dynamic Relocation: Uses two hardware registers, base and limit, which
define consecutive memory locations wherever there is room. Every time a
process references memory, the base value is added to the address. If the
result exceeds the limit, a fault is generated and the access is aborted.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 8
ADDRESS SPACES
Process A Process B Main Memory
Registers Registers 0x7FF8 NOP
Limit: 0x3FFC Limit: 0x7FF8
Base: 0x0000 Base: 0x4000 JE 0x0000
CMP
Program Program 0x4000 ADD
0x3FFC NOP 0x3FFC NOP 0x3FFC NOP
0x0008 ADD 0x0008 JE 0x0000 0x0008 ADD
0x0004 MOV 0x0004 CMP 0x0004 MOV
0x0000 JMP 0x0008 0x0000 ADD 0x0000 JMP 0x0008
Prof. Dr. Andreas Wölfl Operating Systems and Networks 9
ADDRESS SPACES
Swapping
Temporarily writing a process’s address space in its entirety to secondary
memory so that main memory can be made available for other processes.
• Simplest strategy to deal with memory overload that occurs if the physical
memory is not large enough to hold all processes.
• Causes idle processes to be stored mostly on disk, so they do not take up
any memory when not running.
• Swap out: Writing the address space to disk and removing it from memory.
• Swap in: Reading the address space from disk and loading it to memory.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 10
ADDRESS SPACE
Methods to select the process to swap out:
• FIFO (First-in First-Out): selects the oldest process.
• More replacement algorithms are discussed later in the lecture.
Methods to select the location where to swap in:
• First fit: Scan the memory bottom-up until a hole is found that is big enough.
• Next fit: Variant of next fit where the scan starts from the place where it left
off the last time (instead of always at the beginning).
• Best fit: Searches the entire memory for the smalles hole that is adequate.
• Worst fit: Searches the entire memory for the largest hole that is adequate.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 11
ADDRESS SPACES
C
C C C
A A
B B
A A A D D D D
Operating Operating Operating Operating Operating Operating Operating Operating
System System System System System System System System
t0 t1 t2 t3 t4 t5 t6 t7
Prof. Dr. Andreas Wölfl Operating Systems and Networks 12
ADDRESS SPACES
External Fragmentation
Situation where there is a sufficient quantity of free memory available to satisfy
a processes memory requirements, but the process cannot be loaded because
the memory offered is in a non-contiguous manner.
• Caused by swapping processes in and out.
• Leads to significantly reduced capacity and performance.
• Techniques such as memory compaction, which relocate all processes
downward as far as possible, are impractical2 .
2
For example, on a 16GB machine that can copy 8 bytes in 8 ns, it would take about 16 seconds to
compact all of memory.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 13
ADDRESS SPACES
Preliminary conclusion: Realizing the concept of individual address spaces in
multiprogramming systems constitutes a major challenge.
Open problems:
• External fragmentation wastes a considerable amount of memory.
• All active processes must fit into memory, limiting the degree of multipro-
gramming (swapping actively running processes is impractical).
• A program that is larger than main memory cannot execute.
• Dynamically allocating memory3 is not taken into consideration.
→ Nevertheless, the idea of address spaces is good!
3
Requesting a certain amount of memory from the operating system during program execution.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 14
TASK
Task
Solve the task ”Swapping” in iLearn.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 15
MEMORY MANAGEMENT
3. VIRTUAL MEMORY
VIRTUAL MEMORY
Virtual Memory
A memory allocation scheme in which secondary memory can be addressed as
though it were part of main memory.
• Each process is assigned a full virtual address space4 .
• The virtual address space consists of fixed-sized units called pages.
• The corresponding units in physical memory are called page frames.
• Both pages and page frames are generally the same size5 .
• Pages are mapped on the fly to either page frames or to disk.
4
From 0 to a maximum, even if a lesser amount of RAM is installed.
5
4kB in most operating systems
Prof. Dr. Andreas Wölfl Operating Systems and Networks 16
VIRTUAL MEMORY
Virtual Address Space Physical Memory
60K - 64K X Page 7 28K - 32K
56K - 60K X 6 24K - 28K
52K - 56K X 5 20K - 24K
48K - 52K X 4 16K - 20K
44K - 48K 11 3 12K - 16K
40K - 44K 10 2 8K - 12K
36K - 40K 9 1 4K - 8K
32K - 36K 8 0 0K - 4K
28K - 32K 7
24K - 28K 6
Disk Page
20K - 24K 5
16K - 20K 4 frame
12K - 16K 3
8K - 12K 2
4K - 8K 1
0K - 4K 0
Prof. Dr. Andreas Wölfl Operating Systems and Networks 17
VIRTUAL MEMORY
The mapping is programmed by the operating system in software, but performed
by the Memory Management Unit (MMU) in hardware:
The CU presents the virtual
addresses to the MMU
CPU RAM Disk
0x0007
Control Unit
0x0006
0x0005
0x0004
0x0003
MMU TLB 0x0002
0x0001
The MMU sends physical
addresses to RAM
Prof. Dr. Andreas Wölfl Operating Systems and Networks 18
VIRTUAL MEMORY
1 MOV EAX, 0x2004 # load memory location 0x2004 into register EAX
Example: The program tries to access address 819610 using the instruction:
• The CU passes the virtual address 819610 to the MMU.
• The MMU identifies page 2 for 819610 , which maps to page frame 6.
• It transforms the address 819610 and outputs address 2458010 onto the bus.
• The RAM knows nothing at all about the MMU and just sees a request for
reading address 2458010 , which it honors.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 19
TASK
Task
Solve the task ”Address Translation” in iLearn.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 20
VIRTUAL MEMORY
Page Tables
Data structure in RAM maintained by the operating system that holds the
mapping between pages and page frames.
• The page number is used as an index into the page table, yielding the
number of the corresponding page frame as well as a present/absent bit.
• If the present/absent bit is 0, a kernel trap called page fault is generated.
• If the present/absent bit is 1, the page frame number found in the page table
is copied to the most significant 3 bits of the output, along with the 12 bit
offset, which is copied unmodified from the incoming virtual address.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 21
VIRTUAL MEMORY
RAM MMU
Outgoing physical address (2458010)
Page Table
1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Frame P/A
15 000 0
14 000 0
13 000 0
12 000 0 110
11 111 1
10 000 0 12-bit offset copied
9 101 1 1
directly from input to
8 000 0 output
7 000 0 Present 0
6 000 0 Page Fault
bit?
5 011 1
4 100 1
3 000 1
2 110 1
1 001 1
0 001 1
0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Virtual page = 2 is used
as an index to the table Incoming virtual address (819610)
Prof. Dr. Andreas Wölfl Operating Systems and Networks 22
VIRTUAL MEMORY
Modified Present/Absent
Page Frame Number
Referenced Protection
Structure of a Page Table entry:
• Protection: 3 bits that indicate what kinds of access are permitted (rwx).
• Referenced (R bit): Set if a page is accessed by the current instruction.
• Modified (M bit): Set as soon as a page is written to (stays set until unset).
Prof. Dr. Andreas Wölfl Operating Systems and Networks 23
VIRTUAL MEMORY
What happens in case of a page fault trap?
• The operating system picks a page frame to be replaced.
• If the page frame is not already stored on disk or the M bit is set (page dirty),
the contents of the page frame are written to disk.
• The operating system fetches from disk the absent page that was just
referenced into the page frame just freed.
• Finally, the operating system update updates the respective page table
entry6 , and restarts the trapped instruction.
6
Writes the page frame number, sets the P/A bit and unsets the M bit.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 24
VIRTUAL MEMORY
Question: How does the operating system pick the page to be replaced?
Prof. Dr. Andreas Wölfl Operating Systems and Networks 25
VIRTUAL MEMORY
The optimal page frame replacement algorithm:
• Label each page with the number of instructions that will be executed before
that page is first referenced (e.g, one page will not be used for 8 million
instructions, another page will not be used for 6 million instructions).
• On page fault: Pick the page with the highest label (i.e., push the page fault
that will fetch it back as far into the future as possible)
Problem: This algorithm is unrealizable. At the time of the page fault, the
operating system has no way of knowing when each of the pages will be
referenced next.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 26
VIRTUAL MEMORY
Observation:
• Pages that have been heavily in use in the last few instructions will probably
be heavily used again soon.
• Conversely, pages that have not been used for ages will probably remain
unused for a long time.
The Least Recently Used (LRU) algorithm:
• Selects the page that has been unused for the longest time.
• Good approximation to the optimal algorithm (the best that we know).
Prof. Dr. Andreas Wölfl Operating Systems and Networks 27
VIRTUAL MEMORY
Hit = Page available
Miss = Page fault
incoming pages
1 2 3 2 1 5 2 3
3 2 1 5 2 3
page
2 2 3 2 1 5 2
frames
1 1 1 1 3 2 1 5
Miss Miss Miss Hit Hit Miss Hit Miss
→ Problem: Special hardware required for an efficient implementation.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 28
VIRTUAL MEMORY
Counter
Page Frame Number
Referenced
The Not Frequently Used algorithm (attempt to simulate LRU in software):
• With each entry in the page table, a counter is associated (initially 0).
• During each instruction, the operating system scans all pages in memory and
adds the R bit to the counter (i.e., the counter roughly keeps track of how
often each page has been referenced).
• Select the page with the lowest counter.
Problem: It never forgets anything → may keep undesired pages.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 29
VIRTUAL MEMORY
Counter
10000000 1 Page Frame Number t0
01000000 0 Page Frame Number t1
00100000 0 Page Frame Number t2
10100000 1 Page Frame Number t3
11010000 1 Page Frame Number t4
01101000 0 Page Frame Number t5
Referenced
The Aging algorithm (small modification to NFU, simulates LRU quite well):
• Shift the counter to the right before appending the R bit to the left.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 30
EXKURSUS
Page replacement in the Linux kernel (roughly described):
• The kernel tracks pages using a pair of LRU lists (active/inactive).
• Pages that have been recently accessed are kept on the ”active” list.
• Pages taken off the ”active” list are added to the ” inactive” list.
• If an ” inactive” page is accessed, it is promoted to the ”active” list.
• Pages are evicted only from the ” inactive” list.
• Important: Newly allocated pages first join the ” inactive” list.
→ Efficient in situations when a program reads sequentially through a file (the
contents are loaded into memory, but will never be used again).
Prof. Dr. Andreas Wölfl Operating Systems and Networks 31
EXKURSUS
The multi-generational LRU patch of Yu Zhao (Google, 2021)7 :
• Adds more LRU lists to cover a range of page ages (generations).
• Change the way page scanning is done to reduce its overhead.
This work results in:
• Spectrum of page ages (highest access frequency in the oldest generation).
• The generations are smaller in size (compared to the previous lists)
• Only the youngest generation is scanned → efficiency gain.
Note: Linux Torvalds merged it to kernel 6.1. Benchmarks8 look promising.
7
https://lore.kernel.org/lkml/
[email protected]/T/
8
https://www.phoronix.com/news/MGLRU-Reaches-mm-stable
Prof. Dr. Andreas Wölfl Operating Systems and Networks 32
TASK
Task
Solve the task ”Page Replacement” in iLearn.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 33
VIRTUAL MEMORY
Translation Lookaside Buffers (TLB)
Small table inside the MMU where each entry contains the information about
one page, including the page number, the M bit, the protection bits, and the
page frame.
• Based on the observation that most programs tend to make a large number
of references to a small number of pages (and not the other way arround).
• Typically 256 entries, implemented in hardware.
• Can be searched in parallel.
• Huge speedup.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 34
VIRTUAL MEMORY
Example:
Valid Page Number M Bit Protection Page Frame
1 140 1 RW 31
1 20 0 RX 38
1 130 1 RW 29
1 129 1 RW 62
1 19 0 RX 50
1 21 0 RX 45
1 860 1 RW 14
1 861 1 RW 75
Prof. Dr. Andreas Wölfl Operating Systems and Networks 35
VIRTUAL MEMORY
Page frame lookup with TLB:
• When a virtual address is presented to the MMU for translation, the
hardware first checks to see if its page number is present in the TLB.
• If a valid item is found and the access does not violate the protection bits,
the page frame is taken directly from the TLB.
• If the page number is not in the TLB, the MMU detects the muss and does an
ordinary page table lookup.
• It then evicts one of the entries from the TLB and replaces it with the page
table entry just looked up.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 36
TASK
Task
Solve the task ”Translation Lookaside Buffer” in iLearn.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 37
MEMORY MANAGEMENT
4. MEMORY ALLOCATION
MEMORY ALLOCATION
0xFFFF
Segments of a process’ address space: Kernel Space
high address
Stack
• Text: The instructions of the program.
• Data: Values of global and static variables.
• Heap: Area used for dynamic memory allocation.
• Stack: Area used for static memory allocation.
• Kernel Space: The kernel (e.g., system call Heap
procedures) is located at the top of the address Data
space. Device addresses are at the bottom. Text
low address
Kernel Space
0x0000
Prof. Dr. Andreas Wölfl Operating Systems and Networks 38
MEMORY ALLOCATION
Heap
Memory region from which a running program can request chunks.
Example (in the C programming language):
1 int* ptr = (int*) malloc(50); // requests 50 bytes from the OS
2 free(ptr) // releases the 50 bytes to the OS
Note: How heap memory is organized depends on the programming language in
which a program is written.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 39
MEMORY ALLOCATION
Internal Fragmentation
Situation where a bigger amount of memory is allocated than demanded.
• Memory allocation is said to by n-byte aligned when memory is allocated in
blocks that are multiples of n-bytes.
• Object-oriented programming languages such as C++, Java, and Python use
an 8-byte alignment for object allocation.
• Drawback: Memory alignment wastes memory due to internal fragmentation.
• Aligned access is faster because the bus to memory is typically 64 bit wide.
This implies that misaligned access can require 2 reads from memory.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 40
MEMORY ALLOCATION
Example:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
half word
byte
(upper bytes)
address space
byte half word (lower bytes)
half word
half word word (upper bytes
word word (lower bytes)
aligned transfers unaligned transfers
Prof. Dr. Andreas Wölfl Operating Systems and Networks 41
MEMORY ALLOCATION
Stack
Memory region where data is added or removed in last-in-first-out (LIFO)
manner.
• Each entry of the stack is a structure called frame.
• A frame stores information about an invocation of a function.
• Keeps track of the nested sequence of currently active procedures.
• The top of the stack is the frame of the running call, i.e., the procedure that
has currently control over the execution.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 42
MEMORY ALLOCATION
Usually9 , a stack frame includes:
• Parameters (variables declared in a procedure signature).
• Local variables (variables declared inside a procedure).
• The return address, i.e., the memory location of the instruction that caused
the procedure invocation.
Note: In many programming languages, variables and parameters hold memory
locations referencing the actual value in heap memory.
9
Varies among programming languages
Prof. Dr. Andreas Wölfl Operating Systems and Networks 43
MEMORY ALLOCATION
Example (The diagram represents the stack during execution of line 2):
1 int foo(int a, int b) { Locals of bar
2 return a + b;
0x2016 Return Address stack frame
3 } int a = 15
Parameters of bar of foo()
4 int b = 5
5 void bar(int x, int y) { Locals of foo
int n = 15
6 int n = x + 10;
0x2008 Return Address stack frame
7 int z = foo(n, y); int x = 5 of bar()
8 printf("z=%d",z); Parameters of foo
int y = 5
9 }
Locals of main
10 stack frame
0x2000 Return Address of main()
11 int main() {
12 bar(5, 5); Parameters of main
13 }
Prof. Dr. Andreas Wölfl Operating Systems and Networks 44
TASK
Task
Solve the task ”Fragmentation” in iLearn.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 45
MEMORY MANAGEMENT
5. SUMMARY
SUMMARY
Summary
You should have accquired the competencies to
• Differentiate between different types and levels of memory.
• Explain the notion of an address space and understand why we need it.
• Recite techniques to realize an address space in an operating system.
• Explain and understand the concept of virtual memory.
• Describe the main functions of a Memory Management Unit (MMU).
• Distinguish and assess page replacement algorithms.
• Know the difference between text, data, stack, and heap memory.
• Differentiate between internal and external fragmentation and describe
techniques how to tackle it.
Prof. Dr. Andreas Wölfl Operating Systems and Networks 46