1
2
C++ advanced memory management
with allocators.
3
About the speaker
Andrii Radchenko
• C++ Senior Software Engineer, GlobalLogic;
• 4 years of experience in C++;
• Have an experience working with Big Data;
• Currently working on memory-management core
library for protocol analyzer application.
4
Agenda
1. Memory management details in OS
2. Allocators in C++
3. Memory fragmentation problem and how to solve it in C++17
4. Memory-mapped files. Clustering Big Data – an example of memory
management
5
Attention
• Some concepts, code and other information in this presentation is
not cross-platform.
• Allocators are also changing during C++ standard development, so
implementation depends on it as well.
• On the slides with platform-specific code corresponding icon for
platform or standard will be used.
• The presentation is mostly about development but there
are some info for as well.
66
Part 1. Memory management
details in OS
7
Part 1. Memory management details in OS
1. Memory allocation basics
2. Virtual memory
• Motivation
• Main ideas
• Some implementation details on how it works
• C++ code – low level virtual allocation
• Peculiarities about Linux and Windows virtual memory
88
Memory allocation basics
9
Memory allocation basics
Stack allocation Heap allocation
Size known at compile-time Size known at runtime
Memory is managed automatically Programmer should manage allocation
and deallocation
Not suitable for big allocations,
because of stack limit.
No size limits - you can use your
RAM+swap size without any restrictions.
Exist while the created function is
running
Exists until deallocation and anywhere,
not depending on created function
Always cheaper because well-
optimized
If you fail to deallocate, your program will
have a memory leak
1010
Virtual memory
11
Virtual memory motivation
Using physical RAM addresses directly in each program leads to the following
problems:
1. Not enough RAM to fit the whole address space
2. Holes in the address space
3. Programs writing over each other
12
Not enough RAM
Virtual Address (4 GB)
0x00000000
0xFFFFFFFF
Physical Address (1 GB)
???
13
Holes in the address space
Physical Address (4 GB)
Program 1
Program 2
• Run Program 1 requires 1 GB space
• Run Program 2 requires 2 GB space
• Exit Program 1
• Now we actually have 2 GB of RAM, but we
can’t run any task requires more than 1 GB
Program 3
Can’t find
2 GB free space
1 GB
2 GB
1 GB
14
Programs writing over each other
Virtual Address (4 GB)
0x1234 1024 | 9999
Program 1
Write to address 0x1234
Program 2
Write to address 0x1234
• Each program can
access any address
• In case 2 programs
write to the same
address data will be
corrupted
15
Virtual memory motivation
Solution is own virtual memory space for each process.
• Somehow map each virtual address to RAM (physical address)
• We can use disk storage in when we run out of memory
1616
“All problems in computer science
can be solved by another level of
indirection”
1717
Virtual memory
• Virtual memory is a memory management capability of an OS that allows
temporarily transferring data from random access memory (RAM) to disk
storage.
• Virtual address space is increased using active memory in RAM and
inactive memory in hard disk drives (HDDs) to form contiguous addresses.
18
Virtual memory
Virtual Address
Physical Address
Secondary Memory
A
B
?
19
Virtual memory
Virtual Address
Physical Address
Mapping
20
Virtual memory
Virtual Address
Physical Address
Secondary Memory
A
B
Mapping
21
Virtual memory
Lets consider how such an approach can help to solve 3 problems:
1. Not enough RAM to fit the whole address space.
- In cases when we don’t have enough memory hard drive is used.
2. Holes in the address space
3. Programs writing over each other
22
Holes in the address space
Physical Address (4 GB)
Program 2 Virtual address
2 GB
Program 3 Virtual address
2 GB
1 GB
2 GB
1 GB
Mapping
Mapping
23
Programs writing over each other
Physical Address (4 GB)
…
Program 2 Virtual address
…
0x1234
…
Program 1 Virtual address
…
0x1234
…
Mapping
Mapping
Write to address
0x1234
Write to address
0x1234
• Each program can access any address
• In case 2 programs write to the same
address data will be protected
2424
Implementation details
25
Naive mapping implementation
Virtual Address 4 GB
…
512
514
516
…
Physical Address 1 GB
0
2
…
Mapping
512 0
514 2
516 d
… …
DiskWhat is the physical size of
Mapping table?
4 GB size ~ 2 billions virtual addresses because
our memory is word-aligned.
At least 1 pointer for each address ~ 8 GB size
26
Page tables
Virtual Address 4 GB
0-4095
4096-8191
8192-12287
12288-16383
…
Physical Address 1 GB
0
4096 - 8191
8192 - 12287
…
Page table
0-4095 4096-8191
4096-8191 8192-12287
8192-12287 disk
… …
Disk
Page size is 4 KB (can be
even 2 MB for x64 software)
What is the physical size of
Mapping table?
4 GB size ~ 1 million virtual pages.
At least 1 pointer for each page ~ 4 MB size
Page table entry (PTE)
27
Address translation in details
32 bits address space, 1 GB RAM installed and 4 KB pages
20 bits
Virtual address
Physical address 18 bits 12 bits
12 bits
30 bits for 1 GB
32 bits
Page table
Page table number Page offset
28
Address translation in details
Translation of address 0x1300
0x0000-0x0FFF
0x0000-0x0FFF
0x1000-0x1FFF
0x1000-0x1FFF
0x2000-0x2FFF
0x2000-0x2FFF
0
0
1
1
2
2
Page table numbers:
Offset 0x0300
29
Page table
Virtual page number Physical page number
0x0 2
0x1 1
0x2 Disk
0x3 3
0x4 0
… …
0xFFFFF 42
Page table contains mapping from 20-bit virtual page number to RAM page number
• There are 220 entries
• Each entry size is 18 bits in case
of 1 GB installed RAM, but actually
32-bits pointer is required.
• Total size = 4 MB
30
Page fault
• Page table Entry have disk, so the page is actually on disk
• Hardware generates a page fault exception
• Os page fault handler
- The OS choose page on RAM to put it to disk
- Write this page from RAM to disk if required
- The OS reads page from disk and puts page in RAM
- Update Page Table
• OS jumps back to the instruction before page fault
3131
Virtual memory summary
• Virtual memory is an abstraction that allows to form contiguous
addresses when the amount of data is huge;
• It is used when big allocation is required and RAM is not enough;
• Performance of using virtual memory is lower than using RAM directly;
• Memory pages are required to decrease Page Table size;
• Pay attention on Page size - for your.
3232
Lets dive into code
33
Virtual memory allocation examples
void *p = VirtualAlloc(nullptr, allocation_size, MEM_COMMIT | MEM_RESERVE,
PAGE_READWRITE);
if (p == nullptr)
{
// use GetLastError() to obtain error code
}
// ...
if (VirtualFree(p, 0, MEM_RELEASE) == FALSE)
{
// use GetLastError() to obtain error code
}
Here reserve and commit is
done as single operation.
34
Virtual memory allocation examples
• To allocate memory in the address space of another process, use the
VirtualAllocEx function.
• VirtualAlloc supports MEM_COMMIT and MEM_RESERVE in 2 separate
operations
• VirtualFree supports MEM_DECOMMIT for returning page to the reserved state
and MEM_RELEASE for total deallocate.
35
Virtual memory allocation examples
void *reserved = VirtualAlloc(nullptr, allocation_size, MEM_RESERVE, PAGE_NOACCESS);
if (reserved == nullptr)
{
// use GetLastError() to obtain error code
}
// ...
// when memory is required:
void *p = VirtualAlloc(reserved, allocation_size, MEM_COMMIT, PAGE_READWRITE);
if (VirtualFree(reserved, 0, MEM_RELEASE) == FALSE)
{
// use GetLastError() to obtain error code
}
// Details in the article “Reserving and Committing Memory” on MSDN.
36
Virtual memory allocation examples
#include <sys/mman.h>
void *p = mmap(nullptr, allocation_size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS,
-1, 0); // file descriptor and offset are not required
// because of anonymous mapping
if (p == MAP_FAILED)
{
// use perror() to obtain error code
}
// The region is also automatically unmapped when the process is terminated.
// On the other hand, closing the file descriptor does not unmap the region.
if (munmap(p, allocation_size) != 0)
{
// use perror() to obtain error code
}
37
Lets compare platform-specific details
Windows UNIX
Virtual memory range can be just reserved or
actually mapped
Creating a mapping both reserves the address
space and allows access to its contents
Each mapping is a separate allocation. Number of
calls to VirtualFree() must match the number of
calls to VirtualAlloc()
Mappings can be merged and split as needed by
mmap() and munmap().
Mappings always start at a 64KB-aligned boundary. Mappings can start at an arbitrary page-aligned
address. You can map single pages one by one with
no ill effects on the address space.
No limit on the number of mappings, only
performance issues.
There is a limit on the number of mappings
(pretty low, 65530). You cannot easily know the
number of mappings.
You can reserve up to 128TB of virtual address
space, however the commit limit depends on your
physical RAM and swap size.
Since all mappings are always committed, the limit
on the virtual address space is much lower and
depends on the RAM and swap size (by default).
38
Questions and discussion
3939
Part 2. Allocators in C++
40
Part 2. Allocators in C++
1. Classic allocators
• Motivation
• Hello World example
• Disadvantages
2. Polymorphic allocators in C++17
• Main ideas
• Hello World example
3. Comparison of these 2 approaches
4141
Allocators
4242
“All problems in computer science
can be solved by another level of
indirection”
4343
Allocators motivation
Few examples where allocators can be useful:
You need “special” memory:
• Use virtual memory for standard containers;
• Use separate block of memory for your containers in order to make them
available for other processes.
• On specific hardware or OS allocation can use different API (not malloc/free)
Performance for specific cases:
• Big std::list require huge amount of small allocations and deallocations. It
would be efficient to do only one big allocation instead of the classic flow;
44
Allocators motivation
std::vector<T> v;
v.reserve(4); // (1) allocate but not construct
v.push_back(T{}); // (2) construct on already allocated memory
v.push_back(T{}); // (3) construct on already allocated memory
v.clear(); // (4) destroy but not deallocate the memory
// ...
// vector’s destructor should deallocate the memory
Lets consider next code in order to understand more details about allocators
purpose in C++.
4545
Allocation and deallocation
are separated from
construction and destruction
4646
Allocators basics
• Allocator is an abstraction over memory management in STL collections.
• Collections don’t use new/delete directly – all such calls are happens
through allocators
• The std::allocator - default allocator usually with simple new/delete inside
• For user-defined allocator, in theory, only 2 methods are required –
allocate and deallocate
- T* allocate(std::size_t n)
- void deallocate(T* p, std::size_t)
• In practice, you need constructors and operator== and !=
and a few more details.
C++11
47
Allocators Hello World
template <class T>
struct Mallocator
{
using value_type = T;
Mallocator() = default;
template <class U> constexpr Mallocator(const Mallocator <U>&) noexcept {}
T* allocate(std::size_t n) {
if (n > std::numeric_limits<std::size_t>::max() / sizeof(T))
throw std::bad_alloc();
if (auto p = static_cast<T*>(std::malloc(n * sizeof(T))))
return p;
throw std::bad_alloc();
}
void deallocate(T* p, std::size_t) noexcept { std::free(p); }
};
C++11
48
Allocators Hello World
template <class T, class U>
bool operator==(const Mallocator <T>&, const Mallocator <U>&) { return true; }
template <class T, class U>
bool operator!=(const Mallocator <T>&, const Mallocator <U>&) { return false; }
// ...
std::vector<int, Mallocator<int>> v{ 1,2,3,4,5,6 };
C++11
49
Allocators disadvantages
• A lot of stuff to implement except allocate/deallocate methods:
- bool operator== and operator!= these are required to know if memory allocated
by the first object can be deallocated by the second
- construct(ptr, args) and destroy(ptr) these are to call constructor and destructor.
- max_size() this is to get max allowed size
- Additional using statements, like pointer, value_type and others
- Type is important, allocate returns T*
• For a particular collection we can’t choose allocator based on runtime
conditions because it’s a template parameter
C++11Some of the required methods are deprecated in C++17
5050
Polymorphic Allocators
51
Polymorphic Allocators
• In order to implement more dynamic behavior with allocators you can use
std::polymorphic_allocator
• The idea is really simple. Polymorphic allocator owns memory_resource
which is actually works with memory. It contains do_allocate and
do_deallocate methods.
• Now the allocator for the same vector (for example std::vector<int,
std::polymorphic_allocator<int>>) can be chosen dynamically.
(there is a shortcut std::pmr::vector<T>)
• The difference between allocator and memory_resource:
- Allocator uses templates
- memory_resource uses polymorphism
C++17
52
Allocator vs pmr and memory resource
Allocator memory resource
Type of allocator is known at
compile time
Type of memory resource is known at
runtime, allocator parameter is pmr
Based on templates Based on polymorphism
Overcomplicated − a pile of
additional functions to implement
Easy to use − only 2 functions are
required
A lot of features and
customization options that aren’t
really useful in the most of cases.
Extremely useful set of features that is
simple as well.
Have template type so allocate
returns T*
Type is not important. do_allocate returns
void* and developer shouldn’t care about
types – just work with memory.
C++17
5353
Lets dive into code
54
Polymorphic Allocators Hello World
#include <memory_resource>
class HelloWorldResource : public std::pmr::memory_resource
{
void* do_allocate(std::size_t bytes, std::size_t alignment) override;
void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override;
bool do_is_equal(const memory_resource& other) const noexcept override;
};
55
Polymorphic Allocators Hello World
void* HelloWorldResource::do_allocate(std::size_t bytes, std::size_t alignment)
{
return std::pmr::new_delete_resource()->allocate(bytes, alignment);
}
void HelloWorldResource::do_deallocate(void* p, std::size_t bytes, std::size_t alignment)
{
std::pmr::new_delete_resource()->deallocate(p, bytes, alignment);
}
bool HelloWorldResource::do_is_equal(const memory_resource& other) const noexcept
{
return std::pmr::new_delete_resource()->is_equal(other);
}
56
Polymorphic Allocators Hello World
HelloWorldResource r;
std::pmr::vector<int> v(1000, &r);
57
Questions and discussion
5858
Part 3. Memory fragmentation problem
and how to solve it in C++17
59
Part 3. Memory fragmentation problem and how to solve it in C++17
1. Fragmentation problem
2. Memory resource in Standard library
• synchronized_pool_resource
• unsynchronized_pool_resource
• monotonic_buffer_resource
3. Benchmarks on the efficiency of these resources
6060
Fragmentation
61
Memory fragmentation problem
The memory can be virtual where page faults are expensive!
6262
Already implemented memory
resource in standard library
63
Polymorphic Allocators
• Default memory resource (for default constructed polymorphic_allocator):
- new_delete_resource − new and delete direct calls
- null_memory_resource − always throws bad_alloc
• synchronized_pool_resource
• unsynchronized_pool_resource
• monotonic_buffer_resource
C++17
64
Pool resource
Upstream
resource
65
Synchronized and unsynchronized versions
A synchronized_pool_resource may be accessed from multiple threads without
external synchronization and may have thread-specific pools to reduce
synchronization costs. An unsynchronized_pool_resource class may not be
accessed from multiple threads simultaneously and thus avoids the cost of
synchronization entirely in single-threaded applications.
66
Monotonic buffer resource
do_deallocate do nothing!
Memory is unused until collection destruction
67
Upstream resource
• Three memory resources above have constructor with
std::pmr::memory_resource* upstream_resource.
• Upstream resource is required in order to use sophisticated memory models
like Pool or Monotonic resources with custom memory_resource instead of
new/delete
• The constructors not taking a upstream memory resource pointer uses the
return value of std::pmr::get_default_resource as the upstream memory
resource
68
Examples
0
10
20
30
40
50
60
70
80
90
100
16384 32768 65536 131072 262144 524288 1048576
Timems
List size
Filling and destroying std::list
Monotonic resource Default new/delete
69
Examples
0
50
100
150
200
250
300
350
400
16384 32768 65536 131072 262144 524288 1048576
Timems
Map size
filling and destroying std::unordered_map
Monotonic resource Default new/delete
70
Examples
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
16384 32768 65536 131072 262144 524288 1048576
Timems
List size
Writing std::list
Monotonic resource Default new/delete Pool
71
Examples
0
5
10
15
20
25
30
35
40
45
16384 32768 65536 131072 262144 524288 1048576
Timems
List size
Writing std::unordered_map
Monotonic resource Default new/delete Pool
72
Examples
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
16384 32768 65536 131072 262144 524288 1048576
Timems
List size
Reading std::list
Monotonic resource Default new/delete Pool
73
Examples
0
5
10
15
20
25
30
35
40
45
16384 32768 65536 131072 262144 524288 1048576
Timems
List size
Reading std::unordered_map
Monotonic resource Default new/delete Pool
7474
Let debug a bit how monotonic
resource works
75
Questions and discussion
7676
Part 4. Clustering Big Data.
Memory-mapped files
An examples of memory management
77
Part 4. Clustering Big Data. An example of memory management
1. Memory-mapped files mechanism
2. Data clustering
• What is it? Task
• Main problems and their solutions
• Algorithm
3. Implementation
7878
Memory-mapped files
79
Motivation
• RAM and + swap size is not enough
• More space on disk is required and
• CPU performance sometimes is not so critical as memory consumption
• Virtual memory stores data on disk but we don't have any control and don't
have any access to it
• Communication between processes
80
Memory-mapped files
Physical memory
Process 1 Virtual memory
Process 2 Virtual memory
File on disk
File mapping
81
Memory-mapped files
The idea about mapping on disk is similar to virtual memory, but there are some
details:
• All data is mapped to a particular physical file instead of swap file;
• You can leave the data on disk and load it later;
• You can create views in order to use parts of files;
• Size is not limited, so you can fully use hard drive
82
Memory-mapped files workflow in Windows
Close mapping and file handles
::CloseHandle(…)
Destroy all file views
::UnmapViewOfFile(…) ::UnmapViewOfFile(…) ::UnmapViewOfFile(…)
Create View
::MapViewOfFile(…) ::MapViewOfFile(…) ::MapViewOfFile(…)
Create mapping
::CreateFileMapping(…)
Open file
::CreateFile(…)
8383
Advanced example:
k-medoids clustering algorithm
84
Clustering
85
Formulation of the problem
• Dataset
- Euclidian points (usually 2D or 3D
vectors)
- Strings
- Complex objects like molecules,
genes or other domain-specific
objects
• Distance metric
- Symmetric function between points
● Input
• Classification for each point – in fact
just number of cluster
• Points in one cluster should be as
close as possibly with respect to
given distance function
● Output
86
Problems
• Usually input dataset is kind of Big Data;
• Distance function is usually hard to calculate, especially for complex objects;
• Precalculating all distances are efficient, but require huge amount of
memory;
• Of course, there are algorithms and accuracy problems as well, but they aren’t
considered
87
Solution
• Lets precalculate distances matrix using memory-mapped files, because the
whole matrix can’t be placed into RAM and even into RAM+swap;
• K-medoids algorithm uses distances matrix
• Lets compare performance
88
Solution
• What is usually stored in memory:
- Dataset
- Distances matrix
- Labels (clustering result)
• Lets consider tasks when dataset is not really big. In the simulated dataset it is
100.000 points
• Distances matrix is large. 100’000 * 100’000 * 8 / 2 ~ 40 GB
• Precalculating distances matrix is common approach when metric is complex
and expensive.
89
Algorithm illustration
90
91
pyclustering
Algorithm from open source clustering library has been used.
https://github.com/annoviko/pyclustering
92
Questions and discussion
93
Thank you
9494
References
• https://www.youtube.com/watch?v=qcBIvnQt0Bw&list=PLiwt1iVUib9s2Uo5BeYmwkDFUh7
0fJPxX
• Effective STL Scott Meyers
• https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
• https://docs.microsoft.com/en-us/windows/win32/memory/reserving-and-committing-
memory
• https://rcl-rs-vvg.blogspot.com/2018/08/allocating-memory-on-linux-and-windows.html
• Some talks at CppCon and other conferences:
- https://www.youtube.com/watch?v=nZNd5FjSquk
- https://www.youtube.com/watch?v=IejdKidUwIg
- https://www.youtube.com/watch?v=FLbXjNrAjbc
- https://channel9.msdn.com/events/GoingNative/CppCon-2017/119?term=Pablo%20Halpern
95

C++ Advanced Memory Management With Allocators

  • 1.
  • 2.
    2 C++ advanced memorymanagement with allocators.
  • 3.
    3 About the speaker AndriiRadchenko • C++ Senior Software Engineer, GlobalLogic; • 4 years of experience in C++; • Have an experience working with Big Data; • Currently working on memory-management core library for protocol analyzer application.
  • 4.
    4 Agenda 1. Memory managementdetails in OS 2. Allocators in C++ 3. Memory fragmentation problem and how to solve it in C++17 4. Memory-mapped files. Clustering Big Data – an example of memory management
  • 5.
    5 Attention • Some concepts,code and other information in this presentation is not cross-platform. • Allocators are also changing during C++ standard development, so implementation depends on it as well. • On the slides with platform-specific code corresponding icon for platform or standard will be used. • The presentation is mostly about development but there are some info for as well.
  • 6.
    66 Part 1. Memorymanagement details in OS
  • 7.
    7 Part 1. Memorymanagement details in OS 1. Memory allocation basics 2. Virtual memory • Motivation • Main ideas • Some implementation details on how it works • C++ code – low level virtual allocation • Peculiarities about Linux and Windows virtual memory
  • 8.
  • 9.
    9 Memory allocation basics Stackallocation Heap allocation Size known at compile-time Size known at runtime Memory is managed automatically Programmer should manage allocation and deallocation Not suitable for big allocations, because of stack limit. No size limits - you can use your RAM+swap size without any restrictions. Exist while the created function is running Exists until deallocation and anywhere, not depending on created function Always cheaper because well- optimized If you fail to deallocate, your program will have a memory leak
  • 10.
  • 11.
    11 Virtual memory motivation Usingphysical RAM addresses directly in each program leads to the following problems: 1. Not enough RAM to fit the whole address space 2. Holes in the address space 3. Programs writing over each other
  • 12.
    12 Not enough RAM VirtualAddress (4 GB) 0x00000000 0xFFFFFFFF Physical Address (1 GB) ???
  • 13.
    13 Holes in theaddress space Physical Address (4 GB) Program 1 Program 2 • Run Program 1 requires 1 GB space • Run Program 2 requires 2 GB space • Exit Program 1 • Now we actually have 2 GB of RAM, but we can’t run any task requires more than 1 GB Program 3 Can’t find 2 GB free space 1 GB 2 GB 1 GB
  • 14.
    14 Programs writing overeach other Virtual Address (4 GB) 0x1234 1024 | 9999 Program 1 Write to address 0x1234 Program 2 Write to address 0x1234 • Each program can access any address • In case 2 programs write to the same address data will be corrupted
  • 15.
    15 Virtual memory motivation Solutionis own virtual memory space for each process. • Somehow map each virtual address to RAM (physical address) • We can use disk storage in when we run out of memory
  • 16.
    1616 “All problems incomputer science can be solved by another level of indirection”
  • 17.
    1717 Virtual memory • Virtualmemory is a memory management capability of an OS that allows temporarily transferring data from random access memory (RAM) to disk storage. • Virtual address space is increased using active memory in RAM and inactive memory in hard disk drives (HDDs) to form contiguous addresses.
  • 18.
    18 Virtual memory Virtual Address PhysicalAddress Secondary Memory A B ?
  • 19.
  • 20.
    20 Virtual memory Virtual Address PhysicalAddress Secondary Memory A B Mapping
  • 21.
    21 Virtual memory Lets considerhow such an approach can help to solve 3 problems: 1. Not enough RAM to fit the whole address space. - In cases when we don’t have enough memory hard drive is used. 2. Holes in the address space 3. Programs writing over each other
  • 22.
    22 Holes in theaddress space Physical Address (4 GB) Program 2 Virtual address 2 GB Program 3 Virtual address 2 GB 1 GB 2 GB 1 GB Mapping Mapping
  • 23.
    23 Programs writing overeach other Physical Address (4 GB) … Program 2 Virtual address … 0x1234 … Program 1 Virtual address … 0x1234 … Mapping Mapping Write to address 0x1234 Write to address 0x1234 • Each program can access any address • In case 2 programs write to the same address data will be protected
  • 24.
  • 25.
    25 Naive mapping implementation VirtualAddress 4 GB … 512 514 516 … Physical Address 1 GB 0 2 … Mapping 512 0 514 2 516 d … … DiskWhat is the physical size of Mapping table? 4 GB size ~ 2 billions virtual addresses because our memory is word-aligned. At least 1 pointer for each address ~ 8 GB size
  • 26.
    26 Page tables Virtual Address4 GB 0-4095 4096-8191 8192-12287 12288-16383 … Physical Address 1 GB 0 4096 - 8191 8192 - 12287 … Page table 0-4095 4096-8191 4096-8191 8192-12287 8192-12287 disk … … Disk Page size is 4 KB (can be even 2 MB for x64 software) What is the physical size of Mapping table? 4 GB size ~ 1 million virtual pages. At least 1 pointer for each page ~ 4 MB size Page table entry (PTE)
  • 27.
    27 Address translation indetails 32 bits address space, 1 GB RAM installed and 4 KB pages 20 bits Virtual address Physical address 18 bits 12 bits 12 bits 30 bits for 1 GB 32 bits Page table Page table number Page offset
  • 28.
    28 Address translation indetails Translation of address 0x1300 0x0000-0x0FFF 0x0000-0x0FFF 0x1000-0x1FFF 0x1000-0x1FFF 0x2000-0x2FFF 0x2000-0x2FFF 0 0 1 1 2 2 Page table numbers: Offset 0x0300
  • 29.
    29 Page table Virtual pagenumber Physical page number 0x0 2 0x1 1 0x2 Disk 0x3 3 0x4 0 … … 0xFFFFF 42 Page table contains mapping from 20-bit virtual page number to RAM page number • There are 220 entries • Each entry size is 18 bits in case of 1 GB installed RAM, but actually 32-bits pointer is required. • Total size = 4 MB
  • 30.
    30 Page fault • Pagetable Entry have disk, so the page is actually on disk • Hardware generates a page fault exception • Os page fault handler - The OS choose page on RAM to put it to disk - Write this page from RAM to disk if required - The OS reads page from disk and puts page in RAM - Update Page Table • OS jumps back to the instruction before page fault
  • 31.
    3131 Virtual memory summary •Virtual memory is an abstraction that allows to form contiguous addresses when the amount of data is huge; • It is used when big allocation is required and RAM is not enough; • Performance of using virtual memory is lower than using RAM directly; • Memory pages are required to decrease Page Table size; • Pay attention on Page size - for your.
  • 32.
  • 33.
    33 Virtual memory allocationexamples void *p = VirtualAlloc(nullptr, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); if (p == nullptr) { // use GetLastError() to obtain error code } // ... if (VirtualFree(p, 0, MEM_RELEASE) == FALSE) { // use GetLastError() to obtain error code } Here reserve and commit is done as single operation.
  • 34.
    34 Virtual memory allocationexamples • To allocate memory in the address space of another process, use the VirtualAllocEx function. • VirtualAlloc supports MEM_COMMIT and MEM_RESERVE in 2 separate operations • VirtualFree supports MEM_DECOMMIT for returning page to the reserved state and MEM_RELEASE for total deallocate.
  • 35.
    35 Virtual memory allocationexamples void *reserved = VirtualAlloc(nullptr, allocation_size, MEM_RESERVE, PAGE_NOACCESS); if (reserved == nullptr) { // use GetLastError() to obtain error code } // ... // when memory is required: void *p = VirtualAlloc(reserved, allocation_size, MEM_COMMIT, PAGE_READWRITE); if (VirtualFree(reserved, 0, MEM_RELEASE) == FALSE) { // use GetLastError() to obtain error code } // Details in the article “Reserving and Committing Memory” on MSDN.
  • 36.
    36 Virtual memory allocationexamples #include <sys/mman.h> void *p = mmap(nullptr, allocation_size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS, -1, 0); // file descriptor and offset are not required // because of anonymous mapping if (p == MAP_FAILED) { // use perror() to obtain error code } // The region is also automatically unmapped when the process is terminated. // On the other hand, closing the file descriptor does not unmap the region. if (munmap(p, allocation_size) != 0) { // use perror() to obtain error code }
  • 37.
    37 Lets compare platform-specificdetails Windows UNIX Virtual memory range can be just reserved or actually mapped Creating a mapping both reserves the address space and allows access to its contents Each mapping is a separate allocation. Number of calls to VirtualFree() must match the number of calls to VirtualAlloc() Mappings can be merged and split as needed by mmap() and munmap(). Mappings always start at a 64KB-aligned boundary. Mappings can start at an arbitrary page-aligned address. You can map single pages one by one with no ill effects on the address space. No limit on the number of mappings, only performance issues. There is a limit on the number of mappings (pretty low, 65530). You cannot easily know the number of mappings. You can reserve up to 128TB of virtual address space, however the commit limit depends on your physical RAM and swap size. Since all mappings are always committed, the limit on the virtual address space is much lower and depends on the RAM and swap size (by default).
  • 38.
  • 39.
  • 40.
    40 Part 2. Allocatorsin C++ 1. Classic allocators • Motivation • Hello World example • Disadvantages 2. Polymorphic allocators in C++17 • Main ideas • Hello World example 3. Comparison of these 2 approaches
  • 41.
  • 42.
    4242 “All problems incomputer science can be solved by another level of indirection”
  • 43.
    4343 Allocators motivation Few exampleswhere allocators can be useful: You need “special” memory: • Use virtual memory for standard containers; • Use separate block of memory for your containers in order to make them available for other processes. • On specific hardware or OS allocation can use different API (not malloc/free) Performance for specific cases: • Big std::list require huge amount of small allocations and deallocations. It would be efficient to do only one big allocation instead of the classic flow;
  • 44.
    44 Allocators motivation std::vector<T> v; v.reserve(4);// (1) allocate but not construct v.push_back(T{}); // (2) construct on already allocated memory v.push_back(T{}); // (3) construct on already allocated memory v.clear(); // (4) destroy but not deallocate the memory // ... // vector’s destructor should deallocate the memory Lets consider next code in order to understand more details about allocators purpose in C++.
  • 45.
    4545 Allocation and deallocation areseparated from construction and destruction
  • 46.
    4646 Allocators basics • Allocatoris an abstraction over memory management in STL collections. • Collections don’t use new/delete directly – all such calls are happens through allocators • The std::allocator - default allocator usually with simple new/delete inside • For user-defined allocator, in theory, only 2 methods are required – allocate and deallocate - T* allocate(std::size_t n) - void deallocate(T* p, std::size_t) • In practice, you need constructors and operator== and != and a few more details. C++11
  • 47.
    47 Allocators Hello World template<class T> struct Mallocator { using value_type = T; Mallocator() = default; template <class U> constexpr Mallocator(const Mallocator <U>&) noexcept {} T* allocate(std::size_t n) { if (n > std::numeric_limits<std::size_t>::max() / sizeof(T)) throw std::bad_alloc(); if (auto p = static_cast<T*>(std::malloc(n * sizeof(T)))) return p; throw std::bad_alloc(); } void deallocate(T* p, std::size_t) noexcept { std::free(p); } }; C++11
  • 48.
    48 Allocators Hello World template<class T, class U> bool operator==(const Mallocator <T>&, const Mallocator <U>&) { return true; } template <class T, class U> bool operator!=(const Mallocator <T>&, const Mallocator <U>&) { return false; } // ... std::vector<int, Mallocator<int>> v{ 1,2,3,4,5,6 }; C++11
  • 49.
    49 Allocators disadvantages • Alot of stuff to implement except allocate/deallocate methods: - bool operator== and operator!= these are required to know if memory allocated by the first object can be deallocated by the second - construct(ptr, args) and destroy(ptr) these are to call constructor and destructor. - max_size() this is to get max allowed size - Additional using statements, like pointer, value_type and others - Type is important, allocate returns T* • For a particular collection we can’t choose allocator based on runtime conditions because it’s a template parameter C++11Some of the required methods are deprecated in C++17
  • 50.
  • 51.
    51 Polymorphic Allocators • Inorder to implement more dynamic behavior with allocators you can use std::polymorphic_allocator • The idea is really simple. Polymorphic allocator owns memory_resource which is actually works with memory. It contains do_allocate and do_deallocate methods. • Now the allocator for the same vector (for example std::vector<int, std::polymorphic_allocator<int>>) can be chosen dynamically. (there is a shortcut std::pmr::vector<T>) • The difference between allocator and memory_resource: - Allocator uses templates - memory_resource uses polymorphism C++17
  • 52.
    52 Allocator vs pmrand memory resource Allocator memory resource Type of allocator is known at compile time Type of memory resource is known at runtime, allocator parameter is pmr Based on templates Based on polymorphism Overcomplicated − a pile of additional functions to implement Easy to use − only 2 functions are required A lot of features and customization options that aren’t really useful in the most of cases. Extremely useful set of features that is simple as well. Have template type so allocate returns T* Type is not important. do_allocate returns void* and developer shouldn’t care about types – just work with memory. C++17
  • 53.
  • 54.
    54 Polymorphic Allocators HelloWorld #include <memory_resource> class HelloWorldResource : public std::pmr::memory_resource { void* do_allocate(std::size_t bytes, std::size_t alignment) override; void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override; bool do_is_equal(const memory_resource& other) const noexcept override; };
  • 55.
    55 Polymorphic Allocators HelloWorld void* HelloWorldResource::do_allocate(std::size_t bytes, std::size_t alignment) { return std::pmr::new_delete_resource()->allocate(bytes, alignment); } void HelloWorldResource::do_deallocate(void* p, std::size_t bytes, std::size_t alignment) { std::pmr::new_delete_resource()->deallocate(p, bytes, alignment); } bool HelloWorldResource::do_is_equal(const memory_resource& other) const noexcept { return std::pmr::new_delete_resource()->is_equal(other); }
  • 56.
    56 Polymorphic Allocators HelloWorld HelloWorldResource r; std::pmr::vector<int> v(1000, &r);
  • 57.
  • 58.
    5858 Part 3. Memoryfragmentation problem and how to solve it in C++17
  • 59.
    59 Part 3. Memoryfragmentation problem and how to solve it in C++17 1. Fragmentation problem 2. Memory resource in Standard library • synchronized_pool_resource • unsynchronized_pool_resource • monotonic_buffer_resource 3. Benchmarks on the efficiency of these resources
  • 60.
  • 61.
    61 Memory fragmentation problem Thememory can be virtual where page faults are expensive!
  • 62.
  • 63.
    63 Polymorphic Allocators • Defaultmemory resource (for default constructed polymorphic_allocator): - new_delete_resource − new and delete direct calls - null_memory_resource − always throws bad_alloc • synchronized_pool_resource • unsynchronized_pool_resource • monotonic_buffer_resource C++17
  • 64.
  • 65.
    65 Synchronized and unsynchronizedversions A synchronized_pool_resource may be accessed from multiple threads without external synchronization and may have thread-specific pools to reduce synchronization costs. An unsynchronized_pool_resource class may not be accessed from multiple threads simultaneously and thus avoids the cost of synchronization entirely in single-threaded applications.
  • 66.
    66 Monotonic buffer resource do_deallocatedo nothing! Memory is unused until collection destruction
  • 67.
    67 Upstream resource • Threememory resources above have constructor with std::pmr::memory_resource* upstream_resource. • Upstream resource is required in order to use sophisticated memory models like Pool or Monotonic resources with custom memory_resource instead of new/delete • The constructors not taking a upstream memory resource pointer uses the return value of std::pmr::get_default_resource as the upstream memory resource
  • 68.
    68 Examples 0 10 20 30 40 50 60 70 80 90 100 16384 32768 65536131072 262144 524288 1048576 Timems List size Filling and destroying std::list Monotonic resource Default new/delete
  • 69.
    69 Examples 0 50 100 150 200 250 300 350 400 16384 32768 65536131072 262144 524288 1048576 Timems Map size filling and destroying std::unordered_map Monotonic resource Default new/delete
  • 70.
    70 Examples 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 16384 32768 65536131072 262144 524288 1048576 Timems List size Writing std::list Monotonic resource Default new/delete Pool
  • 71.
    71 Examples 0 5 10 15 20 25 30 35 40 45 16384 32768 65536131072 262144 524288 1048576 Timems List size Writing std::unordered_map Monotonic resource Default new/delete Pool
  • 72.
    72 Examples 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 16384 32768 65536131072 262144 524288 1048576 Timems List size Reading std::list Monotonic resource Default new/delete Pool
  • 73.
    73 Examples 0 5 10 15 20 25 30 35 40 45 16384 32768 65536131072 262144 524288 1048576 Timems List size Reading std::unordered_map Monotonic resource Default new/delete Pool
  • 74.
    7474 Let debug abit how monotonic resource works
  • 75.
  • 76.
    7676 Part 4. ClusteringBig Data. Memory-mapped files An examples of memory management
  • 77.
    77 Part 4. ClusteringBig Data. An example of memory management 1. Memory-mapped files mechanism 2. Data clustering • What is it? Task • Main problems and their solutions • Algorithm 3. Implementation
  • 78.
  • 79.
    79 Motivation • RAM and+ swap size is not enough • More space on disk is required and • CPU performance sometimes is not so critical as memory consumption • Virtual memory stores data on disk but we don't have any control and don't have any access to it • Communication between processes
  • 80.
    80 Memory-mapped files Physical memory Process1 Virtual memory Process 2 Virtual memory File on disk File mapping
  • 81.
    81 Memory-mapped files The ideaabout mapping on disk is similar to virtual memory, but there are some details: • All data is mapped to a particular physical file instead of swap file; • You can leave the data on disk and load it later; • You can create views in order to use parts of files; • Size is not limited, so you can fully use hard drive
  • 82.
    82 Memory-mapped files workflowin Windows Close mapping and file handles ::CloseHandle(…) Destroy all file views ::UnmapViewOfFile(…) ::UnmapViewOfFile(…) ::UnmapViewOfFile(…) Create View ::MapViewOfFile(…) ::MapViewOfFile(…) ::MapViewOfFile(…) Create mapping ::CreateFileMapping(…) Open file ::CreateFile(…)
  • 83.
  • 84.
  • 85.
    85 Formulation of theproblem • Dataset - Euclidian points (usually 2D or 3D vectors) - Strings - Complex objects like molecules, genes or other domain-specific objects • Distance metric - Symmetric function between points ● Input • Classification for each point – in fact just number of cluster • Points in one cluster should be as close as possibly with respect to given distance function ● Output
  • 86.
    86 Problems • Usually inputdataset is kind of Big Data; • Distance function is usually hard to calculate, especially for complex objects; • Precalculating all distances are efficient, but require huge amount of memory; • Of course, there are algorithms and accuracy problems as well, but they aren’t considered
  • 87.
    87 Solution • Lets precalculatedistances matrix using memory-mapped files, because the whole matrix can’t be placed into RAM and even into RAM+swap; • K-medoids algorithm uses distances matrix • Lets compare performance
  • 88.
    88 Solution • What isusually stored in memory: - Dataset - Distances matrix - Labels (clustering result) • Lets consider tasks when dataset is not really big. In the simulated dataset it is 100.000 points • Distances matrix is large. 100’000 * 100’000 * 8 / 2 ~ 40 GB • Precalculating distances matrix is common approach when metric is complex and expensive.
  • 89.
  • 90.
  • 91.
    91 pyclustering Algorithm from opensource clustering library has been used. https://github.com/annoviko/pyclustering
  • 92.
  • 93.
  • 94.
    9494 References • https://www.youtube.com/watch?v=qcBIvnQt0Bw&list=PLiwt1iVUib9s2Uo5BeYmwkDFUh7 0fJPxX • EffectiveSTL Scott Meyers • https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc • https://docs.microsoft.com/en-us/windows/win32/memory/reserving-and-committing- memory • https://rcl-rs-vvg.blogspot.com/2018/08/allocating-memory-on-linux-and-windows.html • Some talks at CppCon and other conferences: - https://www.youtube.com/watch?v=nZNd5FjSquk - https://www.youtube.com/watch?v=IejdKidUwIg - https://www.youtube.com/watch?v=FLbXjNrAjbc - https://channel9.msdn.com/events/GoingNative/CppCon-2017/119?term=Pablo%20Halpern
  • 95.