Improve performance of and reduce overheads of memory management

Whenever we palloc a chunk of memory, traditionally, we prefix the returned pointer with a pointer to the memory context to which the chunk belongs. This is required so that we're able to easily determine the owning context when performing operations such as pfree() and repalloc(). For the AllocSet context, prior to this commit we additionally prefixed the pointer to the owning context with the size of the chunk. This made the header 16 bytes in size. This 16-byte overhead was required for all AllocSet allocations regardless of the allocation size. For the generation context, the problem was worse; in addition to the pointer to the owning context and chunk size, we also stored a pointer to the owning block so that we could track the number of freed chunks on a block. The slab allocator had a 16-byte chunk header. The changes being made here reduce the chunk header size down to just 8 bytes for all 3 of our memory context types. For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory. Additionally, this commit completely changes the rule that pointers to palloc'd memory must be directly prefixed by a pointer to the owning memory context and instead, we now insist that they're directly prefixed by an 8-byte value where the least significant 3-bits are set to a value to indicate which type of memory context the pointer belongs to. Using those 3 bits as an index (known as MemoryContextMethodID) to a new array which stores the methods for each memory context type, we're now able to pass the pointer given to functions such as pfree() and repalloc() to the function specific to that context implementation to allow them to devise their own methods of finding the memory context which owns the given allocated chunk of memory. The reason we're able to reduce the chunk header down to just 8 bytes is because of the way we make use of the remaining 61 bits of the required 8-byte chunk header. Here we also implement a general-purpose MemoryChunk struct which makes use of those 61 remaining bits to allow the storage of a 30-bit value which the MemoryContext is free to use as it pleases, and also the number of bytes which must be subtracted from the chunk to get a reference to the block that the chunk is stored on (also 30 bits). The 1 additional remaining bit is to denote if the chunk is an "external" chunk or not. External here means that the chunk header does not store the 30-bit value or the block offset. The MemoryContext can use these external chunks at any time, but must use them if any of the two 30-bit fields are not large enough for the value(s) that need to be stored in them. When the chunk is marked as external, it is up to the MemoryContext to devise its own means to determine the block offset. Using 3-bits for the MemoryContextMethodID does mean we're limiting ourselves to only having a maximum of 8 different memory context types. We could reduce the bit space for the 30-bit value a little to make way for more than 3 bits, but it seems like it might be better to do that only if we ever need more than 8 context types. This would only be a problem if some future memory context type which does not use MemoryChunk really couldn't give up any of the 61 remaining bits in the chunk header. With this MemoryChunk, each of our 3 memory context types can quickly obtain a reference to the block any given chunk is located on. AllocSet is able to find the context to which the chunk is owned, by first obtaining a reference to the block by subtracting the block offset as is stored in the 'hdrmask' field and then referencing the block's 'aset' field. The Generation context uses the same method, but GenerationBlock did not have a field pointing back to the owning context, so one is added by this commit. In aset.c and generation.c, all allocations larger than allocChunkLimit are stored on dedicated blocks. When there's just a single chunk on a block like this, it's easy to find the block from the chunk, we just subtract the size of the block header from the chunk pointer. The size of these chunks is also known as we store the endptr on the block, so we can just subtract the pointer to the allocated memory from that. Because we can easily find the owning block and the size of the chunk for these dedicated blocks, we just always use external chunks for allocation sizes larger than allocChunkLimit. For generation.c, this sidesteps the problem of non-external MemoryChunks being unable to represent chunk sizes >= 1GB. This is less of a problem for aset.c as we store the free list index in the MemoryChunk's spare 30-bit field (the value of which will never be close to using all 30-bits). We can easily reverse engineer the chunk size from this when needed. Storing this saves AllocSetFree() from having to make a call to AllocSetFreeIndex() to determine which free list to put the newly freed chunk on. For the slab allocator, this commit adds a new restriction that slab chunks cannot be >= 1GB in size. If there happened to be any users of slab.c which used chunk sizes this large, they really should be using AllocSet instead. Here we also add a restriction that normal non-dedicated blocks cannot be 1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB during the creation of an AllocSet or Generation context. Allocations can still be larger than 1GB, it's just these will always be on dedicated blocks (which do not have the 1GB restriction). Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
author: David Rowley 2022-08-29 05:15:00 +0000
committer: David Rowley 2022-08-29 05:15:00 +0000
commit: c6e0fe1f2a08505544c410f613839664eea9eb21 (patch)
tree: 29cc826395108c44cfe0796009dacb2467534784 /src/backend/utils/mmgr/slab.c
parent: d2169c998553a6945fd51b8a1e5e9e1384283fdd (diff)
1 files changed, 86 insertions, 114 deletions
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 67d97b22e56..ae1a735b8cb 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -55,6 +55,8 @@
 #include "lib/ilist.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
+#include "utils/memutils_memorychunk.h"
+#include "utils/memutils_internal.h"
 
 /*
  * SlabContext is a specialized implementation of MemoryContext.
@@ -90,33 +92,17 @@ typedef struct SlabBlock
 	dlist_node	node;			/* doubly-linked list */
 	int			nfree;			/* number of free chunks */
 	int			firstFreeChunk; /* index of the first free chunk in the block */
-} SlabBlock;
-
-/*
- * SlabChunk
- *		The prefix of each piece of memory in a SlabBlock
- *
- * Note: to meet the memory context APIs, the payload area of the chunk must
- * be maxaligned, and the "slab" link must be immediately adjacent to the
- * payload area (cf. GetMemoryChunkContext).  Since we support no machines on
- * which MAXALIGN is more than twice sizeof(void *), this happens without any
- * special hacking in this struct declaration.  But there is a static
- * assertion below that the alignment is done correctly.
- */
-typedef struct SlabChunk
-{
-	SlabBlock  *block;			/* block owning this chunk */
 	SlabContext *slab;			/* owning context */
-	/* there must not be any padding to reach a MAXALIGN boundary here! */
-} SlabChunk;
+} SlabBlock;
 
 
+#define Slab_CHUNKHDRSZ sizeof(MemoryChunk)
 #define SlabPointerGetChunk(ptr)	\
-	((SlabChunk *)(((char *)(ptr)) - sizeof(SlabChunk)))
+	((MemoryChunk *)(((char *)(ptr)) - sizeof(MemoryChunk)))
 #define SlabChunkGetPointer(chk)	\
-	((void *)(((char *)(chk)) + sizeof(SlabChunk)))
+	((void *)(((char *)(chk)) + sizeof(MemoryChunk)))
 #define SlabBlockGetChunk(slab, block, idx) \
-	((SlabChunk *) ((char *) (block) + sizeof(SlabBlock)	\
+	((MemoryChunk *) ((char *) (block) + sizeof(SlabBlock)	\
 					+ (idx * slab->fullChunkSize)))
 #define SlabBlockStart(block)	\
 	((char *) block + sizeof(SlabBlock))
@@ -124,42 +110,6 @@ typedef struct SlabChunk
 	(((char *) chunk - SlabBlockStart(block)) / slab->fullChunkSize)
 
 /*
- * These functions implement the MemoryContext API for Slab contexts.
- */
-static void *SlabAlloc(MemoryContext context, Size size);
-static void SlabFree(MemoryContext context, void *pointer);
-static void *SlabRealloc(MemoryContext context, void *pointer, Size size);
-static void SlabReset(MemoryContext context);
-static void SlabDelete(MemoryContext context);
-static Size SlabGetChunkSpace(MemoryContext context, void *pointer);
-static bool SlabIsEmpty(MemoryContext context);
-static void SlabStats(MemoryContext context,
-					  MemoryStatsPrintFunc printfunc, void *passthru,
-					  MemoryContextCounters *totals,
-					  bool print_to_stderr);
-#ifdef MEMORY_CONTEXT_CHECKING
-static void SlabCheck(MemoryContext context);
-#endif
-
-/*
- * This is the virtual function table for Slab contexts.
- */
-static const MemoryContextMethods SlabMethods = {
-	SlabAlloc,
-	SlabFree,
-	SlabRealloc,
-	SlabReset,
-	SlabDelete,
-	SlabGetChunkSpace,
-	SlabIsEmpty,
-	SlabStats
-#ifdef MEMORY_CONTEXT_CHECKING
-	,SlabCheck
-#endif
-};
-
-
-/*
  * SlabContextCreate
  *		Create a new Slab context.
  *
@@ -168,8 +118,7 @@ static const MemoryContextMethods SlabMethods = {
  * blockSize: allocation block size
  * chunkSize: allocation chunk size
  *
- * The chunkSize may not exceed:
- *		MAXALIGN_DOWN(SIZE_MAX) - MAXALIGN(sizeof(SlabBlock)) - sizeof(SlabChunk)
+ * The MAXALIGN(chunkSize) may not exceed MEMORYCHUNK_MAX_VALUE
  */
 MemoryContext
 SlabContextCreate(MemoryContext parent,
@@ -184,19 +133,17 @@ SlabContextCreate(MemoryContext parent,
 	SlabContext *slab;
 	int			i;
 
-	/* Assert we padded SlabChunk properly */
-	StaticAssertStmt(sizeof(SlabChunk) == MAXALIGN(sizeof(SlabChunk)),
-					 "sizeof(SlabChunk) is not maxaligned");
-	StaticAssertStmt(offsetof(SlabChunk, slab) + sizeof(MemoryContext) ==
-					 sizeof(SlabChunk),
-					 "padding calculation in SlabChunk is wrong");
+	/* ensure MemoryChunk's size is properly maxaligned */
+	StaticAssertStmt(Slab_CHUNKHDRSZ == MAXALIGN(Slab_CHUNKHDRSZ),
+					 "sizeof(MemoryChunk) is not maxaligned");
+	Assert(MAXALIGN(chunkSize) <= MEMORYCHUNK_MAX_VALUE);
 
 	/* Make sure the linked list node fits inside a freed chunk */
 	if (chunkSize < sizeof(int))
 		chunkSize = sizeof(int);
 
 	/* chunk, including SLAB header (both addresses nicely aligned) */
-	fullChunkSize = sizeof(SlabChunk) + MAXALIGN(chunkSize);
+	fullChunkSize = Slab_CHUNKHDRSZ + MAXALIGN(chunkSize);
 
 	/* Make sure the block can store at least one chunk. */
 	if (blockSize < fullChunkSize + sizeof(SlabBlock))
@@ -265,7 +212,7 @@ SlabContextCreate(MemoryContext parent,
 	/* Finally, do the type-independent part of context creation */
 	MemoryContextCreate((MemoryContext) slab,
 						T_SlabContext,
-						&SlabMethods,
+						MCTX_SLAB_ID,
 						parent,
 						name);
 
@@ -279,7 +226,7 @@ SlabContextCreate(MemoryContext parent,
  * The code simply frees all the blocks in the context - we don't keep any
  * keeper blocks or anything like that.
  */
-static void
+void
 SlabReset(MemoryContext context)
 {
 	int			i;
@@ -322,7 +269,7 @@ SlabReset(MemoryContext context)
  * SlabDelete
  *		Free all memory which is allocated in the given context.
  */
-static void
+void
 SlabDelete(MemoryContext context)
 {
 	/* Reset to release all the SlabBlocks */
@@ -336,12 +283,12 @@ SlabDelete(MemoryContext context)
  *		Returns pointer to allocated memory of given size or NULL if
  *		request could not be completed; memory is added to the slab.
  */
-static void *
+void *
 SlabAlloc(MemoryContext context, Size size)
 {
 	SlabContext *slab = castNode(SlabContext, context);
 	SlabBlock  *block;
-	SlabChunk  *chunk;
+	MemoryChunk *chunk;
 	int			idx;
 
 	Assert(slab);
@@ -370,6 +317,7 @@ SlabAlloc(MemoryContext context, Size size)
 
 		block->nfree = slab->chunksPerBlock;
 		block->firstFreeChunk = 0;
+		block->slab = slab;
 
 		/*
 		 * Put all the chunks on a freelist. Walk the chunks and point each
@@ -378,7 +326,7 @@ SlabAlloc(MemoryContext context, Size size)
 		for (idx = 0; idx < slab->chunksPerBlock; idx++)
 		{
 			chunk = SlabBlockGetChunk(slab, block, idx);
-			*(int32 *) SlabChunkGetPointer(chunk) = (idx + 1);
+			*(int32 *) MemoryChunkGetPointer(chunk) = (idx + 1);
 		}
 
 		/*
@@ -426,8 +374,8 @@ SlabAlloc(MemoryContext context, Size size)
 	 * Remove the chunk from the freelist head. The index of the next free
 	 * chunk is stored in the chunk itself.
 	 */
-	VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));
-	block->firstFreeChunk = *(int32 *) SlabChunkGetPointer(chunk);
+	VALGRIND_MAKE_MEM_DEFINED(MemoryChunkGetPointer(chunk), sizeof(int32));
+	block->firstFreeChunk = *(int32 *) MemoryChunkGetPointer(chunk);
 
 	Assert(block->firstFreeChunk >= 0);
 	Assert(block->firstFreeChunk <= slab->chunksPerBlock);
@@ -464,47 +412,47 @@ SlabAlloc(MemoryContext context, Size size)
 		slab->minFreeChunks = 0;
 
 	/* Prepare to initialize the chunk header. */
-	VALGRIND_MAKE_MEM_UNDEFINED(chunk, sizeof(SlabChunk));
-
-	chunk->block = block;
-	chunk->slab = slab;
+	VALGRIND_MAKE_MEM_UNDEFINED(chunk, Slab_CHUNKHDRSZ);
 
+	MemoryChunkSetHdrMask(chunk, block, MAXALIGN(slab->chunkSize),
+						  MCTX_SLAB_ID);
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* slab mark to catch clobber of "unused" space */
-	if (slab->chunkSize < (slab->fullChunkSize - sizeof(SlabChunk)))
+	if (slab->chunkSize < (slab->fullChunkSize - Slab_CHUNKHDRSZ))
 	{
-		set_sentinel(SlabChunkGetPointer(chunk), size);
+		set_sentinel(MemoryChunkGetPointer(chunk), size);
 		VALGRIND_MAKE_MEM_NOACCESS(((char *) chunk) +
-								   sizeof(SlabChunk) + slab->chunkSize,
+								   Slab_CHUNKHDRSZ + slab->chunkSize,
 								   slab->fullChunkSize -
-								   (slab->chunkSize + sizeof(SlabChunk)));
+								   (slab->chunkSize + Slab_CHUNKHDRSZ));
 	}
 #endif
+
 #ifdef RANDOMIZE_ALLOCATED_MEMORY
 	/* fill the allocated space with junk */
-	randomize_mem((char *) SlabChunkGetPointer(chunk), size);
+	randomize_mem((char *) MemoryChunkGetPointer(chunk), size);
 #endif
 
 	Assert(slab->nblocks * slab->blockSize == context->mem_allocated);
 
-	return SlabChunkGetPointer(chunk);
+	return MemoryChunkGetPointer(chunk);
 }
 
 /*
  * SlabFree
  *		Frees allocated memory; memory is removed from the slab.
  */
-static void
-SlabFree(MemoryContext context, void *pointer)
+void
+SlabFree(void *pointer)
 {
 	int			idx;
-	SlabContext *slab = castNode(SlabContext, context);
-	SlabChunk  *chunk = SlabPointerGetChunk(pointer);
-	SlabBlock  *block = chunk->block;
+	MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
+	SlabBlock  *block = MemoryChunkGetBlock(chunk);
+	SlabContext *slab = block->slab;
 
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* Test for someone scribbling on unused space in chunk */
-	if (slab->chunkSize < (slab->fullChunkSize - sizeof(SlabChunk)))
+	if (slab->chunkSize < (slab->fullChunkSize - Slab_CHUNKHDRSZ))
 		if (!sentinel_ok(pointer, slab->chunkSize))
 			elog(WARNING, "detected write past chunk end in %s %p",
 				 slab->header.name, chunk);
@@ -560,13 +508,13 @@ SlabFree(MemoryContext context, void *pointer)
 	{
 		free(block);
 		slab->nblocks--;
-		context->mem_allocated -= slab->blockSize;
+		slab->header.mem_allocated -= slab->blockSize;
 	}
 	else
 		dlist_push_head(&slab->freelist[block->nfree], &block->node);
 
 	Assert(slab->nblocks >= 0);
-	Assert(slab->nblocks * slab->blockSize == context->mem_allocated);
+	Assert(slab->nblocks * slab->blockSize == slab->header.mem_allocated);
 }
 
 /*
@@ -582,13 +530,14 @@ SlabFree(MemoryContext context, void *pointer)
  * rather pointless - Slab is meant for chunks of constant size, and moreover
  * realloc is usually used to enlarge the chunk.
  */
-static void *
-SlabRealloc(MemoryContext context, void *pointer, Size size)
+void *
+SlabRealloc(void *pointer, Size size)
 {
-	SlabContext *slab = castNode(SlabContext, context);
+	MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
+	SlabBlock  *block = MemoryChunkGetBlock(chunk);
+	SlabContext *slab = block->slab;
 
 	Assert(slab);
-
 	/* can't do actual realloc with slab, but let's try to be gentle */
 	if (size == slab->chunkSize)
 		return pointer;
@@ -598,14 +547,32 @@ SlabRealloc(MemoryContext context, void *pointer, Size size)
 }
 
 /*
+ * SlabGetChunkContext
+ *		Return the MemoryContext that 'pointer' belongs to.
+ */
+MemoryContext
+SlabGetChunkContext(void *pointer)
+{
+	MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
+	SlabBlock  *block = MemoryChunkGetBlock(chunk);
+	SlabContext *slab = block->slab;
+
+	Assert(slab != NULL);
+
+	return &slab->header;
+}
+
+/*
  * SlabGetChunkSpace
  *		Given a currently-allocated chunk, determine the total space
  *		it occupies (including all memory-allocation overhead).
  */
-static Size
-SlabGetChunkSpace(MemoryContext context, void *pointer)
+Size
+SlabGetChunkSpace(void *pointer)
 {
-	SlabContext *slab = castNode(SlabContext, context);
+	MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
+	SlabBlock  *block = MemoryChunkGetBlock(chunk);
+	SlabContext *slab = block->slab;
 
 	Assert(slab);
 
@@ -616,7 +583,7 @@ SlabGetChunkSpace(MemoryContext context, void *pointer)
  * SlabIsEmpty
  *		Is an Slab empty of any allocated space?
  */
-static bool
+bool
 SlabIsEmpty(MemoryContext context)
 {
 	SlabContext *slab = castNode(SlabContext, context);
@@ -635,7 +602,7 @@ SlabIsEmpty(MemoryContext context)
  * totals: if not NULL, add stats about this context into *totals.
  * print_to_stderr: print stats to stderr if true, elog otherwise.
  */
-static void
+void
 SlabStats(MemoryContext context,
 		  MemoryStatsPrintFunc printfunc, void *passthru,
 		  MemoryContextCounters *totals,
@@ -697,7 +664,7 @@ SlabStats(MemoryContext context,
  * find yourself in an infinite loop when trouble occurs, because this
  * routine will be entered again when elog cleanup tries to release memory!
  */
-static void
+void
 SlabCheck(MemoryContext context)
 {
 	int			i;
@@ -728,6 +695,11 @@ SlabCheck(MemoryContext context)
 				elog(WARNING, "problem in slab %s: number of free chunks %d in block %p does not match freelist %d",
 					 name, block->nfree, block, i);
 
+			/* make sure the slab pointer correctly points to this context */
+			if (block->slab != slab)
+				elog(WARNING, "problem in slab %s: bogus slab link in block %p",
+					 name, block);
+
 			/* reset the bitmap of free chunks for this block */
 			memset(slab->freechunks, 0, (slab->chunksPerBlock * sizeof(bool)));
 			idx = block->firstFreeChunk;
@@ -742,7 +714,7 @@ SlabCheck(MemoryContext context)
 			nfree = 0;
 			while (idx < slab->chunksPerBlock)
 			{
-				SlabChunk  *chunk;
+				MemoryChunk *chunk;
 
 				/* count the chunk as free, add it to the bitmap */
 				nfree++;
@@ -750,8 +722,8 @@ SlabCheck(MemoryContext context)
 
 				/* read index of the next free chunk */
 				chunk = SlabBlockGetChunk(slab, block, idx);
-				VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));
-				idx = *(int32 *) SlabChunkGetPointer(chunk);
+				VALGRIND_MAKE_MEM_DEFINED(MemoryChunkGetPointer(chunk), sizeof(int32));
+				idx = *(int32 *) MemoryChunkGetPointer(chunk);
 			}
 
 			for (j = 0; j < slab->chunksPerBlock; j++)
@@ -759,19 +731,19 @@ SlabCheck(MemoryContext context)
 				/* non-zero bit in the bitmap means chunk the chunk is used */
 				if (!slab->freechunks[j])
 				{
-					SlabChunk  *chunk = SlabBlockGetChunk(slab, block, j);
-
-					/* chunks have both block and slab pointers, so check both */
-					if (chunk->block != block)
+					MemoryChunk *chunk = SlabBlockGetChunk(slab, block, j);
+					SlabBlock  *chunkblock = (SlabBlock *) MemoryChunkGetBlock(chunk);
+
+					/*
+					 * check the chunk's blockoffset correctly points back to
+					 * the block
+					 */
+					if (chunkblock != block)
 						elog(WARNING, "problem in slab %s: bogus block link in block %p, chunk %p",
 							 name, block, chunk);
 
-					if (chunk->slab != slab)
-						elog(WARNING, "problem in slab %s: bogus slab link in block %p, chunk %p",
-							 name, block, chunk);
-
 					/* there might be sentinel (thanks to alignment) */
-					if (slab->chunkSize < (slab->fullChunkSize - sizeof(SlabChunk)))
+					if (slab->chunkSize < (slab->fullChunkSize - Slab_CHUNKHDRSZ))
 						if (!sentinel_ok(chunk, slab->chunkSize))
 							elog(WARNING, "problem in slab %s: detected write past chunk end in block %p, chunk %p",
 								 name, block, chunk);
author	David Rowley	2022-08-29 05:15:00 +0000
committer	David Rowley	2022-08-29 05:15:00 +0000
commit	c6e0fe1f2a08505544c410f613839664eea9eb21 (patch)
tree	29cc826395108c44cfe0796009dacb2467534784 /src/backend/utils/mmgr/slab.c
parent	d2169c998553a6945fd51b8a1e5e9e1384283fdd (diff)