1996-08-27 21:59:28 -04:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
*
|
1999-02-13 18:22:53 -05:00
|
|
|
* memutils.h
|
2000-06-27 23:33:33 -04:00
|
|
|
* This file contains declarations for memory allocation utility
|
|
|
|
|
* functions. These are functions that are not quite widely used
|
|
|
|
|
* enough to justify going in utils/palloc.h, but are still part
|
|
|
|
|
* of the API of the memory management subsystem.
|
1996-08-27 21:59:28 -04:00
|
|
|
*
|
|
|
|
|
*
|
2025-01-01 11:21:55 -05:00
|
|
|
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
|
2000-01-26 00:58:53 -05:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-08-27 21:59:28 -04:00
|
|
|
*
|
2010-09-20 16:08:53 -04:00
|
|
|
* src/include/utils/memutils.h
|
1996-08-27 21:59:28 -04:00
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
#ifndef MEMUTILS_H
|
|
|
|
|
#define MEMUTILS_H
|
|
|
|
|
|
2000-06-27 23:33:33 -04:00
|
|
|
#include "nodes/memnodes.h"
|
1996-08-27 21:59:28 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2013-06-27 14:53:57 -04:00
|
|
|
* MaxAllocSize, MaxAllocHugeSize
|
|
|
|
|
* Quasi-arbitrary limits on size of allocations.
|
1996-08-27 21:59:28 -04:00
|
|
|
*
|
|
|
|
|
* Note:
|
2013-06-27 14:53:57 -04:00
|
|
|
* There is no guarantee that smaller allocations will succeed, but
|
|
|
|
|
* larger requests will be summarily denied.
|
|
|
|
|
*
|
|
|
|
|
* palloc() enforces MaxAllocSize, chosen to correspond to the limiting size
|
|
|
|
|
* of varlena objects under TOAST. See VARSIZE_4B() and related macros in
|
|
|
|
|
* postgres.h. Many datatypes assume that any allocatable size can be
|
|
|
|
|
* represented in a varlena header. This limit also permits a caller to use
|
|
|
|
|
* an "int" variable for an index into or length of an allocation. Callers
|
|
|
|
|
* careful to avoid these hazards can access the higher limit with
|
|
|
|
|
* MemoryContextAllocHuge(). Both limits permit code to assume that it may
|
|
|
|
|
* compute twice an allocation's size without overflow.
|
1996-08-27 21:59:28 -04:00
|
|
|
*/
|
2001-02-05 20:53:53 -05:00
|
|
|
#define MaxAllocSize ((Size) 0x3fffffff) /* 1 gigabyte - 1 */
|
1996-08-27 21:59:28 -04:00
|
|
|
|
2004-06-05 15:48:09 -04:00
|
|
|
#define AllocSizeIsValid(size) ((Size) (size) <= MaxAllocSize)
|
1996-08-27 21:59:28 -04:00
|
|
|
|
Improve performance of and reduce overheads of memory management
Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs. This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().
For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk. This made
the header 16 bytes in size. This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.
For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.
The slab allocator had a 16-byte chunk header.
The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types. For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.
Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to. Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.
The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header. Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits). The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not. External here means that the chunk header does not store the
30-bit value or the block offset. The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them. When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.
Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types. This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.
With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on. AllocSet
is able to find the context to which the chunk is owned, by first
obtaining a reference to the block by subtracting the block offset as is
stored in the 'hdrmask' field and then referencing the block's 'aset'
field. The Generation context uses the same method, but GenerationBlock
did not have a field pointing back to the owning context, so one is added
by this commit.
In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks. When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer. The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that. Because we
can easily find the owning block and the size of the chunk for these
dedicated blocks, we just always use external chunks for allocation sizes
larger than allocChunkLimit. For generation.c, this sidesteps the problem
of non-external MemoryChunks being unable to represent chunk sizes >= 1GB.
This is less of a problem for aset.c as we store the free list index in
the MemoryChunk's spare 30-bit field (the value of which will never be
close to using all 30-bits). We can easily reverse engineer the chunk size
from this when needed. Storing this saves AllocSetFree() from having to
make a call to AllocSetFreeIndex() to determine which free list to put the
newly freed chunk on.
For the slab allocator, this commit adds a new restriction that slab
chunks cannot be >= 1GB in size. If there happened to be any users of
slab.c which used chunk sizes this large, they really should be using
AllocSet instead.
Here we also add a restriction that normal non-dedicated blocks cannot be
1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB
during the creation of an AllocSet or Generation context. Allocations can
still be larger than 1GB, it's just these will always be on dedicated
blocks (which do not have the 1GB restriction).
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 01:15:00 -04:00
|
|
|
/* Must be less than SIZE_MAX */
|
2017-09-01 13:52:53 -04:00
|
|
|
#define MaxAllocHugeSize (SIZE_MAX / 2)
|
2013-06-27 14:53:57 -04:00
|
|
|
|
Improve performance of and reduce overheads of memory management
Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs. This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().
For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk. This made
the header 16 bytes in size. This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.
For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.
The slab allocator had a 16-byte chunk header.
The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types. For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.
Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to. Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.
The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header. Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits). The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not. External here means that the chunk header does not store the
30-bit value or the block offset. The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them. When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.
Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types. This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.
With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on. AllocSet
is able to find the context to which the chunk is owned, by first
obtaining a reference to the block by subtracting the block offset as is
stored in the 'hdrmask' field and then referencing the block's 'aset'
field. The Generation context uses the same method, but GenerationBlock
did not have a field pointing back to the owning context, so one is added
by this commit.
In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks. When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer. The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that. Because we
can easily find the owning block and the size of the chunk for these
dedicated blocks, we just always use external chunks for allocation sizes
larger than allocChunkLimit. For generation.c, this sidesteps the problem
of non-external MemoryChunks being unable to represent chunk sizes >= 1GB.
This is less of a problem for aset.c as we store the free list index in
the MemoryChunk's spare 30-bit field (the value of which will never be
close to using all 30-bits). We can easily reverse engineer the chunk size
from this when needed. Storing this saves AllocSetFree() from having to
make a call to AllocSetFreeIndex() to determine which free list to put the
newly freed chunk on.
For the slab allocator, this commit adds a new restriction that slab
chunks cannot be >= 1GB in size. If there happened to be any users of
slab.c which used chunk sizes this large, they really should be using
AllocSet instead.
Here we also add a restriction that normal non-dedicated blocks cannot be
1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB
during the creation of an AllocSet or Generation context. Allocations can
still be larger than 1GB, it's just these will always be on dedicated
blocks (which do not have the 1GB restriction).
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 01:15:00 -04:00
|
|
|
#define InvalidAllocSize SIZE_MAX
|
|
|
|
|
|
2013-06-27 14:53:57 -04:00
|
|
|
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
|
|
|
|
|
|
1999-08-24 16:11:19 -04:00
|
|
|
|
1999-02-06 11:50:34 -05:00
|
|
|
/*
|
2000-06-27 23:33:33 -04:00
|
|
|
* Standard top-level memory contexts.
|
1999-08-24 16:11:19 -04:00
|
|
|
*
|
2000-06-27 23:33:33 -04:00
|
|
|
* Only TopMemoryContext and ErrorContext are initialized by
|
|
|
|
|
* MemoryContextInit() itself.
|
1999-02-06 11:50:34 -05:00
|
|
|
*/
|
2007-07-25 08:22:54 -04:00
|
|
|
extern PGDLLIMPORT MemoryContext TopMemoryContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext ErrorContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext PostmasterContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext CacheMemoryContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext MessageContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext TopTransactionContext;
|
|
|
|
|
extern PGDLLIMPORT MemoryContext CurTransactionContext;
|
2003-08-03 20:43:34 -04:00
|
|
|
|
2007-03-12 20:33:44 -04:00
|
|
|
/* This is a transient link to the active portal's memory context: */
|
2007-07-25 08:22:54 -04:00
|
|
|
extern PGDLLIMPORT MemoryContext PortalContext;
|
1999-02-06 11:50:34 -05:00
|
|
|
|
|
|
|
|
|
1996-08-27 21:59:28 -04:00
|
|
|
/*
|
2000-06-27 23:33:33 -04:00
|
|
|
* Memory-context-type-independent functions in mcxt.c
|
1996-08-27 21:59:28 -04:00
|
|
|
*/
|
2000-06-27 23:33:33 -04:00
|
|
|
extern void MemoryContextInit(void);
|
|
|
|
|
extern void MemoryContextReset(MemoryContext context);
|
|
|
|
|
extern void MemoryContextDelete(MemoryContext context);
|
2015-02-27 18:09:42 -05:00
|
|
|
extern void MemoryContextResetOnly(MemoryContext context);
|
2000-06-27 23:33:33 -04:00
|
|
|
extern void MemoryContextResetChildren(MemoryContext context);
|
|
|
|
|
extern void MemoryContextDeleteChildren(MemoryContext context);
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:47 -04:00
|
|
|
extern void MemoryContextSetIdentifier(MemoryContext context, const char *id);
|
2011-09-11 16:29:42 -04:00
|
|
|
extern void MemoryContextSetParent(MemoryContext context,
|
|
|
|
|
MemoryContext new_parent);
|
Improve performance of and reduce overheads of memory management
Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs. This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().
For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk. This made
the header 16 bytes in size. This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.
For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.
The slab allocator had a 16-byte chunk header.
The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types. For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.
Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to. Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.
The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header. Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits). The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not. External here means that the chunk header does not store the
30-bit value or the block offset. The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them. When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.
Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types. This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.
With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on. AllocSet
is able to find the context to which the chunk is owned, by first
obtaining a reference to the block by subtracting the block offset as is
stored in the 'hdrmask' field and then referencing the block's 'aset'
field. The Generation context uses the same method, but GenerationBlock
did not have a field pointing back to the owning context, so one is added
by this commit.
In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks. When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer. The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that. Because we
can easily find the owning block and the size of the chunk for these
dedicated blocks, we just always use external chunks for allocation sizes
larger than allocChunkLimit. For generation.c, this sidesteps the problem
of non-external MemoryChunks being unable to represent chunk sizes >= 1GB.
This is less of a problem for aset.c as we store the free list index in
the MemoryChunk's spare 30-bit field (the value of which will never be
close to using all 30-bits). We can easily reverse engineer the chunk size
from this when needed. Storing this saves AllocSetFree() from having to
make a call to AllocSetFreeIndex() to determine which free list to put the
newly freed chunk on.
For the slab allocator, this commit adds a new restriction that slab
chunks cannot be >= 1GB in size. If there happened to be any users of
slab.c which used chunk sizes this large, they really should be using
AllocSet instead.
Here we also add a restriction that normal non-dedicated blocks cannot be
1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB
during the creation of an AllocSet or Generation context. Allocations can
still be larger than 1GB, it's just these will always be on dedicated
blocks (which do not have the 1GB restriction).
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
2022-08-29 01:15:00 -04:00
|
|
|
extern MemoryContext GetMemoryChunkContext(void *pointer);
|
2002-08-11 20:36:12 -04:00
|
|
|
extern Size GetMemoryChunkSpace(void *pointer);
|
2011-09-16 00:42:53 -04:00
|
|
|
extern MemoryContext MemoryContextGetParent(MemoryContext context);
|
2004-09-16 16:17:49 -04:00
|
|
|
extern bool MemoryContextIsEmpty(MemoryContext context);
|
2019-10-05 14:49:39 -04:00
|
|
|
extern Size MemoryContextMemAllocated(MemoryContext context, bool recurse);
|
2024-01-29 11:53:03 -05:00
|
|
|
extern void MemoryContextMemConsumed(MemoryContext context,
|
|
|
|
|
MemoryContextCounters *consumed);
|
2000-06-27 23:33:33 -04:00
|
|
|
extern void MemoryContextStats(MemoryContext context);
|
Avoid recursion in MemoryContext functions
You might run out of stack space with recursion, which is not nice in
functions that might be used e.g. at cleanup after transaction
abort. MemoryContext contains pointer to parent and siblings, so we
can traverse a tree of contexts iteratively, without using
stack. Refactor the functions to do that.
MemoryContextStats() still recurses, but it now has a limit to how
deep it recurses. Once the limit is reached, it prints just a summary
of the rest of the hierarchy, similar to how it summarizes contexts
with lots of children. That seems good anyway, because a context dump
with hundreds of nested contexts isn't very readable.
Report by Egor Chindyaskin and Alexander Lakhin.
Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane
2024-03-08 06:01:36 -05:00
|
|
|
extern void MemoryContextStatsDetail(MemoryContext context,
|
|
|
|
|
int max_level, int max_children,
|
Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.
This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.
Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.
When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.
There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.
Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 00:44:15 -04:00
|
|
|
bool print_to_stderr);
|
2014-06-30 03:13:48 -04:00
|
|
|
extern void MemoryContextAllowInCriticalSection(MemoryContext context,
|
|
|
|
|
bool allow);
|
2002-09-04 16:31:48 -04:00
|
|
|
|
2002-08-11 20:36:12 -04:00
|
|
|
#ifdef MEMORY_CONTEXT_CHECKING
|
Here is the patch with memory leak checker. This checker allow detect
in-chunk leaks, overwrite-next-chunk leaks and overwrite block-freeptr leaks.
A in-chunk leak --- if something overwrite space after wanted (via palloc()
size, but it is still inside chunk. For example
x = palloc(12); /* create 16b chunk */
memset(x, '#', 13);
this leak is in the current source total invisible, because chunk is 16b and
leak is in the "align space".
For this feature I add data_size to StandardChunk, and all memory which go
from AllocSetAlloc() is marked as 0x7F.
The MemoryContextCheck() is compiled '#ifdef USE_ASSERT_CHECKING'.
I add this checking to 'tcop/postgres.c' and is active after each backend
query, but it is probably not sufficient, because some MemoryContext exist
only during memory processing --- will good if someone who known where
it is needful (Tom:-) add it for others contexts;
A problem in the current source is that we have still some malloc()
allocation that is not needful and this allocation is total invisible for
all context routines. For example Dllist in backend (pretty dirty it is in
catcache where values in Dllist are palloc-ed, but list is malloc-ed).
--- and BTW. this Dllist design stand in the way for query cache :-)
Tom, if you agree I start replace some mallocs.
BTW. --- Tom, have you idea for across transaction presistent allocation for
SQL functions? (like regex - now it is via malloc)
I almost forget. I add one if() to AllocSetAlloc(), for 'size' that are
greater than ALLOC_BIGCHUNK_LIMIT is not needful check AllocSetFreeIndex(),
because 'fidx' is always 'ALLOCSET_NUM_FREELISTS - 1'. It a little brisk up
allocation for very large chunks. Right?
Karel
2000-07-11 10:30:37 -04:00
|
|
|
extern void MemoryContextCheck(MemoryContext context);
|
2002-08-11 20:36:12 -04:00
|
|
|
#endif
|
1996-08-27 21:59:28 -04:00
|
|
|
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:47 -04:00
|
|
|
/* Handy macro for copying and assigning context ID ... but note double eval */
|
2018-04-06 12:10:00 -04:00
|
|
|
#define MemoryContextCopyAndSetIdentifier(cxt, id) \
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:47 -04:00
|
|
|
MemoryContextSetIdentifier(cxt, MemoryContextStrdup(cxt, id))
|
|
|
|
|
|
Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.
This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.
Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.
When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.
There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.
Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 00:44:15 -04:00
|
|
|
extern void HandleLogMemoryContextInterrupt(void);
|
|
|
|
|
extern void ProcessLogMemoryContextInterrupt(void);
|
1996-08-27 21:59:28 -04:00
|
|
|
|
2000-06-27 23:33:33 -04:00
|
|
|
/*
|
|
|
|
|
* Memory-context-type-specific functions
|
|
|
|
|
*/
|
1996-08-27 21:59:28 -04:00
|
|
|
|
2000-06-27 23:33:33 -04:00
|
|
|
/* aset.c */
|
2018-10-12 14:26:56 -04:00
|
|
|
extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
|
Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:12 -05:00
|
|
|
const char *name,
|
|
|
|
|
Size minContextSize,
|
|
|
|
|
Size initBlockSize,
|
|
|
|
|
Size maxBlockSize);
|
|
|
|
|
|
|
|
|
|
/*
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:47 -04:00
|
|
|
* This wrapper macro exists to check for non-constant strings used as context
|
|
|
|
|
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
|
2018-10-12 14:26:56 -04:00
|
|
|
* want to provide a variable identifier.)
|
Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:12 -05:00
|
|
|
*/
|
|
|
|
|
#ifdef HAVE__BUILTIN_CONSTANT_P
|
2018-10-12 14:26:56 -04:00
|
|
|
#define AllocSetContextCreate(parent, name, ...) \
|
Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:12 -05:00
|
|
|
(StaticAssertExpr(__builtin_constant_p(name), \
|
Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:47 -04:00
|
|
|
"memory context names must be constant strings"), \
|
2018-10-12 14:26:56 -04:00
|
|
|
AllocSetContextCreateInternal(parent, name, __VA_ARGS__))
|
Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:12 -05:00
|
|
|
#else
|
2018-10-12 14:26:56 -04:00
|
|
|
#define AllocSetContextCreate \
|
|
|
|
|
AllocSetContextCreateInternal
|
Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:12 -05:00
|
|
|
#endif
|
1996-08-27 21:59:28 -04:00
|
|
|
|
Add "Slab" MemoryContext implementation for efficient equal-sized allocations.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
2017-02-27 06:41:44 -05:00
|
|
|
/* slab.c */
|
|
|
|
|
extern MemoryContext SlabContextCreate(MemoryContext parent,
|
|
|
|
|
const char *name,
|
|
|
|
|
Size blockSize,
|
|
|
|
|
Size chunkSize);
|
|
|
|
|
|
2017-11-22 13:45:07 -05:00
|
|
|
/* generation.c */
|
|
|
|
|
extern MemoryContext GenerationContextCreate(MemoryContext parent,
|
|
|
|
|
const char *name,
|
2022-04-04 04:53:13 -04:00
|
|
|
Size minContextSize,
|
|
|
|
|
Size initBlockSize,
|
|
|
|
|
Size maxBlockSize);
|
2017-11-22 13:45:07 -05:00
|
|
|
|
Introduce a bump memory allocator
This introduces a bump MemoryContext type. The bump context is best
suited for short-lived memory contexts which require only allocations
of memory and never a pfree or repalloc, which are unsupported.
Memory palloc'd into a bump context has no chunk header. This makes
bump a useful context type when lots of small allocations need to be
done without any need to pfree those allocations. Allocation sizes are
rounded up to the next MAXALIGN boundary, so with this and no chunk
header, allocations are very compact indeed.
Allocations are also very fast as bump does not check any freelists to
try and make use of previously free'd chunks. It just checks if there
is enough room on the current block, and if so it bumps the freeptr
beyond this chunk and returns the value that the freeptr was previously
pointing to. Simple and fast. A new block is malloc'd when there's not
enough space in the current block.
Code using the bump allocator must take care never to call any functions
which could try to call realloc() (or variants), pfree(),
GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated
chunk. Due to lack of chunk headers, these operations are unsupported.
To increase the chances of catching such issues, when compiled with
MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and
any attempt to perform an unsupported operation will result in an ERROR.
Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported
operation could result in a segfault.
A follow-on commit will implement the first user of bump.
Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
2024-04-07 08:02:43 -04:00
|
|
|
/* bump.c */
|
|
|
|
|
extern MemoryContext BumpContextCreate(MemoryContext parent,
|
|
|
|
|
const char *name,
|
|
|
|
|
Size minContextSize,
|
|
|
|
|
Size initBlockSize,
|
|
|
|
|
Size maxBlockSize);
|
|
|
|
|
|
2000-06-27 23:33:33 -04:00
|
|
|
/*
|
|
|
|
|
* Recommended default alloc parameters, suitable for "ordinary" contexts
|
|
|
|
|
* that might hold quite a lot of data.
|
|
|
|
|
*/
|
2002-12-15 16:01:34 -05:00
|
|
|
#define ALLOCSET_DEFAULT_MINSIZE 0
|
2000-06-27 23:33:33 -04:00
|
|
|
#define ALLOCSET_DEFAULT_INITSIZE (8 * 1024)
|
|
|
|
|
#define ALLOCSET_DEFAULT_MAXSIZE (8 * 1024 * 1024)
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
|
|
|
#define ALLOCSET_DEFAULT_SIZES \
|
|
|
|
|
ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE
|
1996-08-27 21:59:28 -04:00
|
|
|
|
2002-12-15 16:01:34 -05:00
|
|
|
/*
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
|
|
|
* Recommended alloc parameters for "small" contexts that are never expected
|
2002-12-15 16:01:34 -05:00
|
|
|
* to contain much data (for example, a context to contain a query plan).
|
|
|
|
|
*/
|
|
|
|
|
#define ALLOCSET_SMALL_MINSIZE 0
|
|
|
|
|
#define ALLOCSET_SMALL_INITSIZE (1 * 1024)
|
|
|
|
|
#define ALLOCSET_SMALL_MAXSIZE (8 * 1024)
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
|
|
|
#define ALLOCSET_SMALL_SIZES \
|
|
|
|
|
ALLOCSET_SMALL_MINSIZE, ALLOCSET_SMALL_INITSIZE, ALLOCSET_SMALL_MAXSIZE
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Recommended alloc parameters for contexts that should start out small,
|
|
|
|
|
* but might sometimes grow big.
|
|
|
|
|
*/
|
|
|
|
|
#define ALLOCSET_START_SMALL_SIZES \
|
|
|
|
|
ALLOCSET_SMALL_MINSIZE, ALLOCSET_SMALL_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE
|
|
|
|
|
|
2002-12-15 16:01:34 -05:00
|
|
|
|
Fix bogus "out of memory" reports in tuplestore.c.
The tuplesort/tuplestore memory management logic assumed that the chunk
allocation overhead for its memtuples array could not increase when
increasing the array size. This is and always was true for tuplesort,
but we (I, I think) blindly copied that logic into tuplestore.c without
noticing that the assumption failed to hold for the much smaller array
elements used by tuplestore. Given rather small work_mem, this could
result in an improper complaint about "unexpected out-of-memory situation",
as reported by Brent DeSpain in bug #13530.
The easiest way to fix this is just to increase tuplestore's initial
array size so that the assumption holds. Rather than relying on magic
constants, though, let's export a #define from aset.c that represents
the safe allocation threshold, and make tuplestore's calculation depend
on that.
Do the same in tuplesort.c to keep the logic looking parallel, even though
tuplesort.c isn't actually at risk at present. This will keep us from
breaking it if we ever muck with the allocation parameters in aset.c.
Back-patch to all supported versions. The error message doesn't occur
pre-9.3, not so much because the problem can't happen as because the
pre-9.3 tuplestore code neglected to check for it. (The chance of
trouble is a great deal larger as of 9.3, though, due to changes in the
array-size-increasing strategy.) However, allowing LACKMEM() to become
true unexpectedly could still result in less-than-desirable behavior,
so let's patch it all the way back.
2015-08-04 18:18:46 -04:00
|
|
|
/*
|
|
|
|
|
* Threshold above which a request in an AllocSet context is certain to be
|
|
|
|
|
* allocated separately (and thereby have constant allocation overhead).
|
|
|
|
|
* Few callers should be interested in this, but tuplesort/tuplestore need
|
|
|
|
|
* to know it.
|
|
|
|
|
*/
|
|
|
|
|
#define ALLOCSET_SEPARATE_THRESHOLD 8192
|
|
|
|
|
|
Add "Slab" MemoryContext implementation for efficient equal-sized allocations.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
2017-02-27 06:41:44 -05:00
|
|
|
#define SLAB_DEFAULT_BLOCK_SIZE (8 * 1024)
|
|
|
|
|
#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
|
|
|
|
|
|
2024-10-31 22:35:46 -04:00
|
|
|
/*
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
* pg_memory_is_all_zeros
|
|
|
|
|
*
|
2024-10-31 22:35:46 -04:00
|
|
|
* Test if a memory region starting at "ptr" and of size "len" is full of
|
|
|
|
|
* zeroes.
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
*
|
|
|
|
|
* The test is divided into multiple cases for safety reason and multiple
|
|
|
|
|
* phases for efficiency.
|
|
|
|
|
*
|
|
|
|
|
* Case 1: len < sizeof(size_t) bytes, then byte-by-byte comparison.
|
|
|
|
|
* Case 2: len < (sizeof(size_t) * 8 - 1) bytes:
|
|
|
|
|
* - Phase 1: byte-by-byte comparison, until the pointer is aligned.
|
|
|
|
|
* - Phase 2: size_t comparisons, with aligned pointers, up to the last
|
|
|
|
|
* location possible.
|
|
|
|
|
* - Phase 3: byte-by-byte comparison, until the end location.
|
|
|
|
|
* Case 3: len >= (sizeof(size_t) * 8) bytes, same as case 2 except that an
|
|
|
|
|
* additional phase is placed between Phase 1 and Phase 2, with
|
|
|
|
|
* (8 * sizeof(size_t)) comparisons using bitwise OR to encourage
|
|
|
|
|
* compilers to use SIMD instructions if available, up to the last
|
|
|
|
|
* aligned location possible.
|
|
|
|
|
*
|
|
|
|
|
* Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
|
|
|
|
|
* memory area. This is portable for 32-bit and 64-bit architectures.
|
|
|
|
|
*
|
|
|
|
|
* Caller must ensure that "ptr" is not NULL.
|
2024-10-31 22:35:46 -04:00
|
|
|
*/
|
|
|
|
|
static inline bool
|
|
|
|
|
pg_memory_is_all_zeros(const void *ptr, size_t len)
|
|
|
|
|
{
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
const unsigned char *p = (const unsigned char *) ptr;
|
|
|
|
|
const unsigned char *end = &p[len];
|
|
|
|
|
const unsigned char *aligned_end = (const unsigned char *)
|
|
|
|
|
((uintptr_t) end & (~(sizeof(size_t) - 1)));
|
|
|
|
|
|
|
|
|
|
if (len < sizeof(size_t))
|
|
|
|
|
{
|
|
|
|
|
while (p < end)
|
|
|
|
|
{
|
|
|
|
|
if (*p++ != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* "len" in the [sizeof(size_t), sizeof(size_t) * 8 - 1] range */
|
|
|
|
|
if (len < sizeof(size_t) * 8)
|
|
|
|
|
{
|
|
|
|
|
/* Compare bytes until the pointer "p" is aligned */
|
|
|
|
|
while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
|
|
|
|
|
{
|
|
|
|
|
if (p == end)
|
|
|
|
|
return true;
|
|
|
|
|
if (*p++ != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Compare remaining size_t-aligned chunks.
|
|
|
|
|
*
|
|
|
|
|
* There is no risk to read beyond the memory area, as "aligned_end"
|
|
|
|
|
* cannot be higher than "end".
|
|
|
|
|
*/
|
|
|
|
|
for (; p < aligned_end; p += sizeof(size_t))
|
|
|
|
|
{
|
|
|
|
|
if (*(size_t *) p != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Compare remaining bytes until the end */
|
|
|
|
|
while (p < end)
|
|
|
|
|
{
|
|
|
|
|
if (*p++ != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* "len" in the [sizeof(size_t) * 8, inf) range */
|
2024-10-31 22:35:46 -04:00
|
|
|
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
/* Compare bytes until the pointer "p" is aligned */
|
|
|
|
|
while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
|
2024-10-31 22:35:46 -04:00
|
|
|
{
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
if (p == end)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
if (*p++ != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Compare 8 * sizeof(size_t) chunks at once.
|
|
|
|
|
*
|
|
|
|
|
* For performance reasons, we manually unroll this loop and purposefully
|
|
|
|
|
* use bitwise-ORs to combine each comparison. This prevents boolean
|
|
|
|
|
* short-circuiting and lets the compiler know that it's safe to access
|
|
|
|
|
* all 8 elements regardless of the result of the other comparisons. This
|
|
|
|
|
* seems to be enough to coax a few compilers into using SIMD
|
|
|
|
|
* instructions.
|
|
|
|
|
*/
|
|
|
|
|
for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) * 8)
|
|
|
|
|
{
|
|
|
|
|
if ((((size_t *) p)[0] != 0) | (((size_t *) p)[1] != 0) |
|
|
|
|
|
(((size_t *) p)[2] != 0) | (((size_t *) p)[3] != 0) |
|
|
|
|
|
(((size_t *) p)[4] != 0) | (((size_t *) p)[5] != 0) |
|
|
|
|
|
(((size_t *) p)[6] != 0) | (((size_t *) p)[7] != 0))
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Compare remaining size_t-aligned chunks.
|
|
|
|
|
*
|
|
|
|
|
* There is no risk to read beyond the memory area, as "aligned_end"
|
|
|
|
|
* cannot be higher than "end".
|
|
|
|
|
*/
|
|
|
|
|
for (; p < aligned_end; p += sizeof(size_t))
|
|
|
|
|
{
|
|
|
|
|
if (*(size_t *) p != 0)
|
2024-10-31 22:35:46 -04:00
|
|
|
return false;
|
|
|
|
|
}
|
Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
2024-11-17 20:08:20 -05:00
|
|
|
|
|
|
|
|
/* Compare remaining bytes until the end */
|
|
|
|
|
while (p < end)
|
|
|
|
|
{
|
|
|
|
|
if (*p++ != 0)
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2024-10-31 22:35:46 -04:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
1996-08-27 21:59:28 -04:00
|
|
|
#endif /* MEMUTILS_H */
|