postgresql/src/backend
Tomas Vondra a1b4f289be Consider BufFiles when adjusting hashjoin parameters
Until now ExecChooseHashTableSize() considered only the size of the
in-memory hash table, and ignored the memory needed for the batch files.
Which can be a significant amount, because each batch needs two BufFiles
(each with a BLCKSZ buffer). The same issue applies to increasing the
number of batches during execution.

It's also possible to trigger a "batch explosion", e.g. due to duplicate
values or skew. We've seen reports of joins with hundreds of thousands
(or even millions) of batches, consuming gigabytes of memory, triggering
OOM errors. These cases may be fairly rare, but it's clearly possible to
hit them.

These issues can't be prevented during planning. Even if we improve
that, it does not help with execution-time batch explosion. We can
however reduce the impact and use as little memory as possible.

This patch improves the behavior by adjusting how the memory is divided
between the hash table and batch files. It may be better to use fewer
batch files, even if it means the hash table will exceed the limit.

The capacity of the hash node may be increased either by doubling he
number of batches, or doubling the size of the in-memory hash table. The
outcome is the same, but the memory usage may be very different. For low
nbatch values it's better to add batches, for high nbatch values it's
better to allow a larger hash table.

The patch considers both options, both during the initial sizing and
then during execution, to minimize how much the limit gets exceeded.

It might seem this patch is relaxing the memory limit - allowing it to
be exceeded. But that's not really the case. It has always been like
that, except the memory used by batches was ignored.

Allowing the hash table to grow may also prevent the batch explosion.
If there's a large batch that can't be split (due to hash collisions or
duplicate values), at some point the memory limit will increase enough
for the batch to fit into the hash table.

This patch was in the works for a long time. The early versions were
posted in 2019, and revived every year or two when we happened to get
the next report of OOM due to a hashjoin batch explosion. Each of those
patch versions were reviewed by a couple people. I'm mentioning only
Melanie Plageman and Robert Haas, because they reviewed the last
version, and the older patches are very different.

Reviewed-by: Melanie Plageman, Robert Haas
Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me
Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development
Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development
2025-02-19 21:08:20 +01:00
..
access Invalidate inactive replication slots. 2025-02-19 09:29:50 +05:30
archive Update copyright for 2025 2025-01-01 11:21:55 -05:00
backup Update copyright for 2025 2025-01-01 11:21:55 -05:00
bootstrap Remove unnecessary (char *) casts [mem] 2025-02-12 08:50:13 +01:00
catalog Remove unnecessary (char *) casts [xlog] 2025-02-13 10:57:07 +01:00
commands Add ATAlterConstraint struct for ALTER .. CONSTRAINT 2025-02-19 13:06:13 +01:00
executor Consider BufFiles when adjusting hashjoin parameters 2025-02-19 21:08:20 +01:00
foreign Update copyright for 2025 2025-01-01 11:21:55 -05:00
jit Simplify executor's handling of CaseTestExpr & CoerceToDomainValue. 2025-01-30 13:21:42 -05:00
lib Update copyright for 2025 2025-01-01 11:21:55 -05:00
libpq Fix translator notes in comments 2025-02-17 20:23:34 +01:00
main Update copyright for 2025 2025-01-01 11:21:55 -05:00
nodes Move CompareType to separate header file 2025-02-02 08:11:57 +01:00
optimizer Fix freeing a child join's SpecialJoinInfo 2025-02-19 10:02:32 +09:00
parser Add ATAlterConstraint struct for ALTER .. CONSTRAINT 2025-02-19 13:06:13 +01:00
partitioning Track unpruned relids to avoid processing pruned relations 2025-02-07 17:15:09 +09:00
po Update copyright for 2025 2025-01-01 11:21:55 -05:00
port Update copyright for 2025 2025-01-01 11:21:55 -05:00
postmaster Eagerly scan all-visible pages to amortize aggressive vacuum 2025-02-11 13:53:48 -05:00
regex Support PG_UNICODE_FAST locale in the builtin collation provider. 2025-01-17 15:56:30 -08:00
replication Add a test for commit ac0e33136a using the injection point. 2025-02-19 15:02:22 +05:30
rewrite Implement Self-Join Elimination 2025-02-17 12:44:12 +02:00
snowball Update to latest Snowball sources. 2025-02-18 21:13:54 -05:00
statistics Lock table in ShareUpdateExclusive when importing index stats. 2025-02-10 12:58:13 -08:00
storage Fix unsafe access to BufferDescriptors 2025-02-19 11:05:35 +09:00
tcop Remove unnecessary (char *) casts [mem] 2025-02-12 08:50:13 +01:00
tsearch Add is_analyze parameter to vacuum_delay_point(). 2025-02-11 16:38:14 -06:00
utils Improve statistics estimation for single-column GROUP BY in sub-queries 2025-02-19 11:59:30 +02:00
.gitignore Add .gitignore entries for AIX-specific intermediate build artifacts. 2015-07-08 20:44:22 -04:00
common.mk Blind attempt to fix LLVM dependency in the backend 2022-09-15 10:53:48 +07:00
Makefile Update copyright for 2025 2025-01-01 11:21:55 -05:00
meson.build Update copyright for 2025 2025-01-01 11:21:55 -05:00
nls.mk Return yyparse() result not via global variable 2025-01-24 06:55:39 +01:00