postgresql/src/backend
Peter Geoghegan 9a2e2a285a Improve nbtree array primitive scan scheduling.
Add a new scheduling heuristic: don't end the ongoing primitive index
scan immediately (at the point where _bt_advance_array_keys notices that
the next set of matching tuples must be on a later page) if the primscan
already managed to step right/left from its first leaf page.  Schedule a
recheck against the next sibling leaf page's finaltup instead.

The new heuristic tends to avoid scenarios where the top-level scan
repeatedly starts and ends primitive index scans that each read only one
leaf page from a group of neighboring leaf pages.  Affected top-level
scans will now tend to step forward (or backward) through the index
instead, without wasting cycles on descending the index anew.

The recheck mechanism isn't exactly new.  But up until now it has only
been used to deal with edge cases involving high key finaltups with one
or more truncated -inf attributes that _bt_advance_array_keys deemed
"provisionally satisfied" (satisfied for the purposes of allowing the
scan to step onto the next page, subject to recheck once on that page).
The mechanism was added by commit 5bf748b8, which invented the general
concept of primitive scan scheduling.  It was later enhanced by commit
79fa7b3b, which taught it about cases involving -inf attributes that
satisfy inequality scan keys required in the opposite-to-scan direction
only (arguably, they should have been covered by the earliest version).
Now the recheck mechanism can be applied based on scan-level heuristics,
which have nothing to do with truncated high keys.  Now rechecks might
be performed by _bt_readpage when scanning in _either_ scan direction.

The theory behind the new heuristic is that any primitive scan that
makes it past its first leaf page is one that is already likely to have
arrays whose key values match index tuples that are closely clustered
together in the index.  The rules that determine whether we ever get
past the first page are still conservative (that'll still only happen
when pstate.finaltup strongly suggests that it's the right thing to do).
Surviving past the first leaf page is a strong signal in itself.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That'll work by adding skip arrays, which behave similarly
to SAOP arrays, but generate their elements procedurally and on-demand.

Note that this commit isn't specifically concerned with skip arrays; the
scheduling logic doesn't (and won't) condition anything on whether the
scan uses skip arrays, SAOP arrays, or some combination of the two
(which seems like a good general principle for _bt_advance_array_keys).
While the problems that this commit ameliorates are more likely with
skip arrays (at least in practice), SAOP arrays (or those with very
dense, contiguous array elements) are also affected.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com
2025-03-22 13:02:18 -04:00
..
access Improve nbtree array primitive scan scheduling. 2025-03-22 13:02:18 -04:00
archive Update copyright for 2025 2025-01-01 11:21:55 -05:00
backup pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
bootstrap Remove unnecessary (char *) casts [mem] 2025-02-12 08:50:13 +01:00
catalog Label the contents of pg_*_d.h files a little better. 2025-03-21 15:09:46 -04:00
commands Change one loop in ATRewriteTable to use 1-based attnums 2025-03-21 10:55:06 +01:00
executor Ensure first ModifyTable rel initialized if all are pruned 2025-03-19 12:14:24 +09:00
foreign Update copyright for 2025 2025-01-01 11:21:55 -05:00
jit Add special case fast-paths for strict functions 2025-03-11 12:02:42 +01:00
lib Update copyright for 2025 2025-01-01 11:21:55 -05:00
libpq Modularize log_connections output 2025-03-12 11:35:21 -04:00
main Update copyright for 2025 2025-01-01 11:21:55 -05:00
nodes Introduce squashing of constant lists in query jumbling 2025-03-18 18:56:11 +01:00
optimizer Revert workarounds for -Wmissing-braces false positives on old GCC 2025-03-20 11:25:58 +01:00
parser Update a code comment 2025-03-19 10:39:06 +01:00
partitioning Fix bug in cbc127917 to handle nested Append correctly 2025-02-25 09:24:42 +09:00
po Update copyright for 2025 2025-01-01 11:21:55 -05:00
port Update copyright for 2025 2025-01-01 11:21:55 -05:00
postmaster Introduce squashing of constant lists in query jumbling 2025-03-18 18:56:11 +01:00
regex Support PG_UNICODE_FAST locale in the builtin collation provider. 2025-01-17 15:56:30 -08:00
replication Add GUC option to control maximum active replication origins. 2025-03-21 12:20:15 -07:00
rewrite Fix incorrect handling of subquery pullup 2025-03-13 16:36:03 +09:00
snowball Update to latest Snowball sources. 2025-02-18 21:13:54 -05:00
statistics Address stats import review comments. 2025-03-05 23:07:25 -08:00
storage Fix ps display for IO workers. 2025-03-22 10:13:23 +13:00
tcop aio: Infrastructure for io_method=worker 2025-03-18 11:54:01 -04:00
tsearch Clear errno before calling strtol() in spell.c. 2025-03-08 11:24:25 -05:00
utils Add GUC option to control maximum active replication origins. 2025-03-21 12:20:15 -07:00
.gitignore
common.mk Blind attempt to fix LLVM dependency in the backend 2022-09-15 10:53:48 +07:00
Makefile Update copyright for 2025 2025-01-01 11:21:55 -05:00
meson.build Update copyright for 2025 2025-01-01 11:21:55 -05:00
nls.mk Return yyparse() result not via global variable 2025-01-24 06:55:39 +01:00