Commit graph

63715 commits

Author SHA1 Message Date
Peter Eisentraut
c79e414127 Fix typo
Mistake in commit e2f289e5b9: SOFT_ERROR_OCCURRED was called with the
wrong fcinfo field.

Reported-by: Jianghua Yang <yjhjstz@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAZLFmSGti716gWeY%3DDCZ9TTVOixnHZ4_4V4tDzoeE86D64vOA%40mail.gmail.com
2026-03-25 07:09:44 +01:00
Amit Kapila
6b5b7eae3a pg_createsubscriber: Add -l/--logdir option to redirect output to files.
This commit introduces a -l (or --logdir) argument to pg_createsubscriber,
allowing users to specify a directory for log files.

When enabled, a timestamped subdirectory is created within the specified
log directory, containing:

pg_createsubscriber_server.log: Captures logs from the standby server
during its start/stop cycles.
pg_createsubscriber_internal.log: Captures the tool's own internal
diagnostic and progress messages.

This ensures that transient server and utility messages are preserved for
troubleshooting after the subscriber creation process completes or errored
out.

Author: Gyan Sreejith <gyan.sreejith@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc+G+f4oSUHw@mail.gmail.com
2026-03-25 11:22:07 +05:30
John Naylor
be6a7494d2 Refactor handling of x86 CPUID instructions
Introduce two helpers for CPUID, pg_cpuid and pg_cpuid_subleaf that wrap
the platform specific __get_cpuid/__cpuid and __get_cpuid_count/__cpuidex
functions.

Additionally, use macros to specify registers names (e.g. EAX) for clarity,
instead of numeric integers into the result array.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-By: John Naylor <john.naylor@postgresql.org>
Discussion: https://postgr.es/m/CANWCAZZ+Crjt5za9YmFsURRMDW7M4T2mutDezd_3s1gTLnrzGQ@mail.gmail.com
2026-03-25 12:32:36 +07:00
Michael Paquier
7c64d56fd9 Remove isolation test lock-stats
This test is proving to be unstable in the CI for Windows, at least.
The origin of the issue is that the deadlock_timeout requests may not
be processed, causing the lock stats to not be updated.  This could be
mitigated by making the hardcoded sleep longer, however this would cost
in runtime on fast machines.  On slow machines, there is no guarantee
that an augmented sleep would be enough.

An isolation test may not be the best method to write this test
(TAP test with injection point with a NOTICE+wait_for_log before
processing the deadlock_timeout request should remove the need of a
sleep).  As we are late in the release cycle, I am removing the test for
now to keep the CI and the buildfarm a maximum stable.  Let's revisit
this part later.

Discussion: https://postgr.es/m/hlkdrplgrmudbspibsuq6xooxrqxqsgwo6x5b6x5ptvkgjbe7w@xogt6xgua6dz
2026-03-25 08:48:15 +09:00
Jeff Davis
11f8018ee6 Refactor to remove ForeignServerName().
Callers either have a ForeignServer object or can readily construct
one.

Discussion: https://postgr.es/m/CAExHW5vV5znEvecX=ra2-v7UBj9-M6qvdDzuB78M-TxbYD1PEA@mail.gmail.com
Suggested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
2026-03-24 15:20:28 -07:00
Jeff Davis
f16f5d608c GetSubscription(): use per-object memory context.
Constructing a Subcription object uses a number of small or temporary
allocations. Use a per-object memory context for easy cleanup.

Get rid of FreeSubscription() which did not free all the allocations
anyway. Also get rid of the PG_TRY()/PG_CATCH() logic in
ForeignServerConnectionString() which were used to avoid leaks during
GetSubscription().

Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/xvdjrdqnpap3uq7owbaox3r7p5gf7sv62aaqf2ju3vb6yglatr%40kvvwhoudrlxq
Discussion: https://postgr.es/m/CAA4eK1K=WjZ1maBCmj=5ZdO66AwPORK5ZBxVKedS0xdCcb621A@mail.gmail.com
2026-03-24 15:11:45 -07:00
Melanie Plageman
a881cc9c7e Remove XLOG_HEAP2_VISIBLE entirely
There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Bumps XLOG_PAGE_MAGIC because we removed a WAL record type.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2026-03-24 17:58:12 -04:00
Melanie Plageman
a759ced2f1 WAL log VM setting for empty pages in XLOG_HEAP2_PRUNE_VACUUM_SCAN
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible and all-frozen in a
XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no real independent benefit, but empty pages were the last user
of XLOG_HEAP2_VISIBLE, so by making this change we can next remove all
of the XLOG_HEAP2_VISIBLE code.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com>
2026-03-24 17:30:54 -04:00
Melanie Plageman
1252a4ee28 WAL log VM setting during vacuum phase I in XLOG_HEAP2_PRUNE_VACUUM_SCAN
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications. This reduces WAL volume produced by vacuum.

For now, vacuum is still the only user of heap_page_prune_and_freeze()
allowed to set the VM. On-access pruning is not yet able to set the VM.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2026-03-24 16:49:46 -04:00
Robert Haas
dc47beacaa get_memoize_path: Don't exit quickly when PGS_NESTLOOP_PLAIN is unset.
This function exits early in the case where the number of inner rows
is estimated to be less than 2, on the theory that in that case a
Nested Loop with inner Memoize must lose to a plain Nested Loop.
But since commit 4020b370f2 it's
possible for a plain Nested Loop to be disabled, while a Nested Loop
with inner Memoize is still enabled. In that case, this reasoning
is not valid, so adjust the code not to exit early in that case.

This issue was revealed by a test_plan_advice failure on buildfarm
member skink, where NESTED_LOOP_MEMOIZE() couldn't be enforced on
replanning due to this early exit.

Discussion: http://postgr.es/m/CA+TgmoZUN8FT1Ah=m6Uis5bHa4FUa+_hMDWtcABG17toEfpiUg@mail.gmail.com
2026-03-24 16:17:26 -04:00
Melanie Plageman
9ba3ec076a Keep newest live XID up-to-date even if page not all-visible
During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain set_all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
2026-03-24 15:37:18 -04:00
Melanie Plageman
dd5716f3c7 Use GlobalVisState in vacuum to determine page level visibility
During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
2026-03-24 14:50:59 -04:00
Álvaro Herrera
f227b7b20c
Avoid including clog.h in proc.h
The number of .c files that must include access/clog.h can currently be
counted on one's fingers and miss only one (assuming one has the usual
number of hands).  However, due to indirect inclusion via proc.h,
there's a lot of files that are pointlessly including it.  This is easy
to avoid with the easy trick implemented by this commit.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202603221856.iwlhitt6dxxx@alvherre.pgsql
2026-03-24 17:31:16 +01:00
Tom Lane
6e243d81c5 Fix poorly-sized buffers in astreamer compression modules.
astreamer_gzip.c and astreamer_lz4.c left their decompression
output buffers at StringInfo's default allocation, merely 1kB.
This results in a lot of ping-ponging between the decompressor
and the next astreamer filter.  This patch increases these buffer
sizes to 256kB.  In a simple test this had a small but measurable
effect (saving a few percent) on the overall runtime of pg_waldump
for the gzipped-data case; I didn't bother measuring for lz4.

astreamer_zstd.c used ZSTD_DStreamOutSize() to size its
compression output buffer, but the libzstd API says you should use
ZSTD_CStreamOutSize(); ZSTD_DStreamOutSize() is for decompression.
The two functions seem to produce the same value (256kB) here, so
this is just cosmetic, but nonetheless we should play by the rules.

While these issues are old, they don't seem significant enough to
warrant back-patching.

Discussion: https://postgr.es/m/3424809.1774234940@sss.pgh.pa.us
2026-03-24 12:17:12 -04:00
Tom Lane
ca1f1ade3f Remove read_archive_file()'s "count" parameter.
Instead, always try to fill the allocated buffer completely.
The previous coding apparently intended (though it's undocumented)
to read only small amounts of data until we are able to identify the
WAL segment size and begin filtering out unwanted segments.  However
this extra complication has no measurable value according to simple
testing here, and it could easily be a net loss if there is a
substantial amount of non-WAL data in the archive file before the
first WAL file.

Discussion: https://postgr.es/m/3341199.1774221191@sss.pgh.pa.us
2026-03-24 12:17:12 -04:00
Álvaro Herrera
2102ebb195
Don't include storage/lock.h in so many headers
Since storage/locktags.h was added by commit 322bab7974, many headers
can be made leaner by depending on that instead of on storage/lock.h,
which has many other dependencies.

(In fact, some of these changes were possible even before that.)

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/abvrRZo52Yx9ZzWQ@ip-10-97-1-34.eu-west-3.compute.internal
2026-03-24 17:11:12 +01:00
Álvaro Herrera
5f2350a043
Fix dereference in a couple of GUC check hooks
check_backtrace_functions() and check_archive_directory() were doing an
empty-string check this way:
    *newval[0] == '\0'
which, because of operator precedence, is interpreted as *(newval[0])
instead of (*newval)[0] -- but these variables are pointers to C-strings
and we want to check the first character therein, rather than check the
first pointer of the array, so that interpretation is wrong.  This would
be wrong for any index element other than 0, as evidenced by every other
dereference of the same variable in check_backtrace_functions, which use
parentheses.

Add parentheses to make the intended dereference explicit.

This is just cosmetic at this stage, so no backpatch, although it's been
"wrong" for a long time.

Author: Zhang Hu <kongbaik228@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/CAB5m2QssN6UO+ckr6ZCcV0A71mKUB6WdiTw1nHo43v4DTW1Dfg@mail.gmail.com
2026-03-24 16:45:39 +01:00
Nathan Bossart
c7b9f16113 test_bloomfilter: Fix error message.
The error message in question uses the wrong format specifier and
variable.  This has been wrong for a while, but since it's in a
test module and wasn't noticed until just now, no back-patch.

Oversight in commit 51bc271790.

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmS2OMiwe65gdm-MKgO%3DLnKatGMSK6JWxhycGN3TWrhbnw%40mail.gmail.com
2026-03-24 09:32:15 -05:00
Robert Haas
4647ee2da3 Add a test for creating an index on a whole-row expression.
Surprisingly, we have no existing test for this. Had this test
been present before commit 570e2fcc04
the Assert added in commit c98ad086ad
would have caught the bug.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
2026-03-24 10:06:46 -04:00
Peter Eisentraut
6bc7449eac Fix accidentally casting away const
Recently introduced in commit 4b5ba0c4ca.
2026-03-24 14:34:50 +01:00
Fujii Masao
1c162c965a Report detailed errors from XLogFindNextRecord() failures.
Previously, XLogFindNextRecord() did not return detailed error information
when it failed to find a valid WAL record. As a result, callers such as
the WAL summarizer, pg_waldump, and pg_walinspect could only report generic
errors (e.g., "could not find a valid record after ..."), making
troubleshooting difficult.

This commit fix the issue by extending XLogFindNextRecord() to return
detailed error information on failure, and updating its callers to include
those details in their error messages.

For example, when pg_waldump is run on a WAL file with an invalid magic number,
it now reports not only the generic error but also the specific cause
(e.g., "invalid magic number").

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqoxJXddcT4wkd9Xd+cD6Sz-fyspRGuV4Bq-wbXG4pVNzA@mail.gmail.com
2026-03-24 22:33:09 +09:00
Robert Haas
c98ad086ad Bounds-check access to TupleDescAttr with an Assert.
The second argument to TupleDescAttr should always be at least zero
and less than natts; otherwise, we index outside of the attribute
array. Assert that this is the case.

Various violations, or possible violations, of this rule that are
currently in the tree are actually harmless, because while
we do call TupleDescAttr() before verifying that the argument is
within range, we don't actually dereference it unless the argument
was within range all along. Nonetheless, the Assert means we
should be more careful, so tidy up accordingly.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
2026-03-24 08:58:50 -04:00
Peter Eisentraut
e2f289e5b9 Make many cast functions error safe
This adjusts many C functions underlying casts to support soft errors.
This is in preparation for a future feature where conversion errors in
casts can be caught.

This patch covers cast functions that can be adjusted easily by
changing ereport to ereturn or making other light changes.  The
underlying helper functions were already changed to support soft
errors some time ago as part of soft error support in type input
functions.

Other casts and types will require some more work and are being kept
as separate patches.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-24 12:08:22 +01:00
Robert Haas
570e2fcc04 Prevent spurious "indexes on virtual generated columns are not supported".
Both of the checks in DefineIndex() that can produce this error
message have a guard against negative attribute numbers, but lack a
guard to ensure that attno is non-zero. As a result, we can index
off the beginning of the TupleDesc and read a garbage byte for
attgenerated. If that byte happens to be 'v', we'll incorrectly
produce the error mentioned above.

The first call site is easy to hit: any attempt to create an
expression index does so. The second one is not currently hit in
the regression tests, but can be hit by something like
CREATE INDEX ON some_table ((some_function(some_table))).

Found by study of a test_plan_advice failure on buildfarm member
skink, though this issue has nothing to do with test_plan_advice
and seems to have only been revealed by happenstance.

Backpatch-through: 18
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
2026-03-24 06:28:33 -04:00
John Naylor
d2a1aa77c2 Fix copy-paste error in test_ginpostinglist
The check for a mismatch on the second decoded item pointer
was an exact copy of the first item pointer check, comparing
orig_itemptrs[0] with decoded_itemptrs[0] instead of orig_itemptrs[1]
with decoded_itemptrs[1].  The error message also reported (0, 1) as
the expected value instead of (blk, off).  As a result, any decoding
error in the second item pointer (where the varbyte delta encoding
is exercised) would go undetected.

This has been wrong since commit bde7493d1, so backpatch to all
supported versions.

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmSOD8R7tZjRLZsmpKtJLoqjgawAaM-Pne1j8B_Q2aQK8w@mail.gmail.com
Backpatch-through: 14
2026-03-24 17:14:11 +07:00
Alexander Korotkov
6888658516 Further improve commentary about ChangeVarNodesWalkExpression()
The updated comment explains why we use ChangeVarNodes_walker() instead of
expression_tree_walker(), and provides a bit more detail about the differences
in processing top-level Query and subqueries.

Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAPpHfdvbjq342WTQ705Wmqhe8794pcp7wospz%2BWUJ2qB7vuOqA%40mail.gmail.com
Backpatch-through: 18
2026-03-24 09:54:00 +02:00
Michael Paquier
4019f725f5 Add support for lock statistics in pgstats
This commit adds a new stats kind, called PGSTAT_KIND_LOCK, implementing
statistics for lock tags, as reported by pg_locks.  The implementation
is fixed-sized, as the data is caped based on the number of lock tags in
LockTagType.

The new statistics kind records the following fields, providing insight
regarding lock behavior, while avoiding impact on performance-critical
code paths (such as fast-path lock acquisition):
- waits and wait_time: respectively track the number of times a lock
required waiting and the total time spent acquiring it.  These metrics
are only collected once a lock is successfully acquired and after
deadlock_timeout has been exceeded.
fastpath_exceeded: counts how often a lock could not be acquired via
the fast path due to the max_locks_per_transaction slot limits.

A new view called pg_stat_lock can be used to access this data, coupled
with a SQL function called pg_stat_get_lock().

Bump stat file format PGSTAT_FILE_FORMAT_ID.
Bump catalog version.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aIyNxBWFCybgBZBS%40ip-10-97-1-34.eu-west-3.compute.internal
2026-03-24 15:32:09 +09:00
Michael Paquier
a90d865182 Move some code blocks in lock.c and proc.c
This change will simplify an upcoming change that will introduce lock
statistics, reducting code churn.

This commit means that we begin to calculate the time it took to acquire
a lock after the deadlock check interrupt has run should log_lock_waits
be off, when taken in isolation.  This is not a performance-critical
code path, and note that log_lock_waits is enabled by default since
2aac62be8c.

Extracted from a larger patch by the same author.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aIyNxBWFCybgBZBS@ip-10-97-1-34.eu-west-3.compute.internal
2026-03-24 13:34:54 +09:00
Michael Paquier
3d10ece612 Make implementation of SASLprep compliant for ASCII characters
This commit makes our implementation of SASLprep() compliant with RFC
3454 (Stringprep) and RFC 4013 (SASLprep).  Originally, as introduced in
60f11b87a2, the operation considered a password made of only ASCII
characters as valid, performing an optimization for this case to skip
the internal NFKC transformation.

However, the RFCs listed above use a different definition, with the
following characters being prohibited:
- 0x00~0x1F (0~31), control characters.
- 0x7F (127, DEL).

In its SCRAM protocol, Postgres has the idea to apply a password as-is
if SASLprep() is not a success, so this change is safe on
backward-compatibility grounds:
- A libpq client with the compliant SASLprep can connect to a server
with a non-compliant SASLprep.
- A libpq client with the non-compliant SASLprep can connect to a server
with a compliant SASLprep.

This commit removes the all-ASCII optimization used in pg_saslprep() and
applies SASLprep even if a password is made only of ASCII characters,
making the operation compatible with the RFC.  All the in-core callers
of pg_saslprep() do that:
- pg_be_scram_build_secret() in auth-scram.c, when generating a
SCRAM verifier for rolpassword in the backend.
- scram_init() in fe-auth-scram.c, when starting the SASL exchange.
- pg_fe_scram_build_secret() in fe-auth-scram.c, when generating a SCRAM
verifier for the frontend with libpq, to generate it for a ALTER/CREATE
ROLE command for example.

The test module test_saslprep shows the difference this change is
leading to.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aaEJ-El2seZHeFcG@paquier.xyz
2026-03-24 08:29:23 +09:00
Tom Lane
2e123e3c2b Silence compiler warning from older compilers.
Our RHEL7-vintage buildfarm animals are complaining about
"the comparison will always evaluate as true" for a usage of
SOFT_ERROR_OCCURRED() on a local variable.  This is the same
issue addressed in 7bc88c3d6 and some earlier commits, so solve
it the same way: write "escontext.error_occurred" instead.

Problem dates to recent commit a0b6ef29a, no need for back-patch.
2026-03-23 17:25:12 -04:00
Tom Lane
7c08a7e809 Doc: minor improvements to SNI documentation.
My attention was drawn to this new documentation by overlength-line
complaints in the PDF docs builds: the synopsis for hostname lines was
too wide.  I initially thought of shortening the parameter names to
fit, but it turns out that adding <optional> markup is enough to
persuade DocBook to break the line, and that seems more helpful
anyway.

While here, I couldn't resist some copy-editing, mostly being
consistent about whether to use Oxford commas or not.  The biggest
change was to re-order the entries in the hostname-values table to
match the running text.
2026-03-23 15:33:51 -04:00
Tom Lane
99d6aa64ef Doc: document how EXPLAIN ANALYZE reports parallel queries.
This wasn't covered anywhere before...

Reported-by: Marcos Pegoraro <marcos@f10.com.br>
Author: Maciek Sakrejda <maciek@pganalyze.com>
Reviewed-by: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAB-JLwYCgdiB=trauAV1HN5rAWQdvDGgaaY_mqziN88pBTvqqg@mail.gmail.com
2026-03-23 14:48:58 -04:00
Bruce Momjian
0a68fd70cb doc: make "datadir" argument specification more specific
Previously these cases were listed as "directory".

Author: Peter Smith

Discussion: https://postgr.es/m/CAHut+PvCOQqMi0zRk3GecbYzm5xX1wQixxm9Qs3oXXr5fFCUgw@mail.gmail.com
2026-03-23 12:13:31 -04:00
Tom Lane
360dd6f7b4 Improve commentary about ChangeVarNodesWalkExpression().
IMO the proximate cause of the bug fixed in commit 07b7a964d
was sloppy thinking about what ChangeVarNodesWalkExpression()
is to be used for.  Flesh out its header comment to try to
improve that situation.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1607553.1774017006@sss.pgh.pa.us
Backpatch-through: 18
2026-03-23 11:14:24 -04:00
Michael Paquier
93b76db0ac Fix invalid value of pg_aios.pid, function pg_get_aios()
When the value of pg_aios.pid is found to be 0, the function had the
idea to set "nulls" to "false" instead of "true", without setting the
value stored in the tuplestore.  This could lead to the display of buggy
data.  The intention of the code is clearly to display NULL when a PID
of 0 is found, and this commit adjusts the logic to do so.

Issue introduced by 60f566b4f2.

Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by:  Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_7D61A85D6143AD57CA8D8C00DEC541869D06@qq.com
Backpatch-through: 18
2026-03-23 18:13:56 +09:00
Peter Eisentraut
085a531983 ci: Run headerscheck and cpluspluscheck in parallel
This can save several seconds of wall-clock time for that task.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/b49e74d4-3cf9-4d1c-9dce-09f75e55d026%40eisentraut.org
2026-03-23 08:40:29 +01:00
Peter Eisentraut
0f17d1dbfa headerscheck: Get CXXFLAGS from Makefile.global
headerscheck in C++ mode (cpluspluscheck) previously hardcoded
CXXFLAGS and documented that you might need to override them manually
from the environment.  Now that we have better C++ support in the
build system, we can just get CXXFLAGS from Makefile.global, like we
do for other variables.

Furthermore, this is necessary in some configurations to make
cpluspluscheck work under meson, because under meson, some -I options
end up in CXXFLAGS where under make they would be in CPPFLAGS.
Therefore, getting the correct CXXFLAGS is required in those cases.

Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMSWrt-PoQt4sHryWrB1ViuGBJF_PpbjoSGrWR2Ry47bHNLDqg%40mail.gmail.com
2026-03-23 07:53:43 +01:00
Amit Kapila
d6628a5ea0 pg_createsubscriber: Introduce module-specific logging functions.
Replace generic pg_log_* calls with report_createsub_log() and
report_createsub_fatal(). This refactor provides the necessary
infrastructure to support logging to external files via the -l option.

These new functions enable the utility to route messages to both the
terminal and a log file based on the logging configuration and verbosity
levels provided by the user.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Gyan Sreejith <gyan.sreejith@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc+G+f4oSUHw@mail.gmail.com
2026-03-23 09:23:20 +05:30
Michael Paquier
ded9754804 Add missing deflateEnd() for server-side gzip base backups
The gzip basebackup sink called deflateInit2() in begin_archive() but
never called deflateEnd(), leaking zlib's internal compression state
(~256KB per archive) until the memory context of the base backup is
destroyed.

The code tree has already a matching deflateEnd() call for each
deflateInit[2]() call (pgrypto, etc.), except for the file touched in
this commit, so this brings more consistency for all the compression
methods.  The server-side LZ4 and zstd implementations require a
dedicated cleanup callback as they allocate their state outside the
context of a palloc().

As currently used, deflateInit2() is called once per tablespace in a
single backup.  Memory would slightly bloat only when dealing with many
tablespaces at once, not across multiple base backups so this is not
worth a backpatch.  This change could matter for future uses of this
code.

zlib allows the definition of memory allocation and free callbacks in
the z_stream object given to a deflateInit[2]().  The base backup
backend code relies on palloc() for the allocations and deflateEnd()
internally only cleans up memory (no fd allocation for example).

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmQNJ0QNArpWEOZXwv=vbumcWKEHz-b1me5gBqRqG67EwQ@mail.gmail.com
2026-03-23 09:04:44 +09:00
Tom Lane
69c57466a7 Fix another buglet in archive_waldump.c.
While re-reading 860359ea0, I noticed another problem: when
spilling to a temp file, it did not bother to check the result
of fclose().  This is bad since write errors (like ENOSPC)
may not be reported until close time.
2026-03-22 18:48:38 -04:00
Tom Lane
860359ea02 Fix assorted bugs in archive_waldump.c.
1. archive_waldump.c called astreamer_finalize() nowhere.  This meant
that any data retained in decompression buffers at the moment we
detect archive EOF would never reach astreamer_waldump_content(),
resulting in surprising failures if we actually need the last few
bytes of the archive file.

To fix that, make read_archive_file() do the finalize once it detects
EOF.  Change its API to return a boolean "yes there's more data"
rather than the entirely-misleading raw count of bytes read.

2. init_archive_reader() relied on privateInfo->cur_file to track
which WAL segment was being read, but cur_file can become NULL if a
member trailer is processed during a read_archive_file() call.  This
could cause unreproducible "could not find WAL in archive" failures,
particularly with compressed archives where all the WAL data fits in
a small number of compressed bytes.

Fix by scanning the hash table after each read to find any cached
WAL segment with sufficient data, instead of depending on cur_file.
Also reduce the minimum data requirement from XLOG_BLCKSZ to
sizeof(XLogLongPageHeaderData), since we only need the long page
header to extract the segment size.

We likewise need to fix init_archive_reader() to scan the whole
hash table for irrelevant entries, since we might have already
loaded more than one entry when the data is compressible enough.

3. get_archive_wal_entry() relied on tracking cur_file to identify
WAL hash table entries that need to be spilled to disk.  However,
this can't work for entries that are read completely within a
single read_archive_file call: the caller will never see cur_file
pointing at such an entry.  Instead, scan the WAL hash table to
find entries we should spill.  This also fixes a buglet that any
hash table entries completely loaded during init_archive_reader
were never considered for spilling.

Also, simplify the logic tremendously by not attempting to spill
entries that haven't been read fully.  I am not convinced that the old
logic handled that correctly in every path, and it's really not worth
the complication and risk of bugs to try to spill entries on the fly.
We can just write them in a single go once they are no longer the
cur_file.

4. Fix a rather critical performance problem: the code thought that
resetStringInfo() will reclaim storage, but it doesn't.  So by the
end of the run we'd have consumed storage space equal to the total
amount of WAL read, negating all the effort of the spill logic.

Also document the contract that cur_file can change (or become NULL)
during a single read_archive_file() call, since the decompression
pipeline may produce enough output to trigger multiple astreamer
callbacks.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
2026-03-22 18:24:42 -04:00
Tom Lane
5868372bbf Remove nonfunctional tar file trailer size check.
The ASTREAMER_ARCHIVE_TRAILER case in astreamer_tar_parser_content()
intended to reject tar files whose trailer exceeded 2 blocks.  However,
the check compared 'len' after astreamer_buffer_bytes() had already
consumed all the data and set len to 0, so the pg_fatal() could never
fire.

Moreover, per the POSIX specification for the ustar format, the last
physical block of a tar archive is always full-sized, and "logical
records after the two zero logical records may contain undefined data."
GNU tar, for example, zero-pads its output to a 10kB boundary by
default.  So rejecting extra data after the two zero blocks would be
wrong even if the check worked.  (But if the check had worked, it
would have alerted us to the bug just fixed in 9aa1fcc54.)

Remove the dead check and update the comment to explain why trailing
data is expected and harmless.

Per report from Tom Lane.

Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
2026-03-22 18:13:41 -04:00
Tom Lane
9aa1fcc547 Fix finalization of decompressor astreamers.
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size.  This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage.  But that may not
be true in future uses.

Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15
2026-03-22 18:06:48 -04:00
Peter Geoghegan
e5836f7b7d Add fake LSN support to hash index AM.
Use fake LSNs in all hash AM critical sections that write a WAL record.
This gives us a reliable way (a way that works during scans of both
logged and unlogged relations) to detect when an index page was
concurrently modified during the window between when the page is
initially read (by _hash_readpage) and when the page has any known-dead
items LP_DEAD-marked (by _hash_kill_items).

Preparation for an upcoming patch that makes the hash index AM use the
amgetbatch interface, enabling I/O prefetching during hash index scans.

The amgetbatch design imposes certain rules on index AMs with respect to
how they hold on to index page buffer pins (at least in the case of pins
held as an interlock against unsafe concurrent TID recycling by VACUUM).
These rules have consequences for routines that set LP_DEAD bits on
index tuples from an amgetbatch index AM: such routines have an inherent
need to reason about concurrent TID recycling by VACUUM, but can no
longer rely on their amgettuple routine holding on to a buffer pin
(during the aforementioned window) as an interlock against such
recycling.  Instead, they have to follow a new, standardized approach.

The new approach taken by amgetbatch index AMs when setting LP_DEAD bits
is heavily based on the current nbtree dropPin design, which was added
by commit 2ed5b87f.  It also works by checking if the page's LSN
advanced during the window where unsafe concurrent TID recycling might
have taken place.

This commit is similar to commit 8a879119, which taught nbtree to use
fake LSNs to improve its dropPin behavior.  However, unlike that commit,
this is not an independently useful enhancement, since hash doesn't
implement anything like nbtree's dropPin behavior (not yet).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
2026-03-22 17:31:43 -04:00
Melanie Plageman
01b7e4a46d Add pruning fast path for all-visible and all-frozen pages
Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work to do. To avoid this, if a page is already all-frozen or
it is all-visible and no freezing will be attempted, exit early. We
can't exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
2026-03-22 15:46:50 -04:00
Peter Geoghegan
f026fbf059 Make IndexScanInstrumentation a pointer in executor scan nodes.
Change the IndexScanInstrumentation fields in IndexScanState,
IndexOnlyScanState, and BitmapIndexScanState from inline structs to
pointers.  This avoids additional space overhead whenever new fields are
added to IndexScanInstrumentation in the future, at least in the common
case where the instrumentation isn't used (i.e. when the executor node
isn't being run through an EXPLAIN ANALYZE).

Preparation for an upcoming patch series that will add index
prefetching.  The new slot-based interface that will enable index
prefetching necessitates that we add at least one more field to
IndexScanInstrumentation (to count heap fetches during index-only
scans).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
2026-03-22 13:20:29 -04:00
Melanie Plageman
4f7ecca84d Detect and fix visibility map corruption in more cases
Move VM corruption detection and repair into heap page pruning. This
allows VM repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or deleted by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once,
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
2026-03-22 11:52:40 -04:00
Heikki Linnakangas
516310ed4d Don't reset 'latest_page_number' when replaying multixid truncation
'latest_page_number' is set to the correct value, according to
nextOffset, early at system startup. Contrary to the comment, it hence
should be set up correctly by the time we get to WAL replay.

This was committed to back-branches earlier already (commit
817f74600d), to fix a bug in a backwards-compatibility codepath. We
don't have that bug on 'master', but the change nevertheless makes
sense on 'master' too.

Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/20260214090150.GC2297@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://www.postgresql.org/message-id/e1787b17-dc93-4621-a5a1-c713d1ac6a1b@iki.fi
2026-03-22 14:23:54 +02:00
Michael Paquier
1f7947a48d Add test for single-page VACUUM of hash index on INSERT
_hash_vacuum_one_page() in hashinsert.c is a routine related to hash
indexes that can perform a single-page VACUUM when dead tuples are
detected during index insertion.  This routine previously had no test
coverage, and this commit adds a test case for that purpose.

To safely create dead tuples in a way that works with parallel tests,
this uses a technique based on a rollbacked INSERT, following a
suggestion by Heikki Linnakangas.

Author: Alexander Kuzmenkov <akuzmenkov@tigerdata.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com
2026-03-22 15:24:33 +09:00
Michael Paquier
322bab7974 Move declarations related to locktags from lock.h to new locktag.h
This commit moves all the declarations related to locktags from lock.h
to a new header called locktag.h.  This header is useful so as code
paths that care about locktags but not the lock hashtable can know about
these without having to include lock.h and all its set of dependencies.

This move includes the basic locktag structures and the set of macros to
fill in the locktag fields before attempting to acquire a lock.

Based on a suggestion from me, suggestion done while discussing a
different feature.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/abufUya2oK-_PJ3E@paquier.xyz
2026-03-21 14:34:47 +09:00