Commit graph

55729 commits

Author SHA1 Message Date
Thomas Munro
963fccc09c jit: No backport::SectionMemoryManager for LLVM 22.
LLVM 22 has the fix that we copied into our tree in commit 9044fc1d and
a new function to reach it[1][2], so we only need to use our copy for
Aarch64 + LLVM < 22.  The only change to the final version that our copy
didn't get is a new LLVM_ABI macro, but that isn't appropriate for us.
Our copy is hopefully now frozen and would only need maintenance if bugs
are found in the upstream code.

Non-Aarch64 systems now also use the new API with LLVM 22.  It allocates
all sections with one contiguous mmap() instead of one per
section.  We could have done that earlier, but commit 9044fc1d wanted to
limit the blast radius to the affected systems.  We might as well
benefit from that small improvement everywhere now that it is available
out of the box.

We can't delete our copy until LLVM 22 is our minimum supported version,
or we switch to the newer JITLink API for at least Aarch64.

[1] https://github.com/llvm/llvm-project/pull/71968
[2] https://github.com/llvm/llvm-project/pull/174307

Backpatch-through: 14
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com
2026-04-03 15:02:45 +13:00
Thomas Munro
c00ea2b5b4 jit: Stop emitting lifetime.end for LLVM 22.
The lifetime.end intrinsic can now only be used for stack memory
allocated with alloca[1][2][3].  We use it to tell LLVM about the
lifetime of function arguments/isnull values that we keep in palloc'd
memory, so that it can avoid spilling registers to memory.

We might need to rearrange things and put them on the stack, but that'll
take some research.  In the meantime, unbreak the build on LLVM 22.

[1] https://github.com/llvm/llvm-project/pull/149310
[2] https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic
[3] https://llvm.org/docs/LangRef.html#i-alloca

Backpatch-through: 14
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> (earlier attempt)
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> (earlier attempt)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier attempt)
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com
2026-04-02 15:55:21 +13:00
Nathan Bossart
ba8b891236 doc: Add missing description for DROP SUBSCRIPTION IF EXISTS.
Oversight in commit 665d1fad99.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPv72haFerrCdYdmF6hu6o2jKcGzkXehom%2BsP-JBBmOVDg%40mail.gmail.com
Backpatch-through: 14
2026-04-01 09:48:48 -05:00
Tom Lane
811f3263a4 Be more careful to preserve consistency of a tuplestore.
Several places in tuplestore.c would leave the tuplestore data
structure effectively corrupt if some subroutine were to throw
an error.  Notably, if WRITETUP() failed after some number of
successful calls within dumptuples(), the tuplestore would
contain some memtuples pointers that were apparently live
entries but in fact pointed to pfree'd chunks.

In most cases this sort of thing is fine because transaction
abort cleanup is not too picky about the contents of memory that
it's going to throw away anyway.  There's at least one exception
though: if a Portal has a holdStore, we're going to call
tuplestore_end() on that, even during transaction abort.
So it's not cool if that tuplestore is corrupt, and that means
tuplestore.c has to be more careful.

This oversight demonstrably leads to crashes in v15 and before,
if a holdable cursor fails to persist its data due to an undersized
temp_file_limit setting.  Very possibly the same thing can happen in
v16 and v17 as well, though the specific test case submitted failed
to fail there (cf. 095555daf).  The failure is accidentally dodged
as of v18 because 590b045c3 got rid of tuplestore_end's retail tuple
deletion loop.  Still, it seems unwise to permit tuplestores to become
internally inconsistent in any branch, so I've applied the same fix
across the board.

Since the known test case for this is rather expensive and doesn't
fail in recent branches, I've omitted it.

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 14
2026-03-30 13:59:54 -04:00
David Rowley
6b2e091f02 Fix datum_image_*()'s inability to detect sign-extension variations
Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc.  Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine.  This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases.  When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc.  This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:

ERROR:  could not find memoization table entry

Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com
2026-03-30 16:17:39 +13:00
Andrew Dunstan
d3bb7841b4 Fix multiple bugs in astreamer pipeline code.
astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer.  After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data.  This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.

astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory).  Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.

astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.

astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.

astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file.  A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.

Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net

Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>

Backpatch-through: 15
2026-03-29 09:12:40 -04:00
Heikki Linnakangas
92cf11171b Avoid memory leak on error while parsing pg_stat_statements dump file
By using palloc() instead of raw malloc().

Reported-by: Gaurav Singh <gaurav.singh@yugabyte.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAEcQ1bYR9s4eQLFDjzzJHU8fj-MTbmRpW-9J-r2gsCn+HEsynw@mail.gmail.com
Backpatch-through: 14
2026-03-27 12:22:21 +02:00
Fujii Masao
246c296f00 Fix premature NULL lag reporting in pg_stat_replication
pg_stat_replication is documented to keep the last measured lag values for
a short time after the standby catches up, and then set them to NULL when
there is no WAL activity. However, previously lag values could become NULL
prematurely even while WAL activity was ongoing, especially in logical
replication.

This happened because the code cleared lag when two consecutive reply messages
indicated that the apply location had caught up with the send location.
It did not verify that the reported positions were unchanged, so lag could be
cleared even when positions had advanced between messages. In logical
replication, where the apply location often quickly catches up, this issue was
more likely to occur.

This commit fixes the issue by clearing lag only when the standby reports that
it has fully replayed WAL (i.e., both flush and apply locations have caught up
with the send location) and the write/flush/apply positions remain unchanged
across two consecutive reply messages.

The second message with unchanged positions typically results from
wal_receiver_status_interval, so lag values are cleared after that interval
when there is no activity. This avoids showing stale lag data while preventing
premature NULL values.

Even with this fix, lag may rarely become NULL during activity if identical
position reports are sent repeatedly. Eliminating such duplicate messages
would address this fully, but that change is considered too invasive for stable
branches and will be handled in master only later.

Backpatch to all supported branches.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
Backpatch-through: 14
2026-03-26 20:50:38 +09:00
John Naylor
9cee8644f1 Fix copy-paste error in test_ginpostinglist
The check for a mismatch on the second decoded item pointer
was an exact copy of the first item pointer check, comparing
orig_itemptrs[0] with decoded_itemptrs[0] instead of orig_itemptrs[1]
with decoded_itemptrs[1].  The error message also reported (0, 1) as
the expected value instead of (blk, off).  As a result, any decoding
error in the second item pointer (where the varbyte delta encoding
is exercised) would go undetected.

This has been wrong since commit bde7493d1, so backpatch to all
supported versions.

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmSOD8R7tZjRLZsmpKtJLoqjgawAaM-Pne1j8B_Q2aQK8w@mail.gmail.com
Backpatch-through: 14
2026-03-24 17:17:48 +07:00
Heikki Linnakangas
a5f412107f Fix multixact backwards-compatibility with CHECKPOINT race condition
If a CHECKPOINT record with nextMulti N is written to the WAL before
the CREATE_ID record for N, and N happens to be the first multixid on
an offset page, the backwards compatibility logic to tolerate WAL
generated by older minor versions (before commit 789d65364c) failed to
compensate for the missing XLOG_MULTIXACT_ZERO_OFF_PAGE record. In
that case, the latest_page_number was initialized at the start of WAL
replay to the page for nextMulti from the CHECKPOINT record, even if
we had not seen the CREATE_ID record for that multixid yet, which
fooled the backwards compatibility logic to think that the page was
already initialized.

To fix, track the last XLOG_MULTIXACT_ZERO_OFF_PAGE that we've seen
separately from latest_page_number. If we haven't seen any
XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, use
SimpleLruDoesPhysicalPageExist() to check if the page needs to be
initialized.

Reported-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Analyzed-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/c4ef1737-8cba-458e-b6fd-4e2d6011e985.duankunren.dkr@alibaba-inc.com
Backpatch-through: 14-18
2026-03-23 12:02:35 +02:00
Tom Lane
5540f9c430 Fix finalization of decompressor astreamers.
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size.  This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage.  But that may not
be true in future uses.

Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15
2026-03-22 18:06:48 -04:00
Jeff Davis
3a35ab1d01 Fix dependency on FDW handler.
ALTER FOREIGN DATA WRAPPER could drop the dependency on the handler
function if it wasn't explicitly specified.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/35c44a4b7fb76d35418c4d66b775a88f4ce60c86.camel@j-davis.com
Backpatch-through: 14
2026-03-19 15:01:22 -07:00
Fujii Masao
fa9f2e3175 Fix WAL flush LSN used by logical walsender during shutdown
Commit 6eedb2a5fd made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.

A similar issue previously existed in the GiST code. Commit b1f14c9672
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().

This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.

Backpatch to all supported versions.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
Backpatch-through: 14
2026-03-17 08:12:41 +09:00
Tomas Vondra
34baa313e3 Tighten asserts on ParallelWorkerNumber
The comment about ParallelWorkerNumbr in parallel.c says:

  In parallel workers, it will be set to a value >= 0 and < the number
  of workers before any user code is invoked; each parallel worker will
  get a different parallel worker number.

However asserts in various places collecting instrumentation allowed
(ParallelWorkerNumber == num_workers). That would be a bug, as the value
is used as index into an array with num_workers entries.

Fixed by adjusting the asserts accordingly. Backpatch to all supported
versions.

Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 14
2026-03-14 15:33:14 +01:00
Tomas Vondra
ce06b5740e Use GetXLogInsertEndRecPtr in gistGetFakeLSN
The function used GetXLogInsertRecPtr() to generate the fake LSN. Most
of the time this is the same as what XLogInsert() would return, and so
it works fine with the XLogFlush() call. But if the last record ends at
a page boundary, GetXLogInsertRecPtr() returns LSN pointing after the
page header. In such case XLogFlush() fails with errors like this:

  ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000

Such failures are very hard to trigger, particularly outside aggressive
test scenarios.

Fixed by introducing GetXLogInsertEndRecPtr(), returning the correct LSN
without skipping the header. This is the same as GetXLogInsertRecPtr(),
except that it calls XLogBytePosToEndRecPtr().

Initial investigation by me, root cause identified by Andres Freund.

This is a long-standing bug in gistGetFakeLSN(), probably introduced by
c6b92041d3 in PG13. Backpatch to all supported versions.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
Backpatch-through: 14
2026-03-13 23:26:36 +01:00
Michael Paquier
0b0041b942 xml2: Fix failure with xslt_process() under -fsanitize=undefined
The logic of xslt_process() has never considered the fact that
xsltSaveResultToString() would return NULL for an empty string (the
upstream code has always done so, with a string length of 0).  This
would cause memcpy() to be called with a NULL pointer, something
forbidden by POSIX.

Like 46ab07ffda and similar fixes, this is backpatched down to all the
supported branches, with a test case to cover this scenario.  An empty
string has been always returned in xml2 in this case, based on the
history of the module, so this is an old issue.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c516a0d9-4406-47e3-9087-5ca5176ebcf9@gmail.com
Backpatch-through: 14
2026-03-13 16:06:50 +09:00
Fujii Masao
d0f4b6350d doc: Document IF NOT EXISTS option for ALTER FOREIGN TABLE ADD COLUMN.
Commit 2cd40adb85 added the IF NOT EXISTS option to ALTER TABLE ADD COLUMN.
This also enabled IF NOT EXISTS for ALTER FOREIGN TABLE ADD COLUMN,
but the ALTER FOREIGN TABLE documentation was not updated to mention it.

This commit updates the documentation to describe the IF NOT EXISTS option for
ALTER FOREIGN TABLE ADD COLUMN.

While updating that section, also this commit clarifies that the COLUMN keyword
is optional in ALTER FOREIGN TABLE ADD/DROP COLUMN. Previously, part of
the documentation could be read as if COLUMN were required.

This commit adds regression tests covering these ALTER FOREIGN TABLE syntaxes.

Backpatch to all supported versions.

Suggested-by: Fujii Masao <masao.fujii@gmail.com>
Author: Chao Li <lic@highgo.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFk=rrhrwGwPtQxBesbT4DzSZ86Q3ftcwCu3AR5bOiXLw@mail.gmail.com
Backpatch-through: 14
2026-03-09 18:25:15 +09:00
Michael Paquier
46c93b7056 Fix size underestimation of DSA pagemap for odd-sized segments
When make_new_segment() creates an odd-sized segment, the pagemap was
only sized based on a number of usable_pages entries, forgetting that a
segment also contains metadata pages, and that the FreePageManager uses
absolute page indices that cover the entire segment.  This
miscalculation could cause accesses to pagemap entries to be out of
bounds.  During subsequent reuse of the allocated segment, allocations
landing on pages with indices higher than usable_pages could cause
out-of-bounds pagemap reads and/or writes.  On write, 'span' pointers
are stored into the data area, corrupting the allocated objects.  On
read (aka during a dsa_free), garbage is interpreted as a span pointer,
typically crashing the server in dsa_get_address().

The normal geometric path correctly sizes the pagemap for all pages in
the segment.  The odd-sized path needs to do the same, but it works
forward from usable_pages rather than backward from total_size.

This commit fixes the sizing of the odd-sized case by adding pagemap
entries for the metadata pages after the initial metadata_bytes
calculation, using an integer ceiling division to compute the exact
number of additional entries needed in one go, avoiding any iteration in
the calculation.

An assertion is added in the code path for odd-sized segments, ensuring
that the pagemap includes the metadata area, and that the result is
appropriately sized.

This problem would show up depending on the size requested for the
allocation of a DSA segment.  The reporter has noticed this issue when a
parallel hash join makes a DSA allocation large enough to trigger the
odd-sized segment path, but it could happen for anything that does a DSA
allocation.

A regression test is added to test_dsa, down to v17 where the test
module has been introduced.  This adds a set of cheap tests to check the
problem, the new assertion being useful for this purpose.  Sami has
proposed a test that took a longer time than what I have done here; the
test committed is faster and good enough to check the odd-sized
allocation path.

Author: Paul Bunn <paul.bunn@icloud.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/044401dcabac$fe432490$fac96db0$@icloud.com
Backpatch-through: 14
2026-03-09 13:46:37 +09:00
Fujii Masao
42734f2966 Fix publisher shutdown hang caused by logical walsender busy loop.
Previously, when logical replication was running, shutting down
the publisher could cause the logical walsender to enter a busy loop
and prevent the publisher from completing shutdown.

During shutdown, the logical walsender waits for all pending WAL
to be written out. However, some WAL records could remain unflushed,
causing the walsender to wait indefinitely.

The issue occurred because the walsender used XLogBackgroundFlush() to
flush pending WAL. This function does not guarantee that all WAL is written.
For example, WAL generated by a transaction without an assigned
transaction ID that aborts might not be flushed.

This commit fixes the bug by making the logical walsender call XLogFlush()
instead, ensuring that all pending WAL is written and preventing
the busy loop during shutdown.

Backpatch to all supported versions.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAO6_Xqo3co3BuUVEVzkaBVw9LidBgeeQ_2hfxeLMQcXwovB3GQ@mail.gmail.com
Backpatch-through: 14
2026-03-06 16:44:59 +09:00
Tom Lane
9a42888a32 Exit after fatal errors in client-side compression code.
It looks like whoever wrote the astreamer (nee bbstreamer) code
thought that pg_log_error() is equivalent to elog(ERROR), but
it's not; it just prints a message.  So all these places tried to
continue on after a compression or decompression error return,
with the inevitable result being garbage output and possibly
cascading error messages.  We should use pg_fatal() instead.

These error conditions are probably pretty unlikely in practice,
which no doubt accounts for the lack of field complaints.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1531718.1772644615@sss.pgh.pa.us
Backpatch-through: 15
2026-03-05 14:43:21 -05:00
Alexander Korotkov
8bfaae6fb2 Fix handling of updated tuples in the MERGE statement
This branch missed the IsolationUsesXactSnapshot() check.  That led to EPQ on
repeatable read and serializable isolation levels.  This commit fixes the
issue and provides a simple isolation check for that.  Backpatch through v15
where MERGE statement was introduced.

Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAPpHfdvzZSaNYdj5ac-tYRi6MuuZnYHiUkZ3D-AoY-ny8v%2BS%2Bw%40mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Backpatch-through: 15
2026-03-05 19:57:32 +02:00
Fujii Masao
87da83bde9 doc: Clarify that COLUMN is optional in ALTER TABLE ... ADD/DROP COLUMN.
In ALTER TABLE ... ADD/DROP COLUMN, the COLUMN keyword is optional. However,
part of the documentation could be read as if COLUMN were required, which may
mislead users about the command syntax.

This commit updates the ALTER TABLE documentation to clearly state that
COLUMN is optional for ADD and DROP.

Also this commit adds regression tests covering ALTER TABLE ... ADD/DROP
without the COLUMN keyword.

Backpatch to all supported versions.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2n6ShLMOnjOtf63TjjgGbgiTVT5OMsSOFmbjGb6Xue1Bw@mail.gmail.com
Backpatch-through: 14
2026-03-05 12:57:52 +09:00
Michael Paquier
270e7b4ff5 Fix rare instability in recovery TAP test 004_timeline_switch
This fixes a problem similar to ad8c86d22c.  In this case, the test
could fail under the following circumstances:
- The primary is stopped with teardown_node(), meaning that it may not
be able to send all its WAL records to standby_1 and standby_2.
- If standby_2 receives more records than standby_1, attempting to
reconnect standby_2 to the promoted standby_1 would fail because of a
timeline fork.

This race condition is fixed with a simple trick: instead of tearing
down the primary, it is stopped cleanly so as all the WAL records of the
primary are received and flushed by both standby_1 and standby_2.  Once
we do that, there is no need for a wait_for_catchup() before stopping
the node.  The test wants to check that a timeline jump can be achieved
when reconnecting a standby to a promoted standby in the same cluster,
hence an immediate stop of the primary is not required.

This failure is harder to reach than the previous instability of
009_twophase, still the buildfarm has been able to detect this failure
at least once.  I have tried Alexander Lakhin's test trick with the
bgwriter and very aggressive standby snapshots, but I could not
reproduce it directly.  It is reachable, as the buildfarm has proved.

Backpatch down to all supported branches, and this problem can lead to
spurious failures in the buildfarm.

Discussion: https://postgr.es/m/493401a8-063f-436a-8287-a235d9e065fc@gmail.com
Backpatch-through: 14
2026-03-05 10:06:06 +09:00
Tom Lane
4548e87466 Fix yet another bug in archive streamer with LZ4 decompression.
The code path in astreamer_lz4_decompressor_content() that updated
the output pointers when the output buffer isn't full was wrong.
It advanced next_out by bytes_written, which could include previous
decompression output not just that of the current cycle.  The
correct amount to advance is out_size.  While at it, make the
output pointer updates look more like the input pointer updates.

This bug is pretty hard to reach, as it requires consecutive
compression frames that are too small to fill the output buffer.
pg_dump could have produced such data before 66ec01dc4, but
I'm unsure whether any files we use astreamer with would be
likely to contain problematic data.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0594CC79-1544-45DD-8AA4-26270DE777A7@gmail.com
Backpatch-through: 15
2026-03-04 12:08:37 -05:00
Álvaro Herrera
988b9588da
Don't malloc(0) in EventTriggerCollectAlterTSConfig
Author: Florin Irion <florin.irion@enterprisedb.com>
Discussion: https://postgr.es/m/c6fff161-9aee-4290-9ada-71e21e4d84de@gmail.com
2026-03-04 15:04:53 +01:00
Heikki Linnakangas
a6b11ac4c4 Skip prepared_xacts test if max_prepared_transactions < 2
This reduces maintenance overhead, as we no longer need to update the
dummy expected output file every time the .sql file changes.

Discussion: https://www.postgresql.org/message-id/1009073.1772551323@sss.pgh.pa.us
Backpatch-through: 14
2026-03-04 11:21:33 +02:00
Michael Paquier
be233b301e Fix rare instability in recovery TAP test 009_twophase
The phase of the test where we want to check that 2PC transactions
prepared on a primary can be committed on a promoted standby relied on
an immediate stop of the primary.  This logic has a race condition: it
could be possible that some records (most likely standby snapshot
records) are generated on the primary before it finishes its shutdown,
without the promoted standby know about them.  When the primary is
recycled as new standby, the test could fail because of a timeline fork
as an effect of these extra records.

This fix takes care of the instability by doing a clean stop of the
primary instead of a teardown (aka immediate stop), so as all records
generated on the primary are sent to the promoted standby and flushed
there.  There is no need for a teardown of the primary in this test
scenario: the commit of 2PC transactions on a promoted standby do not
care about the state of the primary, only of the standby.

This race is very hard to hit in practice, even slow buildfarm members
like skink have a very low rate of reproduction.  Alexander Lakhin has
come up with a recipe to improve the reproduction rate a lot:
- Enable -DWAL_DEBUG.
- Patch the bgwriter so as standby snapshots are generated every
milliseconds.
- Run 009_twophase tests under heavy parallelism.

With this method, the failure appears after a couple of iterations.
With the fix in place, I have been able to run more than 50 iterations
of the parallel test sequence, without seeing a failure.

Issue introduced in 30820982b2, due to a copy-pasto coming from the
surrounding tests.  Thanks also to Hayato Kuroda for digging into the
details of the failure.  He has proposed a fix different than the one of
this commit.  Unfortunately, it relied on injection points, feature only
available in v17.  The solution of this commit is simpler, and can be
applied to v14~v16.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b0102688-6d6c-c86a-db79-e0e91d245b1a@gmail.com
Backpatch-through: 14
2026-03-04 16:31:03 +09:00
Fujii Masao
4574dd44da doc: Clarify that empty COMMENT string removes the comment.
Clarify the documentation of COMMENT ON to state that specifying an empty
string is treated as NULL, meaning that the comment is removed.

This makes the behavior explicit and avoids possible confusion about how
empty strings are handled.

Also adds regress test cases that use empty string to remove a comment.

Backpatch to all supported versions.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Shengbin Zhao <zshengbin91@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: zhangqiang <zhang_qiang81@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/26476097-B1C1-4BA8-AA92-0AD0B8EC7190@gmail.com
Backpatch-through: 14
2026-03-03 14:47:10 +09:00
Nathan Bossart
8fc45ac5d9 basic_archive: Allow archive directory to be missing at startup.
Presently, the GUC check hook for basic_archive.archive_directory
checks that the specified directory exists.  Consequently, if the
directory does not exist at server startup, archiving will be stuck
indefinitely, even if it appears later.  To fix, remove this check
from the hook so that archiving will resume automatically once the
directory is present.  basic_archive must already be prepared to
deal with the directory disappearing at any time, so no additional
special handling is required.

Reported-by: Олег Самойлов <splarv@ya.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/73271769675212%40mail.yandex.ru
Backpatch-through: 15
2026-03-02 13:12:25 -06:00
Michael Paquier
8eedbc2cc4 test_custom_types: Test module with fancy custom data types
This commit adds a new test module called "test_custom_types", that can
be used to stress code paths related to custom data type
implementations.

Currently, this is used as a test suite to validate the set of fixes
done in 3b7a6fa157, that requires some typanalyze callbacks that can
force very specific backend behaviors, as of:
- typanalyze callback that returns "false" as status, to mark a failure
in computing statistics.
- typanalyze callback that returns "true" but let's the backend know
that no interesting stats could be computed, with stats_valid set to
"false".

This could be extended more in the future if more problems are found.
For simplicity, the module uses a fake int4 data type, that requires a
btree operator class to be usable with extended statistics.  The type is
created by the extension, and its properties are altered in the test.

Like 3b7a6fa157, this module is backpatched down to v14, for coverage
purposes.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aaDrJsE1I5mrE-QF@paquier.xyz
Backpatch-through: 14
2026-03-02 11:10:39 +09:00
Michael Paquier
f033abc6c4 Fix set of issues with extended statistics on expressions
This commit addresses two defects regarding extended statistics on
expressions:
- When building extended statistics in lookup_var_attr_stats(), the call
to examine_attribute() did not account for the possibility of a NULL
return value.  This can happen depending on the behavior of a typanalyze
callback — for example, if the callback returns false, if no rows are
sampled, or if no statistics are computed.  In such cases, the code
attempted to build MCV, dependency, and ndistinct statistics using a
NULL pointer, incorrectly assuming valid statistics were available,
which could lead to a server crash.
- When loading extended statistics for expressions,
statext_expressions_load() did not account for NULL entries in the
pg_statistic array storing expression statistics.  Such NULL entries can
be generated when statistics collection fails for an expression, as may
occur during the final step of serialize_expr_stats().  An extended
statistics object defining N expressions requires N corresponding
elements in the pg_statistic array stored for the expressions, and some
of these elements can be NULL.  This situation is reachable when a
typanalyze callback returns true, but sets stats_valid to indicate that
no useful statistics could be computed.

While these scenarios cannot occur with in-core typanalyze callbacks, as
far as I have analyzed, they can be triggered by custom data types with
custom typanalyze implementations, at least.

No tests are added in this commit.  A follow-up commit will introduce a
test module that can be extended to cover similar edge cases if
additional issues are discovered.  This takes care of the core of the
problem.

Attribute and relation statistics already offer similar protections:
- ANALYZE detects and skips the build of invalid statistics.
- Invalid catalog data is handled defensively when loading statistics.

This issue exists since the support for extended statistics on
expressions has been added, down to v14 as of a4d75c86bf.  Backpatch
to all supported stable branches.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aaDrJsE1I5mrE-QF@paquier.xyz
Backpatch-through: 14
2026-03-02 09:38:46 +09:00
Etsuro Fujita
cf370919a9 postgres_fdw: Fix thinko in comment for UserMappingPasswordRequired().
This commit also rephrases this comment to improve readability.

Oversight in commit 6136e94dc.

Reported-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Author: Andreas Karlsson <andreas@proxel.se>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK16pDnM_wU3kmquPj-M9MYqG3y0BdntRZ0eytqbCaFY3WQ%40mail.gmail.com
Backpatch-through: 14
2026-02-27 17:05:05 +09:00
Jeff Davis
2b993167fc Fix more multibyte issues in ltree.
Commit 84d5efa7e3 missed some multibyte issues caused by short-circuit
logic in the callers. The callers assumed that if the predicate string
is longer than the label string, then it couldn't possibly be a match,
but it can be when using case-insensitive matching (LVAR_INCASE) if
casefolding changes the byte length.

Fix by refactoring to get rid of the short-circuit logic as well as
the function pointer, and consolidate the logic in a replacement
function ltree_label_match().

Discussion: https://postgr.es/m/02c6ef6cf56a5013ede61ad03c7a26affd27d449.camel@j-davis.com
Backpatch-through: 14
2026-02-26 12:26:13 -08:00
Tom Lane
e22821c7bd Fix Solution.pm for change in pg_config.h contents.
In commits 1d97e4788 et al, I forgot that pre-v17 branches
require updating Solution.pm when changing the set of
symbols generated in pg_config.h.  Per buildfarm.
2026-02-26 12:26:52 -05:00
Tom Lane
41d75b9a4a Use CXXFLAGS instead of CFLAGS for linking C++ code
Otherwise, this would break if using C and C++ compilers from
different families and they understand different options.  It already
used the right flags for compiling, this is only for linking.  Also,
the meson setup already did this correctly.

Back-patch of v18 commit 365b5a345 into older supported branches.
At the time we were only aware of trouble in v18, but as shown
by buildfarm member siren, older branches can hit the problem too.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/228700.1722717983@sss.pgh.pa.us
Discussion: https://postgr.es/m/3109540.1771698685@sss.pgh.pa.us
Backpatch-through: 14-17
2026-02-26 12:06:58 -05:00
Noah Misch
44d29f5c6c EUC_CN, EUC_JP, EUC_KR, EUC_TW: Skip U+00A0 tests instead of failing.
Settings that ran the new test euc_kr.sql to completion would fail these
older src/pl tests.  Use alternative expected outputs, for which psql
\gset and \if have reduced the maintenance burden.  This fixes
"LANG=ko_KR.euckr LC_MESSAGES=C make check-world".  (LC_MESSAGES=C fixes
IO::Pty usage in tests 010_tab_completion and 001_password.)  That file
is new in commit c67bef3f32.  Back-patch
to v14, like that commit.

Discussion: https://postgr.es/m/20260217184758.da.noahmisch@microsoft.com
Backpatch-through: 14
2026-02-25 18:13:35 -08:00
Fujii Masao
e81c61ee4a doc: Clarify INCLUDING COMMENTS behavior in CREATE TABLE LIKE.
The documentation for the INCLUDING COMMENTS option of the LIKE clause
in CREATE TABLE was inaccurate and incomplete. It stated that comments for
copied columns, constraints, and indexes are copied, but regarding comments
on constraints in reality only comments on CHECK and NOT NULL constraints
are copied; comments on other constraints (such as primary keys) are not.
In addition, comments on extended statistics are copied, but this was not
documented.

The CREATE FOREIGN TABLE documentation had a similar omission: comments
on extended statistics are also copied, but this was not mentioned.

This commit updates the documentation to clarify the actual behavior.
The CREATE TABLE reference now specifies that comments on copied columns,
CHECK constraints, NOT NULL constraints, indexes, and extended statistics are
copied. The CREATE FOREIGN TABLE reference now notes that comments on
extended statistics are copied as well.

Backpatch to all supported versions. Documentation updates related to
CREATE FOREIGN TABLE LIKE and NOT NULL constraint comment copying are
not applied to v17 and earlier, since those features were introduced in v18.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHSOSGcaYDvHF8EYCUCfGPjbRwGFsJ23cx5KbJ1X6JouQ@mail.gmail.com
Backpatch-through: 14
2026-02-26 09:03:34 +09:00
Fujii Masao
74ee3e9e73 Fix ProcWakeup() resetting wrong waitStart field.
Previously, when one process woke another that was waiting on a lock,
ProcWakeup() incorrectly cleared its own waitStart field (i.e.,
MyProc->waitStart) instead of that of the process being awakened.
As a result, the awakened process retained a stale lock-wait start timestamp.

This did not cause user-visible issues. pg_locks.waitstart was reported as
NULL for the awakened process (i.e., when pg_locks.granted is true),
regardless of the waitStart value.

This bug was introduced by commit 46d6e5f567.

This commit fixes this by resetting the waitStart field of the process
being awakened in ProcWakeup().

Backpatch to all supported branches.

Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: ji xu <thanksgreed@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/537BD852-EC61-4D25-AB55-BE8BE46D07D7@gmail.com
Backpatch-through: 14
2026-02-26 08:51:01 +09:00
Tom Lane
7e788a585a Allow PG_PRINTF_ATTRIBUTE to be different in C and C++ code.
Although clang claims to be compatible with gcc's printf format
archetypes, this appears to be a falsehood: it likes __syslog__
(which gcc does not, on most platforms) and doesn't accept
gnu_printf.  This means that if you try to use gcc with clang++
or clang with g++, you get compiler warnings when compiling
printf-like calls in our C++ code.  This has been true for quite
awhile, but it's gotten more annoying with the recent appearance
of several buildfarm members that are configured like this.

To fix, run separate probes for the format archetype to use with the
C and C++ compilers, and conditionally define PG_PRINTF_ATTRIBUTE
depending on __cplusplus.

(We could alternatively insist that you not mix-and-match C and
C++ compilers; but if the case works otherwise, this is a poor
reason to insist on that.)

This commit back-patches 0909380e4 into supported branches.

Discussion: https://postgr.es/m/986485.1764825548@sss.pgh.pa.us
Discussion: https://postgr.es/m/3988414.1771950285@sss.pgh.pa.us
Backpatch-through: 14-18
2026-02-25 11:57:26 -05:00
Tom Lane
cc9c22a516 Fix some cases of indirectly casting away const.
Newest versions of gcc+glibc are able to detect cases where code
implicitly casts away const by assigning the result of strchr() or
a similar function applied to a "const char *" value to a target
variable that's just "char *".  This of course creates a hazard of
not getting a compiler warning about scribbling on a string one was
not supposed to, so fixing up such cases is good.

This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring.

Thanks to Bertrand Drouvot for finding a couple more spots than
I had.

This commit back-patches relevant portions of 8f1791c61 and
9f7565c6c into supported branches.  However, there are two
places in ecpg (in v18 only) where a proper fix is more
complicated than seems appropriate for a back-patch.  I opted
to silence those two warnings by adding casts.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
Discussion: https://postgr.es/m/3988414.1771950285@sss.pgh.pa.us
Backpatch-through: 14-18
2026-02-25 11:19:50 -05:00
Tom Lane
b57d35dde7 Stabilize output of new isolation test insert-conflict-do-update-4.
The test added by commit 4b760a181 assumed that a table's physical
row order would be predictable after an UPDATE.  But a non-heap table
AM might produce some other order.  Even with heap AM, the assumption
seems risky; compare a3fd53bab for instance.  Adding an ORDER BY is
cheap insurance and doesn't break any goal of the test.

Author: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALT9ZEHcE6tpvumScYPO6pGk_ASjTjWojLkodHnk33dvRPHXVw@mail.gmail.com
Backpatch-through: 14
2026-02-25 10:51:42 -05:00
Jacob Champion
e726620d20 pg_upgrade: Use max_protocol_version=3.0 for older servers
The grease patch in 4966bd3ed found its first problem: prior to the
February 2018 patch releases, no server knew how to negotiate protocol
versions, so pg_upgrade needs to take that into account when speaking to
those older servers.

This will be true even after the grease feature is reverted; we don't
need anyone to trip over this again in the future. Backpatch so that all
supported versions of pg_upgrade can gracefully handle an update to the
default protocol version. (This is needed for any distributions that
link older binaries against newer libpqs, such as Debian.) Branches
prior to 18 need an additional version check, for the existence of
max_protocol_version.

Per buildfarm member crake.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2B%3D4QhCjssfNEoZVK8LPtWxnfkwT5p-PAeoxtG9gpNjqOQ%40mail.gmail.com
Backpatch-through: 14
2026-02-24 14:01:54 -08:00
Tom Lane
3eb6f61958 Stamp 15.17. 2026-02-23 17:01:47 -05:00
Peter Eisentraut
7063b9e92b Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 6b4371ce13670e5660c8660b2984e780610b2a77
2026-02-23 14:03:47 +01:00
Tom Lane
8dc47ebb10 Release notes for 18.3, 17.9, 16.13, 15.17, 14.22. 2026-02-22 12:22:41 -05:00
Thomas Munro
e3bfa4f589 Fix test_valid_server_encoding helper function.
Commit c67bef3f32 introduced this test helper function for use by
src/test/regress/sql/encoding.sql, but its logic was incorrect.  It
confused an encoding ID for a boolean so it gave the wrong results for
some inputs, and also forgot the usual return macro.  The mistake didn't
affect values actually used in the test, so there is no change in
behavior.

Also drop it and another missed function at the end of the test, for
consistency.

Backpatch-through: 14
Author: Zsolt Parragi <zsolt.parragi@percona.com>
2026-02-17 16:26:19 +13:00
Noah Misch
ec86152e02 Suppress new "may be used uninitialized" warning.
Various buildfarm members, having compilers like gcc 8.5 and 6.3, fail
to deduce that text_substring() variable "E" is initialized if
slice_size!=-1.  This suppression approach quiets gcc 8.5; I did not
reproduce the warning elsewhere.  Back-patch to v14, like commit
9f4fd119b2.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1157953.1771266105@sss.pgh.pa.us
Backpatch-through: 14
2026-02-16 18:05:02 -08:00
Michael Paquier
63c05e03bc hstore: Fix NULL pointer dereference with receive function
The receive function of hstore was not able to handle correctly
duplicate key values when a new duplicate links to a NULL value, where a
pfree() could be attempted on a NULL pointer, crashing due to a pointer
dereference.

This problem would happen for a COPY BINARY, when stacking values like
that:
aa => 5
aa => null

The second key/value pair is discarded and pfree() calls are attempted
on its key and its value, leading to a pointer dereference for the value
part as the value is NULL.  The first key/value pair takes priority when
a duplicate is found.

Per offline report.

Reported-by: "Anemone" <vergissmeinnichtzh@gmail.com>
Reported-by: "A1ex" <alex000young@gmail.com>
Backpatch-through: 14
2026-02-17 08:41:35 +09:00
Heikki Linnakangas
899de38d8f Don't reset 'latest_page_number' when replaying multixid truncation
'latest_page_number' is set to the correct value, according to
nextOffset, early at system startup. Contrary to the comment, it hence
should be set up correctly by the time we get to WAL replay.

This fixes a failure to replay WAL generated on older minor versions,
before commit 789d65364c (18.2, 17.8, 16.12, 15.16, 14.21). The
failure occurs after a truncation record has been replayed and looks
like this:

    FATAL:  could not access status of transaction 858112
    DETAIL:  Could not read from file "pg_multixact/offsets/000D" at offset 24576: read too few bytes.
    CONTEXT:  WAL redo at 3/2A3AB408 for MultiXact/CREATE_ID: 858111 offset 6695072 nmembers 5: 1048228 (sh) 1048271 (keysh) 1048316 (sh) 1048344 (keysh) 1048370 (sh)

Reported-by: Sebastian Webber <sebastian@swebber.me>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/20260214090150.GC2297@p46.dedyn.io;lightning.p46.dedyn.io
Backpatch-through: 14-18
2026-02-16 17:20:51 +02:00
Michael Paquier
0bc0fc789d pgcrypto: Tweak error message for incorrect session key length
The error message added in 379695d3cc referred to the public key being
too long.  This is confusing as it is in fact the session key included
in a PGP message which is too long.  This is harmless, but let's be
precise about what is wrong.

Per offline report.

Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 14
2026-02-16 12:18:33 +09:00