Commit graph

44089 commits

Author SHA1 Message Date
Amit Kapila
3510ebeb0d Prevent invalidation of newly created replication slots.
A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.

Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.

The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation, and a shared lock during the minimum LSN
calculation at checkpoints to serialize the process. This ensures that, if
WAL reservation occurs first, the checkpoint waits until restart_lsn is
updated before calculating the minimum LSN. If the checkpoint runs first,
subsequent WAL reservations pick a position at or after the latest
checkpoint's redo pointer.

We used a similar fix in HEAD (via commit 006dd4b2e5) and 18. The
difference is that in 17 and prior branches we need to additionally handle
the race condition with slot's minimum LSN computation during checkpoints.

Reported-by: suyu.cmj <mengjuan.cmj@alibaba-inc.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan.cmj@alibaba-inc.com
2026-01-08 07:17:56 +00:00
David Rowley
bb08ac7ac3 Fix issue with EVENT TRIGGERS and ALTER PUBLICATION
When processing the "publish" options of an ALTER PUBLICATION command,
we call SplitIdentifierString() to split the options into a List of
strings.  Since SplitIdentifierString() modifies the delimiter
character and puts NULs in their place, this would overwrite the memory
of the AlterPublicationStmt.  Later in AlterPublicationOptions(), the
modified AlterPublicationStmt is copied for event triggers, which would
result in the event trigger only seeing the first "publish" option
rather than all options that were specified in the command.

To fix this, make a copy of the string before passing to
SplitIdentifierString().

Here we also adjust a similar case in the pgoutput plugin.  There's no
known issues caused by SplitIdentifierString() here, so this is being
done out of paranoia.

Thanks to Henson Choi for putting together an example case showing the
ALTER PUBLICATION issue.

Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Backpatch-through: 14
2026-01-06 17:30:08 +13:00
Fujii Masao
358784645e Add TAP test for GUC settings passed via CONNECTION in logical replication.
Commit d926462d81 restored the behavior of passing GUC settings from
the CONNECTION string to the publisher's walsender, allowing per-connection
configuration.

This commit adds a TAP test to verify that behavior works correctly.

Since commit d926462d81 was recently applied and backpatched to v15,
this follow-up commit is also backpatched accordingly.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
2026-01-06 11:58:00 +09:00
Fujii Masao
7a990e801a Honor GUC settings specified in CREATE SUBSCRIPTION CONNECTION.
Prior to v15, GUC settings supplied in the CONNECTION clause of
CREATE SUBSCRIPTION were correctly passed through to
the publisher's walsender. For example:

        CREATE SUBSCRIPTION mysub
            CONNECTION 'options=''-c wal_sender_timeout=1000'''
            PUBLICATION ...

would cause wal_sender_timeout to take effect on the publisher's walsender.

However, commit f3d4019da5 changed the way logical replication
connections are established, forcing the publisher's relevant
GUC settings (datestyle, intervalstyle, extra_float_digits) to
override those provided in the CONNECTION string. As a result,
from v15 through v18, GUC settings in the CONNECTION string were
always ignored.

This regression prevented per-connection tuning of logical replication.
For example, using a shorter timeout for walsender connecting
to a nearby subscriber and a longer one for walsender connecting
to a remote subscriber.

This commit restores the intended behavior by ensuring that
GUC settings in the CONNECTION string are again passed through
and applied by the walsender, allowing per-connection configuration.

Backpatch to v15, where the regression was introduced.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
2026-01-06 11:54:14 +09:00
Thomas Munro
174bbc0677 jit: Fix jit_profiling_support when unavailable.
jit_profiling_support=true captures profile data for Linux perf.  On
other platforms, LLVMCreatePerfJITEventListener() returns NULL and the
attempt to register the listener would crash.

Fix by ignoring the setting in that case.  The documentation already
says that it only has an effect if perf support is present, and we
already did the same for older LLVM versions that lacked support.

No field reports, unsurprisingly for an obscure developer-oriented
setting.  Noticed in passing while working on commit 1a28b4b4.

Backpatch-through: 14
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com
2025-12-31 14:52:21 +13:00
Masahiko Sawada
123b851abd Fix a race condition in updating procArray->replication_slot_xmin.
Previously, ReplicationSlotsComputeRequiredXmin() computed the oldest
xmin across all slots without holding ProcArrayLock (when
already_locked is false), acquiring the lock just before updating the
replication slot xmin.

This could lead to a race condition: if a backend created a new slot
and updates the global replication slot xmin, another backend
concurrently running ReplicationSlotsComputeRequiredXmin() could
overwrite that update with an invalid or stale value. This happens
because the concurrent backend might have computed the aggregate xmin
before the new slot was accounted for, but applied the update after
the new slot had already updated the global value.

In the reported failure, a walsender for an apply worker computed
InvalidTransactionId as the oldest xmin and overwrote a valid
replication slot xmin value computed by a walsender for a tablesync
worker. Consequently, the tablesync worker computed a transaction ID
via GetOldestSafeDecodingTransactionId() effectively without
considering the replication slot xmin. This led to the error "cannot
build an initial slot snapshot as oldest safe xid %u follows
snapshot's xmin %u", which was an assertion failure prior to commit
240e0dbacd.

To fix this, we acquire ReplicationSlotControlLock in exclusive mode
during slot creation to perform the initial update of the slot
xmin. In ReplicationSlotsComputeRequiredXmin(), we hold
ReplicationSlotControlLock in shared mode until the global slot xmin
is updated in ProcArraySetReplicationSlotXmin(). This prevents
concurrent computations and updates of the global xmin by other
backends during the initial slot xmin update process, while still
permitting concurrent calls to ReplicationSlotsComputeRequiredXmin().

Backpatch to all supported versions.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Pradeep Kumar <spradeepkumar29@gmail.com>
Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com
Backpatch-through: 14
2025-12-30 10:56:25 -08:00
Thomas Munro
a989237aea jit: Remove -Wno-deprecated-declarations in 18+.
REL_18_STABLE and master have commit ee485912, so they always use the
newer LLVM opaque pointer functions.  Drop -Wno-deprecated-declarations
(commit a56e7b660) for code under jit/llvm in those branches, to catch
any new deprecation warnings that arrive in future version of LLVM.

Older branches continued to use functions marked deprecated in LLVM 14
and 15 (ie switched to the newer functions only for LLVM 16+), as a
precaution against unforeseen compatibility problems with bitcode
already shipped.  In those branches, the comment about warning
suppression is updated to explain that situation better.  In theory we
could suppress warnings only for LLVM 14 and 15 specifically, but that
isn't done here.

Backpatch-through: 14
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1407185.1766682319%40sss.pgh.pa.us
2025-12-30 14:32:01 +13:00
Thomas Munro
b3c8119e28 Fix Mkvcbuild.pm builds of test_cloexec.c.
Mkvcbuild.pm scrapes Makefile contents, but couldn't understand the
change made by commit bec2a0aa.  Revealed by BF animal hamerkop in
branch REL_16_STABLE.

1.  It used += instead of =, which didn't match the pattern that
Mkvcbuild.pm looks for.  Drop the +.

2.  Mkvcbuild.pm doesn't link PROGRAM executables with libpgport.  Apply
a local workaround to REL_16_STABLE only (later branches dropped
Mkvcbuild.pm).

Backpatch-through: 16
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/175163.1766357334%40sss.pgh.pa.us
2025-12-29 15:28:17 +13:00
Michael Paquier
52b27f5859 Fix pg_stat_get_backend_activity() to use multi-byte truncated result
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure
that the reported query string is correctly truncated when it finishes
with an incomplete multi-byte sequence.  However, the result returned by
the function was not what pgstat_clip_activity() generated, but the
non-truncated, original, contents from PgBackendStatus.st_activity_raw.

Oversight in 54b6cd589a, so backpatch all the way down.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com
Backpatch-through: 14
2025-12-27 17:23:53 +09:00
Amit Kapila
a1cdb81201 Update comments to reflect changes in 8e0d32a4a1.
Commit 8e0d32a4a1 fixed an issue by allowing the replication origin to be
created while marking the table sync state as SUBREL_STATE_DATASYNC.
Update the comment in check_old_cluster_subscription_state() to accurately
describe this corrected behavior.

Author: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 17, where the code was introduced
Discussion: https://postgr.es/m/CAA4eK1+KaSf5nV_tWy+SDGV6MnFnKMhdt41jJjSDWm6yCyOcTw@mail.gmail.com
Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz
2025-12-24 09:55:03 +00:00
Amit Kapila
0ed8f1afb1 Don't advance origin during apply failure.
The logical replication parallel apply worker could incorrectly advance
the origin progress during an error or failed apply. This behavior risks
transaction loss because such transactions will not be resent by the
server.

Commit 3f28b2fcac addressed a similar issue for both the apply worker and
the table sync worker by registering a before_shmem_exit callback to reset
origin information. This prevents the worker from advancing the origin
during transaction abortion on shutdown. This patch applies the same fix
to the parallel apply worker, ensuring consistent behavior across all
worker types.

As with 3f28b2fcac, we are backpatching through version 16, since parallel
apply mode was introduced there and the issue only occurs when changes are
applied before the transaction end record (COMMIT or ABORT) is received.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16
Discussion: https://postgr.es/m/TY4PR01MB169078771FB31B395AB496A6B94B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
2025-12-24 04:05:06 +00:00
Heikki Linnakangas
bb87d7fef1 Fix bug in following update chain when locking a heap tuple
After waiting for a concurrent updater to finish, heap_lock_tuple()
followed the update chain to lock all tuple versions. However, when
stepping from the initial tuple to the next one, it failed to check
that the next tuple's XMIN matches the initial tuple's XMAX. That's an
important check whenever following an update chain, and the recursive
part that follows the chain did it, but the initial step missed it.
Without the check, if the updating transaction aborts, the updated
tuple is vacuumed away and replaced by an unrelated tuple, the
unrelated tuple might get incorrectly locked.

Author: Jasper Smit <jasper.smit@servicenow.com>
Discussion: https://www.postgresql.org/message-id/CAOG+RQ74x0q=kgBBQ=mezuvOeZBfSxM1qu_o0V28bwDz3dHxLw@mail.gmail.com
Backpatch-through: 14
2025-12-23 13:37:25 +02:00
Tom Lane
4c9f262ba8 Add missing .gitignore for src/test/modules/test_cloexec. 2025-12-23 15:00:13 +09:00
Michael Paquier
e063ccc722 Fix orphaned origin in shared memory after DROP SUBSCRIPTION
Since ce0fdbfe97, a replication slot and an origin are created by each
tablesync worker, whose information is stored in both a catalog and
shared memory (once the origin is set up in the latter case).  The
transaction where the origin is created is the same as the one that runs
the initial COPY, with the catalog state of the origin becoming visible
for other sessions only once the COPY transaction has committed.  The
catalog state is coupled with a state in shared memory, initialized at
the same time as the origin created in the catalogs.  Note that the
transaction doing the initial data sync can take a long time, time that
depends on the amount of data to transfer from a publication node to its
subscriber node.

Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped
with the origins removed.  The removal of each origin relies on a
catalog lookup.  A worker still running the initial COPY would fail its
transaction, with the catalog state of the origin rolled back while the
shared memory state remains around.  The session running the DROP
SUBSCRIPTION should be in charge of cleaning up the catalog and the
shared memory state, but as there is no data in the catalogs the shared
memory state is not removed.  This issue would leave orphaned origin
data in shared memory, leading to a confusing state as it would still
show up in pg_replication_origin_status.  Note that this shared memory
data is sticky, being flushed on disk in replorigin_checkpoint at
checkpoint.  This prevents other origins from reusing a slot position
in the shared memory data.

To address this problem, the commit moves the creation of the origin at
the end of the transaction that precedes the one executing the initial
COPY, making the origin immediately visible in the catalogs for other
sessions, giving DROP SUBSCRIPTION a way to know about it.  A different
solution would have been to clean up the shared memory state using an
abort callback within the tablesync worker.  The solution of this commit
is more consistent with the apply worker that creates an origin in a
short transaction.

A test is added in the subscription test 004_sync.pl, which was able to
display the problem.  The test fails when this commit is reverted.

Reported-by: Tenglong Gu <brucegu@amazon.com>
Reported-by: Daisuke Higuchi <higudai@amazon.com>
Analyzed-by: Michael Paquier <michael@paquier.xyz>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz
Backpatch-through: 14
2025-12-23 14:32:21 +09:00
Thomas Munro
d4549176ea Fix printf format string warning on MinGW.
This is a back-patch of 1319997d to branches 14-17 to fix an old warning
about a printf type mismatch on MinGW, in anticipation of a potential
expansion of the scope of CI's CompilerWarnings checks.  Though CI began
in 15, BF animal fairwren also shows the warning in 14, so we might as
well fix that too.

Original commit message (except for new "Backpatch-through" tag):

Commit 517bf2d91 changed a printf format string to placate MinGW, which
at the time warned about "%lld".  Current MinGW is now warning about the
replacement "%I64d".  Reverting the change clears the warning on the
MinGW CI task, and hopefully it will clear it on build farm animal
fairywren too.

Backpatch-through: 14-17
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866A71B744BE01B3BF71791F5AEA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
2025-12-21 21:28:12 +13:00
Thomas Munro
0451859131 Clean up test_cloexec.c and Makefile.
An unused variable caused a compiler warning on BF animal fairywren, an
snprintf() call was redundant, and some buffer sizes were inconsistent.
Per code review from Tom Lane.

The Makefile's test ifeq ($(PORTNAME), win32) never succeeded due to a
circularity, so only Meson builds were actually compiling the new test
code, partially explaining why CI didn't tell us about the warning
sooner (the other problem being that CompilerWarnings only makes
world-bin, a problem for another commit).  Simplify.

Backpatch-through: 16, like commit c507ba55
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <tmunro@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1086088.1765593851%40sss.pgh.pa.us
2025-12-21 17:35:46 +13:00
Fujii Masao
699293d274 Add guard to prevent recursive memory context logging.
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.

This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.

While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.

Back-patch to v14, where memory context logging was introduced.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com>
Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com
Backpatch-through: 14
2025-12-19 12:07:21 +09:00
Noah Misch
1cdc07ad5a Sort DO_SUBSCRIPTION_REL dump objects independent of OIDs.
Commit 0decd5e89d missed
DO_SUBSCRIPTION_REL, leading to assertion failures.  In the unlikely use
case of diffing "pg_dump --binary-upgrade" output, spurious diffs were
possible.  As part of fixing that, align the DumpableObject naming and
sort order with DO_PUBLICATION_REL.  The overall effect of this commit
is to change sort order from (subname, srsubid) to (rel, subname).
Since DO_SUBSCRIPTION_REL is only for --binary-upgrade, accept that
larger-than-usual dump order change.  Back-patch to v17, where commit
9a17be1e24 introduced DO_SUBSCRIPTION_REL.

Reported-by: vignesh C <vignesh21@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2x3rd7C0_HjUpJFbxpAqXgm=QtoKfkEWDVA8h+JFpa_w@mail.gmail.com
Backpatch-through: 17
2025-12-18 10:23:51 -08:00
Heikki Linnakangas
4b6d096a0f Do not emit WAL for unlogged BRIN indexes
Operations on unlogged relations should not be WAL-logged. The
brin_initialize_empty_new_buffer() function didn't get the memo.

The function is only called when a concurrent update to a brin page
uses up space that we're just about to insert to, which makes it
pretty hard to hit. If you do manage to hit it, a full-page WAL record
is erroneously emitted for the unlogged index. If you then crash,
crash recovery will fail on that record with an error like this:

    FATAL:  could not create file "base/5/32819": File exists

Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/CALdSSPhpZXVFnWjwEBNcySx_vXtXHwB2g99gE6rK0uRJm-3GgQ@mail.gmail.com
Backpatch-through: 14
2025-12-18 15:09:23 +02:00
Noah Misch
bcb784e7d2 Assert lack of hazardous buffer locks before possible catalog read.
Commit 0bada39c83 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

Back-patch to v14 - v17.  This a back-patch of commit
f4ece891fc (from before v18 branched) to
all supported branches, to accompany the back-patch of commits 243e9b4
and 0bada39.  For catalog indexes, the bttextcmp() behavior that
motivated IsCatalogTextUniqueIndexOid() was v18-specific.  Hence, this
back-patch doesn't need that or its correction from commit
4a4ee0c2c1.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Noah Misch
d3e5d89504 WAL-log inplace update before revealing it to other sessions.
A buffer lock won't stop a reader having already checked tuple
visibility.  If a vac_update_datfrozenid() and then a crash happened
during inplace update of a relfrozenxid value, datfrozenxid could
overtake relfrozenxid.  That could lead to "could not access status of
transaction" errors.

Back-patch to v14 - v17.  This is a back-patch of commits:

- 8e7e672cda
  (main change, on master, before v18 branched)
- 8180136652
  (defect fix, on master, before v18 branched)

It reverses commit bc6bad8857, my revert
of the original back-patch.

In v14, this also back-patches the assertion removal from commit
7fcf2faf9c.

Discussion: https://postgr.es/m/20240620012908.92.nmisch@google.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Noah Misch
0f69beddea For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.

Back-patch to v14 - v17.  This is a back-patch of commits:

- 243e9b40f1
  (main change, on master, before v18 branched)
- 0bada39c83
  (defect fix, on master, before v18 branched)
- bae8ca82fd
  (cosmetics from post-commit review, on REL_18_STABLE)

It reverses commit c1099dd745, my revert
of the original back-patch of 243e9b4.

This back-patch omits the non-comment heap_decode() changes.  I find
those changes removed harmless code that was last necessary in v13.  See
discussion thread for details.  The back branches aren't the place to
remove such code.

Like the original back-patch, this doesn't change WAL, because these
branches use end-of-recovery SIResetAll().  All branches change the ABI
of extern function PrepareToInvalidateCacheTuple().  No PGXN extension
calls that, and there's no apparent use case in extensions.  Expect
".abi-compliance-history" edits to follow.

Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions)
Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions)
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Robert Haas
1d0fc2499f Switch memory contexts in ReinitializeParallelDSM.
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.

That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Backpatch-through: 14
Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
2025-12-16 10:59:04 -05:00
Michael Paquier
f5927da4ff Fail recovery when missing redo checkpoint record without backup_label
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found.  This is the same level of failure as when a
checkpoint record is missing.  This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.

In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.

Previously, the presence of the redo record was not checked.  If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored.  The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found.  These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL.  The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter.  The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all.  This leads to data loss, as
everything is missed between the redo record and the checkpoint record.

Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified.  The code is wrong
since at least 9.2, but I did not look at the exact point of origin.

This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.

Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists.  This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record.  The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.

On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.

Reported-by: Andres Freund <andres@anarazel.de>
Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
2025-12-16 13:29:39 +09:00
Heikki Linnakangas
cd1a887fe9 Clarify comment on multixid offset wraparound check
Coverity complained that offset cannot be 0 here because there's an
explicit check for "offset == 0" earlier in the function, but it
didn't see the possibility that offset could've wrapped around to 0.
The code is correct, but clarify the comment about it.

The same code exists in backbranches in the server
GetMultiXactIdMembers() function and in 'master' in the pg_upgrade
GetOldMultiXactIdSingleMember function. In backbranches Coverity
didn't complain about it because the check was merely an assertion,
but change the comment in all supported branches for consistency.

Per Tom Lane's suggestion.

Discussion: https://www.postgresql.org/message-id/1827755.1765752936@sss.pgh.pa.us
2025-12-15 11:47:58 +02:00
Michael Paquier
0bab0c3b74 Fix allocation formula in llvmjit_expr.c
An array of LLVMBasicBlockRef is allocated with the size used for an
element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef".
LLVMBasicBlockRef is a type that refers to a pointer, so this did not
directly cause a problem because both should have the same size, still
it is incorrect.

This issue has been spotted while reviewing a different patch, and
exists since 2a0faed9d7, so backpatch all the way down.

Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com
Backpatch-through: 14
2025-12-11 10:25:46 +09:00
Heikki Linnakangas
807b2f261d Fix bogus extra arguments to query_safe in test
The test seemed to incorrectly think that query_safe() takes an
argument that describes what the query does, similar to e.g.
command_ok(). Until commit bd8d9c9bdf the extra arguments were
harmless and were just ignored, but when commit bd8d9c9bdf introduced
a new optional argument to query_safe(), the extra arguments started
clashing with that, causing the test to fail.

Backpatch to v17, that's the oldest branch where the test exists. The
extra arguments didn't cause any trouble on the older branches, but
they were clearly bogus anyway.
2025-12-10 19:39:00 +02:00
Heikki Linnakangas
998d100cdb Fix some near-bugs related to ResourceOwner function arguments
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
2025-12-10 11:44:07 +02:00
Michael Paquier
d0518e965e Fix failures with cross-version pg_upgrade tests
Buildfarm members skimmer and crake have reported that pg_upgrade
running from v18 fails due to the changes of d52c24b0f8, with the
expectations that the objects removed in the test module
injection_points should still be present post upgrades, but the test
module does not have them anymore.

The origin of the issue is that the following test modules depend on
injection_points, but they do not drop the extension once the tests
finish, leaving its traces in the dumps used for the upgrades:
- gin, down to v17
- typcache, down to v18
- nbtree, HEAD-only
Test modules have no upgrade requirements, as they are used only for..
Tests, so there is no point in keeping them around.

An alternative solution would be to drop the databases created by these
modules in AdjustUpgrade.pm, but the solution of this commit to drop the
extension is simpler.  Note that there would be a catch if using a
solution based on AdjustUpgrade.pm as the database name used for the
test runs differs between configure and meson:
- configure relies on USE_MODULE_DB for the database name unicity, that
would build a database name based on the *first* entry of REGRESS, that
lists all the SQL tests.
- meson relies on a "name" field.

For example, for the test module "gin", the regression database is named
"regression_gin" under meson, while it is more complex for configure, as
of "contrib_regression_gin_incomplete_splits".  So a AdjustUpgrade.pm
would need a set of DROP DATABASE IF EXISTS to solve this issue, to cope
with each build system.

The failure has been caused by d52c24b0f8, and the problem can happen
with upgrade dumps from v17 and v18 to HEAD.  This problem is not
currently reachable in the back-branches, but it could be possible that
a future change in injection_points in stable branches invalidates this
theory, so this commit is applied down to v17 in the test modules that
matter.

Per discussion with Tom Lane and Heikki Linnakangas.

Discussion: https://postgr.es/m/2899652.1765167313@sss.pgh.pa.us
Backpatch-through: 17
2025-12-10 12:47:23 +09:00
Thomas Munro
f24af0e04c Fix O_CLOEXEC flag handling in Windows port.
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE
when opening files on Windows, making all file descriptors inheritable
by child processes.  This meant the O_CLOEXEC flag, added to many call
sites by commit 1da569ca1f (v16), was silently ignored.

The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it was a mis-
understanding of the code path.  In practice, the code was creating
inheritable handles in all cases.

This hasn't caused widespread problems because most child processes
(archive_command, COPY PROGRAM, etc.) operate on file paths passed as
arguments rather than inherited file descriptors.  Even if a child
wanted to use an inherited handle, it would need to learn the numeric
handle value, which isn't passed through our IPC mechanisms.

Nonetheless, the current behavior is wrong.  It violates documented
O_CLOEXEC semantics, contradicts our own code comments, and makes
PostgreSQL behave differently on Windows than on Unix.  It also creates
potential issues with future code or security auditing tools.

To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by
O_DSYNC.  We use different values in the back branches to preserve
existing values.  In pgwin32_open_handle() we set bInheritHandle
according to whether O_CLOEXEC is specified, for the same atomic
semantics as POSIX in multi-threaded programs that create processes.

Backpatch-through: 16
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments)
Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com
2025-12-10 09:10:31 +13:00
Amit Kapila
f2818868ae Fix LOCK_TIMEOUT handling in slotsync worker.
Previously, the slotsync worker relied on SIGINT for graceful shutdown
during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler
to cancel queries. Since the slotsync worker can lock catalog tables while
parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT
signals and potentially wait indefinitely on locks.

This patch replaces the slotsync worker's SIGINT handler with
StatementCancelHandler to correctly process query-cancel interrupts.
Additionally, the startup process now uses SIGUSR1 to signal the slotsync
worker to stop during promotion. The worker exits after detecting that the
shared memory flag stopSignaled is set.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, here it was introduced
Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
2025-12-09 07:02:08 +00:00
David Rowley
ca98d8ba10 Doc: fix typo in hash index documentation
Plus a similar fix to the README.

Backpatch as far back as the sgml issue exists.  The README issue does
exist in v14, but that seems unlikely to harm anyone.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
2025-12-09 14:42:40 +13:00
Heikki Linnakangas
cad40cec24 Fix setting next multixid's offset at offset wraparound
In commit 789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.

Discussion: https://www.postgresql.org/message-id/d9996478-389a-4340-8735-bfad456b313c@iki.fi
Backpatch-through: 14
2025-12-05 11:35:44 +02:00
Michael Paquier
9d4f6d17f5 Show version of nodes in output of TAP tests
This commit adds the version information of a node initialized by
Cluster.pm, that may vary depending on the install_path given by the
test.  The code was written so as the node information, that includes
the version number, was dumped before the version number was set.

This is particularly useful for the pg_upgrade TAP tests, that may mix
several versions for cross-version runs.  The TAP infrastructure also
allows mixing nodes with different versions, so this information can be
useful for out-of-core tests.

Backpatch down to v15, where Cluster.pm and the pg_upgrade TAP tests
have been introduced.

Author: Potapov Alexander <a.potapov@postgrespro.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/e59bb-692c0a80-5-6f987180@170377126
Backpatch-through: 15
2025-12-05 09:21:18 +09:00
Heikki Linnakangas
8ba61bc063 Set next multixid's offset when creating a new multixid
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.

The waiting mechanism was broken in a few scenarios:

- When nextMulti was advanced without WAL-logging the next
  multixid. For example, if a later multixid was already assigned and
  WAL-logged before the previous one was WAL-logged, and then the
  server crashed. In that case the next offset would never be set in
  the offsets SLRU, and a query trying to read it would get stuck
  waiting for it. Same thing could happen if pg_resetwal was used to
  forcibly advance nextMulti.

- In hot standby mode, a deadlock could happen where one backend waits
  for the next multixid assignment record, but WAL replay is not
  advancing because of a recovery conflict with the waiting backend.

The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.

Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru
Backpatch-through: 14
2025-12-03 19:15:21 +02:00
Dean Rasheed
c090965036 Avoid rewriting data-modifying CTEs more than once.
Formerly, when updating an auto-updatable view, or a relation with
rules, if the original query had any data-modifying CTEs, the rewriter
would rewrite those CTEs multiple times as RewriteQuery() recursed
into the product queries. In most cases that was harmless, because
RewriteQuery() is mostly idempotent. However, if the CTE involved
updating an always-generated column, it would trigger an error because
any subsequent rewrite would appear to be attempting to assign a
non-default value to the always-generated column.

This could perhaps be fixed by attempting to make RewriteQuery() fully
idempotent, but that looks quite tricky to achieve, and would probably
be quite fragile, given that more generated-column-type features might
be added in the future.

Instead, fix by arranging for RewriteQuery() to rewrite each CTE
exactly once (by tracking the number of CTEs already rewritten as it
recurses). This has the advantage of being simpler and more efficient,
but it does make RewriteQuery() dependent on the order in which
rewriteRuleAction() joins the CTE lists from the original query and
the rule action, so care must be taken if that is ever changed.

Reported-by: Bernice Southey <bernice.southey@gmail.com>
Author: Bernice Southey <bernice.southey@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAEDh4nyD6MSH9bROhsOsuTqGAv_QceU_GDvN9WcHLtZTCYM1kA@mail.gmail.com
Backpatch-through: 14
2025-11-29 12:32:12 +00:00
Tom Lane
e79b276621 Allow indexscans on partial hash indexes with implied quals.
Normally, if a WHERE clause is implied by the predicate of a partial
index, we drop that clause from the set of quals used with the index,
since it's redundant to test it if we're scanning that index.
However, if it's a hash index (or any !amoptionalkey index), this
could result in dropping all available quals for the index's first
key, preventing us from generating an indexscan.

It's fair to question the practical usefulness of this case.  Since
hash only supports equality quals, the situation could only arise
if the index's predicate is "WHERE indexkey = constant", implying
that the index contains only one hash value, which would make hash
a really poor choice of index type.  However, perhaps there are
other !amoptionalkey index AMs out there with which such cases are
more plausible.

To fix, just don't filter the candidate indexquals this way if
the index is !amoptionalkey.  That's a bit hokey because it may
result in testing quals we didn't need to test, but to do it
more accurately we'd have to redundantly identify which candidate
quals are actually usable with the index, something we don't know
at this early stage of planning.  Doesn't seem worth the effort.

Reported-by: Sergei Glukhov <s.glukhov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/e200bf38-6b45-446a-83fd-48617211feff@postgrespro.ru
Backpatch-through: 14
2025-11-27 13:09:59 -05:00
Amit Langote
b5511fed50 Fix error reporting for SQL/JSON path type mismatches
transformJsonFuncExpr() used exprType()/exprLocation() on the
possibly coerced path expression, which could be NULL when
coercion to jsonpath failed, leading to "cache lookup failed
for type 0" errors.

Preserve the original expression node so that type and location
in the "must be of type jsonpath" error are reported correctly.
Add regression tests to cover these cases.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CACJufxHunVg81JMuNo8Yvv_hJD0DicgaVN2Wteu8aJbVJPBjZA@mail.gmail.com
Backpatch-through: 17
2025-11-27 11:59:36 +09:00
Nathan Bossart
2fc5c50622 Teach DSM registry to retry entry initialization if needed.
If DSM registry entry initialization fails, backends could try to
use an uninitialized DSM segment, DSA, or dshash table (since the
entry is still added to the registry).  To fix, restructure the
code so that the registry retries initialization as needed.  This
commit also modifies pg_get_dsm_registry_allocations() to leave out
partially-initialized entries, as they shouldn't have any allocated
memory.

DSM registry entry initialization shouldn't fail often in practice,
but retrying was deemed better than leaving entries in a
permanently failed state (as was done by commit 1165a933aa, which
has since been reverted).

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
2025-11-26 15:12:25 -06:00
Nathan Bossart
c7e0f263d6 Revert "Teach DSM registry to ERROR if attaching to an uninitialized entry."
This reverts commit 1165a933aa (and the corresponding commits on
the back-branches).  In a follow-up commit, we'll teach the
registry to retry entry initialization instead of leaving it in a
permanently failed state.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
2025-11-26 11:37:21 -06:00
Andres Freund
427e886a79 lwlock: Fix, currently harmless, bug in LWLockWakeup()
Accidentally the code in LWLockWakeup() checked the list of to-be-woken up
processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that
HAS_WAITERS would not get unset immediately, but only during the next,
unnecessary, call to LWLockWakeup().

Luckily, as the code stands, this is just a small efficiency issue.

However, if there were (as in a patch of mine) a case in which LWLockWakeup()
would not find any backend to wake, despite the wait list not being empty,
we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging.

While the consequences in the backbranches are limited, the code as-is
confusing, and it is possible that there are workloads where the additional
wait list lock acquisitions hurt, therefore backpatch.

Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Backpatch-through: 14
2025-11-24 17:39:57 -05:00
David Rowley
232e0f5de4 Fix incorrect IndexOptInfo header comment
The comment incorrectly indicated that indexcollations[] stored
collations for both key columns and INCLUDE columns, but in reality it
only has elements for the key columns.  canreturn[] didn't get a mention,
so add that while we're here.

Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com
Backpatch-through: 14
2025-11-24 17:01:13 +13:00
Thomas Munro
60215eae7c jit: Adjust AArch64-only code for LLVM 21.
LLVM 21 changed the arguments of RTDyldObjectLinkingLayer's
constructor, breaking compilation with the backported
SectionMemoryManager from commit 9044fc1d.

cd585864c0

Backpatch-through: 14
Author: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/d25e6e4a-d1b4-84d3-2f8a-6c45b975f53d%40applied-asynchrony.com
2025-11-22 21:22:37 +13:00
Heikki Linnakangas
f2e0ca0af9 Print new OldestXID value in pg_resetwal when it's being changed
Commit 74cf7d46a9 added the --oldest-transaction-id option to
pg_resetwal, but forgot to update the code that prints all the new
values that are being set. Fix that.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/5461bc85-e684-4531-b4d2-d2e57ad18cba@iki.fi
Backpatch-through: 14
2025-11-19 18:06:23 +02:00
Tom Lane
075a763e2d Don't allow CTEs to determine semantic levels of aggregates.
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in
sub-selects within aggregate functions to affect the semantic levels
assigned to such aggregates.  It turns out this broke some related
cases, leading to assertion failures or strange planner errors such
as "unexpected outer reference in CTE query".  After experimenting
with some alternative rules for assigning the semantic level in
such cases, we've come to the conclusion that changing the level
is more likely to break things than be helpful.

Therefore, this patch undoes what b0cc0a71e changed, and instead
installs logic to throw an error if there is any reference to a
CTE that's below the semantic level that standard SQL rules would
assign to the aggregate based on its contained Var and Aggref nodes.
(The SQL standard disallows sub-selects within aggregate functions,
so it can't reach the troublesome case and hence has no rule for
what to do.)

Perhaps someone will come along with a legitimate query that this
logic rejects, and if so probably the example will help us craft
a level-adjustment rule that works better than what b0cc0a71e did.
I'm not holding my breath for that though, because the previous
logic had been there for a very long time before bug #19055 without
complaints, and that bug report sure looks to have originated from
fuzzing not from real usage.

Like b0cc0a71e, back-patch to all supported branches, though
sadly that no longer includes v13.

Bug: #19106
Reported-by: Kamil Monicz <kamil@monicz.dev>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org
Backpatch-through: 14
2025-11-18 12:56:55 -05:00
Thomas Munro
d66a922f92 Define PS_USE_CLOBBER_ARGV on GNU/Hurd.
Until d2ea2d310d, the PS_USE_PS_STRINGS
option was used on the GNU/Hurd. As this option got removed and
PS_USE_CLOBBER_ARGV appears to work fine nowadays on the Hurd, define
this one to re-enable process title changes on this platform.

In the 14 and 15 branches, the existing test for __hurd__ (added 25
years ago by commit 209aa77d, removed in 16 by the above commit) is left
unchanged for now as it was activating slightly different code paths and
would need investigation by a Hurd user.

Author: Michael Banck <mbanck@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com
Backpatch-through: 16
2025-11-17 12:48:37 +13:00
Dean Rasheed
d6c415c4b4 Fix Assert failure in EXPLAIN ANALYZE MERGE with a concurrent update.
When instrumenting a MERGE command containing both WHEN NOT MATCHED BY
SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a
concurrent update of the target relation could lead to an Assert
failure in show_modifytable_info(). In a non-assert build, this would
lead to an incorrect value for "skipped" tuples in the EXPLAIN output,
rather than a crash.

This could happen if the concurrent update caused a matched row to no
longer match, in which case ExecMerge() treats the single originally
matched row as a pair of not matched rows, and potentially executes 2
not-matched actions for the single source row. This could then lead to
a state where the number of rows processed by the ModifyTable node
exceeds the number of rows produced by its source node, causing
"skipped_path" in show_modifytable_info() to be negative, triggering
the Assert.

Fix this in ExecMergeMatched() by incrementing the instrumentation
tuple count on the source node whenever a concurrent update of this
kind is detected, if both kinds of merge actions exist, so that the
number of source rows matches the number of actions potentially
executed, and the "skipped" tuple count is correct.

Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE
actions was introduced.

Bug: #19111
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org
Backpatch-through: 17
2025-11-16 22:15:45 +00:00
Nathan Bossart
505ce19a20 Add note about CreateStatistics()'s selective use of check_rights.
Commit 5e4fcbe531 added a check_rights parameter to this function
for use by ALTER TABLE commands that re-create statistics objects.
However, we intentionally ignore check_rights when verifying
relation ownership because this function's lookup could return a
different answer than the caller's.  This commit adds a note to
this effect so that we remember it down the road.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 14
2025-11-14 13:20:09 -06:00
Fujii Masao
5bc251b288 pgbench: Fix assertion failure with multiple \syncpipeline in pipeline mode.
Previously, when pgbench ran a custom script that triggered retriable errors
(e.g., deadlocks) followed by multiple \syncpipeline commands in pipeline mode,
the following assertion failure could occur:

    Assertion failed: (res == ((void*)0)), function discardUntilSync, file pgbench.c, line 3594.

The issue was that discardUntilSync() assumed a pipeline sync result
(PGRES_PIPELINE_SYNC) would always be followed by either another sync result
or NULL. This assumption was incorrect: when multiple sync requests were sent,
a sync result could instead be followed by another result type. In such cases,
discardUntilSync() mishandled the results, leading to the assertion failure.

This commit fixes the issue by making discardUntilSync() correctly handle cases
where a pipeline sync result is followed by other result types. It now continues
discarding results until another pipeline sync followed by NULL is reached.

Backpatched to v17, where support for \syncpipeline command in pgbench was
introduced.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20251111105037.f3fc554616bc19891f926c5b@sraoss.co.jp
Backpatch-through: 17
2025-11-14 22:42:02 +09:00
Nathan Bossart
ac2800ddc1 Teach DSM registry to ERROR if attaching to an uninitialized entry.
If DSM entry initialization fails, backends could try to use an
uninitialized DSM segment, DSA, or dshash table (since the entry is
still added to the registry).  To fix, keep track of whether
initialization completed, and ERROR if a backend tries to attach to
an uninitialized entry.  We could instead retry initialization as
needed, but that seemed complicated, error prone, and unlikely to
help most cases.  Furthermore, such problems probably indicate a
coding error.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/dd36d384-55df-4fc2-825c-5bc56c950fa9%40gmail.com
Backpatch-through: 17
2025-11-12 14:30:11 -06:00