Commit graph

63813 commits

Author SHA1 Message Date
Amit Kapila
5984ea868e Change syntax of EXCEPT TABLE clause in publication commands.
Adjust the syntax of the EXCEPT clause in CREATE/ALTER PUBLICATION
added in commits fd366065e0 and 493f8c6439 to move the TABLE keyword
inside the relation list.

Old syntax:
CREATE PUBLICATION ... FOR ALL TABLES EXCEPT TABLE (t1, ...);
ALTER PUBLICATION  ... SET ALL TABLES EXCEPT TABLE (t1, ...);

New syntax:
CREATE PUBLICATION ... FOR ALL TABLES EXCEPT (TABLE t1, ...);
ALTER PUBLICATION  ... SET ALL TABLES EXCEPT (TABLE t1, ...);

This is to ensure that inclusion and exclusion list can be specified in
a same way. Previously, the exclusion table list can be specified as
TABLE (t1, t2, t3) and inclusion list can be specified as TABLE t1, t2,
t3, or TABLE t1, TABLE t2, TABLE t3.

This change is purely syntactic and does not alter behavior.

Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCC8XuwfX62qKBSfHUAoww_XB3_84HjswgL9jxQy696yw@mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
2026-03-31 09:40:51 +05:30
Tom Lane
786552e7f2 Doc: update ddl.sgml's description of cmin and cmax.
We long ago folded these two tuple header fields into one field
to save space.  However, nothing was done to the user-facing
documentation about them, perhaps with the idea that we'd add
code to emit something approximating the original definitions.
That never happened and presumably never will, so update the
text to reflect current reality.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+renyVYYboiTayRRE0j1oKpeB+NjEBSUXfwgEu6O0JESSmauQ@mail.gmail.com
2026-03-30 18:25:19 -04:00
Peter Eisentraut
c73e8ee061 Add warning option -Wold-style-declaration
This warning has been triggered a few times via the buildfarm (see
commits 8212625e53, 2b7259f855, afe86a9e73), so we might as well
add it so that everyone sees it.

(This is completely separate from the recently added
-Wold-style-definition.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/aa73q1aT0A3/vke/%40ip-10-97-1-34.eu-west-3.compute.internal
2026-03-30 23:34:13 +02:00
Jacob Champion
993368113c libpq: Add oauth_ca_file option to change CAs without debugging
PG18 hid the PGOAUTHCAFILE envvar behind PGOAUTHDEBUG=UNSAFE, because I
thought that any "real" production usage of private CA certificates
would have them added to the Curl system trust store. But there are use
cases, such as containerized environments, that prefer to manage custom
CA settings more granularly; some of them consider envvar configuration
of certificates to be standard practice.

Move PGOAUTHCAFILE out from under the debug flag, and add an
oauth_ca_file option to libpq to configure trusted CAs per connection.

Patch by Jonathan Gonzalez V., with some additional wordsmithing and
test organization by me.

Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/16a91d02795cb991963326a902afa764e4d721db.camel%40gmail.com
2026-03-30 14:14:45 -07:00
Nathan Bossart
bab2f27eaa Remove bits* typedefs.
In addition to removing the bits8, bits16, and bits32 typedefs,
this commit replaces all uses with uint8, uint16, or uint32.  bits*
provided little benefit beyond establishing the intent of the
variable, and they were inconsistently used for that purpose.
Third-party code should instead use the corresponding uint*
typedef.

Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/absbX33E4eaA0Ity%40nathan
2026-03-30 16:12:08 -05:00
Heikki Linnakangas
40c41dc773 Use ShmemInitStruct to allocate shmem for semaphores
This makes them visible in pg_shmem_allocations

Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
2026-03-30 23:39:35 +03:00
Melanie Plageman
378a216187 Set pd_prune_xid on insert
Now that on-access pruning can update the visibility map (VM) during
read-only queries, set the page’s pd_prune_xid hint during INSERT and on
the new page during UPDATE.

This allows heap_page_prune_and_freeze() to set the VM the first time a
page is read after being filled with tuples. This may avoid I/O
amplification by setting the page all-visible when it is still in shared
buffers and allowing later vacuums to skip scanning the page. It also
enables index-only scans of newly inserted data much sooner.

As a side benefit, this addresses a long-standing note in heap_insert()
and heap_multi_insert(): aborted inserts can now be pruned on-access
rather than lingering until the next VACUUM.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2026-03-30 16:07:11 -04:00
Melanie Plageman
b46e1e54d0 Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans, tid
range scans, sample scans, bitmap heap scans, and the underlying heap
relation in index scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, which could
trigger a write and potentially an FPI. It also allows more frequent
index-only scans, since they require pages to be marked all-visible in
the VM.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2026-03-30 15:47:07 -04:00
Nathan Bossart
e3637a05dc Add commit 874da8b1f6 to .git-blame-ignore-revs. 2026-03-30 14:35:24 -05:00
Peter Eisentraut
488ab592d9 configure: Apply -Werror=vla to C++ as well as C
The comment part of d9dd406fe2 mentioned that -Wvla is not applicable
for C++. That is not fully correct: it is true that VLAs are not part of the
C++ standard, and g++ with -pedantic will also warn about them as a non-standard
extension.  However, -Wvla itself works fine in C++ and will catch VLA
usage just as in C.

Fix configure.ac to apply -Werror=vla to C++ as well. There is no need to
fix meson.build as it already includes it in common_warning_flags.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/7bf60ab1-2b5d-4a77-93ce-815072a0a014%40eisentraut.org
2026-03-30 20:55:16 +02:00
Tom Lane
7394773450 Be more careful to preserve consistency of a tuplestore.
Several places in tuplestore.c would leave the tuplestore data
structure effectively corrupt if some subroutine were to throw
an error.  Notably, if WRITETUP() failed after some number of
successful calls within dumptuples(), the tuplestore would
contain some memtuples pointers that were apparently live
entries but in fact pointed to pfree'd chunks.

In most cases this sort of thing is fine because transaction
abort cleanup is not too picky about the contents of memory that
it's going to throw away anyway.  There's at least one exception
though: if a Portal has a holdStore, we're going to call
tuplestore_end() on that, even during transaction abort.
So it's not cool if that tuplestore is corrupt, and that means
tuplestore.c has to be more careful.

This oversight demonstrably leads to crashes in v15 and before,
if a holdable cursor fails to persist its data due to an undersized
temp_file_limit setting.  Very possibly the same thing can happen in
v16 and v17 as well, though the specific test case submitted failed
to fail there (cf. 095555daf).  The failure is accidentally dodged
as of v18 because 590b045c3 got rid of tuplestore_end's retail tuple
deletion loop.  Still, it seems unwise to permit tuplestores to become
internally inconsistent in any branch, so I've applied the same fix
across the board.

Since the known test case for this is rather expensive and doesn't
fail in recent branches, I've omitted it.

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 14
2026-03-30 13:59:58 -04:00
Heikki Linnakangas
681774315d Replace getopt() with our re-entrant variant in the backend
Some of these probably could continue using non-re-entrant getopt()
even if we start using threads in the future, but it seems better to
make them all anyway, so that we have a clear-cut rule of "no plain
getopt() in the postgres binary".

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
2026-03-30 20:47:16 +03:00
Heikki Linnakangas
fd8e3f7cee Invent a variant of getopt(3) that is thread-safe
The standard getopt(3) function is not re-entrant nor thread-safe.
That's OK for current usage, but it's one more little thing we need to
change in order to make the server multi-threaded.

There's no standard getopt_r() function on any platform, I presume
because command line arguments are usually parsed early when you start
a program, before launching any threads, so there isn't much need for
it. However, we call it at backend startup to parse options from the
startup packet. Because there's no standard, we're free to define our
own.

The pg_getopt_start/next() implementation is based on the old getopt
implementation, I just gathered all the state variables to a struct.
The non-re-entrant getopt() function is now a wrapper around the
re-entrant variant, on platforms that don't have getopt(3).
getopt_long() is not used in the server, so we don't need to provide a
re-entrant variant of that.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
2026-03-30 20:47:13 +03:00
Heikki Linnakangas
c5f7820e57 Fix latent bug in get_stats_option_name()
The function is supposed to look at the passed in 'arg' argument, but
peeks at the 'optarg' global variable that's part of getopt()
instead. It happened to work anyway, because all callers passed
'optarg' as the argument.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
2026-03-30 20:34:48 +03:00
Melanie Plageman
50eb5faea2 Pass down information on table modification to scan nodes
Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
2026-03-30 13:27:34 -04:00
Álvaro Herrera
349bd88202
Don't use bits32 in table AM interface
Seems there's near-universal dislike for the bitsXX typedefs.
Revert that part of commit 1bd6f22f43 in favor of using plain uint32.
2026-03-30 19:06:33 +02:00
Melanie Plageman
dcd8cc1c85 Thread flags through begin-scan APIs
Add an AM user-settable flags parameter to several of the table scan
functions, one table AM callback, and index_beginscan(). This allows
users to pass additional context to be used when building the scan
descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.

This introduces an extension point for follow-up work to pass per-scan
information (such as whether the relation is read-only for the current
query) from the executor to the AM layer.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2be31f17-5405-4de9-8d73-90ebc322f7d8%40vondra.me
2026-03-30 12:27:24 -04:00
Tom Lane
095555daf1 Detect pfree or repalloc of a previously-freed memory chunk.
Before the major rewrite in commit c6e0fe1f2, AllocSetFree() would
typically crash when asked to free an already-free chunk.  That was
an ugly but serviceable way of detecting coding errors that led to
double pfrees.  But since that rewrite, double pfrees went through
just fine, because the "hdrmask" of a freed chunk isn't changed at all
when putting it on the freelist.  We'd end with a corrupt freelist
that circularly links back to the doubly-freed chunk, which would
usually result in trouble later, far removed from the actual bug.

This situation is no good at all for debugging purposes.  Fortunately,
we can fix it at low cost in MEMORY_CONTEXT_CHECKING builds by making
AllocSetFree() check for chunk->requested_size == InvalidAllocSize,
relying on the pre-existing code that sets it that way just below.

I investigated the alternative of changing a freed chunk's methodid
field, which would allow detection in non-MEMORY_CONTEXT_CHECKING
builds too.  But that adds measurable overhead.  Seeing that we didn't
notice this oversight for more than three years, it's hard to argue
that detecting this type of bug is worth any extra overhead in
production builds.

Likewise fix AllocSetRealloc() to detect repalloc() on a freed chunk,
and apply similar changes in generation.c and slab.c.  (generation.c
would hit an Assert failure anyway, but it seems best to make it act
like aset.c.)  bump.c doesn't need changes since it doesn't support
pfree in the first place.  Ideally alignedalloc.c would receive
similar changes, but in debugging builds it's impossible to reach
AlignedAllocFree() or AlignedAllocRealloc() on a pfreed chunk, because
the underlying context's pfree would have wiped the chunk header of
the aligned chunk.  But that means we should get an error of some
sort, so let's be content with that.

Per investigation of why the test case for bug #19438 didn't appear to
fail in v16 and up, even though the underlying bug was still present.
(This doesn't fix the underlying double-free bug, just cause it to
get detected.)

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 16
2026-03-30 12:02:08 -04:00
Heikki Linnakangas
bd365b1ae5 Fix outdated comment on MainLWLockArray
It's no longer passed to child processes down via BackendParameters in
EXEC_BACKEND mode.

Reported-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA5RZ0vPWNMvTBqyH7nqDRrHd6Y4Et5iNqXFuwpbsPOk3cL4rQ@mail.gmail.com
2026-03-30 17:13:11 +03:00
Robert Haas
e2ee95233c pg_plan_advice: Avoid assertion failure with partitionwise aggregate.
An Append node that is part of a partitionwise aggregate has no
apprelids. If such a node was elided, the previous coding would
attempt to call unique_nonjoin_rtekind() on a NULL pointer, which
leads to an assertion failure. Insert a NULL check to prevent that.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/0afba1ce-c946-4131-972d-191d9a1c097c@gmail.com
2026-03-30 09:58:25 -04:00
Melanie Plageman
39dcd10a2c Remove PlannedStmt->resultRelations in favor of resultRelationRelids
PlannedStmt->resultRelations was an integer list of range table indexes
because at the time it was added (to Query), the Bitmapset data type did
not yet exist in Postgres.

0f4c170cf3 added a Bitmapset of result relations, so remove the integer
list of RTIs and use the more compact resultRelationRelids.

Discussion: https://postgr.es/m/CAApHDvqAOeOwCKh9g0gfxWa040%3DHyc7_oA%3DC59rjod8kXJDWyw%40mail.gmail.com
2026-03-30 09:51:28 -04:00
Melanie Plageman
0f4c170cf3 Make it cheap to check if a relation is modified by a query
Save the range table indexes of result relations and row mark relations
in separate bitmapsets in the PlannedStmt. Precomputing them allows
cheap membership checks during execution. Together, these two groups
approximate all relations that will be modified by a query. This
includes relations targeted by INSERT, UPDATE, DELETE, and MERGE as well
as relations with any row mark (like SELECT FOR UPDATE).

Future work will use information on whether or not a relation is
modified by a query in a heuristic.

PlannedStmt->resultRelations is only used in a membership check, so it
will be removed in a separate commit.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
2026-03-30 09:38:03 -04:00
Álvaro Herrera
1bd6f22f43
Have table_insert and siblings use an unsigned type for options
Using signed types can lead to bugs, such as the one fixed by commit
2a2e1b470b.

Discussion: https://postgr.es/m/44e6ze3kuunhky63wmfjxrmn72pds2whwf5ok6hpz7c4my7k2h@l65zhpcuasnf
2026-03-30 13:58:16 +02:00
Peter Eisentraut
c546f008cd headerscheck: Avoid mutual inclusion of pg_config.h and c.h
Headers that c.h includes early should not have another header
included before them in the headerscheck test file, especially not
c.h.

A particular instance of a problem is that pg_config.h defines some
symbols that c.h later undefines in some cases, such as in the code
added by commit cd083b54bd, but there were also some before that.
This only works correctly if pg_config.h is included first.

pg_config_manual.h and pg_config_os.h are closely related to
pg_config.h and should be treated the same way.

postgres_ext.h is meant to be usable standalone, so testing it with
c.h included first defeats the point.

c.h also includes port.h, but this commit leaves that alone, since
port.h does need some of c.h to be processed first.  (But because of
header guards, testing port.h separately is probably ineffective.)

Discussion: https://www.postgresql.org/message-id/flat/579116be-5016-4dbc-aed0-c06f8d9f5bbb%40eisentraut.org
2026-03-30 10:14:41 +02:00
Peter Eisentraut
b36b956404 Make cast functions to type money error safe
This converts the cast functions from types integer, bigint, and
numeric to type money to support soft errors.

Note: Casting from type money to type numeric (the other way, function
cash_numeric) is not yet error safe.

Author: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-30 10:10:56 +02:00
John Naylor
ec5981c381 Remove extraneous PGDLLIMPORT
Oversight from commit 3c6e8c1238. Should be harmless, so no
backpatch.

Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAN4CZFM8jmh4+1rUR2c++JWK9sV85T8_mqmwHMvM0YWkTm4_dQ@mail.gmail.com
2026-03-30 14:39:13 +07:00
Peter Eisentraut
75a5914d00 Fix accidentally casting away const
Recently introduced in commit b15c151398.
2026-03-30 09:24:10 +02:00
Peter Eisentraut
26f9012bee Make cast function from circle to polygon error safe
Previously, the function casting type circle to type polygon could not
be made error safe, because it is an SQL language function.

This refactors it as a C/internal function, by sharing code with the
C/internal function that the SQL function previously wrapped, and soft
error support is added.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-30 09:11:08 +02:00
Fujii Masao
2497dac556 Fix FK triggers losing DEFERRABLE/INITIALLY DEFERRED when marked ENFORCED again
Previously, a foreign key defined as DEFERRABLE INITIALLY DEFERRED could
behave as NOT DEFERRABLE after being set to NOT ENFORCED and then back
to ENFORCED.

This happened because recreating the FK triggers on re-enabling the constraint
forgot to restore the tgdeferrable and tginitdeferred fields in pg_trigger.

Fix this bug by properly setting those fields when the foreign key constraint
is marked ENFORCED again and its triggers are recreated, so the original
DEFERRABLE and INITIALLY DEFERRED properties are preserved.

Backpatch to v18, where NOT ENFORCED foreign keys were introduced.

Author: Yasuo Honda <yasuo.honda@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAKmOUTms2nkxEZDdcrsjq5P3b2L_PR266Hv8kW5pANwmVaRJJQ@mail.gmail.com
Backpatch-through: 18
2026-03-30 14:37:33 +09:00
David Rowley
0d866282b8 Fix datum_image_*()'s inability to detect sign-extension variations
Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc.  Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine.  This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases.  When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc.  This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:

ERROR:  could not find memoization table entry

Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com
2026-03-30 16:14:34 +13:00
Fujii Masao
1a11405a43 psql: Make \d+ inheritance tables list formatting consistent with other objects
This followw up on the previous change (commit 7bff9f106a) for partitions by
applying the same formatting to inheritance tables lists.

Previously, \d+ <table> displayed inheritance tables differently from other
object lists: the first inheritance table appeared on the same line as the
"Inherits" header. For example:

    Inherits: test_like_5,
              test_like_5x

This commit updates the output so that inheritance tables are listed
consistently with other objects, with each entry on its own line starting
below the header:

    Inherits:
        test_like_5
        test_like_5x

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
2026-03-30 11:21:22 +09:00
Fujii Masao
7bff9f106a psql: Make \d+ partition list formatting consistent with other objects
Previously, \d+ <table> displayed partitions differently from other object
lists: the first partition appeared on the same line as the "Partitions"
header. For example:

    Partitions: pt12 FOR VALUES IN (1, 2),
                pt34 FOR VALUES IN (3, 4)

This commit updates the output so that partitions are listed consistently
with other objects, with each entry on its own line starting below the header:

    Partitions:
        pt12 FOR VALUES IN (1, 2)
        pt34 FOR VALUES IN (3, 4)

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
2026-03-30 11:06:42 +09:00
Amit Langote
c57d8178eb Doc: fix stale text about partition locking with cached plans
Commit 121d774cae added text to master describing pruning-aware
locking behavior introduced by 525392d57.  That behavior was
reverted in May 2025, making the text incorrect.  Replace it with
the text used in back branches, which correctly describes current
behavior: pruned partitions are still locked at the beginning of
execution.

Discussion: https://postgr.es/m/CA+HiwqFT0fPPoYBr0iUFWNB-Og7bEXB9hB=6ogk_qD9=OM8Vbw@mail.gmail.com
2026-03-30 10:29:21 +09:00
Amit Langote
1ad7191f7e Add comment explaining fire_triggers=false in ri_PerformCheck()
The reason for passing fire_triggers=false to SPI_execute_snapshot()
in ri_PerformCheck() was not documented, making it unclear why it was
done that way.  Add a comment explaining that it ensures AFTER triggers
on rows modified by the RI action are queued in the outer query's
after-trigger context and fire only after all RI updates on the same
row are complete.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Surya Poondla <suryapoondla4@gmail.com>
Discussion: https://postgr.es/m/20250331212648.ad4ab804559001d7f0788741@sraoss.co.jp
2026-03-30 10:10:17 +09:00
Peter Eisentraut
45cdaf3665 Make geometry cast functions error safe
This adjusts cast functions of the geometry types to support soft
errors.  This requires refactoring of various helper functions to
support error contexts.  Also make the float8 to float4 cast error
safe.  It requires some of the same helper functions.

This is in preparation for a future feature where conversion errors in
casts can be caught.

(The function casting type circle to type polygon is not yet made error
safe, because it is an SQL language function.)

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-29 20:40:50 +02:00
Tom Lane
d4cb9c3776 Doc: document more incompatible pg_restore option pairs.
Most of the pairs of incompatible options (such as --file and --dbname)
are pretty obvious and need no explanation.  But it may not be obvious
that --single-transaction cannot be used together with --create or
multiple jobs, so let's mention that in the documentation.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAExHW5ti5igDwOOde6shgfS7JPtCY9gNrkB3xNr=FuGTYVDSjQ@mail.gmail.com
2026-03-29 14:06:50 -04:00
Tom Lane
e7b809ae75 Doc: clarify introductory description of pg_dumpall.
Add a sentence that describes the parts of a cluster's state that are
*not* included in the output.

Also swap two sentences in the introductory paragraph.  Without that,
it is not clear what the "it" at the beginning of the second sentence
is referring to.  Also add a reference to pg_restore, since not all
output formats are restored with pg_dump.

Also clarify the recently-added text about where different output
formats go, and relocate it above the ancillary text about having
to run as superuser.

Reported-by: Dimitre Radoulov <cichomitiko@gmail.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAGJBphSX2oMPPu=VM4U8NP4+qffFH_483tFQCJ_s-mOcN3DLDw@mail.gmail.com
2026-03-29 13:53:17 -04:00
Andrew Dunstan
01d58d7e3f Fix multiple bugs in astreamer pipeline code.
astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer.  After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data.  This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.

astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory).  Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.

astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.

astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.

astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file.  A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.

Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net

Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>

Backpatch-through: 15
2026-03-29 09:01:47 -04:00
Álvaro Herrera
0841b219bf
Sort InternalBGWorkers list alphabetically
This simplifies deciding where to add a new one.
2026-03-29 14:15:00 +02:00
Peter Eisentraut
10e4d8aaf4 Make cast functions from jsonb error safe
This adjusts cast functions from jsonb to other types to support soft
errors.  This just involves some refactoring of the underlying helper
functions to use ereturn.

This is in preparation for a future feature where conversion errors in
casts can be caught.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-28 15:44:13 +01:00
Andres Freund
999dec9ec6 aio: Don't wait for already in-progress IO
When a backend attempts to start a read IO and finds the first buffer already
has I/O in progress, previously it waited for that I/O to complete before
initiating reads for any of the subsequent buffers.

Although it must wait for the I/O to finish when acquiring the buffer, there's
no reason for it to wait when setting up the read operation. Waiting at this
point prevents starting I/O on subsequent buffers and can significantly reduce
concurrency.

This matters in two workloads:
1) When multiple backends scan the same relation concurrently.
2) When a single backend requests the same block multiple times within the
   readahead distance.

Waiting each time an in-progress read is encountered effectively degenerates
the access pattern into synchronous I/O.

To fix this, when encountering an already in-progress IO for the head buffer,
the wait reference is now recorded and waiting is deferred until
WaitReadBuffers(), when the buffer actually needs to be acquired.

In rare cases, a backend may still need to wait synchronously at IO
start time: If another backend has set BM_IO_IN_PROGRESS on the buffer
but has not yet set the wait reference. Such windows should be brief and
uncommon.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw%403p3zu522yykv
2026-03-27 19:53:32 -04:00
Andres Freund
74eafeab1a bufmgr: Improve StartBufferIO interface
Until now StartBufferIO() had a few weaknesses:

- As it did not submit staged IOs, it was not safe to call StartBufferIO()
  where there was a potential for unsubmitted IO, which required
  AsyncReadBuffers() to use a wrapper (ReadBuffersCanStartIO()) around
  StartBufferIO().

- With nowait = true, the boolean return value did not allow to distinguish
  between no IO being necessary and having to wait, which would lead
  ReadBuffersCanStartIO() to unnecessarily submit staged IO.

- Several callers needed to handle both local and shared buffers, requiring
  the caller to differentiate between StartBufferIO() and StartLocalBufferIO()

- In a future commit some callers of StartBufferIO() want the BufferDesc's
  io_wref to be returned, to asynchronously wait for in-progress IO

- Indicating whether to wait with the nowait parameter was somewhat confusing
  compared to a wait parameter

Address these issues as follows:

- StartBufferIO() is renamed to StartSharedBufferIO()

- A new StartBufferIO() is introduced that supports both shared and local
  buffers

- The boolean return value has been replaced with an enum, indicating whether
  the IO is already done, already in progress or that the buffer has been
  readied for IO

- A new PgAioWaitRef * argument allows the caller to get the wait reference is
  desired.  All current callers pass NULL, a user of this will be introduced
  subsequently

- Instead of the nowait argument there now is wait

  This probably would not have been worthwhile on its own, but since all these
  lines needed to be touched anyway...

Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 19:08:12 -04:00
Heikki Linnakangas
2407c8db15 Fix RequestNamedLWLockTranche in single-user mode
PostmasterContext is not available in single-user mode, use
TopMemoryContext instead. Also make sure that we use the correct
memory context in the lappend().

Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/acb_Eo1XtmCO_9z7@nathan
2026-03-28 01:02:11 +02:00
Andres Freund
1f6f200cab test_aio: Add read_stream test infrastructure & tests
While we have a lot of indirect coverage of read streams, there are corner
cases that are hard to test when only indirectly controlling and observing the
read stream.  This commit adds an SQL callable SRF interface for a read stream
and uses that in a few tests.

To make some of the tests possible, the injection point infrastructure in
test_aio had to be expanded to allow blocking IO completion.

While at it, fix a wrong debug message in inj_io_short_read_hook().

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 18:52:43 -04:00
Andres Freund
020c02bd90 test_aio: Add basic tests for StartReadBuffers()
Upcoming commits will change StartReadBuffers() and its building blocks,
making it worthwhile to directly test StartReadBuffers().

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 18:52:43 -04:00
Tom Lane
00c025a001 Doc: split functions-posix-regexp section into multiple subsections.
Create a <sect4> section for each function that the previous text
described in one long series of paragraphs.  Also split the functions'
previously in-line syntax summaries into <synopsis> clauses, which is
more readable and allows us to sneak in an explicit mention of the
result data type.

This change gives us an opportunity to make cross-reference links
more specific, too, so do that.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxFuk9P=P4=BZ=qCkgvo6im8aL8NnCkjxx2S2MQDWNdouw@mail.gmail.com
2026-03-27 17:41:08 -04:00
Andres Freund
f39cb8c011 bufmgr: Make UnlockReleaseBuffer() more efficient
Now that the buffer content lock is implemented as part of BufferDesc.state,
releasing the lock and unpinning the buffer can be implemented as a single
atomic operation.

This improves workloads that have heavy contention on a small number of
buffers substantially, I e.g., see a ~20% improvement for pipelined readonly
pgbench on an older two socket machine.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Andres Freund
8df3c48e46 Use UnlockReleaseBuffer() in more places
An upcoming commit will make UnlockReleaseBuffer() considerably faster and
more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But
it's a small performance benefit even as-is.

Most of the callsites changed in this patch are not performance sensitive,
however some, like the nbtree ones, are in critical paths.

This patch changes all the easily convertible places over to
UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and
reducing cases where the operations are done separately makes the checking
easier.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Andres Freund
41d3d64e87 bufmgr: Don't copy pages while writing out
After the series of preceding commits introducing and using
BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore
while IO is going on. Therefore we do not need to copy pages while they are
being written out anymore.

For the same reason XLogSaveBufferForHint() now does not need to operate on a
copy of the page anymore, but can instead use the normal XLogRegisterBuffer()
mechanism. For that the assertions and comments to XLogRegisterBuffer() had to
be updated to allow share-exclusive locked buffers to be registered.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Tom Lane
79ac82125e pgindent: ensure all C files end with a newline.
Not only is this good style, but it dodges some obscure bugs within
pg_bsd_indent.  We could try to fix said bugs, but the amount of
effort required seems far out of proportion to the benefit.

Reported-by: Akshay Joshi <akshay.joshi@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CANxoLDfca8O5SkeDxB_j6SVNXd+pNKaDmVmEW+2yyicdU8fy0w@mail.gmail.com
2026-03-27 15:38:48 -04:00