Commit graph

63786 commits

Author SHA1 Message Date
Peter Eisentraut
26f9012bee Make cast function from circle to polygon error safe
Previously, the function casting type circle to type polygon could not
be made error safe, because it is an SQL language function.

This refactors it as a C/internal function, by sharing code with the
C/internal function that the SQL function previously wrapped, and soft
error support is added.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-30 09:11:08 +02:00
Fujii Masao
2497dac556 Fix FK triggers losing DEFERRABLE/INITIALLY DEFERRED when marked ENFORCED again
Previously, a foreign key defined as DEFERRABLE INITIALLY DEFERRED could
behave as NOT DEFERRABLE after being set to NOT ENFORCED and then back
to ENFORCED.

This happened because recreating the FK triggers on re-enabling the constraint
forgot to restore the tgdeferrable and tginitdeferred fields in pg_trigger.

Fix this bug by properly setting those fields when the foreign key constraint
is marked ENFORCED again and its triggers are recreated, so the original
DEFERRABLE and INITIALLY DEFERRED properties are preserved.

Backpatch to v18, where NOT ENFORCED foreign keys were introduced.

Author: Yasuo Honda <yasuo.honda@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAKmOUTms2nkxEZDdcrsjq5P3b2L_PR266Hv8kW5pANwmVaRJJQ@mail.gmail.com
Backpatch-through: 18
2026-03-30 14:37:33 +09:00
David Rowley
0d866282b8 Fix datum_image_*()'s inability to detect sign-extension variations
Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc.  Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine.  This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases.  When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc.  This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:

ERROR:  could not find memoization table entry

Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com
2026-03-30 16:14:34 +13:00
Fujii Masao
1a11405a43 psql: Make \d+ inheritance tables list formatting consistent with other objects
This followw up on the previous change (commit 7bff9f106a) for partitions by
applying the same formatting to inheritance tables lists.

Previously, \d+ <table> displayed inheritance tables differently from other
object lists: the first inheritance table appeared on the same line as the
"Inherits" header. For example:

    Inherits: test_like_5,
              test_like_5x

This commit updates the output so that inheritance tables are listed
consistently with other objects, with each entry on its own line starting
below the header:

    Inherits:
        test_like_5
        test_like_5x

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
2026-03-30 11:21:22 +09:00
Fujii Masao
7bff9f106a psql: Make \d+ partition list formatting consistent with other objects
Previously, \d+ <table> displayed partitions differently from other object
lists: the first partition appeared on the same line as the "Partitions"
header. For example:

    Partitions: pt12 FOR VALUES IN (1, 2),
                pt34 FOR VALUES IN (3, 4)

This commit updates the output so that partitions are listed consistently
with other objects, with each entry on its own line starting below the header:

    Partitions:
        pt12 FOR VALUES IN (1, 2)
        pt34 FOR VALUES IN (3, 4)

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
2026-03-30 11:06:42 +09:00
Amit Langote
c57d8178eb Doc: fix stale text about partition locking with cached plans
Commit 121d774cae added text to master describing pruning-aware
locking behavior introduced by 525392d57.  That behavior was
reverted in May 2025, making the text incorrect.  Replace it with
the text used in back branches, which correctly describes current
behavior: pruned partitions are still locked at the beginning of
execution.

Discussion: https://postgr.es/m/CA+HiwqFT0fPPoYBr0iUFWNB-Og7bEXB9hB=6ogk_qD9=OM8Vbw@mail.gmail.com
2026-03-30 10:29:21 +09:00
Amit Langote
1ad7191f7e Add comment explaining fire_triggers=false in ri_PerformCheck()
The reason for passing fire_triggers=false to SPI_execute_snapshot()
in ri_PerformCheck() was not documented, making it unclear why it was
done that way.  Add a comment explaining that it ensures AFTER triggers
on rows modified by the RI action are queued in the outer query's
after-trigger context and fire only after all RI updates on the same
row are complete.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Surya Poondla <suryapoondla4@gmail.com>
Discussion: https://postgr.es/m/20250331212648.ad4ab804559001d7f0788741@sraoss.co.jp
2026-03-30 10:10:17 +09:00
Peter Eisentraut
45cdaf3665 Make geometry cast functions error safe
This adjusts cast functions of the geometry types to support soft
errors.  This requires refactoring of various helper functions to
support error contexts.  Also make the float8 to float4 cast error
safe.  It requires some of the same helper functions.

This is in preparation for a future feature where conversion errors in
casts can be caught.

(The function casting type circle to type polygon is not yet made error
safe, because it is an SQL language function.)

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-29 20:40:50 +02:00
Tom Lane
d4cb9c3776 Doc: document more incompatible pg_restore option pairs.
Most of the pairs of incompatible options (such as --file and --dbname)
are pretty obvious and need no explanation.  But it may not be obvious
that --single-transaction cannot be used together with --create or
multiple jobs, so let's mention that in the documentation.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAExHW5ti5igDwOOde6shgfS7JPtCY9gNrkB3xNr=FuGTYVDSjQ@mail.gmail.com
2026-03-29 14:06:50 -04:00
Tom Lane
e7b809ae75 Doc: clarify introductory description of pg_dumpall.
Add a sentence that describes the parts of a cluster's state that are
*not* included in the output.

Also swap two sentences in the introductory paragraph.  Without that,
it is not clear what the "it" at the beginning of the second sentence
is referring to.  Also add a reference to pg_restore, since not all
output formats are restored with pg_dump.

Also clarify the recently-added text about where different output
formats go, and relocate it above the ancillary text about having
to run as superuser.

Reported-by: Dimitre Radoulov <cichomitiko@gmail.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAGJBphSX2oMPPu=VM4U8NP4+qffFH_483tFQCJ_s-mOcN3DLDw@mail.gmail.com
2026-03-29 13:53:17 -04:00
Andrew Dunstan
01d58d7e3f Fix multiple bugs in astreamer pipeline code.
astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer.  After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data.  This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.

astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory).  Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.

astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.

astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.

astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file.  A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.

Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net

Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>

Backpatch-through: 15
2026-03-29 09:01:47 -04:00
Álvaro Herrera
0841b219bf
Sort InternalBGWorkers list alphabetically
This simplifies deciding where to add a new one.
2026-03-29 14:15:00 +02:00
Peter Eisentraut
10e4d8aaf4 Make cast functions from jsonb error safe
This adjusts cast functions from jsonb to other types to support soft
errors.  This just involves some refactoring of the underlying helper
functions to use ereturn.

This is in preparation for a future feature where conversion errors in
casts can be caught.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
2026-03-28 15:44:13 +01:00
Andres Freund
999dec9ec6 aio: Don't wait for already in-progress IO
When a backend attempts to start a read IO and finds the first buffer already
has I/O in progress, previously it waited for that I/O to complete before
initiating reads for any of the subsequent buffers.

Although it must wait for the I/O to finish when acquiring the buffer, there's
no reason for it to wait when setting up the read operation. Waiting at this
point prevents starting I/O on subsequent buffers and can significantly reduce
concurrency.

This matters in two workloads:
1) When multiple backends scan the same relation concurrently.
2) When a single backend requests the same block multiple times within the
   readahead distance.

Waiting each time an in-progress read is encountered effectively degenerates
the access pattern into synchronous I/O.

To fix this, when encountering an already in-progress IO for the head buffer,
the wait reference is now recorded and waiting is deferred until
WaitReadBuffers(), when the buffer actually needs to be acquired.

In rare cases, a backend may still need to wait synchronously at IO
start time: If another backend has set BM_IO_IN_PROGRESS on the buffer
but has not yet set the wait reference. Such windows should be brief and
uncommon.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw%403p3zu522yykv
2026-03-27 19:53:32 -04:00
Andres Freund
74eafeab1a bufmgr: Improve StartBufferIO interface
Until now StartBufferIO() had a few weaknesses:

- As it did not submit staged IOs, it was not safe to call StartBufferIO()
  where there was a potential for unsubmitted IO, which required
  AsyncReadBuffers() to use a wrapper (ReadBuffersCanStartIO()) around
  StartBufferIO().

- With nowait = true, the boolean return value did not allow to distinguish
  between no IO being necessary and having to wait, which would lead
  ReadBuffersCanStartIO() to unnecessarily submit staged IO.

- Several callers needed to handle both local and shared buffers, requiring
  the caller to differentiate between StartBufferIO() and StartLocalBufferIO()

- In a future commit some callers of StartBufferIO() want the BufferDesc's
  io_wref to be returned, to asynchronously wait for in-progress IO

- Indicating whether to wait with the nowait parameter was somewhat confusing
  compared to a wait parameter

Address these issues as follows:

- StartBufferIO() is renamed to StartSharedBufferIO()

- A new StartBufferIO() is introduced that supports both shared and local
  buffers

- The boolean return value has been replaced with an enum, indicating whether
  the IO is already done, already in progress or that the buffer has been
  readied for IO

- A new PgAioWaitRef * argument allows the caller to get the wait reference is
  desired.  All current callers pass NULL, a user of this will be introduced
  subsequently

- Instead of the nowait argument there now is wait

  This probably would not have been worthwhile on its own, but since all these
  lines needed to be touched anyway...

Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 19:08:12 -04:00
Heikki Linnakangas
2407c8db15 Fix RequestNamedLWLockTranche in single-user mode
PostmasterContext is not available in single-user mode, use
TopMemoryContext instead. Also make sure that we use the correct
memory context in the lappend().

Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/acb_Eo1XtmCO_9z7@nathan
2026-03-28 01:02:11 +02:00
Andres Freund
1f6f200cab test_aio: Add read_stream test infrastructure & tests
While we have a lot of indirect coverage of read streams, there are corner
cases that are hard to test when only indirectly controlling and observing the
read stream.  This commit adds an SQL callable SRF interface for a read stream
and uses that in a few tests.

To make some of the tests possible, the injection point infrastructure in
test_aio had to be expanded to allow blocking IO completion.

While at it, fix a wrong debug message in inj_io_short_read_hook().

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 18:52:43 -04:00
Andres Freund
020c02bd90 test_aio: Add basic tests for StartReadBuffers()
Upcoming commits will change StartReadBuffers() and its building blocks,
making it worthwhile to directly test StartReadBuffers().

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-27 18:52:43 -04:00
Tom Lane
00c025a001 Doc: split functions-posix-regexp section into multiple subsections.
Create a <sect4> section for each function that the previous text
described in one long series of paragraphs.  Also split the functions'
previously in-line syntax summaries into <synopsis> clauses, which is
more readable and allows us to sneak in an explicit mention of the
result data type.

This change gives us an opportunity to make cross-reference links
more specific, too, so do that.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxFuk9P=P4=BZ=qCkgvo6im8aL8NnCkjxx2S2MQDWNdouw@mail.gmail.com
2026-03-27 17:41:08 -04:00
Andres Freund
f39cb8c011 bufmgr: Make UnlockReleaseBuffer() more efficient
Now that the buffer content lock is implemented as part of BufferDesc.state,
releasing the lock and unpinning the buffer can be implemented as a single
atomic operation.

This improves workloads that have heavy contention on a small number of
buffers substantially, I e.g., see a ~20% improvement for pipelined readonly
pgbench on an older two socket machine.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Andres Freund
8df3c48e46 Use UnlockReleaseBuffer() in more places
An upcoming commit will make UnlockReleaseBuffer() considerably faster and
more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But
it's a small performance benefit even as-is.

Most of the callsites changed in this patch are not performance sensitive,
however some, like the nbtree ones, are in critical paths.

This patch changes all the easily convertible places over to
UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and
reducing cases where the operations are done separately makes the checking
easier.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Andres Freund
41d3d64e87 bufmgr: Don't copy pages while writing out
After the series of preceding commits introducing and using
BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore
while IO is going on. Therefore we do not need to copy pages while they are
being written out anymore.

For the same reason XLogSaveBufferForHint() now does not need to operate on a
copy of the page anymore, but can instead use the normal XLogRegisterBuffer()
mechanism. For that the assertions and comments to XLogRegisterBuffer() had to
be updated to allow share-exclusive locked buffers to be registered.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
2026-03-27 15:56:29 -04:00
Tom Lane
79ac82125e pgindent: ensure all C files end with a newline.
Not only is this good style, but it dodges some obscure bugs within
pg_bsd_indent.  We could try to fix said bugs, but the amount of
effort required seems far out of proportion to the benefit.

Reported-by: Akshay Joshi <akshay.joshi@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CANxoLDfca8O5SkeDxB_j6SVNXd+pNKaDmVmEW+2yyicdU8fy0w@mail.gmail.com
2026-03-27 15:38:48 -04:00
Masahiko Sawada
e752a2ccc9 doc: Clarify collation requirements for base32hex sortability.
While fixing the base32hex UUID sortability test in commit
89210037a0, it turned out that the expected lexicographical order is
only maintained under the C collation (or an equivalent byte-wise
collation). Natural language collations may employ different rules,
breaking the sortability.

This commit updates the documentation to explicitly state that
base32hex is "byte-wise sortable", ensuring users do not fall into the
trap of using natural language collations when querying their encoded
data.

Co-Authored-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
2026-03-27 12:13:29 -07:00
Nathan Bossart
d7965d65fc Add rudimentary table prioritization to autovacuum.
Autovacuum workers scan pg_class twice to collect the set of tables
to process.  The first pass is for plain relations and materialized
views, and the second is for TOAST tables.  When the worker finds a
table to process, it adds it to the end of a list.  Later on, it
processes the tables in the same order as the list.  This simple
strategy has worked surprisingly well for a long time, but there
have been many discussions over the years about trying to improve
it.

This commit introduces a scoring system that is used to sort the
aforementioned list of tables to process.  The idea is to have
autovacuum workers prioritize tables that are furthest beyond their
thresholds (e.g., a table nearing transaction ID wraparound should
be vacuumed first).  This prioritization scheme is certainly far
from perfect; there are simply too many possibilities for any
scoring technique to work across all workloads, and the situation
might change significantly between the time we calculate the score
and the time that autovacuum processes it.  However, we have
attemped to develop something that is expected to work for a large
portion of workloads with reasonable parameter settings.

The score is calculated as the maximum of the ratios of each of the
table's relevant values to its threshold.  For example, if the
number of inserted tuples is 100, and the insert threshold for the
table is 80, the insert score is 1.25.  If all other scores are
below that value, the table's score will be 1.25.  The other
criteria considered for the score are the table ages (both
relfrozenxid and relminmxid) compared to the corresponding
freeze-max-age setting, the number of update/deleted tuples
compared to the vacuum threshold, and the number of
inserted/updated/deleted tuples compared to the analyze threshold.

Once exception to the previous paragraph is for tables nearing
wraparound, i.e., those that have surpassed the effective failsafe
ages.  In that case, the relfrozenxid/relminmxid-based score is
scaled aggressively so that the table has a decent chance of
sorting to the front of the list.

To adjust how strongly each component contributes to the score, the
following parameters can be adjusted from their default of 1.0 to
anywhere between 0.0 and 10.0 (inclusive).  Setting all of these to
0.0 restores pre-v19 prioritization behavior:

	autovacuum_freeze_score_weight
	autovacuum_multixact_freeze_score_weight
	autovacuum_vacuum_score_weight
	autovacuum_vacuum_insert_score_weight
	autovacuum_analyze_score_weight

This is intended to be a baby step towards smarter autovacuum
workers.  Possible future improvements include, but are not limited
to, periodic reprioritization, automatic cost limit adjustments,
and better observability (e.g., a system view that shows current
scores).  While we do not expect this commit to produce any
earth-shattering improvements, it is arguably a prerequisite for
the aforementioned follow-up changes.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/aOaAuXREwnPZVISO%40nathan
2026-03-27 10:17:05 -05:00
Peter Eisentraut
9a9998163b Align tests for stored and virtual generated columns
These tests were intended to be aligned with each other, but
additional tests for virtual generated columns disrupted that
alignment.  The test confirming that user-defined types are not
allowed in virtual generated columns has also been moved to the
generated_virtual.sql-specific section.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://www.postgresql.org/message-id/flat/20250808115142.e9ccb81f35466a9a131a4c55@sraoss.co.jp
2026-03-27 15:49:34 +01:00
Peter Eisentraut
6857947db5 pgindent: Always clean up .BAK files from pg_bsd_indent
The previous commit let pgindent clean up File::Temp files on SIGINT.
This extends that to also cleaning up the .BAK files, created by
pg_bsd_indent.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/DFCDD5H4J7VX.3GJKRBBDCKQ86@jeltef.nl
2026-03-27 14:26:43 +01:00
Peter Eisentraut
801de0bd44 pgindent: Clean up temp files created by File::Temp on SIGINT
When pressing Ctrl+C while running pgindent, it would often leave around
files like pgtypedefAXUEEA. This slightly changes SIGINT handling so
those files are cleaned up.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/DFCDD5H4J7VX.3GJKRBBDCKQ86@jeltef.nl
2026-03-27 14:26:43 +01:00
Heikki Linnakangas
3fd0577728 Refactor PredicateLockShmemInit to not reuse var for different things
The PredicateLockShmemInit function is pretty complicated, and one
source of confusion is that it reuses the same local variable for
sizes of things. Replace the different uses with separate variables
for clarity.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/113724ab-0028-493f-9605-6e8570f0939f@iki.fi
2026-03-27 13:24:34 +02:00
Heikki Linnakangas
3c74cb5762 Avoid memory leak on error while parsing pg_stat_statements dump file
By using palloc() instead of raw malloc().

Reported-by: Gaurav Singh <gaurav.singh@yugabyte.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAEcQ1bYR9s4eQLFDjzzJHU8fj-MTbmRpW-9J-r2gsCn+HEsynw@mail.gmail.com
Backpatch-through: 14
2026-03-27 12:25:10 +02:00
Peter Eisentraut
288ae96872 Add a graph pattern variable only once
An element pattern variable may be repeated in the path pattern.
GraphTableParseState maintains a list of all variable names used in
the graph pattern.  Add a new variable name to that list only when it
is not present already.  This isn't a problem right now, but it could
be in the future.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5tR4O0vjeqTCPr2VB5pYjNYbJgbCBEQf63NtU5Pz1MiOQ%40mail.gmail.com
2026-03-27 10:55:17 +01:00
Heikki Linnakangas
98993150c0 Minor comment fixes to yesterday's LWLock tranche refactoring
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA5RZ0sLENRM+BicUjQFs_rP38oPx3gm0SsGrD0-jMhhM+HZ_w@mail.gmail.com
2026-03-27 11:44:10 +02:00
Peter Eisentraut
720f0f89d6 Reject consecutive element patterns of same kind
Adding an implicit empty vertex pattern when a path pattern starts or
ends with an edge pattern or when two consecutive edge patterns appear
in the pattern is not supported right now.  Prohibit such path
patterns.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/72a23702-6d96-4103-a54b-057c2352e885%2540eisentraut.org
2026-03-27 10:31:53 +01:00
Peter Eisentraut
b4a1320224 Enable warning like -Wstrict-prototypes on MSVC as well
This adds an MSVC warning option equivalent to those added in commit
29bf4ee749 for GCC/Clang.

Note that this requires commit bccfc73acd (Disable warnings in system
headers in MSVC).

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/aa73q1aT0A3/vke/%40ip-10-97-1-34.eu-west-3.compute.internal
2026-03-27 08:28:07 +01:00
Robert Haas
874da8b1f6 pg_plan_advice: pgindent
Reported-by: Lukas Fittl <lukas@fittl.com>
2026-03-26 20:10:13 -04:00
Heikki Linnakangas
30d432502b Use ShmemInitStruct to allocate lwlock.c's shared memory
It's nice to have them show up in pg_shmem_allocations like all other
shmem areas. ShmemInitStruct() depends on ShmemIndexLock, but only
after postmaster startup.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
2026-03-26 23:51:41 +02:00
Heikki Linnakangas
06d859aaf4 Move ShmemIndexLock into ShmemAllocator
This makes shmem.c independent of the main LWLock array. That makes it
possible to stop passing MainLWLockArray through BackendParameters in
the next commit.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
2026-03-26 23:51:41 +02:00
Heikki Linnakangas
12e3e0f2c8 Use a separate spinlock to protect LWLockTranches
Previously we reused the shmem allocator's ShmemLock to also protect
lwlock.c's shared memory structures. Introduce a separate spinlock for
lwlock.c for the sake of modularity. Now that lwlock.c has its own
shared memory struct (LWLockTranches), this is easy to do.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
2026-03-26 23:50:59 +02:00
Heikki Linnakangas
d6eba30a24 Refactor how user-defined LWLock tranches are stored in shmem
Merge the LWLockTranches and NamedLWLockTrancheRequest data structures
in shared memory into one array of user-defined tranches. The
NamedLWLockTrancheRequest list is now only used in postmaster, to hold
the requests until shared memory is initialized.

Introduce a C struct, LWLockTranches, to hold all the different fields
kept in shared memory. This gives an easier overview of what are all
the things kept in shared memory. Previously, we had separate pointers
for LWLockTrancheNames, LWLockCounter and the (shared memory copy of)
NamedLWLockTrancheRequestArray.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
2026-03-26 23:47:22 +02:00
Heikki Linnakangas
cc88481aeb Rename MAX_NAMED_TRANCHES to MAX_USER_DEFINED_TRANCHES
The "named tranches" term is a little confusing. In most places it
refers to tranches requested with RequestNamedLWLockTranche(), even
though all built-in tranches and tranches allocated with
LWLockNewTrancheId() also have a name. But in MAX_NAMED_TRANCHES, it
refers to tranches requested with either RequestNamedLWLockTranche()
or LWLockNewTrancheId(), as it's the maximum of all of those in total.

The "user defined" term is already used in
LWTRANCHE_FIRST_USER_DEFINED, so let's standardize on that to mean
tranches allocated with either RequestNamedLWLockTranche() or
LWLockNewTrancheId().

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
2026-03-26 23:46:04 +02:00
Tom Lane
a6d26e0fb2 Doc: declutter CREATE TABLE synopsis.
Factor out the "persistence mode" and storage/compression parts
of the syntax synopsis to reduce line lengths and increase
readability.  Also add an introductory para about the persistence
modes so that the Description section still lines up with the
synopsis.

Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwYfMV-2SdrP-umr5SVNSqTn378BUvHsebetp5=DhT494w@mail.gmail.com
2026-03-26 17:27:40 -04:00
Robert Haas
6455e55b0d pg_plan_advice: Invent DO_NOT_SCAN(relation_identifier).
The premise of src/test/modules/test_plan_advice is that if we plan
a query once, generate plan advice, and then replan it using that
same advice, all of that advice should apply cleanly, since the
settings and everything else are the same. Unfortunately, that's
not the case: the test suite is the main regression tests, and
concurrent activity can change the statistics on tables involved
in the query, especially system catalogs. That's OK as long as it
only affects costing, but in a few cases, it affects which relations
appear in the final plan at all.

In the buildfarm failures observed to date, this happens because
we consider alternative subplans for the same portion of the query;
in theory, MinMaxAggPath is vulnerable to a similar hazard. In both
cases, the planner clones an entire subquery, and the clone has a
different plan name, and therefore different range table identifiers,
than the original. If a cost change results in flipping between one
of these plans and the other, the test_plan_advice tests will fail,
because the range table identifiers to which advice was applied won't
even be present in the output of the second planning cycle.

To fix, invent a new DO_NOT_SCAN advice tag. When generating advice,
emit it for relations that should not appear in the final plan at
all, because some alternative version of that relation was used
instead. When DO_NOT_SCAN is supplied, disable all scan methods for
that relation.

To make this work, we reuse a bunch of the machinery that previously
existed for the purpose of ensuring that we build the same set of
relation identifiers during planning as we do from the final
PlannedStmt. In the process, this commit slightly weakens the
cross-check mechanism: before this commit, it would fire whenever
the pg_plan_advice module was loaded, even if pg_plan_advice wasn't
actually doing anything; now, it will only engage when we have some
other reason to create a pgpa_planner_state. The old way was complex
and didn't add much useful test coverage, so this seems like an
acceptable sacrifice.

Discussion: http://postgr.es/m/CA+TgmoYuWmN-00Ec5pY7zAcpSFQUQLbgAdVWGR9kOR-HM-fHrA@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
2026-03-26 17:09:57 -04:00
Robert Haas
26255a3207 Add an alternative_plan_name field to PlannerInfo.
Typically, we have only one PlannerInfo for any given subquery, but
when we are considering a MinMaxAggPath or a hashed subplan, we end
up creating a second PlannerInfo for the same portion of the query,
with a clone of the original range table. In fact, in the MinMaxAggPath
case, we might end up creating several clones, one per aggregate.

At present, there's no easy way for a plugin, such as pg_plan_advice,
to understand the relationships between the original range table and
the copies of it that are created in these cases.  To fix, add an
alternative_plan_name field to PlannerInfo. For a hashed subplan, this
is the plan name for the non-hashed alternative; for minmax aggregates,
this is the plan_name from the parent PlannerInfo; otherwise, it's the
same as plan_name.

Discussion: http://postgr.es/m/CA+TgmoYuWmN-00Ec5pY7zAcpSFQUQLbgAdVWGR9kOR-HM-fHrA@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
2026-03-26 16:45:17 -04:00
Tom Lane
10e2a8ac6a Doc: commit performs rollback of aborted transactions.
The COMMIT command handles an aborted transaction in the same
manner as the ROLLBACK command, but this wasn't explained in
its official reference page.  Also mention that behavior in
the tutorial's material on transactions.

Also add a comment mentioning that we don't raise an exception
for COMMIT within an aborted transaction, as the SQL standard
would have us do.

Hyperlink a couple of cross-references while we're at it.

Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKFQuwYgYR3rWt6vFXw=ZWZ__bv7PqvdOnHujG+UyqE11f+3sg@mail.gmail.com
2026-03-26 15:14:27 -04:00
Andres Freund
698ab40469 Address perlcritic complaint in response to 906a046972 2026-03-26 15:03:47 -04:00
Andres Freund
8a1a1d6ab8 bufmgr: Restructure AsyncReadBuffers()
Restructure AsyncReadBuffers() to use early return when the head buffer is
already valid, instead of using a did_start_io flag and if/else branches. Also
move around a bit of the code to be located closer to where it is used. This
is a refactor only.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-26 12:07:05 -04:00
Andres Freund
df09452c32 bufmgr: Make buffer hit helper
Already two places count buffer hits, requiring quite a few lines of
code since we do accounting in so many places. Future commits will add
more locations, so refactor into a helper.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-26 12:07:05 -04:00
Andres Freund
c2a68e08b1 bufmgr: Pass io_object and io_context through to PinBufferForBlock()
PinBufferForBlock() is always_inline and called in a loop in
StartReadBuffersImpl(). Previously it computed io_context and io_object
internally, which required calling IOContextForStrategy() -- a non-inline
function the compiler cannot prove is side-effect-free. This could potential
cause unneeded redundant function calls.

Compute io_context and io_object in the callers instead, allowing
StartReadBuffersImpl() to do so once before entering the loop.

Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
2026-03-26 12:07:05 -04:00
Robert Haas
5dcb15e89a pg_plan_advice: Refactor to invent pgpa_planner_info
pg_plan_advice tracks two pieces of per-PlannerInfo data: (1) for each
RTI, the corresponding relation identifier, for purposes of
cross-checking those calculations against the final plan; and (2) the
set of semijoins seen during planning for which the strategy of making
one side unique was considered. The former is tracked using a hash
table that uses <plan_name, RTI> as the key, and the latter is
tracked using a List of <plan_name, relids>.

It seems better to track both of these things in the same way and
to try to reuse some code instead of having everything be completely
separate, so invent pgpa_planner_info; we'll create one every time we
see a new PlannerInfo and need to associate some data with it, and
we'll use the plan_name field to distinguish between PlannerInfo
objects, as it should always be unique. Then, refactor the two
systems mentioned above to use this new infrastructure.

(Note that the adjustment in pgpa_plan_walker is necessary in order
to avoid spuriously triggering the sanity check in that function,
in the case where a pgpa_planner_info is created for a purpose not
related to sj_unique_rels.)

Discussion: https://postgr.es/m/CA+TgmoaK=4w7-qknUo3QhUJ53pXZq=c=KgZmRyD+k7ytqfmgSg@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
2026-03-26 11:57:33 -04:00
Tom Lane
41d69e6dcc Add labels to help make psql's hidden queries more understandable.
We recommend looking at psql's "-E" output to help understand the
system catalogs, but in some cases (particularly table displays)
there's a bunch of rather impenetrable SQL there.  As a small
improvement, label each query issued by describe.c with a short
description of its purpose.  The code is arranged so that the
labels also appear as SQL comments in the server log, if the
server is logging these commands.

We could expand this policy to every use of PSQLexec(), but most of
the ones outside describe.c are issuing simple commands like "BEGIN"
or "COMMIT", which don't seem to need such glosses.  I did add
labels to the commands issued by \sf, \sv and friends.

Also, make the -E and log output for hidden queries say
"INTERNAL QUERY" not just "QUERY", to distinguish them from
user-written queries.

Author: Greg Sabino Mullane <htamfids@gmail.com>
Co-authored-by: David Christensen <david+pg@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKAnmmJz8Hh=8Ru8jgzySPWmLBhnv4=oc_0KRiz-UORJ0Dex+w@mail.gmail.com
2026-03-26 11:36:52 -04:00