postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-07-09 17:51:00 -04:00

Author	SHA1	Message	Date
Richard Guo	5a55ea507a	Consider collation when proving uniqueness from unique indexes relation_has_unique_index_for() has long had an XXX noting that it doesn't check collations when matching a unique index's columns against equality clauses. This was benign as long as all collations in play reduced to the same notion of equality, but has been incorrect since nondeterministic collations were introduced in PG 12: a unique index under a deterministic collation does not prove uniqueness under a nondeterministic collation, nor vice versa. The consequence is wrong query results for any planner optimization that consumes the faulty proof, including inner-unique join execution (which stops the inner search after the first match per outer row), useless-left-join removal, semijoin-to-innerjoin reduction, and self-join elimination. Fix by requiring the index's collation to agree on equality with the clause's input collation. Two collations agree on equality if either is InvalidOid (denoting a non-collation-sensitive operation, which cannot conflict with the other side), if they have the same OID, or if both are deterministic: by definition a deterministic collation treats two strings as equal iff they are byte-wise equal (see CREATE COLLATION), so any two deterministic collations share the same equality relation and the uniqueness proof carries over. Any mismatch involving a nondeterministic collation is rejected. Back-patch to all supported branches; the bug has existed since nondeterministic collations were introduced in PG 12. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com Backpatch-through: 14	2026-05-05 10:22:53 +09:00
Tom Lane	93da297366	Declare load_hosts() as returning HostsFileLoadResult. This function returns some value of enum HostsFileLoadResult, but for reasons lost in the development process was declared to return "int". Fix that, for clarity and so that our typedefs collection tooling sees the typedef as used. Also fix the variable that the sole call assigns into. Move the typedef to the header file that declares load_hosts() to avoid creating header dependency problems. Discussion: https://postgr.es/m/359138.1777922557@sss.pgh.pa.us	2026-05-04 18:33:06 -04:00
Álvaro Herrera	b5f92b8eb4	Fix off-by-one in repack index loop A blunder of mine (Álvaro) in commit `28d534e2ae`. Author: Lakshmi N <lakshmin.jhs@gmail.com> Reviewed-by: Xiaopeng Wang <wxp_728@163.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CA+3i_M9ytFufvD8Tm0rhpfxuC4XrpgQDBHxM7NJQYxv488JW7w@mail.gmail.com	2026-05-04 20:01:19 +02:00
Peter Eisentraut	dc9e7c9ed9	Handle nodes that may appear in GraphPattern expression trees expression_tree_mutator_impl() did not handle T_GraphPattern, T_GraphElementPattern, and T_GraphPropertyRef. The corresponding expression_tree_walker_impl() already handles all three node types. This causes an "unrecognized node type" error whenever a GRAPH_TABLE appeared in an expression tree. While at it, also update raw_expression_tree_walker() and expression_tree_walker() to handle missing nodes that may appear in GraphPattern expression trees. When raw_expression_tree_walker() is called, GraphElementPattern::labelexpr contains ColumnRefs instead of GraphLabelRefs. Hence those are not handled in raw_expression_tree_walker(). Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDc97WFTSkXg%3Dg_ZAH8GnY2gJrvq72cs%2BYjqEAuZgXnkAQ%40mail.gmail.com	2026-05-04 17:34:32 +02:00
Peter Eisentraut	891a57c739	Do not define type for a property graph Even though a property graph is defined in pg_class it does not contain any rows by itself and need not have a type defined. Avoid creating a type for it. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ucu7ZTgYkO6rB_1ShJP3e%3DGAT2T3CP4XWN8rUVEsiJoA%40mail.gmail.com	2026-05-04 15:45:56 +02:00
Peter Eisentraut	b83a94a73b	Add missing serial commas	2026-05-04 11:53:04 +02:00
Amit Kapila	bf3ead6075	Simplify translatable messages for tuple value details in conflict.c. append_tuple_value_detail() constructed user-visible messages using separately translated fragments such as ": ", ", ", and ".",. This makes correct translation difficult or impossible in some languages. Refactor append_tuple_value_detail() to move all punctuation and sentence construction to the callers, which now use a single translatable string with a %s placeholder for the tuple data. Reported-by: David Rowley <dgrowleyml@gmail.com> Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/227279.1775956328%40sss.pgh.pa.us#8f3a5f50543556c60cc5a13270cb7ba4 Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com	2026-05-04 12:06:41 +05:30
Alexander Korotkov	c06d1a4ba6	Mark modified the FSM buffer as dirty during recovery The XLogRecordPageWithFreeSpace function updates the freespace map (FSM) data while replaying data-level WAL records during the recovery. If the FSM block is updated, it needs to be marked as modified. Currently, this is done with the MarkBufferDirtyHint call (as in all other cases for modifying FSM data). However, in the recovery context, this function will actually do nothing if checksums are enabled. It's assumed that the page should not be dirtied during recovery while modifying hints to protect against torn pages, since no new WAL data can be generated at this point to store FPI. Such logic does not seem fully aligned with the FSM case, as its blocks could be simply zeroed if a checksum mismatch is detected. Currently, changes to an FSM block could be lost if each change to that block occurs infrequently enough to allow it to be evicted from the cache. To persist the change, the modification needs to be performed while the FSM block is still kept in buffers and marked as dirty after receiving its FPI. If the block has already been cleaned, the change won't be persisted, so stored FSM blocks may remain in an obsolete state. If a large number of discrepancies between the data in leaf FSM blocks and the actual data blocks accumulate on the replica server, this could cause significant delays in insert operations after switchover. Such an insert operation may need to visit many data blocks marked as having sufficient space in the FSM, only to discover that the information is incorrect and the FSM records need to be corrected. In a heavily trafficked insert-only table with many concurrent clients performing inserts, this has been observed to cause several-second stalls, causing visible application malfunction. The desire to avoid such cases was the reason behind the commit `ab7dbd681`, which introduced an update of FSM data during the heap_xlog_visible invocation. However, an update to the FSM data on the standby side could be lost due to a missing 'dirty' flag, so there is still a possibility that a large number of FSM records will contain incorrect data. Note that having a zeroed FSM page in such a case (due to a checksum mismatch) is preferable, as a zero value will be interpreted as an indication of full data blocks, and the inserter will be routed to the next FSM block or to the end of the table. Given that FSM is ready to handle torn page writes and XLogRecordPageWithFreeSpace is called only during the recovery, there seems to be no reason to use MarkBufferDirtyHint here instead of a regular MarkBufferDirty call. Discussion: https://postgr.es/m/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 20:23:50 +03:00
Alexander Korotkov	e7cd592174	Wake standby_write/standby_flush waiters from the WAL replay loop The startup process only woke STANDBY_REPLAY waiters after replaying each WAL record. STANDBY_WRITE and STANDBY_FLUSH waiters depended only on walreceiver write/flush callbacks. As a result, replay progress alone did not wake those waiters, and in pure archive recovery (where no walreceiver exists) they could sleep until timeout. Fix by also calling WaitLSNWakeup() for STANDBY_WRITE and STANDBY_FLUSH after each replay. For the replay-floor semantics used by GetCurrentLSNForWaitType(), replay progress is a valid lower bound for both modes: WAL cannot be replayed unless it has already been written and flushed locally. This works together with the replay-position floor in GetCurrentLSNForWaitType(). The getter ensures that a waiter woken by replay can recheck successfully; the replay-side wakeups ensure that a waiter already asleep is notified when replay reaches its target. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 16:22:02 +03:00
Alexander Korotkov	cba67b5b87	Use replay position as floor for WAIT FOR LSN standby_(write\|flush) GetCurrentLSNForWaitType() for standby_write and standby_flush modes returned only the walreceiver position, which may lag behind WAL already present on the standby from a base backup, archive restore, or prior streaming. This could cause unnecessary blocking if the target LSN falls between the walreceiver's tracked position and the replay position. Fix by returning the maximum of the walreceiver position and the replay position. WAL up to the replay point is physically on disk regardless of its origin, so there is no reason to wait for the walreceiver to re-receive it. This complements `29e7dbf5e4`, which seeded writtenUpto to receiveStart in RequestXLogStreaming() to fix the most common hang scenario. The getter-level floor handles the remaining edge cases: targets between receiveStart and the replay position, and standbys running with archive recovery only (no walreceiver). Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 16:22:02 +03:00
Alexander Korotkov	df9f938ca2	Remove redundant WAIT FOR LSN caller-side pre-checks All five wakeup call sites duplicate WaitLSNWakeup()'s internal fast-path minWaitedLSN check and add an unnecessary NULL check on waitLSNState. Remove the inline pre-checks and call WaitLSNWakeup() directly. The fast-path check inside WaitLSNWakeup() already returns early when no waiter's target has been reached, so there is no performance difference. The waitLSNState NULL checks are also unnecessary: shared memory is fully initialized before any backend or auxiliary process starts, so waitLSNState is always non-NULL at these call sites. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/jzq5shdewncpxc35r3s2mcfsmo4bjovkza5mnqf5bdfumhfi3g%40bglckf7dxmw5 Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 16:22:02 +03:00
Alexander Korotkov	a80a593ab6	Fix memory ordering in WAIT FOR LSN wakeup mechanism WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN position then reads minWaitedLSN; the waiter stores its target into minWaitedLSN then reads the position. Without a barrier between each side's store and load, a CPU may satisfy the load before the store becomes globally visible, causing either side to miss a concurrent update. The result is a missed wakeup: the waiter sleeps indefinitely until the next unrelated event. Fix by embedding the required barriers into the atomic operations on minWaitedLSN: - In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the waiter's preceding heap update is visible before the new minWaitedLSN value is published. - In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the fast-path check so the waker's preceding position store is globally visible before minWaitedLSN is read. The waiter side is also covered by the barrier semantics already present in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit read barrier (from patch 0001), while the remaining getters acquire a spinlock, which implies the same ordering. Also call ResetLatch() unconditionally after WaitLatch(), following the standard latch loop pattern. WaitLatch() does not guarantee that all simultaneously true wake conditions are reported in one return, so a timeout can race with SetLatch(). If we skip ResetLatch() on a timeout return, the code performs further asynchronous-state checks before consuming the latch, violating the latch API's required wait/reset pattern. That can leave the latch set across loop exit and cause a later unrelated WaitLatch() in the same backend to return immediately. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 16:22:02 +03:00
Alexander Korotkov	dfb690dd52	Use barrier semantics when reading/writing writtenUpto The walreceiver publishes its write position lock-free via writtenUpto. On weakly-ordered architectures (ARM, PowerPC), both sides of this handshake need explicit barriers so that the lock-less reader sees a consistent state. Use pg_atomic_write_membarrier_u64() at both write sites and pg_atomic_read_membarrier_u64() in GetWalRcvWriteRecPtr(). This matches the barrier semantics that GetWalRcvFlushRecPtr() and other LSN-position functions get implicitly from their spinlock acquire/release, and protects from bugs caused by expectations of similar barrier guarantees from different LSN-position functions. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2026-05-03 16:22:02 +03:00
Andrew Dunstan	b772f3fcad	Only show signal-sender PID/UID detail in server log The errdetail() added in `55890a9194` (and reworked in `3e2a1496ba`) exposed the operating-system PID and UID of whoever sent the termination signal directly to the affected client. Discussion suggested this should not be sent to the client, but only recorded in the server log where the admin can use it for diagnosis. Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/E5CA274C-74BD-4067-8B73-A3AD8C080EFA@gmail.com	2026-05-01 13:20:08 -04:00
Fujii Masao	c0b24b32b0	Avoid blocking indefinitely while finishing walsender shutdown When walsender finishes streaming during shutdown, it sends a CommandComplete message to tell the receiver that WAL streaming is done. Previously, that path used EndCommand() followed by pq_flush(). Those functions can block indefinitely waiting for the socket to become writeable. As a result, even when wal_sender_shutdown_timeout is set, walsender could remain stuck while sending the final completion message, and the shutdown timeout would not be enforced. Fix this by introducing EndCommandExtended(), which allows CommandComplete to be queued with pq_putmessage_noblock(), and by using the walsender nonblocking flush path instead of pq_flush(), so the shutdown timeout continues to be checked while pending output is flushed. Per CI testing on FreeBSD. Reported-by: Andres Freund <andres@anarazel.de> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/vwlugmsogfn36jhm56zwrgd7m6xe6ircltvfh3kzt6kldvbtht@f45dgow5uhnx	2026-05-01 12:12:44 +09:00
Richard Guo	f76686ce7f	Fix HAVING-to-WHERE pushdown with nondeterministic collations When GROUP BY uses a nondeterministic collation, the planner's optimization of moving HAVING clauses to WHERE can produce incorrect query results. The HAVING clause may apply a stricter collation that distinguishes values the GROUP BY considers equal. Pushing such a clause to WHERE causes it to filter individual rows before grouping, potentially eliminating group members and changing aggregate results. Fix this by detecting collation conflicts before flatten_group_exprs, while the HAVING clause still contains GROUP Vars (Vars referencing RTE_GROUP). At that point, each GROUP Var directly carries the GROUP BY collation as its varcollid, making it straightforward to compare against the operator's inputcollid. A mismatch where the GROUP BY collation is nondeterministic means the clause is unsafe to push down. RowCompareExpr is treated specially, since it carries per-column inputcollids[] rather than a single inputcollid. The conflicting clause indices are recorded in a Bitmapset and consulted during the existing HAVING-to-WHERE loop, so that only affected clauses are kept in HAVING; other safe clauses in the same query are still pushed. Back-patch to v18 only. The fix relies on the RTE_GROUP mechanism introduced in v18 (commit `247dea89f`), which is what lets us identify grouping expressions and their resolved collations via GROUP Vars on pre-flatten havingQual. Pre-v18 branches lack that machinery, so a back-patch there would need a different approach. Given the absence of field reports of this bug on back branches, the risk of carrying a different fix on stable branches is not justified. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/CAMbWs48Dn2wW6XM94GZsoyMiH42=KgMo+WcobPKuWvGYnWaPOQ@mail.gmail.com Backpatch-through: 18	2026-05-01 11:13:50 +09:00
Amit Langote	410013d2a5	Use "concurrent delete" in serialization error for TM_Deleted cases In ExecLockRows() and ri_LockPKTuple(), the TM_Deleted code path was using the same "could not serialize access due to concurrent update" message as the TM_Updated path. Use "concurrent delete" instead, since the tuple was deleted, not updated. The ExecLockRows() instance was likely a copy-paste error per Andres; the ri_LockPKTuple() instance was carried over from the same pattern in commit `2da86c1ef9`. Update affected isolation test expected files accordingly and add a new test to fk-concurrent-pk-upd.spec with concurrent delete of the PK row. The ExecLockRows() change is master-only for lack of user complaints and to avoid breaking anything that might match on the error text. Reported-by: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CACJufxEG1JTCq4A1gnNAu-bGAq9Xn=Xkf7kC3TRWFz6iuUOuRA@mail.gmail.com	2026-05-01 10:00:29 +09:00
Richard Guo	8d829f5a02	Fix JSON_ARRAY(query) empty set handling and view deparsing According to the SQL/JSON standard, JSON_ARRAY(query) must return an empty JSON array ('[]') when the subquery returns zero rows. Previously, the parser rewrote JSON_ARRAY(query) into a JSON_ARRAYAGG aggregate function. Because this aggregate evaluates to NULL over an empty set without a GROUP BY clause, the constructor erroneously returned NULL. Additionally, this premature rewrite baked physical implementation details into the catalog, preventing ruleutils.c from deparsing the original syntax for views. This patch resolves both issues by introducing a new JSCTOR_JSON_ARRAY_QUERY constructor type. The parser builds the executable form --- a COALESCE-wrapped JSON_ARRAYAGG subquery --- from raw parse nodes via transformExprRecurse, and stores it in the func field. The original transformed Query is kept in a new orig_query field so that ruleutils.c can deparse the original syntax for views. During planning, eval_const_expressions replaces the node with the pre-built func expression. The deparsing issue was reported by Tom Lane. Bump catalog version. Bug: #19418 Reported-by: Lukas Eder <lukas.eder@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/19418-591ba1f29862ef5b@postgresql.org	2026-05-01 09:42:00 +09:00
Álvaro Herrera	6ca631b990	REPACK CONCURRENTLY: fix processing of toasted tuples In order to process tuples inserted or updated while REPACK executes, we write those tuples to disk and later restore them; however, some forms of toasted tuples were not being processed correctly. Fix that. Also expand the tests a bit for better coverage. Author: Satya Narlapuram <satyanarlapuram@gmail.com> Author: Antonin Houska <ah@cybertec.at> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAHg+QDeXb9HM2VGKXQedyCp52GzajJK5KOUdNi6oLjsS0nerQw@mail.gmail.com	2026-04-30 23:32:57 +02:00
Andrew Dunstan	6cf49e804c	Fix attnum remapping in generateClonedExtStatsStmt() When cloning extended statistics via CREATE TABLE ... LIKE ... INCLUDING STATISTICS, stxkeys holds attribute numbers from the source (parent) table, but get_attname() was being called with the child relation's OID. If the parent has dropped columns, the child's attribute numbers are renumbered sequentially and no longer match, so the lookup either returns the wrong column name (silent corruption) or errors out when the attnum does not exist in the child. Fix it by remapping the parent attnum through attmap before the lookup, consistent with how expression statistics are already handled a few lines below. Add a regression test covering both manifestations: a 3-column parent where the stale attnum refers to no child column (cache-lookup error), and a 4-column parent where the stale attnum silently refers to the wrong child column. Author: Julien Tachoires <julmon@gmail.com> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Discussion: https://postgr.es/m/20260415105718.tomuncfbmlt67oel@poseidon.home.virt Backpatch-through: 14	2026-04-30 11:04:57 -04:00
Andrew Dunstan	5642a0367c	Avoid SIGSEGV in pg_get_database_ddl() on NULL tablespace There is a narrow race in which a concurrent ALTER DATABASE ... SET TABLESPACE moves the database off the tablespace and a DROP TABLESPACE removes it between the syscache lookup and the catalog scan. If that happens, output an error. Author: Chao Li <lic@highgo.com> Reviewed-by: Jack Bonatakis <jack@bonatak.is> Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/573E45C1-31A4-4885-A00C-1A2171159A2A@gmail.com	2026-04-30 10:14:52 -04:00
Daniel Gustafsson	75152c5dc5	Fix data_checksum GUC show_hook Commit `f19c0eccae` erroneously omitted the show_hook for the data_checksum GUC. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:57 +02:00
Daniel Gustafsson	1df361e3d8	Improve database detection logic in datachecksumsworker The worker need to know whether a database which failed checksum processing still exists, or has been dropped. This improves the detection logic by checking for being partially dropped. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:55 +02:00
Daniel Gustafsson	bf25e5571b	Improve handling of concurrent checksum requests When pg_{enable\|disable}_data_checksums is called while checksums are being enabled or disabled, the already running launcher is detected and the new desired state is recorded. Processing will then pick up the new state and change its operation to fulfill the new request. If the same state is requested but with different cost values, the new cost values will take effect on the next relation processed. The previous coding had a complex logic of starting a new launcher for this, which is now avoided with the shared mem structure instead used to signal current processing. This makes the logic more robust, and fixes a bug where the launcher would erroneously revert back to the "off" state. Access to the shared memory is also protected with LWLocks in all cases. Since the shmem structure is used for signalling between the worker and the launcher, and there can be only one of each, there were no concurrency issues detected but it's better to stick to proper locking protocol should this ever be updated to handle multiple workers. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:53 +02:00
Daniel Gustafsson	381d19da15	Typo and spelling fixups for online checksums A collection of spelling, wording and punctuation fixups for the code documentation from postcommit review. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:50 +02:00
Daniel Gustafsson	25b922ec58	Fix invalid checksum state transition in checkpoints Commit 78e950cb8 added checksum state handling to all XLOG_CHECKPOINT records which caused unnecessary state transitions and emission of procsignal barriers. Remove as only the _REDO record need to handle checksum state. Barrier emission is also consistently made after controlfile updates to avoid race conditions. Additionally, interrupts are held between calling ProcSignalInit and InitLocalDataChecksumState to remove a window where otherwise invalid state transitions can happen. Also remove a pointless assertion on Controlfile which will never hit. Author: Tomas Vondra <tomas@vondra.me> Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:48 +02:00
Daniel Gustafsson	8fb8ded889	Handle data_checksum state changes during launcher_exit When erroring out from the datachecksums launcher during data checksum enabling, before state has transitioned to "on", we revert back to the "off" state. Since checksums weren't enabled, there is no use staying in an inprogress state since the checksum launcher currently doesn't support restarting from where it left off. Should restartability get added in the future, this would need to be revisited. This state transition was however missing from the allowed transitions in the statemachine causing an error. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se	2026-04-30 13:41:46 +02:00
Daniel Gustafsson	b120358c61	Prevent pg_enable/disable_data_checksums() on standby These functions missed a RecoveryInProgress() check, allowing them to be called on a hot standby. Enabling, or disabling, checksums on the standby only would cause the cluster to get out of sync and replaying checksum transitions to fail. Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAHg+QDfRk4-S7DMmdbXJnQ-xF=sUpMAKuh8b83ObLqYVKx5QLA@mail.gmail.com	2026-04-30 13:41:41 +02:00
Amit Kapila	2bf6c9ff71	Fix double table_close of sequence_rel in copy_sequences(). sequence_rel was declared at batch scope, so when a row is skipped due to concurrent drop or insufficient privileges, the end-of-row cleanup closes the stale pointer from the previous row, tripping the relcache refcount assertion. Move sequence_rel inside the per-row loop. Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/CAJTYsWWOuw-yfmzotV4jCJ6LLxEsb=STLcGtDYXOxRcU9Te3Pw@mail.gmail.com	2026-04-30 16:39:39 +05:30
Michael Paquier	5941e7f092	Fix errno check based on EINTR in pg_flush_data() Upon a failure of sync_file_range(), EINTR was checked based on the returned result of the routine rather than its errno. sync_file_range() returns -1 on failure, making the check a no-op, invalidating the retry attempt in this case. Oversight in `0d369ac650`. Author: DaeMyung Kang <charsyam@gmail.com> Discussion: https://postgr.es/m/20260429151811.1810874-1-charsyam@gmail.com Backpatch-through: 16	2026-04-30 18:44:38 +09:00
Michael Paquier	ac59a90bef	Adjust some incorrect GetDatum() macros This reverts portions of commit `6dcfac9696`, which is wrong in trying to use a GetDatum() that matches with the C types of the values read. GetDatum() should match with the output argument types of the SQL functions. The portions of `6dcfac9696` that are right regarding this rule are: - gistget.c, where the GiST support functions use DatumGetUInt16() to retrieve the strategy number. - The BRIN code for strategynum, used in syscache lookups. The adjustments done in this commit are for pageinspect, pg_buffercache and pg_lock_status(). While double-checking the whole state of the tree regarding non-matching pairs of DatumGet() and *GetDatum(), I have found much more code paths that are incorrect, unrelated to `6dcfac9696`. These may be adjusted in the future, in a different patch (perhaps not for v19, as we are already past feature freeze). Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/97f9375a-be61-4272-a44d-408337fe8fa6@eisentraut.org Discussion: https://postgr.es/m/CAJ7c6TMcGu8qmRe1gZfJ-gOzVnZq-t=fwn-UuyStx1w6ZyydMw@mail.gmail.com	2026-04-30 13:10:19 +09:00
Michael Paquier	4bfd0f1b76	Fix error of pg_stat_reset_shared() "lock" a values is supported since `4019f725f5`, but the error message of the function used when specifying an incorrect value forgot about it. Author: Maksim Logvinenko <logvinenko-ms@yandex.ru> Discussion: https://postgr.es/m/433431777389005@mail.yandex.ru	2026-04-30 11:12:56 +09:00
Peter Geoghegan	748d871b7c	Fix nbtree skip array parallel alloc accounting. btestimateparallelscan neglected to add btps_arrElems[] space overhead for skip array scan keys that were later output by nbtree preprocessing. Skip arrays don't actually need to use this space, but a scan with a subsequent SAOP array will need to subscript btps_arrElems[] using a simple so->arrayKeys[]-wise offset. so->arrayKeys[] has entries for both kinds of arrays. As a result of this oversight, it was possible for an index scan with a skip array and a lower-order SAOP array to write past the allocated shared memory boundary when storing the SAOP array's cur_elem. In practice the problem seems to be limited to scans with many skipped index columns, since our general approach to estimating the amount of shared memory that will be required is fairly conservative. To fix, have btestimateparallelscan request an extra sizeof(int) space for key columns that might require a skip array later on. Oversight in commit `92fe23d9`, which added the nbtree skip scan optimization. Author: Siddharth Kothari <sidkot@google.com> Discussion: https://postgr.es/m/CAGCUe0Lwk3C0qdkBa+OLpYc7yXwW=pbaz8Sju4xMXEQAmyp+5g@mail.gmail.com Backpatch-through: 18	2026-04-29 11:22:23 -04:00
John Naylor	ca9807dfec	Cosmetic fixes for radix sort Do minor comment fixes and remove implicit cast to Datum. While here, let's prefer crashing instead of entering an infinite loop in case of future programming mistakes when computing next_level, suggested by ChangAo Chen. Discussion: https://postgr.es/m/tencent_49E3F11E74D8A584A2144ED532A490CBC40A@qq.com	2026-04-29 16:14:25 +07:00
John Naylor	a0302eac78	Remove unused ByteaSortSupport.abbreviate field Oversight in commit `9303d62c6`. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://postgr.es/m/CAJ7c6TOsKmmgyA6EwxKVsNeHFHrWXYdgZivgjo_ujf890BpeeA@mail.gmail.com	2026-04-29 13:57:07 +07:00
Amit Kapila	c210647aeb	Fix xid_advance_interval when max_retention_duration is 0. When a subscription has retain_dead_tuples enabled and maxretention is zero (unlimited), adjust_xid_advance_interval() mistakenly caps xid_advance_interval to zero. This zero interval forces get_candidate_xid() to evaluate TimestampDifferenceExceeds() as always true, causing the apply worker to call GetOldestActiveTransactionId() for every WAL message. This leads to unnecessary ProcArrayLock acquisitions. Fix this by only capping the interval when maxretention > 0, allowing the exponential back-off to function properly. Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Discussion: https://postgr.es/m/CAHg+QDdKVnCLHot=AcoPpEiSyDzGz7wGYjAFHVOw57oDtmUDWQ@mail.gmail.com	2026-04-28 14:51:38 +05:30
Amit Kapila	7424aac088	Fix wrong datum conversion for subretentionactive in CreateSubscription. Use BoolGetDatum() instead of Int32GetDatum() when storing the boolean subretentionactive column in pg_subscription. This was an oversight in `a850be2fe6`. Author: Lakshmi N <lakshmin.jhs@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Discussion: https://postgr.es/m/CA+3i_M98-XjE-_fw0p+8xOnw64y2_YLtJfcwvCfsVMn-z2ZjGg@mail.gmail.com	2026-04-28 13:13:47 +05:30
Álvaro Herrera	832e220d99	REPACK CONCURRENTLY: Don't use deferrable primary keys Similarly to logical replication, REPACK CONCURRENTLY needs to ability to reliably locate a tuple based on an identity. A replica identity index is okay. Primary keys normally also are, except when they are deferrable, because a tuple being modified might not yet be indexed, causing REPACK to fail. Change the REPACK CONCURRENTLY code to use GetRelationIdentityOrPK(), similar to what the logical replication code does. (Though we don't yet support locating tuples based on arbitrary indexes for replica identity FULL.) While at it, add a few more test cases for situations that aren't supported by REPACK, to improve coverage. Author: Chao Li <lic@highgo.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Yuchen Li <liyuchen_xyz@163.com> Discussion: https://postgr.es/m/10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com	2026-04-27 18:22:03 +02:00
Peter Eisentraut	33db6c4baf	Fix DELETE/UPDATE FOR PORTION OF with rules Previously, these test cases would give internal errors or crash. The fix is to add some missing fields of ForPortionOfExpr to expression_tree_walker. Author: jian he <jian.universality@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Discussion: https://postgr.es/m/CACJufxHs1Hs00EqsZ4NbuAjmYzMzjJyP1sAj12Ne=cBsEVmQOA@mail.gmail.com	2026-04-27 10:34:06 +02:00
Richard Guo	c66d6d19eb	Fix bogus calls in remove_self_join_rel() remove_self_join_rel() called adjust_relid_set() on all_result_relids and leaf_result_relids but threw away the return value. Since adjust_relid_set() returns a freshly-built Relids and does not modify the input in place, the calls did nothing. This has been the case since the SJE feature went in (commit `fc069a3a6`). There has been no observable misbehavior, because the relid being passed is guaranteed not to be a member of either set. At the point remove_self_join_rel() runs, those sets contain only resultRelation; inheritance children have not been added yet, as that happens later in query_planner(), in expand_single_inheritance_child() called from add_other_rels_to_query(). And remove_self_joins_recurse() rejects parse->resultRelation as an SJE candidate to preserve the EvalPlanQual mechanism. Even with the result assigned, the calls would be no-ops in practice. Rather than make the calls do the cleanup they pretend to do, replace them with assertions of the invariant. Any future loosening of the SJE candidate filter -- for instance to allow eliminating a result relation under provable conditions -- will trip the assertion and force whoever does it to revisit this code. Additionally, decorate adjust_relid_set() with pg_nodiscard so that any future accidental discard of its return value is caught at compile time. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMbWs49fYQcqJfJ_Gtn8r1GFNoYtb1=2AUab4ieuqY4Zid9ocQ@mail.gmail.com	2026-04-27 10:40:37 +09:00
Michael Paquier	b801d5eef1	Fix some memory leaks in the WAL receiver These are old leaks, that can pile up if a WAL receiver stays alive, waiting for new WAL data after the sender has switched to a new timeline. While this is technically a bug, the impact is minimal and would only become noticeable if the WAL sender handles a lot of timeline switches, so no backpatch is done. Note that in most cases, primary_conninfo would be updated in a standby to point to a new sender, meaning a restart of the WAL receiver. Let's be clean on HEAD, though. Author: DaeMyung Kang <charsyam@gmail.com> Discussion: https://postgr.es/m/20260426170100.847923-1-charsyam@gmail.com Discussion: https://postgr.es/m/20260426170219.849330-1-charsyam@gmail.com	2026-04-27 10:32:45 +09:00
Peter Eisentraut	9d2979dd68	pg_get_viewdef() and lateral references in COLUMNS of GRAPH_TABLE Expressions in GRAPH_TABLE COLUMNS list may have lateral references. get_rule_expr() requires lateral namespaces to deparse such references. get_from_clause_item() does not pass them when processing the expressions in COLUMNS list causing ERROR "bogus varlevelsup: 0 offset 0". Fix get_from_clause_item() to pass input deparse_context containing lateral namespaces to get_rule_expr() instead of the dummy context. Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDcLVa2iBnggkHxY4itZbXtDMfsYHEjnCUYe9hNbnxDi-w%40mail.gmail.com	2026-04-24 09:12:03 +02:00
Peter Eisentraut	ac3bcc041c	Fix collation of expressions in GRAPH_TABLE COLUMNS clause GRAPH_TABLE clause is converted into a rangetable entry, which is ignored by assign_query_collations(). Hence we assign collations while transforming its parts. But expressions in COLUMNS clause missed that treatment, so fix that. While at it, also add comments about collation assignment to the parts of GRAPH_TABLE clause, and also fix a small grammar issue. Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAHg+QDc4aaiufYSgrwMMPMMRTPtQ66SghcrPFbWJFZMqNaG+BA@mail.gmail.com	2026-04-24 08:43:26 +02:00
Peter Eisentraut	9082680c34	Fix typos and grammar in graph table rewrite code Reported-by: Lakshmi N <lakshmin.jhs@gmail.com> Author: Lakshmi N <lakshmin.jhs@gmail.com> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+3i_M9gpUGjH-BkJk=UFjK16jq9fEQHpmZ1cxpJO+xM4hWC+A@mail.gmail.com	2026-04-24 08:27:04 +02:00
Peter Eisentraut	2ff289d039	Check for stack overflow when rewriting graph queries generate_queries_for_path_pattern_recurse() and generate_setop_from_pathqueries() are recursive functions. For a property graph with hundreds of tables, a graph pattern with a handful element patterns can cause stack overflow. Fix it by calling check_stack_depth() at the beginning of these functions. Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAHg+QDfgK0xddH8f3eAb+UVn7sBDOnv8RvM6OkP4HtHAt6aD7w@mail.gmail.com	2026-04-24 08:18:21 +02:00
David Rowley	94219a73f7	Fix incorrect logic for hashed IN / NOT IN with non-strict operators ExecEvalHashedScalarArrayOp(), when using a strict equality function, performs a short-circuit when looking up NULL values. When the function is non-strict, the code incorrectly looked up the hash table for a zero-valued Datum, which could have resulted in an accidental true return if the hash table contained zero valued Datum, or could result in a crash for non-byval types. Here we fix this by adding an extra step when we build the hash table to check what the result of a NULL lookup would be. This requires looping over the array and checking what the non-hashed version of the code would do. We cache the results of that in the expression so that we can reuse the result any time we're asked to search for a NULL value. It's important to note that non-strict equality functions are free to treat any NULL value as equal to any non-NULL value. For example, someone may wish to design a type that treats an empty string and NULL as equal. All built-in types have strict equality functions, so this could affect custom / user-defined types. Author: Chengpeng Yan <chengpeng_yan@outlook.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: ChangAo Chen <cca5507@qq.com> Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com Backpatch-through: 14	2026-04-24 14:03:12 +12:00
Heikki Linnakangas	713bce9484	Don't call CheckAttributeType() with InvalidOid on dropped cols If CheckAttributeType() is called with InvalidOid, it performs a bunch of pointless, futile syscache lookups with InvalidOid, but ultimately tolerates it and has no effect. We were calling it with InvalidOid on dropped columns, but it seems accidental that it works, so let's stop doing it. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi Backpatch-through: 14	2026-04-23 21:28:26 +03:00
Heikki Linnakangas	dd40691976	Don't allow composite type to be member of itself via multirange CheckAttributeType() checks that a composite type is not made a member of itself with ALTER TABLE ADD COLUMN or ALTER TYPE ADD ATTRIBUTE, even indirectly via a domain, array, another composite type or a range type. But it missed checking for multiranges. That was a simple oversight when multiranges were added. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi Backpatch-through: 14	2026-04-23 21:28:11 +03:00
David Rowley	4f0cbc6fb5	Fix new-to-v19 -Wshadow warnings There's some talk about upgrading our current -Wshadow=compatible-local up to -Wshadow. There's some pending questions as to whether the churn and extra backpatching pain are worthwhile for doing all of them. We can't use the latter argument for ones that are new to v19, providing we fix them now. So let's fix those ones so that the problem is not any worse for if we decide to fix the remainder for v20. Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Yuchen Li <liyuchen_xyz@163.com> Discussion: https://postgr.es/m/CAApHDvp=rx5GxM=yW8QhFF3noXtYt7LkOxJ7zkaPOzpti4Gm8w@mail.gmail.com	2026-04-23 16:49:29 +12:00
Jeff Davis	dbf217c1c7	catcache.c: use C_COLLATION_OID for texteqfast/texthashfast. The problem report was about setting GUCs in the startup packet for a physical replication connection. Setting the GUC required an ACL check, which performed a lookup on pg_parameter_acl.parname. The catalog cache was hardwired to use DEFAULT_COLLATION_OID for texteqfast() and texthashfast(), but the database default collation was uninitialized because it's a physical walsender and never connects to a database. In versions 18 and later, this resulted in a NULL pointer dereference, while in version 17 it resulted in an ERROR. As the comments stated, using DEFAULT_COLLATION_OID was arbitrary anyway: if the collation actually mattered, it should have used the column's actual collation. (In the catalog, some text columns are the default collation and some are "C".) Fix by using C_COLLATION_OID, which doesn't require any initialization and is always available. When any deterministic collation will do, it's best to consistently use the simplest and fastest one, so this is a good idea anyway. Another problem was raised in the thread, which this commit doesn't fix (see second discussion link). Reported-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/D18AD72A-5004-4EF8-AF80-10732AF677FA@yandex-team.ru Discussion: https://postgr.es/m/4524ed61a015d3496fc008644dcb999bb31916a7.camel%40j-davis.com Backpatch-through: 17	2026-04-22 10:22:44 -07:00

1 2 3 4 5 ...

28539 commits