postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-22 10:30:21 -04:00

Author	SHA1	Message	Date
Peter Eisentraut	2f094e7ac6	SQL Property Graph Queries (SQL/PGQ) Implementation of SQL property graph queries, according to SQL/PGQ standard (ISO/IEC 9075-16:2023). This adds: - GRAPH_TABLE table function for graph pattern matching - DDL commands CREATE/ALTER/DROP PROPERTY GRAPH - several new system catalogs and information schema views - psql \dG command - pg_get_propgraphdef() function for pg_dump and psql A property graph is a relation with a new relkind RELKIND_PROPGRAPH. It acts like a view in many ways. It is rewritten to a standard relational query in the rewriter. Access privileges act similar to a security invoker view. (The security definer variant is not currently implemented.) Starting documentation can be found in doc/src/sgml/ddl.sgml and doc/src/sgml/queries.sgml. Author: Peter Eisentraut <peter@eisentraut.org> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: Henson Choi <assam258@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org	2026-03-16 10:14:18 +01:00
Fujii Masao	fd6ecbfa75	Ensure "still waiting on lock" message is logged only once per wait. When log_lock_waits is enabled, the "still waiting on lock" message is normally emitted only once while a session continues waiting. However, if the wait is interrupted, for example by wakeups from client_connection_check_interval, SIGHUP for configuration reloads, or similar events, the message could be emitted again each time the wait resumes. For example, with very small client_connection_check_interval values (e.g., 100 ms), this behavior could flood the logs with repeated messages, making them difficult to use. To prevent this, this commit guards the "still waiting on lock" message so it is reported at most once during a lock wait, even if the wait is interrupted. This preserves the intended behavior when no interrupts occur. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hüseyin Demir <huseyin.d3r@gmail.com> Discussion: https://postgr.es/m/CAHGQGwHZUmg+r4kMcPYt_Z-txxVX+CJJhfra+qemxKXvAxYbpw@mail.gmail.com	2026-03-16 18:10:57 +09:00
Michael Paquier	c336133c65	Reject ALTER TABLE .. CLUSTER earlier for partitioned tables ALTER TABLE .. CLUSTER ON and SET WITHOUT CLUSTER are not supported for partitioned tables and already fail with a check happening when the sub-command is executed, not when it is prepared. This commit moves the relkind check for partitioned tables to happen when the sub-command is prepared in ATSimplePermissions(). This matches with the practice of the other sub-commands of ALTER TABLE, shaving one translatable string. mark_index_clustered() can be a bit simplified, switching one elog(ERROR) to an assertion. Note that mark_index_clustered() can also be called through a CLUSTER command, but it cannot be reached for a partitioned table, per the assertion based on the relkind in cluster_rel(), and there is only one caller of rebuild_relation(). Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com	2026-03-16 17:48:39 +09:00
Fujii Masao	8fe315f18d	Add stats_reset column to pg_statio_all_sequences pg_statio_all_sequences lacked a stats_reset column, unlike the other pg_statio_* views that already expose it. This commit adds the column so users can see when the statistics in this view were last reset. Also this commit updates the documentation for pg_stat_reset_single_table_counters() to clarify that it can reset statistics for sequences and materialized views as well. Catalog version bumped. Author: Sami Imseih <samimseih@gmail.com> Co-authored-by: Shihao Zhong <zhong950419@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0v0OPGyDpwxkX81CtTt9xsj9-TNxhm=8JdOvEKPsVVFNg@mail.gmail.com	2026-03-16 17:24:08 +09:00
Peter Eisentraut	a41bc38439	Fix accidentally casting away const Recently introduced in commit `8c2b30487c`.	2026-03-16 07:37:03 +01:00
Amit Kapila	5f39698c90	Remove obsolete speculative insert cleanup in ReorderBuffer. Commit `4daa140a2f` introduced proper decoding for speculative aborts. As a result, the internal state is guaranteed to be clean when a new speculative insert is encountered. This patch removes the defensive cleanup code that is no longer reachable. Author: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/23256.1772702981@localhost	2026-03-16 10:14:22 +05:30
Michael Paquier	bfa3c4f106	Optimize hash index bulk-deletion with streaming read This commit refactors hashbulkdelete() to use streaming reads, improving the efficiency of the operation by prefetching upcoming buckets while processing a current bucket. There are some specific changes required to make sure that the cleanup work happens in accordance to the data pushed to the stream read callback. When the cached metadata page is refreshed to be able to process the next set of buckets, the stream is reset and the data fed to the stream read callback has to be updated. The reset needs to happen in two code paths, when _hash_getcachedmetap() is called. The author has seen better performance numbers than myself on this one (with tweaks similar to `6c228755ad`). The numbers are good enough for both of us that this change is worth doing, in terms of IO and runtime. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com	2026-03-16 09:22:09 +09:00
Tom Lane	82ff54377e	Move -ffast-math defense to float.c and remove the configure check. We had defenses against -ffast-math in timestamp-related files, which is a pretty obsolete place for them since we've not supported floating-point timestamps in a long time. Remove those and instead put one in float.c, which is still broken by using this switch. Add some commentary to put more color on why it's a bad idea. Also remove the check from configure. That was just there to fail faster, but it doesn't really seem necessary anymore, and besides we have no corresponding check in meson.build. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Suggested-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/abFXfKC8zR0Oclon%40ip-10-97-1-34.eu-west-3.compute.internal	2026-03-15 19:34:52 -04:00
Tom Lane	c675d80d72	Be more careful about int vs. Oid in ecpglib. Print an OID value inserted into a SQL query with %u not %d. The existing code accidentally fails to malfunction when given an OID above 2^31, but only accidentally; future changes to our SQL parser could perhaps break it. Declare the Oid values that ecpg_type_infocache_push() and ecpg_is_type_an_array() work with as "Oid" not "int". This doesn't have any functional effect, but it's clearer. At the moment I don't see a need to back-patch this. Bug: #19429 Author: fairyfar@msn.com Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19429-aead3b1874be1a99@postgresql.org	2026-03-15 18:55:46 -04:00
David Rowley	c456e39113	Optimize tuple deformation This commit includes various optimizations to improve the performance of tuple deformation. We now precalculate CompactAttribute's attcacheoff, which allows us to remove the code from the deform routines which was setting the attcacheoff. Setting the attcacheoff is now handled by TupleDescFinalize(), which must be called before the TupleDesc is used for anything. Having TupleDescFinalize() means we can store the first attribute in the TupleDesc which does not have an offset cached. That allows us to add a dedicated deforming loop to deform all attributes up to the final one with an attcacheoff set, or up to the first NULL attribute, whichever comes first. Here we also improve tuple deformation performance of tuples with NULLs. Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask, deforming would, one-by-one, check each and every bit in the NULL bitmap to see if it was zero. Now, we process the NULL bitmap 1 byte at a time rather than 1 bit at a time to find the attnum with the first NULL. We can now deform the tuple without checking for NULLs up to just before that attribute. We also record the maximum attribute number which is guaranteed to exist in the tuple, that is, has a NOT NULL constraint and isn't an atthasmissing attribute. When deforming only attributes prior to the guaranteed attnum, we've no need to access the tuple's natt count. As an additional optimization, we only count fixed-width columns when calculating the maximum guaranteed column, as this eliminates the need to emit code to fetch byref types in the deformation loop for guaranteed attributes. Some locations in the code deform tuples that have yet to go through NOT NULL constraint validation. We're unable to perform the guaranteed attribute optimization when that's the case. This optimization is opt-in via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS flag. This commit also adds a more efficient way of populating the isnull array by using a bit-wise SWAR trick which performs multiplication on the inverse of the tuple's bitmap byte and masking out all but the lower bit of each of the boolean's byte. This results in much more optimal code when compared to determining the NULLness via att_isnull(). 8 isnull elements are processed at once using this method, which means we need to round the tts_isnull array size up to the next 8 bytes. The palloc code does this anyway, but the round-up needed to be formalized so as not to overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing this also allows the NULL-checking deforming loop to more efficiently check the isnull array, rather than doing the bit-wise processing for each attribute that att_isnull() does. The level of performance improvement from these changes seems to vary depending on the CPU architecture. Apple's M chips seem particularly fond of the changes, with some of the tested deform-heavy queries going over twice as fast as before. With x86-64, the speedups aren't quite as large. With tables containing only a small number of columns, the speedups will be less. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:46:00 +13:00
David Rowley	503620311e	Add all required calls to TupleDescFinalize() As of this commit all TupleDescs must have TupleDescFinalize() called on them once the TupleDesc is set up and before BlessTupleDesc() is called. In this commit, TupleDescFinalize() does nothing. This change has only been separated out from the commit that properly implements this function to make the change more obvious. Any extension which makes its own TupleDesc will need to be modified to call the new function. The follow-up commit which properly implements TupleDescFinalize() will cause any code which forgets to do this to fail in assert-enabled builds in BlessTupleDesc(). It may still be worth mentioning this change in the release notes so that extension authors update their code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:45:49 +13:00
Tom Lane	e5a77d876d	Save a few bytes per CatCTup. CatalogCacheCreateEntry() computed the space needed for a CatCTup as sizeof(CatCTup) + MAXIMUM_ALIGNOF. That's not our usual style, and it wastes memory by allocating more padding than necessary. On 64-bit machines sizeof(CatCTup) would be maxaligned already since it contains pointer fields, therefore this code is wasting 8 bytes compared to the more usual MAXALIGN(sizeof(CatCTup)). While at it, we don't really need to do MemoryContextSwitchTo() when we're only allocating one block. Author: ChangAo Chen <cca5507@qq.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/tencent_A42E0544C6184FE940CD8E3B14A3F0A39605@qq.com	2026-03-15 18:05:38 -04:00
Tom Lane	bb53b8d359	Fix small memory leak in get_dbname_oid_list_from_mfile(). Coverity complained that this function leaked the dumpdirpath string, which it did. But we don't need to make a copy at all, because there's not really any point in trimming trailing slashes from the directory name here. If that were needed, the initial file_exists_in_directory() test would have failed, since it doesn't bother with that (and neither does anyplace else in this file). Moreover, if we did want that, reimplementing canonicalize_path() poorly is not the way to proceed. Arguably, all of this code should be reexamined with an eye to using src/port/path.c's facilities, but for today I'll settle for getting rid of the memory leak.	2026-03-15 15:24:04 -04:00
Andrew Dunstan	a793677e57	pg_restore: Remove dead code in restore_all_databases() Cleanup from commit `763aaa06f0`. Author: Mahendra Singh Thalor <mahi6run@gmail.com> Discussion: https://postgr.es/m/CAKYtNAqN49Hqd4v0wWH3uW6d6QsH+8e8bR_MVf4CboTZSzd+Aw@mail.gmail.com	2026-03-15 12:13:19 -04:00
Melanie Plageman	99bf1f8aa6	Save vmbuffer in heap-specific scan descriptors for on-access pruning Future commits will use the visibility map in on-access pruning to fix VM corruption and set the VM if the page is all-visible. Saving the vmbuffer in the scan descriptor reduces the number of times it would need to be pinned and unpinned, making the overhead of doing so negligible. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/C3AB3F5B-626E-4AAA-9529-23E9A20C727F%40gmail.com	2026-03-15 11:09:10 -04:00
Melanie Plageman	8d2c1df4f4	Avoid BufferGetPage() calls in heap_update() BufferGetPage() isn't cheap and heap_update() calls it multiple times when it could just save the page from a single call. Do that. While we are at it, make separate variables for old and new page in heap_xlog_update(). It's confusing to reuse "page" for both pages. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_a%2BhO4PCptyaPR7AMZd7FjcHfOFKKJT8ouU3KedMud0tQ%40mail.gmail.com	2026-03-15 10:42:34 -04:00
Melanie Plageman	a3511443e5	Initialize missing fields in CreateExecutorState() `d47cbf474e` and `cbc127917e` forgot to initialize a few fields they introduced in the EState, so do that now. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com	2026-03-15 10:13:14 -04:00
Peter Eisentraut	cd083b54bd	Make typeof and typeof_unqual fallback definitions work on C++11 These macros were unintentionally using C++14 features. This replaces them with valid C++11 code. Tested locally by compiling with -std=c++11 (which reproduced the original issue). Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org	2026-03-15 07:36:27 +01:00
Tom Lane	0123ce131f	Switch the semaphore API on Solaris to unnamed POSIX. Solaris descendants (Illumos, OpenIndiana, OmniOS, etc.) hit System V semaphore limits ("No space left on device" from semget) when running many parallel test scripts under default system settings. We could tell people to raise those settings, but there's a better answer. Unnamed POSIX semaphores have been available on Solaris for decades and work well, so prefer them, as was recently done for AIX. This patch also updates the documentation to remove now-unnecessary advice about raising project.max-sem-ids and project.max-msg-ids. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Burd <greg@burd.me> Discussion: https://postgr.es/m/470305.1772417108@sss.pgh.pa.us	2026-03-14 14:10:32 -04:00
Tom Lane	2eb87345e1	Fix aclitemout() to work during early bootstrap. "initdb -d" has been broken since commit `f95d73ed4`, because I changed aclitemin to work in bootstrap mode but failed to consider aclitemout. That routine isn't reached by default, but it is if the elog message level is high enough, so it needs to work without catalog access too. This patch just makes it use its existing code paths to print role OIDs numerically. We could alternatively invent an inverse of boot_get_role_oid() and print them symbolically, but that would take more code and it's not apparent that it'd be any better for debugging purposes. Reported-by: Greg Burd <greg@burd.me> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/4416.1773328045@sss.pgh.pa.us	2026-03-14 13:46:54 -04:00
Tomas Vondra	02eecead86	Tighten asserts on ParallelWorkerNumber The comment about ParallelWorkerNumbr in parallel.c says: In parallel workers, it will be set to a value >= 0 and < the number of workers before any user code is invoked; each parallel worker will get a different parallel worker number. However asserts in various places collecting instrumentation allowed (ParallelWorkerNumber == num_workers). That would be a bug, as the value is used as index into an array with num_workers entries. Fixed by adjusting the asserts accordingly. Backpatch to all supported versions. Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Backpatch-through: 14	2026-03-14 15:26:39 +01:00
Michael Paquier	ae58189a4d	pgstattuple: Optimize pgstattuple_approx() with streaming read This commit plugs into pgstattuple_approx(), the SQL function faster than pgstattuple() that returns approximate results, the streaming read APIs. A callback is used to be able to skip all-visible pages via VM lookup, to match with the logic prior to this commit. Under test conditions similar to `6c228755ad` (some dm_delay and debug_io_direct=data), this can substantially improve the execution time of the function, particularly for large relations. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com	2026-03-14 15:06:13 +09:00
David Rowley	4deecb52af	Allow sibling call optimization in slot_getsomeattrs_int() This changes the TupleTableSlotOps contract to make it so the getsomeattrs() function is in charge of calling slot_getmissingattrs(). Since this removes all code from slot_getsomeattrs_int() aside from the getsomeattrs() call itself, we may as well adjust slot_getsomeattrs() so that it calls getsomeattrs() directly. We leave slot_getsomeattrs_int() intact as this is still called from the JIT code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com	2026-03-14 13:52:09 +13:00
Peter Geoghegan	8a879119a1	Use fake LSNs to improve nbtree dropPin behavior. Use fake LSNs in all nbtree critical sections that write a WAL record. That way we can safely apply the _bt_killitems LSN trick with logged and unlogged indexes alike. This brings the same benefits to plain scans of unlogged relations that commit `2ed5b87f` brought to plain scans of logged relations: scans will drop their leaf page pin eagerly (by applying the "dropPin" optimization), which avoids blocking progress by VACUUM. This is particularly helpful with applications that allow a scrollable cursor to remain idle for long periods. Preparation for an upcoming commit that will add the amgetbatch interface, and switch nbtree over to it (from amgettuple) to enable I/O prefetching. The index prefetching read stream's effective prefetch distance is adversely affected by any buffer pins held by the index AM. At the same time, it can be useful for prefetching to read dozens of leaf pages ahead of the scan to maintain an adequate prefetch distance. The index prefetching patch avoids this tension by always eagerly dropping index page pins of the kind traditionally held as an interlock against unsafe concurrent TID recycling by VACUUM (essentially the same way that amgetbitmap routines have always avoided holding onto pins). The work from this commit makes that possible during scans of nbtree unlogged indexes -- without our having to give up on setting LP_DEAD bits on index tuples altogether. Follow-up to commit `d774072f`, which moved the fake LSN infrastructure out of GiST so that it could be used by other index AMs. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com	2026-03-13 20:37:39 -04:00
Peter Geoghegan	d774072f00	Move fake LSN infrastructure out of GiST. Move utility functions used by GiST to generate fake LSNs into xlog.c and xloginsert.c, so that other index AMs can also generate fake LSNs. Preparation for an upcoming commit that will add support for fake LSNs to nbtree, allowing its dropPin optimization to be used during scans of unlogged relations. That commit is itself preparation for another upcoming commit that will add a new amgetbatch/btgetbatch interface to enable I/O prefetching. Bump XLOG_PAGE_MAGIC due to XLOG_GIST_ASSIGN_LSN becoming XLOG_ASSIGN_LSN. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com	2026-03-13 19:38:17 -04:00
Jeff Davis	9b860373da	Add error code to user-visible message. Reported-by: Alexander Lakhin <exclusion@gmail.com>	2026-03-13 16:07:54 -07:00
Tomas Vondra	b1f14c9672	Use GetXLogInsertEndRecPtr in gistGetFakeLSN The function used GetXLogInsertRecPtr() to generate the fake LSN. Most of the time this is the same as what XLogInsert() would return, and so it works fine with the XLogFlush() call. But if the last record ends at a page boundary, GetXLogInsertRecPtr() returns LSN pointing after the page header. In such case XLogFlush() fails with errors like this: ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000 Such failures are very hard to trigger, particularly outside aggressive test scenarios. Fixed by introducing GetXLogInsertEndRecPtr(), returning the correct LSN without skipping the header. This is the same as GetXLogInsertRecPtr(), except that it calls XLogBytePosToEndRecPtr(). Initial investigation by me, root cause identified by Andres Freund. This is a long-standing bug in gistGetFakeLSN(), probably introduced by `c6b92041d3` in PG13. Backpatch to all supported versions. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo Backpatch-through: 14	2026-03-13 23:25:24 +01:00
Heikki Linnakangas	311a851436	Free memory allocated for unrecognized_protocol_options Since `4966bd3ed9` Valgrind started to warn about little amount of memory being leaked in ProcessStartupPacket(). This is not critical but the warnings may distract from real issues. Fix it by freeing the list after use. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://www.postgresql.org/message-id/CAJ7c6TN3Hbb5p=UHx0SPVN+h_JwPAV6rxoqOm7gHBMFKfnGK-Q@mail.gmail.com	2026-03-13 23:37:19 +02:00
Nathan Bossart	233bbdf031	Add convenience view to stats import test. Presently, many statements in stats_import.sql select all columns from the pg_stats system view. A proposed follow-up commit would add columns to this view (some of which are not stable across test runs), breaking all of these tests. This commit introduces a convenience view for those statements so that future changes are minimally disruptive. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com	2026-03-13 15:04:10 -05:00
Andres Freund	ce5d489166	Fix bug due to confusion about what IsMVCCSnapshot means In `0b96e734c5` I (Andres) relied on page_collect_tuples() being called only with an MVCC snapshot, and added assertions to that end, but did not realize that IsMVCCSnapshot() allows both proper MVCC snapshots and historical snapshots, which behave quite similarly to MVCC snapshots. Unfortunately that can lead to incorrect visibility results during logical decoding, as a historical snapshot is interpreted as a plain MVCC snapshot. The only reason this wasn't noticed earlier is that it's hard to reach as most of the time there are no sequential scans during logical decoding. To fix the bug and avoid issues like this in the future, split IsMVCCSnapshot() into IsMVCCSnapshot() and IsMVCCLikeSnapshot(), where now only the latter includes historic snapshots. One effect of this is that during logical decoding no page-at-a-time snapshots are used, as otherwise runtime branches to handle historic snapshots would be needed in some performance critical paths. Given how uncommon sequential scans are during logical decoding, that seems acceptable. Author: Antonin Houska <ah@cybertec.at> Reported-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/61812.1770637345@localhost	2026-03-13 13:53:19 -04:00
Jacob Champion	b634c4e0e8	libpq-oauth: Fix Makefile dependencies As part of `6225403f2`, I'd removed the override for the `stlib` target, since NAME no longer contains a major version number. But I forgot that its dependencies are declared before Makefile.shlib is included; those dependencies were then omitted entirely. Per buildfarm member indri, which appears to be the only system so far that's bothered by an empty archive.	2026-03-13 10:34:03 -07:00
Jacob Champion	dba3560448	libpq-oauth: Never link against libpq's encoding functions Now that libpq-oauth doesn't have to match the major version of libpq, some things in pg_wchar.h are technically unsafe for us to use. (See `b6c7cfac8` for a fuller discussion.) This is unlikely to be a problem -- we only care about UTF-8 in the context of OAuth right now -- but if anyone did introduce a way to hit it, it'd be extremely difficult to debug or reproduce, and it'd be a potential security vulnerability to boot. Define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries to add a dependency on the exported APIs will simply fail to link the shared module. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com	2026-03-13 09:38:04 -07:00
Jacob Champion	6225403f27	libpq-oauth: Use the PGoauthBearerRequestV2 API Switch the private libpq-oauth ABI to a public one, based on the new PGoauthBearerRequestV2 API. A huge amount of glue code can be removed as part of this, and several code paths can be deduplicated. Additionally, the shared library no longer needs to change its name for every major release; it's now just "libpq-oauth.so". Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com	2026-03-13 09:37:59 -07:00
Nathan Bossart	e0a3a3fd53	Optimize COPY FROM (FORMAT {text,csv}) using SIMD. Presently, such commands scan the input buffer one byte at a time looking for special characters. This commit adds a new path that uses SIMD instructions to skip over chunks of data without any special characters. This can be much faster. To avoid regressions, SIMD processing is disabled for the remainder of the COPY FROM command as soon as we encounter a short line or a special character (except for end-of-line characters, else we'd always disable it after the first line). This is perhaps too conservative, but it could probably be made more lenient in the future via fine-tuned heuristics. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Co-authored-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Ayoub Kazar <ma_kazar@esi.dz> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Neil Conway <neil.conway@gmail.com> Reviewed-by: Greg Burd <greg@burd.me> Tested-by: Manni Wood <manni.wood@enterprisedb.com> Tested-by: Mark Wong <markwkm@gmail.com> Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com	2026-03-13 11:07:32 -05:00
Peter Eisentraut	8c2b30487c	Factor out constructSetOpTargetlist() from transformSetOperationTree() This would be used separately by a future patch. It also makes a little smaller. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org	2026-03-13 16:16:40 +01:00
Heikki Linnakangas	f9de9bf302	Add callback for I/O error messages in SLRUs Historically, all SLRUs were addressed by transaction IDs, but that hasn't been true for a long time. However, the error message on I/O error still always talked about accessing a transaction ID. This commit adds a callback that allows subsystems to construct their own error messages, which can then correctly refer to a transaction ID, multixid or whatever else is used to address the particular SLRU. Author: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/CACG=ezZZfurhYV+66ceubxQAyWqv9vaUi0yoO4-t48OE5xc0DQ@mail.gmail.com	2026-03-13 16:21:06 +02:00
Fujii Masao	723619eaa3	Add stats_reset column to pg_stat_database_conflicts. This commit adds a stats_reset column to pg_stat_database_conflicts, allowing users to see when the statistics in this view were last reset. This makes the view consistent with pg_stat_database and other statistics views. Catalog version bumped. Author: Shihao Zhong <zhong950419@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAGRkXqS98OebEWjax99_LVAECsxCB8i=BfsdAL34i-5QHfwyOQ@mail.gmail.com	2026-03-13 22:17:14 +09:00
Heikki Linnakangas	2e1dcf8c54	Check for interrupts during non-fast-update GIN insertion ginExtractEntries() can produce a lot of entries for a single item. During index build, we check for interrupts between entries, and the fast-update codepath does it as part of vacuum_delay_point(), but the non-fast update insertion codepath was uninterruptible. Add CHECK_FOR_INTERRUPTS() between entries in the non-fast update codepath too. Author: Vinod Sridharan <vsridh90@gmail.com> Discussion: https://www.postgresql.org/message-id/CAFMdLD6mQvAuStiOGvBJxAEfo6wdjZhj3+JveTLxOX8MVn4zmA@mail.gmail.com	2026-03-13 15:12:32 +02:00
Alexander Korotkov	fa6f2f624c	Rework ginScanToDelete() to pass Buffers instead of BlockNumbers. Previously, ginScanToDelete() and ginDeletePage() passed BlockNumbers and re-read pages that were already pinned and locked during the tree walk. The caller ginVacuumPostingTree()) held a cleanup-locked root buffer, yet ginScanToDelete() re-read it by block number with special-case code to skip re-locking. At first, this commit gives both functions more appropriate names, ginScanPostingTreeToDelete() and ginDeletePostingPage(), indicating they deal with posting trees/pages. This is more descriptive and similar to the way we name other GIN functions, for instance, ginVacuumPostingTree() and ginVacuumPostingTreeLeaves(). Then rework both functions to pass Buffers directly. DataPageDeleteStack now carries buffer, myoff (downlink offset in parent), and isRoot per level, so ginScanPostingTreeToDelete() takes only GinVacuumState and DataPageDeleteStack pointers. Also, ginDeletePostingPage() receives the three Buffers directly, and no longer reads or releases them itself. The caller reads and locks child pages before recursing, and manages buffer lifecycle afterward. This eliminates the confusing isRoot special cases in buffer management, including the apparent (but unreachable) double release of the root buffer identified by Andres Freund. Add comments explaining the locking protocol and the DataPageDeleteStack structure. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/utrlxij43fbguzw4kldte2spc4btoldizutcqyrfakqnbrp3ir@ph3sphpj4asz Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Jinbinge <jinbinge@126.com>	2026-03-13 13:50:13 +02:00
Heikki Linnakangas	f30cebb954	Fix pointer type of ShmemAllocatorData->index This went unnoticed in commit `e2362eb2bd` because the pointer is cast to/from a void pointer.	2026-03-13 11:00:15 +02:00
Peter Eisentraut	59292f7aac	Change copyObject() to use typeof_unqual Currently, when the argument of copyObject() is const-qualified, the return type is also, because the use of typeof carries over all the qualifiers. This is incorrect, since the point of copyObject() is to make a copy to mutate. But apparently no code ran into it. The new implementation uses typeof_unqual, which drops the qualifiers, making this work correctly. typeof_unqual is standardized in C23, but all recent versions of all the usual compilers support it even in non-C23 mode, at least as __typeof_unqual__. We add a configure/meson test for typeof_unqual and __typeof_unqual__ and use it if it's available, else we use the existing fallback of just returning void *. This is the second attempt, after the first attempt in commit `4cfce4e62c` was reverted. The following two points address problems with the earlier version: We test the underscore variant first so that there is a higher chance that clang used for bitcode also supports it, since we don't test that separately. Unlike the typeof test, the typeof_unqual test also tests with a void pointer similar to how copyObject() would use it, because that is not handled by MSVC, so we want the test to fail there. Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org	2026-03-13 07:06:57 +01:00
Andrew Dunstan	a0b6ef29a5	Enable fast default for domains with non-volatile constraints Previously, ALTER TABLE ADD COLUMN always forced a table rewrite when the column type was a domain with constraints (CHECK or NOT NULL), even if the default value satisfied those constraints. This was because contain_volatile_functions() considers CoerceToDomain immutable, so the code conservatively assumed any constrained domain might fail. Improve this by using soft error handling (ErrorSaveContext) to evaluate the CoerceToDomain expression at ALTER TABLE time. If the default value passes the domain's constraints, the value is stored as a "missing" attribute default and no table rewrite is needed. If the constraint check fails, we fall back to a table rewrite, preserving the historical behavior that constraint violations are only raised when the table actually contains rows. Domains with volatile constraint expressions always require a table rewrite since the constraint result could differ per evaluation and cannot be cached. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:05:01 -04:00
Andrew Dunstan	487cf2cbd2	Extend DomainHasConstraints() to optionally check constraint volatility Add an optional bool *has_volatile output parameter to DomainHasConstraints(). When non-NULL, the function checks whether any CHECK constraint contains a volatile expression. Callers that don't need this information pass NULL and get the same behavior as before. This is needed by a subsequent commit that enables the fast default optimization for domains with non-volatile constraints: we can safely evaluate such constraints once at ALTER TABLE time, but volatile constraints require a full table rewrite. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:04:16 -04:00
Peter Geoghegan	a367c433ad	Use simplehash for backend-private buffer pin refcounts. Replace dynahash with simplehash for the per-backend PrivateRefCountHash overflow table. Simplehash generates inlined, open-addressed lookup code, avoiding the per-call overhead of dynahash that becomes noticeable when many buffers are pinned with a CPU-bound workload. Motivated by testing of the index prefetching patch, which pins many more buffers concurrently than typical index scans. Author: Peter Geoghegan <pg@bowt.ie> Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-03-12 13:26:16 -04:00
Peter Geoghegan	d071e1cfec	nbtree: Avoid allocating _bt_search stack. Avoid allocating memory for an nbtree descent stack during index scans. We only require a descent stack during inserts, when it is used to determine where to insert a new pivot tuple/downlink into the target leaf page's parent page in the event of a page split. (Page deletion's first phase also performs a _bt_search that requires a descent stack.) This optimization improves performance by minimizing palloc churn. It speeds up index scans that call _bt_search frequently/descend the index many times, especially when the cost of scanning the index dominates (e.g., with index-only skip scans). Testing has shown that the underlying issue causes performance problems for an upcoming patch that will replace btgettuple with a new btgetbatch interface to enable I/O prefetching. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com	2026-03-12 13:22:36 -04:00
Robert Haas	5883ff30b0	Add pg_plan_advice contrib module. Provide a facility that (1) can be used to stabilize certain plan choices so that the planner cannot reverse course without authorization and (2) can be used by knowledgeable users to insist on plan choices contrary to what the planner believes best. In both cases, terrible outcomes are possible: users should think twice and perhaps three times before constraining the planner's ability to do as it thinks best; nevertheless, there are problems that are much more easily solved with these facilities than without them. This patch takes the approach of analyzing a finished plan to produce textual output, which we call "plan advice", that describes key decisions made during plan; if that plan advice is provided during future planning cycles, it will force those key decisions to be made in the same way. Not all planner decisions can be controlled using advice; for example, decisions about how to perform aggregation are currently out of scope, as is choice of sort order. Plan advice can also be edited by the user, or even written from scratch in simple cases, making it possible to generate outcomes that the planner would not have produced. Partial advice can be provided to control some planner outcomes but not others. Currently, plan advice is focused only on specific outcomes, such as the choice to use a sequential scan for a particular relation, and not on estimates that might contribute to those outcomes, such as a possibly-incorrect selectivity estimate. While it would be useful to users to be able to provide plan advice that affects selectivity estimates or other aspects of costing, that is out of scope for this commit. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Dian Fay <di@nmfay.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com	2026-03-12 13:00:43 -04:00
Michael Paquier	6c228755ad	Use streaming read for VACUUM cleanup of GIN This commit replace the synchronous ReadBufferExtended() loop done in ginvacuumcleanup() with the streaming read equivalent, to improve I/O efficiency during GIN index vacuum cleanup operations. With dm_delay to emulate some latency and debug_io_direct=data to force synchronous writes and force the read path to be exercised, the author has noticed a 5x improvement in runtime, with a substantial reduction in IO stats numbers. I have reproduced similar numbers while running similar tests, with improvements becoming better with more tuples and more pages manipulated. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com	2026-03-12 11:48:31 +09:00
Richard Guo	383eb21ebf	Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com	2026-03-12 09:45:18 +09:00
Andres Freund	6322a028fa	bufmgr: Fix use of wrong variable in GetPrivateRefCountEntrySlow() Unfortunately, in `30df61990c`, I made GetPrivateRefCountEntrySlow() set a wrong cache hint when moving entries from the hash table to the faster array. There are no correctness concerns due to this, just an unnecessary loss of performance. Noticed while testing the index prefetching patch. Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-03-11 17:52:21 -04:00
Jeff Davis	547c15f9f8	Fix use of volatile. Commit `8185bb5347` misused volatile. Fix it. See also `6307b096e2`. Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1bb21c7d-885f-4f07-a3ed-21b60d7c92c6@eisentraut.org	2026-03-11 14:27:58 -07:00

1 2 3 4 5 ...

47361 commits