postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-06 17:55:50 -04:00

Author	SHA1	Message	Date
Etsuro Fujita	de28140ded	postgres_fdw: Inherit the local transaction's access/deferrable modes. READ ONLY transactions should prevent modifications to foreign data as well as local data, but postgres_fdw transactions declared as READ ONLY that reference foreign tables mapped to a remote view executing volatile functions would modify data on remote servers, as it would open remote transactions in READ WRITE mode. Similarly, DEFERRABLE transactions should not abort due to a serialization failure even when accessing foreign data, but postgres_fdw transactions declared as DEFERRABLE would abort due to that failure in a remote server, as it would open remote transactions in NOT DEFERRABLE mode. To fix, modify postgres_fdw to open remote transactions in the same access/deferrable modes as the local transaction. This commit also modifies it to open remote subtransactions in the same access mode as the local subtransaction. This commit changes the behavior of READ ONLY/DEFERRABLE transactions using postgres_fdw; in particular, it doesn't allow the READ ONLY transactions to modify data on remote servers anymore, so such transactions should be redeclared as READ WRITE or rewritten using other tools like dblink. The release notes should note this as an incompatibility. These issues exist since the introduction of postgres_fdw, but to avoid the incompatibility in the back branches, fix them in master only. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com Discussion: https://postgr.es/m/E1uLe9X-000zsY-2g%40gemulon.postgresql.org	2026-04-05 18:55:00 +09:00
Thomas Munro	fc44f10665	aio: Simplify pgaio_worker_submit(). Merge pgaio_worker_submit_internal() and pgaio_worker_submit(). The separation didn't serve any purpose. Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com	2026-04-05 18:07:21 +12:00
Andres Freund	f63ca33790	read_stream: Only increase read-ahead distance when waiting for IO This avoids increasing the distance to the maximum in cases where the I/O subsystem is already keeping up. This turns out to be important for performance for two reasons: - Pinning a lot of buffers is not cheap. If additional pins allow us to avoid IO waits, it's definitely worth it, but if we can already do all the necessary readahead at a distance of 16, reading ahead 512 buffers can increase the CPU overhead substantially. This is particularly noticeable when the to-be-read blocks are already in the kernel page cache. - If the read stream is read to completion, reading in data earlier than needed is of limited consequences, leaving aside the CPU costs mentioned above. But if the read stream will not be fully consumed, e.g. because it is on the inner side of a nested loop join, the additional IO can be a serious performance issue. This is not that commonly a problem for current read stream users, but the upcoming work, to use a read stream to fetch table pages as part of an index scan, frequently encounters this. Note that this commit would have substantial performance downsides without earlier commits: - Commit `6e36930f9a`, which avoids decreasing the readahead distance when there was recent IO, is crucial, as otherwise we very often would end up not reading ahead aggressively enough anymore with this commit, due to increasing the distance less often. - "read stream: Split decision about look ahead for AIO and combining" is important as we would otherwise not perform IO combining when the IO subsystem can keep up. - "aio: io_uring: Trigger async processing for large IOs" is important to continue to benefit from memory copy parallelism when using fewer IOs. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Tested-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com	2026-04-05 00:43:54 -04:00
Andres Freund	8ca147d582	read stream: Split decision about look ahead for AIO and combining In a subsequent commit the read-ahead distance will only be increased when waiting for IO. Without further work that would cause a regression: As IO combining and read-ahead are currently controlled by the same mechanism, we would end up not allowing IO combining when never needing to wait for IO (as the distance ends up too small to allow for full sized IOs), which can increase CPU overhead. A typical reason to not have to wait for IO completion at a low look-ahead distance is use of io_uring with the to-be-read data in the page cache. But even with worker the IO submission rate may be low enough for the worker to keep up. One might think that we could just always perform IO combining, but doing so at the start of a scan can cause performance regressions: 1) Performing a large IO commonly has a higher latency than smaller IOs. That is not a problem once reading ahead far enough, but at the start of a stream it can lead to longer waits for IO completion. 2) Sometimes read streams will not be read to completion. Immediately starting with full sized IOs leads to more wasted effort. This is not commonly an issue with existing read stream users, but the upcoming use of read streams to fetch table pages as part of an index scan frequently encounters this. Solve this issue by splitting ReadStream->distance into ->combine_distance and ->readahead_distance. Right now they are increased/decreased at the same time, but that will change in the next commit. One of the comments in read_stream_should_look_ahead() refers to a motivation that only really exists as of the next commit, but without it the code doesn't make sense on its own. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com	2026-04-05 00:43:54 -04:00
Andres Freund	434dab76ba	read_stream: Move logic about IO combining & issuing to helpers The long if statements were hard to read and hard to document. Splitting them into inline helpers makes it much easier to explain each part separately. This is done in preparation for making the logic more complicated... Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu	2026-04-05 00:43:54 -04:00
Andres Freund	a9ee668817	aio: io_uring: Trigger async processing for large IOs io_method=io_uring has a heuristic to trigger asynchronous processing of IOs once the IO depth is a bit larger. That heuristic is important when doing buffered IO from the kernel page cache, to allow parallelizing of the memory copy, as otherwise io_method=io_uring would be a lot slower than io_method=worker in that case. An upcoming commit will make read_stream.c only increase the read-ahead distance if we needed to wait for IO to complete. If to-be-read data is in the kernel page cache, io_uring will synchronously execute IO, unless the IO is flagged as async. Therefore the aforementioned change in read_stream.c heuristic would lead to a substantial performance regression with io_uring when data is in the page cache, as we would never reach a deep enough queue to actually trigger the existing heuristic. Parallelizing the copy from the page cache is mainly important when doing a lot of IO, which commonly is only possible when doing largely sequential IO. The reason we don't just mark all io_uring IOs as asynchronous is that the dispatch to a kernel thread has overhead. This overhead is mostly noticeable with small random IOs with a low queue depth, as in that case the gain from parallelizing the memory copy is small and the latency cost high. The facts from the two prior paragraphs show a way out: Use the size of the IO in addition to the depth of the queue to trigger asynchronous processing. One might think that just using the IO size might be enough, but experimentation has shown that not to be the case - with deep look-ahead distances being able to parallelize the memory copy is important even with smaller IOs. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com	2026-04-05 00:43:54 -04:00
John Naylor	2849fe4c97	Fix unused function warning on Arm platforms Guard definition pg_pmull_available() on compile-time availability of PMULL. Oversight in `fbc57f2bc`. In passing, remove "inline" hint for consistency. Reported-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/f153d5a4-a9be-4211-b0b2-7e99b56d68d5@vondra.me	2026-04-05 08:49:47 +07:00
Álvaro Herrera	69c11f0545	Modernize struct declarations in snapbuild.h Just a cosmetic cleanup.	2026-04-05 00:21:53 +02:00
Álvaro Herrera	33bf7318f9	Make index_concurrently_create_copy more general Also rename it to index_create_copy. Add a 'boolean concurrent' option, and make it work for both cases: in concurrent mode, just create the catalog entries; caller is responsible for the actual building later. In non-concurrent mode, the index is built right away. This allows it to be reused for other purposes -- specifically, for concurrent REPACK. (With the CONCURRENTLY option, REPACK cannot simply swap the heap file and rebuild its indexes. Instead, it needs to build a separate set of indexes, including their system catalog entries, before the actual swap, to reduce the time AccessExclusiveLock needs to be held for. This approach is different from what CREATE INDEX CONCURRENTLY does.) Per a suggestion from Mihail Nikalayeu. Author: Antonin Houska <ah@cybertec.at> Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/41104.1754922120@localhost	2026-04-04 20:38:26 +02:00
Peter Geoghegan	2d3490dd99	heapam: Keep buffer pins across index scan resets. Avoid dropping the heap page pin (xs_cbuf) and visibility map pin (xs_vmbuffer) within heapam_index_fetch_reset. Retaining these pins saves cycles during certain nested loop joins and merge joins that frequently restore a saved mark: cases where the next tuple fetched after a reset often falls on the same heap page will now avoid the cost of repeated pinning and unpinning. Avoiding dropping the scan's heap page buffer pin is preparation for an upcoming patch that will add I/O prefetching to index scans. Testing of that patch (which makes heapam tend to pin more buffers concurrently than was typical before now) shows that the aforementioned cases get a small but clearly measurable benefit from this optimization. Upcoming work to add a slot-based table AM interface for index scans (which is further preparation for prefetching) will move VM checks for index-only scans out of the executor and into heapam. That will expand the role of xs_vmbuffer to include VM lookups for index-only scans (the field won't just be used for setting pages all-visible during on-access pruning via the enhancement recently introduced by commit `b46e1e54`). Avoiding dropping the xs_vmbuffer pin will preserve the historical behavior of nodeIndexonlyscan.c, which always kept this pin on a rescan; that aspect of this commit isn't really new. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-04-04 13:49:37 -04:00
Heikki Linnakangas	fda5300132	Remove unnecessary #include "spin.h" from shmem.h Commit `6b8238cb6a` removed the last usage of slock_t from the file. proc.c was relying the indirect #include, so add it to proc.c directly.	2026-04-04 20:22:04 +03:00
Peter Geoghegan	c7d09595e4	heapam: Track heap block in IndexFetchHeapData. Add an explicit BlockNumber field (xs_blk) to IndexFetchHeapData that tracks which heap block is currently pinned in xs_cbuf. heapam_index_fetch_tuple now uses xs_blk to determine when buffer switching is needed, replacing the previous approach that compared buffer identities via ReleaseAndReadBuffer on every non-HOT-chain call. This is preparatory work for an upcoming commit that will add index prefetching using a read stream. Delegating the release of a currently pinned buffer to ReleaseAndReadBuffer won't work anymore -- at least not when the next buffer that the scan needs to pin is one returned by read_stream_next_buffer (not a buffer returned by ReadBuffer). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-04-04 11:45:33 -04:00
Peter Geoghegan	a29fdd6c8d	Move heapam_handler.c index scan code to new file. Move the heapam index fetch callbacks (index_fetch_begin, index_fetch_reset, index_fetch_end, and index_fetch_tuple) into a new dedicated file. Also move heap_hot_search_buffer over. This is a purely mechanical move with no functional impact. Upcoming work to add a slot-based table AM interface for index scans will substantially expand this code. Keeping it in heapam_handler.c would clutter a file whose primary role is to wire up the TableAmRoutine callbacks. Bitmap heap scans and sequential scans would benefit from similar separation in the future. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh	2026-04-04 11:30:41 -04:00
Peter Geoghegan	1adff1a0c5	Rename heapam_index_fetch_tuple argument for clarity. Rename heapam_index_fetch_tuple's call_again argument to heap_continue, for consistency with the pointed-to variable name (IndexScanDescData's xs_heap_continue field). Preparation for an upcoming commit that will move index scan related heapam functions into their own file. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh	2026-04-04 11:30:05 -04:00
John Naylor	519acd1be5	Fix indentation Per buildfarm member koel	2026-04-04 21:50:54 +07:00
John Naylor	fbc57f2bc2	Compute CRC32C on ARM using the Crypto Extension where available In similar vein to commit `3c6e8c123`, the ARMv8 cryptography extension has 64x64 -> 128-bit carryless multiplication instructions suitable for computing CRC. This was tested to be around twice as fast as scalar CRC instructions for longer inputs. We now do a runtime check, even for builds that target "armv8-a+crc", but those builds can still use a direct call for constant inputs, which we assume are short. As for x86, the MIT-licensed implementation was generated with the "generate" program from https://github.com/corsix/fast-crc32/ Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/CANWCAZaKhE+RD5KKouUFoxx1EbUNrNhcduM1VQ=DkSDadNEFng@mail.gmail.com	2026-04-04 20:47:01 +07:00
John Naylor	5e13b0f240	Use AVX2 for calculating page checksums where available We already rely on autovectorization for computing page checksums, but on x86 we can get a further several-fold performance increase by annotating pg_checksum_block() with a function target attribute for the AVX2 instruction set extension. Not only does that use 256-bit registers, it can also use vector multiplication rather than the vector shifts and adds used in SSE2. Similar to other hardware-specific paths, we set a function pointer on first use. We don't bother to avoid this on platforms without AVX2 since the overhead of indirect calls doesn't matter for multi-kilobyte inputs. However, we do arrange so that only core has the function pointer mechanism. External programs will continue to build a normal static function and don't need to be aware of this. This matters most when using io_uring since in that case the checksum computation is not done in parallel by IO workers. Co-authored-by: Matthew Sterrett <matthewsterrett2@gmail.com> Co-authored-by: Andrew Kim <andrew.kim@intel.com> Reviewed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> Tested-by: Ants Aasma <ants.aasma@cybertec.at> Tested-by: Stepan Neretin <slpmcf@gmail.com> (earlier version) Discussion: https://postgr.es/m/CA+vA85_5GTu+HHniSbvvP+8k3=xZO=WE84NPwiKyxztqvpfZ3Q@mail.gmail.com Discussion: https://postgr.es/m/20250911054220.3784-1-root%40ip-172-31-36-228.ec2.internal	2026-04-04 18:07:15 +07:00
Heikki Linnakangas	c06443063f	Add missing shmem size estimate for fast-path locking struct It's been missing ever since fast-path locking was introduced. It's a small discrepancy, about 4 kB, but let's be tidy. This doesn't seem worth backpatching, however; in stable branches we were less precise about the estimates and e.g. added a 10% margin to the hash table estimates, which is usually much bigger than this discrepancy.	2026-04-04 11:46:11 +03:00
Thomas Munro	bab656bb87	More tar portability adjustments. For the three implementations that have caused problems so far: * GNU and BSD (libarchive) tar both understand --format=ustar * ustar doesn't support large UID/GID values, so set them to 0 to avoid a hard error from at least GNU tar * OpenBSD tar needs -F ustar, and it appears to warn but carry on with "nobody" if a UID is too large * -f /dev/null is a more portable way to throw away the output, since the default destination might be a tape device depending on build options that a distribution might change * Windows ships BSD tar but lacks /dev/null, so ask perl for its name Based on their manuals, the other two implementations the tests are likely to encounter in the wild don't seem to need any special handling: * Solaris/illumos tar uses ustar and replaces large UIDs with 60001 * AIX tar uses ustar (unless --format=pax) and truncates large UIDs Backpatch-through: 18 Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Sami Imseih <samimseih@gmail.com> (large UIDs) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version) Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> (OpenBSD) Reviewed-by: Andrew Dunstan <andrew@dunslane.net> (Windows) Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com	2026-04-04 13:54:21 +13:00
Heikki Linnakangas	4953a25b7f	Remove HASH_DIRSIZE, always use the default algorithm to select it It's not very useful to specify a non-standard directory size. The HASH_DIRSIZE option was only used for shared memory hash tables, and those always used hash_select_dirsize() to choose the size, which in turn just uses the default algorithm anyway. That assumption was ingrained in hash_estimate_size(), too. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi	2026-04-04 02:40:28 +03:00
Heikki Linnakangas	9fe9ecd516	Allocate all parts of shmem hash table from a single contiguous area Previously, the shared header (HASHHDR) and the directory were allocated by the caller, and passed to hash_create(), while the actual elements were allocated separately with ShmemAlloc(). After this commit, all the memory needed by the header, the directory, and all the elements is allocated using a single ShmemInitStruct() call, and the different parts are carved out of that allocation. This way the ShmemIndex entries (and thus pg_shmem_allocations) reflect the size of the whole hash table, rather than just the directories. Commit `f5930f9a98` attempted this earlier, but it had to be reverted. The new strategy is to let dynahash.c perform all the allocations with the alloc function, but have the alloc function carve out the parts from the one larger allocation. The shared header and the directory are now also allocated with alloc calls, instead of passing the area for those directly from the caller. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi	2026-04-04 02:40:25 +03:00
Heikki Linnakangas	999e9ebb51	Prevent shared memory hash tables from growing beyond initial size Set HASH_FIXED_SIZE on all shared memory hash tables, to prevent them from growing after the initial allocation. It was always weirdly indeterministic that if one hash table used up all the unused shared memory, you could not use that space for other things anymore until restart. We just got rid of that behavior for the LOCK and PROCLOCK tables, but it's similarly weird for all other hash tables. Increase SHMEM_INDEX_SIZE because we were already above the max size, on that one, and it's now a hard limit. Some callers of ShmemInitHash() still pass HASH_FIXED_SIZE, but that's now unnecessary. They should perhaps now be removed, but it doesn't do any harm either to pass it. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi	2026-04-04 02:40:24 +03:00
Heikki Linnakangas	9ebe1c4f2c	Merge init and max size options on shmem hash tables Replace the separate init and max size options with a single size option. We didn't make much use of the feature, all callers except the ones in wait_event.c already used the same size for both, and the hash tables in wait_event.c are small so there's little harm in just allocating them to the max size. The only reason why you might want to not reserve the max size upfront is to make the memory available for other hash tables to grow beyond their max size. Letting hash tables grow much beyond their max size is bad for performance, however, because we cannot resize the directory, and we never had very much "wiggle room" to grow to anyway so you couldn't really rely on it. We recently marked the LOCK and PROCLOCK tables with HAS_FIXED_SIZE, so there's nothing left in core that would benefit from more unallocated shared memory. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi	2026-04-04 02:40:20 +03:00
Jacob Champion	d438a36591	oauth: Let validators provide failure DETAILs At the moment, the only way for a validator module to report error details on failure is to log them separately before returning from validate_cb. Independently of that problem, the ereport() calls that we make during validation failure partially duplicate some of the work of auth_failed(). The end result is overly verbose and confusing for readers of the logs: [768233] LOG: [my_validator] bad signature in bearer token [768233] LOG: OAuth bearer authentication failed for user "jacob" [768233] DETAIL: Validator failed to authorize the provided token. [768233] FATAL: OAuth bearer authentication failed for user "jacob" [768233] DETAIL: Connection matched file ".../pg_hba.conf" line ... Solve both problems by making use of the existing logdetail pointer that's provided by ClientAuthentication. Validator modules may set ValidatorModuleResult->error_detail to override our default generic message. The end result looks something like [242284] FATAL: OAuth bearer authentication failed for user "jacob" [242284] DETAIL: [my_validator] bad signature in bearer token Connection matched file ".../pg_hba.conf" line ... Reported-by: Álvaro Herrera <alvherre@kurilemu.de> Reported-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/202601241015.y5uvxd7oxnfs%40alvherre.pgsql	2026-04-03 16:05:33 -07:00
Daniel Gustafsson	0036232ba8	Make data checksum tests more resilient for slow machines The test for re-running checksum enabling was only checking for the data checksum state to transition to 'on', but didn't account for the launcher process having had time to exit, thus getting an error instead of the expected no-op. Adding a pg_stat_activity check for the launcher exiting resolves the error, verified by inducing delay in the launcher. Also wrap a variable only used in injection point tests within the correct USE macros to avoid warning for an unused variable. All per the buildfarm. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Buildfarm Discussion: https://postgr.es/m/1CB288C9-564B-4664-B096-C2F4377D17AB@yesql.se	2026-04-04 00:25:07 +02:00
Nathan Bossart	01876ace13	Add elevel parameter to relation_needs_vacanalyze(). This will be used in a follow-up commit to avoid emitting debug logs from this function. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com	2026-04-03 17:04:28 -05:00
Nathan Bossart	53b8ca6881	Teach relation_needs_vacanalyze() to always compute scores. Presently, this function only computes component scores when the corresponding threshold is reached. A follow-up commit will add a view that shows tables' autovacuum scores, and we anticipate that users will want to use this view to discover tables that are nearing autovacuum eligibility. This commit teaches this function to always compute autovacuum scores, even when a threshold has not been reached or autovacuum is disabled. The restructuring in this commit revealed an interesting edge case. If the table needs vacuuming for wraparound prevention and autovacuum is disabled for it, we might still choose to analyze it. It's not clear if this is intentional, but it has been this way for nearly 20 years, so it seems best to avoid changing it without further discussion. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com	2026-04-03 16:44:41 -05:00
Daniel Gustafsson	f19c0eccae	Online enabling and disabling of data checksums This allows data checksums to be enabled, or disabled, in a running cluster without restricting access to the cluster during processing. Data checksums could prior to this only be enabled during initdb or when the cluster is offline using the pg_checksums app. This commit introduce functionality to enable, or disable, data checksums while the cluster is running regardless of how it was initialized. A background worker launcher process is responsible for launching a dynamic per-database background worker which will mark all buffers dirty for all relation with storage in order for them to have data checksums calculated on write. Once all relations in all databases have been processed, the data_checksums state will be set to on and the cluster will at that point be identical to one which had data checksums enabled during initialization or via offline processing. When data checksums are being enabled, concurrent I/O operations from backends other than the data checksums worker will write the checksums but not verify them on reading. Only when all backends have absorbed the procsignalbarrier for setting data_checksums to on will they also start verifying checksums on reading. The same process is repeated during disabling; all backends write checksums but do not verify them until the barrier for setting the state to off has been absorbed by all. This in-progress state is used to ensure there are no false negatives (or positives) due to reading a checksum which is not in sync with the page. A new testmodule, test_checksums, is introduced with an extensive set of tests covering both online and offline data checksum mode changes. The tests which run concurrent pgbdench during online processing are gated behind the PG_TEST_EXTRA flag due to being very expensive to run. Two levels of PG_TEST_EXTRA flags exist to turn on a subset of the expensive tests, or the full suite of multiple runs. This work is based on an earlier version of this patch which was reviewed by among others Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck and Andrey Borodin. During the work on this new version, Tomas Vondra has given invaluable assistance with not only coding and reviewing but very in-depth testing. Author: Daniel Gustafsson <daniel@yesql.se> Author: Magnus Hagander <magnus@hagander.net> Co-authored-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com	2026-04-03 22:58:51 +02:00
Nathan Bossart	8261ee24fe	Refactor relation_needs_vacanalyze(). This commit adds an early return to this function, allowing us to remove a level of indentation on a decent chunk of code. This is preparatory work for follow-up commits that will add a new system view to show tables' autovacuum scores. Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com	2026-04-03 14:03:12 -05:00
Heikki Linnakangas	79534f9065	Change default of max_locks_per_transactions to 128 The previous commits reduced the amount of memory available for locks by eliminating the "safety margins" and by settling the split between LOCK and PROCLOCK tables at startup. The allocation is now more deterministic, but it also means that you often hit one of the limits sooner than before. To compensate for that, bump up max_locks_per_transactions from 64 to 128. With that there is a little more space in the both hash tables than what was the effective maximum size for either table before the previous commits. This only changes the default, so if you had changed max_locks_per_transactions in postgresql.conf, you will still have fewer locks available than before for the same setting value. This should be noted in the release notes. A good rule of thumb is that if you double max_locks_per_transactions, you should be able to get as many locks as before. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi	2026-04-03 20:27:46 +03:00
Heikki Linnakangas	e1ad034809	Make the lock hash tables fixed-sized This prevents the LOCK table from "stealing" space that was originally calculated for the PROLOCK table, and vice versa. That was weirdly indeterministic so that if you e.g. took a lot of locks consuming all the available shared memory for the LOCK table, subsequent transactions that needed the more space for the PROCLOCK table would fail, but if you restarted the system then the space would be available for PROCLOCK again. Better to be strict and predictable, even though that means that in many cases you can acquire far fewer locks than before. This also prevents the lock hash tables from using up the general-purpose 100 kB reserve we set aside for "stuff that's too small to bother estimating" in CalculateShmemSize(). We are pretty good at accounting for everything nowadays, so we could probably make that reservation smaller, but I'll leave that for another commit. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi	2026-04-03 20:27:16 +03:00
Heikki Linnakangas	3e854d2ff1	Remove 10% safety margin from lock manager hash table estimates As the comment says, the hash table sizes are just estimates, but that doesn't mean we need a "safety margin" here. hash_estimate_size() estimates the needed size in bytes pretty accurately for the given number of elements, so if we wanted room for more elements in the table, we should just use larger max_table_size in the hash_estimate_size() call. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi	2026-04-03 20:26:18 +03:00
Heikki Linnakangas	feb03dfecd	Remove bogus "safety margin" from predicate.c shmem estimates The 10% safety margin was copy-pasted from lock.c when the predicate locking code was originally added. However, we later (commit `7c797e7194`) added the HASH_FIXED_SIZE flag to the hash tables, which means that they cannot actually use the safety margin that we're calculating for them. The extra memory was mainly used by the main lock manager, which is the only shmem hash table of non-trivial size that does not use the HASH_FIXED_SIZE flag. If we wanted to have more space for the lock manager, we should reserve it directly in lock.c. After this commit, the lock manager will just have less memory available than before. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi	2026-04-03 20:25:57 +03:00
Amit Langote	b7b27eb41a	Optimize fast-path FK checks with batched index probes Instead of probing the PK index on each trigger invocation, buffer FK rows in a new per-constraint cache entry (RI_FastPathEntry) and flush them as a batch. On each trigger invocation, the new ri_FastPathBatchAdd() buffers the FK row in RI_FastPathEntry. When the buffer fills (64 rows) or the trigger-firing cycle ends, the new ri_FastPathBatchFlush() probes the index for all buffered rows, sharing a single CommandCounterIncrement, snapshot, permission check, and security context switch across the batch, rather than repeating each per row as the SPI path does. Per-flush CCI is safe because all AFTER triggers for the buffered rows have already fired by flush time. For single-column foreign keys, the new ri_FastPathFlushArray() builds an ArrayType from the buffered FK values (casting to the PK-side type if needed) and constructs a scan key with the SK_SEARCHARRAY flag. The index AM sorts and deduplicates the array internally, then walks matching leaf pages in one ordered traversal instead of descending from the root once per row. A matched[] bitmap tracks which batch items were satisfied; the first unmatched item is reported as a violation. Multi-column foreign keys fall back to per-row probing via the new ri_FastPathFlushLoop(). The fast path introduced in the previous commit (`2da86c1ef9`) yields ~1.8x speedup. This commit adds ~1.6x on top of that, for a combined ~2.9x speedup over the unpatched code (int PK / int FK, 1M rows, PK table and index cached in memory). FK tuples are materialized via ExecCopySlotHeapTuple() into a new purpose-specific memory context (flush_cxt), child of TopTransactionContext, which is also used for per-flush transient work: cast results, the search array, and index scan allocations. It is reset after each flush and deleted in teardown. The PK relation, index, tuple slots, and fast-path metadata are cached in RI_FastPathEntry across trigger invocations within a trigger-firing batch, avoiding repeated open/close overhead. The snapshot and IndexScanDesc are taken fresh per flush. The entry is not subject to cache invalidation: cached relations are held with locks for the transaction duration, and the entry's lifetime is bounded by the trigger-firing cycle. Lifecycle management for RI_FastPathEntry relies on three new mechanisms: - AfterTriggerBatchCallback: A new general-purpose callback mechanism in trigger.c. Callbacks registered via RegisterAfterTriggerBatchCallback() fire at the end of each trigger-firing batch (AfterTriggerEndQuery for immediate constraints, AfterTriggerFireDeferred at COMMIT, and AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code registers ri_FastPathEndBatch as a batch callback. - Batch callbacks only fire at the outermost query level (checked inside FireAfterTriggerBatchCallbacks), so nested queries from SPI inside other AFTER triggers do not tear down the cache mid-batch. - XactCallback: ri_FastPathXactCallback NULLs the static cache pointer at transaction end, handling the abort path where the batch callback never fired. - SubXactCallback: ri_FastPathSubXactCallback NULLs the static cache pointer on subtransaction abort, preventing the batch callback from accessing already-released resources. - AfterTriggerBatchIsActive(): A new exported accessor that returns true when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are called directly outside the after-trigger framework, so batch callbacks would never fire. The fast-path code uses this to fall back to the non-cached per-invocation path in that context. ri_FastPathEndBatch() flushes any partial batch before tearing down cached resources. Since the FK relation may already be closed by flush time (e.g. for deferred constraints at COMMIT), it reopens the relation using entry->fk_relid if needed. The existing ALTER TABLE validation path bypasses batching and continues to call ri_FastPathCheck() directly per row, because RI triggers are called outside the after-trigger framework there and batch callbacks would never fire to flush the buffer. Suggested-by: David Rowley <dgrowleyml@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Co-authored-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Tested-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com	2026-04-03 14:33:53 +09:00
Thomas Munro	be21341e13	jit: No backport::SectionMemoryManager for LLVM 22. LLVM 22 has the fix that we copied into our tree in commit `9044fc1d` and a new function to reach it[1][2], so we only need to use our copy for Aarch64 + LLVM < 22. The only change to the final version that our copy didn't get is a new LLVM_ABI macro, but that isn't appropriate for us. Our copy is hopefully now frozen and would only need maintenance if bugs are found in the upstream code. Non-Aarch64 systems now also use the new API with LLVM 22. It allocates all sections with one contiguous mmap() instead of one per section. We could have done that earlier, but commit `9044fc1d` wanted to limit the blast radius to the affected systems. We might as well benefit from that small improvement everywhere now that it is available out of the box. We can't delete our copy until LLVM 22 is our minimum supported version, or we switch to the newer JITLink API for at least Aarch64. [1] https://github.com/llvm/llvm-project/pull/71968 [2] https://github.com/llvm/llvm-project/pull/174307 Backpatch-through: 14 Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com	2026-04-03 14:55:11 +13:00
Tom Lane	ebba64c08d	Further harden tests that might use not-so-compatible tar versions. Buildfarm testing shows that OpenSUSE (and perhaps related platforms?) configures GNU tar in such a way that it'll archive sparse WAL files by default, thus triggering the pax-extension detection code added by `bc30c704a`. Thus, we need something similar to `852de579a` but for GNU tar's option set. "--format=ustar" seems to do the trick. Moreover, the buildfarm shows that pg_verifybackup's 003_corruption.pl test script is also triggering creation of pax-format tar files on that platform. We had not noticed because those test cases all fail (intentionally) before getting to the point of trying to verify WAL data. Since that means two TAP scripts need this option-selection logic, and plausibly more will do so in future, factor it out into a subroutine in Test::Utils. We also need to back-patch the 003_corruption.pl fix into v18, where it's also failing. While at it, clean up some places where guards for $tar being empty or undefined were incomplete or even outright backwards. Presumably, we missed noticing because the set of machines that run TAP tests and don't have tar installed is empty. But if we're going to try to handle that scenario, we should do it correctly. Reported-by: Tomas Vondra <tomas@vondra.me> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/02770bea-b3f3-4015-8a43-443ae345379c@vondra.me Backpatch-through: 18	2026-04-02 17:21:27 -04:00
Andrew Dunstan	bd4f879a9c	Add additional jsonpath string methods Add the following jsonpath methods: * l/r/btrim() * lower(), upper() * initcap() * replace() * split_part() Each simply dispatches to the standard string processing functions. These depend on the locale, but since it's set at `initdb`, they can be considered immutable and therefore allowed in any jsonpath expression. Author: Florents Tselai <florents.tselai@gmail.com> Co-authored-by: David E. Wheeler <david@justatheory.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA+v5N40sJF39m0v7h=QN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ@mail.gmail.com	2026-04-02 15:19:49 -04:00
Andrew Dunstan	a35c9d524e	Rename jsonpath method arg tokens This is just cleanup in the jsonpath grammar. Rename the `csv_` tokens to `int_`, because they represent signed or unsigned integers, as follows: * `csv_elem` => `int_elem` * `csv_list` => `int_list` * `opt_csv_list` => `opt_int_list` Rename the `datetime_precision` tokens to `uint_arg`, as they represent unsigned integers and will be useful for other methods in the future, as follows: * `datetime_precision` => `uint_elem` * `opt_datetime_precision` => `opt_uint_arg` Rename the `datetime_template` tokens to `str_arg`, as they represent strings and will be useful for other methods in the future, as follows: * `datetime_template` => `str_elem` * `opt_datetime_template` => `opt_str_arg` Author: David E. Wheeler <david@justatheory.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA+v5N40sJF39m0v7h=QN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ@mail.gmail.com	2026-04-02 15:19:49 -04:00
Masahiko Sawada	fd7a25af11	Add target_relid parameter to pg_get_publication_tables(). When a tablesync worker checks whether a specific table is published, it previously issued a query to the publisher calling pg_get_publication_tables() and filtering the result by relid via a WHERE clause. Because the function itself was fully evaluated before the filter was applied, this forced the publisher to enumerate all tables in the publication. For publications covering a large number of tables, this resulted in expensive catalog scans and unnecessary CPU overhead on the publisher. This commit adds a new overloaded form of pg_get_publication_tables() that accepts an array of publication names and a target table OID. Instead of enumerating all published tables, it evaluates membership for the specified relation via syscache lookups, using the new is_table_publishable_in_publication() helper. This helper correctly accounts for publish_via_partition_root, ALL TABLES with EXCEPT clauses, schema publications, and partition inheritance, while avoiding the overhead of building the complete published table list. The existing VARIADIC array form of pg_get_publication_tables() is preserved for backward compatibility. Tablesync workers use the new two-argument form when connected to a publisher running PostgreSQL 19 or later. Bump catalog version. Reported-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Haoyan Wang <wanghaoyan20@163.com> Discussion: https://postgr.es/m/CAB-JLwbBFNuASyEnZWP0Tck9uNkthBZqi6WoXNevUT6+mV8XmA@mail.gmail.com	2026-04-02 11:34:50 -07:00
Tom Lane	bc30c704ad	Harden astreamer tar parsing logic against archives it can't handle. Previously, there was essentially no verification in this code that the input is a tar file at all, let alone that it fits into the subset of valid tar files that we can handle. This was exposed by the discovery that we couldn't handle files that FreeBSD's tar makes, because it's fairly aggressive about converting sparse WAL files into sparse tar entries. To fix: * Bail out if we find a pax extension header. This covers the sparse-file case, and also protects us against scenarios where the pax header changes other file properties that we care about. (Eventually we may extend the logic to actually handle such headers, but that won't happen in time for v19.) * Be more wary about tar file type codes in general: do not assume that anything that's neither a directory nor a symlink must be a regular file. Instead, we just ignore entries that are none of the three supported types. * Apply pg_dump's isValidTarHeader to verify that a purported header block is actually in tar format. To make this possible, move isValidTarHeader into src/port/tar.c, which is probably where it should have been since that file was created. I also took the opportunity to const-ify the arguments of isValidTarHeader and tarChecksum, and to use symbols not hard-wired constants inside tarChecksum. Back-patch to v18 but not further. Although this code exists inside pg_basebackup in older branches, it's not really exposed in that usage to tar files that weren't generated by our own code, so it doesn't seem worth back-porting these changes across `3c9056981` and `f80b09bac`. I did choose to include a back-patch of `5868372bb` into v18 though, to minimize cosmetic differences between these two branches. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/3049460.1775067940@sss.pgh.pa.us> Backpatch-through: 18	2026-04-02 12:20:36 -04:00
Fujii Masao	5770679918	Remove redundant SetLatch() calls in interrupt handling functions Interrupt handling functions (e.g., HandleCatchupInterrupt(), HandleParallelApplyMessageInterrupt()) are called only by procsignal_sigusr1_handler(), which already calls SetLatch() for the current process at the end of its processing. Therefore, these interrupt handling functions do not need to call SetLatch() themselves. However, previously, some of these functions redundantly called SetLatch(). This commit removes those unnecessary calls. While duplicate SetLatch() calls are redundant, they are harmless, so this change is not backpatched. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CALj2ACWd5apddj6Cd885WwJ6LquYu_G81C4GoR4xSoDV1x-FEA@mail.gmail.com	2026-04-02 23:55:30 +09:00
John Naylor	effaa464af	Check for __cpuidex and __get_cpuid_count separately Previously we would only check for the availability of __cpuidex if the related __get_cpuid_count was not available on a platform. Future commits will need to access hypervisor information about the TSC frequency of x86 CPUs. For that case __cpuidex is the only viable option for accessing a high leaf (e.g. 0x40000000), since __get_cpuid_count does not allow that. __cpuidex is defined in cpuid.h for gcc/clang, but in intrin.h for MSVC, so adjust tests to suite. We also need to cast the array of unsigned ints to signed, since gcc (with -Wall) and clang emit warnings otherwise. Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <john.naylor@postgresql.org> Discussion: https://postgr.es/m/CAP53PkyooCeR8YV0BUD_xC7oTZESHz8OdA=tP7pBRHFVQ9xtKg@mail.gmail.com	2026-04-02 19:39:57 +07:00
Andrew Dunstan	bb6ae9707c	Use command_ok for pg_regress calls in 002_pg_upgrade and 027_stream_regress Now that command_ok() captures and displays failure output, use it instead of system() plus manual diff-dumping in these two tests. This simplifies both scripts and produces consistent, truncated output on failure. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl	2026-04-02 08:13:44 -04:00
Andrew Dunstan	b8da9869b8	perl tap: Use croak instead of die in our helper modules Replace die with croak throughout Cluster.pm and Utils.pm (except in INIT blocks and signal handlers, where die is correct) so that error messages report the test script's line number rather than the helper module's. Add @CARP_NOT in Utils.pm listing PostgreSQL::Test::Cluster, so that when a Utils function is called through a Cluster.pm wrapper, croak skips both packages and reports the actual test-script caller. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl	2026-04-02 08:13:44 -04:00
Andrew Dunstan	76540fdedf	perl tap: Show die reason in TAP output Install a $SIG{__DIE__} handler in the INIT block of Utils.pm that emits the die message as a TAP diagnostic. Previously, an unexpected die (e.g. from safe_psql) produced only "no plan was declared" with no indication of the actual error. The handler also calls done_testing() to suppress that confusing message. Dies during compilation ($^S undefined) and inside eval ($^S == 1) are left alone. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl Discussion: https://postgr.es/m/20220222181924.eehi7o4pmneeb4hm%40alap3.anarazel.de	2026-04-02 08:13:44 -04:00
Andrew Dunstan	1402b8d2fc	perl tap: Show failed command output Capture stdout and stderr from command_ok() and command_fails() and emit them as TAP diagnostics on failure. Output is truncated to the first and last 30 lines per channel to avoid flooding. A new helper _diag_command_output() is introduced in Utils.pm so both functions share the same truncation and formatting logic. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl	2026-04-02 08:13:44 -04:00
Andrew Dunstan	5720ae0143	pg_regress: Include diffs in TAP output When pg_regress fails it is often tedious to find the actual diffs, especially in CI where you must navigate a file browser. Emit the first 80 lines of the combined regression.diffs as TAP diagnostics so the failure reason is visible directly in the test output. The line limit is across all failing tests in a single pg_regress run to avoid flooding when a crash causes every subsequent test to fail. New DIAG_DETAIL / DIAG_END tap output types are added, mirroring the existing NOTE_DETAIL / NOTE_END pair, so that long diff lines can be emitted without spurious '#' prefixes on continuation lines. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl	2026-04-02 08:13:44 -04:00
Tomas Vondra	7f8c88c2b8	jit: Change the default to off. While JIT can speed up large analytical queries, it can also cause serious performance issues on otherwise very fast queries. Compiling and optimizing the expressions may be so expensive, it completely outweighs the JIT benefits for shorter queries. Ideally, we'd address this in the cost model, but the part deciding whether to enable JIT for a query is rather simple, partially because we don't have any reliable estimates of how expensive the LLVM compilation and optimization is. Sometimes seemingly unrelated changes (for example a couple additional INSERTs into a table) increase the cost just enough to enable JIT, resulting in a performance cliff. Because of these risks, most large-scale deployments already disable JIT by default. Notably, this includes all hyperscalers. This commit changes our default to align with that established practice. If we improve the JIT (be it better costing or cheaper execution), we can consider enabling it by default again. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/DG1VZJEX1AQH.2EH4OKGRUDB71@jeltef.nl	2026-04-02 13:40:29 +02:00
Heikki Linnakangas	148fe2b05d	Test pg_stat_statements across crash restart Add 'pg_stat_statements' to the crash restart test, to test that shared memory and LWLock initialization works across crash restart in a library listed in shared_preload_libraries. We had no test coverage for that. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com	2026-04-02 13:33:06 +03:00
Amit Kapila	4441d6b2e4	Doc: Fix oversight in commit `55cefadde8`. pg_publication_rel.prrelid refers to sequences whereas stores information only of tables. Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Discussion: https://postgr.es/m/CAHut+Pv1UKR_bxmN7wcCCpQveHoYprvH-hbdFq8gsaH1Ye7B_w@mail.gmail.com	2026-04-02 10:16:53 +05:30

1 2 3 4 5 ...

63904 commits