postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-29 10:11:47 -04:00

Author	SHA1	Message	Date
Tom Lane	6da2ba1d8a	Fix detection and handling of strchrnul() for macOS 15.4. As of 15.4, macOS has strchrnul(), but access to it is blocked behind a check for MACOSX_DEPLOYMENT_TARGET >= 15.4. But our does-it-link configure check finds it, so we try to use it, and fail with the present default deployment target (namely 15.0). This accounts for today's buildfarm failures on indri and sifaka. This is the identical problem that we faced some years ago when Apple introduced preadv and pwritev in the same way. We solved that in commit `f014b1b9b` by using AC_CHECK_DECLS instead of AC_CHECK_FUNCS to check the functions' availability. So do the same now for strchrnul(). Interestingly, we already had a workaround for "the link check doesn't agree with <string.h>" cases with glibc, which we no longer need since only the header declaration is being checked. Testing this revealed that the meson version of this check has never worked, because it failed to use "-Werror=unguarded-availability-new". (Apparently nobody's tried to build with meson on macOS versions that lack preadv/pwritev as standard.) Adjust that while at it. Also, we had never put support for "-Werror=unguarded-availability-new" into v13, but we need that now. Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/385134.1743523038@sss.pgh.pa.us Backpatch-through: 13	2025-04-01 16:50:09 -04:00
Andrew Dunstan	c313fa4602	Use workaround of __builtin_setjmp only on MINGW on MSVCRT MSVCRT is not present Windows/ARM64 and the workaround is not necessary on any UCRT based toolchain. Author: Lars Kanis <lars@greiz-reinsdorf.de> Discussion: https://postgr.es/m/CAHXCYb2OjNHtoGVKyXtXmw4B3bUXwJX6M-Lcp1KcMCRUMLOocA@mail.gmail.com	2025-04-01 16:24:59 -04:00
Andres Freund	e19dc74491	aio: Minor comment improvements Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/usbwzckj7q3jhfx3ann3nrfnukmupbs35axvq5zfyeo6nvrzrm@onjhxs2du4st	2025-04-01 16:06:48 -04:00
Andres Freund	fdd146a8ef	aio: Add README.md explaining higher level design Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-04-01 16:06:48 -04:00
Andres Freund	00066aa173	md: Add comment & assert to buffer-zeroing path in md[start]readv() mdreadv() has a codepath to zero out buffers when a read returns zero bytes, guarded by a check for zero_damaged_pages \|\| InRecovery. The InRecovery codepath to zero out buffers in mdreadv() appears to be unreachable. The only known paths to reach mdreadv()/mdstartreadv() in recovery are XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf(), each of which takes care to extend the relation if necessary. This looks to either have been the case for a long time, or the code was never reachable. The zero_damaged_pages path is incomplete, as missing segments are not created. Putting blocks into the buffer-pool that do not exist on disk is rather problematic, as such blocks will, at least initially, not be found by scans that rely on smgrnblocks(), as they are beyond EOF. It also can cause weird problems with relation extension, as relation extension does not expect blocks beyond EOF to exist. Therefore we would like to remove that path. mdstartreadv(), which I added in e5fe570b51c, does not implement this zeroing logic. I had started a discussion about that a while ago (linked below), but forgot to act on the conclusion of the discussion, namely to disable the in-memory-zeroing behavior. We could certainly implement equivalent zeroing logic in mdstartreadv(), but it would have to be more complicated due to potential differences in the zero_damaged_pages setting between the definer and completor of IO. Given that we want to remove the logic, that does not seem worth implementing the necessary logic. For now, put an Assert(false) and comments documenting this choice into mdreadv() and comments documenting the deprecation of the path in mdreadv() and the non-implementation of it in mdstartreadv(). If we, during testing, discover that we do need the path, we can implement it at that time. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/postgr.es/m/20250330024513.ac.nmisch@google.com Discussion: https://postgr.es/m/postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd	2025-04-01 13:50:39 -04:00
Andres Freund	93bc3d75d8	aio: Add test_aio module To make the tests possible, a few functions from bufmgr.c/localbuf.c had to be exported, via buf_internals.h. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-04-01 13:47:46 -04:00
Andres Freund	60f566b4f2	aio: Add pg_aios view The new view lists all IO handles that are currently in use and is mainly useful for PG developers, but may also be useful when tuning PG. Bumps catversion. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-04-01 13:30:33 -04:00
Álvaro Herrera	172259afb5	Verify roundtrip dump/restore of regression database Add a test to pg_upgrade's test suite that verifies that dump-restore-dump of regression database produces equivalent output to dumping it directly. This was already being tested by running pg_upgrade itself, but non-binary-upgrade mode was not being covered. The regression database has accrued, over time, a sufficient collection of interesting objects to ensure good coverage, but there hasn't been a concerted effort to be completely exhaustive, so it is likely still possible to have more. This'd belong more naturally in the pg_dump test suite, but we chose to put it in src/bin/pg_upgrade/t/002_pg_upgrade.pl because we need a run of the regression tests which is already done here, so this has less total test runtime impact. Also, experiments have shown that using parallel dump/restore is slightly faster, so we use --format=directory -j2. This test has already reported pg_dump bugs, as fixed in `fd41ba93e4`, `74563f6b90`, `d611f8b158`, `4694aedf63`. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com	2025-04-01 18:50:40 +02:00
Peter Eisentraut	764d501d24	Remove a stray "pgrminclude" annotation We don't use those anymore. Fix for commit `8492feb98f`.	2025-04-01 15:28:22 +02:00
Peter Eisentraut	113ecf1f8c	Fix minor C type confusion Returning false instead of NULL gets a compiler error under gcc-14 -std=gnu23, and it appears to have been unintentional. Fix for commit `8492feb98f`.	2025-04-01 15:28:22 +02:00
Heikki Linnakangas	2904324a88	heapam: Only set tuple's block once per page in pagemode Due to splitting the block id into two 16 bit integers, BlockIdSet() is more expensive than one might think. Doing it once per returned tuple shows up as a small but reliably reproducible cost. It's simple enough to set the block number just once per block in pagemode, so do so. Author: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6	2025-04-01 13:24:27 +03:00
John Naylor	af0c248557	Use function attributes for SSE 4.2 even when targeting that extension On Red Hat 9 systems (or similar), the packaged gcc targets x86-64-v2, but clang does not. This has caused build failures in the wake of commit `e2809e3a1` when building --with-llvm. The most expedient fix is to use the same function attributes for the inlined function as we do for the global function. Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (plus members skimmer and bumblebee) Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Todd Cook <cookt@blackduck.com> Discussion: https://postgr.es/m/CANWCAZZSxs3a1YRKehkgk2OHKbrVn+xZ+AWW8Co2R_f70NqqmA@mail.gmail.com	2025-04-01 12:01:58 +07:00
David Rowley	3dbdf86c63	Fix failing regression test on x86-32 machines `95d6e9af0` added code to display the tuplestore storage type for WindowAgg nodes and added a test to ensure the "Disk" storage method was working correctly by setting work_mem to 64 and running a test which caused the WindowAgg to go to disk. Seemingly, the number of rows chosen there wasn't quite enough for that to happen in x86 32-bit. Fix this by increasing the number of rows slightly. I suspect the buildfarm didn't catch this as MEMORY_CONTEXT_CHECKING builds will use a bit more memory for MemoryChunks to store the requested_size and also because of the additional space to store the chunk's sentinel byte. Reported-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/Z-q3ZAM4OhE-4UiI@msg.df7cb.de	2025-04-01 10:52:25 +13:00
Tom Lane	2fd3e2fa5c	Fix accidentally-harmless thinko in psqlscan_test_variable(). This code was passing literal strings to psqlscan_emit, which is quite contrary to that function's specification: "If you pass it something that is not part of the yytext string, you are making a mistake". It accidentally worked anyway, even in non-safe_encoding mode. psqlscan_emit would compute a garbage "reference" pointer, but would never dereference that since the passed string is all-ASCII. So there's no live bug today, but that is a happenstance outcome of psqlscan_emit's current implementation. Let's make psqlscan_test_variable do what it's supposed to, namely append directly to the output buffer. This is just future-proofing against possible changes in psqlscan_emit, so I don't feel a need to back-patch.	2025-03-31 12:16:32 -04:00
John Naylor	e2809e3a10	Inline CRC computation for small fixed-length input on x86 pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c suitable for inlining where possible. This may slightly reduce contention for the WAL insertion lock, but that hasn't been tested. The motivation for this change is avoid regressing for a future commit that will use a function pointer for non-constant input in all x86 builds. While it's technically possible to make a similar change for Arm and LoongArch, there are some questions about how inlining should work since those platforms prefer stricter alignment. There are also no immediate plans to add additional implementations for them. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com	2025-03-31 13:17:21 +07:00
Jeff Davis	4694aedf63	Add relallfrozen to pg_dump statistics. Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=desCuf3dVHasADvdUVRmb-5gO0mhMO5u9nzgv6i7U86Q@mail.gmail.com	2025-03-30 22:14:06 -07:00
Andres Freund	2a5e709e72	Enable IO concurrency on all systems Previously effective_io_concurrency and maintenance_io_concurrency could not be set above 0 on machines without fadvise support. AIO enables IO concurrency without such support, via io_method=worker. Currently only subsystems using the read stream API will take advantage of this. Other users of maintenance_io_concurrency (like recovery prefetching) which leverage OS advice directly will not benefit from this change. In those cases, maintenance_io_concurrency will have no effect on I/O behavior. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAAKRu_atGgZePo=_g6T3cNtfMf0QxpvoUh5OUqa_cnPdhLd=gw@mail.gmail.com	2025-03-30 19:16:47 -04:00
Andres Freund	ae3df4b341	read_stream: Introduce and use optional batchmode support Submitting IO in larger batches can be more efficient than doing so one-by-one, particularly for many small reads. It does, however, require the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not: a) block without first calling pgaio_submit_staged(), unless a to-be-waited-on lock cannot be part of a deadlock, e.g. because it is never held while waiting for IO. b) directly or indirectly start another batch pgaio_enter_batchmode() As this requires care and is nontrivial in some cases, batching is only used with explicit opt-in. This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and uses it where appropriate. There are two cases where batching would likely be beneficial, but where we aren't using it yet: 1) bitmap heap scans, because the callback reads the VM This should soon be solved, because we are planning to remove the use of the VM, due to that not being sound. 2) The first phase of heap vacuum This could be made to support batchmode, but would require some care. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-03-30 18:36:41 -04:00
Andres Freund	f4d0730bbc	aio: Basic read_stream adjustments for real AIO Adapt the read stream logic for real AIO: - If AIO is enabled, we shouldn't issue advice, but if it isn't, we should continue issuing advice - AIO benefits from reading ahead with direct IO - If effective_io_concurrency=0, pass READ_BUFFERS_SYNCHRONOUSLY to StartReadBuffers() to ensure synchronous IO execution There are further improvements we should consider: - While in read_stream_look_ahead(), we can use AIO batch submission mode for increased efficiency. That however requires care to avoid deadlocks and thus done separately. - It can be beneficial to defer starting new IOs until we can issue multiple IOs at once. That however requires non-trivial heuristics to decide when to do so. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Thomas Munro <thomas.munro@gmail.com>	2025-03-30 18:26:44 -04:00
Andres Freund	12ce89fd07	bufmgr: Use AIO in StartReadBuffers() This finally introduces the first actual use of AIO. StartReadBuffers() now uses the AIO routines to issue IO. As the implementation of StartReadBuffers() is also used by the functions for reading individual blocks (StartReadBuffer() and through that ReadBufferExtended()) this means all buffered read IO passes through the AIO paths. However, as those are synchronous reads, actually performing the IO asynchronously would be rarely beneficial. Instead such IOs are flagged to always be executed synchronously. This way we don't have to duplicate a fair bit of code. When io_method=sync is used, the IO patterns generated after this change are the same as before, i.e. actual reads are only issued in WaitReadBuffers() and StartReadBuffers() may issue prefetch requests. This allows to bypass most of the actual asynchronicity, which is important to make a change as big as this less risky. One thing worth calling out is that, if IO is actually executed asynchronously, the precise meaning of what track_io_timing is measuring has changed. Previously it tracked the time for each IO, but that does not make sense when multiple IOs are executed concurrently. Now it only measures the time actually spent waiting for IO. A subsequent commit will adjust the docs for this. While AIO is now actually used, the logic in read_stream.c will often prevent using sufficiently many concurrent IOs. That will be addressed in the next commit. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-30 18:02:23 -04:00
Andres Freund	047cba7fa0	bufmgr: Implement AIO read support This commit implements the infrastructure to perform asynchronous reads into the buffer pool. To do so, it: - Adds readv AIO callbacks for shared and local buffers It may be worth calling out that shared buffer completions may be run in a different backend than where the IO started. - Adds an AIO wait reference to BufferDesc, to allow backends to wait for in-progress asynchronous IOs - Adapts StartBufferIO(), WaitIO(), TerminateBufferIO(), and their localbuf.c equivalents, to be able to deal with AIO - Moves the code to handle BM_PIN_COUNT_WAITER into a helper function, as it now also needs to be called on IO completion As of this commit, nothing issues AIO on shared/local buffers. A future commit will update StartReadBuffers() to do so. Buffer reads executed through this infrastructure will report invalid page / checksum errors / warnings differently than before: In the error case the error message will cover all the blocks that were included in the read, rather than just the reporting the first invalid block. If more than one block is invalid, the error will include information about the range of the read, the first invalid block and the number of invalid pages, with a HINT towards the server log for per-block details. For the warning case (i.e. zero_damaged_buffers) we would previously emit one warning message for each buffer in a multi-block read. Now there is only a single warning message for the entire read, again referring to the server log for more details in case of multiple checksum failures within a single larger read. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-30 17:28:03 -04:00
Andres Freund	ef64fe26ba	aio: Add WARNING result status If an IO succeeds, but issues a warning, e.g. due to a page verification failure with zero_damaged_pages, we want to issue that warning in the context of the issuer of the IO, not the process that executes the completion (always the case for worker). It's already possible for a completion callback to report a custom error message, we just didn't have a result status that allowed a user of AIO to know that a warning should be emitted even though the IO request succeeded. All that's needed for that is a dedicated PGAIO_RS_ value. Previously there were not enough bits in PgAioResult.id for the new value. Increase. While at that, add defines for the amount of bits and static asserts to check that the widths are appropriate. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com	2025-03-30 16:27:10 -04:00
Andres Freund	d445990adc	Let caller of PageIsVerified() control ignore_checksum_failure For AIO the completion of a read into shared buffers (i.e. verifying the page including the checksum, updating the BufferDesc to reflect the IO) can happen in a different backend than the backend that started the IO. As ignore_checksum_failure can differ between backends, we need to allow the caller of PageIsVerified() control whether to ignore checksum failures. The commit leaves a gap in the PIV_* values, as an upcoming commit, which depends on this commit, will add PIV_LOG_LOG, which better fits just after PIV_LOG_WARNING. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com	2025-03-30 16:27:10 -04:00
Andres Freund	b96d3c3897	pgstat: Allow checksum errors to be reported in critical sections For AIO we execute completion callbacks in critical sections (to ensure that AIO can in the future be used for WAL, which in turn requires that we can call completion callbacks in critical sections, to get the resources for WAL io). To report checksum errors a backend now has to call pgstat_prepare_report_checksum_failure(), before entering a critical section, which guarantees the relevant pgstats entry is in shared memory, the relevant DSM segment is mapped into the backend's memory and the address is known via a PgStat_EntryRef. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/wkjj4p2rmkevutkwc6tewoovdqznj6c6nvjmvii4oo5wmbh5sr@retq7d6uqs4j	2025-03-30 16:12:04 -04:00
Andres Freund	4244cf6876	Add errhint_internal() We have errmsg_internal(), errdetail_internal(), but not errhint_internal(). Sometimes it is useful to output a hint with already translated format string (e.g. because there different messages depending on the condition). For message/detail we do that with the _internal() variants, but we can't do that with hint today. It's possible to work around that that by using something like str = psprintf(translated_format, args); ereport(... errhint("%s", str); but that's not exactly pretty and makes it harder to avoid memory leaks. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/ym3dqpa4xcvoeknewcw63x77vnqdosbqcetjinb2zfoh65k55m@m4ozmwhr6lk6	2025-03-30 16:10:51 -04:00
Andres Freund	d6d8054dc7	localbuf: Track pincount in BufferDesc as well For AIO on temporary table buffers the AIO subsystem needs to be able to ensure a pin on a buffer while AIO is going on, even if the IO issuing query errors out. Tracking the buffer in LocalRefCount does not work, as it would cause CheckForLocalBufferLeaks() to assert out. Instead, also track the refcount in BufferDesc.state, not just LocalRefCount. This also makes local buffers behave a bit more akin to shared buffers. Note that we still don't need locking, AIO completion callbacks for local buffers are executed in the issuing session (i.e. nobody else has access to the BufferDesc). Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-03-29 16:36:51 -04:00
Andres Freund	08ccd56ac7	aio, bufmgr: Comment fixes/improvements Some of these comments have been wrong for a while (`12f3867f55`), some I recently introduced (`da7226993f`, `55b454d0e1`). This includes an update to a comment in FlushBuffer(), which will be copied in a future commit. These changes seem big enough to be worth doing in separate commits. Suggested-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com	2025-03-29 14:45:42 -04:00
Andres Freund	50cb7505b3	aio: Implement support for reads in smgr/md/fd This implements the following: 1) An smgr AIO target, for AIO on smgr files. This should be usable not just for md.c but also other SMGR implementation if we ever get them. 2) readv support in fd.c, which requires a small bit of infrastructure work in fd.c 3) smgr.c and md.c support for readv There still is nothing performing AIO, but as of this commit it would be possible. As part of this change FileGetRawDesc() actually ensures that the file is opened - previously it was basically not usable. It's used to reopen a file in IO workers. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-29 13:38:35 -04:00
Andres Freund	dee8002468	Fix mis-attribution of checksum failure stats to the wrong database Checksum failure stats could be attributed to the wrong database in two cases: - when a read of a shared relation encountered a checksum error , it would be attributed to the current database, instead of the "database" representing shared relations - when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the source database would be attributed to the current database The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does not have access to the information about what database a page belongs to. This fixes the issue by removing PIV_REPORT_STAT and delegating the responsibility to report stats to the caller, which now can learn about the number of stats via a new optional argument. As this changes the signature of PageIsVerifiedExtended() and all callers should adapt to the new signature, use the occasion to rename the function to PageIsVerified() and remove the compatibility macro. We could instead have fixed this by adding information about the database to the args of PageIsVerified(), but there are soon-to-be-applied patches that need to separate the stats reporting from the PageIsVerified() call anyway. Those patches also include testing for the failure paths, something we inexplicably have not had. As there is no caller of pgstat_report_checksum_failure() left, remove it. It'd be possible, but awkward to fix this in the back branches. We considered doing the work not quite worth it, as mis-attributed stats should still elicit concern. The emitted error messages do allow to attribute the errors correctly. Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz	2025-03-29 13:38:35 -04:00
Andres Freund	116e851db5	Fix "‘static’ is not at beginning of declaration" warning `b98be8a2a2` used "const static" instead of "static const". We normally use the latter form. Discussion: https://postgr.es/m/z4mc2hzecahyq3paupfsouhuupmzmgum45md3k5my6bmo7gvn7@z5j26doqamqy	2025-03-29 10:48:59 -04:00
Tomas Vondra	14ffaece0f	amcheck: Add gin_index_check() to verify GIN index Adds a new function, validating two kinds of invariants on a GIN index: - parent-child consistency: Paths in a GIN graph have to contain consistent keys. Tuples on parent pages consistently include tuples from child pages; parent tuples do not require any adjustments. - balanced-tree / graph: Each internal page has at least one downlink, and can reference either only leaf pages or only internal pages. The GIN verification is based on work by Grigory Kryachko, reworked by Heikki Linnakangas and with various improvements by Andrey Borodin. Investigation and fixes for multiple bugs by Kirill Reshke. Author: Grigory Kryachko <GSKryachko@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Andrey Borodin <amborodin@acm.org> Reviewed-By: José Villanova <jose.arthur@gmail.com> Reviewed-By: Aleksander Alekseev <aleksander@timescale.com> Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-By: Kirill Reshke <reshkekirill@gmail.com> Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru	2025-03-29 15:44:29 +01:00
Peter Eisentraut	53a2a1564a	pgbench: Make set_random_seed() 64-bit everywhere. Delete an intermediate variable, a redundant cast, a use of long and a use of long long. scanf() the seed directly into a uint64, now that we can do that with SCNu64 from <inttypes.h>. The previous coding was from pre-C99 times when %lld might not have been there, so it read into an unsigned long. Therefore behavior varied by OS, and --random-seed would accept either 32 or 64 bit seeds. Now it's the same everywhere. Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org	2025-03-29 15:24:42 +01:00
Tomas Vondra	d70b17636d	amcheck: Move common routines into a separate module Before performing checks on an index, we need to take some safety measures that apply to all index AMs. This includes: * verifying that the index can be checked - Only selected AMs are supported by amcheck (right now only B-Tree). The index has to be valid and not a temporary index from another session. * changing (and then restoring) user's security context * obtaining proper locks on the index (and table, if needed) * discarding GUC changes from the index functions Until now this was implemented in the B-Tree amcheck module, but it's something every AM will have to do. So relocate the code into a new module verify_common for reuse. The shared steps are implemented by amcheck_lock_relation_and_check(), receiving the AM-specific verification as a callback. Custom parameters may be supplied using a pointer. Author: Andrey Borodin <amborodin@acm.org> Reviewed-By: José Villanova <jose.arthur@gmail.com> Reviewed-By: Aleksander Alekseev <aleksander@timescale.com> Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru	2025-03-29 15:14:49 +01:00
Tomas Vondra	fb9dff7663	Fix grammar in GIN README Author: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/CALdSSPgu9uAhVYojQ0yjG%3Dq5MaqmiSLUJPhz%2B-u7cA6K6Mc9UA%40mail.gmail.com	2025-03-29 15:14:25 +01:00
Dean Rasheed	8b6a0e2392	Fix MERGE with DO NOTHING actions into a partitioned table. ExecInitPartitionInfo() duplicates much of the logic in ExecInitMerge(), except that it failed to handle DO NOTHING actions. This would cause an "unknown action in MERGE WHEN clause" error if a MERGE with any DO NOTHING actions attempted to insert into a partition not already initialised by ExecInitModifyTable(). Bug: #18871 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Gurjeet Singh <gurjeet@singh.im> Discussion: https://postgr.es/m/18871-b44e3c96de3bd2e8%40postgresql.org Backpatch-through: 15	2025-03-29 09:58:40 +00:00
Peter Eisentraut	a0ed19e0a9	Use PRI?64 instead of "ll?" in format strings (continued). Continuation of work started in commit `15a79c73`, after initial trial. Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org	2025-03-29 10:43:57 +01:00
Jeff Davis	a0a4601765	Matview statistics depend on matview data. REFRESH MATERIALIZED VIEW replaces the storage, which resets statistics, so statistics must be restored afterward. If both statistics and data are being dumped for a materialized view, add a dependency from the former to the latter. Defer the statistics to SECTION_POST_DATA, and use RESTORE_PASS_POST_ACL. Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com	2025-03-28 16:12:55 -07:00
Alexander Korotkov	775a06d44c	Make group_similar_or_args() reorder clause list as little as possible Currently, group_similar_or_args() permutes original positions of clauses independently on whether it manages to find any groups of similar clauses. While we are not providing any strict warranties on saving the original order of OR-clauses, it is preferred that the original order be modified as little as possible. This commit changes the reordering algorithm of group_similar_or_args() in the following way. We reorder each group of similar clauses so that the first item of the group stays in place, but all the other items are moved after it. So, if there are no similar clauses, the order of clauses stays the same. When there are some groups, only required reordering happens while the rest of the clauses remain in their places. Reported-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>	2025-03-28 23:37:49 +02:00
Nathan Bossart	519338ace4	Optimize popcount functions with ARM SVE intrinsics. This commit introduces SVE implementations of pg_popcount{32,64}. Unlike the Neon versions, we need an additional configure-time check to determine if the compiler supports SVE intrinsics, and we need a runtime check to determine if the current CPU supports SVE instructions. Our testing showed that the SVE implementations are much faster for larger inputs and are comparable to the status quo for smaller inputs. Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com> Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com	2025-03-28 16:20:20 -05:00
Peter Eisentraut	3c8e463b0d	Revert "Tidy up locale thread safety in ECPG library." This reverts commit `8e993bff53`. It causes various build failures on the buildfarm, to be investigated. Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech	2025-03-28 21:27:37 +01:00
Nathan Bossart	6be53c2767	Optimize popcount functions with ARM Neon intrinsics. This commit introduces Neon implementations of pg_popcount{32,64}, pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume that all available AArch64 hardware supports Neon, so we don't need any new configure-time or runtime checks. Some compilers already emit Neon instructions for these functions, but our hand-rolled implementations for pg_popcount() and pg_popcount_masked() performed better in testing, likely due to better instruction-level parallelism. Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com	2025-03-28 14:49:35 -05:00
Heikki Linnakangas	51a0382e8d	Fix crash if LockErrorCleanup() is called twice The refactoring in commit `3c0fd64fec` removed the clearing of awaitedLock from LockErrorCleanup(). It's still needed, otherwise LockErrorCleanup() during abort processing will try to update the LOCALLOCK struct even after the lock has already been released. Put it back. Reported-by: Richard Guo <guofenglinux@gmail.com> Reported-by: Robins Tharakan <tharakan@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4_dNX1SzBmvFdoY-LxJh_4W_BjtVd5i008ihfU-wFF=eg@mail.gmail.com Discussion: https://www.postgresql.org/message-id/18832-38e5575b1bbd7277@postgresql.org Discussion: https://www.postgresql.org/message-id/e11a30e5-c0d8-491d-8546-3a1b50c10ad4@gmail.com	2025-03-28 20:19:17 +02:00
Nathan Bossart	9ac6f7e7ce	Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64. This macro protects x86_64-specific code, and a subsequent commit will introduce AArch64-specific versions of that code. To prevent confusion, let's rename it to clearly indicate that it's for x86_64. We should likely move this code to its own file (perhaps merging it with the AVX-512 popcount code), but that is left as a future exercise. Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com	2025-03-28 12:27:47 -05:00
Masahiko Sawada	a5419bc72e	Fix timestamp overflow in UUIDv7 implementation. The uuidv7_interval() function previously converted a shifted microsecond-precision timestamp (64-bit integer) to another 64-bit integer representing a timestamp with nanosecond precision. This conversion caused overflow for dates beyond the year 2262. The millisecond and sub-millisecond parts were then extracted from this nanosecond-precision timestamp and stored in UUIDv7 values. With this commit, the millisecond and sub-millisecond parts are stored directly into the UUIDv7 value without being converted back to a nanosecond precision timestamp. Following RFC 9562, the timestamp is stored as an unsigned integer, enabling support for dates up to the year 10889. Reported and fixed by Andrey Borodin, with cosmetic changes and regression tests by me. Reported-by: Andrey Borodin <x4mmm@yandex-team.ru> Author: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru	2025-03-28 09:39:11 -07:00
Peter Eisentraut	8e993bff53	Tidy up locale thread safety in ECPG library. Remove setlocale() and _configthreadlocal() as fallback strategy on systems that don't have uselocale(), where ECPG tries to control LC_NUMERIC formatting on input and output of floating point numbers. It was probably broken on some systems (NetBSD), and the code was also quite messy and complicated, with obsolete configure tests (Windows). It was also arguably broken, or at least had unstated environmental requirements, if pgtypeslib code was called directly. Instead, introduce PG_C_LOCALE to refer to the "C" locale as a locale_t value. It maps to the special constant LC_C_LOCALE when defined by libc (macOS, NetBSD), or otherwise uses a process-lifetime locale_t that is allocated on first use, just as ECPG previously did itself. The new replacement might be more widely useful. Then change the float parsing and printing code to pass that to _l() functions where appropriate. Unfortunately the portability of those functions is a bit complicated. First, many obvious and useful _l() functions are missing from POSIX, though most standard libraries define some of them anyway. Second, although the thread-safe save/restore technique can be used to replace the missing ones, Windows and NetBSD refused to implement standard uselocale(). They might have a point: "wide scope" uselocale() is hard to combine with other code and error-prone, especially in library code. Luckily they have the _l() functions we want so far anyway. So we have to be prepared for both ways of doing things: 1. In ECPG, use strtod_l() for parsing, and supply a port.h replacement using uselocale() over a limited scope if missing. 2. Inside our own snprintf.c, use three different approaches to format floats. For frontend code, call libc's snprintf_l(), or wrap libc's snprintf() in uselocale() if it's missing. For backend code, snprintf.c can keep assuming that the global locale's LC_NUMERIC is "C" and call libc's snprintf() without change, for now. (It might eventually be possible to call our in-tree Ryū routines to display floats in snprintf.c, given the C-locale-always remit of our in-tree snprintf(), but this patch doesn't risk changing anything that complicated.) Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tristan Partin <tristan@partin.io> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech	2025-03-28 16:18:36 +01:00
Peter Eisentraut	2247281c47	Cast result of i64abs() back to int64 Without the cast, the return type could be long or long long, depending on what int64 is underneath. This doesn't affect code correctness, but it could result in format-mismatch warnings when attempting to printf such values using PRId64. Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+hUKGJc4s+Wyb3EFOQNN9VVK+Qv40r2LK41o9PkS9ThxviTvQ@mail.gmail.com	2025-03-28 14:34:57 +01:00
Peter Eisentraut	cdc168ad4b	Add support for not-null constraints on virtual generated columns This was left out of the original patch for virtual generated columns (commit `83ea6c5402`). This just involves a bit of extra work in the executor to expand the generation expressions and run a "IS NOT NULL" test against them. There is also a bit of work to make sure that not-null constraints are checked during a table rewrite. Author: jian he <jian.universality@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Navneet Kumar <thanit3111@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com	2025-03-28 13:53:37 +01:00
Peter Eisentraut	747ddd38cb	Modernize some code a bit Modernize code in ExecRelCheck() and ExecConstraints() a bit, preparing the way for some new code. Co-authored-by: jian he <jian.universality@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Navneet Kumar <thanit3111@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com	2025-03-28 10:49:15 +01:00
Peter Eisentraut	9a9ead1105	Rename a node field for clarity Rename ResultRelInfo.ri_ConstraintExprs to ri_CheckConstraintExprs. This reflects its specific purpose better and avoids confusion with adjacent fields with similar but distinct purposes. Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com	2025-03-28 09:50:01 +01:00
Amit Kapila	fb2ea12f42	pg_createsubscriber: Add '--all' option. The '--all' option indicates that the tool queries the source server (publisher) for all databases and creates subscriptions on the target server (subscriber) for databases with matching names. Without this user needs to explicitly specify all databases by using -d option for each database. This simplifies converting a physical standby to a logical subscriber, particularly during upgrades. The options '--database', '--publication', '--subscription', and '--replication-slot' cannot be used when '--all' is specified. Author: Shubham Khanna <khannashubham1197@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://postgr.es/m/CAHv8RjKhA=_h5vAbozzJ1Opnv=KXYQHQ-fJyaMfqfRqPpnC2bA@mail.gmail.com	2025-03-28 12:26:39 +05:30
Peter Eisentraut	890fc826c9	Use thread-safe strftime_l() instead of strftime(). This removes some setlocale() calls and a lot of commentary about how dangerous that is. strftime_l() is from POSIX 2008, and on Windows we use _wcsftime_l(). While here, adjust error message for strftime_l() failure: it does not in practice set errno (even though POSIX says it could), so no %m. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com	2025-03-28 07:13:43 +01:00
Amit Kapila	474d7a1fd8	Stablize tests added in `3abe9dc188`. The problem is that after the ALTER SUBSCRIPTION tap_sub SET PUBLICATION command, we didn't wait for the new walsender to start on the publisher. Immediately after ALTER, we performed Insert and expected it to replicate. However, the replication could start from a point after the INSERT location, and as the subscription isn't copying initial data, we could miss such an Insert. The fix is to wait for connection to be established between publisher and subscriber before starting DML operations that are expected to replicate. As per CI. Reported-by: Andres Freund <andres@anarazel.de> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com	2025-03-28 11:03:05 +05:30
Daniel Gustafsson	058b5152f0	Fix guc_malloc calls for consistency and OOM checks check_createrole_self_grant and check_synchronized_standby_slots were allocating memory on a LOG elevel without checking if the allocation succeeded or not, which would have led to a segfault on allocation failure. On top of that, a number of callsites were using the ERROR level, relying on erroring out rather than returning false to allow the GUC machinery handle it gracefully. Other callsites used WARNING instead of LOG. While neither being not wrong, this changes all check_ functions do it consistently with LOG. init_custom_variable gets a promoted elevel to FATAL to keep the guc_malloc error handling in line with the rest of the error handling in that function which already call FATAL. If we encounter an OOM in this callsite there is no graceful handling to be had, better to error out hard. Backpatch the fix to check_createrole_self_grant down to v16 and the fix to check_synchronized_standby_slots down to v17 where they were introduced. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Nikita <pm91.arapov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Bug: #18845 Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org Backpatch-through: 16	2025-03-27 22:57:34 +01:00
Melanie Plageman	043799fa08	Use streaming read I/O in heap amcheck Instead of directly invoking ReadBuffer() for each unskippable block in the heap relation, verify_heapam() now uses the read stream API to acquire the next buffer to check for corruption. Author: Matheus Alcantara <matheusssilv97@gmail.com> Co-authored-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com	2025-03-27 14:04:14 -04:00
Tom Lane	d66997dfe8	Avoid mixing designated and non-designated field initializers. As revised by commit `9324c8c58`, PG_MODULE_MAGIC constructed a struct initializer containing both designated fields and a non-designated "0". That's okay in C, but not in C++, with the result that extensions written in C++ failed to compile. Change it to use only designated field initializers. Author: Yurii Rashkovskii <yrashk@omnigres.com> Discussion: https://postgr.es/m/CAG=VW14mctsR543gpzLCuJ9JgJqwa=ptmBfGvxEjs+k8Jf7-Bg@mail.gmail.com	2025-03-27 11:06:30 -04:00
Daniel Gustafsson	0f3604a518	psql: Fix incorrect equality comparison Commit `1a759c8327` contained an incorrect equality comparison which was discovered by Coverity. Reported-by: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://postgr.es/m/CAEudQApfAWzLo+oSuy2byXktdr7R8KJC_ACT5VV8fontrL35Pw@mail.gmail.com	2025-03-27 14:09:25 +01:00
Álvaro Herrera	9fbd53dea5	Remove the query_id_squash_values GUC Commit `62d712ecfd` introduced the capability to calculate the same queryId for queries with different lengths of constants in a list for an IN clause. This behavior was originally enabled with a GUC query_id_squash_values. After a discussion about the value of such a GUC, it was decided to back out of the use of a GUC and make the squashing behavior the only available option. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com	2025-03-27 13:33:37 +01:00
Peter Eisentraut	5d5f415816	Expand test a bit Make pg_constraint output in inherit test show the convalidated column as well. This shows the interaction between convalidated and conenforced. This is extracted from a larger patch so that this reformatting isn't distracting there. Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-03-27 12:11:15 +01:00
Peter Eisentraut	b98be8a2a2	Provide thread-safe pg_localeconv_r(). This involves four different implementation strategies: 1. For Windows, we now require _configthreadlocale() to be available and work (commit `f1da075d9a`), and the documentation says that the object returned by localeconv() is in thread-local memory. 2. For glibc, we translate to nl_langinfo_l() calls, because it offers the same information that way as an extension, and that API is thread-safe. 3. For macOS/*BSD, use localeconv_l(), which is thread-safe. 4. For everything else, use uselocale() to set the locale for the thread, and use a big ugly lock to defend against the returned object being concurrently clobbered. In practice this currently means only Solaris. The new call is used in pg_locale.c, replacing calls to setlocale() and localeconv(). Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com	2025-03-27 10:54:28 +01:00
Álvaro Herrera	4a02af8b1a	Simplify syntax for ALTER TABLE ALTER CONSTRAINT NO INHERIT Commit `d45597f72f` introduced the ability to change a not-null constraint from NO INHERIT to INHERIT and vice versa, but we included the SET noise word in the syntax for it. The SET turns out not to be necessary and goes against what the SQL standard says for other ALTER TABLE subcommands, so remove it. This changes the way this command is processed for constraint types other than not-null, so there are some error message changes. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Discussion: https://postgr.es/m/202503251602.vsxaehsyaoac@alvherre.pgsql	2025-03-27 09:24:52 +01:00
Michael Paquier	72c2f36d57	libpq: Add TAP tests for service files and names This commit adds a set of regression tests that checks various patterns with service names and service files, with: - Service file with no contents, used as default for PGSERVICEFILE to prevent any lookups at the HOME directory of an environment where the test is run. - Service file with valid service name and its section. - Service file at the root of PGSYSCONFDIR, named pg_service.conf. - Missing service file. - Service name defined as a connection parameter or as PGSERVICE. Note that PGSYSCONFDIR is set to always point at a temporary directory created by the test, so as we never try to look at SYSCONFDIR. This set of tests has come up as a useful independent addition while discussing a patch that adds an equivalent of PGSERVICEFILE as a connection parameter as there have never been any tests for service files and service names. Torsten Foertsch and Ryo Kanbayashi have provided a basic implementation, that I have expanded to what is introduced in this commit. Author: Torsten Foertsch <tfoertsch123@gmail.com> Author: Ryo Kanbayashi <kanbayashi.dev@gmail.com> Author: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAKkG4_nCjx3a_F3gyXHSPWxD8Sd8URaM89wey7fG_9g7KBkOCQ@mail.gmail.com	2025-03-27 16:01:38 +09:00
David Rowley	ad9a23bc4f	Optimize Query jumble `f31aad9b0` adjusted query jumbling so it no longer ignores NULL nodes during the jumble. This added some overhead. Here we tune a few things to make jumbling faster again. This makes jumbling perform similar or even slightly faster than prior to that change. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAApHDvreP04nhTKuYsPw0F-YN+4nr4f=L72SPeFb81jfv+2c7w@mail.gmail.com	2025-03-27 18:34:34 +13:00
David Rowley	f31aad9b07	Fix query jumbling to account for NULL nodes Previously NULL nodes were ignored. This could cause issues where the computed query ID could match for queries where fields that are next to each other in their Node struct where one field was NULL and the other non-NULL. For example, the Query struct had distinctClause and sortClause next to each other. If someone wrote; SELECT DISTINCT c1 FROM t; and then; SELECT c1 FROM t ORDER BY c1; these would produce the same query ID since, in the first query, we ignored the NULL sortClause and appended the jumble bytes for the distictClause. In the latter query, since we did nothing for the NULL distinctClause then jumble the non-NULL sortClause, and since the node representation stored is the same in both cases, the query IDs were identical. Here we fix this by always accounting for NULL nodes by recording that we saw a NULL in the jumble buffer. This fixes the issue as the order that the NULL is recorded isn't the same in the above two queries. Author: Bykov Ivan <i.bykov@modernsys.ru> Author: Michael Paquier <michael@paquier.xyz> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain	2025-03-27 18:23:00 +13:00
Michael Paquier	44fe6ceb51	doc: Correct description of values used in FSM for indexes The implementation of FSM for indexes is simpler than heap, where 0 is used to track if a page is in-use and (BLCKSZ - 1) if a page is free. One comment in indexfsm.c and one description in the documentation of pg_freespacemap were incorrect about that. Author: Alex Friedman <alexf01@gmail.com> Discussion: https://postgr.es/m/71eef655-c192-453f-ac45-2772fec2cb04@gmail.com Backpatch-through: 13	2025-03-27 10:20:41 +09:00
Andres Freund	c325a7633f	aio: Add io_method=io_uring Performing AIO using io_uring can be considerably faster than io_method=worker, particularly when lots of small IOs are issued, as a) the context-switch overhead for worker based AIO becomes more significant b) the number of IO workers can become limiting io_uring, however, is linux specific and requires an additional compile-time dependency (liburing). This implementation is fairly simple and there are substantial optimization opportunities. The description of the existing AIO_IO_COMPLETION wait event is updated to make the difference between it and the new AIO_IO_URING_EXECUTION clearer. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-26 19:49:13 -04:00
Andres Freund	8eadd5c73c	aio: Add liburing dependency Will be used in a subsequent commit, to implement io_method=io_uring. Kept separate for easier review. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-03-26 19:45:32 -04:00
Andres Freund	9469d7fdd2	aio: Rename pgaio_io_prep_* to pgaio_io_start_* The old naming pattern (mirroring liburing's naming) was inconsistent with the (not yet introduced) callers. It seems better to get rid of the inconsistency now than to grow more users of the odd naming. Reported-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250326001915.bc.nmisch@google.com	2025-03-26 16:10:29 -04:00
Andres Freund	f321ec237a	aio: Pass result of local callbacks to ->report_return Otherwise the results of e.g. temp table buffer verification errors will not reach bufmgr.c. Obviously that's not right. Found while expanding the tests for invalid buffer contents. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250326001915.bc.nmisch@google.com	2025-03-26 16:06:54 -04:00
Andres Freund	96da9050a5	aio: Be more paranoid about interrupts As reported by Noah, it's possible, although practically very unlikely, that interrupts could be processed in between pgaio_io_reopen() and pgaio_io_perform_synchronously(). Prevent that by explicitly holding interrupts. It also seems good to add an assertion to pgaio_io_before_prep() to ensure that interrupts are held, as otherwise FDs referenced by the IO could be closed during interrupt processing. All code in the aio series currently runs the code with interrupts held, but it seems better to be paranoid. Reviewed-by: Noah Misch <noah@leadboat.com> Reported-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250324002939.5c.nmisch@google.com	2025-03-26 16:06:54 -04:00
Robert Haas	8d5ceb113e	pg_overexplain: Additional EXPLAIN options for debugging. There's a fair amount of information in the Plan and PlanState trees that isn't printed by any existing EXPLAIN option. This means that, when working on the planner, it's often necessary to rely on facilities such as debug_print_plan, which produce excessively voluminous output. Hence, use the new EXPLAIN extension facilities to implement EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core EXPLAIN facility. A great deal more could be done here, and the specific choices about what to print and how are definitely arguable, but this is at least a starting point for discussion and a jumping-off point for possible future improvements. Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it) Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com	2025-03-26 13:52:21 -04:00
Tomas Vondra	818245506c	Keep the decompressed filter in brin_bloom_union The brin_bloom_union() function combines two BRIN summaries, by merging one filter into the other. With bloom, we have to decompress the filters first, but the function failed to update the summary to store the merged filter. As a consequence, the index may be missing some of the data, and return false negatives. This issue exists since BRIN bloom indexes were introduced in Postgres 14, but at that point the union function was called only when two sessions happened to summarize a range concurrently, which is rare. It got much easier to hit in 17, as parallel builds use the union function to merge summaries built by workers. Fixed by storing a pointer to the decompressed filter, and freeing the original one. Free the second filter too, if it was decompressed. The freeing is not strictly necessary, because the union is called in short-lived contexts, but it's tidy. Backpatch to 14, where BRIN bloom indexes were introduced. Reported by Arseniy Mukhin, investigation and fix by me. Reported-by: Arseniy Mukhin Discussion: https://postgr.es/m/18855-1cf1c8bcc22150e6%40postgresql.org Backpatch-through: 14	2025-03-26 17:01:41 +01:00
Tom Lane	55527368bd	Use PG_MODULE_MAGIC_EXT in our installable shared libraries. It seems potentially useful to label our shared libraries with version information, now that a facility exists for retrieving that. This patch labels them with the PG_VERSION string. There was some discussion about using semantic versioning conventions, but that doesn't seem terribly helpful for modules with no SQL-level presence; and for those that do have SQL objects, we typically expect them to support multiple revisions of the SQL definitions, so it'd still not be very helpful. I did not label any of src/test/modules/. It seems unnecessary since we don't install those, and besides there ought to be someplace that still provides test coverage for the original PG_MODULE_MAGIC macro. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com	2025-03-26 11:11:02 -04:00
Tom Lane	9324c8c580	Introduce PG_MODULE_MAGIC_EXT macro. This macro allows dynamically loaded shared libraries (modules) to provide a wired-in module name and version, and possibly other compile-time-constant fields in future. This information can be retrieved with the new pg_get_loaded_modules() function. This feature is expected to be particularly useful for modules that do not have any exposed SQL functionality and thus are not associated with a SQL-level extension object. But even for modules that do belong to extensions, being able to verify the actual code version can be useful. Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com	2025-03-26 11:06:12 -04:00
Daniel Gustafsson	e92c0632c1	Move GSSAPI includes into its own header Due to a conflict in macro names on Windows between <wincrypt.h> and <openssl/ssl.h> these headers need to be included using a predictable pattern with an undef to handle that. The GSSAPI header <gssapi.h> does include <wincrypt.h> which cause problems with compiling PostgreSQL using MSVC when OpenSSL and GSSAPI are both enabled in the tree. Rather than fixing piecemeal for each file including gssapi headers, move the the includes and undef to a new file which should be used to centralize the logic. This patch is a reworked version of a patch by Imran Zaheer proposed earlier in the thread. Once this has proven effective in master we should look at backporting this as the problem exist at least since v16. Author: Daniel Gustafsson <daniel@yesql.se> Co-authored-by: Imran Zaheer <imran.zhir@gmail.com> Reported-by: Dave Page <dpage@pgadmin.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/20240708173204.3f3xjilglx5wuzx6@awork3.anarazel.de	2025-03-26 15:31:46 +01:00
Daniel Gustafsson	1eb399366e	psql: Make test robust against locale variations The test committed in `1a759c8327` was prone to failing when using locales with a different decimal separator. Since the test value isn't the important part, change to using an integer instead. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://postgr.es/m/CAFj8pRDE=7uW7QP4rg-OQLE2i-puYsUUt+eHE-L6_b_J9w=eWg@mail.gmail.com	2025-03-26 13:20:56 +01:00
Dean Rasheed	a3b6dfd410	Add support for gamma() and lgamma() functions. These are useful general-purpose math functions which are included in POSIX and C99, and are commonly included in other math libraries, so expose them as SQL-callable functions. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://postgr.es/m/CAEZATCXpGyfjXCirFk9au+FvM0y2Ah+2-0WSJx7MO368ysNUPA@mail.gmail.com	2025-03-26 09:35:53 +00:00
Richard Guo	7c82b4f711	Fix integer-overflow problem in scram_SaltedPassword() Setting the iteration count for SCRAM secret generation to INT_MAX will cause an infinite loop in scram_SaltedPassword() due to integer overflow, as the loop uses the "i <= iterations" comparison. To fix, use "i < iterations" instead. Back-patch to v16 where the user-settable GUC scram_iterations has been added. Author: Kevin K Biju <kevinkbiju@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com	2025-03-26 17:46:51 +09:00
Michael Paquier	787514b30b	Use relation name instead of OID in query jumbling for RangeTblEntry custom_query_jumble (introduced in `5ac462e2b7` as a node field attribute) is now assigned to the expanded reference name "eref" of RangeTblEntry, adding in the query jumble computation the non-qualified aliased relation name, without the list of column names. The relation OID is removed from the query jumbling. The effects of this change can be seen in the tests added by `3430215fe3`, where pg_stat_statements (PGSS) entries are now grouped using the relation name, ignoring the relation search_path may point at. For example, these two relations are different, but are now grouped in a single PGSS entry as they are assigned the same query ID: CREATE TABLE foo1.tab (a int); CREATE TABLE foo2.tab (b int); SET search_path = 'foo1'; SELECT count() FROM tab; SET search_path = 'foo2'; SELECT count() FROM tab; SELECT count() FROM foo1.tab; SELECT count() FROM foo2.tab; SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab'; query \| calls --------------------------+------- SELECT count(*) FROM tab \| 4 (1 row) It is still possible to use an alias in the FROM clause to split these. This behavior is useful for relations re-created with the same name, where queries based on such relations would be grouped in the same PGSS entry. For permanent schemas, it should not really matter in practice. The main benefit is for workloads that use a lot of temporary relations, which are usually re-created with the same name continuously. These can be a heavy source of bloat in PGSS depending on the workload. Such entries can now be grouped together, improving the user experience. The original idea from Christoph Berg used catalog lookups to find temporary relations, something that the query jumble has never done, and it could cause some performance regressions. The idea to use RangeTblEntry.eref and the relation name, applying the same rules for all relations, temporary and not temporary, has been proposed by Tom Lane. The documentation additions have been suggested by Sami Imseih. Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de	2025-03-26 15:21:05 +09:00
Jeff Davis	bde2fb797a	Add pg_dump --with-{schema\|data\|statistics} options. By adding the positive variants of options, in addition to the negative variants that already exist, users can be explicit about what pg_dump should produce. Discussion: https://postgr.es/m/bd0513e4b1ea2b2f2d06f02720c6579711cb62a6.camel@j-davis.com Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>	2025-03-25 17:36:38 -07:00
Michael Paquier	27ee6ede6b	Fix two issues with custom_query_jumble in gen_node_support.pl A node field marked with custom_query_jumble and query_jumble_ignore would generate some code of a custom routine. The script is changed so as custom_query_jumble behaves like the other options in this case, query_jumble_ignore taking priority, with no code generated. A comment related to the code generated for node types was misplaced. Thinkos introduced in `5ac462e2b7`. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1324036.1742945060@sss.pgh.pa.us	2025-03-26 09:06:36 +09:00
Tom Lane	cb36f8ec21	Fix order of -I switches for building pg_regress.o. We need the -I switch for libpq_srcdir to come before any -I switches injected by configure. Otherwise there is a risk of pulling in a mismatched version of libpq_fe.h from someplace like /usr/local/include, if the platform has another Postgres version installed there. This evidently accounts for today's buildfarm failures on "anaconda". In principle the -I switch for src/port/ is at similar hazard, and has been for a very long time. But the only .h files we keep there are pg_config_paths.h and pthread-win32.h, neither of which get installed on Unix-ish systems, so the odds of picking up a conflicting header seem pretty small. That doubtless accounts for the lack of prior reports. Back-patch to v17 where pg_regress acquired a build dependency on libpq_fe.h. We could go back further to fix the hazard for src/port/ in older branches, but it seems unlikely to be worth troubling over. Reported-by: Nathan Bossart <nathandbossart@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/Z-MhRzoc7t-nPUQG@nathan Backpatch-through: 17	2025-03-25 20:03:56 -04:00
Nathan Bossart	626d7236b6	pg_upgrade: Add --swap for faster file transfer. This new option instructs pg_upgrade to move the data directories from the old cluster to the new cluster and then to replace the catalog files with those generated for the new cluster. This mode can outperform --link, --clone, --copy, and --copy-file-range, especially on clusters with many relations. However, this mode creates many garbage files in the old cluster, which can prolong the file synchronization step if --sync-method=syncfs is used. To handle that, we recommend using --sync-method=fsync with this mode, and pg_upgrade internally uses "initdb --sync-only --no-sync-data-files" for file synchronization. pg_upgrade will synchronize the catalog files as they are transferred. We assume that the database files transferred from the old cluster were synchronized prior to upgrade. This mode also complicates reverting to the old cluster, so we recommend restoring from backup upon failure during or after file transfer. We did consider teaching pg_upgrade how to generate a revert script for such failures, but we decided against it due to the rarity of failing during file transfer, the complexity of generating the script, and the potential for misusing the script. The new mode is limited to clusters located in the same file system. With some effort, we could probably support upgrades between different file systems, but this mode is unlikely to offer much benefit if we have to copy the files across file system boundaries. It is also limited to upgrades from version 10 or newer. There are a few known obstacles for using swap mode to upgrade from older versions. For example, the visibility map format changed in v9.6, and the sequence tuple format changed in v10. In fact, swap mode omits the --sequence-data option in its uses of pg_dump and instead reuses the old cluster's sequence data files. While teaching swap mode to deal with these kinds of changes is surely possible (and we may have to deal with similar problems in the future, anyway), it doesn't seem worth the effort to support upgrades from long-unsupported versions. Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-25 16:02:35 -05:00
Nathan Bossart	9c49f0e8cd	pg_dump: Add --sequence-data. This new option instructs pg_dump to dump sequence data when the --no-data, --schema-only, or --statistics-only option is specified. This was originally considered for commit `a7e5457db8`, but it was left out at that time because there was no known use-case. A follow-up commit will use this to optimize pg_upgrade's file transfer step. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-25 16:02:35 -05:00
Nathan Bossart	cf131fa942	initdb: Add --no-sync-data-files. This new option instructs initdb to skip synchronizing any files in database directories, the database directories themselves, and the tablespace directories, i.e., everything in the base/ subdirectory and any other tablespace directories. Other files, such as those in pg_wal/ and pg_xact/, will still be synchronized unless --no-sync is also specified. --no-sync-data-files is primarily intended for internal use by tools that separately ensure the skipped files are synchronized to disk. A follow-up commit will use this to help optimize pg_upgrade's file transfer step. The --sync-method=fsync implementation of this option makes use of a new exclude_dir parameter for walkdir(). When not NULL, exclude_dir specifies a directory to skip processing. The --sync-method=syncfs implementation of this option just skips synchronizing the non-default tablespace directories. This means that initdb will still synchronize some or all of the database files, but there's not much we can do about that. Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-25 16:02:35 -05:00
Jeff Davis	650ab8aaf1	Stats: use schemaname/relname instead of regclass. For import and export, use schemaname/relname rather than regclass. This is more natural during export, fits with the other arguments better, and it gives better control over error handling in case we need to downgrade more errors to warnings. Also, use text for the argument types for schemaname, relname, and attname so that casts to "name" are not required. Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=ceOSsx_=oe73QQ-BxUFR2Cwqum7-UP_fPe22DBY0NerA@mail.gmail.com	2025-03-25 11:16:06 -07:00
Daniel Gustafsson	1a759c8327	psql: Make default \watch interval configurable The default interval for \watch to wait between executing queries, when executed without a specified interval, was hardcoded to two seconds. This adds the new variable WATCH_INTERVAL which is used to set the default interval, making it configurable for the user. This makes \watch the first command which has a user configurable default setting. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/B2FD26B4-8F64-4552-A603-5CC3DF1C7103@yesql.se	2025-03-25 17:53:33 +01:00
Daniel Gustafsson	a19db08274	pg_basebackup: Add missing PQclear in error path This adds a missing PQclear in the error path of StreamLogicalLog, a fix in the same vein as `e889422d98` with an equivalent low impact. Author: Steven Niu <niushiji@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/c4b1c627-a3e4-4347-a670-1e28a43ce0eb@gmail.com	2025-03-25 17:24:23 +01:00
Peter Eisentraut	ef7a5af77d	refactor: Pass relation OID instead of Relation to createForeignKeyCheckTriggers() Currently, createForeignKeyCheckTriggers() takes a Relation type as its first argument, but it doesn't use that argument directly. Instead, it fetches the relation OID by calling RelationGetRelid(). Therefore, it would be more consistent with other functions (e.g., createForeignKeyCheckTriggers()) to pass the relation OID directly instead of the whole Relation. Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-03-25 17:04:12 +01:00
Peter Eisentraut	639238b978	refactor: Split ATExecAlterConstraintInternal() Split ATExecAlterConstraintInternal() into two functions: ATExecAlterConstrDeferrability() and ATExecAlterConstrInheritability(). This simplifies the code and avoids unnecessary confusion caused by recursive code, which isn't needed for ATExecAlterConstrInheritability(). (This also takes over the changes in commit `64224a834c`, as the new AlterConstrDeferrabilityRecurse() is essentially the old ATExecAlterChildConstr().) Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-03-25 16:18:00 +01:00
Peter Eisentraut	a3280e2a49	refactor: Move some code that updates pg_constraint to a separate function This extracts common/duplicate code for different ALTER CONSTRAINT variants into a common function. We plan to add more variants that would use the same code. Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-03-25 14:37:22 +01:00
Peter Eisentraut	f4b2a62ae3	Small fixes for Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT Small fixes for commit `f4e53e10b6`: Add missing calls to InvokeObjectPostAlterHook() and also CacheInvalidateRelcache(). The former change could have a user-visible effect. The latter omission might have caused other bugs, but it is not clear whether one actually existed. With these changes, the code is now more consistent with similar ALTER CONSTRAINT variants, especially the ones that set the deferrability. Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com	2025-03-25 13:40:24 +01:00
Thomas Munro	3c86223c99	libpq: Deprecate pg_int64. Previously we used pg_int64 in three function prototypes in libpq. It was added by commit `461ef73f` to expose the platform-dependent type used for int64 in the C89 era. As of commit `962da900` it is defined as standard int64_t, and the dust seems to have settled. Let's just use int64_t directly in these three client-facing functions instead of (yet) another name. We've required C99 and thus <stdint.h> since PostgreSQL 12, C89 and C++98 compilers are long gone, and client applications very likely use standard types for their own 64-bit needs. This also cleans up the obscure placement of a new #include <stdint.h> directive in postgres_ext.h, required for the new definition. The typedef was hiding in there for historical reasons, but it doesn't fit postgres_ext.h's own description of its purpose and there is no evidence of client applications including postgres_ext.h directly to see it. Keep a typedef marked deprecated for backward compatibility, but move it into libpq-fe.h where it was used. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGKn_EkNNGMY5RzMcKP%2Ba6urT4JF%3DCPhw_zHtQwjvX6P2g%40mail.gmail.com	2025-03-25 21:40:00 +13:00
Peter Eisentraut	be1cc9aaf5	Generalize index support in network support function The network (inet) support functions currently only supported a hardcoded btree operator family. With the generalized compare type facility, we can generalize this to support any operator family from any index type that supports the required operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-03-25 07:11:56 +01:00
Michael Paquier	5ac462e2b7	Add support for custom_query_jumble as a node field attribute This option gives the possibility for query jumble to define a custom routine for the field of a Node, extending support for custom_query_jumble as a node field attribute. When dealing with complex node structures, this can be simpler than having to enforce a custom function across a full node. Custom functions need to be defined in queryjumblefuncs.c, named as _jumble${node}_${field}(), and use in input the JumbleState, the node and its field. The field is not really required if we have the Node, but it makes custom implementations somewhat easier to think about. The code generated by gen_node_support.pl uses a macro called JUMBLE_CUSTOM(), hiding the internals of the logic inside queryjumblefuncs.c. This will be used by an upcoming patch manipulating adding a custom routine into a field of RangeTblEntry, but this facility can become useful in more cases. Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/Z9y43-dRvb4EtxQ0@paquier.xyz	2025-03-25 14:18:00 +09:00
Jeff Davis	626df47ad9	Remove 'additional' pointer from TupleHashEntryData. Reduces memory required for hash aggregation by avoiding an allocation and a pointer in the TupleHashEntryData structure. That structure is used for all buckets, whether occupied or not, so the savings is substantial. Discussion: https://postgr.es/m/AApHDvpN4v3t_sdz4dvrv1Fx_ZPw=twSnxuTEytRYP7LFz5K9A@mail.gmail.com Reviewed-by: David Rowley <dgrowleyml@gmail.com>	2025-03-24 22:06:02 -07:00
Jeff Davis	a0942f441e	Add ExecCopySlotMinimalTupleExtra(). Allows an "extra" argument that allocates extra memory at the end of the MinimalTuple. This is important for callers that need to store additional data, but do not want to perform an additional allocation. Suggested-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvppeqw2pNM-+ahBOJwq2QmC0hOAGsmCpC89QVmEoOvsdg@mail.gmail.com	2025-03-24 22:05:53 -07:00
Jeff Davis	4d143509cb	Create accessor functions for TupleHashEntry. Refactor for upcoming optimizations. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/1cc3b400a0e8eead18ff967436fa9e42c0c14cfb.camel@j-davis.com	2025-03-24 22:05:41 -07:00
Jeff Davis	cc721c459d	HashAgg: use Bump allocator for hash TupleHashTable entries. The entries aren't freed until the entire hash table is destroyed, so use the Bump allocator to improve allocation speed, avoid wasting space on the chunk header, and avoid wasting space due to the power-of-two allocations. Discussion: https://postgr.es/m/CAApHDvqv1aNB4cM36FzRwivXrEvBO_LsG_eQ3nqDXTjECaatOQ@mail.gmail.com Reviewed-by: David Rowley	2025-03-24 22:05:33 -07:00
Amit Kapila	cc4331605a	Fix the typo in the test case added in `73eba5004a`. Author: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com	2025-03-25 09:39:53 +05:30
Amit Kapila	b87ced747d	Fix an oversight in `3abe9dc188`. Forgot to update the comment atop one of the functions. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/OSCPR01MB1496623BE1125B44614494E7AF5A72@OSCPR01MB14966.jpnprd01.prod.outlook.com	2025-03-25 09:26:23 +05:30
Andres Freund	adb5f85fa5	Redefine max_files_per_process to control additionally opened files Until now max_files_per_process=N limited each backend to open N files in total (minus a safety factor), even if there were already more files opened in postmaster and inherited by backends. Change max_files_per_process to control how many additional files each process is allowed to open. The main motivation for this is the patch to add io_method=io_uring, which needs to open one file for each backend. Without this patch, even if RLIMIT_NOFILE is high enough, postmaster will fail in set_max_safe_fds() if started with a high max_connections. The cause of the failure is that, until now, set_max_safe_fds() subtracted the already open files from max_files_per_process. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/w6uiicyou7hzq47mbyejubtcyb2rngkkf45fk4q7inue5kfbeo@bbfad3qyubvs Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d=h9J=D1rStFCMR+t7yh_Kwj-g87aLQ@mail.gmail.com	2025-03-24 18:20:18 -04:00
Nathan Bossart	7d559c8580	Expand comment for isset_offset. This field was added in commit `0164a0f9ee` to provide a way to determine whether a storage parameter was explicitly set for the relation or if it just picked up the default value. In most cases, this can be accomplished by giving the storage parameter a special out-of-range default value (e.g., the autovacuum_vacuum_insert_threshold storage parameter defaults to -2), but this approach doesn't work in all cases. For example, a Boolean storage parameter cannot be given an out-of-range default, so we need another way to discover the source of its value. Reported-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/CAKFQuwYKtEUYKS%2B18gRs-xPhn0qOJgM2KGyyWVCODHuVn9F-XQ%40mail.gmail.com	2025-03-24 15:47:02 -05:00
Melanie Plageman	aea916fe55	Fix bitmapheapscan incorrect recheck of NULL tuples The bitmap heap scan skip fetch optimization skips fetching the heap block when a page is set all-visible in the visibility map and no columns from the table are needed to satisfy the query. `2b73a8cd33` and `c3953226a0` changed the control flow of bitmap heap scan to use the read stream API. The read stream API returns buffers containing blocks to the user. To make this work with the skip fetch optimization, we keep a count of the empty tuples we need to emit for all the blocks skipped and only emit the empty tuples after processing the next block fetched from the heap or at the end of the scan. It's incorrect to recheck NULL tuples, so we must set `recheck` to false before yielding control back to BitmapHeapNext(). This was done before emitting any remaining empty tuples at the end of the scan but not for empty tuples emitted during the scan. This meant that if a page fetched from the heap did require recheck and set `recheck` to true and then we emitted empty tuples for subsequent blocks, we would get wrong results. Fix this by always setting `recheck` to false before emitting empty tuples. Reported-by: Alexander Lakhin <exclusion@gmail.com> Tested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/496f7acd-881c-4df3-9bd3-8f8534dfec26%40gmail.com	2025-03-24 16:40:59 -04:00
Álvaro Herrera	0e3e0ec06b	Fix typo	2025-03-24 17:36:44 +01:00
Fujii Masao	c68100aa43	Allow pg_recvlogical --drop-slot to work without --dbname. When pg_recvlogical was introduced in 9.4, the --dbname option was not required for --drop-slot. Without it, pg_recvlogical --drop-slot connected using a replication connection (not tied to a specific database) and was able to drop both physical and logical replication slots, similar to pg_receivewal --drop-slot. However, commit `0c013e08cf` unintentionally changed this behavior in 9.5, making pg_recvlogical always check whether it's connected to a specific database and fail if it's not. This change was expected for --create-slot and --start, which handle logical replication slots and require a database connection, but it was unnecessary for --drop-slot, which should work with any replication connection. As a result, --dbname became a required option for --drop-slot. This commit removes that restriction, restoring the original behavior and allowing pg_recvlogical --drop-slot to work without specifying --dbname. Although this issue originated from an unintended change, it has existed for a long time without complaints or bug reports, and the documentation never explicitly stated that --drop-slot should work without --dbname. Therefore, the change is not treated as a bug fix and is applied only to master. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/b15ecf4f-e5af-4fbb-82c2-a425f453e0b2@oss.nttdata.com	2025-03-25 00:18:27 +09:00
Magnus Hagander	a8eeb22f17	psql: use consistent alias for pg_description Author:Jelte Fennema-Nio <github-tech@jeltef.nl> Suggested-By: Michael Banck <mbanck@gmx.net> Discussion: https://www.postgresql.org/message-id/67813520.170a0220.183245.7bf0%40mx.google.com	2025-03-24 14:31:28 +01:00
Magnus Hagander	d696406a9b	psql: show default extension version in \dx output Reviewed-By: Julien Rouhaud <rjuju123@gmail.com> Reviewed-By: Michael Banck <mbanck@gmx.net> Reviewed-By: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/CABUevEyTMyXC6OvCWkj+rPnHrfi8_Rw_+DD_jzgFFNPqgf+Oig@mail.gmail.com	2025-03-24 14:25:05 +01:00
Heikki Linnakangas	19c6eb06c5	Add test case for when subscriber table is missing a column We haven't had bugs in this area, but there's some not-entirely trivial code to detect that case, so it seems good to have test coverage for it. Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://www.postgresql.org/message-id/CAHut%2BPtX8P0EGhsk9p%3DhQGUHrzxeCSzANXSMKOvYiLX-EjdyNw@mail.gmail.com	2025-03-24 12:13:32 +02:00
Amit Kapila	73eba5004a	Detect and Log multiple_unique_conflicts type conflict. Introduce a new conflict type, multiple_unique_conflicts, to handle cases where an incoming row during logical replication violates multiple UNIQUE constraints. Previously, the apply worker detected and reported only the first encountered key conflict (insert_exists/update_exists), causing repeated failures as each constraint violation needs to be handled one by one making the process slow and error-prone. With this patch, the apply worker checks all unique constraints upfront once the first key conflict is detected and reports multiple_unique_conflicts if multiple violations exist. This allows users to resolve all conflicts at once by deleting all conflicting tuples rather than dealing with them individually or skipping the transaction. In the future, this will also allow us to specify different resolution handlers for such a conflict type. Add the stats for this conflict type in pg_stat_subscription_stats. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com	2025-03-24 12:30:44 +05:30
David Rowley	35a92b7c25	Add tests for POSITION(bytea, bytea) Previously there was no coverage for this function. Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TMT6XCooMVKnCd_tR2oBdGcnjefSeCDCv8jzKy9VkWA5w@mail.gmail.com	2025-03-24 19:32:02 +13:00
Michael Paquier	2a0cd38da5	Allow plugins to set a 64-bit plan identifier in PlannedStmt This field can be optionally set in a PlannedStmt through the planner hook, giving extensions the possibility to assign an identifier related to a computed plan. The backend is changed to report it in the backend entry of a process running (including the extended query protocol), with semantics and APIs to set or get it similar to what is used for the existing query ID (introduced in the backend via `4f0b0966c8`). The plan ID is reset at the same timing as the query ID. Currently, this information is not added to the system view pg_stat_activity; extensions can access it through PgBackendStatus. Some patches have been proposed to provide some features in the planning area, where a plan identifier is used as a key to know the plan involved (for statistics, plan storage and manipulations, etc.), and the point of this commit is to provide an anchor in the backend that extensions can rely on for future work. The reset of the plan identifier is controlled by core and follows the same pattern as the query identifier added in `4f0b0966c8`. The contents of this commit are extracted from a larger set proposed originally by Lukas Fittl, that Sami Imseih has proposed as an independent change, with a few tweaks sprinkled by me. Author: Lukas Fittl <lukas@fittl.com> Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com Discussion: https://postgr.es/m/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw@mail.gmail.com	2025-03-24 13:23:42 +09:00
Tom Lane	8a3e4011f0	psql: Add tab completion for VACUUM and ANALYZE ... ONLY option. Improve psql's tab completion for VACUUM and ANALYZE by supporting the ONLY option introduced in `62ddf7ee9`. In passing, simplify some of the VACUUM patterns by making use of MatchAnyN. Author: Umar Hayat <postgresql.wizard@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Discussion: https://postgr.es/m/CAD68Dp3L6yW_nWs+MWBs6s8tKLRzXaQdQgVRm4byZe0L-hRD8g@mail.gmail.com	2025-03-23 17:16:08 -04:00
Heikki Linnakangas	2817525f0d	Fix rare assertion failure in standby, if primary is restarted During hot standby, ExpireAllKnownAssignedTransactionIds() and ExpireOldKnownAssignedTransactionIds() functions mark old transactions as no-longer running, but they failed to update xactCompletionCount and latestCompletedXid. AFAICS it would not lead to incorrect query results, because those functions effectively turn in-progress transactions into aborted transactions and an MVCC snapshot considers both as "not visible". But it could surprise GetSnapshotDataReuse() and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin))" assertion in it, if the apparent xmin in a backend would move backwards. We saw this happen when GetCatalogSnapshot() would reuse an older catalog snapshot, when GetTransactionSnapshot() had already advanced TransactionXmin. The bug goes back all the way to commit `623a9ba79b` in v14 that introduced the snapshot reuse mechanism, but it started to happen more frequently with commit `952365cded` which removed a GetTransactionSnapshot() call from backend startup. That made it more likely for ExpireOldKnownAssignedTransactionIds() to be called between GetCatalogSnapshot() and the first GetTransactionSnapshot() in a backend. Andres Freund first spotted this assertion failure on buildfarm member 'skink'. Reproduction and analysis by Tomas Vondra. Backpatch-through: 14 Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5zmdj	2025-03-23 20:41:16 +02:00
Noah Misch	f0446384ea	Fix "make clean" for new TAP suite. Commit `28f04984f0` missed this.	2025-03-23 06:12:02 -07:00
Andres Freund	ca3067cc57	aio: Change prefix of PgAioResultStatus values to PGAIO_RS_ The previous prefix wasn't consistent with the naming of other AIO related enum values. It seems best to rename it before the users are introduced. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_Yb+JzQpNsgUxCB0gBi+sE-mi_HmcJF6ALnmO4W+UgwpA@mail.gmail.com	2025-03-22 17:30:44 -04:00
Tom Lane	58fdca2204	plpgsql: make WHEN OTHERS distinct from WHEN SQLSTATE '00000'. The catchall exception condition OTHERS was represented as sqlerrstate == 0, which was a poor choice because that comes out the same as SQLSTATE '00000'. While we don't issue that as an error code ourselves, there isn't anything particularly stopping users from doing so. Use -1 instead, which can't match any allowed SQLSTATE string. While at it, invent a macro PLPGSQL_OTHERS to use instead of a hard-coded magic number. While this seems like a bug fix, I'm inclined not to back-patch. It seems barely possible that someone has written code like this and would be annoyed by changing the behavior in a minor release. Reported-by: David Fiedler <david.fido.fiedler@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHjN70-=H5EpTOuZVbC8mPvRS5EfZ4MY2=OUdVDWoyGvKhb+Rw@mail.gmail.com	2025-03-22 14:17:00 -04:00
Peter Geoghegan	9a2e2a285a	Improve nbtree array primitive scan scheduling. Add a new scheduling heuristic: don't end the ongoing primitive index scan immediately (at the point where _bt_advance_array_keys notices that the next set of matching tuples must be on a later page) if the primscan already managed to step right/left from its first leaf page. Schedule a recheck against the next sibling leaf page's finaltup instead. The new heuristic tends to avoid scenarios where the top-level scan repeatedly starts and ends primitive index scans that each read only one leaf page from a group of neighboring leaf pages. Affected top-level scans will now tend to step forward (or backward) through the index instead, without wasting cycles on descending the index anew. The recheck mechanism isn't exactly new. But up until now it has only been used to deal with edge cases involving high key finaltups with one or more truncated -inf attributes that _bt_advance_array_keys deemed "provisionally satisfied" (satisfied for the purposes of allowing the scan to step onto the next page, subject to recheck once on that page). The mechanism was added by commit `5bf748b8`, which invented the general concept of primitive scan scheduling. It was later enhanced by commit `79fa7b3b`, which taught it about cases involving -inf attributes that satisfy inequality scan keys required in the opposite-to-scan direction only (arguably, they should have been covered by the earliest version). Now the recheck mechanism can be applied based on scan-level heuristics, which have nothing to do with truncated high keys. Now rechecks might be performed by _bt_readpage when scanning in _either_ scan direction. The theory behind the new heuristic is that any primitive scan that makes it past its first leaf page is one that is already likely to have arrays whose key values match index tuples that are closely clustered together in the index. The rules that determine whether we ever get past the first page are still conservative (that'll still only happen when pstate.finaltup strongly suggests that it's the right thing to do). Surviving past the first leaf page is a strong signal in itself. Preparation for an upcoming patch that will add skip scan optimizations to nbtree. That'll work by adding skip arrays, which behave similarly to SAOP arrays, but generate their elements procedurally and on-demand. Note that this commit isn't specifically concerned with skip arrays; the scheduling logic doesn't (and won't) condition anything on whether the scan uses skip arrays, SAOP arrays, or some combination of the two (which seems like a good general principle for _bt_advance_array_keys). While the problems that this commit ameliorates are more likely with skip arrays (at least in practice), SAOP arrays (or those with very dense, contiguous array elements) are also affected. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com	2025-03-22 13:02:18 -04:00
Melanie Plageman	e215166c9c	Use streaming read I/O in SP-GiST vacuuming Like `69273b818b` did for GiST vacuuming, make SP-GiST vacuum use the read stream API for vacuuming physically contiguous index pages. Concurrent insertions may cause SP-GiST index tuples to be redirected. While vacuuming, these are added to a pending list which is later processed to ensure no dead tuples are left behind. Pages containing such tuples are still read by directly calling ReadBuffer() and do not use the read stream API. Author: Andrey M. Borodin <x4mmm@yandex-team.ru> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/37432403-8657-403B-9CDF-5A642BECDD81%40yandex-team.ru	2025-03-21 17:51:22 -04:00
Thomas Munro	e51ca405ed	Fix ps display for IO workers. This code must have missed a memo about the backend type description being supplied automatically these days, and was duplicating that information. Before: "io worker io worker: N" After: "io worker N"	2025-03-22 10:13:23 +13:00
Tom Lane	16a3ae504e	Revert inappropriate weakening of an Assert in plpgsql. Commit `682ce911f` modified exec_save_simple_expr to accept a Param in the tlist of a Gather node, rather than the normal case of a Var referencing the Gather's input. It turns out that this was a kluge to work around the bug later fixed in `0f7ec8d9c`, namely that setrefs.c was failing to replace Params in upper plan nodes with Var references to the same Params appearing in the child tlists. With that fixed, there seems no reason to continue to allow a Param here. (Moreover, even if we did expect a Param here, the semantically correct thing to do would be to take the Param as the expression being sought. Whatever it may represent, it is not a reference to the child.) Hence, revert that part of `682ce911f`. That all happened a long time ago. However, since the net effect here is just to tighten an Assert condition, I'm content to change it only in master. Discussion: https://postgr.es/m/1565347.1742572349@sss.pgh.pa.us	2025-03-21 15:55:06 -04:00
Masahiko Sawada	04ff636cbc	Add GUC option to control maximum active replication origins. This commit introduces a new GUC option max_active_replication_origins to control the maximum number of active replication origins. Previously, this was controlled by 'max_replication_slots'. Having a separate GUC option provides better flexibility for setting up subscribers, as they may not require replication slots (for cascading replication) but always require replication origins. Author: Euler Taveira <euler@eulerto.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/b81db436-8262-4575-b7c4-bc0c1551000b@app.fastmail.com	2025-03-21 12:20:15 -07:00
Tom Lane	0e032a2240	Place "extern" declaration in the right part of pg_class.h. errdetail_relkind_not_supported() was declared within EXPOSE_TO_CLIENT_CODE, which is mistaken since that function isn't available client-side. While relatively harmless, this isn't good precedent. Discussion: https://postgr.es/m/1134562.1742507765@sss.pgh.pa.us	2025-03-21 15:14:15 -04:00
Tom Lane	cd72c1b76e	Label the contents of pg__d.h files a little better. Make genbki.pl emit some boilerplate comments identifying the sections of the pg__d.h files that it generates. This is in hopes of making them slightly more readable, in case people look at those files and not the pg_.h/pg_.dat originals. Discussion: https://postgr.es/m/1134562.1742507765@sss.pgh.pa.us	2025-03-21 15:09:46 -04:00
Melanie Plageman	69273b818b	Use streaming read I/O in GiST vacuuming Like `c5c239e26e` did for btree vacuuming, make GiST vacuum use the read stream API for sequentially processed pages. Because it is possible for concurrent insertions to relocate unprocessed index entries to already vacuumed pages, GiST vacuum must backtrack and reprocess those pages. These pages are still read with explicit ReadBuffer() calls. Author: Andrey M. Borodin <x4mmm@yandex-team.ru> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/EFEBED92-18D1-4C0F-A4EB-CD47072EF071%40yandex-team.ru	2025-03-21 14:06:45 -04:00
Melanie Plageman	3f850c3fc5	Assorted trivial cleanup of `c5c239e26e` `c5c239e26e` made btree vacuum use the read stream API. Though it used functions declared in read_stream.h, it relied on transitively including it. Explicitly include that file. Also remove an extraneous newline and decrease the scope of one of the local variables in btvacuumscan().	2025-03-21 14:06:40 -04:00
Tom Lane	7fe312f609	Fix plpgsql's handling of simple expressions in scrollable cursors. exec_save_simple_expr did not account for the possibility that standard_planner would stick a Materialize node atop the plan of even a simple Result, if CURSOR_OPT_SCROLL is set. This led to an "unexpected plan node type" error. This is a very old bug, but it'd only be reached by declaring a cursor for a "SELECT simple-expression" query and explicitly marking it scrollable, which is an odd thing to do. So the lack of prior reports isn't too surprising. Bug: #18859 Reported-by: Olleg Samoylov <splarv@ya.ru> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18859-0d5f28ac99a37059@postgresql.org Backpatch-through: 13	2025-03-21 11:30:42 -04:00
Melanie Plageman	c5c239e26e	Use streaming read I/O in btree vacuuming Btree vacuum processes all index pages in physical order. Now it uses the read stream API to get the next buffer instead of explicitly invoking ReadBuffer(). It is possible for concurrent insertions to cause page splits during index vacuuming. This can lead to index entries that have yet to be vacuumed being moved to pages that have already been vacuumed. Btree vacuum code handles this by backtracking to reprocess those pages. So, while sequentially encountered pages are now read through the read stream API, backtracked pages are still read with explicit ReadBuffer() calls. Author: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_bW1UOyup%3DjdFw%2BkOF9bCaAm%3D9UpiyZtbPMn8n_vnP%2Big%40mail.gmail.com#3b3a84132fc683b3ee5b40bc4c2ea2a5	2025-03-21 09:09:39 -04:00
Álvaro Herrera	1d617a2028	Change one loop in ATRewriteTable to use 1-based attnums All TupleDescAttr() calls in tablecmds.c that aren't in loops across all attributes use AttrNumber-style indexes (1-based); there was only one place in ATRewriteTable that was stashing 0-based indexes in a list for later processing. Switch that to use attnums for consistency. Author: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEoYA5ScUr2=CmA1xcpaS_1ixneDbEkVU77X1ctGxY2mA@mail.gmail.com	2025-03-21 10:55:06 +01:00
Thomas Munro	ce1a75c4fe	Support buffer forwarding in StartReadBuffers(). StartReadBuffers() reports a short read when it finds a cached block that ends a range needing I/O by updating the caller's nblocks. It doesn't want to have to unpin the trailing hit that it knows the caller wants, so the v17 version used sleight of hand in the name of simplicity: it included it in nblocks as if it were part of the I/O, but internally tracked the shorter real I/O size in io_buffers_len (now removed). This API change "forwards" the delimiting buffer to the next call. It's still pinned, and still stored in the caller's array, but *nblocks no longer includes stray buffers that are not really part of the operation. The expectation is that the caller still wants the rest of the blocks and will call again starting from that point, and now it can pass the already pinned buffer back in (or choose not to and release it). The change is needed for the coming asynchronous I/O version's larger version of the problem: by definition it must move BM_IO_IN_PROGRESS negotiation from WaitReadBuffers() to StartReadBuffers(), but it might already have many buffers pinned before it discovers a need to split an I/O. (The current synchronous I/O version hides that detail from callers by looping over smaller reads if required to make all covered buffers valid in WaitReadBuffers(), so it looks like one operation but it might occasionally be several under the covers.) Aside from avoiding unnecessary pin traffic, this will also be important for later work on out-of-order streams: you can't prioritize data that is already available right now if that fact is hidden from you. The new API is natural for read_stream.c (see `ed0b87ca`). After a short read it leaves forwarded buffers where they fell in its circular queue for the continuing call to pick up. Single-block StartReadBuffer() and traditional ReadBuffer() share code but are not affected by the change. They don't do multi-block I/O. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com	2025-03-21 20:43:59 +13:00
Thomas Munro	ed0b87caac	Support buffer forwarding in read_stream.c. In preparation for a follow-up change to the buffer manager, teach read_stream.c to manage buffers "forwarded" from one StartReadBuffers() call to the next after a short read. This involves a small amount of extra book-keeping, and opens the way for lower levels to split I/O operations without having to drop pins, as required for efficient handling of various edge cases. Concretely, the "buffers" argument will change from an out parameter to an in/out parameter. Buffer queue elements must be initialized on first use and cleared after they're consumed, but forwarded buffers are left where they fall ahead of the current pending read in the queue, ready for use by the operation that continues where a short read left off. The stream also needs to count them for pin limit management and release them on reset/early end. Tested-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com	2025-03-21 18:44:47 +13:00
David Rowley	00b52c3db6	Simplify EXPLAIN code for Memoize This removes a needless special case for Memoize's FORMAT TEXT EXPLAIN output. ExplainPropertyText() outputs the same thing in text mode as the special-case code was doing, so removing the special-case code results in the same EXPLAIN output, just with less code. It seems like a good idea to fix this to help prevent future changes in this area from copying the same pattern. Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/88a71bcd-0b5c-4d0b-8107-757e96f402d5@tantorlabs.com	2025-03-21 13:40:05 +13:00
Andres Freund	202b12774d	bufmgr: Improve stats when a buffer is read in concurrently Previously we would have the following inaccuracies when a backend tried to read in a buffer, but that buffer was read in concurrently by another backend: - the read IO was double-counted in the global buffer access stats (pgBufferUsage) - the buffer hit was not accounted for in: - global buffer access statistics - pg_stat_io - relation level IO stats - vacuum cost balancing While trying to read in a buffer that is concurrently read in by another backend is not a common occurrence, it's also not that rare, e.g. due to concurrent sequential scans on the same relation. This scenario has become more likely in PG 17, due to the introducing of read streams, which can pin multiple buffers before calling StartBufferIO() for all the buffers. This behaviour has historically grown, but there doesn't seem to be any reason to continue with the wrong accounting. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_Zk-B08AzPsO-6680LUHLOCGaNJYofaxTFseLa=OepV1g@mail.gmail.com	2025-03-20 19:58:22 -04:00
Andres Freund	fc51a60dd4	smgr: Hold interrupts in most smgr functions We need to hold interrupts across most of the smgr.c/md.c functions, as otherwise interrupt processing, e.g. due to a < ERROR elog/ereport, can trigger procsignal processing, which in turn can trigger smgrreleaseall(). As the relevant code is not reentrant, we quickly end up in a bad situation. The only reason we haven't noticed this before is that there is only one non-error ereport called in affected routines, in register_dirty_segments(), and that one is extremely rarely reached. If one enables fd.c's FDDEBUG it's easy to reproduce crashes. It seems better to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() in smgr.c, instead of trying to push them down to md.c where possible: For one, every smgr implementation would be vulnerable, for another, a good bit of smgr.c code itself is affected too. Eventually we might want a more targeted solution, allowing e.g. a networked smgr implementation to be interrupted, but many other, more complicated, problems would need to be fixed for that to be viable (e.g. smgr.c is often called with interrupts already held). One could argue this should be backpatched, but the existing < ERROR elog/ereports that can be reached with unmodified sources are unlikely to be reached. On balance the risk of backpatching seems higher than the gain - at least for now. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl	2025-03-20 17:33:57 -04:00
Robert Haas	50ba65e733	Add an additional hook for EXPLAIN option validation. Commit `c65bc2e1d1` made it possible for loadable modules to add EXPLAIN options. Normally, any necessary validation can be performed by the hook function passed to RegisterExtensionExplainOption, but if a loadable module wants to sanity check options against each other, that needs to be done after the entire options list has been processed. So, add an additional hook for that purpose. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CAA5RZ0vOcJF91O2e5AQN+V6guMNLMhJx83dxALf-iUZ-hLGO_Q@mail.gmail.com	2025-03-20 13:47:55 -04:00
Nathan Bossart	af0d4901c1	Add test for pg_upgrade file transfer modes. This new test checks all of pg_upgrade's file transfer modes. For each mode, we verify that pg_upgrade either succeeds (and some test objects successfully reach the new version) or fails with an error that indicates the mode is not supported on the current platform. For cross-version tests, we also check that pg_upgrade transfers non-default tablespaces. (Tablespaces can't be tested on same version upgrades because of the version-specific subdirectory conflict, but we might be able to enable such tests once we teach pg_upgrade how to handle in-place tablespaces.) Suggested-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-20 11:08:42 -05:00
Nathan Bossart	0164a0f9ee	Add vacuum_truncate configuration parameter. This new parameter works just like the storage parameter of the same name: if set to true (which is the default), autovacuum and VACUUM attempt to truncate any empty pages at the end of the table. It is primarily intended to help users avoid locking issues on hot standbys. The setting can be overridden with the storage parameter or VACUUM's TRUNCATE option. Since there's presently no way to determine whether a Boolean storage parameter is explicitly set or has just picked up the default value, this commit also introduces an isset_offset member to relopt_parse_elt. Suggested-by: Will Storey <will@summercat.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null	2025-03-20 10:16:50 -05:00
Peter Eisentraut	618c64ffd3	Revert workarounds for -Wmissing-braces false positives on old GCC We have collected several instances of a workaround for GCC bug 53119, which caused false-positive compiler warnings. This bug has long been fixed, but was still seen on the buildfarm, most recently on lapwing with gcc (Debian 4.7.2-5). (The GCC bug tracker mentions that a fix was backported to 4.7.4 and 4.8.3.) That compiler no longer runs warning-free since commit `6fdd5d9563`, so we don't need to keep these workarounds. And furthermore, the consensus appears to be that we don't want to keep supporting that era of platform anymore at all. This reverts the following commits: `d937904cce` `506428d091` `b449afb582` `6392f2a096` `bad0763a4d` `5e0c761d0a` and makes a few similar fixes to newer code. Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com	2025-03-20 11:25:58 +01:00
Peter Eisentraut	b7076c1e7f	Fix extension control path tests Change expected extension to be installed from amcheck to plpgsql since not all build farm animals has the contrib module installed. Author: Matheus Alcantara <mths.dev@pm.me> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com	2025-03-20 10:53:59 +01:00
Peter Eisentraut	47929324c5	Fix typo in comment	2025-03-20 10:44:12 +01:00
Amit Kapila	e5aeed4b80	pg_createsubscriber: Add -R publications option. This patch introduces a new '-R'/'--remove' option in the 'pg_createsubscriber' utility to specify the object types to be removed from the subscriber. Currently, we add support to specify 'publications' as an object type. In the future, other object types like failover-slots could be added. This feature allows optionally to remove publications on the subscriber that were replicated from the primary server (before running this tool) during physical replication. Users may want to retain these publications in case they want some pre-existing subscribers to point to the newly created subscriber. Author: Shubham Khanna <khannashubham1197@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg@mail.gmail.com	2025-03-20 12:21:54 +05:30
Andres Freund	5941946d09	meson: Flush stdout in testwrap Otherwise the progress won't reliably be displayed during a test. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/kx6xu7suexal5vwsxpy7ybgkcznx6hgywbuhkr6qabcwxjqax2@i4pcpk75jvaa Backpatch-through: 16	2025-03-19 09:04:09 -04:00
Peter Eisentraut	190dc27998	Update a code comment The comment explained that ALTER TABLE ADD CONSTRAINT USING INDEX is only supported with a btree index. (This is not being changed.) The reason is to keep upgrades robust, as explained there. The other part of the comment, that btree is the only unique index kind anyway, is somewhat less true as we're trying to enable unique indexes other than btree, and it's irrelevant to this check. There is a check for indisunique earlier already. So just remove this part of the comment. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-03-19 10:39:06 +01:00
Peter Eisentraut	4f7f7b0375	extension_control_path The new GUC extension_control_path specifies a path to look for extension control files. The default value is $system, which looks in the compiled-in location, as before. The path search uses the same code and works in the same way as dynamic_library_path. Some use cases of this are: (1) testing extensions during package builds, (2) installing extensions outside security-restricted containers like Python.app (on macOS), (3) adding extensions to PostgreSQL running in a Kubernetes environment using operators such as CloudNativePG without having to rebuild the base image for each new extension. There is also a tweak in Makefile.global so that it is possible to install extensions using PGXS into an different directory than the default, using 'make install prefix=/else/where'. This previously only worked when specifying the subdirectories, like 'make install datadir=/else/where/share pkglibdir=/else/where/lib', for purely implementation reasons. (Of course, without the path feature, installing elsewhere was rarely useful.) Author: Peter Eisentraut <peter@eisentraut.org> Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: David E. Wheeler <david@justatheory.com> Reviewed-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com	2025-03-19 07:03:20 +01:00
Michael Paquier	2cce0fe440	psql: Allow queries terminated by semicolons while in pipeline mode Currently, the only way to pipe queries in an ongoing pipeline (in a \startpipeline block) is to leverage the meta-commands able to create extended queries such as \bind, \parse or \bind_named. While this is good enough for testing the backend with pipelines, it has been mentioned that it can also be very useful to allow queries terminated by semicolons to be appended to a pipeline. For example, it would be possible to migrate existing psql scripts to use pipelines by just adding a set of \startpipeline and \endpipeline meta-commands, making such scripts more efficient. Doing such a change is proving to be simple in psql: queries terminated by semicolons can be executed through PQsendQueryParams() without any parameters set when the pipeline mode is active, instead of PQsendQuery(), the default, like pgbench. \watch is still forbidden while in a pipeline, as it expects its results to be processed synchronously. The large portion of this commit consists in providing more test coverage, with mixes of extended queries appended in a pipeline by \bind and friends, and queries terminated by semicolons. This improvement has been suggested by Daniel Vérité. Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/d67b9c19-d009-4a50-8020-1a0ea92366a1@manitou-mail.org	2025-03-19 13:34:59 +09:00
Thomas Munro	0b53c08677	Fix compiler warning for commit `434dbf69`. Reported-by: Tom Lane <tgl@sss.pgh.pa.us>	2025-03-19 17:26:16 +13:00
Thomas Munro	1cf4c56480	oauth: Simplify copy of PGoauthBearerRequest Follow-up to `03366b61d`. Since there are no more const members in the PGoauthBearerRequest struct, the previous memcpy() can be replaced with simple assignment. Author: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq	2025-03-19 16:59:25 +13:00
Thomas Munro	434dbf6907	oauth: Fix postcondition for set_timer on macOS On macOS, readding an EVFILT_TIMER to a kqueue does not appear to clear out previously queued timer events, so checks for timer expiration do not work correctly during token retrieval. Switching to IPv4-only communication exposes the problem, because libcurl is no longer clearing out other timeouts related to Happy Eyeballs dual-stack handling. Fully remove and re-register the kqueue timer events during each call to set_timer(), to clear out any stale expirations. Author: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com	2025-03-19 16:45:01 +13:00
Thomas Munro	8d9d5843b5	oauth: Use IPv4-only issuer in oauth_validator tests The test authorization server implemented in oauth_server.py does not listen on IPv6. Most of the time, libcurl happily falls back to IPv4 after failing its initial connection, but on NetBSD, something is consistently showing up on the unreserved IPv6 port and causing a test failure. Rather than deal with dual-stack details across all test platforms, change the issuer to enforce the use of IPv4 only. (This elicits more punishing timeout behavior from libcurl, so it's a useful change from the testing perspective as well.) Author: Jacob Champion <jacob.champion@enterprisedb.com> Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com	2025-03-19 16:45:01 +13:00
Amit Langote	28317de723	Ensure first ModifyTable rel initialized if all are pruned Commit `cbc127917e` introduced tracking of unpruned relids to avoid processing pruned relations, and changed ExecInitModifyTable() to initialize only unpruned result relations. As a result, MERGE statements that prune all target partitions can now lead to crashes or incorrect behavior during execution. The crash occurs because some executor code paths rely on ModifyTableState.resultRelInfo[0] being present and initialized, even when no result relations remain after pruning. For example, ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo to determine the appropriate action. Similarly, ExecInitPartitionInfo() assumes that at least one result relation exists. To preserve these assumptions, ExecInitModifyTable() now includes the first result relation in the initialized result relation list if all result relations for that ModifyTable were pruned. To enable that, ExecDoInitialPruning() ensures the first relation is locked if it was pruned and locking is necessary. To support this exception to the pruning logic, PlannedStmt now includes a list of RT indexes identifying the first result relation of each ModifyTable node in the plan. This allows ExecDoInitialPruning() to check whether each such relation was pruned and, if so, lock it if necessary. Bug: #18830 Reported-by: Robins Tharakan <tharakan@gmail.com> Diagnozed-by: Tender Wang <tndrwang@gmail.com> Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org	2025-03-19 12:14:24 +09:00
Thomas Munro	06fb5612c9	Increase io_combine_limit range to 1MB. The default of 128kB is unchanged, but the upper limit is changed from 32 blocks to 128 blocks, unless the operating system's IOV_MAX is too low. Some other RDBMSes seem to cap their multi-block buffer pool I/O around this number, and it seems useful to allow experimentation. The concrete change is to our definition of PG_IOV_MAX, which provides the maximum for io_combine_limit and io_max_combine_limit. It also affects a couple of other places that work with arrays of struct iovec or smaller objects on the stack, so we still don't want to use the system IOV_MAX directly without a clamp: it is not under our control and likely to be 1024. 128 seems acceptable for our current usage. For Windows, we can't use real scatter/gather yet, so we continue to define our own IOV_MAX value of 16 and emulate preadv()/pwritev() with loops. Someone would need to research the trade-offs of raising that number. NB if trying to see this working: you might temporarily need to hack BAS_BULKREAD to be bigger, since otherwise the obvious way of "a very big SELECT" is limited by that for now. Suggested-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com	2025-03-19 15:40:35 +13:00
Thomas Munro	10f6646847	Introduce io_max_combine_limit. The existing io_combine_limit can be changed by users. The new io_max_combine_limit is fixed at server startup time, and functions as a silent clamp on the user setting. That in itself is probably quite useful, but the primary motivation is: aio_init.c allocates shared memory for all asynchronous IOs including some per-block data, and we didn't want to waste memory you'd never used by assuming they could be up to PG_IOV_MAX. This commit already halves the size of 'AioHandleIov' and 'AioHandleData'. A follow-up commit can now expand PG_IOV_MAX without affecting that. Since our GUC system doesn't support dependencies or cross-checks between GUCs, the user-settable one now assigns a "raw" value to io_combine_limit_guc, and the lower of io_combine_limit_guc and io_max_combine_limit is maintained in io_combine_limit. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com	2025-03-19 15:23:54 +13:00
Michael Paquier	17d8bba6da	Fix copy-paste error related to the autovacuum launcher in pgstat_io.c Autovacuum launchers perform no WAL IO reads, but pgstat_tracks_io_op() was tracking them as an allowed combination for the "init" and "normal" contexts. This caused the "read", "read_bytes" and "read_time" attributes of pg_stat_io to show zeros for the autovacuum launcher rather than NULL. NULL means that a combination of IO object, IO context and IO operation has no meaning for a backend type. Zero is the same as telling that a combination is relevant, and that WAL reads are possible in an autovacuum launcher, but it is not relevant. Copy-pasto introduced in `a051e71e28`. Author: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAEudQAopEMAPiUqE7BvDV+x2fUPmKmb9RrsaoDR+hhQzLKg4PQ@mail.gmail.com	2025-03-19 08:52:10 +09:00
Masahiko Sawada	f4290f20dd	Fix assertion failure in parallel vacuum with minimal maintenance_work_mem setting. `bbf668d66f` lowered the minimum value of maintenance_work_mem to 64kB. However, in parallel vacuum cases, since the initial underlying DSA size is 256kB, it attempts to perform a cycle of index vacuuming and table vacuuming with an empty TID store, resulting in an assertion failure. This commit ensures that at least one page is processed before index vacuuming and table vacuuming begins. Backpatch to 17, where the minimum maintenance_work_mem value was lowered. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAD21AoCEAmbkkXSKbj4dB+5pJDRL4ZHxrCiLBgES_g_g8mVi1Q@mail.gmail.com Backpatch-through: 17	2025-03-18 16:37:02 -07:00
Michael Paquier	6d3ea48ff1	Optimize check for pending backend IO stats This commit changes the backend stats code so as we rely on a single boolean rather than a repeated check based on pg_memory_is_all_zeros() in the code, making it cheaper should PgStat_PendingIO get bigger in size. The frequency of backend stats reports is not a bottleneck, but there is no reason to not make that cheaper, and the logic is simple as the only entry points updating backend IO stats are pgstat_count_backend_io_op() and pgstat_count_backend_io_op_time(). Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal	2025-03-19 08:03:06 +09:00
Nathan Bossart	c9d502eb68	Update guidance for running vacuumdb after pg_upgrade. Now that pg_upgrade can carry over most optimizer statistics, we should recommend using vacuumdb's new --missing-stats-only option to only analyze relations that are missing statistics. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan	2025-03-18 16:32:56 -05:00
Nathan Bossart	edba754f05	vacuumdb: Add option for analyzing only relations missing stats. This commit adds a new --missing-stats-only option that can be used with --analyze-only or --analyze-in-stages. When this option is specified, vacuumdb will analyze a relation if it lacks any statistics for a column, expression index, or extended statistics object. This new option is primarily intended for use after pg_upgrade (since it can now retain most optimizer statistics), but it might be useful in other situations, too. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan	2025-03-18 16:32:56 -05:00
Nathan Bossart	9c03c8d187	vacuumdb: Teach vacuum_one_database() to reuse query results. Presently, each call to vacuum_one_database() queries the catalogs to retrieve the list of tables to process. A follow-up commit will add a "missing stats only" feature to --analyze-in-stages, which requires saving the catalog query results (since tables without statistics will have them after the first stage). This commit adds a new parameter to vacuum_one_database() that specifies either a previously-retrieved list or a place to return the catalog query results. Note that nothing uses this new parameter yet. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan	2025-03-18 16:32:55 -05:00
Andres Freund	499faf9063	smgr: Make SMgrRelation initialization safer against errors In case the smgr_open callback failed, the ->pincount field would not be initialized and the relation would not be put onto the unpinned_relns list. This buglet was introduced in `21d9c3ee4e`, in 17. Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl Backpatch-through: 17	2025-03-18 14:04:44 -04:00
Álvaro Herrera	62d712ecfd	Introduce squashing of constant lists in query jumbling pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Tested-by: Yasuo Honda <yasuo.honda@gmail.com> Tested-by: Sergei Kornilov <sk@zsrv.org> Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com> Tested-by: Chengxi Sun <sunchengxi@highgo.com> Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com	2025-03-18 18:56:11 +01:00
Andres Freund	247ce06b88	aio: Add io_method=worker The previous commit introduced the infrastructure to start io_workers. This commit actually makes the workers execute IOs. IO workers consume IOs from a shared memory submission queue, run traditional synchronous system calls, and perform the shared completion handling immediately. Client code submits most requests by pushing IOs into the submission queue, and waits (if necessary) using condition variables. Some IOs cannot be performed in another process due to lack of infrastructure for reopening the file, and must processed synchronously by the client code when submitted. For now the default io_method is changed to "worker". We should re-evaluate that around beta1, we might want to be careful and set the default to "sync" for 18. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-18 11:54:01 -04:00
Andres Freund	55b454d0e1	aio: Infrastructure for io_method=worker This commit contains the basic, system-wide, infrastructure for io_method=worker. It does not yet actually execute IO, this commit just provides the infrastructure for running IO workers, kept separate for easier review. The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually we'd like to make the number of workers dynamically scale up/down based on the current "IO load". To allow the number of IO workers to be increased without a restart, we need to reserve PGPROC entries for the workers unconditionally. This has been judged to be worth the cost. If it turns out to be problematic, we can introduce a PGC_POSTMASTER GUC to control the maximum number. As io workers might be needed during shutdown, e.g. for AIO during the shutdown checkpoint, a new PMState phase is added. IO workers are shut down after the shutdown checkpoint has been performed and walsender/archiver have shut down, but before the checkpointer itself shuts down. See also `87a6690cc6`. Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-18 11:54:01 -04:00
Jeff Davis	549ea06e42	Fix headerscheck warning. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us	2025-03-18 08:37:07 -07:00
Tom Lane	4078da6c47	Silence compiler warning. Assorted buildfarm members are complaining about "'process_list' may be used uninitialized in this function" since `f76892c9f`, presumably because they don't trust that the switch case labels are exhaustive. We can silence that by initializing the variable to NULL. Should a switch fall-through actually happen, we'll get SIGSEGV at the first use, which is as good as an Assert.	2025-03-18 10:54:10 -04:00
Daniel Gustafsson	daa02c6bd9	Add X25519 to the default set of curves Since many clients default to the X25519 curve in the TLS handshake, the fact that the server by defualt doesn't support it cause an extra roundtrip for each TLS connection. By adding multiple curves, which is supported since `3d1ef3a15c`, we can reduce the risk of extra roundtrips. Author: Daniel Gustafsson <daniel@yesql.se> Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/20240616234612.6cslu7nqexquvwj7@awork3.anarazel.de	2025-03-18 15:26:27 +01:00
Robert Haas	4fd02bf7cf	Add some new hooks so extensions can add details to EXPLAIN. Specifically, add a per-node hook that is called after the per-node information has been displayed but before we display children, and a per-query hook that is called after existing query-level information is printed. This assumes that extension-added information should always go at the end rather than the beginning or the middle, but that seems like an acceptable limitation for simplicity. It also assumes that extensions will only want to add information, not remove or reformat existing details; those also seem like acceptable restrictions, at least for now. If multiple EXPLAIN extensions are used, the order in which any additional details are printed is likely to depend on the order in which the modules are loaded. That seems OK, since the user may have opinions about the order in which output should appear, and the extension author can't really know whether their stuff is more or less important to a particular user than some other extension. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>	2025-03-18 09:28:01 -04:00
Álvaro Herrera	f76892c9ff	Simplify reindexdb coding get_parallel_object_list() was trying to serve two masters, and it was doing a bad job at both. In particular, it treated the given user_list as an output argument, but only sometimes. This was confusing, and the two paths through it didn't really have all that much in common, so the complexity wasn't buying us much. Split it in two: get_parallel_tables_list() handles the straightforward cases for schemas, databases and tables, takes one list as argument and returns another list. A new function get_parallel_tabidx_list() handles the case for indexes. This takes a list as argument and outputs two lists, just like get_parallel_object_list used to do, but now the API is clearer (IMO anyway). Another difference is that accompanying the list of indexes now we have a list of tables as an OID list rather than a fully-qualified table name list. This makes some comparisons easier, and we don't really need the names of the tables, just their OIDs. (This requires atooid, which requires <stdlib.h>). Author: Ranier Vilela <ranier.vf@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAEudQArfqr0-s0VVPSEh=0kgOgBJvFNdGW=xSL5rBcr0WDMQYQ@mail.gmail.com	2025-03-18 14:21:26 +01:00
Melanie Plageman	cc6be07ebd	Increase default maintenance_io_concurrency to 16 Since its introduction in `fc34b0d9de`, the default maintenance_io_concurrency has been larger than the default effective_io_concurrency. maintenance_io_concurrency primarily controlled prefetching done on behalf of the whole system, for operations like recovery. Therefore it makes sense for it to have a value equal to or greater than effective_io_concurrency, which controls I/O concurrency for reading a relation in a bitmap heap scan. `ff79b5b2ab` increased effective_io_concurrency to 16, so we'll increase maintenance_io_concurrency as well. For now, though, we'll keep the defaults of effective_io_concurrency and maintenance_io_concurrency equal to one another (16). On fast, high IOPs systems, significantly higher values of maintenance_io_concurrency are observably beneficial [1]. However, such values would flood low IOPs systems and increase overall system I/O latency. It is worth mentioning that since `9256822608` and `c3e775e608`, maintenance_io_concurrency also controls the I/O concurrency of each vacuum worker. Since many autovacuum workers may be simultaneously issuing I/Os, we want to keep maintenance_io_concurrency appropriately conservative. [1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com Suggested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com	2025-03-18 09:08:10 -04:00
Robert Haas	796bdda484	Fix indentation again. Because somehow I manage to keep forgetting this.	2025-03-18 09:02:36 -04:00
Robert Haas	c65bc2e1d1	Make it possible for loadable modules to add EXPLAIN options. Modules can use RegisterExtensionExplainOption to register new EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState, and SetExplainExtensionState to store related state inside the ExplainState object. Since this substantially increases the amount of code that needs to handle ExplainState-related tasks, move a few bits of existing code to a new file explain_state.c and add the rest of this infrastructure there. See the comments at the top of explain_state.c for further explanation of how this mechanism works. This does not yet provide a way for such such options to do anything useful. The intention is that we'll add hooks for that purpose in a separate commit. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>	2025-03-18 08:41:12 -04:00
Peter Eisentraut	9d6db8bec1	Allow non-btree unique indexes for matviews We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. Stop rejecting these indexes, but instead rely on all unique indexes having equality operators. Unique indexes must have equality operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-03-18 11:29:15 +01:00
Peter Eisentraut	f278e1fe30	Allow non-btree unique indexes for partition keys We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. The problem of not knowing the strategy number for equality in other index AMs is already resolved. Stop rejecting the indexes upfront, and instead reject any for which the equality operator lookup fails. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-03-18 11:25:36 +01:00
Peter Eisentraut	7317e64126	Add some opfamily support functions to lsyscache.c Add get_opfamily_method() and get_opfamily_member_for_cmptype() in lsyscache.c. No callers yet, but we'll add some soon. This is part of generalizing some parts of the code away from having btree hardcoded and use CompareType instead. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-03-18 11:17:43 +01:00
Amit Kapila	122a9af5de	Fix typo. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com	2025-03-18 14:18:09 +05:30
Amit Kapila	01e27aab05	Use correct variable name in publicationcmds.c. subid was used at few places for publicationid in publicationcmds.c/.h. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com	2025-03-18 14:06:51 +05:30
Masahiko Sawada	c462b054ba	Fix the test 005_char_signedness. pg_upgrade test 005_char_signedness was leaving files like delete_old_cluster.sh in the source directory for VPATH and meson builds. The fix is to change the directory to tmp_check before running the test. Reported-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: http://postgr.es/m/CA+TgmoYg5e4oznn0XGoJ3+mceG1qe_JJt34rF2JLwvGS5T1hgQ@mail.gmail.com	2025-03-17 21:34:10 -07:00
Michael Paquier	17caf66445	psql: Add \sendpipeline to send query buffers while in a pipeline In the initial pipeline support for psql added in `41625ab8ea`, \g was used as the way to push extended query into an ongoing pipeline. \gx was blocked. These two meta-commands have format-related options that can be applied when fetching a query result (expanded, etc.). As the results of a pipeline are fetched asynchronously, not at the moment of the meta-command execution but at the moment of a \getresults or a \endpipeline, authorizing \g while blocking \gx leads to a confusing implementation, making one think that psql should be smart enough to remember the output format options defined from the time when \g or \gx were executed. Doing so would lead to more code complications when retrieving a batch of results. There is an extra argument other than simplicity here: the output format options defined at the point of a \getresults or a \endpipeline execution should be what affect the output format for a batch of results. To avoid any confusion, we have settled to the introduction of a new meta-command called \sendpipeline, replacing \g when within a pipeline. An advantage of this design is that it is possible to add new options specific to pipelines when sending a query buffer, independent of \g and \gx, should it prove to be necessary. Most of the changes of this commit happen in the regression tests, where \g is replaced by \sendpipeline. More tests are added to check that \g is not allowed. Per discussion between the author, Daniel Vérité and me. Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/ad4b9f1a-f7fe-4ab8-8546-90754726d0be@manitou-mail.org	2025-03-18 09:41:21 +09:00
Andres Freund	da7226993f	aio: Add core asynchronous I/O infrastructure The main motivations to use AIO in PostgreSQL are: a) Reduce the time spent waiting for IO by issuing IO sufficiently early. In a few places we have approximated this using posix_fadvise() based prefetching, but that is fairly limited (no completion feedback, double the syscalls, only works with buffered IO, only works on some OSs). b) Allow to use Direct-I/O (DIO). DIO can offload most of the work for IO to hardware and thus increase throughput / decrease CPU utilization, as well as reduce latency. While we have gained the ability to configure DIO in `d4e71df6`, it is not yet usable for real world workloads, as every IO is executed synchronously. For portability, the new AIO infrastructure allows to implement AIO using different methods. The choice of the AIO method is controlled by the new io_method GUC. As of this commit, the only implemented method is "sync", i.e. AIO is not actually executed asynchronously. The "sync" method exists to allow to bypass most of the new code initially. Subsequent commits will introduce additional IO methods, including a cross-platform method implemented using worker processes and a linux specific method using io_uring. To allow different parts of postgres to use AIO, the core AIO infrastructure does not need to know what kind of files it is operating on. The necessary behavioral differences for different files are abstracted as "AIO Targets". One example target would be smgr. For boring portability reasons, all targets currently need to be added to an array in aio_target.c. This commit does not implement any AIO targets, just the infrastructure for them. The smgr target will be added in a later commit. Completion (and other events) of IOs for one type of file (i.e. one AIO target) need to be reacted to differently, based on the IO operation and the callsite. This is made possible by callbacks that can be registered on IOs. E.g. an smgr read into a local buffer does not need to update the corresponding BufferDesc (as there is none), but a read into shared buffers does. This commit does not contain any callbacks, they will be added in subsequent commits. For now the AIO infrastructure only understands READV and WRITEV operations, but it is expected that more operations will be added. E.g. fsync/fdatasync, flush_range and network operations like send/recv. As of this commit, nothing uses the AIO infrastructure. Later commits will add an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for read_stream.c IO, which, in one fell swoop, will convert all read stream users to AIO. The goal is to use AIO in many more places. There are patches to use AIO for checkpointer and bgwriter that are reasonably close to being ready. There also are prototypes to use it for WAL, relation extension, backend writes and many more. Those prototypes were important to ensure the design of the AIO subsystem is not too limiting (e.g. WAL writes need to happen in critical sections, which influenced a lot of the design). A future commit will add an AIO README explaining the AIO architecture and how to use the AIO subsystem. The README is added later, as it references details only added in later commits. Many many more people than the folks named below have contributed with feedback, work on semi-independent patches etc. E.g. various folks have contributed patches to use the read stream infrastructure (added by Thomas in `b5a9b18cd0`) in more places. Similarly, a lot of folks have contributed to the CI infrastructure, which I had started to work on to make adding AIO feasible. Some of the work by contributors has gone into the "v1" prototype of AIO, which heavily influenced the current design of the AIO subsystem. None of the code from that directly survives, but without the prototype, the current version of the AIO infrastructure would not exist. Similarly, the reviewers below have not necessarily looked at the current design or the whole infrastructure, but have provided very valuable input. I am to blame for problems, not they. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Co-authored-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m	2025-03-17 18:51:33 -04:00
Andres Freund	02844012b3	aio: Basic subsystem initialization This commit just does the minimal wiring up of the AIO subsystem, added in the next commit, to the rest of the system. The next commit contains more details about motivation and architecture. This commit is kept separate to make it easier to review, separating the changes across the tree, from the implementation of the new subsystem. We discussed squashing this commit with the main commit before merging AIO, but there has been a mild preference for keeping it separate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-03-17 18:51:33 -04:00
Robert Haas	203c1b4cc4	Fix indentation. Commit `99aeb84703` wasn't fully reindented prior to commit.	2025-03-17 16:06:17 -04:00
Nathan Bossart	7e05df430b	pg_upgrade: Remove some dead code. Since commit `e469f0aaf3`, tablespace_suffix can't be empty. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/Z9hc3mkYFKR56Xof%40nathan	2025-03-17 13:18:14 -05:00
Andres Freund	1a22a8a0f1	tests: Expand temp table tests to some pin related matters Added tests: - recovery from running out of unpinned local buffers - that we don't run out of unpinned buffers due to read stream (only recently fixed, in `92fc6856cb`) - temp tables can't be dropped while in use by cursors Discussion: weskknhckugbdm2yt7sa2uq53xlsax67gcdkac34sanb7qpd3p@hcc2wadao5wy Discussion: https://postgr.es/m/ge6nsuddurhpmll3xj22vucvqwp4agqz6ndtcf2mhyeydzarst@l75dman5x53p	2025-03-17 14:12:44 -04:00
Robert Haas	99aeb84703	pg_combinebackup: Add -k, --link option. This is similar to pg_upgrade's --link option, except that here we won't typically be able to use it for every input file: sometimes we will need to reconstruct a complete backup from blocks stored in different files. However, when a whole file does need to be copied, we can use an optimized copying strategy: see the existing --clone and --copy-file-range options and the code to use CopyFile() on Windows. This commit adds a new strategy: add a hard link to an existing file. Making a hard link doesn't actually copy anything, but it makes sense for the code to treat it as doing so. This is useful when the input directories are merely staging directories that will be removed once the restore is complete. In such cases, there is no need to actually copy the data, and making a bunch of new hard links can be very quick. However, it would be quite dangerous to use it if the input directories might later be reused for any other purpose, since starting postgres on the output directory would destructively modify the input directories. For that reason, using this new option causes pg_combinebackup to emit a warning about the danger involved. Author: Israel Barth Rubio <barthisrael@gmail.com> Co-authored-by: Robert Haas <robertmhaas@gmail.com> (cosmetic changes) Reviewed-by: Vignesh C <vignesh21@gmail.com> Discussion: http://postgr.es/m/CA+TgmoaEFsYHsMefNaNkU=2SnMRufKE3eVJxvAaX=OWgcnPmPg@mail.gmail.com	2025-03-17 14:03:14 -04:00
Tom Lane	ed762e9425	Unify wording of user-facing "row security" messages. Row-level security is mostly referred to as "row security" in user-facing messages. Commit `cd3c45125` introduced one inconsistent use of "row level security"; make that one match the rest. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20250317.135305.573764276033358827.horikyota.ntt@gmail.com	2025-03-17 12:53:50 -04:00
Michael Paquier	3943f5cff6	Fix inconsistent quoting for some options in TAP tests This commit addresses some inconsistencies with how the options of some routines from PostgreSQL/Test/ are written, mainly for init() and init_from_backup() in Cluster.pm. These are written as unquoted, except in the locations updated here. Changes extracted from a larger patch by the same author. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org	2025-03-17 14:07:12 +09:00
Michael Paquier	19c6e92b13	Apply more consistent style for command options in TAP tests This commit reshapes the grammar of some commands to apply a more consistent style across the board, following rules similar to `ce1b0f9da0`: - Elimination of some pointless used-once variables. - Use of long options, to self-document better the options used. - Use of fat commas to link option names and their assigned values, including redirections, so as perltidy can be tricked to put them together. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org	2025-03-17 12:42:23 +09:00
Michael Paquier	5721e5453e	Revert "Add redo LSN to pgstats files" This reverts commit `b860848232`, that was added as a prerequisite for the support of pgstats data flush across checkpoints, linking a pgstats file to a specific checkpoint redo LSN. As reported, this is proving to be currently problematic when going through a pg_upgrade, that does direct manipulations of the control file in the new cluster. The LSN stored in the pgstats file is not able to cope with any changes done in the control file by pg_upgrade yet, causing the pgstats file to be discarded when starting the new cluster after overriding its redo LSN (one is a `pg_resetwal -l` where the new cluster's start LSN is bumped by a hardcoded value of 8 segments, see copy_xact_xlog_xid). The least painful path going forward is likely going to be a refactor of the pgstats code so as it is possible to read and write some of its data with some routines in src/common/, so as pg_upgrade or pg_resetwal are able to update its data. The main point is that we are going to need a LSN in the stats file should we make it written at checkpoint time and not only as part of a shutdown sequence. It is too late to dive into these details for v18, so let's revert the change, and let's try to figure out all the details in the next release cycle. The pgstats file is currently only written as part of a shutdown sequence, and its contents are still lost on crash, same as older releases. Bump PGSTAT_FILE_FORMAT_ID. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2563883.1741826489@sss.pgh.pa.us	2025-03-17 08:35:12 +09:00
Tom Lane	cd3c45125d	pg_dump, pg_dumpall, pg_restore: Add --no-policies option. Add --no-policies option to control row level security policy handling in dump and restore operations. When this option is used, both CREATE POLICY commands and ALTER TABLE ... ENABLE ROW LEVEL SECURITY commands are excluded from dumps and skipped during restores. This is useful in scenarios where policies need to be redefined in the target system or when moving data between environments with different security requirements. Author: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com> Discussion: https://postgr.es/m/CAM527d8kG2qPKvbfJ=OYJkT7iRNd623Bk+m-a4ngm+nyHYsHog@mail.gmail.com	2025-03-16 18:08:15 -04:00
Alexander Korotkov	682c5be25c	reindexdb: Fix the index-level REINDEX with multiple jobs `47f99a407d` introduced a parallel index-level REINDEX. The code was written assuming that running run_reindex_command() with 'async == true' can schedule a number of queries for a connection. That's not true, and the second query sent using run_reindex_command() will wait for the completion of the previous one. This commit fixes that by putting REINDEX commands for the same table into a single query. Also, this commit removes the 'async' argument from run_reindex_command(), as only its call always passes 'async == true'. Reported-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202503071820.j25zn3lo4hvn%40alvherre.pgsql Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Backpatch-through: 17	2025-03-16 13:29:15 +02:00
Michael Paquier	83e5763d4d	pg_createsubscriber: Remove some code bloat in the atexit() callback This commit adjusts some code added by `e117cfb2f6` in the atexit() callback of pg_createsubscriber.c, in charge of performing post-failure cleanup actions. The code loops over all the databases specified, and it is changed here to rely on a single LogicalRepInfo for each database rather than always using LogicalRepInfos, simplifying its logic. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PtdBSVi4iH7BObDVwDNVwOpn+H3fezOBdSTtENx+rhNMw@mail.gmail.com	2025-03-16 19:20:49 +09:00
Andres Freund	771ba90298	localbuf: Introduce StartLocalBufferIO() To initiate IO on a shared buffer we have StartBufferIO(). For temporary table buffers no similar function exists - likely because the code for that currently is very simple due to the lack of concurrency. However, the upcoming AIO support will make it possible to re-encounter a local buffer, while the buffer already is the target of IO. In that case we need to wait for already in-progress IO to complete. This commit makes it easier to add the necessary code, by introducing StartLocalBufferIO(). Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	4b4d33b9ea	localbuf: Introduce FlushLocalBuffer() Previously we had two paths implementing writing out temporary table buffers. For shared buffers, the logic for that is centralized in FlushBuffer(). Introduce FlushLocalBuffer() to do the same for local buffers. Besides being a nice cleanup on its own, it also makes an upcoming change slightly easier. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	dd6f2618f6	localbuf: Introduce TerminateLocalBufferIO() Previously TerminateLocalBufferIO() was open-coded in multiple places, which doesn't seem like a great idea. While TerminateLocalBufferIO() currently is rather simple, an upcoming patch requires additional code to be added to TerminateLocalBufferIO(), making this modification particularly worthwhile. For some reason FlushRelationBuffers() previously cleared BM_JUST_DIRTIED, even though that's never set for temporary buffers. This is not carried over as part of this change. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	0762a151b0	localbuf: Introduce InvalidateLocalBuffer() Previously, there were three copies of this code, two of them identical. There's no good reason for that. This change is nice on its own, but the main motivation is the AIO patchset, which needs to add extra checks the deduplicated code, which of course is easier if there is only one version. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	fa6af9b25e	localbuf: Fix dangerous coding pattern in GetLocalVictimBuffer() If PinLocalBuffer() were to modify the buf_state, the buf_state in GetLocalVictimBuffer() would be out of date. Currently that does not happen, as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and GetLocalVictimBuffer() passes false. However, it's easy to make this not the case anymore - it cost me a few hours to debug the consequences. The minimal fix would be to just refetch the buf_state after after calling PinLocalBuffer(), but the same danger exists in later parts of the function. Instead, declare buf_state in the narrower scopes and re-read the state in conditional branches. Besides being safer, it also fits well with an upcoming set of cleanup patches that move the contents of the conditional branches in GetLocalVictimBuffer() into helper functions. I "broke" this in `794f259447`. Arguably this should be backpatched, but as the relevant functions are not exported and there is no actual misbehaviour, I chose to not backpatch, at least for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andrew Dunstan	5eabd91a83	Silence perl critic Commit `27bdec0684` uses a loop variable that is not strictly local to the loop. Perlcritic disapproves, and there's really no reason as the variable is not used outside the loop. Per buildfarm animals koel and crake.	2025-03-15 17:41:54 -04:00
Jeff Davis	27bdec0684	Optimization for lower(), upper(), casefold() functions. Improve performance and reduce table sizes for case mapping. The main case mapping table stores only 16-bit offsets, which can be used to look up the mapped code point in any of the case tables (fold, lower, upper, or title case). Simple case pairs point to the same offsets. Generate a function in generate-unicode_case_table.pl that consists of a nested branches to test for specific codepoint ranges that determine the offset in the main table. Other approaches were considered, such as representing these ranges as another structure (rather than branches in a generated function), or a different approach such as a radix tree, or perfect hashing. The author implemented and tested these alternatives and settled on the generated branches. Author: Alexander Borisov <lex.borisov@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com	2025-03-15 13:00:50 -07:00
Melanie Plageman	c3953226a0	Remove table AM callback scan_bitmap_next_block After pushing the bitmap iterator into table-AM specific code (as part of making bitmap heap scan use the read stream API in `2b73a8cd33`), scan_bitmap_next_block() no longer returns the current block number. Since scan_bitmap_next_block() isn't returning any relevant information to bitmap table scan code, it makes more sense to get rid of it. Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(), and the heap AM implementation of scan_bitmap_next_block() is a local helper in heapam_handler.c. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2025-03-15 10:37:46 -04:00
Melanie Plageman	2b73a8cd33	BitmapHeapScan uses the read stream API Make Bitmap Heap Scan use the read stream API instead of invoking ReadBuffer() for each block indicated by the bitmap. The read stream API handles prefetching, so remove all of the explicit prefetching from bitmap heap scan code. Now, heap table AM implements a read stream callback which uses the bitmap iterator to return the next required block to the read stream code. Tomas Vondra conducted extensive regression testing of this feature. Andres Freund, Thomas Munro, and I analyzed regressions and Thomas Munro patched the read stream API. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Tested-by: Tomas Vondra <tomas@vondra.me> Tested-by: Andres Freund <andres@anarazel.de> Tested-by: Thomas Munro <thomas.munro@gmail.com> Tested-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2025-03-15 10:34:42 -04:00
Melanie Plageman	944e81bf99	Separate TBM[Shared\|Private]Iterator and TBMIterateResult Remove the TBMIterateResult member from the TBMPrivateIterator and TBMSharedIterator and make tbm_[shared\|private_]iterate() take a TBMIterateResult as a parameter. This allows tidbitmap API users to manage multiple TBMIterateResults per scan. This is required for bitmap heap scan to use the read stream API, with which there may be multiple I/Os in flight at once, each one with a TBMIterateResult. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me	2025-03-15 10:11:19 -04:00
Thomas Munro	799959dc7c	Simplify distance heuristics in read_stream.c. Make the distance control heuristics simpler and more aggressive in preparation for asynchronous I/O. The v17 version of read_stream.c made a conservative choice to limit the look-ahead distance when streaming sequential blocks, because it couldn't benefit very much from looking ahead further yet. It had a three-behavior model where only random I/O would rapidly increase the look-ahead distance, to support read-ahead advice. Sequential I/O would move it towards the io_combine_limit setting, just enough to build one full-sized synchronous I/O at a time, and then expect kernel read-ahead to avoid I/O stalls. That already left I/O performance on the table with advice-based I/O concurrency, since sequential blocks could be followed by random jumps, eg with the proposed streaming Bitmap Heap Scan patch. It is time to delete the cautious middle option and adjust the distance based on recent I/O needs only, since asynchronous reads will need to be started ahead of time whether random or sequential. It is still limited by io_combine_limit, *_io_concurrency, buffer availability and strategy ring size, as before. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Tested-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com	2025-03-16 03:05:07 +13:00

... 2 3 4 5 6 ...

45349 commits