postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-21 22:28:15 -04:00

Author	SHA1	Message	Date
Thomas Munro	a8458f508a	Fix lost Windows socket EOF events. Winsock only signals an FD_CLOSE event once if the other end of the socket shuts down gracefully. Because each WaitLatchOrSocket() call constructs and destroys a new event handle every time, with unlucky timing we can lose it and hang. We get away with this only if the other end disconnects non-gracefully, because FD_CLOSE is repeatedly signaled in that case. To fix this design flaw in our Windows socket support fundamentally, we'd probably need to rearchitect it so that a single event handle exists for the lifetime of a socket, or switch to completely different multiplexing or async I/O APIs. That's going to be a bigger job and probably wouldn't be back-patchable. This brute force kludge closes the race by explicitly polling with MSG_PEEK before sleeping. Back-patch to all supported releases. This should hopefully clear up some random build farm and CI hang failures reported over the years. It might also allow us to try using graceful shutdown in more places again (reverted in commit `29992a6`) to fix instability in the transmission of FATAL error messages, but that isn't done by this commit. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/176008.1715492071%40sss.pgh.pa.us	2024-07-13 14:59:46 +12:00
Alvaro Herrera	8391779138	Fix ALTER TABLE DETACH for inconsistent indexes When a partitioned table has an index that doesn't support a constraint, but a partition has an equivalent index that does, then a DETACH operation would misbehave: a crash in assertion-enabled systems (because we fail to find the constraint in the parent that we expect to), or a broken coninhcount value (-1) in production systems (because we blindly believe that we've successfully detached the parent). While we should reject an ATTACH of a partition with such an index, we have failed to do so in existing releases, so adding an error in stable releases might break the (unlikely) existing applications that rely on this behavior. At this point I don't even want to reject them in master, because it'd break pg_upgrade if such databases exist, and there would be no easy way to fix existing databases without expensive index rebuilds. (Later on we could add ALTER TABLE ... ADD CONSTRAINT USING INDEX to partitioned tables, which would allow the user to fix such patterns. At that point we could add more restrictions to prevent the problem from its root.) Also, add a test case that leaves one table in this condition, so that we can verify that pg_upgrade continues to work if we later decide to change the policy on the master branch. Backpatch to all supported branches. Co-authored-by: Tender Wang <tndrwang@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18500-62948b6fe5522f56@postgresql.org	2024-07-12 12:54:01 +02:00
Michael Paquier	734c057a89	Add assertion in pgstat_write_statsfile() about processes allowed This routine can currently only be called from the postmaster in single-user mode or the checkpointer, but there was no sanity check to make sure that this was always the case. This has proved to be useful when hacking the zone (at least to me), to make sure that the write of the pgstats file happens at shutdown, as wanted by design, in the correct process context. Discussion: https://postgr.es/m/ZnEiqAITL-VgZDoY@paquier.xyz	2024-07-12 15:09:53 +09:00
Amit Kapila	63909da978	Fix a typo in logicalrep_write_typ(). Author: ChangAo Chen Discussion: https://postgr.es/m/tencent_CDECB843B30A8B6B5152FA6458F0F00FDE09@qq.com	2024-07-12 10:20:59 +05:30
Richard Guo	22d946b0f8	Consider materializing the cheapest inner path in parallel nestloop When generating non-parallel nestloop paths for each available outer path, we always consider materializing the cheapest inner path if feasible. Similarly, in this patch, we also consider materializing the cheapest inner path when building partial nestloop paths. This approach potentially reduces the need to rescan the inner side of a partial nestloop path for each outer tuple. Author: Tender Wang Reviewed-by: Richard Guo, Robert Haas, David Rowley, Alena Rybakina Reviewed-by: Tomasz Rybak, Paul Jungwirth, Yuki Fujii Discussion: https://postgr.es/m/CAHewXNkPmtEXNfVQMou_7NqQmFABca9f4etjBtdbbm0ZKDmWvw@mail.gmail.com	2024-07-12 11:16:43 +09:00
Michael Paquier	72c0b24b2d	Improve comment of pgstat_read_statsfile() The comment at the top of pgstat_read_statsfile() mentioned that the stats are read from the on-disk file into the pgstats dshash. This is incorrect for fix-numbered stats as these are loaded directly into shared memory. This commit simplifies the comment to be more general. Author: Bertrand Drouvot Discussion: https://postgr.es/m/Zo/eJIHUcqKxeSgv@ip-10-97-1-34.eu-west-3.compute.internal	2024-07-12 09:31:33 +09:00
Tom Lane	0d8bd0a72e	Improve logical replication connection-failure messages. These messages mostly said "could not connect to the publisher: %s" which is lacking context. Add some verbiage to indicate which subscription or worker process is failing. Nisha Moond Discussion: https://postgr.es/m/CABdArM7q1=zqL++cYd0hVMg3u_tc0S=0Of=Um-KvDhLony0cSg@mail.gmail.com	2024-07-11 13:21:13 -04:00
Tom Lane	a0f1fce80c	Add min and max aggregates for composite types (records). Like min/max for arrays, these are just thin wrappers around the existing btree comparison function for records. Aleksander Alekseev Discussion: https://postgr.es/m/CAO=iB8L4WYSNxCJ8GURRjQsrXEQ2-zn3FiCsh2LMqvWq2WcONg@mail.gmail.com	2024-07-11 11:50:50 -04:00
Masahiko Sawada	bb19b70081	Fix possibility of logical decoding partial transaction changes. When creating and initializing a logical slot, the restart_lsn is set to the latest WAL insertion point (or the latest replay point on standbys). Subsequently, WAL records are decoded from that point to find the start point for extracting changes in the DecodingContextFindStartpoint() function. Since the initial restart_lsn could be in the middle of a transaction, the start point must be a consistent point where we won't see the data for partial transactions. Previously, when not building a full snapshot, serialized snapshots were restored, and the SnapBuild jumps to the consistent state even while finding the start point. Consequently, the slot's restart_lsn and confirmed_flush could be set to the middle of a transaction. This could lead to various unexpected consequences. Specifically, there were reports of logical decoding decoding partial transactions, and assertion failures occurred because only subtransactions were decoded without decoding their top-level transaction until decoding the commit record. To resolve this issue, the changes prevent restoring the serialized snapshot and jumping to the consistent state while finding the start point. On v17 and HEAD, a flag indicating whether snapshot restores should be skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION has been bumpded. On backbranches, the flag is stored in the LogicalDecodingContext instead, preserving on-disk compatibility. Backpatch to all supported versions. Reported-by: Drew Callahan Reviewed-by: Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com Backpatch-through: 12	2024-07-11 22:48:23 +09:00
Michael Paquier	9e4664d950	Add a new 'F' entry type for fixed-numbered stats in pgstats file This new entry type is used for all the fixed-numbered statistics, making possible support for custom pluggable stats. In short, we need to be able to detect more easily if a stats kind exists or not when reading back its data from the pgstats file without a dependency on the order of the entries read. The kind ID of the stats is added to the data written. The data is written in the same fashion as previously, with the fixed-numbered stats first and the dshash entries next. The read part becomes more flexible, loading fixed-numbered stats into shared memory based on the new entry type found. Bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 16:12:44 +09:00
Michael Paquier	21471f18e9	Add PgStat_KindInfo.init_shmem_cb This new callback gives fixed-numbered stats the possibility to take actions based on the area of shared memory allocated for them. This removes from pgstat_shmem.c any knowledge specific to the types of fixed-numbered stats, and the initializations happen in their own files. Like `b68b29bc8f`, this change is useful to make this area of the code more pluggable, so as custom fixed-numbered stats can take actions after their shared memory area is initialized. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 09:21:40 +09:00
Dean Rasheed	0dcf753bd8	Improve the numeric width_bucket() computation. Formerly, the computation of the bucket index involved calling div_var() with a scale determined by select_div_scale(), and then taking the floor of the result. That involved computing anything from 16 to 1000 digits after the decimal point, only for floor_var() to throw them away. In addition, the quotient was computed with rounding in the final digit, which meant that in rare cases the whole result could round up to the wrong bucket, and could exceed count. Thus it was also necessary to clamp the result to the range [1, count], though that didn't prevent the result being in the wrong internal bucket. Instead, compute the quotient using floor division, which guarantees the correct result, as specified by the SQL spec, and doesn't need to be clamped. This is both much simpler and more efficient, since it no longer computes any quotient digits after the decimal point. In addition, it is not necessary to have separate code to handle reversed bounds, since the signs cancel out when dividing. As with `b0e9e4d76c` and `a2a0c7c29e`, no back-patch. Dean Rasheed, reviewed by Joel Jacobson. Discussion: https://postgr.es/m/CAEZATCVbJH%2BLE9EXW8Rk3AxLe%3DjbOk2yrT_AUJGGh5Rah6zoeg%40mail.gmail.com	2024-07-10 20:07:20 +01:00
Michael Paquier	d898665bf7	Extend pg_get_acl() to handle sub-object IDs This patch modifies the pg_get_acl() function to accept a third argument called "objsubid", bringing it on par with similar functions in this area like pg_describe_object(). This enables the retrieval of ACLs for relation attributes when scanning dependencies. Bump catalog version. Author: Joel Jacobson Discussion: https://postgr.es/m/f2539bff-64be-47f0-9f0b-df85d3cc0432@app.fastmail.com	2024-07-10 10:14:37 +09:00
Jeff Davis	b3bd18294e	Fix missing invalidations for search_path cache. Reported-by: Noah Misch Discussion: https://postgr.es/m/20240630223047.1f.nmisch@google.com Backpatch-through: 17	2024-07-09 12:37:05 -07:00
Tom Lane	e7192486dd	Suppress "chunk is not well balanced" errors from libxml2. libxml2 2.13 has an entirely different rule than earlier versions about when to emit "chunk is not well balanced" errors. This causes regression test output discrepancies for three test cases that formerly provoked that error (along with others) and now don't. Closer inspection shows that at least in 2.13, this error is pretty useless because it can only be emitted after some other more-relevant error. So let's get rid of the cross-version discrepancy by just suppressing it. In case some older libxml2 version is capable of emitting this error by itself, suppress only when some other error has already been captured. Like `066e8ac6e` and `6082b3d5d`, this will need to be back-patched, but let's check the results in HEAD first. (The patch for xml_2.out, in particular, is blind since I can't test it here.) Erik Wienhold and Tom Lane, per report from Frank Streitzig. Discussion: https://postgr.es/m/trinity-b0161630-d230-4598-9ebc-7a23acdb37cb-1720186432160@3c-app-gmx-bap25 Discussion: https://postgr.es/m/trinity-361ba18b-541a-4fe7-bc63-655ae3a7d599-1720259822452@3c-app-gmx-bs01	2024-07-09 15:01:13 -04:00
Nathan Bossart	ccd38024bc	Introduce pg_signal_autovacuum_worker. Since commit `3a9b18b309`, roles with privileges of pg_signal_backend cannot signal autovacuum workers. Many users treated the ability to signal autovacuum workers as a feature instead of a bug, so we are reintroducing it via a new predefined role. Having privileges of this new role, named pg_signal_autovacuum_worker, only permits signaling autovacuum workers. It does not permit signaling other types of superuser backends. Bumps catversion. Author: Kirill Reshke Reviewed-by: Anthony Leung, Michael Paquier, Andrey Borodin Discussion: https://postgr.es/m/CALdSSPhC4GGmbnugHfB9G0%3DfAxjCSug_-rmL9oUh0LTxsyBfsg%40mail.gmail.com	2024-07-09 13:03:40 -05:00
Fujii Masao	629520be5f	Fix comment in libpqrcv_check_conninfo(). Previously, the comment incorrectly stated that libpqrcv_check_conninfo() returns true or false based on the connection string check. However, this function actually has a void return type and raises an error if the check fails. Author: Rintaro Ikeda Reviewed-by: Jelte Fennema-Nio, Fujii Masao Discussion: https://postgr.es/m/6a1ca81b27fec4da0ccdfaaaec787982@oss.nttdata.com	2024-07-09 21:30:18 +09:00
Dean Rasheed	ca481d3c9a	Optimise numeric multiplication for short inputs. When either input has a small number of digits, and the exact product is requested, the speed of numeric multiplication can be increased significantly by using a faster direct multiplication algorithm. This works by fully computing each result digit in turn, starting with the least significant, and propagating the carry up. This save cycles by not requiring a temporary buffer to store digit products, not making multiple passes over the digits of the longer input, and not requiring separate carry-propagation passes. For now, this is used when the shorter input has 1-4 NBASE digits (up to 13-16 decimal digits), and the longer input is of any size, which covers a lot of common real-world cases. Also, the relative benefit increases as the size of the longer input increases. Possible future work would be to try extending the technique to larger numbers of digits in the shorter input. Joel Jacobson and Dean Rasheed. Discussion: https://postgr.es/m/44d2ffca-d560-4919-b85a-4d07060946aa@app.fastmail.com	2024-07-09 10:00:42 +01:00
Amit Kapila	571f7f7086	To improve the code, move the error check in logical_read_xlog_page(). Commit `0fdab27ad6` changed the code to wait for WAL to be available before determining the timeline but forgot to move the failure check. This change is to make the related code easier to understand and enhance otherwise there is no bug in the current code. In the passing, improve the nearby comments to explain why we determine am_cascading_walsender after waiting for the required WAL. Author: Peter Smith Reviewed-by: Bertrand Drouvot, Amit Kapila Discussion: https://postgr.es/m/CAHut+PvqX49fusLyXspV1Mmd_EekPtXG0oT146vZjcb9XDvNgw@mail.gmail.com	2024-07-09 09:00:45 +05:30
Michael Paquier	b68b29bc8f	Use pgstat_kind_infos to write fixed shared statistics This is similar to `9004abf620`, but this time for the write part of the stats file. The code is changed so as, rather than referring to individual members of PgStat_Snapshot in an order based on their PgStat_Kind value, a loop based on pgstat_kind_infos is used to retrieve the contents to write from the snapshot structure, for a size of PgStat_KindInfo's shared_data_len. This requires the addition to PgStat_KindInfo of an offset to track the location of each fixed-numbered stats in PgStat_Snapshot. This change is useful to make this area of the code more easily pluggable, and reduces the knowledge of specific fixed-numbered kinds in pgstat.c. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-09 10:27:12 +09:00
Fujii Masao	c8d5d6c78a	Fix limit block handling in pg_wal_summary_contents(). Previously, pg_wal_summary_contents() had two issues, causing discrepancies between pg_wal_summary_contents() and the pg_walsummary command on the same WAL summary file: (1) It did not emit the limit block when that's the only data for a particular relation fork. (2) It emitted the same limit block multiple times if the list of block numbers was long enough. This commit fixes these issues. Backpatch to v17 where pg_wal_summary_contents() was added. Author: Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/90980ee6-2da6-42f6-a7b0-b7bae62ae279@oss.nttdata.com	2024-07-09 09:26:54 +09:00
David Rowley	5a1e6df3b8	Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE Nodes like Memoize report the cache stats for each parallel worker, so it makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan in a similar way. Likewise, Sort shows the method and memory used for each worker. There was some discussion on whether the leader stats should include the totals for each parallel worker or not. I did some analysis on this to see what other parallel node types do and it seems only Parallel Hash does anything like this. All the rest, per what's supported by ExecParallelRetrieveInstrumentation() are consistent with each other. Author: David Geier <geidav.pg@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Donghang Lin <donghanglin@gmail.com> Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Christofides <michael@pgmustard.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Donghang Lin <donghanglin@gmail.com> Reviewed-by: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com> Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com	2024-07-09 12:15:47 +12:00
David Rowley	036bdcec9f	Teach planner how to estimate rows for timestamp generate_series This provides the planner with row estimates for generate_series(TIMESTAMP, TIMESTAMP, INTERVAL), generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL) and generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL, TEXT) when the input parameter values can be estimated during planning. Author: David Rowley Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAApHDvrBE%3D%2BASo_sGYmQJ3GvO8GPvX5yxXhRS%3Dt_ybd4odFkhQ%40mail.gmail.com	2024-07-09 09:54:59 +12:00
Nathan Bossart	64f34eb2e2	Use CREATE DATABASE ... STRATEGY = FILE_COPY in pg_upgrade. While this strategy is ordinarily quite costly because it requires performing two checkpoints, testing shows that it tends to be a faster choice than WAL_LOG during pg_upgrade, presumably because fsync is turned off. Furthermore, we can skip the checkpoints altogether because the problems they are intended to prevent don't apply to pg_upgrade. Instead, we just need to CHECKPOINT once in the new cluster after making any changes to template0 and before restoring the rest of the databases. This ensures that said template0 changes are written out to disk prior to creating the databases via FILE_COPY. Co-authored-by: Matthias van de Meent Reviewed-by: Ranier Vilela, Dilip Kumar, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/Zl9ta3FtgdjizkJ5%40nathan	2024-07-08 16:18:00 -05:00
Tom Lane	6082b3d5d3	Use xmlParseInNodeContext not xmlParseBalancedChunkMemory. xmlParseInNodeContext has basically the same functionality with a different API: we have to supply an xmlNode that's attached to a document rather than just the document. That's not hard though. The benefits are two: * Early 2.13.x releases of libxml2 contain a bug that causes xmlParseBalancedChunkMemory to return the wrong status value in some cases. This breaks our regression tests. While that bug is now fixed upstream and will probably never be seen in any production-oriented distro, it is currently a problem on some more-bleeding-edge-friendly platforms. * xmlParseBalancedChunkMemory is considered to depend on libxml2's semi-deprecated SAX1 APIs, and will go away when and if they do. There may already be libxml2 builds out there that lack this function. So there are both short- and long-term reasons to make this change. While here, avoid allocating an xmlParserCtxt in DOCUMENT parse mode, since that code path is not going to use it. Like `066e8ac6e`, this will need to be back-patched. This is just a trial commit to see if the buildfarm agrees that we can use xmlParseInNodeContext unconditionally. Erik Wienhold and Tom Lane, per report from Frank Streitzig. Discussion: https://postgr.es/m/trinity-b0161630-d230-4598-9ebc-7a23acdb37cb-1720186432160@3c-app-gmx-bap25 Discussion: https://postgr.es/m/trinity-361ba18b-541a-4fe7-bc63-655ae3a7d599-1720259822452@3c-app-gmx-bs01	2024-07-08 14:04:00 -04:00
Dean Rasheed	1ff39f4ff2	Fix scale clamping in numeric round() and trunc(). The numeric round() and trunc() functions clamp the scale argument to the range between +/- NUMERIC_MAX_RESULT_SCALE (2000), which is much smaller than the actual allowed range of type numeric. As a result, they return incorrect results when asked to round/truncate more than 2000 digits before or after the decimal point. Fix by using the correct upper and lower scale limits based on the actual allowed (and documented) range of type numeric. While at it, use the new NUMERIC_WEIGHT_MAX constant instead of SHRT_MAX in all other overflow checks, and fix a comment thinko in power_var() introduced by `e54a758d24` -- the minimum value of ln_dweight is -NUMERIC_DSCALE_MAX (-16383), not -SHRT_MAX, though this doesn't affect the point being made in the comment, that the resulting local_rscale value may exceed NUMERIC_MAX_DISPLAY_SCALE (1000). Back-patch to all supported branches. Dean Rasheed, reviewed by Joel Jacobson. Discussion: https://postgr.es/m/CAEZATCXB%2BrDTuMjhK5ZxcouufigSc-X4tGJCBTMpZ3n%3DxxQuhg%40mail.gmail.com	2024-07-08 17:48:45 +01:00
David Rowley	7340d9362a	Widen lossy and exact page counters for Bitmap Heap Scan Both of these counters were using the "long" data type. On MSVC that's a 32-bit type. On modern hardware, I was able to demonstrate that we can wrap those counters with a query that only takes 15 minutes to run. This issue may manifest itself either by not showing the values of the counters because they've wrapped and are less than zero, resulting in them being filtered by the > 0 checks in show_tidbitmap_info(), or bogus numbers being displayed which are modulus 2^32 of the actual number. Widen these counters to uint64. Discussion: https://postgr.es/m/CAApHDvpS_97TU+jWPc=T83WPp7vJa1dTw3mojEtAVEZOWh9bjQ@mail.gmail.com	2024-07-08 14:43:09 +12:00
Richard Guo	d7db04dfda	Remove an extra period in code comment Author: Junwang Zhao Discussion: https://postgr.es/m/CAEG8a3L9GgfKc+XT+NMHPY7atAOVYqjUqKEFQKhcPHFYRW=PuQ@mail.gmail.com	2024-07-08 11:17:22 +09:00
Richard Guo	0ffc0acaf3	Fix right-anti-joins when the inner relation is proven unique For an inner_unique join, we always assume that the executor will stop scanning for matches after the first match. Therefore, for a mergejoin that is inner_unique and whose mergeclauses are sufficient to identify a match, we set the skip_mark_restore flag to true, indicating that the executor need not do mark/restore calls. However, merge-right-anti-join did not get this memo and continues scanning the inner side for matches after the first match. If there are duplicates in the outer scan, we may incorrectly skip matching some inner tuples, which can lead to wrong results. Here we fix this issue by ensuring that merge-right-anti-join also advances to next outer tuple after the first match in inner_unique cases. This also saves cycles by avoiding unnecessary scanning of inner tuples after the first match. Although hash-right-anti-join does not suffer from this wrong results issue, we apply the same change to it as well, to help save cycles for the same reason. Per bug #18522 from Antti Lampinen, and bug #18526 from Feliphe Pozzer. Back-patch to v16 where right-anti-join was introduced. Author: Richard Guo Discussion: https://postgr.es/m/18522-c7a8956126afdfd0@postgresql.org	2024-07-08 10:11:46 +09:00
Tom Lane	066e8ac6ea	Use xmlAddChildList not xmlAddChild in XMLSERIALIZE. It looks like we should have been doing this all along, but we got away with the wrong coding until libxml2 2.13.0 tightened up xmlAddChild's behavior. There is more stuff to be fixed to be compatible with 2.13.0, and it will all need to be back-patched. This is just a trial commit to see if the buildfarm agrees that we can use xmlAddChildList unconditionally. Erik Wienhold, per report from Frank Streitzig. Discussion: https://postgr.es/m/trinity-b0161630-d230-4598-9ebc-7a23acdb37cb-1720186432160@3c-app-gmx-bap25 Discussion: https://postgr.es/m/trinity-361ba18b-541a-4fe7-bc63-655ae3a7d599-1720259822452@3c-app-gmx-bs01	2024-07-06 15:16:13 -04:00
David Rowley	04bcf9e19a	Adjust tuplestore.c not to allocate BufFiles in generation context `590b045c3` made it so tuplestore.c would store tuples inside a generation.c memory context. After fixing a bug report in `97651b013`, it seems that it's probably best not to allocate BufFile related allocations in that context. Let's keep it just for tuple data. This adjusts the code to switch to the Tuplestorestate.context's parent, which is the MemoryContext that tuplestore_begin_common() was called in. It does not seem worth adding a new field in Tuplestorestate to store this when we can access it by looking at the Tuplestorestate's context's parent. Discussion: https://postgr.es/m/CAApHDvqFt_CdJtSr+E9YLZb7jZAyRCy3hjQ+ktM+dcOFVq-xkg@mail.gmail.com	2024-07-06 17:40:05 +12:00
David Rowley	97651b0139	Fix incorrect sentinel byte logic in GenerationRealloc() This only affects MEMORY_CONTEXT_CHECKING builds. This fixes an off-by-one issue in GenerationRealloc() where the fast-path code which tries to reuse the existing allocation if the existing chunk is >= the new requested size. The code there thought it was always ok to use the existing chunk, but when oldsize == size there isn't enough space to store the sentinel byte. If both sizes matched exactly set_sentinel() would overwrite the first byte beyond the chunk and then subsequent GenerationRealloc() calls could then fail the Assert(chunk->requested_size < oldsize) check which is trying to ensure the chunk is large enough to store the sentinel. The same issue does not exist in aset.c as the sentinel checking code only adds a sentinel byte if there's enough space in the chunk. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/49275921-7b39-41af-5eb8-97b50ce3312e@gmail.com Backpatch-through: 16, where the problem was introduced by `0e480385e`	2024-07-06 13:59:34 +12:00
Nathan Bossart	0b1fe1413e	Remove check hooks for GUCs that contribute to MaxBackends. Each of max_connections, max_worker_processes, autovacuum_max_workers, and max_wal_senders has a GUC check hook that verifies the sum of those GUCs does not exceed a hard-coded limit (see the comment for MAX_BACKENDS in postmaster.h). In general, the hooks effectively guard against egregious misconfigurations. However, this approach has some problems. Since these check hooks are called as each GUC is assigned its user-specified value, only one of the hooks will be called with all the relevant GUCs set. If one or more of the user-specified values are less than the initial values of the GUCs' underlying variables, false positives can occur. Furthermore, the error message emitted when one of the check hooks fails is not tremendously helpful. For example, the command $ pg_ctl -D . start -o "-c max_connections=262100 -c max_wal_senders=10000" fails with the following error: FATAL: invalid value for parameter "max_wal_senders": 10000 Fortunately, there is an extra copy of this check in InitializeMaxBackends() that we can rely on, so this commit removes the aforementioned GUC check hooks in favor of that one. It also enhances the error message to clearly show the values of the relevant GUCs and the hard-coded limit their sum may not exceed. The downside of this change is that server startup progresses further before failing due to such misconfigurations (thus taking longer), but these failures are expected to be rare, so we don't anticipate any real harm in practice. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/ZnMr2k-Nk5vj7T7H%40nathan	2024-07-05 14:42:55 -05:00
Michael Paquier	4b211003ec	Support loading of injection points This can be used to load an injection point and prewarm the backend-level cache before running it, to avoid issues if the point cannot be loaded due to restrictions in the code path where it would be run, like a critical section where no memory allocation can happen (load_external_function() can do allocations when expanding a library name). Tests can use a macro called INJECTION_POINT_LOAD() to load an injection point. The test module injection_points gains some tests, and a SQL function able to load an injection point. Based on a request from Andrey Borodin, who has implemented a test for multixacts requiring this facility. Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/ZkrBE1e2q2wGvsoN@paquier.xyz	2024-07-05 18:09:03 +09:00
Heikki Linnakangas	98347b5a3a	Lift limitation that PGPROC->links must be the first field Since commit `5764f611e1`, we've been using the ilist.h functions for handling the linked list. There's no need for 'links' to be the first element of the struct anymore, except for one call in InitProcess where we used a straight cast from the 'dlist_node ' to PGPROC , without the dlist_container() macro. That was just an oversight in commit `5764f611e1`, fix it. There no imminent need to move 'links' from being the first field, but let's be tidy. Reviewed-by: Aleksander Alekseev, Andres Freund Discussion: https://www.postgresql.org/message-id/22aa749e-cc1a-424a-b455-21325473a794@iki.fi	2024-07-05 11:21:46 +03:00
David Rowley	590b045c37	Improve memory management and performance of tuplestore.c Here we make tuplestore.c use a generation.c memory context rather than allocating tuples into the CurrentMemoryContext, which primarily is the ExecutorState or PortalHoldContext memory context. Not having a dedicated context can cause the CurrentMemoryContext context to become bloated when pfree'd chunks are not reused by future tuples. Using generation speeds up users of tuplestore.c, such as the Materialize, WindowAgg and CTE Scan executor nodes. The main reason for the speedup is due to generation.c being more memory efficient than aset.c memory contexts. Specifically, generation does not round sizes up to the next power of 2 value. This both saves memory, allowing more tuples to fit in work_mem, but also makes the memory usage more compact and fit on fewer cachelines. One benchmark showed up to a 22% performance increase in a query containing a Materialize node. Much higher gains are possible if the memory reduction prevents tuplestore.c from spilling to disk. This is especially true for WindowAgg nodes where improvements of several thousand times are possible if the memory reductions made here prevent tuplestore from spilling to disk. Additionally, a generation.c memory context is much better suited for this job as it works well with FIFO palloc/pfree patterns, which is exactly how tuplestore.c uses it. Because of the way generation.c allocates memory, tuples consecutively stored in tuplestores are much more likely to be stored consecutively in memory. This allows the CPU's hardware prefetcher to work more efficiently as it provides a more predictable pattern to allow cachelines for the next tuple to be loaded from RAM in advance of them being needed by the executor. Using a dedicated memory context for storing tuples also allows us to more efficiently clean up the memory used by the tuplestore as we can reset or delete the context rather than looping over all stored tuples and pfree'ing them one by one. Also, remove a badly placed USEMEM call in readtup_heap(). The tuple wasn't being allocated in the Tuplestorestate's context, so no need to adjust the memory consumed by the tuplestore there. Author: David Rowley Reviewed-by: Matthias van de Meent, Dmitry Dolgov Discussion: https://postgr.es/m/CAApHDvp5Py9g4Rjq7_inL3-MCK1Co2CRt_YWFwTU2zfQix0p4A@mail.gmail.com	2024-07-05 17:51:27 +12:00
David Rowley	53abb1e0eb	Fix newly introduced issue in EXPLAIN for Materialize nodes The code added in `1eff8279d` was lacking a check to see if the tuplestore had been created. In nodeMaterial.c this is done by ExecMaterial() rather than by ExecInitMaterial(), so the tuplestore won't be created unless the node has been executed at least once, as demonstrated by Alexander in his report. Here we skip showing any of the new EXPLAIN ANALYZE information when the Materialize node has not been executed. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/fe7fc8fb-86e5-ecb0-3cb2-dd2c9a6c482f@gmail.com	2024-07-05 16:56:16 +12:00
David Rowley	1eff8279d4	Add memory/disk usage for Material nodes in EXPLAIN Up until now, there was no ability to easily determine if a Material node caused the underlying tuplestore to spill to disk or even see how much memory the tuplestore used if it didn't. Here we add some new functions to tuplestore.c to query this information and add some additional output in EXPLAIN ANALYZE to display this information for the Material node. There are a few other executor node types that use tuplestores, so we could also consider adding these details to the EXPLAIN ANALYZE for those nodes too. Let's consider those independently from this. Having the tuplestore.c infrastructure in to allow that is step 1. Author: David Rowley Reviewed-by: Matthias van de Meent, Dmitry Dolgov Discussion: https://postgr.es/m/CAApHDvp5Py9g4Rjq7_inL3-MCK1Co2CRt_YWFwTU2zfQix0p4A@mail.gmail.com	2024-07-05 14:05:08 +12:00
Richard Guo	aa86129e19	Support "Right Semi Join" plan shapes Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com	2024-07-05 09:26:48 +09:00
Michael Paquier	4564f1cebd	Add pg_get_acl() to get the ACL for a database object This function returns the ACL for a database object, specified by catalog OID and object OID. This is useful to be able to retrieve the ACL associated to an object specified with a (class_id,objid) couple, similarly to the other functions for object identification, when joined with pg_depend or pg_shdepend. Original idea by Álvaro Herrera. Bump catalog version. Author: Joel Jacobson Reviewed-by: Isaac Morland, Michael Paquier, Ranier Vilela Discussion: https://postgr.es/m/80b16434-b9b1-4c3d-8f28-569f21c2c102@app.fastmail.com	2024-07-04 17:09:06 +09:00
Amit Langote	3a8a1f3254	SQL/JSON: Fix some obsolete comments. JSON_OBJECT(), JSON_OBJETAGG(), JSON_ARRAY(), and JSON_ARRAYAGG() added in `7081ac46ac` are not transformed into direct calls to user-defined functions as the comments claim. Fix by mentioning instead that they are transformed into JsonConstructorExpr nodes, which may call them, for example, for the *AGG() functions. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/058c856a-e090-ac42-ff00-ffe394f52a87%40gmail.com Backpatch-through: 16	2024-07-04 16:05:35 +09:00
Michael Paquier	b81a71aa05	Assign error codes where missing for user-facing failures All the errors triggered in the code paths patched here would cause the backend to issue an internal_error errcode, which is a state that should be used only for "can't happen" situations. However, these code paths are reachable by the regression tests, and could be seen by users in valid cases. Some regression tests expect internal errcodes as they manipulate the backend state to cause corruption (like checksums), or use elog() because it is more convenient (like injection points), these have no need to change. This reduces the number of internal failures triggered in a check-world by more than half, while providing correct errcodes for these valid cases. Reviewed-by: Robert Haas Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz	2024-07-04 09:48:40 +09:00
Alexander Korotkov	6897f0ec02	Optimize memory access in GetRunningTransactionData() `e85662df44` made GetRunningTransactionData() calculate the oldest running transaction id within the current database. This commit optimized this calculation by performing a cheap transaction id comparison before fetching the process database id, while the latter could cause extra cache misses. Reported-by: Noah Misch Discussion: https://postgr.es/m/20240630231816.bf.nmisch%40google.com	2024-07-04 02:05:37 +03:00
Alexander Korotkov	6c1af5482e	Fix typo in GetRunningTransactionData() `e85662df44` made GetRunningTransactionData() calculate the oldest running transaction id within the current database. However, because of the typo, the new code uses oldestRunningXid instead of oldestDatabaseRunningXid in comparison before updating oldestDatabaseRunningXid. This commit fixes that issue. Reported-by: Noah Misch Discussion: https://postgr.es/m/20240630231816.bf.nmisch%40google.com Backpatch-through: 17	2024-07-04 02:05:27 +03:00
David Rowley	4331a11c62	Remove incorrect Asserts in buffile.c Both BufFileSize() and BufFileAppend() contained Asserts to ensure the given BufFile(s) had a valid fileset. A valid fileset isn't required in either of these functions, so remove the Asserts and adjust the comments accordingly. This was noticed while work was being done on a new patch to call BufFileSize() on a BufFile without a valid fileset. It seems there's currently no code in the tree which could trigger these Asserts, so no need to backpatch this, for now. Reviewed-by: Peter Geoghegan, Matthias van de Meent, Tom Lane Discussion: https://postgr.es/m/CAApHDvofgZT0VzydhyGH5MMb-XZzNDqqAbzf1eBZV5HDm3%2BosQ%40mail.gmail.com	2024-07-04 09:44:34 +12:00
Heikki Linnakangas	f3412a61f3	Avoid 0-length memcpy to NULL with EXEC_BACKEND memcpy(NULL, src, 0) is forbidden by POSIX, even though every production version of libc allows it. Let's be tidy. Per report from Thomas Munro, running UBSan with EXEC_BACKEND. Backpatch to v17, where this code was added. Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2Be-dV7YWBzfBZXsgovgRuX5VmvmOT%2Bv0aXiZJ-EKbXcw@mail.gmail.com	2024-07-03 15:58:14 +03:00
Heikki Linnakangas	a06e8f84a1	Tighten check for --forkchild argument when spawning child process Commit `aafc05de1b` removed all the other --fork* arguments. Altough this is inconsequential, backpatch to v17 since this is new. Author: Nathan Bossart Discussion: https://www.postgresql.org/message-id/ZnCCEN0l3qWv-XpW@nathan	2024-07-03 15:53:30 +03:00
Michael Paquier	9fd0252579	Replace hardcoded identifiers of pgstats file by #defines This changes pgstat.c so as the three types of entries that can exist in a pgstats file are not hardcoded anymore, replacing them with descriptively-named macros, when reading and writing stats files: - 'N' for named entries, like replication slot stats. - 'S' for entries identified by a hash. - 'E' for the end-of-file This has come up while working on making this area of the code more pluggable. The format of the stats file is unchanged, hence there is no need to bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-07-03 13:09:20 +09:00
Heikki Linnakangas	eb21f5bc67	Remove redundant SetProcessingMode(InitProcessing) calls After several refactoring iterations, auxiliary processes are no longer initialized from the bootstrapper. Using the InitProcessing mode for initializing auxiliary processes is more appropriate. Since the global variable Mode is initialized to InitProcessing, we can just remove the redundant calls of SetProcessingMode(InitProcessing). Author: Xing Guo <higuoxing@gmail.com> Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com	2024-07-02 20:14:40 +03:00
Heikki Linnakangas	4d22173ec0	Move bgworker specific logic to bgworker.c For clarity, we've been slowly moving functions that are not called from the postmaster process out of postmaster.c. Author: Xing Guo <higuoxing@gmail.com> Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com	2024-07-02 20:12:05 +03:00

1 2 3 4 5 ...

25635 commits