postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-14 06:32:18 -04:00

Author	SHA1	Message	Date
Jeff Davis	86a5d6006a	Refactor string case conversion into provider-specific files. Create API entry points pg_strlower(), etc., that work with any provider and give the caller control over the destination buffer. Then, move provider-specific logic into pg_locale_builtin.c, pg_locale_icu.c, and pg_locale_libc.c as appropriate. Discussion: https://postgr.es/m/7aa46d77b377428058403723440862d12a8a129a.camel@j-davis.com	2024-12-16 09:35:18 -08:00
Michael Paquier	39240bcad5	Print out error position for CREATE DOMAIN This is simply done by pushing down the ParseState available in ProcessUtility() to DefineDomain(), giving more information about the position of an error when running a CREATE DOMAIN query. Most of the queries impacted by this change have been added previously in `0172b4c944`. Author: Kirill Reshke, Jian He Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com	2024-12-16 14:52:11 +09:00
Michael Paquier	4766438aa3	Adjust some comments about structure properties in pg_stat.h One comment of PgStat_TableCounts mentioned that its pending stats use memcmp() to check for the all-zero case if there is any activity. This is not true since `07e9e28b56`, as pg_memory_is_all_zeros() is used. PgStat_FunctionCounts incorrectly documented that it relied on memcpy(). This has never been correct, and not relevant because function statistics do not have an all-zero check for pending stats. Checkpoint and bgwriter statistics have been always relying on memcmp() or pg_memory_is_all_zeros() (since `07e9e28b56` for the latter), and never mentioned the dependency on event counters for their all-zero checks. Let's document these properties, like the table statistics. Author: Bertrand Drouvot Discussion: https://postgr.es/m/Z1hNLvcPgVLPxCoc@ip-10-97-1-34.eu-west-3.compute.internal	2024-12-12 16:59:22 +09:00
David Rowley	bd10ec5297	Detect redundant GROUP BY columns using UNIQUE indexes `d4c3a156c` added support that when the GROUP BY contained all of the columns belonging to a relation's PRIMARY KEY, all other columns belonging to that relation would be removed from the GROUP BY clause. That's possible because all other columns are functionally dependent on the PRIMARY KEY and those columns alone ensure the groups are distinct. Here we expand on that optimization and allow it to work for any unique indexes on the table rather than just the PRIMARY KEY index. This normally requires that all columns in the index are defined with NOT NULL, however, we can relax that requirement when the index is defined with NULLS NOT DISTINCT. When there are multiple suitable indexes to allow columns to be removed, we prefer the index with the least number of columns as this allows us to remove the highest number of GROUP BY columns. One day, we may want to revisit that decision as it may make more sense to use the narrower set of columns in terms of the width of the data types and stored/queried data. This also adjusts the code to make use of RelOptInfo.indexlist rather than looking up the catalog tables. In passing, add another short-circuit path to allow bailing out earlier in cases where it's certainly not possible to remove redundant GROUP BY columns. This early exit is now cheaper to do than when this code was originally written as `00b41463c` made it cheaper to check for empty Bitmapsets. Patch originally by Zhang Mingli and later worked on by jian he, but after I (David) worked on it, there was very little of the original left. Author: Zhang Mingli, jian he, David Rowley Reviewed-by: jian he, Andrei Lepikhov Discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark	2024-12-12 15:28:38 +13:00
David Rowley	430a5952de	Defer remove_useless_groupby_columns() work until query_planner() Traditionally, remove_useless_groupby_columns() was called during grouping_planner() directly after the call to preprocess_groupclause(). While in many ways, it made sense to populate the field and remove the functionally dependent columns from processed_groupClause at the same time, it's just that doing so had the disadvantage that remove_useless_groupby_columns() was being called before the RelOptInfos were populated for the relations mentioned in the query. Not having RelOptInfos available meant we needed to manually query the catalog tables to get the required details about the primary key constraint for the table. Here we move the remove_useless_groupby_columns() call to query_planner() and put it directly after the RelOptInfos are populated. This is fine to do as processed_groupClause still isn't final at this point as it can still be modified inside standard_qp_callback() by make_pathkeys_for_sortclauses_extended(). This commit is just a refactor and simply moves remove_useless_groupby_columns() into initsplan.c. A planned follow-up commit will adjust that function so it uses RelOptInfo instead of doing catalog lookups and also teach it how to use unique indexes as proofs to expand the cases where we can remove functionally dependent columns from the GROUP BY. Reviewed-by: Andrei Lepikhov, jian he Discussion: https://postgr.es/m/CAApHDvqLezKwoEBBQd0dp4Y9MDkFBDbny0f3SzEeqOFoU7Z5+A@mail.gmail.com	2024-12-12 14:22:15 +13:00
Masahiko Sawada	78c5e141e9	Add UUID version 7 generation function. This commit introduces the uuidv7() SQL function, which generates UUID version 7 as specified in RFC 9652. UUIDv7 combines a Unix timestamp in milliseconds and random bits, offering both uniqueness and sortability. In our implementation, the 12-bit sub-millisecond timestamp fraction is stored immediately after the timestamp, in the space referred to as "rand_a" in the RFC. This ensures additional monotonicity within a millisecond. The rand_a bits also function as a counter. We select a sub-millisecond timestamp so that it monotonically increases for generated UUIDs within the same backend, even when the system clock goes backward or when generating UUIDs at very high frequency. Therefore, the monotonicity of generated UUIDs is ensured within the same backend. This commit also expands the uuid_extract_timestamp() function to support UUID version 7. Additionally, an alias uuidv4() is added for the existing gen_random_uuid() SQL function to maintain consistency. Bump catalog version. Author: Andrey Borodin Reviewed-by: Sergey Prokhorenko, Przemysław Sztoch, Nikolay Samokhvalov Reviewed-by: Peter Eisentraut, Jelte Fennema-Nio, Aleksander Alekseev Reviewed-by: Masahiko Sawada, Lukas Fittl, Michael Paquier, Japin Li Reviewed-by: Marcos Pegoraro, Junwang Zhao, Stepan Neretin Reviewed-by: Daniel Vérité Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com	2024-12-11 15:54:41 -08:00
Masahiko Sawada	398d3e3b5b	Unmark gen_random_uuid() function leakproof. The functions without arguments don't need to be marked leakproof. This commit unmarks gen_random_uuid() leakproof for consistency with upcoming UUID generation functions. Also, this commit adds a regression test to prevent reintroducing such cases. Bump catalog version. Reported-by: Peter Eisentraut Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAD21AoBE1ePPWY1NQEgk3DkqjYzLPZwYTzCySHm0e%2B9a69PfZw%40mail.gmail.com	2024-12-11 10:35:57 -08:00
David Rowley	0f5738202b	Use ExprStates for hashing in GROUP BY and SubPlans This speeds up obtaining hash values for GROUP BY and hashed SubPlans by using the ExprState support for hashing, thus allowing JIT compilation for obtaining hash values for these operations. This, even without JIT compilation, has been shown to improve Hash Aggregate performance in some cases by around 15% and hashed NOT IN queries in one case by over 30%, however, real-world cases are likely to see smaller gains as the test cases used were purposefully designed to have high hashing overheads by keeping the hash table small to prevent additional memory overheads that would be a factor when working with large hash tables. In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the initial value is stored directly in the ExprState's result field if there are no expressions to hash. None of the current users of this function use an initial value, so the bug is only hypothetical. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com	2024-12-11 13:47:16 +13:00
Peter Eisentraut	a2a475b011	Replace get_equal_strategy_number_for_am() by get_equal_strategy_number() get_equal_strategy_number_for_am() gets the equal strategy number for an AM. This currently only supports btree and hash. In the more general case, this also depends on the operator class (see for example GistTranslateStratnum()). To support that, replace this function with get_equal_strategy_number() that takes an opclass and derives it from there. (This function already existed before as a static function, so the signature is kept for simplicity.) This patch is only a refactoring, it doesn't add support for other index AMs such as gist. This will be done separately. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-12-10 12:53:27 +01:00
Masahiko Sawada	724890ffb7	Include necessary header files in radixtree.h. When #include'ing radixtree.h with RT_SHMEM, it could happen to raise compiler errors due to missing some declarations of types and functions. This commit also removes the inclusion of postgres.h since it's against our usual convention. Backpatch to v17, where radixtree.h was introduced. Reviewed-by: Heikki Linnakangas, Álvaro Herrera Discussion: https://postgr.es/m/CAD21AoCU9YH%2Bb9Rr8YRw7UjmB%3D1zh8GKQkWNiuN9mVhMvkyrRg%40mail.gmail.com Backpatch-through: 17	2024-12-09 13:07:06 -08:00
Nathan Bossart	0a27c3d0f7	Fix various overflow hazards in date and timestamp functions. This commit makes use of the overflow-aware routines in int.h to fix a variety of reported overflow bugs in the date and timestamp code. It seems unlikely that this fixes all such bugs in this area, but since the problems seem limited to cases that are far beyond any realistic usage, I'm not going to worry too much. Note that for one bug, I've chosen to simply add a comment about the overflow hazard because fixing it would require quite a bit of code restructuring that doesn't seem worth the risk. Since this is a bug fix, it could be back-patched, but given the risk of conflicts with the new routines in int.h and the overall risk/reward ratio of this patch, I've opted not to do so for now. Fixes bug #18585 (except for the one case that's just commented). Reported-by: Alexander Lakhin Author: Matthew Kim, Nathan Bossart Reviewed-by: Joseph Koshakow, Jian He Discussion: https://postgr.es/m/31ad2cd1-db94-bdb3-f91a-65ffdb4bef95%40gmail.com Discussion: https://postgr.es/m/18585-db646741dd649abd%40postgresql.org	2024-12-09 13:47:23 -06:00
Tom Lane	3eea7a0c97	Simplify executor's determination of whether to use parallelism. Our parallel-mode code only works when we are executing a query in full, so ExecutePlan must disable parallel mode when it is asked to do partial execution. The previous logic for this involved passing down a flag (variously named execute_once or run_once) from callers of ExecutorRun or PortalRun. This is overcomplicated, and unsurprisingly some of the callers didn't get it right, since it requires keeping state that not all of them have handy; not to mention that the requirements for it were undocumented. That led to assertion failures in some corner cases. The only state we really need for this is the existing QueryDesc.already_executed flag, so let's just put all the responsibility in ExecutePlan. (It could have been done in ExecutorRun too, leading to a slightly shorter patch -- but if there's ever more than one caller of ExecutePlan, it seems better to have this logic in the subroutine than the callers.) This makes those ExecutorRun/PortalRun parameters unnecessary. In master it seems okay to just remove them, returning the API for those functions to what it was before parallelism. Such an API break is clearly not okay in stable branches, but for them we can just leave the parameters in place after documenting that they do nothing. Per report from Yugo Nagata, who also reviewed and tested this patch. Back-patch to all supported branches. Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp	2024-12-09 14:38:19 -05:00
Heikki Linnakangas	4d8275046c	Remove remants of "snapshot too old" Remove the 'whenTaken' and 'lsn' fields from SnapshotData. After the removal of the "snapshot too old" feature, they were never set to a non-zero value. This largely reverts commit `3e2f3c2e42`, which added the OldestActiveSnapshot tracking, and the init_toast_snapshot() function. That was only required for setting the 'whenTaken' and 'lsn' fields. SnapshotToast is now a constant again, like SnapshotSelf and SnapshotAny. I kept a thin get_toast_snapshot() wrapper around SnapshotToast though, to check that you have a registered or active snapshot. That's still a useful sanity check. Reviewed-by: Nathan Bossart, Andres Freund, Tom Lane Discussion: https://www.postgresql.org/message-id/cd4b4f8c-e63a-41c0-95f6-6e6cd9b83f6d@iki.fi	2024-12-09 18:13:03 +02:00
Michael Paquier	f0c569d715	Fix memory leak in pgoutput with publication list cache The pgoutput module caches publication names in a list and frees it upon invalidation. However, the code forgot to free the actual publication names within the list elements, as publication names are pstrdup()'d in GetPublication(). This would cause memory to leak in CacheMemoryContext, bloating it over time as this context is not cleaned. This is a problem for WAL senders running for a long time, as an accumulation of invalidation requests would bloat its cache memory usage. A second case, where this leak is easier to see, involves a backend calling SQL functions like pg_logical_slot_{get,peek}_changes() which create a new decoding context with each execution. More publications create more bloat. To address this, this commit adds a new memory context within the logical decoding context and resets it each time the publication names cache is invalidated, based on a suggestion from Amit Kapila. This ensures that the lifespan of the publication names aligns with that of the logical decoding context. This solution changes PGOutputData, which is fine for HEAD but it could cause an ABI breakage in stable branches as the structure size would change, so these are left out for now. Analyzed-by: Michael Paquier, Jeff Davis Author: Zhijie Hou Reviewed-by: Michael Paquier, Masahiko Sawada, Euler Taveira Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz	2024-12-09 16:41:46 +09:00
Tom Lane	3f9b962176	Ensure that pg_amop/amproc entries depend on their lefttype/righttype. Usually an entry in pg_amop or pg_amproc does not need a dependency on its amoplefttype/amoprighttype/amproclefttype/amprocrighttype types, because there is an indirect dependency via the argument types of its referenced operator or procedure, or via the opclass it belongs to. However, for some support procedures in some index AMs, the argument types of the support procedure might not mention the column data type at all. Also, the amop/amproc entry might be treated as "loose" in the opfamily, in which case it lacks a dependency on any particular opclass; or it might be a cross-type entry having a reference to a datatype that is not its opclass' opcintype. The upshot of all this is that there are cases where a datatype can be dropped while leaving behind amop/amproc entries that mention it, because there is no path in pg_depend showing that those entries depend on that type. Such entries are harmless in normal activity, because they won't get used, but they cause problems for maintenance actions such as dropping the operator family. They also cause pg_dump to produce bogus output. The previous commit put a band-aid on the DROP OPERATOR FAMILY failure, but a real fix is needed. To fix, add pg_depend entries showing that a pg_amop/pg_amproc entry depends on its lefttype/righttype. To avoid bloating pg_depend too much, skip this if the referenced operator or function has that type as an input type. (I did not bother with considering the possible indirect dependency via the opclass' opcintype; at least in the reported case, that wouldn't help anyway.) Probably, the reason this has escaped notice for so long is that add-on datatypes and relevant opclasses/opfamilies are usually packaged as extensions nowadays, so that there's no way to drop a type without dropping the referencing opclasses/opfamilies too. Still, in the absence of pg_depend entries there's nothing that constrains DROP EXTENSION to drop the opfamily entries before the datatype, so it seems possible for a DROP failure to occur anyway. The specific case that was reported doesn't fail in v13, because v13 prefers to attach the support procedure to the opclass not the opfamily. But it's surely possible to construct other edge cases that do fail in v13, so patch that too. Per report from Yoran Heling. Back-patch to all supported branches. Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021	2024-12-07 15:56:28 -05:00
Thomas Munro	71cb352904	Fix header inclusion order in c.h. Commit `962da900a` added #include <stdint.h> to postgres_ext.h, which broke c.h's header ordering rule. The system headers on some systems would then lock down off_t's size in private macros, before they'd had a chance to see our definition of _FILE_OFFSET_BITS (and presumably other things). This was picked up by perl's ABI compatibility checks on some 32 bit systems in the build farm. Move #include "postgres_ext.h" down below the system header section, and make the comments clearer (thanks to Tom for the new wording). Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2397643.1733347237%40sss.pgh.pa.us	2024-12-05 14:31:39 +13:00
Nathan Bossart	76fd342496	Provide a better error message for misplaced dispatch options. Before this patch, misplacing a special must-be-first option for dispatching to a subprogram (e.g., postgres -D . --single) would fail with an error like FATAL: --single requires a value This patch adjusts this error to more accurately complain that the special option wasn't listed first. The aforementioned error message now looks like FATAL: --single must be first argument The dispatch option parsing code has been refactored for use wherever ParseLongOption() is called. Beyond the obvious advantage of avoiding code duplication, this should prevent similar problems when new dispatch options are added. Note that we assume that none of the dispatch option names match another valid command-line argument, such as the name of a configuration parameter. Ideally, we'd remove this must-be-first requirement for these options, but after some investigation, we decided that wasn't worth the added complexity and behavior changes. Author: Nathan Bossart, Greg Sabino Mullane Reviewed-by: Greg Sabino Mullane, Peter Eisentraut, Álvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com	2024-12-04 15:04:15 -06:00
Peter Eisentraut	7727049e8f	Simplify IsIndexUsableForReplicaIdentityFull() Take Relation as argument instead of IndexInfo. Building the IndexInfo is an unnecessary intermediate step here. A future patch wants to get some information that is in the relcache but not in IndexInfo, so this will also help there. Discussion: https://www.postgresql.org/message-id/333d3886-b737-45c3-93f4-594c96bb405d@eisentraut.org	2024-12-04 08:33:28 +01:00
Amit Kapila	87ce27de69	Ensure stored generated columns must be published when required. Ensure stored generated columns that are part of REPLICA IDENTITY must be published explicitly for UPDATE and DELETE operations to be published. We can publish generated columns by listing them in the column list or by enabling the publish_generated_columns option. This commit changes the behavior of the test added in commit `adedf54e65` by giving an ERROR for the UPDATE operation in such cases. There is no way to trigger the bug reported in commit `adedf54e65` but we didn't remove the corresponding code change because it is still relevant when replicating changes from a publisher with version less than 18. We decided not to backpatch this behavior change to avoid the risk of breaking existing output plugins that may be sending generated columns by default although we are not aware of any such plugin. Also, we didn't see any reports related to this on STABLE branches which is another reason not to backpatch this change. Author: Shlok Kyal, Hou Zhijie Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA@mail.gmail.com	2024-12-04 09:45:18 +05:30
Thomas Munro	962da900ac	Use <stdint.h> and <inttypes.h> for c.h integers. Redefine our exact width types with standard C99 types and macros, including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already using <stdint.h> types in a few places. One complication is that Windows' <inttypes.h> uses format strings like "%I64d", "%I32", "%I" for PRI64, PRI32, PTR*PTR, instead of mapping to other standardized format strings like "%lld" etc as seen on other known systems. Teach our snprintf.c to understand them. This removes a lot of configure clutter, and should also allow 64-bit numbers and other standard types to be used in localized messages without casting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM	2024-12-04 15:05:38 +13:00
Jeff Davis	7167e05fc7	Move check for ucol_strcollUTF8 to pg_locale_icu.c The result of the check is only used by pg_locale_icu.c. Author: Andreas Karlsson Discussion: https://postgr.es/m/4548a168-62cd-457b-8d06-9ba7b985c477@proxel.se	2024-12-03 11:36:21 -08:00
Álvaro Herrera	4cc2a44980	Update obsolete comment Commit `3aa0395d4e` made worrying about BKI_ROWTYPE_OID matching no longer necessary, but this comment didn't get the memo.	2024-12-03 14:46:31 +01:00
Jeff Davis	e3fa2b037c	Fix unintentional behavior change in commit `e9931bfb75`. Prior to that commit, there was special case to use ASCII case mapping behavior for the libc provider with a single-byte encoding when that's the default collation. Commit `e9931bfb75` mistakenly eliminated that special case; this commit restores it. Discussion: https://postgr.es/m/01a104f0d2179d756261e90d96fd65c36ad6fcf0.camel@j-davis.com	2024-12-02 21:59:02 -08:00
David Rowley	4171c44c9b	Revert "Introduce CompactAttribute array in TupleDesc" This reverts commit `d28dff3f6c`. Quite a large number of buildfarm members didn't like this commit and it's not yet clear why. Reverting this before too many animals turn red. Discussion: https://postgr.es/m/CAApHDvr9i6T5=iAwQCxFDgMsthr_obVxgwBaEJkC8KUH6yM3Hw@mail.gmail.com	2024-12-03 17:12:38 +13:00
David Rowley	d28dff3f6c	Introduce CompactAttribute array in TupleDesc The new compact_attrs array stores a few select fields from FormData_pg_attribute in a more compact way, using only 16 bytes per column instead of the 104 bytes that FormData_pg_attribute uses. Using CompactAttribute allows performance-critical operations such as tuple deformation to be performed without looking at the FormData_pg_attribute element in TupleDesc which means fewer cacheline accesses. With this change, NAMEDATALEN could be increased with a much smaller negative impact on performance. For some workloads, tuple deformation can be the most CPU intensive part of processing the query. Some testing with 16 columns on a table where the first column is variable length showed around a 10% increase in transactions per second for an OLAP type query performing aggregation on the 16th column. However, in certain cases, the increases were much higher, up to ~25% on one AMD Zen4 machine. This also makes pg_attribute.attcacheoff redundant. A follow-on commit will remove it, thus shrinking the FormData_pg_attribute struct by 4 bytes. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com Reviewed-by: Andres Freund, Victor Yegorov	2024-12-03 16:50:59 +13:00
Michael Paquier	08691ea958	Rework some code handling pg_subscription data in psql and pg_dump This commit fixes some inconsistencies found in the frontend code when dealing with subscription catalog data. The following changes are done: - pg_subscription.h gains a EXPOSE_TO_CLIENT_CODE, so as more content defined in pg_subscription.h becomes available in pg_subscription_d.h for the frontend. - In psql's describe.c, substream can be switched to use CppAsString2() with its three LOGICALREP_STREAM_* values, with pg_subscription_d.h included. - pg_dump.c included pg_subscription.h, which is a header that should only be used in the backend code. The code is updated to use pg_subscription_d.h instead. - pg_dump stored all the data from pg_subscription in SubscriptionInfo with only strings, and a good chunk of them are boolean and char values. Using strings is not necessary, complicates the code (see for example two_phase_disabled[] removed here), and is inconsistent with the way other catalogs' data is handled. The fields of SubscriptionInfo are reordered to match with the order in its catalog, while on it. Reviewed-by: Hayato Kuroda Discussion: https://postgr.es/m/Z0lB2kp0ksHgmVuk@paquier.xyz	2024-12-03 09:48:12 +09:00
Nathan Bossart	db6a4a985b	Deprecate MD5 passwords. MD5 has been considered to be unsuitable for use as a cryptographic hash algorithm for some time. Furthermore, MD5 password hashes in PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing the username and hashed password is sufficient to authenticate. The SCRAM-SHA-256 method added in v10 is not subject to these problems and is considered to be superior to MD5. This commit marks MD5 password support in PostgreSQL as deprecated and to be removed in a future release. The documentation now contains several deprecation notices, and CREATE ROLE and ALTER ROLE now emit deprecation warnings when setting MD5 passwords. The warnings can be disabled by setting the md5_password_warnings parameter to "off". Reviewed-by: Greg Sabino Mullane, Jim Nasby Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan	2024-12-02 13:30:07 -06:00
Dean Rasheed	97173536ed	Add a planner support function for numeric generate_series(). This allows the planner to estimate the number of rows returned by generate_series(numeric, numeric[, numeric]), when the input values can be estimated at plan time. Song Jinzhou, reviewed by Dean Rasheed and David Rowley. Discussion: https://postgr.es/m/tencent_F43E7F4DD50EF5986D1051DE8DE547910206%40qq.com Discussion: https://postgr.es/m/tencent_1F6D5B9A1545E02FD7D0EE508DFD056DE50A%40qq.com	2024-12-02 11:37:57 +00:00
Daniel Gustafsson	0c01f509a3	Fix wording in comment Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CAHut+PvE+2T2etdTaHi3n+xbCG_UYrshQuCbaAdJCFPpQGLwgQ@mail.gmail.com	2024-11-28 15:17:49 +01:00
Peter Eisentraut	7f798aca1d	Remove useless casts to (void *) Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org	2024-11-28 08:27:20 +01:00
Thomas Munro	97525bc5c8	Require sizeof(bool) == 1. The C standard says that sizeof(bool) is implementation-defined, but we know of no current systems where it is not 1. The last known systems seem to have been Apple macOS/PowerPC 10.5 and Microsoft Visual C++ 4, both long defunct. PostgreSQL has always required sizeof(bool) == 1 for the definition of bool that it used, but previously it would define its own type if the system-provided bool had a different size. That was liable to cause memory layout problems when interacting with system and third-party libraries on (by now hypothetical) computers with wider _Bool, and now C23 has introduced a new problem by making bool a built-in datatype (like C++), so the fallback code doesn't even compile. We could probably work around that, but then we'd be writing new untested code for a computer that doesn't exist. Instead, delete the unreachable and C23-uncompilable fallback code, and let existing static assertions fail if the system-provided bool is too wide. If we ever get a problem report from a real system, then it will be time to figure out what to do about it in a way that also works on modern compilers. Note on C++: Previously we avoided including <stdbool.h> or trying to define a new bool type in headers that might be included by C++ code. These days we might as well just include <stdbool.h> unconditionally: it should be visible to C++11 but do nothing, just as in C23. We already include <stdint.h> without C++ guards in c.h, and that falls under the same C99-compatibility section of the C++11 standard as <stdbool.h>, so let's remove the guards here too. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3198438.1731895163%40sss.pgh.pa.us	2024-11-28 12:01:14 +13:00
Andrew Dunstan	5c32c21afe	jsonapi: add lexer option to keep token ownership Commit `0785d1b8b` adds support for libpq as a JSON client, but allocations for string tokens can still be leaked during parsing failures. This is tricky to fix for the object_field semantic callbacks: the field name must remain valid until the end of the object, but if a parsing error is encountered partway through, object_field_end() won't be invoked and the client won't get a chance to free the field name. This patch adds a flag to switch the ownership of parsed tokens to the lexer. When this is enabled, the client must make a copy of any tokens it wants to persist past the callback lifetime, but the lexer will handle necessary cleanup on failure. Backend uses of the JSON parser don't need to use this flag, since the parser's allocations will occur in a short lived memory context. A -o option has been added to test_json_parser_incremental to exercise the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP tests make use of it. (The test program now cleans up allocated memory, so that tests can be usefully run under leak sanitizers.) Author: Jacob Champion Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com	2024-11-27 12:07:14 -05:00
Nathan Bossart	61171a632d	Look up backend type in pg_signal_backend() more cheaply. Commit `ccd38024bc`, which introduced the pg_signal_autovacuum_worker role, added a call to pgstat_get_beentry_by_proc_number() for the purpose of determining whether the process is an autovacuum worker. This function calls pgstat_read_current_status(), which can be fairly expensive and may return cached, out-of-date information. Since we just need to look up the target backend's BackendType, and we already know its ProcNumber, we can instead inspect the BackendStatusArray directly, which is much less expensive and possibly more up-to-date. There are some caveats with this approach (which are documented in the code), but it's still substantially better than before. Reported-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/ujenaa2uabzfkwxwmfifawzdozh3ljr7geozlhftsuosgm7n7q%40g3utqqyyosb6	2024-11-27 10:32:25 -06:00
Thomas Munro	f1da075d9a	Remove configure check for _configthreadlocale(). All modern Windows systems have _configthreadlocale(). It was first introduced in msvcr80.dll from Visual Studio 2005. Historically, MinGW was stuck on even older msvcrt.dll, but added its own dummy implementation of the function when using msvcrt.dll years ago anyway, effectively rendering the configure test useless. In practice we don't encounter the dummy anymore because modern MinGW uses ucrt. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech	2024-11-27 23:12:19 +13:00
Álvaro Herrera	e6c32d9fad	Clean up newlines following left parentheses Most came in during the 17 cycle, so backpatch there. Some (particularly reorderbuffer.h) are very old, but backpatching doesn't seem useful. Like commits `c9d2977519`, `c4f113e8fe`.	2024-11-26 17:10:07 +01:00
Peter Eisentraut	2a7b2d9717	Improve InitShmemAccess() prototype The code comment said, 'the argument should be declared "PGShmemHeader *seghdr", but we use void to avoid having to include ipc.h in shmem.h.' We can achieve the original goal with a struct forward declaration. (ipc.h was also not the correct header file.) Discussion: https://www.postgresql.org/message-id/flat/cnthxg2eekacrejyeonuhiaezc7vd7o2uowlsbenxqfkjwgvwj@qgzu6eoqrglb	2024-11-26 08:46:22 +01:00
Richard Guo	a8ccf4e93a	Reordering DISTINCT keys to match input path's pathkeys The ordering of DISTINCT items is semantically insignificant, so we can reorder them as needed. In fact, in the parser, we absorb the sorting semantics of the sortClause as much as possible into the distinctClause, ensuring that one clause is a prefix of the other. This can help avoid a possible need to re-sort. In this commit, we attempt to adjust the DISTINCT keys to match the input path's pathkeys. This can likewise help avoid re-sorting, or allow us to use incremental-sort to save efforts. For DISTINCT ON expressions, the parser already ensures that they match the initial ORDER BY expressions. When reordering the DISTINCT keys, we must ensure that the resulting pathkey list matches the initial distinctClause pathkeys. This introduces a new GUC, enable_distinct_reordering, which allows the optimization to be disabled if needed. Author: Richard Guo Reviewed-by: Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com	2024-11-26 09:25:18 +09:00
Tom Lane	5b8728cd7f	Fix NULLIF()'s handling of read-write expanded objects. If passed a read-write expanded object pointer, the EEOP_NULLIF code would hand that same pointer to the equality function and then (unless equality was reported) also return the same pointer as its value. This is no good, because a function that receives a read-write expanded object pointer is fully entitled to scribble on or even delete the object, thus corrupting the NULLIF output. (This problem is likely unobservable with the equality functions provided in core Postgres, but it's easy to demonstrate with one coded in plpgsql.) To fix, make sure the pointer passed to the equality function is read-only. We can still return the original read-write pointer as the NULLIF result, allowing optimization of later operations. Per bug #18722 from Alexander Lakhin. This has been wrong since we invented expanded objects, so back-patch to all supported branches. Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org	2024-11-25 18:09:09 -05:00
Thomas Munro	bc5a4dfcf7	Assume that <stdbool.h> conforms to the C standard. Previously we checked "for <stdbool.h> that conforms to C99" using autoconf's AC_HEADER_STDBOOL macro. We've required C99 since PostgreSQL 12, so the test was redundant, and under C23 it was broken: autoconf 2.69's implementation doesn't understand C23's new empty header (the macros it's looking for went away, replaced by language keywords). Later autoconf versions fixed that, but let's just remove the anachronistic test. HAVE_STDBOOL_H and HAVE__BOOL will no longer be defined, but they weren't directly tested in core or likely extensions (except in 11, see below). PG_USE_STDBOOL (or USE_STDBOOL in 11 and 12) is still defined when sizeof(bool) is 1, which should be true on all modern systems. Otherwise we define our own bool type and values of size 1, which would fail to compile under C23 as revealed by the broken test. (We'll probably clean that dead code up in master, but here we want a minimal back-patchable change.) This came to our attention when GCC 15 recently started using using C23 by default and failed to compile the replacement code, as reported by Sam James and build farm animal alligator. Back-patch to all supported releases, and then two older versions that also know about <stdbool.h>, per the recently-out-of-support policy[1]. 12 requires C99 so it's much like the supported releases, but 11 only assumes C89 so it now uses AC_CHECK_HEADERS instead of the overly picky AC_HEADER_STDBOOL. (I could find no discussion of which historical systems had <stdbool.h> but failed the conformance test; if they ever existed, they surely aren't relevant to that policy's goals.) [1] https://wiki.postgresql.org/wiki/Committing_checklist#Policies Reported-by: Sam James <sam@gentoo.org> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> (master version) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (approach) Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org	2024-11-25 20:54:28 +13:00
Amit Kapila	d05a387d9d	Doc: Clarify the `inactive_since` field description. Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com	2024-11-25 11:12:32 +05:30
Alexander Korotkov	ae4569161a	Teach bitmap path generation about transforming OR-clauses to SAOP's When optimizer generates bitmap paths, it considers breaking OR-clause arguments one-by-one. But now, a group of similar OR-clauses can be transformed into SAOP during index matching. So, bitmap paths should keep up. This commit teaches bitmap paths generation machinery to group similar OR-clauses into dedicated RestrictInfos. Those RestrictInfos are considered both to match index as a whole (as SAOP), or to match as a set of individual OR-clause argument one-by-one (the old way). Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's transformation. The old way of handling them is also considered. So, there shouldn't be planning regression. Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com Author: Alexander Korotkov, Andrey Lepikhov Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas Reviewed-by: Peter Geoghegan	2024-11-24 01:41:45 +02:00
Nathan Bossart	efdc7d7475	Add INT64_HEX_FORMAT and UINT64_HEX_FORMAT to c.h. Like INT64_FORMAT and UINT64_FORMAT, these macros produce format strings for 64-bit integers. However, INT64_HEX_FORMAT and UINT64_HEX_FORMAT generate the output in hexadecimal instead of decimal. Besides introducing these macros, this commit makes use of them in several places. This was originally intended to be part of commit `5d6187d2a2`, but I left it out because I felt there was a nonzero chance that back-patching these new macros into c.h could cause problems with third-party code. We tend to be less cautious with such changes in new major versions. Note that UINT64_HEX_FORMAT was originally added in commit `ee1b30f128`, but it was placed in test_radixtree.c, so it wasn't widely available. This commit moves UINT64_HEX_FORMAT to c.h. Discussion: https://postgr.es/m/ZwQvtUbPKaaRQezd%40nathan	2024-11-22 12:41:57 -06:00
Heikki Linnakangas	8fb5936703	Make the memory layout of Port struct independent of USE_OPENSSL Commit `d39a49c1e4` added new fields to the struct, but missed the "keep these last" comment on the previous fields. Add placeholder variables so that the offsets of the fields are the same whether you build with USE_OPENSSL or not. This is a courtesy to extensions that might peek at the fields, to make the ABI the same regardless of the options used to build PostgreSQL. In reality, I don't expect any extensions to look at the 'raw_buf' fields. Firstly, they are new in v17, so no one's written such extensions yet. Secondly, extensions should have no business poking at those fields anyway. Nevertheless, fix this properly on 'master'. On v17, we mustn't change the memory layout, so just fix the comments. Author: Jacob Champion Discussion: https://www.postgresql.org/message-id/raw/CAOYmi%2BmKVJNzn5_TD_MK%3DhqO64r_w8Gb0FHCLk0oAkW-PJv8jQ@mail.gmail.com	2024-11-22 17:43:04 +02:00
Thomas Munro	aac831cafa	Use auxv to check for CRC32 instructions on ARM. Previously we probed for CRC32 instructions by testing if they caused SIGILL. Some have expressed doubts about that technique, the Linux documentation advises not to use it, and it's not exactly beautiful. Now that more operating systems expose CPU features to userspace via the ELF loader in approximately the same way, let's use that instead. This is expected to work on Linux, FreeBSD and recent OpenBSD. OpenBSD/ARM has not been tested and is not present in our build farm, but the API matches FreeBSD. On macOS, compilers use a more recent baseline ISA so the runtime test mechanism isn't reached. (A similar situation is expected for Windows/ARM when that port lands.) On NetBSD, runtime feature probing is lost for armv8-a builds. It looks potentially doable with sysctl following the example of the cpuctl program; patches are welcome. No back-patch for now, since we don't have any evidence of actual breakage from the previous technique. Suggested-by: Bastien Roucariès <rouca@debian.org> Discussion: https://postgr.es/m/4496616.iHFcN1HehY%40portable-bastien	2024-11-22 21:45:25 +13:00
Michael Paquier	c06e71d1ac	Add write_to_file to PgStat_KindInfo for pgstats kinds This new field controls if entries of a stats kind should be written or not to the on-disk pgstats file when shutting down an instance. This affects both fixed and variable-numbered kinds. This is useful for custom statistics by itself, and a patch is under discussion to add a new builtin stats kind where the write of the stats is not necessary. All the built-in stats kinds, as well as the two custom stats kinds in the test module injection_points, set this flag to "true" for now, so as stats entries are written to the on-disk pgstats file. Author: Bertrand Drouvot Reviewed-by: Nazir Bilal Yavuz Discussion: https://postgr.es/m/Zz7T47nHwYgeYwOe@ip-10-97-1-34.eu-west-3.compute.internal	2024-11-22 10:12:26 +09:00
Bruce Momjian	f95da9f0e0	More logically order libpq func. includes, e.g., group GUC vals Reported-by: David Zhang Discussion: https://postgr.es/m/65909efe-97c6-4863-af4e-21eb5a26dd1e@highgo.ca Co-authored-by: David Zhang Backpatch-through: master	2024-11-20 17:09:17 -05:00
Michael Paquier	5be1dabd2a	Optimize pg_memory_is_all_zeros() in memutils.h pg_memory_is_all_zeros() is currently implemented to do only a byte-per-byte comparison. While being sufficient for its existing callers for pgstats entries, it could lead to performance regressions should it be used for larger memory areas, like 8kB blocks, or even future commits. This commit optimizes the implementation of this function to be more efficient for larger sizes, written in a way so that compilers can optimize the code. This is portable across 32b and 64b architectures. The implementation handles three cases, depending on the size of the input provided: * If less than sizeof(size_t), do a simple byte-by-byte comparison. * If between sizeof(size_t) and (sizeof(size_t) * 8 - 1): Phase 1: byte-by-byte comparison, until the pointer is aligned. Phase 2: size_t comparisons, with aligned pointers, up to the last aligned location possible. ** Phase 3: byte-by-byte comparison, until the end location. * If more than (sizeof(size_t) * 8) bytes, this is the same as case 2 except that an additional phase is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers to use SIMD instructions if available. The last improvement proves to be at least 3 times faster than the size_t comparisons, which is something currently used for the all-zero page check in PageIsVerifiedExtended(). The optimization tricks that would encourage the use of SIMD instructions have been suggested by David Rowley. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Ranier Vilela Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com	2024-11-18 10:08:38 +09:00
Tom Lane	b69bdcee9c	Avoid assertion due to disconnected NFA sub-graphs in regex parsing. In commit `08c0d6ad6` which introduced "rainbow" arcs in regex NFAs, I didn't think terribly hard about what to do when creating the color complement of a rainbow arc. Clearly, the complement cannot match any characters, and I took the easy way out by just not building any arcs at all in the complement arc set. That mostly works, but Nikolay Shaplov found a case where it doesn't: if we decide to delete that sub-NFA later because it's inside a "{0}" quantifier, delsub() suffered an assertion failure. That's because delsub() relies on the target sub-NFA being fully connected. That was always true before, and the best fix seems to be to restore that property. Hence, invent a new arc type CANTMATCH that can be generated in place of an empty color complement, and drop it again later when we start NFA optimization. (At that point we don't need to do delsub() any more, and besides there are other cases where NFA optimization can lead to disconnected subgraphs.) It appears that this bug has no consequences in a non-assert-enabled build: there will be some transiently leaked NFA states/arcs, but they'll get cleaned up eventually. Still, we don't like assertion failures, so back-patch to v14 where rainbow arcs were introduced. Per bug #18708 from Nikolay Shaplov. Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org	2024-11-15 18:23:38 -05:00
Michael Paquier	818119afcc	Fix race conditions with drop of reused pgstats entries This fixes a set of race conditions with cumulative statistics where a shared stats entry could be dropped while it should still be valid in the event when it is reused: an entry may refer to a different object but requires the same hash key. This can happen with various stats kinds, like: - Replication slots that compute internally an index number, for different slot names. - Stats kinds that use an OID in the object key, where a wraparound causes the same key to be used if an OID is used for the same object. - As of PostgreSQL 18, custom pgstats kinds could also be an issue, depending on their implementation. This issue is fixed by introducing a counter called "generation" in the shared entries via PgStatShared_HashEntry, initialized at 0 when an entry is created and incremented when the same entry is reused, to avoid concurrent issues on drop because of other backends still holding a reference to it. This "generation" is copied to the local copy that a backend holds when looking at an object, then cross-checked with the shared entry to make sure that the entry is not dropped even if its "refcount" justifies that if it has been reused. This problem could show up when a backend shuts down and needs to discard any entries it still holds, causing statistics to be removed when they should not, or even an assertion failure. Another report involved a failure in a standby after an OID wraparound, where the startup process would FATAL on a "can only drop stats once", stopping recovery abruptly. The buildfarm has been sporadically complaining about the problem, as well, but the window is hard to reach with the in-core tests. Note that the issue can be reproduced easily by adding a sleep before dshash_find() in pgstat_release_entry_ref() to enlarge the problematic window while repeating test_decoding's isolation test oldest_xmin a couple of times, for example, as pointed out by Alexander Lakhin. Reported-by: Alexander Lakhin, Peter Smith Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org Backpatch-through: 15	2024-11-15 11:31:58 +09:00
Heikki Linnakangas	5b00786857	Pass MyPMChildSlot as an explicit argument to child process All the other global variables passed from postmaster to child have the same value in all the processes, while MyPMChildSlot is more like a parameter to each child process. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:12:32 +02:00
Heikki Linnakangas	a78af04270	Assign a child slot to every postmaster child process Previously, only backends, autovacuum workers, and background workers had an entry in the PMChildFlags array. With this commit, all postmaster child processes, including all the aux processes, have an entry. Dead-end backends still don't get an entry, though, and other processes that don't touch shared memory will never mark their PMChildFlags entry as active. We now maintain separate freelists for different kinds of child processes. That ensures that there are always slots available for autovacuum and background workers. Previously, pre-authentication backends could prevent autovacuum or background workers from starting up, by using up all the slots. The code to manage the slots in the postmaster process is in a new pmchild.c source file. Because postmaster.c is just so large. Assigning pmsignal slot numbers is now pmchild.c's responsibility. This replaces the PMChildInUse array in pmsignal.c. Some of the comments in postmaster.c still talked about the "stats process", but that was removed in commit `5891c7a8ed`. Fix those while we're at it. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:12:28 +02:00
Heikki Linnakangas	18d67a8d7d	Replace postmaster.c's own backend type codes with BackendType Introduce a separate BackendType for dead-end children, so that we don't need a separate dead_end flag. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:06:16 +02:00
Amit Langote	bfeeb065ea	Add missing word in comment Discussion: https://postgr.es/m/CA+HiwqFgdp8=0_gi+DU0fPWZbg7qY3KZ_c1Wj1DEvzXC4BCnMQ@mail.gmail.com	2024-11-12 20:39:57 +09:00
Tom Lane	5a2fed911a	Fix improper interactions between session_authorization and role. The SQL spec mandates that SET SESSION AUTHORIZATION implies SET ROLE NONE. We tried to implement that within the lowest-level functions that manipulate these settings, but that was a bad idea. In particular, guc.c assumes that it doesn't matter in what order it applies GUC variable updates, but that was not the case for these two variables. This problem, compounded by some hackish attempts to work around it, led to some security-grade issues: * Rolling back a transaction that had done SET SESSION AUTHORIZATION would revert to SET ROLE NONE, even if that had not been the previous state, so that the effective user ID might now be different from what it had been. * The same for SET SESSION AUTHORIZATION in a function SET clause. * If a parallel worker inspected current_setting('role'), it saw "none" even when it should see something else. Also, although the parallel worker startup code intended to cope with the current role's pg_authid row having disappeared, its implementation of that was incomplete so it would still fail. Fix by fully separating the miscinit.c functions that assign session_authorization from those that assign role. To implement the spec's requirement, teach set_config_option itself to perform "SET ROLE NONE" when it sets session_authorization. (This is undoubtedly ugly, but the alternatives seem worse. In particular, there's no way to do it within assign_session_authorization without incompatible changes in the API for GUC assign hooks.) Also, improve ParallelWorkerMain to directly set all the relevant user-ID variables instead of relying on some of them to get set indirectly. That allows us to survive not finding the pg_authid row during worker startup. In v16 and earlier, this includes back-patching `9987a7bf3` which fixed a violation of GUC coding rules: SetSessionAuthorization is not an appropriate place to be throwing errors from. Security: CVE-2024-10978	2024-11-11 10:29:54 -05:00
Michael Paquier	e7a9496de9	Add two attributes to pg_stat_database for parallel workers activity Two attributes are added to pg_stat_database: * parallel_workers_to_launch, counting the total number of parallel workers that were planned to be launched. * parallel_workers_launched, counting the total number of parallel workers actually launched. The ratio of both fields can provide hints that there are not enough slots available when launching parallel workers, also useful when pg_stat_statements is not deployed on an instance (i.e. `cf54a2c002`). This commit relies on `de3a2ea3b2`, that has added two fields to EState, that get incremented when executing Gather or GatherMerge nodes. A test is added in select_parallel, where parallel workers are spawned. Bump catalog version. Author: Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com	2024-11-11 10:40:48 +09:00
Thomas Munro	29d66b2d2f	jit: Remove obsolete LLVM version guard. Commit `9044fc1d` needed a version guard when back-patched, but it is redundant in master as of commit `972c2cd2`, and I accidentally left it in there.	2024-11-11 12:07:24 +13:00
Nathan Bossart	0fa6884065	Fix sign-compare warnings in pg_iovec.h. The code in question (pg_preadv() and pg_pwritev()) has been around for a while, but commit `15c9ac3629` moved it to a header file. If third-party code that includes this header file is built with -Wsign-compare on a system without preadv() or pwritev(), warnings ensue. This commit fixes said warnings by casting the result of pg_pread()/pg_pwrite() to size_t, which should be safe because we will have already checked for a negative value. Author: Wolfgang Walther Discussion: https://postgr.es/m/16989737-1aa8-48fd-8dfe-b7ada06509ab%40technowledgy.de Backpatch-through: 17	2024-11-08 16:11:08 -06:00
Álvaro Herrera	14e87ffa5c	Add pg_constraint rows for not-null constraints We now create contype='n' pg_constraint rows for not-null constraints on user tables. Only one such constraint is allowed for a column. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter or remove it, instead matching by the name of the column that they apply to. This means we don't require the constraint names to be identical across a hierarchy. The inheritance status of these constraints can be controlled: now we can be sure that if a parent table has one, then all children will have it as well. They can optionally be marked NO INHERIT, and then children are free not to have one. (There's currently no support for altering a NO INHERIT constraint into inheriting down the hierarchy, but that's a desirable future feature.) This also opens the door for having these constraints be marked NOT VALID, as well as allowing UNIQUE+NOT NULL to be used for functional dependency determination, as envisioned by commit `e49ae8d3bc`. It's likely possible to allow DEFERRABLE constraints as followup work, as well. psql shows these constraints in \d+, though we may want to reconsider if this turns out to be too noisy. Earlier versions of this patch hid constraints that were on the same columns of the primary key, but I'm not sure that that's very useful. If clutter is a problem, we might be better off inventing a new \d++ command and not showing the constraints in \d+. For now, we omit these constraints on system catalog columns, because they're unlikely to achieve anything. The main difference to the previous attempt at this (`b0e96f3119`) is that we now require that such a constraint always exists when a primary key is in the column; we didn't require this previously which had a number of unpalatable consequences. With this requirement, the code is easier to reason about. For example: - We no longer have "throwaway constraints" during pg_dump. We needed those for the case where a table had a PK without a not-null underneath, to prevent a slow scan of the data during restore of the PK creation, which was particularly problematic for pg_upgrade. - We no longer have to cope with attnotnull being set spuriously in case a primary key is dropped indirectly (e.g., via DROP COLUMN). Some bits of code in this patch were authored by Jian He. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql	2024-11-08 13:28:48 +01:00
Nathan Bossart	f78667bd91	Use __attribute__((target(...))) for AVX-512 support. Presently, we check for compiler support for the required intrinsics both with and without extra compiler flags (e.g., -mxsave), and then depending on the results of those checks, we pick which files to compile with which flags. This is tedious and complicated, and it results in unsustainable coding patterns such as separate files for each portion of code may need to be built with different compiler flags. This commit introduces support for __attribute__((target(...))) and uses it for the AVX-512 code. This simplifies both the configure-time checks and the build scripts, and it allows us to place the functions that use the intrinsics in files that we otherwise do not want to build with special CPU instructions. We are careful to avoid using __attribute__((target(...))) on compilers that do not understand it, but we still perform the configure-time checks in case the compiler allows using the intrinsics without it (e.g., MSVC). A similar change could likely be made for some of the CRC-32C code, but that is left as a future exercise. Suggested-by: Andres Freund Reviewed-by: Raghuveer Devulapalli, Andres Freund Discussion: https://postgr.es/m/20240731205254.vfpap7uxwmebqeaf%40awork3.anarazel.de	2024-11-07 13:58:43 -06:00
Amit Kapila	7054186c4e	Replicate generated columns when 'publish_generated_columns' is set. This patch builds on the work done in commit `745217a051` by enabling the replication of generated columns alongside regular column changes through a new publication parameter: publish_generated_columns. Example usage: CREATE PUBLICATION pub1 FOR TABLE tab_gencol WITH (publish_generated_columns = true); The column list takes precedence. If the generated columns are specified in the column list, they will be replicated even if 'publish_generated_columns' is set to false. Conversely, if generated columns are not included in the column list (assuming the user specifies a column list), they will not be replicated even if 'publish_generated_columns' is true. Author: Vignesh C, Shubham Khanna Reviewed-by: Peter Smith, Amit Kapila, Hayato Kuroda, Shlok Kyal, Ajin Cherian, Hou Zhijie, Masahiko Sawada Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com	2024-11-07 08:58:49 +05:30
Thomas Munro	9044fc1d45	Monkey-patch LLVM code to fix ARM relocation bug. Supply a new memory manager for RuntimeDyld, to avoid crashes in generated code caused by memory placement that can overflow a 32 bit data type. This is a drop-in replacement for the llvm::SectionMemoryManager class in the LLVM library, with Michael Smith's proposed fix from https://www.github.com/llvm/llvm-project/pull/71968. We hereby slurp it into our own source tree, after moving into a new namespace llvm::backport and making some minor adjustments so that it can be compiled with older LLVM versions as far back as 12. It's harder to make it work on even older LLVM versions, but it doesn't seem likely that people are really using them so that is not investigated for now. The problem could also be addressed by switching to JITLink instead of RuntimeDyld, and that is the LLVM project's recommended solution as the latter is about to be deprecated. We'll have to do that soon enough anyway, and then when the LLVM version support window advances far enough in a few years we'll be able to delete this code. Unfortunately that wouldn't be enough for PostgreSQL today: in most relevant versions of LLVM, JITLink is missing or incomplete. Several other projects have already back-ported this fix into their fork of LLVM, which is a vote of confidence despite the lack of commit into LLVM as of today. We don't have our own copy of LLVM so we can't do exactly what they've done; instead we have a copy of the whole patched class so we can pass an instance of it to RuntimeDyld. The LLVM project hasn't chosen to commit the fix yet, and even if it did, it wouldn't be back-ported into the releases of LLVM that most of our users care about, so there is not much point in waiting any longer for that. If they make further changes and commit it to LLVM 19 or 20, we'll still need this for older versions, but we may want to resynchronize our copy and update some comments. The changes that we've had to make to our copy can be seen by diffing our SectionMemoryManager.{h,cpp} files against the ones in the tree of the pull request. Per the LLVM project's license requirements, a copy is in SectionMemoryManager.LICENSE. This should fix the spate of crash reports we've been receiving lately from users on large memory ARM systems. Back-patch to all supported releases. Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> (license aspects) Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com	2024-11-06 23:17:18 +13:00
Alexander Korotkov	3a7ae6b3d9	Revert pg_wal_replay_wait() stored procedure This commit reverts `3c5db1d6b0`, and subsequent improvements and fixes including `8036d73ae3`, `867d396ccd`, `3ac3ec580c`, `0868d7ae70`, `85b98b8d5a`, `2520226c95`, `014f9f34d2`, `e658038772`, `e1555645d7`, `5035172e4a`, `6cfebfe88b`, `73da6b8d1b`, and `e546989a26`. The reason for reverting is a set of remaining issues. Most notably, the stored procedure appears to need more effort than the utility statement to turn the backend into a "snapshot-less" state. This makes an approach to use stored procedures questionable. Catversion is bumped. Discussion: https://postgr.es/m/Zyhj2anOPRKtb0xW%40paquier.xyz	2024-11-04 22:47:57 +02:00
Heikki Linnakangas	3c0fd64fec	Split ProcSleep function into JoinWaitQueue and ProcSleep Split ProcSleep into two functions: JoinWaitQueue and ProcSleep. JoinWaitQueue is called while holding the partition lock, and inserts the current process to the wait queue, while ProcSleep() does the actual sleeping. ProcSleep() is now called without holding the partition lock, and it no longer re-acquires the partition lock before returning. That makes the wakeup a little cheaper. Once upon a time, re-acquiring the partition lock was needed to prevent a signal handler from longjmping out at a bad time, but these days our signal handlers just set flags, and longjmping can only happen at points where we explicitly run CHECK_FOR_INTERRUPTS(). If JoinWaitQueue detects an "early deadlock" before even joining the wait queue, it returns without changing the shared lock entry, leaving the cleanup of the shared lock entry to the caller. This makes the handling of an early deadlock the same as the dontWait=true case. One small user-visible side-effect of this refactoring is that we now only set the 'ps' title to say "waiting" when we actually enter the sleep, not when the lock is skipped because dontWait=true, or when a deadlock is detected early before entering the sleep. This eliminates the 'lockAwaited' global variable in proc.c, which was largely redundant with 'awaitedLock' in lock.c Note: Updating the local lock table is now the caller's responsibility. JoinWaitQueue and ProcSleep are now only responsible for modifying the shared state. Seems a little nicer that way. Based on Thomas Munro's earlier patch and observation that ProcSleep doesn't really need to re-acquire the partition lock. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi	2024-11-04 17:59:24 +02:00
Noah Misch	0bada39c83	Fix inplace update buffer self-deadlock. A CacheInvalidateHeapTuple* callee might call CatalogCacheInitializeCache(), which needs a relcache entry. Acquiring a valid relcache entry might scan pg_class. Hence, to prevent undetected LWLock self-deadlock, CacheInvalidateHeapTuple* callers must not hold BUFFER_LOCK_EXCLUSIVE on buffers of pg_class. Move the CacheInvalidateHeapTupleInplace() before the BUFFER_LOCK_EXCLUSIVE. No back-patch, since I've reverted commit `243e9b40f1` from non-master branches. Reported by Alexander Lakhin. Reviewed by Alexander Lakhin. Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com	2024-11-02 09:04:56 -07:00
Heikki Linnakangas	368d8270c8	Rename two functions that wake up other processes Instead of talking about setting latches, which is a pretty low-level mechanism, emphasize that they wake up other processes. This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi	2024-11-01 13:47:24 +02:00
Heikki Linnakangas	a9c546a5a3	Use ProcNumbers instead of direct Latch pointers to address other procs This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi	2024-11-01 13:47:20 +02:00
Michael Paquier	07e9e28b56	Add pg_memory_is_all_zeros() in memutils.h This new function tests if a memory region starting at a given location for a defined length is made only of zeroes. This unifies in a single path the all-zero checks that were happening in a couple of places of the backend code: - For pgstats entries of relation, checkpointer and bgwriter, where some "all_zeroes" variables were previously used with memcpy(). - For all-zero buffer pages in PageIsVerifiedExtended(). This new function uses the same forward scan as the check for all-zero buffer pages, applying it to the three pgstats paths mentioned above. Author: Bertrand Drouvot Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Peter Smith Discussion: https://postgr.es/m/ZupUDDyf1hHI4ibn@ip-10-97-1-34.eu-west-3.compute.internal	2024-11-01 11:35:46 +09:00
Michael Paquier	49d6c7d8da	Add SQL function array_reverse() This function takes in input an array, and reverses the position of all its elements. This operation only affects the first dimension of the array, like array_shuffle(). The implementation structure is inspired by array_shuffle(), with a subroutine called array_reverse_n() that may come in handy in the future, should more functions able to reverse portions of arrays be introduced. Bump catalog version. Author: Aleksander Alekseev Reviewed-by: Ashutosh Bapat, Tom Lane, Vladlen Popolitov Discussion: https://postgr.es/m/CAJ7c6TMpeO_ke+QGOaAx9xdJuxa7r=49-anMh3G5476e3CX1CA@mail.gmail.com	2024-11-01 10:32:19 +09:00
Tom Lane	89e51abcb2	Add a parse location field to struct FunctionParameter. This allows an error cursor to be supplied for a bunch of bad-function-definition errors that previously lacked one, or that cheated a bit by pointing at the contained type name when the error isn't really about that. Bump catversion from an abundance of caution --- I don't think this node type can actually appear in stored views/rules, but better safe than sorry. Jian He and Tom Lane (extracted from a larger patch by Jian, with some additional work by me) Discussion: https://postgr.es/m/CACJufxEmONE3P2En=jopZy1m=cCCUs65M4+1o52MW5og9oaUPA@mail.gmail.com	2024-10-31 16:09:27 -04:00
David Rowley	3974bc3196	Remove unused field from SubPlanState struct `bf6c614a2` did some conversion work to use ExprState instead of manually calling equality functions to check if one set of values is not distinct from another set. That patch removed many of the fields that became redundant as a result of that change, but it forgot to remove SubPlanState.tab_eq_funcs. Fix that. In passing, fix the header comment for TupleHashEntryData to correctly spell the field name it's talking about. Author: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CA+FpmFeycdombFzrjZw7Rmc29CVm4OOzCWwu=dVBQ6q=PX8SvQ@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrWR2jYVhec=COyF2g2BE_ns91NDsCHAMFiXbyhEujKdQ@mail.gmail.com	2024-10-31 13:44:15 +13:00
Peter Geoghegan	763d65ae25	Fix bug in nbtree array primitive scan scheduling. A bug in nbtree's handling of primitive index scan scheduling could lead to wrong answers when a scrollable cursor was used with an index scan that had a SAOP index qual. Wrong answers were only possible when the scan direction changed after a primitive scan was scheduled, but before _bt_next was asked to fetch the next tuple in line (i.e. for things to break, _bt_next had to be denied the opportunity to step off the page in the same direction as the one used when the primscan was scheduled). Furthermore, the issue only occurred when the page in question happened to be the first page to be visited by the entire top-level scan; the issue hinged upon the cursor backing up to the absolute beginning of the key space that it returns tuples from (fetching in the opposite scan direction across a "primitive scan boundary" always worked correctly). To fix, make _bt_next unset the "needs primitive index scan" flag when it detects that the current scan direction is not the one that was used by _bt_readpage back when the primitive scan in question was scheduled. This fixes the cases that are known to be faulty, and also seems like a good idea on general robustness grounds. Affected scrollable cursor cases now avoid a spurious primitive index scan when they fetch backwards to the absolute start of the key space to be visited by their cursor. Fetching backwards now only returns those tuples at the start of the scan, as expected. It'll also be okay to once again fetch forwards from the start at that point, since the scan will be left in a state that's exactly consistent with the state it was in before any tuples were ever fetched, as expected. Oversight in commit `5bf748b8`, which enhanced nbtree ScalarArrayOp execution. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wznv49bFsE2jkt4GuZ0tU2C91dEST=50egzjY2FeOcHL4Q@mail.gmail.com Backpatch: 17-, where commit `5bf748b8` first appears.	2024-10-30 10:57:19 -04:00
Amit Kapila	745217a051	Replicate generated columns when specified in the column list. This commit allows logical replication to publish and replicate generated columns when explicitly listed in the column list. We also ensured that the generated columns were copied during the initial tablesync when they were published. We will allow to replicate generated columns even when they are not specified in the column list (via a new publication option) in a separate commit. The motivation of this work is to allow replication for cases where the client doesn't have generated columns. For example, the case where one is trying to replicate data from Postgres to the non-Postgres database. Author: Shubham Khanna, Vignesh C, Hou Zhijie Reviewed-by: Peter Smith, Hayato Kuroda, Shlok Kyal, Amit Kapila Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com	2024-10-30 12:36:26 +05:30
Noah Misch	30d47ec8c6	Unpin buffer before inplace update waits for an XID to end. Commit `a07e03fd8f` changed inplace updates to wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. By keeping the pin during that wait, a sequence of autovacuum workers and an uncommitted GRANT starved one foreground LockBufferForCleanup() for six minutes, on buildfarm member sarus. Prevent, at the cost of a bit of complexity. Back-patch to v12, like the earlier commit. That commit and heap_inplace_lock() have not yet appeared in any release. Discussion: https://postgr.es/m/20241026184936.ae.nmisch@google.com	2024-10-29 09:39:55 -07:00
Michael Paquier	4b7bba49e7	doc: Add better description for rewrite functions in event triggers There are two functions that can be used in event triggers to get more details about a rewrite happening on a relation. Both had a limited documentation: - pg_event_trigger_table_rewrite_reason() and pg_event_trigger_table_rewrite_oid() were not mentioned in the main event trigger section in the paragraph dedicated to the event table_rewrite. - pg_event_trigger_table_rewrite_reason() returns an integer which is a bitmap of the reasons why a rewrite happens. There was no explanation about the meaning of these values, forcing the reader to look at the code to find out that these are defined in event_trigger.h. While on it, let's add a comment in event_trigger.h where the AT_REWRITE_* are defined, telling to update the documentation when these values are changed. Backpatch down to 13 as a consequence of `1ad23335f3`, where this area of the documentation has been heavily reworked. Author: Greg Sabino Mullane Discussion: https://postgr.es/m/CAKAnmmL+Z6j-C8dAx1tVrnBmZJu+BSoc68WSg3sR+CVNjBCqbw@mail.gmail.com Backpatch-through: 13	2024-10-29 15:35:01 +09:00
Tom Lane	11b7de4a78	Unify src/common/'s definitions of MaxAllocSize. As threatened in the previous patch, define MaxAllocSize in src/include/common/fe_memutils.h rather than having several copies of it in different src/common/*.c files. This also provides an opportunity to document it better. While this would probably be safe to back-patch, I'll refrain (for now anyway).	2024-10-28 14:39:01 -04:00
Michael Paquier	6b652e6ce8	Set query ID for inner queries of CREATE TABLE AS and DECLARE Some utility statements contain queries that can be planned and executed: CREATE TABLE AS and DECLARE CURSOR. This commit adds query ID computation for the inner queries executed by these two utility commands, with and without EXPLAIN. This change leads to four new callers of JumbleQuery() and post_parse_analyze_hook() so as extensions can decide what to do with this new data. Previously, extensions relying on the query ID, like pg_stat_statements, were not able to track these nested queries as the query_id was 0. For pg_stat_statements, this commit leads to additions under !toplevel when pg_stat_statements.track is set to "all", as shown in its regression tests. The output of EXPLAIN for these two utilities gains a "Query Identifier" if compute_query_id is enabled. Author: Anthonin Bonnefoy Reviewed-by: Michael Paquier, Jian He Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com	2024-10-28 09:03:20 +09:00
Melanie Plageman	de380a62b5	Make table_scan_bitmap_next_block() async-friendly Move all responsibility for indicating a block is exhuasted into table_scan_bitmap_next_tuple() and advance the main iterator in heap-specific code. This flow control makes more sense and is a step toward using the read stream API for bitmap heap scans. Previously, table_scan_bitmap_next_block() returned false to indicate table_scan_bitmap_next_tuple() should not be called for the tuples on the page. This happened both when 1) there were no visible tuples on the page and 2) when the block returned by the iterator was past the end of the table. BitmapHeapNext() (generic bitmap table scan code) handled the case when the bitmap was exhausted. It makes more sense for table_scan_bitmap_next_tuple() to return false when there are no visible tuples on the page and table_scan_bitmap_next_block() to return false when the bitmap is exhausted or there are no more blocks in the table. As part of this new design, TBMIterateResults are no longer used as a flow control mechanism in BitmapHeapNext(), so we removed table_scan_bitmap_next_tuple's TBMIterateResult parameter. Note that the prefetch iterator is still saved in the BitmapHeapScanState node and advanced in generic bitmap table scan code. This is because 1) it was not necessary to change the prefetch iterator location to change the flow control in BitmapHeapNext() 2) modifying prefetch iterator management requires several more steps better split over multiple commits and 3) the prefetch iterator will be removed once the read stream API is used. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas, Mark Dilger Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-10-25 10:11:58 -04:00
Melanie Plageman	7bd7aa4d30	Move EXPLAIN counter increment to heapam_scan_bitmap_next_block Increment the lossy and exact page counters for EXPLAIN of bitmap heap scans in heapam_scan_bitmap_next_block(). Note that other table AMs will need to do this as well Pushing the counters into heapam_scan_bitmap_next_block() is required to be able to use the read stream API for bitmap heap scans. The bitmap iterator must be advanced from inside the read stream callback, so TBMIterateResults cannot be used as a flow control mechanism in BitmapHeapNext(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-10-25 10:11:46 -04:00
Noah Misch	243e9b40f1	For inplace update, send nontransactional invalidations. The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com	2024-10-25 06:51:02 -07:00
Michael Paquier	248c2d1923	Refactor code converting a publication name List to a StringInfo The existing get_publications_str() is renamed to GetPublicationsStr() and is moved to pg_subscription.c, so as it is possible to reuse it at two locations of the tablesync code where the same logic was duplicated. fetch_remote_table_info() was doing two List->StringInfo conversions when dealing with a server of version 15 or newer. The conversion happens only once now. This refactoring leads to less code overall. Author: Peter Smith Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/CAHut+PtJMk4bKXqtpvqVy9ckknCgK9P6=FeG8zHF=6+Em_Snpw@mail.gmail.com	2024-10-25 12:02:04 +09:00
Jeff Davis	d32d146399	Add functions pg_restore_relation_stats(), pg_restore_attribute_stats(). Similar to the pg_set_*_stats() functions, except with a variadic signature that's designed to be more future-proof. Additionally, most problems are reported as WARNINGs rather than ERRORs, allowing most stats to be restored even if some cannot. These functions are intended to be called from pg_dump to avoid the need to run ANALYZE after an upgrade. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com	2024-10-24 12:08:00 -07:00
Daniel Gustafsson	45188c2ea2	Support configuring TLSv1.3 cipher suites The ssl_ciphers GUC can only set cipher suites for TLSv1.2, and lower, connections. For TLSv1.3 connections a different OpenSSL API must be used. This adds a new GUC, ssl_tls13_ciphers, which can be used to configure a colon separated list of cipher suites to support when performing a TLSv1.3 handshake. Original patch by Erica Zhang with additional hacking by me. Author: Erica Zhang <ericazhangy2021@qq.com> Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/tencent_063F89FA72CCF2E48A0DF5338841988E9809@qq.com	2024-10-24 15:20:32 +02:00
Daniel Gustafsson	6c66b7443c	Raise the minimum supported OpenSSL version to 1.1.1 Commit `a70e01d430` retired support for OpenSSL 1.0.2 in order to get rid of the need for manual initialization of the library. This left our API usage compatible with 1.1.0 which was defined as the minimum required version. Also mention that 3.4 is the minimum version required when using LibreSSL. An upcoming commit will introduce support for configuring TLSv1.3 cipher suites which require an API call in OpenSSL 1.1.1 and onwards. In order to support this setting this commit will set v1.1.1 as the new minimum required version. The version-specific call for randomness init added in commit `c3333dbc0c` is removed as it's no longer needed. Author: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/909A668B-06AD-47D1-B8EB-A164211AAD16@yesql.se Discussion: https://postgr.es/m/tencent_063F89FA72CCF2E48A0DF5338841988E9809@qq.com	2024-10-24 15:20:19 +02:00
Alexander Korotkov	e546989a26	Add 'no_error' argument to pg_wal_replay_wait() This argument allow skipping throwing an error. Instead, the result status can be obtained using pg_wal_replay_wait_status() function. Catversion is bumped. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZtUF17gF0pNpwZDI%40paquier.xyz Reviewed-by: Pavel Borisov	2024-10-24 15:02:21 +03:00
Alexander Korotkov	73da6b8d1b	Refactor WaitForLSNReplay() to return the result of waiting Currently, WaitForLSNReplay() immediately throws an error if waiting for LSN replay is not successful. This commit teaches WaitForLSNReplay() to return the result of waiting, while making pg_wal_replay_wait() responsible for throwing an appropriate error. This is preparation to adding 'no_error' argument to pg_wal_replay_wait() and new function pg_wal_replay_wait_status(), which returns the last wait result status. Additionally, we stop distinguishing situations when we find our instance to be not in a recovery state before entering the waiting loop and inside the waiting loop. Standby promotion may happen at any moment, even between issuing a procedure call statement and pg_wal_replay_wait() doing a first check of recovery status. Thus, there is no pointing distinguishing these situations. Also, since we may exit the waiting loop and see our instance not in recovery without throwing an error, we need to deleteLSNWaiter() in that case. We do this unconditionally for the sake of simplicity, even if standby was already promoted after reaching the target LSN, the startup process surely already deleted us. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZtUF17gF0pNpwZDI%40paquier.xyz Reviewed-by: Michael Paquier, Pavel Borisov	2024-10-24 14:38:27 +03:00
Alexander Korotkov	5035172e4a	Move LSN waiting declarations and definitions to better place `3c5db1d6b` implemented the pg_wal_replay_wait() stored procedure. Due to the patch development history, the implementation resided in src/backend/commands/waitlsn.c (src/include/commands/waitlsn.h for headers). `014f9f34d` moved pg_wal_replay_wait() itself to src/backend/access/transam/xlogfuncs.c near to the WAL-manipulation functions. But most of the implementation stayed in place. The code in src/backend/commands/waitlsn.c has nothing to do with commands, but is related to WAL. So, this commit moves this code into src/backend/access/transam/xlogwait.c (src/include/access/xlogwait.h for headers). Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org Reviewed-by: Pavel Borisov	2024-10-24 14:37:53 +03:00
Alexander Korotkov	b85a9d046e	Avoid looping over all type cache entries in TypeCacheRelCallback() Currently, when a single relcache entry gets invalidated, TypeCacheRelCallback() has to loop over all type cache entries to find appropriate typentry to invalidate. Unfortunately, using the syscache here is impossible, because this callback could be called outside a transaction and this makes impossible catalog lookups. This is why present commit introduces RelIdToTypeIdCacheHash to map relation OID to its composite type OID. We are keeping RelIdToTypeIdCacheHash entry while corresponding type cache entry have something to clean. Therefore, RelIdToTypeIdCacheHash shouldn't get bloat in the case of temporary tables flood. There are many places in lookup_type_cache() where syscache invalidation, user interruption, or even error could occur. In order to handle this, we keep an array of in-progress type cache entries. In the case of lookup_type_cache() interruption this array is processed to keep RelIdToTypeIdCacheHash in a consistent state. Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru Author: Teodor Sigaev Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov Reviewed-by: Andrei Lepikhov, Pavel Borisov, Jian He, Alexander Lakhin Reviewed-by: Artur Zakirov	2024-10-24 14:35:52 +03:00
Michael Paquier	499edb0974	Track more precisely query locations for nested statements Previously, a Query generated through the transform phase would have unset stmt_location, tracking the starting point of a query string. Extensions relying on the statement location to extract its relevant parts in the source text string would fallback to use the whole statement instead, leading to confusing results like in pg_stat_statements for queries relying on nested queries, like: - EXPLAIN, with top-level and nested query using the same query string, and a query ID coming from the nested query when the non-top-level entry. - Multi-statements, with only partial portions of queries being normalized. - COPY TO with a query, SELECT or DMLs. This patch improves things by keeping track of the statement locations and propagate it to Query during transform, allowing PGSS to only show the relevant part of the query for nested query. This leads to less bloat in entries for non-top-level entries, as queries can now be grouped within the same (toplevel, queryid) duos in pg_stat_statements. The result gives a stricter one-one mapping between query IDs and its query strings. The regression tests introduced in `45e0ba30fc` produce differences reflecting the new logic. Author: Anthonin Bonnefoy Reviewed-by: Michael Paquier, Jian He Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com	2024-10-24 09:29:54 +09:00
Masahiko Sawada	7b8b8dddd6	Fix typo in tidstore.h. An oversight in commit `f6bef362c`. Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAD21AoB8MJ5OHtpUw1UEGf7spioFmP3PNH44KNx6Yb3FiZSwKA%40mail.gmail.com	2024-10-23 15:37:00 -07:00
Daniel Gustafsson	6d16f9deba	Make SASL max message length configurable The proposed OAUTHBEARER SASL mechanism will need to allow larger messages in the exchange, since tokens are sent directly by the client. Move this limit into the pg_be_sasl_mech struct so that it can be changed per-mechanism. Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAOYmi+nqX_5=Se0W0Ynrr55Fha3CMzwv_R9P3rkpHb=1kG7ZTQ@mail.gmail.com	2024-10-23 16:10:27 +02:00
Jeff Davis	ce207d2a79	Add functions pg_set_attribute_stats() and pg_clear_attribute_stats(). Enable manipulation of attribute statistics. Only superficial validation is performed, so it's possible to add nonsense, and it's up to the planner (or other users of statistics) to behave reasonably in that case. Bump catalog version. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com	2024-10-22 15:06:55 -07:00
Jeff Davis	dbe6bd4343	Change pg_*_relation_stats() functions to return type to void. These functions will either raise an ERROR or run to normal completion, so no return value is necessary. Bump catalog version. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=cBF8rnphuTyHFi3KYzB9ByDgx57HwK9Rz2yp7S+Om87w@mail.gmail.com	2024-10-22 12:48:01 -07:00
Peter Eisentraut	d2b4b4c225	Fix C23 compiler warning The approach of declaring a function pointer with an empty argument list and hoping that the compiler will not complain about casting it to another type no longer works with C23, because foo() is now equivalent to foo(void). We don't need to do this here. With a few struct forward declarations we can supply a correct argument list without having to pull in another header file. (This is the only new warning with C23. Together with the previous fix `a67a49648d`, this makes the whole code compile cleanly under C23.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/95c6a9bf-d306-43d8-b880-664ef08f2944%40eisentraut.org	2024-10-22 08:17:42 +02:00
Amit Langote	11c87216d1	SQL/JSON: Fix some oversights in commit `b6e1157e7` The decision in `b6e1157e7` to ignore raw_expr when evaluating a JsonValueExpr was incorrect. While its value is not ultimately used (since formatted_expr's value is), failing to initialize it can lead to problems, for instance, when the expression tree in raw_expr contains Aggref nodes, which must be initialized to ensure the parent Agg node works correctly. Also, optimize eval_const_expressions_mutator()'s handling of JsonValueExpr a bit. Currently, when formatted_expr cannot be folded into a constant, we end up processing it twice -- once directly in eval_const_expressions_mutator() and again recursively via ece_generic_processing(). This recursive processing is required to handle raw_expr. To avoid the redundant processing of formatted_expr, we now process raw_expr directly in eval_const_expressions_mutator(). Finally, update the comment of JsonValueExpr to describe the roles of raw_expr and formatted_expr more clearly. Bug: #18657 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org Backpatch-through: 16	2024-10-20 12:20:55 +09:00
Tom Lane	52475b4d30	Fix comment about pg_authid. pg_shadow is not "publicly readable". (pg_group is, but there seems no need to make that distinction here.) Seems to be a thinko dating clear back to `7762619e9`. Antonin Houska Discussion: https://postgr.es/m/31926.1729252247@antos	2024-10-19 11:44:14 -04:00
Peter Geoghegan	1bd4bc85ca	Optimize nbtree backwards scans. Make nbtree backwards scans optimistically access the next page to be read to the left by following a prevPage block number that's now stashed in currPos when the leaf page is first read. This approach matches the one taken during forward scans, which follow a symmetric nextPage block number from currPos. We stash both a prevPage and a nextPage, since the scan direction might change (when fetching from a scrollable cursor). Backwards scans will no longer need to lock the same page twice, except in rare cases where the scan detects a concurrent page split (or page deletion). Testing has shown this optimization to be particularly effective during parallel index-only backwards scans: ~12% reductions in query execution time are quite possible. We're much better off being optimistic; concurrent left sibling page splits are rare in general. It's possible that we'll need to lock more pages than the pessimistic approach would have, but only when there are _multiple_ concurrent splits of the left sibling page we now start at. If there's just a single concurrent left sibling page split, the new approach to scanning backwards will at least break even relative to the old one (we'll acquire the same number of leaf page locks as before). The optimization from this commit has long been contemplated by comments added by commit `2ed5b87f96`, which changed the rules for locking/pinning during nbtree index scans. The approach that that commit introduced to leaf level link traversal when scanning forwards is now more or less applied all the time, regardless of the direction we're scanning in. Following uniform conventions around sibling link traversal is simpler. The only real remaining difference between our forward and backwards handling is that our backwards handling must still detect and recover from any concurrent left sibling splits (and concurrent page deletions), as documented in the nbtree README. That is structured as a single, isolated extra step that takes place in _bt_readnextpage. Also use this opportunity to further simplify the functions that deal with reading pages and traversing sibling links on the leaf level, and to document their preconditions and postconditions (with respect to things like buffer locks, buffer pins, and seizing the parallel scan). This enhancement completely supersedes the one recently added by commit `3f44959f`. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com	2024-10-18 11:25:32 -04:00
Jeff Davis	eecd9138a0	Improve ThrowErrorData() comments for use with soft errors. Reviewed-by: Corey Huinker Discussion: https://postgr.es/m/901ab7cf01957f92ea8b30b6feeb0eacfb7505fc.camel@j-davis.com	2024-10-17 14:56:44 -07:00
Peter Eisentraut	eafda78fc4	Improve node type forward reference Instead of using Node *, we can use an incomplete struct. That way, everything has the correct type and fewer casts are required. This technique is already used elsewhere in node type definitions. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org	2024-10-17 08:36:48 +02:00
David Rowley	9ca67658d1	Don't store intermediate hash values in ExprState->resvalue `adf97c156` made it so ExprStates could support hashing and changed Hash Join to use that instead of manually extracting Datums from tuples and hashing them one column at a time. When hashing multiple columns or expressions, the code added in that commit stored the intermediate hash value in the ExprState's resvalue field. That was a mistake as steps may be injected into the ExprState between each hashing step that look at or overwrite the stored intermediate hash value. EEOP_PARAM_SET is an example of such a step. Here we fix this by adding a new dedicated field for storing intermediate hash values and adjust the code so that all apart from the final hashing step store their result in the intermediate field. In passing, rename a variable so that it's more aligned to the surrounding code and also so a few lines stay within the 80 char margin. Reported-by: Andres Freund Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com	2024-10-17 14:25:08 +13:00
Peter Geoghegan	79fa7b3b1a	Normalize nbtree truncated high key array behavior. Commit `5bf748b8` taught nbtree ScalarArrayOp index scans to decide when and how to start the next primitive index scan based on physical index characteristics. This included rules for deciding whether to start a new primitive index scan (or whether to move onto the right sibling leaf page instead) that specifically consider truncated lower-order columns (-inf columns) from leaf page high keys. These omitted columns were treated as satisfying the scan's required scan keys, though only for scan keys marked required in the current scan direction (forward). Scan keys that didn't get this behavior (those marked required in the backwards direction only) usually didn't give the scan reasonable cause to reposition itself to a later leaf page (via another descent of the index in _bt_first), but _bt_advance_array_keys would nevertheless always give up by forcing another call to _bt_first. _bt_advance_array_keys was unwilling to allow the scan to continue onto the next leaf page, to reconsider whether we really should start another primitive scan based on the details of the sibling page's tuples. This didn't match its behavior with similar cases involving keys required in the current scan direction (forward), which seems unprincipled. It led to an excessive number of primitive scans/index descents for queries with a higher-order = array scan key (with dense, contiguous values) mixed with a lower-order required > or >= scan key. Bring > and >= strategy scan keys in line with other required scan key types: treat truncated -inf scan keys as having satisfied scan keys required in either scan direction (forwards and backwards alike) during array advancement. That way affected scans can continue to the right sibling leaf page. Advancement must now schedule an explicit recheck of the right sibling page's high key in cases involving > or >= scan keys. The recheck gives the scan a way to back out and start another primitive index scan (we can't just rely on _bt_checkkeys with > or >= scan keys). This work can be considered a stand alone optimization on top of the work from commit `5bf748b8`. But it was written in preparation for an upcoming patch that will add skip scan to nbtree. In practice scans that use "skip arrays" will tend to be much more sensitive to any implementation deficiencies in this area. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-Wz=9A_UtM7HzUThSkQ+BcrQsQZuNhWOvQWK06PRkEp=SKQ@mail.gmail.com	2024-10-16 12:17:49 -04:00
Nathan Bossart	d5ca15ee54	Add type cast to foreach_internal's loop variable. C++ requires explicitly casting void pointers to the appropriate pointer type, which means the foreach_ptr macro cannot be used in C++ code without this change. Author: Jelte Fennema-Nio Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/CAGECzQSYG3QfHrc-rOk2KbnB9iJOd7Qu-Xii1s-GTA%3D3JFt49Q%40mail.gmail.com Backpatch-through: 17	2024-10-15 16:20:49 -05:00
David Rowley	2453196107	Move clause_sides_match_join() into restrictinfo.h Two near-identical copies of clause_sides_match_join() existed in joinpath.c and analyzejoins.c. Deduplicate this by moving the function into restrictinfo.h. It isn't quite clear that keeping the inline property of this function is worthwhile, but this commit is just an exercise in code deduplication. More effort would be required to determine if the inline property is worth keeping. Author: James Hunter <james.hunter.pg@gmail.com> Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com	2024-10-15 21:14:21 +13:00
Masahiko Sawada	7cdfeee320	Add contrib/pg_logicalinspect. This module provides SQL functions that allow to inspect logical decoding components. It currently allows to inspect the contents of serialized logical snapshots of a running database cluster, which is useful for debugging or educational purposes. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith, Peter Eisentraut Reviewed-by: David G. Johnston Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal	2024-10-14 17:22:02 -07:00
Masahiko Sawada	e2fd615ecc	Move SnapBuild and SnapBuildOnDisk structs to snapshot_internal.h. This commit moves the definitions of the SnapBuild and SnapBuildOnDisk structs, related to logical snapshots, to the snapshot_internal.h file. This change allows external tools, such as pg_logicalinspect (with an upcoming patch), to access and utilize the contents of logical snapshots. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal	2024-10-14 17:19:33 -07:00
Masahiko Sawada	7be4ba4a9d	Remove obsolete comment in reorderbuffer.h. Commit `9fab40ad32` changed ReorderBuffer to use Slab Context for allocating ReorderBufferTXN entries instead of using a caching mechanism. The txn->node is no longer used as an element of the list of preallocated ReorderBufferTXNs. Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAD21AoB1CTnX66Ji3zTCnjoPVC9OzYe0B6LygUHcxEB2RV-hFw%40mail.gmail.com	2024-10-14 09:53:05 -07:00
Peter Eisentraut	c594f1ad2b	Track scan reversals in MergeJoin The MergeJoin struct was tracking "mergeStrategies", which were an array of btree strategy numbers, purely for the purpose of comparing it later against btree strategies to determine if the scan direction was forward or reverse. Change that. Instead, track "mergeReversals", an array of bool, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:18 +02:00
Peter Eisentraut	0d2aa4d493	Track sort direction in SortGroupClause Functions make_pathkey_from_sortop() and transformWindowDefinitions(), which receive a SortGroupClause, were determining the sort order (ascending vs. descending) by comparing that structure's operator strategy to BTLessStrategyNumber, but could just as easily have gotten it from the SortGroupClause object, if it had such a field, so add one. This reduces the number of places that hardcode the assumption that the strategy refers specifically to a btree strategy, rather than some other index AM's operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:02 +02:00
Peter Eisentraut	a2d9a9b95a	Remove traces of BeOS. Commit `15abc7788e` tolerated namespace pollution from BeOS system headers. Commit `44f902122` de-supported BeOS. Since that stuff didn't make it into the Meson build system, synchronize by removing from configure. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (the idea, not the patch) Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM	2024-10-14 08:33:36 +02:00
Jeff Davis	e839c8ecc9	Create functions pg_set_relation_stats, pg_clear_relation_stats. These functions are used to tweak statistics on any relation, provided that the user has MAINTAIN privilege on the relation, or is the database owner. Bump catalog version. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com	2024-10-11 16:55:11 -07:00
Daniel Gustafsson	6f782a2a17	Avoid mixing custom and OpenSSL BIO functions PostgreSQL has for a long time mixed two BIO implementations, which can lead to subtle bugs and inconsistencies. This cleans up our BIO by just just setting up the methods we need. This patch does not introduce any functionality changes. The following methods are no longer defined due to not being needed: - gets: Not used by libssl - puts: Not used by libssl - create: Sets up state not used by libpq - destroy: Not used since libpq use BIO_NOCLOSE, if it was used it close the socket from underneath libpq - callback_ctrl: Not implemented by sockets The following methods are defined for our BIO: - read: Used for reading arbitrary length data from the BIO. No change in functionality from the previous implementation. - write: Used for writing arbitrary length data to the BIO. No change in functionality from the previous implementation. - ctrl: Used for processing ctrl messages in the BIO (similar to ioctl). The only ctrl message which matters is BIO_CTRL_FLUSH used for writing out buffered data (or signal EOF and that no more data will be written). BIO_CTRL_FLUSH is mandatory to implement and is implemented as a no-op since there is no intermediate buffer to flush. BIO_CTRL_EOF is the out-of-band method for signalling EOF to read_ex based BIO's. Our BIO is not read_ex based but someone could accidentally call BIO_CTRL_EOF on us so implement mainly for completeness sake. As the implementation is no longer related to BIO_s_socket or calling SSL_set_fd, methods have been renamed to reference the PGconn and Port types instead. This also reverts back to using BIO_set_data, with our fallback, as a small optimization as BIO_set_app_data require the ex_data mechanism in OpenSSL. Author: David Benjamin <davidben@google.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAF8qwaCZ97AZWXtg_y359SpOHe+HdJ+p0poLCpJYSUxL-8Eo8A@mail.gmail.com	2024-10-11 21:58:58 +02:00
Nathan Bossart	4e1fad3787	Add pg_ls_summariesdir(). This function returns the name, size, and last modification time of each regular file in pg_wal/summaries. This allows administrators to grant privileges to view the contents of this directory without granting privileges on pg_ls_dir(), which allows listing the contents of many other directories. This commit also gives the pg_monitor predefined role EXECUTE privileges on the new pg_ls_summariesdir() function. Bumps catversion. Author: Yushi Ogiwara Reviewed-by: Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/a0a3af15a9b9daa107739eb45aa9a9bc%40oss.nttdata.com	2024-10-11 11:02:09 -05:00
Álvaro Herrera	fd64ed60b6	Unbreak overflow test for attinhcount/coninhcount Commit `90189eefc1` narrowed pg_attribute.attinhcount and pg_constraint.coninhcount from 32 to 16 bits, but kept other related structs with 32-bit wide fields: ColumnDef and CookedConstraint contain an int 'inhcount' field which is itself checked for overflow on increments, but there's no check that the values aren't above INT16_MAX before assigning to the catalog columns. This means that a creative user can get a inconsistent table definition and override some protections. Fix it by changing those other structs to also use int16. Also, modernize style by using pg_add_s16_overflow for overflow testing instead of checking for negative values. We also have Constraint.inhcount, which is here removed completely. This was added by commit `b0e96f3119` and not removed by its revert at `6f8bb7c1e9`. It is not needed by the upcoming not-null constraints patch. This is mostly academic, so we agreed not to backpatch to avoid ABI problems. Bump catversion because of the changes to parse nodes. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Co-authored-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202410081611.up4iyofb5ie7@alvherre.pgsql	2024-10-10 17:41:01 +02:00
Fujii Masao	1909835c28	Improve descriptions of some pg_stat_checkpoints functions in pg_proc.dat. Previously, the descriptions of pg_stat_get_checkpointer_num_requested(), pg_stat_get_checkpointer_restartpoints_requested(), and pg_stat_get_checkpointer_restartpoints_performed() in pg_proc.dat referred to "backend". This was misleading because these functions report the number of checkpoints or restartpoints requested or performed by other than backends as well. This commit removes "backend" from these descriptions to avoid confusion. Bump catalog version. Idea from Anton A. Melnikov Author: Fujii Masao Reviewed-by: Anton A. Melnikov Discussion: https://postgr.es/m/8e5f353f-8b31-4a8e-9cfa-c037f22b4aee@postgrespro.ru	2024-10-11 00:12:29 +09:00
Michael Paquier	de3a2ea3b2	Introduce two fields in EState to track parallel worker activity These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:07:48 +09:00
Tom Lane	2d24fd942c	Add min and max aggregates for bytea type. Similar to `a0f1fce80`, although we chose to duplicate logic rather than invoke byteacmp, primarily to avoid repeat detoasting. Marat Buharov, Aleksander Alekseev Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com	2024-10-08 13:52:14 -04:00
Andres Freund	57f3702471	Use aux process resource owner in walsender AIO will need a resource owner to do IO. Right now we create a resowner on-demand during basebackup, and we could do the same for AIO. But it seems easier to just always create an aux process resowner. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Andres Freund	755a4c10d1	bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers() With real AIO it doesn't make sense to cross segment boundaries with one IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be merged. We could continue to cross segment boundaries when not using AIO, but it doesn't really make sense, because md.c will never be able to perform the read across the segment boundary in one system call. Which means we'll mark more buffers as undergoing IO than really makes sense - if another backend desires to read the same blocks, it'll be blocked longer than necessary. So it seems better to just never cross the boundary. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Heikki Linnakangas	2bbc261ddb	Use an shmem_exit callback to remove backend from PMChildFlags on exit This seems nicer than having to duplicate the logic between InitProcess() and ProcKill() for which child processes have a PMChildFlags slot. Move the MarkPostmasterChildActive() call earlier in InitProcess(), out of the section protected by the spinlock. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-10-08 15:06:34 +03:00
Fujii Masao	4ac2a9bece	Add REJECT_LIMIT option to the COPY command. Previously, when ON_ERROR was set to 'ignore', the COPY command would skip all rows with data type conversion errors, with no way to limit the number of skipped rows before failing. This commit introduces the REJECT_LIMIT option, allowing users to specify the maximum number of erroneous rows that can be skipped. If more rows encounter data type conversion errors than allowed by REJECT_LIMIT, the COPY command will fail with an error, even when ON_ERROR = 'ignore'. Author: Atsushi Torikoshi Reviewed-by: Junwang Zhao, Kirill Reshke, jian he, Fujii Masao Discussion: https://postgr.es/m/63f99327aa6b404cc951217fa3e61fe4@oss.nttdata.com	2024-10-08 18:19:58 +09:00
Nathan Bossart	8275325a06	Restrict password hash length. Commit `6aa44060a3` removed pg_authid's TOAST table because the only varlena column is rolpassword, which cannot be de-TOASTed during authentication because we haven't selected a database yet and cannot read pg_class. Since that change, attempts to set password hashes that require out-of-line storage will fail with a "row is too big" error. This error message might be confusing to users. This commit places a limit on the length of password hashes so that attempts to set long password hashes will fail with a more user-friendly error. The chosen limit of 512 bytes should be sufficient to avoid "row is too big" errors independent of BLCKSZ, but it should also be lenient enough for all reasonable use-cases (or at least all the use-cases we could imagine). Reviewed-by: Tom Lane, Jonathan Katz, Michael Paquier, Jacob Champion Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com	2024-10-07 10:56:16 -05:00
Heikki Linnakangas	094ae07160	Remove unneeded #include Unneeded since commit `d72731a704`. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi	2024-10-05 15:09:32 +03:00
Heikki Linnakangas	6c0c49f7d3	Remove unused latch It was left unused by commit `bc971f4025`, which replaced the latch usage with a condition variable Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi	2024-10-05 15:09:27 +03:00
Fujii Masao	e7834a1a25	Add log_verbosity = 'silent' support to COPY command. Previously, when the on_error option was set to ignore, the COPY command would always log NOTICE messages for input rows discarded due to data type incompatibility. Users had no way to suppress these messages. This commit introduces a new log_verbosity setting, 'silent', which prevents the COPY command from emitting NOTICE messages when on_error = 'ignore' is used, even if rows are discarded. This feature is particularly useful when processing malformed files frequently, where a flood of NOTICE messages can be undesirable. For example, when frequently loading malformed files via the COPY command or querying foreign tables using file_fdw (with an upcoming patch to add on_error support for file_fdw), users may prefer to suppress these messages to reduce log noise and improve clarity. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com	2024-10-03 15:55:37 +09:00
Robert Haas	d94cf5ca7f	File size in a backup manifest should use uint64, not size_t. size_t is the size of an object in memory, not the size of a file on disk. Thanks to Tom Lane for noting the error. Discussion: http://postgr.es/m/1865585.1727803933@sss.pgh.pa.us	2024-10-02 09:59:04 -04:00
Fujii Masao	17cc5f666f	Fix inconsistent reporting of checkpointer stats. Previously, the pg_stat_checkpointer view and the checkpoint completion log message could show different numbers for buffers written during checkpoints. The view only counted shared buffers, while the log message included both shared and SLRU buffers, causing inconsistencies. This commit resolves the issue by updating both the view and the log message to separately report shared and SLRU buffers written during checkpoints. A new slru_written column is added to the pg_stat_checkpointer view to track SLRU buffers, while the existing buffers_written column now tracks only shared buffers. This change would help users distinguish between the two types of buffers, in the pg_stat_checkpointer view and the checkpoint complete log message, respectively. Bump catalog version. Author: Nitin Jadhav Reviewed-by: Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Robert Haas Reviewed-by: Andres Freund, vignesh C, Fujii Masao Discussion: https://postgr.es/m/CAMm1aWb18EpT0whJrjG+-nyhNouXET6ZUw0pNYYAe+NezpvsAA@mail.gmail.com	2024-10-02 11:17:47 +09:00
Peter Eisentraut	10b721821d	Use macro to define the number of enum values Refactoring in the interest of code consistency, a follow-up to `2e068db56e`. The argument against inserting a special enum value at the end of the enum definition is that a switch statement might generate a compiler warning unless it has a default clause. Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com	2024-10-01 09:30:24 -04:00
Peter Eisentraut	9c2a6c5a5f	Simplify checking for xlocale.h Instead of XXX_IN_XLOCALE_H for several features XXX, let's just include <xlocale.h> if HAVE_XLOCALE_H. The reason for the extra complication was apparently that some old glibc systems also had an <xlocale.h>, and you weren't supposed to include it directly, but it's gone now (as far as I can tell it was harmless to do so anyway). Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech	2024-10-01 07:23:45 -04:00
Peter Eisentraut	ee4859123e	jit: Use opaque pointers in all supported LLVM versions. LLVM's opaque pointer change began in LLVM 14, but remained optional until LLVM 16. When commit `37d5babb` added opaque pointer support, we didn't turn it on for LLVM 14 and 15 yet because we didn't want to risk weird bitcode incompatibility problems in released branches of PostgreSQL. (That might have been overly cautious, I don't know.) Now that PostgreSQL 18 has dropped support for LLVM versions < 14, and since it hasn't been released yet and no extensions or bitcode have been built against it in the wild yet, we can be more aggressive. We can rip out the support code and build system clutter that made opaque pointer use optional. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussions: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com	2024-10-01 06:10:15 -04:00
Michael Paquier	4c7cd07aa6	Bump catalog version for change in VariableSetStmt Oversight in `dc68515968`, as this breaks SQL functions with a SET command. Reported-by: Tom Lane Discussion: https://postgr.es/m/1364409.1727673407@sss.pgh.pa.us	2024-09-30 14:52:03 +09:00
Michael Paquier	dc68515968	Show values of SET statements as constants in pg_stat_statements This is a continuation of work like `11c34b342b`, done to reduce the bloat of pg_stat_statements by applying more normalization to query entries. This commit is able to detect and normalize values in VariableSetStmt, resulting in: SET conf_param = $1 Compared to other parse nodes, VariableSetStmt is embedded in much more places in the parser, impacting many query patterns in pg_stat_statements. A custom jumble function is used, with an extra field in the node to decide if arguments should be included in the jumbling or not, a location field being not enough for this purpose. This approach allows for a finer tuning. Clauses relying on one or more keywords are not normalized, for example: * DEFAULT * FROM CURRENT * List of keywords. SET SESSION CHARACTERISTICS AS TRANSACTION, where it is critical to differentiate different sets of options, is a good example of why normalization should not happen. Some queries use VariableSetStmt for some subclauses with SET, that also have their values normalized: - ALTER DATABASE - ALTER ROLE - ALTER SYSTEM - CREATE/ALTER FUNCTION `ba90eac7a9` has added test coverage for most of the existing SET patterns. The expected output of these tests shows the difference this commit creates. Normalization could be perhaps applied to more portions of the grammar but what is done here is conservative, and good enough as a starting point. Author: Greg Sabino Mullane, Michael Paquier Discussion: https://postgr.es/m/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com	2024-09-30 14:02:00 +09:00
Fujii Masao	559efce1d6	Add num_done counter to the pg_stat_checkpointer view. Checkpoints can be skipped when the server is idle. The existing num_timed and num_requested counters in pg_stat_checkpointer track both completed and skipped checkpoints, but there was no way to count only the completed ones. This commit introduces the num_done counter, which tracks only completed checkpoints, making it easier to see how many were actually performed. Bump catalog version. Author: Anton A. Melnikov Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com	2024-09-30 11:56:05 +09:00
Tom Lane	a3179ab692	Recalculate where-needed data accurately after a join removal. Up to now, remove_rel_from_query() has done a pretty shoddy job of updating our where-needed bitmaps (per-Var attr_needed and per-PlaceHolderVar ph_needed relid sets). It removed direct mentions of the to-be-removed baserel and outer join, which is the minimum amount of effort needed to keep the data structures self-consistent. But it didn't account for the fact that the removed join ON clause probably mentioned Vars of other relations, and those Vars might now not be needed as high up in the join tree as before. It's easy to show cases where this results in failing to remove a lower outer join that could also have been removed. To fix, recalculate the where-needed bitmaps from scratch after each successful join removal. This sounds expensive, but it seems to add only negligible planner runtime. (We cheat a little bit by preserving "relation 0" entries in the bitmaps, allowing us to skip re-scanning the targetlist and HAVING qual.) The submitted test case drew attention because we had successfully optimized away the lower join prior to v16. I suspect that that's somewhat accidental and there are related cases that were never optimized before and now can be. I've not tried to come up with one, though. Perhaps we should back-patch this into v16 and v17 to repair the performance regression. However, since it took a year for anyone to notice the problem, it can't be affecting too many people. Let's let the patch bake awhile in HEAD, and see if we get more complaints. Per bug #18627 from Mikaël Gourlaouen. No back-patch for now. Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org	2024-09-27 16:04:04 -04:00
Robert Haas	8dfd312902	pg_verifybackup: Verify tar-format backups. This also works for compressed tar-format backups. However, -n must be used, because we use pg_waldump to verify WAL, and it doesn't yet know how to verify WAL that is stored inside of a tarfile. Amul Sul, reviewed by Sravan Kumar and by me, and revised by me.	2024-09-27 08:40:24 -04:00
Michael Paquier	f762d99c87	Fix catalog data of new LO privilege functions This commit improves the catalog data in pg_proc for the three functions for has_largeobject_privilege(), introduced in `4eada203a5`: - Fix their descriptions (typos and consistency). - Reallocate OIDs to be within the 8000-9999 range as required by `a6417078c4`. Bump catalog version. Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ZvUYR0V0dzWaLnsV@paquier.xyz	2024-09-27 07:26:29 +09:00
Alexander Korotkov	e658038772	Update oid for pg_wal_replay_wait() procedure Use an oid from 8000-9999 range, as required by `98eab30b93`. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZvUY6bfTwB0GsyzP%40paquier.xyz	2024-09-26 11:49:41 +03:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit `86dc90056d`, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
Noah Misch	a07e03fd8f	Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the systable_inplace_update_begin() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions. Reported by Smolkin Grigory. Reviewed by Nitin Motiani, (in earlier versions) Heikki Linnakangas, and (in earlier versions) Alexander Lakhin. Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com	2024-09-24 15:25:18 -07:00
Jeff Davis	ac30021356	Allow length=-1 for NUL-terminated input to pg_strncoll(), etc. Like ICU, allow a length of -1 to be specified for NUL-terminated arguments to pg_strncoll(), pg_strnxfrm(), and pg_strnxfrm_prefix(). Simplifies the code and comments. Discussion: https://postgr.es/m/2d758e07dff26bcc7cbe2aec57431329bfe3679a.camel@j-davis.com	2024-09-24 15:15:18 -07:00
Jeff Davis	ceeaaed87a	Tighten up make_libc_collator() and make_icu_collator(). Ensure that error paths within these functions do not leak a collator, and return the result rather than using an out parameter. (Error paths in the caller may still result in a leaked collator, which will be addressed separately.) In make_libc_collator(), if the first newlocale() succeeds and the second one fails, close the first locale_t object. The function make_icu_collator() doesn't have any external callers, so change it to be static. Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com	2024-09-24 12:01:45 -07:00
Nathan Bossart	6aa44060a3	Remove pg_authid's TOAST table. pg_authid's only varlena column is rolpassword, which unfortunately cannot be de-TOASTed during authentication because we haven't selected a database yet and cannot read pg_class. By removing pg_authid's TOAST table, attempts to set password hashes that require out-of-line storage will fail with a "row is too big" error instead. We may want to provide a more user-friendly error in the future, but for now let's just remove the useless TOAST table. Bumps catversion. Reported-by: Alexander Lakhin Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com	2024-09-21 15:17:46 -05:00
Tomas Vondra	c4d5cb71d2	Increase the number of fast-path lock slots Replace the fixed-size array of fast-path locks with arrays, sized on startup based on max_locks_per_transaction. This allows using fast-path locking for workloads that need more locks. The fast-path locking introduced in 9.2 allowed each backend to acquire a small number (16) of weak relation locks cheaply. If a backend needs to hold more locks, it has to insert them into the shared lock table. This is considerably more expensive, and may be subject to contention (especially on many-core systems). The limit of 16 fast-path locks was always rather low, because we have to lock all relations - not just tables, but also indexes, views, etc. For planning we need to lock all relations that might be used in the plan, not just those that actually get used in the final plan. So even with rather simple queries and schemas, we often need significantly more than 16 locks. As partitioning gets used more widely, and the number of partitions increases, this limit is trivial to hit. Complex queries may easily use hundreds or even thousands of locks. For workloads doing a lot of I/O this is not noticeable, but for workloads accessing only data in RAM, the access to the shared lock table may be a serious issue. This commit removes the hard-coded limit of the number of fast-path locks. Instead, the size of the fast-path arrays is calculated at startup, and can be set much higher than the original 16-lock limit. The overall fast-path locking protocol remains unchanged. The variable-sized fast-path arrays can no longer be part of PGPROC, but are allocated as a separate chunk of shared memory and then references from the PGPROC entries. The fast-path slots are organized as a 16-way set associative cache. You can imagine it as a hash table of 16-slot "groups". Each relation is mapped to exactly one group using hash(relid), and the group is then processed using linear search, just like the original fast-path cache. With only 16 entries this is cheap, with good locality. Treating this as a simple hash table with open addressing would not be efficient, especially once the hash table gets almost full. The usual remedy is to grow the table, but we can't do that here easily. The access would also be more random, with worse locality. The fast-path arrays are sized using the max_locks_per_transaction GUC. We try to have enough capacity for the number of locks specified in the GUC, using the traditional 2^n formula, with an upper limit of 1024 lock groups (i.e. 16k locks). The default value of max_locks_per_transaction is 64, which means those instances will have 64 fast-path slots. The main purpose of the max_locks_per_transaction GUC is to size the shared lock table. It is often set to the "average" number of locks needed by backends, with some backends using significantly more locks. This should not be a major issue, however. Some backens may have to insert locks into the shared lock table, but there can't be too many of them, limiting the contention. The only solution is to increase the GUC, even if the shared lock table already has sufficient capacity. That is not free, especially in terms of memory usage (the shared lock table entries are fairly large). It should only happen on machines with plenty of memory, though. In the future we may consider a separate GUC for the number of fast-path slots, but let's try without one first. Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com	2024-09-21 20:09:35 +02:00
Tom Lane	54562c9cfa	Improve Asserts checking relation matching in parallel scans. table_beginscan_parallel and index_beginscan_parallel contain Asserts checking that the relation a worker will use in a parallel scan is the same one the leader intended. However, they were checking for relation OID match, which was not strong enough to detect the mismatch problem fixed in `126ec0bc7`. What would be strong enough is to compare relfilenodes instead. Arguably, that's a saner definition anyway, since a scan surely operates on a physical relation not a logical one. Hence, store and compare RelFileLocators not relation OIDs. Also ensure that index_beginscan_parallel checks the index identity not just the table identity. Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us	2024-09-20 16:37:55 -04:00
Alexander Korotkov	014f9f34d2	Move pg_wal_replay_wait() to xlogfuncs.c This commit moves pg_wal_replay_wait() procedure to be a neighbor of WAL-related functions in xlogfuncs.c. The implementation of LSN waiting continues to reside in the same place. By proposal from Michael Paquier. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org	2024-09-19 14:26:11 +03:00
Nathan Bossart	b52c4fc3c0	Add TOAST table to pg_index. This change allows pg_index rows to use out-of-line storage for the "indexprs" and "indpred" columns, which enables use-cases such as very large index expressions. This system catalog was previously not given a TOAST table due to a fear of circularity issues (see commit `96cdeae07f`). Testing has not revealed any such problems, and it seems unlikely that the entries for system indexes could ever need out-of-line storage. In any case, it is still early in the v18 development cycle, so committing this now will hopefully increase the chances of finding any unexpected problems prior to release. Bumps catversion. Reported-by: Jonathan Katz Reviewed-by: Tom Lane Discussion: https://postgr.es/m/b611015f-b423-458c-aa2d-be0e655cc1b4%40postgresql.org	2024-09-18 14:42:57 -05:00
Michael Paquier	b14e9ce7d5	Extend PgStat_HashKey.objid from 4 to 8 bytes This opens the possibility to define keys for more types of statistics kinds in PgStat_HashKey, the first case being 8-byte query IDs for statistics like pg_stat_statements. This increases the size of PgStat_HashKey from 12 to 16 bytes, while PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps the same size due to alignment. xl_xact_stats_item, that tracks the stats items to drop in commit WAL records, is increased from 12 to 16 bytes. Note that individual chunks in commit WAL records should be multiples of sizeof(int), hence 8-byte object IDs are stored as two uint32, based on a suggestion from Heikki Linnakangas. While on it, the field of PgStat_HashKey is renamed from "objoid" to "objid", as for some stats kinds this field does not refer to OIDs but just IDs, like for replication slot stats. This commit bumps the following format variables: - PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file for non-serialized stats kinds in the dshash table. - XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item. - Catalog version, for the SQL function pg_stat_have_stats(). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz	2024-09-18 12:44:15 +09:00
Thomas Munro	70d38e3d8a	Allow ReadStream to be consumed as raw block numbers. Commits `041b9680` and `6377e12a` changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com	2024-09-18 11:34:28 +12:00
Peter Eisentraut	89f908a6d0	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as `34768ee361`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	7406ab623f	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as `6db4598fcb`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:29 +02:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit `e9931bfb75`. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Fujii Masao	4eada203a5	Add has_largeobject_privilege function. This function checks whether a user has specific privileges on a large object, identified by OID. The user can be provided by name, OID, or default to the current user. If the specified large object doesn't exist, the function returns NULL. It raises an error for a non-existent user name. This behavior is basically consistent with other privilege inquiry functions like has_table_privilege. Bump catalog version. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:51:26 +09:00
Fujii Masao	412229d197	Deduplicate code in LargeObjectExists and myLargeObjectExists. myLargeObjectExists() and LargeObjectExists() had nearly identical code, except for handling snapshots. This commit renames myLargeObjectExists() to LargeObjectExistsWithSnapshot() and refactors LargeObjectExists() to call it internally, reducing duplication. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:45:42 +09:00
Peter Eisentraut	23d0b48468	Remove hardcoded hash opclass function signature exceptions hashvalidate(), which validates the signatures of support functions for the hash AM, contained several hardcoded exceptions. For example, hash/date_ops support function 1 was hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(). hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat because some extensions currently use them to build hash functions for their own types, and we need to keep exposing these functions as "LANGUAGE internal" functions for that to continue to work. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org	2024-09-12 12:57:43 +02:00
Michael Paquier	00c76cf21c	Move logic related to WAL replay of Heap/Heap2 into its own file This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com	2024-09-12 13:32:05 +09:00
David Rowley	9fba1ed294	Adjust tuplestore stats API `1eff8279d` added an API to tuplestore.c to allow callers to obtain storage telemetry data. That API wasn't quite good enough for callers that perform tuplestore_clear() as the telemetry functions only accounted for the current state of the tuplestore, not the maximums before tuplestore_clear() was called. There's a pending patch that would like to add tuplestore telemetry output to EXPLAIN ANALYZE for WindowAgg. That node type uses tuplestore_clear() before moving to the next window partition and we want to show the maximum space used, not the space used for the final partition. Reviewed-by: Tatsuo Ishii, Ashutosh Bapat Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com	2024-09-12 16:02:01 +12:00
Peter Eisentraut	0785d1b8b2	common/jsonapi: support libpq as a client Based on a patch by Michael Paquier. For libpq, use PQExpBuffer instead of StringInfo. This requires us to track allocation failures so that we can return JSON_OUT_OF_MEMORY as needed rather than exit()ing. Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Co-authored-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-09-11 09:01:07 +02:00
Peter Eisentraut	56fead44dc	Add amgettreeheight index AM API routine The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-09-10 10:03:23 +02:00
Richard Guo	f5050f795a	Mark expressions nullable by grouping sets When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:36:48 +09:00
Richard Guo	247dea89f7	Introduce an RTE for the grouping step If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:35:34 +09:00
Robert Haas	cdb6b0fdb0	Add PQfullProtocolVersion() to surface the precise protocol version. The existing function PQprotocolVersion() does not include the minor version of the protocol. In preparation for pending work that will bump that number for the first time, add a new function to provide it to clients that may care, using the (major * 10000 + minor) convention already used by PQserverVersion(). Jacob Champion based on earlier work by Jelte Fennema-Nio Discussion: http://postgr.es/m/CAOYmi+mM8+6Swt1k7XsLcichJv8xdhPnuNv7-02zJWsezuDL+g@mail.gmail.com	2024-09-09 11:54:55 -04:00
Michael Paquier	fc415edf8c	Add callbacks to control flush of fixed-numbered stats This commit adds two callbacks in pgstats to have a better control of the flush timing of pgstat_report_stat(), whose operation depends on the three PGSTAT_*_INTERVAL variables: - have_fixed_pending_cb(), to check if a stats kind has any pending data waiting for a flush. This is used as a fast path if there are no pending statistics to flush, and this check is done for fixed-numbered statistics only if there are no variable-numbered statistics to flush. A flush will need to happen if at least one callback reports any pending data. - flush_fixed_cb(), to do the actual flush. These callbacks are currently used by the SLRU, WAL and IO statistics, generalizing the concept for all stats kinds (builtin and custom). The SLRU and IO stats relied each on one global variable to determine whether a flush should happen; these are now local to pgstat_slru.c and pgstat_io.c, cleaning up a bit how the pending flush states are tracked in pgstat.c. pgstat_flush_io() and pgstat_flush_wal() are still required, but we do not need to check their return result anymore. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtaVO0N-aTwiAk3w@paquier.xyz	2024-09-09 11:12:29 +09:00
Jeff Davis	51edc4ca54	Remove lc_ctype_is_c(). Instead always fetch the locale and look at the ctype_is_c field. hba.c relies on regexes working for the C locale without needing catalog access, which worked before due to a special case for C_COLLATION_OID in lc_ctype_is_c(). Move the special case to pg_set_regex_collation() now that lc_ctype_is_c() is gone. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-06 13:23:21 -07:00
Amit Langote	3422f5f93f	Update comment about ExprState.escontext The updated comment provides more helpful guidance by mentioning that escontext should be set when soft error handling is needed. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-06 10:13:53 +09:00
Michael Paquier	1b373aed20	Add callback for backend initialization in pgstats pgstat_initialize() is currently used by the WAL stats as a code path to take some custom actions when a backend starts. A callback is added to generalize the concept so as all stats kinds can do the same, for builtin and custom kinds, if set. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtZr1K4PLdeWclXY@paquier.xyz	2024-09-05 16:05:21 +09:00
David Rowley	908a968612	Optimize WindowAgg's use of tuplestores When WindowAgg finished one partition of a PARTITION BY, it previously would call tuplestore_end() to purge all the stored tuples before again calling tuplestore_begin_heap() and carefully setting up all of the tuplestore read pointers exactly as required for the given frameOptions. Since the frameOptions don't change between partitions, this part does not make much sense. For queries that had very few rows per partition, the overhead of this was very large. It seems much better to create the tuplestore and the read pointers once and simply call tuplestore_clear() at the end of each partition. tuplestore_clear() moves all of the read pointers back to the start position and deletes all the previously stored tuples. A simple test query with 1 million partitions and 1 tuple per partition has been shown to run around 40% faster than without this change. The additional effort seems to have mostly been spent in malloc/free. Making this work required adding a new bool field to WindowAggState which had the unfortunate effect of being the 9th bool field in a group resulting in the struct being enlarged. Here we shuffle the fields around a little so that the two bool fields for runcondition relating stuff fit into existing padding. Also, move the "runcondition" field to be near those. This frees up enough space with the other bool fields so that the newly added one fits into the padding bytes. This was done to address a very small but apparent performance regression with queries containing a large number of rows per partition. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com	2024-09-05 16:18:30 +12:00
Jeff Davis	06421b0843	Remove lc_collate_is_c(). Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-04 14:35:25 -07:00
Michael Paquier	b4db64270e	Apply more quoting to GUC names in messages This is a continuation of `17974ec259`. More quotes are applied to GUC names in error messages and hints, taking care of what seems to be all the remaining holes currently in the tree for the GUCs. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2024-09-04 13:50:44 +09:00
Amit Kapila	6c2b5edecc	Collect statistics about conflicts in logical replication. This commit adds columns in view pg_stat_subscription_stats to show the number of times a particular conflict type has occurred during the application of logical replication changes. The following columns are added: confl_insert_exists: Number of times a row insertion violated a NOT DEFERRABLE unique constraint. confl_update_origin_differs: Number of times an update was performed on a row that was previously modified by another origin. confl_update_exists: Number of times that the updated value of a row violates a NOT DEFERRABLE unique constraint. confl_update_missing: Number of times that the tuple to be updated is missing. confl_delete_origin_differs: Number of times a delete was performed on a row that was previously modified by another origin. confl_delete_missing: Number of times that the tuple to be deleted is missing. The update_origin_differs and delete_origin_differs conflicts can be detected only when track_commit_timestamp is enabled. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith, Anit Kapila Discussion: https://postgr.es/m/OS0PR01MB57160A07BD575773045FC214948F2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-09-04 08:55:21 +05:30
Noah Misch	c582b75851	Add block_range_read_stream_cb(), to deduplicate code. This replaces two functions for iterating over all blocks in a range. A pending patch will use this instead of adding a third. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com	2024-09-03 10:46:20 -07:00
Peter Eisentraut	2b5f57977f	Add const qualifiers to XLogRegister*() functions Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org	2024-09-03 08:06:03 +02:00
Michael Paquier	4236825197	Fix typos and grammar in code comments and docs Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com	2024-09-03 14:49:04 +09:00
Michael Paquier	c7cd2d6ed0	Define PG_TBLSPC_DIR for path pg_tblspc/ in data folder Similarly to `2065ddf5e3`, this introduces a define for "pg_tblspc". This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. There is a difference with the other cases with the introduction of PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-09-03 09:11:54 +09:00
Daniel Gustafsson	a70e01d430	Remove support for OpenSSL older than 1.1.0 OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for some time, and is no longer the default OpenSSL version with any vendor which package PostgreSQL. By retiring support for OpenSSL 1.0.2 we can remove a lot of no longer required complexity for managing state within libcrypto which is now handled by OpenSSL. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com	2024-09-02 13:51:48 +02:00
Peter Eisentraut	4d5111b3f1	More use of getpwuid_r() directly Remove src/port/user.c, call getpwuid_r() directly. This reduces some complexity and allows better control of the error behavior. For example, the old code would in some circumstances silently truncate the result string, or produce error message strings that the caller wouldn't use. src/port/user.c used to be called src/port/thread.c and contained various portability complications to support thread-safety. These are all obsolete, and all but the user-lookup functions have already been removed. This patch completes this by also removing the user-lookup functions. Also convert src/backend/libpq/auth.c to use getpwuid_r() for thread-safety. Originally, I tried to be overly correct by using sysconf(_SC_GETPW_R_SIZE_MAX) to get the buffer size for getpwuid_r(), but that doesn't work on FreeBSD. All the OS where I could find the source code internally use 1024 as the suggested buffer size, so I just ended up hardcoding that. The previous code used BUFSIZ, which is an unrelated constant from stdio.h, so its use seemed inappropriate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/5f293da9-ceb4-4937-8e52-82c25db8e4d3%40eisentraut.org	2024-09-02 09:04:30 +02:00
Michael Paquier	c39afc38cf	Define PG_LOGICAL_DIR for path pg_logical/ in data folder This is similar to `2065ddf5e3`, but this time for pg_logical/ itself and its contents, like the paths for snapshots, mappings or origin checkpoints. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 15:25:12 +09:00
Michael Paquier	2065ddf5e3	Define PG_REPLSLOT_DIR for path pg_replslot/ in data folder This commit replaces most of the hardcoded values of "pg_replslot" by a new PG_REPLSLOT_DIR #define. This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. More places will follow a similar change. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 10:42:21 +09:00
Michael Paquier	a83a944e9f	Rename pg_sequence_read_tuple() to pg_get_sequence_data() This commit removes log_cnt from the tuple returned by the SQL function. This field is an internal counter that tracks when a WAL record should be generated for a sequence, and it is reset each time the sequence is restored or recovered. It is not necessary to rebuild the sequence DDL commands for pg_dump and pg_upgrade where this function is used. The field can still be queried with a scan of the "table" created under-the-hood for a sequence. Issue noticed while hacking on a feature that can rely on this new function rather than pg_sequence_last_value(), aimed at making sequence computation more easily pluggable. Bump catalog version. Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/Zsvka3r-y2ZoXAdH@paquier.xyz	2024-08-30 08:49:24 +09:00
Heikki Linnakangas	478846e768	Rename some shared memory initialization routines To make them follow the usual naming convention where FoobarShmemSize() calculates the amount of shared memory needed by Foobar subsystem, and FoobarShmemInit() performs the initialization. I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are a little special. They need to be called before any of the other ShmemInit() functions, because they set up the shared memory bookkeeping itself. I also didn't rename InitProcGlobal(), because unlike other Shmeminit functions, it's not called by individual backends. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:21 +03:00
Heikki Linnakangas	fbce7dfc77	Refactor lock manager initialization to make it a bit less special Split the shared and local initialization to separate functions, and follow the common naming conventions. With this, we no longer create the LockMethodLocalHash hash table in the postmaster process, which was always pointless. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:06 +03:00
Amit Kapila	640178c92e	Rename the conflict types for the origin differ cases. The conflict types 'update_differ' and 'delete_differ' indicate that a row to be modified was previously altered by another origin. Rename those to 'update_origin_differs' and 'delete_origin_differs' to clarify their meaning. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith Discussion: https://postgr.es/m/CAA4eK1+HEKwG_UYt4Zvwh5o_HoCKCjEGesRjJX38xAH3OxuuYA@mail.gmail.com	2024-08-29 09:12:12 +05:30
Peter Eisentraut	6654bb9204	Add prefetching support on macOS macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE command does the same thing. Some related documentation has been generalized to not mention posix_advise() specifically anymore. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org	2024-08-28 07:28:27 +02:00
Alexander Korotkov	3890d90c15	Revert support for ALTER TABLE ... MERGE/SPLIT PARTITION(S) commands This commit reverts `1adf16b8fb`, `87c21bb941`, and subsequent fixes and improvements including `df64c81ca9`, `c99ef1811a`, `9dfcac8e15`, `885742b9f8`, `842c9b2705`, `fcf80c5d5f`, `96c7381c4c`, `f4fc7cb54b`, `60ae37a8bc`, `259c96fa8f`, `449cdcd486`, `3ca43dbbb6`, `2a679ae94e`, `3a82c689fd`, `fbd4321fd5`, `d53a4286d7`, `c086896625`, `4e5d6c4091`, `04158e7fa3`. The reason for reverting is security issues related to repeatable name lookups (CVE-2014-0062). Even though `04158e7fa3` solved part of the problem, there are still remaining issues, which aren't feasible to even carefully analyze before the RC deadline. Reported-by: Noah Misch, Robert Haas Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Backpatch-through: 17	2024-08-24 18:48:48 +03:00
Peter Eisentraut	a2bbc58f74	thread-safety: gmtime_r(), localtime_r() Use gmtime_r() and localtime_r() instead of gmtime() and localtime(), for thread-safety. There are a few affected calls in libpq and ecpg's libpgtypes, which are probably effectively bugs, because those libraries already claim to be thread-safe. There is one affected call in the backend. Most of the backend otherwise uses the custom functions pg_gmtime() and pg_localtime(), which are implemented differently. While we're here, change the call in the backend to gmtime() instead of localtime(), since for that use time zone behavior is irrelevant, and this side-steps any questions about when time zones are initialized by localtime_r() vs localtime(). Portability: gmtime_r() and localtime_r() are in POSIX but are not available on Windows. Windows has functions gmtime_s() and localtime_s() that can fulfill the same purpose, so we add some small wrappers around them. (Note that these _s() functions are also different from the _s() functions in the bounds-checking extension of C11. We are not using those here.) On MinGW, you can get the POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately before including <time.h>. This leads to a conflict at least in plpython because apparently _POSIX_C_SOURCE gets defined in some header there, and then our replacement definitions conflict with the system definitions. To avoid that sort of thing, we now always define _POSIX_C_SOURCE on MinGW and use the POSIX-style functions here. Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7d70@eisentraut.org	2024-08-23 07:43:04 +02:00
Alexander Korotkov	04158e7fa3	Avoid repeated table name lookups in createPartitionTable() Currently, createPartitionTable() opens newly created table using its name. This approach is prone to privilege escalation attack, because we might end up opening another table than we just created. This commit address the issue above by opening newly created table by its OID. It appears to be tricky to get a relation OID out of ProcessUtility(). We have to extend TableLikeClause with new newRelationOid field, which is filled within ProcessUtility() to be further accessed by caller. Security: CVE-2014-0062 Reported-by: Noah Misch Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Reviewed-by: Pavel Borisov, Dmitry Koval	2024-08-22 09:50:48 +03:00
Michael Paquier	490f869d92	Create syscache entries for pg_extension Two syscache identifiers are added for extension names and OIDs. Shared libraries of extensions might want to invalidate or update their own caches whenever a CREATE, ALTER or DROP EXTENSION command is run for their extension (in any backend). Right now this is non-trivial to do correctly and efficiently, but, if an extension catalog is part of a syscache, this could simply be done by registering an callback using CacheRegisterSyscacheCallback for the relevant syscache. Another case where this is useful is a loaded library where some of its code paths rely on some objects of the extension to exist; it can be simpler and more efficient to do an existence check directly on the extension through the syscache. Author: Jelte Fennema-Nio Reviewed-by: Alexander Korotkov, Pavel Stehule Discussion: https://postgr.es/m/CAGECzQTWm9sex719Hptbq4j56hBGUti7J9OWjeMobQ1ccRok9w@mail.gmail.com	2024-08-22 10:48:25 +09:00
Robert Haas	c01743aa48	Show number of disabled nodes in EXPLAIN ANALYZE output. Now that disable_cost is not included in the cost estimate, there's no visible sign in EXPLAIN output of which plan nodes are disabled. Fix that by propagating the number of disabled nodes from Path to Plan, and then showing it in the EXPLAIN output. There is some question about whether this is a desirable change. While I personally believe that it is, it seems best to make it a separate commit, in case we decide to back out just this part, or rework it. Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:14:35 -04:00
Robert Haas	e222534679	Treat number of disabled nodes in a path as a separate cost metric. Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:12:30 -04:00
Amit Kapila	3f28b2fcac	Don't advance origin during apply failure. We advance origin progress during abort on successful streaming and application of ROLLBACK in parallel streaming mode. But the origin shouldn't be advanced during an error or unsuccessful apply due to shutdown. Otherwise, it will result in a transaction loss as such a transaction won't be sent again by the server. Reported-by: Hou Zhijie Author: Hayato Kuroda and Shveta Malik Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com	2024-08-21 09:22:32 +05:30
Michael Paquier	15c1abd977	Remove _PG_fini() `ab02d702ef` has removed from the backend the code able to support the unloading of modules, because this has never worked. This removes the last references to _PG_fini(), that could be used as a callback for modules to manipulate the stack when unloading a library. The test module ldap_password_func had the idea to declare it, doing nothing. The function declaration in fmgr.h is gone. It was left around in 2022 to avoid breaking extension code, but at this stage there are also benefits in letting extension developers know that keeping the unloading code is pointless and this move leads to less maintenance. Reviewed-by: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/ZsQfi0AUJoMF6NSd@paquier.xyz	2024-08-21 07:24:03 +09:00
Amit Kapila	9758174e2e	Log the conflicts while applying changes in logical replication. This patch provides the additional logging information in the following conflict scenarios while applying changes: insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint. update_differ: Updating a row that was previously modified by another origin. update_exists: The updated row value violates a NOT DEFERRABLE unique constraint. update_missing: The tuple to be updated is missing. delete_differ: Deleting a row that was previously modified by another origin. delete_missing: The tuple to be deleted is missing. For insert_exists and update_exists conflicts, the log can include the origin and commit timestamp details of the conflicting key with track_commit_timestamp enabled. update_differ and delete_differ conflicts can only be detected when track_commit_timestamp is enabled on the subscriber. We do not offer additional logging for exclusion constraint violations because these constraints can specify rules that are more complex than simple equality checks. Resolving such conflicts won't be straightforward. This area can be further enhanced if required. Author: Hou Zhijie Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-08-20 08:35:11 +05:30
David Rowley	adf97c1562	Speed up Hash Join by making ExprStates support hashing Here we add ExprState support for obtaining a 32-bit hash value from a list of expressions. This allows both faster hashing and also JIT compilation of these expressions. This is especially useful when hash joins have multiple join keys as the previous code called ExecEvalExpr on each hash join key individually and that was inefficient as tuple deformation would have only taken into account one key at a time, which could lead to walking the tuple once for each join key. With the new code, we'll determine the maximum attribute required and deform the tuple to that point only once. Some performance tests done with this change have shown up to a 20% performance increase of a query containing a Hash Join without JIT compilation and up to a 26% performance increase when JIT is enabled and optimization and inlining were performed by the JIT compiler. The performance increase with 1 join column was less with a 14% increase with and without JIT. This test was done using a fairly small hash table and a large number of hash probes. The increase will likely be less with large tables, especially ones larger than L3 cache as memory pressure is more likely to be the limiting factor there. This commit only addresses Hash Joins, but lays expression evaluation and JIT compilation infrastructure for other hashing needs such as Hash Aggregate. Author: David Rowley Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net> Reviewed-by: Tels <nospam-pg-abuse@bloodgate.com> Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com	2024-08-20 13:38:22 +12:00
Nathan Bossart	9e9a2b7031	Remove dependence on -fwrapv semantics in a few places. This commit attempts to update a few places, such as the money, numeric, and timestamp types, to no longer rely on signed integer wrapping for correctness. This is intended to move us closer towards removing -fwrapv, which may enable some compiler optimizations. However, there is presently no plan to actually remove that compiler option in the near future. Besides using some of the existing overflow-aware routines in int.h, this commit introduces and makes use of some new ones. Specifically, it adds functions that accept a signed integer and return its absolute value as an unsigned integer with the same width (e.g., pg_abs_s64()). It also adds functions that accept an unsigned integer, store the result of negating that integer in a signed integer with the same width, and return whether the negation overflowed (e.g., pg_neg_u64_overflow()). Finally, this commit adds a couple of tests for timestamps near POSTGRES_EPOCH_JDATE. Author: Joseph Koshakow Reviewed-by: Tom Lane, Heikki Linnakangas, Jian He Discussion: https://postgr.es/m/CAAvxfHdBPOyEGS7s%2Bxf4iaW0-cgiq25jpYdWBqQqvLtLe_t6tw%40mail.gmail.com	2024-08-15 15:47:31 -05:00
David Rowley	80ffcb8427	Improve ALTER PUBLICATION validation and error messages Attempting to add a system column for a table to an existing publication would result in the not very intuitive error message of: ERROR: negative bitmapset member not allowed Here we improve that to have it display the same error message as a user would see if they tried adding a system column for a table when adding it to the publication in the first place. Doing this requires making the function which validates the list of columns an extern function. The signature of the static function wasn't an ideal external API as it made the code more complex than it needed to be. Here we adjust the function to have it populate a Bitmapset of attribute numbers. Doing it this way allows code simplification. There was no particular bug here other than the weird error message, so no backpatch. Bug: #18558 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Peter Smith, David Rowley Discussion: https://postgr.es/m/18558-411bc81b03592125@postgresql.org	2024-08-15 13:10:25 +12:00
Peter Eisentraut	5304fec4d8	Apply PGDLLIMPORT markings to some GUC variables According to the commit message in `8ec569479`, we must have all variables in header files marked with PGDLLIMPORT. In commit `d3cc5ffe81` some variables were moved from launch_backend.c file to several header files. This adds PGDLLIMPORT to moved variables. Author: Sofia Kopikova <s.kopikova@postgrespro.ru> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/e0b17014-5319-4dd6-91cd-93d9c8fc9539%40postgrespro.ru	2024-08-14 11:36:12 +02:00
Peter Eisentraut	c8e2d422fd	Remove TRACE_SORT macro The TRACE_SORT macro guarded the availability of the trace_sort GUC setting. But it has been enabled by default ever since it was introduced in PostgreSQL 8.1, and there have been no reports that someone wanted to disable it. So just remove the macro to simplify things. (For the avoidance of doubt: The trace_sort GUC is still there. This only removes the rarely-used macro guarding it.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/be5f7162-7c1d-44e3-9a78-74dcaa6529f2%40eisentraut.org	2024-08-14 08:07:52 +02:00
Masahiko Sawada	c584781bcc	Use pgBufferUsage for buffer usage tracking in analyze. Previously, (auto)analyze used global variables VacuumPageHit, VacuumPageMiss, and VacuumPageDirty to track buffer usage. However, pgBufferUsage provides a more generic way to track buffer usage with support functions. This change replaces those global variables with pgBufferUsage in analyze. Since analyze was the sole user of those variables, it removes their declarations. Vacuum previously used those variables but replaced them with pgBufferUsage as part of a bug fix, commit `5cd72cc0c`. Additionally, it adjusts the buffer usage message in both vacuum and analyze for better consistency. Author: Anthonin Bonnefoy Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com	2024-08-13 18:49:45 -07:00
Thomas Munro	14c648ff00	All POSIX systems have langinfo.h and CODESET. We don't need configure probes for HAVE_LANGINFO_H (it is implied by !WIN32), and we don't need to consider systems that have it but don't define CODESET (that was for OpenBSD in commit `81cca218`, but it has now had it for 19 years). Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com	2024-08-13 22:13:52 +12:00
Peter Geoghegan	1343ae954c	Give nbtree move right function internal linkage. Declare _bt_moveright() static. This is a minor modularity win; the routine was already private to nbtsearch.c for all practical purposes. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WgWVzCNEXQB_op5MMZMDgJ3fg3AhVm6bq2iZPpJNXGhWw@mail.gmail.com	2024-08-12 14:36:55 -04:00
Nathan Bossart	760162fedb	Add user-callable CRC functions. We've had code for CRC-32 and CRC-32C for some time (for WAL records, etc.), but there was no way for users to call it, despite apparent popular demand. The new crc32() and crc32c() functions accept bytea input and return bigint (to avoid returning negative values). Bumps catversion. Author: Aleksander Alekseev Reviewed-by: Peter Eisentraut, Tom Lane Discussion: https://postgr.es/m/CAJ7c6TNMTGnqnG%3DyXXUQh9E88JDckmR45H2Q%2B%3DucaCLMOW1QQw%40mail.gmail.com	2024-08-12 10:35:06 -05:00
David Rowley	313df8f5ad	Fix outdated comments A few fields in ResultRelInfo are now also used for MERGE. Update the comments to mention that. Reported-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxH8-NvFhLcSZZTTW+1M9AfS4+SOTKmyPG7ZhzNvN=+NkA@mail.gmail.com:wq	2024-08-12 23:41:13 +12:00
David Rowley	ffabb56c94	Fix a series of typos and outdated references Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/c1d63754-cb85-2d8a-8409-bde2c4d2d04b@gmail.com	2024-08-12 23:27:09 +12:00
David Rowley	f0d1127595	Remove "parent" column from pg_backend_memory_contexts `32d3ed816` added the "path" column to pg_backend_memory_contexts to allow a stable method of obtaining the parent MemoryContext of a given row in the view. Using the "path" column is now the preferred method of obtaining the parent row. Previously, any queries which were self-joining to this view using the "name" and "parent" columns could get incorrect results due to the fact that names are not unique. Here we aim to explicitly break such queries so that they can be corrected and use the "path" column instead. It is possible that there are more innocent users of the parent column that just need an indication of the parent and having to write out a self-joining CTE may be an unnecessary hassle for those cases. Let's remove the column for now and see if anyone comes back with any complaints. This does seem like a good time to attempt to get rid of the column as we still have around 1 year to revert this if someone comes back with a valid complaint. Plus this view is new to v14 and is quite niche, so perhaps not many people will be affected. Author: Melih Mutlu <m.melihmutlu@gmail.com> Discussion: https://postgr.es/m/CAGPVpCT7NOe4fZXRL8XaoxHpSXYTu6GTpULT_3E-HT9hzjoFRA@mail.gmail.com	2024-08-12 15:42:16 +12:00
Tom Lane	364de74cff	Allow adjusting session_authorization and role in parallel workers. The code intends to allow GUCs to be set within parallel workers via function SET clauses, but not otherwise. However, doing so fails for "session_authorization" and "role", because the assign hooks for those attempt to set the subsidiary "is_superuser" GUC, and that call falls foul of the "not otherwise" prohibition. We can't switch to using GUC_ACTION_SAVE for this, so instead add a new GUC variable flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set anyway. (This is okay because is_superuser has context PGC_INTERNAL and thus only hard-wired calls can change it. We'd need more thought before applying the flag to other GUCs; but maybe there are other use-cases.) This isn't the prettiest fix perhaps, but other alternatives we thought of would be much more invasive. While here, correct a thinko in commit `059de3ca4`: when rejecting a GUC setting within a parallel worker, we should return 0 not -1 if the ereport doesn't longjmp. (This seems to have no consequences right now because no caller cares, but it's inconsistent.) Improve the comments to try to forestall future confusion of the same kind. Despite the lack of field complaints, this seems worth back-patching. Thanks to Nathan Bossart for the idea to invent a new flag, and for review. Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us	2024-08-10 15:51:30 -04:00
Heikki Linnakangas	28a520c0b7	Refactor code to handle death of a backend or bgworker in postmaster Currently, when a child process exits, the postmaster first scans through BackgroundWorkerList, to see if it the child process was a background worker. If not found, then it scans through BackendList to see if it was a regular backend. That leads to some duplication between the bgworker and regular backend cleanup code, as both have an entry in the BackendList that needs to be cleaned up in the same way. Refactor that so that we scan just the BackendList to find the child process, and if it was a background worker, do the additional bgworker-specific cleanup in addition to the normal Backend cleanup. Change HandleChildCrash so that it doesn't try to handle the cleanup of the process that already exited, only the signaling of all the other processes. When called for any of the aux processes, the caller had already cleared the *PID global variable, so the code in HandleChildCrash() to do that was unused. On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's now logged with that exit code, instead of 0. Also, if a bgworker exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is restarted. Previously it was treated as a normal exit. If a child process is not found in the BackendList, the log message now calls it "untracked child process" rather than "server process". Arguably that should be a PANIC, because we do track all the child processes in the list, so failing to find a child process is highly unexpected. But if we want to change that, let's discuss and do that as a separate commit. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-10 00:04:43 +03:00
Heikki Linnakangas	b43100fa71	Make BackgroundWorkerList doubly-linked This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit() to take a RegisteredBgWorker pointer as argument, rather than a list iterator. That feels a little more natural. But more importantly, this paves the way for more refactoring in the next commit. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-09 22:44:20 +03:00
Peter Eisentraut	7da1bdc2c2	Remove obsolete RECHECK keyword completely This used to be part of CREATE OPERATOR CLASS and ALTER OPERATOR FAMILY, but it has done nothing (except issue a NOTICE) since PostgreSQL 8.4. Commit `30e7c175b8` removed support for dumping from pre-9.2 servers, so this no longer serves any need. This now removes it completely, and you'd get a normal parse error if you used it. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/113ef2d2-3657-4353-be97-f28fceddbca1%40eisentraut.org	2024-08-09 07:18:51 +02:00
Robert Haas	22b4a1b561	Improve file header comments for astramer code. Make it clear that "astreamer" stands for "archive streamer". Generalize comments that still believe this code can only be used by pg_basebackup. Add some comments explaining the asymmetry between the gzip, lz4, and zstd astreamers, in the hopes of making life easier for anyone who hacks on this code in the future. Robert Haas, reviewed by Amul Sul. Discussion: http://postgr.es/m/CAAJ_b97O2kkKVTWxt8MxDN1o-cDfbgokqtiN2yqFf48=gXpcxQ@mail.gmail.com	2024-08-07 08:49:41 -04:00
Alexander Korotkov	d0f020037e	Introduce hash_search_with_hash_value() function This new function iterates hash entries with given hash values. This function is designed to avoid full sequential hash search in the syscache invalidation callbacks. Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru Author: Teodor Sigaev Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov Reviewed-by: Andrei Lepikhov	2024-08-07 07:06:17 +03:00
Heikki Linnakangas	d5f139cb68	Constify fields and parameters in spell.c I started by marking VoidString as const, and fixing the fallout by marking more fields and function arguments as const. It proliferated quite a lot, but all within spell.c and spell.h. A more narrow patch to get rid of the static VoidString buffer would be to replace it with '#define VoidString ""', as C99 allows assigning "" to a non-const pointer, even though you're not allowed to modify it. But it seems like good hygiene to mark all these as const. In the structs, the pointers can point to the constant VoidString, or a buffer allocated with palloc(), or with compact_palloc(), so you should not modify them. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:04:51 +03:00
Heikki Linnakangas	85829c973c	Make nullSemAction const, add 'const' decorators to related functions To make it more clear that these should never be modified. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:04:22 +03:00
Heikki Linnakangas	1e35951e71	Turn a few 'validnsps' static variables into locals There was no need for these to be static buffers, local variables work just as well. I think they were marked as 'static' to imply that they are read-only, but 'const' is more appropriate for that, so change them to const. To make it possible to mark the variables as 'const', also add 'const' decorations to the transformRelOptions() signature. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:03:43 +03:00
Heikki Linnakangas	a54d4ed183	Fix datatypes in comments in instr_time.h The INSTR_TIME_GET_NANOSEC(t) and INSTR_TIME_GET_MICROSEC(t) macros return a signed int64. Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 22:15:55 +03:00
Heikki Linnakangas	39a138fbef	Revert "Fix comments in instr_time.h and remove an unneeded cast to int64" This reverts commit `3dcb09de7b`. Tom Lane pointed out that it broke the abstraction provided by the macros. The callers should not need to know what the internal type is. This commit is an exact revert, the next commit will fix the comments on the macros that incorrectly claim that they return uint64. Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 22:15:46 +03:00
Heikki Linnakangas	3dcb09de7b	Fix comments in instr_time.h and remove an unneeded cast to int64 `03023a2664` represented time as an int64 on all platforms but forgot to update the comment related to INSTR_TIME_GET_MICROSEC() and provided an incorrect comment for INSTR_TIME_GET_NANOSEC(). In passing remove an unneeded cast to int64. Author: Bertrand Drouvot Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 14:28:02 +03:00
Robert Haas	f80b09bac8	Move astreamer (except astreamer_inject) to fe_utils. This allows the code to be used by other frontend applications. Amul Sul, reviewed by Sravan Kumar, Andres Freund (whose input I specifically solicited regarding the meson.build changes), and me. Discussion: http://postgr.es/m/CAAJ_b94StvLWrc_p4q-f7n3OPfr6GhL8_XuAg2aAaYZp1tF-nw@mail.gmail.com	2024-08-05 11:41:57 -04:00
Masahiko Sawada	66e94448ab	Restrict accesses to non-system views and foreign tables during pg_dump. When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12	2024-08-05 06:05:33 -07:00
Amit Kapila	b5df24e520	Fix typo in bufpage.h. Author: Senglee Choi Reviewed-by: Tender Wang Discussion: https://postgr.es/m/CACUsy79U0=S5zWEf6D57F=vB7rOEa86xFY6oovDZ58jRcROCxQ@mail.gmail.com	2024-08-05 14:38:00 +05:30
Michael Paquier	2eff9e678d	Add helper routines to retrieve data for custom fixed-numbered pgstats This is useful for extensions to get snapshot and shmem data for custom cumulative statistics when these have a fixed number of objects, so as these do not need to know about the snapshot internals, aka pgStatLocal. An upcoming commit introducing an example template for custom cumulative stats with fixed-numbered objects will make use of these. I have noticed that this is useful for extension developers while hacking my own example, actually. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-05 11:43:33 +09:00
Michael Paquier	7949d95945	Introduce pluggable APIs for Cumulative Statistics This commit adds support in the backend for $subject, allowing out-of-core extensions to plug their own custom kinds of cumulative statistics. This feature has come up a few times into the lists, and the first, original, suggestion came from Andres Freund, about pg_stat_statements to use the cumulative statistics APIs in shared memory rather than its own less efficient internals. The advantage of this implementation is that this can be extended to any kind of statistics. The stats kinds are divided into two parts: - The in-core "builtin" stats kinds, with designated initializers, able to use IDs up to 128. - The "custom" stats kinds, able to use a range of IDs from 128 to 256 (128 slots available as of this patch), with information saved in TopMemoryContext. This can be made larger, if necessary. There are two types of cumulative statistics in the backend: - For fixed-numbered objects (like WAL, archiver, etc.). These are attached to the snapshot and pgstats shmem control structures for efficiency, and built-in stats kinds still do that to avoid any redirection penalty. The data of custom kinds is stored in a first array in snapshot structure and a second array in the shmem control structure, both indexed by their ID, acting as an equivalent of the builtin stats. - For variable-numbered objects (like tables, functions, etc.). These are stored in a dshash using the stats kind ID in the hash lookup key. Internally, the handling of the builtin stats is unchanged, and both fixed and variabled-numbered objects are supported. Structure definitions for builtin stats kinds are renamed to reflect better the differences with custom kinds. Like custom RMGRs, custom cumulative statistics can only be loaded with shared_preload_libraries at startup, and must allocate a unique ID shared across all the PostgreSQL extension ecosystem with the following wiki page to avoid conflicts: https://wiki.postgresql.org/wiki/CustomCumulativeStats This makes the detection of the stats kinds and their handling when reading and writing stats much easier than, say, allocating IDs for stats kinds from a shared memory counter, that may change the ID used by a stats kind across restarts. When under development, extensions can use PGSTAT_KIND_EXPERIMENTAL. Two examples that can be used as templates for fixed-numbered and variable-numbered stats kinds will be added in some follow-up commits, with tests to provide coverage. Some documentation is added to explain how to use this plugin facility. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-04 19:41:24 +09:00
Alexander Korotkov	3c5db1d6b0	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. The queue of waiters is stored in the shared memory as an LSN-ordered pairing heap, where the waiter with the nearest LSN stays on the top. During the replay of WAL, waiters whose LSNs have already been replayed are deleted from the shared memory pairing heap and woken up by setting their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working without an active snapshot, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi	2024-08-02 21:16:56 +03:00
Heikki Linnakangas	ef4c35b416	Fix outdated comment; all running bgworkers are in BackendList Before commit `8a02b3d732`, only bgworkers that connected to a database had an entry in the Backendlist. Commit `8a02b3d732` changed that, but forgot to update this comment. Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-01 23:23:47 +03:00
Michael Paquier	3188a4582a	Switch PgStat_Kind from an enum to a uint32 type A follow-up patch is planned to make cumulative statistics pluggable, and using a type is useful in the internal routines used by pgstats as PgStat_Kind may have a value that was not originally in the enum removed here, once made pluggable. While on it, this commit switches pgstat_is_kind_valid() to use PgStat_Kind rather than an int, to be more consistent with its existing callers. Some loops based on the stats kind IDs are switched to use PgStat_Kind rather than int, for consistency with the new time. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-02 04:49:34 +09:00
Michael Paquier	b860848232	Add redo LSN to pgstats files This is used in the startup process to check that the pgstats file we are reading includes the redo LSN referring to the shutdown checkpoint where it has been written. The redo LSN in the pgstats file needs to match with what the control file has. This is intended to be used for an upcoming change that will extend the write of the stats file to happen during checkpoints, rather than only shutdown sequences. Bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zp8o6_cl0KSgsnvS@paquier.xyz	2024-08-02 01:57:28 +09:00
Etsuro Fujita	e66b32e43b	Update comment in portal.h. We store tuples into the portal's tuple store for a PORTAL_ONE_MOD_WITH query as well. Back-patch to all supported branches. Reviewed by Andy Fan. Discussion: https://postgr.es/m/CAPmGK14HVYBZYZtHabjeCd-e31VT%3Dwx6rQNq8QfehywLcpZ2Hw%40mail.gmail.com	2024-08-01 17:45:00 +09:00
Peter Eisentraut	a292c98d62	Convert node test compile-time settings into run-time parameters This converts COPY_PARSE_PLAN_TREES WRITE_READ_PARSE_PLAN_TREES RAW_EXPRESSION_COVERAGE_TEST into run-time parameters debug_copy_parse_plan_trees debug_write_read_parse_plan_trees debug_raw_expression_coverage_test They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS. The compile-time symbols are kept for build farm compatibility, but they now just determine the default value of the run-time settings. Furthermore, support for these settings is not compiled in at all unless assertions are enabled, or the new symbol DEBUG_NODE_TESTS_ENABLED is defined at compile time, or any of the legacy compile-time setting symbols are defined. So there is no run-time overhead in production builds. (This is similar to the handling of DISCARD_CACHES_ENABLED.) Discussion: https://www.postgresql.org/message-id/flat/30747bd8-f51e-4e0c-a310-a6e2c37ec8aa%40eisentraut.org	2024-08-01 10:09:18 +02:00
Andres Freund	a7f107df2b	Evaluate arguments of correlated SubPlans in the referencing ExprState Until now we generated an ExprState for each parameter to a SubPlan and evaluated them one-by-one ExecScanSubPlan. That's sub-optimal as creating lots of small ExprStates a) makes JIT compilation more expensive b) wastes memory c) is a bit slower to execute This commit arranges to evaluate parameters to a SubPlan as part of the ExprState referencing a SubPlan, using the new EEOP_PARAM_SET expression step. We emit one EEOP_PARAM_SET for each argument to a subplan, just before the EEOP_SUBPLAN step. It likely is worth using EEOP_PARAM_SET in other places as well, e.g. for SubPlan outputs, nestloop parameters and - more ambitiously - to get rid of ExprContext->domainValue/caseValue/ecxt_agg*. But that's for later. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Discussion: https://postgr.es/m/20230225214401.346ancgjqc3zmvek@awork3.anarazel.de	2024-07-31 19:54:46 -07:00
Jeff Davis	ca2eea3ac8	Add is_create parameter to RefreshMatviewByOid(). RefreshMatviewByOid is used for both REFRESH and CREATE MATERIALIZED VIEW. This flag is currently just used for handling internal error messages, but also aimed to improve code-readability. Author: Yugo Nagata Discussion: https://postgr.es/m/20240726122630.70e889f63a4d7e26f8549de8@sraoss.co.jp	2024-07-31 16:42:19 -07:00
Jeff Davis	f683d3a4ca	Remove unused ParamListInfo argument from ExecRefreshMatView. Author: Yugo Nagata Discussion: https://postgr.es/m/20240726122630.70e889f63a4d7e26f8549de8@sraoss.co.jp	2024-07-31 16:37:53 -07:00
Nathan Bossart	c8b06bb969	Introduce pg_sequence_read_tuple(). This new function returns the data for the given sequence, i.e., the values within the sequence tuple. Since this function is a substitute for SELECT from the sequence, the SELECT privilege is required on the sequence in question. It returns all NULLs for sequences for which we lack privileges, other sessions' temporary sequences, and unlogged sequences on standbys. This function is primarily intended for use by pg_dump in a follow-up commit that will use it to optimize dumpSequenceData(). Like pg_sequence_last_value(), which is a support function for the pg_sequences system view, pg_sequence_read_tuple() is left undocumented. Bumps catversion. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20240503025140.GA1227404%40nathanxps13	2024-07-31 10:12:42 -05:00
Heikki Linnakangas	47c98035c6	Remove leftover function declaration Commit `9d9b9d46f3` removed the function (or rather, moved it to a different source file and renamed it to SendCancelRequest), but forgot the declaration in the header file.	2024-07-30 15:19:46 +03:00
Thomas Munro	83aadbeb96	Require memory barrier support. Previously we had a fallback implementation that made a harmless system call, based on the assumption that system calls must contain a memory barrier. That shouldn't be reached on any current system, and it seems highly likely that we can easily find out how to request explicit memory barriers, if we've already had to find out how to do atomics on a hypothetical new system. Removed comments and a function name referred to a spinlock used for fallback memory barriers, but that changed in `1b468a13`, which left some misleading words behind in a few places. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 23:01:55 +12:00
Thomas Munro	a011dc399c	Require compiler barrier support. Previously we had a fallback implementation of pg_compiler_barrier() that called an empty function across a translation unit boundary so the compiler couldn't see what it did. That shouldn't be needed on any current systems, and might not even work with a link time optimizer. Since we now require compiler-specific knowledge of how to implement atomics, we should also know how to implement compiler barriers on a hypothetical new system. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:59:30 +12:00
Thomas Munro	8138526136	Remove --disable-atomics, require 32 bit atomics. Modern versions of all relevant architectures and tool chains have atomics support. Since `edadeb07`, there is no remaining reason to carry code that simulates atomic flags and uint32 imperfectly with spinlocks. 64 bit atomics are still emulated with spinlocks, if needed, for now. Any modern compiler capable of implementing C11 <stdatomic.h> must have the underlying operations we need, though we don't require C11 yet. We detect certain compilers and architectures, so hypothetical new systems might need adjustments here. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:58:57 +12:00
Thomas Munro	e25626677f	Remove --disable-spinlocks. A later change will require atomic support, so it wouldn't make sense for a hypothetical new system not to be able to implement spinlocks. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:58:37 +12:00
Jeff Davis	72fe6d24a3	Make collation not depend on setlocale(). Now that the result of pg_newlocale_from_collation() is always non-NULL, then we can move the collate_is_c and ctype_is_c flags into pg_locale_t. That simplifies the logic in lc_collate_is_c() and lc_ctype_is_c(), removing the dependence on setlocale(). This commit also eliminates the multi-stage initialization of the collation cache. As long as we have catalog access, then it's now safe to call pg_newlocale_from_collation() without checking lc_collate_is_c() first. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-30 00:58:06 -07:00
Richard Guo	9b282a9359	Fix partitionwise join with partially-redundant join clauses To determine if the two relations being joined can use partitionwise join, we need to verify the existence of equi-join conditions involving pairs of matching partition keys for all partition keys. Currently we do that by looking through the join's restriction clauses. However, it has been discovered that this approach is insufficient, because there might be partition keys known equal by a specific EC, but they do not form a join clause because it happens that other members of the EC than the partition keys are constrained to become a join clause. To address this issue, in addition to examining the join's restriction clauses, we also check if any partition keys are known equal by ECs, by leveraging function exprs_known_equal(). To accomplish this, we enhance exprs_known_equal() to check equality per the semantics of the opfamily, if provided. It could be argued that exprs_known_equal() could be called O(N^2) times, where N is the number of partition key expressions, resulting in noticeable performance costs if there are a lot of partition key expressions. But I think this is not a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression. That is to say, it does not scale with the number of partitions. A benchmark with a query involving 5-way joins of partitioned tables, each with 3 partition keys and 1000 partitions, shows that the planning time is not significantly affected by this patch (within the margin of error), particularly when compared to the impact caused by partitionwise join. Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to check if partition keys are known equal by ECs. Author: Richard Guo, Tom Lane Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com	2024-07-30 15:51:54 +09:00
Amit Langote	7f56eaff2f	SQL/JSON: Fix casting for integer EXISTS columns in JSON_TABLE The current method of coercing the boolean result value of JsonPathExists() to the target type specified for an EXISTS column, which is to call the type's input function via json_populate_type(), leads to an error when the target type is integer, because the integer input function doesn't recognize boolean literal values as valid. Instead use the boolean-to-integer cast function for coercion in that case so that using integer or domains thereof as type for EXISTS columns works. Note that coercion for ON ERROR values TRUE and FALSE already works like that because the parser creates a cast expression including the cast function, but the coercion of the actual result value is not handled by the parser. Tests by Jian He. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-07-30 10:34:17 +09:00
Heikki Linnakangas	9d9b9d46f3	Move cancel key generation to after forking the backend Move responsibility of generating the cancel key to the backend process. The cancel key is now generated after forking, and the backend advertises it in the ProcSignal array. When a cancel request arrives, the backend handling it scans the ProcSignal array to find the target pid and cancel key. This is similar to how this previously worked in the EXEC_BACKEND case with the ShmemBackendArray, just reusing the ProcSignal array. One notable change is that we no longer generate cancellation keys for non-backend processes. We generated them before just to prevent a malicious user from canceling them; the keys for non-backend processes were never actually given to anyone. There is now an explicit flag indicating whether a process has a valid key or not. I wrote this originally in preparation for supporting longer cancel keys, but it's a nice cleanup on its own. Reviewed-by: Jelte Fennema-Nio Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi	2024-07-29 15:37:48 +03:00
Richard Guo	513f4472a4	Reduce memory used by partitionwise joins In try_partitionwise_join, we aim to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child-join relations for them. With potentially thousands of partitions, the local objects allocated in each iteration can accumulate significant memory usage. Therefore, we opt to eagerly free these local objects at the end of each iteration. In line with this approach, this patch frees the bitmap set that represents the relids of child-join relations at the end of each iteration. Additionally, it modifies build_child_join_rel() to reuse the AppendRelInfo structures generated within each iteration. Author: Ashutosh Bapat Reviewed-by: David Christensen, Richard Guo Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com	2024-07-29 11:35:51 +09:00
Jeff Davis	1c461a8d8d	Refactor: make default_locale internal to pg_locale.c. Discussion: https://postgr.es/m/2228884bb1f1a02614b39f71a90c94d2cc8a3a2f.camel@j-davis.com Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-28 13:07:25 -07:00
David Rowley	17a5871d9d	Optimize escaping of JSON strings There were quite a few places where we either had a non-NUL-terminated string or a text Datum which we needed to call escape_json() on. Many of these places required that a temporary string was created due to the fact that escape_json() needs a NUL-terminated cstring. For text types, those first had to be converted to cstring before calling escape_json() on them. Here we introduce two new functions to make escaping JSON more optimal: escape_json_text() can be given a text Datum to append onto the given buffer. This is more optimal as it foregoes the need to convert the text Datum into a cstring. A temporary allocation is only required if the text Datum needs to be detoasted. escape_json_with_len() can be used when the length of the cstring is already known or the given string isn't NUL-terminated. Having this allows various places which were creating a temporary NUL-terminated string to just call escape_json_with_len() without any temporary memory allocations. Discussion: https://postgr.es/m/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com Reviewed-by: Melih Mutlu, Heikki Linnakangas	2024-07-27 23:46:07 +12:00
Robert Haas	8a53539bd6	Wait for WAL summarization to catch up before creating .partial file. When a standby is promoted, CleanupAfterArchiveRecovery() may decide to rename the final WAL file from the old timeline by adding ".partial" to the name. If WAL summarization is enabled and this file is renamed before its partial contents are summarized, WAL summarization breaks: the summarizer gets stuck at that point in the WAL stream and just errors out. To fix that, first make the startup process wait for WAL summarization to catch up before renaming the file. Generally, this should be quick, and if it's not, the user can shut off summarize_wal and try again. To make this fix work, also teach the WAL summarizer that after a promotion has occurred, no more WAL can appear on the previous timeline: previously, the WAL summarizer wouldn't switch to the new timeline until we actually started writing WAL there, but that meant that when the startup process was waiting for the WAL summarizer, it was waiting for an action that the summarizer wasn't yet prepared to take. In the process of fixing these bugs, I realized that the logic to wait for WAL summarization to catch up was spread out in a way that made it difficult to reuse properly, so this code refactors things to make it easier. Finally, add a test case that would have caught this bug and the previously-fixed bug that WAL summarization sometimes needs to back up when the timeline changes. Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com	2024-07-26 15:00:48 -04:00
Daniel Gustafsson	161c73462b	Fix macro placement in pg_config.h.in Commit `274bbced85` accidentally placed the pg_config.h.in for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader places it. Fix by re-arranging and backpatch to the same level as the original commit. Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru> Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru Backpatch-through: v12	2024-07-26 16:25:28 +02:00
Heikki Linnakangas	20e0e7da9b	Add test for early backend startup errors The new test tests the libpq fallback behavior on an early error, which was fixed in the previous commit. This adds an IS_INJECTION_POINT_ATTACHED() macro, to allow writing injected test code alongside the normal source code. In principle, the new test could've been implemented by an extra test module with a callback that sets the FrontendProtocol global variable, but I think it's more clear to have the test code right where the injection point is, because it has pretty intimate knowledge of the surrounding context it runs in. Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAOYmi%2Bnwvu21mJ4DYKUa98HdfM_KZJi7B1MhyXtnsyOO-PB6Ww%40mail.gmail.com	2024-07-26 15:12:21 +03:00
Heikki Linnakangas	b9e5249c29	Fix using injection points at backend startup in EXEC_BACKEND mode Commit `86db52a506` changed the locking of injection points to use only atomic ops and spinlocks, to make it possible to define injection points in processes that don't have a PGPROC entry (yet). However, it didn't work in EXEC_BACKEND mode, because the pointer to shared memory area was not initialized until the process "attaches" to all the shared memory structs. To fix, pass the pointer to the child process along with other global variables that need to be set up early. Backpatch-through: 17	2024-07-26 15:11:50 +03:00
Daniel Gustafsson	274bbced85	Disable all TLS session tickets OpenSSL supports two types of session tickets for TLSv1.3, stateless and stateful. The option we've used only turns off stateless tickets leaving stateful tickets active. Use the new API introduced in 1.1.1 to disable all types of tickets. Backpatch to all supported versions. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de Backpatch-through: v12	2024-07-26 11:09:45 +02:00
Tom Lane	580f8727ca	Add argument names to the regexp_XXX functions. This change allows these functions to be called using named-argument notation, which can be helpful for readability, particularly for the ones with many arguments. There was considerable debate about exactly which names to use, but in the end we settled on the names already shown in our documentation table 9.10. The citext extension provides citext-aware versions of some of these functions, so add argument names to those too. In passing, fix table 9.10's syntax synopses for regexp_match, which were slightly wrong about which combinations of arguments are allowed. Jian He, reviewed by Dian Fay and others Discussion: https://postgr.es/m/CACJufxG3NFKKsh6x4fRLv8h3V-HvN4W5dA=zNKMxsNcDwOKang@mail.gmail.com	2024-07-25 14:51:46 -04:00
David Rowley	32d3ed8165	Add path column to pg_backend_memory_contexts view "path" provides a reliable method of determining the parent/child relationships between memory contexts. Previously this could be done in a non-reliable way by writing a recursive query and joining the "parent" and "name" columns. This wasn't reliable as the names were not unique, which could result in joining to the wrong parent. To make this reliable, "path" stores an array of numerical identifiers starting with the identifier for TopLevelMemoryContext. It contains an element for each intermediate parent between that and the current context. Incompatibility: Here we also adjust the "level" column to make it 1-based rather than 0-based. A 1-based level provides a convenient way to access elements in the "path" array. e.g. path[level] gives the identifier for the current context. Identifiers are not stable across multiple evaluations of the view. In an attempt to make these more stable for ad-hoc queries, the identifiers are assigned breadth-first. Contexts closer to TopLevelMemoryContext are less likely to change between queries and during queries. Author: Melih Mutlu <m.melihmutlu@gmail.com> Discussion: https://postgr.es/m/CAGPVpCThLyOsj3e_gYEvLoHkr5w=tadDiN_=z2OwsK3VJppeBA@mail.gmail.com Reviewed-by: Andres Freund, Stephen Frost, Atsushi Torikoshi, Reviewed-by: Michael Paquier, Robert Haas, David Rowley	2024-07-25 15:03:28 +12:00
Thomas Munro	f6bef362ca	Refactor tidstore.c iterator buffering. Previously, TidStoreIterateNext() would expand the set of offsets for each block into an internal buffer that it overwrote each time. In order to be able to collect the offsets for multiple blocks before working with them, change the contract. Now, the offsets are obtained by a separate call to TidStoreGetBlockOffsets(), which can be called at a later time. TidStoreIteratorResult objects are safe to copy and store in a queue. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAAKRu_bbkmwAzSBgnezancgJeXrQZXy4G4kBTd+5=cr86H5yew@mail.gmail.com	2024-07-24 17:32:35 +12:00
Amit Kapila	1462aad2e4	Allow altering of two_phase option of a SUBSCRIPTION. The two_phase option is controlled by both the publisher (as a slot option) and the subscriber (as a subscription option), so the slot option must also be modified. Changing the 'two_phase' option for a subscription from 'true' to 'false' is permitted only when there are no pending prepared transactions corresponding to that subscription. Otherwise, the changes of already prepared transactions can be replicated again along with their corresponding commit leading to duplicate data or errors. To avoid data loss, the 'two_phase' option for a subscription can only be changed from 'false' to 'true' once the initial data synchronization is completed. Therefore this is performed later by the logical replication worker. Author: Hayato Kuroda, Ajin Cherian, Amit Kapila Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166	2024-07-24 10:13:36 +05:30
Peter Eisentraut	774d47b6c0	Move all extern declarations for GUC variables to header files Add extern declarations in appropriate header files for global variables related to GUC. In many cases, this was handled quite inconsistently before, with some GUC variables declared in a header file and some only pulled in via ad-hoc extern declarations in various .c files. Also add PGDLLIMPORT qualifications to those variables. These were previously missing because src/tools/mark_pgdllimport.pl has only been used with header files. This also fixes -Wmissing-variable-declarations warnings for GUC variables (not yet part of the standard warning options). Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-24 06:31:07 +02:00
Peter Eisentraut	d3cc5ffe81	Move extern declarations for EXEC_BACKEND to header files This fixes warnings from -Wmissing-variable-declarations (not yet part of the standard warning options) under EXEC_BACKEND. The NON_EXEC_STATIC variables need a suitable declaration in a header file under EXEC_BACKEND. Also fix the inconsistent application of the volatile qualifier for PMSignalState, which was revealed by this change. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 15:07:10 +02:00
Peter Eisentraut	935e675f3c	Get rid of a global variable bootstrap_data_checksum_version can just as easily be passed to where it is used via function arguments. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 10:00:41 +02:00
Michael Paquier	ffb0603929	Improve comments in slru.{c,h} about segment name format slru.h described incorrectly how SLRU segment names are formatted depending on the segment number and if long or short segment names are used. This commit closes the gap with a better description, fitting with the reality. Reported-by: Noah Misch Author: Aleksander Alekseev Discussion: https://postgr.es/m/20240626002747.dc.nmisch@google.com Backpatch-through: 17	2024-07-23 16:54:51 +09:00
Peter Eisentraut	4d130b2872	Windows replacement for strtok_r() They spell it "strtok_s" there. There are currently no uses, but some will be added soon. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-07-23 09:20:22 +02:00
Richard Guo	581df21487	Fix rowcount estimate for gather (merge) paths In the case of a parallel plan, when computing the number of tuples processed per worker, we divide the total number of tuples by the parallel_divisor obtained from get_parallel_divisor(), which accounts for the leader's contribution in addition to the number of workers. Accordingly, when estimating the number of tuples for gather (merge) nodes, we should multiply the number of tuples per worker by the same parallel_divisor to reverse the division. However, currently we use parallel_workers rather than parallel_divisor for the multiplication. This could result in an underestimation of the number of tuples for gather (merge) nodes, especially when there are fewer than four workers. This patch fixes this issue by using the same parallel_divisor for the multiplication. There is one ensuing plan change in the regression tests, but it looks reasonable and does not compromise its original purpose of testing parallel-aware hash join. In passing, this patch removes an unnecessary assignment for path.rows in create_gather_merge_path, and fixes an uninitialized-variable issue in generate_useful_gather_paths. No backpatch as this could result in plan changes. Author: Anthonin Bonnefoy Reviewed-by: Rafia Sabih, Richard Guo Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com	2024-07-23 10:33:26 +09:00
Robert Haas	e4326fbc60	Remove grotty use of disable_cost for TID scan plans. Previously, the code charged disable_cost for CurrentOfExpr, and then subtracted disable_cost from the cost of a TID path that used CurrentOfExpr as the TID qual, effectively disabling all paths except that one. Now, we instead suppress generation of the disabled paths entirely, and generate only the one that the executor will actually understand. With this approach, we do not need to rely on disable_cost being large enough to prevent the wrong path from being chosen, and we save some CPU cycle by avoiding generating paths that we can't actually use. In my opinion, the code is also easier to understand like this. Patch by me. Review by Heikki Linnakangas. Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi	2024-07-22 14:57:53 -04:00
Peter Eisentraut	683be87fbb	Add port/ replacement for strsep() from OpenBSD, similar to strlcat, strlcpy There are currently no uses, but some will be added soon. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-07-22 09:50:30 +02:00
Noah Misch	a858be17c3	Add a way to create read stream object by using SMgrRelation. Currently read stream object can be created only by using Relation. Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	af07a827b9	Refactor PinBufferForBlock() to remove checks about persistence. There are checks in PinBufferForBlock() function to set persistence of the relation. This function is called for each block in the relation. Instead, set persistence of the relation before PinBufferForBlock(). Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	e00c45f685	Remove "smgr_persistence == 0" dead code. Reaching that code would have required multiple processes performing relation extension during recovery, which does not happen. That caller has the persistence available, so pass it. This was dead code as soon as commit `210622c60e` added it. Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Heikki Linnakangas	5784a493f1	Move resowner from common JitContext to LLVM specific Only the LLVM specific code uses it since resource owners were made extensible in commit `b8bff07daa`. This is new in v17, so backpatch there to keep the branches from diverging just yet. Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/fd3a2a00-6605-4e30-a118-48418b478e6e@proxel.se	2024-07-19 10:27:06 +03:00
Robert Haas	402b586d0a	Do not summarize WAL if generated with wal_level=minimal. To do this, we must include the wal_level in the first WAL record covered by each summary file; so add wal_level to struct Checkpoint and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY. This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the Checkpoint is also stored in the control file, also PG_CONTROL_VERSION. It's not great to do that so late in the release cycle, but the alternative seems to ship v17 without robust protections against this scenario, which could result in corrupted incremental backups. A side effect of this patch is that, when a server with wal_level=replica is started with summarize_wal=on for the first time, summarization will no longer begin with the oldest WAL that still exists in pg_wal, but rather from the first checkpoint after that. This change should be harmless, because a WAL summary for a partial checkpoint cycle can never make an incremental backup possible when it would otherwise not have been. Report by Fujii Masao. Patch by me. Review and/or testing by Jakub Wartak and Fujii Masao. Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com	2024-07-18 12:09:48 -04:00
Michael Paquier	a0a5869a85	Add INJECTION_POINT_CACHED() to run injection points directly from cache This new macro is able to perform a direct lookup from the local cache of injection points (refreshed each time a point is loaded or run), without touching the shared memory state of injection points at all. This works in combination with INJECTION_POINT_LOAD(), and it is better than INJECTION_POINT() in a critical section due to the fact that it would avoid all memory allocations should a concurrent detach happen since a LOAD(), as it retrieves a callback from the backend-private memory. The documentation is updated to describe in more details how to use this new macro with a load. Some tests are added to the module injection_points based on a new SQL function that acts as a wrapper of INJECTION_POINT_CACHED(). Based on a suggestion from Heikki Linnakangas. Author: Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/58d588d0-e63f-432f-9181-bed29313dece@iki.fi	2024-07-18 09:50:41 +09:00
Nathan Bossart	a99cc6c6b4	Use PqMsg_* macros in more places. Commit `f4b54e1ed9`, which introduced macros for protocol characters, missed updating a few places. It also did not introduce macros for messages sent from parallel workers to their leader processes. This commit adds a new section in protocol.h for those. Author: Aleksander Alekseev Discussion: https://postgr.es/m/CAJ7c6TNTd09AZq8tGaHS3LDyH_CCnpv0oOz2wN1dGe8zekxrdQ%40mail.gmail.com Backpatch-through: 17	2024-07-17 10:51:00 -05:00
Jeff Davis	4b74ebf726	When creating materialized views, use REFRESH to load data. Previously, CREATE MATERIALIZED VIEW ... WITH DATA populated the MV the same way as CREATE TABLE ... AS. Instead, reuse the REFRESH logic, which locks down security-restricted operations and restricts the search_path. This reduces the chance that a subsequent refresh will fail. Reported-by: Noah Misch Backpatch-through: 17 Discussion: https://postgr.es/m/20240630222344.db.nmisch@google.com	2024-07-16 15:41:29 -07:00
Tom Lane	a0f1fce80c	Add min and max aggregates for composite types (records). Like min/max for arrays, these are just thin wrappers around the existing btree comparison function for records. Aleksander Alekseev Discussion: https://postgr.es/m/CAO=iB8L4WYSNxCJ8GURRjQsrXEQ2-zn3FiCsh2LMqvWq2WcONg@mail.gmail.com	2024-07-11 11:50:50 -04:00
Masahiko Sawada	bb19b70081	Fix possibility of logical decoding partial transaction changes. When creating and initializing a logical slot, the restart_lsn is set to the latest WAL insertion point (or the latest replay point on standbys). Subsequently, WAL records are decoded from that point to find the start point for extracting changes in the DecodingContextFindStartpoint() function. Since the initial restart_lsn could be in the middle of a transaction, the start point must be a consistent point where we won't see the data for partial transactions. Previously, when not building a full snapshot, serialized snapshots were restored, and the SnapBuild jumps to the consistent state even while finding the start point. Consequently, the slot's restart_lsn and confirmed_flush could be set to the middle of a transaction. This could lead to various unexpected consequences. Specifically, there were reports of logical decoding decoding partial transactions, and assertion failures occurred because only subtransactions were decoded without decoding their top-level transaction until decoding the commit record. To resolve this issue, the changes prevent restoring the serialized snapshot and jumping to the consistent state while finding the start point. On v17 and HEAD, a flag indicating whether snapshot restores should be skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION has been bumpded. On backbranches, the flag is stored in the LogicalDecodingContext instead, preserving on-disk compatibility. Backpatch to all supported versions. Reported-by: Drew Callahan Reviewed-by: Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com Backpatch-through: 12	2024-07-11 22:48:23 +09:00
Michael Paquier	9e4664d950	Add a new 'F' entry type for fixed-numbered stats in pgstats file This new entry type is used for all the fixed-numbered statistics, making possible support for custom pluggable stats. In short, we need to be able to detect more easily if a stats kind exists or not when reading back its data from the pgstats file without a dependency on the order of the entries read. The kind ID of the stats is added to the data written. The data is written in the same fashion as previously, with the fixed-numbered stats first and the dshash entries next. The read part becomes more flexible, loading fixed-numbered stats into shared memory based on the new entry type found. Bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 16:12:44 +09:00
Michael Paquier	21471f18e9	Add PgStat_KindInfo.init_shmem_cb This new callback gives fixed-numbered stats the possibility to take actions based on the area of shared memory allocated for them. This removes from pgstat_shmem.c any knowledge specific to the types of fixed-numbered stats, and the initializations happen in their own files. Like `b68b29bc8f`, this change is useful to make this area of the code more pluggable, so as custom fixed-numbered stats can take actions after their shared memory area is initialized. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 09:21:40 +09:00
Michael Paquier	d898665bf7	Extend pg_get_acl() to handle sub-object IDs This patch modifies the pg_get_acl() function to accept a third argument called "objsubid", bringing it on par with similar functions in this area like pg_describe_object(). This enables the retrieval of ACLs for relation attributes when scanning dependencies. Bump catalog version. Author: Joel Jacobson Discussion: https://postgr.es/m/f2539bff-64be-47f0-9f0b-df85d3cc0432@app.fastmail.com	2024-07-10 10:14:37 +09:00
Nathan Bossart	ccd38024bc	Introduce pg_signal_autovacuum_worker. Since commit `3a9b18b309`, roles with privileges of pg_signal_backend cannot signal autovacuum workers. Many users treated the ability to signal autovacuum workers as a feature instead of a bug, so we are reintroducing it via a new predefined role. Having privileges of this new role, named pg_signal_autovacuum_worker, only permits signaling autovacuum workers. It does not permit signaling other types of superuser backends. Bumps catversion. Author: Kirill Reshke Reviewed-by: Anthony Leung, Michael Paquier, Andrey Borodin Discussion: https://postgr.es/m/CALdSSPhC4GGmbnugHfB9G0%3DfAxjCSug_-rmL9oUh0LTxsyBfsg%40mail.gmail.com	2024-07-09 13:03:40 -05:00
Michael Paquier	b68b29bc8f	Use pgstat_kind_infos to write fixed shared statistics This is similar to `9004abf620`, but this time for the write part of the stats file. The code is changed so as, rather than referring to individual members of PgStat_Snapshot in an order based on their PgStat_Kind value, a loop based on pgstat_kind_infos is used to retrieve the contents to write from the snapshot structure, for a size of PgStat_KindInfo's shared_data_len. This requires the addition to PgStat_KindInfo of an offset to track the location of each fixed-numbered stats in PgStat_Snapshot. This change is useful to make this area of the code more easily pluggable, and reduces the knowledge of specific fixed-numbered kinds in pgstat.c. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-09 10:27:12 +09:00
David Rowley	5a1e6df3b8	Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE Nodes like Memoize report the cache stats for each parallel worker, so it makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan in a similar way. Likewise, Sort shows the method and memory used for each worker. There was some discussion on whether the leader stats should include the totals for each parallel worker or not. I did some analysis on this to see what other parallel node types do and it seems only Parallel Hash does anything like this. All the rest, per what's supported by ExecParallelRetrieveInstrumentation() are consistent with each other. Author: David Geier <geidav.pg@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Donghang Lin <donghanglin@gmail.com> Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Christofides <michael@pgmustard.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Donghang Lin <donghanglin@gmail.com> Reviewed-by: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com> Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com	2024-07-09 12:15:47 +12:00
David Rowley	e41f713097	Perform forgotten cat version bump I missed this in `036bdcec9`	2024-07-09 09:56:46 +12:00
David Rowley	036bdcec9f	Teach planner how to estimate rows for timestamp generate_series This provides the planner with row estimates for generate_series(TIMESTAMP, TIMESTAMP, INTERVAL), generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL) and generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL, TEXT) when the input parameter values can be estimated during planning. Author: David Rowley Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAApHDvrBE%3D%2BASo_sGYmQJ3GvO8GPvX5yxXhRS%3Dt_ybd4odFkhQ%40mail.gmail.com	2024-07-09 09:54:59 +12:00
Michael Paquier	e311c6e539	Renumber pg_get_acl() in pg_proc.dat `a6417078c4` has introduced as project policy that new features committed during the development cycle should use new OIDs in the [8000,9999] range. `4564f1cebd` did not respect that rule, so let's renumber pg_get_acl() to use an OID in the correct range. Bump catalog version.	2024-07-08 15:34:33 +09:00
David Rowley	7340d9362a	Widen lossy and exact page counters for Bitmap Heap Scan Both of these counters were using the "long" data type. On MSVC that's a 32-bit type. On modern hardware, I was able to demonstrate that we can wrap those counters with a query that only takes 15 minutes to run. This issue may manifest itself either by not showing the values of the counters because they've wrapped and are less than zero, resulting in them being filtered by the > 0 checks in show_tidbitmap_info(), or bogus numbers being displayed which are modulus 2^32 of the actual number. Widen these counters to uint64. Discussion: https://postgr.es/m/CAApHDvpS_97TU+jWPc=T83WPp7vJa1dTw3mojEtAVEZOWh9bjQ@mail.gmail.com	2024-07-08 14:43:09 +12:00
Thomas Munro	2a5ef09830	Cope with <regex.h> name clashes. macOS 15's SDK pulls in headers related to <regex.h> when we include <xlocale.h>. This causes our own regex_t implementation to clash with the OS's regex_t implementation. Luckily our function names already had pg_ prefixes, but the macros and typenames did not. Include <regex.h> explicitly on all POSIX systems, and fix everything that breaks. Then we can prove that we are capable of fully hiding and replacing the system regex API with our own. 1. Deal with standard-clobbering macros by undefining them all first. POSIX says they are "symbolic constants". If they are macros, this allows us to redefine them. If they are enums or variables, our macros will hide them. 2. Deal with standard-clobbering types by giving our types pg_ prefixes, and then using macros to redirect xxx_t -> pg_xxx_t. After including our "regex/regex.h", the system <regex.h> is hidden, because we've replaced all the standard names. The PostgreSQL source tree and extensions can continue to use standard prefix-less type and macro names, but reach our implementation, if they included our "regex/regex.h" header. Back-patch to all supported branches, so that macOS 15's tool chain can build them. Reported-by: Stan Hu <stanhu@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://postgr.es/m/CAMBWrQnEwEJtgOv7EUNsXmFw2Ub4p5P%2B5QTBEgYwiyjy7rAsEQ%40mail.gmail.com	2024-07-06 10:27:16 +12:00
Nathan Bossart	0b1fe1413e	Remove check hooks for GUCs that contribute to MaxBackends. Each of max_connections, max_worker_processes, autovacuum_max_workers, and max_wal_senders has a GUC check hook that verifies the sum of those GUCs does not exceed a hard-coded limit (see the comment for MAX_BACKENDS in postmaster.h). In general, the hooks effectively guard against egregious misconfigurations. However, this approach has some problems. Since these check hooks are called as each GUC is assigned its user-specified value, only one of the hooks will be called with all the relevant GUCs set. If one or more of the user-specified values are less than the initial values of the GUCs' underlying variables, false positives can occur. Furthermore, the error message emitted when one of the check hooks fails is not tremendously helpful. For example, the command $ pg_ctl -D . start -o "-c max_connections=262100 -c max_wal_senders=10000" fails with the following error: FATAL: invalid value for parameter "max_wal_senders": 10000 Fortunately, there is an extra copy of this check in InitializeMaxBackends() that we can rely on, so this commit removes the aforementioned GUC check hooks in favor of that one. It also enhances the error message to clearly show the values of the relevant GUCs and the hard-coded limit their sum may not exceed. The downside of this change is that server startup progresses further before failing due to such misconfigurations (thus taking longer), but these failures are expected to be rare, so we don't anticipate any real harm in practice. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/ZnMr2k-Nk5vj7T7H%40nathan	2024-07-05 14:42:55 -05:00
Michael Paquier	4b211003ec	Support loading of injection points This can be used to load an injection point and prewarm the backend-level cache before running it, to avoid issues if the point cannot be loaded due to restrictions in the code path where it would be run, like a critical section where no memory allocation can happen (load_external_function() can do allocations when expanding a library name). Tests can use a macro called INJECTION_POINT_LOAD() to load an injection point. The test module injection_points gains some tests, and a SQL function able to load an injection point. Based on a request from Andrey Borodin, who has implemented a test for multixacts requiring this facility. Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/ZkrBE1e2q2wGvsoN@paquier.xyz	2024-07-05 18:09:03 +09:00
Heikki Linnakangas	98347b5a3a	Lift limitation that PGPROC->links must be the first field Since commit `5764f611e1`, we've been using the ilist.h functions for handling the linked list. There's no need for 'links' to be the first element of the struct anymore, except for one call in InitProcess where we used a straight cast from the 'dlist_node ' to PGPROC , without the dlist_container() macro. That was just an oversight in commit `5764f611e1`, fix it. There no imminent need to move 'links' from being the first field, but let's be tidy. Reviewed-by: Aleksander Alekseev, Andres Freund Discussion: https://www.postgresql.org/message-id/22aa749e-cc1a-424a-b455-21325473a794@iki.fi	2024-07-05 11:21:46 +03:00
David Rowley	1eff8279d4	Add memory/disk usage for Material nodes in EXPLAIN Up until now, there was no ability to easily determine if a Material node caused the underlying tuplestore to spill to disk or even see how much memory the tuplestore used if it didn't. Here we add some new functions to tuplestore.c to query this information and add some additional output in EXPLAIN ANALYZE to display this information for the Material node. There are a few other executor node types that use tuplestores, so we could also consider adding these details to the EXPLAIN ANALYZE for those nodes too. Let's consider those independently from this. Having the tuplestore.c infrastructure in to allow that is step 1. Author: David Rowley Reviewed-by: Matthias van de Meent, Dmitry Dolgov Discussion: https://postgr.es/m/CAApHDvp5Py9g4Rjq7_inL3-MCK1Co2CRt_YWFwTU2zfQix0p4A@mail.gmail.com	2024-07-05 14:05:08 +12:00
Richard Guo	aa86129e19	Support "Right Semi Join" plan shapes Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com	2024-07-05 09:26:48 +09:00
Alvaro Herrera	768f0c3e21	Remove bogus assertion in pg_atomic_monotonic_advance_u64 This code wanted to ensure that the 'exchange' variable passed to pg_atomic_compare_exchange_u64 has correct alignment, but apparently platforms don't actually require anything that doesn't come naturally. While messing with pg_atomic_monotonic_advance_u64: instead of using Max() to determine the value to return, just use pg_atomic_compare_exchange_u64()'s return value to decide; also, use pg_atomic_compare_exchange_u64 instead of the _impl version; also remove the unnecessary underscore at the end of variable name "target". Backpatch to 17, where this code was introduced by commit bf3ff7bf83bc. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/36796438-a718-cf9b-2071-b2c1b947c1b5@gmail.com	2024-07-04 13:25:31 +02:00
Michael Paquier	4564f1cebd	Add pg_get_acl() to get the ACL for a database object This function returns the ACL for a database object, specified by catalog OID and object OID. This is useful to be able to retrieve the ACL associated to an object specified with a (class_id,objid) couple, similarly to the other functions for object identification, when joined with pg_depend or pg_shdepend. Original idea by Álvaro Herrera. Bump catalog version. Author: Joel Jacobson Reviewed-by: Isaac Morland, Michael Paquier, Ranier Vilela Discussion: https://postgr.es/m/80b16434-b9b1-4c3d-8f28-569f21c2c102@app.fastmail.com	2024-07-04 17:09:06 +09:00
Peter Eisentraut	8f8bcb8883	Improve some global variable declarations We have in launch_backend.c: /* * The following need to be available to the save/restore_backend_variables * functions. They are marked NON_EXEC_STATIC in their home modules. / extern slock_t ShmemLock; extern slock_t ProcStructLock; extern PGPROC AuxiliaryProcs; extern PMSignalData PMSignalState; extern pg_time_t first_syslogger_file_time; extern struct bkend ShmemBackendArray; extern bool redirection_done; That comment is not completely true: ShmemLock, ShmemBackendArray, and redirection_done are not in fact NON_EXEC_STATIC. ShmemLock once was, but was then needed elsewhere. ShmemBackendArray was static inside postmaster.c before launch_backend.c was created. redirection_done was never static. This patch moves the declaration of ShmemLock and redirection_done to a header file. ShmemBackendArray gets a NON_EXEC_STATIC. This doesn't make a difference, since it only exists if EXEC_BACKEND anyway, but it makes it consistent. After that, the comment is now correct. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-02 07:26:22 +02:00
Tom Lane	edadeb0710	Remove support for HPPA (a/k/a PA-RISC) architecture. This old CPU architecture hasn't been produced in decades, and whatever instances might still survive are surely too underpowered for anyone to consider running Postgres on in production. We'd nonetheless continued to carry code support for it (largely at my insistence), because its unique implementation of spinlocks seemed like a good edge case for our spinlock infrastructure. However, our last buildfarm animal of this type was retired last year, and it seems quite unlikely that another will emerge. Without the ability to run tests, the argument that this is useful test code fails to hold water. Furthermore, carrying code support for an untestable architecture has costs not to be ignored. So, remove HPPA-specific code, in the same vein as commits `718aa43a4` and `92d70b77e`. Discussion: https://postgr.es/m/3351991.1697728588@sss.pgh.pa.us	2024-07-01 13:55:52 -04:00
Nathan Bossart	7967d10c5b	Remove redundant privilege check from pg_sequences system view. This commit adjusts pg_sequence_last_value() to return NULL instead of ERROR-ing for sequences for which the current user lacks privileges. This allows us to remove the call to has_sequence_privilege() in the definition of the pg_sequences system view. Bumps catversion. Suggested-by: Michael Paquier Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20240501005730.GA594666%40nathanxps13	2024-07-01 11:47:40 -05:00
David Rowley	12227a1d5f	Add context type field to pg_backend_memory_contexts Since we now (as of v17) have 4 MemoryContext types, the type of context seems like useful information to include in the pg_backend_memory_contexts view. Here we add that. Reviewed-by: David Christensen, Michael Paquier Discussion: https://postgr.es/m/CAApHDvrXX1OR09Zjb5TnB0AwCKze9exZN%3D9Nxxg1ZCVV8W-3BA%40mail.gmail.com	2024-07-01 21:19:01 +12:00
Amit Kapila	2357c9223b	Rename standby_slot_names to synchronized_standby_slots. The standby_slot_names GUC allows the specification of physical standby slots that must be synchronized before the logical walsenders associated with logical failover slots. However, for this purpose, the GUC name is too generic. Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Masahiko Sawada Backpatch-through: 17 Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us	2024-07-01 11:36:56 +05:30
Michael Paquier	9004abf620	Use pgstat_kind_infos to read fixed shared statistics Shared statistics with a fixed number of objects are read from the stats file in pgstat_read_statsfile() using members of PgStat_ShmemControl and following an order based on their PgStat_Kind value. Instead of being explicit, this commit changes the stats read to iterate over the pgstat_kind_infos array to find the memory locations to read into, based on a new shared_ctl_off in PgStat_KindInfo that can be used to define the position of this stats kind in shared memory. This makes the read logic simpler, and eases the introduction of future improvements aimed at making this area more pluggable for external modules. Original idea suggested by Andres Freund. Author: Tristan Partin Reviewed-by: Andres Freund, Michael Paquier Discussion: https://postgr.es/m/D12SQ7OYCD85.20BUVF3DWU5K7@neon.tech	2024-07-01 14:26:25 +09:00
Michael Paquier	b19db55bd6	Remove PgStat_KindInfo.named_on_disk This field is used to track if a stats kind can use a custom format representation on disk when reading or writing its stats case. On HEAD, this exists for replication slots stats, that need a mapping between an internal index ID and the slot names. named_on_disk is currently used nowhere and the callbacks to_serialized_name and from_serialized_name are in charge of checking if the serialization of the stats data should apply, so let's remove it. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/ZmKVlSX_T5YvIOsd@paquier.xyz	2024-07-01 09:35:36 +09:00
Amit Langote	716bd12d22	SQL/JSON: Always coerce JsonExpr result at runtime Instead of looking up casts at parse time for converting the result of JsonPath* query functions to the specified or the default RETURNING type, always perform the conversion at runtime using either the target type's input function or the function json_populate_type(). There are two motivations for this change: 1. json_populate_type() coerces to types with typmod such that any string values that exceed length limit cause an error instead of silent truncation, which is necessary to be standard-conforming. 2. It was possible to end up with a cast expression that doesn't support soft handling of errors causing bugs in the of handling ON ERROR clause. JsonExpr.coercion_expr which would store the cast expression is no longer necessary, so remove. Bump catversion because stored rules change because of the above removal. Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: Discussion: https://postgr.es/m/202405271326.5a5rprki64aw%40alvherre.pgsql	2024-06-28 21:58:13 +09:00
Robert Haas	065583cf46	Prevent summarizer hang when summarize_wal turned off and back on. Before this commit, when the WAL summarizer started up or recovered from an error, it would resume summarization from wherever it left off. That was OK normally, but wrong if summarize_wal=off had been turned off temporary, allowing some WAL to be removed, and then turned back on again. In such cases, the WAL summarizer would simply hang forever. This commit changes the reinitialization sequence for WAL summarizer to rederive the starting position in the way we were already doing at initial startup, fixing the problem. Per report from Israel Barth Rubio. Reviewed by Tom Lane. Discussion: http://postgr.es/m/CA+TgmoYN6x=YS+FoFOS6=nr6=qkXZFWhdiL7k0oatGwug2hcuA@mail.gmail.com	2024-06-28 08:29:05 -04:00
Noah Misch	4a7f91b3d3	Remove comment about xl_heap_inplace "AT END OF STRUCT". Commit `2c03216d83` moved the tuple data from there to the buffer-0 data. Back-patch to v12 (all supported versions), the plan for the next change to this struct. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com	2024-06-27 19:21:06 -07:00
Noah Misch	f9f47f0d93	Cope with inplace update making catcache stale during TOAST fetch. This extends `ad98fb1422` to invals of inplace updates. Trouble requires an inplace update of a catalog having a TOAST table, so only pg_database was at risk. (The other catalog on which core code performs inplace updates, pg_class, has no TOAST table.) Trouble would require something like the inplace-inval.spec test. Consider GRANT ... ON DATABASE fetching a stale row from cache and discarding a datfrozenxid update that vac_truncate_clog() has already relied upon. Back-patch to v12 (all supported versions). Reviewed (in an earlier version) by Robert Haas. Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com	2024-06-27 19:21:06 -07:00
Noah Misch	0cecc908e9	Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX. Commit `5b562644fe` added a comment that SetRelationHasSubclass() callers must hold this lock. When commit `17f206fbc8` extended use of this column to partitioned indexes, it didn't take the lock. As the latter commit message mentioned, we currently never reset a partitioned index to relhassubclass=f. That largely avoids harm from the lock omission. The cause for fixing this now is to unblock introducing a rule about locks required to heap_update() a pg_class row. This might cause more deadlocks. It gives minor user-visible benefits: - If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks instead of failing with "tuple concurrently updated". (Many cases of DDL concurrency still fail that way.) - Match ALTER INDEX ATTACH PARTITION in choosing to lock the index. While not user-visible today, we'll need this if we ever make something set the flag to false for a partitioned index, like ANALYZE does today for tables. Back-patch to v12 (all supported versions), the plan for the commit relying on the new rule. In back branches, add LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter. Reviewed (in an earlier version) by Robert Haas. Discussion: https://postgr.es/m/20240611024525.9f.nmisch@google.com	2024-06-27 19:21:05 -07:00
Noah Misch	bb93640a68	Add wait event type "InjectionPoint", a custom type like "Extension". Both injection points and customization of type "Extension" are new in v17, so this just changes a detail of an unreleased feature. Reported by Robert Haas. Reviewed by Michael Paquier. Discussion: https://postgr.es/m/CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com	2024-06-27 19:21:05 -07:00
Heikki Linnakangas	cbfbda7841	Fix MVCC bug with prepared xact with subxacts on standby We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi	2024-06-27 21:09:58 +03:00
Alexander Korotkov	70a845c04a	Remove extra comment at TableAmRoutine.scan_analyze_next_block The extra comment was accidentally copied here by `6377e12a` from heapam_scan_analyze_next_block(). Reported-by: Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WjC5PiweG-4oe0hB_Zr59iF3tRE1gURm8w4Cg5b6JEBGw%40mail.gmail.com	2024-06-22 16:17:50 +03:00
John Naylor	fd49e8f323	Prevent access of uninitialized memory in radix tree nodes RT_NODE_16_SEARCH_EQ() performs comparisions using vector registers on x64-64 and aarch64. We apply a mask to the resulting bitfield to eliminate irrelevant bits that may be set. This ensures correct behavior, but Valgrind complains of the partially-uninitialised values. So far the warnings have only occurred on aarch64, which explains why this hasn't been seen earlier. To fix this warning, initialize the whole fixed-sized part of the nodes upon allocation, rather than just do the minimum initialization to function correctly. The initialization for node48 is a bit different in that the 256-byte slot index array must be populated with "invalid index" rather than zero. Experimentation has shown that compilers tend to emit code that uselessly memsets that array twice. To avoid pessimizing this path, swap the order of the slot_idxs[] and isset[] arrays so we can initialize with two non-overlapping memset calls. Reported by Tomas Vondra Analysis and patch by Tom Lane, reviewed by Masahiko Sawada. I investigated the behavior of memset calls to overlapping regions, leading to the above tweaks to node48 as discussed in the thread. Discussion: https://postgr.es/m/120c63ad-3d12-415f-a7bf-3da451c31bf6%40enterprisedb.com	2024-06-21 17:29:39 +07:00
Peter Eisentraut	02bbc3c83a	parse_manifest: Use const char * This adapts the manifest parsing code to take advantage of the const-ified jsonapi. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org	2024-06-21 07:53:30 +02:00
Peter Eisentraut	15cd9a3881	jsonapi: Use const char * Apply const qualifiers to char * arguments and fields throughout the jsonapi. This allows the top-level APIs such as pg_parse_json_incremental() to declare their input argument as const. It also reduces the number of unconstify() calls. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org	2024-06-21 07:53:30 +02:00
Peter Eisentraut	0b06bf9fa9	jsonapi: Use size_t Use size_t instead of int for object sizes in the jsonapi. This makes the API better self-documenting. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org	2024-06-21 07:53:30 +02:00
Amit Langote	0f271e8e8d	SQL/JSON: Correct jsonpath variable name matching Previously, GetJsonPathVar() allowed a jsonpath expression to reference any prefix of a PASSING variable's name. For example, the following query would incorrectly work: SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz); The fix ensures that the length of the variable name mentioned in a jsonpath expression matches exactly with the name of the PASSING variable before comparing the strings using strncmp(). Reported-by: Alvaro Herrera (off-list) Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com	2024-06-19 15:22:06 +09:00
Tom Lane	92c49d1062	Fix insertion of SP-GiST REDIRECT tuples during REINDEX CONCURRENTLY. Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may insert some REDIRECT tuples. This will typically happen in a transaction that lacks an XID, which leads either to assertion failure in spgFormDeadTuple or to insertion of a REDIRECT tuple with zero xid. The latter's not good either, since eventually VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid, resulting in either an assertion failure or a garbage answer. In practice, since REINDEX CONCURRENTLY locks out index scans till it's done, it doesn't matter whether it inserts REDIRECTs or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM reduces such a REDIRECT to a PLACEHOLDER. So in non-assert builds there's no observable problem here, other than perhaps a little index bloat. But it's not behaving as intended. To fix, remove the failing Assert in spgFormDeadTuple, acknowledging that we might sometimes insert a zero XID; and guard VACUUM's GlobalVisTestIsRemovableXid() call with a test for valid XID, ensuring that we'll reduce such a REDIRECT the first time VACUUM sees it. (Versions before v14 use TransactionIdPrecedes here, which won't fail on zero xid, so they really have no bug at all in non-assert builds.) Another solution could be to not create REDIRECTs at all during REINDEX CONCURRENTLY, making the relevant code paths treat that case like index build (which likewise knows that no concurrent index scans can be happening). That would allow restoring the Assert in spgFormDeadTuple, but we'd still need the VACUUM change because redirection tuples with zero xid may be out there already. But there doesn't seem to be a nice way for spginsert() to tell that it's being called in REINDEX CONCURRENTLY without some API changes, so we'll leave that as a possible future improvement. In HEAD, also rename the SpGistState.myXid field to redirectXid, which seems less misleading (since it might not in fact be our transaction's XID) and is certainly less uninformatively generic. Per bug #18499 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org	2024-06-17 14:30:59 -04:00
Tom Lane	35dd40d34c	Improve tracking of role dependencies of pg_init_privs entries. Commit `534287403` invented SHARED_DEPENDENCY_INITACL entries in pg_shdepend, but installed them only for non-owner roles mentioned in a pg_init_privs entry. This turns out to be the wrong thing, because there is nothing to cue REASSIGN OWNED to go and update pg_init_privs entries when the object's ownership is reassigned. That leads to leaving dangling entries in pg_init_privs, as reported by Hannu Krosing. Instead, install INITACL entries for all roles mentioned in pg_init_privs entries (except pinned roles), and change ALTER OWNER to not touch them, just as it doesn't touch pg_init_privs entries. REASSIGN OWNED will now substitute the new owner OID for the old in pg_init_privs entries. This feels like perhaps not quite the right thing, since pg_init_privs ought to be a historical record of the state of affairs just after CREATE EXTENSION. However, it's hard to see what else to do, if we don't want to disallow dropping the object's original owner. In any case this is better than the previous do-nothing behavior, and we're unlikely to come up with a superior solution in time for v17. While here, tighten up some coding rules about how ACLs in pg_init_privs should never be null or empty. There's not any obvious reason to allow that, and perhaps asserting that it's not so will catch some bugs. (We were previously inconsistent on the point, with some code paths taking care not to store empty ACLs and others not.) This leaves recordExtensionInitPrivWorker not doing anything with its ownerId argument, but we'll deal with that separately. catversion bump forced because of change of expected contents of pg_shdepend when pg_init_privs entries exist. Discussion: https://postgr.es/m/CAMT0RQSVgv48G5GArUvOVhottWqZLrvC5wBzBa4HrUdXe9VRXw@mail.gmail.com	2024-06-17 12:55:10 -04:00
Masahiko Sawada	f1affb6705	Reintroduce dead tuple counter in pg_stat_progress_vacuum. Commit `667e65aac3` changed both num_dead_tuples and max_dead_tuples columns to dead_tuple_bytes and max_dead_tuple_bytes columns, respectively. But as per discussion, the number of dead tuples collected still provides meaningful insights for users. This commit reintroduces the column for the count of dead tuples, renamed as num_dead_item_ids. It avoids confusion with the number of dead tuples removed by VACUUM, which includes dead heap-only tuples but excludes any pre-existing LP_DEAD items left behind by opportunistic pruning. Bump catalog version. Reviewed-by: Peter Geoghegan, Álvaro Herrera, Andrey Borodin Discussion: https://postgr.es/m/CAD21AoBL5sJE9TRWPyv%2Bw7k5Ee5QAJqDJEDJBUdAaCzGWAdvZw%40mail.gmail.com	2024-06-14 10:08:15 +09:00
Peter Geoghegan	6207f08f70	Harmonize function parameter names for Postgres 17. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit `035ce1fe`).	2024-06-12 17:01:51 -04:00
Peter Eisentraut	3482bab5e3	meson: Restore implicit warning/debug/optimize flags for extensions Meson uses warning/debug/optimize flags such as "-Wall", "-g", and "-O2" automatically based on "--warnlevel" and "--buildtype" options. And we use "--warning_level=1" and "--buildtype=debugoptimized" by default. But we need these flags for Makefile.global (for extensions) and pg_config, so we need to compute them manually based on the higher-level options. Without this change, extensions building using pgxs wouldn't get -Wall or optimization options. Author: Sutou Kouhei <kou@clear-code.com> Discussion: https://www.postgresql.org/message-id/flat/20240122.141139.931086145628347157.kou%40clear-code.com	2024-06-07 09:36:26 +02:00
Alexander Korotkov	505c008ca3	Restore preprocess_groupclause() `0452b461bc` made optimizer explore alternative orderings of group-by pathkeys. It eliminated preprocess_groupclause(), which was intended to match items between GROUP BY and ORDER BY. Instead, get_useful_group_keys_orderings() function generates orderings of GROUP BY elements at the time of grouping paths generation. The get_useful_group_keys_orderings() function takes into account 3 orderings of GROUP BY pathkeys and clauses: original order as written in GROUP BY, matching ORDER BY clauses as much as possible, and matching the input path as much as possible. Given that even before `0452b461b`, preprocess_groupclause() could change the original order of GROUP BY clauses we don't need to consider it apart from ordering matching ORDER BY clauses. This commit restores preprocess_groupclause() to provide an ordering of GROUP BY elements matching ORDER BY before generation of paths. The new version of preprocess_groupclause() takes into account an incremental sort. The get_useful_group_keys_orderings() function now takes into 2 orderings of GROUP BY elements: the order generated preprocess_groupclause() and the order matching the input path as much as possible. Discussion: https://postgr.es/m/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Andrei Lepikhov, Pavel Borisov	2024-06-06 13:44:34 +03:00
Alexander Korotkov	0c1af2c35c	Rename PathKeyInfo to GroupByOrdering `0452b461bc` made optimizer explore alternative orderings of group-by pathkeys. The PathKeyInfo data structure was used to store the particular ordering of group-by pathkeys and corresponding clauses. It turns out that PathKeyInfo is not the best name for that purpose. This commit renames this data structure to GroupByOrdering, and revises its comment. Discussion: https://postgr.es/m/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov	2024-06-06 13:43:24 +03:00
Alexander Korotkov	199012a3d8	Fix asymmetry in setting EquivalenceClass.ec_sortref `0452b461bc` made get_eclass_for_sort_expr() always set EquivalenceClass.ec_sortref if it's not done yet. This leads to an asymmetric situation when whoever first looks for the EquivalenceClass sets the ec_sortref. It is also counterintuitive that get_eclass_for_sort_expr() performs modification of data structures. This commit makes make_pathkeys_for_sortclauses_extended() responsible for setting EquivalenceClass.ec_sortref. Now we set the EquivalenceClass.ec_sortref's needed to explore alternative GROUP BY ordering specifically during building pathkeys by the list of grouping clauses. Discussion: https://postgr.es/m/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov	2024-06-06 13:41:34 +03:00
Dean Rasheed	5c5bccef21	Fix another couple of outdated comments for MERGE RETURNING. Oversights in `c649fa24a4` which added RETURNING support to MERGE. Discussion: https://postgr.es/m/CAApHDvpqp6vtUzG-_josUEiBGyqnrnVxJ-VdF+hJLXjHdHzsyQ@mail.gmail.com	2024-06-04 09:29:42 +01:00
David Rowley	8559252095	Fix a couple of outdated comments now that we have MERGE RETURNING This has been supported since `c649fa24a`. Discussion: https://postgr.es/m/CAApHDvpqp6vtUzG-_josUEiBGyqnrnVxJ-VdF+hJLXjHdHzsyQ@mail.gmail.com	2024-05-23 15:24:54 +12:00
Robert Haas	12933dc604	Re-allow planner to use Merge Append to efficiently implement UNION. This reverts commit `7204f35919`, thus restoring `66c0185a3` (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits `d5d2205c8`, `3b1a7eb28`, `7487044d6`. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.	2024-05-21 12:44:51 -04:00
Tom Lane	7204f35919	Revert commit `66c0185a3` and follow-on patches. This reverts `66c0185a3` (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits `d5d2205c8`, `3b1a7eb28`, `7487044d6`. In addition to those, `07746a8ef` had to be removed then re-applied in a different place, because `66c0185a3` moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)	2024-05-20 15:08:30 -04:00
Peter Eisentraut	be5942aee7	Remove unused typedefs There were a few typedefs that were never used to define a variable or field. This had the effect that the buildfarm's typedefs.list would not include them, and so they would need to be re-added manually to keep the overall pgindent result perfectly clean. We can easily get rid of these typedefs to avoid the issue. In a few cases, we just let the enum or struct stand on its own without a typedef around it. In one case, an enum was abused to define flag bits; that's better done with macro definitions. This fixes all the remaining issues with missing entries in the buildfarm's typedefs.list. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1919000.1715815925@sss.pgh.pa.us	2024-05-17 07:36:12 +02:00
Michael Paquier	110eb4aefb	Remove enum WaitEventExtension This enum was used to determine the first ID to use when assigning a custom wait event for extensions, which is always 1. It was kept so as it would be possible to add new in-core wait events in the category "Extension". There is no such thing currently, so let's remove this enum until a case justifying it pops up. This makes the code simpler and easier to understand. This has as effect to switch back autoprewarm.c to use PG_WAIT_EXTENSION rather than WAIT_EVENT_EXTENSION, on par with v16 and older stable branches. Thinko in `c9af054653`. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/195c6c45-abce-4331-be6a-e87724e1d060@eisentraut.org	2024-05-17 12:29:57 +09:00
Peter Eisentraut	8aee330af5	Revert temporal primary keys and foreign keys This feature set did not handle empty ranges correctly, and it's now too late for PostgreSQL 17 to fix it. The following commits are reverted: `6db4598fcb` Add stratnum GiST support function `46a0cd4cef` Add temporal PRIMARY KEY and UNIQUE constraints `86232a49a4` Fix comment on gist_stratnum_btree `030e10ff1a` Rename pg_constraint.conwithoutoverlaps to conperiod `a88c800deb` Use daterange and YMD in without_overlaps tests instead of tsrange. `5577a71fb0` Use half-open interval notation in without_overlaps tests `34768ee361` Add temporal FOREIGN KEY contraints `482e108cd3` Add test for REPLICA IDENTITY with a temporal key `c3db1f30cb` doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE `144c2ce0cc` Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org	2024-05-16 08:17:46 +02:00
David Rowley	8cb653b245	Fix incorrect year in some copyright notices added this year A few patches that were written <= 2023 and committed in 2024 still contained 2023 copyright year. Fix that. Discussion: https://postgr.es/m/CAApHDvr5egyW3FmHbAg-Uq2p_Aizwco1Zjs6Vbq18KqN64-hRA@mail.gmail.com	2024-05-15 15:01:21 +12:00
Tom Lane	245f1cec59	Do pre-release housekeeping on catalog data. Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta tasks specified by RELEASE_CHANGES. For reference, the command was ./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6300	2024-05-14 16:46:38 -04:00
Tom Lane	da256a4a7f	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. The pgindent part of this is pretty small, consisting mainly of fixing up self-inflicted formatting damage from patches that hadn't bothered to add their new typedefs to typedefs.list. In order to keep it from making anything worse, I manually added a dozen or so typedefs that appeared in the existing typedefs.list but not in the buildfarm's list. Perhaps we should formalize that, or better find a way to get those typedefs into the automatic list. pgperltidy is as opinionated as always, and reformat-dat-files too.	2024-05-14 16:34:50 -04:00
Tom Lane	1f7452fa59	Remove COMMAND_TAG_NEXTTAG from enum CommandTag. COMMAND_TAG_NEXTTAG isn't really a valid command tag. Declaring it as if it were one prompts complaints from Coverity and perhaps other static analyzers. Our only use of it is in an entirely-unnecessary array sizing declaration, so let's just drop it. Ranier Vilela Discussion: https://postgr.es/m/CAEudQAoY0xrKuTAX7W10zsjjUpKBPFRtdCyScb3Z0FB2v6HNmQ@mail.gmail.com	2024-05-13 13:52:17 -04:00
Alvaro Herrera	6f8bb7c1e9	Revert structural changes to not-null constraints There are some problems with the new way to handle these constraints that were detected at the last minute, and require fixes that appear too invasive to be doing this late in the cycle. Revert this (again) for now, we'll try again with these problems fixed. The following commits are reverted: `b0e96f3119` Catalog not-null constraints `9b581c5341` Disallow changing NO INHERIT status of a not-null constraint `d0ec2ddbe0` Fix not-null constraint test `ac22a9545c` Move privilege check to the right place `b0f7dd915b` Check stack depth in new recursive functions `3af7217942` Update information_schema definition for not-null constraints `c3709100be` Fix propagating attnotnull in multiple inheritance `d9f686a72e` Fix restore of not-null constraints with inheritance `d72d32f52d` Don't try to assign smart names to constraints `0cd711271d` Better handle indirect constraint drops `13daa33fa5` Disallow NO INHERIT not-null constraints on partitioned tables `d45597f72f` Disallow direct change of NO INHERIT of not-null constraints `21ac38f498` Fix inconsistencies in error messages Discussion: https://postgr.es/m/202405110940.joxlqcx4dogd@alvherre.pgsql	2024-05-13 11:31:09 +02:00
Michael Paquier	33181b48fd	Introduce private data area for injection points This commit extends the backend-side infrastructure of injection points so as it becomes possible to register some input data when attaching a point. This private data can be registered with the function name and the library name of the callback when attaching a point, then it is given as input argument to the callback. This gives the possibility for modules to pass down custom data at runtime when attaching a point without managing that internally, in a manner consistent with the callback entry retrieved from the hash shmem table storing the injection point data. InjectionPointAttach() gains two arguments, to be able to define the private data contents and its size. A follow-up commit will rely on this infrastructure to close a race condition with the injection point detach in the module injection_points. While on it, this changes InjectionPointDetach() to return a boolean, returning false if a point cannot be detached. This has been mentioned by Noah as useful when it comes to implement more complex tests with concurrent point detach, solid with the automatic detach done for local points in the test module. Documentation is adjusted in consequence. Per discussion with Noah Misch. Reviewed-by: Noah Misch Discussion: https://postgr.es/m/20240509031553.47@rfd.leadboat.com	2024-05-12 18:53:06 +09:00
Heikki Linnakangas	407e0b023c	Change ALPN protocol ID to IANA-approved "postgresql" "TBD-pgsql" was a placeholder until the IANA registration was approved. Discussion: https://www.postgresql.org/message-id/87jzk2hj2n.fsf%40wibble.ilmari.org Discussion: https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/	2024-05-11 18:48:19 +03:00
Tom Lane	9effc4608e	Repair ALTER EXTENSION ... SET SCHEMA. It turns out that we broke this in commit `e5bc9454e`, because the code was assuming that no dependent types would appear among the extension's direct dependencies, and now they do. This isn't terribly hard to fix: just skip dependent types, expecting that we will recurse to them when we process the parent object (which should also be among the direct dependencies). But a little bit of refactoring is needed so that we can avoid duplicating logic about what is a dependent type. Although there is some testing of ALTER EXTENSION SET SCHEMA, it failed to cover interesting cases, so add more tests. Discussion: https://postgr.es/m/930191.1715205151@sss.pgh.pa.us	2024-05-09 12:19:52 -04:00
Peter Eisentraut	509199587d	Fix assorted bugs related to identity column in partitioned tables When changing the data type of a column of a partitioned table, craft the ALTER SEQUENCE command only once. Partitions do not have identity sequences of their own and thus do not need a ALTER SEQUENCE command for each partition. Fix getIdentitySequence() to fetch the identity sequence associated with the top-level partitioned table when a Relation of a partition is passed to it. While doing so, translate the attribute number of the partition into the attribute number of the partitioned table. Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Discussion: https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7d59@gmail.com	2024-05-07 22:50:00 +02:00
Nathan Bossart	521a7156ab	Fix privilege checks in pg_stats_ext and pg_stats_ext_exprs. The catalog view pg_stats_ext fails to consider privileges for expression statistics. The catalog view pg_stats_ext_exprs fails to consider privileges and row-level security policies. To fix, restrict the data in these views to table owners or roles that inherit privileges of the table owner. It may be possible to apply less restrictive privilege checks in some cases, but that is left as a future exercise. Furthermore, for pg_stats_ext_exprs, do not return data for tables with row-level security enabled, as is already done for pg_stats_ext. On the back-branches, a fix-CVE-2024-4317.sql script is provided that will install into the "share" directory. This file can be used to apply the fix to existing clusters. Bumps catversion on 'master' branch only. Reported-by: Lukas Fittl Reviewed-by: Noah Misch, Tomas Vondra, Tom Lane Security: CVE-2024-4317 Backpatch-through: 14	2024-05-06 09:00:00 -05:00
Alexander Korotkov	d1d286d83c	Revert: Remove useless self-joins This commit reverts `d3d55ce571` and subsequent fixes `2b26a69455`, `93c85db3b5`, `b44a1708ab`, `b7f315c9d7`, `8a8ed916f7`, `b5fb6736ed`, `0a93f803f4`, `e0477837ce`, `a7928a57b9`, `5ef34a8fc3`, `30b4955a46`, `8c441c0827`, `028b15405b`, `fe093994db`, `489072ab7a`, and `466979ef03`. We are quite late in the release cycle and new bugs continue to appear. Even though we have fixes for all known bugs, there is a risk of throwing many bugs to end users. The plan for self-join elimination would be to do more review and testing, then re-commit in the early v18 cycle. Reported-by: Tom Lane Discussion: https://postgr.es/m/2422119.1714691974%40sss.pgh.pa.us	2024-05-06 14:36:36 +03:00
David Rowley	7d2c7f08d9	Fix query pullup issue with WindowClause runCondition `94985c210` added code to detect when WindowFuncs were monotonic and allowed additional quals to be "pushed down" into the subquery to be used as WindowClause runConditions in order to short-circuit execution in nodeWindowAgg.c. The Node representation of runConditions wasn't well selected and because we do qual pushdown before planning the subquery, the planning of the subquery could perform subquery pull-up of nested subqueries. For WindowFuncs with args, the arguments could be changed after pushing the qual down to the subquery. This was made more difficult by the fact that the code duplicated the WindowFunc inside an OpExpr to include in the WindowClauses runCondition field. This could result in duplication of subqueries and a pull-up of such a subquery could result in another initplan parameter being issued for the 2nd version of the subplan. This could result in errors such as: ERROR: WindowFunc not found in subplan target lists To fix this, we change the node representation of these run conditions and instead of storing an OpExpr containing the WindowFunc in a list inside WindowClause, we now store a new node type named WindowFuncRunCondition within a new field in the WindowFunc. These get transformed into OpExprs later in planning once subquery pull-up has been performed. This problem did exist in v15 and v16, but that was fixed by `9d36b883b` and e5d20bbd. Cat version bump due to new node type and modifying WindowFunc struct. Bug: #18305 Reported-by: Zuming Jiang Discussion: https://postgr.es/m/18305-33c49b4c830b37b3%40postgresql.org	2024-05-05 12:54:46 +12:00
David Rowley	a42fc1c903	Fix an assortment of typos Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com	2024-05-04 02:33:25 +12:00
Alvaro Herrera	d45597f72f	Disallow direct change of NO INHERIT of not-null constraints We support changing NO INHERIT constraint to INHERIT for constraints in child relations when adding a constraint to some ancestor relation, and also during pg_upgrade's schema restore; but other than those special cases, command ALTER TABLE ADD CONSTRAINT should not be allowed to change an existing constraint from NO INHERIT to INHERIT, as that would require to process child relations so that they also acquire an appropriate constraint, which we may not be in a position to do. (It'd also be surprising behavior.) It is conceivable that we want to allow ALTER TABLE SET NOT NULL to make such a change; but in that case some more code is needed to implement it correctly, so for now I've made that throw the same error message. Also, during the prep phase of ALTER TABLE ADD CONSTRAINT, acquire locks on all descendant tables; otherwise we might operate on child tables on which no locks are held, particularly in the mode where a primary key causes not-null constraints to be created on children. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/7d923a66-55f0-3395-cd40-81c142b5448b@gmail.com	2024-05-02 17:26:30 +02:00
David Rowley	a63224be49	Ensure we allocate NAMEDATALEN bytes for names in Index Only Scans As an optimization, we store "name" columns as cstrings in btree indexes. Here we modify it so that Index Only Scans convert these cstrings back to names with NAMEDATALEN bytes rather than storing the cstring in the tuple slot, as was happening previously. Bug: #17855 Reported-by: Alexander Lakhin Reviewed-by: Alexander Lakhin, Tom Lane Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org Backpatch-through: 12, all supported versions	2024-05-01 13:21:21 +12:00
Tom Lane	5342874039	Fix failure to track role dependencies of pg_init_privs entries. If an ACL recorded in pg_init_privs mentions a non-pinned role, that reference must also be noted in pg_shdepend so that we know that the role can't go away without removing the ACL reference. Otherwise, DROP ROLE could succeed and leave dangling entries behind, which is what's causing the recent upgrade-check failures on buildfarm member copperhead. This has been wrong since pg_init_privs was introduced, but it's escaped notice because typical pg_init_privs entries would only mention the bootstrap superuser (pinned) or at worst the owner of the extension (who can't go away before the extension does). We lack even a representation of such a role reference for pg_shdepend. My first thought for a solution was entries listing pg_init_privs in classid, but that doesn't work because then there's noplace to put the granted-on object's classid. Rather than adding a new column to pg_shdepend, let's add a new deptype code SHARED_DEPENDENCY_INITACL. Much of the associated boilerplate code can be cribbed from code for SHARED_DEPENDENCY_ACL. A lot of the bulk of this patch just stems from the new need to pass the object's owner ID to recordExtensionInitPriv, so that we can consult it while updating pg_shdepend. While many callers have that at hand already, a few places now need to fetch the owner ID of an arbitrary privilege-bearing object. For that, we assume that there is a catcache on the relevant catalog's OID column, which is an assumption already made in ExecGrant_common so it seems okay here. We do need an entirely new routine RemoveRoleFromInitPriv to perform cleanup of pg_init_privs ACLs during DROP OWNED BY. It's analogous to RemoveRoleFromObjectACL, but we can't share logic because that function operates by building a command parsetree and invoking existing GRANT/REVOKE infrastructure. There is of course no SQL command that would update pg_init_privs entries when we're not in process of creating their extension, so we need a routine that can do the updates directly. catversion bump because this changes the expected contents of pg_shdepend. For the same reason, there's no hope of back-patching this, even though it fixes a longstanding bug. Fortunately, the case where it's a problem seems to be near nonexistent in the field. If it weren't for the buildfarm breakage, I'd have been content to leave this for v18. Patch by me; thanks to Daniel Gustafsson for review and discussion. Discussion: https://postgr.es/m/1745535.1712358659@sss.pgh.pa.us	2024-04-29 19:26:19 -04:00
Peter Eisentraut	592a228372	Revert "Add GUC backtrace_on_internal_error" This reverts commit `a740b213d4`. Subsequent discussion showed that there was interest in a more general facility to configure when server log events would produce backtraces, and this existing limited way couldn't be extended in a compatible way. So the consensus was to revert this for PostgreSQL 17 and reconsider this topic for PostgreSQL 18. Discussion: https://www.postgresql.org/message-id/flat/CAGECzQTChkvn5Xj772LB3%3Dxo2x_LcaO5O0HQvXqobm1xVp6%2B4w%40mail.gmail.com#764bcdbb73e162787e1ad984935e51e3	2024-04-29 10:49:42 +02:00
John Naylor	ed52df3b19	Small cosmetic fixes in radix tree template - Bring memory context names in line with other naming - Fix typos, reported off-list by Alexander Lakhin - Remove copy-paste errors from comments - Remove duplicate #undef	2024-04-27 14:42:01 +07:00
Masahiko Sawada	bb7f195ff7	radixtree: Fix SIGSEGV at update of embeddable value to non-embeddable. Also, fix a memory leak when updating from non-embeddable to embeddable. Both were unreachable without adding C code. Reported-by: Noah Misch Author: Noah Misch Reviewed-by: Masahiko Sawada, John Naylor Discussion: https://postgr.es/m/20240424210319.4c.nmisch%40google.com	2024-04-25 21:48:52 +09:00
Amit Kapila	db08e8c6fa	Post-commit review fixes for slot synchronization. Allow pg_sync_replication_slots() to error out during promotion of standby. This makes the behavior of the SQL function consistent with the slot sync worker. We also ensured that pg_sync_replication_slots() cannot be executed if sync_replication_slots is enabled and the slotsync worker is already running to perform the synchronization of slots. Previously, it would have succeeded in cases when the worker is idle and failed when it is performing sync which could confuse users. This patch fixes another issue in the slot sync worker where SignalHandlerForShutdownRequest() needs to be registered before setting SlotSyncCtx->pid, otherwise, the slotsync worker could miss handling SIGINT sent by the startup process(ShutDownSlotSync) if it is sent before worker could register SignalHandlerForShutdownRequest(). To be consistent, all signal handlers' registration is moved to a prior location before we set the worker's pid. Ensure that we clean up synced temp slots at the end of pg_sync_replication_slots() to avoid such slots being left over after promotion. Ensure that ShutDownSlotSync() captures SlotSyncCtx->pid under spinlock to avoid accessing invalid value as it can be reset by concurrent slot sync exit due to an error. Author: Shveta Malik Reviewed-by: Hou Zhijie, Bertrand Drouvot, Amit Kapila, Masahiko Sawada Discussion: https://postgr.es/m/CAJpy0uBefXUS_TSz=oxmYKHdg-fhxUT0qfjASW3nmqnzVC3p6A@mail.gmail.com	2024-04-25 14:01:44 +05:30
Michael Paquier	ee3ef4af19	Improve comment of DeallocateStmt->isall This field is not used directly in the code, but it is important for query jumbling to be able to make a difference between a named DEALLOCATE and DEALLOCATE ALL (see `bb45156f34`). This behavior is tracked in the regression tests of pg_stat_statements, but the reason why this field is important can be easily missed, as a recent discussion has proved, so let's improve its comment to document the reason why it needs to be around. Wording has been suggested by Tom Lane Discussion: https://postgr.es/m/Zih1ATt37YFda8_p@paquier.xyz	2024-04-25 10:20:49 +09:00
Tom Lane	b7d35d393e	Remove some unnecessary fields from executor nodes. JsonExprState.input_finfo is only assigned to, never read, and it's really fairly useless since the value can be gotten out of the adjacent input_fcinfo field. Let's remove it before someone starts to depend on it. While here, also remove TidScanState.tss_htup and AggState.combinedproj, which are referenced nowhere. Those should have been removed by the commits that caused them to become disused, but were not. I don't think a catversion bump is necessary here, since plan trees are never stored on disk. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WjsY4d0TBymLNGK4zpttUcg_YZaTjyWz2VfDUV6YH8wXQ@mail.gmail.com	2024-04-23 12:55:26 -04:00
Michael Paquier	06a0f4d52b	Remove resowner_private.h This header is not used since the refactoring of resource owners done in `b8bff07daa`, and all the functions declared in it became (well, mostly) static inline local to each resowner kind's code path. Author: Xing Guo Discussion: https://postgr.es/m/CACpMh+BFmtK5Z=b6PvH4HLKhUpWa_VtRTZSrB4-yK-tQejpWGw@mail.gmail.com	2024-04-20 18:01:03 +09:00
Tomas Vondra	41d2c6f952	Add missing index_insert_cleanup calls The optimization for inserts into BRIN indexes added by `c1ec02be1d` relies on a cache that needs to be explicitly released after calling index_insert(). The commit however failed to invoke the cleanup in validate_index(), which calls index_insert() indirectly through table_index_validate_scan(). After inspecting index_insert() callers, it seems unique_key_recheck() is missing the call too. Fixed by adding the two missing index_insert_cleanup() calls. The commit does two additional improvements. The aminsertcleanup() signature is modified to have the index as the first argument, to make it more like the other AM callbacks. And the aminsertcleanup() callback is invoked even if the ii_AmCache is NULL, so that it can decide if the cleanup is necessary. Author: Alvaro Herrera, Tomas Vondra Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql	2024-04-19 16:08:34 +02:00
Dean Rasheed	2e068db56e	Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code. Code quality improvement for `0294df2f1f`. Aleksander Alekseev, reviewed by Richard Guo. Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com	2024-04-19 09:40:20 +01:00
Daniel Gustafsson	f6e8451336	Remove unused function prototype Commit `aafc05de1b` removed StartSlotSyncWorker() but mistakenly left the prototype in slotsync.h. Fix by removing. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-19 09:58:04 +02:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Alvaro Herrera	d9f686a72e	Fix restore of not-null constraints with inheritance In tables with primary keys, pg_dump creates tables with primary keys by initially dumping them with throw-away not-null constraints (marked "no inherit" so that they don't create problems elsewhere), to later drop them once the primary key is restored. Because of a unrelated consideration, on tables with children we add not-null constraints to all columns of the primary key when it is created. If both a table and its child have primary keys, and pg_dump happens to emit the child table first (and its throw-away not-null) and later its parent table, the creation of the parent's PK will fail because the throw-away not-null constraint collides with the permanent not-null constraint that the PK wants to add, so the dump fails to restore. We can work around this problem by letting the primary key "take over" the child's not-null. This requires no changes to pg_dump, just two changes to ALTER TABLE: first, the ability to convert a no-inherit not-null constraint into a regular inheritable one (including recursing down to children, if there are any); second, the ability to "drop" a constraint that is defined both directly in the table and inherited from a parent (which simply means to mark it as no longer having a local definition). Secondarily, change ATPrepAddPrimaryKey() to acquire locks all the way down the inheritance hierarchy, in case we need to recurse when propagating constraints. These two changes allow pg_dump to reproduce more cases involving inheritance from versions 16 and older. Lastly, make two changes to pg_dump: 1) do not try to drop a not-null constraint that's marked as inherited; this allows a dump to restore with no errors if a table with a PK inherits from another which also has a PK; 2) avoid giving inherited constraints throwaway names, for the rare cases where such a constraint survives after the restore. Reported-by: Andrew Bille <andrewbille@gmail.com> Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAJnzarwkfRu76_yi3dqVF_WL-MpvT54zMwAxFwJceXdHB76bOA@mail.gmail.com Discussion: https://postgr.es/m/Zh0aAH7tbZb-9HbC@pryzbyj2023	2024-04-18 15:35:15 +02:00
Amit Langote	ef744ebb73	SQL/JSON: Miscellaneous fixes and improvements This addresses some post-commit review comments for commits `6185c973`, `de3600452`, and 9425c596a0, with the following changes: * Fix JSON_TABLE() syntax documentation to use the term "path_expression" for JSON path expressions instead of "json_path_specification" to be consistent with the other SQL/JSON functions. * Fix a typo in the example code in JSON_TABLE() documentation. * Rewrite some newly added comments in jsonpath.h. * In JsonPathQuery(), add missing cast to int before printing an enum value. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com	2024-04-18 14:46:43 +09:00
Amit Langote	b4fad46b6b	SQL/JSON: Improve some error messages This improves some error messages emitted by SQL/JSON query functions by mentioning column name when available, such as when they are invoked as part of evaluating JSON_TABLE() columns. To do so, a new field column_name is added to both JsonFuncExpr and JsonExpr that is only populated when creating those nodes for transformed JSON_TABLE() columns. While at it, relevant error messages are reworded for clarity. Reported-by: Jian He <jian.universality@gmail.com> Suggested-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com	2024-04-18 14:45:48 +09:00
Andres Freund	3ab8cf9275	Remove GlobalVisTestNonRemovable[Full]Horizon, not used anymore GlobalVisTestNonRemovableHorizon() was only used for the implementation of snapshot_too_old, which was removed in `f691f5b80a`. As using GlobalVisTestNonRemovableHorizon() is not particularly efficient, no new uses for it should be added. Therefore remove. Discussion: https://postgr.es/m/20240415185720.q4dg4dlcyvvrabz4@awork3.anarazel.de	2024-04-17 11:21:17 -07:00
Peter Eisentraut	2ea5d8bece	Mark some new location fields as ParseLoc Some new code probably didn't see `605721f819` and continued to use type int for parse location fields. Fix those.	2024-04-16 20:22:41 +02:00
Tom Lane	ec07d0d7fa	Clean up more indent breakage from `6377e12a5`. Per buildfarm member koel.	2024-04-16 13:00:40 -04:00
Peter Geoghegan	f698704155	Fix nbtree posting list comment. Oversight in commit `0d861bbb70`.	2024-04-16 11:20:41 -04:00
Peter Geoghegan	aa1def44c3	Fix nbtree page recycling comment. Oversight in commit `e5d8a99903`.	2024-04-16 11:14:36 -04:00
Alexander Korotkov	6377e12a5a	revert: Generalize relation analyze in table AM interface This commit reverts `27bc1772fc` and `dd1f6b0c17`. Per review by Andres Freund. Discussion: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de	2024-04-16 13:14:20 +03:00
Michael Paquier	768ceeeaa1	Add missing PGDLLIMPORT markings All backend-side variables should be marked with PGDLLIMPORT, as per policy introduced in `8ec569479f`. `aafc05de1b` has forgotten MyClientSocket, and `05c3980e7f` LoadedSSL. These can be spotted with a command like this one (be careful of not switching __pg_log_level): src/tools/mark_pgdllimport.pl $(git ls-files src/include/) Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/ZhzkRzrkKhbeQMRm@paquier.xyz	2024-04-16 09:38:46 +09:00
Alvaro Herrera	cee8db3f68	ATTACH PARTITION: Don't match a PK with a UNIQUE constraint When matching constraints in AttachPartitionEnsureIndexes() we weren't testing the constraint type, which could make a UNIQUE key lacking a not-null constraint incorrectly satisfy a primary key requirement. Fix this by testing that the constraint types match. (Other possible mismatches are verified by comparing index properties.) Discussion: https://postgr.es/m/202402051447.wimb4xmtiiyb@alvherre.pgsql	2024-04-15 15:07:47 +02:00
Andrew Dunstan	661ab4e185	Fix some memory leaks associated with parsing json and manifests Coverity complained about not freeing some memory associated with incrementally parsing backup manifests. To fix that, provide and use a new shutdown function for the JsonManifestParseIncrementalState object, in line with a suggestion from Tom Lane. While analysing the problem, I noticed a buglet in freeing memory for incremental json lexers. To fix that remove a bogus condition on freeing the memory allocated for them.	2024-04-12 10:32:30 -04:00
David Rowley	3af7040985	Fix IS [NOT] NULL qual optimization for inheritance tables `b262ad440` added code to have the planner remove redundant IS NOT NULL quals and eliminate needless scans for IS NULL quals on tables where the qual's column has a NOT NULL constraint. That commit failed to consider that an inheritance parent table could have differing NOT NULL constraints between the parent and the child. This caused issues as if we eliminated a qual on the parent, when applying the quals to child tables in apply_child_basequals(), the qual might not have been added to the parent's baserestrictinfo. Here we fix this by not applying the optimization to remove redundant quals to RelOptInfos belonging to inheritance parents and applying the optimization again in apply_child_basequals(). Effectively, this means that the parent and child are considered independently as the parent has both an inh=true and inh=false RTE and we still apply the optimization to the RelOptInfo corresponding to the inh=false RTE. We're able to still apply the optimization in add_base_clause_to_rel() for partitioned tables as the NULLability of partitions must match that of their parent. And, if we ever expand restriction_is_always_false() and restriction_is_always_true() to handle partition constraints then we can apply the same logic as, even in multi-level partitioned tables, there's no way to route values to a partition when the qual does not match the partition qual of the partitioned table's parent partition. The same is true for CHECK constraints as those must also match between arent partitioned tables and their partitions. Author: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com	2024-04-12 20:07:53 +12:00
Alexander Korotkov	772faafca1	Revert: Implement pg_wal_replay_wait() stored procedure This commit reverts `06c418e163`, `e37662f221`, `bf1e650806`, `25f42429e2`, `ee79928441`, and `74eaf66f98` per review by Heikki Linnakangas. Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi	2024-04-11 17:28:15 +03:00
Alexander Korotkov	922c4c461d	Revert: Allow table AM to store complex data structures in rd_amcache This commit reverts `02eb07ea89` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:02:49 +03:00
Alexander Korotkov	8dd0bb84da	Revert: Allow table AM tuple_insert() method to return the different slot This commit reverts `c35a3fb5e0` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:02:45 +03:00
Alexander Korotkov	193e6d18e5	Revert: Allow locking updated tuples in tuple_update() and tuple_delete() This commit reverts `87985cc925` and `818861eb57` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:01:34 +03:00
Alexander Korotkov	da841aa4dc	Revert: Let table AM insertion methods control index insertion This commit reverts `b1484a3f19` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:01:30 +03:00
Alexander Korotkov	bc1e2092eb	Revert: Custom reloptions for table AM This commit reverts `9bd99f4c26` and `422041542f` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 15:46:35 +03:00
Masahiko Sawada	810f64a015	Revert indexed and enlargable binary heap implementation. This reverts commit `b840508644` and `bcb14f4abc`. These commits were made for commit `5bec1d6bc5` (Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions). However, per discussion, commit `efb8acc0d0` replaced binary heap + index with pairing heap, and made these commits unnecessary. Reported-by: Jeff Davis Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com	2024-04-11 17:18:05 +09:00
Masahiko Sawada	efb8acc0d0	Replace binaryheap + index with pairingheap in reorderbuffer.c A pairing heap can perform the same operations as the binary heap + index, with as good or better algorithmic complexity, and that's an existing data structure so that we don't need to invent anything new compared to v16. This commit makes the new binaryheap functionality that was added in commits `b840508644` and `bcb14f4abc` unnecessary, but they will be reverted separately. Remove the optimization to only build and maintain the heap when the amount of memory used is close to the limit, becuase the bookkeeping overhead with the pairing heap seems to be small enough that it doesn't matter in practice. Reported-by: Jeff Davis Author: Heikki Linnakangas Reviewed-by: Michael Paquier, Hayato Kuroda, Masahiko Sawada Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com	2024-04-11 17:04:38 +09:00
Alexander Korotkov	ff9f72c68f	revert: Transform OR clauses to ANY expression This commit reverts `72bd38cc99` due to implementation and design issues. Reported-by: Tom Lane Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us	2024-04-10 02:28:09 +03:00
John Naylor	0fe5f64367	Teach radix tree to embed values at runtime Previously, the decision to store values in leaves or within the child pointer was made at compile time, with variable length values using leaves by necessity. This commit allows introspecting the length of variable length values at runtime for that decision. This requires the ability to tell whether the last-level child pointer is actually a value, so we use a pointer tag in the lowest level bit. Use this in TID store. This entails adding a byte to the header to reserve space for the tag. Commit `f35bd9bf3` stores up to three offsets within the header with no bitmap, and now the header can be embedded as above. This reduces worst-case memory usage when TIDs are sparse. Reviewed (in an earlier version) by Masahiko Sawada Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com	2024-04-08 18:54:35 +07:00
Alexander Korotkov	dd1f6b0c17	Provide a way block-level table AMs could re-use acquire_sample_rows() While keeping API the same, this commit provides a way for block-level table AMs to re-use existing acquire_sample_rows() by providing custom callbacks for getting the next block and the next tuple. Reported-by: Andres Freund Discussion: https://postgr.es/m/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de Reviewed-by: Pavel Borisov	2024-04-08 14:39:48 +03:00
Alexander Korotkov	df64c81ca9	Fix some grammer errors from error messages and codes comments Discussion: https://postgr.es/m/CAHewXNkGMPU50QG7V6Q60JGFORfo8LfYO1_GCkCa0VWbmB-fEw%40mail.gmail.com Author: Tender Wang	2024-04-08 14:39:41 +03:00
Alexander Korotkov	422041542f	Fill CommonRdOptions with default values in extract_autovac_opts() Reported-by: Thomas Munro Reported-by: Pavel Borisov Discussion: https://postgr.es/m/CA%2BhUKGLZzLR50RBvuqOO3MZ%3DF54ETz-rTp1PDX9uDGP_GqyYqA%40mail.gmail.com	2024-04-08 12:18:23 +03:00
Alexander Korotkov	9bd99f4c26	Custom reloptions for table AM Let table AM define custom reloptions for its tables. This allows specifying AM-specific parameters by the WITH clause when creating a table. The reloptions, which could be used outside of table AM, are now extracted into the CommonRdOptions data structure. These options could be by decision of table AM directly specified by a user or calculated in some way. The new test module test_tam_options evaluates the ability to set up custom reloptions and calculate fields of CommonRdOptions on their base. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis	2024-04-08 11:23:28 +03:00
John Naylor	8a1b31e6e5	Use bump context for TID bitmaps stored by vacuum Vacuum does not pfree individual entries, and only frees the entire storage space when finished with it. This allows using a bump context, eliminating the chunk header in each leaf allocation. Most leaf allocations will be 16 to 32 bytes, so that's a significant savings. TidStoreCreateLocal gets a boolean parameter to indicate that the created store is insert-only. This requires a separate tree context for iteration, since we free the iteration state after iteration completes. Discussion: https://postgr.es/m/CANWCAZac%3DpBePg3rhX8nXkUuaLoiAJJLtmnCfZsPEAS4EtJ%3Dkg%40mail.gmail.com Discussion: https://postgr.es/m/CANWCAZZQFfxvzO8yZHFWtQV+Z2gAMv1ku16Vu7KWmb5kZQyd1w@mail.gmail.com	2024-04-08 14:39:49 +07:00
Amit Langote	bb766cde63	JSON_TABLE: Add support for NESTED paths and columns A NESTED path allows to extract data from nested levels of JSON objects given by the parent path expression, which are projected as columns specified using a nested COLUMNS clause, just like the parent COLUMNS clause. Rows comprised from a NESTED columns are "joined" to the row comprised from the parent columns. If a particular NESTED path evaluates to 0 rows, then the nested COLUMNS will emit NULLs, making it an OUTER join. NESTED columns themselves may include NESTED paths to allow extracting data from arbitrary nesting levels, which are likewise joined against the rows at the parent level. Multiple NESTED paths at a given level are called "sibling" paths and their rows are combined by UNIONing them, that is, after being joined against the parent row as described above. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-08 16:14:13 +09:00
Thomas Munro	13453eedd3	Add pg_buffercache_evict() function for testing. When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com	2024-04-08 16:23:40 +12:00
Andres Freund	af7e90a277	simplehash: Free collisions array in SH_STAT While SH_STAT() is only used for debugging, the allocated array can be large, and therefore should be freed. It's unclear why coverity started warning now. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Coverity Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us Backpatch: 12-	2024-04-07 19:08:41 -07:00
Heikki Linnakangas	91044ae4ba	Send ALPN in TLS handshake, require it in direct SSL connections libpq now always tries to send ALPN. With the traditional negotiated SSL connections, the server accepts the ALPN, and refuses the connection if it's not what we expect, but connecting without ALPN is still OK. With the new direct SSL connections, ALPN is mandatory. NOTE: This uses "TBD-pgsql" as the protocol ID. We must register a proper one with IANA before the release! Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion	2024-04-08 04:24:51 +03:00
Heikki Linnakangas	d39a49c1e4	Support TLS handshake directly without SSLRequest negotiation By skipping SSLRequest, you can eliminate one round-trip when establishing a TLS connection. It is also more friendly to generic TLS proxies that don't understand the PostgreSQL protocol. This is disabled by default in libpq, because the direct TLS handshake will fail with old server versions. It can be enabled with the sslnegotation=direct option. It will still fall back to the negotiated TLS handshake if the server rejects the direct attempt, either because it is an older version or the server doesn't support TLS at all, but the fallback can be disabled with the sslnegotiation=requiredirect option. Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion	2024-04-08 04:24:49 +03:00
Thomas Munro	041b96802e	Use streaming I/O in ANALYZE. The ANALYZE command prefetches and reads sample blocks chosen by a BlockSampler algorithm. Instead of calling [Prefetch\|Read]Buffer() for each block, ANALYZE now uses the streaming API introduced in `b5a9b18cd0`. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com	2024-04-08 13:16:28 +12:00
Alexander Korotkov	72bd38cc99	Transform OR clauses to ANY expression Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>	2024-04-08 01:27:52 +03:00
Thomas Munro	b7b0f3f272	Use streaming I/O in sequential scans. Instead of calling ReadBuffer() for each block, heap sequential scans and TID range scans now use the streaming API introduced in `b5a9b18cd0`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com	2024-04-08 01:53:57 +12:00
David Rowley	6ed83d5fa5	Use bump memory context for tuplesorts `29f6a959c` added a bump allocator type for efficient compact allocations. Here we make use of this for non-bounded tuplesorts to store tuples. This is very space efficient when storing narrow tuples due to bump.c not having chunk headers. This means we can fit more tuples in work_mem before spilling to disk, or perform an in-memory sort touching fewer cacheline. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-08 00:32:26 +12:00
Alvaro Herrera	f3ff7bf83b	Add XLogCtl->logInsertResult This tracks the position of WAL that's been fully copied into WAL buffers by all processes emitting WAL. (For some reason we call that "WAL insertion"). This is updated using atomic monotonic advance during WaitXLogInsertionsToFinish, which is not when the insertions actually occur, but it's the only place where we know where have all the insertions have completed. This value is useful in WALReadFromBuffers, which can verify that callers don't try to read past what has been inserted. (However, more infrastructure is needed in order to actually use WAL after the flush point, since it could be lost.) The value is also useful in WaitXLogInsertionsToFinish() itself, since we can now exit quickly when all WAL has been already inserted, without even having to take any locks.	2024-04-07 14:06:30 +02:00
David Rowley	29f6a959cf	Introduce a bump memory allocator This introduces a bump MemoryContext type. The bump context is best suited for short-lived memory contexts which require only allocations of memory and never a pfree or repalloc, which are unsupported. Memory palloc'd into a bump context has no chunk header. This makes bump a useful context type when lots of small allocations need to be done without any need to pfree those allocations. Allocation sizes are rounded up to the next MAXALIGN boundary, so with this and no chunk header, allocations are very compact indeed. Allocations are also very fast as bump does not check any freelists to try and make use of previously free'd chunks. It just checks if there is enough room on the current block, and if so it bumps the freeptr beyond this chunk and returns the value that the freeptr was previously pointing to. Simple and fast. A new block is malloc'd when there's not enough space in the current block. Code using the bump allocator must take care never to call any functions which could try to call realloc() (or variants), pfree(), GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated chunk. Due to lack of chunk headers, these operations are unsupported. To increase the chances of catching such issues, when compiled with MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and any attempt to perform an unsupported operation will result in an ERROR. Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported operation could result in a segfault. A follow-on commit will implement the first user of bump. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-08 00:02:43 +12:00
David Rowley	0ba8b75e7e	Enlarge bit-space for MemoryContextMethodID Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did technically allow a maximum of 8 memory context types, however, we've opted to reserve some bit patterns which left us with only 4 slots, all of which were used. Here we add another bit which frees up 8 slots for future memory context types. In passing, adjust the enum names in MemoryContextMethodID to make it more clear which ones can be used and which ones are reserved. Author: Matthias van de Meent, David Rowley Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com	2024-04-07 23:32:00 +12:00
Nathan Bossart	41c51f0c68	Optimize visibilitymap_count() with AVX-512 instructions. Commit `792752af4e` added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com	2024-04-06 22:58:23 -05:00
Nathan Bossart	792752af4e	Optimize pg_popcount() with AVX-512 instructions. Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com	2024-04-06 21:56:23 -05:00
Tomas Vondra	04e72ed617	BitmapHeapScan: Push skip_fetch optimization into table AM Commit `7c70996ebf` introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-07 00:24:14 +02:00
Alexander Korotkov	87c21bb941	Implement ALTER TABLE ... SPLIT PARTITION ... command This new DDL command splits a single partition into several parititions. Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are created using createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:44 +03:00
Alexander Korotkov	1adf16b8fb	Implement ALTER TABLE ... MERGE PARTITIONS ... command This new DDL command merges several partitions into the one partition of the target table. The target partition is created using new createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:43 +03:00
Alexander Korotkov	ee79928441	Clarify what is protected by WaitLSNLock Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock. There is one now documented exclusion on fast-path checking of WaitLSNProcInfo.inHeap flag. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql	2024-04-07 00:49:53 +03:00
Alexander Korotkov	25f42429e2	Use an LWLock instead of a spinlock in waitlsn.c This should prevent busy-waiting when number of waiting processes is high. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Author: Alvaro Herrera	2024-04-07 00:49:53 +03:00
Peter Geoghegan	5bf748b86b	Enhance nbtree ScalarArrayOp execution. Commit `9e8da0f7` taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit `807a40c5` taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit `a4523c5a`, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com	2024-04-06 11:47:10 -04:00
John Naylor	a365d9e2e8	Speed up tail processing when hashing aligned C strings, take two After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. This was slower than the unaligned case for very short strings. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. In addition to endianness issues, the previous attempt upset valgrind in the way it computed the mask. Whether by accident or by wisdom, the author's proposed method passes locally with valgrind 3.22. Ants Aasma, with cosmetic adjustments by me Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-04-06 17:14:28 +07:00
John Naylor	0c25fee359	Teach fasthash_accum to use platform endianness for bytewise loads This function previously used a mix of word-wise loads and bytewise loads. The bytewise loads happened to be little-endian regardless of platform. This in itself is not a problem. However, a future commit will require the same result whether A) the input is loaded as a word with the relevent bytes masked-off, or B) the input is loaded one byte at a time. While at it, improve debuggability of the internal hash state. Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com	2024-04-06 17:14:28 +07:00
Thomas Munro	3bd8439ed6	Allow BufferAccessStrategy to limit pin count. While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com	2024-04-06 23:11:45 +13:00
John Naylor	f956ecd035	Convert uses of hash_string_pointer to fasthash equivalent Remove duplicate hash_string_pointer() function definitions by creating a new inline function hash_string() for this purpose. This has the added advantage of avoiding strlen() calls when doing hash lookup. It's not clear how many of these are perfomance-sensitive enough to benefit from that, but the simplification is worth it on its own. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com	2024-04-06 12:20:40 +07:00

... 6 7 8 9 10 ...

12197 commits