postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-23 10:55:21 -04:00

Author	SHA1	Message	Date
Richard Guo	24225ad9aa	Pathify RHS unique-ification for semijoin planning There are two implementation techniques for semijoins: one uses the JOIN_SEMI jointype, where the executor emits at most one matching row per left-hand side (LHS) row; the other unique-ifies the right-hand side (RHS) and then performs a plain inner join. The latter technique currently has some drawbacks related to the unique-ification step. * Only the cheapest-total path of the RHS is considered during unique-ification. This may cause us to miss some optimization opportunities; for example, a path with a better sort order might be overlooked simply because it is not the cheapest in total cost. Such a path could help avoid a sort at a higher level, potentially resulting in a cheaper overall plan. * We currently rely on heuristics to choose between hash-based and sort-based unique-ification. A better approach would be to generate paths for both methods and allow add_path() to decide which one is preferable, consistent with how path selection is handled elsewhere in the planner. * In the sort-based implementation, we currently pay no attention to the pathkeys of the input subpath or the resulting output. This can result in redundant sort nodes being added to the final plan. This patch improves semijoin planning by creating a new RelOptInfo for the RHS rel to represent its unique-ified version. It then generates multiple paths that represent elimination of distinct rows from the RHS, considering both a hash-based implementation using the cheapest total path of the original RHS rel, and sort-based implementations that either exploit presorted input paths or explicitly sort the cheapest total path. All resulting paths compete in add_path(), and those deemed worthy of consideration are added to the new RelOptInfo. Finally, the unique-ified rel is joined with the other side of the semijoin using a plain inner join. As a side effect, most of the code related to the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or RHS path should be made unique -- has been removed. Besides, the T_Unique path now has the same meaning for both semijoins and upper DISTINCT clauses: it represents adjacent-duplicate removal on presorted input. This patch unifies their handling by sharing the same data structures and functions. This patch also removes the UNIQUE_PATH_NOOP related code along the way, as it is dead code -- if the RHS rel is provably unique, the semijoin should have already been simplified to a plain inner join by analyzejoins.c. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com	2025-08-19 09:35:40 +09:00
Etsuro Fujita	5a8ab650a7	Update obsolete comments in ResultRelInfo struct. Commit `c5b7ba4e6` changed things so that the ri_RootResultRelInfo field of this struct is set for both partitions and inheritance children and used for tuple routing and transition capture (before that commit, it was only set for partitions to route tuples into), but failed to update these comments. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAPmGK14NF5CcdCmTZpxrvpvBiT0y4EqKikW1r_wAu1CEHeOmUA%40mail.gmail.com Backpatch-through: 14	2025-08-17 19:40:00 +09:00
Tom Lane	2a600a93c7	Make type Datum be 8 bytes wide everywhere. This patch makes sizeof(Datum) be 8 on all platforms including 32-bit ones. The objective is to allow USE_FLOAT8_BYVAL to be true everywhere, and in consequence to remove a lot of code that is specific to pass-by-reference handling of float8, int8, etc. The code for abbreviated sort keys can be simplified similarly. In this way we can reduce the maintenance effort involved in supporting 32-bit platforms, without going so far as to actually desupport them. Since Datum is strictly an in-memory concept, this has no impact on on-disk storage, though an initdb or pg_upgrade will be needed to fix affected catalog entries. We have required platforms to support [u]int64 for ages, so this breaks no supported platform. We can expect that this change will make 32-bit builds a bit slower and more memory-hungry, although being able to use pass-by-value handling of 8-byte types may buy back some of that. But we stopped optimizing for 32-bit cases a long time ago, and this seems like just another step on that path. This initial patch simply forces the correct type definition and USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler complaints that ensued. This is sufficient for testing purposes. In the wake of a bunch of Datum-conversion cleanups by Peter Eisentraut, this now compiles cleanly with gcc on a 32-bit platform. (I'd only tested the previous version with clang, which it turns out is less picky than gcc about width-changing coercions.) There is a good deal of now-dead code that I'll remove in separate follow-up patches. A catversion bump is required because this affects initial catalog contents (on 32-bit machines) in two ways: pg_type.typbyval changes for some built-in types, and Const nodes in stored views/rules will now have 8 bytes not 4 for pass-by-value types. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us	2025-08-13 17:18:22 -04:00
Michael Paquier	e125e36002	Rename CachedPlanType to PlannedStmtOrigin for PlannedStmt Commit `719dcf3c42` introduced a field called CachedPlanType in PlannedStmt to allow extensions to determine whether a cached plan is generic or custom. After discussion, the concepts that we want to track are a bit wider than initially anticipated, as it is closer to knowing from which "source" or "origin" a PlannedStmt has been generated or retrieved. Custom and generic cached plans are a subset of that. Based on the state of HEAD, we have been able to define two more origins: - "standard", for the case where PlannedStmt is generated in standard_planner(), the most common case. - "internal", for the fake PlannedStmt generated internally by some query patterns. This could be tuned in the future depending on what is needed. This looks like a good starting point, at least. The default value is called "UNKNOWN", provided as fallback value. This value is not used in the core code, the idea is to let extensions building their own PlannedStmts know about this new field. Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aILaHupXbIGgF2wJ@paquier.xyz	2025-07-31 10:06:34 +09:00
David Rowley	4bc62b8684	Display Memoize planner estimates in EXPLAIN There've been a few complaints that it can be overly difficult to figure out why the planner picked a Memoize plan. To help address that, here we adjust the EXPLAIN output to display the following additional details: 1) The estimated number of cache entries that can be stored at once 2) The estimated number of unique lookup keys that we expect to see 3) The number of lookups we expect 4) The estimated hit ratio Technically #4 can be calculated using #1, #2 and #3, but it's not a particularly obvious calculation, so we opt to display it explicitly. The original patch by Lukas Fittl only displayed the hit ratio, but there was a fear that might lead to more questions about how that was calculated. The idea with displaying all 4 is to be transparent which may allow queries to be tuned more easily. For example, if #2 isn't correct then maybe extended statistics or a manual n_distinct estimate can be used to help fix poor plan choices. Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com	2025-07-29 15:18:01 +12:00
Michael Paquier	719dcf3c42	Introduce field tracking cached plan type in PlannedStmt PlannedStmt gains a new field, called CachedPlanType, able to track if a given plan tree originates from the cache and if we are dealing with a generic or custom cached plan. This field can be used for monitoring or statistical purposes, in the executor hooks, for example, based on the planned statement attached to a QueryDesc. A patch is under discussion for pg_stat_statements to provide an equivalent of the counters in pg_prepared_statements for custom and generic plans, to provide a more global view of such data, as this data is now restricted to the current session. The concept introduced in this commit is useful on its own, and has been extracted from a larger patch by the same author. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0uFw8Y9GCFvafhC=OA8NnMqVZyzXPfv_EePOt+iv1T-qQ@mail.gmail.com	2025-07-24 15:41:18 +09:00
Richard Guo	e2debb6438	Reduce "Var IS [NOT] NULL" quals during constant folding In commit `b262ad440`, we introduced an optimization that reduces an IS [NOT] NULL qual on a NOT NULL column to constant true or constant false, provided we can prove that the input expression of the NullTest is not nullable by any outer joins or grouping sets. This deduction happens quite late in the planner, during the distribution of quals to rels in query_planner. However, this approach has some drawbacks: we can't perform any further folding with the constant, and it turns out to be prone to bugs. Ideally, this deduction should happen during constant folding. However, the per-relation information about which columns are defined as NOT NULL is not available at that point. This information is currently collected from catalogs when building RelOptInfos for base or "other" relations. This patch moves the collection of NOT NULL attribute information for relations before pull_up_sublinks, storing it in a hash table keyed by relation OID. It then uses this information to perform the NullTest deduction for Vars during constant folding. This also makes it possible to leverage this information to pull up NOT IN subqueries. Note that this patch does not get rid of restriction_is_always_true and restriction_is_always_false. Removing them would prevent us from reducing some IS [NOT] NULL quals that we were previously able to reduce, because (a) the self-join elimination may introduce new IS NOT NULL quals after constant folding, and (b) if some outer joins are converted to inner joins, previously irreducible NullTest quals may become reducible. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com	2025-07-22 11:21:36 +09:00
Nathan Bossart	a4f126516e	Add option list to CHECKPOINT command. This commit adds the boilerplate code for supporting a list of options in CHECKPOINT commands. No actual options are supported yet, but follow-up commits will add support for MODE and FLUSH_UNLOGGED. While at it, this commit refactors the code for executing CHECKPOINT commands to its own function since it's about to become significantly larger. Author: Christoph Berg <myon@debian.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de	2025-07-11 11:51:25 -05:00
Michael Paquier	8ec04c8577	Refactor subtype field of AlterDomainStmt AlterDomainStmt.subtype used characters for its subtypes of commands, SET\|DROP DEFAULT\|NOT NULL and ADD\|DROP\|VALIDATE CONSTRAINT, which were hardcoded in a couple of places of the code. The code is improved by using an enum instead, with the same character values as the original code. Note that the field was documented in parsenodes.h and that it forgot to mention 'V' (VALIDATE CONSTRAINT). Author: Quan Zongliang <quanzongliang@yeah.net> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/41ff310b-16bd-44b9-a3ef-97e20f14b709@yeah.net	2025-07-03 16:34:28 +09:00
Peter Eisentraut	f039c22441	meson: Increase minimum version to 0.57.2 The previous minimum was to maintain support for Python 3.5, but we now require Python 3.6 anyway (commit `45363fca63`), so that reason is obsolete. A small raise to Meson 0.57 allows getting rid of a fair amount of version conditionals and silences some future-deprecated warnings. With the version bump, the following deprecation warnings appeared and are fixed: WARNING: Project targets '>=0.57' but uses feature deprecated since '0.55.0': ExternalProgram.path. use ExternalProgram.full_path() instead WARNING: Project targets '>=0.57' but uses feature deprecated since '0.56.0': meson.build_root. use meson.project_build_root() or meson.global_build_root() instead. It turns out that meson 0.57.0 and 0.57.1 are buggy for our use, so the minimum is actually set to 0.57.2. This is specific to this version series; in the future we won't necessarily need to be this precise. Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/42e13eb0-862a-441e-8d84-4f0fd5f6def0%40eisentraut.org	2025-07-02 11:14:53 +02:00
Peter Eisentraut	de5aa15209	Reformat some node comments Use per-field comments for IndexInfo, instead of one big header comment listing all the fields. This makes the relevant comments easier to find, and it will also make it less likely that comments are not updated when fields are added or removed, as has happened in the past. Author: Japin Li <japinli@hotmail.com> Discussion: https://www.postgresql.org/message-id/flat/ME0P300MB04453E6C7EA635F0ECF41BFCB6832%40ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-07-02 09:50:51 +02:00
Michael Paquier	bee23ea4dd	Show sizes of FETCH queries as constants in pg_stat_statements Prior to this patch, every FETCH call would generate a unique queryId with a different size specified. Depending on the workloads, this could lead to a significant bloat in pg_stat_statements, as repeatedly calling a specific cursor would result in a new queryId each time. For example, FETCH 1 c1; and FETCH 2 c1; would produce different queryIds. This patch improves the situation by normalizing the fetch size, so as semantically similar statements generate the same queryId. As a result, statements like the below, which differ syntactically but have the same effect, will now share a single queryId: FETCH FROM c1 FETCH NEXT c1 FETCH 1 c1 In order to do a normalization based on the keyword used in FETCH, FetchStmt is tweaked with a new FetchDirectionKeywords. This matters for "howMany", which could be set to a negative value depending on the direction, and we want to normalize the queries with enough information about the direction keywords provided, including RELATIVE, ABSOLUTE or all the ALL variants. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0tA6LbHCg2qSS+KuM850BZC_+ZgHV7Ug6BXw22TNyF+MA@mail.gmail.com	2025-07-02 08:39:25 +09:00
Peter Eisentraut	184595836b	Update comment for IndexInfo.ii_NullsNotDistinct Commit `7a7b3e11e6` added the ii_NullsNotDistinct field, but the comment was not updated. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ME0P300MB04453E6C7EA635F0ECF41BFCB6832%40ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-07-01 23:12:24 +02:00
Peter Eisentraut	7a7b3e11e6	Update comment for IndexInfo.ii_WithoutOverlaps Commit `fc0438b4e8` added the ii_WithoutOverlaps field, but the comment was not updated. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ME0P300MB04453E6C7EA635F0ECF41BFCB6832%40ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-07-01 20:37:24 +02:00
Peter Eisentraut	9e5fee8228	Fix outdated comment for IndexInfo Commit `7841623571` removed the ii_OpclassOptions field, but the comment was not updated. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ME0P300MB04453E6C7EA635F0ECF41BFCB6832%40ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-07-01 20:12:36 +02:00
Álvaro Herrera	c2da1a5d63	Make query jumbling also squash PARAM_EXTERN params Commit `62d712ecfd` made query jumbling squash lists of Consts as a single element, but there's no reason not to treat PARAM_EXTERN parameters the same. For these purposes, these values are indeed constants for any particular execution of a query. In particular, this should make list squashing more useful for applications using extended query protocol, which would use parameters extensively. A complication arises: if a query has both external parameters and squashable lists, then the parameter number used as placeholder for the squashed list might be inconsistent with regards to the parameter numbers used by the query literal. To reduce the surprise factor, all parameters are renumbered starting from 1 in that case. Author: Sami Imseih <samimseih@gmail.com> Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0tRXoPG2y6bMgBCWNDt0Tn=unRerbzYM=oW0syi1=C1OA@mail.gmail.com	2025-06-24 19:36:32 +02:00
Álvaro Herrera	0f65f3eec4	Fix squashing algorithm for query texts The algorithm to squash lists of constants added by commit `62d712ecfd` was a bit too simplistic; we wanted to avoid adding unnecessary complexity, but cases like direct function calls of typecasting functions (and others) were missed, and bogus SQL syntax was being shown in pg_stat_statements normalized query text field. To fix normalization for those cases, we need the parser to transmit information about were each list of constant values starts and ends, so add that to a couple of nodes. Also add a few more test cases to make sure we're doing the right thing. The patch initially submitted by Sami added a new private struct in gram.y to carry the start/end information for A_Expr, but I (Álvaro) decided that a better fix was to remove the parser indirection via the in_expr production, and instead create separate components in the a_expr rule. I'm surprised that this works and doesn't require more changes, but I assume (without checking) that the grammar used to be more complex and got simplified at some point. Bump catversion. Author: Sami Imseih <samimseih@gmail.com> Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0tRXoPG2y6bMgBCWNDt0Tn=unRerbzYM=oW0syi1=C1OA@mail.gmail.com	2025-06-12 14:21:21 +02:00
Michael Paquier	f85f6ab051	Revert support for improved tracking of nested queries This commit reverts the two following commits: - `499edb0974`, track more precisely query locations for nested statements. - `06450c7b8c`, a follow-up fix of `499edb0974` with query locations. The test introduced in this commit is not reverted. This is proving useful to track a problem that only pgaudit was able to detect. These prove to have issues with the tracking of SELECT statements, when these use multiple parenthesis which is something supported by the grammar. Incorrect location and lengths are causing pg_stat_statements to become confused, failing its job in query normalization with potential out-of-bound writes because the location and the length may not match with what can be handled. A lot of the query patterns discussed when this issue was reported have no test coverage in the main regression test suite, or the recovery test 027_stream_regress.pl would have caught the problems as pg_stat_statements is loaded by the node running the regression tests. A first step would be to improve the test coverage to stress more the query normalization logic. A different portion of this work was done in `45e0ba30fc`, with the addition of tests for nested queries. These can be left in the tree. They are useful to track the way inner queries are currently tracked by PGSS with non-top-level entries, and will be useful when reconsidering in the future the work reverted here. Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru> Discussion: https://postgr.es/m/18947-cdd2668beffe02bf@postgresql.org	2025-06-12 10:08:55 +09:00
Michael Paquier	e050af2868	Change internal plan ID type from uint64 to int64 uint64 was chosen to be consistent with the type used by the query ID, but the conclusion of a recent discussion for the query ID is that int64 is a better fit as the signed form is shown to the user, for PGSS or EXPLAIN outputs. This commit changes the plan ID to use int64, following `c3eda50b06` that has done the same for the query ID. The plan ID is new to v18, introduced in `2a0cd38da5`. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aCvzJNwetyEI3Sgo@paquier.xyz	2025-05-31 09:40:45 +09:00
David Rowley	c3eda50b06	Change internal queryid type from uint64 to int64 uint64 was perhaps chosen in `cff440d36` as the type was uint32 prior to that widening work. Having this as uint64 doesn't make much sense and just adds the overhead of having to remember that we always output this in its signed form. Let's remove that overhead. The signed form output is seemingly required since we have no way to represent the full range of uint64 in an SQL type. We use BIGINT in places like pg_stat_statements, which maps directly to int64. The release notes "Source Code" section may want to mention this adjustment as some extensions may wish to adjust their code. Author: David Rowley <dgrowleyml@gmail.com> Suggested-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/50cb0c8b-994b-48f9-a1c4-13039eb3536b@eisentraut.org	2025-05-30 22:59:39 +12:00
Amit Langote	1722d5eb05	Revert "Don't lock partitions pruned by initial pruning" As pointed out by Tom Lane, the patch introduced fragile and invasive design around plan invalidation handling when locking of prunable partitions was deferred from plancache.c to the executor. In particular, it violated assumptions about CachedPlan immutability and altered executor APIs in ways that are difficult to justify given the added complexity and overhead. This also removes the firstResultRels field added to PlannedStmt in commit `28317de72`, which was intended to support deferred locking of certain ModifyTable result relations. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us	2025-05-22 17:02:35 +09:00
Richard Guo	c06e909c26	Track the number of presorted outer pathkeys in MergePath When creating an explicit Sort node for the outer path of a mergejoin, we need to determine the number of presorted keys of the outer path to decide whether explicit incremental sort can be applied. Currently, this is done by repeatedly calling pathkeys_count_contained_in. This patch caches the number of presorted outer pathkeys in MergePath, allowing us to save several calls to pathkeys_count_contained_in. It can be considered a complement to the changes in commit `828e94c9d`. Reported-by: David Rowley <dgrowleyml@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAApHDvqvBireB_w6x8BN5txdvBEHxVgZBt=rUnpf5ww5P_E_ww@mail.gmail.com	2025-05-08 18:21:32 +09:00
Michael Paquier	88e947136b	Fix typos and grammar in the code The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com	2025-04-19 19:17:42 +09:00
David Rowley	d69d45a5a9	Speedup child EquivalenceMember lookup in planner When planning queries to partitioned tables, we clone all EquivalenceMembers belonging to the partitioned table into em_is_child EquivalenceMembers for each non-pruned partition. For partitioned tables with large numbers of partitions, this meant the ec_members list could become large and code searching that list would become slow. Effectively, the more partitions which were present, the more searches needed to be performed for operations such as find_ec_member_matching_expr() during create_plan() and the more partitions present, the longer these searches would take, i.e., a quadratic slowdown. To fix this, here we adjust how we store EquivalenceMembers for em_is_child members. Instead of storing these directly in ec_members, these are now stored in a new array of Lists in the EquivalenceClass, which is indexed by the relid. When we want to find EquivalenceMembers belonging to a certain child relation, we can narrow the search to the array element for that relation. To make EquivalenceMember lookup easier and to reduce the amount of code change, this commit provides a pair of functions to allow iteration over the EquivalenceMembers of an EC which also handles finding the child members, if required. Callers that never need to look at child members can remain using the foreach loop over ec_members, which will now often be faster due to only parent-level members being stored there. The actual performance increases here are highly dependent on the number of partitions and the query being planned. Performance increases can be visible with as few as 8 partitions, but the speedup is marginal for such low numbers of partitions. The speedups become much more visible with a few dozen to hundreds of partitions. With some tested queries using 56 partitions, the planner was around 3x faster than before. For use cases with thousands of partitions, these are likely to become significantly faster. Some testing has shown planner speedups of 60x or more with 8192 partitions. Author: Yuya Watari <watari.yuya@gmail.com> Co-authored-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Tested-by: Thom Brown <thom@linux.com> Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com> Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com	2025-04-08 18:09:57 +12:00
Peter Eisentraut	8123e91f5a	Convert PathKey to use CompareType Change the PathKey struct to use CompareType to record the sort direction instead of hardcoding btree strategy numbers. The CompareType is then converted to the index-type-specific strategy when the plan is created. This reduces the number of places btree strategy numbers are hardcoded, and it's a self-contained subset of a larger effort to allow non-btree indexes to behave like btrees. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-04-04 11:22:20 +02:00
Amit Langote	88f55bc976	Make derived clause lookup in EquivalenceClass more efficient Derived clauses are stored in ec_derives, a List of RestrictInfos. These clauses are later looked up by matching the left and right EquivalenceMembers along with the clause's parent EC. This linear search becomes expensive in queries with many joins or partitions, where ec_derives may contain thousands of entries. In particular, create_join_clause() can spend significant time scanning this list. To improve performance, introduce a hash table (ec_derives_hash) that is built when the list reaches 32 entries -- the same threshold used for join_rel_hash. The original list is retained alongside the hash table to support EC merging and serialization (_outEquivalenceClass()). Each clause is stored in the hash table using a canonicalized key: the EquivalenceMember with the lower memory address is placed in the key before the one with the higher memory address. This avoids storing or searching for both permutations of the same clause. For clauses involving a constant EM, the key places NULL in the first slot and the non-constant EM in the second. The hash table is initialized using list_length(ec_derives_list) as the size hint. simplehash internally adjusts this to the next power of two after dividing by the fillfactor, so this typically results in at least 64 buckets near the threshold -- avoiding immediate resizing while adapting to the actual number of entries. The lookup logic for derived clauses is now centralized in ec_search_derived_clause_for_ems(), which consults the hash table when available and falls back to the list otherwise. The new ec_clear_derived_clauses() always frees ec_derives_list, even though some of the original code paths that cleared the old ec_derives field did not. This ensures consistent cleanup and avoids leaking memory when large lists are discarded. An assertion originally placed in find_derived_clause_for_ec_member() is moved into ec_search_derived_clause_for_ems() so that it is enforced consistently, regardless of whether the hash table or list is used for lookup. This design incorporates suggestions by David Rowley, who proposed both the key canonicalization and the initial sizing approach to balance memory usage and CPU efficiency. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Tested-by: Amit Langote <amitlangote09@gmail.com> Tested-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com	2025-04-04 10:45:05 +09:00
Peter Eisentraut	748e98d05b	Fix code comment The changes made in commit `d2b4b4c225` contained incorrect comments: They said that certain forward declarations were necessary to "avoid including pathnodes.h here", but the file is itself pathnodes.h! So change the comment to just say it's a forward declaration in one case, and in the other case we don't need the declaration at all because it already appeared earlier in the file.	2025-04-02 14:46:47 +02:00
Peter Eisentraut	eec0040c4b	Add support for NOT ENFORCED in foreign key constraints This expands the NOT ENFORCED constraint flag, previously only supported for CHECK constraints (commit `ca87c415e2`), to foreign key constraints. Normally, when a foreign key constraint is created on a table, action and check triggers are added to maintain data integrity. With this patch, if a constraint is marked as NOT ENFORCED, integrity checks are no longer required, making these triggers unnecessary. Consequently, when creating a NOT ENFORCED foreign key constraint, triggers will not be created, and the constraint will be marked as NOT VALID. Similarly, if an existing foreign key constraint is changed to NOT ENFORCED, the associated triggers will be dropped, and the constraint will also be marked as NOT VALID. Conversely, if a NOT ENFORCED foreign key constraint is changed to ENFORCED, the necessary triggers will be created, and the will be changed to VALID by performing necessary validation. Since not-enforced foreign key constraints have no triggers, the shortcut used for example in psql and pg_dump to skip looking for foreign keys if the relation is known not to have triggers no longer applies. (It already didn't work for partitioned tables.) Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Isaac Morland <isaac.morland@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Tested-by: Triveni N <triveni.n@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-04-02 13:36:44 +02:00
Peter Eisentraut	cdc168ad4b	Add support for not-null constraints on virtual generated columns This was left out of the original patch for virtual generated columns (commit `83ea6c5402`). This just involves a bit of extra work in the executor to expand the generation expressions and run a "IS NOT NULL" test against them. There is also a bit of work to make sure that not-null constraints are checked during a table rewrite. Author: jian he <jian.universality@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Navneet Kumar <thanit3111@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com	2025-03-28 13:53:37 +01:00
Peter Eisentraut	9a9ead1105	Rename a node field for clarity Rename ResultRelInfo.ri_ConstraintExprs to ri_CheckConstraintExprs. This reflects its specific purpose better and avoids confusion with adjacent fields with similar but distinct purposes. Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com	2025-03-28 09:50:01 +01:00
Álvaro Herrera	9fbd53dea5	Remove the query_id_squash_values GUC Commit `62d712ecfd` introduced the capability to calculate the same queryId for queries with different lengths of constants in a list for an IN clause. This behavior was originally enabled with a GUC query_id_squash_values. After a discussion about the value of such a GUC, it was decided to back out of the use of a GUC and make the squashing behavior the only available option. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com	2025-03-27 13:33:37 +01:00
David Rowley	f31aad9b07	Fix query jumbling to account for NULL nodes Previously NULL nodes were ignored. This could cause issues where the computed query ID could match for queries where fields that are next to each other in their Node struct where one field was NULL and the other non-NULL. For example, the Query struct had distinctClause and sortClause next to each other. If someone wrote; SELECT DISTINCT c1 FROM t; and then; SELECT c1 FROM t ORDER BY c1; these would produce the same query ID since, in the first query, we ignored the NULL sortClause and appended the jumble bytes for the distictClause. In the latter query, since we did nothing for the NULL distinctClause then jumble the non-NULL sortClause, and since the node representation stored is the same in both cases, the query IDs were identical. Here we fix this by always accounting for NULL nodes by recording that we saw a NULL in the jumble buffer. This fixes the issue as the order that the NULL is recorded isn't the same in the above two queries. Author: Bykov Ivan <i.bykov@modernsys.ru> Author: Michael Paquier <michael@paquier.xyz> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain	2025-03-27 18:23:00 +13:00
Michael Paquier	787514b30b	Use relation name instead of OID in query jumbling for RangeTblEntry custom_query_jumble (introduced in `5ac462e2b7` as a node field attribute) is now assigned to the expanded reference name "eref" of RangeTblEntry, adding in the query jumble computation the non-qualified aliased relation name, without the list of column names. The relation OID is removed from the query jumbling. The effects of this change can be seen in the tests added by `3430215fe3`, where pg_stat_statements (PGSS) entries are now grouped using the relation name, ignoring the relation search_path may point at. For example, these two relations are different, but are now grouped in a single PGSS entry as they are assigned the same query ID: CREATE TABLE foo1.tab (a int); CREATE TABLE foo2.tab (b int); SET search_path = 'foo1'; SELECT count() FROM tab; SET search_path = 'foo2'; SELECT count() FROM tab; SELECT count() FROM foo1.tab; SELECT count() FROM foo2.tab; SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab'; query \| calls --------------------------+------- SELECT count(*) FROM tab \| 4 (1 row) It is still possible to use an alias in the FROM clause to split these. This behavior is useful for relations re-created with the same name, where queries based on such relations would be grouped in the same PGSS entry. For permanent schemas, it should not really matter in practice. The main benefit is for workloads that use a lot of temporary relations, which are usually re-created with the same name continuously. These can be a heavy source of bloat in PGSS depending on the workload. Such entries can now be grouped together, improving the user experience. The original idea from Christoph Berg used catalog lookups to find temporary relations, something that the query jumble has never done, and it could cause some performance regressions. The idea to use RangeTblEntry.eref and the relation name, applying the same rules for all relations, temporary and not temporary, has been proposed by Tom Lane. The documentation additions have been suggested by Sami Imseih. Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z9iWXKGwkm8RAC93@msg.df7cb.de	2025-03-26 15:21:05 +09:00
Michael Paquier	5ac462e2b7	Add support for custom_query_jumble as a node field attribute This option gives the possibility for query jumble to define a custom routine for the field of a Node, extending support for custom_query_jumble as a node field attribute. When dealing with complex node structures, this can be simpler than having to enforce a custom function across a full node. Custom functions need to be defined in queryjumblefuncs.c, named as _jumble${node}_${field}(), and use in input the JumbleState, the node and its field. The field is not really required if we have the Node, but it makes custom implementations somewhat easier to think about. The code generated by gen_node_support.pl uses a macro called JUMBLE_CUSTOM(), hiding the internals of the logic inside queryjumblefuncs.c. This will be used by an upcoming patch manipulating adding a custom routine into a field of RangeTblEntry, but this facility can become useful in more cases. Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/Z9y43-dRvb4EtxQ0@paquier.xyz	2025-03-25 14:18:00 +09:00
Jeff Davis	626df47ad9	Remove 'additional' pointer from TupleHashEntryData. Reduces memory required for hash aggregation by avoiding an allocation and a pointer in the TupleHashEntryData structure. That structure is used for all buckets, whether occupied or not, so the savings is substantial. Discussion: https://postgr.es/m/AApHDvpN4v3t_sdz4dvrv1Fx_ZPw=twSnxuTEytRYP7LFz5K9A@mail.gmail.com Reviewed-by: David Rowley <dgrowleyml@gmail.com>	2025-03-24 22:06:02 -07:00
Jeff Davis	4d143509cb	Create accessor functions for TupleHashEntry. Refactor for upcoming optimizations. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/1cc3b400a0e8eead18ff967436fa9e42c0c14cfb.camel@j-davis.com	2025-03-24 22:05:41 -07:00
Jeff Davis	cc721c459d	HashAgg: use Bump allocator for hash TupleHashTable entries. The entries aren't freed until the entire hash table is destroyed, so use the Bump allocator to improve allocation speed, avoid wasting space on the chunk header, and avoid wasting space due to the power-of-two allocations. Discussion: https://postgr.es/m/CAApHDvqv1aNB4cM36FzRwivXrEvBO_LsG_eQ3nqDXTjECaatOQ@mail.gmail.com Reviewed-by: David Rowley	2025-03-24 22:05:33 -07:00
Michael Paquier	2a0cd38da5	Allow plugins to set a 64-bit plan identifier in PlannedStmt This field can be optionally set in a PlannedStmt through the planner hook, giving extensions the possibility to assign an identifier related to a computed plan. The backend is changed to report it in the backend entry of a process running (including the extended query protocol), with semantics and APIs to set or get it similar to what is used for the existing query ID (introduced in the backend via `4f0b0966c8`). The plan ID is reset at the same timing as the query ID. Currently, this information is not added to the system view pg_stat_activity; extensions can access it through PgBackendStatus. Some patches have been proposed to provide some features in the planning area, where a plan identifier is used as a key to know the plan involved (for statistics, plan storage and manipulations, etc.), and the point of this commit is to provide an anchor in the backend that extensions can rely on for future work. The reset of the plan identifier is controlled by core and follows the same pattern as the query identifier added in `4f0b0966c8`. The contents of this commit are extracted from a larger set proposed originally by Lukas Fittl, that Sami Imseih has proposed as an independent change, with a few tweaks sprinkled by me. Author: Lukas Fittl <lukas@fittl.com> Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com Discussion: https://postgr.es/m/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw@mail.gmail.com	2025-03-24 13:23:42 +09:00
Amit Langote	28317de723	Ensure first ModifyTable rel initialized if all are pruned Commit `cbc127917e` introduced tracking of unpruned relids to avoid processing pruned relations, and changed ExecInitModifyTable() to initialize only unpruned result relations. As a result, MERGE statements that prune all target partitions can now lead to crashes or incorrect behavior during execution. The crash occurs because some executor code paths rely on ModifyTableState.resultRelInfo[0] being present and initialized, even when no result relations remain after pruning. For example, ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo to determine the appropriate action. Similarly, ExecInitPartitionInfo() assumes that at least one result relation exists. To preserve these assumptions, ExecInitModifyTable() now includes the first result relation in the initialized result relation list if all result relations for that ModifyTable were pruned. To enable that, ExecDoInitialPruning() ensures the first relation is locked if it was pruned and locking is necessary. To support this exception to the pruning logic, PlannedStmt now includes a list of RT indexes identifying the first result relation of each ModifyTable node in the plan. This allows ExecDoInitialPruning() to check whether each such relation was pruned and, if so, lock it if necessary. Bug: #18830 Reported-by: Robins Tharakan <tharakan@gmail.com> Diagnozed-by: Tender Wang <tndrwang@gmail.com> Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org	2025-03-19 12:14:24 +09:00
Álvaro Herrera	62d712ecfd	Introduce squashing of constant lists in query jumbling pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Tested-by: Yasuo Honda <yasuo.honda@gmail.com> Tested-by: Sergei Kornilov <sk@zsrv.org> Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com> Tested-by: Chengxi Sun <sunchengxi@highgo.com> Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com	2025-03-18 18:56:11 +01:00
Robert Haas	c65bc2e1d1	Make it possible for loadable modules to add EXPLAIN options. Modules can use RegisterExtensionExplainOption to register new EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState, and SetExplainExtensionState to store related state inside the ExplainState object. Since this substantially increases the amount of code that needs to handle ExplainState-related tasks, move a few bits of existing code to a new file explain_state.c and add the rest of this infrastructure there. See the comments at the top of explain_state.c for further explanation of how this mechanism works. This does not yet provide a way for such such options to do anything useful. The intention is that we'll add hooks for that purpose in a separate commit. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>	2025-03-18 08:41:12 -04:00
Melanie Plageman	2b73a8cd33	BitmapHeapScan uses the read stream API Make Bitmap Heap Scan use the read stream API instead of invoking ReadBuffer() for each block indicated by the bitmap. The read stream API handles prefetching, so remove all of the explicit prefetching from bitmap heap scan code. Now, heap table AM implements a read stream callback which uses the bitmap iterator to return the next required block to the read stream code. Tomas Vondra conducted extensive regression testing of this feature. Andres Freund, Thomas Munro, and I analyzed regressions and Thomas Munro patched the read stream API. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Tested-by: Tomas Vondra <tomas@vondra.me> Tested-by: Andres Freund <andres@anarazel.de> Tested-by: Thomas Munro <thomas.munro@gmail.com> Tested-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2025-03-15 10:34:42 -04:00
Melanie Plageman	944e81bf99	Separate TBM[Shared\|Private]Iterator and TBMIterateResult Remove the TBMIterateResult member from the TBMPrivateIterator and TBMSharedIterator and make tbm_[shared\|private_]iterate() take a TBMIterateResult as a parameter. This allows tidbitmap API users to manage multiple TBMIterateResults per scan. This is required for bitmap heap scan to use the read stream API, with which there may be multiple I/Os in flight at once, each one with a TBMIterateResult. Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me	2025-03-15 10:11:19 -04:00
Peter Eisentraut	05cbd6cb22	Swap order of extern/static and pg_nodiscard When pg_nodiscard was first added, the C standard draft had it as a function specifier, and so the code comment about placement was written with that in mind. The final C23 standard has it as an attribute and the placement rules are a bit different for that. Specifically, it needs to be before extern or static. (Or at least both current clang and gcc require that.) So just swap these. (To be clear: The current implementation with gcc attributes doesn't care. This change is just for maximum forward compatibility for non-gcc compilers.) This also keeps the order consistent with the previously introduced pg_noreturn. Also update the code comment to reflect the mentioned developments since its introduction. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw	2025-03-14 07:18:07 +01:00
Tom Lane	8b1b342544	Improve EXPLAIN's display of window functions. Up to now we just punted on showing the window definitions used in a plan, with window function calls represented as "OVER (?)". To improve that, show the window definition implemented by each WindowAgg plan node, and reference their window names in OVER. For nameless window clauses generated by "OVER (...)", assign unique names w1, w2, etc. In passing, re-order the properties shown for a WindowAgg node so that the Run Condition (if any) appears after the Window property and before the Filter (if any). This seems more sensible since the Run Condition is associated with the Window and acts before the Filter. Thanks to David G. Johnston and Álvaro Herrera for design suggestions. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us	2025-03-11 11:19:54 -04:00
Peter Geoghegan	0fbceae841	Show index search count in EXPLAIN ANALYZE, take 2. Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit `5bf748b8`). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit `5ead85fb` (though that was quickly reverted by commit `d00107cd` following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit `5a1e6df3`, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com	2025-03-11 09:20:50 -04:00
Álvaro Herrera	f4e53e10b6	Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT This allows to redefine an existing non-inheritable constraint to be inheritable, which allows to straighten up situations with NO INHERIT constraints so that thay can become normal constraints without having to re-verify existing data. For existing inheritance children this may require creating additional constraints, if they don't exist already. It also allows to do the opposite, if only for symmetry. Author: Suraj Kharage <suraj.kharage@enterprisedb.com> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com	2025-03-05 13:50:22 +01:00
Richard Guo	1e4351af32	Expand virtual generated columns in the planner Commit `83ea6c540` added support for virtual generated columns that are computed on read. All Var nodes in the query that reference virtual generated columns must be replaced with the corresponding generation expressions. Currently, this replacement occurs in the rewriter. However, this approach has several issues. If a Var referencing a virtual generated column has any varnullingrels, those varnullingrels need to be propagated into the generation expression. Failing to do so can lead to "wrong varnullingrels" errors and improper outer-join removal. Additionally, if such a Var comes from the nullable side of an outer join, we may need to wrap the generation expression in a PlaceHolderVar to ensure that it is evaluated at the right place and hence is forced to null when the outer join should do so. In certain cases, such as when the query uses grouping sets, we also need a PlaceHolderVar for anything that is not a simple Var to isolate subexpressions. Failure to do so can result in incorrect results. To fix these issues, this patch expands the virtual generated columns in the planner rather than in the rewriter, and leverages the pullup_replace_vars architecture to avoid code duplication. The generation expressions will be correctly marked with nullingrel bits and wrapped in PlaceHolderVars when needed by the pullup_replace_vars callback function. This requires handling the OLD/NEW RETURNING list Vars in pullup_replace_vars_callback, as it may now deal with Vars referencing the result relation instead of a subquery. The "wrong varnullingrels" error was reported by Alexander Lakhin. The incorrect result issue and the improper outer-join removal issue were reported by Richard Guo. Author: Richard Guo <guofenglinux@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com	2025-02-25 16:10:25 +09:00
Melanie Plageman	bfe56cdf9a	Delay extraction of TIDBitmap per page offsets Pages from the bitmap created by the TIDBitmap API can be exact or lossy. The TIDBitmap API extracts the tuple offsets from exact pages into an array for the convenience of the caller. This was done in tbm_private\|shared_iterate() right after advancing the iterator. However, as long as tbm_private\|shared_iterate() set a reference to the PagetableEntry in the TBMIterateResult, the offset extraction can be done later. Waiting to extract the tuple offsets has a few benefits. For the shared iterator case, it allows us to extract the offsets after dropping the shared iterator state lock, reducing time spent holding a contended lock. Separating the iteration step and extracting the offsets later also allows us to avoid extracting the offsets for prefetched blocks. Those offsets were never used, so the overhead of extracting and storing them was wasted. The real motivation for this change, however, is that future commits will make bitmap heap scan use the read stream API. This requires a TBMIterateResult per issued block. By removing the array of tuple offsets from the TBMIterateResult and only extracting the offsets when they are used, we reduce the memory required for per buffer data substantially. Suggested-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com	2025-02-24 16:10:19 -05:00
Melanie Plageman	b8778c4cd8	Add lossy indicator to TBMIterateResult TBMIterateResult->ntuples is -1 when the page in the bitmap is lossy. Add an explicit lossy indicator so that we can move ntuples out of the TBMIterateResult in a future commit. Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com	2025-02-24 16:10:13 -05:00
Amit Langote	525392d572	Don't lock partitions pruned by initial pruning Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit `cbc127917e`), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit `cbc127917e`), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-02-20 17:09:48 +09:00
Álvaro Herrera	80d7f99049	Add ATAlterConstraint struct for ALTER .. CONSTRAINT Replace the use of Constraint with a new ATAlterConstraint struct, which allows us to pass additional information. No functionality is added by this commit. This is necessary for future work that allows altering constraints in other ways. I (Álvaro) took the liberty of restructuring the code for ALTER CONSTRAINT beyond what Amul did. The original coding before Amul's patch was unnecessarily baroque, and this change makes things simpler by removing one level of subroutine. Also, partly remove the assumption that only partitioned tables are relevant (by passing sensible 'recurse' arguments) and no longer ignore whether ONLY was specified. I say 'partly' because the current coding only walks down via the 'conparentid' relationship, which is only used for partitioned tables; but future patches could handle ONLY or not for other types of constraint changes for legacy inheritance trees too. Author: Amul Sul <sulamul@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com	2025-02-19 13:06:13 +01:00
Alexander Korotkov	fc069a3a63	Implement Self-Join Elimination The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-17 12:44:12 +02:00
Amit Langote	75dfde1363	Fix an oversight in `cbc127917` to handle MERGE correctly ExecInitModifyTable() forgot to trim MERGE-related lists to exclude entries for result relations pruned during initial pruning, so fix that. While at it, make the function's use of the pruned resultRelations list, rather than ModifyTable.resultRelations, more consistent. Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith) Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/e72c94d9-e5f9-4753-9bc1-69d72bd54b8a@gmail.com	2025-02-17 16:12:03 +09:00
Tom Lane	c366d2bdba	Allow extension functions to participate in in-place updates. Commit `1dc5ebc90` allowed PL/pgSQL to perform in-place updates of expanded-object variables that are being updated with assignments like "x := f(x, ...)". However this was allowed only for a hard-wired list of functions f(), since we need to be sure that f() will not modify the variable if it fails. It was always envisioned that we should make that extensible, but at the time we didn't have a good way to do so. Since then we've invented the idea of "support functions" to allow attaching specialized optimization knowledge to functions, and that is a perfect mechanism for doing this. Hence, adjust PL/pgSQL to use a support function request instead of hard-wired logic to decide if in-place update is safe. Preserve the previous optimizations by creating support functions for the three functions that were previously hard-wired. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com	2025-02-11 12:49:34 -05:00
Michael Paquier	c9238ad853	Fix indentation of comment in plannodes.h Oversight in commit `3d17d7d7fb`. Worth noting that pgindent was fine as-is. Author: Sami Imseih Discussion: https://postgr.es/m/CAA5RZ0t80hP2aTv97QtEJy39GkxKmDBVDiTBApfiuTa4O=TEWQ@mail.gmail.com	2025-02-11 07:40:03 +09:00
Michael Paquier	3d17d7d7fb	Reformat node comments in plannodes.h This is similar to `d575051b9a` but this time for the comments in plannodes.h to avoid long lines, which is useful if adding per-field annotations with pg_node_attr() to these planner structures. Some patches are under discussion to add such properties to planner fields, which is something that may or may not happen, and this change makes future proposals easier to work on and review, which being more consistent in style with the parse nodes. Author: Sami Imseih Discussion: https://postgr.es/m/Z5xTb5iBHVGns35R@paquier.xyz	2025-02-10 09:58:25 +09:00
Peter Eisentraut	83ea6c5402	Virtual generated columns This adds a new variant of generated columns that are computed on read (like a view, unlike the existing stored generated columns, which are computed on write, like a materialized view). The syntax for the column definition is ... GENERATED ALWAYS AS (...) VIRTUAL and VIRTUAL is also optional. VIRTUAL is the default rather than STORED to match various other SQL products. (The SQL standard makes no specification about this, but it also doesn't know about VIRTUAL or STORED.) (Also, virtual views are the default, rather than materialized views.) Virtual generated columns are stored in tuples as null values. (A very early version of this patch had the ambition to not store them at all. But so much stuff breaks or gets confused if you have tuples where a column in the middle is completely missing. This is a compromise, and it still saves space over being forced to use stored generated columns. If we ever find a way to improve this, a bit of pg_upgrade cleverness could allow for upgrades to a newer scheme.) The capabilities and restrictions of virtual generated columns are mostly the same as for stored generated columns. In some cases, this patch keeps virtual generated columns more restricted than they might technically need to be, to keep the two kinds consistent. Some of that could maybe be relaxed later after separate careful considerations. Some functionality that is currently not supported, but could possibly be added as incremental features, some easier than others: - index on or using a virtual column - hence also no unique constraints on virtual columns - extended statistics on virtual columns - foreign-key constraints on virtual columns - not-null constraints on virtual columns (check constraints are supported) - ALTER TABLE / DROP EXPRESSION - virtual column cannot have domain type - virtual columns are not supported in logical replication The tests in generated_virtual.sql have been copied over from generated_stored.sql with the keyword replaced. This way we can make sure the behavior is mostly aligned, and the differences can be visible. Some tests for currently not supported features are currently commented out. Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org	2025-02-07 09:46:59 +01:00
Amit Langote	cbc127917e	Track unpruned relids to avoid processing pruned relations This commit introduces changes to track unpruned relations explicitly, making it possible for top-level plan nodes, such as ModifyTable and LockRows, to avoid processing partitions pruned during initial pruning. Scan-level nodes, such as Append and MergeAppend, already avoid the unnecessary processing by accessing partition pruning results directly via part_prune_index. In contrast, top-level nodes cannot access pruning results directly and need to determine which partitions remain unpruned. To address this, this commit introduces a new bitmapset field, es_unpruned_relids, which the executor uses to track the set of unpruned relations. This field is referenced during plan initialization to skip initializing certain nodes for pruned partitions. It is initialized with PlannedStmt.unprunableRelids, a new field that the planner populates with RT indexes of relations that cannot be pruned during runtime pruning. These include relations not subject to partition pruning and those required for execution regardless of pruning. PlannedStmt.unprunableRelids is computed during set_plan_refs() by removing the RT indexes of runtime-prunable relations, identified from PartitionPruneInfos, from the full set of relation RT indexes. ExecDoInitialPruning() then updates es_unpruned_relids by adding partitions that survive initial pruning. To support this, PartitionedRelPruneInfo and PartitionedRelPruningData now include a leafpart_rti_map[] array that maps partition indexes to their corresponding RT indexes. The former is used in set_plan_refs() when constructing unprunableRelids, while the latter is used in ExecDoInitialPruning() to convert partition indexes returned by get_matching_partitions() into RT indexes, which are then added to es_unpruned_relids. These changes make it possible for ModifyTable and LockRows nodes to process only relations that remain unpruned after initial pruning. ExecInitModifyTable() trims lists, such as resultRelations, withCheckOptionLists, returningLists, and updateColnosLists, to consider only unpruned partitions. It also creates ResultRelInfo structs only for these partitions. Similarly, child RowMarks for pruned relations are skipped. By avoiding unnecessary initialization of structures for pruned partitions, these changes improve the performance of updates and deletes on partitioned tables during initial runtime pruning. Due to ExecInitModifyTable() changes as described above, EXPLAIN on a plan for UPDATE and DELETE that uses runtime initial pruning no longer lists partitions pruned during initial pruning. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-02-07 17:15:09 +09:00
Peter Eisentraut	119fc30dd5	Move CompareType to separate header file We'll want to make use of it in more places, and we'd prefer to not have to include all of primnodes.h everywhere. Author: Mark Dilger <mark.dilger@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-02 08:11:57 +01:00
Tom Lane	041e8b95b8	Get rid of our dependency on type "long" for memory size calculations. Consistently use "Size" (or size_t, or in some places int64 or double) as the type for variables holding memory allocation sizes. In most places variables' data types were fine already, but we had an ancient habit of computing bytes from kilobytes-units GUCs with code like "work_mem * 1024L". That risks overflow on Win64 where they did not make "long" as wide as "size_t". We worked around that by restricting such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB on Win64. This patch removes that restriction, after replacing such calculations with "work_mem * (Size) 1024" or variants of that. It should be noted that this patch was constructed by searching outwards from the GUCs that have MAX_KILOBYTES as upper limit. So I can't positively guarantee there are no other places doing memory-size arithmetic in int or long variables. I do however feel pretty confident that increasing MAX_KILOBYTES on Win64 is safe now. Also, nothing in our code should be dealing in multiple-gigabyte allocations without authorization from a relevant GUC, so it seems pretty likely that this search caught everything that could be at risk of overflow. Author: Vladlen Popolitov <v.popolitov@postgrespro.ru> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217	2025-01-31 13:52:40 -05:00
Amit Langote	d47cbf474e	Perform runtime initial pruning outside ExecInitNode() This commit builds on the prior change that moved PartitionPruneInfos out of individual plan nodes into a list in PlannedStmt, making it possible to initialize PartitionPruneStates without traversing the plan tree and perform runtime initial pruning before ExecInitNode() initializes the plan trees. These tasks are now handled in a new routine, ExecDoInitialPruning(), which is called by InitPlan() before calling ExecInitNode() on various plan trees. ExecDoInitialPruning() performs the initial pruning and saves the result -- a Bitmapset of indexes for surviving child subnodes -- in es_part_prune_results, a list in EState. PartitionPruneStates created for initial pruning are stored in es_part_prune_states, another list in EState, for later use during exec pruning. Both lists are parallel to es_part_prune_infos, which holds the PartitionPruneInfos from PlannedStmt, enabling shared indexing. PartitionPruneStates initialized in ExecDoInitialPruning() now include only the PartitionPruneContexts for initial pruning steps. Exec pruning contexts are initialized later in ExecInitPartitionExecPruning() when the parent plan node is initialized, as the exec pruning step expressions depend on the parent node's PlanState. The existing function PartitionPruneFixSubPlanMap() has been repurposed for this initialization to avoid duplicating a similar loop structure for finding PartitionedRelPruningData to initialize exec pruning contexts for. It has been renamed to InitExecPruningContexts() to reflect its new primary responsibility. The original logic to "fix subplan maps" remains intact but is now encapsulated within the renamed function. This commit removes two obsolete Asserts in partkey_datum_from_expr(). The ExprContext used for pruning expression evaluation is now independent of the parent PlanState, making these Asserts unnecessary. By centralizing pruning logic and decoupling it from the plan initialization step (ExecInitNode()), this change sets the stage for future patches that will use the result of initial pruning to save the overhead of redundant processing for pruned partitions. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-01-31 15:47:15 +09:00
Amit Langote	bb3ec16e14	Move PartitionPruneInfo out of plan nodes into PlannedStmt This moves PartitionPruneInfo from plan nodes to PlannedStmt, simplifying traversal by centralizing all PartitionPruneInfo structures in a single list in it, which holds all instances for the main query and its subqueries. Instead of plan nodes (Append or MergeAppend) storing PartitionPruneInfo pointers, they now reference an index in this list. A bitmapset field is added to PartitionPruneInfo to store the RT indexes corresponding to the apprelids field in Append or MergeAppend. This allows execution pruning logic to verify that it operates on the correct plan node, mainly to facilitate debugging. Duplicated code in set_append_references() and set_mergeappend_references() is refactored into a new function, register_pruneinfo(). This updates RT indexes by applying rtoffet and adds PartitionPruneInfo to the global list in PlannerGlobal. By allowing pruning to be performed without traversing the plan tree, this change lays the groundwork for runtime initial pruning to occur independently of plan tree initialization. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-01-30 11:57:32 +09:00
John Naylor	128897b101	Fix grammatical typos around possessive "its" Some places spelled it "it's", which is short for "it is". In passing, fix a couple other nearby grammatical errors. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com	2025-01-29 14:39:14 +07:00
Dean Rasheed	80feb727c8	Add OLD/NEW support to RETURNING in DML queries. This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries to explicitly return old and new values by using the special aliases "old" and "new", which are automatically added to the query (if not already defined) while parsing its RETURNING list, allowing things like: RETURNING old.colname, new.colname, ... RETURNING old., new. Additionally, a new syntax is supported, allowing the names "old" and "new" to be changed to user-supplied alias names, e.g.: RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ... This is useful when the names "old" and "new" are already defined, such as inside trigger functions, allowing backwards compatibility to be maintained -- the interpretation of any existing queries that happen to already refer to relations called "old" or "new", or use those as aliases for other relations, is not changed. For an INSERT, old values will generally be NULL, and for a DELETE, new values will generally be NULL, but that may change for an INSERT with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule changes the command type. Therefore, we put no restrictions on the use of old and new in any DML queries. Dean Rasheed, reviewed by Jian He and Jeff Davis. Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com	2025-01-16 14:57:35 +00:00
Peter Eisentraut	630f9a43ce	Change gist stratnum function to use CompareType This changes commit `7406ab623f` in that the gist strategy number mapping support function is changed to use the CompareType enum as input, instead of the "well-known" RT*StrategyNumber strategy numbers. This is a bit cleaner, since you are not dealing with two sets of strategy numbers. Also, this will enable us to subsume this system into a more general system of using CompareType to define operator semantics across index methods. Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-01-15 11:34:04 +01:00
Peter Eisentraut	6339f6468e	Rename RowCompareType to CompareType RowCompareType served as a way to describe the fundamental meaning of an operator, notionally independent of an operator class (although so far this was only really supported for btrees). Its original purpose was for use inside RowCompareExpr, and it has also found some small use outside, such as for get_op_btree_interpretation(). We want to expand this now, as a more general way to describe operator semantics for other index access methods, including gist (to improve GistTranslateStratnum()) and others not written yet. To avoid future confusion, we rename the type to CompareType and the symbols from ROWCOMPARE_XXX to COMPARE_XXX to reflect their more general purpose. Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-01-15 08:44:01 +01:00
Jeff Davis	b4a07f532b	Revert "TupleHashTable: store additional data along with tuple." This reverts commit `e0ece2a981` due to performance regressions. Reported-by: David Rowley	2025-01-13 14:14:33 -08:00
Richard Guo	6e826278f1	Fix pgindent damage Oversight in commit `e0ece2a98`.	2025-01-13 11:27:32 +09:00
Peter Eisentraut	ca87c415e2	Add support for NOT ENFORCED in CHECK constraints This adds support for the NOT ENFORCED/ENFORCED flag for constraints, with support for check constraints. The plan is to eventually support this for foreign key constraints, where it is typically more useful. Note that CHECK constraints do not currently support ALTER operations, so changing the enforceability of an existing constraint isn't possible without dropping and recreating it. This could be added later. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Tested-by: Triveni N <triveni.n@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com	2025-01-11 10:52:30 +01:00
Jeff Davis	e0ece2a981	TupleHashTable: store additional data along with tuple. Previously, the caller needed to allocate the memory and the TupleHashTable would store a pointer to it. That wastes space for the palloc overhead as well as the size of the pointer itself. Now, the TupleHashTable relies on the caller to correctly specify the additionalsize, and allocates that amount of space. The caller can then request a pointer into that space. Discussion: https://postgr.es/m/b9cbf0219a9859dc8d240311643ff4362fd9602c.camel@j-davis.com Reviewed-by: Heikki Linnakangas	2025-01-10 17:14:37 -08:00
Jeff Davis	8a96faedc4	Remove unused TupleHashTableData->entrysize. Discussion: https://postgr.es/m/7530bd8783b1a78d53a3c70383e38d8da0a5ffe5.camel%40j-davis.com	2025-01-07 14:49:18 -08:00
Bruce Momjian	50e6eb731d	Update copyright for 2025 Backpatch-through: 13	2025-01-01 11:21:55 -05:00
Tom Lane	2762792952	Convert SetOp to read its inputs as outerPlan and innerPlan. The original design for set operations involved appending the two input relations into one and adding a flag column that allows distinguishing which side each row came from. Then the SetOp node pries them apart again based on the flag. This is bizarre. The only apparent reason to do it is that when sorting, we'd only need one Sort node not two. But since sorting is at least O(N log N), sorting all the data is actually worse than sorting each side separately --- plus, we have no chance of taking advantage of presorted input. On top of that, adding the flag column frequently requires an additional projection step that adds cycles, and then the Append node isn't free either. Let's get rid of all of that and make the SetOp node have two separate children, using the existing outerPlan/innerPlan infrastructure. This initial patch re-implements nodeSetop.c and does a bare minimum of work on the planner side to generate correctly-shaped plans. In particular, I've tried not to change the cost estimates here, so that the visible changes in the regression test results will only involve removal of useless projection steps and not any changes in whether to use sorted vs hashed mode. For SORTED mode, we combine successive identical tuples from each input into groups, and then merge-join the groups. The tuple comparisons now use SortSupport instead of simple equality, but the group-formation part should involve roughly the same number of tuple comparisons as before. The cross-comparisons between left and right groups probably add to that, but I'm not sure to quantify how many more comparisons we might need. For HASHED mode, nodeSetop's logic is almost the same as before, just refactored into two separate loops instead of one loop that has an assumption that it will see all the left-hand inputs first. In both modes, I added early-exit logic to not bother reading the right-hand relation if the left-hand input is empty, since neither INTERSECT nor EXCEPT modes can produce any output if the left input is empty. This could have been done before in the hashed mode, but not in sorted mode. Sorted mode can also stop as soon as it exhausts the left input; any remaining right-hand tuples cannot have matches. Also, this patch adds some infrastructure for detecting whether child plan nodes all output the same type of tuple table slot. If they do, the hash table logic can use slightly more efficient code based on assuming that that's the input slot type it will see. We'll make use of that infrastructure in other plan node types later. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us	2024-12-19 16:23:45 -05:00
Melanie Plageman	1a0da347a7	Bitmap Table Scans use unified TBMIterator With the repurposing of TBMIterator as an interface for both parallel and serial iteration through TIDBitmaps in commit `7f9d4187e7`, bitmap table scans may now use it. Modify bitmap table scan code to use the TBMIterator. This requires moving around a bit of code, so a few variables are initialized elsewhere. Author: Melanie Plageman Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me	2024-12-18 18:43:39 -05:00
Melanie Plageman	7f9d4187e7	Add common interface for TBMIterators Add and use TBMPrivateIterator, which replaces the current TBMIterator for serial use cases, and repurpose TBMIterator to be a unified interface for both the serial ("private") and parallel ("shared") TID Bitmap iterator interfaces. This encapsulation simplifies call sites for callers supporting both parallel and serial TID Bitmap access. TBMIterator is not yet used in this commit. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-12-18 18:19:28 -05:00
David Rowley	bd10ec5297	Detect redundant GROUP BY columns using UNIQUE indexes `d4c3a156c` added support that when the GROUP BY contained all of the columns belonging to a relation's PRIMARY KEY, all other columns belonging to that relation would be removed from the GROUP BY clause. That's possible because all other columns are functionally dependent on the PRIMARY KEY and those columns alone ensure the groups are distinct. Here we expand on that optimization and allow it to work for any unique indexes on the table rather than just the PRIMARY KEY index. This normally requires that all columns in the index are defined with NOT NULL, however, we can relax that requirement when the index is defined with NULLS NOT DISTINCT. When there are multiple suitable indexes to allow columns to be removed, we prefer the index with the least number of columns as this allows us to remove the highest number of GROUP BY columns. One day, we may want to revisit that decision as it may make more sense to use the narrower set of columns in terms of the width of the data types and stored/queried data. This also adjusts the code to make use of RelOptInfo.indexlist rather than looking up the catalog tables. In passing, add another short-circuit path to allow bailing out earlier in cases where it's certainly not possible to remove redundant GROUP BY columns. This early exit is now cheaper to do than when this code was originally written as `00b41463c` made it cheaper to check for empty Bitmapsets. Patch originally by Zhang Mingli and later worked on by jian he, but after I (David) worked on it, there was very little of the original left. Author: Zhang Mingli, jian he, David Rowley Reviewed-by: jian he, Andrei Lepikhov Discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark	2024-12-12 15:28:38 +13:00
David Rowley	0f5738202b	Use ExprStates for hashing in GROUP BY and SubPlans This speeds up obtaining hash values for GROUP BY and hashed SubPlans by using the ExprState support for hashing, thus allowing JIT compilation for obtaining hash values for these operations. This, even without JIT compilation, has been shown to improve Hash Aggregate performance in some cases by around 15% and hashed NOT IN queries in one case by over 30%, however, real-world cases are likely to see smaller gains as the test cases used were purposefully designed to have high hashing overheads by keeping the hash table small to prevent additional memory overheads that would be a factor when working with large hash tables. In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the initial value is stored directly in the ExprState's result field if there are no expressions to hash. None of the current users of this function use an initial value, so the bug is only hypothetical. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com	2024-12-11 13:47:16 +13:00
Álvaro Herrera	14e87ffa5c	Add pg_constraint rows for not-null constraints We now create contype='n' pg_constraint rows for not-null constraints on user tables. Only one such constraint is allowed for a column. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter or remove it, instead matching by the name of the column that they apply to. This means we don't require the constraint names to be identical across a hierarchy. The inheritance status of these constraints can be controlled: now we can be sure that if a parent table has one, then all children will have it as well. They can optionally be marked NO INHERIT, and then children are free not to have one. (There's currently no support for altering a NO INHERIT constraint into inheriting down the hierarchy, but that's a desirable future feature.) This also opens the door for having these constraints be marked NOT VALID, as well as allowing UNIQUE+NOT NULL to be used for functional dependency determination, as envisioned by commit `e49ae8d3bc`. It's likely possible to allow DEFERRABLE constraints as followup work, as well. psql shows these constraints in \d+, though we may want to reconsider if this turns out to be too noisy. Earlier versions of this patch hid constraints that were on the same columns of the primary key, but I'm not sure that that's very useful. If clutter is a problem, we might be better off inventing a new \d++ command and not showing the constraints in \d+. For now, we omit these constraints on system catalog columns, because they're unlikely to achieve anything. The main difference to the previous attempt at this (`b0e96f3119`) is that we now require that such a constraint always exists when a primary key is in the column; we didn't require this previously which had a number of unpalatable consequences. With this requirement, the code is easier to reason about. For example: - We no longer have "throwaway constraints" during pg_dump. We needed those for the case where a table had a PK without a not-null underneath, to prevent a slow scan of the data during restore of the PK creation, which was particularly problematic for pg_upgrade. - We no longer have to cope with attnotnull being set spuriously in case a primary key is dropped indirectly (e.g., via DROP COLUMN). Some bits of code in this patch were authored by Jian He. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql	2024-11-08 13:28:48 +01:00
Tom Lane	89e51abcb2	Add a parse location field to struct FunctionParameter. This allows an error cursor to be supplied for a bunch of bad-function-definition errors that previously lacked one, or that cheated a bit by pointing at the contained type name when the error isn't really about that. Bump catversion from an abundance of caution --- I don't think this node type can actually appear in stored views/rules, but better safe than sorry. Jian He and Tom Lane (extracted from a larger patch by Jian, with some additional work by me) Discussion: https://postgr.es/m/CACJufxEmONE3P2En=jopZy1m=cCCUs65M4+1o52MW5og9oaUPA@mail.gmail.com	2024-10-31 16:09:27 -04:00
David Rowley	3974bc3196	Remove unused field from SubPlanState struct `bf6c614a2` did some conversion work to use ExprState instead of manually calling equality functions to check if one set of values is not distinct from another set. That patch removed many of the fields that became redundant as a result of that change, but it forgot to remove SubPlanState.tab_eq_funcs. Fix that. In passing, fix the header comment for TupleHashEntryData to correctly spell the field name it's talking about. Author: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CA+FpmFeycdombFzrjZw7Rmc29CVm4OOzCWwu=dVBQ6q=PX8SvQ@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrWR2jYVhec=COyF2g2BE_ns91NDsCHAMFiXbyhEujKdQ@mail.gmail.com	2024-10-31 13:44:15 +13:00
Melanie Plageman	de380a62b5	Make table_scan_bitmap_next_block() async-friendly Move all responsibility for indicating a block is exhuasted into table_scan_bitmap_next_tuple() and advance the main iterator in heap-specific code. This flow control makes more sense and is a step toward using the read stream API for bitmap heap scans. Previously, table_scan_bitmap_next_block() returned false to indicate table_scan_bitmap_next_tuple() should not be called for the tuples on the page. This happened both when 1) there were no visible tuples on the page and 2) when the block returned by the iterator was past the end of the table. BitmapHeapNext() (generic bitmap table scan code) handled the case when the bitmap was exhausted. It makes more sense for table_scan_bitmap_next_tuple() to return false when there are no visible tuples on the page and table_scan_bitmap_next_block() to return false when the bitmap is exhausted or there are no more blocks in the table. As part of this new design, TBMIterateResults are no longer used as a flow control mechanism in BitmapHeapNext(), so we removed table_scan_bitmap_next_tuple's TBMIterateResult parameter. Note that the prefetch iterator is still saved in the BitmapHeapScanState node and advanced in generic bitmap table scan code. This is because 1) it was not necessary to change the prefetch iterator location to change the flow control in BitmapHeapNext() 2) modifying prefetch iterator management requires several more steps better split over multiple commits and 3) the prefetch iterator will be removed once the read stream API is used. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas, Mark Dilger Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-10-25 10:11:58 -04:00
Michael Paquier	499edb0974	Track more precisely query locations for nested statements Previously, a Query generated through the transform phase would have unset stmt_location, tracking the starting point of a query string. Extensions relying on the statement location to extract its relevant parts in the source text string would fallback to use the whole statement instead, leading to confusing results like in pg_stat_statements for queries relying on nested queries, like: - EXPLAIN, with top-level and nested query using the same query string, and a query ID coming from the nested query when the non-top-level entry. - Multi-statements, with only partial portions of queries being normalized. - COPY TO with a query, SELECT or DMLs. This patch improves things by keeping track of the statement locations and propagate it to Query during transform, allowing PGSS to only show the relevant part of the query for nested query. This leads to less bloat in entries for non-top-level entries, as queries can now be grouped within the same (toplevel, queryid) duos in pg_stat_statements. The result gives a stricter one-one mapping between query IDs and its query strings. The regression tests introduced in `45e0ba30fc` produce differences reflecting the new logic. Author: Anthonin Bonnefoy Reviewed-by: Michael Paquier, Jian He Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com	2024-10-24 09:29:54 +09:00
Peter Eisentraut	d2b4b4c225	Fix C23 compiler warning The approach of declaring a function pointer with an empty argument list and hoping that the compiler will not complain about casting it to another type no longer works with C23, because foo() is now equivalent to foo(void). We don't need to do this here. With a few struct forward declarations we can supply a correct argument list without having to pull in another header file. (This is the only new warning with C23. Together with the previous fix `a67a49648d`, this makes the whole code compile cleanly under C23.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/95c6a9bf-d306-43d8-b880-664ef08f2944%40eisentraut.org	2024-10-22 08:17:42 +02:00
Amit Langote	11c87216d1	SQL/JSON: Fix some oversights in commit `b6e1157e7` The decision in `b6e1157e7` to ignore raw_expr when evaluating a JsonValueExpr was incorrect. While its value is not ultimately used (since formatted_expr's value is), failing to initialize it can lead to problems, for instance, when the expression tree in raw_expr contains Aggref nodes, which must be initialized to ensure the parent Agg node works correctly. Also, optimize eval_const_expressions_mutator()'s handling of JsonValueExpr a bit. Currently, when formatted_expr cannot be folded into a constant, we end up processing it twice -- once directly in eval_const_expressions_mutator() and again recursively via ece_generic_processing(). This recursive processing is required to handle raw_expr. To avoid the redundant processing of formatted_expr, we now process raw_expr directly in eval_const_expressions_mutator(). Finally, update the comment of JsonValueExpr to describe the roles of raw_expr and formatted_expr more clearly. Bug: #18657 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org Backpatch-through: 16	2024-10-20 12:20:55 +09:00
Jeff Davis	eecd9138a0	Improve ThrowErrorData() comments for use with soft errors. Reviewed-by: Corey Huinker Discussion: https://postgr.es/m/901ab7cf01957f92ea8b30b6feeb0eacfb7505fc.camel@j-davis.com	2024-10-17 14:56:44 -07:00
Peter Eisentraut	eafda78fc4	Improve node type forward reference Instead of using Node *, we can use an incomplete struct. That way, everything has the correct type and fewer casts are required. This technique is already used elsewhere in node type definitions. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org	2024-10-17 08:36:48 +02:00
Nathan Bossart	d5ca15ee54	Add type cast to foreach_internal's loop variable. C++ requires explicitly casting void pointers to the appropriate pointer type, which means the foreach_ptr macro cannot be used in C++ code without this change. Author: Jelte Fennema-Nio Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/CAGECzQSYG3QfHrc-rOk2KbnB9iJOd7Qu-Xii1s-GTA%3D3JFt49Q%40mail.gmail.com Backpatch-through: 17	2024-10-15 16:20:49 -05:00
Peter Eisentraut	c594f1ad2b	Track scan reversals in MergeJoin The MergeJoin struct was tracking "mergeStrategies", which were an array of btree strategy numbers, purely for the purpose of comparing it later against btree strategies to determine if the scan direction was forward or reverse. Change that. Instead, track "mergeReversals", an array of bool, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:18 +02:00
Peter Eisentraut	0d2aa4d493	Track sort direction in SortGroupClause Functions make_pathkey_from_sortop() and transformWindowDefinitions(), which receive a SortGroupClause, were determining the sort order (ascending vs. descending) by comparing that structure's operator strategy to BTLessStrategyNumber, but could just as easily have gotten it from the SortGroupClause object, if it had such a field, so add one. This reduces the number of places that hardcode the assumption that the strategy refers specifically to a btree strategy, rather than some other index AM's operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:02 +02:00
Álvaro Herrera	fd64ed60b6	Unbreak overflow test for attinhcount/coninhcount Commit `90189eefc1` narrowed pg_attribute.attinhcount and pg_constraint.coninhcount from 32 to 16 bits, but kept other related structs with 32-bit wide fields: ColumnDef and CookedConstraint contain an int 'inhcount' field which is itself checked for overflow on increments, but there's no check that the values aren't above INT16_MAX before assigning to the catalog columns. This means that a creative user can get a inconsistent table definition and override some protections. Fix it by changing those other structs to also use int16. Also, modernize style by using pg_add_s16_overflow for overflow testing instead of checking for negative values. We also have Constraint.inhcount, which is here removed completely. This was added by commit `b0e96f3119` and not removed by its revert at `6f8bb7c1e9`. It is not needed by the upcoming not-null constraints patch. This is mostly academic, so we agreed not to backpatch to avoid ABI problems. Bump catversion because of the changes to parse nodes. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Co-authored-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202410081611.up4iyofb5ie7@alvherre.pgsql	2024-10-10 17:41:01 +02:00
Michael Paquier	de3a2ea3b2	Introduce two fields in EState to track parallel worker activity These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:07:48 +09:00
Michael Paquier	dc68515968	Show values of SET statements as constants in pg_stat_statements This is a continuation of work like `11c34b342b`, done to reduce the bloat of pg_stat_statements by applying more normalization to query entries. This commit is able to detect and normalize values in VariableSetStmt, resulting in: SET conf_param = $1 Compared to other parse nodes, VariableSetStmt is embedded in much more places in the parser, impacting many query patterns in pg_stat_statements. A custom jumble function is used, with an extra field in the node to decide if arguments should be included in the jumbling or not, a location field being not enough for this purpose. This approach allows for a finer tuning. Clauses relying on one or more keywords are not normalized, for example: * DEFAULT * FROM CURRENT * List of keywords. SET SESSION CHARACTERISTICS AS TRANSACTION, where it is critical to differentiate different sets of options, is a good example of why normalization should not happen. Some queries use VariableSetStmt for some subclauses with SET, that also have their values normalized: - ALTER DATABASE - ALTER ROLE - ALTER SYSTEM - CREATE/ALTER FUNCTION `ba90eac7a9` has added test coverage for most of the existing SET patterns. The expected output of these tests shows the difference this commit creates. Normalization could be perhaps applied to more portions of the grammar but what is done here is conservative, and good enough as a starting point. Author: Greg Sabino Mullane, Michael Paquier Discussion: https://postgr.es/m/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com	2024-09-30 14:02:00 +09:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit `86dc90056d`, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
Peter Eisentraut	89f908a6d0	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as `34768ee361`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Richard Guo	247dea89f7	Introduce an RTE for the grouping step If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:35:34 +09:00
Amit Langote	3422f5f93f	Update comment about ExprState.escontext The updated comment provides more helpful guidance by mentioning that escontext should be set when soft error handling is needed. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-06 10:13:53 +09:00
David Rowley	908a968612	Optimize WindowAgg's use of tuplestores When WindowAgg finished one partition of a PARTITION BY, it previously would call tuplestore_end() to purge all the stored tuples before again calling tuplestore_begin_heap() and carefully setting up all of the tuplestore read pointers exactly as required for the given frameOptions. Since the frameOptions don't change between partitions, this part does not make much sense. For queries that had very few rows per partition, the overhead of this was very large. It seems much better to create the tuplestore and the read pointers once and simply call tuplestore_clear() at the end of each partition. tuplestore_clear() moves all of the read pointers back to the start position and deletes all the previously stored tuples. A simple test query with 1 million partitions and 1 tuple per partition has been shown to run around 40% faster than without this change. The additional effort seems to have mostly been spent in malloc/free. Making this work required adding a new bool field to WindowAggState which had the unfortunate effect of being the 9th bool field in a group resulting in the struct being enlarged. Here we shuffle the fields around a little so that the two bool fields for runcondition relating stuff fit into existing padding. Also, move the "runcondition" field to be near those. This frees up enough space with the other bool fields so that the newly added one fits into the padding bytes. This was done to address a very small but apparent performance regression with queries containing a large number of rows per partition. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com	2024-09-05 16:18:30 +12:00
Alexander Korotkov	3890d90c15	Revert support for ALTER TABLE ... MERGE/SPLIT PARTITION(S) commands This commit reverts `1adf16b8fb`, `87c21bb941`, and subsequent fixes and improvements including `df64c81ca9`, `c99ef1811a`, `9dfcac8e15`, `885742b9f8`, `842c9b2705`, `fcf80c5d5f`, `96c7381c4c`, `f4fc7cb54b`, `60ae37a8bc`, `259c96fa8f`, `449cdcd486`, `3ca43dbbb6`, `2a679ae94e`, `3a82c689fd`, `fbd4321fd5`, `d53a4286d7`, `c086896625`, `4e5d6c4091`, `04158e7fa3`. The reason for reverting is security issues related to repeatable name lookups (CVE-2014-0062). Even though `04158e7fa3` solved part of the problem, there are still remaining issues, which aren't feasible to even carefully analyze before the RC deadline. Reported-by: Noah Misch, Robert Haas Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Backpatch-through: 17	2024-08-24 18:48:48 +03:00

1 2 3 4 5 ...

2060 commits