postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-15 22:10:45 -04:00

Author	SHA1	Message	Date
Etsuro Fujita	a0bf7a0ecc	Remove new structure member from ResultRelInfo. In commit `ffbb7e65a`, I added a ModifyTableState member to ResultRelInfo to save the owning ModifyTableState for use by nodeModifyTable.c when performing batch inserts, but as pointed out by Tom Lane, that changed the array stride of es_result_relations, and that would break any previously-compiled extension code that accesses that array. Fix by removing that member from ResultRelInfo and instead adding a List member at the end of EState to save such ModifyTableStates. Per report from Tom Lane. Back-patch to v14, like the previous commit; I chose to apply the patch to HEAD as well, to make back-patching easy. Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us	2022-12-08 16:15:01 +09:00
David Rowley	2a535620ce	Fix 32-bit build dangling pointer issue in WindowAgg `9d9c02ccd` added window "run conditions", which allows the evaluation of monotonic window functions to be skipped when the run condition is no longer true. Prior to this commit, once the run condition was no longer true and we stopped evaluating the window functions, we simply just left the ecxt_aggvalues[] and ecxt_aggnulls[] arrays alone to store whatever value was stored there the last time the window function was evaluated. Leaving a stale value in there isn't really a problem on 64-bit builds as all of the window functions which we recognize as monotonic all return int8, which is passed by value on 64-bit builds. However, on 32-bit builds, this was a problem as the value stored in the ecxt_values[] element would be a by-ref value and it would be pointing to some memory which would get reset once the tuple context is destroyed. Since the WindowAgg node will output these values in the resulting tupleslot, this could be problematic for the top-level WindowAgg node which must look at these values to filter out the rows that don't meet its filter condition. Here we fix this by just zeroing the ecxt_aggvalues[] and setting the ecxt_aggnulls[] array to true when the run condition first becomes false. This results in the WindowAgg's output having NULLs for the WindowFunc's columns rather than the stale or pointer pointing to possibly freed memory. These tuples with the NULLs can only make it as far as the top-level WindowAgg node before they're filtered out. To ensure that these tuples are always filtered out, we now insist that OpExprs making up the run condition are strict OpExprs. Currently, all the window functions which the planner recognizes as monotonic return INT8 and the operator which is used for the run condition must be a member of a btree opclass. In reality, these restrictions exclude nothing that's built-in to Postgres and are unlikely to exclude anyone's custom operators due to the requirement that the operator is part of a btree opclass. It would be unusual if those were not strict. Reported-by: Sergey Shinderuk, using valgrind Reviewed-by: Richard Guo, Sergey Shinderuk Discussion: https://postgr.es/m/29184c50-429a-ebd7-f1fb-0589c6723a35@postgrespro.ru Backpatch-through: 15, where `9d9c02ccd` was added	2022-12-07 00:10:21 +13:00
Tom Lane	5dfc2b753b	Prevent clobbering of utility statements in SQL function caches. This is an oversight in commit `7c337b6b5`: I apparently didn't think about the possibility of a SQL function being executed multiple times within a query. In that case, functions.c's primitive caching mechanism allows the same utility parse tree to be presented for execution more than once. We have to tell ProcessUtility to make a working copy of the parse tree, or bad things happen. Normally I'd add a regression test, but I think the reported crasher is dependent on some rather random implementation choices that are nowhere near functions.c, so its usefulness as a long-lived test feels questionable. In any case, this fix is clearly correct given the design choices of `7c337b6b5`. Per bug #17702 from Xin Wen. Thanks to Daniel Gustafsson for analysis. Back-patch to v14 where the faulty commit came in (before that, the responsibility for copying scribble-able utility parse trees lay elsewhere). Discussion: https://postgr.es/m/17702-ad24fdcdd1e9047a@postgresql.org	2022-11-29 11:46:33 -05:00
Etsuro Fujita	fc02019c09	Fix handling of pending inserts in nodeModifyTable.c. Commit `b663a4136`, which allowed FDWs to INSERT rows in bulk, added to nodeModifyTable.c code to flush pending inserts to the foreign-table result relation(s) before completing processing of the ModifyTable node, but the code failed to take into account the case where the INSERT query has modifying CTEs, leading to incorrect results. Also, that commit failed to flush pending inserts before firing BEFORE ROW triggers so that rows are visible to such triggers. In that commit we scanned through EState's es_tuple_routing_result_relations or es_opened_result_relations list to find the foreign-table result relations to which pending inserts are flushed, but that would be inefficient in some cases. So to fix, 1) add a List member to EState to record the insert-pending result relations, and 2) modify nodeModifyTable.c so that it adds the foreign-table result relation to the list in ExecInsert() if appropriate, and flushes pending inserts properly using the list where needed. While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(), which was added by that commit. Back-patch to v14 where that commit appeared. Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com	2022-11-25 17:45:01 +09:00
Alvaro Herrera	3d45edcef0	Fix MERGE tuple count with DO NOTHING Reporting tuples for which nothing is done is useless and goes against the documented behavior, so don't do it. Backpatch to 15. Reported by: Luca Ferrari Discussion: https://postgr.es/m/CAKoxK+42MmACUh6s8XzASQKizbzrtOGA6G1UjzCP75NcXHsiNw@mail.gmail.com	2022-11-17 18:56:11 +01:00
Alvaro Herrera	cefe182533	Fix outdated comment in ExecDelete This commend references a struct that disappeared before MERGE was merged ... and ExecDelete is not called by the committed MERGE anyway. Revert to the original wording. Backpatch to 15	2022-11-17 12:52:20 +01:00
Daniel Gustafsson	1eaa48e998	doc: Fix wording of MERGE actions in README UPDATE was listed twice and DELETE was omitted, replace one UPDATE with DELETE instead. Backpatch through v15 where MERGE was added. Author: Myo Wai Thant <myo.waithant@fujitsu.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/OSAPR01MB43247E46931E9E9CFC4AA0F29A079@OSAPR01MB4324.jpnprd01.prod.outlook.com Backpatch-through: 15	2022-11-17 10:07:06 +01:00
Etsuro Fujita	d5e1748f02	Fix copy-and-pasteo in comment.	2022-11-02 18:15:01 +09:00
Etsuro Fujita	d460faf002	Update comment in ExecInsert() regarding batch insertion. Remove the stale text that is a leftover from an earlier version of the patch to add support for batch insertion, and adjust the wording in the remaining text. Back-patch to v14 where batch insertion came in. Review and wording adjustment by Tom Lane. Discussion: https://postgr.es/m/CAPmGK14goatHPHQv2Aeu_UTKqZ%2BBO%2BP%2Bzd3HKv5D%2BdyyfWKDSw%40mail.gmail.com	2022-09-29 16:55:01 +09:00
David Rowley	f7ae8a2e18	Restrict Datum sort optimization to byval types only `91e9e89dc` modified nodeSort.c so that it used datum sorts when the targetlist of the outer node contained only a single column. That commit failed to recognise that the Datum returned by tuplesort_getdatum() must be pfree'd when the type is a byref type. Ronan Dunklau did originally propose the patch with that restriction, but that, probably through my own fault, got lost during further development work. Due to the timing of this report (PG15 RC1 is almost out the door), let's just restrict the datum sort optimization to apply for byval types only. We might want to look harder into making this work for byref types in PG16. Reported-by: Önder Kalacı Diagnosis-by: Tom Lane Discussion: https://postgr.es/m/CACawEhVxe0ufR26UcqtU7GYGRuubq3p6ZWPGXL4cxy_uexpAAQ@mail.gmail.com Backpatch-through: 15, where `91e9e89dc` was introduced.	2022-09-29 11:43:40 +13:00
Peter Eisentraut	517484b582	Message style improvements	2022-09-24 18:38:35 -04:00
Tom Lane	c403f97b4e	Future-proof the recursion inside ExecShutdownNode(). The API contract for planstate_tree_walker() callbacks is that they take a PlanState pointer and a context pointer. Somebody figured they could save a couple lines of code by ignoring that, and passing ExecShutdownNode itself as the walker even though it has but one argument. Somewhat remarkably, we've gotten away with that so far. However, it seems clear that the upcoming C2x standard means to forbid such cases, and compilers that actively break such code likely won't be far behind. So spend the extra few lines of code to do it honestly with a separate walker function. In HEAD, we might as well go further and remove ExecShutdownNode's useless return value. I left that as-is in back branches though, to forestall complaints about ABI breakage. Back-patch, with the thought that this might become of practical importance before our stable branches are all out of service. It doesn't seem to be fixing any live bug on any currently known platform, however. Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us	2022-09-19 12:16:02 -04:00
Andrew Dunstan	96ef3237bf	Revert SQL/JSON features The reverts the following and makes some associated cleanups: commit `f79b803dc`: Common SQL/JSON clauses commit `f4fb45d15`: SQL/JSON constructors commit `5f0adec25`: Make STRING an unreserved_keyword. commit `33a377608`: IS JSON predicate commit `1a36bc9db`: SQL/JSON query functions commit `606948b05`: SQL JSON functions commit `49082c2cc`: RETURNING clause for JSON() and JSON_SCALAR() commit `4e34747c8`: JSON_TABLE commit `fadb48b00`: PLAN clauses for JSON_TABLE commit `2ef6f11b0`: Reduce running time of jsonb_sqljson test commit `14d3f24fa`: Further improve jsonb_sqljson parallel test commit `a6baa4bad`: Documentation for SQL/JSON features commit `b46bcf7a4`: Improve readability of SQL/JSON documentation. commit `112fdb352`: Fix finalization for json_objectagg and friends commit `fcdb35c32`: Fix transformJsonBehavior commit `4cd8717af`: Improve a couple of sql/json error messages commit `f7a605f63`: Small cleanups in SQL/JSON code commit `9c3d25e17`: Fix JSON_OBJECTAGG uniquefying bug commit `a79153b7a`: Claim SQL standard compliance for SQL/JSON features commit `a1e7616d6`: Rework SQL/JSON documentation commit `8d9f9634e`: Fix errors in copyfuncs/equalfuncs support for JSON node types. commit `3c633f32b`: Only allow returning string types or bytea from json_serialize commit `67b26703b`: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size. The release notes are also adjusted. Backpatch to release 15. Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org	2022-09-01 17:10:42 -04:00
Amit Kapila	76d2579259	Fix replica identity check for a partitioned table. The current publisher code checks if UPDATE or DELETE can be executed with the replica identity of the table even if it's a partitioned table. We can skip checking the replica identity for partitioned tables because the operations are actually performed on the leaf partitions (not the partitioned table). Reported-by: Brad Nicholson Author: Hou Zhijie Reviewed-by: Peter Smith, Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAMMnM%3D8i5DohH%3DYKzV0_wYuYSYvuOJoL9F5nzXTc%2ByzsG1f6rg%40mail.gmail.com	2022-08-16 15:14:27 +05:30
Tom Lane	aee9543736	Avoid misbehavior when hash_table_bytes < bucket_size. It's possible to reach this case when work_mem is very small and tupsize is (relatively) very large. In that case ExecChooseHashTableSize would get an assertion failure, or with asserts off it'd compute nbuckets = 0, which'd likely cause misbehavior later (I've not checked). To fix, clamp the number of buckets to be at least 1. This is due to faulty conversion of old my_log2() coding in `28d936031`. Back-patch to v13, as that was. Zhang Mingli Discussion: https://postgr.es/m/beb64ca0-91e2-44ac-bf4a-7ea36275ec02@Spark	2022-08-13 16:59:58 -04:00
Tom Lane	ad3e07c156	Fix handling of R/W expanded datums that are passed to SQL functions. fmgr_sql must make expanded-datum arguments read-only, because it's possible that the function body will pass the argument to more than one callee function. If one of those functions takes the datum's R/W property as license to scribble on it, then later callees will see an unexpected value, leading to wrong answers. From a performance standpoint, it'd be nice to skip this in the common case that the argument value is passed to only one callee. However, detecting that seems fairly hard, and certainly not something that I care to attempt in a back-patched bug fix. Per report from Adam Mackler. This has been broken since we invented expanded datums, so back-patch to all supported branches. Discussion: https://postgr.es/m/WScDU5qfoZ7PB2gXwNqwGGgDPmWzz08VdydcPFLhOwUKZcdWbblbo-0Lku-qhuEiZoXJ82jpiQU4hOjOcrevYEDeoAvz6nR0IU4IHhXnaCA=@mackler.email Discussion: https://postgr.es/m/187436.1660143060@sss.pgh.pa.us	2022-08-10 13:37:25 -04:00
Tom Lane	3419d51e19	Fix check_exclusion_or_unique_constraint for UNIQUE NULLS NOT DISTINCT. Adjusting this function was overlooked in commit `94aa7cc5f`. The only visible symptom (so far) is that INSERT ... ON CONFLICT could go into an endless loop when inserting a null that has a conflict. Richard Guo and Tom Lane, per bug #17558 from Andrew Kesper Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org	2022-08-04 14:16:26 -04:00
Tom Lane	4c7b16312e	Add CHECK_FOR_INTERRUPTS in ExecInsert's speculative insertion loop. Ordinarily the functions called in this loop ought to have plenty of CFIs themselves; but we've now seen a case where no such CFI is reached, making the loop uninterruptible. Even though that's from a recently-introduced bug, it seems prudent to install a CFI at the loop level in all branches. Per discussion of bug #17558 from Andrew Kesper (an actual fix for that bug will follow). Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org	2022-08-04 14:10:06 -04:00
Peter Eisentraut	b681ca7635	Add another SQL/JSON error code A code comment said that the standard does not define a number for ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE, but this was fixed in a later draft version of the standard, so use that number now.	2022-07-18 14:27:53 +02:00
David Rowley	30efc3b5a3	Remove size increase in ExprEvalStep caused by hashed saops `50e17ad28` increased the size of ExprEvalStep from 64 bytes up to 88 bytes. Lots of effort was spent during the development of the current expression evaluation code to make an instance of this struct as small as possible. Making this struct larger than needed reduces CPU cache efficiency during expression evaluation which causes noticeable slowdowns during query execution. In order to reduce the size of the struct, here we remove the fn_addr field. The values from this field can be obtained via fcinfo, just with some extra pointer dereferencing. The extra indirection does not seem to cause any noticeable slowdowns. Various other fields have been moved into the ScalarArrayOpExprHashTable struct. These fields are only used when the ScalarArrayOpExprHashTable pointer has already been dereferenced, so no additional pointer dereferences occur for these. Here we also make hash_fcinfo_data the last field in ScalarArrayOpExprHashTable so that we can avoid a further pointer dereference to get the FunctionCallInfoBaseData. This also saves a call to palloc(). `50e17ad28` was added in 14, but it's too late to adjust the size of the ExprEvalStep in that version, so here we just backpatch to 15, which is currently in beta. Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Backpatch-through: 15	2022-07-06 19:41:09 +12:00
Andres Freund	5a1ab894f7	expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size. The new expression step types increased the size of ExprEvalStep by ~4 for all types of expression steps, slowing down expression evaluation noticeably. Move them out of line. There's other issues with these expression steps, but addressing them is largely independent of this aspect. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Backpatch: 15-	2022-07-05 11:26:27 -07:00
Tom Lane	1218780cce	Un-break whole-row Vars referencing domain-over-composite types. In commit `ec62cb0aa`, I foolishly replaced ExecEvalWholeRowVar's lookup_rowtype_tupdesc_domain call with just lookup_rowtype_tupdesc, because I didn't see how a domain could be involved there, and there were no regression test cases to jog my memory. But the existing code was correct, so revert that change and add a test case showing why it's necessary. (Note: per comment in struct DatumTupleFields, it is correct to produce an output tuple that's labeled with the base composite type, not the domain; hence just blindly looking through the domain is correct here.) Per bug #17515 from Dan Kubb. Back-patch to v11 where domains over composites became a thing. Discussion: https://postgr.es/m/17515-a24737438363aca0@postgresql.org	2022-06-10 10:35:57 -04:00
David Rowley	fa5185b26c	Harden Memoization code against broken data types Bug #17512 highlighted that a suitably broken data type could cause the backend to crash if either the hash function or equality function were in someway non-deterministic based on their input values. Such a data type could cause a crash of the backend due to some code which assumes that we'll always find a hash table entry corresponding to an item in the Memoize LRU list. Here we remove the assumption that we'll always find the entry corresponding to the given LRU list item and add run-time checks to verify we have found the given item in the cache. This is not a fix for bug #17512, but it will turn the crash reported by that bug report into an internal ERROR. Reported-by: Ales Zeleny Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvpxFSTwvoYWT7kmFVSZ9zLAeHb=S9vrz=RExMgSkQNWqw@mail.gmail.com Backpatch-through: 14, where Memoize was added.	2022-06-08 12:39:09 +12:00
Tom Lane	a916cb9d5a	Avoid overflow hazard when clamping group counts to "long int". Several places in the planner tried to clamp a double value to fit in a "long" by doing (long) Min(x, (double) LONG_MAX); This is subtly incorrect, because it casts LONG_MAX to double and potentially back again. If long is 64 bits then the double value is inexact, and the platform might round it up to LONG_MAX+1 resulting in an overflow and an undesirably negative output. While it's not hard to rewrite the expression into a safe form, let's put it into a common function to reduce the risk of someone doing it wrong in future. In principle this is a bug fix, but since the problem could only manifest with group count estimates exceeding 2^63, it seems unlikely that anyone has actually hit this or will do so anytime soon. We're fixing it mainly to satisfy fuzzer-type tools. That being the case, a HEAD-only fix seems sufficient. Andrey Lepikhov Discussion: https://postgr.es/m/ebbc2efb-7ef9-bf2f-1ada-d6ec48f70e58@postgrespro.ru	2022-05-21 13:13:44 -04:00
Alvaro Herrera	c4f113e8fe	Clean up newlines following left parentheses Like commit `c9d2977519`.	2022-05-13 23:52:35 +02:00
Tom Lane	3ab9a63cb6	Rename JsonIsPredicate.value_type, fix JSON backend/nodes/ infrastructure. I started out with the intention to rename value_type to item_type to avoid a collision with a typedef name that appears on some platforms. Along the way, I noticed that the adjacent field "format" was not being correctly handled by the backend/nodes/ infrastructure functions: copyfuncs.c erroneously treated it as a scalar, while equalfuncs, outfuncs, and readfuncs omitted handling it at all. This looks like it might be cosmetic at the moment because the field is always NULL after parse analysis; but that's likely a bug in itself, and the code's certainly not very future-proof. Let's fix it while we can still do so without forcing an initdb on beta testers. Further study found a few other inconsistencies in the backend/nodes/ infrastructure for the recently-added JSON node types, so fix those too. catversion bumped because of potential change in stored rules. Discussion: https://postgr.es/m/526703.1652385613@sss.pgh.pa.us	2022-05-13 11:40:08 -04:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Michael Paquier	45edde037e	Fix typos and grammar in code and test comments This fixes the grammar of some comments in a couple of tests (SQL and TAP), and in some C files. Author: Justin Pryzby Discussion: https://postgr.es/m/20220511020334.GH19626@telsasoft.com	2022-05-11 15:38:55 +09:00
Peter Eisentraut	755df30e48	Fix incorrect format placeholders	2022-04-27 09:49:10 +02:00
Alvaro Herrera	a87e759569	Move ModifyTableContext->lockmode to UpdateContext Should have been done this way to start with, but I failed to notice This way we avoid some pointless initialization, and better contains the variable to exist in the scope where it is really used. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql	2022-04-20 11:18:04 +02:00
Alvaro Herrera	3dcc6bf406	ExecModifyTable: use context.planSlot instead of planSlot There's no reason to keep a separate local variable when we have a place for it elsewhere. This allows to simplify some code. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql	2022-04-20 10:34:58 +02:00
Alvaro Herrera	24d2b2680a	Remove extraneous blank lines before block-closing braces These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch	2022-04-13 19:16:02 +02:00
Alvaro Herrera	183c869e1c	adjust_partition_colnos mustn't be called if not needed Add an assert to make this very explicit, as well as a code comment. The former should silence Coverity complaining about this. Introduced by `7103ebb7aa`. Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqTTAOzXiYybab+1DQOb3ZUuK99=p_KD+yrRFhcDbd0jg@mail.gmail.com	2022-04-12 15:19:57 +02:00
David Rowley	b0e5f02ddc	Fix various typos and spelling mistakes in code comments Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-11 20:49:41 +12:00
Michael Paquier	efb0ef909f	Track I/O timing for temporary file blocks in EXPLAIN (BUFFERS) Previously, the output of EXPLAIN (BUFFERS) option showed only the I/O timing spent reading and writing shared and local buffers. This commit adds on top of that the I/O timing for temporary buffers in the output of EXPLAIN (for spilled external sorts, hashes, materialization. etc). This can be helpful for users in cases where the I/O related to temporary buffers is the bottleneck. Like its cousin, this information is available only when track_io_timing is enabled. Playing the patch, this is showing an extra overhead of up to 1% even when using gettimeofday() as implementation for interval timings, which is slightly within the usual range noise still that's measurable. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com	2022-04-08 11:27:21 +09:00
David Rowley	9d9c02ccd1	Teach planner and executor about monotonic window funcs Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT , row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count() and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com	2022-04-08 10:34:36 +12:00
Alvaro Herrera	a90641eac2	Revert "Rewrite some RI code to avoid using SPI" This reverts commit `99392cdd78`. We'd rather rewrite ri_triggers.c as a whole rather than piecemeal. Discussion: https://postgr.es/m/E1ncXX2-000mFt-Pe@gemulon.postgresql.org	2022-04-07 23:42:13 +02:00
Alvaro Herrera	99392cdd78	Rewrite some RI code to avoid using SPI Modify the subroutines called by RI trigger functions that want to check if a given referenced value exists in the referenced relation to simply scan the foreign key constraint's unique index, instead of using SPI to execute SELECT 1 FROM referenced_relation WHERE ref_key = $1 This saves a lot of work, especially when inserting into or updating a referencing relation. This rewrite allows to fix a PK row visibility bug caused by a partition descriptor hack which requires ActiveSnapshot to be set to come up with the correct set of partitions for the RI query running under REPEATABLE READ isolation. We now set that snapshot indepedently of the snapshot to be used by the PK index scan, so the two no longer interfere. The buggy output in src/test/isolation/expected/fk-snapshot.out of the relevant test case added by commit `00cb86e75d` has been corrected. (The bug still exists in branch 14, however, but this fix is too invasive to backpatch.) Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Li Japin <japinli@hotmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com	2022-04-07 21:10:03 +02:00
Tomas Vondra	2c7ea57e56	Revert "Logical decoding of sequences" This reverts a sequence of commits, implementing features related to logical decoding and replication of sequences: - `0da92dc530` - `80901b3291` - `b779d7d8fd` - `d5ed9da41d` - `a180c2b34d` - `75b1521dae` - `2d2232933b` - `002c9dd97a` - `05843b1aa4` The implementation has issues, mostly due to combining transactional and non-transactional behavior of sequences. It's not clear how this could be fixed, but it'll require reworking significant part of the patch. Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com	2022-04-07 20:06:36 +02:00
Alvaro Herrera	297daa9d43	Refactor and cleanup runtime partition prune code a little * Move the execution pruning initialization steps that are common between both ExecInitAppend() and ExecInitMergeAppend() into a new function ExecInitPartitionPruning() defined in execPartition.c. Those steps include creation of a PartitionPruneState to be used for all instances of pruning and determining the minimal set of child subplans that need to be initialized by performing initial pruning if needed, and finally adjusting the subplan_map arrays in the PartitionPruneState to reflect the new set of subplans remaining after initial pruning if it was indeed performed. ExecCreatePartitionPruneState() is no longer exported out of execPartition.c and has been renamed to CreatePartitionPruneState() as a local sub-routine of ExecInitPartitionPruning(). * Likewise, ExecFindInitialMatchingSubPlans() that was in charge of performing initial pruning no longer needs to be exported. In fact, since it would now have the same body as the more generally named ExecFindMatchingSubPlans(), except differing in the value of initial_prune passed to the common subroutine find_matching_subplans_recurse(), it seems better to remove it and add an initial_prune argument to ExecFindMatchingSubPlans(). * Add an ExprContext field to PartitionPruneContext to remove the implicit assumption in the runtime pruning code that the ExprContext to use to compute pruning expressions that need one can always rely on the PlanState providing it. A future patch will allow runtime pruning (at least the initial pruning steps) to be performed without the corresponding PlanState yet having been created, so this will help. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com	2022-04-05 11:46:48 +02:00
Andrew Dunstan	4e34747c88	JSON_TABLE This feature allows jsonb data to be treated as a table and thus used in a FROM clause like other tabular data. Data can be selected from the jsonb using jsonpath expressions, and hoisted out of nested structures in the jsonb to form multiple rows, more or less like an outer join. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu (whose name I previously misspelled), Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru	2022-04-04 16:03:47 -04:00
David Rowley	40af10b571	Use Generation memory contexts to store tuples in sorts The general usage pattern when we store tuples in tuplesort.c is that we store a series of tuples one by one then either perform a sort or spill them to disk. In the common case, there is no pfreeing of already stored tuples. For the common case since we do not individually pfree tuples, we have very little need for aset.c memory allocation behavior which maintains freelists and always rounds allocation sizes up to the next power of 2 size. Here we conditionally use generation.c contexts for storing tuples in tuplesort.c when the sort will never be bounded. Unfortunately, the memory context to store tuples is already created by the time any calls would be made to tuplesort_set_bound(), so here we add a new sort option that allows callers to specify if they're going to need a bounded sort or not. We'll use a standard aset.c allocator when this sort option is not set. Extension authors must ensure that the TUPLESORT_ALLOWBOUNDED flag is used when calling tuplesort_begin_* for any sorts that make a call to tuplesort_set_bound(). Author: David Rowley Reviewed-by: Andy Fan Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com	2022-04-04 22:52:35 +12:00
David Rowley	77bae396df	Adjust tuplesort API to have bitwise option flags This replaces the bool flag for randomAccess. An upcoming patch requires adding another option, so instead of breaking the API for that, then breaking it again one day if we add more options, let's just break it once. Any boolean options we add in the future will just make use of an unused bit in the flags. Any extensions making use of tuplesorts will need to update their code to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess. TUPLESORT_NONE can be used for a set of empty options. Author: David Rowley Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com	2022-04-04 22:24:59 +12:00
Andrew Dunstan	606948b058	SQL JSON functions This Patch introduces three SQL standard JSON functions: JSON() (incorrectly mentioned in my commit message for `f4fb45d15c`) JSON_SCALAR() JSON_SERIALIZE() JSON() produces json values from text, bytea, json or jsonb values, and has facilitites for handling duplicate keys. JSON_SCALAR() produces a json value from any scalar sql value, including json and jsonb. JSON_SERIALIZE() produces text or bytea from input which containis or represents json or jsonb; For the most part these functions don't add any significant new capabilities, but they will be of use to users wanting standard compliant JSON handling. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-30 16:30:37 -04:00
Andrew Dunstan	1a36bc9dba	SQL/JSON query functions This introduces the SQL/JSON functions for querying JSON data using jsonpath expressions. The functions are: JSON_EXISTS() JSON_QUERY() JSON_VALUE() All of these functions only operate on jsonb. The workaround for now is to cast the argument to jsonb. JSON_EXISTS() tests if the jsonpath expression applied to the jsonb value yields any values. JSON_VALUE() must return a single value, and an error occurs if it tries to return multiple values. JSON_QUERY() must return a json object or array, and there are various WRAPPER options for handling scalar or multi-value results. Both these functions have options for handling EMPTY and ERROR conditions. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-29 16:57:13 -04:00
Andrew Dunstan	33a377608f	IS JSON predicate This patch intrdocuces the SQL standard IS JSON predicate. It operates on text and bytea values representing JSON as well as on the json and jsonb types. Each test has an IS and IS NOT variant. The tests are: IS JSON [VALUE] IS JSON ARRAY IS JSON OBJECT IS JSON SCALAR IS JSON WITH \| WITHOUT UNIQUE KEYS These are mostly self-explanatory, but note that IS JSON WITHOUT UNIQUE KEYS is true whenever IS JSON is true, and IS JSON WITH UNIQUE KEYS is true whenever IS JSON is true except it IS JSON OBJECT is true and there are duplicate keys (which is never the case when applied to jsonb values). Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-28 15:37:08 -04:00
Alvaro Herrera	7103ebb7aa	Add support for MERGE SQL command MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql	2022-03-28 16:47:48 +02:00
Andrew Dunstan	f4fb45d15c	SQL/JSON constructors This patch introduces the SQL/JSON standard constructors for JSON: JSON() JSON_ARRAY() JSON_ARRAYAGG() JSON_OBJECT() JSON_OBJECTAGG() For the most part these functions provide facilities that mimic existing json/jsonb functions. However, they also offer some useful additional functionality. In addition to text input, the JSON() function accepts bytea input, which it will decode and constuct a json value from. The other functions provide useful options for handling duplicate keys and null values. This series of patches will be followed by a consolidated documentation patch. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-27 17:03:34 -04:00
Andrew Dunstan	f79b803dcc	Common SQL/JSON clauses This introduces some of the building blocks used by the SQL/JSON constructor and query functions. Specifically, it provides node executor and grammar support for the FORMAT JSON [ENCODING foo] clause, and values decorated with it, and for the RETURNING clause. The following SQL/JSON patches will leverage these. Nikita Glukhov (who probably deserves an award for perseverance). Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-27 17:03:33 -04:00
Michael Paquier	411b91360f	Fix comment in execParallel.c `0f61727` has made this comment incorrect. Author: Julien Rouhaud Reviewed-by: Matthias van de Meent Discussion: https://postgr.es/m/20220326160117.qtp5nkuku6cvhcby@jrouhaud	2022-03-27 18:22:22 +09:00
Tomas Vondra	923def9a53	Allow specifying column lists for logical replication This allows specifying an optional column list when adding a table to logical replication. The column list may be specified after the table name, enclosed in parentheses. Columns not included in this list are not sent to the subscriber, allowing the schema on the subscriber to be a subset of the publisher schema. For UPDATE/DELETE publications, the column list needs to cover all REPLICA IDENTITY columns. For INSERT publications, the column list is arbitrary and may omit some REPLICA IDENTITY columns. Furthermore, if the table uses REPLICA IDENTITY FULL, column list is not allowed. The column list can contain only simple column references. Complex expressions, function calls etc. are not allowed. This restriction could be relaxed in the future. During the initial table synchronization, only columns included in the column list are copied to the subscriber. If the subscription has several publications, containing the same table with different column lists, columns specified in any of the lists will be copied. This means all columns are replicated if the table has no column list at all (which is treated as column list with all columns), or when of the publications is defined as FOR ALL TABLES (possibly IN SCHEMA that matches the schema of the table). For partitioned tables, publish_via_partition_root determines whether the column list for the root or the leaf relation will be used. If the parameter is 'false' (the default), the list defined for the leaf relation is used. Otherwise, the column list for the root partition will be used. Psql commands \dRp+ and \d <table-name> now display any column lists. Author: Tomas Vondra, Alvaro Herrera, Rahila Syed Reviewed-by: Peter Eisentraut, Alvaro Herrera, Vignesh C, Ibrar Ahmed, Amit Kapila, Hou zj, Peter Smith, Wang wei, Tang, Shi yu Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com	2022-03-26 01:01:27 +01:00
Tomas Vondra	75b1521dae	Add decoding of sequences to built-in replication This commit adds support for decoding of sequences to the built-in replication (the infrastructure was added by commit `0da92dc530`). The syntax and behavior mostly mimics handling of tables, i.e. a publication may be defined as FOR ALL SEQUENCES (replicating all sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating all sequences in a particular schema) or individual sequences. To publish sequence modifications, the publication has to include 'sequence' action. The protocol is extended with a new message, describing sequence increments. A new system view pg_publication_sequences lists all the sequences added to a publication, both directly and indirectly. Various psql commands (\d and \dRp) are improved to also display publications including a given sequence, or sequences included in a publication. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres Freund, Petr Jelinek Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca	2022-03-24 18:49:27 +01:00
Andrew Dunstan	1460fc5942	Revert "Common SQL/JSON clauses" This reverts commit `865fe4d5df`. This has caused issues with a significant number of buildfarm members	2022-03-22 19:56:14 -04:00
Andrew Dunstan	865fe4d5df	Common SQL/JSON clauses This introduces some of the building blocks used by the SQL/JSON constructor and query functions. Specifically, it provides node executor and grammar support for the FORMAT JSON [ENCODING foo] clause, and values decorated with it, and for the RETURNING clause. The following SQL/JSON patches will leverage these. Nikita Glukhov (who probably deserves an award for perseverance). Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup. Erik Rijkers, Zihong Yu and Himanshu Upadhyaya. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-22 17:32:54 -04:00
Alvaro Herrera	2d655a08d5	Blind fix for uninitialized memory bug in `ba9a7e3921` Valgrind animal skink shows a crash in this new code. I couldn't reproduce the problem locally, but going by blind code inspection, initializing insert_destrel should be sufficient to fix the problem.	2022-03-20 22:10:24 +01:00
Alvaro Herrera	ba9a7e3921	Enforce foreign key correctly during cross-partition updates When an update on a partitioned table referenced in foreign key constraints causes a row to move from one partition to another, the fact that the move is implemented as a delete followed by an insert on the target partition causes the foreign key triggers to have surprising behavior. For example, a given foreign key's delete trigger which implements the ON DELETE CASCADE clause of that key will delete any referencing rows when triggered for that internal DELETE, although it should not, because the referenced row is simply being moved from one partition of the referenced root partitioned table into another, not being deleted from it. This commit teaches trigger.c to skip queuing such delete trigger events on the leaf partitions in favor of an UPDATE event fired on the root target relation. Doing so is sensible because both the old and the new tuple "logically" belong to the root relation. The after trigger event queuing interface now allows passing the source and the target partitions of a particular cross-partition update when registering the update event for the root partitioned table. Along with the two ctids of the old and the new tuple, the after trigger event now also stores the OIDs of those partitions. The tuples fetched from the source and the target partitions are converted into the root table format, if necessary, before they are passed to the trigger function. The implementation currently has a limitation that only the foreign keys pointing into the query's target relation are considered, not those of its sub-partitioned partitions. That seems like a reasonable limitation, because it sounds rare to have distinct foreign keys pointing to sub-partitioned partitions instead of to the root table. This misbehavior stems from commit `f56f8f8da6` (which added support for foreign keys to reference partitioned tables) not paying sufficient attention to commit `2f17844104` (which had introduced cross-partition updates a year earlier). Even though the former commit goes back to Postgres 12, we're not backpatching this fix at this time for fear of destabilizing things too much, and because there are a few ABI breaks in it that we'd have to work around in older branches. It also depends on commit `f4566345cf`, which had its own share of backpatchability issues as well. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Eduard Català <eduard.catala@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com	2022-03-20 18:43:40 +01:00
Alvaro Herrera	a1fc50672c	Fix an outdated and grammatically wrong comment Authored by Amit Langote and myself independently Discussion: https://postgr.es/m/CA+HiwqGCjcH0gG-=tM7hhP7TEDmzrHMHJbPGSHtHgFmx9mnFkg@mail.gmail.com	2022-03-19 19:34:04 +01:00
Tom Lane	ec62cb0aac	Revert applying column aliases to the output of whole-row Vars. In commit `bf7ca1587`, I had the bright idea that we could make the result of a whole-row Var (that is, foo.) track any column aliases that had been applied to the FROM entry the Var refers to. However, that's not terribly logically consistent, because now the output of the Var is no longer of the named composite type that the Var claims to emit. `bf7ca1587` tried to handle that by changing the output tuple values to be labeled with a blessed RECORD type, but that's really pretty disastrous: we can wind up storing such tuples onto disk, whereupon they're not readable by other sessions. The only practical fix I can see is to give up on what `bf7ca1587` tried to do, and say that the column names of tuples produced by a whole-row Var are always those of the underlying named composite type, query aliases or no. While this introduces some inconsistencies, it removes others, so it's not that awful in the abstract. What is* kind of awful is to make such a behavioral change in a back-patched bug fix. But corrupt data is worse, so back-patched it will be. (A workaround available to anyone who's unhappy about this is to introduce an extra level of sub-SELECT, so that the whole-row Var is referring to the sub-SELECT's output and not to a named table type. Then the Var is of type RECORD to begin with and there's no issue.) Per report from Miles Delahunty. The faulty commit dates to 9.5, so back-patch to all supported branches. Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us	2022-03-17 18:18:05 -04:00
Alvaro Herrera	25e777cf8e	Split ExecUpdate and ExecDelete into reusable pieces Create subroutines ExecUpdatePrologue / ExecUpdateAct / ExecUpdateEpilogue, and similar for ExecDelete. Introduce a new struct to be used internally in nodeModifyTable.c, dubbed ModifyTableContext, which contains all context information needed to perform these operations, as well as ExecInsert and others. This allows using a different schedule and a different way of evaluating the results of these operations, which can be exploited by a later commit introducing support for MERGE. It also makes ExecUpdate and ExecDelete proper shorter and (hopefully) simpler. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/202202271724.4z7xv3cf46kv@alvherre.pgsql	2022-03-17 11:47:04 +01:00
Michael Paquier	c28839c832	Improve comment in execReplication.c Author: Peter Smith Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CAHut+PuRVf3ghNTg8EV5XOQu6unGSZma0ahsRoz-haaOFZe-1A@mail.gmail.com	2022-03-08 14:29:03 +09:00
Peter Eisentraut	791b1b71da	Parse/analyze function renaming There are three parallel ways to call parse/analyze: with fixed parameters, with variable parameters, and by supplying your own parser callback. Some of the involved functions were confusingly named and made this API structure more confusing. This patch renames some functions to make this clearer: parse_analyze() -> parse_analyze_fixedparams() pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams() (Otherwise one might think this variant doesn't accept parameters, but in fact all three ways accept parameters.) pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb() (Before, and also when considering pg_analyze_and_rewrite(), one might think this is the only way to pass parameters. Moreover, the parser callback doesn't necessarily need to parse only parameters, it's just one of the things it could do.) parse_fixed_parameters() -> setup_parse_fixed_parameters() parse_variable_parameters() -> setup_parse_variable_parameters() (These functions don't actually do any parsing, they just set up callbacks to use during parsing later.) This patch also adds some const decorations to the fixed-parameters API, so the distinction from the variable-parameters API is more clear. Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com	2022-03-04 14:50:22 +01:00
Tom Lane	12d768e704	Don't use static storage for SaveTransactionCharacteristics(). This is pretty queasy-making on general principles, and the more so once you notice that CommitTransactionCommand() is actually stomping on the values saved by _SPI_commit(). It's okay as long as the active values didn't change during HoldPinnedPortals(); but that's a larger assumption than I think we want to make, especially since the fix is so simple. Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us	2022-02-28 12:54:12 -05:00
Tom Lane	2e517818f4	Fix SPI's handling of errors during transaction commit. SPI_commit previously left it up to the caller to recover from any error occurring during commit. Since that's complicated and requires use of low-level xact.c facilities, it's not too surprising that no caller got it right. Let's move the responsibility for cleanup into spi.c. Doing that requires redefining SPI_commit as starting a new transaction, so that it becomes equivalent to SPI_commit_and_chain except that you get default transaction characteristics instead of preserving the prior transaction's characteristics. We can make this pretty transparent API-wise by redefining SPI_start_transaction() as a no-op. Callers that expect to do something in between might be surprised, but available evidence is that no callers do so. Having made that API redefinition, we can fix this mess by having SPI_commit[_and_chain] trap errors and start a new, clean transaction before re-throwing the error. Likewise for SPI_rollback[_and_chain]. Some cleanup is also needed in AtEOXact_SPI, which was nowhere near smart enough to deal with SPI contexts nested inside a committing context. While plperl and pltcl need no changes beyond removing their now-useless SPI_start_transaction() calls, plpython needs some more work because it hadn't gotten the memo about catching commit/rollback errors in the first place. Such an error resulted in longjmp'ing out of the Python interpreter, which leaks Python stack entries at present and is reported to crash Python 3.11 altogether. Add the missing logic to catch such errors and convert them into Python exceptions. We are probably going to have to back-patch this once Python 3.11 ships, but it's a sufficiently basic change that I'm a bit nervous about doing so immediately. Let's let it bake awhile in HEAD first. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org	2022-02-28 12:45:36 -05:00
Daniel Gustafsson	2313a3ee22	Fix statenames in mergejoin comments The names in the comments were on a few states not consistent with the documented state. Author: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CALNJ-vQVthfQXVqmrHR8BKHtC4fMGbhM1xbvJNJAPexTq_dH=w@mail.gmail.com	2022-02-23 10:54:03 +01:00
Amit Kapila	52e4f0cd47	Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com	2022-02-22 08:11:50 +05:30
John Naylor	4b35408f1e	Use bitwise rotate functions in more places There were a number of places in the code that used bespoke bit-twiddling expressions to do bitwise rotation. While we've had pg_rotate_right32() for a while now, we hadn't gotten around to standardizing on that. Do so now. Since many potential call sites look more natural with the "left" equivalent, add that function too. Reviewed by Tom Lane and Yugo Nagata Discussion: https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com	2022-02-20 13:22:08 +07:00
Alexander Korotkov	3f74daa8df	Fix memory leak in IndexScan node with reordering Fix ExecReScanIndexScan() to free the referenced tuples while emptying the priority queue. Backpatch to all supported versions. Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com Author: Aliaksandr Kalenik Reviewed-by: Tom Lane, Alexander Korotkov Backpatch-through: 10	2022-02-14 04:17:04 +03:00
Tom Lane	5e26aa641e	Test, don't just Assert, that mergejoin's inputs are in order. There are two Asserts in nodeMergejoin.c that are reachable if the input data is not in the expected order. This seems way too fragile. Alexander Lakhin reported a case where the assertions could be triggered with misconfigured foreign-table partitions, and bitter experience with unstable operating system collation definitions suggests another easy route to hitting them. Neither Assert is in a place where we can't afford one more test-and-branch, so replace 'em with plain test-and-elog logic. Per bug #17395. While the reported symptom is relatively recent, collation changes could happen anytime, so back-patch to all supported branches. Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org	2022-02-05 11:59:29 -05:00
Andres Freund	7c1aead6cb	Fix compiler warning in non-assert builds, introduced in `f862d57057`. Discussion: https://postgr.es/m/20220203183655.ralgkh54sdcgysmn@alap3.anarazel.de Backpatch: 14-, like `f862d57057`	2022-02-03 10:44:26 -08:00
Etsuro Fujita	f862d57057	Further fix for EvalPlanQual with mix of local and foreign partitions. We assume that direct-modify ForeignScan nodes cannot be re-evaluated during EvalPlanQual processing, but the rework for inherited UPDATE/DELETE in commit `86dc90056` changed things, without considering that, so that such ForeignScan nodes get called as part of the EvalPlanQual subtree during EvalPlanQual processing in the case of an inherited UPDATE/DELETE where the inheritance set contains foreign target relations. To avoid re-evaluating such ForeignScan nodes during EvalPlanQual processing, commit `c3928b467` modified nodeForeignscan.c, but the assumption made there that ExecForeignScan() should never be called for such ForeignScan nodes during EvalPlanQual processing turned out to be wrong in some cases, leading to a segmentation fault or a "cannot re-evaluate a Foreign Update or Delete during EvalPlanQual" error. Fix by modifying nodeForeignscan.c further to avoid re-evaluating such ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan() during EvalPlanQual processing. Since this makes non-reachable the test-and-elog added to ForeignNext() by commit `c3928b467` that produced the aforesaid error, convert the test-and-elog to an Assert. Per bug #17355 from Alexander Lakhin. Back-patch to v14 where both commits came in. Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote. Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org	2022-02-03 15:15:00 +09:00
Etsuro Fujita	eabcfd99ed	Fix typo in comment.	2022-01-28 15:45:00 +09:00
Peter Geoghegan	db6736c93c	Fix memory leak in indexUnchanged hint mechanism. Commit `9dc718bd` added a "logically unchanged by UPDATE" hinting mechanism, which is currently used within nbtree indexes only (see commit `d168b666`). This mechanism determined whether or not the incoming item is a logically unchanged duplicate (a duplicate needed only for MVCC versioning purposes) once per row updated per non-HOT update. This approach led to memory leaks which were noticeable with an UPDATE statement that updated sufficiently many rows, at least on tables that happen to have an expression index. On HEAD, fix the issue by adding a cache to the executor's per-index IndexInfo struct. Take a different approach on Postgres 14 to avoid an ABI break: simply pass down the hint to all indexes unconditionally with non-HOT UPDATEs. This is deemed acceptable because the hint is currently interpreted within btinsert() as "perform a bottom-up index deletion pass if and when the only alternative is splitting the leaf page -- prefer to delete any LP_DEAD-set items first". nbtree must always treat the hint as a noisy signal about what might work, as a strategy of last resort, with costs imposed on non-HOT updaters. (The same thing might not be true within another index AM that applies the hint, which is why the original behavior is preserved on HEAD.) Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com> Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us Backpatch: 14-, where the hinting mechanism was added.	2022-01-12 15:41:04 -08:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Tom Lane	9a3ddeb519	Fix index-only scan plans, take 2. Commit `4ace45677` failed to fix the problem fully, because the same issue of attempting to fetch a non-returnable index column can occur when rechecking the indexqual after using a lossy index operator. Moreover, it broke EXPLAIN for such indexquals (which indicates a gap in our test cases :-(). Revert the code changes of `4ace45677` in favor of adding a new field to struct IndexOnlyScan, containing a version of the indexqual that can be executed against the index-returned tuple without using any non-returnable columns. (The restrictions imposed by check_index_only guarantee this is possible, although we may have to recompute indexed expressions.) Support construction of that during setrefs.c processing by marking IndexOnlyScan.indextlist entries as resjunk if they can't be returned, rather than removing them entirely. (We could alternatively require setrefs.c to look up the IndexOptInfo again, but abusing resjunk this way seems like a reasonably safe way to avoid needing to do that.) This solution isn't great from an API-stability standpoint: if there are any extensions out there that build IndexOnlyScan structs directly, they'll be broken in the next minor releases. However, only a very invasive extension would be likely to do such a thing. There's no change in the Path representation, so typical planner extensions shouldn't have a problem. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org	2022-01-03 15:42:27 -05:00
Tom Lane	bbc227e951	Always use ReleaseTupleDesc after lookup_rowtype_tupdesc et al. The API spec for lookup_rowtype_tupdesc previously said you could use either ReleaseTupleDesc or DecrTupleDescRefCount. However, the latter choice means the caller must be certain that the returned tupdesc is refcounted. I don't recall right now whether that was always true when this spec was written, but it's certainly not always true since we introduced shared record typcaches for parallel workers. That means that callers using DecrTupleDescRefCount are dependent on typcache behavior details that they probably shouldn't be. Hence, change the API spec to say that you must call ReleaseTupleDesc, and fix the half-dozen callers that weren't. AFAICT this is just future-proofing, there's no live bug here. So no back-patch. Per gripe from Chapman Flack. Discussion: https://postgr.es/m/61B901A4.1050808@anastigmatix.net	2021-12-15 18:58:20 -05:00
Tom Lane	3804539e48	Replace random(), pg_erand48(), etc with a better PRNG API and algorithm. Standardize on xoroshiro128 as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128 is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo	2021-11-28 21:33:07 -05:00
David Rowley	411137a429	Flush Memoize cache when non-key parameters change, take 2 It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added	2021-11-24 23:29:14 +13:00
David Rowley	dad20ad470	Revert "Flush Memoize cache when non-key parameters change" This reverts commit `1050048a31`.	2021-11-24 15:27:43 +13:00
David Rowley	1050048a31	Flush Memoize cache when non-key parameters change It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added	2021-11-24 14:56:18 +13:00
David Rowley	e502150f7d	Allow Memoize to operate in binary comparison mode Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added	2021-11-24 10:06:59 +13:00
Tom Lane	01fc652703	Fix variable lifespan in ExecInitCoerceToDomain(). This undoes a mistake in `1ec7679f1`: domainval and domainnull were meant to live across loop iterations, but they were incorrectly moved inside the loop. The effect was only to emit useless extra EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless, back-patch to v13 where the mistake was introduced. Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com	2021-11-02 13:36:47 -04:00
Tom Lane	e9d9ba2a4d	Avoid some other O(N^2) hazards in list manipulation. In the same spirit as `6301c3ada`, fix some more places where we were using list_delete_first() in a loop and thereby risking O(N^2) behavior. It's not clear that the lists manipulated in these spots can get long enough to be really problematic ... but it's not clear that they can't, either, and the fixes are simple enough. As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com	2021-11-01 16:24:39 -04:00
Tom Lane	3e310d837a	Fix assignment to array of domain over composite. An update such as "UPDATE ... SET fld[n].subfld = whatever" failed if the array elements were domains rather than plain composites. That's because isAssignmentIndirectionExpr() failed to cope with the CoerceToDomain node that would appear in the expression tree in this case. The result would typically be a crash, and even if we accidentally didn't crash, we'd not correctly preserve other fields of the same array element. Per report from Onder Kalaci. Back-patch to v11 where arrays of domains came in. Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com	2021-10-19 13:54:45 -04:00
Heikki Linnakangas	c4649cce39	Refactor LogicalTapeSet/LogicalTape interface. All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now take a LogicalTape as argument, instead of LogicalTapeSet+tape number. You can create any number of LogicalTapes in a single LogicalTapeSet, and you don't need to decide the number upfront, when you create the tape set. This makes the tape management in hash agg spilling in nodeAgg.c simpler. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor	2021-10-18 14:46:01 +03:00
Robert Haas	46846433a0	shm_mq: Update mq_bytes_written less often. Do not update shm_mq's mq_bytes_written until we have written an amount of data greater than 1/4th of the ring size, unless the caller of shm_mq_send(v) requests a flush at the end of the message. This reduces the number of calls to SetLatch(), and also the number of CPU cache misses, considerably, and thus makes shm_mq significantly faster. Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some minor cosmetic changes by me. Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com	2021-10-14 16:13:36 -04:00
Michael Paquier	68f7c4b57a	Clean up more code using "(expr) ? true : false" This is similar to `fd0625c`, taking care of any remaining code paths that are worth the cleanup. This also changes some cases using opposite expression patterns. Author: Justin Pryzby, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com	2021-10-11 09:36:42 +09:00
Tom Lane	a0558cfa39	Fix checking of query type in plpgsql's RETURN QUERY command. Prior to v14, we insisted that the query in RETURN QUERY be of a type that returns tuples. (For instance, INSERT RETURNING was allowed, but not plain INSERT.) That happened indirectly because we opened a cursor for the query, so spi.c checked SPI_is_cursor_plan(). As a consequence, the error message wasn't terribly on-point, but at least it was there. Commit `2f48ede08` lost this detail. Instead, plain RETURN QUERY insisted that the query be a SELECT (by checking for SPI_OK_SELECT) while RETURN QUERY EXECUTE failed to check the query type at all. Neither of these changes was intended. The only convenient place to check this in the EXECUTE case is inside _SPI_execute_plan, because we haven't done parse analysis until then. So we need to pass down a flag saying whether to enforce that the query returns tuples. Fortunately, we can squeeze another boolean into struct SPIExecuteOptions without an ABI break, since there's padding space there. (It's unlikely that any extensions would already be using this new struct, but preserving ABI in v14 seems like a smart idea anyway.) Within spi.c, it seemed like _SPI_execute_plan's parameter list was already ridiculously long, and I didn't want to make it longer. So I thought of passing SPIExecuteOptions down as-is, allowing that parameter list to become much shorter. This makes the patch a bit more invasive than it might otherwise be, but it's all internal to spi.c, so that seems fine. Per report from Marc Bachmann. Back-patch to v14 where the faulty code came in. Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com	2021-10-03 13:21:20 -04:00
Michael Paquier	e767ddcd35	Fix typos and grammar in code comments Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com	2021-09-27 14:21:28 +09:00
Tom Lane	e3ec3c00d8	Remove arbitrary 64K-or-so limit on rangetable size. Up to now the size of a query's rangetable has been limited by the constants INNER_VAR et al, which mustn't be equal to any real rangetable index. 65000 doubtless seemed like enough for anybody, and it still is orders of magnitude larger than the number of joins we can realistically handle. However, we need a rangetable entry for each child partition that is (or might be) processed by a query. Queries with a few thousand partitions are getting more realistic, so that the day when that limit becomes a problem is in sight, even if it's not here yet. Hence, let's raise the limit. Rather than just increase the values of INNER_VAR et al, this patch adopts the approach of making them small negative values, so that rangetables could theoretically become as long as INT_MAX. The bulk of the patch is concerned with changing Var.varno and some related variables from "Index" (unsigned int) to plain "int". This is basically cosmetic, with little actual effect other than to help debuggers print their values nicely. As such, I've only bothered with changing places that could actually see INNER_VAR et al, which the parser and most of the planner don't. We do have to be careful in places that are performing less/greater comparisons on varnos, but there are very few such places, other than the IS_SPECIAL_VARNO macro itself. A notable side effect of this patch is that while it used to be possible to add INNER_VAR et al to a Bitmapset, that will now draw an error. I don't see any likelihood that it wouldn't be a bug to include these fake varnos in a bitmapset of real varnos, so I think this is all to the good. Although this touches outfuncs/readfuncs, I don't think a catversion bump is required, since stored rules would never contain Vars with these fake varnos. Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru	2021-09-15 14:11:21 -04:00
Peter Eisentraut	639a86e36a	Remove Value node struct The Value node struct is a weird construct. It is its own node type, but most of the time, it actually has a node type of Integer, Float, String, or BitString. As a consequence, the struct name and the node type don't match most of the time, and so it has to be treated specially a lot. There doesn't seem to be any value in the special construct. There is very little code that wants to accept all Value variants but nothing else (and even if it did, this doesn't provide any convenient way to check it), and most code wants either just one particular node type (usually String), or it accepts a broader set of node types besides just Value. This change removes the Value struct and node type and replaces them by separate Integer, Float, String, and BitString node types that are proper node types and structs of their own and behave mostly like normal node types. Also, this removes the T_Null node tag, which was previously also a possible variant of Value but wasn't actually used outside of the Value contained in A_Const. Replace that by an isnull field in A_Const. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com	2021-09-09 08:36:53 +02:00
Michael Paquier	fd0625c7a9	Clean up some code using "(expr) ? true : false" All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com	2021-09-08 09:44:04 +09:00
Heikki Linnakangas	6ac763f19a	Fix missing words in comment. Introduced by commit `c3928b467a`, backpatch to v14 like that one. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA+HiwqFQgNLS6VGntMcuJV6erBFV425xA6wBVnY=41GK4zC0Bw@mail.gmail.com	2021-09-07 10:28:55 +03:00
Tom Lane	26ae660903	Improve error messages about misuse of SELECT INTO. Improve two places in plpgsql, and one in spi.c, where an error message would confusingly tell you that you couldn't use a SELECT query, when what you had written was a SELECT query. The actual problem is that you can't use SELECT ... INTO in these contexts, but the messages failed to make that apparent. Special-case SELECT INTO to make these errors more helpful. Also, fix the same spots in plpgsql, as well as several messages in exec_eval_expr(), to not quote the entire complained-of query or expression in the primary error message. That behavior very easily led to violating our message style guideline about keeping the primary error message short and single-line. Also, since the important part of the message was after the inserted text, it could make the real problem very hard to see. We can report the query or expression as the first line of errcontext instead. Per complaint from Roger Mason. Back-patch to v14, since (a) some of these messages are new in v14 and (b) v14's translatable strings are still somewhat in flux. The problem's older than that of course, but I'm hesitant to change the behavior further back. Discussion: https://postgr.es/m/1914708.1629474624@sss.pgh.pa.us	2021-08-21 10:22:14 -04:00
Tomas Vondra	650663b4cb	Use appropriate tuple descriptor in FDW batching The FDW batching code was using the same tuple descriptor both for all slots (regular and plan slots), but that's incorrect - the subplan may use a different descriptor. Currently this is benign, because batching is used only for INSERTs, and in that case the descriptors always match. But that would change if we allow batching UPDATEs. Fix by copying the appropriate tuple descriptor. Backpatch to 14, where the FDW batching was implemented. Author: Amit Langote Backpatch-through: 14, where FDW batching was added Discussion: https://postgr.es/m/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com	2021-08-12 22:10:06 +02:00
Heikki Linnakangas	c3928b467a	Fix segfault during EvalPlanQual with mix of local and foreign partitions. It's not sensible to re-evaluate a direct-modify Foreign Update or Delete during EvalPlanQual. However, ExecInitForeignScan() can still get called if a table mixes local and foreign partitions. EvalPlanQualStart() left the es_result_relations array uninitialized in the child EPQ EState, but ExecInitForeignScan() still expected to find it. That caused a segfault. Fix by skipping the es_result_relations lookup during EvalPlanQual processing. To make things a bit more robust, also skip the BeginDirectModify calls, and add a runtime check that ExecForeignScan() is not called on direct-modify foreign scans during EvalPlanQual processing. This is new in v14, commit `1375422c78`. Before that, EvalPlanQualStart() copied the whole ResultRelInfo array to the EPQ EState. Backpatch to v14. Report and diagnosis by Andrey Lepikhov. Discussion: https://www.postgresql.org/message-id/cb2b808d-cbaa-4772-76ee-c8809bafcf3d%40postgrespro.ru	2021-08-12 11:02:29 +03:00
Peter Eisentraut	2226b4189b	Change SeqScan node to contain Scan node This makes the structure of all Scan-derived nodes the same, independent of whether they have additional fields. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com	2021-08-08 18:46:34 +02:00
Etsuro Fujita	a8ed9bd59d	Fix oversight in commit `1ec7fca859`. I failed to account for the possibility that when ExecAppendAsyncEventWait() notifies multiple async-capable nodes using postgres_fdw, a preceding node might invoke process_pending_request() to process a pending asynchronous request made by a succeeding node. In that case the succeeding node should produce a tuple to return to the parent Append node from tuples fetched by process_pending_request() when notified. Repair. Per buildfarm via Michael Paquier. Back-patch to v14, like the previous commit. Thanks to Tom Lane for testing. Discussion: https://postgr.es/m/YQP0UPT8KmPiHTMs%40paquier.xyz	2021-08-02 12:45:00 +09:00
Etsuro Fujita	1ec7fca859	postgres_fdw: Fix handling of pending asynchronous requests. A pending asynchronous request is handled by process_pending_request(), which previously not only processed an in-progress remote query but performed ExecForeignScan() to produce a tuple to return to the local server asynchronously from the result of the remote query. But that led to a server crash when executing a query or led to an "InstrStartNode called twice in a row" or "InstrEndLoop called on running node" failure when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it contained multiple async-capable nodes accessing the same initplan/subplan that contained multiple async-capable nodes scanning the same foreign tables as for the parent async-capable nodes, as reported by Andrey Lepikhov. The reason is that the second step in process_pending_request() invoked when executing the initplan/subplan for one of the parent async-capable nodes caused recursive execution of the initplan/subplan for another of the parent async-capable nodes. To fix, split process_pending_request() into the two steps and postpone the second step until ForeignAsyncConfigureWait() is called for each of the pending asynchronous requests. Also, in ExecAppendAsyncEventWait() we assumed that FDWs would register at least one wait event in a WaitEventSet created there when they were called from ForeignAsyncConfigureWait() in that function, but allow FDWs to register zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait() to just return in that case. Oversight in commit `27e1f1456`. Back-patch to v14 where that commit went in. Andrey Lepikhov and Etsuro Fujita Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru	2021-07-30 17:00:00 +09:00
Tom Lane	28d936031a	Get rid of artificial restriction on hash table sizes on Windows. The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com	2021-07-25 14:02:27 -04:00
David Rowley	91e9e89dcc	Make nodeSort.c use Datum sorts for single column sorts Datum sorts can be significantly faster than tuple sorts, especially when the data type being sorted is a pass-by-value type. Something in the region of 50-70% performance improvements appear to be possible. Just in case there's any confusion; the Datum sort is only used when the targetlist of the Sort node contains a single column, not when there's a single column in the sort key and multiple items in the target list. Author: Ronan Dunklau Reviewed-by: James Coleman, David Rowley, Ranier Vilela, Hou Zhijie Tested-by: John Naylor Discussion: https://postgr.es/m/3177670.itZtoPt7T5@aivenronan	2021-07-22 14:03:19 +12:00

1 2 3 4 5 ...

2377 commits