postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-29 13:53:46 -04:00

Author	SHA1	Message	Date
Álvaro Herrera	fba4233c83	Reduce header inclusions via execnodes.h Remove a bunch of #include lines from execnodes.h. Most of these requier suitable typedefs to be added, so that it still compiles standalone. In one case, the fix is to move a struct definition to the one .c file where it is needed. Also some light clean up in plannodes.h and genam.h, though not as extensive as in execnodes.h. Author: Álvaro Herrera <alvherre@kurilemu.de> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql	2026-03-16 14:34:57 +01:00
Peter Eisentraut	2f094e7ac6	SQL Property Graph Queries (SQL/PGQ) Implementation of SQL property graph queries, according to SQL/PGQ standard (ISO/IEC 9075-16:2023). This adds: - GRAPH_TABLE table function for graph pattern matching - DDL commands CREATE/ALTER/DROP PROPERTY GRAPH - several new system catalogs and information schema views - psql \dG command - pg_get_propgraphdef() function for pg_dump and psql A property graph is a relation with a new relkind RELKIND_PROPGRAPH. It acts like a view in many ways. It is rewritten to a standard relational query in the rewriter. Access privileges act similar to a security invoker view. (The security definer variant is not currently implemented.) Starting documentation can be found in doc/src/sgml/ddl.sgml and doc/src/sgml/queries.sgml. Author: Peter Eisentraut <peter@eisentraut.org> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: Henson Choi <assam258@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org	2026-03-16 10:14:18 +01:00
David Rowley	c456e39113	Optimize tuple deformation This commit includes various optimizations to improve the performance of tuple deformation. We now precalculate CompactAttribute's attcacheoff, which allows us to remove the code from the deform routines which was setting the attcacheoff. Setting the attcacheoff is now handled by TupleDescFinalize(), which must be called before the TupleDesc is used for anything. Having TupleDescFinalize() means we can store the first attribute in the TupleDesc which does not have an offset cached. That allows us to add a dedicated deforming loop to deform all attributes up to the final one with an attcacheoff set, or up to the first NULL attribute, whichever comes first. Here we also improve tuple deformation performance of tuples with NULLs. Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask, deforming would, one-by-one, check each and every bit in the NULL bitmap to see if it was zero. Now, we process the NULL bitmap 1 byte at a time rather than 1 bit at a time to find the attnum with the first NULL. We can now deform the tuple without checking for NULLs up to just before that attribute. We also record the maximum attribute number which is guaranteed to exist in the tuple, that is, has a NOT NULL constraint and isn't an atthasmissing attribute. When deforming only attributes prior to the guaranteed attnum, we've no need to access the tuple's natt count. As an additional optimization, we only count fixed-width columns when calculating the maximum guaranteed column, as this eliminates the need to emit code to fetch byref types in the deformation loop for guaranteed attributes. Some locations in the code deform tuples that have yet to go through NOT NULL constraint validation. We're unable to perform the guaranteed attribute optimization when that's the case. This optimization is opt-in via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS flag. This commit also adds a more efficient way of populating the isnull array by using a bit-wise SWAR trick which performs multiplication on the inverse of the tuple's bitmap byte and masking out all but the lower bit of each of the boolean's byte. This results in much more optimal code when compared to determining the NULLness via att_isnull(). 8 isnull elements are processed at once using this method, which means we need to round the tts_isnull array size up to the next 8 bytes. The palloc code does this anyway, but the round-up needed to be formalized so as not to overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing this also allows the NULL-checking deforming loop to more efficiently check the isnull array, rather than doing the bit-wise processing for each attribute that att_isnull() does. The level of performance improvement from these changes seems to vary depending on the CPU architecture. Apple's M chips seem particularly fond of the changes, with some of the tested deform-heavy queries going over twice as fast as before. With x86-64, the speedups aren't quite as large. With tables containing only a small number of columns, the speedups will be less. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:46:00 +13:00
David Rowley	503620311e	Add all required calls to TupleDescFinalize() As of this commit all TupleDescs must have TupleDescFinalize() called on them once the TupleDesc is set up and before BlessTupleDesc() is called. In this commit, TupleDescFinalize() does nothing. This change has only been separated out from the commit that properly implements this function to make the change more obvious. Any extension which makes its own TupleDesc will need to be modified to call the new function. The follow-up commit which properly implements TupleDescFinalize() will cause any code which forgets to do this to fail in assert-enabled builds in BlessTupleDesc(). It may still be worth mentioning this change in the release notes so that extension authors update their code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:45:49 +13:00
Melanie Plageman	a3511443e5	Initialize missing fields in CreateExecutorState() `d47cbf474e` and `cbc127917e` forgot to initialize a few fields they introduced in the EState, so do that now. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com	2026-03-15 10:13:14 -04:00
Tomas Vondra	02eecead86	Tighten asserts on ParallelWorkerNumber The comment about ParallelWorkerNumbr in parallel.c says: In parallel workers, it will be set to a value >= 0 and < the number of workers before any user code is invoked; each parallel worker will get a different parallel worker number. However asserts in various places collecting instrumentation allowed (ParallelWorkerNumber == num_workers). That would be a bug, as the value is used as index into an array with num_workers entries. Fixed by adjusting the asserts accordingly. Backpatch to all supported versions. Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Backpatch-through: 14	2026-03-14 15:26:39 +01:00
David Rowley	4deecb52af	Allow sibling call optimization in slot_getsomeattrs_int() This changes the TupleTableSlotOps contract to make it so the getsomeattrs() function is in charge of calling slot_getmissingattrs(). Since this removes all code from slot_getsomeattrs_int() aside from the getsomeattrs() call itself, we may as well adjust slot_getsomeattrs() so that it calls getsomeattrs() directly. We leave slot_getsomeattrs_int() intact as this is still called from the JIT code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com	2026-03-14 13:52:09 +13:00
Andrew Dunstan	a0b6ef29a5	Enable fast default for domains with non-volatile constraints Previously, ALTER TABLE ADD COLUMN always forced a table rewrite when the column type was a domain with constraints (CHECK or NOT NULL), even if the default value satisfied those constraints. This was because contain_volatile_functions() considers CoerceToDomain immutable, so the code conservatively assumed any constrained domain might fail. Improve this by using soft error handling (ErrorSaveContext) to evaluate the CoerceToDomain expression at ALTER TABLE time. If the default value passes the domain's constraints, the value is stored as a "missing" attribute default and no table rewrite is needed. If the constraint check fails, we fall back to a table rewrite, preserving the historical behavior that constraint violations are only raised when the table actually contains rows. Domains with volatile constraint expressions always require a table rewrite since the constraint result could differ per evaluation and cannot be cached. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:05:01 -04:00
Andrew Dunstan	487cf2cbd2	Extend DomainHasConstraints() to optionally check constraint volatility Add an optional bool *has_volatile output parameter to DomainHasConstraints(). When non-NULL, the function checks whether any CHECK constraint contains a volatile expression. Callers that don't need this information pass NULL and get the same behavior as before. This is needed by a subsequent commit that enables the fast default optimization for domains with non-volatile constraints: we can safely evaluate such constraints once at ALTER TABLE time, but volatile constraints require a full table rewrite. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:04:16 -04:00
Álvaro Herrera	868825aaeb	Don't include wait_event.h in pgstat.h wait_event.h itself includes wait_event_types.h, which is a generated file, so it's nice that we can avoid compiling >10% of the tree just because that file is regenerated. To avoid breaking too many third-party modules, we now #include utils/wait_classes.h in storage/latch.h. Then, the very common case of doing WaitLatch(..., PG_WAIT_EXTENSION) continues to work by including just storage/latch.h. (I didn't try to determine how many modules would actually break if we don't do this, but this seems a convenient and low-impact measure.) Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/202602181214.gcmhx2vhlxzp@alvherre.pgsql	2026-03-06 16:24:58 +01:00
Alexander Korotkov	177037341a	Fix handling of updated tuples in the MERGE statement This branch missed the IsolationUsesXactSnapshot() check. That led to EPQ on repeatable read and serializable isolation levels. This commit fixes the issue and provides a simple isolation check for that. Backpatch through v15 where MERGE statement was introduced. Reported-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAPpHfdvzZSaNYdj5ac-tYRi6MuuZnYHiUkZ3D-AoY-ny8v%2BS%2Bw%40mail.gmail.com Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Backpatch-through: 15	2026-03-05 19:49:28 +02:00
Álvaro Herrera	a2c89835f5	Don't include proc.h in shm_mq.h This prevents proliferation of proc.h to tons of other places; shm_mq.h is widely included. Discussion: https://postgr.es/m/202602261733.s2rkxezwuif6@alvherre.pgsql	2026-02-27 10:53:47 +01:00
Andres Freund	9d6294c09e	instrumentation: Drop INSTR_TIME_SET_CURRENT_LAZY macro This macro had exactly one user in InstrStartNode, and the caller can instead use INSTR_TIME_IS_ZERO / INSTR_TIME_SET_CURRENT directly. This supports a future change that intends to modify the time source being used in the InstrStartNode case. Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53Pkx1bK1FB71_nBqYmzvSSXnp_MbE0ZDnU+baPJF6Ud2WDA@mail.gmail.com	2026-02-26 10:39:29 -05:00
Andres Freund	3218825271	instrumentation: Rename INSTR_TIME_LT macro to INSTR_TIME_GT This was incorrectly named "LT" for "larger than" in `e5a5e0a907`, but that is against existing conventions, where "LT" means "less than". Clarify by using "GT" for "greater than" in macro name, and add a missing comment at the top of instr_time.h to note the macro's existence. Reported by: Peter Smith <smithpb2250@gmail.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAHut%2BPut94CTpjQsqOJHdHkgJ2ZXq%2BqVSfMEcmDKLiWLW-hPfA%40mail.gmail.com	2026-02-26 10:38:59 -05:00
Peter Eisentraut	8354b9d6b6	Use fallthrough attribute instead of comment Instead of using comments to mark fallthrough switch cases, use the fallthrough attribute. This will (in the future, not here) allow supporting other compilers besides gcc. The commenting convention is only supported by gcc, the attribute is supported by clang, and in the fullness of time the C23 standard attribute would allow supporting other compilers as well. Right now, we package the attribute into a macro called pg_fallthrough. This commit defines that macro and replaces the existing comments with that macro invocation. We also raise the level of the gcc -Wimplicit-fallthrough= option from 3 to 5 to enforce the use of the attribute. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org	2026-02-19 08:51:12 +01:00
Álvaro Herrera	b7271aa1d7	Use a bitmask for ExecInsertIndexTuples options ... instead of passing a bunch of separate booleans. Also, rearrange the argument list in a hopefully more sensible order. Discussion: https://postgr.es/m/202602111846.xpvuccb3inbx@alvherre.pgsql Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> (older version)	2026-02-17 17:59:45 +01:00
Dean Rasheed	88327092ff	Add support for INSERT ... ON CONFLICT DO SELECT. This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which returns the pre-existing rows when conflicts are detected. The INSERT statement must have a RETURNING clause, when DO SELECT is specified. The optional FOR UPDATE/SHARE clause allows the rows to be locked before they are are returned. As with a DO UPDATE conflict action, an optional WHERE clause may be used to prevent rows from being selected for return (but as with a DO UPDATE action, rows filtered out by the WHERE clause are still locked). Bumps catversion as stored rules change. Author: Andreas Karlsson <andreas@proxel.se> Author: Marko Tiikkaja <marko@joh.to> Author: Viktor Holmberg <v@viktorh.net> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com	2026-02-12 09:57:04 +00:00
Michael Paquier	9181c870ba	Improve type handling of varlena structures This commit changes the definition of varlena to a typedef, so as it becomes possible to remove "struct" markers from various declarations in the code base. Historically, "struct" markers are not the project style for variable declarations, so this update simplifies the code and makes it more consistent across the board. This change has an impact on the following structures, simplifying declarations using them: - varlena - varatt_indirect - varatt_external This cleanup has come up in a different path set that played with TOAST and varatt.h, independently worth doing on its own. Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz	2026-02-11 07:33:24 +09:00
Peter Eisentraut	137d05df2f	Rename AssertVariableIsOfType to StaticAssertVariableIsOfType This keeps run-time assertions and static assertions clearly separate. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:45:24 +01:00
Tom Lane	da7a1dc0d6	Refactor att_align_nominal() to improve performance. Separate att_align_nominal() into two macros, similarly to what was already done with att_align_datum() and att_align_pointer(). The inner macro att_nominal_alignby() is really just TYPEALIGN(), while att_align_nominal() retains its previous API by mapping TYPALIGN_xxx values to numbers of bytes to align to and then calling att_nominal_alignby(). In support of this, split out tupdesc.c's logic to do that mapping into a publicly visible function typalign_to_alignby(). Having done that, we can replace performance-critical uses of att_align_nominal() with att_nominal_alignby(), where the typalign_to_alignby() mapping is done just once outside the loop. In most places I settled for doing typalign_to_alignby() once per function. We could in many places pass the alignby value in from the caller if we wanted to change function APIs for this purpose; but I'm a bit loath to do that, especially for exported APIs that extensions might call. Replacing a char typalign argument by a uint8 typalignby argument would be an API change that compilers would fail to warn about, thus silently breaking code in hard-to-debug ways. I did revise the APIs of array_iter_setup and array_iter_next, moving the element type attribute arguments to the former; if any external code uses those, the argument-count change will cause visible compile failures. Performance testing shows that ExecEvalScalarArrayOp is sped up by about 10% by this change, when using a simple per-element function such as int8eq. I did not check any of the other loops optimized here, but it's reasonable to expect similar gains. Although the motivation for creating this patch was to avoid a performance loss if we add some more typalign values, it evidently is worth doing whether that patch lands or not. Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us	2026-02-02 14:39:50 -05:00
Masahiko Sawada	1fdbca159e	Standardize replication origin naming to use "ReplOrigin". The replication origin code was using inconsistent naming conventions. Functions were typically prefixed with 'replorigin', while typedefs and constants used "RepOrigin". This commit unifies the naming convention by renaming RepOriginId to ReplOriginId. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com	2026-01-28 11:03:29 -08:00
Álvaro Herrera	e6d6e32f42	Fix duplicate arbiter detection during REINDEX CONCURRENTLY on partitions Commit `90eae926a` fixed ON CONFLICT handling during REINDEX CONCURRENTLY on partitioned tables by treating unparented indexes as potential arbiters. However, there's a remaining race condition: when pg_inherits records are swapped between consecutive calls to get_partition_ancestors(), two different child indexes can appear to have the same parent, causing duplicate entries in the arbiter list and triggering "invalid arbiter index list" errors. Note that this is not a new problem introduced by `90eae926a`. The same error could occur before that commit in a slightly different scenario: an index is selected during planning, then index_concurrently_swap() commits, and a subsequent call to get_partition_ancestors() uses a new catalog snapshot that sees zero ancestors for that index. Fix by tracking which parent indexes have already been processed. If a subsequent call to get_partition_ancestors() returns a parent we've already seen, treat that index as unparented instead, allowing it to be matched via IsIndexCompatibleAsArbiter() like other concurrent reindex scenarios. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/e5a8c1df-04e5-4343-85ef-5df2a7e3d90c@gmail.com	2026-01-28 14:38:53 +01:00
Peter Eisentraut	5ca5f12c2c	Fix accidentally cast away qualifiers This fixes cases where a qualifier (const, in all cases here) was dropped by a cast, but the cast was otherwise necessary or desirable, so the straightforward fix is to add the qualifier into the cast. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org	2026-01-26 16:02:31 +01:00
David Rowley	83a53572a6	Always inline SeqNext and SeqRecheck The intention of the work done in `fb9f95502` was that these functions are inlined. I noticed my compiler isn't doing this on -O2 (gcc version 15.2.0). Also, clang version 20.1.8 isn't inlining either. Fix by marking both of these functions as pg_attribute_always_inline to avoid leaving this up to the compiler's heuristics. A quick test with a Seq Scan on a table with a single int column running a query that filters all 1 million rows in the WHERE clause yields a 3.9% speedup on my Zen4 machine. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrL7Q41B=gv+3wc8+AJGKZugGegUbBo8FPQ+3+NGTPb+w@mail.gmail.com	2026-01-26 14:29:10 +13:00
Amit Langote	f9a468c664	Fix bogus ctid requirement for dummy-root partitioned targets ExecInitModifyTable() unconditionally required a ctid junk column even when the target was a partitioned table. This led to spurious "could not find junk ctid column" errors when all children were excluded and only the dummy root result relation remained. A partitioned table only appears in the result relations list when all leaf partitions have been pruned, leaving the dummy root as the sole entry. Assert this invariant (nrels == 1) and skip the ctid requirement. Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for partitioned tables, which is safe since no rows will be processed in this case. Bug: #19099 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org Backpatch-through: 14	2026-01-23 10:23:30 +09:00
Tom Lane	4b760a181a	Remove faulty Assert in partitioned INSERT...ON CONFLICT DO UPDATE. Commit `f16241bef` mistakenly supposed that INSERT...ON CONFLICT DO UPDATE rejects partitioned target tables. (This may have been accurate when the patch was written, but it was already obsolete when committed.) Hence, there's an assertion that we can't see ItemPointerIndicatesMovedPartitions() in that path, but the assertion is triggerable. Some other places throw error if they see a moved-across-partitions tuple, but there seems no need for that here, because if we just retry then we get the same behavior as in the update-within-partition case, as demonstrated by the new isolation test. So fix by deleting the faulty Assert. (The fact that this is the fix doubtless explains why we've heard no field complaints: the behavior of a non-assert build is fine.) The TM_Deleted case contains a cargo-culted copy of the same Assert, which I also deleted to avoid confusion, although I believe that one is actually not triggerable. Per our code coverage report, neither the TM_Updated nor the TM_Deleted case were reached at all by existing tests, so this patch adds tests for both. Reported-by: Dmitry Koval <d.koval@postgrespro.ru> Author: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru Backpatch-through: 14	2026-01-22 18:35:31 -05:00
Amit Langote	889676a0d5	Fix rowmark handling for non-relation RTEs during executor init Commit `cbc127917e` introduced tracking of unpruned relids to skip processing of pruned partitions. PlannedStmt.unprunableRelids is computed as the difference between PlannerGlobal.allRelids and prunableRelids, but allRelids only contains RTE_RELATION entries. This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are never included in unprunableRelids, and consequently not in es_unpruned_relids at runtime. As a result, rowmarks attached to non-relation RTEs were incorrectly skipped during executor initialization. This affects any DML statement that has rowmarks on such RTEs, including MERGE with a VALUES or subquery source, and UPDATE/DELETE with joins against subqueries or CTEs. When a concurrent update triggers an EPQ recheck, the missing rowmark leads to incorrect results. Fix by restricting the es_unpruned_relids membership check to RTE_RELATION entries only, since partition pruning only applies to actual relations. Rowmarks for other RTE kinds are now always processed. Bug: #19355 Reported-by: Bihua Wang <wangbihua.cn@gmail.com> Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org Backpatch-through: 18	2026-01-16 14:53:50 +09:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Jeff Davis	ed425b5a20	Remove redundant assignment in CreateWorkExprContext In CreateWorkExprContext(), maxBlockSize is initialized to ALLOCSET_DEFAULT_MAXSIZE, and it then immediately reassigned, thus the initialization is a redundant. Author: Andreas Karlsson <andreas@proxel.se> Reported-by: Chao Li <lic@highgo.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/83a14f3c-f347-4769-9c01-30030b31f1eb@gmail.com	2026-01-14 12:01:36 -08:00
Álvaro Herrera	225d1df1d2	Stop including {brin,gin}_tuple.h in tuplesort.h Doing this meant that those two headers, which are supposed to be internal to their corresponding index AMs, were being included pretty much universally, because tuplesort.h is included by execnodes.h which is very widely used. Stop that, and fix fallout. We also change indexing.h to no longer include execnodes.h (tuptable.h is sufficient), and relscan.h to no longer include buf.h (pointless since `c2fe139c20`). Author: Mario González <gonzalemario@gmail.com> Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com	2026-01-12 18:09:49 +01:00
Andres Freund	e5a5e0a907	instrumentation: Keep time fields as instrtime, convert in callers Previously the instrumentation logic always converted to seconds, only for many of the callers to do unnecessary division to get to milliseconds. As an upcoming refactoring will split the Instrumentation struct, utilize instrtime always to keep things simpler. It's also a bit faster to not have to first convert to a double in functions like InstrEndLoop(), InstrAggNode(). Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com	2026-01-09 13:38:00 -05:00
Andres Freund	75609fded3	Fix buggy interaction between array subscripts and subplan params In `a7f107df2` I changed subplan param evaluation to happen within the containing expression. As part of that, ExecInitSubPlanExpr() was changed to evaluate parameters via a new EEOP_PARAM_SET expression step. These parameters were temporarily stored into ExprState->resvalue/resnull, with some reasoning why that would be fine. Unfortunately, that analysis was wrong - ExecInitSubscriptionRef() evaluates the input array into "resv"/"resnull", which will often point to ExprState->resvalue/resnull. This means that the EEOP_PARAM_SET, if inside an array subscript, would overwrite the input array to array subscript. The fix is fairly simple - instead of evaluating into ExprState->resvalue/resnull, store the temporary result of the subplan in the subplan's return value. Bug: #19370 Reported-by: Zepeng Zhang <redraiment@gmail.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Diagnosed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/19370-7fb7a5854b7618f1@postgresql.org Backpatch-through: 18	2026-01-06 19:51:10 -05:00
Amit Kapila	3f906d3af9	Improve the comments atop build_replindex_scan_key(). Author: zourenli <398740848@qq.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/tencent_C2DC8157CC05C8F5C36E12678A7864554809@qq.com	2026-01-05 03:06:55 +00:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Tom Lane	bc6374cd76	Change IndexAmRoutines to be statically-allocated structs. Up to now, index amhandlers were expected to produce a new, palloc'd struct on each call. That requires palloc/pfree overhead, and creates a risk of memory leaks if the caller fails to pfree, and the time taken to fill such a large structure isn't nil. Moreover, we were storing these things in the relcache, eating several hundred bytes for each cached index. There is not anything in these structs that needs to vary at runtime, so let's change the definition so that an amhandler can return a pointer to a "static const" struct of which there's only one copy per index AM. Mark all the core code's IndexAmRoutine pointers const so that we catch anyplace that might still try to change or pfree one. (This is similar to the way we were already handling TableAmRoutine structs. This commit does fix one comment that was infelicitously copied-and-pasted into tableamapi.c.) This commit needs to be called out in the v19 release notes as an API change for extension index AMs. An un-updated AM will still work (as of now, anyway) but it risks memory leaks and will be slower than necessary. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com	2025-12-30 18:26:23 -05:00
Tom Lane	58dad7f349	Update typedefs.list to match what the buildfarm currently reports. The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us	2025-12-14 17:03:53 -05:00
Michael Paquier	4f7dacc5b8	Use palloc_object() and palloc_array(), the last change This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in `1b105f9472`, `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-11 14:29:12 +09:00
Michael Paquier	1b105f9472	Use palloc_object() and palloc_array() in backend code The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-10 07:36:46 +09:00
Álvaro Herrera	90eae926ab	Fix ON CONFLICT with REINDEX CONCURRENTLY and partitions When planning queries with ON CONFLICT on partitioned tables, the indexes to consider as arbiters for each partition are determined based on those found in the parent table. However, it's possible for an index on a partition to be reindexed, and in that case, the auxiliary indexes created on the partition must be considered as arbiters as well; failing to do that may result in spurious "duplicate key" errors given sufficient bad luck. We fix that in this commit by matching every index that doesn't have a parent to each initially-determined arbiter index. Every unparented matching index is considered an additional arbiter index. Closely related to the fixes in `bc32a12e0d` and `2bc7e886fc`, and for identical reasons, not backpatched (for now) even though it's a longstanding issue. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com	2025-12-02 13:51:53 +01:00
Peter Eisentraut	4f941d432b	Remove useless casting to same type This removes some casts where the input already has the same type as the type specified by the cast. Their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. It also improves readability. Same kind of idea as `7f798aca1d` and `ef8fe69360`. (This does not change all such instances, but only those hand-picked by the author.) Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal	2025-12-02 10:09:32 +01:00
David Rowley	0ca3b16973	Add parallelism support for TID Range Scans In v14, `bb437f995` added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca	2025-11-27 14:05:04 +13:00
Álvaro Herrera	417ac9c1ee	Improve test case stability Given unlucky timing, some of the new tests added by commit `bc32a12e0d` can fail spuriously. We haven't seen such failures yet in buildfarm, but allegedly we can prevent them with this tweak. While at it, remove an unused injection point I (Álvaro) added. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Discussion: https://postgr.es/m/CADzfLwUc=jtSUEaQCtyt8zTeOJ-gHZ8=w_KJsVjDOYSLqaY9Lg@mail.gmail.com Discussion: https://postgr.es/m/CADzfLwV5oQq-Vg_VmG_o4SdL6yHjDoNO4T4pMtgJLzYGmYf74g@mail.gmail.com	2025-11-25 18:20:06 +01:00
Álvaro Herrera	bc32a12e0d	Fix infer_arbiter_index during concurrent index operations Previously, we would only consider indexes marked indisvalid as usable for INSERT ON CONFLICT. But that's problematic during CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY, because concurrent transactions would end up with inconsistents lists of inferred indexes, leading to deadlocks and spurious errors about unique key violations (because two transactions are operating on different indexes for the speculative insertion tokens). Change this function to return indexes even if invalid. This fixes the spurious errors and deadlocks. Because such indexes might not be complete, we still need uniqueness to be verified in a different way. We do that by requiring that at least one index marked valid is part of the set of indexes returned. It is that index that is going to help ensure that the inserted tuple is indeed unique. This does not fix similar problems occurring with partitioned tables or with named constraints. These problems will be fixed in follow-up commits. We have no user report of this problem, even though it exists in all branches. Because of that and given that the fix is somewhat tricky, I decided not to backpatch for now. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com	2025-11-24 17:17:29 +01:00
Richard Guo	4d3f623ea8	Fix typo in nodeHash.c Replace "overlow" with "overflow". Author: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNnzFjAjYLTkP78HE2PQ17MjBqFdQQg+0X6Wo7YMUb68xA@mail.gmail.com	2025-11-19 11:04:03 +09:00
Álvaro Herrera	77fb3959a4	Fix typo	2025-11-18 19:31:23 +01:00
Dean Rasheed	1b92fe7bb9	Fix Assert failure in EXPLAIN ANALYZE MERGE with a concurrent update. When instrumenting a MERGE command containing both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a concurrent update of the target relation could lead to an Assert failure in show_modifytable_info(). In a non-assert build, this would lead to an incorrect value for "skipped" tuples in the EXPLAIN output, rather than a crash. This could happen if the concurrent update caused a matched row to no longer match, in which case ExecMerge() treats the single originally matched row as a pair of not matched rows, and potentially executes 2 not-matched actions for the single source row. This could then lead to a state where the number of rows processed by the ModifyTable node exceeds the number of rows produced by its source node, causing "skipped_path" in show_modifytable_info() to be negative, triggering the Assert. Fix this in ExecMergeMatched() by incrementing the instrumentation tuple count on the source node whenever a concurrent update of this kind is detected, if both kinds of merge actions exist, so that the number of source rows matches the number of actions potentially executed, and the "skipped" tuple count is correct. Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE actions was introduced. Bug: #19111 Reported-by: Dilip Kumar <dilipbalaut@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org Backpatch-through: 17	2025-11-16 22:14:06 +00:00
Daniel Gustafsson	3a872ddd64	Fix typos in nodeWindowAgg comments One of them submitted by the author, with another one other spotted during review so this fixes both. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAHewXN=eNx2oJ_hzxJrkSvy-1A5Qf45SM8pxERWXE+6RoZyFrw@mail.gmail.com	2025-11-10 12:51:47 +01:00
Etsuro Fujita	a3ebec4e4c	Update obsolete comment in ExecScanReScan(). Commit `27cc7cd2b` removed the epqScanDone flag from the EState struct, and instead added an equivalent flag named relsubs_done to the EPQState struct; but it failed to update this comment. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAPmGK152zJ3fU5avDT5udfL0namrDeVfMTL3dxdOXw28SOrycg%40mail.gmail.com Backpatch-through: 13	2025-11-06 12:25:00 +09:00
Tom Lane	8f29467c57	Change "long" numGroups fields to be Cardinality (i.e., double). We've been nibbling away at removing uses of "long" for a long time, since its width is platform-dependent. Here's one more: change the remaining "long" fields in Plan nodes to Cardinality, since the three surviving examples all represent group-count estimates. The upstream planner code was converted to Cardinality some time ago; for example the corresponding fields in Path nodes are type Cardinality, as are the arguments of the make_foo_path functions. Downstream in the executor, it turns out that these all feed to the table-size argument of BuildTupleHashTable. Change that to "double" as well, and fix it so that it safely clamps out-of-range values to the uint32 limit of simplehash.h, as was not being done before. Essentially, this is removing all the artificial datatype-dependent limitations on these values from upstream processing, and applying just one clamp at the moment where we're forced to do so by the datatype choices of simplehash.h. Also, remove BuildTupleHashTable's misguided attempt to enforce work_mem/hash_mem_limit. It doesn't have enough information (particularly not the expected tuple width) to do that accurately, and it has no real business second-guessing the caller's choice. For all these plan types, it's really the planner's responsibility to not choose a hashed implementation if the hashtable is expected to exceed hash_mem_limit. The previous patch improved the accuracy of those estimates, and even if BuildTupleHashTable had more information it should arrive at the same conclusions. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:43 -05:00
Tom Lane	1ea5bdb00b	Improve planner's estimates of tuple hash table sizes. For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:26 -05:00

1 2 3 4 5 ...

2762 commits