postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-29 05:43:27 -04:00

Author	SHA1	Message	Date
Peter Geoghegan	f026fbf059	Make IndexScanInstrumentation a pointer in executor scan nodes. Change the IndexScanInstrumentation fields in IndexScanState, IndexOnlyScanState, and BitmapIndexScanState from inline structs to pointers. This avoids additional space overhead whenever new fields are added to IndexScanInstrumentation in the future, at least in the common case where the instrumentation isn't used (i.e. when the executor node isn't being run through an EXPLAIN ANALYZE). Preparation for an upcoming patch series that will add index prefetching. The new slot-based interface that will enable index prefetching necessitates that we add at least one more field to IndexScanInstrumentation (to count heap fetches during index-only scans). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-03-22 13:20:29 -04:00
David Rowley	07d5bffe75	Fix new tuple deforming code so it can support cstrings again In `c456e3911`, I mistakenly thought that the deformer code would never see cstrings and that I could use pg_assume() to have the compiler omit producing code for attlen == -2 attributes. That saves bloating the deforming code a bit with the extra check and strlen() call. While this is ok to do for tuples from the heap, it's not ok to do for MinimalTuples as those can contain cstrings and tts_minimal_getsomeattrs() implements deforming by inlining the (slightly misleadingly named) slot_deform_heap_tuple() code. To fix, add a new parameter to the slot_deform_heap_tuple() and have the callers define which code to inline. Because this new parameter is passed as a const, the compiler can choose to emit or not emit the cstring-related code based on the parameter's value. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNmSK+gKziAt_WvQoMVWt3_LRVMmRYY9dAbMPMcpPV0QmA@mail.gmail.com	2026-03-20 14:16:06 +13:00
Tom Lane	1811f1af98	Improve hash join's handling of tuples with null join keys. In a plain join, we can just summarily discard an input tuple with null join key(s), since it cannot match anything from the other side of the join (assuming a strict join operator). However, if the tuple comes from the outer side of an outer join then we have to emit it with null-extension of the other side. Up to now, hash joins did that by inserting the tuple into the hash table as though it were a normal tuple. This is unnecessarily inefficient though, since the required processing is far simpler than for a potentially-matchable tuple. Worse, if there are a lot of such tuples they will bloat the hash bucket they go into, possibly causing useless repeated attempts to split that bucket or increase the number of batches. We have a report of a large join vainly creating many thousands of batches when faced with such input. This patch improves the situation by keeping such tuples out of the hash table altogether, instead pushing them into a separate tuplestore from which we return them later. (One might consider trying to return them immediately; but that would require substantial refactoring, and it doesn't work anyway for cases where we rescan an unmodified hash table.) This works even in parallel hash joins, because whichever worker reads a null-keyed tuple can just return it; there's no need for consultation with other workers. Thus the tuplestores are local storage even in a parallel join. A pre-existing buglet that I noticed while analyzing the code's behavior is that ExecHashRemoveNextSkewBucket fails to decrement hashtable->skewTuples for tuples moved into the main hash table from the skew hash table. This invalidates ExecHashTableInsert's calculation of the number of main-hash-table tuples, though probably not by a lot since we expect the skew table to be small relative to the main one. Nonetheless, let's fix that too while we're here. Bug: #18909 Reported-by: Sergey Koposov <Sergey.Koposov@ed.ac.uk> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/3061845.1746486714@sss.pgh.pa.us	2026-03-19 15:21:36 -04:00
Andrew Dunstan	3b4c2b9db2	Allow IS JSON predicate to work with domain types The IS JSON predicate only accepted the base types text, json, jsonb, and bytea. Extend it to also accept domain types over those base types by resolving through getBaseType() during parse analysis. The base type OID is stored in the JsonIsPredicate node (as exprBaseType) so the executor can dispatch to the correct validation path without repeating the domain lookup at runtime. When a non-supported type (or domain over a non-supported type) is used, the error message displays the original type name as written by the user, rather than the resolved base type. Author: jian he <jian.universality@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/CACJufxEk34DnJFG72CRsPPT4tsJL9arobX0tNPsn7yH28J=zQg@mail.gmail.com	2026-03-17 15:20:22 -04:00
David Rowley	d8a859d22b	Reduce size of CompactAttribute struct to 8 bytes Previously, this was 16 bytes. With the use of some bitflags and by reducing the attcacheoff field size to a 16-bit type, we can halve the size of the struct. It's unlikely that caching the offsets for offsets larger than what will fit in a 16-bit int will help much as the tuple is very likely to have some non-fixed-width types anyway, the offsets of which we cannot cache. Shrinking this down to 8 bytes helps by accessing fewer cachelines when performing tuple deformation. The fields used there are all fully fledged fields, which don't require any bitmasking to extract the value of. It also helps to more efficiently calculate the address of a compact_attrs[] element in TupleDesc as the x86 LEA instruction can work with 8 byte offsets, which allows the element address to be calculated from the TupleDesc's address in a single instruction using LEA's concurrent shift and add. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com	2026-03-17 15:06:31 +13:00
Álvaro Herrera	fba4233c83	Reduce header inclusions via execnodes.h Remove a bunch of #include lines from execnodes.h. Most of these requier suitable typedefs to be added, so that it still compiles standalone. In one case, the fix is to move a struct definition to the one .c file where it is needed. Also some light clean up in plannodes.h and genam.h, though not as extensive as in execnodes.h. Author: Álvaro Herrera <alvherre@kurilemu.de> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql	2026-03-16 14:34:57 +01:00
Peter Eisentraut	2f094e7ac6	SQL Property Graph Queries (SQL/PGQ) Implementation of SQL property graph queries, according to SQL/PGQ standard (ISO/IEC 9075-16:2023). This adds: - GRAPH_TABLE table function for graph pattern matching - DDL commands CREATE/ALTER/DROP PROPERTY GRAPH - several new system catalogs and information schema views - psql \dG command - pg_get_propgraphdef() function for pg_dump and psql A property graph is a relation with a new relkind RELKIND_PROPGRAPH. It acts like a view in many ways. It is rewritten to a standard relational query in the rewriter. Access privileges act similar to a security invoker view. (The security definer variant is not currently implemented.) Starting documentation can be found in doc/src/sgml/ddl.sgml and doc/src/sgml/queries.sgml. Author: Peter Eisentraut <peter@eisentraut.org> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: Henson Choi <assam258@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org	2026-03-16 10:14:18 +01:00
David Rowley	c456e39113	Optimize tuple deformation This commit includes various optimizations to improve the performance of tuple deformation. We now precalculate CompactAttribute's attcacheoff, which allows us to remove the code from the deform routines which was setting the attcacheoff. Setting the attcacheoff is now handled by TupleDescFinalize(), which must be called before the TupleDesc is used for anything. Having TupleDescFinalize() means we can store the first attribute in the TupleDesc which does not have an offset cached. That allows us to add a dedicated deforming loop to deform all attributes up to the final one with an attcacheoff set, or up to the first NULL attribute, whichever comes first. Here we also improve tuple deformation performance of tuples with NULLs. Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask, deforming would, one-by-one, check each and every bit in the NULL bitmap to see if it was zero. Now, we process the NULL bitmap 1 byte at a time rather than 1 bit at a time to find the attnum with the first NULL. We can now deform the tuple without checking for NULLs up to just before that attribute. We also record the maximum attribute number which is guaranteed to exist in the tuple, that is, has a NOT NULL constraint and isn't an atthasmissing attribute. When deforming only attributes prior to the guaranteed attnum, we've no need to access the tuple's natt count. As an additional optimization, we only count fixed-width columns when calculating the maximum guaranteed column, as this eliminates the need to emit code to fetch byref types in the deformation loop for guaranteed attributes. Some locations in the code deform tuples that have yet to go through NOT NULL constraint validation. We're unable to perform the guaranteed attribute optimization when that's the case. This optimization is opt-in via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS flag. This commit also adds a more efficient way of populating the isnull array by using a bit-wise SWAR trick which performs multiplication on the inverse of the tuple's bitmap byte and masking out all but the lower bit of each of the boolean's byte. This results in much more optimal code when compared to determining the NULLness via att_isnull(). 8 isnull elements are processed at once using this method, which means we need to round the tts_isnull array size up to the next 8 bytes. The palloc code does this anyway, but the round-up needed to be formalized so as not to overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing this also allows the NULL-checking deforming loop to more efficiently check the isnull array, rather than doing the bit-wise processing for each attribute that att_isnull() does. The level of performance improvement from these changes seems to vary depending on the CPU architecture. Apple's M chips seem particularly fond of the changes, with some of the tested deform-heavy queries going over twice as fast as before. With x86-64, the speedups aren't quite as large. With tables containing only a small number of columns, the speedups will be less. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:46:00 +13:00
David Rowley	503620311e	Add all required calls to TupleDescFinalize() As of this commit all TupleDescs must have TupleDescFinalize() called on them once the TupleDesc is set up and before BlessTupleDesc() is called. In this commit, TupleDescFinalize() does nothing. This change has only been separated out from the commit that properly implements this function to make the change more obvious. Any extension which makes its own TupleDesc will need to be modified to call the new function. The follow-up commit which properly implements TupleDescFinalize() will cause any code which forgets to do this to fail in assert-enabled builds in BlessTupleDesc(). It may still be worth mentioning this change in the release notes so that extension authors update their code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com	2026-03-16 11:45:49 +13:00
Melanie Plageman	a3511443e5	Initialize missing fields in CreateExecutorState() `d47cbf474e` and `cbc127917e` forgot to initialize a few fields they introduced in the EState, so do that now. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com	2026-03-15 10:13:14 -04:00
Tomas Vondra	02eecead86	Tighten asserts on ParallelWorkerNumber The comment about ParallelWorkerNumbr in parallel.c says: In parallel workers, it will be set to a value >= 0 and < the number of workers before any user code is invoked; each parallel worker will get a different parallel worker number. However asserts in various places collecting instrumentation allowed (ParallelWorkerNumber == num_workers). That would be a bug, as the value is used as index into an array with num_workers entries. Fixed by adjusting the asserts accordingly. Backpatch to all supported versions. Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Backpatch-through: 14	2026-03-14 15:26:39 +01:00
David Rowley	4deecb52af	Allow sibling call optimization in slot_getsomeattrs_int() This changes the TupleTableSlotOps contract to make it so the getsomeattrs() function is in charge of calling slot_getmissingattrs(). Since this removes all code from slot_getsomeattrs_int() aside from the getsomeattrs() call itself, we may as well adjust slot_getsomeattrs() so that it calls getsomeattrs() directly. We leave slot_getsomeattrs_int() intact as this is still called from the JIT code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com	2026-03-14 13:52:09 +13:00
Andrew Dunstan	a0b6ef29a5	Enable fast default for domains with non-volatile constraints Previously, ALTER TABLE ADD COLUMN always forced a table rewrite when the column type was a domain with constraints (CHECK or NOT NULL), even if the default value satisfied those constraints. This was because contain_volatile_functions() considers CoerceToDomain immutable, so the code conservatively assumed any constrained domain might fail. Improve this by using soft error handling (ErrorSaveContext) to evaluate the CoerceToDomain expression at ALTER TABLE time. If the default value passes the domain's constraints, the value is stored as a "missing" attribute default and no table rewrite is needed. If the constraint check fails, we fall back to a table rewrite, preserving the historical behavior that constraint violations are only raised when the table actually contains rows. Domains with volatile constraint expressions always require a table rewrite since the constraint result could differ per evaluation and cannot be cached. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:05:01 -04:00
Andrew Dunstan	487cf2cbd2	Extend DomainHasConstraints() to optionally check constraint volatility Add an optional bool *has_volatile output parameter to DomainHasConstraints(). When non-NULL, the function checks whether any CHECK constraint contains a volatile expression. Callers that don't need this information pass NULL and get the same behavior as before. This is needed by a subsequent commit that enables the fast default optimization for domains with non-volatile constraints: we can safely evaluate such constraints once at ALTER TABLE time, but volatile constraints require a full table rewrite. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io> Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com	2026-03-12 18:04:16 -04:00
Álvaro Herrera	868825aaeb	Don't include wait_event.h in pgstat.h wait_event.h itself includes wait_event_types.h, which is a generated file, so it's nice that we can avoid compiling >10% of the tree just because that file is regenerated. To avoid breaking too many third-party modules, we now #include utils/wait_classes.h in storage/latch.h. Then, the very common case of doing WaitLatch(..., PG_WAIT_EXTENSION) continues to work by including just storage/latch.h. (I didn't try to determine how many modules would actually break if we don't do this, but this seems a convenient and low-impact measure.) Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/202602181214.gcmhx2vhlxzp@alvherre.pgsql	2026-03-06 16:24:58 +01:00
Alexander Korotkov	177037341a	Fix handling of updated tuples in the MERGE statement This branch missed the IsolationUsesXactSnapshot() check. That led to EPQ on repeatable read and serializable isolation levels. This commit fixes the issue and provides a simple isolation check for that. Backpatch through v15 where MERGE statement was introduced. Reported-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAPpHfdvzZSaNYdj5ac-tYRi6MuuZnYHiUkZ3D-AoY-ny8v%2BS%2Bw%40mail.gmail.com Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Backpatch-through: 15	2026-03-05 19:49:28 +02:00
Álvaro Herrera	a2c89835f5	Don't include proc.h in shm_mq.h This prevents proliferation of proc.h to tons of other places; shm_mq.h is widely included. Discussion: https://postgr.es/m/202602261733.s2rkxezwuif6@alvherre.pgsql	2026-02-27 10:53:47 +01:00
Andres Freund	9d6294c09e	instrumentation: Drop INSTR_TIME_SET_CURRENT_LAZY macro This macro had exactly one user in InstrStartNode, and the caller can instead use INSTR_TIME_IS_ZERO / INSTR_TIME_SET_CURRENT directly. This supports a future change that intends to modify the time source being used in the InstrStartNode case. Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53Pkx1bK1FB71_nBqYmzvSSXnp_MbE0ZDnU+baPJF6Ud2WDA@mail.gmail.com	2026-02-26 10:39:29 -05:00
Andres Freund	3218825271	instrumentation: Rename INSTR_TIME_LT macro to INSTR_TIME_GT This was incorrectly named "LT" for "larger than" in `e5a5e0a907`, but that is against existing conventions, where "LT" means "less than". Clarify by using "GT" for "greater than" in macro name, and add a missing comment at the top of instr_time.h to note the macro's existence. Reported by: Peter Smith <smithpb2250@gmail.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAHut%2BPut94CTpjQsqOJHdHkgJ2ZXq%2BqVSfMEcmDKLiWLW-hPfA%40mail.gmail.com	2026-02-26 10:38:59 -05:00
Peter Eisentraut	8354b9d6b6	Use fallthrough attribute instead of comment Instead of using comments to mark fallthrough switch cases, use the fallthrough attribute. This will (in the future, not here) allow supporting other compilers besides gcc. The commenting convention is only supported by gcc, the attribute is supported by clang, and in the fullness of time the C23 standard attribute would allow supporting other compilers as well. Right now, we package the attribute into a macro called pg_fallthrough. This commit defines that macro and replaces the existing comments with that macro invocation. We also raise the level of the gcc -Wimplicit-fallthrough= option from 3 to 5 to enforce the use of the attribute. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org	2026-02-19 08:51:12 +01:00
Álvaro Herrera	b7271aa1d7	Use a bitmask for ExecInsertIndexTuples options ... instead of passing a bunch of separate booleans. Also, rearrange the argument list in a hopefully more sensible order. Discussion: https://postgr.es/m/202602111846.xpvuccb3inbx@alvherre.pgsql Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> (older version)	2026-02-17 17:59:45 +01:00
Dean Rasheed	88327092ff	Add support for INSERT ... ON CONFLICT DO SELECT. This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which returns the pre-existing rows when conflicts are detected. The INSERT statement must have a RETURNING clause, when DO SELECT is specified. The optional FOR UPDATE/SHARE clause allows the rows to be locked before they are are returned. As with a DO UPDATE conflict action, an optional WHERE clause may be used to prevent rows from being selected for return (but as with a DO UPDATE action, rows filtered out by the WHERE clause are still locked). Bumps catversion as stored rules change. Author: Andreas Karlsson <andreas@proxel.se> Author: Marko Tiikkaja <marko@joh.to> Author: Viktor Holmberg <v@viktorh.net> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com	2026-02-12 09:57:04 +00:00
Michael Paquier	9181c870ba	Improve type handling of varlena structures This commit changes the definition of varlena to a typedef, so as it becomes possible to remove "struct" markers from various declarations in the code base. Historically, "struct" markers are not the project style for variable declarations, so this update simplifies the code and makes it more consistent across the board. This change has an impact on the following structures, simplifying declarations using them: - varlena - varatt_indirect - varatt_external This cleanup has come up in a different path set that played with TOAST and varatt.h, independently worth doing on its own. Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz	2026-02-11 07:33:24 +09:00
Peter Eisentraut	137d05df2f	Rename AssertVariableIsOfType to StaticAssertVariableIsOfType This keeps run-time assertions and static assertions clearly separate. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:45:24 +01:00
Tom Lane	da7a1dc0d6	Refactor att_align_nominal() to improve performance. Separate att_align_nominal() into two macros, similarly to what was already done with att_align_datum() and att_align_pointer(). The inner macro att_nominal_alignby() is really just TYPEALIGN(), while att_align_nominal() retains its previous API by mapping TYPALIGN_xxx values to numbers of bytes to align to and then calling att_nominal_alignby(). In support of this, split out tupdesc.c's logic to do that mapping into a publicly visible function typalign_to_alignby(). Having done that, we can replace performance-critical uses of att_align_nominal() with att_nominal_alignby(), where the typalign_to_alignby() mapping is done just once outside the loop. In most places I settled for doing typalign_to_alignby() once per function. We could in many places pass the alignby value in from the caller if we wanted to change function APIs for this purpose; but I'm a bit loath to do that, especially for exported APIs that extensions might call. Replacing a char typalign argument by a uint8 typalignby argument would be an API change that compilers would fail to warn about, thus silently breaking code in hard-to-debug ways. I did revise the APIs of array_iter_setup and array_iter_next, moving the element type attribute arguments to the former; if any external code uses those, the argument-count change will cause visible compile failures. Performance testing shows that ExecEvalScalarArrayOp is sped up by about 10% by this change, when using a simple per-element function such as int8eq. I did not check any of the other loops optimized here, but it's reasonable to expect similar gains. Although the motivation for creating this patch was to avoid a performance loss if we add some more typalign values, it evidently is worth doing whether that patch lands or not. Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us	2026-02-02 14:39:50 -05:00
Masahiko Sawada	1fdbca159e	Standardize replication origin naming to use "ReplOrigin". The replication origin code was using inconsistent naming conventions. Functions were typically prefixed with 'replorigin', while typedefs and constants used "RepOrigin". This commit unifies the naming convention by renaming RepOriginId to ReplOriginId. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com	2026-01-28 11:03:29 -08:00
Álvaro Herrera	e6d6e32f42	Fix duplicate arbiter detection during REINDEX CONCURRENTLY on partitions Commit `90eae926a` fixed ON CONFLICT handling during REINDEX CONCURRENTLY on partitioned tables by treating unparented indexes as potential arbiters. However, there's a remaining race condition: when pg_inherits records are swapped between consecutive calls to get_partition_ancestors(), two different child indexes can appear to have the same parent, causing duplicate entries in the arbiter list and triggering "invalid arbiter index list" errors. Note that this is not a new problem introduced by `90eae926a`. The same error could occur before that commit in a slightly different scenario: an index is selected during planning, then index_concurrently_swap() commits, and a subsequent call to get_partition_ancestors() uses a new catalog snapshot that sees zero ancestors for that index. Fix by tracking which parent indexes have already been processed. If a subsequent call to get_partition_ancestors() returns a parent we've already seen, treat that index as unparented instead, allowing it to be matched via IsIndexCompatibleAsArbiter() like other concurrent reindex scenarios. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/e5a8c1df-04e5-4343-85ef-5df2a7e3d90c@gmail.com	2026-01-28 14:38:53 +01:00
Peter Eisentraut	5ca5f12c2c	Fix accidentally cast away qualifiers This fixes cases where a qualifier (const, in all cases here) was dropped by a cast, but the cast was otherwise necessary or desirable, so the straightforward fix is to add the qualifier into the cast. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org	2026-01-26 16:02:31 +01:00
David Rowley	83a53572a6	Always inline SeqNext and SeqRecheck The intention of the work done in `fb9f95502` was that these functions are inlined. I noticed my compiler isn't doing this on -O2 (gcc version 15.2.0). Also, clang version 20.1.8 isn't inlining either. Fix by marking both of these functions as pg_attribute_always_inline to avoid leaving this up to the compiler's heuristics. A quick test with a Seq Scan on a table with a single int column running a query that filters all 1 million rows in the WHERE clause yields a 3.9% speedup on my Zen4 machine. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrL7Q41B=gv+3wc8+AJGKZugGegUbBo8FPQ+3+NGTPb+w@mail.gmail.com	2026-01-26 14:29:10 +13:00
Amit Langote	f9a468c664	Fix bogus ctid requirement for dummy-root partitioned targets ExecInitModifyTable() unconditionally required a ctid junk column even when the target was a partitioned table. This led to spurious "could not find junk ctid column" errors when all children were excluded and only the dummy root result relation remained. A partitioned table only appears in the result relations list when all leaf partitions have been pruned, leaving the dummy root as the sole entry. Assert this invariant (nrels == 1) and skip the ctid requirement. Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for partitioned tables, which is safe since no rows will be processed in this case. Bug: #19099 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org Backpatch-through: 14	2026-01-23 10:23:30 +09:00
Tom Lane	4b760a181a	Remove faulty Assert in partitioned INSERT...ON CONFLICT DO UPDATE. Commit `f16241bef` mistakenly supposed that INSERT...ON CONFLICT DO UPDATE rejects partitioned target tables. (This may have been accurate when the patch was written, but it was already obsolete when committed.) Hence, there's an assertion that we can't see ItemPointerIndicatesMovedPartitions() in that path, but the assertion is triggerable. Some other places throw error if they see a moved-across-partitions tuple, but there seems no need for that here, because if we just retry then we get the same behavior as in the update-within-partition case, as demonstrated by the new isolation test. So fix by deleting the faulty Assert. (The fact that this is the fix doubtless explains why we've heard no field complaints: the behavior of a non-assert build is fine.) The TM_Deleted case contains a cargo-culted copy of the same Assert, which I also deleted to avoid confusion, although I believe that one is actually not triggerable. Per our code coverage report, neither the TM_Updated nor the TM_Deleted case were reached at all by existing tests, so this patch adds tests for both. Reported-by: Dmitry Koval <d.koval@postgrespro.ru> Author: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru Backpatch-through: 14	2026-01-22 18:35:31 -05:00
Amit Langote	889676a0d5	Fix rowmark handling for non-relation RTEs during executor init Commit `cbc127917e` introduced tracking of unpruned relids to skip processing of pruned partitions. PlannedStmt.unprunableRelids is computed as the difference between PlannerGlobal.allRelids and prunableRelids, but allRelids only contains RTE_RELATION entries. This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are never included in unprunableRelids, and consequently not in es_unpruned_relids at runtime. As a result, rowmarks attached to non-relation RTEs were incorrectly skipped during executor initialization. This affects any DML statement that has rowmarks on such RTEs, including MERGE with a VALUES or subquery source, and UPDATE/DELETE with joins against subqueries or CTEs. When a concurrent update triggers an EPQ recheck, the missing rowmark leads to incorrect results. Fix by restricting the es_unpruned_relids membership check to RTE_RELATION entries only, since partition pruning only applies to actual relations. Rowmarks for other RTE kinds are now always processed. Bug: #19355 Reported-by: Bihua Wang <wangbihua.cn@gmail.com> Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org Backpatch-through: 18	2026-01-16 14:53:50 +09:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Jeff Davis	ed425b5a20	Remove redundant assignment in CreateWorkExprContext In CreateWorkExprContext(), maxBlockSize is initialized to ALLOCSET_DEFAULT_MAXSIZE, and it then immediately reassigned, thus the initialization is a redundant. Author: Andreas Karlsson <andreas@proxel.se> Reported-by: Chao Li <lic@highgo.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/83a14f3c-f347-4769-9c01-30030b31f1eb@gmail.com	2026-01-14 12:01:36 -08:00
Álvaro Herrera	225d1df1d2	Stop including {brin,gin}_tuple.h in tuplesort.h Doing this meant that those two headers, which are supposed to be internal to their corresponding index AMs, were being included pretty much universally, because tuplesort.h is included by execnodes.h which is very widely used. Stop that, and fix fallout. We also change indexing.h to no longer include execnodes.h (tuptable.h is sufficient), and relscan.h to no longer include buf.h (pointless since `c2fe139c20`). Author: Mario González <gonzalemario@gmail.com> Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com	2026-01-12 18:09:49 +01:00
Andres Freund	e5a5e0a907	instrumentation: Keep time fields as instrtime, convert in callers Previously the instrumentation logic always converted to seconds, only for many of the callers to do unnecessary division to get to milliseconds. As an upcoming refactoring will split the Instrumentation struct, utilize instrtime always to keep things simpler. It's also a bit faster to not have to first convert to a double in functions like InstrEndLoop(), InstrAggNode(). Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com	2026-01-09 13:38:00 -05:00
Andres Freund	75609fded3	Fix buggy interaction between array subscripts and subplan params In `a7f107df2` I changed subplan param evaluation to happen within the containing expression. As part of that, ExecInitSubPlanExpr() was changed to evaluate parameters via a new EEOP_PARAM_SET expression step. These parameters were temporarily stored into ExprState->resvalue/resnull, with some reasoning why that would be fine. Unfortunately, that analysis was wrong - ExecInitSubscriptionRef() evaluates the input array into "resv"/"resnull", which will often point to ExprState->resvalue/resnull. This means that the EEOP_PARAM_SET, if inside an array subscript, would overwrite the input array to array subscript. The fix is fairly simple - instead of evaluating into ExprState->resvalue/resnull, store the temporary result of the subplan in the subplan's return value. Bug: #19370 Reported-by: Zepeng Zhang <redraiment@gmail.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Diagnosed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/19370-7fb7a5854b7618f1@postgresql.org Backpatch-through: 18	2026-01-06 19:51:10 -05:00
Amit Kapila	3f906d3af9	Improve the comments atop build_replindex_scan_key(). Author: zourenli <398740848@qq.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/tencent_C2DC8157CC05C8F5C36E12678A7864554809@qq.com	2026-01-05 03:06:55 +00:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Tom Lane	bc6374cd76	Change IndexAmRoutines to be statically-allocated structs. Up to now, index amhandlers were expected to produce a new, palloc'd struct on each call. That requires palloc/pfree overhead, and creates a risk of memory leaks if the caller fails to pfree, and the time taken to fill such a large structure isn't nil. Moreover, we were storing these things in the relcache, eating several hundred bytes for each cached index. There is not anything in these structs that needs to vary at runtime, so let's change the definition so that an amhandler can return a pointer to a "static const" struct of which there's only one copy per index AM. Mark all the core code's IndexAmRoutine pointers const so that we catch anyplace that might still try to change or pfree one. (This is similar to the way we were already handling TableAmRoutine structs. This commit does fix one comment that was infelicitously copied-and-pasted into tableamapi.c.) This commit needs to be called out in the v19 release notes as an API change for extension index AMs. An un-updated AM will still work (as of now, anyway) but it risks memory leaks and will be slower than necessary. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com	2025-12-30 18:26:23 -05:00
Tom Lane	58dad7f349	Update typedefs.list to match what the buildfarm currently reports. The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us	2025-12-14 17:03:53 -05:00
Michael Paquier	4f7dacc5b8	Use palloc_object() and palloc_array(), the last change This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in `1b105f9472`, `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-11 14:29:12 +09:00
Michael Paquier	1b105f9472	Use palloc_object() and palloc_array() in backend code The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-10 07:36:46 +09:00
Álvaro Herrera	90eae926ab	Fix ON CONFLICT with REINDEX CONCURRENTLY and partitions When planning queries with ON CONFLICT on partitioned tables, the indexes to consider as arbiters for each partition are determined based on those found in the parent table. However, it's possible for an index on a partition to be reindexed, and in that case, the auxiliary indexes created on the partition must be considered as arbiters as well; failing to do that may result in spurious "duplicate key" errors given sufficient bad luck. We fix that in this commit by matching every index that doesn't have a parent to each initially-determined arbiter index. Every unparented matching index is considered an additional arbiter index. Closely related to the fixes in `bc32a12e0d` and `2bc7e886fc`, and for identical reasons, not backpatched (for now) even though it's a longstanding issue. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com	2025-12-02 13:51:53 +01:00
Peter Eisentraut	4f941d432b	Remove useless casting to same type This removes some casts where the input already has the same type as the type specified by the cast. Their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. It also improves readability. Same kind of idea as `7f798aca1d` and `ef8fe69360`. (This does not change all such instances, but only those hand-picked by the author.) Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal	2025-12-02 10:09:32 +01:00
David Rowley	0ca3b16973	Add parallelism support for TID Range Scans In v14, `bb437f995` added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca	2025-11-27 14:05:04 +13:00
Álvaro Herrera	417ac9c1ee	Improve test case stability Given unlucky timing, some of the new tests added by commit `bc32a12e0d` can fail spuriously. We haven't seen such failures yet in buildfarm, but allegedly we can prevent them with this tweak. While at it, remove an unused injection point I (Álvaro) added. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Discussion: https://postgr.es/m/CADzfLwUc=jtSUEaQCtyt8zTeOJ-gHZ8=w_KJsVjDOYSLqaY9Lg@mail.gmail.com Discussion: https://postgr.es/m/CADzfLwV5oQq-Vg_VmG_o4SdL6yHjDoNO4T4pMtgJLzYGmYf74g@mail.gmail.com	2025-11-25 18:20:06 +01:00
Álvaro Herrera	bc32a12e0d	Fix infer_arbiter_index during concurrent index operations Previously, we would only consider indexes marked indisvalid as usable for INSERT ON CONFLICT. But that's problematic during CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY, because concurrent transactions would end up with inconsistents lists of inferred indexes, leading to deadlocks and spurious errors about unique key violations (because two transactions are operating on different indexes for the speculative insertion tokens). Change this function to return indexes even if invalid. This fixes the spurious errors and deadlocks. Because such indexes might not be complete, we still need uniqueness to be verified in a different way. We do that by requiring that at least one index marked valid is part of the set of indexes returned. It is that index that is going to help ensure that the inserted tuple is indeed unique. This does not fix similar problems occurring with partitioned tables or with named constraints. These problems will be fixed in follow-up commits. We have no user report of this problem, even though it exists in all branches. Because of that and given that the fix is somewhat tricky, I decided not to backpatch for now. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com	2025-11-24 17:17:29 +01:00
Richard Guo	4d3f623ea8	Fix typo in nodeHash.c Replace "overlow" with "overflow". Author: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNnzFjAjYLTkP78HE2PQ17MjBqFdQQg+0X6Wo7YMUb68xA@mail.gmail.com	2025-11-19 11:04:03 +09:00
Álvaro Herrera	77fb3959a4	Fix typo	2025-11-18 19:31:23 +01:00

1 2 3 4 5 ...

2767 commits