Commit graph

2060 commits

Author SHA1 Message Date
Robert Haas
0fbfd37cef Allow extensions to mark an individual index as disabled.
Up until now, the only way for a loadable module to disable the use of a
particular index was to use build_simple_rel_hook (or, previous to
yesterday's commit, get_relation_info_hook) to remove it from the index
list. While that works, it has some disadvantages. First, the index
becomes invisible for all purposes, and can no longer be used for
optimizations such as self-join elimination or left join removal, which
can severely degrade the resulting plan.

Second, if the module attempts to compel the use of a certain index
by removing all other indexes from the index list and disabling
other scan types, but the planner is unable to use the chosen index
for some reason, it will fall back to a sequential scan, because that
is only disabled, whereas the other indexes are, from the planner's
point of view, completely gone. While this situation ideally shouldn't
occur, it's hard for a loadable module to be completely sure whether
the planner will view a certain index as usable for a certain query.
If it isn't, it may be better to fall back to a scan using a disabled
index rather than falling back to an also-disabled sequential scan.

Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA%2BTgmoYS4ZCVAF2jTce%3DbMP0Oq_db_srocR4cZyO0OBp9oUoGg%40mail.gmail.com
2026-03-10 08:33:55 -04:00
Jeff Davis
8185bb5347 CREATE SUBSCRIPTION ... SERVER.
Allow CREATE SUBSCRIPTION to accept a foreign server using the SERVER
clause instead of a raw connection string using the CONNECTION clause.

  * Enables a user with sufficient privileges to create a subscription
    using a foreign server by name without specifying the connection
    details.

  * Integrates with user mappings (and other FDW infrastructure) using
    the subscription owner.

  * Provides a layer of indirection to manage multiple subscriptions
    to the same remote server more easily.

Also add CREATE FOREIGN DATA WRAPPER ... CONNECTION clause to specify
a connection_function. To be eligible for a subscription, the foreign
server's foreign data wrapper must specify a connection_function.

Add connection_function support to postgres_fdw, and bump postgres_fdw
version to 1.3.

Bump catversion.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.camel@j-davis.com
2026-03-06 08:27:56 -08:00
Amit Kapila
fd366065e0 Allow table exclusions in publications via EXCEPT TABLE.
Extend CREATE PUBLICATION ... FOR ALL TABLES to support the EXCEPT TABLE
syntax. This allows one or more tables to be excluded. The publisher will
not send the data of excluded tables to the subscriber.

To support this, pg_publication_rel now includes a prexcept column to flag
excluded relations. For partitioned tables, the exclusion is applied at
the root level; specifying a root table excludes all current and future
partitions in that tree.

Follow-up work will implement ALTER PUBLICATION support for managing these
exclusions.

Author: vignesh C <vignesh21@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
2026-03-04 15:56:48 +05:30
Michael Paquier
462fe0ff62 Fix variety of typos and grammar mistakes
This commit includes a batch of fixes for various minor typos and
grammar mistakes, that have been proposed to the hackers mailing list
since the beginning of January.

Similar batches are planned on a bi-monthly basis depending on the
amount received, with the next one for the end of April.
2026-02-24 13:26:37 +09:00
Dean Rasheed
88327092ff Add support for INSERT ... ON CONFLICT DO SELECT.
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.

The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).

Bumps catversion as stored rules change.

Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
2026-02-12 09:57:04 +00:00
Robert Haas
7358abcc60 Store information about Append node consolidation in the final plan.
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.

To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.

This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.

Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 17:55:59 -05:00
Robert Haas
0d4391b265 Store information about elided nodes in the final plan.
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.

To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable.  So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.

Also, update pg_overexplain to display these details.

Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 16:46:05 -05:00
Robert Haas
adbad833f3 Store information about range-table flattening in the final plan.
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.

Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.

To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.

This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.

Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 15:33:39 -05:00
Peter Eisentraut
0af05b5dbb Revert "Change copyObject() to use typeof_unqual"
This reverts commit 4cfce4e62c.

This implementation fails to compile on newer MSVC that support
__typeof_unqual__.  (Older versions did not support it and compiled
fine.)  Revert for now and research further.

Reported-by: Bryan Green <dbryan.green@gmail.com>
Discussion: https://www.postgresql.org/message-id/b03ddcd4-2a16-49ee-b105-e7f609f3c514%40gmail.com
2026-02-07 10:08:38 +01:00
Peter Eisentraut
4cfce4e62c Change copyObject() to use typeof_unqual
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers.  This is incorrect, since the point of copyObject() is to
make a copy to mutate.  But apparently no code ran into it.

The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.

typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__.  We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.

Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
2026-02-04 09:22:41 +01:00
Robert Haas
4020b370f2 Allow for plugin control over path generation strategies.
Each RelOptInfo now has a pgs_mask member which is a mask of acceptable
strategies. For most rels, this is populated from PlannerGlobal's
default_pgs_mask, which is computed from the values of the enable_*
GUCs at the start of planning.

For baserels, get_relation_info_hook can be used to adjust pgs_mask for
each new RelOptInfo, at least for rels of type RTE_RELATION. Adjusting
pgs_mask is less useful for other types of rels, but if it proves to
be necessary, we can revisit the way this hook works or add a new one.

For joinrels, two new hooks are added. joinrel_setup_hook is called each
time a joinrel is created, and one thing that can be done from that hook
is to manipulate pgs_mask for the new joinrel. join_path_setup_hook is
called each time we're about to add paths to a joinrel by considering
some particular combination of an outer rel, an inner rel, and a join
type. It can modify the pgs_mask propagated into JoinPathExtraData to
restrict strategy choice for that particular combination of rels.

To make joinrel_setup_hook work as intended, the existing calls to
build_joinrel_partition_info are moved later in the calling functions;
this is because that function checks whether the rel's pgs_mask includes
PGS_CONSIDER_PARTITIONWISE, so we want it to only be called after
plugins have had a chance to alter pgs_mask.

Upper rels currently inherit pgs_mask from the input relation. It's
unclear that this is the most useful behavior, but at the moment there
are no hooks to allow the mask to be set in any other way.

Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-01-28 11:28:34 -05:00
Michael Paquier
e217dc7484 Fix query jumbling with GROUP BY clauses
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling.  For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;

However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.

This fixes an oversight in 247dea89f7, that has introduced an RTE for
GROUP BY clauses.  Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.

The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.

Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
2026-01-14 08:44:12 +09:00
Álvaro Herrera
225d1df1d2
Stop including {brin,gin}_tuple.h in tuplesort.h
Doing this meant that those two headers, which are supposed to be
internal to their corresponding index AMs, were being included pretty
much universally, because tuplesort.h is included by execnodes.h which
is very widely used.  Stop that, and fix fallout.

We also change indexing.h to no longer include execnodes.h (tuptable.h
is sufficient), and relscan.h to no longer include buf.h (pointless
since c2fe139c20).

Author: Mario González <gonzalemario@gmail.com>
Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com
2026-01-12 18:09:49 +01:00
Álvaro Herrera
2defd00062
Move instrumentation-related structs to instrument_node.h
Some structs and enums related to parallel query instrumentation had
organically grown scattered across various files, and were causing
header pollution especially through execnodes.h.  Create a single file
where they can live together.

This only moves the structs to the new file; cleaning up the pollution
by removing no-longer-necessary cross-header inclusion will be done in
future commits.

Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Mario González <gonzalemario@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql
Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com
2026-01-12 16:59:28 +01:00
Michael Paquier
b8cfcb9e00 Fix typos and inconsistencies in code and comments
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree).  All of these have been spotted by the author.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
2026-01-05 09:19:15 +09:00
Tom Lane
62299bbd90 Add parse location to IndexElem.
This patch mostly just fills in the field, although a few error
reports in resolve_unique_index_expr() are adjusted to use it.
The next commit will add more uses.

catversion bump out of an abundance of caution: I'm not sure
IndexElem can appear in stored rules, but I'm not sure it can't
either.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com
Discussion: https://postgr.es/m/202512121327.f2zimsr6guso@alvherre.pgsql
2026-01-04 14:16:20 -05:00
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Tom Lane
58dad7f349 Update typedefs.list to match what the buildfarm currently reports.
The current list from the buildfarm includes quite a few typedef
names that it used to miss.  The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array.  In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it.  Install the current list and re-run pgindent
to update affected code.

Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.

Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
2025-12-14 17:03:53 -05:00
Alexander Korotkov
4b3d173629 Implement ALTER TABLE ... SPLIT PARTITION ... command
This new DDL command splits a single partition into several partitions.  Just
like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are
created using the createPartitionTable() function with the parent partition
as the template.

This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing.  This is why the new DDL command
can't be recommended for large, partitioned tables under high load.  However,
this implementation comes in handy in certain cases, even as it is.  Also, it
could serve as a foundation for future implementations with less locking and
possibly parallelism.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
2025-12-14 13:29:38 +02:00
Alexander Korotkov
f2e4cc4279 Implement ALTER TABLE ... MERGE PARTITIONS ... command
This new DDL command merges several partitions into a single partition of the
target table.  The target partition is created using the new
createPartitionTable() function with the parent partition as the template.

This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing.  This is why this new DDL
command can't be recommended for large partitioned tables under a high load.
However, this implementation comes in handy in certain cases, even as it is.
Also, it could serve as a foundation for future implementations with less
locking and possibly parallelism.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
2025-12-14 13:29:17 +02:00
Peter Eisentraut
9790affcce Fix stray references to SubscriptRef
This type never existed.  SubscriptingRef was meant instead.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org
2025-12-03 14:44:14 +01:00
David Rowley
0ca3b16973 Add parallelism support for TID Range Scans
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose.  Here, we allow these scans to
be parallelized.  The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan.  Doing that means workers finish at roughly the same time.

Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers).  It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range.  There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.

Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
2025-11-27 14:05:04 +13:00
David Rowley
42473b3b31 Have the planner replace COUNT(ANY) with COUNT(*), when possible
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.

Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*).  This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref.  This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column.  The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.

It may be possible to add prosupport functions for other aggregates.  We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).

There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test.  Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
2025-11-27 10:43:28 +13:00
David Rowley
07d1dc3aeb Fix incorrect IndexOptInfo header comment
The comment incorrectly indicated that indexcollations[] stored
collations for both key columns and INCLUDE columns, but in reality it
only has elements for the key columns.  canreturn[] didn't get a mention,
so add that while we're here.

Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com
Backpatch-through: 14
2025-11-24 17:00:01 +13:00
Tom Lane
b140c8d7a3 Add SupportRequestInlineInFrom planner support request.
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery.  The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated.  While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
2025-11-22 19:33:34 -05:00
Alexander Korotkov
447aae13b0 Implement WAIT FOR command
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Tom Lane
8f29467c57 Change "long" numGroups fields to be Cardinality (i.e., double).
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent.  Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates.  The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions.  Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable.  Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.

Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.

Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit.  It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit.  The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-11-02 16:57:43 -05:00
Tom Lane
c106ef0807 Use BumpContext contexts in TupleHashTables, and do some code cleanup.
For all extant uses of TupleHashTables, execGrouping.c itself does
nothing with the "tablecxt" except to allocate new hash entries in it,
and the callers do nothing with it except to reset the whole context.
So this is an ideal use-case for a BumpContext, and the hash tables
are frequently big enough for the savings to be significant.

(Commit cc721c459 already taught nodeAgg.c this idea, but neglected
the other callers of BuildTupleHashTable.)

While at it, let's clean up some ill-advised leftovers from rebasing
TupleHashTables on simplehash.h:

* Many comments and variable names were based on the idea that the
tablecxt holds the whole TupleHashTable, whereas now it only holds the
hashed tuples (plus any caller-defined "additional storage").  Rename
to names like tuplescxt and tuplesContext, and adjust the comments.
Also adjust the memory context names to be like "<Foo> hashed tuples".

* Make ResetTupleHashTable() reset the tuplescxt rather than relying
on the caller to do so; that was fairly bizarre and seems like a
recipe for leaks.  This is less efficient in the case where nodeAgg.c
uses the same tuplescxt for several different hashtables, but only
microscopically so because mcxt.c will short-circuit the extra resets
via its isReset flag.  I judge the extra safety and intellectual
cleanliness well worth those few cycles.

* Remove the long-obsolete "allow_jit" check added by ac88807f9;
instead, just Assert that metacxt and tuplescxt are different.
We need that anyway for this definition of ResetTupleHashTable() to
be safe.

There is a side issue of the extent to which this change invalidates
the planner's estimates of hashtable memory consumption.  However,
those estimates are already pretty bad, so improving them seems like
it can be a separate project.  This change is useful to do first to
establish consistent executor behavior that the planner can expect.

A loose end not addressed here is that the "entrysize" calculation
in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) +
additionalsize" corresponds neither to the size of the simplehash
entries nor to the total space needed per tuple.  It's questionable
why BuildTupleHashTable is second-guessing its caller's nbuckets
choice at all, since the original source of the number should have had
more information.  But that all seems wrapped up with the planner's
estimation logic, so let's leave it for the planned followup patch.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us
2025-10-30 11:21:22 -04:00
Peter Eisentraut
8ce795fcb7 Fix some confusing uses of const
There are a few places where we have

    typedef struct FooData { ... } FooData;
    typedef FooData *Foo;

and then function declarations with

    bar(const Foo x)

which isn't incorrect but probably meant

    bar(const FooData *x)

meaning that the thing x points to is immutable, not x itself.

This patch makes those changes where appropriate.  In one
case (execGrouping.c), the thing being pointed to was not immutable,
so in that case remove the const altogether, to avoid further
confusion.

Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
2025-10-30 11:20:04 +01:00
Amit Kapila
f0b3573c3a Introduce "REFRESH SEQUENCES" for subscriptions.
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.

In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION

These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.

This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.

Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-23 08:30:27 +00:00
David Rowley
86d118f9a6 Englishify comment wording
Switch to using the English word here rather than using a verbified
function name.

The full word still fits within a single comment line, so it's probably
better just to use that instead of trying to shorten it, which might
cause confusion.

Author: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/CA+FpmFe7LnRF2NA_QfARjkSWme4mNt+Udwbh2Yb=zZm35Ji31w@mail.gmail.com
2025-10-18 12:50:14 +13:00
Nathan Bossart
812221b204 Remove partColsUpdated.
This information appears to have been unused since commit
c5b7ba4e67.  We could not find any references in third-party code,
either.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan
2025-10-16 11:31:38 -05:00
Amit Kapila
2436b8c047 Standardize use of REFRESH PUBLICATION in code and messages.
This patch replaces ALTER SUBSCRIPTION REFRESH with
ALTER SUBSCRIPTION REFRESH PUBLICATION in comments and error messages to
improve clarity and support future extensibility. The change aligns with
upcoming addition REFRESH SEQUENCES for sequence synchronization.

Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-15 03:42:27 +00:00
Richard Guo
1206df04c2 Rename apply_at to apply_agg_at for clarity
The field name "apply_at" in RelAggInfo was a bit ambiguous.  Rename
it to "apply_agg_at" to improve clarity and make its purpose clearer.

Per complaint from David Rowley, Robert Haas.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+TgmoZ0KR2_XCWHy17=HHcQ3p2Mamc9c6Dnnhf1J6wPYFD9ng@mail.gmail.com
2025-10-14 16:35:22 +09:00
Amit Kapila
96b3784973 Add "ALL SEQUENCES" support to publications.
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.

Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.

ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.

The view pg_publication_sequences provides information about the mapping
between publications and sequences.

This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.

Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-09 03:48:54 +00:00
Robert Haas
4685977cc5 Add extension_state member to PlannedStmt.
Extensions can stash data computed at plan time into this list using
planner_shutdown_hook (or perhaps other mechanisms) and then access
it from any code that has access to the PlannedStmt (such as explain
hooks), allowing for extensible debugging and instrumentation of
plans.

Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08 09:07:49 -04:00
Richard Guo
8e11859102 Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined.  Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.

In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations.  To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node.  During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied.  If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.

Grouped relation paths can be generated in two ways.  The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths.  To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation.  Joining two grouped relations
is currently not supported.

To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.

For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators.  This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does.  This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.

One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation.  Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.

If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end.  The final paths will compete in the
usual way with paths built from regular planning.

The patch was originally proposed by Antonin Houska in 2017.  This
commit reworks various important aspects and rewrites most of the
current code.  However, the original patch and reviews were very
useful.

Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
2025-10-08 17:04:23 +09:00
Robert Haas
64095d1574 Remove PlannerInfo's join_search_private method.
Instead, use the new mechanism that allows planner extensions to store
private state inside a PlannerInfo, treating GEQO as an in-core planner
extension.  This is a useful test of the new facility, and also buys
back a few bytes of storage.

To make this work, we must remove innerrel_is_unique_ext's hack of
testing whether join_search_private is set as a proxy for whether
the join search might be retried. Add a flag that extensions can
use to explicitly signal their intentions instead.

Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-07 12:43:45 -04:00
Robert Haas
0132dddab3 Allow private state in certain planner data structures.
Extension that make extensive use of planner hooks may want to
coordinate their efforts, for example to avoid duplicate computation,
but that's currently difficult because there's no really good way to
pass data between different hooks.

To make that easier, allow for storage of extension-managed private
state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very
similar lines to what we have permitted for ExplainState since commit
c65bc2e1d1.

Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-07 12:09:30 -04:00
Robert Haas
8c49a484e8 Assign each subquery a unique name prior to planning it.
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.

To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.

We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.

We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).

Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
2025-10-07 09:18:54 -04:00
Tatsuo Ishii
25a30bbd42 Add IGNORE NULLS/RESPECT NULLS option to Window functions.
Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead,
lag, first_value, last_value and nth_value window functions.  If
unspecified, the default is RESPECT NULLS which includes NULL values
in any result calculation. IGNORE NULLS ignores NULL values.

Built-in window functions are modified to call new API
WinCheckAndInitializeNullTreatment() to indicate whether they accept
IGNORE NULLS/RESPECT NULLS option or not (the API can be called by
user defined window functions as well).  If WinGetFuncArgInPartition's
allowNullTreatment argument is true and IGNORE NULLS option is given,
WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return
evaluated function's argument expression on specified non NULL row (if
it exists) in the partition or the frame.

When IGNORE NULLS option is given, window functions need to visit and
evaluate same rows over and over again to look for non null rows. To
mitigate the issue, 2-bit not null information array is created while
executing window functions to remember whether the row has been
already evaluated to NULL or NOT NULL. If already evaluated, we could
skip the evaluation work, thus we could get better performance.

Author: Oliver Ford <ojford@gmail.com>
Co-authored-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com>
Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
2025-10-03 09:47:36 +09:00
Álvaro Herrera
3bf31dd243
Do a tiny bit of header file maintenance
Stop including utils/relcache.h in access/genam.h, and stop including
htup_details.h in nodes/tidbitmap.h.  Both these files (genam.h and
tidbitmap.h) are widely used in other header files, so it's in our best
interest that they remain as lean as reasonable.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
2025-09-30 12:28:29 +02:00
Tom Lane
ef38a4d975 Add GROUP BY ALL.
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does
not contain an aggregate or window function into the groupClause of
the query, making it exactly equivalent to specifying those same
expressions in an explicit GROUP BY list.

This feature is useful for certain kinds of data exploration.  It's
already present in some other DBMSes, and the SQL committee recently
accepted it into the standard, so we can be reasonably confident in
the syntax being stable.  We do have to invent part of the semantics,
as the standard doesn't allow for expressions in GROUP BY, so they
haven't specified what to do with window functions.  We assume that
those should be treated like aggregates, i.e., left out of the
constructed GROUP BY list.

In passing, wordsmith some existing documentation about GROUP BY,
and update some neglected synopsis entries in select_into.sgml.

Author: David Christensen <david@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
2025-09-29 16:55:17 -04:00
Masahiko Sawada
66cdef4425 Remove unused for_all_tables field from AlterPublicationStmt.
No backpatch as AlterPublicationStmt struct is exposed.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAD21AoC6B6AuxWOST-TkxUbDgp8FwX=BLEJZmKLG_VrH-hfxpA@mail.gmail.com
2025-09-26 09:23:00 -07:00
Álvaro Herrera
81fc3e28e3
Update some more forward declarations to use typedef
As commit d4d1fc527b.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql
2025-09-25 14:33:19 +02:00
Robert Haas
f2bae51dfd Keep track of what RTIs a Result node is scanning.
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.

Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.

Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted.  Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.

The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.

Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
2025-09-23 09:07:55 -04:00
Richard Guo
e3a0304eba Fix misleading comment in RangeTblEntry
The comment describing join_using_alias incorrectly referred to the
alias field as being defined "below", when it actually appears earlier
in the RangeTblEntry struct.  This patch fixes that.

Author: Steve Lau <stevelauc@outlook.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/TYWPR01MB10612B020C33FD08F729415CEB613A@TYWPR01MB10612.jpnprd01.prod.outlook.com
2025-09-22 10:04:39 +09:00
Peter Eisentraut
d4d1fc527b Update various forward declarations to use typedef
There are a number of forward declarations that use struct but not the
customary typedef, because that could have led to repeat typedefs,
which was not allowed.  This is now allowed in C11, so we can update
these to provide the typedefs as well, so that the later uses of the
types look more consistent.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-15 11:04:10 +02:00
Peter Eisentraut
1e3b5edb8e Remove workarounds against repeat typedefs
This is allowed in C11, so we don't need the workarounds anymore.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-15 11:04:10 +02:00
Tom Lane
bdc6cfcd12 Eliminate duplicative hashtempcxt in nodeSubplan.c.
Instead of building a separate memory context that's used just
for running hash functions, make the hash functions run in the
per-tuple context of the node's innerecontext.  This saves a
little space at runtime, and it avoids needing to reset two
contexts instead of one inside buildSubPlanHash's main loop.

This largely reverts commit 133924e13.  That's safe to do now
because bf6c614a2 decoupled the evaluation context used by
TupleHashTableMatch from that used for hash function evaluation,
so that there's no longer a risk of resetting the innerecontext
too soon.

Per discussion of bug #19040, although this is not directly
a fix for that.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reviewed-by: Fei Changhong <feichanghong@qq.com>
Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
2025-09-10 16:15:08 -04:00