Commit graph

2331 commits

Author SHA1 Message Date
Alvaro Herrera
7103ebb7aa
Add support for MERGE SQL command
MERGE performs actions that modify rows in the target table using a
source table or query. MERGE provides a single SQL statement that can
conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise
require multiple PL statements.  For example,

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular tables, partitioned tables and inheritance
hierarchies, including column and row security enforcement, as well as
support for row and statement triggers and transition tables therein.

MERGE is optimized for OLTP and is parameterizable, though also useful
for large scale ETL/ELT. MERGE is not intended to be used in preference
to existing single SQL commands for INSERT, UPDATE or DELETE since there
is some overhead.  MERGE can be used from PL/pgSQL.

MERGE does not support targetting updatable views or foreign tables, and
RETURNING clauses are not allowed either.  These limitations are likely
fixable with sufficient effort.  Rewrite rules are also not supported,
but it's not clear that we'd want to support them.

Author: Pavan Deolasee <pavan.deolasee@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-28 16:47:48 +02:00
Andrew Dunstan
f4fb45d15c SQL/JSON constructors
This patch introduces the SQL/JSON standard constructors for JSON:

JSON()
JSON_ARRAY()
JSON_ARRAYAGG()
JSON_OBJECT()
JSON_OBJECTAGG()

For the most part these functions provide facilities that mimic
existing json/jsonb functions. However, they also offer some useful
additional functionality. In addition to text input, the JSON() function
accepts bytea input, which it will decode and constuct a json value from.
The other functions provide useful options for handling duplicate keys
and null values.

This series of patches will be followed by a consolidated documentation
patch.

Nikita Glukhov

Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-27 17:03:34 -04:00
Andrew Dunstan
f79b803dcc Common SQL/JSON clauses
This introduces some of the building blocks used by the SQL/JSON
constructor and query functions. Specifically, it provides node
executor and grammar support for the FORMAT JSON [ENCODING foo]
clause, and values decorated with it, and for the RETURNING clause.

The following SQL/JSON patches will leverage these.

Nikita Glukhov (who probably deserves an award for perseverance).

Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-27 17:03:33 -04:00
Michael Paquier
411b91360f Fix comment in execParallel.c
0f61727 has made this comment incorrect.

Author: Julien Rouhaud
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/20220326160117.qtp5nkuku6cvhcby@jrouhaud
2022-03-27 18:22:22 +09:00
Tomas Vondra
923def9a53 Allow specifying column lists for logical replication
This allows specifying an optional column list when adding a table to
logical replication. The column list may be specified after the table
name, enclosed in parentheses. Columns not included in this list are not
sent to the subscriber, allowing the schema on the subscriber to be a
subset of the publisher schema.

For UPDATE/DELETE publications, the column list needs to cover all
REPLICA IDENTITY columns. For INSERT publications, the column list is
arbitrary and may omit some REPLICA IDENTITY columns. Furthermore, if
the table uses REPLICA IDENTITY FULL, column list is not allowed.

The column list can contain only simple column references. Complex
expressions, function calls etc. are not allowed. This restriction could
be relaxed in the future.

During the initial table synchronization, only columns included in the
column list are copied to the subscriber. If the subscription has
several publications, containing the same table with different column
lists, columns specified in any of the lists will be copied.

This means all columns are replicated if the table has no column list
at all (which is treated as column list with all columns), or when of
the publications is defined as FOR ALL TABLES (possibly IN SCHEMA that
matches the schema of the table).

For partitioned tables, publish_via_partition_root determines whether
the column list for the root or the leaf relation will be used. If the
parameter is 'false' (the default), the list defined for the leaf
relation is used. Otherwise, the column list for the root partition
will be used.

Psql commands \dRp+ and \d <table-name> now display any column lists.

Author: Tomas Vondra, Alvaro Herrera, Rahila Syed
Reviewed-by: Peter Eisentraut, Alvaro Herrera, Vignesh C, Ibrar Ahmed,
Amit Kapila, Hou zj, Peter Smith, Wang wei, Tang, Shi yu
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
2022-03-26 01:01:27 +01:00
Tomas Vondra
75b1521dae Add decoding of sequences to built-in replication
This commit adds support for decoding of sequences to the built-in
replication (the infrastructure was added by commit 0da92dc530).

The syntax and behavior mostly mimics handling of tables, i.e. a
publication may be defined as FOR ALL SEQUENCES (replicating all
sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating
all sequences in a particular schema) or individual sequences.

To publish sequence modifications, the publication has to include
'sequence' action. The protocol is extended with a new message,
describing sequence increments.

A new system view pg_publication_sequences lists all the sequences
added to a publication, both directly and indirectly. Various psql
commands (\d and \dRp) are improved to also display publications
including a given sequence, or sequences included in a publication.

Author: Tomas Vondra, Cary Huang
Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres
             Freund, Petr Jelinek
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-03-24 18:49:27 +01:00
Andrew Dunstan
1460fc5942 Revert "Common SQL/JSON clauses"
This reverts commit 865fe4d5df.

This has caused issues with a significant number of buildfarm members
2022-03-22 19:56:14 -04:00
Andrew Dunstan
865fe4d5df Common SQL/JSON clauses
This introduces some of the building blocks used by the SQL/JSON
constructor and query functions. Specifically, it provides node
executor and grammar support for the FORMAT JSON [ENCODING foo]
clause, and values decorated with it, and for the RETURNING clause.

The following SQL/JSON patches will leverage these.

Nikita Glukhov (who probably deserves an award for perseverance).

Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup. Erik Rijkers, Zihong Yu and
Himanshu Upadhyaya.

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-22 17:32:54 -04:00
Alvaro Herrera
2d655a08d5
Blind fix for uninitialized memory bug in ba9a7e3921
Valgrind animal skink shows a crash in this new code.  I couldn't
reproduce the problem locally, but going by blind code inspection,
initializing insert_destrel should be sufficient to fix the problem.
2022-03-20 22:10:24 +01:00
Alvaro Herrera
ba9a7e3921
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior.  For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation.  Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table.  Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.

The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions.  That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.

This misbehavior stems from commit f56f8f8da6 (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f17844104 (which had introduced cross-partition
updates a year earlier).  Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches.  It also depends on
commit f4566345cf, which had its own share of backpatchability issues
as well.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
Alvaro Herrera
a1fc50672c
Fix an outdated and grammatically wrong comment
Authored by Amit Langote and myself independently
Discussion: https://postgr.es/m/CA+HiwqGCjcH0gG-=tM7hhP7TEDmzrHMHJbPGSHtHgFmx9mnFkg@mail.gmail.com
2022-03-19 19:34:04 +01:00
Tom Lane
ec62cb0aac Revert applying column aliases to the output of whole-row Vars.
In commit bf7ca1587, I had the bright idea that we could make the
result of a whole-row Var (that is, foo.*) track any column aliases
that had been applied to the FROM entry the Var refers to.  However,
that's not terribly logically consistent, because now the output of
the Var is no longer of the named composite type that the Var claims
to emit.  bf7ca1587 tried to handle that by changing the output
tuple values to be labeled with a blessed RECORD type, but that's
really pretty disastrous: we can wind up storing such tuples onto
disk, whereupon they're not readable by other sessions.

The only practical fix I can see is to give up on what bf7ca1587
tried to do, and say that the column names of tuples produced by
a whole-row Var are always those of the underlying named composite
type, query aliases or no.  While this introduces some inconsistencies,
it removes others, so it's not that awful in the abstract.  What *is*
kind of awful is to make such a behavioral change in a back-patched
bug fix.  But corrupt data is worse, so back-patched it will be.

(A workaround available to anyone who's unhappy about this is to
introduce an extra level of sub-SELECT, so that the whole-row Var is
referring to the sub-SELECT's output and not to a named table type.
Then the Var is of type RECORD to begin with and there's no issue.)

Per report from Miles Delahunty.  The faulty commit dates to 9.5,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-17 18:18:05 -04:00
Alvaro Herrera
25e777cf8e
Split ExecUpdate and ExecDelete into reusable pieces
Create subroutines ExecUpdatePrologue / ExecUpdateAct /
ExecUpdateEpilogue, and similar for ExecDelete.

Introduce a new struct to be used internally in nodeModifyTable.c,
dubbed ModifyTableContext, which contains all context information needed
to perform these operations, as well as ExecInsert and others.

This allows using a different schedule and a different way of evaluating
the results of these operations, which can be exploited by a later
commit introducing support for MERGE.  It also makes ExecUpdate and
ExecDelete proper shorter and (hopefully) simpler.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/202202271724.4z7xv3cf46kv@alvherre.pgsql
2022-03-17 11:47:04 +01:00
Michael Paquier
c28839c832 Improve comment in execReplication.c
Author: Peter Smith
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CAHut+PuRVf3ghNTg8EV5XOQu6unGSZma0ahsRoz-haaOFZe-1A@mail.gmail.com
2022-03-08 14:29:03 +09:00
Peter Eisentraut
791b1b71da Parse/analyze function renaming
There are three parallel ways to call parse/analyze: with fixed
parameters, with variable parameters, and by supplying your own parser
callback.  Some of the involved functions were confusingly named and
made this API structure more confusing.  This patch renames some
functions to make this clearer:

parse_analyze() -> parse_analyze_fixedparams()
pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams()

(Otherwise one might think this variant doesn't accept parameters, but
in fact all three ways accept parameters.)

pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb()

(Before, and also when considering pg_analyze_and_rewrite(), one might
think this is the only way to pass parameters.  Moreover, the parser
callback doesn't necessarily need to parse only parameters, it's just
one of the things it could do.)

parse_fixed_parameters() -> setup_parse_fixed_parameters()
parse_variable_parameters() -> setup_parse_variable_parameters()

(These functions don't actually do any parsing, they just set up
callbacks to use during parsing later.)

This patch also adds some const decorations to the fixed-parameters
API, so the distinction from the variable-parameters API is more
clear.

Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
2022-03-04 14:50:22 +01:00
Tom Lane
12d768e704 Don't use static storage for SaveTransactionCharacteristics().
This is pretty queasy-making on general principles, and the more so
once you notice that CommitTransactionCommand() is actually stomping
on the values saved by _SPI_commit().  It's okay as long as the
active values didn't change during HoldPinnedPortals(); but that's
a larger assumption than I think we want to make, especially since
the fix is so simple.

Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us
2022-02-28 12:54:12 -05:00
Tom Lane
2e517818f4 Fix SPI's handling of errors during transaction commit.
SPI_commit previously left it up to the caller to recover from any error
occurring during commit.  Since that's complicated and requires use of
low-level xact.c facilities, it's not too surprising that no caller got
it right.  Let's move the responsibility for cleanup into spi.c.  Doing
that requires redefining SPI_commit as starting a new transaction, so
that it becomes equivalent to SPI_commit_and_chain except that you get
default transaction characteristics instead of preserving the prior
transaction's characteristics.  We can make this pretty transparent
API-wise by redefining SPI_start_transaction() as a no-op.  Callers
that expect to do something in between might be surprised, but
available evidence is that no callers do so.

Having made that API redefinition, we can fix this mess by having
SPI_commit[_and_chain] trap errors and start a new, clean transaction
before re-throwing the error.  Likewise for SPI_rollback[_and_chain].
Some cleanup is also needed in AtEOXact_SPI, which was nowhere near
smart enough to deal with SPI contexts nested inside a committing
context.

While plperl and pltcl need no changes beyond removing their now-useless
SPI_start_transaction() calls, plpython needs some more work because it
hadn't gotten the memo about catching commit/rollback errors in the
first place.  Such an error resulted in longjmp'ing out of the Python
interpreter, which leaks Python stack entries at present and is reported
to crash Python 3.11 altogether.  Add the missing logic to catch such
errors and convert them into Python exceptions.

We are probably going to have to back-patch this once Python 3.11 ships,
but it's a sufficiently basic change that I'm a bit nervous about doing
so immediately.  Let's let it bake awhile in HEAD first.

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com
Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
2022-02-28 12:45:36 -05:00
Daniel Gustafsson
2313a3ee22 Fix statenames in mergejoin comments
The names in the comments were on a few states not consistent with
the documented state.

Author: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CALNJ-vQVthfQXVqmrHR8BKHtC4fMGbhM1xbvJNJAPexTq_dH=w@mail.gmail.com
2022-02-23 10:54:03 +01:00
Amit Kapila
52e4f0cd47 Allow specifying row filters for logical replication of tables.
This feature adds row filtering for publication tables. When a publication
is defined or modified, an optional WHERE clause can be specified. Rows
that don't satisfy this WHERE clause will be filtered out. This allows a
set of tables to be partially replicated. The row filter is per table. A
new row filter can be added simply by specifying a WHERE clause after the
table name. The WHERE clause must be enclosed by parentheses.

The row filter WHERE clause for a table added to a publication that
publishes UPDATE and/or DELETE operations must contain only columns that
are covered by REPLICA IDENTITY. The row filter WHERE clause for a table
added to a publication that publishes INSERT can use any column. If the
row filter evaluates to NULL, it is regarded as "false". The WHERE clause
only allows simple expressions that don't have user-defined functions,
user-defined operators, user-defined types, user-defined collations,
non-immutable built-in functions, or references to system columns. These
restrictions could be addressed in the future.

If you choose to do the initial table synchronization, only data that
satisfies the row filters is copied to the subscriber. If the subscription
has several publications in which a table has been published with
different WHERE clauses, rows that satisfy ANY of the expressions will be
copied. If a subscriber is a pre-15 version, the initial table
synchronization won't use row filters even if they are defined in the
publisher.

The row filters are applied before publishing the changes. If the
subscription has several publications in which the same table has been
published with different filters (for the same publish operation), those
expressions get OR'ed together so that rows satisfying any of the
expressions will be replicated.

This means all the other filters become redundant if (a) one of the
publications have no filter at all, (b) one of the publications was
created using FOR ALL TABLES, (c) one of the publications was created
using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema.

If your publication contains a partitioned table, the publication
parameter publish_via_partition_root determines if it uses the partition's
row filter (if the parameter is false, the default) or the root
partitioned table's row filter.

Psql commands \dRp+ and \d <table-name> will display any row filters.

Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian
Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 08:11:50 +05:30
John Naylor
4b35408f1e Use bitwise rotate functions in more places
There were a number of places in the code that used bespoke bit-twiddling
expressions to do bitwise rotation. While we've had pg_rotate_right32()
for a while now, we hadn't gotten around to standardizing on that. Do so
now. Since many potential call sites look more natural with the "left"
equivalent, add that function too.

Reviewed by Tom Lane and Yugo Nagata

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com
2022-02-20 13:22:08 +07:00
Alexander Korotkov
3f74daa8df Fix memory leak in IndexScan node with reordering
Fix ExecReScanIndexScan() to free the referenced tuples while emptying the
priority queue.  Backpatch to all supported versions.

Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com
Author: Aliaksandr Kalenik
Reviewed-by: Tom Lane, Alexander Korotkov
Backpatch-through: 10
2022-02-14 04:17:04 +03:00
Tom Lane
5e26aa641e Test, don't just Assert, that mergejoin's inputs are in order.
There are two Asserts in nodeMergejoin.c that are reachable if
the input data is not in the expected order.  This seems way too
fragile.  Alexander Lakhin reported a case where the assertions
could be triggered with misconfigured foreign-table partitions,
and bitter experience with unstable operating system collation
definitions suggests another easy route to hitting them.  Neither
Assert is in a place where we can't afford one more test-and-branch,
so replace 'em with plain test-and-elog logic.

Per bug #17395.  While the reported symptom is relatively recent,
collation changes could happen anytime, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
2022-02-05 11:59:29 -05:00
Andres Freund
7c1aead6cb Fix compiler warning in non-assert builds, introduced in f862d57057.
Discussion: https://postgr.es/m/20220203183655.ralgkh54sdcgysmn@alap3.anarazel.de
Backpatch: 14-, like f862d57057
2022-02-03 10:44:26 -08:00
Etsuro Fujita
f862d57057 Further fix for EvalPlanQual with mix of local and foreign partitions.
We assume that direct-modify ForeignScan nodes cannot be re-evaluated
during EvalPlanQual processing, but the rework for inherited
UPDATE/DELETE in commit 86dc90056 changed things, without considering
that, so that such ForeignScan nodes get called as part of the
EvalPlanQual subtree during EvalPlanQual processing in the case of an
inherited UPDATE/DELETE where the inheritance set contains foreign
target relations.  To avoid re-evaluating such ForeignScan nodes during
EvalPlanQual processing, commit c3928b467 modified nodeForeignscan.c,
but the assumption made there that ExecForeignScan() should never be
called for such ForeignScan nodes during EvalPlanQual processing turned
out to be wrong in some cases, leading to a segmentation fault or a
"cannot re-evaluate a Foreign Update or Delete during EvalPlanQual"
error.

Fix by modifying nodeForeignscan.c further to avoid re-evaluating such
ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan()
during EvalPlanQual processing.  Since this makes non-reachable the
test-and-elog added to ForeignNext() by commit c3928b467 that produced
the aforesaid error, convert the test-and-elog to an Assert.

Per bug #17355 from Alexander Lakhin.  Back-patch to v14 where both
commits came in.

Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote.

Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org
2022-02-03 15:15:00 +09:00
Etsuro Fujita
eabcfd99ed Fix typo in comment. 2022-01-28 15:45:00 +09:00
Peter Geoghegan
db6736c93c Fix memory leak in indexUnchanged hint mechanism.
Commit 9dc718bd added a "logically unchanged by UPDATE" hinting
mechanism, which is currently used within nbtree indexes only (see
commit d168b666).  This mechanism determined whether or not the incoming
item is a logically unchanged duplicate (a duplicate needed only for
MVCC versioning purposes) once per row updated per non-HOT update.  This
approach led to memory leaks which were noticeable with an UPDATE
statement that updated sufficiently many rows, at least on tables that
happen to have an expression index.

On HEAD, fix the issue by adding a cache to the executor's per-index
IndexInfo struct.

Take a different approach on Postgres 14 to avoid an ABI break: simply
pass down the hint to all indexes unconditionally with non-HOT UPDATEs.
This is deemed acceptable because the hint is currently interpreted
within btinsert() as "perform a bottom-up index deletion pass if and
when the only alternative is splitting the leaf page -- prefer to delete
any LP_DEAD-set items first".  nbtree must always treat the hint as a
noisy signal about what might work, as a strategy of last resort, with
costs imposed on non-HOT updaters.  (The same thing might not be true
within another index AM that applies the hint, which is why the original
behavior is preserved on HEAD.)

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com>
Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us
Backpatch: 14-, where the hinting mechanism was added.
2022-01-12 15:41:04 -08:00
Bruce Momjian
27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Tom Lane
9a3ddeb519 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00
Tom Lane
bbc227e951 Always use ReleaseTupleDesc after lookup_rowtype_tupdesc et al.
The API spec for lookup_rowtype_tupdesc previously said you could use
either ReleaseTupleDesc or DecrTupleDescRefCount.  However, the latter
choice means the caller must be certain that the returned tupdesc is
refcounted.  I don't recall right now whether that was always true
when this spec was written, but it's certainly not always true since
we introduced shared record typcaches for parallel workers.  That means
that callers using DecrTupleDescRefCount are dependent on typcache
behavior details that they probably shouldn't be.  Hence, change the API
spec to say that you must call ReleaseTupleDesc, and fix the half-dozen
callers that weren't.

AFAICT this is just future-proofing, there's no live bug here.
So no back-patch.

Per gripe from Chapman Flack.

Discussion: https://postgr.es/m/61B901A4.1050808@anastigmatix.net
2021-12-15 18:58:20 -05:00
Tom Lane
3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
David Rowley
411137a429 Flush Memoize cache when non-key parameters change, take 2
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 23:29:14 +13:00
David Rowley
dad20ad470 Revert "Flush Memoize cache when non-key parameters change"
This reverts commit 1050048a31.
2021-11-24 15:27:43 +13:00
David Rowley
1050048a31 Flush Memoize cache when non-key parameters change
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 14:56:18 +13:00
David Rowley
e502150f7d Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:06:59 +13:00
Tom Lane
01fc652703 Fix variable lifespan in ExecInitCoerceToDomain().
This undoes a mistake in 1ec7679f1: domainval and domainnull were
meant to live across loop iterations, but they were incorrectly
moved inside the loop.  The effect was only to emit useless extra
EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless,
back-patch to v13 where the mistake was introduced.

Ranier Vilela

Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com
2021-11-02 13:36:47 -04:00
Tom Lane
e9d9ba2a4d Avoid some other O(N^2) hazards in list manipulation.
In the same spirit as 6301c3ada, fix some more places where we were
using list_delete_first() in a loop and thereby risking O(N^2)
behavior.  It's not clear that the lists manipulated in these spots
can get long enough to be really problematic ... but it's not clear
that they can't, either, and the fixes are simple enough.

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-01 16:24:39 -04:00
Tom Lane
3e310d837a Fix assignment to array of domain over composite.
An update such as "UPDATE ... SET fld[n].subfld = whatever"
failed if the array elements were domains rather than plain
composites.  That's because isAssignmentIndirectionExpr()
failed to cope with the CoerceToDomain node that would appear
in the expression tree in this case.  The result would typically
be a crash, and even if we accidentally didn't crash, we'd not
correctly preserve other fields of the same array element.

Per report from Onder Kalaci.  Back-patch to v11 where arrays of
domains came in.

Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
2021-10-19 13:54:45 -04:00
Heikki Linnakangas
c4649cce39 Refactor LogicalTapeSet/LogicalTape interface.
All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.

This makes the tape management in hash agg spilling in nodeAgg.c simpler.

Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi
Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
2021-10-18 14:46:01 +03:00
Robert Haas
46846433a0 shm_mq: Update mq_bytes_written less often.
Do not update shm_mq's mq_bytes_written until we have written
an amount of data greater than 1/4th of the ring size, unless
the caller of shm_mq_send(v) requests a flush at the end of
the message. This reduces the number of calls to SetLatch(),
and also the number of CPU cache misses, considerably, and thus
makes shm_mq significantly faster.

Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some
minor cosmetic changes by me.

Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
2021-10-14 16:13:36 -04:00
Michael Paquier
68f7c4b57a Clean up more code using "(expr) ? true : false"
This is similar to fd0625c, taking care of any remaining code paths that
are worth the cleanup.  This also changes some cases using opposite
expression patterns.

Author: Justin Pryzby, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com
2021-10-11 09:36:42 +09:00
Tom Lane
a0558cfa39 Fix checking of query type in plpgsql's RETURN QUERY command.
Prior to v14, we insisted that the query in RETURN QUERY be of a type
that returns tuples.  (For instance, INSERT RETURNING was allowed,
but not plain INSERT.)  That happened indirectly because we opened a
cursor for the query, so spi.c checked SPI_is_cursor_plan().  As a
consequence, the error message wasn't terribly on-point, but at least
it was there.

Commit 2f48ede08 lost this detail.  Instead, plain RETURN QUERY
insisted that the query be a SELECT (by checking for SPI_OK_SELECT)
while RETURN QUERY EXECUTE failed to check the query type at all.
Neither of these changes was intended.

The only convenient place to check this in the EXECUTE case is inside
_SPI_execute_plan, because we haven't done parse analysis until then.
So we need to pass down a flag saying whether to enforce that the
query returns tuples.  Fortunately, we can squeeze another boolean
into struct SPIExecuteOptions without an ABI break, since there's
padding space there.  (It's unlikely that any extensions would
already be using this new struct, but preserving ABI in v14 seems
like a smart idea anyway.)

Within spi.c, it seemed like _SPI_execute_plan's parameter list
was already ridiculously long, and I didn't want to make it longer.
So I thought of passing SPIExecuteOptions down as-is, allowing that
parameter list to become much shorter.  This makes the patch a bit
more invasive than it might otherwise be, but it's all internal to
spi.c, so that seems fine.

Per report from Marc Bachmann.  Back-patch to v14 where the
faulty code came in.

Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
2021-10-03 13:21:20 -04:00
Michael Paquier
e767ddcd35 Fix typos and grammar in code comments
Several mistakes have piled in the code comments over the time,
including incorrect grammar, function names and simple typos.  This
commit takes care of a portion of these.

No backpatch is done as this is only cosmetic.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
2021-09-27 14:21:28 +09:00
Tom Lane
e3ec3c00d8 Remove arbitrary 64K-or-so limit on rangetable size.
Up to now the size of a query's rangetable has been limited by the
constants INNER_VAR et al, which mustn't be equal to any real
rangetable index.  65000 doubtless seemed like enough for anybody,
and it still is orders of magnitude larger than the number of joins
we can realistically handle.  However, we need a rangetable entry
for each child partition that is (or might be) processed by a query.
Queries with a few thousand partitions are getting more realistic,
so that the day when that limit becomes a problem is in sight,
even if it's not here yet.  Hence, let's raise the limit.

Rather than just increase the values of INNER_VAR et al, this patch
adopts the approach of making them small negative values, so that
rangetables could theoretically become as long as INT_MAX.

The bulk of the patch is concerned with changing Var.varno and some
related variables from "Index" (unsigned int) to plain "int".  This
is basically cosmetic, with little actual effect other than to help
debuggers print their values nicely.  As such, I've only bothered
with changing places that could actually see INNER_VAR et al, which
the parser and most of the planner don't.  We do have to be careful
in places that are performing less/greater comparisons on varnos,
but there are very few such places, other than the IS_SPECIAL_VARNO
macro itself.

A notable side effect of this patch is that while it used to be
possible to add INNER_VAR et al to a Bitmapset, that will now
draw an error.  I don't see any likelihood that it wouldn't be a
bug to include these fake varnos in a bitmapset of real varnos,
so I think this is all to the good.

Although this touches outfuncs/readfuncs, I don't think a catversion
bump is required, since stored rules would never contain Vars
with these fake varnos.

Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut

Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
2021-09-15 14:11:21 -04:00
Peter Eisentraut
639a86e36a Remove Value node struct
The Value node struct is a weird construct.  It is its own node type,
but most of the time, it actually has a node type of Integer, Float,
String, or BitString.  As a consequence, the struct name and the node
type don't match most of the time, and so it has to be treated
specially a lot.  There doesn't seem to be any value in the special
construct.  There is very little code that wants to accept all Value
variants but nothing else (and even if it did, this doesn't provide
any convenient way to check it), and most code wants either just one
particular node type (usually String), or it accepts a broader set of
node types besides just Value.

This change removes the Value struct and node type and replaces them
by separate Integer, Float, String, and BitString node types that are
proper node types and structs of their own and behave mostly like
normal node types.

Also, this removes the T_Null node tag, which was previously also a
possible variant of Value but wasn't actually used outside of the
Value contained in A_Const.  Replace that by an isnull field in
A_Const.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
2021-09-09 08:36:53 +02:00
Michael Paquier
fd0625c7a9 Clean up some code using "(expr) ? true : false"
All the code paths simplified here were already using a boolean or used
an expression that led to zero or one, making the extra bits
unnecessary.

Author: Justin Pryzby
Reviewed-by: Tom Lane, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
2021-09-08 09:44:04 +09:00
Heikki Linnakangas
6ac763f19a Fix missing words in comment.
Introduced by commit c3928b467a, backpatch to v14 like that one.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA+HiwqFQgNLS6VGntMcuJV6erBFV425xA6wBVnY=41GK4zC0Bw@mail.gmail.com
2021-09-07 10:28:55 +03:00
Tom Lane
26ae660903 Improve error messages about misuse of SELECT INTO.
Improve two places in plpgsql, and one in spi.c, where an error
message would confusingly tell you that you couldn't use a SELECT
query, when what you had written *was* a SELECT query.  The actual
problem is that you can't use SELECT ... INTO in these contexts,
but the messages failed to make that apparent.  Special-case
SELECT INTO to make these errors more helpful.

Also, fix the same spots in plpgsql, as well as several messages
in exec_eval_expr(), to not quote the entire complained-of query or
expression in the primary error message.  That behavior very easily
led to violating our message style guideline about keeping the primary
error message short and single-line.  Also, since the important part
of the message was after the inserted text, it could make the real
problem very hard to see.  We can report the query or expression as
the first line of errcontext instead.

Per complaint from Roger Mason.  Back-patch to v14, since (a) some
of these messages are new in v14 and (b) v14's translatable strings
are still somewhat in flux.  The problem's older than that of
course, but I'm hesitant to change the behavior further back.

Discussion: https://postgr.es/m/1914708.1629474624@sss.pgh.pa.us
2021-08-21 10:22:14 -04:00
Tomas Vondra
650663b4cb Use appropriate tuple descriptor in FDW batching
The FDW batching code was using the same tuple descriptor both for all
slots (regular and plan slots), but that's incorrect - the subplan may
use a different descriptor. Currently this is benign, because batching
is used only for INSERTs, and in that case the descriptors always match.
But that would change if we allow batching UPDATEs.

Fix by copying the appropriate tuple descriptor. Backpatch to 14, where
the FDW batching was implemented.

Author: Amit Langote
Backpatch-through: 14, where FDW batching was added
Discussion: https://postgr.es/m/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com
2021-08-12 22:10:06 +02:00
Heikki Linnakangas
c3928b467a Fix segfault during EvalPlanQual with mix of local and foreign partitions.
It's not sensible to re-evaluate a direct-modify Foreign Update or Delete
during EvalPlanQual. However, ExecInitForeignScan() can still get called
if a table mixes local and foreign partitions. EvalPlanQualStart() left
the es_result_relations array uninitialized in the child EPQ EState, but
ExecInitForeignScan() still expected to find it. That caused a segfault.

Fix by skipping the es_result_relations lookup during EvalPlanQual
processing. To make things a bit more robust, also skip the
BeginDirectModify calls, and add a runtime check that ExecForeignScan()
is not called on direct-modify foreign scans during EvalPlanQual
processing.

This is new in v14, commit 1375422c78. Before that, EvalPlanQualStart()
copied the whole ResultRelInfo array to the EPQ EState. Backpatch to v14.

Report and diagnosis by Andrey Lepikhov.

Discussion: https://www.postgresql.org/message-id/cb2b808d-cbaa-4772-76ee-c8809bafcf3d%40postgrespro.ru
2021-08-12 11:02:29 +03:00
Peter Eisentraut
2226b4189b Change SeqScan node to contain Scan node
This makes the structure of all Scan-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Etsuro Fujita
a8ed9bd59d Fix oversight in commit 1ec7fca859.
I failed to account for the possibility that when
ExecAppendAsyncEventWait() notifies multiple async-capable nodes using
postgres_fdw, a preceding node might invoke process_pending_request() to
process a pending asynchronous request made by a succeeding node.  In
that case the succeeding node should produce a tuple to return to the
parent Append node from tuples fetched by process_pending_request() when
notified.  Repair.

Per buildfarm via Michael Paquier.  Back-patch to v14, like the previous
commit.

Thanks to Tom Lane for testing.

Discussion: https://postgr.es/m/YQP0UPT8KmPiHTMs%40paquier.xyz
2021-08-02 12:45:00 +09:00
Etsuro Fujita
1ec7fca859 postgres_fdw: Fix handling of pending asynchronous requests.
A pending asynchronous request is handled by process_pending_request(),
which previously not only processed an in-progress remote query but
performed ExecForeignScan() to produce a tuple to return to the local
server asynchronously from the result of the remote query.  But that led
to a server crash when executing a query or led to an "InstrStartNode
called twice in a row" or "InstrEndLoop called on running node" failure
when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it
contained multiple async-capable nodes accessing the same
initplan/subplan that contained multiple async-capable nodes scanning
the same foreign tables as for the parent async-capable nodes, as
reported by Andrey Lepikhov.  The reason is that the second step in
process_pending_request() invoked when executing the initplan/subplan
for one of the parent async-capable nodes caused recursive execution of
the initplan/subplan for another of the parent async-capable nodes.

To fix, split process_pending_request() into the two steps and postpone
the second step until ForeignAsyncConfigureWait() is called for each of
the pending asynchronous requests.  Also, in ExecAppendAsyncEventWait()
we assumed that FDWs would register at least one wait event in a
WaitEventSet created there when they were called from
ForeignAsyncConfigureWait() in that function, but allow FDWs to register
zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait()
to just return in that case.

Oversight in commit 27e1f1456.  Back-patch to v14 where that commit went
in.

Andrey Lepikhov and Etsuro Fujita

Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru
2021-07-30 17:00:00 +09:00
Tom Lane
28d936031a Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
David Rowley
91e9e89dcc Make nodeSort.c use Datum sorts for single column sorts
Datum sorts can be significantly faster than tuple sorts, especially when
the data type being sorted is a pass-by-value type.  Something in the
region of 50-70% performance improvements appear to be possible.

Just in case there's any confusion; the Datum sort is only used when the
targetlist of the Sort node contains a single column, not when there's a
single column in the sort key and multiple items in the target list.

Author: Ronan Dunklau
Reviewed-by: James Coleman, David Rowley, Ranier Vilela, Hou Zhijie
Tested-by: John Naylor
Discussion: https://postgr.es/m/3177670.itZtoPt7T5@aivenronan
2021-07-22 14:03:19 +12:00
Peter Eisentraut
d9a38c52ce Rename NodeTag of ExprState
Rename from tag to type, for consistency with all other node structs.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-07-21 08:48:33 +02:00
Peter Eisentraut
81d5995b4b More improvements of error messages about mismatching relkind
Follow-up to 2ed532ee8c, a few error
messages in the logical replication area currently only deal with
tables, but if we're anticipating more relkinds such as sequences
being handled, then these messages also fall into the category
affected by the previous patch, so adjust them too.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/c9ba5c6a-4bd5-e12c-1b3c-edbcaedbf392@enterprisedb.com
2021-07-21 07:52:10 +02:00
Peter Eisentraut
2b00db4fb0 Use l*_node() family of functions where appropriate
Instead of castNode(…, lfoo(…))

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/87eecahraj.fsf@wibble.ilmari.org
2021-07-19 08:20:24 +02:00
David Rowley
83f4fcc655 Change the name of the Result Cache node to Memoize
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough.  That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize".  People seem to like "Memoize", so let's do the rename.

Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
2021-07-14 12:43:58 +12:00
David Rowley
29f45e299e Use a hash table to speed up NOT IN(values)
Similar to 50e17ad28, which allowed hash tables to be used for IN clauses
with a set of constants, here we add the same feature for NOT IN clauses.

NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3.
Obviously, if we're using a hash table we must be exactly equivalent to
that and return the same result taking into account that either side of
the condition could contain a NULL.  This requires a little bit of
special handling to make work with the hash table version.

When processing NOT IN, the ScalarArrayOpExpr's operator will be the <>
operator.  To be able to build and lookup a hash table we must use the
<>'s negator operator.  The planner checks if that exists and is hashable
and sets the relevant fields in ScalarArrayOpExpr to instruct the executor
to use hashing.

Author: David Rowley, James Coleman
Reviewed-by: James Coleman, Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvoF1mum_FRk6D621edcB6KSHBi2+GAgWmioj5AhOu2vwQ@mail.gmail.com
2021-07-07 16:29:17 +12:00
Tom Lane
955b3e0f92 Allow CustomScan providers to say whether they support projections.
Previously, all CustomScan providers had to support projections,
but there may be cases where this is inconvenient.  Add a flag
bit to say if it's supported.

Important item for the release notes: this is non-backwards-compatible
since the default is now to assume that CustomScan providers can't
project, instead of assuming that they can.  It's fail-soft, but could
result in visible performance penalties due to adding unnecessary
Result nodes.

Sven Klemm, reviewed by Aleksander Alekseev; some cosmetic fiddling
by me.

Discussion: https://postgr.es/m/CAMCrgp1kyakOz6c8aKhNDJXjhQ1dEjEnp+6KNT3KxPrjNtsrDg@mail.gmail.com
2021-07-06 18:10:20 -04:00
David Rowley
63b1af9437 Cleanup some aggregate code in the executor
Here we alter the code that calls build_pertrans_for_aggref() so that the
function no longer needs to special-case whether it's dealing with an
aggtransfn or an aggcombinefn.  This allows us to reuse the
build_aggregate_transfn_expr() function and just get rid of the
build_aggregate_combinefn_expr() completely.

All of the special case code that was in build_pertrans_for_aggref() has
been moved up to the calling functions.

This saves about a dozen lines of code in nodeAgg.c and a few dozen more
in parse_agg.c

Also, rename a few variables in nodeAgg.c to try to make it more clear
that we're working with either a aggtransfn or an aggcombinefn.  Some of
the old names would have you believe that we were always working with an
aggtransfn.

Discussion: https://postgr.es/m/CAApHDvptMQ9FmF0D67zC_w88yVnoNVR2+kkOQGUrCmdxWxLULQ@mail.gmail.com
2021-07-04 18:47:31 +12:00
Andrew Dunstan
e1c1c30f63
Pre branch pgindent / pgperltidy run
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28 11:05:54 -04:00
Tom Lane
7c337b6b52 Centralize the logic for protective copying of utility statements.
In the "simple Query" code path, it's fine for parse analysis or
execution of a utility statement to scribble on the statement's node
tree, since that'll just be thrown away afterwards.  However it's
not fine if the node tree is in the plan cache, as then it'd be
corrupted for subsequent executions.  Up to now we've dealt with
that by having individual utility-statement functions apply
copyObject() if they were going to modify the tree.  But that's
prone to errors of omission.  Bug #17053 from Charles Samborski
shows that CREATE/ALTER DOMAIN didn't get this memo, and can
crash if executed repeatedly from plan cache.

In the back branches, we'll just apply a narrow band-aid for that,
but in HEAD it seems prudent to have a more principled fix that
will close off the possibility of other similar bugs in future.
Hence, let's hoist the responsibility for doing copyObject up into
ProcessUtility from its children, thus ensuring that it happens for
all utility statement types.

Also, modify ProcessUtility's API so that its callers can tell it
whether a copy step is necessary.  It turns out that in all cases,
the immediate caller knows whether the node tree is transient, so
this doesn't involve a huge amount of code thrashing.  In this way,
while we lose a little bit in the execute-from-cache code path due
to sometimes copying node trees that wouldn't be mutated anyway,
we gain something in the simple-Query code path by not copying
throwaway node trees.  Statements that are complex enough to be
expensive to copy are almost certainly ones that would have to be
copied anyway, so the loss in the cache code path shouldn't be much.

(Note that this whole problem applies only to utility statements.
Optimizable statements don't have the issue because we long ago made
the executor treat Plan trees as read-only.  Perhaps someday we will
make utility statement execution act likewise, but I'm not holding
my breath.)

Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
2021-06-18 11:22:58 -04:00
Tomas Vondra
99cea49d65 Fix copying data into slots with FDW batching
Commit b676ac443b optimized handling of tuple slots with bulk inserts
into foreign tables, so that the slots are initialized only once and
reused for all batches. The data was however copied into the slots only
after the initialization, inserting duplicate values when the slot gets
reused. Fixed by moving the ExecCopySlot outside the init branch.

The existing postgres_fdw tests failed to catch this due to inserting
data into foreign tables without unique indexes, and then checking only
the number of inserted rows. This adds a new test with both a unique
index and a check of inserted values.

Reported-by: Alexander Pyhalov
Discussion: https://postgr.es/m/7a8cf8d56b3d18e5c0bccd6cd42d04ac%40postgrespro.ru
2021-06-16 23:49:25 +02:00
Tomas Vondra
b676ac443b Optimize creation of slots for FDW bulk inserts
Commit b663a41363 introduced bulk inserts for FDW, but the handling of
tuple slots turned out to be problematic for two reasons. Firstly, the
slots were re-created for each individual batch. Secondly, all slots
referenced the same tuple descriptor - with reasonably small batches
this is not an issue, but with large batches this triggers O(N^2)
behavior in the resource owner code.

These two issues work against each other - to reduce the number of times
a slot has to be created/dropped, larger batches are needed. However,
the larger the batch, the more expensive the resource owner gets. For
practical batch sizes (100 - 1000) this would not be a big problem, as
the benefits (latency savings) greatly exceed the resource owner costs.
But for extremely large batches it might be much worse, possibly even
losing with non-batching mode.

Fixed by initializing tuple slots only once (and reusing them across
batches) and by using a new tuple descriptor copy for each slot.

Discussion: https://postgr.es/m/ebbbcc7d-4286-8c28-0272-61b4753af761%40enterprisedb.com
2021-06-11 20:23:33 +02:00
David Rowley
04539e73fa Use the correct article for abbreviations
We've accumulated quite a mix of instances of "an SQL" and "a SQL" in the
documents.  It would be good to be a bit more consistent with these.

The most recent version of the SQL standard I looked at seems to prefer
"an SQL".  That seems like a good lead to follow, so here we change all
instances of "a SQL" to become "an SQL".  Most instances correctly use
"an SQL" already, so it also makes sense to use the dominant variation in
order to minimise churn.

Additionally, there were some other abbreviations that needed to be
adjusted. FSM, SSPI, SRF and a few others.  Also fix some pronounceable,
abbreviations to use "a" instead of "an".  For example, "a SASL" instead
of "an SASL".

Here I've only adjusted the documents and error messages.  Many others
still exist in source code comments.  Translator hint comments seem to be
the biggest culprit.  It currently does not seem worth the churn to change
these.

Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
2021-06-11 13:38:04 +12:00
Tom Lane
e56bce5d43 Reconsider the handling of procedure OUT parameters.
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of
OUT parameters, for procedures only.  While that had some advantages
for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty
disastrous from a number of other perspectives.  Notably, since the
primary key of pg_proc is name + proargtypes, this made it possible to
have multiple procedures with identical names + input arguments and
differing output argument types.  That would make it impossible to call
any one of the procedures by writing just NULL (or "?", or any other
data-type-free notation) for the output argument(s).  The change also
seems likely to cause grave confusion for client applications that
examine pg_proc and expect the traditional definition of proargtypes.

Hence, revert the definition of proargtypes to what it was, and
undo a number of complications that had been added to support that.

To support the SQL-spec behavior of DROP PROCEDURE, when there are
no argmode markers in the command's parameter list, we perform the
lookup both ways (that is, matching against both proargtypes and
proallargtypes), succeeding if we get just one unique match.
In principle this could result in ambiguous-function failures
that would not happen when using only one of the two rules.
However, overloading of procedure names is thought to be a pretty
rare usage, so this shouldn't cause many problems in practice.
Postgres-specific code such as pg_dump can defend against any
possibility of such failures by being careful to specify argmodes
for all procedure arguments.

This also fixes a few other bugs in the area of CALL statements
with named parameters, and improves the documentation a little.

catversion bump forced because the representation of procedures
with OUT arguments changes.

Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us
2021-06-10 17:11:36 -04:00
Tom Lane
bb4aed46a5 Shut down EvalPlanQual machinery when LockRows node reaches the end.
Previously, we left the EPQ sub-executor alone until ExecEndLockRows.
This caused any buffer pins or other resources that it might hold to
remain held until ExecutorEnd, which in some code paths means that
they are held till the Portal is closed.  That can cause user-visible
problems, such as blocking VACUUM; and it's unlike the behavior of
ordinary table-scanning nodes, which will have released all buffer
pins by the time they return an EOF indication.

We can make LockRows work more like other plan nodes by calling
EvalPlanQualEnd just before returning NULL.  We still need to call it
in ExecEndLockRows in case the node was not run to completion, but in
the normal case the second call does nothing and costs little.

Per report from Yura Sokolov.  In principle this is a longstanding
bug, but in view of the lack of other complaints and the low severity
of the consequences, I chose not to back-patch.

Discussion: https://postgr.es/m/4aa370cb91ecf2f9885d98b80ad1109c@postgrespro.ru
2021-06-10 11:15:13 -04:00
Etsuro Fujita
f3baaf28a6 Fix rescanning of async-aware Append nodes.
In cases where run-time pruning isn't required, the synchronous and
asynchronous subplans for an async-aware Append node determined using
classify_matching_subplans() should be re-used when rescanning the node,
but the previous code re-determined them using that function repeatedly
each time when rescanning the node, leading to incorrect results in a
normal build and an Assert failure in an Assert-enabled build as that
function doesn't assume that it's called repeatedly in such cases.  Fix
the code as mentioned above.

My oversight in commit 27e1f1456.

While at it, initialize async-related pointers/variables to NULL/zero
explicitly in ExecInitAppend() and ExecReScanAppend(), just to be sure.
(The variables would have been set to zero before we get to the latter
function, but let's do so.)

Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAPmGK16Q4B2_KY%2BJH7rb7wQbw54AUprp7TMekGTd2T1B62yysQ%40mail.gmail.com
2021-06-07 12:45:00 +09:00
David Rowley
9e215378d7 Fix planner's use of Result Cache with unique joins
When the planner considered using a Result Cache node to cache results
from the inner side of a Nested Loop Join, it failed to consider that the
inner path's parameterization may not be the entire join condition.  If
the join was marked as inner_unique then we may accidentally put the cache
in singlerow mode.  This meant that entries would be marked as complete
after caching the first row.  That was wrong as if only part of the join
condition was parameterized then the uniqueness of the unique join was not
guaranteed at the Result Cache's level.  The uniqueness is only guaranteed
after Nested Loop applies the join filter.  If subsequent rows were found,
this would lead to:

ERROR: cache entry already complete

This could have been fixed by only putting the cache in singlerow mode if
the entire join condition was parameterized.  However, Nested Loop will
only read its inner side so far as the first matching row when the join is
unique, so that might mean we never get an opportunity to mark cache
entries as complete.  Since non-complete cache entries are useless for
subsequent lookups, we just don't bother considering a Result Cache path
in this case.

In passing, remove the XXX comment that claimed the above ERROR might be
better suited to be an Assert.  After there being an actual case which
triggered it, it seems better to keep it an ERROR.

Reported-by: David Christensen
Discussion: https://postgr.es/m/CAOxo6X+dy-V58iEPFgst8ahPKEU+38NZzUuc+a7wDBZd4TrHMQ@mail.gmail.com
2021-05-22 16:22:27 +12:00
Tom Lane
2b0ee126bb Fix usage of "tableoid" in GENERATED expressions.
We consider this supported (though I've got my doubts that it's a
good idea, because tableoid is not immutable).  However, several
code paths failed to fill the field in soon enough, causing such
a GENERATED expression to see zero or the wrong value.  This
occurred when ALTER TABLE adds a new GENERATED column to a table
with existing rows, and during regular INSERT or UPDATE on a
foreign table with GENERATED columns.

Noted during investigation of a report from Vitaly Ustinov.
Back-patch to v12 where GENERATED came in.

Discussion: https://postgr.es/m/CAM_DEiWR2DPT6U4xb-Ehigozzd3n3G37ZB1+867zbsEVtYoJww@mail.gmail.com
2021-05-21 15:02:06 -04:00
Tom Lane
84f5c2908d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:59 -04:00
David Rowley
6cb93beddd Convert misleading while loop into an if condition
This seems to be leftover from ea15e1867 and from when we used to evaluate
SRFs at each node.

Since there is an unconditional "return" at the end of the loop body, only
1 loop is ever possible, so we can just change this into an if condition.

There is no actual bug being fixed here so no back-patch. It seems fine to
just fix this anomaly in master only.

Author: Greg Nancarrow
Discussion: https://postgr.es/m/CAJcOf-d7T1q0az-D8evWXnsuBZjigT04WkV5hCAOEJQZRWy28w@mail.gmail.com
2021-05-14 12:26:11 +12:00
Tom Lane
def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Etsuro Fujita
a363bc6da9 Fix EXPLAIN ANALYZE for async-capable nodes.
EXPLAIN ANALYZE for an async-capable ForeignScan node associated with
postgres_fdw is done just by using instrumentation for ExecProcNode()
called from the node's callbacks, causing the following problems:

1) If the remote table to scan is empty, the node is incorrectly
   considered as "never executed" by the command even if the node is
   executed, as ExecProcNode() isn't called from the node's callbacks at
   all in that case.
2) The command fails to collect timings for things other than
   ExecProcNode() done in the node, such as creating a cursor for the
   node's remote query.

To fix these problems, add instrumentation for async-capable nodes, and
modify postgres_fdw accordingly.

My oversight in commit 27e1f1456.

While at it, update a comment for the AsyncRequest struct in execnodes.h
and the documentation for the ForeignAsyncRequest API in fdwhandler.sgml
to match the code in ExecAsyncAppendResponse() in nodeAppend.c, and fix
typos in comments in nodeAppend.c.

Per report from Andrey Lepikhov, though I didn't use his patch.

Reviewed-by: Andrey Lepikhov
Discussion: https://postgr.es/m/2eb662bb-105d-fc20-7412-2f027cc3ca72%40postgrespro.ru
2021-05-12 14:00:00 +09:00
Tom Lane
049e1e2edb Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns.  That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.

This had escaped notice through a confluence of missing sanity checks,
including

* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype.  While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.

* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers.  Add assertions there too.

* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist.  That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.

In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns.  This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.

In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot.  (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10.  In 9.6, projections work much differently and
we can't cheaply give them such an option.  The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.

In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table.  Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated.  Fix by preserving
resjunk columns in that routine.  (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)

Per report from Andres Freund.  Back-patch to all supported branches.

Security: CVE-2021-32028
2021-05-10 11:02:29 -04:00
Tom Lane
f02b9085ad Prevent integer overflows in array subscripting calculations.
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027
2021-05-10 10:44:38 -04:00
David Rowley
92c4c269d2 Move memory accounting Asserts for Result Cache code
In 9eacee2e6, I included some code to verify the cache's memory tracking
is correct by counting up the number of entries and the memory they use
each time we evict something from the cache.  Those values are then
compared to the expected values using Assert.  The problem is that this
requires looping over the entire cache hash table each time we evict an
entry from the cache.  That can be pretty expensive, as noted by Pavel
Stehule.

Here we move this memory accounting checking code so that we only verify
it on cassert builds once when shutting down the Result Cache node.

Aside from the performance increase, this has two distinct advantages:

1) We do the memory checks at the last possible moment before destroying
   the cache.  This means we'll now catch accounting problems that might
   sneak in after a cache eviction.

2) We now do the memory Assert checks when there were no cache evictions.
   This increases the coverage.

One small disadvantage is that we'll now miss any memory tracking issues
that somehow managed to resolve themselves by the end of execution.
However, it seems to me that such a memory tracking problem would be quite
unlikely, and likely somewhat less harmful if one were to exist.

In passing, adjust the loop over the hash table to use the standard
simplehash.h method of iteration.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRAzgoSkdEiqrKbT=7yG9FA5fjUAP3jmJywuDqYq6Ki5ug@mail.gmail.com
2021-05-09 11:37:18 +12:00
Michael Paquier
9b5558e7ad Fix come comments in execMain.c
1375422 has refactored this area of the executor code, and some comments
went out-of-sync.

Author: Yukun Wang
Reviewed-by: Amul Sul
Discussion: https://postgr.es/m/OS0PR01MB60033394FCAEF79B98F078F5B4459@OS0PR01MB6003.jpnprd01.prod.outlook.com
2021-04-24 15:07:04 +09:00
Etsuro Fujita
bb684c82f7 Minor code cleanup in asynchronous execution support.
This is cleanup for commit 27e1f1456:

* ExecAppendAsyncEventWait(), which was modified a bit further by commit
  a8af856d3, duplicated the same nevents calculation.  Simplify the code
  a little bit to avoid the duplication.  Update comments there.
* Add an assertion to ExecAppendAsyncRequest().
* Update a comment about merging the async_capable options from input
  relations in merge_fdw_options(), per complaint from Kyotaro Horiguchi.
* Add a comment for fetch_more_data_begin().

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK1637W30Wx3MnrReewhafn6F_0J76mrJGoFXFnpPq4QfvA%40mail.gmail.com
2021-04-23 12:00:00 +09:00
Tom Lane
d479d00285 Don't crash on reference to an un-available system column.
Adopt a more consistent policy about what slot-type-specific
getsysattr functions should do when system attributes are not
available.  To wit, they should all throw the same user-oriented
error, rather than variously crashing or emitting developer-oriented
messages.

This closes a identifiable problem in commits a71cfc56b and
3fb93103a (in v13 and v12), so back-patch into those branches,
along with a test case to try to ensure we don't break it again.
It is not known that any of the former crash cases are reachable
in HEAD, but this seems like a good safety improvement in any case.

Discussion: https://postgr.es/m/141051591267657@mail.yandex.ru
2021-04-22 17:30:55 -04:00
Alvaro Herrera
8aba932251
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 15:13:25 -04:00
Peter Eisentraut
544b28088f doc: Improve hyphenation consistency 2021-04-21 08:14:43 +02:00
Bruce Momjian
9660834dd8 adjust query id feature to use pg_stat_activity.query_id
Previously, it was pg_stat_activity.queryid to match the
pg_stat_statements queryid column.  This is an adjustment to patch
4f0b0966c8.  This also adjusts some of the internal function calls to
match.  Catversion bumped.

Reported-by: Álvaro Herrera, Julien Rouhaud

Discussion: https://postgr.es/m/20210408032704.GA7498@alvherre.pgsql
2021-04-20 12:22:26 -04:00
Michael Paquier
7ef8b52cf0 Fix typos and grammar in comments and docs
Author: Justin Pryzby
Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com
2021-04-19 11:32:30 +09:00
Tom Lane
83efce7a1e Revert "Cope with NULL query string in ExecInitParallelPlan()."
This reverts commit b3ee4c5038.
We don't need it in the wake of the preceding commit, which
added an upstream check that the querystring isn't null.

Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us
2021-04-15 17:17:45 -04:00
Tom Lane
1111b2668d Undo decision to allow pg_proc.prosrc to be NULL.
Commit e717a9a18 changed the longstanding rule that prosrc is NOT NULL
because when a SQL-language function is written in SQL-standard style,
we don't currently have anything useful to put there.  This seems a poor
decision though, as it could easily have negative impacts on external
PLs (opening them to crashes they didn't use to have, for instance).
SQL-function-related code can just as easily test "is prosqlbody not
null" as "is prosrc null", so there's no real gain there either.
Hence, revert the NOT NULL marking removal and adjust related logic.

For now, we just put an empty string into prosrc for SQL-standard
functions.  Maybe we'll have a better idea later, although the
history of things like pg_attrdef.adsrc suggests that it's not
easy to maintain a string equivalent of a node tree.

This also adds an assertion that queryDesc->sourceText != NULL
to standard_ExecutorStart.  We'd been silently relying on that
for awhile, so let's make it less silent.

Also fix some overlooked documentation and test cases.

Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us
2021-04-15 17:17:20 -04:00
Tom Lane
c2db458c10 Redesign the caching done by get_cached_rowtype().
Previously, get_cached_rowtype() cached a pointer to a reference-counted
tuple descriptor from the typcache, relying on the ExprContextCallback
mechanism to release the tupdesc refcount when the expression tree
using the tupdesc was destroyed.  This worked fine when it was designed,
but the introduction of within-DO-block COMMITs broke it.  The refcount
is logged in a transaction-lifespan resource owner, but plpgsql won't
destroy simple expressions made within the DO block (before its first
commit) until the DO block is exited.  That results in a warning about
a leaked tupdesc refcount when the COMMIT destroys the original resource
owner, and then an error about the active resource owner not holding a
matching refcount when the expression is destroyed.

To fix, get rid of the need to have a shutdown callback at all, by
instead caching a pointer to the relevant typcache entry.  Those
survive for the life of the backend, so we needn't worry about the
pointer becoming stale.  (For registered RECORD types, we can still
cache a pointer to the tupdesc, knowing that it won't change for the
life of the backend.)  This mechanism has been in use in plpgsql
and expandedrecord.c since commit 4b93f5799, and seems to work well.

This change requires modifying the ExprEvalStep structs used by the
relevant expression step types, which is slightly worrisome for
back-patching.  However, there seems no good reason for extensions
to be familiar with the details of these particular sub-structs.

Per report from Rohit Bhogate.  Back-patch to v11 where within-DO-block
COMMITs became a thing.

Discussion: https://postgr.es/m/CAAV6ZkQRCVBh8qAY+SZiHnz+U+FqAGBBDaDTjF2yiKa2nJSLKg@mail.gmail.com
2021-04-13 13:37:07 -04:00
Bruce Momjian
0f61727b75 Fixes for query_id feature
Ignore parallel workers in pg_stat_statements
  Oversight in 4f0b0966c8 which exposed queryid in parallel workers.
  Counters are aggregated by the main backend process so parallel workers
  would report duplicated activity, and could also report activity for the
  wrong entry as they are only aware of the top level queryid.

Fix thinko in pg_stat_get_activity when retrieving the queryid.

Remove unnecessary call to pgstat_report_queryid().

Reported-by: Amit Kapila, Andres Freund, Thomas Munro

Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de p634GTSOqnDW86Owrn6qDAVosC5dJjXjp7BMfc5Gz1Q@mail.gmail.com

Author: Julien Rouhaud
2021-04-08 11:16:01 -04:00
David Rowley
50e17ad281 Speedup ScalarArrayOpExpr evaluation
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array.  When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.

Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.

The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr.  The
executor will only perform the hash table evaluation when the hashfuncid
is set.

This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set.  We could maybe do something about that at some later
date.  The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once.  Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.

In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array.  The threshold is
currently set to 9.

Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
2021-04-08 23:51:22 +12:00
Andres Freund
b3ee4c5038 Cope with NULL query string in ExecInitParallelPlan().
It's far from clear that this is the right approach - but a good
portion of the buildfarm has been red for a few hours, on the last day
of the CF. And this fixes at least the obvious crash. So let's go with
that for now.

Discussion: https://postgr.es/m/20210407225806.majgznh4lk34hjvu%40alap3.anarazel.de
2021-04-07 22:08:24 -07:00
Peter Eisentraut
e717a9a18b SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

    CREATE FUNCTION add(a integer, b integer) RETURNS integer
        LANGUAGE SQL
        RETURN a + b;

or as a block

    CREATE PROCEDURE insert_data(a integer, b integer)
    LANGUAGE SQL
    BEGIN ATOMIC
      INSERT INTO tbl VALUES (a);
      INSERT INTO tbl VALUES (b);
    END;

The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody.  So at run time,
no further parsing is required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com
2021-04-07 21:47:55 +02:00
Bruce Momjian
4f0b0966c8 Make use of in-core query id added by commit 5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.

Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.

Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.

For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.

Bump catalog version.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 14:04:06 -04:00
Tom Lane
0d46771eaa Comment cleanup for a1115fa07.
Amit Langote

Discussion: https://postgr.es/m/CA+HiwqEcawatEaUh1uTbZMEZTJeLzbroRTz9_X9Z5CFjTWJkhw@mail.gmail.com
2021-04-07 12:22:02 -04:00
Tom Lane
a1115fa078 Postpone some more stuff out of ExecInitModifyTable.
Delay creation of the projections for INSERT and UPDATE tuples
until they're needed.  This saves a pretty fair amount of work
when only some of the partitions are actually touched.

The logic associated with identifying junk columns in UPDATE/DELETE
is moved to another loop, allowing removal of one loop over the
target relations; but it didn't actually change at all.

Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.

Amit Langote, reviewed at different times by Heikki Linnakangas and
myself

Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
2021-04-06 18:13:17 -04:00
Tom Lane
c5b7ba4e67 Postpone some stuff out of ExecInitModifyTable.
Arrange to do some things on-demand, rather than immediately during
executor startup, because there's a fair chance of never having to do
them at all:

* Don't open result relations' indexes until needed.

* Don't initialize partition tuple routing, nor the child-to-root
tuple conversion map, until needed.

This wins in UPDATEs on partitioned tables when only some of the
partitions will actually receive updates; with larger partition
counts the savings is quite noticeable.  Also, we can remove some
sketchy heuristics in ExecInitModifyTable about whether to set up
tuple routing.

Also, remove execPartition.c's private hash table tracking which
partitions were already opened by the ModifyTable node.  Instead
use the hash added to ModifyTable itself by commit 86dc90056.

To allow lazy computation of the conversion maps, we now set
ri_RootResultRelInfo in all child ResultRelInfos.  We formerly set it
only in some, not terribly well-defined, cases.  This has user-visible
side effects in that now more error messages refer to the root
relation instead of some partition (and provide error data in the
root's column order, too).  It looks to me like this is a strict
improvement in consistency, so I don't have a problem with the
output changes visible in this commit.

Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.

Amit Langote, reviewed at different times by Heikki Linnakangas and
myself

Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
2021-04-06 15:57:11 -04:00
Tom Lane
789d81de8a Fix missing #include in nodeResultCache.h.
Per cpluspluscheck.
2021-04-06 11:23:56 -04:00
Tom Lane
091e22b2e6 Clean up treatment of missing default and CHECK-constraint records.
Andrew Gierth reported that it's possible to crash the backend if no
pg_attrdef record is found to match an attribute that has atthasdef set.
AttrDefaultFetch warns about this situation, but then leaves behind
a relation tupdesc that has null "adbin" pointer(s), which most places
don't guard against.

We considered promoting the warning to an error, but throwing errors
during relcache load is pretty drastic: it effectively locks one out
of using the relation at all.  What seems better is to leave the
load-time behavior as a warning, but then throw an error in any code
path that wants to use a default and can't find it.  This confines
the error to a subset of INSERT/UPDATE operations on the table, and
in particular will at least allow a pg_dump to succeed.

Also, we should fix AttrDefaultFetch to not leave any null pointers
in the tupdesc, because that just creates an untested bug hazard.

While at it, apply the same philosophy of "warn at load, throw error
only upon use of the known-missing info" to CHECK constraints.
CheckConstraintFetch is very nearly the same logic as AttrDefaultFetch,
but for reasons lost in the mists of time, it was throwing ERROR for
the same cases that AttrDefaultFetch treats as WARNING.  Make the two
functions more nearly alike.

In passing, get rid of potentially-O(N^2) loops in equalTupleDesc
by making AttrDefaultFetch sort the entries after fetching them,
so that equalTupleDesc can assume that entries in two equal tupdescs
must be in matching order.  (CheckConstraintFetch already was sorting
CHECK constraints, but equalTupleDesc hadn't been told about it.)

There's some argument for back-patching this, but with such a small
number of field reports, I'm content to fix it in HEAD.

Discussion: https://postgr.es/m/87pmzaq4gx.fsf@news-spur.riddles.org.uk
2021-04-06 10:34:39 -04:00
Etsuro Fujita
a8af856d32 Adjust input value to WaitEventSetWait() in ExecAppendAsyncEventWait().
Adjust the number of events given to WaitEventSetWait() so that it
doesn't exceed the maximum number of events in the WaitEventSet given
to that function (set->nevents_space) in hopes of making the buildfarm
green.

Per valgrind failure report from Tom Lane and the buildfarm.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/3411577.1617289776%40sss.pgh.pa.us
2021-04-06 19:15:00 +09:00
David Rowley
1267d9862f Remove useless Asserts in Result Cache code
Testing if an unsigned variable is >= 0 is pretty pointless.

There's likely enough code in remove_cache_entry() to verify the cache
memory accounting is correct in assert enabled builds. These Asserts
were not adding much extra cover, even if they had been checking >= 0 on a
signed variable.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210402204734.6mo3nfacnljlicgn@alap3.anarazel.de
2021-04-03 10:41:43 +13:00