Commit graph

10489 commits

Author SHA1 Message Date
Daniel Gustafsson
970cd5c62b Fix macro placement in pg_config.h.in
Commit 274bbced85 accidentally placed the pg_config.h.in
for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader
places it.  Fix by re-arranging and backpatch to the same level as
the original commit.

Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru>
Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru
Backpatch-through: v12
2024-07-26 14:16:40 +02:00
Daniel Gustafsson
118ec331bf Disable all TLS session tickets
OpenSSL supports two types of session tickets for TLSv1.3, stateless
and stateful. The option we've used only turns off stateless tickets
leaving stateful tickets active. Use the new API introduced in 1.1.1
to disable all types of tickets.

Backpatch to all supported versions.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de
Backpatch-through: v12
2024-07-26 11:09:45 +02:00
Melanie Plageman
dc6354c670 Ensure vacuum removes all visibly dead tuples older than OldestXmin
If vacuum fails to remove a tuple with xmax older than
VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed,
it will loop infinitely in lazy_scan_prune(), which compares tuples'
visibility information to OldestXmin.

Starting in version 14, which uses GlobalVisState for visibility testing
during pruning, it is possible for GlobalVisState->maybe_needed to
precede OldestXmin if maybe_needed is forced to go backward while vacuum
is running. This can happen if a disconnected standby with a running
transaction older than VacuumCutoffs->OldestXmin reconnects to the
primary after vacuum initially calculates GlobalVisState and OldestXmin.

Fix this by having vacuum always remove tuples older than OldestXmin
during pruning. This is okay because the standby won't replay the tuple
removal until the tuple is removable. Thus, the worst that can happen is
a recovery conflict.

Fixes BUG# 17257

Back-patched in versions 14-17

Author: Melanie Plageman
Reviewed-by: Noah Misch, Peter Geoghegan, Robert Haas, Andres Freund, and Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
2024-07-19 12:05:51 -04:00
Masahiko Sawada
aee8c2b954 Fix possibility of logical decoding partial transaction changes.
When creating and initializing a logical slot, the restart_lsn is set
to the latest WAL insertion point (or the latest replay point on
standbys). Subsequently, WAL records are decoded from that point to
find the start point for extracting changes in the
DecodingContextFindStartpoint() function. Since the initial
restart_lsn could be in the middle of a transaction, the start point
must be a consistent point where we won't see the data for partial
transactions.

Previously, when not building a full snapshot, serialized snapshots
were restored, and the SnapBuild jumps to the consistent state even
while finding the start point. Consequently, the slot's restart_lsn
and confirmed_flush could be set to the middle of a transaction. This
could lead to various unexpected consequences. Specifically, there
were reports of logical decoding decoding partial transactions, and
assertion failures occurred because only subtransactions were decoded
without decoding their top-level transaction until decoding the commit
record.

To resolve this issue, the changes prevent restoring the serialized
snapshot and jumping to the consistent state while finding the start
point.

On v17 and HEAD, a flag indicating whether snapshot restores should be
skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION
has been bumpded.

On backbranches, the flag is stored in the LogicalDecodingContext
instead, preserving on-disk compatibility.

Backpatch to all supported versions.

Reported-by: Drew Callahan
Reviewed-by: Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com
Backpatch-through: 12
2024-07-11 22:48:16 +09:00
Thomas Munro
467d77bb16 Cope with <regex.h> name clashes.
macOS 15's SDK pulls in headers related to <regex.h> when we include
<xlocale.h>.  This causes our own regex_t implementation to clash with
the OS's regex_t implementation.  Luckily our function names already had
pg_ prefixes, but the macros and typenames did not.

Include <regex.h> explicitly on all POSIX systems, and fix everything
that breaks.  Then we can prove that we are capable of fully hiding and
replacing the system regex API with our own.

1.  Deal with standard-clobbering macros by undefining them all first.
POSIX says they are "symbolic constants".  If they are macros, this
allows us to redefine them.  If they are enums or variables, our macros
will hide them.

2.  Deal with standard-clobbering types by giving our types pg_
prefixes, and then using macros to redirect xxx_t -> pg_xxx_t.

After including our "regex/regex.h", the system <regex.h> is hidden,
because we've replaced all the standard names.  The PostgreSQL source
tree and extensions can continue to use standard prefix-less type and
macro names, but reach our implementation, if they included our
"regex/regex.h" header.

Back-patch to all supported branches, so that macOS 15's tool chain can
build them.

Reported-by: Stan Hu <stanhu@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAMBWrQnEwEJtgOv7EUNsXmFw2Ub4p5P%2B5QTBEgYwiyjy7rAsEQ%40mail.gmail.com
2024-07-06 10:53:13 +12:00
Noah Misch
2ca8ca4827 Remove comment about xl_heap_inplace "AT END OF STRUCT".
Commit 2c03216d83 moved the tuple data
from there to the buffer-0 data.  Back-patch to v12 (all supported
versions), the plan for the next change to this struct.

Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2024-06-27 19:21:11 -07:00
Noah Misch
b08a4b6163 Cope with inplace update making catcache stale during TOAST fetch.
This extends ad98fb1422 to invals of
inplace updates.  Trouble requires an inplace update of a catalog having
a TOAST table, so only pg_database was at risk.  (The other catalog on
which core code performs inplace updates, pg_class, has no TOAST table.)
Trouble would require something like the inplace-inval.spec test.
Consider GRANT ... ON DATABASE fetching a stale row from cache and
discarding a datfrozenxid update that vac_truncate_clog() has already
relied upon.  Back-patch to v12 (all supported versions).

Reviewed (in an earlier version) by Robert Haas.

Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
2024-06-27 19:21:11 -07:00
Noah Misch
a338e41374 Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX.
Commit 5b562644fe added a comment that
SetRelationHasSubclass() callers must hold this lock.  When commit
17f206fbc8 extended use of this column to
partitioned indexes, it didn't take the lock.  As the latter commit
message mentioned, we currently never reset a partitioned index to
relhassubclass=f.  That largely avoids harm from the lock omission.  The
cause for fixing this now is to unblock introducing a rule about locks
required to heap_update() a pg_class row.  This might cause more
deadlocks.  It gives minor user-visible benefits:

- If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE
  ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks
  instead of failing with "tuple concurrently updated".  (Many cases of
  DDL concurrency still fail that way.)

- Match ALTER INDEX ATTACH PARTITION in choosing to lock the index.

While not user-visible today, we'll need this if we ever make something
set the flag to false for a partitioned index, like ANALYZE does today
for tables.  Back-patch to v12 (all supported versions), the plan for
the commit relying on the new rule.  In back branches, add
LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter.

Reviewed (in an earlier version) by Robert Haas.

Discussion: https://postgr.es/m/20240611024525.9f.nmisch@google.com
2024-06-27 19:21:10 -07:00
Heikki Linnakangas
0e2f3d78bf Fix MVCC bug with prepared xact with subxacts on standby
We did not recover the subtransaction IDs of prepared transactions
when starting a hot standby from a shutdown checkpoint. As a result,
such subtransactions were considered as aborted, rather than
in-progress. That would lead to hint bits being set incorrectly, and
the subtransactions suddenly becoming visible to old snapshots when
the prepared transaction was committed.

To fix, update pg_subtrans with prepared transactions's subxids when
starting hot standby from a shutdown checkpoint. The snapshots taken
from that state need to be marked as "suboverflowed", so that we also
check the pg_subtrans.

Backport to all supported versions.

Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
2024-06-27 21:10:31 +03:00
Tom Lane
f550833193 Fix insertion of SP-GiST REDIRECT tuples during REINDEX CONCURRENTLY.
Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may
insert some REDIRECT tuples.  This will typically happen in
a transaction that lacks an XID, which leads either to assertion
failure in spgFormDeadTuple or to insertion of a REDIRECT tuple
with zero xid.  The latter's not good either, since eventually
VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid,
resulting in either an assertion failure or a garbage answer.

In practice, since REINDEX CONCURRENTLY locks out index scans
till it's done, it doesn't matter whether it inserts REDIRECTs
or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM
reduces such a REDIRECT to a PLACEHOLDER.  So in non-assert builds
there's no observable problem here, other than perhaps a little
index bloat.  But it's not behaving as intended.

To fix, remove the failing Assert in spgFormDeadTuple, acknowledging
that we might sometimes insert a zero XID; and guard VACUUM's
GlobalVisTestIsRemovableXid() call with a test for valid XID,
ensuring that we'll reduce such a REDIRECT the first time VACUUM
sees it.  (Versions before v14 use TransactionIdPrecedes here,
which won't fail on zero xid, so they really have no bug at all
in non-assert builds.)

Another solution could be to not create REDIRECTs at all during
REINDEX CONCURRENTLY, making the relevant code paths treat that
case like index build (which likewise knows that no concurrent
index scans can be happening).  That would allow restoring the
Assert in spgFormDeadTuple, but we'd still need the VACUUM change
because redirection tuples with zero xid may be out there already.
But there doesn't seem to be a nice way for spginsert() to tell that
it's being called in REINDEX CONCURRENTLY without some API changes,
so we'll leave that as a possible future improvement.

In HEAD, also rename the SpGistState.myXid field to redirectXid,
which seems less misleading (since it might not in fact be our
transaction's XID) and is certainly less uninformatively generic.

Per bug #18499 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org
2024-06-17 14:30:59 -04:00
Tom Lane
4ac385adc5 Account for optimized MinMax aggregates during SS_finalize_plan.
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table.  When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs.  Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan.  However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.

The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs.  That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that.  I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs.  I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.

Back-patch of v17 commits d0d44049d and 779ac2c74.  When d0d44049d
went in, there was no evidence that it was fixing a reachable bug,
so I refrained from back-patching.  Now we have such evidence.

Per bug #18465 from Hal Takahara.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18465-2fae927718976b22@postgresql.org
Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
2024-05-18 14:31:35 -04:00
David Rowley
52f21f9287 Ensure we allocate NAMEDATALEN bytes for names in Index Only Scans
As an optimization, we store "name" columns as cstrings in btree
indexes.

Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.

Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
2024-05-01 13:22:16 +12:00
Andres Freund
dcb7cf945c simplehash: Free collisions array in SH_STAT
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.

It's unclear why coverity started warning now.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
2024-04-07 19:09:04 -07:00
Alexander Korotkov
c2faf48fa3 Fix the parameters order for TableAmRoutine.relation_copy_for_cluster()
Specify OldTable first, NewTable second as used by
table_relation_copy_for_cluster() and as implemented in
heapam_relation_copy_for_cluster().

Backpatch to PostgreSQL 12, where TableAmRoutine was introduced.

Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
Author: Japin Li
Reviewed-by: Pavel Borisov
Backpatch-through: 12
2024-04-03 21:31:06 +03:00
Tom Lane
4fb56a734d Avoid deadlock during orphan temp table removal.
If temp tables have dependencies (such as sequences) then it's
possible for autovacuum's cleanup of orphan temp tables to deadlock
against an incoming backend that's trying to clean out the temp
namespace for its own use.  That can happen because RemoveTempRelations'
performDeletion call can visit objects within the namespace in
an order different from the order in which a per-table deletion
will visit them.

To fix, observe that performDeletion will begin by taking an exclusive
lock on the temp namespace (even though it won't actually delete it).
So, if we can get a shared lock on the namespace, we can be sure we're
not running concurrently with RemoveTempRelations, while also not
conflicting with ordinary use of the namespace.  This requires
introducing a conditional version of LockDatabaseObject, but that's no
big deal.  (It's surprising we've got along without that this long.)

Report and patch by Mikhail Zhilin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru
2024-04-02 14:59:04 -04:00
Tom Lane
6f66fadad9 Fix confusion about the return rowtype of SQL-language procedures.
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)

As far as I know, that doesn't cause any problems in ordinary
functions.  It's disastrous for procedures however.  All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite.  Doing something else leads to an assertion failure
or core dump.

This is simple enough to fix: we just need to not apply that rule
when considering procedures.  However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers.  Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.

Per report from Yahor Yuzefovich.  This has been broken since we
implemented procedures, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
2024-03-12 18:16:10 -04:00
Michael Paquier
e82b64a92c Revert "Fix parallel-safety check of expressions and predicate for index builds"
This reverts commit eae7be600b, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.

Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
2024-03-07 08:31:04 +09:00
Michael Paquier
50b5913a87 Fix parallel-safety check of expressions and predicate for index builds
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().

As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers.  Depending on the expressions
or predicate used, this could cause the parallel build to fail.

Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions.  Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.

Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
2024-03-06 17:24:06 +09:00
Noah Misch
8fa4a1ac61 Sync PG_VERSION file in CREATE DATABASE.
An OS crash could leave PG_VERSION empty or missing.  The same symptom
appeared in a backup by block device snapshot, taken after the next
checkpoint and before the OS flushes the PG_VERSION blocks.  Device
snapshots are not a documented backup method, however.  Back-patch to
v15, where commit 9c08aea6a3 introduced
STRATEGY=WAL_LOG and made it the default.

Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
2024-02-01 13:44:23 -08:00
Nathan Bossart
3726c1cb0e Move is_valid_ascii() to ascii.h.
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h.  Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics).  Moving the function to the much
less popular ascii.h resolves these issues for now.

This commit is back-patched for the benefit of the aforementioned
third-party tool.  The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists.  This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code.  It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().

Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
2024-01-29 12:09:08 -06:00
Michael Paquier
1cf2dba84b Add try_index_open(), conditional variant of index_open()
try_index_open() is able to open an index if its relkind fits, except
that it would return NULL instead of generated an error if the relation
does not exist.  This new routine will be used by an upcoming patch to
make REINDEX on partitioned relations more robust when an index in a
partition tree is dropped.

Extracted from a larger patch by the same author.

Author: Fei Changhong
Discussion: https://postgr.es/m/tencent_6A52106095ACDE55333E3AD33F304C0C3909@qq.com
Backpatch-through: 14
2024-01-18 15:04:35 +09:00
Andres Freund
f374fb4aab lwlock: Fix quadratic behavior with very long wait lists
Until now LWLockDequeueSelf() sequentially searched the list of waiters to see
if the current proc is still is on the list of waiters, or has already been
removed. In extreme workloads, where the wait lists are very long, this leads
to a quadratic behavior. #backends iterating over a list #backends
long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in
the first place also increases with the increased length of the wait queue, as
it becomes more likely that a lock is released while waiting for the wait list
lock, which is held for longer during lock release.

Due to the exponential back-off in perform_spin_delay() this is surprisingly
hard to detect. We should make that easier, e.g. by adding a wait event around
the pg_usleep() - but that's a separate patch.

The fix is simple - track whether a proc is currently waiting in the wait list
or already removed but waiting to be woken up in PGPROC->lwWaiting.

In some workloads with a lot of clients contending for a small number of
lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput.

This has been originally fixed for 16~ with a4adc31f69 without a
backpatch, and we have heard complaints from users impacted by this
quadratic behavior in older versions as well.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de
Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
Backpatch-through: 12
2024-01-18 11:12:31 +09:00
Tom Lane
5dd30bb54b Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.
We should have done it this way all along, but we accidentally got
away with using the wrong BIO field up until OpenSSL 3.2.  There,
the library's BIO routines that we rely on use the "data" field
for their own purposes, and our conflicting use causes assorted
weird behaviors up to and including core dumps when SSL connections
are attempted.  Switch to using the approved field for the purpose,
i.e. app_data.

While at it, remove our configure probes for BIO_get_data as well
as the fallback implementation.  BIO_{get,set}_app_data have been
there since long before any OpenSSL version that we still support,
even in the back branches.

Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor
change in an error message spelling that evidently came in with 3.2.

Tristan Partin and Bo Andreson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
2023-11-28 12:34:03 -05:00
Heikki Linnakangas
2873fbfe0d Fix assertions with RI triggers in heap_update and heap_delete.
If the tuple being updated is not visible to the crosscheck snapshot,
we return TM_Updated but the assertions would not hold in that case.
Move them to before the cross-check.

Fixes bug #17893. Backpatch to all supported versions.

Author: Alexander Lakhin
Backpatch-through: 12
Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
2023-11-28 11:59:50 +02:00
Daniel Gustafsson
aef521849b llvmjit: Use explicit LLVMContextRef for inlining
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.

To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types.  Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.

This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.

This is a backpatch of 9dce22033d to all supported branches.

Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
Backpatch-through: v12
2023-11-17 10:21:34 +01:00
Tom Lane
9057ddbefe Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Nathan Bossart
18f47989ec Fix fallback implementation for pg_atomic_test_set_flag().
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument.  This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.

Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
2023-11-15 15:04:34 -06:00
David Rowley
456d697bae Ensure we use the correct spelling of "ensure"
We seem to have accidentally used "insure" in a few places.  Correct
that.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
2023-11-10 00:17:07 +13:00
Dean Rasheed
308a69a987 Fix corner-case 64-bit integer subtraction bug on some platforms.
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.

Back-patch to all supported branches.

Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.

Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
2023-11-09 09:54:22 +00:00
Tom Lane
3bc6bc3ee2 Detect integer overflow while computing new array dimensions.
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds.  In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen.  Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.

To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.

The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text.  v11 psql
lacks that parameter, so omit the tests in that branch.

Our thanks to Pedro Gallegos for reporting this problem.

Security: CVE-2023-5869
2023-11-06 10:56:43 -05:00
Tom Lane
ae33659d42 Be more wary about NULL values for GUC string variables.
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val.  Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice.  Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:47:33 -04:00
Tom Lane
1268e73781 Fix problems when a plain-inheritance parent table is excluded.
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not.  (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.)  In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:

1. The wrong table's statement-level triggers would get fired.

2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan.  This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.

In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.

While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch.  Given the lack of field complaints about the
trigger issue, I'll desist for now.  (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)

Per bug #18147 from Hans Buschmann.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-24 14:48:34 -04:00
Thomas Munro
b2e0977886 jit: Supply LLVMGlobalGetValueType() for LLVM < 8.
Commit 37d5babb used this C API function while adding support for LLVM
16 and opaque pointers, but it's not available in LLVM 7 and older.
Provide it in our own llvmjit_wrap.cpp.  It just calls a C++ function
that pre-dates LLVM 3.9, our minimum target.

Back-patch to 12, like 37d5babb.

Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
2023-10-19 03:03:27 +13:00
Thomas Munro
eed1feb3fe jit: Support opaque pointers in LLVM 16.
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].

 * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
 * For LLVM == 15, we'll continue to do the same, explicitly opting
   out of opaque pointer mode.
 * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
   the extra type argument.

The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc.  The change is mostly mechanical,
except that at each site the correct type had to be provided.

In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.

Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.

[1] https://llvm.org/docs/OpaquePointers.html

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
2023-10-18 22:59:46 +13:00
Nathan Bossart
c9265ae80b Avoid calling proc_exit() in processes forked by system().
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system().  This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers.  If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit().  This can end badly.  For
example, both processes will try to remove themselves from the
PGPROC shared array.

To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before.  If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.

This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.

Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
2023-10-17 10:42:06 -05:00
Noah Misch
782be0f712 Dissociate btequalimage() from interval_ops, ending its deduplication.
Under interval_ops, some equal values are distinguishable.  One such
pair is '24:00:00' and '1 day'.  With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function.  This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes.  This fix makes interval_ops simply omit the support function,
like numeric_ops does.  Back-pack to v13, where btequalimage() first
appeared.  In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval".  Going
forward, back-branch initdb will include the catalog change.

Reviewed by Peter Geoghegan.

Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
2023-10-14 16:33:54 -07:00
Dean Rasheed
3c1a1af91d Fix EvalPlanQual rechecking during MERGE.
Under some circumstances, concurrent MERGE operations could lead to
inconsistent results, that varied according the plan chosen. This was
caused by a lack of rowmarks on the source relation, which meant that
EvalPlanQual rechecking was not guaranteed to return the same source
tuples when re-running the join query.

Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for
all non-target relations used in MERGE, in the same way that it does
for UPDATE and DELETE.

Per bug #18103. Back-patch to v15, where MERGE was introduced.

Dean Rasheed, reviewed by Richard Guo.

Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
2023-09-30 10:55:24 +01:00
Peter Geoghegan
cac37c1a1b Fix btmarkpos/btrestrpos array key wraparound bug.
nbtree's mark/restore processing failed to correctly handle an edge case
involving array key advancement and related search-type scan key state.
Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore
processing (for a merge join) could incorrectly conclude that an
affected array/scan key must not have advanced during the time between
marking and restoring the scan's position.

As a result of all this, array key handling within btrestrpos could skip
a required call to _bt_preprocess_keys().  This confusion allowed later
primitive index scans to overlook tuples matching the true current array
keys.  The scan's search-type scan keys would still have spurious values
corresponding to the final array element(s) -- not values matching the
first/now-current array element(s).

To fix, remember that "array key wraparound" has taken place during the
ongoing btrescan in a flag variable stored in the scan's state, and use
that information at the point where btrestrpos decides if another call
to _bt_preprocess_keys is required.

Oversight in commit 70bc5833, which taught nbtree to handle array keys
during mark/restore processing, but missed this subtlety.  That commit
was itself a bug fix for an issue in commit 9e8da0f7, which taught
nbtree to handle ScalarArrayOpExpr quals natively.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
2023-09-28 16:29:32 -07:00
Tom Lane
8700851352 Avoid unnecessary plancache revalidation of utility statements.
Revalidation of a plancache entry (after a cache invalidation event)
requires acquiring a snapshot.  Normally that is harmless, but not
if the cached statement is one that needs to run without acquiring a
snapshot.  We were already aware of that for TransactionStmts,
but for some reason hadn't extrapolated to the other statements that
PlannedStmtRequiresSnapshot() knows mustn't set a snapshot.  This can
lead to unexpected failures of commands such as SET TRANSACTION
ISOLATION LEVEL.  We can fix it in the same way, by excluding those
command types from revalidation.

However, we can do even better than that: there is no need to
revalidate for any statement type for which parse analysis, rewrite,
and plan steps do nothing interesting, which is nearly all utility
commands.  To mechanize this, invent a parser function
stmt_requires_parse_analysis() that tells whether parse analysis does
anything beyond wrapping a CMD_UTILITY Query around the raw parse
tree.  If that's what it does, then rewrite and plan will just
skip the Query, so that it is not possible for the same raw parse
tree to produce a different plan tree after cache invalidation.

stmt_requires_parse_analysis() is basically equivalent to the
existing function analyze_requires_snapshot(), except that for
obscure reasons that function omits ReturnStmt and CallStmt.
It is unclear whether those were oversights or intentional.
I have not been able to demonstrate a bug from not acquiring a
snapshot while analyzing these commands, but at best it seems mighty
fragile.  It seems safer to acquire a snapshot for parse analysis of
these commands too, which allows making stmt_requires_parse_analysis
and analyze_requires_snapshot equivalent.

In passing this fixes a second bug, which is that ResetPlanCache
would exclude ReturnStmts and CallStmts from revalidation.
That's surely *not* safe, since they contain parsable expressions.

Per bug #18059 from Pavel Kulakov.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org
2023-08-24 12:02:40 -04:00
Peter Eisentraut
908f711d23 Update DECLARE_INDEX documentation
Update source code comment changes belonging to the changes in
6a6389a08b.

Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-08-24 14:00:54 +02:00
Etsuro Fujita
9f495270cc Update comments on CustomPath struct.
Commit e7cb7ee14 allowed custom scan providers to create CustomPath
paths for join relations as well, but missed updating the comments.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAPmGK15ODkN%2B%3DhkBCufj1HBW0x5OTb65Xuy7ryXchMdiCMpx_g%40mail.gmail.com
2023-08-03 17:15:02 +09:00
Etsuro Fujita
d1ef5631e6 Disallow replacing joins with scans in problematic cases.
Commit e7cb7ee14, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-07-28 15:45:02 +09:00
Amit Kapila
2864eb977a Fix the display of UNKNOWN message type in apply worker.
We include the message type while displaying an error context in the
apply worker. Now, while retrieving the message type string if the
message type is unknown we throw an error that will hide the original
error. So, instead, we need to simply return the string indicating an
unknown message type.

Reported-by: Ashutosh Bapat
Author: Euler Taveira, Amit Kapila
Reviewed-by: Ashutosh Bapat
Backpatch-through: 15
Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L+ci2zreYWebpzxYsA@mail.gmail.com
2023-07-25 08:50:37 +05:30
Andres Freund
f66403749d Handle DROP DATABASE getting interrupted
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.

To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.

An invalid database cannot be connected to anymore, but can still be
dropped.

Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.

Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.

Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
2023-07-13 13:04:45 -07:00
Andres Freund
d12d1a9278 Fix type of iterator variable in SH_START_ITERATE
Also add comment to make the reasoning behind the Assert() more explicit (per
Tom).

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAocXNJ6s1VLz+hMamLAQAiewRoW17OJ6-+9GACKfj6iPQ@mail.gmail.com
Backpatch: 11-
2023-07-06 09:57:31 -07:00
Tom Lane
4729d1e8aa Fix misbehavior of EvalPlanQual checks with multiple result relations.
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals.  (In join cases, other relations in the query are still
scanned normally.)  This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children.  We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself.  This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.

Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck.  In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState).  Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.

This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested.  I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.

Per report from Ante Krešić (via Aleksander Alekseev)

Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
2023-05-19 14:26:34 -04:00
Michael Paquier
2dd7782217 pageinspect: Fix gist_page_items() with included columns
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes.  This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.

This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with.  In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.

ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.

While on it, this commit fixes a third problem related to the amount of
data reported for key attributes.  The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available.  This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect.  Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().

The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.

Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
2023-05-19 12:38:15 +09:00
Tomas Vondra
e187693239 Fix handling of empty ranges and NULLs in BRIN
BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges, treating them as essentially the same
thing. Summaries were initialized with allnulls=true, and opclasses
simply reset allnulls to false when processing the first non-NULL value.
This however produces incorrect results if the range starts with a NULL
value (or a sequence of NULL values), in which case we forget the range
contains NULL values when adding the first non-NULL value.

This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.

Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting the flag in both cases would make
it correct, but it would also make BRIN indexes useless for queries with
IS NULL clauses. All ranges start empty (and thus allnulls=true), so all
ranges would end up with either allnulls=true or hasnulls=true.

The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true.
This can happen e.g. for small tables (because a summary for the first
range exists for all BRIN indexes), or for tables with large fraction of
NULL values in the indexed columns.

Bulk summarization (e.g. during CREATE INDEX or automatic summarization)
that processes all values at once is not affected by this issue. In this
case the flags were updated in a slightly different way, not forgetting
the NULL values.

To identify empty ranges we use a new flag, stored in an unused bit in
the BRIN tuple header so the on-disk format remains the same. A matching
flag is added to BrinMemTuple, into a 3B gap after bt_placeholder.
That means there's no risk of ABI breakage, although we don't actually
pass the BrinMemTuple to any public API.

We could also skip storing index tuples for empty summaries, but then
we'd have to always process such ranges - even if there are no rows in
large parts of the table (e.g. after a bulk DELETE), it would still
require reading the pages etc. So we store them, but ignore them when
building the bitmap.

Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.

Backpatch-through: 11
Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
2023-05-19 00:15:13 +02:00
Michael Paquier
ccd21e1cfa Fix assertion failure when updating stats_fetch_consistency in a transaction
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot.  In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").

In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed.  Some tests are added to
check, while on it.

Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.

Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached.  Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt.  Finally, snapshot->cache would cache a new snapshot on
the second attempt.

Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15
2023-05-10 11:24:40 +09:00
Michael Paquier
b9ad73ad25 Fix crashes with CREATE SCHEMA AUTHORIZATION and schema elements
CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to
crashes when comparing the schema name of the query with the schemas
used in the qualification of some clauses in the elements' queries.

The origin of the problem is that the transformation routine for the
elements listed in a CREATE SCHEMA query uses as new, expected, schema
name the one listed in CreateSchemaStmt itself.  However, depending on
the query, CreateSchemaStmt.schemaname may be NULL, being computed
instead from the role specification of the query given by the
AUTHORIZATION clause, that could be either:
- A user name string, with the new schema name being set to the same
value as the role given.
- Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new
schema name computed from the security context where CREATE SCHEMA is
running.

Regression tests are added for CREATE SCHEMA with some appended elements
(some of them with schema qualifications), covering also some role
specification patterns.

While on it, this simplifies the context structure used during the
transformation of the elements listed in a CREATE SCHEMA query by
removing the fields for the role specification and the role type.  They
were not used, and for the role specification this could be confusing as
the schema name may by extracted from that at the beginning of
CreateSchemaCommand().

This issue exists for a long time, so backpatch down to all the versions
supported.

Reported-by: Song Hongyu
Author: Michael Paquier
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org
Backpatch-through: 11
2023-04-28 19:29:36 +09:00
Michael Paquier
5c32549460 Fix detection of unseekable files for fseek() and ftello() with MSVC
Calling fseek() or ftello() on a handle to a non-seeking device such as
a pipe or a communications device is not supported.  Unfortunately,
MSVC's flavor of these routines, _fseeki64() and _ftelli64(), do not
return an error when given a pipe as handle.  Some of the logic of
pg_dump and restore relies on these routines to check if a handle is
seekable, causing failures when passing the contents of pg_dump to
pg_restore through a pipe, for example.

This commit introduces wrappers for fseeko() and ftello() on MSVC so as
any callers are able to properly detect the cases of non-seekable
handles.  This relies mainly on GetFileType(), sharing a bit of code
with the MSVC port for fstat().  The code in charge of getting a file
type is refactored into a new file called win32common.c, shared by
win32stat.c and the new win32fseek.c.  It includes the MSVC ports for
fseeko() and ftello().

Like 765f5df, this is backpatched down to 14, where the fstat()
implementation for MSVC is able to understand about files larger than
4GB in size.  Using a TAP test for that is proving to be tricky as
IPC::Run handles the pipes by itself, still I have been able to check
the fix manually.

Reported-by: Daniel Watzinger
Author: Juan José Santamaría Flecha, Michael Paquier
Discussion: https://postgr.es/m/CAC+AXB26a4EmxM2suXxPpJaGrqAdxracd7hskLg-zxtPB50h7A@mail.gmail.com
Backpatch-through: 14
2023-04-12 09:09:53 +09:00
Thomas Munro
c03c6e8cf6 Fix race in parallel hash join batch cleanup, take II.
With unlucky timing and parallel_leader_participation=off (not the
default), PHJ could attempt to access per-batch shared state just as it
was being freed.  There was code intended to prevent that by checking
for a cleared pointer, but it was racy.  Fix, by introducing an extra
barrier phase.  The new phase PHJ_BUILD_RUNNING means that it's safe to
access the per-batch state to find a batch to help with, and
PHJ_BUILD_DONE means that it is too late.  The last to detach will free
the array of per-batch state as before, but now it will also atomically
advance the phase, so that late attachers can avoid the hazard.  This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

An earlier attempt to fix this (commit 3b8981b6, later reverted) missed
one special case.  When the inner side is empty (the "empty inner
optimization), the build barrier would only make it to
PHJ_BUILD_HASHING_INNER phase before workers attempted to detach from
the hashtable.  In that case, fast-forward the build barrier to
PHJ_BUILD_RUNNING before proceeding, so that our later assertions hold
and we can still negotiate who is cleaning up.

Revealed by build farm failures, where BarrierAttach() failed a sanity
check assertion, because the memory had been clobbered by dsa_free().
In non-assert builds, the result could be a segmentation fault.

Back-patch to all supported releases.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: David Geier <geidav.pg@gmail.com>
Tested-by: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
2023-03-21 14:32:14 +13:00
Jeff Davis
8b87e92919 Fix t_isspace(), etc., when datlocprovider=i and datctype=C.
Check whether the datctype is C to determine whether t_isspace() and
related functions use isspace() or iswspace().

Previously, t_isspace() checked whether the database default collation
was C; which is incorrect when the default collation uses the ICU
provider.

Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Backpatch-through: 15
2023-03-17 12:07:47 -07:00
Dean Rasheed
7d9a75713a Fix concurrent update issues with MERGE.
If MERGE attempts an UPDATE or DELETE on a table with BEFORE ROW
triggers, or a cross-partition UPDATE (with or without triggers), and
a concurrent UPDATE or DELETE happens, the merge code would fail.

In some cases this would lead to a crash, while in others it would
cause the wrong merge action to be executed, or no action at all. The
immediate cause of the crash was the trigger code calling
ExecGetUpdateNewTuple() as part of the EPQ mechanism, which fails
because during a merge ri_projectNew is NULL, since merge has its own
per-action projection information, which ExecGetUpdateNewTuple() knows
nothing about.

Fix by arranging for the trigger code to exit early, returning the
TM_Result and TM_FailureData information, if a concurrent modification
is detected, allowing the merge code to do the necessary EPQ handling
in its own way. Similarly, prevent the cross-partition update code
from doing any EPQ processing for a merge, allowing the merge code to
work out what it needs to do.

This leads to a number of simplifications in nodeModifyTable.c. Most
notably, the ModifyTableContext->GetUpdateNewTuple() callback is no
longer needed, and mergeGetUpdateNewTuple() can be deleted, since
there is no longer any requirement for get-update-new-tuple during a
merge. Similarly, ModifyTableContext->cpUpdateRetrySlot is no longer
needed. Thus ExecGetUpdateNewTuple() and the retry_slot handling of
ExecCrossPartitionUpdate() can be restored to how they were in v14,
before the merge code was added, and ExecMergeMatched() no longer
needs any special-case handling for cross-partition updates.

While at it, tidy up ExecUpdateEpilogue() a bit, making it handle
recheckIndexes locally, rather than passing it in as a parameter,
ensuring that it is freed properly. This dates back to when it was
split off from ExecUpdate() to support merge.

Per bug #17809 from Alexander Lakhin, and follow-up investigation of
bug #17792, also from Alexander Lakhin.

Back-patch to v15, where MERGE was introduced, taking care to preserve
backwards-compatibility of the trigger API in v15 for any extensions
that might use it.

Discussion:
  https://postgr.es/m/17809-9e6650bef133f0fe%40postgresql.org
  https://postgr.es/m/17792-0f89452029662c36%40postgresql.org
2023-03-13 10:23:42 +00:00
Tom Lane
f61e60102f Avoid failure when altering state of partitioned foreign-key triggers.
Beginning in v15, if you apply ALTER TABLE ENABLE/DISABLE TRIGGER to
a partitioned table, it also affects the partitions' cloned versions
of the affected trigger(s).  The initial implementation of this
located the clones by name, but that fails on foreign-key triggers
which have names incorporating their own OIDs.  We can fix that, and
also make the behavior more bulletproof in the face of user-initiated
trigger renames, by identifying the cloned triggers by tgparentid.

Following the lead of earlier commits in this area, I took care not
to break ABI in the v15 branch, even though I rather doubt there
are any external callers of EnableDisableTrigger.

While here, update the documentation, which was not touched when
the semantics were changed.

Per bug #17817 from Alan Hodgson.  Back-patch to v15; older versions
do not have this behavior.

Discussion: https://postgr.es/m/17817-31dfb7c2100d9f3d@postgresql.org
2023-03-04 13:32:35 -05:00
Tom Lane
a033f9165c Fix MULTIEXPR_SUBLINK with partitioned target tables, yet again.
We already tried to fix this in commits 3f7323cbb et al (and follow-on
fixes), but now it emerges that there are still unfixed cases;
moreover, these cases affect all branches not only pre-v14.  I thought
we had eliminated all cases of making multiple clones of an UPDATE's
target list when we nuked inheritance_planner.  But it turns out we
still do that in some partitioned-UPDATE cases, notably including
INSERT ... ON CONFLICT UPDATE, because ExecInitPartitionInfo thinks
it's okay to clone and modify the parent's targetlist.

This fix is based on a suggestion from Andres Freund: let's stop
abusing the ParamExecData.execPlan mechanism, which was only ever
meant to handle initplans, and instead solve the execution timing
problem by having the expression compiler move MULTIEXPR_SUBLINK steps
to the front of their expression step lists.  This is feasible because
(a) all branches still in support compile the entire targetlist of
an UPDATE into a single ExprState, and (b) we know that all
MULTIEXPR_SUBLINKs do need to be evaluated --- none could be buried
inside a CASE, for example.  There is a minor semantics change
concerning the order of execution of the MULTIEXPR's subquery versus
other parts of the parent targetlist, but that seems like something
we can get away with.  By doing that, we no longer need to worry
about whether different clones of a MULTIEXPR_SUBLINK share output
Params; their usage of that data structure won't overlap.

Per bug #17800 from Alexander Lakhin.  Back-patch to all supported
branches.  In v13 and earlier, we can revert 3f7323cbb and follow-on
fixes; however, I chose to keep the SubPlan.subLinkId field added
in ccbb54c72.  We don't need that anymore in the core code, but it's
cheap enough to fill, and removing a plan node field in a minor
release seems like it'd be asking for trouble.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/17800-ff90866b3906c964@postgresql.org
2023-02-25 14:44:14 -05:00
Michael Paquier
5fd61055ea Fix handling of SCRAM-SHA-256's channel binding with RSA-PSS certificates
OpenSSL 1.1.1 and newer versions have added support for RSA-PSS
certificates, which requires the use of a specific routine in OpenSSL to
determine which hash function to use when compiling it when using
channel binding in SCRAM-SHA-256.  X509_get_signature_nid(), that is the
original routine the channel binding code has relied on, is not able to
determine which hash algorithm to use for such certificates.  However,
X509_get_signature_info(), new to OpenSSL 1.1.1, is able to do it.  This
commit switches the channel binding logic to rely on
X509_get_signature_info() over X509_get_signature_nid(), which would be
the choice when building with 1.1.1 or newer.

The error could have been triggered on the client or the server, hence
libpq and the backend need to have their related code paths patched.
Note that attempting to load an RSA-PSS certificate with OpenSSL 1.1.0
or older leads to a failure due to an unsupported algorithm.

The discovery of relying on X509_get_signature_info() comes from Jacob,
the tests have been written by Heikki (with few tweaks from me), while I
have bundled the whole together while adding the bits needed for MSVC
and meson.

This issue exists since channel binding exists, so backpatch all the way
down.  Some tests are added in 15~, triggered if compiling with OpenSSL
1.1.1 or newer, where the certificate and key files can easily be
generated for RSA-PSS.

Reported-by: Gunnar "Nick" Bluth
Author: Jacob Champion, Heikki Linnakangas
Discussion: https://postgr.es/m/17760-b6c61e752ec07060@postgresql.org
Backpatch-through: 11
2023-02-15 10:12:31 +09:00
Robert Haas
ecb01e6ebb Un-revert "Disable STARTUP_PROGRESS_TIMEOUT in standby mode."
This reverts commit 1eadfbdd7e
and thus reinstates commit 98e7234242.

It's a better time to commit this now that the release is over.

Discussion: http://postgr.es/m/3509384.1675878203@sss.pgh.pa.us
2023-02-10 16:27:05 -05:00
Robert Haas
1eadfbdd7e Revert "Disable STARTUP_PROGRESS_TIMEOUT in standby mode."
This reverts commit 98e7234242. I
forgot that we're about to wrap a release, and this fix isn't
critical enough to justify committing it right before we wrap
a release.

Discussion: http://postgr.es/m/2676424.1675700113@sss.pgh.pa.us
2023-02-06 11:16:03 -05:00
Robert Haas
98e7234242 Disable STARTUP_PROGRESS_TIMEOUT in standby mode.
In standby mode, we don't actually report progress of recovery,
but up until now, startup_progress_timeout_handler() nevertheless
got called every log_startup_progress_interval seconds. That's
an unnecessary expense, so avoid it.

Report by Thomas Munro. Patch by Bharath Rupireddy, reviewed by
Simon Riggs, Thomas Munro, and me. Back-patch to v15, where
the problem was introduced.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKCHSffAj8zZJKJvNX7ygnQFxVD6wm1d-2j3fVw%2BMafPQ%40mail.gmail.com
2023-02-06 10:55:42 -05:00
Tom Lane
9a40a03119 Make our back branches build under -fkeep-inline-functions.
Add "#ifndef FRONTEND" where necessary to make pg_waldump build
on compilers that don't elide unused static-inline functions.

This back-patches relevant parts of commit 3e9ca5260, fixing build
breakage from dc7420c2c and back-patching of f10f0ae42.

Per recently-resurrected buildfarm member castoroides.  We aren't
expecting castoroides to build anything newer than v11, but we
might as well clean up the intermediate branches while at it.
2023-01-20 11:58:12 -05:00
Tom Lane
a8f7687a0b Make new GENERATED-expressions code more bulletproof.
In commit 8bf6ec3ba I assumed that no code path could reach
ExecGetExtraUpdatedCols without having gone through
ExecInitStoredGenerated.  That turns out not to be the case in
logical replication: if there's an ON UPDATE trigger on the target
table, trigger.c will call this code before anybody has set up its
generated columns.  Having seen that, I don't have a lot of faith in
there not being other such paths.  ExecGetExtraUpdatedCols can call
ExecInitStoredGenerated for itself, as long as we are willing to
assume that it is only called in CMD_UPDATE operations, which on
the whole seems like a safer leap of faith.

Per report from Vitaly Davydov.

Discussion: https://postgr.es/m/d259d69652b8c2ff50e14cda3c236c7f@postgrespro.ru
2023-01-15 14:06:46 -05:00
Tom Lane
3706cc97aa Fix calculation of which GENERATED columns need to be updated.
We were identifying the updatable generated columns of inheritance
children by transposing the calculation made for their parent.
However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field
in favor of identifying which generated columns depend on updated
columns during executor startup.  In HEAD we can remove
extraUpdatedCols altogether; in back branches, it's still there but
always empty.  Another difference between the HEAD and back-branch
versions of this patch is that in HEAD we can add the new bitmap field
to ResultRelInfo, but that would cause an ABI break in back branches.
Like 4b3e37993, add a List field at the end of struct EState instead.

Back-patch to v13.  The bogus calculation is also being made in v12,
but it doesn't have the same visible effect because we don't use it
to decide which generated columns to recalculate; as a consequence of
which the patch doesn't apply easily.  I think that there might still
be a demonstrable bug associated with trigger firing conditions, but
that's such a weird corner-case usage that I'm content to leave it
unfixed in v12.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
2023-01-05 14:12:17 -05:00
Andres Freund
c6e75e4c27 perl: Hide warnings inside perl.h when using gcc compatible compiler
New versions of perl trigger warnings within perl.h with our compiler
flags. At least -Wdeclaration-after-statement, -Wshadow=compatible-local are
known to be problematic.

To avoid these warnings, conditionally use #pragma GCC system_header before
including plperl.h.

Alternatively, we could add the include paths for problematic headers with
-isystem, but that is a larger hammer and is harder to search for.

A more granular alternative would be to use #pragma GCC diagnostic
push/ignored/pop, but gcc warns about unknown warnings being ignored, so every
to-be-ignored-temporarily compiler warning would require its own pg_config.h
symbol and #ifdef.

As the warnings are voluminous, it makes sense to backpatch this change. But
don't do so yet, we first want gather buildfarm coverage - it's e.g. possible
that some compiler claiming to be gcc compatible has issues with the pragma.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: Discussion: https://postgr.es/m/20221228182455.hfdwd22zztvkojy2@awork3.anarazel.de
2023-01-02 15:49:33 -08:00
Tom Lane
18431ee6f5 Rethink handling of [Prevent|Is]InTransactionBlock in pipeline mode.
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock().  This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.

Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.

To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it.  Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".

Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems.  This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock.  (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)

For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock.  This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.

Per report from Peter Eisentraut.  As before, back-patch to all
supported branches (which sadly no longer includes v10).

Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
2022-12-13 14:23:59 -05:00
Etsuro Fujita
a0bf7a0ecc Remove new structure member from ResultRelInfo.
In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo
to save the owning ModifyTableState for use by nodeModifyTable.c when
performing batch inserts, but as pointed out by Tom Lane, that changed
the array stride of es_result_relations, and that would break any
previously-compiled extension code that accesses that array.  Fix by
removing that member from ResultRelInfo and instead adding a List member
at the end of EState to save such ModifyTableStates.

Per report from Tom Lane.  Back-patch to v14, like the previous commit;
I chose to apply the patch to HEAD as well, to make back-patching easy.

Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
2022-12-08 16:15:01 +09:00
Tom Lane
c959f84c2b Fix Memoize to work with partitionwise joining.
A couple of places weren't up to speed for this.  By sheer good
luck, we didn't fail but just selected a non-memoized join plan,
at least in the test case we have.  Nonetheless, it's a bug,
and I'm not quite sure that it couldn't have worse consequences
in other examples.  So back-patch to v14 where Memoize came in.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs48GkNom272sfp0-WeD6_0HSR19BJ4H1c9ZKSfbVnJsvRg@mail.gmail.com
2022-12-05 12:36:41 -05:00
Amit Kapila
ebf87c019c Fix incorrect output from pgoutput when using column lists.
For Updates and Deletes, we were not honoring the columns list for old
tuple values while sending tuple data via pgoutput. This results in
pgoutput emitting more columns than expected.

This is not a problem for built-in logical replication as we simply ignore
additional columns based on the relation information sent previously which
didn't have those columns. However, some other users of pgoutput plugin
may expect the columns as per the column list. Also, sending extra columns
unnecessarily consumes network bandwidth defeating the purpose of the
column list feature.

Reported-by: Gunnar Morling
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 15
Discussion: https://postgr.es/m/CADGJaX9kiRZ-OH0EpWF5Fkyh1ZZYofoNRCrhapBfdk02tj5EKg@mail.gmail.com
2022-12-02 11:14:42 +05:30
Tom Lane
55fa993d7e Improve heuristics for compressing the KnownAssignedXids array.
Previously, we'd compress only when the active range of array entries
reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids).
If max_connections is large, the first term could result in not
compressing for a long time, resulting in much wastage of cycles in
hot-standby backends scanning the array to take snapshots.  Get rid
of that term, and just bound it to 2 * pArray->numKnownAssignedXids.

That however creates the opposite risk, that we might spend too much
effort compressing.  Hence, consider compressing only once every 128
commit records.  (This frequency was chosen by benchmarking.  While
we only tried one benchmark scenario, the results seem stable over
a fairly wide range of frequencies.)

Also, force compression when processing RecoveryInfo WAL records
(which should be infrequent); the old code could perform compression
then, but would do so only after the same array-range check as for
the transaction-commit path.

Also, opportunistically run compression if the startup process is about
to wait for WAL, though not oftener than once a second.  This should
prevent cases where we waste lots of time by leaving the array
not-compressed for long intervals due to low WAL traffic.

Lastly, add a simple check to keep us from uselessly compressing
when the array storage is already compact.

Back-patch, as the performance problem is worse in pre-v14 branches
than in HEAD.

Simon Riggs and Michail Nikolaev, with help from Tom Lane and
Andres Freund.

Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
2022-11-29 15:43:17 -05:00
Etsuro Fujita
fc02019c09 Fix handling of pending inserts in nodeModifyTable.c.
Commit b663a4136, which allowed FDWs to INSERT rows in bulk, added to
nodeModifyTable.c code to flush pending inserts to the foreign-table
result relation(s) before completing processing of the ModifyTable node,
but the code failed to take into account the case where the INSERT query
has modifying CTEs, leading to incorrect results.

Also, that commit failed to flush pending inserts before firing BEFORE
ROW triggers so that rows are visible to such triggers.

In that commit we scanned through EState's
es_tuple_routing_result_relations or es_opened_result_relations list to
find the foreign-table result relations to which pending inserts are
flushed, but that would be inefficient in some cases.  So to fix, 1) add
a List member to EState to record the insert-pending result relations,
and 2) modify nodeModifyTable.c so that it adds the foreign-table result
relation to the list in ExecInsert() if appropriate, and flushes pending
inserts properly using the list where needed.

While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(),
which was added by that commit.

Back-patch to v14 where that commit appeared.

Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com
2022-11-25 17:45:01 +09:00
Tom Lane
0353db996e Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline.
I just spent an annoying amount of time reverse-engineering the
100%-undocumented API between ts_headline and the text search
parser's prsheadline function.  Add some commentary about that
while it's fresh in mind.  Also remove some unused macros in
wparser_def.c.

While at it, I noticed that when commit 78e73e875 added a
CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed
doing so in the parallel function TS_phrase_execute, which
surely needs one just as much.

Back-patch because of the missing CHECK_FOR_INTERRUPTS.
Might as well back-patch the rest of this too.
2022-11-21 17:07:07 -05:00
Tom Lane
2c6d43650d Fix CREATE DATABASE so we can pg_upgrade DBs with OIDs above 2^31.
Commit aa0105141 repeated one of the oldest mistakes in our book:
thinking that OID is the same as int32.  It isn't of course, and
unsurprisingly the first person who came along with a database
OID above 2 billion broke it.  Repair.

Per bug #17677 from Sergey Pankov.  Back-patch to v15.

Discussion: https://postgr.es/m/17677-a99fa067d7ed71c9@postgresql.org
2022-11-04 10:39:52 -04:00
Tom Lane
725cd4d2e4 Add casts to simplehash.h to silence C++ warnings.
Casting the result of palloc etc. to the intended type is more per
project style anyway.

(The fact that cpluspluscheck doesn't notice these problems is
because it doesn't expand any macros, which seems like a troubling
shortcoming.  Don't have a good idea about improving that.)

Back-patch to v13, which is as far as the patch applies cleanly;
doesn't seem worth working harder.

David Geier

Discussion: https://postgr.es/m/aa5d88a3-71f4-3455-11cf-82de0372c941@gmail.com
2022-11-03 10:47:31 -04:00
Tom Lane
a5737e765d Allow use of __sync_lock_test_and_set for spinlocks on any machine.
If we have no special-case code in s_lock.h for the current platform,
but the compiler has __sync_lock_test_and_set, use that instead of
failing.  It's unlikely that anybody's __sync_lock_test_and_set
would be so awful as to be worse than our semaphore-based fallback,
but if it is, they can (continue to) use --disable-spinlocks.

This allows removal of the RISC-V special case installed by commit
c32fcac56, which generated exactly the same code but only on that
platform.  Usefully, the RISC-V buildfarm animals should now test
at least the int variant of this patch.

I've manually tested both variants on ARM by dint of removing the
ARM-specific stanza.  We don't want to drop that, because it already
has some special knowledge and is likely to grow more over time.
Likewise, this is not meant to preclude installing special cases
for other arches if that proves worthwhile.

Per discussion of a request to install the same code for loongarch64.
Like the previous patch, we might as well back-patch to supported
branches.

Discussion: https://postgr.es/m/761ac43d44b84d679ba803c2bd947cc0@HSMAILSVR04.hs.handsome.com.cn
2022-11-02 17:37:26 -04:00
David Rowley
23f4427612 Fix outdated comment in tuplesort.h
This was outdated by 77bae396d.

Backpatch-through: 15, where 77bae396d was added
2022-11-02 15:29:49 +13:00
Alvaro Herrera
fb2a83b2b7
Update some comments that should've covered MERGE
Oversight in 7103ebb7aa.  Backpatch to 15.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48gnDjZXq3-b56dVpQCNUJ5hD9kdtWN4QFwKCEapspNsA@mail.gmail.com
2022-10-24 12:52:43 +02:00
Robert Haas
5c013e620c pg_basebackup: Fix cross-platform tablespace relocation.
Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.

Patch by me, reviewed by Tom Lane, Andrew Dunstan, and
Davinder Singh.

Discussion: http://postgr.es/m/CA+TgmoY+jC3YiskomvYKDPK3FbrmsDU7_8+wMHt02HOdJeRb0g@mail.gmail.com
2022-10-21 08:31:39 -04:00
Michael Paquier
f2f7e509e6 Rename SetSingleFuncCall() to InitMaterializedSRF()
Per discussion, the existing routine name able to initialize a SRF
function with materialize mode is unpopular, so rename it.  Equally, the
flags of this function are renamed, as of:
- SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC
- SRF_SINGLE_BLESS -> MAT_SRF_BLESS
The previous function and flags introduced in 9e98583 are kept around
for compatibility purposes, so as any extension code already compiled
with v15 continues to work as-is.  The declarations introduced here for
compatibility will be removed from HEAD in a follow-up commit.

The new names have been suggested by Andres Freund and Melanie
Plageman.

Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Backpatch-through: 15
2022-10-18 10:22:40 +09:00
Tom Lane
24c4c26171 Rename parser token REF to REF_P to avoid a symbol conflict.
In the latest version of Apple's macOS SDK, <sys/socket.h>
fails to compile if "REF" is #define'd as something.
Apple may or may not agree that this is a bug, and even if
they do accept the bug report I filed, they probably won't
fix it very quickly.  In the meantime, our back branches will all
fail to compile gram.y.  v15 and HEAD currently escape the problem
thanks to the refactoring done in 98e93a1fc, but that's purely
accidental.  Moreover, since that patch removed a widely-visible
inclusion of <netdb.h>, back-patching it seems too likely to break
third-party code.

Instead, change the token's code name to REF_P, following our usual
convention for naming parser tokens that are likely to have symbol
conflicts.  The effects of that should be localized to the grammar
and immediately surrounding files, so it seems like a safer answer.

Per project policy that we want to keep recently-out-of-support
branches buildable on modern systems, back-patch all the way to 9.2.

Discussion: https://postgr.es/m/1803927.1665938411@sss.pgh.pa.us
2022-10-16 15:27:04 -04:00
Andres Freund
c3b5992b91 pgstat: Prevent stats reset from corrupting slotname by removing slotname
Previously PgStat_StatReplSlotEntry contained the slotname, which was mainly
used when writing out the stats during shutdown, to identify the slot in the
serialized data (at runtime the index in ReplicationSlotCtl->replication_slots
is used, but that can change during a restart). Unfortunately the slotname was
overwritten when the slot's stats were reset.

That turned out to only cause "real" problems if the slot was active during
the reset, triggering an assertion failure at the next
pgstat_report_replslot(). In other paths the stats were re-initialized during
pgstat_acquire_replslot().

Fix this by removing slotname from PgStat_StatReplSlotEntry. Instead we can
get the slot's name from the slot itself. Besides fixing a bug, this also is
architecturally cleaner (a name is not really statistics). This is safe
because stats, for a slot removed while shut down, will not be restored at
startup.

In 15 the slotname is not removed, but renamed, to avoid changing the stats
format. In master, bump PGSTAT_FILE_FORMAT_ID.

This commit does not contain a test for the fix. I think this can only be
tested by a tap test starting pg_recvlogical in the background and checking
pg_recvlogical's output. That type of test is notoriously hard to be reliable,
so committing it shortly before the release is wrapped seems like a bad idea.

Reported-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/YxfagaTXUNa9ggLb@ahch-to
Backpatch: 15-, where the bug was introduced in 5891c7a8ed
2022-10-08 09:39:34 -07:00
Tom Lane
443df6e2db Revert "Optimize order of GROUP BY keys".
This reverts commit db0d67db24 and
several follow-on fixes.  The idea of making a cost-based choice
of the order of the sorting columns is not fundamentally unsound,
but it requires cost information and data statistics that we don't
really have.  For example, relying on procost to distinguish the
relative costs of different sort comparators is pretty pointless
so long as most such comparator functions are labeled with cost 1.0.
Moreover, estimating the number of comparisons done by Quicksort
requires more than just an estimate of the number of distinct values
in the input: you also need some idea of the sizes of the larger
groups, if you want an estimate that's good to better than a factor of
three or so.  That's data that's often unknown or not very reliable.
Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.
Even if all the inputs are perfectly reliable, the cost algorithm
as-implemented cannot offer useful information about how to order
sorting columns beyond the point at which the average group size
is estimated to drop to 1.

Close inspection of the code added by db0d67db2 shows that there
are also multiple small bugs.  These could have been fixed, but
there's not much point if we don't trust the estimates to be
accurate in-principle.

Finally, the changes in cost_sort's behavior made for very large
changes (often a factor of 2 or so) in the cost estimates for all
sorting operations, not only those for multi-column GROUP BY.
That naturally changes plan choices in many situations, and there's
precious little evidence to show that the changes are for the better.
Given the above doubts about whether the new estimates are really
trustworthy, it's hard to summon much confidence that these changes
are better on the average.

Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.

Note: in v15, I left T_PathKeyInfo in place in nodes.h even though
it's unreferenced.  Removing it would be an ABI break, and it seems
a bit late in the release cycle for that.

Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
2022-10-03 10:56:16 -04:00
Peter Geoghegan
1ef00afe2e Make check_usermap() parameter names consistent.
The function has a bool argument named "case_insensitive", but that was
spelled "case_sensitive" in the declaration.  Make them consistent now
to avoid confusion in the future.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Backpatch: 10-
2022-09-17 16:54:16 -07:00
Peter Eisentraut
7fe55d5e12 Expand palloc/pg_malloc API for more type safety
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.

Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type.  There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.

Inspired by the talloc library.

This is backpatched from master so that future backpatchable code can
make use of these APIs.  This patch by itself does not contain any
users of these APIs.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-14 06:04:24 +02:00
Michael Paquier
53332eacaf Simplify handling of compression level with compression specifications
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given.  This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification.  This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it.  zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it.  It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).

This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods.  Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.

Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip.  This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.

Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
2022-09-14 12:17:03 +09:00
Thomas Munro
dd38ff28ad Fix recovery_prefetch with low maintenance_io_concurrency.
We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.

That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2.  Fix the ordering, and also remove
that clamp.  We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.

Also add an explicit error message for missing contrecords.  It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment.  Reporting an error
prevents that.

Back-patch to 15.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
2022-09-08 20:36:44 +12:00
Tom Lane
e25631265a Fix new pg_publication_tables query.
The addition of published column names forgot to filter on attisdropped,
leading to cases where you could see "........pg.dropped.1........"
or the like as a reportedly-published column.

While we're here, rewrite the new subquery to get a more efficient plan
for it.

Hou Zhijie, per report from Jaime Casanova.  Back-patch to v15 where
the bug was introduced.  (Sadly, this means we need a post-beta4
catversion bump before beta4 has even hit the streets.  I see no
good alternative though.)

Discussion: https://postgr.es/m/Yxa1SU4nH2HfN3/i@ahch-to
2022-09-06 18:00:32 -04:00
Thomas Munro
d0d9344900 Fix cache invalidation bug in recovery_prefetch.
XLogPageRead() can retry internally after a pread() system call has
succeeded, in the case of short reads, and page validation failures
while in standby mode (see commit 0668719801).  Due to an oversight in
commit 3f1ce973, these cases could leave stale data in the internal
cache of xlogreader.c without marking it invalid.  The main defense
against stale cached data on failure to read a page was in the error
handling path of the calling function ReadPageInternal(), but that
wasn't quite enough for errors handled internally by XLogPageRead()'s
retry loop if we then exited with XLREAD_WOULDBLOCK.

1.  ReadPageInternal() now marks the cache invalid before calling the
    page_read callback, by setting state->readLen to 0.  It'll be set to
    a non-zero value only after a successful read.  It'll stay valid as
    long as the caller requests data in the cached range.

2.  XLogPageRead() no long performs internal retries while reading
    ahead.  While such retries should work, the general philosophy is
    that we should give up prefetching if anything unusual happens so we
    can handle it when recovery catches up, to reduce the complexity of
    the system.  Let's do that here too.

3.  While here, a new function XLogReaderResetError() improves the
    separation between xlogrecovery.c and xlogreader.c, where the former
    previously clobbered the latter's internal error buffer directly.
    The new function makes this more explicit, and also clears a related
    flag, without which a standby would needlessly retry in the outer
    function.

Thanks to Noah Misch for tracking down the conditions required for a
rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and
providing a reproducer.

Back-patch to 15.

Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
2022-09-03 13:29:36 +12:00
Andrew Dunstan
96ef3237bf Revert SQL/JSON features
The reverts the following and makes some associated cleanups:

    commit f79b803dc: Common SQL/JSON clauses
    commit f4fb45d15: SQL/JSON constructors
    commit 5f0adec25: Make STRING an unreserved_keyword.
    commit 33a377608: IS JSON predicate
    commit 1a36bc9db: SQL/JSON query functions
    commit 606948b05: SQL JSON functions
    commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
    commit 4e34747c8: JSON_TABLE
    commit fadb48b00: PLAN clauses for JSON_TABLE
    commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
    commit 14d3f24fa: Further improve jsonb_sqljson parallel test
    commit a6baa4bad: Documentation for SQL/JSON features
    commit b46bcf7a4: Improve readability of SQL/JSON documentation.
    commit 112fdb352: Fix finalization for json_objectagg and friends
    commit fcdb35c32: Fix transformJsonBehavior
    commit 4cd8717af: Improve a couple of sql/json error messages
    commit f7a605f63: Small cleanups in SQL/JSON code
    commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
    commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
    commit a1e7616d6: Rework SQL/JSON documentation
    commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
    commit 3c633f32b: Only allow returning string types or bytea from json_serialize
    commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.

The release notes are also adjusted.

Backpatch to release 15.

Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
2022-09-01 17:10:42 -04:00
Andres Freund
045ec34785 pgstat: Acquire lock when reading variable-numbered stats
Somewhere during the development of the patch acquiring a lock during read
access to variable-numbered stats got lost. The missing lock acquisition won't
cause corruption, but can lead to reading torn values when accessing
stats. Add the missing lock acquisitions.

Reported-by: Greg Stark <stark@mit.edu>
Reviewed-by: "Drouvot, Bertrand" <bdrouvot@amazon.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CAM-w4HMYkM_DkYhWtUGV+qE_rrBxKOzOF0+5faozxO3vXrc9wA@mail.gmail.com
Backpatch: 15-
2022-08-22 20:17:45 -07:00
Michael Paquier
522ae011fa Allow event trigger table_rewrite for ALTER MATERIALIZED VIEW
This event can happen when using SET ACCESS METHOD, as the data files of
the materialized need a full refresh but this command tag was not
updated to reflect that.  The documentation is updated to track this
behavior.

Author: Onder Kalaci
Discussion: https://postgr.es/m/CACawEhXwHN3X34FiwoYG8vXR-oyUdrp7qcfRWSzS+NPahS5gSw@mail.gmail.com
Backpatch-through: 15
2022-08-17 14:55:24 +09:00
Amit Kapila
272248a0c1 Fix catalog lookup with the wrong snapshot during logical decoding.
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.

To fix this problem, this changes the snapshot builder so that it
remembers the last-running-xacts list of the decoded RUNNING_XACTS record
after restoring the previously serialized snapshot. Then, we mark the
transaction as containing catalog changes if it's in the list of initial
running transactions and its commit record has XACT_XINFO_HAS_INVALS. To
avoid ABI breakage, we store the array of the initial running transactions
in the static variables InitialRunningXacts and NInitialRunningXacts,
instead of storing those in SnapBuild or ReorderBuffer.

This approach has a false positive; we could end up adding the transaction
that didn't change catalog to the snapshot since we cannot distinguish
whether the transaction has catalog changes only by checking the COMMIT
record. It doesn't have the information on which (sub) transaction has
catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate
that the transaction has catalog change. But that won't be a problem since
we use snapshot built during decoding only to read system catalogs.

On the master branch, we took a more future-proof approach by writing
catalog modifying transactions to the serialized snapshot which avoids the
above false positive. But we cannot backpatch it because of a change in
the SnapBuild.

Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
2022-08-11 09:56:49 +05:30
Robert Haas
f88798c098 Move basebackup code to new directory src/backend/backup
Reviewed by David Steele and Justin Pryzby

Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com
2022-08-10 14:03:09 -04:00
Tom Lane
cc7e0feba5 In extensions, don't replace objects not belonging to the extension.
Previously, if an extension script did CREATE OR REPLACE and there was
an existing object not belonging to the extension, it would overwrite
the object and adopt it into the extension.  This is problematic, first
because the overwrite is probably unintentional, and second because we
didn't change the object's ownership.  Thus a hostile user could create
an object in advance of an expected CREATE EXTENSION command, and would
then have ownership rights on an extension object, which could be
modified for trojan-horse-type attacks.

Hence, forbid CREATE OR REPLACE of an existing object unless it already
belongs to the extension.  (Note that we've always forbidden replacing
an object that belongs to some other extension; only the behavior for
previously-free-standing objects changes here.)

For the same reason, also fail CREATE IF NOT EXISTS when there is
an existing object that doesn't belong to the extension.

Our thanks to Sven Klemm for reporting this problem.

Security: CVE-2022-2625
2022-08-08 11:12:31 -04:00
Alvaro Herrera
e78fd90845
Fix ENABLE/DISABLE TRIGGER to handle recursion correctly
Using ATSimpleRecursion() in ATPrepCmd() to do so as bbb927b4db did is
not correct, because ATPrepCmd() can't distinguish between triggers that
may be cloned and those that may not, so would wrongly try to recurse
for the latter category of triggers.

So this commit restores the code in EnableDisableTrigger() that
86f575948c had added to do the recursion, which would do it only for
triggers that may be cloned, that is, row-level triggers.  This also
changes tablecmds.c such that ATExecCmd() is able to pass the value of
ONLY flag down to EnableDisableTrigger() using its new 'recurse'
parameter.

This also fixes what seems like an oversight of 86f575948c that the
recursion to partition triggers would only occur if EnableDisableTrigger()
had actually changed the trigger.  It is more apt to recurse to inspect
partition triggers even if the parent's trigger didn't need to be
changed: only then can we be certain that all descendants share the same
state afterwards.

Backpatch all the way back to 11, like bbb927b4db.  Care is taken not
to break ABI compatibility (and that no catversion bump is needed.)

Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://postgr.es/m/CA+HiwqG-cZT3XzGAnEgZQLoQbyfJApVwOTQaCaas1mhpf+4V5A@mail.gmail.com
2022-08-05 09:47:15 +02:00
Tom Lane
14168d57bb Fix incorrect tests for SRFs in relation_can_be_sorted_early().
Commit fac1b470a thought we could check for set-returning functions
by testing only the top-level node in an expression tree.  This is
wrong in itself, and to make matters worse it encouraged others
to make the same mistake, by exporting tlist.c's special-purpose
IS_SRF_CALL() as a widely-visible macro.  I can't find any evidence
that anyone's taken the bait, but it was only a matter of time.

Use expression_returns_set() instead, and stuff the IS_SRF_CALL()
genie back in its bottle, this time with a warning label.  I also
added a couple of cross-reference comments.

After a fair amount of fooling around, I've despaired of making
a robust test case that exposes the bug reliably, so no test case
here.  (Note that the test case added by fac1b470a is itself
broken, in that it doesn't notice if you remove the code change.
The repro given by the bug submitter currently doesn't fail either
in v15 or HEAD, though I suspect that may indicate an unrelated bug.)

Per bug #17564 from Martijn van Oosterhout.  Back-patch to v13,
as the faulty patch was.

Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
2022-08-03 17:33:42 -04:00
Tom Lane
a0c632c1de Force immediate commit after CREATE DATABASE etc in extended protocol.
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol.  In extended protocol, we didn't commit
until receiving a Sync message.  Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK.  In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.

This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.

To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit.  This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.

While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining.  That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.

Per bug #17434 from Yugo Nagata.  It's been wrong for ages,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
2022-07-26 13:07:03 -04:00
Tom Lane
31ed3cf746 Process session_preload_libraries within InitPostgres's transaction.
Previously we did this after InitPostgres, at a somewhat randomly chosen
place within PostgresMain.  However, since commit a0ffa885e doing this
outside a transaction can cause a crash, if we need to check permissions
while replacing a placeholder GUC.  (Besides which, a preloaded library
could itself want to do database access within _PG_init.)

To avoid needing an additional transaction start/end in every session,
move the process_session_preload_libraries call to within InitPostgres's
transaction.  That requires teaching the code not to call it when
InitPostgres is called from somewhere other than PostgresMain, since
we don't want session_preload_libraries to affect background workers.
The most future-proof solution here seems to be to add an additional
flag parameter to InitPostgres; fortunately, we're not yet very worried
about API stability for v15.

Doing this also exposed the fact that we're currently honoring
session_preload_libraries in walsenders, even those not connected to
any database.  This seems, at minimum, a POLA violation: walsenders
are not interactive sessions.  Let's stop doing that.

(All these comments also apply to local_preload_libraries, of course.)

Per report from Gurjeet Singh (thanks also to Nathan Bossart and Kyotaro
Horiguchi for review).  Backpatch to v15 where a0ffa885e came in.

Discussion: https://postgr.es/m/CABwTF4VEpwTHhRQ+q5MiC5ucngN-whN-PdcKeufX7eLSoAfbZA@mail.gmail.com
2022-07-25 10:27:43 -04:00
Tom Lane
a2944d8724 Fix missed corner cases for grantable permissions on GUCs.
We allow users to set the values of not-yet-loaded extension GUCs,
remembering those values in "placeholder" GUC entries.  When/if
the extension is loaded later in the session, we need to verify that
the user had permissions to set the GUC.  That was done correctly
before commit a0ffa885e, but as of that commit, we'd check the
permissions of the active role when the LOAD happens, not the role
that had set the value.  (This'd be a security bug if it had made it
into a released version.)

In principle this is simple enough to fix: we just need to remember
the exact role OID that set each GUC value, and use that not
GetUserID() when verifying permissions.  Maintaining that data in
the guc.c data structures is slightly tedious, but fortunately it's
all basically just copy-n-paste of the logic for tracking the
GucSource of each setting, as we were already doing.

Another oversight is that validate_option_array_item() hadn't
been taught to check for granted GUC privileges.  This appears
to manifest only in that ALTER ROLE/DATABASE RESET ALL will
fail to reset settings that the user should be allowed to reset.

Patch by myself and Nathan Bossart, per report from Nathan Bossart.
Back-patch to v15 where the faulty code came in.

Discussion: https://postgr.es/m/20220706224727.GA2158260@nathanxps13
2022-07-19 17:22:31 -04:00
Andres Freund
92f478657c windows: msvc: Define STDIN/OUT/ERR_FILENO.
Because they are not available we've used _fileno(stdin) in some places, but
that doesn't reliably work, because stdin might be closed. This is the
prerequisite of the subsequent commit, fixing a failure introduced in
76e38b37a5.

Author: Andres Freund <andres@anarazel.de>
Reported-By: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Message-Id: 20220520164558.ozb7lm6unakqzezi@alap3.anarazel.de (on pgsql-packagers)
Backpatch: 15-, where 76e38b37a5 came in
2022-07-18 17:22:35 -07:00
David Rowley
8c297dd5f4 Fix inconsistent parameter names between prototype and declaration
Noticed while working in this area.  This code was introduced in PG15,
which is still in beta, so backpatch to there for consistency.

Backpatch-through: 15
2022-07-15 15:27:14 +12:00