Supply a new memory manager for RuntimeDyld, to avoid crashes in
generated code caused by memory placement that can overflow a 32 bit
data type. This is a drop-in replacement for the
llvm::SectionMemoryManager class in the LLVM library, with Michael
Smith's proposed fix from
https://www.github.com/llvm/llvm-project/pull/71968.
We hereby slurp it into our own source tree, after moving into a new
namespace llvm::backport and making some minor adjustments so that it
can be compiled with older LLVM versions as far back as 12. It's harder
to make it work on even older LLVM versions, but it doesn't seem likely
that people are really using them so that is not investigated for now.
The problem could also be addressed by switching to JITLink instead of
RuntimeDyld, and that is the LLVM project's recommended solution as
the latter is about to be deprecated. We'll have to do that soon enough
anyway, and then when the LLVM version support window advances far
enough in a few years we'll be able to delete this code. Unfortunately
that wouldn't be enough for PostgreSQL today: in most relevant versions
of LLVM, JITLink is missing or incomplete.
Several other projects have already back-ported this fix into their fork
of LLVM, which is a vote of confidence despite the lack of commit into
LLVM as of today. We don't have our own copy of LLVM so we can't do
exactly what they've done; instead we have a copy of the whole patched
class so we can pass an instance of it to RuntimeDyld.
The LLVM project hasn't chosen to commit the fix yet, and even if it
did, it wouldn't be back-ported into the releases of LLVM that most of
our users care about, so there is not much point in waiting any longer
for that. If they make further changes and commit it to LLVM 19 or 20,
we'll still need this for older versions, but we may want to
resynchronize our copy and update some comments.
The changes that we've had to make to our copy can be seen by diffing
our SectionMemoryManager.{h,cpp} files against the ones in the tree of
the pull request. Per the LLVM project's license requirements, a copy
is in SectionMemoryManager.LICENSE.
This should fix the spate of crash reports we've been receiving lately
from users on large memory ARM systems.
Back-patch to all supported releases.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se> (license aspects)
Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
This reverts commit 95c5acb3fc (v17) and
counterparts in each other non-master branch. If released, that commit
would have caused a worst-in-years minor release regression, via
undetected LWLock self-deadlock. This commit and its self-deadlock fix
warrant more bake time in the master branch.
Reported by Alexander Lakhin.
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Commit a07e03fd8f changed inplace updates
to wait for heap_update() commands like GRANT TABLE and GRANT DATABASE.
By keeping the pin during that wait, a sequence of autovacuum workers
and an uncommitted GRANT starved one foreground LockBufferForCleanup()
for six minutes, on buildfarm member sarus. Prevent, at the cost of a
bit of complexity. Back-patch to v12, like the earlier commit. That
commit and heap_inplace_lock() have not yet appeared in any release.
Discussion: https://postgr.es/m/20241026184936.ae.nmisch@google.com
There are two functions that can be used in event triggers to get more
details about a rewrite happening on a relation. Both had a limited
documentation:
- pg_event_trigger_table_rewrite_reason() and
pg_event_trigger_table_rewrite_oid() were not mentioned in the main
event trigger section in the paragraph dedicated to the event
table_rewrite.
- pg_event_trigger_table_rewrite_reason() returns an integer which is a
bitmap of the reasons why a rewrite happens. There was no explanation
about the meaning of these values, forcing the reader to look at the
code to find out that these are defined in event_trigger.h.
While on it, let's add a comment in event_trigger.h where the
AT_REWRITE_* are defined, telling to update the documentation when
these values are changed.
Backpatch down to 13 as a consequence of 1ad23335f3, where this area
of the documentation has been heavily reworked.
Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmmL+Z6j-C8dAx1tVrnBmZJu+BSoc68WSg3sR+CVNjBCqbw@mail.gmail.com
Backpatch-through: 13
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption. Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll(). All branches change the ABI of
extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions.
Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
An inplace update's invalidation messages are part of its transaction's
commit record. However, the update survives even if its transaction
aborts or we stop recovery before replaying its transaction commit.
After recovery, a backend that started in recovery could update the row
without incorporating the inplace update. That could result in a table
with an index, yet relhasindex=f. That is a source of index corruption.
This bulk invalidation avoids the functional consequences. A future
change can fix the !RecoveryInProgress() scenario without changing the
WAL format. Back-patch to v17 - v12 (all supported versions). v18 will
instead add invalidations to WAL.
Discussion: https://postgr.es/m/20240618152349.7f.nmisch@google.com
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect. While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance, when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.
Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit. Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing(). This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now process raw_expr directly in eval_const_expressions_mutator().
Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.
Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
The previous commit fixed some ways of losing an inplace update. It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update(). This includes MERGE commands. To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update. The
v14+ UPDATE fix needs commit 86dc90056d,
and it wasn't worth reimplementing that fix without such infrastructure.
Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.
Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
As previously-added tests demonstrated, heap_inplace_update() could
instead update an unrelated tuple of the same catalog. It could lose
the update. Losing relhasindex=t was a source of index corruption.
Inplace-updating commands like VACUUM will now wait for heap_update()
commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a
long-running GRANT already hurts VACUUM progress more just by keeping an
XID running. The VACUUM will behave like a DELETE or UPDATE waiting for
the uncommitted change.
For implementation details, start at the systable_inplace_update_begin()
header comment and README.tuplock. Back-patch to v12 (all supported
versions). In back branches, retain a deprecated heap_inplace_update(),
for extensions.
Reported by Smolkin Grigory. Reviewed by Nitin Motiani, (in earlier
versions) Heikki Linnakangas, and (in earlier versions) Alexander
Lakhin.
Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com
We advance origin progress during abort on successful streaming and
application of ROLLBACK in parallel streaming mode. But the origin
shouldn't be advanced during an error or unsuccessful apply due to
shutdown. Otherwise, it will result in a transaction loss as such a
transaction won't be sent again by the server.
Reported-by: Hou Zhijie
Author: Hayato Kuroda and Shveta Malik
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
The code intends to allow GUCs to be set within parallel workers
via function SET clauses, but not otherwise. However, doing so fails
for "session_authorization" and "role", because the assign hooks for
those attempt to set the subsidiary "is_superuser" GUC, and that call
falls foul of the "not otherwise" prohibition. We can't switch to
using GUC_ACTION_SAVE for this, so instead add a new GUC variable
flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set
anyway. (This is okay because is_superuser has context PGC_INTERNAL
and thus only hard-wired calls can change it. We'd need more thought
before applying the flag to other GUCs; but maybe there are other
use-cases.) This isn't the prettiest fix perhaps, but other
alternatives we thought of would be much more invasive.
While here, correct a thinko in commit 059de3ca4: when rejecting
a GUC setting within a parallel worker, we should return 0 not -1
if the ereport doesn't longjmp. (This seems to have no consequences
right now because no caller cares, but it's inconsistent.) Improve
the comments to try to forestall future confusion of the same kind.
Despite the lack of field complaints, this seems worth back-patching.
Thanks to Nathan Bossart for the idea to invent a new flag,
and for review.
Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.
This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.
To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.
Back-patch to all supported branches.
Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
Commit 274bbced85 accidentally placed the pg_config.h.in
for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader
places it. Fix by re-arranging and backpatch to the same level as
the original commit.
Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru>
Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru
Backpatch-through: v12
OpenSSL supports two types of session tickets for TLSv1.3, stateless
and stateful. The option we've used only turns off stateless tickets
leaving stateful tickets active. Use the new API introduced in 1.1.1
to disable all types of tickets.
Backpatch to all supported versions.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de
Backpatch-through: v12
If vacuum fails to remove a tuple with xmax older than
VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed,
it will loop infinitely in lazy_scan_prune(), which compares tuples'
visibility information to OldestXmin.
Starting in version 14, which uses GlobalVisState for visibility testing
during pruning, it is possible for GlobalVisState->maybe_needed to
precede OldestXmin if maybe_needed is forced to go backward while vacuum
is running. This can happen if a disconnected standby with a running
transaction older than VacuumCutoffs->OldestXmin reconnects to the
primary after vacuum initially calculates GlobalVisState and OldestXmin.
Fix this by having vacuum always remove tuples older than OldestXmin
during pruning. This is okay because the standby won't replay the tuple
removal until the tuple is removable. Thus, the worst that can happen is
a recovery conflict.
Fixes BUG# 17257
Back-patched in versions 14-17
Author: Melanie Plageman
Reviewed-by: Noah Misch, Peter Geoghegan, Robert Haas, Andres Freund, and Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
When creating and initializing a logical slot, the restart_lsn is set
to the latest WAL insertion point (or the latest replay point on
standbys). Subsequently, WAL records are decoded from that point to
find the start point for extracting changes in the
DecodingContextFindStartpoint() function. Since the initial
restart_lsn could be in the middle of a transaction, the start point
must be a consistent point where we won't see the data for partial
transactions.
Previously, when not building a full snapshot, serialized snapshots
were restored, and the SnapBuild jumps to the consistent state even
while finding the start point. Consequently, the slot's restart_lsn
and confirmed_flush could be set to the middle of a transaction. This
could lead to various unexpected consequences. Specifically, there
were reports of logical decoding decoding partial transactions, and
assertion failures occurred because only subtransactions were decoded
without decoding their top-level transaction until decoding the commit
record.
To resolve this issue, the changes prevent restoring the serialized
snapshot and jumping to the consistent state while finding the start
point.
On v17 and HEAD, a flag indicating whether snapshot restores should be
skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION
has been bumpded.
On backbranches, the flag is stored in the LogicalDecodingContext
instead, preserving on-disk compatibility.
Backpatch to all supported versions.
Reported-by: Drew Callahan
Reviewed-by: Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com
Backpatch-through: 12
macOS 15's SDK pulls in headers related to <regex.h> when we include
<xlocale.h>. This causes our own regex_t implementation to clash with
the OS's regex_t implementation. Luckily our function names already had
pg_ prefixes, but the macros and typenames did not.
Include <regex.h> explicitly on all POSIX systems, and fix everything
that breaks. Then we can prove that we are capable of fully hiding and
replacing the system regex API with our own.
1. Deal with standard-clobbering macros by undefining them all first.
POSIX says they are "symbolic constants". If they are macros, this
allows us to redefine them. If they are enums or variables, our macros
will hide them.
2. Deal with standard-clobbering types by giving our types pg_
prefixes, and then using macros to redirect xxx_t -> pg_xxx_t.
After including our "regex/regex.h", the system <regex.h> is hidden,
because we've replaced all the standard names. The PostgreSQL source
tree and extensions can continue to use standard prefix-less type and
macro names, but reach our implementation, if they included our
"regex/regex.h" header.
Back-patch to all supported branches, so that macOS 15's tool chain can
build them.
Reported-by: Stan Hu <stanhu@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAMBWrQnEwEJtgOv7EUNsXmFw2Ub4p5P%2B5QTBEgYwiyjy7rAsEQ%40mail.gmail.com
This extends ad98fb1422 to invals of
inplace updates. Trouble requires an inplace update of a catalog having
a TOAST table, so only pg_database was at risk. (The other catalog on
which core code performs inplace updates, pg_class, has no TOAST table.)
Trouble would require something like the inplace-inval.spec test.
Consider GRANT ... ON DATABASE fetching a stale row from cache and
discarding a datfrozenxid update that vac_truncate_clog() has already
relied upon. Back-patch to v12 (all supported versions).
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
Commit 5b562644fe added a comment that
SetRelationHasSubclass() callers must hold this lock. When commit
17f206fbc8 extended use of this column to
partitioned indexes, it didn't take the lock. As the latter commit
message mentioned, we currently never reset a partitioned index to
relhassubclass=f. That largely avoids harm from the lock omission. The
cause for fixing this now is to unblock introducing a rule about locks
required to heap_update() a pg_class row. This might cause more
deadlocks. It gives minor user-visible benefits:
- If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE
ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks
instead of failing with "tuple concurrently updated". (Many cases of
DDL concurrency still fail that way.)
- Match ALTER INDEX ATTACH PARTITION in choosing to lock the index.
While not user-visible today, we'll need this if we ever make something
set the flag to false for a partitioned index, like ANALYZE does today
for tables. Back-patch to v12 (all supported versions), the plan for
the commit relying on the new rule. In back branches, add
LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter.
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240611024525.9f.nmisch@google.com
We did not recover the subtransaction IDs of prepared transactions
when starting a hot standby from a shutdown checkpoint. As a result,
such subtransactions were considered as aborted, rather than
in-progress. That would lead to hint bits being set incorrectly, and
the subtransactions suddenly becoming visible to old snapshots when
the prepared transaction was committed.
To fix, update pg_subtrans with prepared transactions's subxids when
starting hot standby from a shutdown checkpoint. The snapshots taken
from that state need to be marked as "suboverflowed", so that we also
check the pg_subtrans.
Backport to all supported versions.
Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may
insert some REDIRECT tuples. This will typically happen in
a transaction that lacks an XID, which leads either to assertion
failure in spgFormDeadTuple or to insertion of a REDIRECT tuple
with zero xid. The latter's not good either, since eventually
VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid,
resulting in either an assertion failure or a garbage answer.
In practice, since REINDEX CONCURRENTLY locks out index scans
till it's done, it doesn't matter whether it inserts REDIRECTs
or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM
reduces such a REDIRECT to a PLACEHOLDER. So in non-assert builds
there's no observable problem here, other than perhaps a little
index bloat. But it's not behaving as intended.
To fix, remove the failing Assert in spgFormDeadTuple, acknowledging
that we might sometimes insert a zero XID; and guard VACUUM's
GlobalVisTestIsRemovableXid() call with a test for valid XID,
ensuring that we'll reduce such a REDIRECT the first time VACUUM
sees it. (Versions before v14 use TransactionIdPrecedes here,
which won't fail on zero xid, so they really have no bug at all
in non-assert builds.)
Another solution could be to not create REDIRECTs at all during
REINDEX CONCURRENTLY, making the relevant code paths treat that
case like index build (which likewise knows that no concurrent
index scans can be happening). That would allow restoring the
Assert in spgFormDeadTuple, but we'd still need the VACUUM change
because redirection tuples with zero xid may be out there already.
But there doesn't seem to be a nice way for spginsert() to tell that
it's being called in REINDEX CONCURRENTLY without some API changes,
so we'll leave that as a possible future improvement.
In HEAD, also rename the SpGistState.myXid field to redirectXid,
which seems less misleading (since it might not in fact be our
transaction's XID) and is certainly less uninformatively generic.
Per bug #18499 from Alexander Lakhin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table. When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs. Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan. However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.
The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs. That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that. I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs. I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.
Back-patch of v17 commits d0d44049d and 779ac2c74. When d0d44049d
went in, there was no evidence that it was fixing a reachable bug,
so I refrained from back-patching. Now we have such evidence.
Per bug #18465 from Hal Takahara. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18465-2fae927718976b22@postgresql.org
Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
As an optimization, we store "name" columns as cstrings in btree
indexes.
Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.
Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.
It's unclear why coverity started warning now.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
If temp tables have dependencies (such as sequences) then it's
possible for autovacuum's cleanup of orphan temp tables to deadlock
against an incoming backend that's trying to clean out the temp
namespace for its own use. That can happen because RemoveTempRelations'
performDeletion call can visit objects within the namespace in
an order different from the order in which a per-table deletion
will visit them.
To fix, observe that performDeletion will begin by taking an exclusive
lock on the temp namespace (even though it won't actually delete it).
So, if we can get a shared lock on the namespace, we can be sure we're
not running concurrently with RemoveTempRelations, while also not
conflicting with ordinary use of the namespace. This requires
introducing a conditional version of LockDatabaseObject, but that's no
big deal. (It's surprising we've got along without that this long.)
Report and patch by Mikhail Zhilin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru
Up to now, all of the "catcache list" objects within a catalog cache
were just chained together on a single dlist, requiring O(N) time to
search. Remarkably, we've not had serious performance problems with
that so far; but we got a complaint of a bad performance regression
from v15 in a case with a large number of roles in the system, which
traced down to O(N^2) total time when we probed N catcache lists.
Replace that data structure with a hashtable having an enlargeable
number of dlists, in an exactly parallel way to the data structure
we've used for years for the plain CatCTup cache members. The extra
cost of maintaining a hash table seems negligible, since we were
already computing a hash value for list searches.
Normally this'd be HEAD-only material, but in view of the performance
regression it seems advisable to back-patch into v16. In the v16
version of the patch, leave the dead cc_lists field where it is and
add the new fields at the end of struct catcache, to avoid possible
ABI breakage in case any external code is looking at these structs.
(We assume no external code is actually allocating new catcache
structs.)
Per report from alex work.
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz=vggErdSG7pv2s6vmmTOLJSg@mail.gmail.com
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)
As far as I know, that doesn't cause any problems in ordinary
functions. It's disastrous for procedures however. All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite. Doing something else leads to an assertion failure
or core dump.
This is simple enough to fix: we just need to not apply that rule
when considering procedures. However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers. Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.
Per report from Yahor Yuzefovich. This has been broken since we
implemented procedures, so back-patch to all supported branches.
Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
This reverts commit eae7be600b, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.
Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().
As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers. Depending on the expressions
or predicate used, this could cause the parallel build to fail.
Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions. Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.
Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
Commit 5579388d removed code that supplied a fallback implementation of
getaddrinfo(), which was dead code on modern systems. One tiny piece of
the removed code was still doing something useful on Windows, though:
that OS's own gai_strerror()/gai_strerrorA() function returns a pointer
to a static buffer that it overwrites each time, so it's not
thread-safe. In rare circumstances, a multi-threaded client program
could get an incorrect or corrupted error message.
Restore the replacement gai_strerror() function, though now that it's
only for Windows we can put it into a win32-specific file and cut it
down to the errors that Windows documents. The error messages here are
taken from FreeBSD, because Windows' own messages seemed too verbose.
Back-patch to 16.
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKz%2BF9d2PTiXwfYV7qJw%2BWg2jzACgSDgPizUw7UG%3Di58A%40mail.gmail.com
An OS crash could leave PG_VERSION empty or missing. The same symptom
appeared in a backup by block device snapshot, taken after the next
checkpoint and before the OS flushes the PG_VERSION blocks. Device
snapshots are not a documented backup method, however. Back-patch to
v15, where commit 9c08aea6a3 introduced
STRATEGY=WAL_LOG and made it the default.
Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h. Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics). Moving the function to the much
less popular ascii.h resolves these issues for now.
This commit is back-patched for the benefit of the aforementioned
third-party tool. The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists. This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code. It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().
Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
try_index_open() is able to open an index if its relkind fits, except
that it would return NULL instead of generated an error if the relation
does not exist. This new routine will be used by an upcoming patch to
make REINDEX on partitioned relations more robust when an index in a
partition tree is dropped.
Extracted from a larger patch by the same author.
Author: Fei Changhong
Discussion: https://postgr.es/m/tencent_6A52106095ACDE55333E3AD33F304C0C3909@qq.com
Backpatch-through: 14
We should have done it this way all along, but we accidentally got
away with using the wrong BIO field up until OpenSSL 3.2. There,
the library's BIO routines that we rely on use the "data" field
for their own purposes, and our conflicting use causes assorted
weird behaviors up to and including core dumps when SSL connections
are attempted. Switch to using the approved field for the purpose,
i.e. app_data.
While at it, remove our configure probes for BIO_get_data as well
as the fallback implementation. BIO_{get,set}_app_data have been
there since long before any OpenSSL version that we still support,
even in the back branches.
Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor
change in an error message spelling that evidently came in with 3.2.
Tristan Partin and Bo Andreson. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
If the tuple being updated is not visible to the crosscheck snapshot,
we return TM_Updated but the assertions would not hold in that case.
Move them to before the cross-check.
Fixes bug #17893. Backpatch to all supported versions.
Author: Alexander Lakhin
Backpatch-through: 12
Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.
To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types. Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.
This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.
This is a backpatch of 9dce22033d to all supported branches.
Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
Backpatch-through: v12
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions). Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.
Two actual bugs of this ilk are fixed here. We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest. Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism. A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.
In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after. The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.
Per bug #18097 from Jim Keener. Back-patch to all supported versions.
Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument. This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.
Back-patch to all supported branches.
Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.
Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds. In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen. Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.
To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.
The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text. v11 psql
lacks that parameter, so omit the tests in that branch.
Our thanks to Pedro Gallegos for reporting this problem.
Security: CVE-2023-5869
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val. Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice. Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not. (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.) In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:
1. The wrong table's statement-level triggers would get fired.
2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan. This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.
In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.
While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch. Given the lack of field complaints about the
trigger issue, I'll desist for now. (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)
Per bug #18147 from Hans Buschmann.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].
* For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
* For LLVM == 15, we'll continue to do the same, explicitly opting
out of opaque pointer mode.
* For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
the extra type argument.
The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc. The change is mostly mechanical,
except that at each site the correct type had to be provided.
In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.
Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.
[1] https://llvm.org/docs/OpaquePointers.html
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system(). This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers. If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit(). This can end badly. For
example, both processes will try to remove themselves from the
PGPROC shared array.
To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before. If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.
This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.
Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
Under interval_ops, some equal values are distinguishable. One such
pair is '24:00:00' and '1 day'. With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function. This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes. This fix makes interval_ops simply omit the support function,
like numeric_ops does. Back-pack to v13, where btequalimage() first
appeared. In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval". Going
forward, back-branch initdb will include the catalog change.
Reviewed by Peter Geoghegan.
Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com