Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
While determining xid horizons, we skip over backends that are running
Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently
for the purposes of computing Xmin for Vacuum. But we were not setting the
flags corresponding to these operations when they are performed in
parallel which was preventing Xid horizon from advancing.
The optimization related to skipping Create Index Concurrently, or Reindex
Concurrently operations was implemented in PG-14 but the fix is the same
for the Parallel Vacuum as well so back-patched till PG-13.
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com
Currently, lastOverflowedXid is never reset. It's just adjusted on new
transactions known to be overflowed. But if there are no overflowed
transactions for a long time, snapshots could be mistakenly marked as
suboverflowed due to wraparound.
This commit fixes this issue by resetting lastOverflowedXid when needed
altogether with KnownAssignedXids.
Backpatch to all supported versions.
Reported-by: Stan Hu
Discussion: https://postgr.es/m/CAMBWrQ%3DFp5UAsU_nATY7EMY7NHczG4-DTDU%3DmCvBQZAQ6wa2xQ%40mail.gmail.com
Author: Kyotaro Horiguchi, Alexander Korotkov
Reviewed-by: Stan Hu, Simon Riggs, Nikolay Samokhvalov, Andrey Borodin, Dmitry Dolgov
If lo_export() fails to open the target file or to write to it, it leaks
the created LargeObjectDesc and its snapshot in the top-transaction
context and resource owner. That's pretty harmless, it's a small leak
after all, but it gives the user a "Snapshot reference leak" warning.
Fix by using a short-lived memory context and no resource owner for
transient LargeObjectDescs that are opened and closed within one function
call. The leak is easiest to reproduce with lo_export() on a directory
that doesn't exist, but in principle the other lo_* functions could also
fail.
Backpatch to all supported versions.
Reported-by: Andrew B
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/32bf767a-2d65-71c4-f170-122f416bab7e@iki.fi
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked. This is a slightly
larger change than in those cases. We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries. Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().
list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy. I found a couple of existing
uses of the latter that could profitably use the new function. (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)
As before, back-patch to v13.
Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
When replaying a transaction that held many exclusive locks on the
primary, a standby server's startup process would expend O(N^2)
effort on manipulating the list of locks. This code was fine when
written, but commit 1cff1b95a made repetitive list_delete_first()
calls inefficient, as explained in its commit message. Fix by just
iterating the list normally, and releasing storage only when done.
(This'd be inadequate if we needed to recover from an error occurring
partway through; but we don't.)
Back-patch to v13 where 1cff1b95a came in.
Nathan Bossart
Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.
To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.
To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.
Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.
Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
The purpose of commit 8a54e12a38 was to
fix this, and it sufficed when the PREPARE TRANSACTION completed before
the CIC looked for lock conflicts. Otherwise, things still broke. As
before, in a cluster having used CIC while having enabled prepared
transactions, queries that use the resulting index can silently fail to
find rows. It may be necessary to reindex to recover from past
occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index
builds by making CIC wait for arbitrarily-recent prepared transactions
and for ordinary transactions that may yet PREPARE TRANSACTION. As part
of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC
before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all
supported versions).
Andrey Borodin, reviewed (in earlier versions) by Andres Freund.
Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
Do not update shm_mq's mq_bytes_written until we have written
an amount of data greater than 1/4th of the ring size, unless
the caller of shm_mq_send(v) requests a flush at the end of
the message. This reduces the number of calls to SetLatch(),
and also the number of CPU cache misses, considerably, and thus
makes shm_mq significantly faster.
Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some
minor cosmetic changes by me.
Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area, taking advantage of the
work done in 0c39c29 and 0bd305e. This is useful for users to estimate
the amount of huge pages required for a server as it becomes possible to
do an estimation without having to start the server and potentially
allocate a large chunk of shared memory.
The number of huge pages is calculated based on the existing GUC
huge_page_size if set, or by using the system's default by looking at
/proc/meminfo on Linux. There is nothing new here as this commit reuses
the existing calculation methods, and just exposes this information
directly to the user. The routine calculating the huge page size is
refactored to limit the number of files with platform-specific flags.
This new GUC's name was the most popular choice based on the discussion
done. This is only supported on Linux.
I have taken the time to test the change on Linux, Windows and MacOS,
though for the last two ones large pages are not supported. The first
one calculates correctly the number of pages depending on the existing
GUC huge_page_size or the system's default.
Thanks to Andres Freund, Robert Haas, Kyotaro Horiguchi, Tom Lane,
Justin Pryzby (and anybody forgotten here) for the discussion.
Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
ProcArrayGroupClearXid function has a parameter named "proc",
but the same name was used for its local variables. This commit fixes
this variable shadowing, to improve code readability.
Back-patch to all supported versions, to make future back-patching
easy though this patch is classified as refactoring only.
Reported-by: Ranier Vilela
Author: Ranier Vilela, Aleksander Alekseev
https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
All size_t variables declared in procarray.c are actually int ones.
Let's use int instead of size_t for those variables. Which would
reduce Wsign-compare compiler warnings.
Back-patch to v14 where commit 941697c3c1 added size_t variables
in procarray.c, to make future back-patching easy though
this patch is classified as refactoring only.
Reported-by: Ranier Vilela
Author: Ranier Vilela, Aleksander Alekseev
https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
We don't allow relations to exceed 2^32-1 blocks, because block
numbers are 32 bits and the last possible block number is reserved
to mean InvalidBlockNumber. There is a check for this in mdextend,
but that's really way too late, because the smgr API requires us to
create a buffer for the block-to-be-added, and we do not want to
have any buffer with blocknum InvalidBlockNumber. (Such a case
can trigger assertions in bufmgr.c, plus I think it might confuse
ReadBuffer's logic for data-past-EOF later on.) So put the check
into ReadBuffer.
Per report from Christoph Berg. It's been like this forever,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/YTn1iTkUYBZfcODk@msg.credativ.de
This runtime-computed GUC shows the size of the server's main shared
memory area, taking into account the amount of shared memory allocated
by extensions as this is calculated after processing
shared_preload_libraries.
Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
All the code paths simplified here were already using a boolean or used
an expression that led to zero or one, making the extra bits
unnecessary.
Author: Justin Pryzby
Reviewed-by: Tom Lane, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function. This is intended
for use in a future change related to the setup of huge pages and shared
memory with some GUCs, while useful on its own for extensions.
Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
We had a complaint that the postmaster fails to start if the invoking
program closes stdin. That happens because count_usable_fds expects
to be able to dup(0), and if it can't, we conclude there are no free
FDs and go belly-up. So far as I can find, though, there is no other
place in the server that touches stdin, and it's not unreasonable to
expect that a daemon wouldn't use that file.
As a simple improvement, let's dup FD 2 (stderr) instead. Unlike stdin,
it *is* reasonable for us to expect that stderr be open; even if we are
configured not to touch it, common libraries such as libc might try to
write error messages there.
Per gripe from Mario Emmenlauer. Given the lack of previous complaints,
I'm not excited about pushing this into stable branches, but it seems
OK to squeeze it into v14.
Discussion: https://postgr.es/m/48bafc63-c30f-3962-2ded-f2e985d93e86@emmenlauer.de
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be
created under the same fileset and the files will be deleted
after the transaction is completed.
This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.
Author: Dilip Kumar, based on suggestion by Thomas Munro
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.
Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
Since a partitioned index doesn't have storage, getting the number of
blocks from it will not give sensible results. Existing callers
already check that they don't call it that way, so there doesn't
appear to be a live problem. But for correctness, handle
RELKIND_PARTITIONED_INDEX together with the other non-storage
relkinds.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/1d3a5fbe-f48b-8bea-80da-9a5c4244aef9@enterprisedb.com
Commit 80abbeba2 evidently didn't bother checking this code.
Also, list the generated executable in .gitignore (so it's
been a REALLY long time since anyone tried this).
Noted while trying out RISC-V spinlock patch. Given that
this has been broken for 5 years and nobody noticed, it's
likely not worth back-patching.
As reported by a few OSX buildfarm animals there exist at least one path where
temporary files exist during AtProcExit_Files() processing. As temporary file
cleanup causes pgstat reporting, the assertions added in ee3f8d3d3a caused
failures.
This is not an OSX specific issue, we were just lucky that timing on OSX
reliably triggered the problem. The known way to cause this is a FATAL error
during perform_base_backup() with a MANIFEST used - adding an elog(FATAL)
after InitializeBackupManifest() reliably reproduces the problem in isolation.
The problem is that the temporary file created in InitializeBackupManifest()
is not cleaned up via resource owner cleanup as WalSndResourceCleanup()
currently is only used for non-FATAL errors. That then allows to reach
AtProcExit_Files() with existing temporary files, causing the assertion
failure.
To fix this problem, move temporary file cleanup to a before_shmem_exit() hook
and add assertions ensuring that no temporary files are created before / after
temporary file management has been initialized / shut down. The cleanest way
to do so seems to be to split fd.c initialization into two, one for plain file
access and one for temporary file access.
Right now there's no need to perform further fd.c cleanup during process exit,
so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively
we could perform another pass through the files to check that no temporary
files exist, but the added assertions seem to provide enough protection
against that.
It might turn out that the assertions added in ee3f8d3d3a will cause too much
noise - in that case we'll have to downgrade them to a WARNING, at least
temporarily.
This commit is not necessarily the best approach to address this issue, but it
should resolve the buildfarm failures. We can revise later.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
For EXEC_BACKEND InitProcess()/InitAuxiliaryProcess() needs to have been
called well before we call BaseInit(), as SubPostmasterMain() needs LWLocks to
work. Having the order of initialization differ between platforms makes it
unnecessarily hard to understand the system and to add initialization points
for new subsystems without a lot of duplication.
To be able to change the order, BaseInit() cannot trigger
CreateSharedMemoryAndSemaphores() anymore - obviously that needs to have
happened before we can call InitProcess(). It seems cleaner to create shared
memory explicitly in single user/bootstrap mode anyway.
After this change the separation of bufmgr initialization into
InitBufferPoolAccess() / InitBufferPoolBackend() is not meaningful anymore so
the latter is removed.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
These have been unrelated since bgwriter and checkpointer were split into two
processes in 806a2aee37. As there several pending patches (shared memory
stats, extending the set of tracked IO / buffer statistics) that are made a
bit more awkward by the grouping, split them. Done separately to make
reviewing easier.
This does *not* change the contents of pg_stat_bgwriter or move fields out of
bgwriter/checkpointer stats that arguably do not belong in either. However
pgstat_fetch_global() was renamed and split into
pgstat_fetch_stat_checkpointer() and pgstat_fetch_stat_bgwriter().
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Start up the checkpointer and bgwriter during crash recovery (except in
--single mode), as we do for replication. This wasn't done back in
commit cdd46c76 out of caution. Now it seems like a better idea to make
the environment as similar as possible in both cases. There may also be
some performance advantages.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com
They are used in code that runs both during normal operation and during
WAL replay, and needs to behave differently during replay. Move them to
xlogutils.c, because that's where we have other helper functions used by
redo routines.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need. However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int". As
written, the same clamp was applied to hash_mem. This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.
Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched. However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.
This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.
Per gripe from Laurent Hasson. Back-patch to v13 where the
behavior change came in.
Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
5a1e1d8302 was a minimal bug fix for dc7420c2c9. To avoid future bugs of
that kind, deduplicate the choice of a relation's horizon into a new helper,
GlobalVisHorizonKindForRel().
As the code in question was only introduced in dc7420c2c9 it seems worth
backpatching this change as well, otherwise 14 will look different from all
other branches.
A different approach to this was suggested by Matthias van de Meent.
Author: Andres Freund
Discussion: https://postgr.es/m/20210621122919.2qhu3pfugxxp3cji@alap3.anarazel.de
Backpatch: 14, like 5a1e1d8302
We have an implementation restriction that PREPARE TRANSACTION can't
handle cases where both session-lifespan and transaction-lifespan locks
are held on the same lockable object. (That's because we'd otherwise
need to acquire a new PROCLOCK entry during post-prepare cleanup, which
is an operation that might fail. The situation can only arise with odd
usages of advisory locks, so removing the restriction is probably not
worth the amount of effort it would take.) AtPrepare_Locks attempted
to enforce this, but its logic was many bricks shy of a load, because
it only detected cases where the session and transaction locks had the
same lockmode. Locks of different modes on the same object would lead
to the rather unhelpful message "PANIC: we seem to have dropped a bit
somewhere".
To fix, build a transient hashtable with one entry per locktag,
not one per locktag + mode, and use that to detect conflicts.
Per bug #17122 from Alexander Pyhalov. This bug is ancient,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17122-04f3c32098a62233@postgresql.org
No concrete problem reported, but in the past it's been known to cause
problems on some compilers so let's avoid doing that.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/234364.1626704007%40sss.pgh.pa.us
A couple of open flags used in an assertion didn't exist in macOS 10.4.
Per build farm animal prairiedog. Also add O_EXCL while here (there are
a few more standard flags but they're not relevant and likely to be
missing).
Macs don't understand O_DIRECT, but they can disable caching with a
separate fcntl() call. Extend the file opening functions in fd.c to
handle this for us if the caller passes in PG_O_DIRECT.
For now, this affects only WAL data and even then only if you set:
max_wal_senders=0
wal_level=minimal
This is not expected to be very useful on its own, but later proposed
patches will make greater use of direct I/O, and it'll be useful for
testing if developers on Macs can see the effects.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BADiyyHe0cun2wfT%2BSVnFVqNYPxoO6J9zcZkVO7%2BNGig%40mail.gmail.com
As of v14, pg_depend contains almost 7000 "pin" entries recording
the OIDs of built-in objects. This is a fair amount of bloat for
every database, and it adds time to pg_depend lookups as well as
initdb. We can get rid of all of those entries in favor of an OID
range check, i.e. "OIDs below FirstUnpinnedObjectId are pinned".
(template1 and the public schema are exceptions. Those exceptions
are now wired into IsPinnedObject() instead of initdb's code for
filling pg_depend, but it's the same amount of cruft either way.)
The contents of pg_shdepend are modified likewise.
Discussion: https://postgr.es/m/3737988.1618451008@sss.pgh.pa.us
The idea behind this patch is to design out bugs like the one fixed
by commit 9d523119f. Previously, once one did RelationOpenSmgr(rel),
it was considered okay to access rel->rd_smgr directly for some
not-very-clear interval. But since that pointer will be cleared by
relcache flushes, we had bugs arising from overreliance on a previous
RelationOpenSmgr call still being effective.
Now, very little code except that in rel.h and relcache.c should ever
touch the rd_smgr field directly. The normal coding rule is to use
RelationGetSmgr(rel) and not expect the result to be valid for longer
than one smgr function call. There are a couple of places where using
the function every single time seemed like overkill, but they are now
annotated with large warning comments.
Amul Sul, after an idea of mine.
Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
Several places were performing a tight loop to determine the first power
of 2 number that's > or >= the required memory. Instead of using a loop
for that, we can use pg_nextpower2_32 or pg_nextpower2_64. When we need a
power of 2 number equal to or greater than a given amount, we just pass
the amount to the nextpower2 function. When we need a power of 2 greater
than the amount, we just pass the amount + 1.
Additionally, in tsearch there were a couple of locations that were
performing a while loop when a simple "if" would have done. In both of
these locations only 1 item is being added, so the loop could only have
ever iterated once. Changing the loop into an if statement makes the code
very slightly more optimal as the condition is checked once rather than
twice.
There are quite a few remaining locations that increase the size of the
buffer in the following form:
while (reqsize >= buflen)
{
buflen *= 2;
buf = repalloc(buf, buflen);
}
These are not touched in this commit. repalloc will error out for sizes
larger than MaxAllocSize. Changing these to use pg_nextpower2_32 would
remove the chance of that error being raised. It's unclear from the code
if the sizes could ever become that large, so err on the side of caution.
Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com
Reviewed-by: Zhihong Yu
In dc7420c2c9 I (Andres) accidentally used
RelationIsAccessibleInLogicalDecoding() as the sole condition to use the
non-shared catalog horizon in GetOldestNonRemovableTransactionId(). That is
incorrect, as RelationIsAccessibleInLogicalDecoding() checks whether wal_level
is logical.
The correct check, as done e.g. in GlobalVisTestFor(), is to check
IsCatalogRelation() and RelationIsAccessibleInLogicalDecoding().
The observed misbehavior of this bug was that there could be an endless loop
in lazy_scan_prune(), because the horizons used in heap_page_prune() and the
individual tuple liveliness checks did not match. Likely there are other
potential consequences as well.
A later commit will unify the determination which horizon has to be used, and
add additional assertions to make it easier to catch a bug like this.
Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Diagnosed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wg32Y9+WJfw=aofkRx1ZRFt_Ev6bNPc4PSaz7PjSFtZgQ@mail.gmail.com
It was unable to wait on a backend that had already left the procarray.
Users tolerant of that limitation can poll pg_stat_activity. Other
users can employ the "timeout" argument of pg_terminate_backend().
Reviewed by Bharath Rupireddy.
Discussion: https://postgr.es/m/20210605013236.GA208701@rfd.leadboat.com
Revert the pg_description entry to its v13 form, since those messages
usually remain shorter and don't discuss individual parameters. No
catversion bump, since pg_description content does not impair backend
compatibility or application compatibility.
Justin Pryzby
Discussion: https://postgr.es/m/20210612182743.GY16435@telsasoft.com
941697c3c1 changed ProcArrayAdd()/Remove() substantially. As reported by
zhanyi, I missed that due to the introduction of the PGPROC->pgxactoff
ProcArrayRemove() does not need to search for the position in
procArray->pgprocnos anymore - that's pgxactoff. Remove the search loop.
The removal of the search loop reduces assertion coverage a bit - but we can
easily do better than before by adding assertions to other loops over
pgprocnos elements.
Also do a bit of cleanup, mostly by reducing line lengths by introducing local
helper variables and adding newlines.
Author: zhanyi <w@hidva.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/tencent_5624AA3B116B3D1C31CA9744@qq.com