Failing to do so results in inability of logical decoding to process the
WAL stream. Handle it by doing nothing.
Backpatch all the way back.
Reported-by: Petr Jelínek <petr.jelinek@enterprisedb.com>
During a replication slot creation, an ERROR generated in the same
transaction as the one creating a to-be-exported snapshot would have
left the backend in an inconsistent state, as the associated static
export snapshot state was not being reset on transaction abort, but only
on the follow-up command received by the WAL sender that created this
snapshot on replication slot creation. This would trigger inconsistency
failures if this session tried to export again a snapshot, like during
the creation of a replication slot.
Note that a snapshot export cannot happen in a transaction block, so
there is no need to worry resetting this state for subtransaction
aborts. Also, this inconsistent state would very unlikely show up to
users. For example, one case where this could happen is an
out-of-memory error when building the initial snapshot to-be-exported.
Dilip found this problem while poking at a different patch, that caused
an error in this code path for reasons unrelated to HEAD.
Author: Dilip Kumar
Reviewed-by: Michael Paquier, Zhihong Yu
Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com
Backpatch-through: 9.6
Previously, walreceiver always closed the currently-opened WAL segment
and created its archive notification file, after it finished writing
the current segment up and received any WAL data that should be
written into the next segment. If walreceiver exited just before
any WAL data in the next segment arrived at standby, it did not
create the archive notification file of the current segment
even though that's known completed. This behavior could cause
WAL archiving of the segment to be delayed until subsequent
restartpoints or checkpoints created its notification file.
To fix the issue, this commit changes walreceiver so that it creates
an archive notification file of a current WAL segment immediately
if that's known completed before receiving next WAL data.
Back-patch to all supported branches.
Reported-by: Kyotaro Horiguchi
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200630.165503.1465894182551545886.horikyota.ntt@gmail.com
The existing code tried to do syscache lookups in an already-failed
transaction, which is problematic to say the least. After some
consideration of alternatives, the best fix seems to be to just drop
type names from the error message altogether. The table and column
names seem like sufficient localization. If the user is unsure what
types are involved, she can check the local and remote table
definitions.
Having done that, we can also discard the LogicalRepTypMap hash
table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE
replication messages are now obsolete as well; but we should
probably keep them in case some other use emerges. (The complexity
of removing something from the replication protocol would likely
outweigh any savings anyhow.)
Masahiko Sawada and Bharath Rupireddy, per complaint from Andres
Freund. Back-patch to v10 where this code originated.
Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
A walsender process that has executed a SQL command left the text of
that command in pg_stat_activity.query indefinitely, which is quite
confusing if it's in RUNNING state but not doing that query. An easy
and useful fix is to treat replication commands as if they were SQL
queries, and show them in pg_stat_activity according to the same rules
as for regular queries. While we're at it, it seems also sensible to
set debug_query_string, allowing error logging and debugging to see
the replication command.
While here, clean up assorted silliness in exec_replication_command:
* Clean up SQLCmd code path, and fix its only-accidentally-not-buggy
memory management.
* Remove useless duplicate call of SnapBuildClearExportedSnapshot().
* replication_scanner_finish() was never called.
Back-patch of commit f560209c6 into v10-v13. I'd originally felt
that this didn't merit back-patching, but subsequent confusion
while debugging walsender problems suggests that it'll be useful.
Also, the original commit has now aged long enough to provide some
comfort that it won't cause problems.
Discussion: https://postgr.es/m/2673480.1624557299@sss.pgh.pa.us
Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.
The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.
Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
Using an Assert to check the validity of incoming messages is an
extremely poor decision. In a debug build, it should not be that easy
for a broken or malicious remote client to crash the logrep worker.
The consequences could be even worse in non-debug builds, which will
fail to make such checks at all, leading to who-knows-what misbehavior.
Hence, promote every Assert that could possibly be triggered by wrong
or out-of-order replication messages to a full test-and-ereport.
To avoid bloating the set of messages the translation team has to cope
with, establish a policy that replication protocol violation error
reports don't need to be translated. Hence, all the new messages here
use errmsg_internal(). A couple of old messages are changed likewise
for consistency.
Along the way, fix some non-idiomatic or outright wrong uses of
hash_search().
Most of these mistakes are new with the "streaming replication"
patch (commit 464824323), but a couple go back a long way.
Back-patch as appropriate.
Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us
The worker.c global wrconn is only meant to be used by logical apply/
tablesync workers, but there are other variables with the same name. To
reduce future confusion rename the global from "wrconn" to
"LogRepWorkerWalRcvConn".
While this is just cosmetic, it seems better to backpatch it all the way
back to 10 where this code appeared, to avoid future backpatching
issues.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
We leaked the error report from PQconninfoParse, when there was
one. It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.
Found via valgrind testing.
Back-patch to v10 where this code was added.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
Document that though the history file content is marked as bytea, it is
the same a text, and neither is btyea-escaped or encoding converted.
Reported-by: Brar Piening
Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de
Backpatch-through: 13 - 9.5 (not master)
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format. This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.
Two of these call sites were in fact wrong:
* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.
* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended. Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units. This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.
There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds. It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.
Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.
Alexey Kondratov and Tom Lane
Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
Previously the standby server didn't archive timeline history files
streamed from the primary even when archive_mode is set to "always",
while it archives the streamed WAL files. This could cause the PITR to
fail because there was no required timeline history file in the archive.
The cause of this issue was that walreceiver didn't mark those files as
ready for archiving.
This commit makes walreceiver mark those streamed timeline history
files as ready for archiving if archive_mode=always. Then the archiver
process archives the marked timeline history files.
Back-patch to all supported versions.
Reported-by: Grigory Smolkin
Author: Grigory Smolkin, Fujii Masao
Reviewed-by: David Zhang, Anastasia Lubennikova
Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru
The code recorded cache invalidation events by zeroing the "localreloid"
field of affected cache entries. However, it's possible for an inval
event to occur even while we have the entry open and locked. So an
ill-timed inval could result in "cache lookup failed for relation 0"
errors, if the worker's code tried to use the cleared field. We can
fix that by creating a separate bool field to record whether the entry
needs to be revalidated. (In the back branches, cram the bool into
what had been padding space, to avoid an ABI break in the somewhat
unlikely event that any extension is looking at this struct.)
Also, rearrange the logic in logicalrep_rel_open so that it
does the right thing in cases where table_open would fail.
We should retry the lookup by name in that case, but we didn't.
The real-world impact of this is probably small. In the first place,
the error conditions are very low probability, and in the second place,
the worker would just exit and get restarted. We only noticed because
in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly,
preventing the worker from making progress. Nonetheless, it's clearly
a bug, and it impedes a useful type of testing; so back-patch to v10
where this code was introduced.
Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us
Commit 3f60f690f only partially fixed the broken-status-tracking
issue in LogicalRepApplyLoop: we need ping_sent to have the same
lifetime as last_recv_timestamp. The effects are much less serious
than what that commit fixed, though. AFAICS this would just lead to
extra ping requests being sent, once per second until the sender
responds. Still, it's a bug, so backpatch to v10 as before.
Discussion: https://postgr.es/m/959627.1599248476@sss.pgh.pa.us
This is like CVE-2018-1058 commit
582edc369c. Today, a malicious user of a
publisher or subscriber database can invoke arbitrary SQL functions
under an identity running replication, often a superuser. This fix may
cause "does not exist" or "no schema has been selected to create in"
errors in a replication process. After upgrading, consider watching
server logs for these errors. Objects accruing schema qualification in
the wake of the earlier commit are unlikely to need further correction.
Back-patch to v10, which introduced logical replication.
Security: CVE-2020-14349
Ashutosh Bapat noticed that when logical walsender needs to wait for
WAL, and it realizes that it must send a keepalive message to
walreceiver to update the sent-LSN, which *does not* request a reply
from walreceiver, it wrongly sets the flag that it's going to wait for
that reply. That means that any future would-be sender of feedback
messages ends up not sending a feedback message, because they all
believe that a reply is expected.
With built-in logical replication there's not much harm in this, because
WalReceiverMain will send a ping-back every wal_receiver_timeout/2
anyway; but with other logical replication systems (e.g. pglogical) it
can cause significant pain.
This problem was introduced in commit 41d5f8ad73, where the
request-reply flag was changed from true to false to WalSndKeepalive,
without at the same time removing the line that sets
waiting_for_ping_response.
Just removing that line would be a sufficient fix, but it seems better
to shift the responsibility of setting the flag to WalSndKeepalive
itself instead of requiring caller to do it; this is clearly less
error-prone.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Backpatch: 9.5 and up
Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
Commit b9c130a1f failed to apply the publisher-to-subscriber column
mapping while checking which columns were updated. Perhaps less
significantly, it didn't exclude dropped columns either. This could
result in an incorrect updated-columns bitmap and thus wrong decisions
about whether to fire column-specific triggers on the subscriber while
applying updates. In HEAD (since commit 9de77b545), it could also
result in accesses off the end of the colstatus array, as detected by
buildfarm member skink. Fix the logic, and adjust 003_constraints.pl
so that the problem is exposed in unpatched code.
In HEAD, also add some assertions to check that we don't access off
the ends of these newly variable-sized arrays.
Back-patch to v10, as b9c130a1f was.
Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
Fix some more violations of the "only straight-line code inside a
spinlock" rule. These are hazardous not only because they risk
holding the lock for an excessively long time, but because it's
possible for palloc to throw elog(ERROR), leaving a stuck spinlock
behind.
copy_replication_slot() had two separate places that did pallocs
while holding a spinlock. We can make the code simpler and safer
by copying the whole ReplicationSlot struct into a local variable
while holding the spinlock, and then referencing that copy.
(While that's arguably more cycles than we really need to spend
holding the lock, the struct isn't all that big, and this way seems
far more maintainable than copying fields piecemeal. Anyway this
is surely much cheaper than a palloc.) That bug goes back to v12.
InvalidateObsoleteReplicationSlots() not only did a palloc while
holding a spinlock, but for extra sloppiness then leaked the memory
--- probably for the lifetime of the checkpointer process, though
I didn't try to verify that. Fortunately that silliness is new
in HEAD.
pg_get_replication_slots() had a cosmetic violation of the rule,
in that it only assumed it's safe to call namecpy() while holding
a spinlock. Still, that's a hazard waiting to bite somebody, and
there were some other cosmetic coding-rule violations in the same
function, so clean it up. I back-patched this as far as v10; the
code exists before that but it looks different, and this didn't
seem important enough to adapt the patch further back.
Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
The previous coding zeroed out offsetof(ReplicationStateCtl, states)
more bytes than it was entitled to, as a consequence of starting the
zeroing from the wrong pointer (or, if you prefer, using the wrong
calculation of how much to zero).
It's unsurprising that this has not caused any reported problems,
since it can be expected that the newly-allocated block is at the end
of what we've used in shared memory, and we always make the shmem
block substantially bigger than minimally necessary. Nonetheless,
this is wrong and it could bite us someday; plus it's a dangerous
model for somebody to copy.
This dates back to the introduction of this code (commit 5aa235042),
so back-patch to all supported branches.
We have repeatedly seen the buildfarm reach the Assert(false) in
SyncRepGetSyncStandbysPriority. This apparently is due to failing to
consider the possibility that the sync_standby_priority values in
shared memory might be inconsistent; but they will be whenever only
some of the walsenders have updated their values after a change in
the synchronous_standby_names setting. That function is vastly too
complex for what it does, anyway, so rewriting it seems better than
trying to apply a band-aid fix.
Furthermore, the API of SyncRepGetSyncStandbys is broken by design:
it returns a list of WalSnd array indexes, but there is nothing
guaranteeing that the contents of the WalSnd array remain stable.
Thus, if some walsender exits and then a new walsender process
takes over that WalSnd array slot, a caller might make use of
WAL position data that it should not, potentially leading to
incorrect decisions about whether to release transactions that
are waiting for synchronous commit.
To fix, replace SyncRepGetSyncStandbys with a new function
SyncRepGetCandidateStandbys that copies all the required data
from shared memory while holding the relevant mutexes. If the
associated walsender process then exits, this data is still safe to
make release decisions with, since we know that that much WAL *was*
sent to a valid standby server. This incidentally means that we no
longer need to treat sync_standby_priority as protected by the
SyncRepLock rather than the per-walsender mutex.
SyncRepGetSyncStandbys is no longer used by the core code, so remove
it entirely in HEAD. However, it seems possible that external code is
relying on that function, so do not remove it from the back branches.
Instead, just remove the known-incorrect Assert. When the bug occurs,
the function will return a too-short list, which callers should treat
as meaning there are not enough sync standbys, which seems like a
reasonably safe fallback until the inconsistent state is resolved.
Moreover it's bug-compatible with what has been happening in non-assert
builds. We cannot do anything about the walsender-replacement race
condition without an API/ABI break.
The bogus assertion exists back to 9.6, but 9.6 is sufficiently
different from the later branches that the patch doesn't apply at all.
I chose to just remove the bogus assertion in 9.6, feeling that the
probability of a bad outcome from the walsender-replacement race
condition is too low to justify rewriting the whole patch for 9.6.
Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.
This could result in a walsender being stuck on the lock forever. A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.
Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com>
Reported-by: Craig Ringer <craig@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
Back-patch a subset of commit 9155580fd5
to v11, v10, 9.6, and 9.5. Include the latest repairs to this function.
Use a new XLOG_FPI_MULTI value instead of reusing XLOG_FPI. That way,
if an older server reads WAL from this function, that server will PANIC
instead of applying just one page of the record. The next commit adds a
call to this function.
Discussion: https://postgr.es/m/20200304.162919.898938381201316571.horikyota.ntt@gmail.com
Manifested as
ERROR: subtransaction logged without previous top-level txn record
this check forbids legit behaviours like
- First xl_xact_assignment record is beyond reading, i.e. earlier
restart_lsn.
- After restart_lsn there is some change of a subxact.
- After that, there is second xl_xact_assignment (for another subxact)
revealing the relationship between top and first subxact.
Such a transaction won't be streamed anyway because we hadn't seen it in
full. Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.
Reported-by: Hsu, John
Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila
Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
... specifically, set it incrementally as each individual change is
spilled down to disk. This way, it is set correctly when the
transaction disappears without trace, ie. without leaving an XACT_ABORT
wal record. (This happens when the server crashes midway through a
transaction.)
Failing to have final_lsn prevents ReorderBufferRestoreCleanup() from
working, since it needs the final_lsn in order to know the endpoint of
its iteration through spilled files.
Commit df9f682c7b already tried to fix the problem, but it didn't set
the final_lsn in all cases. Revert that, since it's no longer needed.
Author: Vignesh C
Reviewed-by: Amit Kapila, Dilip Kumar
Discussion: https://postgr.es/m/CALDaNm2CLk+K9JDwjYST0sPbGg5AQdvhUt0jbKyX_HdAE0jk3A@mail.gmail.com
On the publisher, it was assumed that an INSERT change cannot happen for
a relation with no replica identity. However this is true only for a
change that needs references to old rows, aka UPDATE or DELETE, so
trying to use logical replication with a relation that has no replica
identity led to an assertion failure in the publisher when issuing an
INSERT. This commit removes the incorrect assertion, and adds more
regression tests to provide coverage for relations without replica
identity.
Reported-by: Neha Sharma
Author: Dilip Kumar, Michael Paquier
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com
Backpatch-through: 10
Currently while decoding changes, if the number of changes exceeds a
certain threshold, we spill those to disk. And this happens for each
(sub)transaction. Now, while reading all these files, we don't close them
until we read all the files. While reading these files, if the number of
such files exceeds the maximum number of file descriptors, the operation
errors out.
Use PathNameOpenFile interface to open these files as that internally has
the mechanism to release kernel FDs as needed to get us under the
max_safe_fds limit.
Reported-by: Amit Khandekar
Author: Amit Khandekar
Reviewed-by: Amit Kapila
Backpatch-through: 9.4
Discussion: https://postgr.es/m/CAJ3gD9c-sECEn79zXw4yBnBdOttacoE-6gAyP0oy60nfs_sabQ@mail.gmail.com
This patch allows building the local relmap cache for a subscribed
relation after processing pending invalidation messages and potential
relcache updates. Without this, the attributes in the local cache don't
tally with the updated relcache entry leading to invalid memory access.
Reported-by Jehan-Guillaume de Rorthais
Author: Jehan-Guillaume de Rorthais and Vignesh C
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/20191025175929.7e90dbf5@firost
slot_modify_cstrings seriously abused the TupleTableSlot API by relying
on a slot's underlying data to stay valid across ExecClearTuple. Since
this abuse was also quite undocumented, it's little surprise that the
case got broken during the v12 slot rewrites. As reported in bug #16129
from Ondřej Jirman, this could lead to crashes or data corruption when
a logical replication subscriber processes a row update. Problems would
only arise if the subscriber's table contained columns of pass-by-ref
types that were not being copied from the publisher.
Fix by explicitly copying the datum/isnull arrays from the source slot
that the old row was in already. This ends up being about the same
thing that happened pre-v12, but hopefully in a less opaque and
fragile way.
We might've caught the problem sooner if there were any test cases
dealing with updates involving non-replicated or dropped columns.
Now there are.
Back-patch to v10 where this code came in. Even though the failure
does not manifest before v12, IMO this code is too fragile to leave
as-is. In any case we certainly want the additional test coverage.
Patch by me; thanks to Tomas Vondra for initial investigation.
Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org
This happens when we add a replica identity column on a subscriber
that does not yet exist on the publisher, according to the mapping
maintained by the subscriber. Code that checks whether the target
relation on the subscriber is updatable would check the replica
identity attribute bitmap with a column number -1, which would result
in an error. To fix, skip such columns in the bitmap lookup and
consider the relation not updatable. The result is consistent with
the rule that the replica identity columns on the subscriber must be a
subset of those on the publisher, since if the column doesn't exist on
the publisher, the column set on the subscriber can't be a subset.
Reported-by: Tim Clarke <tim.clarke@minerva.info>
Analyzed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/a9139c29-7ddd-973b-aa7f-71fed9c38d75%40minerva.info
When sending data for logical decoding using the streaming replication
protocol via a WAL sender, the timestamp of the sent write message is
allocated at the beginning of the message when preparing for the write,
and actually computed when the write message is ready to be sent.
The timestamp was getting computed after sending the message. This
impacts anything using logical decoding, causing for example logical
replication to report mostly NULL for last_msg_send_time in
pg_stat_subscription.
This commit makes sure that the timestamp is computed before sending the
message. This is wrong since 5a991ef, so backpatch down to 9.4.
Author: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1z=WMn8jt7iEdC5sYNaPgAgOASb_OW5JYv-vMdYaJSL-w@mail.gmail.com
Backpatch-through: 9.4
When a backend exits, it gets deleted from the syncrep queue if present.
The queue was checked without SyncRepLock taken in exclusive mode, so it
would have been possible for a backend to remove itself after a WAL
sender already did the job. Fix this issue based on a suggestion from
Fujii Masao, by first checking the queue without the lock. Then, if the
backend is present in the queue, take the lock and perform an additional
lookup check before doing the element deletion.
Author: Dongming Liu
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/a0806273-8bbb-43b3-bbe1-c45a58f6ae21.lingce.ldm@alibaba-inc.com
Backpatch-through: 9.4
The timestamp tracking the last moment a message is received in a
logical replication worker was initialized in each loop checking if a
message was received or not, causing wal_receiver_timeout to be ignored
in basically any logical replication deployments. This also broke the
ping sent to the server when reaching half of wal_receiver_timeout.
This simply moves the initialization of the timestamp out of the apply
loop to the beginning of LogicalRepApplyLoop().
Reported-by: Jehan-Guillaume De Rorthais
Author: Julien Rouhaud
Discussion: https://postgr.es/m/CAOBaU_ZHESFcWva8jLjtZdCLspMj7vqaB2k++rjHLY897ZxbYw@mail.gmail.com
Backpatch-through: 10
Logical walsender should exit when it catches up with sending WAL during
shutdown; but there was a rare corner case when it failed to because of
a race condition that puts it back to wait for more WAL instead -- but
since there wasn't any, it'd not shut down immediately. It would only
continue the shutdown when wal_sender_timeout terminates the sleep,
which causes annoying waits during shutdown procedure. Restructure the
code so that we no longer forget to set WalSndCaughtUp in that case.
This was an oversight in commit c6c333436.
Backpatch all the way down to 9.4.
Author: Craig Ringer, Álvaro Herrera
Discussion: https://postgr.es/m/CAMsr+YEuz4XwZX_QmnX_-2530XhyAmnK=zCmicEnq1vLr0aZ-g@mail.gmail.com
Most WAL records are ignored in early SnapBuild snapshot build phases.
But it's critical to process some of them, so that later messages have
the correct transaction state after the snapshot is completely built; in
particular, XLOG_XACT_ASSIGNMENT messages are critical in order for
sub-transactions to be correctly assigned to their parent transactions,
or at least one assert misbehaves, as reported by Ildar Musin.
Diagnosed-by: Masahiko Sawada
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAONYFtOv+Er1p3WAuwUsy1zsCFrSYvpHLhapC_fMD-zNaRWxYg@mail.gmail.com
Some of these are quite old, but that doesn't make them not bugs.
We'd rather report a failure via elog than SIGSEGV.
While at it, uniformly spell the error check as !RelationIsValid(rel)
rather than a bare rel == NULL test. The machine code is the same
but it seems better to be consistent.
Coverity complained about this today, not sure why, because the
mistake is in fact old.
The old code didn't differentiate between a read error and a
concurrent truncation. fread reports both of these by returning 0;
you have to use feof() or ferror() to distinguish between them,
which this code did not do.
It might be a better idea to use read() rather than fread() here,
so that we can display a less-generic error message, but I'm not
sure that would qualify as a back-patchable bug fix, so just do
this much for now.
Jeevan Chalke, reviewed by Jeevan Ladhe and by me.
Discussion: http://postgr.es/m/CA+TgmobG4ywMzL5oQq2a8YKp8x2p3p1LOMMcGqpS7aekT9+ETA@mail.gmail.com
In early development patches, "replication origins" were called "identifiers";
almost everything was renamed, but these references to the old terminology
went unnoticed.
Reported-by: Craig Ringer
This is quite unsafe, even for the case of ereport(FATAL) where we won't
return control to the interrupted code, and despite this code's use of
a flag to restrict the areas where we'd try to do it. It's possible
for example that we interrupt malloc or free while that's holding a lock
that's meant to protect against cross-thread interference. Then, any
attempt to do malloc or free within ereport() will result in a deadlock,
preventing the walreceiver process from exiting in response to SIGTERM.
We hypothesize that this explains some hard-to-reproduce failures seen
in the buildfarm.
Hence, get rid of the immediate-exit code in WalRcvShutdownHandler,
as well as the logic associated with WalRcvImmediateInterruptOK.
Instead, we need to take care that potentially-blocking operations
in the walreceiver's data transmission logic (libpqwalreceiver.c)
will respond reasonably promptly to the process's latch becoming
set and then call ProcessWalRcvInterrupts. Much of the needed code
for that was already present in libpqwalreceiver.c. I refactored
things a bit so that all the uses of PQgetResult use latch-aware
waiting, but didn't need to do much more.
These changes should be enough to ensure that libpqwalreceiver.c
will respond promptly to SIGTERM whenever it's waiting to receive
data. In principle, it could block for a long time while waiting
to send data too, and this patch does nothing to guard against that.
I think that that hazard is mostly theoretical though: such blocking
should occur only if we fill the kernel's data transmission buffers,
and we don't generally send enough data to make that happen without
waiting for input. If we find out that the hazard isn't just
theoretical, we could fix it by using PQsetnonblocking, but that
would require more ticklish changes than I care to make now.
Back-patch of commit a1a789eb5. This problem goes all the way back
to the origins of walreceiver; but given the substantial reworking
the module received during the v10 cycle, it seems unsafe to assume
that our testing on HEAD validates this patch for pre-v10 branches.
And we'd need to back-patch some prerequisite patches (at least
597a87ccc and its followups, maybe other things), increasing the risk
of problems. Given the dearth of field reports matching this problem,
it's not worth much risk. Hence back-patch to v10 and v11 only.
Patch by me; thanks to Thomas Munro for review.
Discussion: https://postgr.es/m/20190416070119.GK2673@paquier.xyz
Only hand-assigned type OIDs should be presumed to match across different
PG servers; those assigned during genbki.pl or during initdb are likely
to change due to addition or removal of unrelated objects.
This means that the cutoff should be FirstGenbkiObjectId (in HEAD)
or FirstBootstrapObjectId (before that), not FirstNormalObjectId.
Compare postgres_fdw's is_builtin() test.
It's likely that this error has no observable consequence in a
normally-functioning system, since ATM the only affected type OIDs are
system catalog rowtypes and information_schema types, which would not
typically be interesting for logical replication. But you could
probably break it if you tried hard, so back-patch.
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us