mirror of
https://github.com/postgres/postgres.git
synced 2026-04-05 09:15:40 -04:00
Fix premature NULL lag reporting in pg_stat_replication
pg_stat_replication is documented to keep the last measured lag values for a short time after the standby catches up, and then set them to NULL when there is no WAL activity. However, previously lag values could become NULL prematurely even while WAL activity was ongoing, especially in logical replication. This happened because the code cleared lag when two consecutive reply messages indicated that the apply location had caught up with the send location. It did not verify that the reported positions were unchanged, so lag could be cleared even when positions had advanced between messages. In logical replication, where the apply location often quickly catches up, this issue was more likely to occur. This commit fixes the issue by clearing lag only when the standby reports that it has fully replayed WAL (i.e., both flush and apply locations have caught up with the send location) and the write/flush/apply positions remain unchanged across two consecutive reply messages. The second message with unchanged positions typically results from wal_receiver_status_interval, so lag values are cleared after that interval when there is no activity. This avoids showing stale lag data while preventing premature NULL values. Even with this fix, lag may rarely become NULL during activity if identical position reports are sent repeatedly. Eliminating such duplicate messages would address this fully, but that change is considered too invasive for stable branches and will be handled in master only later. Backpatch to all supported branches. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com Backpatch-through: 14
This commit is contained in:
parent
791ff1df1e
commit
fdce5de552
1 changed files with 19 additions and 16 deletions
|
|
@ -2436,7 +2436,9 @@ ProcessStandbyReplyMessage(void)
|
|||
TimestampTz now;
|
||||
TimestampTz replyTime;
|
||||
|
||||
static bool fullyAppliedLastTime = false;
|
||||
static XLogRecPtr prevWritePtr = InvalidXLogRecPtr;
|
||||
static XLogRecPtr prevFlushPtr = InvalidXLogRecPtr;
|
||||
static XLogRecPtr prevApplyPtr = InvalidXLogRecPtr;
|
||||
|
||||
/* the caller already consumed the msgtype byte */
|
||||
writePtr = pq_getmsgint64(&reply_message);
|
||||
|
|
@ -2469,22 +2471,23 @@ ProcessStandbyReplyMessage(void)
|
|||
applyLag = LagTrackerRead(SYNC_REP_WAIT_APPLY, applyPtr, now);
|
||||
|
||||
/*
|
||||
* If the standby reports that it has fully replayed the WAL in two
|
||||
* consecutive reply messages, then the second such message must result
|
||||
* from wal_receiver_status_interval expiring on the standby. This is a
|
||||
* convenient time to forget the lag times measured when it last
|
||||
* wrote/flushed/applied a WAL record, to avoid displaying stale lag data
|
||||
* until more WAL traffic arrives.
|
||||
* If the standby reports that it has fully replayed the WAL, and the
|
||||
* write/flush/apply positions remain unchanged across two consecutive
|
||||
* reply messages, forget the lag times measured when it last
|
||||
* wrote/flushed/applied a WAL record.
|
||||
*
|
||||
* The second message with unchanged positions typically results from
|
||||
* wal_receiver_status_interval expiring on the standby, so lag values are
|
||||
* usually cleared after that interval when there is no activity. This
|
||||
* avoids displaying stale lag data until more WAL traffic arrives.
|
||||
*/
|
||||
clearLagTimes = false;
|
||||
if (applyPtr == sentPtr)
|
||||
{
|
||||
if (fullyAppliedLastTime)
|
||||
clearLagTimes = true;
|
||||
fullyAppliedLastTime = true;
|
||||
}
|
||||
else
|
||||
fullyAppliedLastTime = false;
|
||||
clearLagTimes = (applyPtr == sentPtr && flushPtr == sentPtr &&
|
||||
writePtr == prevWritePtr && flushPtr == prevFlushPtr &&
|
||||
applyPtr == prevApplyPtr);
|
||||
|
||||
prevWritePtr = writePtr;
|
||||
prevFlushPtr = flushPtr;
|
||||
prevApplyPtr = applyPtr;
|
||||
|
||||
/* Send a reply if the standby requested one. */
|
||||
if (replyRequested)
|
||||
|
|
|
|||
Loading…
Reference in a new issue