postgresql/src/include
Michael Paquier ec59500a17 Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN.  This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded.  The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.

This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled.  This would cause visibility issues with transactions that we
should be waiting for but they were not.  The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.

In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined.  In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized.  This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state.  This is the same as the previous sync_standbys_defined in
WalSndCtl.

This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.

Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing.  On machines slow enough, this bug
would cause spurious failures.

In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer.  This
issue is quite old, and can be reproduced on all the stable branches.

Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
2025-04-11 10:02:18 +09:00
..
access At update of non-LP_NORMAL TID, fail instead of corrupting page header. 2025-01-25 11:28:19 -08:00
backup Move basebackup code to new directory src/backend/backup 2022-08-10 14:03:09 -04:00
bootstrap Apply PGDLLIMPORT markings broadly. 2022-04-08 08:16:38 -04:00
catalog Fix broken handling of domains in atthasmissing logic. 2025-03-03 12:43:29 -05:00
commands doc: Add better description for rewrite functions in event triggers 2024-10-29 15:35:19 +09:00
common Fix corner-case 64-bit integer subtraction bug on some platforms. 2023-11-09 09:54:22 +00:00
datatype Avoid using timezone Asia/Manila in regression tests. 2025-01-20 15:47:53 -05:00
executor Simplify executor's determination of whether to use parallelism. 2024-12-09 14:38:19 -05:00
fe_utils Specify the encoding of input to fmtId() 2025-02-10 10:03:39 -05:00
foreign Update copyright for 2022 2022-01-07 19:04:57 -05:00
jit Monkey-patch LLVM code to fix ARM relocation bug. 2024-11-06 23:09:28 +13:00
lib simplehash: Free collisions array in SH_STAT 2024-04-07 19:09:04 -07:00
libpq Make dblink interruptible, via new libpqsrv APIs. 2025-04-03 09:34:01 -07:00
mb Add pg_encoding_set_invalid() 2025-02-10 10:03:39 -05:00
nodes Repair commits 317aba70e et al for -DWRITE_READ_PARSE_PLAN_TREES. 2025-03-13 12:13:07 -04:00
optimizer Account for optimized MinMax aggregates during SS_finalize_plan. 2024-05-18 14:31:35 -04:00
parser Handle default NULL insertion a little better. 2025-01-29 15:31:55 -05:00
partitioning Refactor and cleanup runtime partition prune code a little 2022-04-05 11:46:48 +02:00
port Provide 64-bit ftruncate() and lseek() on Windows. 2025-01-09 14:58:18 +13:00
portability Update copyright for 2022 2022-01-07 19:04:57 -05:00
postmaster Un-revert "Disable STARTUP_PROGRESS_TIMEOUT in standby mode." 2023-02-10 16:27:05 -05:00
regex Avoid assertion due to disconnected NFA sub-graphs in regex parsing. 2024-11-15 18:23:38 -05:00
replication Fix race with synchronous_standby_names at startup 2025-04-11 10:02:18 +09:00
rewrite Fix calculation of which GENERATED columns need to be updated. 2023-01-05 14:12:17 -05:00
snowball Update copyright for 2022 2022-01-07 19:04:57 -05:00
statistics Add stxdinherit flag to pg_statistic_ext_data 2022-01-16 13:38:01 +01:00
storage Restore smgrtruncate() prototype in back-branches. 2025-01-08 10:47:43 +13:00
tcop Restrict accesses to non-system views and foreign tables during pg_dump. 2024-08-05 06:05:25 -07:00
tsearch Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline. 2022-11-21 17:07:07 -05:00
utils Fix catcache invalidation of a list entry that's being built 2025-01-14 14:29:11 +02:00
.gitignore Refactor dlopen() support 2018-09-06 11:33:04 +02:00
c.h Assume that <stdbool.h> conforms to the C standard. 2024-11-25 20:53:55 +13:00
fmgr.h Pre-beta mechanical code beautification. 2022-05-12 15:17:30 -04:00
funcapi.h Rename SetSingleFuncCall() to InitMaterializedSRF() 2022-10-18 10:22:40 +09:00
getaddrinfo.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
getopt_long.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
Makefile Build in some knowledge about foreign-key relationships in the catalogs. 2021-02-02 17:11:55 -05:00
miscadmin.h Exclude parallel workers from connection privilege/limit checks. 2024-12-28 16:08:50 -05:00
pg_config.h.in Fix detection and handling of strchrnul() for macOS 15.4. 2025-04-01 16:49:51 -04:00
pg_config_ext.h.in Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_manual.h Fix old-fd issues using global barriers everywhere. 2022-05-07 16:47:29 +12:00
pg_getopt.h Apply PGDLLIMPORT markings broadly. 2022-04-08 08:16:38 -04:00
pg_trace.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
pgstat.h Fix assertion failure when updating stats_fetch_consistency in a transaction 2023-05-10 11:24:40 +09:00
pgtar.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
pgtime.h Apply PGDLLIMPORT markings broadly. 2022-04-08 08:16:38 -04:00
port.h Avoid breaking SJIS encoding while de-backslashing Windows paths. 2025-01-29 14:24:36 -05:00
postgres.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
postgres_ext.h Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
postgres_fe.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
rusagestub.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
windowapi.h Update copyright for 2022 2022-01-07 19:04:57 -05:00