postgresql/src/include
Heikki Linnakangas 321ec54625 Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:

test=# listen c21;
ERROR:  58P01: could not access status of transaction 14279685
DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087

To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.

Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.

This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
  for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
  patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
  as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
  anyone.

Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.

Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org
Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
Backpatch-through: 14
2025-11-12 21:00:42 +02:00
..
access Introduce XLogRecPtrIsValid() 2025-11-06 19:08:29 +01:00
archive Update copyright for 2025 2025-01-01 11:21:55 -05:00
backup Update copyright for 2025 2025-01-01 11:21:55 -05:00
bootstrap pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
catalog Fix a deadlock during ALTER SUBSCRIPTION ... DROP PUBLICATION. 2025-08-01 07:46:22 +00:00
commands Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY 2025-11-12 21:00:42 +02:00
common Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays 2025-05-08 22:01:25 +03:00
datatype Avoid using timezone Asia/Manila in regression tests. 2025-01-20 15:47:53 -05:00
executor Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch. 2025-10-15 17:15:01 +09:00
fe_utils Remove inappropriate inclusions of c.h and postgres_fe.h. 2025-04-27 16:58:57 -04:00
foreign Update copyright for 2025 2025-01-01 11:21:55 -05:00
jit Don't use double-quotes in #include's of system headers, redux. 2025-04-27 13:23:19 -04:00
lib Fix reset of incorrect hash iterator in GROUPING SETS queries 2025-10-18 16:07:41 +13:00
libpq Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays 2025-05-08 22:01:25 +03:00
mb With GB18030, prevent SIGSEGV from reading past end of allocation. 2025-05-05 04:52:04 -07:00
nodes Update obsolete comments in ResultRelInfo struct. 2025-08-17 19:40:01 +09:00
optimizer Disallow collecting transition tuples from child foreign tables. 2025-08-08 10:50:01 +09:00
parser Revert support for improved tracking of nested queries 2025-06-12 10:08:55 +09:00
partitioning Fix incorrect #endif comment 2025-03-10 13:36:04 +13:00
pch Update copyright for 2025 2025-01-01 11:21:55 -05:00
port Fix generic read and write barriers for Clang. 2025-11-08 12:28:15 +13:00
portability Update copyright for 2025 2025-01-01 11:21:55 -05:00
postmaster Fix incorrect const qualifier 2025-09-16 07:23:50 +02:00
regex Update copyright for 2025 2025-01-01 11:21:55 -05:00
replication Make invalid primary_slot_name follow standard GUC error reporting. 2025-10-22 20:10:58 +09:00
rewrite Refactor ChangeVarNodesExtended() using the custom callback 2025-05-07 11:10:16 +03:00
snowball Update to latest Snowball sources. 2025-02-18 21:13:54 -05:00
statistics Fix redefinition of typedef RangeVar. 2025-10-15 13:14:00 -05:00
storage aio: Stop using enum bitfields due to bad code generation 2025-08-27 19:12:50 -04:00
tcop Sync typedefs.list with the buildfarm. 2025-06-15 13:04:24 -04:00
tsearch Update copyright for 2025 2025-01-01 11:21:55 -05:00
utils Fix incorrect message-printing in win32security.c. 2025-10-13 17:56:45 -04:00
.gitignore Use <stdint.h> and <inttypes.h> for c.h integers. 2024-12-04 15:05:38 +13:00
c.h Remove INT64_HEX_FORMAT and UINT64_HEX_FORMAT 2025-08-06 10:58:06 +02:00
fmgr.h Avoid mixing designated and non-designated field initializers. 2025-03-27 11:06:30 -04:00
funcapi.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
getopt_long.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
Makefile Use <stdint.h> and <inttypes.h> for c.h integers. 2024-12-04 15:05:38 +13:00
meson.build Update copyright for 2025 2025-01-01 11:21:55 -05:00
miscadmin.h Avoid mixing void and integer in a conditional expression. 2025-11-02 12:31:01 -05:00
pg_config.h.in aio: Combine io_uring memory mappings, if supported 2025-07-07 21:04:03 -04:00
pg_config_manual.h Avoid invalidating all RelationSyncCache entries on publication rename. 2025-03-13 09:16:33 +05:30
pg_getopt.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
pg_trace.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
pgstat.h Don't include execnodes.h in replication/conflict.h 2025-09-25 14:52:19 +02:00
pgtar.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pgtime.h Seek zone abbreviations in the IANA data before timezone_abbreviations. 2025-01-16 14:11:19 -05:00
port.h Add timingsafe_bcmp(), for constant-time memory comparison 2025-04-02 15:32:40 +03:00
postgres.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
postgres_ext.h Move pg_int64 back to postgres_ext.h 2025-09-16 10:48:44 +02:00
postgres_fe.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
varatt.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
windowapi.h Update copyright for 2025 2025-01-01 11:21:55 -05:00