postgresql/src
Amit Kapila 4909b38af0 Fix data loss in logical replication.
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ...  that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
2025-04-10 13:14:40 +05:30
..
backend Fix data loss in logical replication. 2025-04-10 13:14:40 +05:30
bin Fix incorrect format placeholders 2025-04-10 08:04:35 +02:00
common Use XLOG_CONTROL_FILE macro consistently for control file name. 2025-04-07 09:27:33 +09:00
fe_utils add new list type simple_oid_string_list to fe-utils/simple_list 2025-04-04 16:01:22 -04:00
include Fix data loss in logical replication. 2025-04-10 13:14:40 +05:30
interfaces libpq: Fix some issues in TAP tests for service files 2025-04-07 12:55:09 +09:00
makefiles Add support for basic NUMA awareness 2025-04-07 23:08:17 +02:00
pl plpython: Add test for returning Python set from SETOF function 2025-04-03 11:09:50 +02:00
port Cleanup of pg_numa.c 2025-04-09 21:50:17 +02:00
template thread-safety: gmtime_r(), localtime_r() 2024-08-23 07:43:04 +02:00
test Fix performance issue in deadlock-parallel isolation test. 2025-04-09 12:28:34 -04:00
timezone pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
tools Introduce file_copy_method setting. 2025-04-08 21:35:38 +12:00
tutorial Doc: simplify the tutorial's window-function examples. 2025-01-21 14:43:21 -05:00
.gitignore
DEVELOPERS
Makefile Remove distprep 2023-11-06 15:18:04 +01:00
Makefile.global.in Add support for basic NUMA awareness 2025-04-07 23:08:17 +02:00
Makefile.shlib Remove AIX support 2024-02-28 15:17:23 +04:00
meson.build Update copyright for 2025 2025-01-01 11:21:55 -05:00
nls-global.mk Remove distprep 2023-11-06 15:18:04 +01:00