postgresql/src
Michael Paquier 7c525519d8 Avoid duplicate XIDs at recovery when building initial snapshot
On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby.  The set of transaction IDs is created
by scanning all the entries in ProcArray.  However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes.  This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.

So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design.  If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice.  It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.

XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.

Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
Backpatch-through: 9.3
2018-10-14 22:23:54 +09:00
..
backend Avoid duplicate XIDs at recovery when building initial snapshot 2018-10-14 22:23:54 +09:00
bin Initialize random() in bootstrap/stand-alone postgres and in initdb. 2018-09-23 22:56:57 -07:00
common Enlarge find_other_exec's meager fgets buffer 2018-04-19 10:45:15 -03:00
include Back-patch addition of the ALLOCSET_FOO_SIZES macros. 2018-10-12 14:49:33 -04:00
interfaces Reduce an unnecessary O(N^3) loop in lexer. 2018-08-23 21:33:38 +01:00
makefiles Prevent accidental linking of system-supplied copies of libpq.so etc. 2018-07-09 17:23:32 -04:00
pl Make some fixes to allow building Postgres on macOS 10.14 ("Mojave"). 2018-09-25 13:23:29 -04:00
port Set snprintf.c's maximum number of NL arguments to be 31. 2018-10-02 12:41:28 -04:00
template Make some fixes to allow building Postgres on macOS 10.14 ("Mojave"). 2018-09-25 13:23:29 -04:00
test Remove abstime, reltime, tinterval tables from old regression databases. 2018-10-12 19:33:57 -04:00
timezone Update time zone data files to tzdata release 2018e. 2018-05-09 13:56:00 -04:00
tools Support building with Visual Studio 2017 2018-09-11 16:03:42 -04:00
tutorial pgindent run for 9.4 2014-05-06 12:12:18 -04:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
bcc32.mak Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
DEVELOPERS Replace a couple of references to files that no longer exist in the source 2009-05-04 08:08:47 +00:00
Makefile Install TAP test infrastructure so it's available for extension testing. 2016-09-23 15:50:00 -04:00
Makefile.global.in Make some fixes to allow building Postgres on macOS 10.14 ("Mojave"). 2018-09-25 13:23:29 -04:00
Makefile.shlib Prevent accidental linking of system-supplied copies of libpq.so etc. 2018-07-09 17:23:32 -04:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00
win32.mak Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00