postgresql/src
Peter Geoghegan 9a9db08ae4 Fix replica backward scan race condition.
It was possible for the logic used by backward scans (which must reason
about concurrent page splits/deletions in its own peculiar way) to
become confused when running on a replica.  Concurrent replay of a WAL
record that describes the second phase of page deletion could cause
_bt_walk_left() to get confused.  btree_xlog_unlink_page() simply failed
to adhere to the same locking protocol that we use on the primary, which
is obviously wrong once you consider these two disparate functions
together.  This bug is present in all stable branches.

More concretely, the problem was that nothing stopped _bt_walk_left()
from observing inconsistencies between the deletion's target page and
its original sibling pages when running on a replica.  This is true even
though the second phase of page deletion is supposed to work as a single
atomic action.  Queries running on replicas raised "could not find left
sibling of block %u in index %s" can't-happen errors when they went back
to their scan's "original" page and observed that the page has not been
marked deleted (even though it really was concurrently deleted).

There is no evidence that this actually happened in the real world.  The
issue came to light during unrelated feature development work.  Note
that _bt_walk_left() is the only code that cares about the difference
between a half-dead page and a fully deleted page that isn't also
exclusively used by nbtree VACUUM (unless you include contrib/amcheck
code).  It seems very likely that backward scans are the only thing that
could become confused by the inconsistency.  Even amcheck's complex
bt_right_page_check_scankey() dance was unaffected.

To fix, teach btree_xlog_unlink_page() to lock the left sibling, target,
and right sibling pages in that order before releasing any locks (just
like _bt_unlink_halfdead_page()).  This is the simplest possible
approach.  There doesn't seem to be any opportunity to be more clever
about lock acquisition in the REDO routine, and it hardly seems worth
the trouble in any case.

This fix might enable contrib/amcheck verification of leaf page sibling
links with only an AccessShareLock on the relation.  An amcheck patch
from Andrey Borodin was rejected back in January because it clashed with
btree_xlog_unlink_page()'s lax approach to locking pages.  It now seems
likely that the real problem was with btree_xlog_unlink_page(), not the
patch.

This is a low severity, low likelihood bug, so no backpatch.

Author: Michail Nikolaev
Diagnosed-By: Michail Nikolaev
Discussion: https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com
2020-08-03 15:54:38 -07:00
..
backend Fix replica backward scan race condition. 2020-08-03 15:54:38 -07:00
bin Remove unnecessary "DISTINCT" in psql's queries for \dAc and \dAf. 2020-08-03 14:02:35 -04:00
common Prevent compilation of frontend-only files in src/common/ with backend 2020-06-30 13:26:11 +09:00
fe_utils Mop up some no-longer-necessary hacks around printf %.*s format. 2020-06-29 17:12:38 -04:00
include Correct comment in simplehash.h. 2020-08-03 12:23:05 +12:00
interfaces Fix behavior of ecpg's "EXEC SQL elif name". 2020-08-03 09:46:12 -04:00
makefiles Remove libpq.rc, use win32ver.rc for libpq 2020-01-15 15:06:12 +01:00
pl Fix -Wcast-function-type warnings 2020-07-14 19:55:25 +02:00
port Remove optimization for RAND_poll() failing. 2020-07-25 14:50:59 -07:00
template Fix compiler warning for ppoll() on Cygwin 2019-12-22 23:20:00 +01:00
test Fix rare failure in LDAP tests. 2020-08-03 12:49:36 +12:00
timezone Ensure that distributed timezone abbreviation files are plain ASCII. 2020-07-17 11:03:55 -04:00
tools Rename configure.in to configure.ac 2020-07-24 10:42:08 +02:00
tutorial Update copyrights for 2020 2020-01-01 12:21:45 -05:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
DEVELOPERS Replace a couple of references to files that no longer exist in the source 2009-05-04 08:08:47 +00:00
Makefile Fix partial-build problems introduced by having more generated headers. 2018-04-09 16:42:10 -04:00
Makefile.global.in Update Unicode data to Unicode 13.0.0 and CLDR 37 2020-04-24 09:52:59 +02:00
Makefile.shlib Add PostgreSQL home page to --help output 2020-02-28 13:12:21 +01:00
nls-global.mk NLS: Fix backend gettext triggers 2019-09-23 09:04:20 +02:00