postgresql/src
Alvaro Herrera a5736bf754 Fix traversal of half-frozen update chains
When some tuple versions in an update chain are frozen due to them being
older than freeze_min_age, the xmax/xmin trail can become broken.  This
breaks HOT (and probably other things).  A subsequent VACUUM can break
things in more serious ways, such as leaving orphan heap-only tuples
whose root HOT redirect items were removed.  This can be seen because
index creation (or REINDEX) complain like
  ERROR:  XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"

Because of relfrozenxid contraints, we cannot avoid the freezing of the
early tuples, so we must cope with the results: whenever we see an Xmin
of FrozenTransactionId, consider it a match for whatever the previous
Xmax value was.

This problem seems to have appeared in 9.3 with multixact changes,
though strictly speaking it seems unrelated.

Since 9.4 we have commit 37484ad2a "Change the way we mark tuples as
frozen", so the fix is simple: just compare the raw Xmin (still stored
in the tuple header, since freezing merely set an infomask bit) to the
Xmax.  But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
the original value is lost and we have nothing to compare the Xmax with.
To cope with that case we need to compare the Xmin with FrozenXid,
assume it's a match, and hope for the best.  Sadly, since you can
pg_upgrade a 9.3 instance containing half-frozen pages to newer
releases, we need to keep the old check in newer versions too, which
seems a bit brittle; I hope we can somehow get rid of that.

I didn't optimize the new function for performance.  The new coding is
probably a bit slower than before, since there is a function call rather
than a straight comparison, but I'd rather have it work correctly than
be fast but wrong.

This is a followup after 20b6552242 fixed a few related problems.
Apparently, in 9.6 and up there are more ways to get into trouble, but
in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
there must be a separate bug.

Reported-by: Peter Geoghegan
Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
	Yi Wen Wong, Álvaro
Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
2017-10-06 17:20:01 +02:00
..
backend Fix traversal of half-frozen update chains 2017-10-06 17:20:01 +02:00
bin Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. 2017-10-01 15:36:14 -07:00
common Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. 2017-10-01 15:36:14 -07:00
fe_utils Provide a test for variable existence in psql 2017-09-21 19:02:23 -04:00
include Fix traversal of half-frozen update chains 2017-10-06 17:20:01 +02:00
interfaces Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. 2017-10-01 15:36:14 -07:00
makefiles Always use -fPIC, not -fpic, when building shared libraries with gcc. 2017-06-01 13:32:55 -04:00
pl Use Py_RETURN_NONE where suitable 2017-09-29 16:51:39 -04:00
port Yet another pg_bswap typo in a windows only file. 2017-10-01 20:05:27 -07:00
template Remove "sco" and "unixware" ports. 2016-10-11 11:26:04 -04:00
test Basic partition-wise join functionality. 2017-10-06 11:11:10 -04:00
timezone Sync our copy of the timezone library with IANA tzcode master. 2017-09-22 00:04:29 -04:00
tools Attempt to adapt windows build for 212e6f34d5. 2017-10-04 09:32:02 -07:00
tutorial Distinguish selectivity of < from <= and > from >=. 2017-09-13 11:12:39 -04:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
DEVELOPERS Replace a couple of references to files that no longer exist in the source 2009-05-04 08:08:47 +00:00
Makefile Remove redundant coverage target 2017-02-17 08:56:57 -05:00
Makefile.global.in Add PostgreSQL version to coverage output 2017-09-29 08:54:47 -04:00
Makefile.shlib Remove support for bcc and msvc standalone libpq builds 2017-04-11 15:22:21 +02:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00