postgresql/src/backend
Tom Lane 7573d122f1 Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running.  However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running.  If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin.  Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages.  The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.

The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages.  But that would add cycles,
as well as contention for the ProcArrayLock.  We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all.  In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load.  Light testing
says that this change is a wash performance-wise for normal loads.

I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.

Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.

Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 14:28:33 -04:00
..
access Fix access-off-end-of-array in clog.c. 2017-10-06 12:20:13 -04:00
bootstrap Protect against multixact members wraparound 2015-04-28 11:32:53 -03:00
catalog Include foreign tables in information_schema.table_privileges 2017-08-15 19:32:52 -04:00
commands Fix low-probability loss of NOTIFY messages due to XID wraparound. 2017-10-11 14:28:33 -04:00
executor Fix traversal of half-frozen update chains 2017-10-06 17:14:42 +02:00
foreign Arrange to cache FdwRoutine structs in foreign tables' relcache entries. 2013-03-06 23:48:09 -05:00
lib Misc comment typo fixes. 2014-12-16 16:39:33 +02:00
libpq Fix saving and restoring umask 2017-09-23 10:05:40 -04:00
main Make fallback implementation of pg_memory_barrier() work in 9.2 and 9.3. 2016-04-16 10:42:07 -04:00
nodes Fix improper repetition of previous results from a hashed aggregate. 2016-08-24 14:37:51 -04:00
optimizer Spelling fixes 2017-03-14 13:45:45 -04:00
parser Add missing ALTER USER variants 2017-08-03 21:29:36 -04:00
po Translation updates 2017-08-28 10:09:04 -04:00
port Avoid depending on non-POSIX behavior of fcntl(2). 2017-04-21 15:55:56 -04:00
postmaster On Windows, retry process creation if we fail to reserve shared memory. 2017-07-10 11:00:09 -04:00
regex Fix regexport.c to behave sanely with lookaround constraints. 2017-04-13 17:18:35 -04:00
replication Fix coding rules violations in walreceiver.c 2017-10-03 14:58:25 +02:00
rewrite Fix multiple assignments to a column of a domain type. 2017-07-11 16:48:59 -04:00
snowball Fix ancient encoding error in hungarian.stop. 2014-06-10 22:48:39 -04:00
storage Fix race condition in predicate-lock init code in EXEC_BACKEND builds. 2017-07-24 16:45:47 -04:00
tcop Unify SIGHUP handling between normal and walsender backends. 2017-06-05 19:18:16 -07:00
tsearch Reduce memory usage of tsvector type analyze function. 2017-07-12 22:04:08 +03:00
utils Fix low-probability loss of NOTIFY messages due to XID wraparound. 2017-10-11 14:28:33 -04:00
.gitignore Add gitignore for mingw/cygwin build outputs 2011-06-09 18:11:47 +02:00
common.mk Call check_keywords.pl in maintainer-check 2012-02-27 13:53:12 +02:00
Makefile AIX: Link the postgres executable with -Wl,-brtllib. 2015-07-15 21:00:30 -04:00
nls.mk xlogreader.c: Fix report_invalid_record translatability flag 2015-01-09 12:34:24 -03:00