postgresql/src
Michael Paquier c030db3495 Rework order of end-of-recovery actions to delay timeline history write
A critical failure in some of the end-of-recovery actions before the
end-of-recovery record is written can cause PostgreSQL to react
inconsistently with the rest of the cluster in the event of a crash
before the final record is written.  Two such failures are for example
an error while processing a two-phase state files or when operating on
recovery.conf.  With this commit, the failures are still considered
FATAL, but the write of the timeline history file is delayed as much as
possible so as the window between the moment the file is written and the
end-of-recovery record is generated gets minimized. This way, in the
event of a crash or a failure, the new timeline decided at promotion
will not seem taken by other nodes in the cluster.  It is not really
possible to reduce to zero this window, hence one could still see
failures if a crash happens between the history file write and the
end-of-recovery record, so any future code should be careful when
adding new end-of-recovery actions.  The original report from Magnus
Hagander mentioned a renamed recovery.conf as original end-of-recovery
failure which caused a timeline to be seen as taken but the subsequent
processing on the now-missing recovery.conf cause the startup process to
issue stop on FATAL, which at follow-up startup made the system
inconsistent because of on-disk changes which already happened.

Processing of two-phase state files still needs some work as corrupted
entries are simply ignored now.  This is left as a future item and this
commit fixes the original complain.

Reported-by: Magnus Hagander
Author: Heikki Linnakangas
Reviewed-by: Alexander Korotkov, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com
2018-07-09 10:26:18 +09:00
..
backend Rework order of end-of-recovery actions to delay timeline history write 2018-07-09 10:26:18 +09:00
bin Correct handling of fsync failures with tar mode of walmethods.c 2018-06-26 09:56:55 +09:00
common Fix error message on short read of pg_control 2018-05-18 17:53:12 +02:00
fe_utils Empty search_path in Autovacuum and non-psql/pgbench clients. 2018-02-26 07:39:47 -08:00
include Improve the performance of relation deletes during recovery. 2018-07-05 02:26:22 +09:00
interfaces Add PGTYPESchar_free() to avoid cross-module problems on Windows. 2018-06-26 19:49:52 +12:00
makefiles Always use -fPIC, not -fpic, when building shared libraries with gcc. 2017-06-01 13:32:55 -04:00
pl Fix misidentification of SQL statement type in plpgsql's exec_stmt_execsql. 2018-05-25 14:31:06 -04:00
port Fix simple_prompt() to disable echo on Windows when stdin != terminal. 2018-05-23 19:04:34 -04:00
template Remove "sco" and "unixware" ports. 2016-10-11 11:26:04 -04:00
test Prevent references to invalid relation pages after fresh promotion 2018-07-05 10:47:01 +09:00
timezone Update time zone data files to tzdata release 2018e. 2018-05-09 13:55:42 -04:00
tools Clear severity 5 perlcritic warnings from vcregress.pl 2018-05-06 07:39:05 -04:00
tutorial Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
DEVELOPERS Replace a couple of references to files that no longer exist in the source 2009-05-04 08:08:47 +00:00
Makefile Build src/test/isolation during "make" and "make install". 2017-11-22 20:18:52 -08:00
Makefile.global.in Be more thorough about cleaning out gcov litter. 2017-08-11 17:39:27 -04:00
Makefile.shlib Fix make rules that generate multiple output files. 2018-03-23 13:45:38 -04:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00