Previously it was impossible to terminate these programs via control-C
while they were prompting for a password. We can fix that trivially
for their initial password prompts, by moving setup of the SIGINT
handler from just before to just after their initial GetConnection()
calls.
This fix doesn't permit escaping out of later re-prompts, but those
should be exceedingly rare, since the user's password or the server's
authentication setup would have to have changed meanwhile. We
considered applying a fix similar to commit 46d665bc2, but that
seemed more complicated than it'd be worth. Moreover, this way is
back-patchable, which that wasn't.
The misbehavior exists in all supported versions, so back-patch to all.
Tom Lane and Nathan Bossart
Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
The error handling here was a mess, as a result of a fundamentally
bad design (relying on errno to keep its value much longer than is
safe to assume) as well as a lot of just plain sloppiness, both as
to noticing errors at all and as to reporting the correct errno.
Moreover, the recent addition of LZ4 compression broke things
completely, because liblz4 doesn't use errno to report errors.
To improve matters, keep the error state in the DirectoryMethodData or
TarMethodData struct, and add a string field so we can handle cases
that don't set errno. (The tar methods already had a version of this,
but it can be done more efficiently since all these cases use a
constant error string.) Make the dir and tar methods handle errors
in basically identical ways, which they didn't before.
This requires copying errno into the state struct in a lot of places,
which is a bit tedious, but it has the virtue that we can get rid of
ad-hoc code to save and restore errno in a number of places ... not
to mention that it fixes other places that should've saved/restored
errno but neglected to.
In passing, fix some pointlessly static buffers to be ordinary
local variables.
There remains an issue about exactly how to handle errors from
fsync(), but that seems like material for its own patch.
While the LZ4 problems are new, all the rest of this is fixes for
old bugs, so backpatch to v10 where walmethods.c was introduced.
Patch by me; thanks to Michael Paquier for review.
Discussion: https://postgr.es/m/1343113.1636489231@sss.pgh.pa.us
Make sure that the string parsing is limited by the size of the
destination buffer.
In pg_basebackup the available values sent from the server
is limited to two characters so there was no risk of overflow.
In pg_dump the buffer is bounded by MAXPGPATH, and thus the limit
must be inserted via preprocessor expansion and the buffer increased
by one to account for the terminator. There is no risk of overflow
here, since in this case, the buffer scanned is smaller than the
destination buffer.
Backpatch the pg_basebackup fix to 11 where it was introduced, and
the pg_dump fix all the way down to 9.6.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/B14D3D7B-F98C-4E20-9459-C122C67647FB@yesql.se
Backpatch-through: 11 and 9.6
These have been introduced by 7fbe0c8, and could happen for
pg_basebackup and pg_receivewal.
Per report from Coverity for the ones in walmethods.c, I have spotted
the ones in receivelog.c after more review.
Backpatch-through: 10
The logic handling the opening of new WAL segments was fuzzy when using
--compress if a partial, non-compressed, segment with the same base name
existed in the repository storing those files. In this case, using
--compress would cause the code to first check for the existence and the
size of a non-compressed segment, followed by the opening of a new
compressed, partial, segment. The code was accidentally working
correctly on most platforms as the buildfarm has proved, except
bowerbird where gzflush() could fail in this code path. It is wrong
anyway to take the code path used pre-padding when creating a new
partial, non-compressed, segment, so let's fix it.
Note that this issue exists when users mix successive runs of
pg_receivewal with or without compression, as discovered with the tests
introduced by ffc9dda.
While on it, this refactors the code so as code paths that need to know
about the ".gz" suffix are down from four to one in walmethods.c, easing
a bit the introduction of new compression methods. This addresses a
second issue where log messages generated for an unexpected failure
would not show the compressed segment name involved, which was
confusing, printing instead the name of the non-compressed equivalent.
Reported-by: Georgios Kokolatos
Discussion: https://postgr.es/m/YPDLz2x3o1aX2wRh@paquier.xyz
Backpatch-through: 10
This commit back-patches the v12-era commits a73d08319, cc92cca43,
and 7570df0f3 into supported pre-v12 branches. The net effect is to
eliminate our former dependency on the never-standard sys_siglist[]
array, instead using POSIX-standard strsignal(3).
What motivates doing this now is that glibc just removed sys_siglist[]
from the set of symbols available to newly-built programs. While our
code can survive without sys_siglist[], it then fails to print any
description of the signal that killed a child process, which is a
non-negligible loss of friendliness. We can expect that people will
be wanting to build the back branches on platforms that include this
change, so we need to do something.
Since strsignal(3) has existed for quite a long time, and we've not
had any trouble with these patches so far in v12, it seems safe to
back-patch into older branches.
Discussion: https://postgr.es/m/3179114.1594853308@sss.pgh.pa.us
A few places calling fwrite and gzwrite were not setting errno to ENOSPC
when reporting errors, as is customary; this led to some failures being
reported as
"could not write file: Success"
which makes us look silly. Make a few of these places in pg_dump and
pg_basebackup use our customary pattern.
Backpatch-to: 9.5
Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200611153753.GU14879@telsasoft.com
The defect suppressed a Standby Status Update message when bytes flushed
to disk had changed but bytes received had not changed. If
pg_recvlogical then exited with no intervening Standby Status Update,
the next pg_recvlogical repeated already-flushed records. The defect
could also cause superfluous messages, which are functionally harmless.
Back-patch to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20200502221647.GA3941274@rfd.leadboat.com
pg_recvlogical merely called PQfinish(), so the backend sent messages
after the disconnect. When that caused EPIPE in internal_flush(),
before a LogicalConfirmReceivedLocation(), the next pg_recvlogical would
repeat already-acknowledged records. Whether or not the defect causes
EPIPE, post-disconnect messages could contain an ErrorResponse that the
user should see. One properly ends PGRES_COPY_OUT by repeating
PQgetCopyData() until it returns a negative value. Augment one of the
tests to cover the case of WAL past --endpos. Back-patch to v10, where
commit 7c030783a5 first appeared. Before
that commit, pg_recvlogical never reached PGRES_COPY_OUT.
Reported by Thomas Munro.
Discussion: https://postgr.es/m/CAEepm=1MzM2Z_xNe4foGwZ1a+MO_2S9oYDq3M5D11=JDU_+0Nw@mail.gmail.com
The result of the query used to retrieve the WAL segment size from the
backend was not getting freed in two code paths. Both pg_basebackup and
pg_receivewal exit immediately if a failure happened on this query, so
this was not an actual problem, but it could be an issue if this code
gets used for other tools in different ways, be they future tools in
this code tree or external, existing, ones.
Oversight in commit fc49e24, so backpatch down to 11.
Author: Jie Zhang
Discussion: https://postgr.es/m/970ad9508461469b9450b64027842331@G08CNEXMBPEKD06.g08.fujitsu.local
Backpatch-through: 11
An instance of PostgreSQL crashing with a bad timing could leave behind
temporary pg_internal.init files, potentially causing failures when
verifying checksums. As the same exclusion lists are used between
pg_rewind, pg_checksums and basebackup.c, all those tools are extended
with prefix checks to keep everything in sync, with dedicated checks
added for pg_internal.init.
Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and
checksum verification for base backups have been introduced.
Reported-by: Michael Banck
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/62031974fd8e941dd8351fbc8c7eff60d59c5338.camel@credativ.de
Backpatch-through: 11
Since the addition of fsync requests in bc34223 to make base backup data
consistent on disk once pg_basebackup finishes, each tablespace tar file
is individually flushed once completed, with an additional flush of the
parent directory when the base backup finishes. While holding a
connection to the server, a fsync request taking a long time may cause a
failure of the base backup, which is annoying for any integration. A
recent example of breakage can involve tcp_user_timeout, but
wal_sender_timeout can cause similar problems.
While reviewing the code, there was a second issue causing too many
fsync requests to be done for the same WAL data. As recursive fsyncs
are done at the end of the backup for both the plain and tar formats
from the base target directory where everything is written, it is fine
to disable fsyncs when fetching or streaming WAL.
Reported-by: Ryohei Takahashi
Author: Michael Paquier
Reviewed-by: Ryohei Takahashi
Discussion: https://postgr.es/m/OSBPR01MB4550DAE2F8C9502894A45AAB82BE0@OSBPR01MB4550.jpnprd01.prod.outlook.com
Backpatch-through: 10
ee9e145 has fixed the tests of pg_basebackup for checksums a first time,
still one seek() call missed the shot. Also, the data written in files
to emulate corruptions was not actually writing zeros as the quoting
style was incorrect.
Author: Michael Banck
Discussion: https://postgr.es/m/1550153276.796.35.camel@credativ.de
Backpatch-through: 11
This reverts commit f02259fe93, in the
v11 branch only.
The hack this required in initdb.c should probably have clued us that it
wasn't really ready, but we didn't get the hint. Subsequent developments
have made clear that it affected text-vs-binary behavior in a lot of
places, and there's no reason to think that any of those behavioral changes
are desirable. There's no time to fix this before 11beta4, so just revert
for the moment. We can keep working on this in HEAD, and maybe reconsider
a back-patch once we're satisfied things are stable.
(I take the blame for this fiasco, having encouraged Michael to back-patch
a change at the last possible moment before beta wrap.)
PostgreSQL uses a custom wrapper for open() and fopen() which is
concurrent-safe, allowing multiple processes to open and work on the
same file. This has a couple of advantages:
- pg_test_fsync does not handle O_DSYNC correctly otherwise, leading to
false claims that disks are unsafe.
- TAP tests can run into race conditions when a postmaster and pg_ctl
open postmaster.pid, fixing some random failures in the buildfam.
pg_upgrade is one frontend tool using workarounds to bypass file locking
issues with the log files it generates, however the interactions with
pg_ctl are proving to be tedious to get rid of, so this is left for
later.
Author: Laurenz Albe
Reviewed-by: Michael Paquier, Kuntal Ghosh
Discussion: https://postgr.es/m/1527846213.2475.31.camel@cybertec.at
Discussion: https://postgr.es/m/16922.1520722108@sss.pgh.pa.us
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd. However, that policy's
been ignored in an increasing number of places. We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway. But this is not
something to rely on. Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.
To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.
I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster. I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.
Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.
Patch by me; thanks to Michael Paquier for review.
Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
6cb3372 enforces errno to ENOSPC when less bytes than what is expected
have been written when it is unset, though it forgot to properly reset
errno before doing a system call to write(), causing errno to
potentially come from a previous system call.
Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us
This file has been missing the fact that it needs to report back to
callers a proper failure on fsync calls. I have spotted the one in
tar_finish() while Kuntal has spotted the one in tar_close().
Backpatch down to 10 where this code has been introduced.
Reported by: Michael Paquier, Kuntal Ghosh
Author: Michael Paquier
Reviewed-by: Kuntal Ghosh, Magnus Hagander
Discussion: https://postgr.es/m/20180625024356.GD1146@paquier.xyz
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set. Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.
This patch uses the brute-force approach of correcting all those code
paths. Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.
Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
The vertical tightness settings collapse vertical whitespace between
opening and closing brackets (parentheses, square brakets and braces).
This can make data structures in particular harder to read, and is not
very consistent with our style in non-Perl code. This patch restricts
that setting to parentheses only, and reformats all the perl code
accordingly. Not applying this to parentheses has some unfortunate
effects, so the consensus is to keep the setting for parentheses and not
for the others.
The diff for this patch does highlight some places where structures
should have trailing commas. They can be added manually, as there is no
automatic tool to do so.
Discussion: https://postgr.es/m/a2f2b87c-56be-c070-bfc0-36288b4b41c1@2ndQuadrant.com
The predecessor test boiled down to "PQserverVersion(NULL) >= 100000",
which is always false. No release includes that, so it could not have
reintroduced CVE-2018-1058. Back-patch to 9.4, like the addition of the
predecessor in commit 8d2814f274.
Discussion: https://postgr.es/m/20180422215551.GB2676194@rfd.leadboat.com