Starting on 2023-08-03, this intermittently terminated a "pgbench -C"
test in CI. It could affect a high-client-count "pgbench" without "-C".
While parallel reindexdb and vacuumdb reach the same problematic check,
sufficient client count and/or connection turnover is less plausible for
them. Given the lack of examples from the buildfarm or from manual
builds, reproducing this must entail rare operating system
configurations. Also correct the associated error message, which was
wrong for non-Windows. Back-patch to v12, where the pgbench check first
appeared. While v11 vacuumdb has the problematic check, reaching it
with typical vacuumdb usage is implausible.
Reviewed by Thomas Munro.
Discussion: https://postgr.es/m/CA+hUKG+JwvTNdcyJTriy9BbtzF1veSRQ=9M_ZKFn9_LqE7Kp7Q@mail.gmail.com
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.
To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.
An invalid database cannot be connected to anymore, but can still be
dropped.
Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.
Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.
Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
Slow hosts may avoid load-induced, spurious failures by setting
environment variable PG_TEST_TIMEOUT_DEFAULT to some number of seconds
greater than 180. Developers may see faster failures by setting that
environment variable to some lesser number of seconds. In tests, write
$PostgreSQL::Test::Utils::timeout_default wherever the convention has
been to write 180. This change raises the default for some briefer
timeouts. Back-patch to v10 (all supported versions).
Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.
This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them. However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.
Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.) Those functions were already
made proof against this class of problem, cf CVE-2006-2313.
To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it. (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem. So we just want a
version for those callers that don't have any better way of handling
this issue.)
Per private report from houjingyi. Back-patch to all supported branches.
The error messages, docs, and one of the options were using
'parallel degree' to indicate parallelism used by vacuum command. We
normally use 'parallel workers' at other places so change it for parallel
vacuum accordingly.
Author: Bharath Rupireddy
Reviewed-by: Dilip Kumar, Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CALj2ACWz=PYrrFXVsEKb9J1aiX4raA+UBe02hdRp_zqDkrWUiw@mail.gmail.com
The same test for REINDEX (VERBOSE) was done twice, while it is clear
that the second test should use --concurrently. Issue introduced in
5dc92b8, for what looks like a copy-paste mistake.
Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/A7AE97EA-F4B0-4CAB-8FFF-3FECD31F9D63@enterprisedb.com
Backpatch-through: 12
We found last February that the error-case tests added by commit
008cf0409 failed on OpenBSD, because that platform doesn't really
check locale names. At the time it seemed that that was only an issue
for LC_CTYPE, but testing on a more recent version of OpenBSD shows
that it's now equally lax about LC_COLLATE.
Rather than dropping the LC_COLLATE test too, put back LC_CTYPE
(reverting c4b0edb07), and adjust these tests to accept the different
error message that we get if setlocale() doesn't reject a bogus locale
name. The point of these tests is not really what the backend does
with the locale name, but to show that createdb quotes funny locale
names safely; so we're not losing test reliability this way.
Back-patch as appropriate.
Discussion: https://postgr.es/m/231373.1610058324@sss.pgh.pa.us
When told to process all databases, clusterdb, reindexdb, and vacuumdb
would reconnect by replacing their --maintenance-db parameter with the
name of the target database. If that parameter is a connstring (which
has been allowed for a long time, though we failed to document that
before this patch), we'd lose any other options it might specify, for
example SSL or GSS parameters, possibly resulting in failure to connect.
Thus, this is the same bug as commit a45bc8a4f fixed in pg_dump and
pg_restore. We can fix it in the same way, by using libpq's rules for
handling multiple "dbname" parameters to add the target database name
separately. I chose to apply the same refactoring approach as in that
patch, with a struct to handle the command line parameters that need to
be passed through to connectDatabase. (Maybe someday we can unify the
very similar functions here and in pg_dump/pg_restore.)
Per Peter Eisentraut's comments on bug #16604. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
Any libpq client can use the header. Clients include backend components
postgres_fdw, dblink, and logical replication apply worker. Back-patch
to v10, because another fix needs this. In released branches, just copy
the header and keep the original.
As it stands, this flag is only set when we've successfully sent a
cancel request, not if we get SIGINT and then fail to send a cancel.
However, for almost all callers, that's the Wrong Thing: we'd prefer
to abort processing after control-C even if no cancel could be sent.
As an example, since commit 1d468b9ad "pgbench -i" fails to give up
sending COPY data even after control-C, if the postmaster has been
stopped, which is clearly not what the code intends and not what anyone
would want. (The fact that it keeps going at all is the fault of a
separate bug in libpq, but not letting CancelRequested become set is
clearly not what we want here.)
The sole exception, as far as I can find, is that scripts_parallel.c's
ParallelSlotsGetIdle tries to consume a query result after issuing a
cancel, which of course might not terminate quickly if no cancel
happened. But that behavior was poorly thought out too. No user of
ParallelSlotsGetIdle tries to continue processing after a cancel,
so there is really no point in trying to clear the connection's state.
Moreover this has the same defect as for other users of cancel.c,
that if the cancel request fails for some reason then we end up with
control-C being completely ignored. (On top of that, select_loop failed
to distinguish clearly between SIGINT and other reasons for select(2)
failing, which means that it's possible that the existing code would
think that a cancel has been sent when it hasn't.)
Hence, redefine CancelRequested as simply meaning that SIGINT was
received. We could add a second flag with the other meaning, but
in the absence of any compelling argument why such a flag is needed,
I think it would just offer an opportunity for future callers to
get it wrong. Also remove the consumeQueryResult call in
ParallelSlotsGetIdle's failure exit. In passing, simplify the
API of select_loop.
It would now be possible to re-unify psql's cancel_pressed with
CancelRequested, partly undoing 5d43c3c54. But I'm not really
convinced that that's worth the trouble, so I left psql alone,
other than fixing a misleading comment.
This code is new in v13 (cf a4fd3aa71), so no need for back-patch.
Per investigation of a complaint from Andres Freund.
Discussion: https://postgr.es/m/20200603201242.ofvm4jztpqytwfye@alap3.anarazel.de
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
OpenBSD falls back to "C" when using an incorrect input with setlocale()
and LC_CTYPE, causing this test, introduced by 008cf04, to fail. This
removes the culprit test to avoid the portability issue.
Per report from Robert Haas, via buildfarm member curculio.
Discussion: https://postgr.es/m/CA+TgmoZ6ddh3mHD9gU8DvNYoFmuJaYYn1+4AvZNp25vTdRwCAQ@mail.gmail.com
Backpatch-through: 11
The original coding failed to properly quote those arguments, leading to
failures when using quotes in the values used. As the quoting can be
encoding-sensitive, the connection to the backend needs to be taken
before applying the correct quoting.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200214041004.GB1998@paquier.xyz
Backpatch-through: 9.5
Commit 40d964ec99 allowed vacuum command to leverage multiple CPUs by
invoking parallel workers to process indexes. This commit provides a
'--parallel' option to specify the parallel degree used by vacuum command.
Author: Masahiko Sawada, with few modifications by me
Reviewed-by: Mahendra Singh and Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
This variable is now part of the refactored code for query cancellation
in fe_utils. This fixes an oversight in commit a4fd3aa. While on it,
improve some header includes in bin/scripts/.
Author: Michael Paquier
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/20191203101625.GF1634@paquier.xyz
Originally, this code was duplicated in src/bin/psql/ and
src/bin/scripts/, but it can be useful for other frontend applications,
like pgbench. This refactoring offers the possibility to setup a custom
callback which would get called in the signal handler for SIGINT or when
the interruption console events happen on Windows.
Author: Fabien Coelho, with contributions from Michael Paquier
Reviewed-by: Álvaro Herrera, Ibrar Ahmed
Discussion: https://postgr.es/m/alpine.DEB.2.21.1910311939430.27369@lancre
This commit revert the commits to add a test case that tests the 'force'
option when there is an active backend connected to the database being
dropped.
This feature internally sends SIGTERM to all the backends connected to the
database being dropped and then the same is reported to the client. We
found that on Windows, the client end of the socket is not able to read
the data once we close the socket in the server which leads to loss of
error message which is not what we expect. We also observed similar
behavior in other cases like pg_terminate_backend(),
pg_ctl kill TERM <pid>. There are probably a few others like that. The
fix for this requires further study.
Discussion: https://postgr.es/m/E1iaD8h-0004us-K9@gemulon.postgresql.org
Specifying '-f' will add the 'force' option to the DROP DATABASE command
sent to the server. This will try to terminate all existing connections
to the target database before dropping it.
Author: Pavel Stehule
Reviewed-by: Vignesh C and Amit Kapila
Discussion: https://postgr.es/m/CAP_rwwmLJJbn70vLOZFpxGw3XD7nLB_7+NKz46H5EOO2k5H7OQ@mail.gmail.com