Incrementing the level of the call stack reported is useful for
debugging purposes as it allows to control which part of the test is
exactly failing, especially if a test is structured with subroutines
that call routines from Test::More.
This adds more incrementations of $Test::Builder::Level where debugging
gets improved (for example it does not make sense for some paths like
pg_rewind where long subroutines are used).
A note is added to src/test/perl/README about that, based on a
suggestion from Andrew Dunstan and a wording coming from both of us.
Usage of Test::Builder::Level has spread in 12, so a backpatch down to
this version is done.
Reviewed-by: Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson
Discussion: https://postgr.es/m/YV1CCFwgM1RV1LeS@paquier.xyz
Backpatch-through: 12
This fixes a loss of precision that occurs when the first input is
very close to 1, so that its logarithm is very small.
Formerly, during the initial low-precision calculation to estimate the
result weight, the logarithm was computed to a local rscale that was
capped to NUMERIC_MAX_DISPLAY_SCALE (1000). However, the base may be
as close as 1e-16383 to 1, hence its logarithm may be as small as
1e-16383, and so the local rscale needs to be allowed to exceed 16383,
otherwise all precision is lost, leading to a poor choice of rscale
for the full-precision calculation.
Fix this by removing the cap on the local rscale during the initial
low-precision calculation, as we already do in the full-precision
calculation. This doesn't change the fact that the initial calculation
is a low-precision approximation, computing the logarithm to around 8
significant digits, which is very fast, especially when the base is
very close to 1.
Patch by me, reviewed by Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCV-Ceu%2BHpRMf416yUe4KKFv%3DtdgXQAe5-7S9tD%3D5E-T1g%40mail.gmail.com
3c5b0685b9 used setFilePointer() to set the position of the filehandle, but
passed the wrong filehandle, always leaving the position at 0. Instead of just
fixing that, remove use of setFilePointer(), we have a perl fd at this point,
so we can just use perl's seek().
Additionally, the perl filehandle wasn't closed, just the windows filehandle.
Reviewed-By: Andrew Dunstan <andrew@dunslane.net>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20211003173038.64mmhgxctfqn7wl6@alap3.anarazel.de
Backpatch: 9.6-, like 3c5b0685b9
Computing related timestamps by subtracting "N days" is sensitive
to the prevailing timezone, since we interpret that as "same local
time on the N'th prior day". Even though the intervals in question
are only two to four days, through remarkable bad luck they managed
to cross the end of Ramadan in 2014, causing the test's output to
change if timezone is set to Africa/Casablanca. (Maybe in other
Muslim areas as well; I didn't check.) There's absolutely no reason
for this test to exercise interval subtraction, so just get rid of
that and use plain timestamptz constants representing the intended
values.
Per report from Andres Freund. Back-patch to v10 where this test
script came in.
Discussion: https://postgr.es/m/20210930183641.7lh4jhvpipvromca@alap3.anarazel.de
Physical replication always ships WAL segment files to replicas once
they are complete. This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
LOG: invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be. A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary. But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.
A fix for this problem was already attempted in commit 515e3d84a0, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.
This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts. With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.
Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.
A new TAP test that exercises this is added, but the portability of it
is yet to be seen.
This has been wrong since the introduction of physical replication, so
backpatch all the way back. In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
Contrary to the output of native perl, Msys perl generates outputs with
CRLFs characters. There are already places in the TAP code where CRLFs
(\r\n) are automatically converted to LF (\n) on Msys, but we missed a
couple of places when running commands and using their output for
comparison, that would lead to failures.
This problem has been found thanks to the test added in 5adb067 using
TestLib::command_checks_all(), but after a closer look more code paths
were missing a filter.
This is backpatched all the way down to prevent any surprises if a new
test is introduced in stable branches.
Reviewed-by: Andrew Dunstan, Álvaro Herrera
Discussion: https://postgr.es/m/1252480.1631829409@sss.pgh.pa.us
Backpatch-through: 9.6
Casting a value that's already of a type with a specific typmod
to an unspecified typmod doesn't do anything so far as run-time
behavior is concerned. However, it really ought to change the
exposed type of the expression to match. Up to now,
coerce_type_typmod hasn't bothered with that, which creates gotchas
in contexts such as recursive unions. If for example one side of
the union is numeric(18,3), but it needs to be plain numeric to
match the other side, there's no direct way to express that.
This is easy enough to fix, by inserting a RelabelType to update the
exposed type of the expression. However, it's a bit nervous-making
to change this behavior, because it's stood for a really long time.
But no complaints have emerged about 14beta3, so go ahead and
back-patch.
Back-patch of 5c056b0c2 into previous supported branches.
Discussion: https://postgr.es/m/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ=uWWWfQ@mail.gmail.com
Discussion: https://postgr.es/m/1488389.1631984807@sss.pgh.pa.us
Commit 55dc86eca changed pull_varnos to use (if possible) the associated
ph_eval_at for a PlaceHolderVar. I missed a fine point though: we might
be looking at a PHV in the quals or tlist of a child appendrel, in which
case we need to compute a ph_eval_at value that's been translated in the
same way that the PHV itself has been (cf. adjust_appendrel_attrs).
Fortunately, enough info is available in the PlaceHolderInfo to make
such translation possible without additional outside data, so we don't
need another round of uglification of planner APIs. This is a little
bit complicated, but since it's a hard-to-hit corner case, I'm not much
worried about adding cycles here.
Per report from Jaime Casanova. Back-patch to v12, like the previous
commit.
Discussion: https://postgr.es/m/20210915230959.GB17635@ahch-to
We have long forbidden fetching backwards from a NO SCROLL cursor,
but the prohibition didn't extend to cases in which we rewind the
query altogether and then re-fetch forwards. I think the reason is
that this logic was mainly meant to protect plan nodes that can't
be run in the reverse direction. However, re-reading the query output
is problematic if the query is volatile (which includes SELECT FOR
UPDATE, not just queries with volatile functions): the re-read can
produce different results, which confuses the cursor navigation logic
completely. Another reason for disliking this approach is that some
code paths will either fetch backwards or rewind-and-fetch-forwards
depending on the distance to the target row; so that seemingly
identical use-cases may or may not draw the "cursor can only scan
forward" error. Hence, let's clean things up by disallowing rewind
as well as fetch-backwards in a NO SCROLL cursor.
Ordinarily we'd only make such a definitional change in HEAD, but
there is a third reason to consider this change now. Commit ba2c6d6ce
created some new user-visible anomalies for non-scrollable cursors
WITH HOLD, in that navigation in the cursor result got confused if the
cursor had been partially read before committing. The only good way
to resolve those anomalies is to forbid rewinding such a cursor, which
allows removal of the incorrect cursor state manipulations that
ba2c6d6ce added to PersistHoldablePortal.
To minimize the behavioral change in the back branches (including
v14), refuse to rewind a NO SCROLL cursor only when it has a holdStore,
ie has been held over from a previous transaction due to WITH HOLD.
This should avoid breaking most applications that have been sloppy
about whether to declare cursors as scrollable. We'll enforce the
prohibition across-the-board beginning in v15.
Back-patch to v11, as ba2c6d6ce was.
Discussion: https://postgr.es/m/3712911.1631207435@sss.pgh.pa.us
If we copy data-modifying CTEs from the original query to a replacement
query (from a DO INSTEAD rule), we must set hasModifyingCTE properly
in the replacement query. Failure to do this can cause various
unpleasantness, such as unsafe usage of parallel plans. The code also
neglected to propagate hasRecursive, though that's only cosmetic at
the moment.
A difficulty arises if the rule action is an INSERT...SELECT. We
attach the original query's RTEs and CTEs to the sub-SELECT Query, but
data-modifying CTEs are only allowed to appear in the topmost Query.
For the moment, throw an error in such cases. It would probably be
possible to avoid this error by attaching the CTEs to the top INSERT
Query instead; but that would require a bunch of new code to adjust
ctelevelsup references. Given the narrowness of the use-case, and
the need to back-patch this fix, it does not seem worth the trouble
for now. We can revisit this if we get field complaints.
Per report from Greg Nancarrow. Back-patch to all supported branches.
(The test case added here does not fail before v10, but there are
plenty of places checking top-level hasModifyingCTE in 9.6, so I have
no doubt that this code change is necessary there too.)
Greg Nancarrow and Tom Lane
Discussion: https://postgr.es/m/CAJcOf-f68DT=26YAMz_i0+Au3TcLO5oiHY5=fL6Sfuits6r+_w@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
Updates/Deletes on a relation were allowed even without replica identity
after we define the publication for all tables. This would later lead to
an error on subscribers. The reason was that for such publications we were
not invalidating the relcache and the publication information for
relations was not getting rebuilt. Similarly, we were not invalidating the
relcache after dropping of such publications which will prohibit
Updates/Deletes without replica identity even without any publication.
Author: Vignesh C and Hou Zhijie
Reviewed-by: Hou Zhijie, Kyotaro Horiguchi, Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/CALDaNm0pF6zeWqCA8TCe2sDuwFAy8fCqba=nHampCKag-qLixg@mail.gmail.com
Modern POSIX seems to require strtod() to accept "-NaN", but there's
nothing about NaN in SUSv2, and some of our oldest buildfarm members
don't like it. Let's try writing it as -'NaN' instead; that seems
to produce the same result, at least on Intel hardware.
Per buildfarm.
The IEEE 754 standard allows a wide variety of bit patterns for NaNs,
of which at least two ("NaN" and "-NaN") are pretty easy to produce
from SQL on most machines. This is problematic because our btree
comparison functions deem all NaNs to be equal, but our float hash
functions know nothing about NaNs and will happily produce varying
hash codes for them. That causes unexpected results from queries
that hash a column containing different NaN values. It could also
produce unexpected lookup failures when using a hash index on a
float column, i.e. "WHERE x = 'NaN'" will not find all the rows
it should.
To fix, special-case NaN in the float hash functions, not too much
unlike the existing special case that forces zero and minus zero
to hash the same. I arranged for the most vanilla sort of NaN
(that coming from the C99 NAN constant) to still have the same
hash code as before, to reduce the risk to existing hash indexes.
I dithered about whether to back-patch this into stable branches,
but ultimately decided to do so. It's a clear improvement for
queries that hash internally. If there is anybody who has -NaN
in a hash index, they'd be well advised to re-index after applying
this patch ... but the misbehavior if they don't will not be much
worse than the misbehavior they had before.
Per bug #17172 from Ma Liangzhu.
Discussion: https://postgr.es/m/17172-7505bea9e04e230f@postgresql.org
The check to test whether the subscription workers were restarting after a
change in the subscription was failing. The reason was that the test was
assuming the walsender started before it reaches the 'streaming' state and
the walsender was exiting due to an error before that. Now, the walsender
was erroring out before reaching the 'streaming' state because it tries to
acquire the slot before the previous walsender has exited.
In passing, improve the die messages so that it is easier to investigate
the failures in the future if any.
Reported-by: Michael Paquier, as per buildfarm
Author: Ajin Cherian
Reviewed-by: Masahiko Sawada, Amit Kapila
Backpatch-through: 10, where this test was introduced
Discussion: https://postgr.es/m/YRnhFxa9bo73wfpV@paquier.xyz
Commit 5be8ce82e8 added a new role to the stats_ext regression suite,
but the role name did not start with regress_ causing failures when
running with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS. Fixed by
renaming the role to start with the expected regress_ prefix.
Backpatch-through: 10, same as the new regression test
Discussion: https://postgr.es/m/1F238937-7CC2-4703-A1B1-6DC225B8978A%40enterprisedb.com
When an ownership check on extended statistics object failed, the code
was calling aclcheck_error_type to report the failure, which is clearly
wrong, resulting in cache lookup errors. Fix by calling aclcheck_error.
This issue exists since the introduction of extended statistics, so
backpatch all the way back to PostgreSQL 10. It went unnoticed because
there were no tests triggering the error, so add one.
Reported-by: Mark Dilger
Backpatch-through: 10, where extended stats were introduced
Discussion: https://postgr.es/m/1F238937-7CC2-4703-A1B1-6DC225B8978A%40enterprisedb.com
There are two identical error messages about valid value of modulus for
hash partition, in PostgreSQL source code. Commit 0e1275fb07 improved
only one of them so that ambiguous word "positive" was avoided there,
and forgot to improve the other. This commit improves the other.
Which would reduce translator burden.
Back-pach to v11 where the error message exists.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210819.170315.1413060634876301811.horikyota.ntt@gmail.com
Regexps like "(.){0}...\1" drew an "invalid backreference number".
That's not unreasonable on its face, since the capture group will
never be matched if it's iterated zero times. However, other engines
such as Perl's don't complain about this, nor do we throw an error for
related cases such as "(.)|\1", even though that backref can never
succeed either. Also, if the zero-iterations case happens at runtime
rather than compile time --- say, "(x)*...\1" when there's no "x" to
be found --- that's not an error, we just deem the backref to not
match. Making this even less defensible, no error was thrown for
nested cases such as "((.)){0}...\2"; and to add insult to injury,
those cases could result in assertion failures instead. (It seems
that nothing especially bad happened in non-assert builds, though.)
Let's just fix it so that no error is thrown and instead the backref
is deemed to never match, so that compile-time detection of no
iterations behaves the same as run-time detection.
Per report from Mark Dilger. This appears to be an aboriginal error
in Spencer's library, so back-patch to all supported versions.
Pre-v14, it turns out to also be necessary to back-patch one aspect of
commits cb76fbd7e/00116dee5, namely to create capture-node subREs with
the begin/end states of their subexpressions, not the current lp/rp
of the outer parseqatom invocation. Otherwise delsub complains that
we're trying to disconnect a state from itself. This is a bit scary
but code examination shows that it's safe: in the pre-v14 code, if we
want to wrap iteration around the subexpression, the first thing we do
is overwrite the atom's begin/end fields with new states. So the
bogus values didn't survive long enough to be used for anything, except
if no iteration is required, in which case it doesn't matter.
Discussion: https://postgr.es/m/A099E4A8-4377-4C64-A98C-3DEDDC075502@enterprisedb.com
The recursion in cdissect() was careless about clearing match data
for capturing parentheses after rejecting a partial match. This
could allow a later back-reference to succeed when by rights it
should fail for lack of a defined referent.
To fix, think a little more rigorously about what the contract
between different levels of cdissect's recursion needs to be.
With the right spec, we can fix this using fewer rather than more
resets of the match data; the key decision being that a failed
sub-match is now explicitly responsible for clearing any matches
it may have set.
There are enough other cross-checks and optimizations in the code
that it's not especially easy to exhibit this problem; usually, the
match will fail as-expected. Plus, regexps that are even potentially
vulnerable are most likely user errors, since there's just not much
point in writing a back-ref that doesn't always have a referent.
These facts perhaps explain why the issue hasn't been detected,
even though it's almost certainly a couple of decades old.
Discussion: https://postgr.es/m/151435.1629733387@sss.pgh.pa.us
transformLockingClause neglected to exclude the pseudo-RTEs for
OLD/NEW when processing a rule's query. This led to odd errors
or even crashes later on. This bug is very ancient, but it's
not terribly surprising that nobody noticed, since the use-case
for SELECT FOR UPDATE in a non-view rule is somewhere between
thin and non-existent. Still, crashing is not OK.
Per bug #17151 from Zhiyong Wu. Thanks to Masahiko Sawada
for analysis of the problem.
Discussion: https://postgr.es/m/17151-c03a3e6e4ec9aadb@postgresql.org
Recursion into the FILTER clause was mis-implemented, such that a
relevant Var or Aggref at the very top of the FILTER clause would
be ignored. (Of course, that'd have to be a plain boolean Var or
boolean-returning aggregate.) The consequence would be
mis-identification of the correct semantic level of the aggregate,
which could lead to not-per-spec query behavior. If the FILTER
expression is an aggregate, this could also lead to failure to issue
an expected "aggregate function calls cannot be nested" error, which
would likely result in a core dump later on, since the planner and
executor aren't expecting such cases to appear.
The root cause is that commit b560ec1b0 blindly copied some code
that assumed it's recursing into a List, and thus didn't examine the
top-level node. To forestall questions about why this call doesn't
look like the others, as well as possible future copy-and-paste
mistakes, let's change all three check_agg_arguments_walker calls in
check_agg_arguments, even though only the one for the filter clause
is really broken.
Per bug #17152 from Zhiyong Wu. This has been wrong since we
implemented FILTER, so back-patch to all supported versions.
(Testing suggests that pre-v11 branches manage to avoid crashing
in the bad-Aggref case, thanks to "redundant" checks in ExecInitAgg.
But I'm not sure how thorough that protection is, and anyway the
wrong-behavior issue remains, so fix 9.6 and 10 too.)
Discussion: https://postgr.es/m/17152-c7f906cc1a88e61b@postgresql.org
Rather than trying to pick table aliases that won't conflict with
any possible user-defined matview column name, adjust the queries'
syntax so that the aliases are only used in places where they can't be
mistaken for column names. Mostly this consists of writing "alias.*"
not just "alias", which adds clarity for humans as well as machines.
We do have the issue that "SELECT alias.*" acts differently from
"SELECT alias", but we can use the same hack ruleutils.c uses for
whole-row variables in SELECT lists: write "alias.*::compositetype".
We might as well revert to the original aliases after doing this;
they're a bit easier to read.
Like 75d66d10e, back-patch to all supported branches.
Discussion: https://postgr.es/m/2488325.1628261320@sss.pgh.pa.us
Formerly, the numeric code tested whether an integer value of a larger
type would fit in a smaller type by casting it to the smaller type and
then testing if the reverse conversion produced the original value.
That's perfectly fine, except that it caused a test failure on
buildfarm animal castoroides, most likely due to a compiler bug.
Instead, do these tests by comparing against PG_INT16/32_MIN/MAX. That
matches existing code in other places, such as int84(), which is more
widely tested, and so is less likely to go wrong.
While at it, add regression tests covering the numeric-to-int8/4/2
conversions, and adjust the recently added tests to the style of
434ddfb79a (on the v11 branch) to make failures easier to diagnose.
Per buildfarm via Tom Lane, reviewed by Tom Lane.
Discussion: https://postgr.es/m/2394813.1628179479%40sss.pgh.pa.us
This fixes a long-standing bug when using to_char() to format a
numeric value in scientific notation -- if the value's exponent is
less than -NUMERIC_MAX_DISPLAY_SCALE-1 (-1001), it produced a
division-by-zero error.
The reason for this error was that get_str_from_var_sci() divides its
input by 10^exp, which it produced using power_var_int(). However, the
underflow test in power_var_int() causes it to return zero if the
result scale is too small. That's not a problem for power_var_int()'s
only other caller, power_var(), since that limits the rscale to 1000,
but in get_str_from_var_sci() the exponent can be much smaller,
requiring a much larger rscale. Fix by introducing a new function to
compute 10^exp directly, with no rscale limit. This also allows 10^exp
to be computed more efficiently, without any numeric multiplication,
division or rounding.
Discussion: https://postgr.es/m/CAEZATCWhojfH4whaqgUKBe8D5jNHB8ytzemL-PnRx+KCTyMXmg@mail.gmail.com
This fixes a couple of related problems that arise when raising
numbers to very large powers.
Firstly, when raising a negative number to a very large integer power,
the result should be well-defined, but the previous code would only
cope if the exponent was small enough to go through power_var_int().
Otherwise it would throw an internal error, attempting to take the
logarithm of a negative number. Fix this by adding suitable handling
to the general case in power_var() to cope with negative bases,
checking for integer powers there.
Next, when raising a (positive or negative) number whose absolute
value is slightly less than 1 to a very large power, the result should
approach zero as the power is increased. However, in some cases, for
sufficiently large powers, this would lose all precision and return 1
instead of 0. This was due to the way that the local_rscale was being
calculated for the final full-precision calculation:
local_rscale = rscale + (int) val - ln_dweight + 8
The first two terms on the right hand side are meant to give the
number of significant digits required in the result ("val" being the
estimated result weight). However, this failed to account for the fact
that rscale is clipped to a maximum of NUMERIC_MAX_DISPLAY_SCALE
(1000), and the result weight might be less then -1000, causing their
sum to be negative, leading to a loss of precision. Fix this by
forcing the number of significant digits calculated to be nonnegative.
It's OK for it to be zero (when the result weight is less than -1000),
since the local_rscale value then includes a few extra digits to
ensure an accurate result.
Finally, add additional underflow checks to exp_var() and power_var(),
so that they consistently return zero for cases like this where the
result is indistinguishable from zero. Some paths through this code
already returned zero in such cases, but others were throwing overflow
errors.
Dean Rasheed, reviewed by Yugo Nagata.
Discussion: http://postgr.es/m/CAEZATCW6Dvq7+3wN3tt5jLj-FyOcUgT5xNoOqce5=6Su0bCR0w@mail.gmail.com
The error messages using the word "non-negative" are confusing
because it's ambiguous about whether it accepts zero or not.
This commit improves those error messages by replacing it with
less ambiguous word like "greater than zero" or
"greater than or equal to zero".
Also this commit added the note about the word "non-negative" to
the error message style guide, to help writing the new error messages.
When postgres_fdw option fetch_size was set to zero, previously
the error message "fetch_size requires a non-negative integer value"
was reported. This error message was outright buggy. Therefore
back-patch to all supported versions where such buggy error message
could be thrown.
Reported-by: Hou Zhijie
Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/OS0PR01MB5716415335A06B489F1B3A8194569@OS0PR01MB5716.jpnprd01.prod.outlook.com
We have an implementation restriction that PREPARE TRANSACTION can't
handle cases where both session-lifespan and transaction-lifespan locks
are held on the same lockable object. (That's because we'd otherwise
need to acquire a new PROCLOCK entry during post-prepare cleanup, which
is an operation that might fail. The situation can only arise with odd
usages of advisory locks, so removing the restriction is probably not
worth the amount of effort it would take.) AtPrepare_Locks attempted
to enforce this, but its logic was many bricks shy of a load, because
it only detected cases where the session and transaction locks had the
same lockmode. Locks of different modes on the same object would lead
to the rather unhelpful message "PANIC: we seem to have dropped a bit
somewhere".
To fix, build a transient hashtable with one entry per locktag,
not one per locktag + mode, and use that to detect conflicts.
Per bug #17122 from Alexander Pyhalov. This bug is ancient,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17122-04f3c32098a62233@postgresql.org
We don't allow to create replication slot_name as an empty string ('') via
SQL API pg_create_logical_replication_slot() but it is allowed to be set
via Alter Subscription command. This will lead to apply worker repeatedly
keep trying to stream data via slot_name '' and the user is not allowed to
create the slot with that name.
Author: Japin Li
Reviewed-By: Ranier Vilela, Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/MEYP282MB1669CBD98E721C77CA696499B61A9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
pg_dump failed to preserve the 'enabled' flag (which can be not only
disabled, but also REPLICA or ALWAYS) for partitions which had it
changed from their respective parents. Attempt to handle that by
including a definition for such triggers in the dump, but replace the
standard CREATE TRIGGER line with an ALTER TRIGGER line.
Backpatch to 11, where these triggers can exist. In branches 11 and 12,
pick up a few test lines from commit b9b408c487 to verify that
pg_upgrade is okay with these arrangements.
Co-authored-by: Justin Pryzby <pryzby@telsasoft.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200930223450.GA14848@telsasoft.com
When triggers are cloned from partitioned tables to their partitions,
the 'tgenabled' flag (origin/replica/always/disable) was not propagated.
Make it so that the flag on the trigger on partition is initially set to
the same value as on the partitioned table.
Add a test case to verify the behavior.
Backpatch to 11, where this appeared in commit 86f575948c.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200930223450.GA14848@telsasoft.com
Some commands of ALTER TABLE could fail with the following error:
ERROR: "tab" is of the wrong type
This error is unexpected, as all the code paths leading to
ATWrongRelkindError() should use a supported set of relkinds to generate
correct error messages. This commit closes the gap with such mistakes,
by adding all the missing relkind combinations. Tests are added to
check all the problems found. Note that some combinations are not used,
but these are left around as it could have an impact on applications
relying on this code.
2ed532e has done a much larger refactoring on HEAD to make such error
messages easier to manage in the long-term, so nothing is needed there.
Author: Kyotaro Horiguchi
Reviewed-by: Peter Eisentraut, Ahsan Hadi, Michael Paquier
Discussion: https://postgr.es/m/20210216.181415.368926598204753659.horikyota.ntt@gmail.com
Backpatch-through: 11
This fixes an overflow error when using the numeric * operator if the
result has more than 16383 digits after the decimal point by rounding
the result. Overflow errors should only occur if the result has too
many digits *before* the decimal point.
Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
Since the executor can't cope with a utility statement appearing
as a node of a plan tree, we can't support cases where a rewrite
rule inserts a NOTIFY into an INSERT/UPDATE/DELETE command appearing
in a WITH clause of a larger query. (One can imagine ways around
that, but it'd be a new feature not a bug fix, and so far there's
been no demand for it.) RewriteQuery checked for this, but it
missed the case where the DML command rewrites to *only* a NOTIFY.
That'd lead to crashes later on in planning. Add the missed check,
and improve the level of testing of this area.
Per bug #17094 from Yaoguang Chen. It's been busted since WITH
was introduced, so back-patch to all supported branches.
Discussion: https://postgr.es/m/17094-bf15dff55eaf2e28@postgresql.org
Due to how pg_size_pretty(bigint) was implemented, it's possible that when
given a negative number of bytes that the returning value would not match
the equivalent positive return value when given the equivalent positive
number of bytes. This was due to two separate issues.
1. The function used bit shifting to convert the number of bytes into
larger units. The rounding performed by bit shifting is not the same as
dividing. For example -3 >> 1 = -2, but -3 / 2 = -1. These two
operations are only equivalent with positive numbers.
2. The half_rounded() macro rounded towards positive infinity. This meant
that negative numbers rounded towards zero and positive numbers rounded
away from zero.
Here we fix#1 by dividing the values instead of bit shifting. We fix#2
by adjusting the half_rounded macro always to round away from zero.
Additionally, adjust the pg_size_pretty(numeric) function to be more
explicit that it's using division rather than bit shifting. A casual
observer might have believed bit shifting was used due to a static
function being named numeric_shift_right. However, that function was
calculating the divisor from the number of bits and performed division.
Here we make that more clear. This change is just cosmetic and does not
affect the return value of the numeric version of the function.
Here we also add a set of regression tests both versions of
pg_size_pretty() which test the values directly before and after the
function switches to the next unit.
This bug was introduced in 8a1fab36a. Prior to that negative values were
always displayed in bytes.
Author: Dean Rasheed, David Rowley
Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com
Backpatch-through: 9.6, where the bug was introduced.
The prove_installcheck recipe in src/Makefile.global.in was emitting
bogus paths for a couple of elements when used with PGXS. Here we create
a separate recipe for the PGXS case that does it correctly. We also take
the opportunity to make the make the file more readable by breaking up
the prove_installcheck and prove_check recipes across several lines, and
to remove the setting for REGRESS_SHLIB to src/test/recovery/Makefile,
which is the only set of tests that actually need it.
Backpatch to all live branches
Discussion: https://postgr.es/m/f2401388-936b-f4ef-a07c-a0bcc49b3300@dunslane.net
83158f7 has improved index_set_state_flags() so as it is possible to use
transactional updates when updating pg_index state flags, but there was
not really a test case which stressed directly the possibility it fixed.
This commit adds such a test, using a predicate that looks valid in
appearance but calls a stable function.
Author: Andrey Lepikhov
Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru
Backpatch-through: 9.6
specscanner.l leaked a kilobyte of memory per token of the spec file.
Apparently somebody thought that the introductory code block would be
executed once; but it's once per yylex() call.
A couple of functions in isolationtester.c leaked small amounts of
memory due to not bothering to free one-time allocations. Might
as well improve these so that valgrind gives this program a clean
bill of health. Also get rid of an ugly static variable.
Coverity complained about one of the one-time leaks, which led me
to try valgrind'ing isolationtester, which led to discovery of the
larger leak.
For no obvious reason, isolationtester has always insisted that
session and step names be written with double quotes. This is
fairly tedious and does little for test readability, especially
since the names that people actually choose almost always look
like normal identifiers. Hence, let's tweak the lexer to allow
SQL-like identifiers not only double-quoted strings.
(They're SQL-like, not exactly SQL, because I didn't add any
case-folding logic. Also there's no provision for U&"..." names,
not that anyone's likely to care.)
There is one incompatibility introduced by this change: if you write
"foo""bar" with no space, that used to be taken as two identifiers,
but now it's just one identifier with an embedded quote mark.
I converted all the src/test/isolation/ specfiles to remove
unnecessary double quotes, but stopped there because my
eyes were glazing over already.
Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.
Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
Previously, isolationtester displayed SQL query results using some
ad-hoc code that clearly hadn't had much effort expended on it.
Field values longer than 14 characters weren't separated from
the next field, and usually caused misalignment of the columns
too. Also there was no visual separation of a query's result
from subsequent isolationtester output. This made test result
files confusing and hard to read.
To improve matters, let's use libpq's PQprint() function. Although
that's long since unused by psql, it's still plenty good enough
for the purpose here.
Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.
Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
We've long contended with isolation test results that aren't entirely
stable. Some test scripts insert long delays to try to force stable
results, which is not terribly desirable; but other erratic failure
modes remain, causing unrepeatable buildfarm failures. I've spent a
fair amount of time trying to solve this by improving the server-side
support code, without much success: that way is fundamentally unable
to cope with diffs that stem from chance ordering of arrival of
messages from different server processes.
We can improve matters on the client side, however, by annotating
the test scripts themselves to show the desired reporting order
of events that might occur in different orders. This patch adds
three types of annotations to deal with (a) test steps that might or
might not complete their waits before the isolationtester can see them
waiting; (b) test steps in different sessions that can legitimately
complete in either order; and (c) NOTIFY messages that might arrive
before or after the completion of a step in another session. We might
need more annotation types later, but this seems to be enough to deal
with the instabilities we've seen in the buildfarm. It also lets us
get rid of all the long delays that were previously used, cutting more
than a minute off the runtime of the isolation tests.
Back-patch to all supported branches, because the buildfarm
instabilities affect all the branches, and because it seems desirable
to keep isolationtester's capabilities the same across all branches
to simplify possible future back-patching of tests.
Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
Ordinarily, a pg_policy.polroles array wouldn't list the same role
more than once; but CREATE POLICY does not prevent that. If we
perform DROP OWNED BY on a role that is listed more than once,
RemoveRoleFromObjectPolicy either suffered an assertion failure
or encountered a tuple-updated-by-self error. Rewrite it to cope
correctly with duplicate entries, and add a CommandCounterIncrement
call to prevent the other problem.
Per discussion, there's other cleanup that ought to happen here,
but this seems like the minimum essential fix.
Per bug #17062 from Alexander Lakhin. It's been broken all along,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
The fast default code added in Release 11 omitted to check that the
table a fast default was being added to was a plain table. Thus one
could be added to a foreign table, which predicably blows up. Here we
perform that check.
In addition, on the back branches, since some of these might have
escaped into the wild, if we encounter a missing value for
an attribute of something other than a plain table we ignore it.
Fixes bug #17056
Backpatch to release 11,
Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane
This is useful for developers to find out if an isolation spec is
over-engineered or if it needs more work by warning at the end of a
test run if a step is not used, generating a failure with extra diffs.
While on it, clean up all the specs which include steps not used in any
permutations to simplify them.
This is a backpatch of 989d23b and 06fdc4e, as it is becoming useful to
make all the branches consistent for an upcoming patch that will improve
the output generated by isolationtester.
Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6
The original purpose of the dry-run mode is to be able to print all the
possible permutations from a spec file, but it has become less useful
since isolation tests have improved regarding deadlock detection as one
step not wanted by the author could block indefinitely now (originally
the step blocked would have been detected rather quickly). Per
discussion, let's remove it.
This is a backpatch of 9903338 for 9.6~12. It is proving to become
useful to have on those branches so as the code gets consistent across
all supported versions, as a matter of improving the output generated by
isolationtester.
Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6
TestLib::perl2host can take a file argument as well as a directory
argument, so that code becomes substantially simpler. Also add comments
on why we're using forward slashes, and why we're setting
PERL_BADLANG=0.
Discussion: https://postgr.es/m/e9947bcd-20ee-027c-f0fe-01f736b7e345@dunslane.net
Translate path slashes on target directory path. This was confusing old
branches, but is applied to all branches for the sake of uniformity.
Perl is perfectly able to understand paths with forward slashes.
Along the way, restore the previous archive_wait query, for the sake of
uniformity with other tests, per gripe from Tom Lane.
This is similar to the work done in 8279f68 for TestLib.pm, where
environment variables set may cause unwanted failures if using a
temporary installation with pg_regress. The list of variables reset is
adjusted in each stable branch depending on what is supported.
Comments are added to remember that the lists in TestLib.pm and
pg_regress.c had better be kept in sync.
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/YMNR9GYDn+fHlMta@paquier.xyz
Backpatch-through: 9.6
Several TAP tests use poll_query_until() to wait for the postmaster
to restart. They were checking to see if a trivial query
(e.g. "SELECT 1") succeeds. However, that's problematic in the wake
of commit 11e9caff8, because now that we feed said query to psql
via stdin, we risk IPC::Run whining about a SIGPIPE failure if psql
quits before reading the query. Hence, we can't use a nonempty
query in cases where we need to wait for connection failures to
stop happening.
Per the precedent of commits c757a3da0 and 6d41dd045, we can pass
"undef" as the query in such cases to ensure that IPC::Run has
nothing to write. However, then we have to say that the expected
output is empty, and this exposes a deficiency in poll_query_until:
if psql fails altogether and returns empty stdout, poll_query_until
will treat that as a success! That's because, contrary to its
documentation, it makes no actual check for psql failure, looking
neither at the exit status nor at stderr.
To fix that, adjust poll_query_until to insist on empty stderr as
well as a stdout match. (I experimented with checking exit status
instead, but it seems that psql often does exit(1) in cases that we
need to consider successes. That might be something to fix someday,
but it would be a non-back-patchable behavior change.)
Back-patch to v10. The test cases needing this exist only as far
back as v11, but it seems wise to keep poll_query_until's behavior
the same in v10, in case we back-patch another such test case in
future. (9.6 does not currently need this change, because in that
branch poll_query_until can't be told to accept empty stdout as
a success case.)
Per assorted buildfarm failures, mostly on hoverfly.
Discussion: https://postgr.es/m/CAA4eK1+zM6L4QSA1XMvXY_qqWwdUmqkOS1+hWvL8QcYEBGA1Uw@mail.gmail.com
Commit caba8f0d43 wasn't quite right for msys, as demonstrated by
several buildfarm animals, including jacana and fairywren. We need to
use the msys perl in the archive command, but call it in such a way that
Windows will understand the path. Furthermore, inside the copy script we
need to convert a Windows path to an msys path.