Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
Previously this code assumed that all IndexScan nodes supported
mark/restore, which is not true since it depends on optional index AM
support functions. This could lead to errors about missing support
functions in rare edge cases of mergejoins with no sort keys, where an
unordered non-btree index scan was placed on the inner path without a
protecting Materialize node. (Normally, the fact that merge join
requires ordered input would avoid this error.)
Backpatch all the way since this bug is ancient.
Per report from Eugen Konkov on irc.
Discussion: https://postgr.es/m/87o8jn50be.fsf@news-spur.riddles.org.uk
Commit 502898192 was too careless about the order of execution of the
additional ALTER TABLE operations generated by expandTableLikeClause.
It just stuck them all at the end, which seems okay for most purposes.
But it falls down in the case where LIKE is importing a primary key
or unique index and the outer CREATE TABLE includes a FOREIGN KEY
constraint that needs to depend on that index. Weird as that is,
it used to work, so we ought to keep it working.
To fix, make parse_utilcmd.c insert LIKE clauses between index-creation
and FK-creation commands in the transformed list of commands, and change
utility.c so that the commands generated by expandTableLikeClause are
executed immediately not at the end. One could imagine scenarios where
this wouldn't work either; but currently expandTableLikeClause only
makes column default expressions, CHECK constraints, and indexes, and
this ordering seems fine for those.
Per bug #16730 from Sofoklis Papasofokli. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/16730-b902f7e6e0276b30@postgresql.org
Otherwise, if FDDEBUG is enabled, the debugging output fails because
it tries to read the fileName, which isn't set up yet (and should in
fact always be NULL).
AFAICT, this has been wrong since Berkeley. Before 96bf88d52,
it would accidentally fail to crash on platforms where snprintf()
is forgiving about being passed a NULL pointer for %s; but the
file name intended to be included in the debug output wouldn't
ever have shown up.
Report and fix by Greg Nancarrow. Although this is only visibly
broken in custom-made builds, it still seems worth back-patching
to all supported branches, as the FDDEBUG code is pretty useless
as it stands.
Discussion: https://postgr.es/m/CAJcOf-cUDgm9qYtC_B6XrC6MktMPNRby2p61EtSGZKnfotMArw@mail.gmail.com
Document that though the history file content is marked as bytea, it is
the same a text, and neither is btyea-escaped or encoding converted.
Reported-by: Brar Piening
Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de
Backpatch-through: 13 - 9.5 (not master)
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format. This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.
Two of these call sites were in fact wrong:
* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.
* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended. Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units. This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.
There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds. It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.
Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.
Alexey Kondratov and Tom Lane
Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
brin_form_tuple failed to consider the values may be toasted, inserting
the toast pointer into the index. This may easily result in index
corruption, as the toast data may be deleted and cleaned up by vacuum.
The cleanup however does not care about indexes, leaving invalid toast
pointers behind, which triggers errors like this:
ERROR: missing chunk number 0 for toast value 16433 in pg_toast_16426
A less severe consequence are inconsistent failures due to the index row
being too large, depending on whether brin_form_tuple operated on plain
or toasted version of the row. For example
CREATE TABLE t (val TEXT);
INSERT INTO t VALUES ('... long value ...')
CREATE INDEX idx ON t USING brin (val);
would likely succeed, as the row would likely include toast pointer.
Switching the order of INSERT and CREATE INDEX would likely fail:
ERROR: index row size 8712 exceeds maximum 8152 for index "idx"
because this happens before the row values are toasted.
The bug exists since PostgreSQL 9.5 where BRIN indexes were introduced.
So backpatch all the way back.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Backpatch-through: 9.5
Discussion: https://postgr.es/m/20201001184133.oq5uq75sb45pu3aw@development
Discussion: https://postgr.es/m/20201104010544.zexj52mlldagzowv%40development
Revert 59ab4ac32, as well as the followup fix 33862cb9c, in all
branches. We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases. We'll take another crack at this
later.
Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
The intention in commit 491c029db was to require superuserness to
change the BYPASSRLS property, but the actual effect of the coding
in AlterRole() was to require superuserness to change anything at all
about a BYPASSRLS role. Other properties of a BYPASSRLS role should
be changeable under the same rules as for a normal role, though.
Fix that, and also take care of some documentation omissions related
to BYPASSRLS and REPLICATION role properties.
Tom Lane and Stephen Frost, per bug report from Wolfgang Walther.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/a5548a9f-89ee-3167-129d-162b5985fcf8@technowledgy.de
EventTriggerAlterTableEnd neglected to make sure that it built its
output list in the right context. In simple cases this was masked
because the function is called in PortalContext which will be
sufficiently long-lived anyway; but that doesn't make it not a bug.
Commit ced138e8c fixed this in HEAD and v13, but mistakenly chose
not to back-patch further. Back-patch the same code change all
the way (I didn't bother with the test case though, as it would
prove nothing in pre-v13 branches).
Per report from Arseny Sher.
Original fix by Jehan-Guillaume de Rorthais.
Discussion: https://postgr.es/m/877drcyprb.fsf@ars-thinkpad
Discussion: https://postgr.es/m/20200902193715.6e0269d4@firost
The restriction that only tables and views can be locked by LOCK TABLE
is quite arbitrary, since the underlying mechanism can lock any relation
type. Drop the restriction so that programs such as pg_dump can lock
all relations they're interested in, preventing schema changes that
could cause a dump to fail after expending much effort.
Backpatch to 9.5.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Wells Oliver <wells.oliver@gmail.com>
Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
In shm_mq_receive(), a huge payload could trigger an unjustified
"invalid memory alloc request size" error due to the way the buffer
size is increased.
Add error checks (documenting the upper limit) and avoid the error by
limiting the allocation size to MaxAllocSize.
Author: Markus Wanner <markus.wanner@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/3bb363e7-ac04-0ac4-9fe8-db1148755bfa%402ndquadrant.com
In the planner, it was possible, given an extreme enough case containing a
large number of joins for the number of estimated rows to become infinite.
This could cause problems in initial_cost_mergejoin() where we perform
some calculations based on those row estimates.
A problem case, presented by Onder Kalaci showed an Assert failure from
an Assert checking outerstartsel <= outerendsel. In his test case this
was effectively NaN <= Inf, which is false. The NaN outerstartsel came
from multiplying the infinite outer_path_rows by 0.0.
In master, this problem was fixed by a90c950fc, however, that fix was too
invasive for the backbranches. Here we just relax the Asserts to allow
them to pass. The worst that appears to happen from this is that we show
NaN cost values and infinite row estimates in EXPLAIN. add_path() would
have had a hard time doing anything useful with such costs, but that does
not really matter as if the row estimates were even close to accurate,
such plan would not complete this side of the heat death of the universe.
Reported-by: Onder Kalaci
Backpatch: 9.5 to 13
Discussion: https://postgr.es/m/DM6PR21MB1211FF360183BCA901B27F04D80B0@DM6PR21MB1211.namprd21.prod.outlook.com
The prohibitValueChange code paths in set_config_option(), which
are executed whenever we re-read a PGC_POSTMASTER variable from
postgresql.conf, neglected to free anything before exiting. Thus
we'd leak the proposed new value of a PGC_STRING variable, as noted
by BoChen in bug #16666. For all variable types, if the check hook
creates an "extra" chunk, we'd also leak that.
These are malloc not palloc chunks, so there is no mechanism for
recovering the leaks before process exit. Fortunately, the values
are typically not very large, meaning you'd have to go through an
awful lot of SIGHUP configuration-reload cycles to make the leakage
amount to anything. Still, for a long-lived postmaster process it
could potentially be a problem.
Oversight in commit 2594cf0e8. Back-patch to all supported branches.
Discussion: https://postgr.es/m/16666-2c41a4eec61b03e1@postgresql.org
It appears that commit cf63c641c, which intended to prevent
misoptimization of the result-building step in makeOrderedSetArgs,
didn't go far enough: buildfarm member hornet's version of xlc
is now optimizing back to the old, broken behavior in which
list_length(directargs) is fetched only after list_concat() has
changed that value. I'm not entirely convinced whether that's
an undeniable compiler bug or whether it can be justified by a
sufficiently aggressive interpretation of C sequence points.
So let's just change the code to make it harder to misinterpret.
Back-patch to all supported versions, just in case.
Discussion: https://postgr.es/m/1830491.1601944935@sss.pgh.pa.us
get_eclass_for_sort_expr() computes expr_relids and nullable_relids
early on, even though they won't be needed unless we make a new
EquivalenceClass, which we often don't. Aside from the probably-minor
inefficiency, there's a memory management problem: these bitmapsets will
be built in the caller's context, leading to dangling pointers if that
is shorter-lived than root->planner_cxt. This would be a live bug if
get_eclass_for_sort_expr() could be called with create_it = true during
GEQO join planning. So far as I can find, the core code never does
that, but it's hard to be sure that no extensions do, especially since
the comments make it clear that that's supposed to be a supported case.
Fix by not computing these values until we've switched into planner_cxt
to build the new EquivalenceClass.
generate_join_implied_equalities() uses inner_rel->relids to look up
relevant eclasses, but it ought to be using nominal_inner_relids.
This is presently harmless because a child RelOptInfo will always have
exactly the same eclass_indexes as its topmost parent; but that might
not be true forever, and anyway it makes the code confusing.
The first of these is old (introduced by me in f3b3b8d5b), so back-patch
to all supported branches. The second only dates to v13, but we might
as well back-patch it to keep the code looking similar across branches.
Discussion: https://postgr.es/m/1508010.1601832581@sss.pgh.pa.us
Previously, a conversion such as
to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years. Fix the off-by-one problem.
Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD. This is how
the negative-century case works, so it seems sane to do likewise.
Continue to read "year 0000" as 1 BC. Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.
Per bug #16419 from Saeed Hubaishan. Back-patch to all supported
branches.
Dar Alathar-Yemen and Tom Lane
Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
Previously the standby server didn't archive timeline history files
streamed from the primary even when archive_mode is set to "always",
while it archives the streamed WAL files. This could cause the PITR to
fail because there was no required timeline history file in the archive.
The cause of this issue was that walreceiver didn't mark those files as
ready for archiving.
This commit makes walreceiver mark those streamed timeline history
files as ready for archiving if archive_mode=always. Then the archiver
process archives the marked timeline history files.
Back-patch to all supported versions.
Reported-by: Grigory Smolkin
Author: Grigory Smolkin, Fujii Masao
Reviewed-by: David Zhang, Anastasia Lubennikova
Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru
This function leaked some memory while loading qual clauses for
an RLS policy. While ordinarily negligible, that could build up
in some repeated-reload cases, as reported by Konstantin Knizhnik.
We can improve matters by borrowing the coding long used in
RelationBuildRuleLock: build stringToNode's result directly in
the target context, and remember to explicitly pfree the
input string.
This patch by no means completely guarantees zero leaks within
this function, since we have no real guarantee that the catalog-
reading subroutines it calls don't leak anything. However,
practical tests suggest that this is enough to resolve the issue.
In any case, any remaining leaks are similar to those risked by
RelationBuildRuleLock and other relcache-loading subroutines.
If we need to fix them, we should adopt a more global approach
such as that used by the RECOVER_RELATION_BUILD_MEMORY hack.
While here, let's remove the need for an expensive PG_TRY block by
using MemoryContextSetParent to reparent an initially-short-lived
context for the RLS data.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/21356c12-8917-8249-b35f-1c447231922b@postgrespro.ru
Harmonize behavior by moving reponsibility for fsyncing directories down
into slru.c. In 10 and later, only the multixact directories were
missed (see commit 1b02be21), and in older branches all SLRUs were
missed.
Back-patch to all supported releases.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGLtsTUOScnNoSMZ-2ZLv%2BwGh01J6kAo_DM8mTRq1sKdSQ%40mail.gmail.com
tsearch_readline() saves the string pointer it returns to the caller
for possible use in the associated error context callback. However,
the caller will usually pfree that string sometime before it next
calls tsearch_readline(), so that there is a window where an ereport
will try to print an already-freed string.
The built-in users of tsearch_readline() happen to all do that pfree
at the bottoms of their loops, so that the window is effectively
empty for them. However, this is not documented as a requirement,
and contrib/dict_xsyn doesn't do it like that, so it seems likely
that third-party dictionaries might have live bugs here.
The practical consequences of this seem pretty limited in any case,
since production builds wouldn't clobber the freed string immediately,
besides which you'd not expect syntax errors in dictionary files
being used in production. Still, it's clearly a bug waiting to bite
somebody.
Fix by pstrdup'ing the string to be saved for the error callback,
and then pfree'ing it next time through. It's been like this for
a long time, so back-patch to all supported branches.
Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
transformCreateStmt() adjusts the transformed statement's RangeVar
to specify the target schema explicitly, for the express reason
of making sure that auxiliary statements derived by parse
transformation operate on the right table. But the refactoring
I did in commit 502898192 got this wrong and passed the untransformed
RangeVar to expandTableLikeClause(). This could lead to assertion
failures or weird misbehavior if the wrong table was accessed.
Per report from Alexander Lakhin. Like the previous patch, back-patch
to all supported branches.
Discussion: https://postgr.es/m/05051f9d-b32b-cb35-6735-0e9f2ab86b5f@gmail.com
Bring the signal handling for startup-packet collection into line
with the policy established in commits bedadc732 and 8e19a8264,
namely don't risk running atexit callbacks when handling SIGQUIT.
Ideally, we'd not do so for SIGTERM or timeout interrupts either,
but that change seems a bit too risky for the back branches.
For now, just improve the comments in this area to describe the risk.
Also relocate where BackendInitialize re-disables these interrupts,
to minimize the code span where they're active. This doesn't buy
a whole lot of safety, but it can't hurt.
In passing, rename startup_die() to remove confusion about whether
it is for the startup process.
Like the previous commits, back-patch to all supported branches.
Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
Commit 8e19a8264 changed the SIGQUIT handlers of almost all server
processes not to run atexit callbacks. The archiver process was
skipped, perhaps because it's not connected to shared memory; but
it's just as true here that running atexit callbacks in a signal
handler is unsafe. So let's make it work like the rest.
In HEAD and v13, we can use the common SignalHandlerForCrashExit
handler. Before that, just tweak pgarch_exit to use _exit(2)
explicitly.
Like the previous commit, back-patch to all supported branches.
Kyotaro Horiguchi, back-patching by me
Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
We reported the wrong types when complaining that an aggregate's
moving-aggregate implementation is inconsistent with its regular
implementation.
This was wrong since the feature was introduced, so back-patch
to all supported branches.
Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1x808LH=LPhZp9mNSP0Xd1xDqEd+XeGcvEe48dfE6xV=A@mail.gmail.com
Because sigsetjmp() will restore the initial state with signals blocked,
the code path in bgworker.c for reporting an error and exiting would
execute that way. Usually this is fairly harmless; but if a parallel
worker had an error message exceeding the shared-memory communication
buffer size (16K) it would lock up, because it would wait for a
resume-sending signal from its parallel leader which it would never
detect.
To fix, just unblock signals at the appropriate point.
This can be shown to fail back to 9.6. The lack of parallel query
infrastructure makes it difficult to provide a simple test case for
9.5; but I'm pretty sure the issue exists in some form there as well,
so apply the code change there too.
Vignesh C, reviewed by Bharath Rupireddy, Robert Haas, and myself
Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
collectMatchBitmap() needs to re-find the index tuple it was previously
looking at, after transiently dropping lock on the index page it's on.
The tuple should still exist and be at its prior position or somewhere
to the right of that, since ginvacuum never removes tuples but
concurrent insertions could add one. However, there was a thinko in
that logic, to the effect of expecting any inserted tuples to have the
same index "attnum" as what we'd been scanning. Since there's no
physical separation of tuples with different attnums, it's not terribly
hard to devise scenarios where this fails, leading to transient "lost
saved point in index" errors. (While I've duplicated this with manual
testing, it seems impossible to make a reproducible test case with our
available testing technology.)
Fix by just continuing the scan when the attnum doesn't match.
While here, improve the error message used if we do fail, so that it
matches the wording used in btree for a similar case.
collectMatchBitmap()'s posting-tree code path was previously not
exercised at all by our regression tests. While I can't make
a regression test that exhibits the bug, I can at least improve
the code coverage here, so do that. The test case I made for this
is an extension of one added by 4b754d6c1, so it only works in
HEAD and v13; didn't seem worth trying hard to back-patch it.
Per bug #16595 from Jesse Kinkead. This has been broken since
multicolumn capability was added to GIN (commit 27cb66fdf),
so back-patch to all supported branches.
Discussion: https://postgr.es/m/16595-633118be8eef9ce2@postgresql.org
The trouble with doing this is that an apparently-constant subquery
output column isn't really constant if it is a grouping column that
appears in only some of the grouping sets. A qual using such a
column would be subject to incorrect const-folding after push-down,
as seen in bug #16585 from Paul Sivash.
To fix, just disable qual pushdown altogether if the sub-query has
nonempty groupingSets. While we could imagine far less restrictive
solutions, there is not much point in working harder right now,
because subquery_planner() won't move HAVING clauses to WHERE within
such a subquery. If the qual stays in HAVING it's not going to be
a lot more useful than if we'd kept it at the outer level.
Having said that, this restriction could be removed if we used a
parsetree representation that distinguished such outputs from actual
constants, which is something I hope to do in future. Hence, make
the patch a minimal addition rather than integrating it more tightly
(e.g. by renumbering the existing items in subquery_is_pushdown_safe's
comment).
Back-patch to 9.5 where grouping sets were introduced.
Discussion: https://postgr.es/m/16585-9d8c340d23ade8c1@postgresql.org
If a CREATE TABLE command uses both LIKE and traditional inheritance,
Vars in CHECK constraints and expression indexes that are absorbed
from a LIKE parent table tended to get mis-numbered, resulting in
wrong answers and/or bizarre error messages (though probably not any
actual crashes, thanks to validation occurring in the executor).
In v12 and up, the same could happen to Vars in GENERATED expressions,
even in cases with no LIKE clause but multiple traditional-inheritance
parents.
The cause of the problem for LIKE is that parse_utilcmd.c supposed
it could renumber such Vars correctly during transformCreateStmt(),
which it cannot since we have not yet accounted for columns added via
inheritance. Fix that by postponing processing of LIKE INCLUDING
CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed
DefineRelation().
The error with GENERATED and multiple inheritance is a simple oversight
in MergeAttributes(); it knows it has to renumber Vars in inherited
CHECK constraints, but forgot to apply the same processing to inherited
GENERATED expressions (a/k/a defaults).
Per bug #16272 from Tom Gottfried. The non-GENERATED variants of the
issue are ancient, presumably dating right back to the addition of
CREATE TABLE LIKE; hence back-patch to all supported branches.
Discussion: https://postgr.es/m/16272-6e32da020e9a9381@postgresql.org
Several PGXN modules reference LockTagType values; renumbering would
force a recompile of those modules. Oversight in back-patch of today's
commit 566372b3d6. Back-patch to released
branches, v12 through 9.5.
Reported by Tom Lane.
Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery. However, the planner only went as far
as verifying that the clauses were all binary OpExprs. This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10. To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things. Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).
While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns. Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining. There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly. Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.
This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
When a table is scanned by heapam_index_build_range_scan (née
IndexBuildHeapScan) and the table lock being held allows concurrent data
changes, it is possible for new HOT chains to sprout in a page that were
unknown when the scan of a page happened. This leads to an error such
as
ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl"
because the root tuple was not present when we first obtained the list
of the page's root tuples. This can be fixed by re-obtaining the list
of root tuples, if we see that a heap-only tuple appears to point to a
non-existing root.
This was reported by Anastasia as occurring for BRIN summarization
(which exists since 9.5), but I think it could theoretically also happen
with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY
(very recent). It seems a happy coincidence that BRIN forces us to
backpatch this all the way to 9.5.
Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru
Backpatch: 9.5 - master
If a page range is desummarized at just the right time concurrently with
an index walk, BRIN would raise an error indicating index corruption.
This is scary and unhelpful; silently returning that the page range is
not summarized is sufficient reaction.
This bug was introduced by commit 975ad4e602 as additional protection
against a bug whose actual fix was elsewhere. Backpatch equally.
Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Diagnosed-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
Backpatch: 9.5 - master
Hostile objects located within the installation-time search_path could
capture references in an extension's installation or upgrade script.
If the extension is being installed with superuser privileges, this
opens the door to privilege escalation. While such hazards have existed
all along, their urgency increases with the v13 "trusted extensions"
feature, because that lets a non-superuser control the installation path
for a superuser-privileged script. Therefore, make a number of changes
to make such situations more secure:
* Tweak the construction of the installation-time search_path to ensure
that references to objects in pg_catalog can't be subverted; and
explicitly add pg_temp to the end of the path to prevent attacks using
temporary objects.
* Disable check_function_bodies within installation/upgrade scripts,
so that any security gaps in SQL-language or PL-language function bodies
cannot create a risk of unwanted installation-time code execution.
* Adjust lookup of type input/receive functions and join estimator
functions to complain if there are multiple candidate functions. This
prevents capture of references to functions whose signature is not the
first one checked; and it's arguably more user-friendly anyway.
* Modify various contrib upgrade scripts to ensure that catalog
modification queries are executed with secure search paths. (These
are in-place modifications with no extension version changes, since
it is the update process itself that is at issue, not the end result.)
Extensions that depend on other extensions cannot be made fully secure
by these methods alone; therefore, revert the "trusted" marking that
commit eb67623c9 applied to earthdistance and hstore_plperl, pending
some better solution to that set of issues.
Also add documentation around these issues, to help extension authors
write secure installation scripts.
Patch by me, following an observation by Andres Freund; thanks
to Noah Misch for review.
Security: CVE-2020-14350
Ashutosh Bapat noticed that when logical walsender needs to wait for
WAL, and it realizes that it must send a keepalive message to
walreceiver to update the sent-LSN, which *does not* request a reply
from walreceiver, it wrongly sets the flag that it's going to wait for
that reply. That means that any future would-be sender of feedback
messages ends up not sending a feedback message, because they all
believe that a reply is expected.
With built-in logical replication there's not much harm in this, because
WalReceiverMain will send a ping-back every wal_receiver_timeout/2
anyway; but with other logical replication systems (e.g. pglogical) it
can cause significant pain.
This problem was introduced in commit 41d5f8ad73, where the
request-reply flag was changed from true to false to WalSndKeepalive,
without at the same time removing the line that sets
waiting_for_ping_response.
Just removing that line would be a sufficient fix, but it seems better
to shift the responsibility of setting the flag to WalSndKeepalive
itself instead of requiring caller to do it; this is clearly less
error-prone.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Backpatch: 9.5 and up
Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
The new hlCover() algorithm that I introduced in commit c9b0c678d
turns out to potentially take O(N^2) or worse time on long documents,
if there are many occurrences of individual query words but few or no
substrings that actually satisfy the query. (One way to hit this
behavior is with a "common_word & rare_word" type of query.) This
seems unavoidable given the original goal of checking every substring
of the document, so we have to back off that idea. Fortunately, it
seems unlikely that anyone would really want headlines spanning all of
a long document, so we can avoid the worse-than-linear behavior by
imposing a maximum length of substring that we'll consider.
For now, just hard-wire that maximum length as a multiple of max_words
times max_fragments. Perhaps at some point somebody will argue for
exposing it as a ts_headline parameter, but I'm hesitant to make such
a feature addition in a back-patched bug fix.
I also noted that the hlFirstIndex() function I'd added in that
commit was unnecessarily stupid: it really only needs to check whether
a HeadlineWordEntry's item pointer is null or not. This wouldn't make
all that much difference in typical cases with queries having just
a few terms, but a cycle shaved is a cycle earned.
In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse.
This ensures that hlCover's loop is cancellable if it manages to take
a long time, and it may protect some other TS_execute callers as well.
Back-patch to 9.6 as the previous commit was. I also chose to add the
CHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seems
to avoid the O(N^2) behavior, at least on the test case I tried, but
nonetheless it's not very quick on a long document.
Per report from Stephen Frost.
Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
Backpatch an assertion (that was originally added to Postgres 12 by
commit dd299df818) that seems broadly useful. The assertion can detect
violations of the HOT invariant (i.e. no two index tuples can point to
the same heap TID) when CREATE INDEX somehow incorrectly allows that to
take place.
For example, a IndexBuildHeapScan/heapam_index_build_range_scan bug
might result in two tuples that both point to the same heap TID. If
these two tuples also happen to be duplicates, the assertion will fail.
Discussion: https://postgr.es/m/CAH2-WzmBxu4o=pMsniur+bwHqCGCmV_AOLkuK6BuU7ngA6evqw@mail.gmail.com
Backpatch: 9.5-11 only
This commit back-patches the v12-era commits a73d08319, cc92cca43,
and 7570df0f3 into supported pre-v12 branches. The net effect is to
eliminate our former dependency on the never-standard sys_siglist[]
array, instead using POSIX-standard strsignal(3).
What motivates doing this now is that glibc just removed sys_siglist[]
from the set of symbols available to newly-built programs. While our
code can survive without sys_siglist[], it then fails to print any
description of the signal that killed a child process, which is a
non-negligible loss of friendliness. We can expect that people will
be wanting to build the back branches on platforms that include this
change, so we need to do something.
Since strsignal(3) has existed for quite a long time, and we've not
had any trouble with these patches so far in v12, it seems safe to
back-patch into older branches.
Discussion: https://postgr.es/m/3179114.1594853308@sss.pgh.pa.us
An ALTER TABLE to validate a foreign key in which another subcommand
already caused a pending table rewrite could fail due to ALTER TABLE
attempting to validate the foreign key before the actual table rewrite
takes place. This situation could result in an error such as:
ERROR: could not read block 0 in file "base/nnnnn/nnnnn": read only 0 of 8192 bytes
The failure here was due to the SPI call which validates the foreign key
trying to access an index which is yet to be rebuilt.
Similarly, we also incorrectly tried to validate CHECK constraints before
the heap had been rewritten.
The fix for both is to delay constraint validation until phase 3, after
the table has been rewritten. For CHECK constraints this means a slight
behavioral change. Previously ALTER TABLE VALIDATE CONSTRAINT on
inheritance tables would be validated from the bottom up. This was
different from the order of evaluation when a new CHECK constraint was
added. The changes made here aligns the VALIDATE CONSTRAINT evaluation
order for inheritance tables to be the same as ADD CONSTRAINT, which is
generally top-down.
Reported-by: Nazli Ugur Koyluoglu, using SQLancer
Discussion: https://postgr.es/m/CAApHDvp%3DZXv8wiRyk_0rWr00skhGkt8vXDrHJYXRMft3TjkxCA%40mail.gmail.com
Backpatch-through: 9.5 (all supported versions)
The qual pushdown logic assumed that all Vars in a restriction clause
must be Vars referencing subquery outputs; but since we introduced
LATERAL, it's possible for such a Var to be a lateral reference instead.
This led to an assertion failure in debug builds. In a non-debug
build, there might be no ill effects (if qual_is_pushdown_safe decided
the qual was unsafe anyway), or we could get failures later due to
construction of an invalid plan. I've not gone to much length to
characterize the possible failures, but at least segfaults in the
executor have been observed.
Given that this has been busted since 9.3 and it took this long for
anybody to notice, I judge that the case isn't worth going to great
lengths to optimize. Hence, fix by just teaching qual_is_pushdown_safe
that such quals are unsafe to push down, matching the previous behavior
when it accidentally didn't fail.
Per report from Tom Ellis. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20200713175124.GQ8220@cloudinit-builder
After running GetForeignRelSize for a foreign table, adjust rel->tuples
to be at least as large as rel->rows. This prevents bizarre behavior
in estimate_num_groups() and perhaps other places, especially in the
scenario where rel->tuples is zero because pg_class.reltuples is
(suggesting that ANALYZE has never been run for the table). As things
stood, we'd end up estimating one group out of any GROUP BY on such a
table, whereas the default group-count estimate is more likely to result
in a sane plan.
Also, clarify in the documentation that GetForeignRelSize has the option
to override the rel->tuples value if it has a better idea of what to use
than what is in pg_class.reltuples.
Per report from Jeff Janes. Back-patch to all supported branches.
Patch by me; thanks to Etsuro Fujita for review
Discussion: https://postgr.es/m/CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com
Commit ecd9e9f0b fixed the problem in the wrong place, causing unwanted
side-effects on the behavior of GetNextTempTableSpace(). Instead,
let's make SharedFileSetInit() responsible for subbing in the value
of MyDatabaseTableSpace when the default tablespace is called for.
The convention about what is in the tempTableSpaces[] array is
evidently insufficiently documented, so try to improve that.
It also looks like SharedFileSetInit() is doing the wrong thing in the
case where temp_tablespaces is empty. It was hard-wiring use of the
pg_default tablespace, but it seems like using MyDatabaseTableSpace
is more consistent with what happens for other temp files.
Back-patch the reversion of PrepareTempTablespaces()'s behavior to
9.5, as ecd9e9f0b was. The changes in SharedFileSetInit() go back
to v11 where that was introduced. (Note there is net zero code change
before v11 from these two patch sets, so nothing to release-note.)
Magnus Hagander and Tom Lane
Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com
A likely copy/paste error in 98e8b48053 from back in 2004 would
cause temp tablespace to be reset to InvalidOid if temp_tablespaces
was set to the same value as the primary tablespace in the database.
This would cause shared filesets (such as for parallel hash joins)
to ignore them, putting the temporary files in the default tablespace
instead of the configured one. The bug is in the old code, but it
appears to have been exposed only once we had shared filesets.
Reviewed-By: Daniel Gustafsson
Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com
Backpatch-through: 9.5
explain_get_index_name() applied quote_identifier() to the index name.
This is fine for text output, but the non-text output formats all have
their own quoting conventions and would much rather start from the
actual index name. For example in JSON you'd get something like
"Index Name": "\"My Index\"",
which is surely not desirable, especially when the same does not
happen for table names. Hence, move the responsibility for applying
quoting out to the callers, where it can go into already-existing
special code paths for text format.
This changes the API spec for users of explain_get_index_name_hook:
before, they were supposed to apply quote_identifier() if necessary,
now they should not. Research suggests that the only publicly
available user of the hook is hypopg, and it actually forgot to
apply quoting anyway, so it's fine. (In any case, there's no
behavioral change for the output of a hook as seen in non-text
EXPLAIN formats, so this won't break any case that programs should
be relying on.)
Digging in the commit logs, it appears that quoting was included in
explain_get_index_name's duties when commit 604ffd280 invented it;
and that was fine at the time because we only had text output format.
This should have been rethought when non-text formats were invented,
but it wasn't.
This is a fairly clear bug for users of non-text EXPLAIN formats,
so back-patch to all supported branches.
Per bug #16502 from Maciek Sakrejda. Patch by me (based on
investigation by Euler Taveira); thanks to Julien Rouhaud for review.
Discussion: https://postgr.es/m/16502-57bd1c9f913ed1d1@postgresql.org