Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery. That was never possible before writable CTEs, of
course, but now it's broken. The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.
Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment. Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists. (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with. But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)
Back-patch to 9.1, where the problem became possible to hit.
We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.
The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.
Backpatch to all supported releases.
For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible. When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.
In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup. It looks like everyplace else that touches
PlaceHolderVars got it right, though.
Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.
In commit 57664ed25e I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
Phil Sorber reported that a rewriting ALTER TABLE within an extension
update script failed, because it creates and then drops a placeholder
table; the drop was being disallowed because the table was marked as an
extension member. We could hack that specific case but it seems likely
that there might be related cases now or in the future, so the most
practical solution seems to be to create an exception to the general rule
that extension member objects can only be dropped by dropping the owning
extension. To wit: if the DROP is issued within the extension's own
creation or update scripts, we'll allow it, implicitly performing an
"ALTER EXTENSION DROP object" first. This will simplify cases such as
extension downgrade scripts anyway.
No docs change since we don't seem to have documented the idea that you
would need ALTER EXTENSION DROP for such an action to begin with.
Also, arrange for explicitly temporary tables to not get linked as
extension members in the first place, and the same for the magic
pg_temp_nnn schemas that are created to hold them. This prevents assorted
unpleasant results if an extension script creates a temp table: the forced
drop at session end would either fail or remove the entire extension, and
neither of those outcomes is desirable. Note that this doesn't fix the
ALTER TABLE scenario, since the placeholder table is not temp (unless the
table being rewritten is).
Back-patch to 9.1.
Both libpq and the backend would truncate a common name extracted from a
certificate at 32 bytes. Replace that fixed-size buffer with dynamically
allocated string so that there is no hard limit. While at it, remove the
code for extracting peer_dn, which we weren't using for anything; and
don't bother to store peer_cn longer than we need it in libpq.
This limit was not so terribly unreasonable when the code was written,
because we weren't using the result for anything critical, just logging it.
But now that there are options for checking the common name against the
server host name (in libpq) or using it as the user's name (in the server),
this could result in undesirable failures. In the worst case it even seems
possible to spoof a server name or user name, if the correct name is
exactly 32 bytes and the attacker can persuade a trusted CA to issue a
certificate in which that string is a prefix of the certificate's common
name. (To exploit this for a server name, he'd also have to send the
connection astray via phony DNS data or some such.) The case that this is
a realistic security threat is a bit thin, but nonetheless we'll treat it
as one.
Back-patch to 8.4. Older releases contain the faulty code, but it's not
a security problem because the common name wasn't used for anything
interesting.
Reported and patched by Heikki Linnakangas
Security: CVE-2012-0867
In the Fedora variant of MinGW, the openssl libraries have their normal
names, not libeay32 and libssleay32. Adjust configure probes to allow
that, per bug #6486.
Tomasz Ostrowski
This extends the changes of commit 6252c4f9e2
so that we run the cleanup hook earlier for failure cases as well as
success cases. As before, the point is to avoid an assertion failure from
an Assert I added in commit a874fe7b4c, which
was meant to check that no user-written code can be called during portal
cleanup. This fixes a case reported by Pavan Deolasee in which the Assert
could be triggered during backend exit (see the new regression test case),
and also prevents the possibility that the cleanup hook is run after
portions of the portal's state have already been recycled. That doesn't
really matter in current usage, but it foreseeably could matter in the
future.
Back-patch to 9.1 where the Assert in question was added.
We log AccessExclusiveLocks for replay onto standby nodes,
but because of timing issues on ProcArray it is possible to
log a lock that is still held by a just committed transaction
that is very soon to be removed. To avoid any timing issue we
avoid applying locks made by transactions with InvalidXid.
Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
In commit 7b0d0e9356, I made CLUSTER and
VACUUM FULL try to preserve toast value OIDs from the original toast table
to the new one. However, if we have to copy both live and recently-dead
versions of a row that has a toasted column, those versions may well
reference the same toast value with the same OID. The patch then led to
duplicate-key failures as we tried to insert the toast value twice with the
same OID. (The previous behavior was not very desirable either, since it
would have silently inserted the same value twice with different OIDs.
That wastes space, but what's worse is that the toast values inserted for
already-dead heap rows would not be reclaimed by subsequent ordinary
VACUUMs, since they go into the new toast table marked live not deleted.)
To fix, check if the copied OID already exists in the new toast table, and
if so, assume that it stores the desired value. This is reasonably safe
since the only case where we will copy an OID from a previous toast pointer
is when toast_insert_or_update was given that toast pointer and so we just
pulled the data from the old table; if we got two different values that way
then we have big problems anyway. We do have to assume that no other
backend is inserting items into the new toast table concurrently, but
that's surely safe for CLUSTER and VACUUM FULL.
Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous
patch.
Historically we've used the SWPB instruction for TAS() on ARM, but this
is deprecated and not available on ARMv6 and later. Instead, make use
of a GCC builtin if available. We'll still fall back to SWPB if not,
so as not to break existing ports using older GCC versions.
Eventually we might want to try using __sync_lock_test_and_set() on some
other architectures too, but for now that seems to present only risk and
not reward.
Back-patch to all supported versions, since people might want to use any
of them on more recent ARM chips.
Martin Pitt
I forgot to change the functions to use the PG_GETARG_INET_PP() macro,
when I changed DatumGetInetP() to unpack the datum, like Datum*P macros
usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP()
macro, and didn't notice because it wasn't used.
This fixes the memory leak when sorting inet values, as reported
by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like
the previous patch that broke it.
The EvalPlanQual machinery assumes that whole-row Vars generated for the
outputs of non-table RTEs will be of composite types. However, for the
case where the RTE is a function call returning a scalar type, we were
doing the wrong thing, as a result of sharing code with a parser case
where the function's scalar output is wanted. (Or at least, that's what
that case has done historically; it does seem a bit inconsistent.)
To fix, extend makeWholeRowVar's API so that it can support both use-cases.
This fixes Belinda Cussen's report of crashes during concurrent execution
of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED
mode, we'd run the EvalPlanQual machinery after a conflicting row update
commits, and it was expecting to get a HeapTuple not a scalar datum from
the "wholerowN" variable referencing the function RTE.
Back-patch to 9.0 where the current EvalPlanQual implementation appeared.
In 9.1 and up, this patch also fixes failure to attach the correct
collation to the Var generated for a scalar-result case. An example:
regression=# select upper(x.*) from textcat('ab', 'cd') x;
ERROR: could not determine which collation to use for upper() function
Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output
expressions appear non-constant and distinct from each other. This makes
the world safe for add_child_rel_equivalences to do what it does. Before,
it was possible for that function to add identical expressions to different
EquivalenceClasses, which logically should imply merging such ECs, which
would be wrong; or to improperly add a constant to an EquivalenceClass,
drastically changing its behavior. Per report from Teodor Sigaev.
The only currently known consequence of this bug is "MergeAppend child's
targetlist doesn't match MergeAppend" planner failures in 9.1 and later.
I am suspicious that there may be other failure modes that could affect
older release branches; but in the absence of any hard evidence, I'll
refrain from back-patching further than 9.1.
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.
Backpatch to 8.3, where 1-byte header varlenas were introduced.
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation. The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.
Bug report by Chris Redekop
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.
The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.
The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.
Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results. Per report from Marti Raudsepp,
though this is not his proposed patch.
Back-patch to 9.0, where both these features were introduced. In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
This was broken in commit 53dbc27c62, which
introduced unlogged tables. Fortunately, as debugging tools go, this one
is pretty cheap, which is probably why it took nine months for someone to
notice, but it's not intended to be enabled by default, so revert.
Noted by Fujii Masao.
CREATE EXTENSION needs to transiently set search_path, as well as
client_min_messages and log_min_messages. We were doing this by the
expedient of saving the current string value of each variable, doing a
SET LOCAL, and then doing another SET LOCAL with the previous value at
the end of the command. This is a bit expensive though, and it also fails
badly if there is anything funny about the existing search_path value,
as seen in a recent report from Roger Niederland. Fortunately, there's a
much better way, which is to piggyback on the GUC infrastructure previously
developed for functions with SET options. We just open a new GUC nesting
level, do our assignments with GUC_ACTION_SAVE, and then close the nesting
level when done. This automatically restores the prior settings without a
re-parsing pass, so (in principle anyway) there can't be an error. And
guc.c still takes care of cleanup in event of an error abort.
The CREATE EXTENSION code for this was modeled on some much older code in
ri_triggers.c, which I also changed to use the better method, even though
there wasn't really much risk of failure there. Also improve the comments
in guc.c to reflect this additional usage.
dots. I previously worked around this in initdb, mapping the known
problematic locale names to aliases that work, but Hiroshi Inoue pointed
out that that's not enough because even if you use one of the aliases, like
"Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie.
"Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by
passing that value back to setlocale(), it fails. Note that you are affected
by this bug also if you use one of those short-form names manually, so just
reverting the hack in initdb won't fix it.
To work around that, move the locale name mapping from initdb to a wrapper
around setlocale(), so that the mapping is invoked on every setlocale() call.
Also, add a few checks for failed setlocale() calls in the backend. These
calls shouldn't fail, and if they do there isn't much we can do about it,
but at least you'll get a warning.
Backpatch to 9.1, where the initdb hack was introduced. The Windows bug
affects older versions too if you set locale manually to one of the aliases,
but given the lack of complaints from the field, I'm hesitent to backpatch.
ifdef block. It has nothing to do with whether the replacement snprintf
function is used. It caused no live bug, because the replacement snprintf
function is always used on Win32, but it was nevertheless misplaced.
Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update. Fix by not trying to overload the use of
estate->es_trig_tuple_slot. Per report from Yoran Heling.
Back-patch to 9.0, when trigger WHEN conditions were introduced.
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale. The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.
Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages. This is more straightforward, and might even be a bit
faster since only one unlink call is needed.
This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
Fix a whole bunch of signal handlers that had been hacked to do things that
might change errno, without adding the necessary save/restore logic for
errno. Also make some minor fixes in unix_latch.c, and clean up bizarre
and unsafe scheme for disowning the process's latch. While at it, rename
the PGPROC latch field to procLatch for consistency with 9.2.
Issues noted while reviewing a patch by Peter Geoghegan.
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code. Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.
This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them. In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.
While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
Somebody thought it'd be cute to invent a set of Node tag numbers that were
defined independently of, and indeed conflicting with, the main tag-number
list. While this accidentally failed to fail so far, it would certainly
lead to trouble as soon as anyone wanted to, say, apply copyObject to these
node types. Clang was already complaining about the use of makeNode on
these tags, and I think quite rightly so. Fix by pushing these node
definitions into the mainstream, including putting replnodes.h where it
belongs.
This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown. This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.
Fixes breakage induced by commit c85c941470.
Back-patch to all affected branches.
It was initialized in the wrong place and to the wrong value. With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.
The original implementation simply did nothing when replacing an existing
object during CREATE EXTENSION. The folly of this was exposed by a report
from Marc Munro: if the existing object belongs to another extension, we
are left in an inconsistent state. We should insist that the object does
not belong to another extension, and then add it to the current extension
if not already a member.
This function supports untranslated detail messages, in the same way that
errmsg_internal supports untranslated primary messages. We've needed this
for some time IMO, but discussion of some cases in the SSI code provided
the impetus to actually add it.
Kevin Grittner, with minor adjustments by me
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.
Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.
Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.
We were using GetConfigOption to collect the old value of each setting,
overlooking the possibility that it didn't exist yet. This does happen
in the case of adding a new entry within a custom variable class, as
exhibited in bug #6097 from Maxim Boguk.
To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1
and HEAD --- it seems possible that some third-party code is using that
function, so changing its API in a minor release would cause problems.
In 9.0, create a near-duplicate function instead.
transactions might not match the order the work done in those transactions
become visible to others. The logic in SSI, however, assumed that it does.
Fix that by having two sequence numbers for each serializable transaction,
one taken before a transaction becomes visible to others, and one after it.
This is easier than trying to make the the transition totally atomic, which
would require holding ProcArrayLock and SerializableXactHashLock at the same
time. By using prepareSeqNo instead of commitSeqNo in a few places where
commit sequence numbers are compared, we can make those comparisons err on
the safe side when we don't know for sure which committed first.
Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it
is different from the original patch.
Per discussion, this structure seems more understandable than what was
there before. Make config.sgml and postgresql.conf.sample agree.
In passing do a bit of editorial work on the variable descriptions.