In e6799d5a53 I removed vacuumlazy.c trickery around re-checking
whether a page is actually empty after acquiring an extension lock on
the relation, because the page is not PageInit()ed anymore, and
entries in the FSM ought not to lead to user-visible errors.
As reported by various buildfarm animals that is not correct, given
the way to code currently stands: If vacuum processes a page that's
just been newly added by either RelationGetBufferForTuple() or
RelationAddExtraBlocks(), it could add that page to the FSM and it
could be reused by other backends, before those two functions check
whether the newly added page is actually new. That's a relatively
narrow race, but several buildfarm machines appear to be able to hit
it.
While it seems wrong that the FSM, given it's lack of durability and
approximative nature, can trigger errors like this, that seems better
fixed in a separate commit. Especially given that a good portion of
the buildfarm is red, and this is just re-introducing logic that
existed a few hours ago.
Author: Andres Freund
Discussion: https://postgr.es/m/20190128222259.zhi7ovzgtkft6em6@alap3.anarazel.de
In batching mode, COPY was using the same (per-tuple) memory context for
allocations with longer lifetime. This was confusing but harmless, until
commit 31f3817402 added COPY FROM ... WHERE feature, introducing a risk
of memory leak.
The "per-tuple" memory context was reset only when starting new batch,
but as the rows may be filtered out by the WHERE clauses, that may not
happen at all. The WHERE clause however has to be evaluated for all
rows, before filtering them out.
This commit separates the per-tuple and per-batch contexts, removing the
ambiguity. Expressions (both defaults and WHERE clause) are evaluated
in the per-tuple context, while tuples are formed in the batch context.
This allows resetting the contexts at appropriate times.
The main complexity is related to partitioning, in which case we need to
reset the batch context after forming the tuple (which happens before
routing to leaf partition). Instead of switching between two contexts
as before, we simply copy the last tuple aside, reset the context and
then copy the tuple back. The performance impact is negligible, and
juggling with two contexts is not free either.
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
Previously we initialized pages when bulk extending in
RelationAddExtraBlocks(). That has a major disadvantage: It ties
RelationAddExtraBlocks() to heap, as other types of storage are likely
to need different amounts of special space, have different amount of
free space (previously determined by PageGetHeapFreeSpace()).
That we're relying on initializing pages, but not WAL logging the
initialization, also means the risk for getting
"WARNING: relation \"%s\" page %u is uninitialized --- fixing"
style warnings in vacuums after crashes/immediate shutdowns, is
considerably higher. The warning sounds much more serious than what
they are.
Fix those two issues together by not initializing pages in
RelationAddExtraPages() (but continue to do so in
RelationGetBufferForTuple(), which is linked much more closely to
heap), and accepting uninitialized pages as normal in
vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it
to the FSM, but does nothing else. We chose to not issue a debug
message, much less a warning in that case - it seems rarely useful,
and quite likely to scare people unnecessarily.
For now empty pages aren't added to the VM, because standbys would not
re-discover such pages after a promotion. In contrast to other sources
for empty pages, there's no corresponding WAL records triggering FSM
updates during replay.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.
Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation. We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.
Author: John Naylor with design inputs and some code contribution by Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
Since LISTEN is (still) disallowed, UNLISTEN must be a no-op in a
hot-standby session, and so there's no harm in allowing it. This
change allows client code to not worry about whether it's connected
to a primary or standby server when performing session-state-reset
type activities. (Note that DISCARD ALL, which includes UNLISTEN,
was already allowed, making it inconsistent to reject UNLISTEN.)
Per discussion, back-patch to all supported versions.
Shay Rojansky, reviewed by Mi Tar
Discussion: https://postgr.es/m/CADT4RqCf2gA_TJtPAjnGzkC3ZiexfBZiLmA-mV66e4UyuVv8bA@mail.gmail.com
There were two flags used to track the access to temporary tables and
to the temporary namespace of a session which are used to restrict
PREPARE TRANSACTION, however the first control flag is a concept
included in the second. This removes the flag for temporary table
tracking, keeping around only the one at namespace level.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190118053126.GH1883@paquier.xyz
This change allows callers of query_tree_walker() to choose whether
to visit an RTE before or after visiting the contents of the RTE
(i.e., prefix or postfix tree order). All existing users of
QTW_EXAMINE_RTES want the QTW_EXAMINE_RTES_BEFORE behavior, but
an upcoming patch will want QTW_EXAMINE_RTES_AFTER, and it seems
like a potentially useful change on its own.
Andreas Karlsson (extracted from CTE inlining patch)
Discussion: https://postgr.es/m/8810.1542402910@sss.pgh.pa.us
While it's perhaps unlikely that users would write an explicit test
like "ctid IS NULL", this function is also used in range estimation,
and an incorrect answer can throw off the results for tight ranges.
Anyway it's not much code so we might as well do it.
Edmund Horner
Discussion: https://postgr.es/m/CAMyN-kCa3BFUFrCTtQeprxTU1anCd3Pua7zXstGCKq4pXgjukw@mail.gmail.com
Heikki's compiler doesn't complain about end_ptr, apparently,
but mine does.
In passing, I failed to resist the temptation to remove the
no-longer-used fldnum variable, and relocate chunk_len's
declaration to a narrower scope.
The old implementation first converted the input strings to arrays of
wchars, and performed the conversion on those. However, the conversion is
expensive, and for a large input string, consumes a lot of memory.
Allocating the large arrays also meant that these functions could not be
used on strings larger 1 GB / pg_encoding_max_length() (256 MB for UTF-8).
Avoid the conversion, and instead use the single-byte algorithm even with
multibyte encodings. That can get fooled, if there is a matching byte
sequence in the middle of a multi-byte character, so to eliminate false
positives like that, we verify any matches by walking the string character
by character with pg_mblen(). Also, if the caller needs the position of
the match, as a character-offset, we also need to walk the string to count
the characters.
Performance testing shows that walking the whole string with pg_mblen() is
somewhat slower than converting the whole string to wchars. It's still
often a win, though, because we don't need to do it if there is no match,
and even when there is, we only need to walk up to the point where the
match is, not the whole string. Even in the worst case, there would be
room for optimization: Much of the CPU time in the current loop with
pg_mblen() is function call overhead, and could be improved by inlining
pg_mblen() and/or the encoding-specific mblen() functions. But I didn't
attempt to do that as part of this patch.
Most of the callers of text_position_setup/next functions were actually
not interested in the position of the match, counted in characters. To
cater for them, refactor the text_position_next() interface into two
parts: searching for the next match (text_position_next()), and returning
the current match's position as a pointer (text_position_get_match_ptr())
or as a character offset (text_position_get_match_pos()). Getting the
pointer to the match is a more convenient API for many callers, and with
UTF-8, it allows skipping the character-walking step altogether, because
UTF-8 can't have false matches even when treated like raw byte strings.
Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/3173d989-bc1c-fc8a-3b69-f24246f73876%40iki.fi
GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the
assumption that mblen() only looks at the first byte, but it turns out to
work out fine, because of the way the GB18030 encoding works. COPY will
see a 4-byte encoded character as two 2-byte encoded characters, which is
enough for COPY's purposes. It cannot mix those up with delimiter or
escaping characters, because only single-byte ASCII characters are
supported as delimiters or escape characters.
Discussion: https://www.postgresql.org/message-id/7704d099-9643-2a55-fb0e-becd64400dcb%40iki.fi
Previously, only literals were allowed. This change allows general
expressions, including functions calls, which are evaluated at the
time the DDL command is executed.
Besides offering some more functionality, it simplifies the parser
structures and removes some inconsistencies in how the literals were
handled.
Author: Kyotaro Horiguchi, Tom Lane, Amit Langote
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/9f88b5e0-6da2-5227-20d0-0d7012beaa1c@lab.ntt.co.jp/
We were failing to set conislocal correctly for constraints in
partitions after partition detach, leading to those constraints becoming
undroppable. Fix by setting the flag correctly. Existing databases
might contain constraints with the conislocal wrongly set to false, for
partitions that were detached; this situation should be fixable by
applying an UPDATE on pg_constraint to set conislocal true. This
problem should otherwise be innocuous and should disappear across a
dump/restore or pg_upgrade.
Secondarily, when constraint drop was attempted in a partitioned table,
ATExecDropConstraint would try to recurse to partitions after doing
performDeletion() of the constraint in the partitioned table itself; but
since the constraint in the partitions are dropped by the initial call
of performDeletion() (because of following dependencies), the recursion
step would fail since it would not find the constraint, causing the
whole operation to fail. Fix by preventing recursion.
Reported-by: Amit Langote
Diagnosed-by: Amit Langote
Author: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/f2b8ead5-4131-d5a8-8016-2ea0a31250af@lab.ntt.co.jp
Given these routines are heap specific, and that there will be more
generic visibility support in via table AM, it makes sense to move the
prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will
not be exposed in a generic fashion, because they are too storage
specific).
Similarly, the code in tqual.c is specific to heap, so moving it into
access/heap/ makes sense.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
The code in tqual.c is largely heap specific. Due to the upcoming
pluggable storage work, it therefore makes sense to move it into
access/heap/ (as the file's header notes, the tqual name isn't very
good).
But the various statically allocated snapshot and snapshot
initialization functions are now (see previous commit) generic and do
not depend on functions declared in tqual.h anymore. Therefore move.
Also move XidInMVCCSnapshot as that's useful for future AMs, and
already used outside of tqual.c.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
This is in preparation for allowing the same snapshot be used for
different table AMs. With the current callback based approach we would
need one callback for each supported AM, which clearly would not be
extensible. Thus add a new Snapshot->snapshot_type field, and move
the dispatch into HeapTupleSatisfiesVisibility() (which is now a
function). Later work will then dispatch calls to
HeapTupleSatisfiesVisibility() and other AMs visibility functions
depending on the type of the table. The central SnapshotType enum
also seems like a good location to centralize documentation about the
intended behaviour of various types of snapshots.
As tqual.h isn't included by bufmgr.h any more (as HeapTupleSatisfies*
isn't referenced by TestForOldSnapshot() anymore) a few files now need
to include it directly.
Author: Andres Freund, loosely based on earlier work by Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.dehttps://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
Detaching a partition from a partitioned table that's constrained by
foreign keys requires additional action triggers on the referenced side;
otherwise, DELETE/UPDATE actions there fail to notice rows in the table
that was partition, and so are incorrectly allowed through. With this
commit, those triggers are now created. Conversely, when a table that
has a foreign key is attached as a partition to a table that also has
the same foreign key, those action triggers are no longer needed, so we
remove them.
Add a minimal test case verifying (part of) this.
Authors: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/f2b8ead5-4131-d5a8-8016-2ea0a31250af@lab.ntt.co.jp
Back in commit 100340e2dc, we made relcache entries keep lists of the
foreign keys applying to the relation -- but we forgot to update
CacheInvalidateHeapTuple to flush those entries when new FKs got created
or existing ones updated/deleted. No bugs appear to have been reported
that would be explained by this ommission, but I noticed the problem
while working on an unrelated bugfix which clearly showed it. Fix by
adding relcache flush on relevant foreign key changes.
Backpatch to 9.6, like the aforementioned commit.
Discussion: https://postgr.es/m/201901211927.7mmhschxlejh@alvherre.pgsql
Reviewed-by: Tom Lane
Most of these had been obsoleted by 568d4138c / the SnapshotNow
removal.
This is is preparation for moving most of tqual.[ch] into either
snapmgr.h or heapam.h, which in turn is in preparation for pluggable
table AMs.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
access/heapam contains functions that are very storage specific (say
heap_insert() and a lot of lower level functions), and fairly generic
infrastructure like relation_open(), heap_open() etc. In the upcoming
pluggable storage work we're introducing a layer between table
accesses in general and heapam, to allow for different storage
methods. For a bit cleaner separation it thus seems advantageous to
move generic functions like the aforementioned to their own headers.
access/relation.h will contain relation_open() etc, and access/table.h
will contain table_open() (formerly known as heap_open()). I've decided
for table.h not to include relation.h, but we might change that at a
later stage.
relation.h already exists in another directory, but the other
plausible name (rel.h) also conflicts. It'd be nice if there were a
non-conflicting name, but nobody came up with a suggestion. It's
possible that the appropriate way to address the naming conflict would
be to rename nodes/relation.h, which isn't particularly well named.
To avoid breaking a lot of extensions that just use heap_open() etc,
table.h has macros mapping the old names to the new ones, and heapam.h
includes relation, table.h. That also allows to keep the
bulk renaming of existing callers in a separate commit.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
Historically, the notices output by DROP CASCADE tended to come out
in uncertain order, and in some cases you might get different claims
about which object depends on which other one. This is because we
just traversed the dependency tree in the order in which pg_depend
entries are seen, and nbtree has never promised anything about the
order of equal-keyed index entries. We've put up with that for years,
hacking regression tests when necessary to prevent them from emitting
unstable output. However, it's a problem for pending work that will
change nbtree's behavior for equal keys, as that causes unexpected
changes in the regression test results.
Hence, adjust findDependentObjects to sort the results of each
indexscan before processing them. The sort is on descending OID of
the dependent objects, hence more or less reverse creation order.
While this rule could still result in bogus regression test failures
if an OID wraparound occurred mid-test, that seems unlikely to happen
in any plausible development or packaging-test scenario.
This is enough to ensure output stability for ordinary DROP CASCADE
commands, but not for DROP OWNED BY, because that has a different
code path with the same problem. We might later choose to sort in
the DROP OWNED BY code as well, but this patch doesn't do so.
I've also not done anything about reverting the existing hacks to
suppress unstable DROP CASCADE output in specific regression tests.
It might be worth undoing those, but it seems like a distinct question.
The first indexscan loop in findDependentObjects is not touched,
meaning there is a hazard of unstable error reports from that too.
However, said hazard is not the fault of that code: it was designed
on the assumption that there could be at most one "owning" object
to complain about, and that assumption does not seem unreasonable.
The recent patch that added the possibility of multiple
DEPENDENCY_INTERNAL_AUTO links broke that assumption, but we should
fix that situation not band-aid around it. That's a matter for
another patch, though.
Discussion: https://postgr.es/m/12244.1547854440@sss.pgh.pa.us
Previously, in set_append_rel_size(), we generated tlists and EC members
for dummy children for possible use by partition-wise join, even if
partition-wise join was disabled or the top parent was not a partitioned
table, but adding such EC members causes noticeable planning speed
degradation for queries with certain kinds of join quals like
"(foo.x + bar.y) = constant" where foo and bar are partitioned tables in
cases where there are lots of dummy children, as the EC members lists
grow huge, especially for the ECs derived from such join quals, which
makes the search for the parent EC members in add_child_rel_equivalences()
very time-consuming. Postpone the work until such children are actually
involved in a partition-wise join.
Reported-by: Sanyo Capobiango
Analyzed-by: Justin Pryzby and Alvaro Herrera
Author: Amit Langote, with a few additional changes by me
Reviewed-by: Ashutosh Bapat
Backpatch-through: v11 where partition-wise join was added
Discussion: https://postgr.es/m/CAO698qZnrxoZu7MEtfiJmpmUtz3AVYFVnwzR%2BpqjF%3DrmKBTgpw%40mail.gmail.com
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.). Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.
Author: Surafel Temesgen
Reviewed-by: Tomas Vondra, Masahiko Sawada, Lim Myungkyu
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
Commit c0d0e54084 replaced the ones in the documentation, but missed out
on the ones in the code. Replace those as well, but unlike c0d0e54084,
don't backpatch the code changes to avoid breaking translations.
When creating a foreign key in a partitioned table, if some partitions
already have equivalent constraints, we wastefully create duplicates of
the constraints instead of attaching to the existing ones. That's
inconsistent with the de-duplication that is applied when a table is
attached as a partition. To fix, reuse the FK-cloning code instead of
having a separate code path.
Backpatch to Postgres 11. This is a subtle behavior change, but surely
a welcome one since there's no use in having duplicate foreign keys.
Discovered by Álvaro Herrera while thinking about a different problem
reported by Jesper Pedersen (bug #15587).
Author: Álvaro Herrera
Discussion: https://postgr.es/m/201901151935.zfadrzvyof4k@alvherre.pgsql
My commit 3de241dba8 introduced some code to create a clone of a
foreign key to a partition, but I put it in pg_constraint.c because it
was too close to the contents of the pg_constraint row. With the
previous commit that split out the constraint tuple deconstruction into
its own routine, it makes more sense to have the FK-cloning function in
tablecmds.c, mostly because its static subroutine can then be used by a
future bugfix.
My initial posting of this patch had this routine as static in
tablecmds.c, but sadly this function is already part of the Postgres 11
ABI as exported from pg_constraint.c, so keep it as exported also just
to avoid breaking any possible users of it.
My commit 3de241dba8 introduced some code (in tablecmds.c) to obtain
data from a pg_constraint row for a foreign key, that already existed in
ri_triggers.c. Split it out into its own routine in pg_constraint.c,
where it naturally belongs.
No functional code changes, only code movement.
Backpatch to pg11, because a future bugfix is simpler after this.
A cascaded drop might find independent reasons to drop both a table
and some column of the table (for instance, a schema drop might include
dropping a data type used in some table in the schema). Depending on
the order of visitation of pg_depend entries, we might report the
table column and the whole table as separate objects-to-be-dropped,
or we might only report the table. This is confusing and leads to
unstable regression test output, so fix it to report only the table
regardless of visitation order.
Per gripe from Peter Geoghegan. This is just cosmetic from a user's
standpoint, and we haven't actually seen regression test problems in
practice (yet), so I'll refrain from back-patching.
Discussion: https://postgr.es/m/15908.1547762076@sss.pgh.pa.us
Attempting to use a temporary table within a two-phase transaction is
forbidden for ages. However, there have been uncovered grounds for
a couple of other object types and commands which work on temporary
objects with two-phase commit. In short, trying to create, lock or drop
an object on a temporary schema should not be authorized within a
two-phase transaction, as it would cause its state to create
dependencies with other sessions, causing all sorts of side effects with
the existing session or other sessions spawned later on trying to use
the same temporary schema name.
Regression tests are added to cover all the grounds found, the original
report mentioned function creation, but monitoring closer there are many
other patterns with LOCK, DROP or CREATE EXTENSION which are involved.
One of the symptoms resulting in combining both is that the session
which used the temporary schema is not able to shut down completely,
waiting for being able to drop the temporary schema, something that it
cannot complete because of the two-phase transaction involved with
temporary objects. In this case the client is able to disconnect but
the session remains alive on the backend-side, potentially blocking
connection backend slots from being used. Other problems reported could
also involve server crashes.
This is back-patched down to v10, which is where 9b013dc has introduced
MyXactFlags, something that this patch relies on.
Reported-by: Alexey Bashtanov
Author: Michael Paquier
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc
Backpatch-through: 10
Previously, parseCheckAggregates was run before
assign_query_collations, but this causes problems if any expression
has already had a collation assigned by some transform function (e.g.
transformCaseExpr) before parseCheckAggregates runs. The differing
collations would cause expressions not to be recognized as equal to
the ones in the GROUP BY clause, leading to spurious errors about
unaggregated column references.
The result was that CASE expr WHEN val ... would fail when "expr"
contained a GROUPING() expression or matched one of the group by
expressions, and where collatable types were involved; whereas the
supposedly identical CASE WHEN expr = val ... would succeed.
Backpatch all the way; this appears to have been wrong ever since
collations were introduced.
Per report from Guillaume Lelarge, analysis and patch by me.
Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com
Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk