Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result. A tiny minority of calls
creatively did it differently. Bring them into line with the rest.
This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.
Back-patch to v11 just to keep the branches in sync. (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)
Per searching to see if anyplace else had made the same error
repaired in 62148c352.
The interaction of these parameters was a bit confused/confusing,
and in fact v11 entirely misses the opportunity to apply partition
constraints when a partition is accessed directly (rather than
indirectly from its parent).
In HEAD, establish the principle that enable_partition_pruning controls
partition pruning and nothing else. When accessing a partition via its
parent, we do partition pruning (if enabled by enable_partition_pruning)
and then there is no need to consider partition constraints in the
constraint_exclusion logic. When accessing a partition directly, its
partition constraints are applied by the constraint_exclusion logic,
only if constraint_exclusion = on.
In v11, we can't have such a clean division of these GUCs' effects,
partly because we don't want to break compatibility too much in a
released branch, and partly because the clean coding requires
inheritance_planner to have applied partition pruning to a partitioned
target table, which it doesn't in v11. However, we can tweak things
enough to cover the missed case, which seems like a good idea since
it's potentially a performance regression from v10. This patch keeps
v11's previous behavior in which enable_partition_pruning overrides
constraint_exclusion for an inherited target table, though.
In HEAD, also teach relation_excluded_by_constraints that it's okay to use
inheritable constraints when trying to prune a traditional inheritance
tree. This might not be thought worthy of effort given that that feature
is semi-deprecated now, but we have enough infrastructure that it only
takes a couple more lines of code to do it correctly.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/9813f079-f16b-61c8-9ab7-4363cab28d80@lab.ntt.co.jp
Discussion: https://postgr.es/m/29069.1555970894@sss.pgh.pa.us
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
The assertions added by commit b04aeb0a0 exposed that there are some
code paths wherein the executor will try to open an index without
holding any lock on it. We do have some lock on the index's table,
so it seems likely that there's no fatal problem with this (for
instance, the index couldn't get dropped from under us). Still,
it's bad practice and we should fix it.
To do so, remove the optimizations in ExecInitIndexScan and friends
that tried to avoid taking a lock on an index belonging to a target
relation, and just take the lock always. In non-bug cases, this
will result in no additional shared-memory access, since we'll find
in the local lock table that we already have a lock of the desired
type; hence, no significant performance degradation should occur.
Also, adjust the planner and executor so that the type of lock taken
on an index is always identical to the type of lock taken for its table,
by relying on the recently added RangeTblEntry.rellockmode field.
This avoids some corner cases where that might not have been true
before (possibly resulting in extra locking overhead), and prevents
future maintenance issues from having multiple bits of logic that
all needed to be in sync. In addition, this change removes all core
calls to ExecRelationIsTargetRelation, which avoids a possible O(N^2)
startup penalty for queries with large numbers of target relations.
(We'd probably remove that function altogether, were it not that we
advertise it as something that FDWs might want to use.)
Also adjust some places in selfuncs.c to not take any lock on indexes
they are transiently opening, since we can assume that plancat.c
did that already.
In passing, change gin_clean_pending_list() to take RowExclusiveLock
not AccessShareLock on its target index. Although it's not clear that
that's actually a bug, it seemed very strange for a function that's
explicitly going to modify the index to use only AccessShareLock.
David Rowley, reviewed by Julien Rouhaud and Amit Langote,
a bit of further tweaking by me
Discussion: https://postgr.es/m/19465.1541636036@sss.pgh.pa.us
This moves bitmap heap scan support to below an optional tableam
callback. It's optional as the whole concept of bitmap heapscans is
fairly block specific.
This basically moves the work previously done in bitgetpage() into the
new scan_bitmap_next_block callback, and the direct poking into the
buffer done in BitmapHeapNext() into the new scan_bitmap_next_tuple()
callback.
The abstraction is currently somewhat leaky because
nodeBitmapHeapscan.c's prefetching and visibilitymap based logic
remains - it's likely that we'll later have to move more into the
AM. But it's not trivial to do so without introducing a significant
amount of code duplication between the AMs, so that's a project for
later.
Note that now nodeBitmapHeapscan.c and the associated node types are a
bit misnamed. But it's not clear whether renaming wouldn't be a cure
worse than the disease. Either way, that'd be best done in a separate
commit.
Author: Andres Freund
Reviewed-By: Robert Haas (in an older version)
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
We can set this up once and for all in subquery_planner's initial survey
of the flattened rangetable, rather than incrementally adjusting it in
build_simple_rel. The previous approach made it rather hard to reason
about exactly when the value would be available, and we were definitely
using it in some places before the final value was computed.
Noted while fooling around with Amit Langote's patch to delay creation
of inheritance child rels. That didn't break this code, but it made it
even more fragile, IMO.
This just moves the table/matview[/toast] determination of relation
size to a callback, and uses a copy of the existing logic to implement
that callback for heap.
It probably would make sense to also move the index specific logic
into a callback, so the metapage handling (and probably more) can be
index specific. But that's a separate task.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
Previously, the planner created RangeTblEntry and RelOptInfo structs
for every partition of a partitioned table, even though many of them
might later be deemed uninteresting thanks to partition pruning logic.
This incurred significant overhead when there are many partitions.
Arrange to postpone creation of these data structures until after
we've processed the query enough to identify restriction quals for
the partitioned table, and then apply partition pruning before not
after creation of each partition's data structures. In this way
we need not open the partition relations at all for partitions that
the planner has no real interest in.
For queries that can be proven at plan time to access only a small
number of partitions, this patch improves the practical maximum
number of partitions from under 100 to perhaps a few thousand.
Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.
This implements one kind of generated column: stored (computed on
write). Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
Up to now, otherrel RelOptInfos were built at the same time as baserel
RelOptInfos, thanks to recursion in build_simple_rel(). However,
nothing in query_planner's preprocessing cares at all about otherrels,
only baserels, so we don't really need to build them until just before
we enter make_one_rel. This has two benefits:
* create_lateral_join_info did a lot of extra work to propagate
lateral-reference information from parents to the correct children.
But if we delay creation of the children till after that, it's
trivial (and much harder to break, too).
* Since we have all the restriction quals correctly assigned to
parent appendrels by this point, it'll be possible to do plan-time
pruning and never make child RelOptInfos at all for partitions that
can be pruned away. That's not done here, but will be later on.
Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
We've been creating duplicate RTEs for partitioned tables just
because we do so for regular inheritance parent tables. But unlike
regular-inheritance parents which are themselves regular tables
and thus need to be scanned, partitioned tables don't need the
extra RTE.
This makes the conditions for building a child RTE the same as those
for building an AppendRelInfo, allowing minor simplification in
expand_single_inheritance_child. Since the planner's actual processing
is driven off the AppendRelInfo list, nothing much changes beyond that,
we just have one fewer useless RTE entry.
Amit Langote, reviewed and hacked a bit by me
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
If there's only one child relation, the Append or MergeAppend isn't
doing anything useful, and can be elided. It does have a purpose
during planning though, which is to serve as a buffer between parent
and child Var numbering. Therefore we keep it all the way through
to setrefs.c, and get rid of it only after fixing references in the
plan level(s) above it. This works largely the same as setrefs.c's
ancient hack to get rid of no-op SubqueryScan nodes, and can even
share some code with that.
Note the change to make setrefs.c use apply_tlist_labeling rather than
ad-hoc code. This has the effect of propagating the child's resjunk
and ressortgroupref labels, which formerly weren't propagated when
removing a SubqueryScan. Doing that is demonstrably necessary for
the [Merge]Append cases, and seems harmless for SubqueryScan, if only
because trivial_subqueryscan is afraid to collapse cases where the
resjunk marking differs. (I suspect that restriction could now be
removed, though it's unclear that it'd make any new matches possible,
since the outer query can't have references to a child resjunk column.)
David Rowley, reviewed by Alvaro Herrera and Tomas Vondra
Discussion: https://postgr.es/m/CAKJS1f_7u8ATyJ1JGTMHFoKDvZdeF-iEBhs+sM_SXowOr9cArg@mail.gmail.com
I (tgl) remain dubious that it's a good idea for PartitionDirectory
to hold a pin on a relcache entry throughout planning, rather than
copying the data or using some kind of refcount scheme. However, it's
certainly the responsibility of the PartitionDirectory code to ensure
that what it's handing back is a stable data structure, not that of
its caller. So this is a pretty clear oversight in commit 898e5e329,
and one that can cost a lot of performance when there are many
partitions.
Amit Langote (extracted from a much larger patch set)
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.
To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc. If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued. To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.
Second, successive calls to RelationGetPartitionDesc might not return
the same answer. The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning. Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times. Invent a new
PartitionDirectory concept and use it to ensure consistency. This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view. Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.
For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point. Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started. We might want to tighten that up, but it's material for a
separate patch. There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.
Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution. So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly. There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.
This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc. That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends. Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends. There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.
Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.
Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
In particular, make it possible to prove/refute "x IS NULL" and
"x IS NOT NULL" predicates from a clause involving a ScalarArrayOpExpr
even when we are unable or unwilling to deconstruct the expression
into an AND/OR tree. This avoids a former unexpected degradation of
plan quality when the size of an ARRAY[] expression or array constant
exceeded the arbitrary MAX_SAOP_ARRAY_SIZE limit. For IS-NULL proofs,
we don't really care about the values of the individual array elements;
at most, we care whether there are any, and for some common cases we
needn't even know that.
The main user-visible effect of this is to let the optimizer recognize
applicability of partial indexes with "x IS NOT NULL" predicates to
queries with "x IN (array)" clauses in some cases where it previously
failed to recognize that. The structure of predtest.c is such that a
bunch of related proofs will now also succeed, but they're probably
much less useful in the wild.
James Coleman, reviewed by David Rowley
Discussion: https://postgr.es/m/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg@mail.gmail.com
Previously, this function acquired locks in the order using
find_all_inheritors(), which locks the children of each table that it
processes in ascending OID order, and which processes the inheritance
hierarchy as a whole in a breadth-first fashion. Now, it processes
the inheritance hierarchy in a depth-first fashion, and at each level
it proceeds in the order in which tables appear in the PartitionDesc.
If table inheritance rather than table partitioning is used, the old
order is preserved.
This change moves the locking of any given partition much closer to
the code that actually expands that partition. This seems essential
if we ever want to allow concurrent DDL to add or remove partitions,
because if the set of partitions can change, we must use the same data
to decide which partitions to lock as we do to decide which partitions
to expand; otherwise, we might expand a partition that we haven't
locked. It should hopefully also facilitate efforts to postpone
inheritance expansion or locking for performance reasons, because
there's really no way to postpone locking some partitions if
we're blindly locking them all using find_all_inheritors().
The only downside of this change which is known to me is that it
further deviates from the principle that we should always lock the
inheritance hierarchy in find_all_inheritors() order to avoid deadlock
risk. However, we've already crossed that bridge in commit
9eefba181f and there are futher patches
pending that make similar changes, so this isn't really giving up
anything that we haven't surrendered already -- and it seems entirely
worth it, given the performance benefits some of those changes seem
likely to bring.
Patch by me; thanks to David Rowley for discussion of these issues.
Discussion: http://postgr.es/m/CAKJS1f_eEYVEq5tM8sm1k-HOwG0AyCPwX54XG9x4w0zy_N4Q_Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
This is similar in spirit to the existing partbounds.c file in the
same directory, except that there's a lot less code in the new file
created by this commit. Pending work in this area proposes to add a
bunch more code related to PartitionDescs, though, and this will give
us a good place to put it.
Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
PG type coercions are generally strict, ie a NULL input must produce
a NULL output (or, in domain cases, possibly an error). The planner's
understanding of that was a bit incomplete though, so improve it:
* Teach contain_nonstrict_functions() that CoerceViaIO can always be
considered strict. Previously it believed that only if the underlying
I/O functions were marked strict, which is often but not always true.
* Teach clause_is_strict_for() that CoerceViaIO, ArrayCoerceExpr,
ConvertRowtypeExpr, CoerceToDomain can all be considered strict.
Previously it knew nothing about any of them.
The main user-visible impact of this is that IS NOT NULL predicates
can be proven to hold from expressions involving casts in more cases
than before, allowing partial indexes with such predicates to be used
without extra pushups. This reduces the surprise factor for users,
who may well be used to ordinary (function-call-based) casts being
known to be strict.
Per a gripe from Samuel Williams. This doesn't rise to the level of
a bug, IMO, so no back-patch.
Discussion: https://postgr.es/m/27571.1550617881@sss.pgh.pa.us
The recursion in contain_nonstrict_functions_walker() was done wrong,
causing the strictness check to be bypassed for a parse node that
is the immediate input of an ArrayCoerceExpr node. This could allow,
for example, incorrect decisions about whether a strict SQL function
can be inlined.
I didn't add a regression test, because (a) the bug is so narrow
and (b) I couldn't think of a test case that wasn't dependent on a
large number of other behaviors, to the point where it would likely
soon rot to the point of not testing what it was intended to.
I broke this in commit c12d570fa, so back-patch to v11.
Discussion: https://postgr.es/m/27571.1550617881@sss.pgh.pa.us
For a long time, indxpath.c has had the ability to extract derived (lossy)
index conditions from certain operators such as LIKE. For just as long,
it's been obvious that we really ought to make that capability available
to extensions. This commit finally accomplishes that, by adding another
API for planner support functions that lets them create derived index
conditions for their functions. As proof of concept, the hardwired
"special index operator" code formerly present in indxpath.c is pushed
out to planner support functions attached to LIKE and other relevant
operators.
A weak spot in this design is that an extension needs to know OIDs for
the operators, datatypes, and opfamilies involved in the transformation
it wants to make. The core-code prototypes use hard-wired OID references
but extensions don't have that option for their own operators etc. It's
usually possible to look up the required info, but that may be slow and
inconvenient. However, improving that situation is a separate task.
I want to do some additional refactorization around selfuncs.c, but
that also seems like a separate task.
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
Add support function requests for estimating the selectivity, cost,
and number of result rows (if a SRF) of the target function.
The lack of a way to estimate selectivity of a boolean-returning
function in WHERE has been a recognized deficiency of the planner
since Berkeley days. This commit finally fixes it.
In addition, non-constant estimates of cost and number of output
rows are now possible. We still fall back to looking at procost
and prorows if the support function doesn't service the request,
of course.
To make concrete use of the possibility of estimating output rowcount
for SRFs, this commit adds support functions for array_unnest(anyarray)
and the integer variants of generate_series; the lack of plausible
rowcount estimates for those, even when it's obvious to a human,
has been a repeated subject of complaints. Obviously, much more
could now be done in this line, but I'm mostly just trying to get
the infrastructure in place.
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
Rename/repurpose pg_proc.protransform as "prosupport". The idea is
still that it names an internal function that provides knowledge to
the planner about the behavior of the function it's attached to;
but redesign the API specification so that it's not limited to doing
just one thing, but can support an extensible set of requests.
The original purpose of simplifying a function call is handled by
the first request type to be invented, SupportRequestSimplify.
Adjust all the existing transform functions to handle this API,
and rename them fron "xxx_transform" to "xxx_support" to reflect
the potential generalization of what they do. (Since we never
previously provided any way for extensions to add transform functions,
this change doesn't create an API break for them.)
Also add DDL and pg_dump support for attaching a support function to a
user-defined function. Unfortunately, DDL access has to be restricted
to superusers, at least for now; but seeing that support functions
will pretty much have to be written in C, that limitation is just
theoretical. (This support is untested in this patch, but a follow-on
patch will add cases that exercise it.)
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
In place of three separate but interrelated lists (indexclauses,
indexquals, and indexqualcols), an IndexPath now has one list
"indexclauses" of IndexClause nodes. This holds basically the same
information as before, but in a more useful format: in particular, there
is now a clear connection between an indexclause (an original restriction
clause from WHERE or JOIN/ON) and the indexquals (directly usable index
conditions) derived from it.
We also change the ground rules a bit by mandating that clause commutation,
if needed, be done up-front so that what is stored in the indexquals list
is always directly usable as an index condition. This gets rid of repeated
re-determination of which side of the clause is the indexkey during costing
and plan generation, as well as repeated lookups of the commutator
operator. To minimize the added up-front cost, the typical case of
commuting a plain OpExpr is handled by a new special-purpose function
commute_restrictinfo(). For RowCompareExprs, generating the new clause
properly commuted to begin with is not really any more complex than before,
it's just different --- and we can save doing that work twice, as the
pretty-klugy original implementation did.
Tracking the connection between original and derived clauses lets us
also track explicitly whether the derived clauses are an exact or lossy
translation of the original. This provides a cheap solution to getting
rid of unnecessary rechecks of boolean index clauses, which previously
seemed like it'd be more expensive than it was worth.
Another pleasant (IMO) side-effect is that EXPLAIN now always shows
index clauses with the indexkey on the left; this seems less confusing.
This commit leaves expand_indexqual_conditions() and some related
functions in a slightly messy state. I didn't bother to change them
any more than minimally necessary to work with the new data structure,
because all that code is going to be refactored out of existence in
a follow-on patch.
Discussion: https://postgr.es/m/22182.1549124950@sss.pgh.pa.us
Up to now postgres_fdw has been using create_foreignscan_path() to
generate not only base-relation paths, but also paths for foreign joins
and foreign upperrels. This is wrong, because create_foreignscan_path()
calls get_baserel_parampathinfo() which will only do the right thing for
baserels. It accidentally fails to fail for unparameterized paths, which
are the only ones postgres_fdw (thought it) was handling, but we really
need different APIs for the baserel and join cases.
In HEAD, the best thing to do seems to be to split up the baserel,
joinrel, and upperrel cases into three functions so that they can
have different APIs. I haven't actually given create_foreign_join_path
a different API in this commit: we should spend a bit of time thinking
about just what we want to do there, since perhaps FDWs would want to
do something different from the build-up-a-join-pairwise approach that
get_joinrel_parampathinfo expects. In the meantime, since postgres_fdw
isn't prepared to generate parameterized joins anyway, just give it a
defense against trying to plan joins with lateral refs.
In addition (and this is what triggered this whole mess) fix bug #15613
from Srinivasan S A, by teaching file_fdw and postgres_fdw that plain
baserel foreign paths still have outer refs if the relation has
lateral_relids. Add some assertions in relnode.c to catch future
occurrences of the same error --- in particular, to catch other FDWs
doing that, but also as backstop against core-code mistakes like the
one fixed by commit bdd9a99aa.
Bug #15613 also needs to be fixed in the back branches, but the
appropriate fix will look quite a bit different there, since we don't
want to assume that existing FDWs get the word right away.
Discussion: https://postgr.es/m/15613-092be1be9576c728@postgresql.org
The old name of this file was never a very good indication of what it
was for. Now that there's also access/relation.h, we have a potential
confusion hazard as well, so let's rename it to something more apropos.
Per discussion, "pathnodes.h" is reasonable, since a good fraction of
the file is Path node definitions.
While at it, tweak a couple of other headers that were gratuitously
importing relation.h into modules that don't need it.
Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
Create a new header optimizer/optimizer.h, which exposes just the
planner functions that can be used "at arm's length", without need
to access Paths or the other planner-internal data structures defined
in nodes/relation.h. This is intended to provide the whole planner
API seen by most of the rest of the system; although FDWs still need
to use additional stuff, and more thought is also needed about just
what selfuncs.c should rely on.
The main point of doing this now is to limit the amount of new
#include baggage that will be needed by "planner support functions",
which I expect to introduce later, and which will be in relevant
datatype modules rather than anywhere near the planner.
This commit just moves relevant declarations into optimizer.h from
other header files (a couple of which go away because everything
got moved), and adjusts #include lists to match. There's further
cleanup that could be done if we want to decide that some stuff
being exposed by optimizer.h doesn't belong in the planner at all,
but I'll leave that for another day.
Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
Move a few very simple node-creation and node-type-testing functions
from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs.
There's nothing planner-specific about them, as evidenced by the
number of other places that were using them.
While at it, rename and_clause() etc to is_andclause() etc, to clarify
that they are node-type-testing functions not node-creation functions.
And use "static inline" implementations for the shortest ones.
Also, modify flatten_join_alias_vars() and some subsidiary functions
to take a Query not a PlannerInfo to define the join structure that
Vars should be translated according to. They were only using the
"parse" field of the PlannerInfo anyway, so this just requires removing
one level of indirection. The advantage is that now parse_agg.c can
use flatten_join_alias_vars() without the horrid kluge of creating an
incomplete PlannerInfo, which will allow that file to be decoupled from
relation.h in a subsequent patch.
Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
Previously, only literals were allowed. This change allows general
expressions, including functions calls, which are evaluated at the
time the DDL command is executed.
Besides offering some more functionality, it simplifies the parser
structures and removes some inconsistencies in how the literals were
handled.
Author: Kyotaro Horiguchi, Tom Lane, Amit Langote
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/9f88b5e0-6da2-5227-20d0-0d7012beaa1c@lab.ntt.co.jp/
Up to now, createplan.c attempted to share PARAM_EXEC slots for
NestLoopParams across different plan levels, if the same underlying Var
was being fed down to different righthand-side subplan trees by different
NestLoops. This was, I think, more of an artifact of using subselect.c's
PlannerParamItem infrastructure than an explicit design goal, but anyway
that was the end result.
This works well enough as long as the plan tree is executing synchronously,
but the feature whereby Gather can execute the parallelized subplan locally
breaks it. An upper NestLoop node might execute for a row retrieved from
a parallel worker, and assign a value for a PARAM_EXEC slot from that row,
while the leader's copy of the parallelized subplan is suspended with a
different active value of the row the Var comes from. When control
eventually returns to the leader's subplan, it gets the wrong answers if
the same PARAM_EXEC slot is being used within the subplan, as reported
in bug #15577 from Bartosz Polnik.
This is pretty reminiscent of the problem fixed in commit 46c508fbc, and
the proper fix seems to be the same: don't try to share PARAM_EXEC slots
across different levels of controlling NestLoop nodes.
This requires decoupling NestLoopParam handling from PlannerParamItem
handling, although the logic remains somewhat similar. To avoid bizarre
division of labor between subselect.c and createplan.c, I decided to move
all the param-slot-assignment logic for both cases out of those files
and put it into a new file paramassign.c. Hopefully it's a bit better
documented now, too.
A regression test case for this might be nice, but we don't know a
test case that triggers the problem with a suitably small amount
of data.
Back-patch to 9.6 where we added Gather nodes. It's conceivable that
related problems exist in older branches; but without some evidence
for that, I'll leave the older branches alone.
Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
This commit moves expand_inherited_tables and underlings from
optimizer/prep/prepunionc.c to optimizer/utils/inherit.c.
Also, all of the AppendRelInfo-based expression manipulation routines
are moved to optimizer/utils/appendinfo.c.
No functional code changes. One exception is the introduction of
make_append_rel_info, but that's still just moving around code.
Also, stop including <limits.h> in prepunion.c, which no longer needs
it since 3fc6e2d7f5. I (Álvaro) noticed this because Amit was copying
that to inherit.c, which likewise doesn't need it.
Author: Amit Langote
Discussion: https://postgr.es/m/3be67028-a00a-502c-199a-da00eec8fb6e@lab.ntt.co.jp
MinMaxExpr invokes the btree comparison function for its input datatype,
so it's only leakproof if that function is. Many such functions are
indeed leakproof, but others are not, and we should not just assume that
they are. Hence, adjust contain_leaked_vars to verify the leakproofness
of the referenced function explicitly.
I didn't add a regression test because it would need to depend on
some particular comparison function being leaky, and that's a moving
target, per discussion.
This has been wrong all along, so back-patch to supported branches.
Discussion: https://postgr.es/m/31042.1546194242@sss.pgh.pa.us
Up to now we've not worried much about joins where the join key is a
relation's CTID column, reasoning that storing a table's CTIDs in some
other table would be pretty useless. However, there are use-cases for
this sort of query involving self-joins, so that argument doesn't really
hold water.
This patch allows generating plans for joins on CTID that use a nestloop
with inner TidScan, similar to what we might do with an index on the join
column. This is the most efficient way to join when the outer side of
the nestloop is expected to yield relatively few rows.
This change requires upgrading tidpath.c and the generated TidPaths
to work with RestrictInfos instead of bare qual clauses, but that's
long-postponed technical debt anyway.
Discussion: https://postgr.es/m/17443.1545435266@sss.pgh.pa.us
Doing this requires an assumption that the invoked btree comparison
function is immutable. We could check that explicitly, but in other
places such as contain_mutable_functions we just assume that it's true,
so we may as well do likewise here. (If the comparison function's
behavior isn't immutable, the sort order in indexes built with it would
be unstable, so it seems certainly wrong for it not to be so.)
Vik Fearing
Discussion: https://postgr.es/m/c6e8504c-4c43-35fa-6c8f-3c0b80a912cc@2ndquadrant.com
If a domain has no constraints, then CoerceToDomain doesn't really do
anything and can be simplified to a RelabelType. This not only
eliminates cycles at execution, but allows the planner to optimize better
(for instance, match the coerced expression to an index on the underlying
column). However, we do have to support invalidating the plan later if
a constraint gets added to the domain. That's comparable to the case of
a change to a SQL function that had been inlined into a plan, so all the
necessary logic already exists for plans depending on functions. We
need only duplicate or share that logic for domains.
ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval
messages for the domain's pg_type entry, since those operations don't
update that row. (ALTER DOMAIN SET/DROP NOT NULL do update that row,
so no code change is needed for them.)
Testing this revealed what's really a pre-existing bug in plpgsql:
it caches the SQL-expression-tree expansion of type coercions and
had no provision for invalidating entries in that cache. Up to now
that was only a problem if such an expression had inlined a SQL
function that got changed, which is unlikely though not impossible.
But failing to track changes of domain constraints breaks an existing
regression test case and would likely cause practical problems too.
We could fix that locally in plpgsql, but what seems like a better
idea is to build some generic infrastructure in plancache.c to store
standalone expressions and track invalidation events for them.
(It's tempting to wonder whether plpgsql's "simple expression" stuff
could use this code with lower overhead than its current use of the
heavyweight plancache APIs. But I've left that idea for later.)
Other stuff fixed in passing:
* Allow estimate_expression_value() to drop CoerceToDomain
unconditionally, effectively assuming that the coercion will succeed.
This will improve planner selectivity estimates for cases involving
estimatable expressions that are coerced to domains. We could have
done this independently of everything else here, but there wasn't
previously any need for eval_const_expressions_mutator to know about
CoerceToDomain at all.
* Use a dlist for plancache.c's list of cached plans, rather than a
manually threaded singly-linked list. That eliminates a potential
performance problem in DropCachedPlan.
* Fix a couple of inconsistencies in typecmds.c about whether
operations on domains drop RowExclusiveLock on pg_type. Our common
practice is that DDL operations do drop catalog locks, so standardize
on that choice.
Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
The planner doesn't make any use of root->total_table_pages until it
estimates costs of indexscans, so we don't need to compute it as
early as that's currently done. By doing the calculation between
set_base_rel_sizes and set_base_rel_pathlists, we can omit relations
that get removed from the query by partition pruning or constraint
exclusion, which seems like a more accurate basis for costing.
(Historical note: I think at the time this code was written, there
was not a separation between the "set sizes" and "set pathlists"
steps, so that this approach would have been impossible at the time.
But now that we have that separation, this is clearly the better way
to do things.)
David Rowley, reviewed by Edmund Horner
Discussion: https://postgr.es/m/CAKJS1f-NG1mRM0VOtkAG7=ZLQWihoqees9R4ki3CKBB0-fRfCA@mail.gmail.com
A ConvertRowtypeExpr is used to translate a whole-row reference of a
child to that of a parent. The planner produces nested
ConvertRowtypeExpr while translating whole-row reference of a leaf
partition in a multi-level partition hierarchy. Executor then
translates the whole-row reference from the leaf partition into all
the intermediate parent's whole-row references before arriving at the
final whole-row reference. It could instead translate the whole-row
reference from the leaf partition directly to the top-most parent's
whole-row reference skipping any intermediate translations.
Ashutosh Bapat, with tests by Kyotaro Horiguchi and some
editorialization by me. Reviewed by Andres Freund, Pavel Stehule,
Kyotaro Horiguchi, Dmitry Dolgov, Tom Lane.
An array-type coercion appearing within a CASE that has a constant
(after const-folding) test expression was mangled by the planner, causing
all the elements of the resulting array to be equal to the coerced value
of the CASE's test expression. This is my oversight in commit c12d570fa:
that changed ArrayCoerceExpr to use a subexpression involving a
CaseTestExpr, and I didn't notice that eval_const_expressions needed an
adjustment to keep from folding such a CaseTestExpr to a constant when
it's inside a suitable CASE.
This is another in what's getting to be a depressingly long line of bugs
associated with misidentification of the referent of a CaseTestExpr.
We're overdue to redesign that mechanism; but any such fix is unlikely
to be back-patchable into v11. As a stopgap, fix eval_const_expressions
to do what it must here. Also add a bunch of comments pointing out the
restrictions and assumptions that are needed to make this work at all.
Also fix a related oversight: contain_context_dependent_node() was not
aware of the relationship of ArrayCoerceExpr to CaseTestExpr. That was
somewhat fail-soft, in that the outcome of a wrong answer would be to
prevent optimizations that could have been made, but let's fix it while
we're at it.
Per bug #15471 from Matt Williams. Back-patch to v11 where the faulty
logic came in.
Discussion: https://postgr.es/m/15471-1117f49271989bad@postgresql.org