Per buildfarm member mandrill, it seems that
max_parallel_workers_per_gather may not always be set to the default value
of 2 when the new test added in 16fd03e95 is executed. Here let's just
explicitly set that to 2 so that the planner never opts to use more than
that many parallel workers.
This adds combine, serial and deserial functions for the array_agg() and
string_agg() aggregate functions, thus allowing these aggregates to
partake in partial aggregations. This allows both parallel aggregation to
take place when these aggregates are present and also allows additional
partition-wise aggregation plan shapes to include plans that require
additional aggregation once the partially aggregated results from the
partitions have been combined.
Author: David Rowley
Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com
In 1349d2790, we gave the planner the ability to provide ORDER BY/DISTINCT
Aggrefs with presorted input so that nodeAgg would not have to perform
sorts during execution. That commit failed to properly consider the
implications of if the Aggref had a volatile function in its ORDER
BY/DISTINCT clause. As it happened, this resulted in an ERROR about the
volatile function being missing from the targetlist.
Here, instead of adding the volatile function to the targetlist, we just
never consider an Aggref with a volatile function in its ORDER BY/DISTINCT
clause when choosing which Aggrefs we should sort by. We do this as if we
were to choose a plan which provided these aggregates with presorted
input, then if there were many such aggregates which could all share the
same sort order, then it may be surprising if they all shared the same
sort sometimes and didn't at other times when some other set of aggregates
were given presorted results. We can avoid this inconsistency by just
never providing these volatile function aggregates with presorted input.
Reported-by: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCWETioXs5kY8vT6BVguY41_wD962VDk=u_Nvd7S1UXzuQ@mail.gmail.com
1349d279 added query planner support to allow more efficient execution of
aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to
that commit, the planner would only request that the lower planner produce
a plan with the order required for the GROUP BY clause and it would be
left up to nodeAgg.c to perform the final sort of records within each
group so that the aggregate transition functions were called in the
correct order. Now that the planner requests the lower planner produce a
plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind,
there is the possibility that the planner chooses a plan which could be
less efficient than what would have been produced before 1349d279.
While developing 1349d279, I had in mind that Incremental Sort would help
us in cases where an index exists only on the GROUP BY column(s).
Incremental Sort would just replace the implicit tuplesorts which are
being performed in nodeAgg.c. However, because the planner has the
flexibility to instead choose a plan which just performs a full sort on
both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is
potential for the planner to make a bad choice. The costing for
Incremental Sort is not perfect as it assumes an even distribution of rows
to sort within each sort group.
Here we add an escape hatch in the form of the enable_presorted_aggregate
GUC. This will allow users to get the pre-PG16 behavior in cases where
they have no other means to convince the query planner to produce a plan
which only sorts on the GROUP BY column(s).
Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
This reverts commit db0d67db24 and
several follow-on fixes. The idea of making a cost-based choice
of the order of the sorting columns is not fundamentally unsound,
but it requires cost information and data statistics that we don't
really have. For example, relying on procost to distinguish the
relative costs of different sort comparators is pretty pointless
so long as most such comparator functions are labeled with cost 1.0.
Moreover, estimating the number of comparisons done by Quicksort
requires more than just an estimate of the number of distinct values
in the input: you also need some idea of the sizes of the larger
groups, if you want an estimate that's good to better than a factor of
three or so. That's data that's often unknown or not very reliable.
Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.
Even if all the inputs are perfectly reliable, the cost algorithm
as-implemented cannot offer useful information about how to order
sorting columns beyond the point at which the average group size
is estimated to drop to 1.
Close inspection of the code added by db0d67db2 shows that there
are also multiple small bugs. These could have been fixed, but
there's not much point if we don't trust the estimates to be
accurate in-principle.
Finally, the changes in cost_sort's behavior made for very large
changes (often a factor of 2 or so) in the cost estimates for all
sorting operations, not only those for multi-column GROUP BY.
That naturally changes plan choices in many situations, and there's
precious little evidence to show that the changes are for the better.
Given the above doubts about whether the new estimates are really
trustworthy, it's hard to summon much confidence that these changes
are better on the average.
Since we're hard up against the release deadline for v15, let's
revert these changes for now. We can always try again later.
Note: in v15, I left T_PathKeyInfo in place in nodes.h even though
it's unreferenced. Removing it would be an ABI break, and it seems
a bit late in the release cycle for that.
Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been
executed by always performing a sort in nodeAgg.c to sort the tuples in
the current group into the correct order before calling the transition
function on the sorted tuples. This was not great as often there might be
an index that could have provided pre-sorted input and allowed the
transition functions to be called as the rows come in, rather than having
to store them in a tuplestore in order to sort them once all the tuples
for the group have arrived.
Here we change the planner so it requests a path with a sort order which
supports the most amount of ORDER BY / DISTINCT aggregate functions and
add new code to the executor to allow it to support the processing of
ORDER BY / DISTINCT aggregates where the tuples are already sorted in the
correct order.
Since there can be many ORDER BY / DISTINCT aggregates in any given query
level, it's very possible that we can't find an order that suits all of
these aggregates. The sort order that the planner chooses is simply the
one that suits the most aggregate functions. We take the most strictly
sorted variation of each order and see how many aggregate functions can
use that, then we try again with the order of the remaining aggregates to
see if another order would suit more aggregate functions. For example:
SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;
SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...
would just pick a plan ordered by {a} (we give precedence to aggregates
which are earlier in the targetlist).
SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...
would choose to order by {b} since two aggregates suit that vs just one
that requires input ordered by {a}.
Author: David Rowley
Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane
Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may be heavily dependent on the order in which the keys are
compared when building the groups. Grouping does not imply any ordering,
so we're allowed to compare the keys in arbitrary order, and a Hash Agg
leverages this. But for Group Agg, we simply compared keys in the order
as specified in the query. This commit explores alternative ordering of
the keys, trying to find a cheaper one.
In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.
When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.
The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.
The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.
This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.
The original patch was proposed by Teodor Sigaev, and later improved and
reworked by Dmitry Dolgov. Reviews by a number of people, including me,
Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu.
Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra
Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu
Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
The idea behind this patch is to make it possible to run individual
test scripts without running the entire core test suite. Making all
the scripts completely independent would involve a massive rewrite,
and would probably be worse for coverage of things like concurrent DDL.
So this patch just does what seems practical with limited changes.
The net effect is that any test script can be run after running
limited earlier dependencies:
* all scripts depend on test_setup
* many scripts depend on create_index
* other dependencies are few in number, and are documented in
the parallel_schedule file.
To accomplish this, I chose a small number of commonly-used tables
and moved their creation and filling into test_setup. Later scripts
are expected not to modify these tables' data contents, for fear of
affecting other scripts' results. Also, our former habit of declaring
all C functions in one place is now gone in favor of declaring them
where they're used, if that's just one script, or in test_setup if
necessary.
There's more that could be done to remove some of the remaining
inter-script dependencies, but significantly more-invasive changes
would be needed, and at least for now it doesn't seem worth it.
Discussion: https://postgr.es/m/1114748.1640383217@sss.pgh.pa.us
Recursion into the FILTER clause was mis-implemented, such that a
relevant Var or Aggref at the very top of the FILTER clause would
be ignored. (Of course, that'd have to be a plain boolean Var or
boolean-returning aggregate.) The consequence would be
mis-identification of the correct semantic level of the aggregate,
which could lead to not-per-spec query behavior. If the FILTER
expression is an aggregate, this could also lead to failure to issue
an expected "aggregate function calls cannot be nested" error, which
would likely result in a core dump later on, since the planner and
executor aren't expecting such cases to appear.
The root cause is that commit b560ec1b0 blindly copied some code
that assumed it's recursing into a List, and thus didn't examine the
top-level node. To forestall questions about why this call doesn't
look like the others, as well as possible future copy-and-paste
mistakes, let's change all three check_agg_arguments_walker calls in
check_agg_arguments, even though only the one for the filter clause
is really broken.
Per bug #17152 from Zhiyong Wu. This has been wrong since we
implemented FILTER, so back-patch to all supported versions.
(Testing suggests that pre-v11 branches manage to avoid crashing
in the bad-Aggref case, thanks to "redundant" checks in ExecInitAgg.
But I'm not sure how thorough that protection is, and anyway the
wrong-behavior issue remains, so fix 9.6 and 10 too.)
Discussion: https://postgr.es/m/17152-c7f906cc1a88e61b@postgresql.org
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough. That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize". People seem to like "Memoize", so let's do the rename.
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
Here we add a new executor node type named "Result Cache". The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins. This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again. Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.
For certain data sets, this can significantly improve the performance of
joins. The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join. In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch. Merge joins would have to
skip over all of the unmatched rows. If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join. The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large. Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join. This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does. The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables. Smaller hash tables generally perform better.
The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size. We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.
For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node. We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be. Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.
For now, the planner will only consider using a result cache for
parameterized nested loop joins. This works for both normal joins and
also for LATERAL type joins to subqueries. It is possible to use this new
node for other uses in the future. For example, to cache results from
correlated subqueries. However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio. Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.
The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations. With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be. In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%. Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join. However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values. If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join. Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature. Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.
For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache. However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default. There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression. Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default. It remains to be seen if we'll
maintain that setting for the release.
Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch. Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people. If there's some consensus on a better name, then we can
change it before the release. Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.
Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
This removes "Add Result Cache executor node". It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals. It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.
This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.
Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
Here we add a new executor node type named "Result Cache". The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins. This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again. Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.
For certain data sets, this can significantly improve the performance of
joins. The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join. In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch. Merge joins would have to
skip over all of the unmatched rows. If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join. The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large. Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join. This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does. The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables. Smaller hash tables generally perform better.
The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size. We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.
For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node. We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be. Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.
For now, the planner will only consider using a result cache for
parameterized nested loop joins. This works for both normal joins and
also for LATERAL type joins to subqueries. It is possible to use this new
node for other uses in the future. For example, to cache results from
correlated subqueries. However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio. Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.
The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations. With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be. In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%. Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join. However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values. If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join. Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature. Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.
For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache. However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default. There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression. Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default. It remains to be seen if we'll
maintain that setting for the release.
Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch. Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people. If there's some consensus on a better name, then we can
change it before the release. Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.
Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
SERIALIZABLE no longer inhibits parallelism, so we can drop some
outdated workarounds and comments from regression tests. The change
came in release 12, commit bb16aba5, but it's not really worth
back-patching.
Also fix a typo.
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJUaHeK%3DHLATxF1JOKDjKJVrBKA-zmbPAebOM0Se2FQRg%40mail.gmail.com
Add infinities that behave the same as they do in the floating-point
data types. Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.
The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.
Patch by me, reviewed by Dean Rasheed and Andrew Gierth
Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
When there is just one non-null input value, and it is infinity or NaN,
aggregates such as stddev_pop and covar_pop should produce a NaN
result, because the calculation is not well-defined. They used to do
so, but since we adopted Youngs-Cramer aggregation in commit e954a727f,
they produced zero instead. That's an oversight, so fix it. Add tests
exercising these edge cases.
Affected aggregates are
var_pop(double precision)
stddev_pop(double precision)
var_pop(real)
stddev_pop(real)
regr_sxx(double precision,double precision)
regr_syy(double precision,double precision)
regr_sxy(double precision,double precision)
regr_r2(double precision,double precision)
regr_slope(double precision,double precision)
regr_intercept(double precision,double precision)
covar_pop(double precision,double precision)
corr(double precision,double precision)
Back-patch to v12 where the behavior change was accidentally introduced.
Report and patch by me; thanks to Dean Rasheed for review.
Discussion: https://postgr.es/m/353062.1591898766@sss.pgh.pa.us
When merging two NumericAggStates, the code missed adding the new
state's NaNcount unless its N was also nonzero; since those counts
are independent, this is wrong.
This would only have visible effect if some partial aggregate scans
found only NaNs while earlier ones found only non-NaNs; then we could
end up falsely deciding that there were no NaNs and fail to return a
NaN final result as expected. That's pretty improbable, so it's no
surprise this hasn't been reported from the field. Still, it's a bug.
I didn't try to produce a regression test that would show the bug,
but I did notice that these functions weren't being reached at all
in our regression tests, so I improved the tests to at least
exercise them. With these additions, I see pretty complete code
coverage on the aggregation-related functions in numeric.c.
Back-patch to 9.6 where this code was introduced. (I only added
the improved test case as far back as v10, though, since the
relevant part of aggregates.sql isn't there at all in 9.6.)
Eliminate enable_groupingsets_hash_disk, which was primarily useful
for testing grouping sets that use HashAgg and spill. Instead, hack
the table stats to convince the planner to choose hashed aggregation
for grouping sets that will spill to disk. Suggested by Melanie
Plageman.
Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the
meaning of on/off. The new name indicates more strongly that it only
affects the planner. Also, the word "avoid" is less definite, which
should avoid surprises when HashAgg still needs to use the
disk. Change suggested by Justin Pryzby, though I chose a different
GUC name.
Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com
Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com
Backpatch-through: 13
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
If we have, say, an int column that is left-joined to a bigint column
with USING, the merged column is the int column promoted to bigint.
GROUP BY's tests for whether grouping on the merged column allows a
reference to the underlying column, or vice versa, should know about
that relationship --- and they do. But I nearly broke this case with
an ill-advised optimization, so the lack of any test coverage for it
seems like a bad idea.
d4c3a156c added code to remove columns that were not part of a table's
PRIMARY KEY constraint from the GROUP BY clause when all the primary key
columns were present in the group by. This is fine to do since we know
that there will only be one row per group coming from this relation.
However, the logic failed to consider inheritance parent relations. These
can have child relations without a primary key, but even if they did, they
could duplicate one of the parent's rows or one from another child
relation. In this case, those additional GROUP BY columns are required.
Fix this by disabling the optimization for inheritance parent tables.
In v11 and beyond, partitioned tables are fine since partitions cannot
overlap and before v11 partitioned tables could not have a primary key.
Reported-by: Manuel Rigger
Discussion: http://postgr.es/m/CA+u7OA7VLKf_vEr6kLF3MnWSA9LToJYncgpNX2tQ-oWzYCBQAw@mail.gmail.com
Backpatch-through: 9.6
When there were duplicate columns in the hash key list, the array
sizes could be miscomputed, resulting in access off the end of the
array. Adjust the computation to ensure the array is always large
enough.
(I considered whether the duplicates could be removed in planning, but
I can't rule out the possibility that duplicate columns might have
different hash functions assigned. Simpler to just make sure it works
at execution time regardless.)
Bug apparently introduced in fc4b3dea2 as part of narrowing down the
tuples stored in the hashtable. Reported by Colm McHugh of Salesforce,
though I didn't use their patch. Backpatch back to version 10 where
the bug was introduced.
Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
For partial aggregation combine steps,
AggStatePerTrans->numTransInputs was set to the transition function's
number of inputs, rather than the combine function's number of
inputs (always 1).
That lead to partial aggregates with strict combine functions to
wrongly check for NOT NULL input as required by strictness. When the
aggregate wasn't exactly passed one argument, the strictness check was
either omitted (in the 0 args case) or too many arguments were
checked. In the latter case we'd read beyond the end of
FunctionCallInfoData->args (only in master).
AggStatePerTrans->numTransInputs actually has been wrong since since
9.6, where partial aggregates were added. But it turns out to not be
an active problem in 9.6 and 10, because numTransInputs wasn't used at
all for combine functions: Before c253b722f6 there simply was no NULL
check for the input to strict trans functions, and after that the
check was simply hardcoded for the right offset in fcinfo, as it's
done by code specific to combine functions.
In bf6c614a2f (11) the strictness check was generalized, with common
code doing the strictness checks for both plain and combine transition
functions, based on numTransInputs. For combine functions this lead to
not emitting an expression step to check for strict input in the 0
arguments case, and in the > 1 arguments case, we'd check too many
arguments.Due to the fact that the relevant fcinfo->isnull[2..] was
always zero-initialized (more or less by accident, by being part of
the AggStatePerTrans struct, which is palloc0'ed), there was no
observable damage in the latter case before a9c35cf85c, we just
checked too many array elements.
Due to the changes in a9c35cf85c, > 1 argument bug became visible,
because these days fcinfo is a) dynamically allocated without being
zeroed b) exactly the length required for the number of specified
arguments (hardcoded to 2 in this case).
This commit only contains a fairly minimal fix, setting numTransInputs
to a hardcoded 1 when building a pertrans for a combine function. It
seems likely that we'll want to clean this up further (e.g. the
arguments build_pertrans_for_aggref() aren't particularly meaningful
for combine functions). But the wrap date for 12 beta1 is coming up
fast, so it seems good to have a minimal fix in place.
Backpatch to 11. While AggStatePerTrans->numTransInputs was set
wrongly before that, the value was not used for combine functions.
Reported-By: Rajkumar Raghuwanshi
Diagnosed-By: Kyotaro Horiguchi, Jeevan Chalke, Andres Freund, David Rowley
Author: David Rowley, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/CAKcux6=uZEyWyLw0N7HtR9OBc-sWEFeByEZC7t-KDf15FKxVew@mail.gmail.com
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
Previously, parseCheckAggregates was run before
assign_query_collations, but this causes problems if any expression
has already had a collation assigned by some transform function (e.g.
transformCaseExpr) before parseCheckAggregates runs. The differing
collations would cause expressions not to be recognized as equal to
the ones in the GROUP BY clause, leading to spurious errors about
unaggregated column references.
The result was that CASE expr WHEN val ... would fail when "expr"
contained a GROUPING() expression or matched one of the group by
expressions, and where collatable types were involved; whereas the
supposedly identical CASE WHEN expr = val ... would succeed.
Backpatch all the way; this appears to have been wrong ever since
collations were introduced.
Per report from Guillaume Lelarge, analysis and patch by me.
Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com
Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk
I (Andres) broke this unintentionally in 69c3936a14, by checking
strictness for all input expressions computed for an aggregate, rather
than just the input for the aggregate transition function.
Reported-By: Ondřej Bouda
Bisected-By: Tom Lane
Diagnosed-By: Andrew Gierth
Discussion: https://postgr.es/m/2a505161-2727-2473-7c46-591ed108ac52@email.cz
Backpatch: 11-, like 69c3936a14
When computing statistical aggregates like variance, the common
schoolbook algorithm which computes the sum of the squares of the
values and subtracts the square of the mean can lead to a large loss
of precision when using floating point arithmetic, because the
difference between the two terms is often very small relative to the
terms themselves.
To avoid this, re-work these aggregates to use the Youngs-Cramer
algorithm, which is a proven, numerically stable algorithm that
directly aggregates the sum of the squares of the differences of the
values from the mean in a single pass over the data.
While at it, improve the test coverage to test the aggregate combine
functions used during parallel aggregation.
Per report and suggested algorithm from Erich Schubert.
Patch by me, reviewed by Madeleine Thompson.
Discussion: https://postgr.es/m/153313051300.1397.9594490737341194671@wrigleys.postgresql.org
A typo in numeric_poly_combine caused bogus results for queries using
it, but of course would only manifest if parallel aggregation is
performed. Reported by Rajkumar Raghuwanshi.
David Rowley did the diagnosis and the fix; I editorialized rather
heavily on his regression test additions.
Back-patch to v10 where the breakage was introduced (by 9cca11c91).
Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
The query didn't really have a preferred index, leading to platform-
specific choices of which one to use. Adjust it to make sure tenk1_hundred
is always chosen.
Per buildfarm.
When strict aggregate combine functions, used in multi-stage/parallel
aggregation, returned NULL, we didn't check for that, invoking the
combine function with NULL the next round, despite it being strict.
The equivalent code invoking normal transition functions has a check
for that situation, which did not get copied in a7de3dc5c3. Fix the
bug by adding the equivalent check.
Based on a quick look I could not find any strict combine functions in
core actually returning NULL, and it doesn't seem very likely external
users have done so. So this isn't likely to have caused issues in
practice.
Add tests verifying transition / combine functions returning NULL is
tested.
Reported-By: Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de
Backpatch: 9.6, where parallel aggregation was introduced
The built-in OSAs all share the same transition function, so they can
share transition state as long as the final functions cooperate to not
do the sort step more than once. To avoid running the tuplesort object
in randomAccess mode unnecessarily, add a bit of infrastructure to
nodeAgg.c to let the aggregate functions find out whether the transition
state is actually being shared or not.
This doesn't work for the hypothetical aggregates, since those inject
a hypothetical row that isn't traceable to the shared input state.
So they remain marked aggfinalmodify = 'w'.
Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
An aggregate's input expression(s) are not supposed to be evaluated
at all for a row where its FILTER test fails ... but commit 8ed3f11bb
overlooked that requirement. Reshuffle so that aggregates having a
filter clause evaluate their arguments separately from those without.
This still gets the benefit of doing only one ExecProject in the
common case of multiple Aggrefs, none of which have filters.
While at it, arrange for filter clauses to be included in the common
ExecProject evaluation, thus perhaps buying a little bit even when
there are filters.
Back-patch to v10 where the bug was introduced.
Discussion: https://postgr.es/m/30065.1508161354@sss.pgh.pa.us
This ought to work, but the built-in OSAs are not capable of coping,
because their final-functions destructively modify their transition
state (specifically, the contained tuplesort object). That was fine
when those functions were written, but commit 804163bc2 moved the
goalposts without telling orderedsetaggs.c.
We should fix the built-in OSAs to support this, but it will take
a little work, especially if we don't want to sacrifice performance
in the normal non-shared-state case. Given that it took a year after
9.6 release for anyone to notice this bug, we should not prioritize
sharable-state over nonsharable-state performance. And a proper fix
is likely to be more complicated than we'd want to back-patch, too.
Therefore, let's just put in this stop-gap patch to prevent nodeAgg.c
from choosing to use shared state for OSAs. We can revert it in HEAD
when we get a better fix.
Report from Lukas Eder, diagnosis by me, patch by David Rowley.
Back-patch to 9.6 where the problem was introduced.
Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
If a query contained two aggregates that could share the transition value,
we would correctly collect the input into a tuplesort only once, but
incorrectly run the transition function over the accumulated input twice,
in finalize_aggregates(). That caused a crash, when we tried to call
tuplesort_performsort() on an already-freed NULL tuplestore.
Backport to 9.6, where sharing of transition state and this bug were
introduced.
Analysis by Tom Lane.
Discussion: https://www.postgresql.org/message-id/ac5b0b69-744c-9114-6218-8300ac920e61@iki.fi
ExecReScanAgg's check for whether it could re-use a previously calculated
hashtable neglected the possibility that the Agg node might reference
PARAM_EXEC Params that are not referenced by its input plan node. That's
okay if the Params are in upper tlist or qual expressions; but if one
appears in aggregate input expressions, then the hashtable contents need
to be recomputed when the Param's value changes.
To avoid unnecessary performance degradation in the case of a Param that
isn't within an aggregate input, add logic to the planner to determine
which Params are within aggregate inputs. This requires a new field in
struct Agg, but fortunately we never write plans to disk, so this isn't
an initdb-forcing change.
Per report from Jeevan Chalke. This has been broken since forever,
so back-patch to all supported branches.
Andrew Gierth, with minor adjustments by me
Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>
Commit 3fc6e2d7f5 introduced new "upper"
RelOptInfo structures but didn't set consider_parallel for them
correctly, a point I completely missed when reviewing it. Later,
commit e06a38965b made the situation
worse by doing it incorrectly for the grouping relation. Try to
straighten all of that out. Along the way, get rid of the annoying
wholePlanParallelSafe flag, which was only necessarily because of
the fact that upper planning stages didn't use paths at the time
that code was written.
The most important immediate impact of these changes is that
force_parallel_mode will provide useful test coverage in quite a few
more scenarios than it did previously, but it's also necessary
preparation for fixing some problems related to subqueries.
Patch by me, reviewed by Tom Lane.
If we need to use a gating Result node for pseudoconstant quals,
create_scan_plan() intentionally suppresses use_physical_tlist's checks
on whether there are matches for sortgroupref labels, on the grounds that
we don't need matches because we can label the Result's projection output
properly. However, it then called apply_pathtarget_labeling_to_tlist
anyway. This oversight was harmless when written, but in commit aeb9ae645
I made that function throw an error if there was no match. Thus, the
combination of a table scan, pseudoconstant quals, and a non-simple-Var
sortgroupref column threw the dreaded "ORDER/GROUP BY expression not found
in targetlist" error. To fix, just skip applying the labeling in this
case. Per report from Rushabh Lathia.
Report: <CAGPqQf2iLB8t6t-XrL-zR233DFTXxEsfVZ4WSqaYfLupEoDxXA@mail.gmail.com>
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.
This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses. Thus, redundant GROUP BY items are not uncommon. Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.
David Rowley, reviewed by Julien Rouhaud
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.
David Rowley, reviewed and edited by me.
The code for advancing through the input rows overlooked the case that we
might already be past the first row of the row pair now being considered,
in case the previous percentile also fell between the same two input rows.
Report and patch by Andrew Gierth; logic rewritten a bit for clarity by me.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
Make the COPY test, which loads most of the large static tables used in
the tests, also explicitly ANALYZE those tables. This allows us to get
rid of various ad-hoc, and rather redundant, ANALYZE commands that had
gotten stuck into various test scripts over time to ensure we got
consistent plan choices. (We could have done a database-wide ANALYZE,
but that would cause stats to get attached to the small static tables
too, which results in plan changes compared to the historical behavior.
I'm not sure that's a good idea, so not going that far for now.)
Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing
regression tests for lack of an "ANALYZE tenk1" in the subselect test.
There's no need for this in 8.4 since we didn't print any plans back
then.
This patch improves performance of most built-in aggregates that formerly
used a NUMERIC or NUMERIC array as their transition type; this includes
not only aggregates on numeric inputs, but some aggregates on integer
inputs where overflow of an int8 value is a possibility. The code now
uses a special-purpose data structure to avoid array construction and
deconstruction overhead, as well as packing and unpacking overhead for
numeric values.
These aggregates' transition type is now declared as INTERNAL, since
it doesn't correspond to any SQL data type. To keep the planner from
thinking that that means a lot of storage will be used, we make use
of the just-added pg_aggregate.aggtransspace feature. The space estimate
is set to 128 bytes, which is at least in the right ballpark.
Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
There's no inherent reason why an aggregate function can't be variadic
(even VARIADIC ANY) if its transition function can handle the case.
Indeed, this patch to add the feature touches none of the planner or
executor, and little of the parser; the main missing stuff was DDL and
pg_dump support.
It is true that variadic aggregates can create the same sort of ambiguity
about parameters versus ORDER BY keys that was complained of when we
(briefly) had both one- and two-argument forms of string_agg(). However,
the policy formed in response to that discussion only said that we'd not
create any built-in aggregates with varying numbers of arguments, not that
we shouldn't allow users to do it. So the logical extension of that is
we can allow users to make variadic aggregates as long as we're wary about
shipping any such in core.
In passing, this patch allows aggregate function arguments to be named, to
the extent of remembering the names in pg_proc and dumping them in pg_dump.
You can't yet call an aggregate using named-parameter notation. That seems
like a likely future extension, but it'll take some work, and it's not what
this patch is really about. Likewise, there's still some work needed to
make window functions handle VARIADIC fully, but I left that for another
day.
initdb forced because of new aggvariadic field in Aggref parse nodes.
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.
catversion bump due to change in Aggref and WindowFunc.
David Fetter, reviewed by Dean Rasheed.