postgresql/src/include
Tomas Vondra db0d67db24 Optimize order of GROUP BY keys
When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may be heavily dependent on the order in which the keys are
compared when building the groups. Grouping does not imply any ordering,
so we're allowed to compare the keys in arbitrary order, and a Hash Agg
leverages this. But for Group Agg, we simply compared keys in the order
as specified in the query. This commit explores alternative ordering of
the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.

When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.

The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.

This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.

The original patch was proposed by Teodor Sigaev, and later improved and
reworked by Dmitry Dolgov. Reviews by a number of people, including me,
Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu.

Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra
Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu
Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
2022-03-31 01:13:33 +02:00
..
access Revert "Fix replay of create database records on standby" 2022-03-29 15:36:21 +02:00
bootstrap Update copyright for 2022 2022-01-07 19:04:57 -05:00
catalog Add range_agg with multirange inputs 2022-03-30 20:16:23 +02:00
commands Add header matching mode to COPY FROM 2022-03-30 09:02:31 +02:00
common Allow parallel zstd compression when taking a base backup. 2022-03-30 09:41:26 -04:00
datatype Update copyright for 2022 2022-01-07 19:04:57 -05:00
executor SQL JSON functions 2022-03-30 16:30:37 -04:00
fe_utils Allow pgbench to retry in some cases. 2022-03-23 19:05:45 +09:00
foreign Update copyright for 2022 2022-01-07 19:04:57 -05:00
jit Update copyright for 2022 2022-01-07 19:04:57 -05:00
lib dshash: Add sequential scan support. 2022-03-10 12:57:05 -08:00
libpq Add system view pg_ident_file_mappings 2022-03-29 10:15:48 +09:00
mb Update copyright for 2022 2022-01-07 19:04:57 -05:00
nodes Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
optimizer Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
parser SQL JSON functions 2022-03-30 16:30:37 -04:00
partitioning Update copyright for 2022 2022-01-07 19:04:57 -05:00
port Refactor DLSUFFIX handling 2022-03-25 08:56:02 +01:00
portability Update copyright for 2022 2022-01-07 19:04:57 -05:00
postmaster Allow archiving via loadable modules. 2022-02-03 14:05:02 -05:00
regex Update copyright for 2022 2022-01-07 19:04:57 -05:00
replication Skip empty transactions for logical replication. 2022-03-30 07:41:05 +05:30
rewrite Update copyright for 2022 2022-01-07 19:04:57 -05:00
snowball Update copyright for 2022 2022-01-07 19:04:57 -05:00
statistics Add stxdinherit flag to pg_statistic_ext_data 2022-01-16 13:38:01 +01:00
storage Add new block-by-block strategy for CREATE DATABASE. 2022-03-29 11:48:36 -04:00
tcop Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
tsearch Update copyright for 2022 2022-01-07 19:04:57 -05:00
utils Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
.gitignore Refactor dlopen() support 2018-09-06 11:33:04 +02:00
c.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
fmgr.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
funcapi.h Create routine able to set single-call SRFs for Materialize mode 2022-03-07 10:26:29 +09:00
getaddrinfo.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
getopt_long.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
Makefile Build in some knowledge about foreign-key relationships in the catalogs. 2021-02-02 17:11:55 -05:00
miscadmin.h Remove MaxBackends variable in favor of GetMaxBackends() function. 2022-02-08 15:53:19 -05:00
pg_config.h.in Refactor DLSUFFIX handling 2022-03-25 08:56:02 +01:00
pg_config_ext.h.in Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_manual.h Fix DROP {DATABASE,TABLESPACE} on Windows. 2022-02-12 10:21:23 +13:00
pg_getopt.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
pg_trace.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
pgstat.h pgstat: reorder pgstat.[ch] contents. 2022-03-21 16:21:00 -07:00
pgtar.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
pgtime.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
port.h Clean up messy API for src/port/thread.c. 2022-01-11 13:46:20 -05:00
postgres.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
postgres_ext.h Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
postgres_fe.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
rusagestub.h Update copyright for 2022 2022-01-07 19:04:57 -05:00
windowapi.h Update copyright for 2022 2022-01-07 19:04:57 -05:00