In a plain join, we can just summarily discard an input tuple with null join key(s), since it cannot match anything from the other side of the join (assuming a strict join operator). However, if the tuple comes from the outer side of an outer join then we have to emit it with null-extension of the other side. Up to now, hash joins did that by inserting the tuple into the hash table as though it were a normal tuple. This is unnecessarily inefficient though, since the required processing is far simpler than for a potentially-matchable tuple. Worse, if there are a lot of such tuples they will bloat the hash bucket they go into, possibly causing useless repeated attempts to split that bucket or increase the number of batches. We have a report of a large join vainly creating many thousands of batches when faced with such input. This patch improves the situation by keeping such tuples out of the hash table altogether, instead pushing them into a separate tuplestore from which we return them later. (One might consider trying to return them immediately; but that would require substantial refactoring, and it doesn't work anyway for cases where we rescan an unmodified hash table.) This works even in parallel hash joins, because whichever worker reads a null-keyed tuple can just return it; there's no need for consultation with other workers. Thus the tuplestores are local storage even in a parallel join. A pre-existing buglet that I noticed while analyzing the code's behavior is that ExecHashRemoveNextSkewBucket fails to decrement hashtable->skewTuples for tuples moved into the main hash table from the skew hash table. This invalidates ExecHashTableInsert's calculation of the number of main-hash-table tuples, though probably not by a lot since we expect the skew table to be small relative to the main one. Nonetheless, let's fix that too while we're here. Bug: #18909 Reported-by: Sergey Koposov <Sergey.Koposov@ed.ac.uk> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/3061845.1746486714@sss.pgh.pa.us |
||
|---|---|---|
| .github | ||
| config | ||
| contrib | ||
| doc | ||
| src | ||
| .cirrus.star | ||
| .cirrus.tasks.yml | ||
| .cirrus.yml | ||
| .dir-locals.el | ||
| .editorconfig | ||
| .git-blame-ignore-revs | ||
| .gitattributes | ||
| .gitignore | ||
| .mailmap | ||
| aclocal.m4 | ||
| configure | ||
| configure.ac | ||
| COPYRIGHT | ||
| GNUmakefile.in | ||
| HISTORY | ||
| Makefile | ||
| meson.build | ||
| meson_options.txt | ||
| README.md | ||
PostgreSQL Database Management System
This directory contains the source code distribution of the PostgreSQL database management system.
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.
Copyright and license information can be found in the file COPYRIGHT.
General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.
The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.