mirror of
https://github.com/postgres/postgres.git
synced 2026-03-22 10:30:21 -04:00
This commit includes various optimizations to improve the performance of tuple deformation. We now precalculate CompactAttribute's attcacheoff, which allows us to remove the code from the deform routines which was setting the attcacheoff. Setting the attcacheoff is now handled by TupleDescFinalize(), which must be called before the TupleDesc is used for anything. Having TupleDescFinalize() means we can store the first attribute in the TupleDesc which does not have an offset cached. That allows us to add a dedicated deforming loop to deform all attributes up to the final one with an attcacheoff set, or up to the first NULL attribute, whichever comes first. Here we also improve tuple deformation performance of tuples with NULLs. Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask, deforming would, one-by-one, check each and every bit in the NULL bitmap to see if it was zero. Now, we process the NULL bitmap 1 byte at a time rather than 1 bit at a time to find the attnum with the first NULL. We can now deform the tuple without checking for NULLs up to just before that attribute. We also record the maximum attribute number which is guaranteed to exist in the tuple, that is, has a NOT NULL constraint and isn't an atthasmissing attribute. When deforming only attributes prior to the guaranteed attnum, we've no need to access the tuple's natt count. As an additional optimization, we only count fixed-width columns when calculating the maximum guaranteed column, as this eliminates the need to emit code to fetch byref types in the deformation loop for guaranteed attributes. Some locations in the code deform tuples that have yet to go through NOT NULL constraint validation. We're unable to perform the guaranteed attribute optimization when that's the case. This optimization is opt-in via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS flag. This commit also adds a more efficient way of populating the isnull array by using a bit-wise SWAR trick which performs multiplication on the inverse of the tuple's bitmap byte and masking out all but the lower bit of each of the boolean's byte. This results in much more optimal code when compared to determining the NULLness via att_isnull(). 8 isnull elements are processed at once using this method, which means we need to round the tts_isnull array size up to the next 8 bytes. The palloc code does this anyway, but the round-up needed to be formalized so as not to overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing this also allows the NULL-checking deforming loop to more efficiently check the isnull array, rather than doing the bit-wise processing for each attribute that att_isnull() does. The level of performance improvement from these changes seems to vary depending on the CPU architecture. Apple's M chips seem particularly fond of the changes, with some of the tested deform-heavy queries going over twice as fast as before. With x86-64, the speedups aren't quite as large. With tables containing only a small number of columns, the speedups will be less. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
201 lines
5.8 KiB
C
201 lines
5.8 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* nodeWorktablescan.c
|
|
* routines to handle WorkTableScan nodes.
|
|
*
|
|
* Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
*
|
|
* IDENTIFICATION
|
|
* src/backend/executor/nodeWorktablescan.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
#include "executor/executor.h"
|
|
#include "executor/nodeWorktablescan.h"
|
|
|
|
static TupleTableSlot *WorkTableScanNext(WorkTableScanState *node);
|
|
|
|
/* ----------------------------------------------------------------
|
|
* WorkTableScanNext
|
|
*
|
|
* This is a workhorse for ExecWorkTableScan
|
|
* ----------------------------------------------------------------
|
|
*/
|
|
static TupleTableSlot *
|
|
WorkTableScanNext(WorkTableScanState *node)
|
|
{
|
|
TupleTableSlot *slot;
|
|
Tuplestorestate *tuplestorestate;
|
|
|
|
/*
|
|
* get information from the estate and scan state
|
|
*
|
|
* Note: we intentionally do not support backward scan. Although it would
|
|
* take only a couple more lines here, it would force nodeRecursiveunion.c
|
|
* to create the tuplestore with backward scan enabled, which has a
|
|
* performance cost. In practice backward scan is never useful for a
|
|
* worktable plan node, since it cannot appear high enough in the plan
|
|
* tree of a scrollable cursor to be exposed to a backward-scan
|
|
* requirement. So it's not worth expending effort to support it.
|
|
*
|
|
* Note: we are also assuming that this node is the only reader of the
|
|
* worktable. Therefore, we don't need a private read pointer for the
|
|
* tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
|
|
*/
|
|
Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
|
|
|
|
tuplestorestate = node->rustate->working_table;
|
|
|
|
/*
|
|
* Get the next tuple from tuplestore. Return NULL if no more tuples.
|
|
*/
|
|
slot = node->ss.ss_ScanTupleSlot;
|
|
(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
|
|
return slot;
|
|
}
|
|
|
|
/*
|
|
* WorkTableScanRecheck -- access method routine to recheck a tuple in EvalPlanQual
|
|
*/
|
|
static bool
|
|
WorkTableScanRecheck(WorkTableScanState *node, TupleTableSlot *slot)
|
|
{
|
|
/* nothing to check */
|
|
return true;
|
|
}
|
|
|
|
/* ----------------------------------------------------------------
|
|
* ExecWorkTableScan(node)
|
|
*
|
|
* Scans the worktable sequentially and returns the next qualifying tuple.
|
|
* We call the ExecScan() routine and pass it the appropriate
|
|
* access method functions.
|
|
* ----------------------------------------------------------------
|
|
*/
|
|
static TupleTableSlot *
|
|
ExecWorkTableScan(PlanState *pstate)
|
|
{
|
|
WorkTableScanState *node = castNode(WorkTableScanState, pstate);
|
|
|
|
/*
|
|
* On the first call, find the ancestor RecursiveUnion's state via the
|
|
* Param slot reserved for it. (We can't do this during node init because
|
|
* there are corner cases where we'll get the init call before the
|
|
* RecursiveUnion does.)
|
|
*/
|
|
if (node->rustate == NULL)
|
|
{
|
|
WorkTableScan *plan = (WorkTableScan *) node->ss.ps.plan;
|
|
EState *estate = node->ss.ps.state;
|
|
ParamExecData *param;
|
|
|
|
param = &(estate->es_param_exec_vals[plan->wtParam]);
|
|
Assert(param->execPlan == NULL);
|
|
Assert(!param->isnull);
|
|
node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
|
|
Assert(node->rustate);
|
|
|
|
/*
|
|
* The scan tuple type (ie, the rowtype we expect to find in the work
|
|
* table) is the same as the result rowtype of the ancestor
|
|
* RecursiveUnion node. Note this depends on the assumption that
|
|
* RecursiveUnion doesn't allow projection.
|
|
*/
|
|
ExecAssignScanType(&node->ss,
|
|
ExecGetResultType(&node->rustate->ps));
|
|
|
|
/*
|
|
* Now we can initialize the projection info. This must be completed
|
|
* before we can call ExecScan().
|
|
*/
|
|
ExecAssignScanProjectionInfo(&node->ss);
|
|
}
|
|
|
|
return ExecScan(&node->ss,
|
|
(ExecScanAccessMtd) WorkTableScanNext,
|
|
(ExecScanRecheckMtd) WorkTableScanRecheck);
|
|
}
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
* ExecInitWorkTableScan
|
|
* ----------------------------------------------------------------
|
|
*/
|
|
WorkTableScanState *
|
|
ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
|
|
{
|
|
WorkTableScanState *scanstate;
|
|
|
|
/* check for unsupported flags */
|
|
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
|
|
|
|
/*
|
|
* WorkTableScan should not have any children.
|
|
*/
|
|
Assert(outerPlan(node) == NULL);
|
|
Assert(innerPlan(node) == NULL);
|
|
|
|
/*
|
|
* create new WorkTableScanState for node
|
|
*/
|
|
scanstate = makeNode(WorkTableScanState);
|
|
scanstate->ss.ps.plan = (Plan *) node;
|
|
scanstate->ss.ps.state = estate;
|
|
scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
|
|
scanstate->rustate = NULL; /* we'll set this later */
|
|
|
|
/*
|
|
* Miscellaneous initialization
|
|
*
|
|
* create expression context for node
|
|
*/
|
|
ExecAssignExprContext(estate, &scanstate->ss.ps);
|
|
|
|
/*
|
|
* tuple table initialization
|
|
*/
|
|
ExecInitResultTypeTL(&scanstate->ss.ps);
|
|
|
|
/* signal that return type is not yet known */
|
|
scanstate->ss.ps.resultopsset = true;
|
|
scanstate->ss.ps.resultopsfixed = false;
|
|
|
|
ExecInitScanTupleSlot(estate, &scanstate->ss, NULL, &TTSOpsMinimalTuple, 0);
|
|
|
|
/*
|
|
* initialize child expressions
|
|
*/
|
|
scanstate->ss.ps.qual =
|
|
ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
|
|
|
|
/*
|
|
* Do not yet initialize projection info, see ExecWorkTableScan() for
|
|
* details.
|
|
*/
|
|
|
|
return scanstate;
|
|
}
|
|
|
|
/* ----------------------------------------------------------------
|
|
* ExecReScanWorkTableScan
|
|
*
|
|
* Rescans the relation.
|
|
* ----------------------------------------------------------------
|
|
*/
|
|
void
|
|
ExecReScanWorkTableScan(WorkTableScanState *node)
|
|
{
|
|
if (node->ss.ps.ps_ResultTupleSlot)
|
|
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
|
|
|
|
ExecScanReScan(&node->ss);
|
|
|
|
/* No need (or way) to rescan if ExecWorkTableScan not called yet */
|
|
if (node->rustate)
|
|
tuplestore_rescan(node->rustate->working_table);
|
|
}
|