Doc: improve explanation of GiST compress/decompress methods.

The docs previously didn't explain that leaf and non-leaf keys
could be treated differently, even though many of our opclasses
do exactly that.  It also wasn't explained how that relates to
the STORAGE option, particularly since only one storage type
can be specified for both leaf and non-leaf keys.

While here, reorganize the text slightly, rather than sticking
additional detail into what's supposed to be a brief summary
paragraph.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+renyWs5Np+FLSYfL+eu20S4U671A3fQGb-+7e22HLrD1NbYw@mail.gmail.com
This commit is contained in:
Tom Lane 2026-03-31 11:23:20 -04:00
parent 7b424e3108
commit fb7a9050d5
2 changed files with 31 additions and 9 deletions

View file

@ -273,14 +273,10 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
index will depend on the <function>penalty</function> and <function>picksplit</function>
methods.
Two optional methods are <function>compress</function> and
<function>decompress</function>, which allow an index to have internal tree data of
a different type than the data it indexes. The leaves are to be of the
indexed data type, while the other tree nodes can be of any C struct (but
you still have to follow <productname>PostgreSQL</productname> data type rules here,
see about <literal>varlena</literal> for variable sized data). If the tree's
internal data type exists at the SQL level, the <literal>STORAGE</literal> option
of the <command>CREATE OPERATOR CLASS</command> command can be used.
The optional eighth method is <function>distance</function>, which is needed
<function>decompress</function>, which allow an index to store keys that
are of a different type than the data it indexes, or are a compressed
representation of that type.
The optional eighth method <function>distance</function> is needed
if the operator class wishes to support ordered scans (nearest-neighbor
searches). The optional ninth method <function>fetch</function> is needed if the
operator class wishes to support index-only scans, except when the
@ -294,6 +290,7 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
<filename>src/include/access/cmptype.h</filename>) into strategy numbers
used by the operator class. This lets the core code look up operators for
temporal constraint indexes.
All these methods are described in more detail below.
</para>
<variablelist>
@ -484,6 +481,24 @@ my_union(PG_FUNCTION_ARGS)
in the index without modification.
</para>
<para>
Use the <literal>STORAGE</literal> option of the <command>CREATE
OPERATOR CLASS</command> command to define the data type that is
stored in the index, if it is different from the data type being
indexed. Be aware however that the <literal>STORAGE</literal> data
type is only used to define the physical properties of the index
entries (their <replaceable>typlen</replaceable>,
<replaceable>typbyval</replaceable>,
and <replaceable>typalign</replaceable> attributes). What is
actually in the index datums is under the control of the
<function>compress</function> and <function>decompress</function>
methods, so long as the stored datums match those properties.
It is allowed for <function>compress</function> to produce different
representations for leaf keys than for keys on higher-level index
pages, so long as both representations match
the <literal>STORAGE</literal> data type.
</para>
<para>
The <acronym>SQL</acronym> declaration of the function must look like this:

View file

@ -10,9 +10,13 @@ GiST stands for Generalized Search Tree. It was introduced in the seminal paper
Jeffrey F. Naughton, Avi Pfeffer:
http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
Concurrency support was described in "Concurrency and Recovery in Generalized
Search Trees", 1997, Marcel Kornacker, C. Mohan, Joseph M. Hellerstein:
https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
and implemented by J. Hellerstein and P. Aoki in an early version of
GiST was implemented by J. Hellerstein and P. Aoki in an early version of
PostgreSQL (more details are available from The GiST Indexing Project
at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
project it had a limited number of features and was in rare use.
@ -55,6 +59,9 @@ The original algorithms were modified in several ways:
it is now a single-pass algorithm.
* Since the papers were theoretical, some details were omitted and we
had to find out ourself how to solve some specific problems.
* The 1997 paper above (but not the 1995 one) states that leaf pages should
store the original key. While that can be done in PostgreSQL, it is
also possible to use a compressed representation in leaf pages.
Because of the above reasons, we have revised the interaction of GiST
core and PostgreSQL WAL system. Moreover, we encountered (and solved)