mirror of
https://github.com/postgres/postgres.git
synced 2026-04-14 13:37:39 -04:00
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit5bf748b8). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit5ead85fb(though that was quickly reverted by commitd00107cdfollowing "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit5a1e6df3, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
301 lines
10 KiB
Text
301 lines
10 KiB
Text
<!-- doc/src/sgml/bloom.sgml -->
|
|
|
|
<sect1 id="bloom" xreflabel="bloom">
|
|
<title>bloom — bloom filter index access method</title>
|
|
|
|
<indexterm zone="bloom">
|
|
<primary>bloom</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<literal>bloom</literal> provides an index access method based on
|
|
<ulink url="https://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</ulink>.
|
|
</para>
|
|
|
|
<para>
|
|
A Bloom filter is a space-efficient data structure that is used to test
|
|
whether an element is a member of a set. In the case of an index access
|
|
method, it allows fast exclusion of non-matching tuples via signatures
|
|
whose size is determined at index creation.
|
|
</para>
|
|
|
|
<para>
|
|
A signature is a lossy representation of the indexed attribute(s), and as
|
|
such is prone to reporting false positives; that is, it may be reported
|
|
that an element is in the set, when it is not. So index search results
|
|
must always be rechecked using the actual attribute values from the heap
|
|
entry. Larger signatures reduce the odds of a false positive and thus
|
|
reduce the number of useless heap visits, but of course also make the index
|
|
larger and hence slower to scan.
|
|
</para>
|
|
|
|
<para>
|
|
This type of index is most useful when a table has many attributes and
|
|
queries test arbitrary combinations of them. A traditional btree index is
|
|
faster than a bloom index, but it can require many btree indexes to support
|
|
all possible queries where one needs only a single bloom index. Note
|
|
however that bloom indexes only support equality queries, whereas btree
|
|
indexes can also perform inequality and range searches.
|
|
</para>
|
|
|
|
<sect2 id="bloom-parameters">
|
|
<title>Parameters</title>
|
|
|
|
<para>
|
|
A <literal>bloom</literal> index accepts the following parameters in its
|
|
<literal>WITH</literal> clause:
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>length</literal></term>
|
|
<listitem>
|
|
<para>
|
|
Length of each signature (index entry) in bits. It is rounded up to the
|
|
nearest multiple of <literal>16</literal>. The default is
|
|
<literal>80</literal> bits and the maximum is <literal>4096</literal>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>col1 — col32</literal></term>
|
|
<listitem>
|
|
<para>
|
|
Number of bits generated for each index column. Each parameter's name
|
|
refers to the number of the index column that it controls. The default
|
|
is <literal>2</literal> bits and the maximum is <literal>4095</literal>.
|
|
Parameters for index columns not actually used are ignored.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2 id="bloom-examples">
|
|
<title>Examples</title>
|
|
|
|
<para>
|
|
This is an example of creating a bloom index:
|
|
</para>
|
|
|
|
<programlisting>
|
|
CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
|
|
WITH (length=80, col1=2, col2=2, col3=4);
|
|
</programlisting>
|
|
|
|
<para>
|
|
The index is created with a signature length of 80 bits, with attributes
|
|
i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could
|
|
have omitted the <literal>length</literal>, <literal>col1</literal>,
|
|
and <literal>col2</literal> specifications since those have the default values.
|
|
</para>
|
|
|
|
<para>
|
|
Here is a more complete example of bloom index definition and usage, as
|
|
well as a comparison with equivalent btree indexes. The bloom index is
|
|
considerably smaller than the btree index, and can perform better.
|
|
</para>
|
|
|
|
<programlisting>
|
|
=# CREATE TABLE tbloom AS
|
|
SELECT
|
|
(random() * 1000000)::int as i1,
|
|
(random() * 1000000)::int as i2,
|
|
(random() * 1000000)::int as i3,
|
|
(random() * 1000000)::int as i4,
|
|
(random() * 1000000)::int as i5,
|
|
(random() * 1000000)::int as i6
|
|
FROM
|
|
generate_series(1,10000000);
|
|
SELECT 10000000
|
|
</programlisting>
|
|
|
|
<para>
|
|
A sequential scan over this large table takes a long time:
|
|
<programlisting>
|
|
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------&zwsp;-----------------------------------
|
|
Seq Scan on tbloom (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0.00 loops=1)
|
|
Filter: ((i2 = 898732) AND (i5 = 123451))
|
|
Rows Removed by Filter: 10000000
|
|
Buffers: shared hit=63744
|
|
Planning Time: 0.346 ms
|
|
Execution Time: 357.076 ms
|
|
(6 rows)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Even with the btree index defined the result will still be a
|
|
sequential scan:
|
|
<programlisting>
|
|
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
|
|
CREATE INDEX
|
|
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
|
|
pg_size_pretty
|
|
----------------
|
|
386 MB
|
|
(1 row)
|
|
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------&zwsp;-----------------------------------
|
|
Seq Scan on tbloom (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0.00 loops=1)
|
|
Filter: ((i2 = 898732) AND (i5 = 123451))
|
|
Rows Removed by Filter: 10000000
|
|
Buffers: shared hit=63744
|
|
Planning Time: 0.138 ms
|
|
Execution Time: 351.035 ms
|
|
(6 rows)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Having the bloom index defined on the table is better than btree in
|
|
handling this type of search:
|
|
<programlisting>
|
|
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
|
|
CREATE INDEX
|
|
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
|
|
pg_size_pretty
|
|
----------------
|
|
153 MB
|
|
(1 row)
|
|
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------&zwsp;--------------------------------------------------
|
|
Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0.00 loops=1)
|
|
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
|
|
Rows Removed by Index Recheck: 2300
|
|
Heap Blocks: exact=2256
|
|
Buffers: shared hit=21864
|
|
-> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300.00 loops=1)
|
|
Index Cond: ((i2 = 898732) AND (i5 = 123451))
|
|
Index Searches: 1
|
|
Buffers: shared hit=19608
|
|
Planning Time: 0.099 ms
|
|
Execution Time: 22.632 ms
|
|
(11 rows)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Now, the main problem with the btree search is that btree is inefficient
|
|
when the search conditions do not constrain the leading index column(s).
|
|
A better strategy for btree is to create a separate index on each column.
|
|
Then the planner will choose something like this:
|
|
<programlisting>
|
|
=# CREATE INDEX btreeidx1 ON tbloom (i1);
|
|
CREATE INDEX
|
|
=# CREATE INDEX btreeidx2 ON tbloom (i2);
|
|
CREATE INDEX
|
|
=# CREATE INDEX btreeidx3 ON tbloom (i3);
|
|
CREATE INDEX
|
|
=# CREATE INDEX btreeidx4 ON tbloom (i4);
|
|
CREATE INDEX
|
|
=# CREATE INDEX btreeidx5 ON tbloom (i5);
|
|
CREATE INDEX
|
|
=# CREATE INDEX btreeidx6 ON tbloom (i6);
|
|
CREATE INDEX
|
|
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
|
|
Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.032..0.033 rows=0.00 loops=1)
|
|
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
|
|
Buffers: shared read=6
|
|
-> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.047..0.047 rows=0.00 loops=1)
|
|
Buffers: shared hit=6
|
|
-> Bitmap Index Scan on btreeidx5 (cost=0.00..4.52 rows=11 width=0) (actual time=0.026..0.026 rows=7.00 loops=1)
|
|
Index Cond: (i5 = 123451)
|
|
Index Searches: 1
|
|
Buffers: shared hit=3
|
|
-> Bitmap Index Scan on btreeidx2 (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8.00 loops=1)
|
|
Index Cond: (i2 = 898732)
|
|
Index Searches: 1
|
|
Buffers: shared hit=3
|
|
Planning Time: 0.264 ms
|
|
Execution Time: 0.047 ms
|
|
(15 rows)
|
|
</programlisting>
|
|
Although this query runs much faster than with either of the single
|
|
indexes, we pay a penalty in index size. Each of the single-column
|
|
btree indexes occupies 88.5 MB, so the total space needed is 531 MB,
|
|
over three times the space used by the bloom index.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="bloom-operator-class-interface">
|
|
<title>Operator Class Interface</title>
|
|
|
|
<para>
|
|
An operator class for bloom indexes requires only a hash function for the
|
|
indexed data type and an equality operator for searching. This example
|
|
shows the operator class definition for the <type>text</type> data type:
|
|
</para>
|
|
|
|
<programlisting>
|
|
CREATE OPERATOR CLASS text_ops
|
|
DEFAULT FOR TYPE text USING bloom AS
|
|
OPERATOR 1 =(text, text),
|
|
FUNCTION 1 hashtext(text);
|
|
</programlisting>
|
|
</sect2>
|
|
|
|
<sect2 id="bloom-limitations">
|
|
<title>Limitations</title>
|
|
<para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Only operator classes for <type>int4</type> and <type>text</type> are
|
|
included with the module.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Only the <literal>=</literal> operator is supported for search. But
|
|
it is possible to add support for arrays with union and intersection
|
|
operations in the future.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
<literal>bloom</literal> access method doesn't support
|
|
<literal>UNIQUE</literal> indexes.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
<literal>bloom</literal> access method doesn't support searching for
|
|
<literal>NULL</literal> values.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="bloom-authors">
|
|
<title>Authors</title>
|
|
|
|
<para>
|
|
Teodor Sigaev <email>teodor@postgrespro.ru</email>,
|
|
Postgres Professional, Moscow, Russia
|
|
</para>
|
|
|
|
<para>
|
|
Alexander Korotkov <email>a.korotkov@postgrespro.ru</email>,
|
|
Postgres Professional, Moscow, Russia
|
|
</para>
|
|
|
|
<para>
|
|
Oleg Bartunov <email>obartunov@postgrespro.ru</email>,
|
|
Postgres Professional, Moscow, Russia
|
|
</para>
|
|
</sect2>
|
|
|
|
</sect1>
|