Clarify the documentation of COMMENT ON to state that specifying an empty
string is treated as NULL, meaning that the comment is removed.
This makes the behavior explicit and avoids possible confusion about how
empty strings are handled.
Also adds regress test cases that use empty string to remove a comment.
Backpatch to all supported versions.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Shengbin Zhao <zshengbin91@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: zhangqiang <zhang_qiang81@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/26476097-B1C1-4BA8-AA92-0AD0B8EC7190@gmail.com
Backpatch-through: 14
Transform low_compare and high_compare nbtree skip array inequalities
(with opclasses that offer skip support) in such a way as to allow
_bt_first to consistently apply later keys when it descends the tree.
This can lower the number of index searches for multi-column scans that
use a ">" key on one of the index's prefix columns (or use a "<" key,
when scanning backwards) when it precedes some later lower-order key.
For example, an index qual "WHERE a > 5 AND b = 2" will now be converted
to "WHERE a >= 6 AND b = 2" by a new preprocessing step that takes place
after low_compare and high_compare have been finalized. That way, the
initial call to _bt_first can use "WHERE a >= 6 AND b = 2" to find an
initial position, rather than just using "WHERE a > 5" -- "b = 2" can be
applied during every _bt_first call. There's a decent chance that this
will allow such a scan to avoid the extra search that might otherwise be
needed to determine the lowest "a" value still satisfying "WHERE a > 5".
The transformation process can only lower the total number of index
pages read when the use of a more restrictive set of initial positioning
keys in _bt_first actually allows the scan to land on some later leaf
page directly, relative to the unoptimized case (or on an earlier leaf
page directly, when scanning backwards). But the savings can really add
up in cases where an affected skip array comes after some other array.
For example, a scan indexqual "WHERE x IN (1, 2, 3) AND y > 5 AND z = 2"
can save as many as 3 _bt_first calls by applying the new transformation
to its "y" array (up to 1 extra search can be avoided per "x" element).
Follow-up to commit 92fe23d9, which added nbtree skip scan.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=FJ78K3WsF3iWNxWnUCY9f=Jdg3QPxaXE=uYUbmuRz5Q@mail.gmail.com
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns. When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan. The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.
A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays. Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys. This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.
Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan). In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.
The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index. Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument. These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).
A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However,
improved performance isn't guaranteed. Much depends on physical index
characteristics.
B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
Currently, group_similar_or_args() permutes original positions of clauses
independently on whether it manages to find any groups of similar clauses.
While we are not providing any strict warranties on saving the original order
of OR-clauses, it is preferred that the original order be modified as little
as possible.
This commit changes the reordering algorithm of group_similar_or_args() in
the following way. We reorder each group of similar clauses so that the
first item of the group stays in place, but all the other items are moved
after it. So, if there are no similar clauses, the order of clauses stays
the same. When there are some groups, only required reordering happens while
the rest of the clauses remain in their places.
Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
This commit allows transformation of OR-clauses into SAOP's for index scans
within nested loop joins. That required the following changes.
1. Make match_orclause_to_indexcol() and group_similar_or_args() understand
const-ness in the same way as match_opclause_to_indexcol(). This
generally makes our approach more uniform.
2. Make match_join_clauses_to_index() pass OR-clauses to
match_clause_to_index().
3. Also switch match_join_clauses_to_index() to use list_append_unique_ptr()
for adding clauses to *joinorclauses. That avoids possible duplicates
when processing the same clauses with different indexes. Previously such
duplicates were elimited in match_clause_to_index(), but now
group_similar_or_args() each time generates distinct copies of grouped
OR clauses.
Discussion: https://postgr.es/m/CAPpHfdv%2BjtNwofg-p5z86jLYZUTt6tR17Wy00ta0dL%3DwHQN3ZA%40mail.gmail.com
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Move nbtree's detection of RowCompare quals that are unsatisfiable due
to having a NULL in their first row element: rather than detecting these
cases at the point where _bt_first builds its insertion scan key, do so
earlier, during preprocessing proper. This brings the RowCompare case
in line every other case involving an unsatisfiable-due-to-NULL qual.
nbtree now consistently detects such unsatisfiable quals -- even when
they happen to involve a key that isn't examined by _bt_first at all.
Affected cases thereby avoid useless full index scans that cannot
possibly return any matching rows.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmySVXst2hFrOATC-zw1Byg1XC-jYUS314=mzuqsNwk+Q@mail.gmail.com
There is no point in transforming OR-clauses into SAOP's if the target index
doesn't support SAOP scans anyway. This commit adds corresponding checks
to match_orclause_to_indexcol() and group_similar_or_args(). The first check
fixes the actual bug, while the second just saves some cycles.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/8174de69-9e1a-0827-0e81-ef97f56a5939%40gmail.com
Author: Alena Rybakina
Reviewed-by: Ranier Vilela, Alexander Korotkov, Andrei Lepikhov
When optimizer generates bitmap paths, it considers breaking OR-clause
arguments one-by-one. But now, a group of similar OR-clauses can be
transformed into SAOP during index matching. So, bitmap paths should
keep up.
This commit teaches bitmap paths generation machinery to group similar
OR-clauses into dedicated RestrictInfos. Those RestrictInfos are considered
both to match index as a whole (as SAOP), or to match as a set of individual
OR-clause argument one-by-one (the old way).
Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's
transformation. The old way of handling them is also considered. So, there
shouldn't be planning regression.
Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com
Author: Alexander Korotkov, Andrey Lepikhov
Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas
Reviewed-by: Peter Geoghegan
This commit makes match_clause_to_indexcol() match
"(indexkey op C1) OR (indexkey op C2) ... (indexkey op CN)" expression
to the index while transforming it into "indexkey op ANY(ARRAY[C1, C2, ...])"
(ScalarArrayOpExpr node).
This transformation allows handling long OR-clauses with single IndexScan
avoiding diving them into a slower BitmapOr.
We currently restrict Ci to be either Const or Param to apply this
transformation only when it's clearly beneficial. However, in the future,
we might switch to a liberal understanding of constants, as it is in other
cases.
Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina, Andrey Lepikhov, Alexander Korotkov
Reviewed-by: Peter Geoghegan, Ranier Vilela, Alexander Korotkov, Robert Haas
Reviewed-by: Jian He, Tom Lane, Nikolay Shaplov
pg_authid's only varlena column is rolpassword, which unfortunately
cannot be de-TOASTed during authentication because we haven't
selected a database yet and cannot read pg_class. By removing
pg_authid's TOAST table, attempts to set password hashes that
require out-of-line storage will fail with a "row is too big"
error instead. We may want to provide a more user-friendly error
in the future, but for now let's just remove the useless TOAST
table.
Bumps catversion.
Reported-by: Alexander Lakhin
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.
Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').
Sometimes it can lead to not optimal plan. This is why there is a
or_to_any_transform_limit GUC. It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.
Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively. This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).
Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient. All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.
The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value. Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan. Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.
Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.
Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute. These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).
Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes. They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem). They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).
There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions. Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
A unique index which is created with non-distinct NULLS cannot be
used for backing a primary key constraint. Make sure to disallow
such table alterations and teach pg_dump to drop the non-distinct
NULLS clause on indexes where this has been set.
Bug: 17720
Reported-by: Reiner Peterke <zedaardv@drizzle.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/17720-dab8ee0fa85d316d@postgresql.org
This addresses a couple of bugs in the REINDEX grammar, introduced by
83011ce:
- A name was never specified for DATABASE/SYSTEM, even if the query
included one. This caused such REINDEX queries to always work with any
object name, but we should complain if the object name specified does
not match the name of the database we are connected to. A test is added
for this case in the main regression test suite, provided by Álvaro.
- REINDEX SYSTEM CONCURRENTLY [name] was getting rejected in the
parser. Concurrent rebuilds are not supported for catalogs but the
error provided at execution time is more helpful for the user, and
allowing this flavor results in a simplification of the parsing logic.
- REINDEX DATABASE CONCURRENTLY was rebuilding the index in a
non-concurrent way, as the option was not being appended correctly in
the list of DefElems in ReindexStmt (REINDEX (CONCURRENTLY) DATABASE was
working fine. A test is added in the TAP tests of reindexdb for this
case, where we already have a REINDEX DATABASE CONCURRENTLY query
running on a small-ish instance. This relies on the work done in
2cbc3c1 for SYSTEM, but here we check if the OIDs of the index relations
match or not after the concurrent rebuild. Note that in order to get
this part to work, I had to tweak the tests so as the index OID and
names are saved separately. This change not affect the reliability or
of the coverage of the existing tests.
While on it, I have implemented a tweak in the grammar to reduce the
parsing by one branch, simplifying things even more.
Author: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/YttqI6O64wDxGn0K@paquier.xyz
This is to gather some more evidence about why buildfarm member wrasse
is failing. We should revert it (or at least scale it way back) once
that's resolved.
Discussion: https://postgr.es/m/1346227.1649887693@sss.pgh.pa.us
The idea behind this patch is to make it possible to run individual
test scripts without running the entire core test suite. Making all
the scripts completely independent would involve a massive rewrite,
and would probably be worse for coverage of things like concurrent DDL.
So this patch just does what seems practical with limited changes.
The net effect is that any test script can be run after running
limited earlier dependencies:
* all scripts depend on test_setup
* many scripts depend on create_index
* other dependencies are few in number, and are documented in
the parallel_schedule file.
To accomplish this, I chose a small number of commonly-used tables
and moved their creation and filling into test_setup. Later scripts
are expected not to modify these tables' data contents, for fear of
affecting other scripts' results. Also, our former habit of declaring
all C functions in one place is now gone in favor of declaring them
where they're used, if that's just one script, or in test_setup if
necessary.
There's more that could be done to remove some of the remaining
inter-script dependencies, but significantly more-invasive changes
would be needed, and at least for now it doesn't seem worth it.
Discussion: https://postgr.es/m/1114748.1640383217@sss.pgh.pa.us
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not. Different
implementations have different behaviors. In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
This patch adds this option to PostgreSQL. The default behavior
remains UNIQUE NULLS DISTINCT. Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
The opclass parameter Datums from the old index are fetched in the same
way as for predicates and expressions, by grabbing them directly from
the system catalogs. They are then copied into the new IndexInfo that
will be used for the creation of the new copy.
This caused the new index to be rebuilt with default parameters rather
than the ones pre-defined by a user. The only way to get back a new
index with correct opclass parameters would be to recreate a new index
from scratch.
The issue has been introduced by 911e702.
Author: Michael Paquier
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/YX0CG/QpLXcPr8HJ@paquier.xyz
Backpatch-through: 13
83158f7 has improved index_set_state_flags() so as it is possible to use
transactional updates when updating pg_index state flags, but there was
not really a test case which stressed directly the possibility it fixed.
This commit adds such a test, using a predicate that looks valid in
appearance but calls a stable function.
Author: Andrey Lepikhov
Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru
Backpatch-through: 9.6
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose. We're out of time for this release so we'll revert
and try again.
Commits reverted:
1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.
Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
recordMultipleDependencies had the wrong scope for its "version"
variable, allowing a version label to leak from the collation entry it
was meant for to subsequent non-collation entries. This is relatively
hard to trigger because of the OID-descending order that the inputs
will normally arrive in: subsequent non-collation items will tend to
be pinned. But it can be exhibited easily with a custom collation.
Also, don't special-case the default collation, but instead ignore
pinned-ness of a collation when we've found a version for it. This
avoids creating useless pg_depend entries, and removes a not-very-
future-proof assumption that C, POSIX, and DEFAULT are the only
pinned collations.
A small problem is that, because the default collation may or may
not have a version, the regression tests can't assume anything about
whether dependency entries will be made for it. This seems OK though
since it's now handled just the same as other collations, and we have
test cases for both versioned and unversioned collations.
Fixes oversights in commit 257836a75. Thanks to Julien Rouhaud
for review.
Discussion: https://postgr.es/m/3564817.1618420687@sss.pgh.pa.us
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS. This data was lost on the new index after REINDEX
CONCURRENTLY.
The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable. Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.
Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
CREATE TABLE AS has been preferred over SELECT INTO (outside of ecpg
and PL/pgSQL) for a long time. There were still a few uses of SELECT
INTO in tests and documentation, some old, some more recent. This
changes them to CREATE TABLE AS. Some occurrences in the tests remain
where they are specifically testing SELECT INTO parsing or similar.
Discussion: https://www.postgresql.org/message-id/flat/96dc0df3-e13a-a85d-d045-d6e2c85218da%40enterprisedb.com
This changes CLUSTER and REINDEX so as a parenthesized grammar becomes
possible for options, while unifying the grammar parsing rules for
option lists with the existing ones.
This is a follow-up of the work done in 873ea9e for VACUUM, ANALYZE and
EXPLAIN. This benefits REINDEX for a potential backend-side filtering
for collatable-sensitive indexes and TABLESPACE, while CLUSTER would
benefit from the latter.
Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
Historically these were called >^ and <^, but that is inconsistent
with the similar box, polygon, and circle operators, which are named
|>> and <<| respectively. Worse, the >^ and <^ names are used for
*not* strict above/below tests for the box type.
Hence, invent new operators following the more common naming. The
old operators remain available for now, and are still accepted by
the relevant index opclasses too. But there's a deprecation notice,
so maybe we can get rid of them someday.
Emre Hasegeli, reviewed by Pavel Borisov
Discussion: https://postgr.es/m/24348.1587444160@sss.pgh.pa.us
Statistics associated to an index got lost after running REINDEX
CONCURRENTLY, while the non-concurrent case preserves these correctly.
The concurrent and non-concurrent operations need to be consistent for
the end-user, and missing statistics would force to wait for a new
analyze to happen, which could take some time depending on the activity
of the existing autovacuum workers. This issue is fixed by copying any
existing entries in pg_statistic associated to the old index to the new
one. Note that this copy is already done with the data of the index in
the stats collector.
Reported-by: Fabrízio de Royes Mello
Author: Michael Paquier, Fabrízio de Royes Mello
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAFcNs+qpFPmiHd1oTXvcPdvAHicJDA9qBUSujgAhUMJyUMb+SA@mail.gmail.com
Backpatch-through: 12
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
If the flag value is lost, logical decoding would work the same way as
REPLICA IDENTITY NOTHING, meaning that no old tuple values would be
included in the changes anymore produced by logical decoding.
Author: Michael Paquier
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/20200603065340.GK89559@paquier.xyz
Backpatch-through: 12
If the flag value is lost, a CLUSTER query following REINDEX
CONCURRENTLY could fail. Non-concurrent REINDEX is already handling
this case consistently.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200229024202.GH29456@telsasoft.com
Backpatch-through: 12
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
If CheckAttributeType() threw an error about the datatype of an
index expression column, it would report an empty column name,
which is pretty unhelpful and certainly not the intended behavior.
I (tgl) evidently broke this in commit cfc5008a5, by not noticing
that the column's attname was used above where I'd placed the
assignment of it.
In HEAD and v12, this is trivially fixable by moving up the
assignment of attname. Before v12 the code is a bit more messy;
to avoid doing substantial refactoring, I took the lazy way out
and just put in two copies of the assignment code.
Report and patch by Amit Langote. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/CA+HiwqFA+BGyBFimjiYXXMa2Hc3fcL0+OJOyzUNjhU4NCa_XXw@mail.gmail.com
When creating a uniqueness constraint using a pre-existing index,
we have always required that the index have the same properties you'd
get if you just let a new index get built. However, when collations
were added, we forgot to add the index's collation to that check.
It's hard to trip over this without intentionally trying to break it:
you'd have to explicitly specify a different collation in CREATE
INDEX, then convert it to a pkey or unique constraint. Still, if you
did that, pg_dump would emit a script that fails to reproduce the
index's collation. The main practical problem is that after a
pg_upgrade the index would be corrupt, because its actual physical
order wouldn't match what pg_index says. A more theoretical issue,
which is new as of v12, is that if you create the index with a
nondeterministic collation then it wouldn't be enforcing the normal
notion of uniqueness, causing the constraint to mean something
different from a normally-created constraint.
To fix, just add collation to the conditions checked for index
acceptability in ADD PRIMARY KEY/UNIQUE USING INDEX. We won't try
to clean up after anybody who's already created such a situation;
it seems improbable enough to not be worth the effort involved.
(If you do get into trouble, a REINDEX should be enough to fix it.)
In principle this is a long-standing bug, but I chose not to
back-patch --- the odds of causing trouble seem about as great
as the odds of preventing it, and both risks are very low anyway.
Per report from Alexey Bashtanov, though this is not his preferred
fix.
Discussion: https://postgr.es/m/b05ce36a-cefb-ca5e-b386-a400535b1c0b@imap.cc
get_relkind_objtype, and hence get_object_type, failed when applied to a
toast table. This is not a good thing, because it prevents reporting of
perfectly legitimate permissions errors. (At present, these functions
are in fact *only* used to determine the ObjectType argument for
acl_error() calls.) It seems best to have them fall back to returning
OBJECT_TABLE in every case where they can't determine an object type
for a pg_class entry, so do that.
In passing, make some edits to alter.c to make it more obvious that
those calls of get_object_type() are used only for error reporting.
This might save a few cycles in the non-error code path, too.
Back-patch to v11 where this issue originated.
John Hsu, Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/C652D3DF-2B0C-4128-9420-FB5379F6B1E4@amazon.com
When swapping the dependencies of the old and new indexes, the code has
been correctly switching all links in pg_depend from the old to the new
index for both referencing and referenced entries. However it forgot
the fact that the new index may itself have existing entries in
pg_depend, like references to the parent table attributes. This
resulted in duplicated entries in pg_depend after running REINDEX
CONCURRENTLY.
Fix this problem by removing any existing entries in pg_depend on the
new index before switching the dependencies of the old index to the new
one. More regression tests are added to check the consistency of
entries in pg_depend for indexes, including partition indexes.
Author: Michael Paquier
Discussion: https://postgr.es/m/20191025064318.GF8671@paquier.xyz
Backpatch-through: 12
When copying the definition of an index rebuilt concurrently for the new
entry, the index information was taken directly from the old index using
the relation cache. In this case, predicates and expressions have
some post-processing to prepare things for the planner, which loses some
information including the collations added in any of them.
This inconsistency can cause issues when attempting for example a table
rewrite, and makes the new indexes rebuilt concurrently inconsistent
with the old entries.
In order to fix the problem, fetch expressions and predicates directly
from the catalog of the old entry, and fill in IndexInfo for the new
index with that. This makes the process more consistent with
DefineIndex(), and the code is refactored with the addition of a routine
to create an IndexInfo node.
Reported-by: Manuel Rigger
Author: Michael Paquier
Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com
Backpatch-through: 12
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY. First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations. Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs. The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.
While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs. Some error messages are improved,
with wording suggestion coming from Tom Lane.
Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
The tests turn out to cause deadlocks in some circumstances. Fairly
reproducibly so with -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE. Some of the deadlocks may be hard to fix
without disproportionate measures, but others probably should be fixed
- but not in 12.
We discussed removing the new tests until we can fix the issues
underlying the deadlocks, but results from buildfarm animal
markhor (which runs with CLOBBER_CACHE_ALWAYS) indicates that there
might be a more severe, as of yet undiagnosed, issue (including on
stable branches) with reindexing catalogs. The failure is:
ERROR: could not read block 0 in file "base/16384/28025": read only 0 of 8192 bytes
Therefore it seems advisable to keep the tests.
It's not certain that running the tests in isolation removes the risk
of deadlocks. It's possible that additional locks are needed to
protect against a concurrent auto-analyze or such.
Per discussion with Tom Lane.
Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
Backpatch: 9.4-, like 3dbb317d3
When reindexing individual indexes on pg_class it was possible to
either trigger an assertion failure:
TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((index)->rd_id)))
That's because reindex_index() called SetReindexProcessing() - which
enables an asserts ensuring no index insertions happen into the index
- before calling RelationSetNewRelfilenode(). That not correct for
indexes on pg_class, because RelationSetNewRelfilenode() updates the
relevant pg_class row, which needs to update the indexes.
The are two reasons this wasn't noticed earlier. Firstly the bug
doesn't trigger when reindexing all of pg_class, as reindex_relation
has code "hiding" all yet-to-be-reindexed indexes. Secondly, the bug
only triggers when the the update to pg_class doesn't turn out to be a
HOT update - otherwise there's no index insertion to trigger the
bug. Most of the time there's enough space, making this bug hard to
trigger.
To fix, move RelationSetNewRelfilenode() to before the
SetReindexProcessing() (and, together with some other code, to outside
of the PG_TRY()).
To make sure the error checking intended by SetReindexProcessing() is
more robust, modify CatalogIndexInsert() to check
ReindexIsProcessingIndex() even when the update is a HOT update.
Also add a few regression tests for REINDEXing of system catalogs.
The last two improvements would have prevented some of the issues
fixed in 5c1560606d from being introduced in the first place.
Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
Backpatch: 9.4-, the bug is present in all branches
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway). This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.
Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.
More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.
Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
In case of a partition index, when swapping the old and new index, we
also need to attach the new index as a partition and detach the old
one. Also, to handle partition indexes, we not only need to change
dependencies referencing the index, but also dependencies of the index
referencing something else. The previous code did this only
specifically for a constraint, but we also need to do this for
partitioned indexes. So instead write a generic function that does it
for all dependencies.
Author: Michael Paquier <michael@paquier.xyz>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/DF4PR8401MB11964EDB77C860078C343BEBEE5A0%40DF4PR8401MB1196.NAMPRD84.PROD.OUTLOOK.COM#154df1fedb735190a773481765f7b874
The point of this change is to increase the potential for parallelism
while running the core regression tests. Most people these days are
using parallel testing modes on multi-core machines, so we might as
well try a bit harder to keep multiple cores busy. Hence, a test that
runs much longer than others in its parallel group is a candidate to
be sub-divided.
In this patch, create_index.sql and join.sql are split up.
I haven't changed the content of the tests in any way, just
moved them.
I moved create_index.sql's SP-GiST-related tests into a new script
create_index_spgist, and moved its btree multilevel page deletion test
over to the existing script btree_index. (btree_index is a more natural
home for that test, and it's shorter than others in its parallel group,
so this doesn't hurt total runtime of that group.) There might be
room for more aggressive splitting of create_index, but this is enough
to improve matters considerably.
Likewise, I moved join.sql's "exercises for the hash join code" into
a new file join_hash. Those exercises contributed three-quarters of
the script's runtime. Which might well be excessive ... but for the
moment, I'm satisfied with shoving them into a different parallel
group, where they can share runtime with the roughly-equally-lengthy
gist test.
(Note for anybody following along at home: there are interesting
interactions between the runtimes of create_index and anything running
in parallel with it, because the tests of CREATE INDEX CONCURRENTLY
in that file will repeatedly block waiting for concurrent transactions
to commit. As committed in this patch, create_index and
create_index_spgist have roughly equal runtimes, but that's mostly an
artifact of forced synchronization of the CONCURRENTLY tests; when run
serially, create_index is much faster. A followup patch will reduce
the runtime of create_index_spgist and thereby also create_index.)
Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
Commit c1afd175, which added support for rootdescend verification to
amcheck, added only minimal regression test coverage. Address this by
making sure that rootdescend verification is run on a multi-level index.
In passing, simplify some of the regression tests that exercise
multi-level nbtree page deletion.
Both issues spotted while rereviewing coverage of the nbtree patch
series using gcov.