Over the past few releases, we've added a huge amount of complexity
to our popcount implementations. Commits fbe327e5b4, 79e232ca01,
8c6653516c, and 25dc485074 did some preliminary refactoring, but
many opportunities remain. In particular, if we disclaim interest
in micro-optimizing this code for 32-bit builds and in unnecessary
alignment checks on x86-64, we can remove a decent chunk of code.
I cannot find public discussion or benchmarks for the code this
commit removes, but it seems unlikely that this change will
noticeably impact performance on affected systems.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.
The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).
Bumps catversion as stored rules change.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
The only place that used p_is_insert was transformAssignedExpr(),
which used it to distinguish INSERT from UPDATE when handling
indirection on assignment target columns -- see commit c1ca3a19df.
However, this information is already available to
transformAssignedExpr() via its exprKind parameter, which is always
either EXPR_KIND_INSERT_TARGET or EXPR_KIND_UPDATE_TARGET.
As noted in the commit message for c1ca3a19df, this use of
p_is_insert isn't particularly pretty, so have transformAssignedExpr()
use the exprKind parameter instead. This then allows p_is_insert to be
removed entirely, which simplifies state management in a few other
places across the parser.
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/badc3b4c-da73-4000-b8d3-638a6f53a769@Spark
This commit adds a new parameter called
password_expiration_warning_threshold that controls when the server
begins emitting imminent-password-expiration warnings upon
successful password authentication. By default, this parameter is
set to 7 days, but this functionality can be disabled by setting it
to 0. This patch also introduces a new "connection warning"
infrastructure that can be reused elsewhere. For example, we may
want to warn about the use of MD5 passwords for a couple of
releases before removing MD5 password support.
Author: Gilles Darold <gilles@darold.net>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: liu xiaohui <liuxh.zj.cn@gmail.com>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/129bcfbf-47a6-e58a-190a-62fc21a17d03%40migops.com
* Remove an unused variable
* Use "default log level" consistently (instead of "generic")
* Keep the process types in alphabetical order (missed one place in the
SGML docs)
* Since log_min_messages type was changed from enum to string, it
is a good idea to add single quotes when printing it out. Otherwise
it fails if the user copies and pastes from the SHOW output to SET,
except in the simplest case. Using single quotes reduces confusion.
* Use lowercase string for the burned-in default value, to keep the same
output as previous versions.
Author: Euler Taveira <euler@eulerto.com>
Author: Man Zeng <zengman@halodbtech.com>
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql
It protects the freeProcs and some other fields in ProcGlobal, so
let's move it there. It's good for cache locality to have it next to
the thing it protects, and just makes more sense anyway. I believe it
was allocated as a separate shared memory area just for historical
reasons.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/b78719db-0c54-409f-b185-b0d59261143f@iki.fi
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit 94f3ad3961 failed to do this
because I couldn't think of a use for the information, but this has
proven to be short-sighted. Best to fix it before this code is
officially released.
Now, the only argument to standard_planenr that isn't passed to
planner_setup_hook is boundParams, but that is accessible via
glob->boundParams, and so doesn't need to be passed separately.
Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Two changes here:
1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to
indicate a suspected deadlock that involves a buffer pin. Previously
the startup process used the same flag for a deadlock involving just
regular locks, and to check for deadlocks involving the buffer
pin. The cases are handled separately in the startup process, but the
receiving backend had to deduce which one it was based on
HoldingBufferPinThatDelaysRecovery(). With a separate flag, the
receiver doesn't need to guess.
2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely
on fallthrough through the switch-statement. That was difficult to
read.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
This commit adds three attributes to the system view pg_stats_ext_exprs,
whose data can exist when involving a range type in an expression:
range_length_histogram
range_empty_frac
range_bounds_histogram
These statistics fields exist since 918eee0c49, and have become
viewable in pg_stats later in bc3c8db8ae. This puts the definition of
pg_stats_ext_exprs on par with pg_stats.
This issue has showed up during the discussion about the restore of
extended statistics for expressions, so as it becomes possible to query
the stats data to restore from the catalogs. Having access to this data
is useful on its own, without the restore part.
Some documentation and some tests are added, written by me. Corey has
authored the part in system_views.sql.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz
They are not and never have been used by any known code -- apparently we
just cargo-culted them in commit 37484ad2aa (or their ancestor macros
anyway, which begat these functions in commit 34694ec888). Allegedly
they're also potentially dangerous; users are better off going through
HeapTupleSetHintBits instead.
Author: Andy Fan <zhihuifan1213@163.com>
Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
Change log_min_messages from being a single element to a comma-separated
list of type:level elements, with 'type' representing a process type,
and 'level' being a log level to use for that type of process. The list
must also have a freestanding level specification which is used for
process types not listed, which convenientely makes the whole thing
backwards-compatible.
Some choices made here could be contested; for instance, we use the
process type `backend` to affect regular backends as well as dead-end
backends and the standalone backend, and `autovacuum` means both the
launcher and the workers. I think it's largely sensible though, and it
can easily be tweaked if desired.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tan Yang <332696245@qq.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Commit 176dffdf7 added a NULL array pointer check before performing
a qsort in order to prevent undefined behavior when passing NULL
pointer and zero length. To head off future degenerate cases, check
that there are at least two elements to sort before proceeding with
insertion sort. This has the added advantage of allowing us to remove
four equivalent checks that guarded against recursion/iteration.
There might be a tiny performance penalty from unproductive
recursions, but we can buy that back by increasing the insertion sort
threshold. That is left for future work.
Discussion: https://postgr.es/m/CANWCAZZWvds_35nXc4vXD-eBQa_=mxVtqZf-PM_ps=SD7ghhJg@mail.gmail.com
This commit adjusts a few debugging macros to match the style of
those in pg_config_manual.h. Like commits 123661427b and
b4cbc106a6, these were discovered while reviewing Aleksander
Alekseev's proposed changes to pgindent.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
The main reason that libpq doesn't request protocol version 3.2 by
default is because other proxy/server implementations don't implement
the negotiation. This is a bit of a chicken-and-egg problem: We don't
bump the default version that libpq requests, but other implementations
may not be incentivized to implement version negotiation if their users
never run into issues.
One established practice to combat this is to flip Postel's Law on its
head, by sending parameters that the server cannot possibly support. If
the server fails the handshake instead of correctly negotiating, then
the problem is surfaced naturally. If the server instead claims to
support the bogus parameters, then we fail the connection to make the
lie obvious. This is called "grease" (or "greasing"), after the GREASE
mechanism in TLS that popularized the concept:
https://www.rfc-editor.org/rfc/rfc8701.html
This patch reserves 3.9999 as an explicitly unsupported protocol version
number and `_pq_.test_protocol_negotiation` as an explicitly unsupported
protocol extension. A later commit will send these by default in order
to stress-test the ecosystem during the beta period; that commit will
then be reverted before 19 RC1, so that we can decide what to do with
whatever data has been gathered.
The _pq_.test_protocol_negotiation change here is intentionally docs-
only: after its implementation is reverted, the parameter should remain
reserved.
Extracted/adapted from a patch by Jelte Fennema-Nio.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
Provide a way to disable the use of posix_fallocate() for relation
files. It was introduced by commit 4d330a61bb. The new setting
file_extend_method=write_zeros can be used as a workaround for problems
reported from the field:
* BTRFS compression is disabled by the use of posix_fallocate()
* XFS could produce spurious ENOSPC errors in some Linux kernel
versions, though that problem is reported to have been fixed
The default is file_extend_method=posix_fallocate if available, as
before. The write_zeros option is similar to PostgreSQL < 16, except
that now it's multi-block.
Backpatch-through: 16
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical slots
provided that all logical slots have caught up (i.e., they have no
pending decodable WAL records). Previously, this verification was done
by checking each slot individually, which could be time-consuming if
there were many logical slots to migrate.
This commit optimizes the check to avoid reading the same WAL stream
multiple times. It performs the check only for the slot with the
minimum confirmed_flush_lsn and applies the result to all other slots
in the same database. This limits the check to at most one logical
slot per database.
During the check, we identify the last decodable WAL record's LSN to
report any slots with unconsumed records, consistent with the existing
error reporting behavior. Additionally, the maximum
confirmed_flush_lsn among all logical slots on the database is used as
an early scan cutoff; finding a decodable WAL record beyond this point
implies that no slot has caught up.
Performance testing demonstrated that the execution time remains
stable regardless of the number of slots in the database.
Note that we do not distinguish slots based on their output plugins. A
hypothetical plugin might use a replication origin filter that filters
out changes from a specific origin. In such cases, we might get a
false positive (erroneously considering a slot caught up). However,
this is safe from a data integrity standpoint, such scenarios are
rare, and the impact of a false positive is minimal.
This optimization is applied only when the old cluster is version 19
or later.
Bump catalog version.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com
We can immediately make use of it in pg_signal_backend(), which
previously fetched the process type from the backend status array with
pgstat_get_backend_type_by_proc_number(). That was correct but felt a
little questionable to me: backend status should be for observability
purposes only, not for permission checks.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers. This is incorrect, since the point of copyObject() is to
make a copy to mutate. But apparently no code ran into it.
The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.
typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__. We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
I don't understand how to reach errdetail_abort() with
MyProc->recoveryConflictPending set. If a recovery conflict signal is
received, ProcessRecoveryConflictInterrupt() raises an ERROR or FATAL
error to cancel the query or connection, and abort processing clears
the flag. The error message from ProcessRecoveryConflictInterrupt() is
very clear that the query or connection was terminated because of
recovery conflict.
The only way to reach it AFAICS is with a race condition, if the
startup process sends a recovery conflict signal when the transaction
has just entered aborted state for some other reason. And in that case
the detail would be misleading, as the transaction was already aborted
for some other reason, not because of the recovery conflict.
errdetail_abort() was the only user of the recoveryConflictPending
flag in PGPROC, so we can remove that and all the related code too.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint
with an explicit name, we have to ensure that if the column is already
marked NOT NULL, the provided name matches the existing constraint name.
Failing to do so could lead to confusion regarding which constraint
object actually enforces the rule.
This patch adds a check to throw an error if the user tries to add a
named not-null constraint to a column that already has one with a
different name.
Reported-by: yanliang lei <msdnchina@163.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org
Separate att_align_nominal() into two macros, similarly to what
was already done with att_align_datum() and att_align_pointer().
The inner macro att_nominal_alignby() is really just TYPEALIGN(),
while att_align_nominal() retains its previous API by mapping
TYPALIGN_xxx values to numbers of bytes to align to and then
calling att_nominal_alignby(). In support of this, split out
tupdesc.c's logic to do that mapping into a publicly visible
function typalign_to_alignby().
Having done that, we can replace performance-critical uses of
att_align_nominal() with att_nominal_alignby(), where the
typalign_to_alignby() mapping is done just once outside the loop.
In most places I settled for doing typalign_to_alignby() once
per function. We could in many places pass the alignby value
in from the caller if we wanted to change function APIs for this
purpose; but I'm a bit loath to do that, especially for exported
APIs that extensions might call. Replacing a char typalign
argument by a uint8 typalignby argument would be an API change
that compilers would fail to warn about, thus silently breaking
code in hard-to-debug ways. I did revise the APIs of array_iter_setup
and array_iter_next, moving the element type attribute arguments to
the former; if any external code uses those, the argument-count
change will cause visible compile failures.
Performance testing shows that ExecEvalScalarArrayOp is sped up by
about 10% by this change, when using a simple per-element function
such as int8eq. I did not check any of the other loops optimized
here, but it's reasonable to expect similar gains.
Although the motivation for creating this patch was to avoid a
performance loss if we add some more typalign values, it evidently
is worth doing whether that patch lands or not.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
Up to now we've used GNU-style local labels for branch targets
in s_lock.h's assembly blocks. But there's an alternative style,
which I for one didn't know about till recently: use regular
assembler labels, and insert a per-asm-block number in them
using %= to ensure they are distinct across multiple TAS calls
within one source file. gcc has had %= since gcc 2.0, and
I've verified that clang knows it too.
While the immediate motivation for changing this is that AIX's
assembler doesn't do local labels, it seems to me that this is a
superior solution anyway. There is nothing mnemonic about "1:",
while a regular label can convey something useful, and at least
to me it feels less error-prone. Therefore let's standardize on
this approach, also converting the one other usage in s_lock.h.
Discussion: https://postgr.es/m/399291.1769998688@sss.pgh.pa.us
The build generates four files based on the wait event contents stored
in wait_event_names.txt:
- wait_event_types.h
- pgstat_wait_event.c
- wait_event_funcs_data.c
- wait_event_types.sgml
The SGML file is generated as part of a documentation build, with its
data stored in doc/src/sgml/ for meson and configure. The three others
are handled differently for meson and configure:
- In configure, all the files are created in src/backend/utils/activity/.
A link to wait_event_types.h is created in src/include/utils/.
- In meson, all the files are created in src/include/utils/.
The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are
then included in respectively wait_event.c and wait_event_funcs.c,
without the "utils/" path.
For configure, this does not present a problem. For meson, this has to
be combined with a trick in src/backend/utils/activity/meson.build,
where include_directories needs to point to include/utils/ to make the
inclusion of the C files work properly, causing builds to pull in
PostgreSQL headers rather than system headers in some build paths, as
src/include/utils/ would take priority.
In order to fix this issue, this commit reworks the way the C/H files
are generated, becoming consistent with guc_tables.inc.c:
- For meson, basically nothing changes. The files are still generated
in src/include/utils/. The trick with include_directories is removed.
- For configure, the files are now generated in src/backend/utils/, with
links in src/include/utils/ pointing to the ones in src/backend/. This
requires extra rules in src/backend/utils/activity/Makefile so as a
make command in this sub-directory is able to work.
- The three files now fall under header-stamp, which is actually simpler
as guc_tables.inc.c does the same.
- wait_event_funcs_data.c and pgstat_wait_event.c are now included with
"utils/" in their path.
This problem has not been an issue in the buildfarm; it has been noted
with AIX and a conflict with float.h. This issue could, however, create
conflicts in the buildfarm depending on the environment with unexpected
headers pulled in, so this fix is backpatched down to where the
generation of the wait-event files has been introduced.
While on it, this commit simplifies wait_event_names.txt regarding the
paths of the files generated, to mention just the names of the files
generated. The paths where the files are generated became incorrect.
The path of the SGML path was wrong.
This change has been tested in the CI, down to v17. Locally, I have run
tests with configure (with and without VPATH), as well as meson, on the
three branches.
Combo oversight in fa88928470 and 1e68e43d3f.
Reported-by: Aditya Kamath <aditya.kamath1@ibm.com>
Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com
Backpatch-through: 17
For readability. It was a slight modularity violation to have fields
in PGShmemHeader that were only used by the allocator code in
shmem.c. And it was inconsistent that ShmemLock was nevertheless not
stored there. Moving all the allocator-related fields to a separate
struct makes it more consistent and modular, and removes the need to
allocate and pass ShmemLock separately via BackendParameters.
Merge InitShmemAccess() and InitShmemAllocation() into a single
function that initializes the struct when called from postmaster, and
when called from backends in EXEC_BACKEND mode, re-establishes the
global variables. That's similar to all the *ShmemInit() functions
that we have.
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5uNRB9oT4pdo54qAo025MXFX4MfYrD9K15OCqe-ExnNvg@mail.gmail.com
Previously, the CheckXidAlive check was performed within the table_scan*next*
functions. This caused the check to be executed for every fetched tuple, an
unnecessary overhead.
To fix, move the check to table_beginscan* so it is performed once per scan
rather than once per row.
Note: table_tuple_fetch_row_version() does not use a scan descriptor;
therefore, the CheckXidAlive check is retained in that function. The overhead
is unlikely to be relevant for the existing callers.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Amit Kapila <akapila@postgresql.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/tlpltqm5jjwj7mp66dtebwwhppe4ri36vdypux2zoczrc2i3mp%40dhv4v4nikyfg
Commit 6ceef9408 was still one brick shy of a load, because it caused
any usage at all of PGIOAlignedBlock or PGAlignedXLogBlock to fail
under older g++. Notably, this broke "headerscheck --cplusplus".
We can permit references to these structs as abstract structs though;
only actual declaration of such a variable needs to be forbidden.
Discussion: https://www.postgresql.org/message-id/3119480.1769189606@sss.pgh.pa.us
This commit adds support for the restore of extended statistics of the
kind "mcv", aka most-common values.
This format is different from n_distinct and dependencies stat types in
that it is the combination of three of the four different arrays from the
pg_stats_ext view which in turn require three different input parameters
on pg_restore_extended_statistics(). These are translated into three
input arguments for the function:
- "most_common_vals", acting as a leader of the others. It is a
2-dimension array, that includes the common values.
- "most_common_freqs", 1-dimension array of float8[], with a number of
elements that has to match with "most_common_vals".
- "most_common_base_freqs", 1-dimension array of float8[], with a number
of elements that has to match with "most_common_vals".
All three arrays are required to achieve the restore of this type of
extended statistics (if "most_common_vals" happens to be NULL in the
catalogs, the rest is NULL by design).
Note that "most_common_val_nulls" is not required in input, its data is
rebuilt from the decomposition of the "most_common_vals" array based on
its text[] representation. The initial versions of the patch provided
this option in input, but we do not require it and it simplifies a lot
the result.
Support in pg_dump is added down to v13 which is where the support for
this type of extended statistics has been added, when --statistics is
used. This means that upgrade and dumps can restore extended statistics
data transparently, like "dependencies", "ndistinct", attribute and
relation statistics. For MCV, the values are directly queried from the
relevant catalogs.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This commit moves the separate global variables for replication origin
state into a single ReplOriginXactState struct. This groups logically
related variables, which improves code readability and simplifies
state management (e.g., resetting the state) by handling them as a
unit.
Author: Chao Li <lic@highgo.com>
Suggested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com
Factor out common logic for clearing replorigin_session_* variables
into a dedicated helper function, replorigin_xact_clear().
This removes duplicated assignments of these variables across multiple
call sites, and makes the intended scope of each reset explicit.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com
The replication origin code was using inconsistent naming
conventions. Functions were typically prefixed with 'replorigin',
while typedefs and constants used "RepOrigin".
This commit unifies the naming convention by renaming RepOriginId to
ReplOriginId.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com
Each RelOptInfo now has a pgs_mask member which is a mask of acceptable
strategies. For most rels, this is populated from PlannerGlobal's
default_pgs_mask, which is computed from the values of the enable_*
GUCs at the start of planning.
For baserels, get_relation_info_hook can be used to adjust pgs_mask for
each new RelOptInfo, at least for rels of type RTE_RELATION. Adjusting
pgs_mask is less useful for other types of rels, but if it proves to
be necessary, we can revisit the way this hook works or add a new one.
For joinrels, two new hooks are added. joinrel_setup_hook is called each
time a joinrel is created, and one thing that can be done from that hook
is to manipulate pgs_mask for the new joinrel. join_path_setup_hook is
called each time we're about to add paths to a joinrel by considering
some particular combination of an outer rel, an inner rel, and a join
type. It can modify the pgs_mask propagated into JoinPathExtraData to
restrict strategy choice for that particular combination of rels.
To make joinrel_setup_hook work as intended, the existing calls to
build_joinrel_partition_info are moved later in the calling functions;
this is because that function checks whether the rel's pgs_mask includes
PGS_CONSIDER_PARTITIONWISE, so we want it to only be called after
plugins have had a chance to alter pgs_mask.
Upper rels currently inherit pgs_mask from the input relation. It's
unclear that this is the most useful behavior, but at the moment there
are no hooks to allow the mask to be set in any other way.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.
When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.
To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.
However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.
This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a3, when we did not pin the
VM page until deciding we must read it. Now that the VM page is already
pinned, there is no meaningful benefit to relying on a cached VM status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available, which would make the logic harder to reason about. And
eliminating it enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible to set the VM
after fixing corruption (if pruning found the page all-visible).
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
This fixes cases where a qualifier (const, in all cases here) was
dropped by a cast, but the cast was otherwise necessary or desirable,
so the straightforward fix is to add the qualifier into the cast.
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org
Fix for commit a9bdb63bba. The previous plan of redefining alignas
didn't work, because it interfered with other C++ header files (e.g.,
LLVM). So now the new workaround is to just disable the affected
typedefs under the affected compilers. These are not typically used
in extensions anyway.
Discussion: https://www.postgresql.org/message-id/3119480.1769189606%40sss.pgh.pa.us
This function closely mirror its relation and attribute counterparts,
but for extended statistics (i.e. CREATE STATISTICS) objects, being
able to restore extended statistics for an extended stats object. Like
the other functions, the goal of this feature is to ease the dump or
upgrade of clusters so as ANALYZE would not be required anymore after
these operations, stats being directly loaded into the target cluster
without any post-dump/upgrade computation.
The caller of this function needs the following arguments for the
extended stats to restore:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
- One or more extended stats kind with its data.
This commit adds only support for the restore of the extended statistics
kind "n_distinct", building the basic infrastructure for the restore
of more extended statistics kinds in follow-up commits, including MVC
and dependencies.
The support for "n_distinct" is eased in this commit thanks to the
previous work done particularly in commits 1f927cce44 and
44eba8f06e, that have added the input function for the type
pg_ndistinct, used as data type in input of this new restore function.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Based on name of the macro, it was implied that it could be used for all
mmap() calls on portability grounds. However, its use is limited to
sysv_shmem.c, for CreateAnonymousSegment(). This commit removes the
declaration, reducing the confusion around it as a portability tweak,
being limited to SysV-style shared memory.
This macro has been introduced in b0fc0df936 for sysv_shmem.c,
originally. It has been moved to mem.h in 0ac5e5a7e1 a bit later.
Suggested by: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vTWABxuM5fbQcFkGuTLwaxuZDEE2vtx2WuMUWk6JnF4g@mail.gmail.com
Discussion: https://postgr.es/m/12add41a-7625-4639-a394-a5563e349322@eisentraut.org