This commit replaces the implementations of pg_popcount{32,64} with
branchless ones in plain C. While these new implementations do not
make use of more sophisticated population count instructions
available on some CPUs, testing indicates they perform well,
especially now that they are inlined. Newer versions of popular
compilers will automatically replace these with special
instructions if possible, anyway. A follow-up commit will replace
various loops over these functions with calls to pg_popcount(),
leaving us little reason to worry about micro-optimizing them
further.
Since this commit removes the only uses of the popcount builtins,
we can also remove the corresponding configuration checks.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
conflict.h currently includes utils/timestamp.h despite only requiring
basic timestamp type definitions. This creates unnecessary overhead.
Replace the include with datatype/timestamp.h to provide the necessary
types. This change requires explicitly including utils/timestamp.h in
test_custom_fixed_stats.c, which previously relied on the indirect
inclusion.
Extracted from the larger patch by Andres Freund.
Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de
On common architectures, the PGPROC struct happened to be a multiple
of 64 bytes on PG 18, but it's changed on 'master' since. There was
worry that changing the alignment might hurt performance, due to false
cacheline sharing across elements in the proc array. However, there
was no explicit alignment, so any alignment to cache lines was
accidental. Add explicit alignment to remove worry about false
sharing.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
The ordering was pretty random, making it hard to get an overview of
what's in it. Group related fields together, and add comments to act
as separators between the groups.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
If a CREATE TABLE statement defined a constraint whose name is identical
to the name generated for a NOT NULL constraint, we'd throw an
(unnecessary) unique key violation error on
pg_constraint_conrelid_contypid_conname_index: this can easily be
avoided by choosing a different name for the NOT NULL constraint.
Fix by passing the constraint names already created by
AddRelationNewConstraints() to AddRelationNotNullConstraints(), so that
the latter can avoid name collisions with them.
Bug: #19393
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reported-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19393-6a82427485a744cf@postgresql.org
The field was mainly used for the position in a LOCK's wait queue, but
also as the position in a the freelist when the PGPROC entry was not
in use. The reuse saves some memory at the expense of readability,
which seems like a bad tradeoff. If we wanted to make the struct
smaller there's other things we could do, but we're actually just
discussing adding padding to the struct for performance reasons.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
pgstat.h is a widely included header. Including worker_internal.h there is
unnecessary and creates tight coupling. By refactoring
pgstat_report_subscription_error() to fetch the required
LogicalRepWorkerType internally rather than receiving it as an argument,
we can eliminate the need for the internal header.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de
S_LOCK_FREE() is used by the test program in s_lock.c, but nobody
has voiced concerns about losing some coverage there.
SpinLockFree() appears to have been unused since it was introduced
by commit 499abb0c0f. There was agreement to remove these in 2020,
but it never happened. Since we still have agreement for removal
in 2026, let's do that now.
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/aZX2oUcKf7IzHnnK%40nathan
Discussion: https://postgr.es/m/20200608225338.m5zho424w6lpwb2d%40alap3.anarazel.de
This has been a keyword since C99, and we now require C11, so we no
longer need to use __inline__ or to check for it at configure time.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aZdGbDaV4_yKCMc-%40nathan
This commit allows setting wal_receiver_timeout per subscription
using the CREATE SUBSCRIPTION and ALTER SUBSCRIPTION commands.
The value is stored in the subwalrcvtimeout column of the pg_subscription
catalog.
When set, this value overrides the global wal_receiver_timeout for
the subscription's apply worker. The default is -1, which means the
global setting (from the server configuration, command line, role,
or database) remains in effect.
This feature is useful for configuring different timeout values for
each subscription, especially when connecting to multiple publisher
servers, to improve failure detection.
Bump catalog version.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/a1414b64-bf58-43a6-8494-9704975a41e9@oss.nttdata.com
Instead of using comments to mark fallthrough switch cases, use the
fallthrough attribute. This will (in the future, not here) allow
supporting other compilers besides gcc. The commenting convention is
only supported by gcc, the attribute is supported by clang, and in the
fullness of time the C23 standard attribute would allow supporting
other compilers as well.
Right now, we package the attribute into a macro called
pg_fallthrough. This commit defines that macro and replaces the
existing comments with that macro invocation.
We also raise the level of the gcc -Wimplicit-fallthrough= option from
3 to 5 to enforce the use of the attribute.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
Up to now, to create such a function, one had to make a pg_proc.dat
entry and then overwrite it with a CREATE OR REPLACE command in
system_functions.sql. That's error-prone (cf. bug #19409) and
results in leaving dead rows in the initial contents of pg_proc.
Manual maintenance of pg_node_tree strings seems entirely impractical,
and parsing expressions during bootstrap would be extremely difficult
as well. But Andres Freund observed that all the current use-cases
are simple constants, and building a Const node is well within the
capabilities of bootstrap mode. So this patch invents a special case:
if bootstrap mode is asked to ingest a non-null value for
pg_proc.proargdefaults (which would otherwise fail in
pg_node_tree_in), it parses the value as an array literal and then
feeds the element strings to the input functions for the corresponding
parameter types. Then we can build a suitable pg_node_tree string
with just a few more lines of code.
This allows removing all the system_functions.sql entries that are
just there to set up default arguments, replacing them with
proargdefaults fields in pg_proc.dat entries. The old technique
remains available in case someone needs a non-constant default.
The initial contents of pg_proc are demonstrably the same after
this patch, except that (1) json_strip_nulls and jsonb_strip_nulls
now have the correct provolatile setting, as per bug #19409;
(2) pg_terminate_backend, make_interval, and drandom_normal
now have defaults that don't include a type coercion, which is
how they should have been all along.
In passing, remove some unused entries from bootstrap.c's TypInfo[]
array. I had to add some new ones because we'll now need an entry for
each default-possessing system function parameter, but we shouldn't
carry more than we need there; it's just a maintenance gotcha.
Bug: #19409
Reported-by: Lucio Chiessi <lucio.chiessi@trustly.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/183292bb-4891-4c96-a3ca-e78b5e0e1358@dunslane.net
Discussion: https://postgr.es/m/19409-e16cd2605e59a4af@postgresql.org
The main purpose of this change is to allow an ABI checker to understand
when the list of SysCacheIdentifier changes, by switching all the
routine declarations that relied on a signed integer for a syscache ID
to this new type. This is going to be useful in the long-term for
versions newer than v19 so as we will be able to check when the list of
values in SysCacheIdentifier is updated in a non-ABI compliant fashion.
Most of the changes of this commit are due to the new definition of
SyscacheCallbackFunction, where a SysCacheIdentifier is now required for
the syscache ID. It is a mechanical change, still slightly invasive.
There are more areas in the tree that could be improved with an ABI
checker in mind; this takes care of only one area.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/289125.1770913057@sss.pgh.pa.us
... instead of passing a bunch of separate booleans.
Also, rearrange the argument list in a hopefully more sensible order.
Discussion: https://postgr.es/m/202602111846.xpvuccb3inbx@alvherre.pgsql
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> (older version)
syscache_info.h was installed into $installdir/include/server/catalog
if you use a non-VPATH autoconf build, but not if you use a VPATH
build or meson. That happened because the makefiles blindly install
src/include/catalog/*.h, and in a non-VPATH build the generated
header files would be swept up in that. While it's hard to conjure
a reason to need syscache_info.h outside of backend build, it's
also hard to get the makefiles to skip syscache_info.h, so let's
go the other way and install it in the other two cases too.
Another problem, new in v19, was that meson builds install a copy of
src/include/catalog/README, while autoconf builds do not. The issue
here is that that file is new and wasn't added to meson.build's
exclusion list.
While it's clearly a bug if different build methods don't install
the same set of files, I doubt anyone would thank us for changing
the behavior in released branches. Hence, fix in master only.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/946828.1771185367@sss.pgh.pa.us
This completes the work started by commit 75f49221c2.
In basebackup.c, changing the StaticAssertStmt to StaticAssertDecl
results in having the same StaticAssertDecl() in 2 functions. So, it
makes more sense to move it to file scope instead.
Also, as it depends on some computations based on 2 tar blocks, define
TAR_NUM_TERMINATION_BLOCKS.
In deadlock.c, change the StaticAssertStmt to StaticAssertDecl and
keep it in the function scope. Add new braces to avoid warning from
-Wdeclaration-after-statement.
In aset.c, change the StaticAssertStmt to StaticAssertDecl and move it
to file scope.
Finally, update the comments in c.h a bit.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/aYH6ii46AvGVCB84%40ip-10-97-1-34.eu-west-3.compute.internal
Radix sort can be much faster than quicksort, but for our purposes it
is limited to sequences of unsigned bytes. To make tuples with other
types amenable to this technique, several features of tuple comparison
must be accounted for, i.e. the sort key must be "normalized":
1. Signedness -- It's possible to modify a signed integer such that
it can be compared as unsigned. For example, a signed char has range
-128 to 127. If we cast that to unsigned char and add 128, the range
of values becomes 0 to 255 while preserving order.
2. Direction -- SQL allows specification of ASC or DESC. The
descending case is easily handled by taking the complement of the
unsigned representation.
3. NULL values -- NULLS FIRST and NULLS LAST must work correctly.
This commmit only handles the case where datum1 is pass-by-value
Datum (possibly abbreviated) that compares like an ordinary
integer. (Abbreviations of values of type "numeric" are a convenient
counterexample.) First, tuples are partitioned by nullness in the
correct NULL ordering. Then the NOT NULL tuples are sorted with radix
sort on datum1. For tiebreaks on subsequent sortkeys (including the
first sort key if abbreviated), we divert to the usual qsort.
ORDER BY queries on pre-warmed buffers are up to 2x faster on high
cardinality inputs with radix sort than the sort specializations added
by commit 697492434, so get rid of them. It's sufficient to fall back
to qsort_tuple() for small arrays. Moderately low cardinality inputs
show more modest improvents. Our qsort is strongly optimized for very
low cardinality inputs, but radix sort is usually equal or very close
in those cases.
The changes to the regression tests are caused by under-specified sort
orders, e.g. "SELECT a, b from mytable order by a;". For unstable
sorts, such as our qsort and this in-place radix sort, there is no
guarantee of the order of "b" within each group of "a".
The implementation is taken from ska_byte_sort() (Boost licensed),
which is similar to American flag sort (an in-place radix sort) with
modifications to make it better suited for modern pipelined CPUs.
The technique of normalization described above can also be extended
to the case of multiple keys. That is left for future work (Thanks
to Peter Geoghegan for the suggestion to look into this area).
Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com
The uses of these functions do not justify the level of
micro-optimization we've done and may even hurt performance in some
cases (e.g., due to using function pointers). This commit removes
all architecture-specific implementations of pg_popcount{32,64} and
converts the portable ones to inlined functions in pg_bitutils.h.
These inlined versions should produce the same code as before (but
inlined), so in theory this is a net gain for many machines. A
follow-up commit will replace the remaining loops over these
word-length popcount functions with calls to pg_popcount(), further
reducing the need for architecture-specific implementations.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
Over the past few releases, we've added a huge amount of complexity
to our popcount implementations. Commits fbe327e5b4, 79e232ca01,
8c6653516c, and 25dc485074 did some preliminary refactoring, but
many opportunities remain. In particular, if we disclaim interest
in micro-optimizing this code for 32-bit builds and in unnecessary
alignment checks on x86-64, we can remove a decent chunk of code.
I cannot find public discussion or benchmarks for the code this
commit removes, but it seems unlikely that this change will
noticeably impact performance on affected systems.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.
The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).
Bumps catversion as stored rules change.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
The only place that used p_is_insert was transformAssignedExpr(),
which used it to distinguish INSERT from UPDATE when handling
indirection on assignment target columns -- see commit c1ca3a19df.
However, this information is already available to
transformAssignedExpr() via its exprKind parameter, which is always
either EXPR_KIND_INSERT_TARGET or EXPR_KIND_UPDATE_TARGET.
As noted in the commit message for c1ca3a19df, this use of
p_is_insert isn't particularly pretty, so have transformAssignedExpr()
use the exprKind parameter instead. This then allows p_is_insert to be
removed entirely, which simplifies state management in a few other
places across the parser.
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/badc3b4c-da73-4000-b8d3-638a6f53a769@Spark
This commit adds a new parameter called
password_expiration_warning_threshold that controls when the server
begins emitting imminent-password-expiration warnings upon
successful password authentication. By default, this parameter is
set to 7 days, but this functionality can be disabled by setting it
to 0. This patch also introduces a new "connection warning"
infrastructure that can be reused elsewhere. For example, we may
want to warn about the use of MD5 passwords for a couple of
releases before removing MD5 password support.
Author: Gilles Darold <gilles@darold.net>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: liu xiaohui <liuxh.zj.cn@gmail.com>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/129bcfbf-47a6-e58a-190a-62fc21a17d03%40migops.com
* Remove an unused variable
* Use "default log level" consistently (instead of "generic")
* Keep the process types in alphabetical order (missed one place in the
SGML docs)
* Since log_min_messages type was changed from enum to string, it
is a good idea to add single quotes when printing it out. Otherwise
it fails if the user copies and pastes from the SHOW output to SET,
except in the simplest case. Using single quotes reduces confusion.
* Use lowercase string for the burned-in default value, to keep the same
output as previous versions.
Author: Euler Taveira <euler@eulerto.com>
Author: Man Zeng <zengman@halodbtech.com>
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql
It protects the freeProcs and some other fields in ProcGlobal, so
let's move it there. It's good for cache locality to have it next to
the thing it protects, and just makes more sense anyway. I believe it
was allocated as a separate shared memory area just for historical
reasons.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/b78719db-0c54-409f-b185-b0d59261143f@iki.fi
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit 94f3ad3961 failed to do this
because I couldn't think of a use for the information, but this has
proven to be short-sighted. Best to fix it before this code is
officially released.
Now, the only argument to standard_planenr that isn't passed to
planner_setup_hook is boundParams, but that is accessible via
glob->boundParams, and so doesn't need to be passed separately.
Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Two changes here:
1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to
indicate a suspected deadlock that involves a buffer pin. Previously
the startup process used the same flag for a deadlock involving just
regular locks, and to check for deadlocks involving the buffer
pin. The cases are handled separately in the startup process, but the
receiving backend had to deduce which one it was based on
HoldingBufferPinThatDelaysRecovery(). With a separate flag, the
receiver doesn't need to guess.
2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely
on fallthrough through the switch-statement. That was difficult to
read.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
This commit adds three attributes to the system view pg_stats_ext_exprs,
whose data can exist when involving a range type in an expression:
range_length_histogram
range_empty_frac
range_bounds_histogram
These statistics fields exist since 918eee0c49, and have become
viewable in pg_stats later in bc3c8db8ae. This puts the definition of
pg_stats_ext_exprs on par with pg_stats.
This issue has showed up during the discussion about the restore of
extended statistics for expressions, so as it becomes possible to query
the stats data to restore from the catalogs. Having access to this data
is useful on its own, without the restore part.
Some documentation and some tests are added, written by me. Corey has
authored the part in system_views.sql.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz
They are not and never have been used by any known code -- apparently we
just cargo-culted them in commit 37484ad2aa (or their ancestor macros
anyway, which begat these functions in commit 34694ec888). Allegedly
they're also potentially dangerous; users are better off going through
HeapTupleSetHintBits instead.
Author: Andy Fan <zhihuifan1213@163.com>
Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
Change log_min_messages from being a single element to a comma-separated
list of type:level elements, with 'type' representing a process type,
and 'level' being a log level to use for that type of process. The list
must also have a freestanding level specification which is used for
process types not listed, which convenientely makes the whole thing
backwards-compatible.
Some choices made here could be contested; for instance, we use the
process type `backend` to affect regular backends as well as dead-end
backends and the standalone backend, and `autovacuum` means both the
launcher and the workers. I think it's largely sensible though, and it
can easily be tweaked if desired.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tan Yang <332696245@qq.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Commit 176dffdf7 added a NULL array pointer check before performing
a qsort in order to prevent undefined behavior when passing NULL
pointer and zero length. To head off future degenerate cases, check
that there are at least two elements to sort before proceeding with
insertion sort. This has the added advantage of allowing us to remove
four equivalent checks that guarded against recursion/iteration.
There might be a tiny performance penalty from unproductive
recursions, but we can buy that back by increasing the insertion sort
threshold. That is left for future work.
Discussion: https://postgr.es/m/CANWCAZZWvds_35nXc4vXD-eBQa_=mxVtqZf-PM_ps=SD7ghhJg@mail.gmail.com
This commit adjusts a few debugging macros to match the style of
those in pg_config_manual.h. Like commits 123661427b and
b4cbc106a6, these were discovered while reviewing Aleksander
Alekseev's proposed changes to pgindent.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
The main reason that libpq doesn't request protocol version 3.2 by
default is because other proxy/server implementations don't implement
the negotiation. This is a bit of a chicken-and-egg problem: We don't
bump the default version that libpq requests, but other implementations
may not be incentivized to implement version negotiation if their users
never run into issues.
One established practice to combat this is to flip Postel's Law on its
head, by sending parameters that the server cannot possibly support. If
the server fails the handshake instead of correctly negotiating, then
the problem is surfaced naturally. If the server instead claims to
support the bogus parameters, then we fail the connection to make the
lie obvious. This is called "grease" (or "greasing"), after the GREASE
mechanism in TLS that popularized the concept:
https://www.rfc-editor.org/rfc/rfc8701.html
This patch reserves 3.9999 as an explicitly unsupported protocol version
number and `_pq_.test_protocol_negotiation` as an explicitly unsupported
protocol extension. A later commit will send these by default in order
to stress-test the ecosystem during the beta period; that commit will
then be reverted before 19 RC1, so that we can decide what to do with
whatever data has been gathered.
The _pq_.test_protocol_negotiation change here is intentionally docs-
only: after its implementation is reverted, the parameter should remain
reserved.
Extracted/adapted from a patch by Jelte Fennema-Nio.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
Provide a way to disable the use of posix_fallocate() for relation
files. It was introduced by commit 4d330a61bb. The new setting
file_extend_method=write_zeros can be used as a workaround for problems
reported from the field:
* BTRFS compression is disabled by the use of posix_fallocate()
* XFS could produce spurious ENOSPC errors in some Linux kernel
versions, though that problem is reported to have been fixed
The default is file_extend_method=posix_fallocate if available, as
before. The write_zeros option is similar to PostgreSQL < 16, except
that now it's multi-block.
Backpatch-through: 16
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical slots
provided that all logical slots have caught up (i.e., they have no
pending decodable WAL records). Previously, this verification was done
by checking each slot individually, which could be time-consuming if
there were many logical slots to migrate.
This commit optimizes the check to avoid reading the same WAL stream
multiple times. It performs the check only for the slot with the
minimum confirmed_flush_lsn and applies the result to all other slots
in the same database. This limits the check to at most one logical
slot per database.
During the check, we identify the last decodable WAL record's LSN to
report any slots with unconsumed records, consistent with the existing
error reporting behavior. Additionally, the maximum
confirmed_flush_lsn among all logical slots on the database is used as
an early scan cutoff; finding a decodable WAL record beyond this point
implies that no slot has caught up.
Performance testing demonstrated that the execution time remains
stable regardless of the number of slots in the database.
Note that we do not distinguish slots based on their output plugins. A
hypothetical plugin might use a replication origin filter that filters
out changes from a specific origin. In such cases, we might get a
false positive (erroneously considering a slot caught up). However,
this is safe from a data integrity standpoint, such scenarios are
rare, and the impact of a false positive is minimal.
This optimization is applied only when the old cluster is version 19
or later.
Bump catalog version.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com
We can immediately make use of it in pg_signal_backend(), which
previously fetched the process type from the backend status array with
pgstat_get_backend_type_by_proc_number(). That was correct but felt a
little questionable to me: backend status should be for observability
purposes only, not for permission checks.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers. This is incorrect, since the point of copyObject() is to
make a copy to mutate. But apparently no code ran into it.
The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.
typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__. We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
I don't understand how to reach errdetail_abort() with
MyProc->recoveryConflictPending set. If a recovery conflict signal is
received, ProcessRecoveryConflictInterrupt() raises an ERROR or FATAL
error to cancel the query or connection, and abort processing clears
the flag. The error message from ProcessRecoveryConflictInterrupt() is
very clear that the query or connection was terminated because of
recovery conflict.
The only way to reach it AFAICS is with a race condition, if the
startup process sends a recovery conflict signal when the transaction
has just entered aborted state for some other reason. And in that case
the detail would be misleading, as the transaction was already aborted
for some other reason, not because of the recovery conflict.
errdetail_abort() was the only user of the recoveryConflictPending
flag in PGPROC, so we can remove that and all the related code too.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint
with an explicit name, we have to ensure that if the column is already
marked NOT NULL, the provided name matches the existing constraint name.
Failing to do so could lead to confusion regarding which constraint
object actually enforces the rule.
This patch adds a check to throw an error if the user tries to add a
named not-null constraint to a column that already has one with a
different name.
Reported-by: yanliang lei <msdnchina@163.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org