The previous code could allocate pgpa_sj_unique_rel objects in a context
that had too short a lifespan. Fix by allocating them (and any
associated List-related allocations) in the same context as the
pgpa_planner_state to which they are attached. We also need to copy
uniquerel->relids, because the associated RelOptInfo may also be
allocated within a short-lived context.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/a6e6d603-e847-44dc-acd5-879fb4570062@gmail.com
The Makefile failed to set HEADERS_pg_plan_advice, so the header wasn't
installed. Fixing that reveals another problem: since this is just a
loadable module, not an extension, the header file is installed into
$(includedir_server)/contrib rather than $(includedir_server)/extension.
While we have no existing cases of installing header files there, it
appears to be the intent of pgxs.mk. However, this is inconsistent with
meson.build, which was using dir_include_extension. Changing that to
dir_include_server / 'contrib' makes the install locations consistent
across the two builds.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: http://postgr.es/m/CAN4CZFP6NOjv__4Mx+iQD8StdpbHvzDAatEQn2n15UKJ=MySSQ@mail.gmail.com
TOK_IDENT allows only non-keywords; identifier should be used
any place where either keywords or non-keywords should be accepted.
Hence, without this commit, any string that happens to be a keyword
can't be used as a partition schema, partition name, or plan name,
which is incorrect.
Author: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com
Remove a bunch of #include lines from execnodes.h. Most of these
requier suitable typedefs to be added, so that it still compiles
standalone. In one case, the fix is to move a struct definition to the
one .c file where it is needed.
Also some light clean up in plannodes.h and genam.h, though not as
extensive as in execnodes.h.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql
Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
Commit 2a525cc97e introduced the ON_ERROR = 'set_null' option for COPY,
allowing it to be used with foreign tables backed by file_fdw. However,
unlike ON_ERROR = 'ignore', no regression test was added to verify
this behavior for file_fdw.
This commit adds a regression test to ensure that foreign tables using
file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yi Ding <dingyi_yale@163.com>
Discussion: https://postgr.es/m/CAHGQGwGmPc6aHpA5=WxKreiDePiOEitfOFsW2dSo5m81xWXgRA@mail.gmail.com
As of this commit all TupleDescs must have TupleDescFinalize() called on
them once the TupleDesc is set up and before BlessTupleDesc() is called.
In this commit, TupleDescFinalize() does nothing. This change has only
been separated out from the commit that properly implements this function
to make the change more obvious. Any extension which makes its own
TupleDesc will need to be modified to call the new function.
The follow-up commit which properly implements TupleDescFinalize() will
cause any code which forgets to do this to fail in assert-enabled builds in
BlessTupleDesc(). It may still be worth mentioning this change in the
release notes so that extension authors update their code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
This commit plugs into pgstattuple_approx(), the SQL function faster
than pgstattuple() that returns approximate results, the streaming read
APIs. A callback is used to be able to skip all-visible pages via VM
lookup, to match with the logic prior to this commit.
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can substantially improve the execution time
of the function, particularly for large relations.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Since commit 5883ff30b0, some compilers have been warning that the
rtekind variable in unique_nonjoin_rtekind() may be used
uninitialized. There doesn't appear to be any actual risk, so
let's just initialize it to something to silence the compiler
warnings.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sieVNfniCKMDdDjuXGd1OuzMQfTS5%3D9vX3sa-iiujKUA%40mail.gmail.com
The logic of xslt_process() has never considered the fact that
xsltSaveResultToString() would return NULL for an empty string (the
upstream code has always done so, with a string length of 0). This
would cause memcpy() to be called with a NULL pointer, something
forbidden by POSIX.
Like 46ab07ffda and similar fixes, this is backpatched down to all the
supported branches, with a test case to cover this scenario. An empty
string has been always returned in xml2 in this case, based on the
history of the module, so this is an old issue.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c516a0d9-4406-47e3-9087-5ca5176ebcf9@gmail.com
Backpatch-through: 14
This commit replaces the synchronous ReadBufferExtended() loops with the
streaming read routines, affecting pgstatindex() (for btree) and
pgstathashindex() (for hash indexes).
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can result in nice runtime and IO gains.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Avoid allocating memory for an nbtree descent stack during index scans.
We only require a descent stack during inserts, when it is used to
determine where to insert a new pivot tuple/downlink into the target
leaf page's parent page in the event of a page split. (Page deletion's
first phase also performs a _bt_search that requires a descent stack.)
This optimization improves performance by minimizing palloc churn. It
speeds up index scans that call _bt_search frequently/descend the index
many times, especially when the cost of scanning the index dominates
(e.g., with index-only skip scans). Testing has shown that the
underlying issue causes performance problems for an upcoming patch that
will replace btgettuple with a new btgetbatch interface to enable I/O
prefetching.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com
Provide a facility that (1) can be used to stabilize certain plan choices
so that the planner cannot reverse course without authorization and
(2) can be used by knowledgeable users to insist on plan choices contrary
to what the planner believes best. In both cases, terrible outcomes are
possible: users should think twice and perhaps three times before
constraining the planner's ability to do as it thinks best; nevertheless,
there are problems that are much more easily solved with these facilities
than without them.
This patch takes the approach of analyzing a finished plan to produce
textual output, which we call "plan advice", that describes key
decisions made during plan; if that plan advice is provided during
future planning cycles, it will force those key decisions to be made in
the same way. Not all planner decisions can be controlled using advice;
for example, decisions about how to perform aggregation are currently
out of scope, as is choice of sort order. Plan advice can also be edited
by the user, or even written from scratch in simple cases, making it
possible to generate outcomes that the planner would not have produced.
Partial advice can be provided to control some planner outcomes but not
others.
Currently, plan advice is focused only on specific outcomes, such as
the choice to use a sequential scan for a particular relation, and not
on estimates that might contribute to those outcomes, such as a
possibly-incorrect selectivity estimate. While it would be useful to
users to be able to provide plan advice that affects selectivity
estimates or other aspects of costing, that is out of scope for this
commit.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Dian Fay <di@nmfay.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit replaces the synchronous ReadBufferExtended() loops done in
blbulkdelete() and blvacuumcleanup() with the streaming read equivalent,
to improve I/O efficiency during bloom index vacuum cleanup operations.
Under the same test conditions as 6c228755ad, the runtime is proving
to gain around 30% better, with most the benefits coming from a large
reduction of the IO operation based on the stats retrieved in the
scenarios run.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
On platforms where we can read or write the whole LSN atomically, we do
not need to lock the buffer header to prevent torn LSNs. We can do this
only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
pd_lsn field is properly aligned.
For historical reasons the PageXLogRecPtr was defined as a struct with
two uint32 fields. This replaces it with a single uint64 value, to make
the intent clearer. To prevent issues with weak typedefs the value is
still wrapped in a struct.
This also adjusts heapfuncs() in pageinspect, to ensure proper alignment
when reading the LSN from a page on alignment-sensitive hardware.
Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
Peter Geoghegan. Minor tweaks by me.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/b6610c3b-3f59-465a-bdbb-8e9259f0abc4@proxel.se
This commit replaces the per-page buffer read look in blgetbitmap() with
a reading stream, to improve scan efficiency, particularly useful for
large bloom indexes. Some benchmarking with a large number of rows has
shown a very nice improvement in terms of runtime and IO read reduction
with test cases up to 10M rows for a bloom index scan.
For the io_uring method, The author has reported a 3x in runtime with
io_uring while I was at close to a 7x. For the worker method with 3
workers, the author has reported better numbers than myself in runtime,
with the reduction in IO stats being appealing for all the cases
measured.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Allow CREATE SUBSCRIPTION to accept a foreign server using the SERVER
clause instead of a raw connection string using the CONNECTION clause.
* Enables a user with sufficient privileges to create a subscription
using a foreign server by name without specifying the connection
details.
* Integrates with user mappings (and other FDW infrastructure) using
the subscription owner.
* Provides a layer of indirection to manage multiple subscriptions
to the same remote server more easily.
Also add CREATE FOREIGN DATA WRAPPER ... CONNECTION clause to specify
a connection_function. To be eligible for a subscription, the foreign
server's foreign data wrapper must specify a connection_function.
Add connection_function support to postgres_fdw, and bump postgres_fdw
version to 1.3.
Bump catversion.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.camel@j-davis.com
wait_event.h itself includes wait_event_types.h, which is a generated
file, so it's nice that we can avoid compiling >10% of the tree just
because that file is regenerated.
To avoid breaking too many third-party modules, we now #include
utils/wait_classes.h in storage/latch.h. Then, the very common case
of doing
WaitLatch(..., PG_WAIT_EXTENSION)
continues to work by including just storage/latch.h. (I didn't try to
determine how many modules would actually break if we don't do this, but
this seems a convenient and low-impact measure.)
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202602181214.gcmhx2vhlxzp@alvherre.pgsql
We hadn't noticed this violation of -Wshadow=compatible-local
because this function isn't compiled without -DTRGM_REGEXP_DEBUG.
As long as we have to clean it up, let's do so by converting all
this function's loops to use C99 loop-local control variables.
Reported-by: Sergei Kornilov <sk@zsrv.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3009911772478436@08341ecb-668d-43a9-af4d-b45f00c72521
Presently, the GUC check hook for basic_archive.archive_directory
checks that the specified directory exists. Consequently, if the
directory does not exist at server startup, archiving will be stuck
indefinitely, even if it appears later. To fix, remove this check
from the hook so that archiving will resume automatically once the
directory is present. basic_archive must already be prepared to
deal with the directory disappearing at any time, so no additional
special handling is required.
Reported-by: Олег Самойлов <splarv@ya.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/73271769675212%40mail.yandex.ru
Backpatch-through: 15
This commit updates the frontend tools (src/bin/, contrib/ and
src/test/) to use the memory allocation variants based on
pg_malloc_object() and pg_malloc_array() in various code paths. This
does not cover all the allocations, but a good chunk of them.
Like all the changes of this kind (31d3847a37, etc.), this should
encourage any future code to use this new style.
Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/cfb645da-6b3a-4f22-9bcc-5bc46b0e9c61@proxel.se
Commit 84d5efa7e3 missed some multibyte issues caused by short-circuit
logic in the callers. The callers assumed that if the predicate string
is longer than the label string, then it couldn't possibly be a match,
but it can be when using case-insensitive matching (LVAR_INCASE) if
casefolding changes the byte length.
Fix by refactoring to get rid of the short-circuit logic as well as
the function pointer, and consolidate the logic in a replacement
function ltree_label_match().
Discussion: https://postgr.es/m/02c6ef6cf56a5013ede61ad03c7a26affd27d449.camel@j-davis.com
Backpatch-through: 14
This commit includes a batch of fixes for various minor typos and
grammar mistakes, that have been proposed to the hackers mailing list
since the beginning of January.
Similar batches are planned on a bi-monthly basis depending on the
amount received, with the next one for the end of April.
If the 'bounds' array needs to be expanded, because the input contains
more trigrams than the initial guess, the code didn't return the
reallocated array correctly to the caller. That could lead to a crash
in the rare case that the input string becomes longer when it's
lower-cased. The only known instance of that is when an ICU locale is
used with certain single-byte encodings. This was an oversight in
commit 00896ddaf4.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 18
Instead of using comments to mark fallthrough switch cases, use the
fallthrough attribute. This will (in the future, not here) allow
supporting other compilers besides gcc. The commenting convention is
only supported by gcc, the attribute is supported by clang, and in the
fullness of time the C23 standard attribute would allow supporting
other compilers as well.
Right now, we package the attribute into a macro called
pg_fallthrough. This commit defines that macro and replaces the
existing comments with that macro invocation.
We also raise the level of the gcc -Wimplicit-fallthrough= option from
3 to 5 to enforce the use of the attribute.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
The main purpose of this change is to allow an ABI checker to understand
when the list of SysCacheIdentifier changes, by switching all the
routine declarations that relied on a signed integer for a syscache ID
to this new type. This is going to be useful in the long-term for
versions newer than v19 so as we will be able to check when the list of
values in SysCacheIdentifier is updated in a non-ABI compliant fashion.
Most of the changes of this commit are due to the new definition of
SyscacheCallbackFunction, where a SysCacheIdentifier is now required for
the syscache ID. It is a mechanical change, still slightly invasive.
There are more areas in the tree that could be improved with an ABI
checker in mind; this takes care of only one area.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/289125.1770913057@sss.pgh.pa.us
The receive function of hstore was not able to handle correctly
duplicate key values when a new duplicate links to a NULL value, where a
pfree() could be attempted on a NULL pointer, crashing due to a pointer
dereference.
This problem would happen for a COPY BINARY, when stacking values like
that:
aa => 5
aa => null
The second key/value pair is discarded and pfree() calls are attempted
on its key and its value, leading to a pointer dereference for the value
part as the value is NULL. The first key/value pair takes priority when
a duplicate is found.
Per offline report.
Reported-by: "Anemone" <vergissmeinnichtzh@gmail.com>
Reported-by: "A1ex" <alex000young@gmail.com>
Backpatch-through: 14
The error message added in 379695d3cc referred to the public key being
too long. This is confusing as it is in fact the session key included
in a PGP message which is too long. This is harmless, but let's be
precise about what is wrong.
Per offline report.
Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 14
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.
The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).
Bumps catversion as stored rules change.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
The buildfarm occasionally shows a variant row order in the output
of this UPDATE ... RETURNING, implying that the preceding INSERT
dropped one of the rows into some free space within the table rather
than appending them all at the end. It's not entirely clear why that
happens some times and not other times, but we have established that
it's affected by concurrent activity in other databases of the
cluster. In any case, the behavior is not wrong; the test is at fault
for presuming that a seqscan will give deterministic row ordering.
Add an ORDER BY atop the update to stop the buildfarm noise.
The buildfarm seems to have shown this only in v18 and master
branches, but just in case the cause is older, back-patch to
all supported branches.
Discussion: https://postgr.es/m/3866274.1770743162@sss.pgh.pa.us
Backpatch-through: 14
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
The IS DISTINCT FROM construct compares values acting as though NULL
were a normal data value, rather than "unknown". Semantically, "x IS
DISTINCT FROM y" yields true if the values differ or if exactly one is
NULL, and false if they are equal or both NULL. Unlike ordinary
comparison operators, it never returns NULL.
Previously, the planner only simplified this construct if all inputs
were constants, folding it to a constant boolean result. This patch
extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable. For cases
where both inputs are guaranteed not to be NULL, the expression
becomes semantically equivalent to "x <> y", and the DistinctExpr is
converted into an inequality OpExpr.
This transformation provides several benefits. It converts the
comparison into a standard operator, allowing the use of partial
indexes and constraint exclusion. Furthermore, if the clause is
negated (i.e., "IS NOT DISTINCT FROM"), it simplifies to an equality
operator. This enables the planner to generate better plans using
index scans, merge joins, hash joins, and EC-based qual deduction.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
pgp_sym_decrypt() and pgp_pub_decrypt() will raise such errors, while
bytea variants will not. The existing "dat3" test decrypted to non-UTF8
text, so switch that query to bytea.
The long-term intent is for type "text" to always be valid in the
database encoding. pgcrypto has long been known as a source of
exceptions to that intent, but a report about exploiting invalid values
of type "text" brought this module to the forefront. This particular
exception is straightforward to fix, with reasonable effect on user
queries. Back-patch to v14 (all supported versions).
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Author: shihao zhong <zhong950419@gmail.com>
Reviewed-by: cary huang <hcary328@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqRZyo0gLxPJqUsDqtWYBbgM14betsHiLRPj9mo2=z9VvA@mail.gmail.com
Backpatch-through: 14
Security: CVE-2026-2006
A security patch changed them today, so close the coverage gap now.
Test that buffer overrun is avoided when pg_mblen*() requires more
than the number of bytes remaining.
This does not cover the calls in dict_thesaurus.c or in dict_synonym.c.
That code is straightforward. To change that code's input, one must
have access to modify installed OS files, so low-privilege users are not
a threat. Testing this would likewise require changing installed
share/postgresql/tsearch_data, which was enough of an obstacle to not
bother.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:
latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
lower
-----------
i\x1A\x1A
(1 row)
In this example, lower-casing expands the single input character into
three characters.
The generate_trgm_only() function relied on that assumption in two
ways:
- It used "slen * pg_database_encoding_max_length() + 4" to allocate
the buffer to hold the lowercased and blank-padded string. That
formula accounts for expansion if the lower-case characters are
longer (in bytes) than the originals, but it's still not enough if
the lower-cased string contains more *characters* than the original.
- Its callers sized the output array to hold the trigrams extracted
from the input string with the formula "(slen / 2 + 1) * 3", where
'slen' is the input string length in bytes. (The formula was
generous to account for the possibility that RPADDING was set to 2.)
That's also not enough if one input byte can turn into multiple
characters.
To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
The function assumed that if charlen == bytelen, there are no
multibyte characters in the string. That's sensible, but the callers
were a little careless in how they calculated the lengths. The callers
converted the string to lowercase before calling make_trigram(), and
the 'charlen' value was calculated *before* the conversion to
lowercase while 'bytelen' was calculated after the conversion. If the
lowercased string had a different number of characters than the
original, make_trigram() might incorrectly apply the fastpath and
treat all the bytes as single-byte characters, or fail to apply the
fastpath (which is harmless), or it might hit the "Assert(bytelen ==
charlen)" assertion. I'm not aware of any locale / character
combinations where you could hit that assertion in practice,
i.e. where a string converted to lowercase would have fewer characters
than the original, but it seems best to avoid making that assumption.
To fix, remove the 'charlen' argument. To keep the performance when
there are no multibyte characters, always try the fast path first, but
check the input for multibyte characters as we go. The check on each
byte adds some overhead, but it's close enough. And to compensate, the
find_word() function no longer needs to count the characters.
This fixes one small bug in make_trigrams(): in the multibyte
codepath, it peeked at the byte just after the end of the input
string. When compiled with IGNORECASE, that was harmless because there
is always a NUL byte or blank after the input string. But with
!IGNORECASE, the call from generate_wildcard_trgm() doesn't guarantee
that.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
pgp_pub_decrypt_bytea() was missing a safeguard for the session key
length read from the message data, that can be given in input of
pgp_pub_decrypt_bytea(). This can result in the possibility of a buffer
overflow for the session key data, when the length specified is longer
than PGP_MAX_KEY, which is the maximum size of the buffer where the
session data is copied to.
A script able to rebuild the message and key data that can trigger the
overflow is included in this commit, based on some contents provided by
the reporter, heavily editted by me. A SQL test is added, based on the
data generated by the script.
Reported-by: Team Xint Code as part of zeroday.cloud
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2005
Backpatch-through: 14