Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
In 0b96e734c5 I (Andres) relied on page_collect_tuples() being called only
with an MVCC snapshot, and added assertions to that end, but did not realize
that IsMVCCSnapshot() allows both proper MVCC snapshots and historical
snapshots, which behave quite similarly to MVCC snapshots.
Unfortunately that can lead to incorrect visibility results during logical
decoding, as a historical snapshot is interpreted as a plain MVCC
snapshot. The only reason this wasn't noticed earlier is that it's hard to
reach as most of the time there are no sequential scans during logical
decoding.
To fix the bug and avoid issues like this in the future, split
IsMVCCSnapshot() into IsMVCCSnapshot() and IsMVCCLikeSnapshot(), where now
only the latter includes historic snapshots.
One effect of this is that during logical decoding no page-at-a-time snapshots
are used, as otherwise runtime branches to handle historic snapshots would be
needed in some performance critical paths. Given how uncommon sequential scans
are during logical decoding, that seems acceptable.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/61812.1770637345@localhost
Add an optional bool *has_volatile output parameter to
DomainHasConstraints(). When non-NULL, the function checks whether any
CHECK constraint contains a volatile expression. Callers that don't
need this information pass NULL and get the same behavior as before.
This is needed by a subsequent commit that enables the fast default
optimization for domains with non-volatile constraints: we can safely
evaluate such constraints once at ALTER TABLE time, but volatile
constraints require a full table rewrite.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io>
Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com
The planner has historically been unable to convert "x NOT IN (SELECT
y ...)" sublinks into anti-joins. This is because standard SQL
semantics for NOT IN require that if the comparison "x = y" returns
NULL, the "NOT IN" expression evaluates to NULL (effectively false),
causing the row to be discarded. In contrast, an anti-join preserves
the row if no match is found. Due to this semantic mismatch regarding
NULL handling, the conversion was previously considered unsafe.
However, if we can prove that neither side of the comparison can yield
NULL values, and further that the operator itself cannot return NULL
for non-null inputs, the behavior of NOT IN and anti-join becomes
identical. Enabling this conversion allows the planner to treat the
sublink as a first-class relation rather than an opaque SubPlan
filter. This unlocks global join ordering optimization and permits
the selection of the most efficient join algorithm based on cost,
often yielding significant performance improvements for large
datasets.
This patch verifies that neither side of the comparison can be NULL
and that the operator is safe regarding NULL results before performing
the conversion.
To verify operator safety, we require that the operator be a member of
a B-tree or Hash operator family. This serves as a proxy for standard
boolean behavior, ensuring the operator does not return NULL on valid
non-null inputs, as doing so would break index integrity.
For operand non-nullability, this patch makes use of several existing
mechanisms. It leverages the outer-join-aware-Var infrastructure to
verify that a Var does not come from the nullable side of an outer
join, and consults the NOT-NULL-attnums hash table to efficiently
verify schema-level NOT NULL constraints. Additionally, it employs
find_nonnullable_vars to identify Vars forced non-nullable by qual
clauses, and expr_is_nonnullable to deduce non-nullability for other
expression types.
The logic for verifying the non-nullability of the subquery outputs
was adapted from prior work by David Rowley and Tom Lane.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com
REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command. Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.
We retain those older commands, but de-emphasize them in the
documentation, in favor of REPACK; the difference between VACUUM FULL
and CLUSTER (namely, the fact that tuples are written in a specific
ordering) is neatly absorbed as two different modes of REPACK.
This allows us to introduce further functionality in the future that
works regardless of whether an ordering is being applied, such as (and
especially) a concurrent mode.
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
The main purpose of this change is to allow an ABI checker to understand
when the list of SysCacheIdentifier changes, by switching all the
routine declarations that relied on a signed integer for a syscache ID
to this new type. This is going to be useful in the long-term for
versions newer than v19 so as we will be able to check when the list of
values in SysCacheIdentifier is updated in a non-ABI compliant fashion.
Most of the changes of this commit are due to the new definition of
SyscacheCallbackFunction, where a SysCacheIdentifier is now required for
the syscache ID. It is a mechanical change, still slightly invasive.
There are more areas in the tree that could be improved with an ABI
checker in mind; this takes care of only one area.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/289125.1770913057@sss.pgh.pa.us
Radix sort can be much faster than quicksort, but for our purposes it
is limited to sequences of unsigned bytes. To make tuples with other
types amenable to this technique, several features of tuple comparison
must be accounted for, i.e. the sort key must be "normalized":
1. Signedness -- It's possible to modify a signed integer such that
it can be compared as unsigned. For example, a signed char has range
-128 to 127. If we cast that to unsigned char and add 128, the range
of values becomes 0 to 255 while preserving order.
2. Direction -- SQL allows specification of ASC or DESC. The
descending case is easily handled by taking the complement of the
unsigned representation.
3. NULL values -- NULLS FIRST and NULLS LAST must work correctly.
This commmit only handles the case where datum1 is pass-by-value
Datum (possibly abbreviated) that compares like an ordinary
integer. (Abbreviations of values of type "numeric" are a convenient
counterexample.) First, tuples are partitioned by nullness in the
correct NULL ordering. Then the NOT NULL tuples are sorted with radix
sort on datum1. For tiebreaks on subsequent sortkeys (including the
first sort key if abbreviated), we divert to the usual qsort.
ORDER BY queries on pre-warmed buffers are up to 2x faster on high
cardinality inputs with radix sort than the sort specializations added
by commit 697492434, so get rid of them. It's sufficient to fall back
to qsort_tuple() for small arrays. Moderately low cardinality inputs
show more modest improvents. Our qsort is strongly optimized for very
low cardinality inputs, but radix sort is usually equal or very close
in those cases.
The changes to the regression tests are caused by under-specified sort
orders, e.g. "SELECT a, b from mytable order by a;". For unstable
sorts, such as our qsort and this in-place radix sort, there is no
guarantee of the order of "b" within each group of "a".
The implementation is taken from ska_byte_sort() (Boost licensed),
which is similar to American flag sort (an in-place radix sort) with
modifications to make it better suited for modern pipelined CPUs.
The technique of normalization described above can also be extended
to the case of multiple keys. That is left for future work (Thanks
to Peter Geoghegan for the suggestion to look into this area).
Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com
* Remove an unused variable
* Use "default log level" consistently (instead of "generic")
* Keep the process types in alphabetical order (missed one place in the
SGML docs)
* Since log_min_messages type was changed from enum to string, it
is a good idea to add single quotes when printing it out. Otherwise
it fails if the user copies and pastes from the SHOW output to SET,
except in the simplest case. Using single quotes reduces confusion.
* Use lowercase string for the burned-in default value, to keep the same
output as previous versions.
Author: Euler Taveira <euler@eulerto.com>
Author: Man Zeng <zengman@halodbtech.com>
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
Change log_min_messages from being a single element to a comma-separated
list of type:level elements, with 'type' representing a process type,
and 'level' being a log level to use for that type of process. The list
must also have a freestanding level specification which is used for
process types not listed, which convenientely makes the whole thing
backwards-compatible.
Some choices made here could be contested; for instance, we use the
process type `backend` to affect regular backends as well as dead-end
backends and the standalone backend, and `autovacuum` means both the
launcher and the workers. I think it's largely sensible though, and it
can easily be tweaked if desired.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tan Yang <332696245@qq.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
We can immediately make use of it in pg_signal_backend(), which
previously fetched the process type from the backend status array with
pgstat_get_backend_type_by_proc_number(). That was correct but felt a
little questionable to me: backend status should be for observability
purposes only, not for permission checks.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi
Separate att_align_nominal() into two macros, similarly to what
was already done with att_align_datum() and att_align_pointer().
The inner macro att_nominal_alignby() is really just TYPEALIGN(),
while att_align_nominal() retains its previous API by mapping
TYPALIGN_xxx values to numbers of bytes to align to and then
calling att_nominal_alignby(). In support of this, split out
tupdesc.c's logic to do that mapping into a publicly visible
function typalign_to_alignby().
Having done that, we can replace performance-critical uses of
att_align_nominal() with att_nominal_alignby(), where the
typalign_to_alignby() mapping is done just once outside the loop.
In most places I settled for doing typalign_to_alignby() once
per function. We could in many places pass the alignby value
in from the caller if we wanted to change function APIs for this
purpose; but I'm a bit loath to do that, especially for exported
APIs that extensions might call. Replacing a char typalign
argument by a uint8 typalignby argument would be an API change
that compilers would fail to warn about, thus silently breaking
code in hard-to-debug ways. I did revise the APIs of array_iter_setup
and array_iter_next, moving the element type attribute arguments to
the former; if any external code uses those, the argument-count
change will cause visible compile failures.
Performance testing shows that ExecEvalScalarArrayOp is sped up by
about 10% by this change, when using a simple per-element function
such as int8eq. I did not check any of the other loops optimized
here, but it's reasonable to expect similar gains.
Although the motivation for creating this patch was to avoid a
performance loss if we add some more typalign values, it evidently
is worth doing whether that patch lands or not.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
The build generates four files based on the wait event contents stored
in wait_event_names.txt:
- wait_event_types.h
- pgstat_wait_event.c
- wait_event_funcs_data.c
- wait_event_types.sgml
The SGML file is generated as part of a documentation build, with its
data stored in doc/src/sgml/ for meson and configure. The three others
are handled differently for meson and configure:
- In configure, all the files are created in src/backend/utils/activity/.
A link to wait_event_types.h is created in src/include/utils/.
- In meson, all the files are created in src/include/utils/.
The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are
then included in respectively wait_event.c and wait_event_funcs.c,
without the "utils/" path.
For configure, this does not present a problem. For meson, this has to
be combined with a trick in src/backend/utils/activity/meson.build,
where include_directories needs to point to include/utils/ to make the
inclusion of the C files work properly, causing builds to pull in
PostgreSQL headers rather than system headers in some build paths, as
src/include/utils/ would take priority.
In order to fix this issue, this commit reworks the way the C/H files
are generated, becoming consistent with guc_tables.inc.c:
- For meson, basically nothing changes. The files are still generated
in src/include/utils/. The trick with include_directories is removed.
- For configure, the files are now generated in src/backend/utils/, with
links in src/include/utils/ pointing to the ones in src/backend/. This
requires extra rules in src/backend/utils/activity/Makefile so as a
make command in this sub-directory is able to work.
- The three files now fall under header-stamp, which is actually simpler
as guc_tables.inc.c does the same.
- wait_event_funcs_data.c and pgstat_wait_event.c are now included with
"utils/" in their path.
This problem has not been an issue in the buildfarm; it has been noted
with AIX and a conflict with float.h. This issue could, however, create
conflicts in the buildfarm depending on the environment with unexpected
headers pulled in, so this fix is backpatched down to where the
generation of the wait-event files has been introduced.
While on it, this commit simplifies wait_event_names.txt regarding the
paths of the files generated, to mention just the names of the files
generated. The paths where the files are generated became incorrect.
The path of the SGML path was wrong.
This change has been tested in the CI, down to v17. Locally, I have run
tests with configure (with and without VPATH), as well as meson, on the
three branches.
Combo oversight in fa88928470 and 1e68e43d3f.
Reported-by: Aditya Kamath <aditya.kamath1@ibm.com>
Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com
Backpatch-through: 17
This fixes cases where a qualifier (const, in all cases here) was
dropped by a cast, but the cast was otherwise necessary or desirable,
so the straightforward fix is to add the qualifier into the cast.
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org
Continuing to support this backwards-compatibility feature has
nontrivial costs; in particular it is potentially a security hazard
if an application somehow gets confused about which setting the
server is using. We changed the default to ON fifteen years ago,
which seems like enough time for applications to have adapted.
Let's remove support for the legacy string syntax.
We should not remove the GUC altogether, since client-side code will
still test it, pg_dump scripts will attempt to set it to ON, etc.
Instead, just prevent it from being set to OFF. There is precedent
for this approach (see commit de66987ad).
This patch does remove the related GUC escape_string_warning, however.
That setting does nothing when standard_conforming_strings is on,
so it's now useless. We could leave it in place as a do-nothing
setting to avoid breaking clients that still set it, if there are any.
But it seems likely that any such client is also trying to turn off
standard_conforming_strings, so it'll need work anyway.
The client-side changes in this patch are pretty minimal, because even
though we are dropping the server's support, most of our clients still
need to be able to talk to older server versions. We could remove
dead client code only once we disclaim compatibility with pre-v19
servers, which is surely years away. One change of note is that
pg_dump/pg_dumpall now set standard_conforming_strings = on in their
source session, rather than accepting the source server's default.
This ensures that literals in view definitions and such will be
printed in a way that's acceptable to v19+. In particular,
pg_upgrade will work transparently even if the source installation has
standard_conforming_strings = off. (However, pg_restore will behave
the same as before if given an archive file containing
standard_conforming_strings = off. Such an archive will not be safely
restorable into v19+, but we shouldn't break the ability to extract
valid data from it for use with an older server.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
From the user's point of view these are just Boolean values; from the
implementation side we can now distinguish an option that hasn't been
set. Reimplement the vacuum_truncate reloption using this type.
This could also be used for reloptions vacuum_index_cleanup and
buffering, but those additionally need a per-option "alias" for the
state where the variable is unset (currently the value "auto").
Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.
For future patches, we require git archaelogical analysis before we
accept patches of this nature.
Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
Doing this meant that those two headers, which are supposed to be
internal to their corresponding index AMs, were being included pretty
much universally, because tuplesort.h is included by execnodes.h which
is very widely used. Stop that, and fix fallout.
We also change indexing.h to no longer include execnodes.h (tuptable.h
is sufficient), and relscan.h to no longer include buf.h (pointless
since c2fe139c20).
Author: Mario González <gonzalemario@gmail.com>
Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com
Some structs and enums related to parallel query instrumentation had
organically grown scattered across various files, and were causing
header pollution especially through execnodes.h. Create a single file
where they can live together.
This only moves the structs to the new file; cleaning up the pollution
by removing no-longer-necessary cross-header inclusion will be done in
future commits.
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Mario González <gonzalemario@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql
Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree). All of these have been spotted by the author.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call. That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil. Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index. There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM. Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.
(This is similar to the way we were already handling TableAmRoutine
structs. This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)
This commit needs to be called out in the v19 release notes as an API
change for extension index AMs. An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.
This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.
When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.
While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.
The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.
A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.
Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
4ba012a8ed defined the "header" (pointer to the stats data) of
from_serialized_data() as a const, even though it is fine (and
expected!) for the callback to modify the shared memory entry when
loading the stats at startup.
While on it, this commit updates the callback to_serialized_data() in
the test module test_custom_stats to make the data extracted from the
"header" parameter a const since it should never be modified: the stats
are written to disk and no modifications are expected in the shared
memory entry.
This clarifies the API contract of these new callbacks.
Reported-By: Peter Eisentraut <peter@eisentraut.org>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/d87a93b0-19c7-4db6-b9c0-d6827e7b2da1@eisentraut.org
ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.
We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would be a behavior
change and require additional infrastructure though; so for now, just
avoid the global LC_CTYPE dependency.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.
For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().
One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding. It rewrites other comments.
Back-patch to v18, where commit 243e9b40f1
first appeared. Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN
contains no calls to that function.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
Cumulative stats kinds gain the capability to write additional per-entry
data when flushing the stats at shutdown, and read this data when
loading back the stats at startup. This can be fit for example in the
case of variable-length data (like normalized query strings), so as it
becomes possible to link the shared memory stats entries to data that is
stored in a different area, like a DSA segment.
Three new optional callbacks are added to PgStat_KindInfo, available to
variable-numbered stats kinds:
* to_serialized_data: writes auxiliary data for an entry.
* from_serialized_data: reads auxiliary data for an entry.
* finish: performs actions after read/write/discard operations. This is
invoked after processing all the entries of a kind, allowing extensions
to close file handles and clean up resources.
Stats kinds have the option to store this data in the existing pgstats
file, but can as well store it in one or more additional files whose
names can be built upon the entry keys. The new serialized callbacks
are called once an entry key is read or written from the main stats
file. A file descriptor to the main pgstats file is available in the
arguments of the callbacks.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
The current list from the buildfarm includes quite a few typedef
names that it used to miss. The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array. In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it. Install the current list and re-run pgindent
to update affected code.
Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.
Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
When relptr.h was added (commit fbc1c12a94), there was no check for
HAVE_TYPEOF, so it used HAVE__BUILTIN_TYPES_COMPATIBLE_P, which
already existed (commit ea473fb2de) and which was thought to cover
approximately the same compilers. But the guarded code can also work
without HAVE__BUILTIN_TYPES_COMPATIBLE_P, and we now have a check for
HAVE_TYPEOF (commit 4cb824699e), so let's fix this up to use the
correct logic.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN%2BLq341V7QL775DA%40mail.gmail.com
It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.
This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification. In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.
However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.
I tried to improve the comments in this area too.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT)
will be 16 on most platforms due to alignment padding. Using the
sizeof number is no problem for usages such as palloc'ing a result
datum, but in usages such as datumCopy we really ought to match
what pg_type says. Add a macro TIMETZ_TYPLEN so that we have a
symbolic way to write that rather than hard-coding "12".
I cannot find any place where we've needed this so far, but an
upcoming patch requires it.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2329959.1765047648@sss.pgh.pa.us
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Presently, this view reports NULL for the size of DSAs and dshash
tables because 1) the current backend might not be attached to them
and 2) the registry doesn't save the pointers to the dsa_area or
dshash_table in local memory. Also, the view doesn't show
partially-initialized entries to avoid ambiguity, since those
entries would report a NULL size as well.
This commit introduces a function that looks up the size of a DSA
given its handle (transiently attaching to the control segment if
needed) and teaches pg_dsm_registry_allocations to use it to show
the size of successfully-initialized DSA and dshash entries.
Furthermore, the view now reports partially-initialized entries
with a NULL size.
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan
This commit updates two functions that convert "timestamptz" to
"timestamp", and vice-versa, to use the soft error reporting rather than
a their own logic to do the same. These are now named as follows:
- timestamp2timestamptz_safe()
- timestamptz2timestamp_safe()
These functions were suffixed with "_opt_overflow", previously.
This shaves some code, as it is possible to detect how a timestamp[tz]
overflowed based on the returned value rather than a custom state. It
is optionally possible for the callers of these functions to rely on the
error generated internally by these functions, depending on the error
context.
Similar work has been done in d03668ea05 and 4246a977ba.
Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz
The regex mechanism scans through the first "max_chr" character values
to cache character property ranges (isalpha, etc.). For single-byte
encodings, there's no sense in scanning beyond UCHAR_MAX; but for
UTF-8 it makes sense to cache higher code point values (though not all
of them; only up to MAX_SIMPLE_CHR).
Prior to 5a38104b36, the logic about how many character values to scan
was based on the pg_regex_strategy, which was dependent on the
provider. Commit 5a38104b36 preserved that logic exactly, allowing
different providers to define the "max_chr".
Now, change it to depend only on the encoding and whether
ctype_is_c. For this specific calculation, distinguishing between
providers creates more complexity than it's worth.
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
This commit changes some functions related to the data types date and
timestamp to use the soft error reporting rather than a custom boolean
flag called "overflow", used to let the callers of these functions know
if an overflow happens.
This results in the removal of some boilerplate code, as it is possible
to rely on an error context rather than a custom state, with the
possibility to use the error generated inside the functions updated
here, if necessary.
These functions were suffixed with "_opt_overflow". They are now
renamed to use "_safe" as suffix.
This work is similar to 4246a977ba.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com
The code comment said, "It is expected that this routine will
eventually be replaced with the C99 hypot() function.", so let's do
that now.
This function is tested via the geometry regression test, so if it is
faulty on any platform, it will show up there.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
This commit renames write_chunk and read_chunk to respectively
pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s
convenience macros.
These are made available for plug-ins, so as any code that decides to
write and/or read stats data can rely on a single code path for this
work.
Extracted from a larger patch by the same author.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com