Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.
This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.
When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.
While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.
The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.
A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.
Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
Correct the referenced location of the RangeTblEntry definition
in the pg_overexplain documentation.
Backpatched to v18, where pg_overexplain was introduced.
Author: Julien Tachoires <julien@tachoires.me>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20251218092319.tht64ffmcvzqdz7u@poseidon.home.virt
Backpatch-through: 18
In the wake of commit db6a4a985, remove most use of 'md5' from the
example configuration file. The only remainder is an example exception
for a client that doesn't support SCRAM.
Author: Mikael Gustavsson <mikael.gustavsson@smhi.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/176595607507.978865.11597773194269211255@wrigleys.postgresql.org
Discussion: https://postgr.es/m/4ed268473fdb4cf9b0eced6c8019d353@smhi.se
Backpatch-through: 18
The definition of PGoauthBearerRequest uses a temporary SOCKTYPE macro
to hide the difference between Windows and Berkeley socket handles,
since we don't surface pgsocket in our public API. This macro doesn't
need to escape the header, because implementers will choose the correct
socket type based on their platform, so I #undef'd it immediately after
use.
I didn't namespace that helper, though, so if anyone else needs a
SOCKTYPE macro, libpq-fe.h will now unhelpfully get rid of it. This
doesn't seem too far-fetched, given its proximity to existing POSIX
macro names.
Add a PQ_ prefix to avoid collisions, update and improve the surrounding
documentation, and backpatch.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
Allow pg_createsubscriber to reuse existing publications instead of
failing when they already exist on the publisher.
Previously, pg_createsubscriber would fail if any specified publication
already existed. Now, existing publications are reused as-is with their
current configuration, and non-existing publications are created
automatically with FOR ALL TABLES.
This change provides flexibility when working with mixed scenarios of
existing and new publications. Users should verify that existing
publications have the desired configuration before reusing them, and can
use --dry-run with verbose mode to see which publications will be reused
and which will be created.
Only publications created by pg_createsubscriber are cleaned up during
error cleanup operations. Pre-existing publications are preserved unless
'--clean=publications' is explicitly specified, which drops all
publications.
This feature would be helpful for pub-sub configurations where users want
to subscribe to a subset of tables from the publisher.
Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: tianbing <tian_bing_0531@163.com>
Discussion: https://postgr.es/m/CAHv8Rj%2BsxWutv10WiDEAPZnygaCbuY2RqiLMj2aRMH-H3iZwyA%40mail.gmail.com
Commit 119fc30 moved CompareType to cmptype.h but the mention in
the docs still refered to primnodes.h
Author: Daisuke Higuchi <higuchi.daisuke11@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAEVT6c8guXe5P=L_Un5NUUzCgEgbHnNcP+Y3TV2WbQh-xjiwqA@mail.gmail.com
Backpatch-through: 18
This commit adds a new "void *arg" parameter to
GetNamedDSMSegment() that is passed to the initialization callback
function. This is useful for reusing an initialization callback
function for multiple DSM segments.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
Previously, pg_sync_replication_slots() would finish without synchronizing
slots that didn't meet requirements, rather than failing outright. This
could leave some failover slots unsynchronized if required catalog rows or
WAL segments were missing or at risk of removal, while the standby
continued removing needed data.
To address this, the function now waits for the primary slot to advance to
a position where all required data is available on the standby before
completing synchronization. It retries cyclically until all failover slots
that existed on the primary at the start of the call are synchronized.
Slots created after the function begins are not included. If the standby
is promoted during this wait, the function exits gracefully and the
temporary slots will be removed.
Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
This new DDL command splits a single partition into several partitions. Just
like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are
created using the createPartitionTable() function with the parent partition
as the template.
This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing. This is why the new DDL command
can't be recommended for large, partitioned tables under high load. However,
this implementation comes in handy in certain cases, even as it is. Also, it
could serve as a foundation for future implementations with less locking and
possibly parallelism.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
This new DDL command merges several partitions into a single partition of the
target table. The target partition is created using the new
createPartitionTable() function with the parent partition as the template.
This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing. This is why this new DDL
command can't be recommended for large partitioned tables under a high load.
However, this implementation comes in handy in certain cases, even as it is.
Also, it could serve as a foundation for future implementations with less
locking and possibly parallelism.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
The reference to the test module test_custom_stats should have been
added under the section "Custom Cumulative Statistics", but the section
"Injection Points" has been updated instead, reversing the references
for both test modules.
d52c24b0f8 has removed a paragraph that was correct, and 31280d96a6
has added a paragraph that was incorrect.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4heX926+ZNh63u12gLd9jgauU6yiirKc7xGo1G01PXQ@mail.gmail.com
This new option instructs vacuumdb to print, but not execute, the
VACUUM and ANALYZE commands that would've been sent to the server.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
The new column, started_by, indicates the initiator of the
analyze ('manual' or 'autovacuum'), helping users and monitoring tools
to better understand ANALYZE behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.
Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.
We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.
Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
The description of deferrable constraints in create_table.sgml states
that deferrable constraints cannot be used as conflict arbitrators in
an INSERT with an ON CONFLICT DO UPDATE clause, but in fact this
restriction applies to all ON CONFLICT clauses, not just those with DO
UPDATE. Fix this, and while at it, change the word "arbitrators" to
"arbiters", to match the terminology used elsewhere.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWsybvZP3ce8rGcVNx-QHuDOJZDz8y=p1SzqHwjRXyV4Q@mail.gmail.com
Backpatch-through: 14
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/
Similar work has been done in 31d3847a37.
The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Plus a similar fix to the README.
Backpatch as far back as the sgml issue exists. The README issue does
exist in v14, but that seems unlikely to harm anyone.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
This test module acts as a replacement that existed prior to
d52c24b0f8 in the test module injection_points. It uses a more
flexible structure than its ancestor:
- Two libraries are built, one for fixed-sized stats and one for
variable-sized stats.
- No GUCs required. The stats are enabled only if one or both libraries
are loaded with shared_preload_libraries.
- Same kind IDs reserved: 25 (variable-sized) and 26 (fixed-sized)
The goal of this redesign is to be able to easier extend the code
coverage provided by this module for other changes that are currently
under discussion, and injection_points was not suited for these.
Injection points are also now widely used in the tree now, so extending
more the test coverage for custom pgstats in the test module
injection_points would be a riskier long-term move.
The new code is mostly a copy of what existed previously in the test
module injection_points, with the same callbacks defined for fixed-sized
and variable-sized stats, but a simpler overall structure in terms of
the stats counters updated.
The test coverage should remain the same as previously: one TAP test is
used to check data reports, crash recovery and clean restart scenarios.
Tests are added for the manual reset of fixed-sized stats, something
not tested until now.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
The test module injection_points has been used as a landing spot to
provide coverage for the custom pgstats APIs, for both fixed-sized and
variable-sized stats kinds. Some recent work related to pgstats is
proving that this structure makes the implementation of new tests
harder.
This commit removes the code related to pgstats from injection_points,
and an equivalent will be reintroduced as a separate test module in a
follow-up commit. This removal is done in its own commit for clarity.
Using injection_points for this test coverage was perhaps not the best
way to design things, but this was good enough while working on the
first flavor of the custom pgstats APIs. Using a new test module will
make easier the introduction of new tests, and we will not need to worry
about the impact of new changes related to custom pgstats could have
with the internals of injection_points.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
This function reads the lines from a file and filters its contents to
report its head and tail contents. The amount of contents to read from
a file can be tuned by the environment variable PG_TEST_FILE_READ_LINES,
that can be used to override the default of 50 lines. If the file whose
content is read has less lines than two times PG_TEST_FILE_READ_LINES,
the whole file is returned.
This will be used in a follow-up commit to limit the amount of
information reported by some of the TAP tests on failure, where we have
noticed that the contents reported by the buildfarm can be heavily
bloated in some cases, with the head and tail contents of a report being
able to provide enough information to be useful for debugging.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
Commit 76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.
Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
This problem came in with commit 5ae2087202, which introduced
uniqueness check. Backpatch to 17.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
The majority of cases already used "restartpoint" with just a few
instances of "restart point". Changing the latter spelling to the
former ensures consistency in the user facing documentation. Code
comments are not affected by this since it is not worth the churn
to change anything there.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0F6E38D0-649F-4489-B2C1-43CD937E6636@yesql.se
Presently, this view reports NULL for the size of DSAs and dshash
tables because 1) the current backend might not be attached to them
and 2) the registry doesn't save the pointers to the dsa_area or
dshash_table in local memory. Also, the view doesn't show
partially-initialized entries to avoid ambiguity, since those
entries would report a NULL size as well.
This commit introduces a function that looks up the size of a DSA
given its handle (transiently attaching to the control segment if
needed) and teaches pg_dsm_registry_allocations to use it to show
the size of successfully-initialized DSA and dshash entries.
Furthermore, the view now reports partially-initialized entries
with a NULL size.
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan
Introduce a new column, slotsync_skip_reason, in the pg_replication_slots
view. This column records the reason why the last slot synchronization was
skipped. It is primarily relevant for logical replication slots on standby
servers where the 'synced' field is true. The value is NULL when
synchronization succeeds.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
This commit introduces three new functions for marking shared buffers as
dirty by using the functions introduced in 9660906dbd:
* pg_buffercache_mark_dirty() for one shared buffer.
- pg_buffercache_mark_dirt_relation() for all the shared buffers in a
relation.
* pg_buffercache_mark_dirty_all() for all the shared buffers in pool.
The "_all" and "_relation" flavors are designed to address the
inefficiency of repeatedly calling pg_buffercache_mark_dirty() for each
individual buffer, which can be time-consuming when dealing with with
large shared buffers pool.
These functions are intended as developer tools and are available only
to superusers. There is no need to bump the version of pg_buffercache,
4b203d499c having done this job in this release cycle.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
The documentation for CREATE/ALTER PUBLICATION previously showed:
[ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ] [, ... ]
to indicate that the table/column specification could be repeated.
However, placing [, ... ] directly after a multi-part construct was
misleading and made it unclear which portion was repeatable.
This commit introduces a new term, table_and_columns, to represent:
[ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ]
and updates the synopsis to use:
table_and_columns [, ... ]
which clearly identifies the repeatable element.
Backpatched to v15, where the misleading syntax was introduced.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+PtsyvYL3KmA6C8f0ZpXQ=7FEqQtETVy-BOF+cm9WPvfMQ@mail.gmail.com
Backpatch-through: 15
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose. Here, we allow these scans to
be parallelized. The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan. Doing that means workers finish at roughly the same time.
Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers). It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range. There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.
Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
There is no straightforward way to determine if a cluster is running
in EXEC_BACKEND mode or not, which is useful for tests to know. This
adds a GUC debug_exec_backend similar to debug_assertions which will
be true when the server is running in EXEC_BACKEND mode.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
When running on Windows (or EXEC_BACKEND) the SSL configuration will
be reloaded on each backend start, so the passphrase command will be
reloaded along with it. This implies that passphrase command reload
must be enabled on Windows for connections to work at all. Document
this since it wasn't mentioned explicitly, and will there add markup
for parameter value to match the rest of the docs.
Backpatch to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
Backpatch-through: 14
This patch adds two new columns to the pg_stat_replication_slots view:
slotsync_skip_count - the total number of times a slotsync operation was
skipped.
slotsync_skip_at - the timestamp of the most recent skip.
These additions provide better visibility into replication slot
synchronization behavior.
A future patch will introduce the slotsync_skip_reason column in
pg_replication_slots to capture the reason for skip.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
ba2a3c2302 has added a way to check if a buffer is spread across
multiple pages with some NUMA information, via a new view
pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL
function. These can only be queried when support for libnuma exists,
generating an error if not.
However, it can be useful to know how shared buffers and OS pages map
when NUMA is not supported or not available. This commit expands the
capabilities around pg_buffercache_numa:
- pg_buffercache_numa_pages() is refactored as an internal function,
able to optionally process NUMA. Its SQL definition prior to this
commit is still around to ensure backward-compatibility with v1.6.
- A SQL function called pg_buffercache_os_pages() is added, able to work
with or without NUMA.
- The view pg_buffercache_numa is redefined to use
pg_buffercache_os_pages().
- A new view is added, called pg_buffercache_os_pages. This ignores
NUMA for its result processing, for a better efficiency.
The implementation is done so as there is no code duplication between
the NUMA and non-NUMA views/functions, relying on one internal function
that does the job for all of them. The module is bumped to v1.7.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery. The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated. While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
The existing range_minus function raises an exception when the range is
"split", because then the result can't be represented by a single range.
For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'.
This commit adds new set-returning functions so that callers can get
results even in the case of splits. There is no risk of an exception for
multiranges, but a set-returning function lets us handle them the same
way we handle ranges.
Both functions return zero results if the subtraction would give an
empty range/multirange.
The main use-case for these functions is to implement UPDATE/DELETE FOR
PORTION OF, which must compute the application-time of "temporal
leftovers": the part of history in an updated/deleted row that was not
changed. To preserve the untouched history, we will implicitly insert
one record for each result returned by range/multirange_minus_multi.
Using a set-returning function will also let us support user-defined
types for application-time update/delete in the future.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
A set of wording improvements and spelling fixes.
Author: Oleg Sibiryakov <o.sibiryakov@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/e62bedb5-c26f-4d37-b4ed-ce9b55f1e980@postgrespro.ru
Commit 792353f7d5 updated the pg_dump and pg_dumpall documentation to
clarify which statistics are not included in their output. The pg_upgrade
documentation contained a nearly identical description, but it was not updated
at the same time.
This commit updates the pg_upgrade documentation to match those changes.
Backpatch to v18, where commit 792353f7d5 was backpatched to.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/CAHGQGwFnfgdGz8aGWVzgFCFwoWQU7KnFFjmxinf4RkQAkzmR+w@mail.gmail.com
Backpatch-through: 18
This patch renames the sync_error_count column to sync_table_error_count
in the pg_stat_subscription_stats view. The new name makes the purpose
explicit now that a separate column exists to track sequence
synchronization errors.
Additionally, the column seq_sync_error_count is renamed to
sync_seq_error_count to maintain a consistent naming pattern, making it
easier for users to group, and query synchronization related counters.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com
The documentation did not previously mention the default values for
the --fsync-interval and --plugin options, even though pg_recvlogical --help
shows them. This omission made it harder for users to understand
the tool's behavior from the documentation alone.
This commit adds the missing default value descriptions for both options
to the pg_recvlogical documentation.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAHGQGwFqssPBjkWMFofGq32e_tANOeWN-cM=6biAP3nnFUXMRw@mail.gmail.com
The existing format of pg_dependencies uses a single-object JSON
structure, with each key value embedding all the knowledge about the
set attributes tracked, like:
{"1 => 5": 1.000000, "5 => 1": 0.423130}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object, particularly when
tracking multiple attributes.
The new output format introduced in this commit is a JSON array of
objects, with:
- A key named "degree", with a float value.
- A key named "attributes", with an array of attribute numbers.
- A key named "dependency", with an attribute number.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"degree": 1.000000, "attributes": [1], "dependency": 5},
{"degree": 0.423130, "attributes": [5], "dependency": 1}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in the header introduced by 1f927cce44, to ease the
integration of frontend-specific changes that are still under
discussion. (Again a personal note: if anybody comes up with better
name for the keys, of course feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The existing format of pg_ndistinct uses a single-object JSON structure
where each key is itself a comma-separated list of attnums, like:
{"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object.
The new output format introduced in this commit is an array of objects,
with:
- A key named "attributes", that contains an array of attribute numbers.
- A key named "ndistinct", represented as an integer.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"ndistinct": 11, "attributes": [3,4]},
{"ndistinct": 11, "attributes": [3,6]},
{"ndistinct": 11, "attributes": [4,6]},
{"ndistinct": 11, "attributes": [3,4,6]}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in a new header, to ease with the integration of
frontend-specific changes that are still under discussion. (Personal
note: I am not specifically wedded to these key names, but if there are
better name suggestions for this release, feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Path expansion might expose characters like spaces which would cause
command failure, so double-quote the examples. While %f doesn't need
quoting since it uses a fixed character set, it is best to be
consistent.
Discussion: https://postgr.es/m/aROPCQCfvKp9Htk4@momjian.us
Backpatch-through: master
This reverts commit 1fd981f053, based on concerns that the logging
improvements do not justify the protocol breakage of dropping an unnamed
portal once its execution has completed.
It seems unlikely that one would try to send an execute or describe
message after the portal has been used, but if they do such
post-completion messages would not be able to process as the previous
versions. Let's revert this change for now so as we keep compatibility
and consider a different solution.
The tests added by 76bba03312 track the pre-1fd981f05369 behavior, and
are still valid.
Discussion: https://postgr.es/m/CA+TgmoYFJyJNQw3RT7veO3M2BWRE9Aw4hprC5rOcawHZti-f8g@mail.gmail.com
Much of the "Replication Slot" chapter applies to physical and logical
slots, but it was sloppy in mentioning mostly physical slots. This
patch clarified which parts of the text apply to which slot types.
This chapter is referenced from the logical slot/subscriber chapter, so
it needs to do double duty.
Backpatch-through: master
Also mention that logical replication slots are created by default when
subscriptions are created. This should clarify the text.
Backpatch-through: master
Previously it was not clear that "physical" replication slots were being
discussed, and that they needed to be created on the primary and not the
standby.
Backpatch-through: master
On the CREATE POLICY page, the "Policies Applied by Command Type"
table was missing MERGE ... THEN DELETE and some of the policies
applied during INSERT ... ON CONFLICT and MERGE. Fix that, and try to
improve readability by listing the various MERGE cases separately,
rather than together with INSERT/UPDATE/DELETE. Mention COPY ... TO
along with SELECT, since it behaves in the same way. In addition,
document which policy violations cause errors to be thrown, and which
just cause rows to be silently ignored.
Also, a paragraph above the table states that INSERT ... ON CONFLICT
DO UPDATE only checks the WITH CHECK expressions of INSERT policies
for rows appended to the relation by the INSERT path, which is
incorrect -- all rows proposed for insertion are checked, regardless
of whether they end up being inserted. Fix that, and also mention that
the same applies to INSERT ... ON CONFLICT DO NOTHING.
In addition, in various other places on that page, clarify how the
different types of policy are applied to different commands, and
whether or not errors are thrown when policy checks do not pass.
Backpatch to all supported versions. Prior to v17, MERGE did not
support RETURNING, and so MERGE ... THEN INSERT would never check new
rows against SELECT policies. Prior to v15, MERGE was not supported at
all.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWqnfeChjK=n1V_dYZT4rt4mnq+ybf9c0qXDYTVMsy8pg@mail.gmail.com
Backpatch-through: 14
Explicitly document that privileges are transferred along with the
ownership. Backpatch to all supported versions since this behavior
has always been present.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Josef Šimánek <josef.simanek@gmail.com>
Reported-by: Gilles Parc <gparc@free.fr>
Discussion: https://postgr.es/m/2023185982.281851219.1646733038464.JavaMail.root@zimbra15-e2.priv.proxad.net
Backpatch-through: 14
Add documentation describing sequence synchronization support in logical
replication. It explains how sequence changes are synchronized from the
publisher to the subscriber, the configuration requirements, and provide
examples illustrating setup and usage.
Additionally, document the pg_get_sequence_data() function, which allows
users to query sequence details on the publisher to determine when to
refresh corresponding sequences on the subscriber.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
The synopsis for the ALTER PUBLICATION ... DROP ... command incorrectly
implied that a column list and WHERE clause could be specified as part of
the publication object. However, these options are not allowed for
DROP operations, making the documentation misleading.
This commit corrects the synopsis to clearly show only the valid forms
of publication objects.
Backpatched to v15, where the incorrect synopsis was introduced.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsPu+47Q7b0o6h1r-qSt90U3zgbAHMHUag5o5E1Lo+=uw@mail.gmail.com
Backpatch-through: 15
The documentation stated that the directory specified by --file
must not exist, but pg_dump does allow for empty directories to
be specified and used.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/534AA60D-CF6B-432F-9882-E9737B33D1B7@gmail.com
This commit adds the --continue-on-error option, allowing pgbench clients
to continue running even when SQL statements fail for reasons other than
serialization or deadlock errors. Without this option (by default),
the clients aborts in such cases, which was the only available behavior
previously.
This option is useful for benchmarks using custom scripts that may
raise errors, such as unique constraint violations, where users want
pgbench to complete the run despite individual statement failures.
Author: Rintaro Ikeda <ikedarintarof@oss.nttdata.com>
Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/44334231a4d214fac382a69cceb7d9fc@oss.nttdata.com
This commit adds a new column, seq_sync_error_count, to the
pg_stat_subscription_stats view. This counter tracks the number of errors
encountered by the sequence synchronization worker during operation.
Since a single worker handles the synchronization of all sequences, this
value may reflect errors from multiple sequences. This addition improves
observability of sequence synchronization behavior and helps monitor
potential issues during replication.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
The following parameters can only be set at server start because
their context is PGC_POSTMASTER, but this information was missing
or incorrectly documented. This commit adds or corrects
that information for the following parameters:
* debug_io_direct
* dynamic_shared_memory_type
* event_source
* huge_pages
* io_max_combine_limit
* max_notify_queue_pages
* shared_memory_type
* track_commit_timestamp
* wal_decode_buffer_size
Backpatched to all supported branches.
Author: Karina Litskevich <litskevichkarina@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGfPzcin-_6XwPgVbWTOUFVZgHF5g9ROrwLUdCTfjy=0A@mail.gmail.com
Backpatch-through: 13
If these parameters are set without units, the values are interpreted
as blocks. This detail was previously missing from the documentation,
so this commit adds it.
Backpatch to v17 where io_combine_limit was added.
Author: Karina Litskevich <litskevichkarina@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACiT8iZCDkz1bNYQNQyvGhXWJExSnJULRTYT894u4-Ti7Yh6jw@mail.gmail.com
Backpatch-through: 17
Use uppercase SQL keywords consistently throughout the documentation to
ease reading. Also add whitespace in a couple of places where it
improves readability.
Author: Erik Wienhold <ewie@ewie.name>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/82eb512b-8ed2-46be-b311-54ffd26978c4%40ewie.name
This section introduces temporal tables, with a focus on Application
Time (which we support) and only a brief mention of System Time (which
we don't). It covers temporal primary keys, unique constraints, and
temporal foreign keys. We will document temporal update/delete and
periods as we add those features.
This commit also adds glossary entries for temporal table, application
time, and system time.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024@illuminatedcomputing.com
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality. It's not possible to implement this as
a function. Previous experience shows that stored procedures also have
limitation in this aspect.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Previously, unnamed portals were kept until the next Bind message or the
end of the transaction. This could cause temporary files to persist
longer than expected and make logging not reflect the actual SQL
responsible for the temporary file.
This patch changes exec_execute_message() to drop unnamed portals
immediately after execution to completion at the end of an Execute
message, making their removal more aggressive. This forces temporary
file cleanups to happen at the same time as the completion of the portal
execution, with statement logging correctly reflecting to which
statements these temporary files were attached to (see the diffs in the
TAP test updated by this commit for an idea).
The documentation is updated to describe the lifetime of unnamed
portals, and test cases are updated to verify temporary file removal and
proper statement logging after unnamed portal execution. This changes
how unnamed portals are handled in the protocol, hence no backpatch is
done.
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Co-Authored-by: Sami Imseih <samimseih@gmail.com>
Co-Authored-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tTrTUoEr3kDXCuKsvqYGq8OOHiBwoD-dyJocq95uEOTQ%40mail.gmail.com
We have never had a SET syntax that allows setting a GUC_LIST_INPUT
parameter to be an empty list. A locution such as
SET search_path = '';
doesn't mean that; it means setting the GUC to contain a single item
that is an empty string. (For search_path the net effect is much the
same, because search_path ignores invalid schema names and '' must be
invalid.) This is confusing, not least because configuration-file
entries and the set_config() function can easily produce empty-list
values.
We considered making the empty-string syntax do this, but that would
foreclose ever allowing empty-string items to be valid in list GUCs.
While there isn't any obvious use-case for that today, it feels like
the kind of restriction that might hurt someday. Instead, let's
accept the forbidden-up-to-now value NULL and treat that as meaning an
empty list. (An objection to this could be "what if we someday want
to allow NULL as a GUC value?". That seems unlikely though, and even
if we did allow it for scalar GUCs, we could continue to treat it as
meaning an empty list for list GUCs.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Klychkov <andrew.a.klychkov@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/CA+mfrmwsBmYsJayWjc8bJmicxc3phZcHHY=yW5aYe=P-1d_4bg@mail.gmail.com
Several functions in the codebase accept "Datum *" parameters but do
not modify the pointed-to data. These have been updated to take
"const Datum *" instead, improving type safety and making the
interfaces clearer about their intent. This change helps the compiler
catch accidental modifications and better documents immutability of
arguments.
Most of "Datum *" parameters have a pairing "bool *isnull" parameter,
they are constified as well.
No functional behavior is changed by this patch.
Author: Chao Li <lic@highgo.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
The docs for max_protocol_version suggested PQprotocolVersion()
instead of PQfullProtocolVersion() to find out the exact protocol
version. Since PQprotocolVersion() only returns the major protocol
version, that is bad advice.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQSKFxQsYAgr11PhdOr-RtPZEdAXZnHx6U3avLuk3xQaTQ%40mail.gmail.com
The new wal_fpi_bytes counter calculates the total amount of full page
images inserted in WAL records, in bytes. This commit exposes this
information in EXPLAIN (ANALYZE, WAL) alongside the existing counters,
for both the text and JSON/YAML outputs, building upon f9a09aa295.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discusssion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
The new %S substitution shows the current value of search_path.
Note that this only works when connected to Postgres v18 or newer,
since search_path was first marked as GUC_REPORT in commit
28a1121fd9. On older versions that don't report search_path, %S is
replaced with a question mark.
Suggested-by: Lauri Siltanen <lauri.siltanen@gmail.com>
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CANsM767JhTKCRagTaq5Lz52fVwLPVkhSpyD1C%2BOrridGv0SO0A%40mail.gmail.com
This new counter, called "wal_fpi_bytes", tracks the total amount in
bytes of full page images (FPIs) generated in WAL. This data becomes
available globally via pg_stat_wal, and for backend statistics via
pg_stat_get_backend_wal().
Previously, this information could only be retrieved with pg_waldump or
pg_walinspect, which may not be available depending on the environment,
and are expensive to execute. It offers hints about how much FPIs
impact the WAL generated, which could be a large percentage for some
workloads, as well as the effects of wal_compression or page holes.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in
PgStat_WalCounters.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
This commit makes the way WAL segments are handled from the source to
the target server slightly smarter: the copy of the WAL segments is now
skipped if these have been created before the point where source and
target have diverged (the WAL segment where the point of divergence
exists is still copied), because we know that such segments exist on
both the target and source. Note that the on-disk size of the WAL
segments on the source and target need to match. Hence, only the
segments generated after the point of divergence are now copied. A
segment existing on the source but not the target is copied.
Previously, all the WAL segments were just copied in full. This change
can make the rewind operation cheaper in some configurations, especially
for setups where some WAL retention causes many segments to remain on
the source server even after the promotion of a standby used as source
to rewind a previous primary.
A TAP test is added to track these new behaviors. The file map printed
with --debug now includes all the information related to WAL segments,
to be able to track if these are copied or skipped, and the test relies
on the debug output generated.
Author: John Hsu <johnhyvr@gmail.com>
Author: Justin Kwan <justinpkwan@outlook.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/181b4c6fa9c.b8b725681941212.7547232617810891479@viggy28.dev
The project Git server hasn't supported cloning with the Git protocol
in a very long time, but the documentation never got the memo. Remove
the mention of using the Git protocol, and while there wrap a mention
of Git in <productname> tags.
Backpatch down to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABwTF4WMiMb-KT2NRcib5W0C8TQF6URMb+HK9a_=rnZnY8Q42w@mail.gmail.com
Backpatch-through: 13
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.
In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION
These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.
This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.
Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
Previously, COPY TO command didn't support directly specifying
partitioned tables so users had to use COPY (SELECT ...) TO variant.
This commit adds direct COPY TO support for partitioned
tables, improving both usability and performance. Performance tests
show it's faster than the COPY (SELECT ...) TO variant as it avoids
the overheads of query processing and sending results to the COPY TO
command.
When used with partitioned tables, COPY TO copies the same rows as
SELECT * FROM table. Row-level security policies of the partitioned
table are applied in the same way as when executing COPY TO on a plain
table.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com
Previously, attempting to use pg_checksums on a cluster with a control
file whose version does not match with what thetool is able to support
would lead to the following error:
pg_checksums: error: pg_control CRC value is incorrect
This is confusing, because it would look like the control file is
corrupted. However, the contents of the control file are correct,
pg_checksums not being able to understand how the past control file is
shaped.
This commit adds a check based on PG_VERSION, using the facility added
by cd0be131ba, using the same error message as some of the other
frontend tools. A note is added in the documentation about the major
version requirement.
Author: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/68f1ff21.170a0220.2c9b5f.4df5@mx.google.com
Improve the documentation of pg_stat_replication to explain when
the backend_xmin column becomes NULL. This happens when
a replication slot is used (the xmin is then shown in pg_replication_slots)
or when hot_standby_feedback is disabled.
Author: Renzo Dani <arons7@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CA+XOKQAMXzskpdUmj2sg03_5fmiXc2Gs0r3TX1_rmcFcqh+=xQ@mail.gmail.com
The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.
Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
There was some confusion around how to adjust the n_distinct estimates
for partitioned tables. Here we try and clarify that
n_distinct_inherited needs to be adjusted rather than n_distinct.
Also fix some slightly misleading text which was talking about table
size rather than table rows, fix a grammatical error, and adjust some
text which indicated that ANALYZE was performing calculations based on
the n_distinct settings. Really it's the query planner that does this
and ANALYZE only stores the overridden n_distinct estimate value in
pg_statistic.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvrL7a-ZytM1SP8Uk9nEw9bR2CPzVb+uP+bcNj=_q-ZmVw@mail.gmail.com
Since protocol version 3.2 the CancelRequest does not have a fixed size
length anymore. The protocol docs still listed the length field to be a
constant number though. This fixes that.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reported-by: Dmitry Igrishin <dmitigr@gmail.com>
Backpatch-through: 18