Commit graph

2512 commits

Author SHA1 Message Date
Robert Haas
7358abcc60 Store information about Append node consolidation in the final plan.
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.

To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.

This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.

Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 17:55:59 -05:00
Michael Paquier
9181c870ba Improve type handling of varlena structures
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base.  Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.

This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external

This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
2026-02-11 07:33:24 +09:00
Robert Haas
0d4391b265 Store information about elided nodes in the final plan.
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.

To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable.  So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.

Also, update pg_overexplain to display these details.

Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 16:46:05 -05:00
Robert Haas
adbad833f3 Store information about range-table flattening in the final plan.
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.

Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.

To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.

This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.

Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
2026-02-10 15:33:39 -05:00
Heikki Linnakangas
17f51ea818 Separate RecoveryConflictReasons from procsignals
Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery
conflict reasons. To distinguish, have a bitmask in PGPROC to indicate
the reason(s).

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
2026-02-10 16:23:08 +02:00
Tom Lane
8ebdf41c26 Harden _int_matchsel() against being attached to the wrong operator.
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)

We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.

Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension.  (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)

This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name.  This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.

Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
2026-02-09 10:14:22 -05:00
Heikki Linnakangas
00896ddaf4 Fix buffer overflows in pg_trgm due to lower-casing
The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:

    latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
       lower
    -----------
     i\x1A\x1A
    (1 row)

In this example, lower-casing expands the single input character into
three characters.

The generate_trgm_only() function relied on that assumption in two
ways:

- It used "slen * pg_database_encoding_max_length() + 4" to allocate
  the buffer to hold the lowercased and blank-padded string. That
  formula accounts for expansion if the lower-case characters are
  longer (in bytes) than the originals, but it's still not enough if
  the lower-cased string contains more *characters* than the original.

- Its callers sized the output array to hold the trigrams extracted
  from the input string with the formula "(slen / 2 + 1) * 3", where
  'slen' is the input string length in bytes. (The formula was
  generous to account for the possibility that RPADDING was set to 2.)
  That's also not enough if one input byte can turn into multiple
  characters.

To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.

Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.

Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
2026-02-09 12:08:58 +13:00
Heikki Linnakangas
e2362eb2bd Move shmem allocator's fields from PGShmemHeader to its own struct
For readability. It was a slight modularity violation to have fields
in PGShmemHeader that were only used by the allocator code in
shmem.c. And it was inconsistent that ShmemLock was nevertheless not
stored there. Moving all the allocator-related fields to a separate
struct makes it more consistent and modular, and removes the need to
allocate and pass ShmemLock separately via BackendParameters.

Merge InitShmemAccess() and InitShmemAllocation() into a single
function that initializes the struct when called from postmaster, and
when called from backends in EXEC_BACKEND mode, re-establishes the
global variables. That's similar to all the *ShmemInit() functions
that we have.

Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5uNRB9oT4pdo54qAo025MXFX4MfYrD9K15OCqe-ExnNvg@mail.gmail.com
2026-01-30 18:22:56 +02:00
Masahiko Sawada
8f1e2dfe03 Consolidate replication origin session globals into a single struct.
This commit moves the separate global variables for replication origin
state into a single ReplOriginXactState struct. This groups logically
related variables, which improves code readability and simplifies
state management (e.g., resetting the state) by handling them as a
unit.

Author: Chao Li <lic@highgo.com>
Suggested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com
2026-01-28 12:26:22 -08:00
Masahiko Sawada
1fdbca159e Standardize replication origin naming to use "ReplOrigin".
The replication origin code was using inconsistent naming
conventions. Functions were typically prefixed with 'replorigin',
while typedefs and constants used "RepOrigin".

This commit unifies the naming convention by renaming RepOriginId to
ReplOriginId.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com
2026-01-28 11:03:29 -08:00
Michael Paquier
0e80f3f88d Add pg_restore_extended_stats()
This function closely mirror its relation and attribute counterparts,
but for extended statistics (i.e. CREATE STATISTICS) objects, being
able to restore extended statistics for an extended stats object.  Like
the other functions, the goal of this feature is to ease the dump or
upgrade of clusters so as ANALYZE would not be required anymore after
these operations, stats being directly loaded into the target cluster
without any post-dump/upgrade computation.

The caller of this function needs the following arguments for the
extended stats to restore:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
- One or more extended stats kind with its data.

This commit adds only support for the restore of the extended statistics
kind "n_distinct", building the basic infrastructure for the restore
of more extended statistics kinds in follow-up commits, including MVC
and dependencies.

The support for "n_distinct" is eased in this commit thanks to the
previous work done particularly in commits 1f927cce44 and
44eba8f06e, that have added the input function for the type
pg_ndistinct, used as data type in input of this new restore function.

Bump catalog version.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-26 15:08:15 +09:00
Álvaro Herrera
4d6a66f675
Allow Boolean reloptions to have ternary values
From the user's point of view these are just Boolean values; from the
implementation side we can now distinguish an option that hasn't been
set.  Reimplement the vacuum_truncate reloption using this type.

This could also be used for reloptions vacuum_index_cleanup and
buffering, but those additionally need a per-option "alias" for the
state where the variable is unset (currently the value "auto").

Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro
2026-01-21 20:06:01 +01:00
Tom Lane
282b1cde9d Optimize LISTEN/NOTIFY via shared channel map and direct advancement.
This patch reworks LISTEN/NOTIFY to avoid waking backends that have
no need to process the notification messages we just sent.

The primary change is to create a shared hash table that tracks
which processes are listening to which channels (where a "channel" is
defined by a database OID and channel name).  This allows a notifying
process to accurately determine which listeners are interested,
replacing the previous weak approximation that listeners in other
databases couldn't be interested.

Secondly, if a listener is known not to be interested and is
currently stopped at the old queue head, we avoid waking it at all
and just directly advance its queue pointer past the notifications
we inserted.

These changes permit very significant improvements (integer multiples)
in NOTIFY throughput, as well as a noticeable reduction in latency,
when there are many listeners but only a few are interested in any
specific message.  There is no improvement for the simplest case where
every listener reads every message, but any loss seems below the noise
level.

Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
2026-01-15 14:12:15 -05:00
Andres Freund
0b96e734c5 heapam: Add batch mode mvcc check and use it in page mode
There are two reasons for doing so:

1) It is generally faster to perform checks in a batched fashion and making
   sequential scans faster is nice.

2) We would like to stop setting hint bits while pages are being written
   out. The necessary locking becomes visible for page mode scans, if done for
   every tuple. With batching, the overhead can be amortized to only happen
   once per page.

There are substantial further optimization opportunities along these
lines:

- Right now HeapTupleSatisfiesMVCCBatch() simply uses the single-tuple
  HeapTupleSatisfiesMVCC(), relying on the compiler to inline it. We could
  instead write an explicitly optimized version that avoids repeated xid
  tests.

- Introduce batched version of the serializability test

- Introduce batched version of HeapTupleSatisfiesVacuum

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
2026-01-12 13:22:04 -05:00
Nathan Bossart
a516b3f00d MSVC: Support building for AArch64.
This commit does the following to get tests passing for
MSVC/AArch64:

* Implements spin_delay() with an ISB instruction (like we do for
gcc/clang on AArch64).

* Sets USE_ARMV8_CRC32C unconditionally.  Vendor-supported versions
of Windows for AArch64 require at least ARMv8.1, which is where CRC
extension support became mandatory.

* Implements S_UNLOCK() with _InterlockedExchange().  The existing
implementation for MSVC uses _ReadWriteBarrier() (a compiler
barrier), which is insufficient for this purpose on non-TSO
architectures.

There are likely other changes required to take full advantage of
the hardware (e.g., atomics/arch-arm.h, simd.h,
pg_popcount_aarch64.c), but those can be dealt with later.

Author: Niyas Sait <niyas.sait@linaro.org>
Co-authored-by: Greg Burd <greg@burd.me>
Co-authored-by: Dave Cramer <davecramer@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/A6152C7C-F5E3-4958-8F8E-7692D259FF2F%40greg.burd.me
Discussion: https://postgr.es/m/CAFPTBD-74%2BAEuN9n7caJ0YUnW5A0r-KBX8rYoEJWqFPgLKpzdg%40mail.gmail.com
2026-01-07 13:42:57 -06:00
Jeff Davis
c4ff35f104 ICU: use UTF8-optimized case conversion API
Initializes a UCaseMap object once for use across calls, and uses
UTF8-optimized APIs.

Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se
2026-01-06 14:09:07 -08:00
Michael Paquier
b8cfcb9e00 Fix typos and inconsistencies in code and comments
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree).  All of these have been spotted by the author.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
2026-01-05 09:19:15 +09:00
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Andrew Dunstan
f3c9e341cd Add paths of extensions to pg_available_extensions
Add a new "location" column to the pg_available_extensions and
pg_available_extension_versions views, exposing the directory where
the extension is located.

The default system location is shown as '$system', the same value
that can be used to configure the extension_control_path GUC.

User-defined locations are only visible for super users, otherwise
'<insufficient privilege>' is returned as a column value, the same
behaviour that we already use in pg_stat_activity.

I failed to resist the temptation to do a little extra editorializing of
the TAP test script.

Catalog version bumped.

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Reviewed-By: Rohit Prasad <rohit.prasad@arm.com>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Manni Wood <manni.wood@enterprisedb.com>
Reviewed-By: Euler Taveira <euler@eulerto.com>
Reviewed-By: Quan Zongliang <quanzongliang@yeah.net>
2026-01-01 12:13:59 -05:00
Peter Eisentraut
b63443718a Remove MsgType type
Presumably, the C type MsgType was meant to hold the protocol message
type in the pre-version-3 era, but this was never fully developed even
then, and the name is pretty confusing nowadays.  It has only one
vestigial use for cancel requests that we can get rid of.  Since a
cancel request is indicated by a special protocol version number, we
can use the ProtocolVersion type, which MsgType was based on.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/505e76cb-0ca2-4e22-ba0f-772b5dc3f230%40eisentraut.org
2025-12-27 23:46:28 +01:00
Masahiko Sawada
67c20979ce Toggle logical decoding dynamically based on logical slot presence.
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.

This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.

When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.

While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.

The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.

A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.

Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.

Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
2025-12-23 10:13:16 -08:00
John Naylor
9303d62c6d Separate out bytea sort support from varlena.c
In the wake of commit b45242fd3, bytea_sortsupport() still called out
to varstr_sortsupport(). Treating bytea as a kind of text/varchar
required varstr_sortsupport() to allow for the possibility of
NUL bytes, but only for C collation. This was confusing. For
better separation of concerns, create an independent sortsupport
implementation in bytea.c.

The heuristics for bytea_abbrev_abort() remain the same as for
varstr_abbrev_abort(). It's possible that the bytea case warrants
different treatment, but that is left for future investigation.

In passing, adjust some strange looking comparisons in
varstr_abbrev_abort().

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
2025-12-16 15:19:16 +07:00
Michael Paquier
4ba012a8ed Allow cumulative statistics to read/write auxiliary data from/to disk
Cumulative stats kinds gain the capability to write additional per-entry
data when flushing the stats at shutdown, and read this data when
loading back the stats at startup.  This can be fit for example in the
case of variable-length data (like normalized query strings), so as it
becomes possible to link the shared memory stats entries to data that is
stored in a different area, like a DSA segment.

Three new optional callbacks are added to PgStat_KindInfo, available to
variable-numbered stats kinds:
* to_serialized_data: writes auxiliary data for an entry.
* from_serialized_data: reads auxiliary data for an entry.
* finish: performs actions after read/write/discard operations.  This is
invoked after processing all the entries of a kind, allowing extensions
to close file handles and clean up resources.

Stats kinds have the option to store this data in the existing pgstats
file, but can as well store it in one or more additional files whose
names can be built upon the entry keys.  The new serialized callbacks
are called once an entry key is read or written from the main stats
file.  A file descriptor to the main pgstats file is available in the
arguments of the callbacks.

Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
2025-12-15 09:40:56 +09:00
Tom Lane
58dad7f349 Update typedefs.list to match what the buildfarm currently reports.
The current list from the buildfarm includes quite a few typedef
names that it used to miss.  The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array.  In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it.  Install the current list and re-run pgindent
to update affected code.

Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.

Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
2025-12-14 17:03:53 -05:00
Tom Lane
fe7ede45f1 Looks like we can't test NLS on machines that lack any es_ES locale.
While commit 5b275a6e1 fixed a few unhappy buildfarm animals,
it looks like the remainder simply don't have any es_ES locale
at all.  There's little point in running the test in that case,
so minimize the number of variant expected-files by bailing out.
Also emit a log entry so that it's possible to tell from buildfarm
postmaster logs which case occurred.

Possibly, the scope of this testing could be improved by providing
additional translations.  But I think it's likely that the failing
animals have no non-C locales installed at all.

In passing, update typedefs.list so that koel doesn't think
regress.c is misformatted.

Discussion: https://postgr.es/m/E1vUpNU-000kcQ-1D@gemulon.postgresql.org
2025-12-14 14:30:50 -05:00
Andres Freund
edbaaea0a9 bufmgr: Separate keys for private refcount infrastructure
This makes lookups faster, due to allowing auto-vectorized lookups. It is also
beneficial for an upcoming patch, independent of auto-vectorization, as the
upcoming patch wants to track more information for each pinned buffer, making
the existing loop, iterating over an array of PrivateRefCountEntry, more
expensive due to increasing its size.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-14 13:09:43 -05:00
Alexander Korotkov
4b3d173629 Implement ALTER TABLE ... SPLIT PARTITION ... command
This new DDL command splits a single partition into several partitions.  Just
like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are
created using the createPartitionTable() function with the parent partition
as the template.

This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing.  This is why the new DDL command
can't be recommended for large, partitioned tables under high load.  However,
this implementation comes in handy in certain cases, even as it is.  Also, it
could serve as a foundation for future implementations with less locking and
possibly parallelism.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
2025-12-14 13:29:38 +02:00
Heikki Linnakangas
bd8d9c9bdf Widen MultiXactOffset to 64 bits
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.

Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.

We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.

Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.

Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
2025-12-09 13:53:03 +02:00
Michael Paquier
31280d96a6 test_custom_stats: Test module for custom cumulative statistics
This test module acts as a replacement that existed prior to
d52c24b0f8 in the test module injection_points.  It uses a more
flexible structure than its ancestor:
- Two libraries are built, one for fixed-sized stats and one for
variable-sized stats.
- No GUCs required.  The stats are enabled only if one or both libraries
are loaded with shared_preload_libraries.
- Same kind IDs reserved: 25 (variable-sized) and 26 (fixed-sized)

The goal of this redesign is to be able to easier extend the code
coverage provided by this module for other changes that are currently
under discussion, and injection_points was not suited for these.
Injection points are also now widely used in the tree now, so extending
more the test coverage for custom pgstats in the test module
injection_points would be a riskier long-term move.

The new code is mostly a copy of what existed previously in the test
module injection_points, with the same callbacks defined for fixed-sized
and variable-sized stats, but a simpler overall structure in terms of
the stats counters updated.

The test coverage should remain the same as previously: one TAP test is
used to check data reports, crash recovery and clean restart scenarios.
Tests are added for the manual reset of fixed-sized stats, something
not tested until now.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
2025-12-08 15:23:09 +09:00
Michael Paquier
d52c24b0f8 injection_points: Remove portions related to custom pgstats
The test module injection_points has been used as a landing spot to
provide coverage for the custom pgstats APIs, for both fixed-sized and
variable-sized stats kinds.  Some recent work related to pgstats is
proving that this structure makes the implementation of new tests
harder.

This commit removes the code related to pgstats from injection_points,
and an equivalent will be reintroduced as a separate test module in a
follow-up commit.  This removal is done in its own commit for clarity.

Using injection_points for this test coverage was perhaps not the best
way to design things, but this was good enough while working on the
first flavor of the custom pgstats APIs.  Using a new test module will
make easier the introduction of new tests, and we will not need to worry
about the impact of new changes related to custom pgstats could have
with the internals of injection_points.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
2025-12-08 12:45:20 +09:00
Peter Eisentraut
40bdd839f5 headerscheck ccache support
Currently, headerscheck and cpluspluscheck are very slow, and they
defeat use of ccache.  This fixes that, and now they are much faster.

The problem was that the test files are created in a randomly-named
directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory is
mentioned on the compiler command line, which is part of the cache
key.

The solution is to create the test files in the build directory.  For
example, for src/include/storage/ipc.h, we generate

    tmp_headerscheck_c/src_include_storage_ipc_h.c (or .cpp)

Now ccache works.  (And it's also a bit easier to debug everything
with this naming.)

(The subdirectory is used to keep the cleanup trap simple.)

The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
is from about 1min 20s to only 20s.  In local use, the speedups are
similar.

Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/b49e74d4-3cf9-4d1c-9dce-09f75e55d026%40eisentraut.org
2025-12-04 11:23:23 +01:00
Peter Eisentraut
d0b7a0b4c8 headerscheck: Use LLVM_CPPFLAGS
Otherwise, headerscheck will fail if the LLVM headers are in a
location not reached by the normal CFLAGS/CPPFLAGS.

Discussion: https://www.postgresql.org/message-id/flat/b49e74d4-3cf9-4d1c-9dce-09f75e55d026%40eisentraut.org
2025-12-04 10:58:15 +01:00
Andres Freund
6c5c393b74 Rename BUFFERPIN wait event class to BUFFER
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-03 18:38:20 -05:00
Andres Freund
156680055d bufmgr: Turn BUFFER_LOCK_* into an enum
It seems cleaner to use an enum to tie the different values together. It also
helps to have a more descriptive type in the argument to various functions.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-03 18:38:20 -05:00
Amit Kapila
e68b6adad9 Add slotsync_skip_reason column to pg_replication_slots view.
Introduce a new column, slotsync_skip_reason, in the pg_replication_slots
view. This column records the reason why the last slot synchronization was
skipped. It is primarily relevant for logical replication slots on standby
servers where the 'synced' field is true. The value is NULL when
synchronization succeeds.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-28 05:21:35 +00:00
David Rowley
42473b3b31 Have the planner replace COUNT(ANY) with COUNT(*), when possible
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.

Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*).  This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref.  This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column.  The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.

It may be possible to add prosupport functions for other aggregates.  We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).

There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test.  Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
2025-11-27 10:43:28 +13:00
Daniel Gustafsson
1cdb84bb1b Check for correct version of perltidy
pgperltidy requires a particular version of perltidy, but the version
wasn't checked like how pgindent checks the underlying indent binary.
Fix by checking the version of perltidy and error out if an incorrect
version is used.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1209850.1764092152@sss.pgh.pa.us
2025-11-26 20:43:09 +01:00
Michael Paquier
e1405aa5e3 Add input function for data type pg_dependencies
pg_dependencies is used as data type for the contents of dependencies
extended statistics.  This new input function consumes the format that
has been established by e76defbcf0 for the output function of
pg_dependencies, enforcing some sanity checks for:
- Checks for the input object, which should be a one-dimension array
with correct attributes and values.
- The key names: "attributes", "dependency", "degree".  All are
required, other key names are blocked.
- Value types for each key: "attributes" requires an array of integers,
"dependency" an attribute number, "degree" a float.
- List of attributes.  In this case, it is possible that some
dependencies are not listed in the statistics data, as items with a
degree of 0 are discarded when building the statistics.  This commit
includes checks for simple scenarios, like duplicated attributes, or
overlapping values between the list of "attributes" and the "dependency"
value.  Even if the input function considers the input as valid, a value
still needs to be cross-checked with the attributes defined in a
statistics object at import.
- Based on the discussion, the checks on the values are loose, as there
is also an argument for potentially stats injection.  For example,
"degree" should be defined in [0.0,1.0], but a check is not enforced.

This is required for a follow-up patch that aims to implement the import
of extended statistics.  Some tests are added to check the code paths of
the JSON parser checking the shape of the pg_dependencies inputs, with
91% of code coverage reached.  The tests are located in their own new
test file, for clarity.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2025-11-26 10:53:16 +09:00
Michael Paquier
44eba8f06e Add input function for data type pg_ndistinct
pg_ndistinct is used as data type for the contents of ndistinct extended
statistics.  This new input function consumes the format that has been
established by 1f927cce44 for the output function of pg_ndistinct,
enforcing some sanity checks for:
- Checks for the input object, which should be a one-dimension array
with correct attributes and values.
- The key names: "attributes", "ndistinct".  Both are required, other
key names are blocked.
- Value types for each key: "attributes" requires an array of integers,
and "ndistinct" an integer.
- List of attributes.  Note that this enforces a check so as an
attribute list has to be a subset of the longest attribute list found.
This does not enforce that a full group of attribute sets exist, based
on how the groups are generated when the ndistinct objects are
generated, making the list of ndistinct items a bit loose.  Note a check
would still be required at import to see if the attributes listed match
with the attribute numbers set in the definition of a statistics object.
- Based on the discussion, the checks on the values are loose, as there
is also an argument for potentially stats injection.  The relation and
attribute level stats follow the same line of argument for the values.

This is required for a follow-up patch that aims to implement the import
of extended statistics.  Some tests are added to check the code paths of
the JSON parser checking the shape of the pg_ndistinct inputs, with 90%
of code coverage reached.  The tests are located in their own new test
file, for clarity.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2025-11-26 10:13:18 +09:00
Michael Paquier
4b203d499c pg_buffercache: Add pg_buffercache_os_pages
ba2a3c2302 has added a way to check if a buffer is spread across
multiple pages with some NUMA information, via a new view
pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL
function.  These can only be queried when support for libnuma exists,
generating an error if not.

However, it can be useful to know how shared buffers and OS pages map
when NUMA is not supported or not available.  This commit expands the
capabilities around pg_buffercache_numa:
- pg_buffercache_numa_pages() is refactored as an internal function,
able to optionally process NUMA.  Its SQL definition prior to this
commit is still around to ensure backward-compatibility with v1.6.
- A SQL function called pg_buffercache_os_pages() is added, able to work
with or without NUMA.
- The view pg_buffercache_numa is redefined to use
pg_buffercache_os_pages().
- A new view is added, called pg_buffercache_os_pages.  This ignores
NUMA for its result processing, for a better efficiency.

The implementation is done so as there is no code duplication between
the NUMA and non-NUMA views/functions, relying on one internal function
that does the job for all of them.  The module is bumped to v1.7.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-24 14:29:15 +09:00
Tom Lane
b140c8d7a3 Add SupportRequestInlineInFrom planner support request.
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery.  The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated.  While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
2025-11-22 19:33:34 -05:00
Bruce Momjian
c0bc9af151 tools: remove src/tools/codelines
This is a one-line script never gained general usage since being added
in 2005.

Backpatch-through: master
2025-11-22 12:02:14 -05:00
Andrew Dunstan
51da766494 Add 'make check-tests' behavior to the meson based builds
There was no easy way to run specific tests in the meson based builds.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: postgr.es/m/CAExHW5tK-QqayUN0%2BN3MF5bjV6vLKDCkRuGwoDJwc7vGjwCygQ%40mail.gmail.com
2025-11-21 17:12:22 -05:00
Peter Eisentraut
e6be84356b Update timezone to C99
This reverts changes done in PostgreSQL over the upstream code to
avoid relying on C99 <stdint.h> and <inttypes.h>.

In passing, there were a few other minor and cosmetic changes that I
left in to improve alignment with upstream, including some C11 feature
use (_Noreturn).

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/9ad2749f-77ab-4ecb-a321-1ca915480b05%40eisentraut.org
2025-11-21 13:07:40 +01:00
Bruce Momjian
5d4dc112c7 tools: update tools/codelines to use "git ls-files"
This generates a more accurate code count because 'make distclean'
doesn't always remove build files.

Author: idea from David Rowley

Discussion: https://postgr.es/m/aR4hoOotVHB7TXo5@momjian.us

Backpatch-through: master
2025-11-20 15:23:39 -05:00
Melanie Plageman
1937ed7062 Refactor heap_page_prune_and_freeze() parameters into a struct
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
2025-11-20 10:32:14 -05:00
Tom Lane
057012b205 Speed up eqjoinsel() with lots of MCV entries.
If both sides of the operator have most-common-value statistics,
eqjoinsel wants to check which MCVs have matches on the other side.
Formerly it did this with a dumb compare-all-the-entries loop,
which had O(N^2) behavior for long MCV lists.  When that code was
written, twenty-plus years ago, that seemed tolerable; but nowadays
people frequently use much larger statistics targets, so that the
O(N^2) behavior can hurt quite a bit.

To add insult to injury, when asked for semijoin semantics, the
entire comparison loop was done over, even though we frequently
know that it will yield exactly the same results.

To improve matters, switch to using a hash table to perform the
matching.  Testing suggests that depending on the data type, we may
need up to about 100 MCVs on each side to amortize the extra costs
of setting up the hash table and performing hash-value computations;
so continue to use the old looping method when there are fewer MCVs
than that.

Also, refactor so that we don't repeat the matching work unless
we really need to, which occurs only in the uncommon case where
eqjoinsel_semi decides to truncate the set of inner MCVs it
considers.  The refactoring also got rid of the need to use the
presented operator's commutator.  Real-world operators that are
using eqjoinsel should pretty much always have commutators, but
at the very least this saves a few syscache lookups.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Co-authored-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20ea8bf5-3569-4e46-92ef-ebb2666debf6@tantorlabs.com
2025-11-19 13:22:12 -05:00
Michael Paquier
84fb27511d Replace off_t by pgoff_t in I/O routines
PostgreSQL's Windows port has never been able to handle files larger
than 2GB due to the use of off_t for file offsets, only 32-bit on
Windows.  This causes signed integer overflow at exactly 2^31 bytes when
trying to handle files larger than 2GB, for the routines touched by this
commit.

Note that large files are forbidden by ./configure (3c6248a828) and
meson (recent change, see 79cd66f28c).  This restriction also exists
in v16 and older versions for the now-dead MSVC scripts.

The code base already defines pgoff_t as __int64 (64-bit) on Windows for
this purpose, and some function declarations in headers use it, but many
internals still rely on off_t.  This commit switches more routines to
use pgoff_t, offering more portability, for areas mainly related to file
extensions and storage.

These are not critical for WAL segments yet, which have currently a
maximum size allowed of 1GB (well, this opens the door at allowing a
larger size for them).  This matters more for segment files if we want
to lift the large file restriction in ./configure and meson in the
future, which would make sense to remove once/if all traces of off_t are
gone from the tree.  This can additionally matter for out-of-core code
that may want files larger than 2GB in places where off_t is four bytes
in size.

Note that off_t is still used in other parts of the tree like
buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc.
These other code paths can be addressed separately, and their update
will be required if we want to remove the large file restriction in the
future.  This commit is a good first cut in itself towards more
portability, hopefully.

On Unix-like systems, pgoff_t is defined as off_t, so this change only
affects Windows behavior.

Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
2025-11-13 12:41:40 +09:00
Álvaro Herrera
78aae29830
Change coding pattern for CURL_IGNORE_DEPRECATION()
Instead of having to write a semicolon inside the macro argument, we can
insert a semicolon with another macro layer.  This no longer gives
pg_bsd_indent indigestion, so we can remove the digestive aids that had
to be installed in the pgindent Perl script.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202511111134.njrwf5w5nbjm@alvherre.pgsql
Backpatch-through: 18
2025-11-12 12:35:14 +01:00
Thomas Munro
b498af4204 ci: Improve OpenBSD core dump backtrace handling.
Since OpenBSD core dumps do not embed executable paths, the script now
searches for the corresponding binary manually within the specified
directory before invoking LLDB.  This is imperfect but should find the
right executable in practice, as needed for meaningful backtraces.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ36R74TZ8RKsFueYwLxGKDAm3LU2FHM_ZUCSB6imd3vYA@mail.gmail.com
Backpatch-through: 18
2025-11-06 21:14:05 +13:00