Commit graph

2385 commits

Author SHA1 Message Date
Andres Freund
c325a7633f aio: Add io_method=io_uring
Performing AIO using io_uring can be considerably faster than
io_method=worker, particularly when lots of small IOs are issued, as
a) the context-switch overhead for worker based AIO becomes more significant
b) the number of IO workers can become limiting

io_uring, however, is linux specific and requires an additional compile-time
dependency (liburing).

This implementation is fairly simple and there are substantial optimization
opportunities.

The description of the existing AIO_IO_COMPLETION wait event is updated to
make the difference between it and the new AIO_IO_URING_EXECUTION clearer.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-26 19:49:13 -04:00
Robert Haas
8d5ceb113e pg_overexplain: Additional EXPLAIN options for debugging.
There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.

A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com
2025-03-26 13:52:21 -04:00
Tom Lane
9324c8c580 Introduce PG_MODULE_MAGIC_EXT macro.
This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:06:12 -04:00
Amit Kapila
73eba5004a Detect and Log multiple_unique_conflicts type conflict.
Introduce a new conflict type, multiple_unique_conflicts, to handle cases
where an incoming row during logical replication violates multiple UNIQUE
constraints.

Previously, the apply worker detected and reported only the first
encountered key conflict (insert_exists/update_exists), causing repeated
failures as each constraint violation needs to be handled one by one
making the process slow and error-prone.

With this patch, the apply worker checks all unique constraints upfront
once the first key conflict is detected and reports
multiple_unique_conflicts if multiple violations exist. This allows users
to resolve all conflicts at once by deleting all conflicting tuples rather
than dealing with them individually or skipping the transaction.

In the future, this will also allow us to specify different resolution
handlers for such a conflict type.

Add the stats for this conflict type in pg_stat_subscription_stats.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com
2025-03-24 12:30:44 +05:30
Andres Freund
5941946d09 meson: Flush stdout in testwrap
Otherwise the progress won't reliably be displayed during a test.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/kx6xu7suexal5vwsxpy7ybgkcznx6hgywbuhkr6qabcwxjqax2@i4pcpk75jvaa
Backpatch-through: 16
2025-03-19 09:04:09 -04:00
Andres Freund
247ce06b88 aio: Add io_method=worker
The previous commit introduced the infrastructure to start io_workers. This
commit actually makes the workers execute IOs.

IO workers consume IOs from a shared memory submission queue, run traditional
synchronous system calls, and perform the shared completion handling
immediately.  Client code submits most requests by pushing IOs into the
submission queue, and waits (if necessary) using condition variables.  Some
IOs cannot be performed in another process due to lack of infrastructure for
reopening the file, and must processed synchronously by the client code when
submitted.

For now the default io_method is changed to "worker". We should re-evaluate
that around beta1, we might want to be careful and set the default to "sync"
for 18.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18 11:54:01 -04:00
Robert Haas
c65bc2e1d1 Make it possible for loadable modules to add EXPLAIN options.
Modules can use RegisterExtensionExplainOption to register new
EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState,
and SetExplainExtensionState to store related state inside the
ExplainState object.

Since this substantially increases the amount of code that needs
to handle ExplainState-related tasks, move a few bits of existing
code to a new file explain_state.c and add the rest of this
infrastructure there.

See the comments at the top of explain_state.c for further
explanation of how this mechanism works.

This does not yet provide a way for such such options to do anything
useful. The intention is that we'll add hooks for that purpose in a
separate commit.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 08:41:12 -04:00
Andres Freund
da7226993f aio: Add core asynchronous I/O infrastructure
The main motivations to use AIO in PostgreSQL are:

a) Reduce the time spent waiting for IO by issuing IO sufficiently early.

   In a few places we have approximated this using posix_fadvise() based
   prefetching, but that is fairly limited (no completion feedback, double the
   syscalls, only works with buffered IO, only works on some OSs).

b) Allow to use Direct-I/O (DIO).

   DIO can offload most of the work for IO to hardware and thus increase
   throughput / decrease CPU utilization, as well as reduce latency.  While we
   have gained the ability to configure DIO in d4e71df6, it is not yet usable
   for real world workloads, as every IO is executed synchronously.

For portability, the new AIO infrastructure allows to implement AIO using
different methods. The choice of the AIO method is controlled by the new
io_method GUC. As of this commit, the only implemented method is "sync",
i.e. AIO is not actually executed asynchronously. The "sync" method exists to
allow to bypass most of the new code initially.

Subsequent commits will introduce additional IO methods, including a
cross-platform method implemented using worker processes and a linux specific
method using io_uring.

To allow different parts of postgres to use AIO, the core AIO infrastructure
does not need to know what kind of files it is operating on. The necessary
behavioral differences for different files are abstracted as "AIO
Targets". One example target would be smgr. For boring portability reasons,
all targets currently need to be added to an array in aio_target.c.  This
commit does not implement any AIO targets, just the infrastructure for
them. The smgr target will be added in a later commit.

Completion (and other events) of IOs for one type of file (i.e. one AIO
target) need to be reacted to differently, based on the IO operation and the
callsite. This is made possible by callbacks that can be registered on
IOs. E.g. an smgr read into a local buffer does not need to update the
corresponding BufferDesc (as there is none), but a read into shared buffers
does.  This commit does not contain any callbacks, they will be added in
subsequent commits.

For now the AIO infrastructure only understands READV and WRITEV operations,
but it is expected that more operations will be added. E.g. fsync/fdatasync,
flush_range and network operations like send/recv.

As of this commit, nothing uses the AIO infrastructure. Later commits will add
an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for
read_stream.c IO, which, in one fell swoop, will convert all read stream users
to AIO.

The goal is to use AIO in many more places. There are patches to use AIO for
checkpointer and bgwriter that are reasonably close to being ready. There also
are prototypes to use it for WAL, relation extension, backend writes and many
more. Those prototypes were important to ensure the design of the AIO
subsystem is not too limiting (e.g. WAL writes need to happen in critical
sections, which influenced a lot of the design).

A future commit will add an AIO README explaining the AIO architecture and how
to use the AIO subsystem. The README is added later, as it references details
only added in later commits.

Many many more people than the folks named below have contributed with
feedback, work on semi-independent patches etc. E.g. various folks have
contributed patches to use the read stream infrastructure (added by Thomas in
b5a9b18cd0) in more places. Similarly, a *lot* of folks have contributed to
the CI infrastructure, which I had started to work on to make adding AIO
feasible.

Some of the work by contributors has gone into the "v1" prototype of AIO,
which heavily influenced the current design of the AIO subsystem. None of the
code from that directly survives, but without the prototype, the current
version of the AIO infrastructure would not exist.

Similarly, the reviewers below have not necessarily looked at the current
design or the whole infrastructure, but have provided very valuable input. I
am to blame for problems, not they.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-17 18:51:33 -04:00
Andres Freund
02844012b3 aio: Basic subsystem initialization
This commit just does the minimal wiring up of the AIO subsystem, added in the
next commit, to the rest of the system. The next commit contains more details
about motivation and architecture.

This commit is kept separate to make it easier to review, separating the
changes across the tree, from the implementation of the new subsystem.

We discussed squashing this commit with the main commit before merging AIO,
but there has been a mild preference for keeping it separate.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17 18:51:33 -04:00
Michael Paquier
19c6e92b13 Apply more consistent style for command options in TAP tests
This commit reshapes the grammar of some commands to apply a more
consistent style across the board, following rules similar to
ce1b0f9da0:
- Elimination of some pointless used-once variables.
- Use of long options, to self-document better the options used.
- Use of fat commas to link option names and their assigned values,
including redirections, so as perltidy can be tricked to put them
together.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-17 12:42:23 +09:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Richard Guo
cc5d98525d Fix incorrect handling of subquery pullup
When pulling up a subquery, if the subquery's target list items are
used in grouping set columns, we need to wrap them in PlaceHolderVars.
This ensures that expressions retain their separate identity so that
they will match grouping set columns when appropriate.

In 90947674f, we decided to wrap subquery outputs that are non-var
expressions in PlaceHolderVars.  This prevents const-simplification
from merging them into the surrounding expressions after subquery
pullup, which could otherwise lead to failing to match those
subexpressions to grouping set columns, with the effect that they'd
not go to null when expected.

However, that left some loose ends.  If the subquery's target list
contains two or more identical Var expressions, we can still fail to
match the Var expression to the expected grouping set expression.
This is not related to const-simplification, but rather to how we
match expressions to lower target items in setrefs.c.

For sort/group expressions, we use ressortgroupref matching, which
works well.  For other expressions, we primarily rely on comparing the
expressions to determine if they are the same.  Therefore, we need a
way to prevent setrefs.c from matching the expression to some other
identical ones.

To fix, wrap all subquery outputs in PlaceHolderVars if the parent
query uses grouping sets, ensuring that they preserve their separate
identity throughout the whole planning process.

Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-meSahaanKskpBn0KKxdHAXC1_EJCVWHxEodqirrGJnw@mail.gmail.com
2025-03-13 16:36:03 +09:00
Melanie Plageman
18cd15e706 Add connection establishment duration logging
Add log_connections option 'setup_durations' which logs durations of
several key parts of connection establishment and backend setup.

For an incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. This logging provides visibility
into authentication and fork duration as well as the end-to-end
connection establishment and backend initialization time.

To make this portable, the timings captured in the postmaster (socket
creation time, fork initiation time) are passed through the
BackendStartupData.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:27 -04:00
Melanie Plageman
9219093cab Modularize log_connections output
Convert the boolean log_connections GUC into a list GUC comprised of the
connection aspects to log.

This gives users more control over the volume and kind of connection
logging.

The current log_connections options are 'receipt', 'authentication', and
'authorization'. The empty string disables all connection logging. 'all'
enables all available connection logging.

For backwards compatibility, the most common values for the
log_connections boolean are still supported (on, off, 1, 0, true, false,
yes, no). Note that previously supported substrings of on, off, true,
false, yes, and no are no longer supported.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:21 -04:00
Peter Geoghegan
0fbceae841 Show index search count in EXPLAIN ANALYZE, take 2.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes.  This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).

The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level).  Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).

Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:").  It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here).  It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call).  But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11 09:20:50 -04:00
Tomas Vondra
0b2a45a5d1 Compress TID lists when writing GIN tuples to disk
When serializing GIN tuples to tuplesorts during parallel index builds,
we can significantly reduce the amount of data by compressing the TID
lists. The GIN opclasses may produce a lot of data (depending on how
many keys are extracted from each row), and the TID compression is very
efficient and effective.

If the number of distinct keys is high, the first worker pass (reading
data from the table and writing them into a private tuplesort) may not
benefit from the compression very much. It is likely to spill data to
disk before the TID lists get long enough for the compression to help.
The second pass (writing the merged data into the shared tuplesort) is
more likely to benefit from compression.

The compression can be seen as a way to reduce the amount of disk space
needed by the parallel builds, because the data is written twice. First
into the per-worker tuplesorts, then into the shared tuplesort.

Author: Tomas Vondra
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-04 19:02:05 +01:00
Thomas Munro
b6904afae4 ci: Use a RAM disk for NetBSD and OpenBSD.
Put the RAM disk setup for all three *BSD CI tasks into a common script,
replacing the old FreeBSD-specific one from commit 0265e5c1.  This makes
them run 3 times and a bit over 2 times faster, respectively.

NetBSD and FreeBSD now share the same one-liner to mount tmpfs.  OpenBSD
needs a GCP-image specific recipe that knows where to steal an unused
disk partition needed to reserve swap space for an mfs RAM disk, because
its tmpfs is deprecated and currently broken.  The configured size is
enough for our current tests but could potentially need future
expansion.  Thanks to Bilal for the disklabel incantation.

Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJJ-XrPhN%2BQA4ZUfYAAXcwOSDty9t0vE9Z8__AdacKnQg%40mail.gmail.com
2025-03-04 11:29:21 +13:00
Tomas Vondra
8492feb98f Allow parallel CREATE INDEX for GIN indexes
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.

The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.

The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).

Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-03 16:53:06 +01:00
Masahiko Sawada
7717f63006 Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.

This change is a preliminary step towards making the COPY FROM command
extensible in terms of input formats.

Similar to 2e4127b6d2, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-28 10:29:36 -08:00
Masahiko Sawada
2e4127b6d2 Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-27 15:03:52 -08:00
Michael Paquier
0e42d31b0b Adding new PgStat_WalCounters structure in pgstat.h
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.

Bump PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-26 16:48:54 +09:00
Andres Freund
ecbff4378b Change _mdfd_segpath() to return paths by value
This basically mirrors the changes done in the predecessor commit. While there
isn't currently a need to get these paths in critical sections, it seems a
shame to unnecessarily allocate memory in these paths now that relpath()
doesn't allocate anymore.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Michael Paquier
2421e9a51d Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time

a051e71e28 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects.  So, keeping the same information in pg_stat_wal
has little benefits.

Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.

The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-24 09:51:56 +09:00
Andrew Dunstan
8e4d72573c Ignore blank lines in pgindent exclude files
Currently a blank line matches everything, which is almost never what
someone would want. If they really want that they can use a wildcard
regex to do it.

Author: Zsolt Parragi <zsolt.parragi@percona.com>

Discussion: https://postgr.es/m/CAN4CZFNka+2q3=-Dithr4w65RJfwPaV92T62spEzLn+T4MgcMg@mail.gmail.com
2025-02-20 11:36:07 -05:00
Daniel Gustafsson
b3f0be788a Add support for OAUTHBEARER SASL mechanism
This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628.  In order to use this there is a
new pg_hba auth method called oauth.  When speaking to a OAuth-
enabled server, it looks a bit like this:

  $ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
  Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG

Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows.  The built-in device authorization flow is currently not
supported on Windows.

In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added.  As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in.  Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.

This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements.  In order to run the tests, Python is required
as this adds a https server written in Python.  Tests are gated
behind PG_TEST_EXTRA as they open ports.

This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions:  Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few.  While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
2025-02-20 16:25:17 +01:00
Jeff Davis
1fd1bd8710 Transfer statistics during pg_upgrade.
Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com
2025-02-20 01:29:06 -08:00
Álvaro Herrera
80d7f99049
Add ATAlterConstraint struct for ALTER .. CONSTRAINT
Replace the use of Constraint with a new ATAlterConstraint struct, which
allows us to pass additional information.  No functionality is added by
this commit.  This is necessary for future work that allows altering
constraints in other ways.

I (Álvaro) took the liberty of restructuring the code for ALTER
CONSTRAINT beyond what Amul did.  The original coding before Amul's
patch was unnecessarily baroque, and this change makes things simpler
by removing one level of subroutine.  Also, partly remove the assumption
that only partitioned tables are relevant (by passing sensible 'recurse'
arguments) and no longer ignore whether ONLY was specified.  I say
'partly' because the current coding only walks down via the 'conparentid'
relationship, which is only used for partitioned tables; but future
patches could handle ONLY or not for other types of constraint changes
for legacy inheritance trees too.

Author: Amul Sul <sulamul@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com
2025-02-19 13:06:13 +01:00
Amit Kapila
ac0e33136a Invalidate inactive replication slots.
This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-02-19 09:29:50 +05:30
Daniel Gustafsson
95ef3d9029 Add PGErrorVerbosity to typedefs.list
PGErrorVerbosity was missing which resulted in incorrect whitespace
alignment going back all the way to e3860ffa4d.  No backpatch for
this though since we don't pgindent backbranches.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com
2025-02-18 13:23:13 +01:00
Alexander Korotkov
fc069a3a63 Implement Self-Join Elimination
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

 1. At most, one inner row matches the join clause;
 2. Each matched inner row must be (physically) the same as the outer one;
 3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

 1. Collect all merge-joinable join quals which look like a.x = b.x;
 2. Add to the list above the baseretrictinfo of the inner table;
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-17 12:44:12 +02:00
Andres Freund
c45963c5d5 ci: Collect core files on NetBSD and OpenBSD
Support for NetBSD and OpenBSD operating systems have been added to CI in the
prior commit. Now add support for collect core files and generating backtraces
using for all core files.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com
2025-02-12 09:40:20 -05:00
Andres Freund
e291573534 ci: Test NetBSD and OpenBSD
NetBSD and OpenBSD Postgres CI images are now generated [1], but aren't yet
utilized for Postgres' CI. This commit adds CI support for them.

For now the tasks will be manually triggered, to save on CI credits.

[1] https://github.com/anarazel/pg-vm-images

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com
2025-02-12 09:40:07 -05:00
Tom Lane
c366d2bdba Allow extension functions to participate in in-place updates.
Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request instead
of hard-wired logic to decide if in-place update is safe.
Preserve the previous optimizations by creating support functions
for the three functions that were previously hard-wired.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
2025-02-11 12:49:34 -05:00
Tom Lane
6c7251db0c Implement new optimization rule for updates of expanded variables.
If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
2025-02-11 12:34:59 -05:00
Andres Freund
ac00ff1c96 Add test of various escape functions
As highlighted by the prior commit, writing correct escape functions is less
trivial than one might hope.

This test module tries to verify that different escaping functions behave
reasonably. It e.g. tests:

- Invalidly encoded input to an escape function leads to invalidly encoded
  output

- Trailing incomplete multi-byte characters are handled sensibly

- Escaped strings are parsed as single statement by psql's parser (which
  derives from the backend parser)

There are further tests that would be good to add. But even in the current
state it was rather useful for writing the fix in the prior commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:37 -05:00
Peter Eisentraut
c09e5a6a01 Convert strategies to and from compare types
For each Index AM, provide a mapping between operator strategies and
the system-wide generic concept of a comparison type.  For example,
for btree, BTLessStrategyNumber maps to and from COMPARE_LT.  Numerous
places in the planner and executor think directly in terms of btree
strategy numbers (and a few in terms of hash strategy numbers.)  These
should be converted over subsequent commits to think in terms of
CompareType instead.  (This commit doesn't make any use of this API
yet.)

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-02 10:26:04 +01:00
Peter Eisentraut
2452e71ff2 Add script to keep .editorconfig in sync with .gitattributes
Our repo already contained an .editorconfig file, but it was not kept
up to date with .gitattributes.  This adds a script that keeps these
files in sync.  A big advantage of the editorconfig file is that it
many editors/IDEs get automatically configured to trim trailing
newlines and add a final newline on save, while .gitattributes only
complains about these problems instead of automatically fixing them.

This also adds rules to .gitattributes for Python files as well as for
C files in pg_bsd_indent directory (which have a different tab_width
than most C files due to being vendored in).

Author: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQQGzbroAXi+Yicp3HvcCo4=g84kaOgjuvQ5MW9F0ubOGg@mail.gmail.com
2025-02-01 10:09:45 +01:00
Amit Kapila
e65dbc9927 Change publication's publish_generated_columns option type to enum.
The current boolean publish_generated_columns option only supports a
binary choice, which is insufficient for future enhancements where
generated columns can be of different types (e.g., stored or virtual). The
supported values for the publish_generated_columns option are 'none' and
'stored'.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/d718d219-dd47-4a33-bb97-56e8fc4da994@eisentraut.org
Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
2025-01-23 15:28:37 +05:30
Michael Paquier
be31ac2519 Run perltidy
A follow-up patch will adjust the TAP tests to follow a more-structured
format for option lists in commands, that perltidy is able to cope
better with.  Putting the tree first in a clean state makes the next
change a bit easier.  v20230309 has been used.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87jzc46d8u.fsf@wibble.ilmari.org
2025-01-22 10:15:32 +09:00
Peter Eisentraut
6e4df237fb Add PageData C type
This adds the C type PageData and makes the existing type Page a
pointer to it.  This follows the usual PostgreSQL C type naming scheme
of Foo/FooData pairs.  (Prior to commit ddbba3aac8, PageData existed
as an unrelated type.)  The type definitions are compatible, so this
doesn't change anything except some of the naming.

Discussion: https://www.postgresql.org/message-id/flat/692ee0da-49da-4d32-8dca-da224cc2800e@eisentraut.org
2025-01-20 11:06:49 +01:00
Jeff Davis
286a365b9c Support Unicode full case mapping and conversion.
Generate tables from Unicode SpecialCasing.txt to support more
sophisticated case mapping behavior:

 * support case mappings to multiple codepoints, such as "ß"
   uppercasing to "SS"
 * support conditional case mappings, such as the "final sigma"
 * support titlecase variants, such as "dž" uppercasing to "DŽ" but
   titlecasing to "Dž"

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite
2025-01-17 15:56:20 -08:00
Melanie Plageman
f7a8fc10cc Add and use BitmapHeapScanDescData struct
Move the several members of HeapScanDescData which are specific to
Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which
inherits from HeapScanDescData.

This reduces the size of the HeapScanDescData for other types of scans
and will allow us to add additional bitmap heap scan-specific members in
the future without fear of bloating the HeapScanDescData.

Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2025-01-16 18:42:39 -05:00
Tom Lane
d7674c9fab Seek zone abbreviations in the IANA data before timezone_abbreviations.
If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data, not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts across
different time zones.

Also update the pg_timezone_abbrevs view to report abbreviations
that are recognized via the IANA data, and *not* report any
timezone_abbreviations entries that are thereby overridden.
Under the hood, there are now two SRFs, one that pulls the IANA
data and one that pulls timezone_abbreviations entries.  They're
combined by logic in the view.  This approach was useful for
debugging (since the functions can be called on their own).
While I don't intend to document the functions explicitly,
they might be useful to call directly.

Also improve DecodeTimezoneAbbrev's caching logic so that it can
cache zone abbreviations found in the IANA data.  Without that,
this patch would have caused a noticeable degradation of the
runtime of timestamptz_in.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
2025-01-16 14:11:19 -05:00
Dean Rasheed
80feb727c8 Add OLD/NEW support to RETURNING in DML queries.
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:

  RETURNING old.colname, new.colname, ...

  RETURNING old.*, new.*

Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:

  RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...

This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.

For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.

Dean Rasheed, reviewed by Jian He and Jeff Davis.

Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
2025-01-16 14:57:35 +00:00
Peter Eisentraut
44512e7c95 Add a bit of documentation related to IWYU
Add some basic information about IWYU to src/tools/pginclude/README.

Discussion: https://www.postgresql.org/message-id/flat/9395d484-eff4-47c2-b276-8e228526c8ae@eisentraut.org
2025-01-15 18:57:53 +01:00
Peter Eisentraut
6fdd5d9563 Drop warning-free support for Flex 2.5.35
This removes all the various workarounds for avoiding compiler
warnings with Flex 2.5.35.  Several recent patches have added
additional warnings that would either need to be fixed along the lines
of the existing workarounds, or we decide to no longer care about
this, which we do here.

Flex 2.5.35 is extremely outdated, and you can't even download it
anymore from any of the Flex project sites, so it's nearly impossible
to support.

After this, using Flex 2.5.35 will still work, but the generated code
will produce numerous compiler warnings.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/1a204ccd-7ae6-478c-a431-407b5c48ccc6@eisentraut.org
2025-01-15 15:35:08 +01:00
Peter Eisentraut
6339f6468e Rename RowCompareType to CompareType
RowCompareType served as a way to describe the fundamental meaning of
an operator, notionally independent of an operator class (although so
far this was only really supported for btrees).  Its original purpose
was for use inside RowCompareExpr, and it has also found some small
use outside, such as for get_op_btree_interpretation().

We want to expand this now, as a more general way to describe operator
semantics for other index access methods, including gist (to improve
GistTranslateStratnum()) and others not written yet.  To avoid future
confusion, we rename the type to CompareType and the symbols from
ROWCOMPARE_XXX to COMPARE_XXX to reflect their more general purpose.

Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-01-15 08:44:01 +01:00
Heikki Linnakangas
af8cd1639a Fix catcache invalidation of a list entry that's being built
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.

Back-patch to all supported versions.

Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
2025-01-14 14:28:49 +02:00
Michael Paquier
2c14037bb5 Refactor some code related to backend statistics
This commit changes the way pending backend statistics are tracked by
moving them into a new structure called PgStat_BackendPending, removing
PgStat_BackendPendingIO.  PgStat_BackendPending currently only includes
PgStat_PendingIO for the pending I/O stats.

pgstat_flush_backend() is extended with a "flags" argument to control
which parts of the stats of a backend should be flushed.

With this refactoring, it becomes easier to plug into backend statistics
more data.  A patch to add information related to WAL in this stats kind
is under discussion.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-01-10 09:00:48 +09:00