get_pg_version() is able to return a version number, that can be used
for comparisons based on PG_VERSION_NUM. A macro is added to convert
the result to a major version number, to work with PG_MAJORVERSION_NUM.
It is possible to pass to the routine an optional argument, where the
contents retrieved from PG_VERSION are saved. This requirement matters
for some of the frontend code (one example: pg_upgrade wants that for
tablespace paths with a version number strictly older than v10).
This will be used by a set of follow-up patches, to be consumed in
various frontend tools that duplicate a logic similar to do what this
new routine does, like:
- pg_resetwal
- pg_combinebackup
- pg_createsubscriber
- pg_upgrade
This routine supports both the post-v10 version number and the older
flavor (aka 9.6), as required at least by pg_upgrade.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.
Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.
ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.
The view pg_publication_sequences provides information about the mapping
between publications and sequences.
This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.
Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
Previously StrategyGetBuffer() acquired the buffer header spinlock for every
buffer, whether it was reusable or not. If reusable, it'd be returned, with
the lock held, to GetVictimBuffer(), which then would pin the buffer with
PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for
holding spinlocks (i.e. that they are only held for a few lines of consecutive
code) and necessitates using PinBuffer_Locked(), which scales worse than
PinBuffer() due to holding the spinlock. This alone makes it worth changing
the code.
However, the main reason to change this is that a future commit will make
PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain
scalability for the much more common case of pinning a pre-existing buffer. By
pinning the buffer with a single atomic operation, iff the buffer is reusable,
we avoid any potential regression for miss-heavy workloads. There strictly are
fewer atomic operations for each potential buffer after this change.
The price for this improvement is that freelist.c needs two CAS loops and
needs to be able to set up the resource accounting for pinned buffers. The
latter is achieved by exposing a new function for that purpose from bufmgr.c,
that seems better than exposing the entire private refcount infrastructure.
The improvement seems worth the complexity.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().
The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.
Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This commit introduces a new column mem_exceeded_count to the
pg_stat_replication_slots view. This counter tracks how often the
memory used by logical decoding exceeds the logical_decoding_work_mem
limit. The new statistic helps users determine whether exceeding the
logical_decoding_work_mem limit is a rare occurrences or a frequent
issue, information that wasn't available through existing statistics.
Bumps catversion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
In the same spirit as 3bf905692, assume that all compilers we still
support provide the NAN macro, and get rid of workarounds for that.
The C standard allows implementations to omit NAN if the underlying
float arithmetic lacks quiet (non-signaling) NaNs. However, we've
required that feature for years: the workarounds only supported
lack of the macro, not lack of the functionality. I put in a
compile-time #error if there's no macro, just for clarity.
Also fix up the copies of these functions in ecpglib, and leave
a breadcrumb for the next hacker who touches them.
History of the hacks being removed here can be found in commits
1bc2d544b, 4d17a2146, cec8394b5.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us
Extensions can stash data computed at plan time into this list using
planner_shutdown_hook (or perhaps other mechanisms) and then access
it from any code that has access to the PlannedStmt (such as explain
hooks), allowing for extensible debugging and instrumentation of
plans.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
These hooks allow plugins to get control at the earliest point at
which the PlannerGlobal object is fully initialized, and then just
before it gets destroyed. This is useful in combination with the
extendable plan state facilities (see extendplan.h) and perhaps for
other purposes as well.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
This allows extensions to have access to any data they've stored
in the ExplainState during planning. Unfortunately, it won't help
with EXPLAIN EXECUTE is used, but since that case is less common,
this still seems like an improvement.
Since planner() has quite a few arguments now, also add some
documentation of those arguments and the return value.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
Per idea from Robert Haas, though applied differently than originally
suggested.
Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
It is possible to call pg_stat_reset_single_function_counters() for a
single function, but the reset time was missing the system view showing
its statistics. Like all the fields of pg_stat_user_functions, the GUC
track_functions needs to be enabled to show the statistics about
function executions.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatFuncEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz
The INFINITY macro is always defined per C99 standard, so this should
mean we can now get rid of the workaround code for when that macro isn't
defined.
Also, delete the (now unneeded) #pragma code which was disabling a
compiler warning in MSVC. There was a comment explaining why the #pragma
was placed outside the function body to work around a MSVC compiler bug,
but the link explaining that was dead, as reported by jian he.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com
Instead, use the new mechanism that allows planner extensions to store
private state inside a PlannerInfo, treating GEQO as an in-core planner
extension. This is a useful test of the new facility, and also buys
back a few bytes of storage.
To make this work, we must remove innerrel_is_unique_ext's hack of
testing whether join_search_private is set as a proxy for whether
the join search might be retried. Add a flag that extensions can
use to explicitly signal their intentions instead.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Extension that make extensive use of planner hooks may want to
coordinate their efforts, for example to avoid duplicate computation,
but that's currently difficult because there's no really good way to
pass data between different hooks.
To make that easier, allow for storage of extension-managed private
state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very
similar lines to what we have permitted for ExplainState since commit
c65bc2e1d1.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.
To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.
We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.
We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
The hex_encode() and hex_decode() functions serve as the workhorses
for hexadecimal data for bytea's text format conversion functions,
and some workloads are sensitive to their performance. This commit
adds new implementations that use routines from port/simd.h, which
testing indicates are much faster for larger inputs. For small or
invalid inputs, we fall back on the existing scalar versions.
Since we are using port/simd.h, these optimizations apply to both
x86-64 and AArch64.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com>
Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan
This patch enhances the pg_get_sequence_data function to include the
page-level LSN (Log Sequence Number) of the sequence. This additional
metadata will be used by upcoming patches to support synchronization
of sequences during logical replication.
By exposing the LSN, we enable more accurate tracking of sequence
changes, which is essential for maintaining consistency across
replicated nodes.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
All the other structures describe the list of blocks used, and in the
case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with
the items added to it.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
It is possible to call pg_stat_reset_single_table_counters() on a
relation (index or table) but the reset time was missing from the system
views showing their statistics.
This commit adds the reset time as an attribute of pg_stat_all_tables,
pg_stat_all_indexes, and other relations related to them.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatTabEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
Presently, the SIMD implementation of this function uses unsigned
saturating subtraction to find bytes less than or equal to the
given value, which is a workaround for the lack of unsigned
comparison instructions on some architectures. However, Neon
offers vminvq_u8(), which returns the minimum (unsigned) value in
the vector. This commit adds a Neon-specific implementation that
uses vminvq_u8() to optimize vector8_has_le() on AArch64.
In passing, adjust the SSE2 implementation to use vector8_min() and
vector8_eq() to find values less than or equal to the given value.
This was the only use of vector8_ssub(), so it has been removed.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan
Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead,
lag, first_value, last_value and nth_value window functions. If
unspecified, the default is RESPECT NULLS which includes NULL values
in any result calculation. IGNORE NULLS ignores NULL values.
Built-in window functions are modified to call new API
WinCheckAndInitializeNullTreatment() to indicate whether they accept
IGNORE NULLS/RESPECT NULLS option or not (the API can be called by
user defined window functions as well). If WinGetFuncArgInPartition's
allowNullTreatment argument is true and IGNORE NULLS option is given,
WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return
evaluated function's argument expression on specified non NULL row (if
it exists) in the partition or the frame.
When IGNORE NULLS option is given, window functions need to visit and
evaluate same rows over and over again to look for non null rows. To
mitigate the issue, 2-bit not null information array is created while
executing window functions to remember whether the row has been
already evaluated to NULL or NOT NULL. If already evaluated, we could
skip the evaluation work, thus we could get better performance.
Author: Oliver Ford <ojford@gmail.com>
Co-authored-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com>
Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
Stop including utils/relcache.h in access/genam.h, and stop including
htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and
tidbitmap.h) are widely used in other header files, so it's in our best
interest that they remain as lean as reasonable.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does
not contain an aggregate or window function into the groupClause of
the query, making it exactly equivalent to specifying those same
expressions in an explicit GROUP BY list.
This feature is useful for certain kinds of data exploration. It's
already present in some other DBMSes, and the SQL committee recently
accepted it into the standard, so we can be reasonably confident in
the syntax being stable. We do have to invent part of the semantics,
as the standard doesn't allow for expressions in GROUP BY, so they
haven't specified what to do with window functions. We assume that
those should be treated like aggregates, i.e., left out of the
constructed GROUP BY list.
In passing, wordsmith some existing documentation about GROUP BY,
and update some neglected synopsis entries in select_into.sgml.
Author: David Christensen <david@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
Stats kinds can set a new option called "track_entry_count" (disabled by
default, available for variable-numbered stats) that will make pgstats
track the number of entries that exist in its shared hashtable.
As there is only one code path where a new entry is added, and one code
path where entries are freed, the count tracking is straight-forward in
its implementation. Reads of these counters are optimistic, and may
change across two calls. The counter is incremented when an entry is
created (not when reused), and is decremented when an entry is freed
from the hashtable (marked for drop with its refcount reaching 0), which
is something that pgstats decides internally.
A first use case of this facility would be pg_stat_statements, where we
need to be able to cap the number of entries that would be stored in the
shared hashtable, based on its "max" GUC. The module currently relies
on hash_get_num_entries(), which offers a cheap way to count how many
entries are in its hash table, but we cannot do that in pgstats for
variable-sized stats kinds as a single hashtable is used for all the
stats kinds. Independently of PGSS, this is useful for other custom
stats kinds that want to cap, control, or track the number of entries
they have, without depending on a potentially expensive sequential scan
to know the number of entries while holding an extra exclusive lock.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Keisuke Kuroda <keisuke.kuroda.3862@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
It had always been intended that this already works correctly as
pg_unreachable() uses __assume(0) on MSVC, and that directs the compiler
in a way so it knows that a given function won't return. However, with
ereport_domain(), it didn't work...
It's now understood that the failure to determine that elog(ERROR) does not
return comes from the inability of the MSVC compiler to detect the "const
int elevel_" is the same as the "elevel" macro parameter. MSVC seems to be
unable to make the "if (elevel_ >= ERROR) branch constantly-true when the
macro is used with any elevel >= ERROR, therefore the pg_unreachable() is
seen to only be present in a *conditional* branch rather than present
*unconditionally*.
While there seems to be no way to force the compiler into knowing that
elevel_ is equal to elevel within the ereport_domain() macro, there is a
way in C11 to determine if the elevel parameter is a compile-time
constant or not. This is done via some hackery using the _Generic()
intrinsic function, which gives us functionality similar to GCC's
__builtin_constant_p(), albeit only for integers.
Here we define pg_builtin_integer_constant_p() for this purpose.
Callers can check for availability via HAVE_PG_BUILTIN_INTEGER_CONSTANT_P.
ereport_domain() has been adjusted to use
pg_builtin_integer_constant_p() instead of __builtin_constant_p().
It's not quite clear at this stage if this now allows us to forego doing
the likes of "return NULL; /* keep compiler quiet */" as there may be other
compilers in use that have similar struggles. It's just a matter of time
before someone commits a function that does not "return" a value after
an elog(ERROR). Let's make time and lack of complaints about said commit
be the judge of if we need to continue the "/* keep compiler quiet */"
palaver.
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvom02B_XNVSkvxznVUyZbjGAR+5myA89ZcbEd3=PA9UcA@mail.gmail.com
Use our established coding pattern to reduce maintenance pain when
adding other per-process-type characteristics.
Like PG_KEYWORD, PG_CMDTAG, PG_RMGR.
To keep the strings translatable, the relevant makefile now also scans
src/include for this specific file. I didn't want to have it scan all
.h files, as then gettext would have to scan all header files. I didn't
find any way to affect the meson behavior in this respect though.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Discussion: https://postgr.es/m/202507151830.dwgz5nmmqtdy@alvherre.pgsql
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
This commit corrects several issues in function comments:
* The parameter "rel" was incorrectly referred to as "relation" in the comments
for table_tuple_delete(), table_tuple_update(), and table_tuple_lock().
* In table_tuple_delete(), "changingPart" was listed as an output parameter
in the comments but is actually input.
* In table_tuple_update(), "slot" was listed as an input parameter
in the comments but is actually output.
* The comment for "update_indexes" in table_tuple_update() was mis-indented.
* The comments for heap_lock_tuple() incorrectly referenced a non-existent
"tid" parameter.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.
Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.
Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted. Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.
The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.
Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not. In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that. What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.
Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want. (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)
Notably, this works even when none of the values are common enough
to be called MCEs. In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates. So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines. A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot. That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on. The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.
In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data. Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.
Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.
Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.
This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.
This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.
Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.
Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
This change is a tighter rework of 7d85d87f4d, which tried to improve
the code so as it would work should PgStat_HashKey gain new fields that
create padding bytes. However, the previous change is proving to not be
enough as some code paths of pgstats do not pass PgStat_HashKey by
reference (valgrind would warn when padding is added to the structure,
through a new field).
Per discussion, let's document and check that PgStat_HashKey has no
padding rather than try to complicate the code of pgstats so as it is
able to work around that.
This removes a couple of memset(0) calls that should not be required.
While on it, this commit adds a static assertion checking that no
padding is introduced in the structure, by checking that the size of
PgStat_HashKey matches with the sum of the size of all its fields.
The object ID part of the hash key is already 8 bytes, which should be
plenty enough already. A comment is added to discourage the addition of
new fields.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com
Commit 2a600a93 made Datum 8 bytes wide everywhere. It was no longer
appropriate to use TypeSizeT on 32 bit systems, and JIT compilation
would fail with various type check errors. Introduce a separate
LLVMTypeRef with the name TypeDatum. TypeSizeT is still used in some
places for actual size_t values.
Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru
Up to now we've contented ourselves with a one-size-fits-all error
hint when we fail to find any match to a function or procedure call.
That was mostly okay in the beginning, but it was never great, and
since the introduction of named arguments it's really not adequate.
We at least ought to distinguish "function name doesn't exist" from
"function name exists, but not with those argument names". And the
rules for named-argument matching are arcane enough that some more
detail seems warranted if we match the argument names but the call
still doesn't work.
This patch creates a framework for dealing with these problems:
FuncnameGetCandidates and related code will now pass back a bitmask of
flags showing how far the match succeeded. This allows a considerable
amount of granularity in the reports. The set-bits-in-a-bitmask
approach means that when there are multiple candidate functions, the
report will reflect the match(es) that got the furthest, which seems
correct. Also, we can avoid mentioning "maybe add casts" unless
failure to match argument types is actually the issue.
Extend the same return-a-bitmask approach to OpernameGetCandidates.
The issues around argument names don't apply to operator syntax,
but it still seems worth distinguishing between "there is no
operator of that name" and "we couldn't match the argument types".
While at it, adjust these messages and related ones to more strictly
separate "detail" from "hint", following our message style guidelines'
distinction between those.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us