The test has been reported as having a race condition for the case of a
worker that should be terminated after a database rename. Based on the
report received from buildfarm member jay, the database renamed is
accessed by a different session, preventing the ALTER DATABASE to
complete, ultimately failing the test.
Honestly, I am not completely sure what is the origin of this
disturbance, but two possibilities are an autovacuum or parallel worker
(due to debug_parallel_query being used by the host). In order to
(hopefully) stabilize the test, autovacuum and debug_parallel_query are
now disabled in the configuration of the node used in the test.
The failure is hard to reproduce, so it will take a few weeks to make
sure that the test has become stable. Let's see where it goes.
Reported-by: Aya Iwata <iwata.aya@fujitsu.com>
Discussion: https://postgr.es/m/OS3PR01MB8889505E2F3E443CCA4BD72EEA45A@OS3PR01MB8889.jpnprd01.prod.outlook.com
Previously, this was 16 bytes. With the use of some bitflags and by
reducing the attcacheoff field size to a 16-bit type, we can halve the
size of the struct.
It's unlikely that caching the offsets for offsets larger than what will
fit in a 16-bit int will help much as the tuple is very likely to have
some non-fixed-width types anyway, the offsets of which we cannot cache.
Shrinking this down to 8 bytes helps by accessing fewer cachelines when
performing tuple deformation. The fields used there are all fully
fledged fields, which don't require any bitmasking to extract the value
of. It also helps to more efficiently calculate the address of a
compact_attrs[] element in TupleDesc as the x86 LEA instruction can work
with 8 byte offsets, which allows the element address to be calculated
from the TupleDesc's address in a single instruction using LEA's
concurrent shift and add.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com
Commit 6eedb2a5fd made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.
A similar issue previously existed in the GiST code. Commit b1f14c9672
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().
This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.
Backpatch to all supported versions.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
Backpatch-through: 14
This code was recently adjusted by c456e3911, but that commit didn't get
the logic correct when finding the attnum to start walking the tuple in.
If there is a NULL, we need to start walking the tuple before it.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNnb-s_=VdVUZ9h7dPA0u3hxV8x2aU3obZytnqQZ_MiROA@mail.gmail.com
TOK_IDENT allows only non-keywords; identifier should be used
any place where either keywords or non-keywords should be accepted.
Hence, without this commit, any string that happens to be a keyword
can't be used as a partition schema, partition name, or plan name,
which is incorrect.
Author: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com
The fundamental problem is that when we call clang to generate
bitcode, we might be using configure results from a different
compiler, which might not be fully compatible with the clang we are
using. In practice, clang supports most things other compilers
support, so this has apparently not been a problem in practice.
But commits 4cfce4e62c, 0af05b5dbb, and 59292f7aac have been
struggling to make typeof_unqual work in this situation. Clang added
support in version 19, GCC in version 14, so if you are using, say,
GCC 14 and Clang 16, the compilation with the latter will fail. Such
combinations are not very likely in practice, because GCC 14 and Clang
19 were released within a few months of each other, and so Linux
distributions are likely to have suitable combinations. But some
buildfarm members and some Fedora versions are affected, so this tries
to fix it.
The fully correct solution would be to run a separate set of configure
tests for that clang-for-bitcode, but that would be very difficult to
implement, and probably of limited use in practice. So the workaround
here is that we hardcodedly override the configure result under clang
based on the version number. As long as we only have a few of these
cases, this should be manageable.
Also swap the order of the tests of typeof_unqual: Commit 59292f7aac
tested the underscore variant first, but the reasons for that are now
gone.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
This commit teaches pg_dumpall to fail when both --clean and
--data-only are specified. Previously, it passed the options
through to pg_dump, which would fail after pg_dumpall had already
started producing output. Like recent commits b2898baaf7 and
7c8280eeb5, no back-patch.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAKYtNArrHiJ0LDB9BFZiUWs6tC78QkBN50wiwO07WhxewYDS3Q%40mail.gmail.com
Remove a bunch of #include lines from execnodes.h. Most of these
requier suitable typedefs to be added, so that it still compiles
standalone. In one case, the fix is to move a struct definition to the
one .c file where it is needed.
Also some light clean up in plannodes.h and genam.h, though not as
extensive as in execnodes.h.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql
Commit 8fe315f18d added the stats_reset column to pg_statio_all_sequences and
included a regression test to verify that statistics in this view are reset
correctly. However, this test caused buildfarm member crake to report
a pg_upgradeCheck failure.
The failing test assumed that the blks_read and blks_hit counters
in pg_statio_all_sequences would be zero after calling
pg_stat_reset_single_table_counters(). On crake, however, either blks_read or
blks_hit sometimes appeared as 1 during the pg_upgradeCheck test, even right
after the reset.
Since these counters may change due to concurrent activity and the test is
unstable, this commit removes the checks for blks_read and blks_hit in
pg_statio_all_sequences from the regression test.
Per buildfarm member crake.
Discussion: https://postgr.es/m/CAHGQGwFcay_tX=7HSS=N=+Yd0FLEm2GrJgwxnqHM4wvxX0B=4g@mail.gmail.com
When an extension is located via extension_control_path and it has a
hardcoded $libdir/ path, this is stripped by the
extension_control_path mechanism. But when pg_upgrade verifies the
extension using LOAD, this stripping does not happen, and so
pg_upgrade will fail because it cannot load the extension. To work
around that, change pg_upgrade to itself strip the prefix when it runs
its checks. A test case is also added.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/43b3691c673a8b9158f5a09f06eacc3c63e2c02d.camel%40gmail.com
They were already using pg_attribute_aligned. This replaces that with
alignas and moves that into the required syntactic position.
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/d7a788fa-e609-4894-a8be-2f70e135424f%40eisentraut.org
A following commit will enable -Wstrict-prototypes and -Wold-style-definition
by default. This commit fixes the warnings that those new flags will generate
before actually adding the new flags.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/13d51b20-a69c-4ac1-8546-ec4fc278064f%40eisentraut.org
Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
When log_lock_waits is enabled, the "still waiting on lock" message is normally
emitted only once while a session continues waiting. However, if the wait is
interrupted, for example by wakeups from client_connection_check_interval,
SIGHUP for configuration reloads, or similar events, the message could be
emitted again each time the wait resumes.
For example, with very small client_connection_check_interval values
(e.g., 100 ms), this behavior could flood the logs with repeated messages,
making them difficult to use.
To prevent this, this commit guards the "still waiting on lock" message so
it is reported at most once during a lock wait, even if the wait is interrupted.
This preserves the intended behavior when no interrupts occur.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHZUmg+r4kMcPYt_Z-txxVX+CJJhfra+qemxKXvAxYbpw@mail.gmail.com
ALTER TABLE .. CLUSTER ON and SET WITHOUT CLUSTER are not supported for
partitioned tables and already fail with a check happening when the
sub-command is executed, not when it is prepared.
This commit moves the relkind check for partitioned tables to happen
when the sub-command is prepared in ATSimplePermissions(). This matches
with the practice of the other sub-commands of ALTER TABLE, shaving one
translatable string.
mark_index_clustered() can be a bit simplified, switching one
elog(ERROR) to an assertion. Note that mark_index_clustered() can also
be called through a CLUSTER command, but it cannot be reached for a
partitioned table, per the assertion based on the relkind in
cluster_rel(), and there is only one caller of rebuild_relation().
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com
pg_statio_all_sequences lacked a stats_reset column, unlike the other
pg_statio_* views that already expose it. This commit adds the column so
users can see when the statistics in this view were last reset.
Also this commit updates the documentation for
pg_stat_reset_single_table_counters() to clarify that it can reset statistics
for sequences and materialized views as well.
Catalog version bumped.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Shihao Zhong <zhong950419@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0OPGyDpwxkX81CtTt9xsj9-TNxhm=8JdOvEKPsVVFNg@mail.gmail.com
Commit 4daa140a2f introduced proper decoding for speculative aborts. As a
result, the internal state is guaranteed to be clean when a new
speculative insert is encountered. This patch removes the defensive
cleanup code that is no longer reachable.
Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/23256.1772702981@localhost
Commit 2a525cc97e introduced the ON_ERROR = 'set_null' option for COPY,
allowing it to be used with foreign tables backed by file_fdw. However,
unlike ON_ERROR = 'ignore', no regression test was added to verify
this behavior for file_fdw.
This commit adds a regression test to ensure that foreign tables using
file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yi Ding <dingyi_yale@163.com>
Discussion: https://postgr.es/m/CAHGQGwGmPc6aHpA5=WxKreiDePiOEitfOFsW2dSo5m81xWXgRA@mail.gmail.com
This commit refactors hashbulkdelete() to use streaming reads, improving
the efficiency of the operation by prefetching upcoming buckets while
processing a current bucket. There are some specific changes required
to make sure that the cleanup work happens in accordance to the data
pushed to the stream read callback. When the cached metadata page is
refreshed to be able to process the next set of buckets, the stream is
reset and the data fed to the stream read callback has to be updated.
The reset needs to happen in two code paths, when _hash_getcachedmetap()
is called.
The author has seen better performance numbers than myself on this one
(with tweaks similar to 6c228755ad). The numbers are good enough for
both of us that this change is worth doing, in terms of IO and runtime.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
We had defenses against -ffast-math in timestamp-related files,
which is a pretty obsolete place for them since we've not supported
floating-point timestamps in a long time. Remove those and instead
put one in float.c, which is still broken by using this switch.
Add some commentary to put more color on why it's a bad idea.
Also remove the check from configure. That was just there to fail
faster, but it doesn't really seem necessary anymore, and besides
we have no corresponding check in meson.build.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/abFXfKC8zR0Oclon%40ip-10-97-1-34.eu-west-3.compute.internal
Print an OID value inserted into a SQL query with %u not %d.
The existing code accidentally fails to malfunction when
given an OID above 2^31, but only accidentally; future changes
to our SQL parser could perhaps break it.
Declare the Oid values that ecpg_type_infocache_push() and
ecpg_is_type_an_array() work with as "Oid" not "int".
This doesn't have any functional effect, but it's clearer.
At the moment I don't see a need to back-patch this.
Bug: #19429
Author: fairyfar@msn.com
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19429-aead3b1874be1a99@postgresql.org
This commit includes various optimizations to improve the performance of
tuple deformation.
We now precalculate CompactAttribute's attcacheoff, which allows us to
remove the code from the deform routines which was setting the
attcacheoff. Setting the attcacheoff is now handled by
TupleDescFinalize(), which must be called before the TupleDesc is used for
anything. Having TupleDescFinalize() means we can store the first
attribute in the TupleDesc which does not have an offset cached. That
allows us to add a dedicated deforming loop to deform all attributes up
to the final one with an attcacheoff set, or up to the first NULL
attribute, whichever comes first.
Here we also improve tuple deformation performance of tuples with NULLs.
Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask,
deforming would, one-by-one, check each and every bit in the NULL bitmap
to see if it was zero. Now, we process the NULL bitmap 1 byte at a time
rather than 1 bit at a time to find the attnum with the first NULL. We
can now deform the tuple without checking for NULLs up to just before that
attribute.
We also record the maximum attribute number which is guaranteed to exist
in the tuple, that is, has a NOT NULL constraint and isn't an
atthasmissing attribute. When deforming only attributes prior to the
guaranteed attnum, we've no need to access the tuple's natt count. As an
additional optimization, we only count fixed-width columns when
calculating the maximum guaranteed column, as this eliminates the need to
emit code to fetch byref types in the deformation loop for guaranteed
attributes.
Some locations in the code deform tuples that have yet to go through NOT
NULL constraint validation. We're unable to perform the guaranteed
attribute optimization when that's the case. This optimization is opt-in
via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS
flag.
This commit also adds a more efficient way of populating the isnull
array by using a bit-wise SWAR trick which performs multiplication on the
inverse of the tuple's bitmap byte and masking out all but the lower bit
of each of the boolean's byte. This results in much more optimal code
when compared to determining the NULLness via att_isnull(). 8 isnull
elements are processed at once using this method, which means we need to
round the tts_isnull array size up to the next 8 bytes. The palloc code
does this anyway, but the round-up needed to be formalized so as not to
overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing
this also allows the NULL-checking deforming loop to more efficiently
check the isnull array, rather than doing the bit-wise processing for each
attribute that att_isnull() does.
The level of performance improvement from these changes seems to vary
depending on the CPU architecture. Apple's M chips seem particularly
fond of the changes, with some of the tested deform-heavy queries going
over twice as fast as before. With x86-64, the speedups aren't quite as
large. With tables containing only a small number of columns, the
speedups will be less.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
As of this commit all TupleDescs must have TupleDescFinalize() called on
them once the TupleDesc is set up and before BlessTupleDesc() is called.
In this commit, TupleDescFinalize() does nothing. This change has only
been separated out from the commit that properly implements this function
to make the change more obvious. Any extension which makes its own
TupleDesc will need to be modified to call the new function.
The follow-up commit which properly implements TupleDescFinalize() will
cause any code which forgets to do this to fail in assert-enabled builds in
BlessTupleDesc(). It may still be worth mentioning this change in the
release notes so that extension authors update their code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
CatalogCacheCreateEntry() computed the space needed for a CatCTup
as sizeof(CatCTup) + MAXIMUM_ALIGNOF. That's not our usual style,
and it wastes memory by allocating more padding than necessary.
On 64-bit machines sizeof(CatCTup) would be maxaligned already
since it contains pointer fields, therefore this code is wasting
8 bytes compared to the more usual MAXALIGN(sizeof(CatCTup)).
While at it, we don't really need to do MemoryContextSwitchTo()
when we're only allocating one block.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/tencent_A42E0544C6184FE940CD8E3B14A3F0A39605@qq.com
Coverity complained that this function leaked the dumpdirpath string,
which it did. But we don't need to make a copy at all, because
there's not really any point in trimming trailing slashes from the
directory name here. If that were needed, the initial
file_exists_in_directory() test would have failed, since it doesn't
bother with that (and neither does anyplace else in this file).
Moreover, if we did want that, reimplementing canonicalize_path()
poorly is not the way to proceed. Arguably, all of this code should
be reexamined with an eye to using src/port/path.c's facilities, but
for today I'll settle for getting rid of the memory leak.
Future commits will use the visibility map in on-access pruning to fix
VM corruption and set the VM if the page is all-visible.
Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/C3AB3F5B-626E-4AAA-9529-23E9A20C727F%40gmail.com
BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_a%2BhO4PCptyaPR7AMZd7FjcHfOFKKJT8ouU3KedMud0tQ%40mail.gmail.com
Solaris descendants (Illumos, OpenIndiana, OmniOS, etc.) hit System V
semaphore limits ("No space left on device" from semget) when running
many parallel test scripts under default system settings. We could
tell people to raise those settings, but there's a better answer.
Unnamed POSIX semaphores have been available on Solaris for decades
and work well, so prefer them, as was recently done for AIX.
This patch also updates the documentation to remove now-unnecessary
advice about raising project.max-sem-ids and project.max-msg-ids.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/470305.1772417108@sss.pgh.pa.us
"initdb -d" has been broken since commit f95d73ed4, because I changed
aclitemin to work in bootstrap mode but failed to consider aclitemout.
That routine isn't reached by default, but it is if the elog message
level is high enough, so it needs to work without catalog access too.
This patch just makes it use its existing code paths to print role
OIDs numerically. We could alternatively invent an inverse of
boot_get_role_oid() and print them symbolically, but that would take
more code and it's not apparent that it'd be any better for debugging
purposes.
Reported-by: Greg Burd <greg@burd.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4416.1773328045@sss.pgh.pa.us
The comment about ParallelWorkerNumbr in parallel.c says:
In parallel workers, it will be set to a value >= 0 and < the number
of workers before any user code is invoked; each parallel worker will
get a different parallel worker number.
However asserts in various places collecting instrumentation allowed
(ParallelWorkerNumber == num_workers). That would be a bug, as the value
is used as index into an array with num_workers entries.
Fixed by adjusting the asserts accordingly. Backpatch to all supported
versions.
Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 14
This commit plugs into pgstattuple_approx(), the SQL function faster
than pgstattuple() that returns approximate results, the streaming read
APIs. A callback is used to be able to skip all-visible pages via VM
lookup, to match with the logic prior to this commit.
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can substantially improve the execution time
of the function, particularly for large relations.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
This changes the TupleTableSlotOps contract to make it so the
getsomeattrs() function is in charge of calling
slot_getmissingattrs().
Since this removes all code from slot_getsomeattrs_int() aside from the
getsomeattrs() call itself, we may as well adjust slot_getsomeattrs() so
that it calls getsomeattrs() directly. We leave slot_getsomeattrs_int()
intact as this is still called from the JIT code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com
Use fake LSNs in all nbtree critical sections that write a WAL record.
That way we can safely apply the _bt_killitems LSN trick with logged and
unlogged indexes alike. This brings the same benefits to plain scans of
unlogged relations that commit 2ed5b87f brought to plain scans of logged
relations: scans will drop their leaf page pin eagerly (by applying the
"dropPin" optimization), which avoids blocking progress by VACUUM. This
is particularly helpful with applications that allow a scrollable cursor
to remain idle for long periods.
Preparation for an upcoming commit that will add the amgetbatch
interface, and switch nbtree over to it (from amgettuple) to enable I/O
prefetching. The index prefetching read stream's effective prefetch
distance is adversely affected by any buffer pins held by the index AM.
At the same time, it can be useful for prefetching to read dozens of
leaf pages ahead of the scan to maintain an adequate prefetch distance.
The index prefetching patch avoids this tension by always eagerly
dropping index page pins of the kind traditionally held as an interlock
against unsafe concurrent TID recycling by VACUUM (essentially the same
way that amgetbitmap routines have always avoided holding onto pins).
The work from this commit makes that possible during scans of nbtree
unlogged indexes -- without our having to give up on setting LP_DEAD
bits on index tuples altogether.
Follow-up to commit d774072f, which moved the fake LSN infrastructure
out of GiST so that it could be used by other index AMs.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
Move utility functions used by GiST to generate fake LSNs into xlog.c
and xloginsert.c, so that other index AMs can also generate fake LSNs.
Preparation for an upcoming commit that will add support for fake LSNs
to nbtree, allowing its dropPin optimization to be used during scans of
unlogged relations. That commit is itself preparation for another
upcoming commit that will add a new amgetbatch/btgetbatch interface to
enable I/O prefetching.
Bump XLOG_PAGE_MAGIC due to XLOG_GIST_ASSIGN_LSN becoming
XLOG_ASSIGN_LSN.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
The function used GetXLogInsertRecPtr() to generate the fake LSN. Most
of the time this is the same as what XLogInsert() would return, and so
it works fine with the XLogFlush() call. But if the last record ends at
a page boundary, GetXLogInsertRecPtr() returns LSN pointing after the
page header. In such case XLogFlush() fails with errors like this:
ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000
Such failures are very hard to trigger, particularly outside aggressive
test scenarios.
Fixed by introducing GetXLogInsertEndRecPtr(), returning the correct LSN
without skipping the header. This is the same as GetXLogInsertRecPtr(),
except that it calls XLogBytePosToEndRecPtr().
Initial investigation by me, root cause identified by Andres Freund.
This is a long-standing bug in gistGetFakeLSN(), probably introduced by
c6b92041d3 in PG13. Backpatch to all supported versions.
Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
Backpatch-through: 14
Presently, many statements in stats_import.sql select all columns
from the pg_stats system view. A proposed follow-up commit would
add columns to this view (some of which are not stable across test
runs), breaking all of these tests. This commit introduces a
convenience view for those statements so that future changes are
minimally disruptive.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
In 0b96e734c5 I (Andres) relied on page_collect_tuples() being called only
with an MVCC snapshot, and added assertions to that end, but did not realize
that IsMVCCSnapshot() allows both proper MVCC snapshots and historical
snapshots, which behave quite similarly to MVCC snapshots.
Unfortunately that can lead to incorrect visibility results during logical
decoding, as a historical snapshot is interpreted as a plain MVCC
snapshot. The only reason this wasn't noticed earlier is that it's hard to
reach as most of the time there are no sequential scans during logical
decoding.
To fix the bug and avoid issues like this in the future, split
IsMVCCSnapshot() into IsMVCCSnapshot() and IsMVCCLikeSnapshot(), where now
only the latter includes historic snapshots.
One effect of this is that during logical decoding no page-at-a-time snapshots
are used, as otherwise runtime branches to handle historic snapshots would be
needed in some performance critical paths. Given how uncommon sequential scans
are during logical decoding, that seems acceptable.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/61812.1770637345@localhost
As part of 6225403f2, I'd removed the override for the `stlib` target,
since NAME no longer contains a major version number. But I forgot that
its dependencies are declared before Makefile.shlib is included; those
dependencies were then omitted entirely.
Per buildfarm member indri, which appears to be the only system so far
that's bothered by an empty archive.
Now that libpq-oauth doesn't have to match the major version of libpq,
some things in pg_wchar.h are technically unsafe for us to use. (See
b6c7cfac8 for a fuller discussion.) This is unlikely to be a problem --
we only care about UTF-8 in the context of OAuth right now -- but if
anyone did introduce a way to hit it, it'd be extremely difficult to
debug or reproduce, and it'd be a potential security vulnerability to
boot.
Define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries to add a
dependency on the exported APIs will simply fail to link the shared
module.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com