The bug fixed in commit 72eab84a5 would not have occurred if initdb
had a less surprising rule about which columns should be marked
NOT NULL by default. Let's make that rule be strictly that the
column must be fixed-width and its predecessors must be fixed-width
and NOT NULL, removing the hacky and unsafe exceptions for oidvector
and int2vector.
Since we do still want all existing oidvector and int2vector columns
to be marked NOT NULL, we have to put BKI_FORCE_NOT_NULL labels on
them. But making this less magic and more documented seems like a
good idea, even if it's a shade more verbose.
I didn't bump catversion since the initial catalog contents are
not actually changed by this patch. Note however that the
contents of postgres.bki do change, and feeding an old copy of
that to a new backend will produce wrong results.
Discussion: https://postgr.es/m/204760.1595181800@sss.pgh.pa.us
This is more consistent with the IEEE 754 spec and our treatment of
NaNs elsewhere; in particular, the case has always acted that way in
"numeric" arithmetic.
Noted by Dean Rasheed.
Discussion: https://postgr.es/m/3421746.1594927785@sss.pgh.pa.us
The code has always set this column to NULL when it's not valid,
but the catalog header's description failed to reflect that,
as did the SGML docs, as did some of the code. To prevent future
coding errors of the same ilk, let's hide the field from C code
as though it were variable-length (which, in a sense, it is).
As with commit 72eab84a5, we can only fix this cleanly in HEAD
and v13; the problem extends further back but we'll need some
klugery in the released branches.
Discussion: https://postgr.es/m/367660.1595202498@sss.pgh.pa.us
Commit b9c130a1f failed to apply the publisher-to-subscriber column
mapping while checking which columns were updated. Perhaps less
significantly, it didn't exclude dropped columns either. This could
result in an incorrect updated-columns bitmap and thus wrong decisions
about whether to fire column-specific triggers on the subscriber while
applying updates. In HEAD (since commit 9de77b545), it could also
result in accesses off the end of the colstatus array, as detected by
buildfarm member skink. Fix the logic, and adjust 003_constraints.pl
so that the problem is exposed in unpatched code.
In HEAD, also add some assertions to check that we don't access off
the ends of these newly variable-sized arrays.
Back-patch to v10, as b9c130a1f was.
Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
max_slot_wal_keep_size that was added in v13 and wal_keep_segments are
the GUC parameters to specify how much WAL files to retain for
the standby servers. While max_slot_wal_keep_size accepts the number of
bytes of WAL files, wal_keep_segments accepts the number of WAL files.
This difference of setting units between those similar parameters could
be confusing to users.
To alleviate this situation, this commit renames wal_keep_segments to
wal_keep_size, and make users specify the WAL size in it instead of
the number of WAL files.
There was also the idea to rename max_slot_wal_keep_size to
max_slot_wal_keep_segments, in the discussion. But we have been moving
away from measuring in segments, for example, checkpoint_segments was
replaced by max_wal_size. So we concluded to rename wal_keep_segments
to wal_keep_size.
Back-patch to v13 where max_slot_wal_keep_size was added.
Author: Fujii Masao
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
The logical decoding infrastructure needs to know which top-level
transaction the subxact belongs to, in order to decode all the
changes. Until now that might be delayed until commit, due to the
caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring
incremental decoding.
So we also write the assignment info into WAL immediately, as part
of the next WAL record (to minimize overhead) only when wal_level=logical.
We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is
required for avoiding overflow in the hot standby snapshot.
Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID.
Author: Tomas Vondra, Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
There was no easy way to find how many times generic and custom plans
have been executed for a prepared statement. This commit exposes those
numbers of times in pg_prepared_statements view.
Author: Atsushi Torikoshi, Kyotaro Horiguchi
Reviewed-by: Tatsuro Yamada, Masahiro Ikeda, Fujii Masao
Discussion: https://postgr.es/m/CACZ0uYHZ4M=NZpofH6JuPHeX=__5xcDELF8hT8_2T+R55w4RQw@mail.gmail.com
Due to the layout of this catalog, subslotname has to be explicitly
marked BKI_FORCE_NULL, else initdb will default to the assumption
that it's non-nullable. Since, in fact, CREATE/ALTER SUBSCRIPTION
will store null values there, the existing marking is just wrong,
and has been since this catalog was invented.
We haven't noticed because not much in the system actually depends
on attnotnull being truthful. However, JIT'ed tuple deconstruction
does depend on that in some cases, allowing crashes or wrong answers
in queries that inspect pg_subscription. Commit 9de77b545 quite
accidentally exposed this on the buildfarm members that force JIT
activation.
Back-patch to v13. The problem goes further back, but we cannot
force initdb in released branches, so some klugier solution will
be needed there. Before working on that, push this simple fix
to try to get the buildfarm back to green.
Discussion: https://postgr.es/m/4118109.1595096139@sss.pgh.pa.us
This header hasn't changed recently, so the fact that it now fails
headerscheck/cpluspluscheck testing must be due to changes in what
it includes. Probably f21916791 is to blame, but I didn't try to
verify that.
Discussion: https://postgr.es/m/3699703.1595016554@sss.pgh.pa.us
This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION.
When that's set, the publisher will send data using the data type's
typsend function if any, rather than typoutput. This is generally
faster, if slightly less robust.
As committed, we won't try to transfer user-defined array or composite
types in binary, for fear that type OIDs won't match at the subscriber.
This might be changed later, but it seems like fit material for a
follow-on patch.
Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others;
adjusted some by me
Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
Teach Valgrind memcheck to maintain the "defined-ness" of each shared
buffer based on whether the backend holds at least one pin at the point
it is accessed by access method code. Bugs like the one fixed by commit
b0229f26 can be detected using this new instrumentation.
Note that backends running with Valgrind naturally have their own
independent ideas about whether any given byte in shared memory is safe
or unsafe to access. There is no risk that concurrent access by
multiple backends to the same shared memory will confuse Valgrind's
instrumentation, because everything already works at the process level
(or at the memory mapping level, if you prefer).
Author: Álvaro Herrera, Peter Geoghegan
Reviewed-By: Anastasia Lubennikova
Discussion: https://postgr.es/m/20150723195349.GW5596@postgresql.org
Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
This representation saves 8 bytes per tuple compared to HeapTuple, and
avoids the need to allocate, copy and free on the receiving side.
Gather can emit the returned MinimalTuple directly, but GatherMerge now
needs to make an explicit copy because it buffers multiple tuples at a
time. That should be no worse than before.
Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2B8T_ggoUTAE-U%3DA%2BOcPc4%3DB0nPPHcSfffuQhvXXjML6w%40mail.gmail.com
When using the following functions, users could see various types of
errors of the type "cache lookup failed for OID XXX" with elog(), that
can only be used for internal errors:
* pg_describe_object()
* pg_identify_object()
* pg_identify_object_as_address()
The set of APIs managing object addresses for all object types are made
smarter by gaining a new argument "missing_ok" that allows any caller to
control if an error is raised or not on an undefined object. The SQL
functions listed above are changed to handle the case where an object is
missing.
Regression tests are added for all object types for the cases where
these are undefined. Before this commit, these cases failed with cache
lookup errors, and now they basically return NULL (minus the name of the
object type requested).
Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
Three groups of issues needed to be addressed:
load_external_function() and related functions returned PGFunction,
even though not necessarily all callers are looking for a function of
type PGFunction. Since these functions are really just wrappers
around dlsym(), change to return void * just like dlsym().
In dynahash.c, we are using strlcpy() where a function with a
signature like memcpy() is expected. This should be safe, as the new
comment there explains, but the cast needs to be augmented to avoid
the warning.
In PL/Python, methods all need to be cast to PyCFunction, per Python
API, but this now runs afoul of these warnings. (This issue also
exists in core CPython.)
To fix the second and third case, we add a new type pg_funcptr_t that
is defined specifically so that gcc accepts it as a special function
pointer that can be cast to any other function pointer without the
warning.
Also add -Wcast-function-type to the standard warning flags, subject
to configure check.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1e97628e-6447-b4fd-e230-d109cec2d584%402ndquadrant.com
Incorrect function names were referenced. As this fixes some portions
of tableam.h, that is mentioned in the docs as something to look at when
implementing a table AM, backpatch down to 12 where this has been
introduced.
Author: Hironobu Suzuki
Discussion: https://postgr.es/m/8fe6d672-28dd-3f1d-7aed-ac2f6d599d3f@interdb.jp
Backpatch-through: 12
The stats with this commit was available only for WALSenders, however,
users might want to see for backends doing logical decoding via SQL API.
Then, users might want to reset and access these stats across server
restart which was not possible with the current patch.
List of commits reverted:
caa3c4242c Don't call elog() while holding spinlock.
e641b2a995 Doc: Update the documentation for spilled transaction
statistics.
5883f5fe27 Fix unportable printf format introduced in commit 9290ad198.
9290ad198b Track statistics for spilling of changes from ReorderBuffer.
Additionaly, remove the release notes entry for this feature.
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
This includes two changes:
- Addition of a new function pg_xact_commit_timestamp_origin() able, for
a given transaction ID, to return the commit timestamp and replication
origin of this transaction. An equivalent function existed in
pglogical.
- Addition of the replication origin to pg_last_committed_xact().
The commit timestamp manager includes already APIs able to return the
replication origin of a transaction on top of its commit timestamp, but
the code paths for replication origins were never stressed as those
functions have never looked for a replication origin, and the SQL
functions available have never included this information since their
introduction in 73c986a.
While on it, refactor a test of modules/commit_ts/ to use tstzrange() to
check that a transaction timestamp is within the wanted range, making
the test a bit easier to read.
Bump catalog version.
Author: Movead Li
Reviewed-by: Madan Kumar, Michael Paquier
Discussion: https://postgr.es/m/2020051116430836450630@highgo.ca
"relkind" normally refers to the char field from pg_class. However, in
the parse nodes AlterTableStmt and CreateTableAsStmt, "relkind" was used
for a field of type enum ObjectType, that could refer to other object
types than those possible for a relkind. Such fields being usually
named "objtype", switch the name in both structures to make things more
consistent. Note that this led to some confusion in functions that
also operate on a RangeTableEntry object, which also has a field named
"relkind".
This naming goes back to commit 09d4e96, where only OBJECT_TABLE and
OBJECT_INDEX were used. This got extended later to use as well
OBJECT_TYPE with e440e12, not really a relation kind.
Author: Mark Dilger
Reviewed-by: Daniel Gustafsson, Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/609181AE-E399-47C7-9221-856E0F96BF93@enterprisedb.com
Commit f7f70d5e2 left one inconsistency behind: we're still creating
pg_type entries for the composite types of sequences and toast tables,
but not arrays over those composites. But there seems precious little
reason to have named composite types for toast tables, and not much more
to have them for sequences (especially given the thought that sequences
may someday not be standalone relations at all).
So, let's close that inconsistency by removing these composite types,
rather than adding arrays for them. This buys back a little bit of
the initial pg_type bloat added by the previous patch, and could be
a significant savings in a large database with many toast tables.
Aside from a small logic rearrangement in heap_create_with_catalog,
this patch mostly needs to clean up some places that were assuming that
pg_class.reltype always has a valid value. Those are really pre-existing
bugs, given that it's documented otherwise; notably, the plpgsql changes
fix code that gives "cache lookup failed for type 0" on indexes today.
But none of these seem interesting enough to back-patch.
Also, remove the pg_dump/pg_upgrade infrastructure for propagating
a toast table's pg_type OID into the new database, since we no longer
need that.
Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
The previous definition of the column was almost universally disliked,
so provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger. This should be operationally more convenient.
Backpatch to 13, where the new column to pg_replication_slots (and the
feature it represents) were added.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
When we invented arrays of composite types (commit bc8036fc6),
we excluded system catalogs, basically just on the grounds of not
wanting to bloat pg_type. However, it's definitely inconsistent that
catalogs' composite types can't be put into arrays when others can.
Another problem is that the exclusion is done by checking
IsUnderPostmaster in heap_create_with_catalog, which means that
(1) If a user tries to create a table in single-user mode, it doesn't
get an array type. That's bad in itself, plus it breaks pg_upgrade.
(2) If someone drops and recreates a system view or information_schema
view (as we occasionally recommend doing), it will now have an array
type where it did not before, making for still more inconsistency.
So this is all pretty messy. Let's just get rid of the inconsistency
and decree that system-created relations should have array types if
similar user-created ones would, i.e. it only depends on the relkind.
As of HEAD, that means that the initial contents of pg_type grow from
411 rows to 605, which is a lot of growth percentage-wise, but it's
still quite a small catalog compared to others.
Wenjing Zeng, reviewed by Shawn Wang, further hacking by me
Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
This introduces a new set of extended routines for procedure and
operator name lookups, with a flag bitmask argument that can modify the
result. The following options are available:
- Force schema qualification, ignoring search_path. This is similar to
the existing option for format_{operator|procedure}_qualified().
- Force NULL as result instead of a numeric OID for an undefined
object. This option is new.
This is a refactoring similar to 1185c78, that will be used for a future
patch to improve the SQL functions providing information using object
addresses for undefined objects.
Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
If a type scanned is undefined, type format routines have two behaviors
depending on if FORMAT_TYPE_ALLOW_INVALID is used by the caller or not:
- Issue a cache lookup error
- Return an undefined type name "???", "???[]" or "-"
The current interface is not really helpful for callers willing to
format properly a type name, but still make sure that the type is
defined as there could be types matching the strings generated when
looking for an undefined type, even if that should not be a problem in
practice. In order to counter that, add a new flag called
FORMAT_TYPE_INVALID_AS_NULL that returns a NULL result instead of "???
or "-" which does not generate an error. This flag will be used in a
follow-up patch improving the set of SQL functions showing information
for object addresses when it comes to undefined objects.
Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
src/backend/replication/README includes since 32bc08b a basic
description of the WAL receiver hooks available in walreceiver.h for a
module like libpqwalreceiver, but the README has never been updated to
reflect changes done to the hooks, so the contents of the README have
rotten with the time. This commit moves the description from the README
to walreceiver.h, where it will be hard to miss that a description
update or addition is needed depending on the modifications done to the
hooks.
Each hook now includes a description of what it does in walreceiver.h,
and the replication's README mentions walreceiver.h.
Thanks also to Amit Kapila for the discussion.
Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/20200502024606.GA471944@paquier.xyz
OpenSSL's native reports about problems related to protocol version
restrictions are pretty opaque and inconsistent. When we get an
SSL error that is plausibly due to this, emit a hint message that
includes the range of SSL protocol versions we (think we) are
allowing. This should at least get the user thinking in the right
direction to resolve the problem, even if the hint isn't totally
accurate, which it might not be for assorted reasons.
Back-patch to v13 where we increased the default minimum protocol
version, thereby increasing the risk of this class of failure.
Patch by me, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/a9408304-4381-a5af-d259-e55d349ae4ce@2ndquadrant.com
Commit 0d861bbb, which added deduplication to nbtree, had
_bt_check_unique() pass a TID to table_index_fetch_tuple_check() that
isn't safe to mutate. table_index_fetch_tuple_check()'s tid argument is
modified when the TID in question is not the latest visible tuple in a
hot chain, though this wasn't documented.
To fix, go back to using a local copy of the TID in _bt_check_unique(),
and update comments above table_index_fetch_tuple_check().
Backpatch: 13-, where B-Tree deduplication was introduced.
In pg_replication_slot, change output from normal/reserved/lost to
reserved/extended/unreserved/ lost, which better expresses the possible
states particularly near the time where segments are no longer safe but
checkpoint has not run yet.
Under the new definition, reserved means the slot is consuming WAL
that's still under the normal WAL size constraints; extended means it's
consuming WAL that's being protected by wal_keep_segments or the slot
itself, whose size is below max_slot_wal_keep_size; unreserved means the
WAL is no longer safe, but checkpoint has not yet removed those files.
Such as slot is in imminent danger, but can still continue for a little
while and may catch up to the reserved WAL space.
Also, there were some bugs in the calculations used to report the
status; fixed those.
Backpatch to 13.
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com
The function has been reading global variable forceSyncCommit, mirroring
the intent of the caller that passed forceSync=forceSyncCommit. The
other caller, RecordTransactionCommitPrepared(), passed false. Since
COMMIT PREPARED can't share a transaction with any command, it certainly
doesn't share a transaction with a command that sets forceSyncCommit.
Reviewed by Michael Paquier.
Discussion: https://postgr.es/m/20200617032615.GC2916904@rfd.leadboat.com
It was possible for deduplication's single value strategy to mistakenly
believe that a very small duplicate tuple counts as one of the six large
tuples that it aims to leave behind after the page finally splits. This
could cause slightly suboptimal space utilization with very low
cardinality indexes, though only under fairly narrow conditions.
To fix, be particular about what kind of tuple counts as a
maxpostingsize-capped tuple. This avoids confusion in the event of a
small tuple that gets "wedged" between two large tuples, where all
tuples on the page are duplicates of the same value.
Discussion: https://postgr.es/m/CAH2-Wz=Y+sgSFc-O3LpiZX-POx2bC+okec2KafERHuzdVa7-rQ@mail.gmail.com
Backpatch: 13-, where deduplication was introduced (by commit 0d861bbb)
Since 1f39bce02, HashAgg nodes have had the ability to spill to disk when
memory consumption exceeds work_mem. That commit added new properties to
EXPLAIN ANALYZE to show the maximum memory usage and disk usage, however,
it didn't quite go as far as showing that information for parallel
workers. Since workers may have experienced something very different from
the main process, we should show this information per worker, as is done
in Sort.
Reviewed-by: Justin Pryzby
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/CAApHDvpEKbfZa18mM1TD7qV6PG+w97pwCWq5tVD0dX7e11gRJw@mail.gmail.com
Backpatch-through: 13, where the hashagg spilling code was added.
On platforms without support for 64bit atomic operations where we also
cannot rely on 64bit reads to have single copy atomicity, such atomics
are implemented using a spinlock based fallback. That means it's not
safe to even read such atomics from within a signal handler (since the
signal handler might run when the spinlock already is held).
To avoid this issue defer global barrier processing out of the signal
handler. Instead of checking local / shared barrier generation to
determine whether to set ProcSignalBarrierPending, introduce
PROCSIGNAL_BARRIER and always set ProcSignalBarrierPending when
receiving such a signal. Additionally avoid redundant work in
ProcessProcSignalBarrier if ProcSignalBarrierPending is unnecessarily.
Also do a small amount of other polishing.
Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20200609193723.eu5ilsjxwdpyxhgz@alap3.anarazel.de
Backpatch: 13-, where the code was introduced.
Don't use fread(), since that doesn't necessarily set errno. We could
use read() instead, but it's even better to use pg_pread(), which
allows us to avoid some extra calls to seek to the desired location in
the file.
Also, advertise a wait event while reading from a file, as we do for
most other places where we're reading data from files.
Patch by me, reviewed by Hamid Akhtar.
Discussion: http://postgr.es/m/CA+TgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA@mail.gmail.com
Commit 72d422a522 made xlog.c call
sendTablespace() with the 'sizeonly' argument set to true, which
required basebackup.c to export sendTablespace(). However, that's
kind of ugly, so instead defer the call to sendTablespace() until
basebackup.c regains control. That way, it can still be a static
function.
Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi.
Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
Introduce TAR_BLOCK_SIZE and replace many instances of 512 with
the new constant. Introduce function tarPaddingBytesRequired
and use it to replace numerous repetitions of (x + 511) & ~511.
Add preprocessor guards against multiple inclusion to pgtar.h.
Reformat the prototype for tarCreateHeader so it doesn't extend
beyond 80 characters.
Discussion: http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
Core by no means makes excessive use of these functions, but quite a large
number of those usages do require the caller to call strlen() on the
returned string. This is quite wasteful since these functions do already
have a good idea of the length of the string, so we might as well just
have them return that.
Reviewed-by: Andrew Gierth
Discussion: https://postgr.es/m/CAApHDvrm2A5x2uHYxsqriO2cUaGcFvND%2BksC9e7Tjep0t2RK_A%40mail.gmail.com