This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.
The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.
For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.
Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
Introduce a new GUC recovery_prefetch, disabled by default. When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.
The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.
The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
First, don't perform database access while holding a buffer lock.
When checking a heap, we can validate that TOAST pointers are sane by
performing a scan on the TOAST index and looking up the chunks that
correspond to each value ID that appears in a TOAST poiner in the main
table. But, to do that while holding a buffer lock at least risks
causing other backends to wait uninterruptibly, and probably can cause
undetected and uninterruptible deadlocks. So, instead, make a list of
checks to perform while holding the lock, and then perform the checks
after releasing it.
Second, adjust things so that we don't try to follow TOAST pointers
for tuples that are already eligible to be pruned. The TOAST tuples
become eligible for pruning at the same time that the main tuple does,
so trying to check them may lead to spurious reports of corruption,
as observed in the buildfarm. The necessary infrastructure to decide
whether or not the tuple being checked is prunable was added by
commit 3b6c1259f9, but it wasn't
actually used for its intended purpose prior to this patch.
Mark Dilger, adjusted by me to avoid a memory leak.
Discussion: http://postgr.es/m/AC5479E4-6321-473D-AC92-5EC36299FBC2@enterprisedb.com
Retry the call to heap_prune_page() in rare cases where there is
disagreement between the heap_prune_page() call and the call to
HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is
possible when a concurrently-aborted transaction makes a tuple DEAD
during the tiny window between each step. This was the only case where
a tuple considered DEAD by VACUUM still had storage following pruning.
VACUUM's definition of dead tuples is now uniformly simple and
unambiguous: dead tuples from each page are always LP_DEAD line pointers
that were encountered just after we performed pruning (and just before
we considered freezing remaining items with tuple storage).
Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
style skipping of index vacuuming that takes place based on flexible,
dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping
indexes up-front before now, due to a subtle interaction with the
special case (see commit dd695979) -- this was a special case unto
itself. Now there are no special cases. And so now it won't matter
when or how we decide to skip index vacuuming: it won't affect how
pruning behaves, and it won't be affected by any of the implementation
details of pruning or freezing.
Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer
necessary because we now rely entirely on heap pruning taking care of
recovery conflicts. There is no longer any need to generate recovery
conflicts for DEAD tuples that pruning just missed. This also means
that heap vacuuming now uses exactly the same strategy for recovery
conflicts as index vacuuming always has: REDO routines never need to
process a latestRemovedXid from the WAL record, since earlier REDO of
the WAL record from pruning is sufficient in all cases. The generic
XLOG_HEAP2_CLEAN record type is now split into two new record types to
reflect this new division (these are called XLOG_HEAP2_PRUNE and
XLOG_HEAP2_VACUUM).
Also stop acquiring a super-exclusive lock for heap pages when they're
vacuumed during VACUUM's second heap pass. A regular exclusive lock is
enough. This is correct because heap page vacuuming is now strictly a
matter of setting the LP_DEAD line pointers to LP_UNUSED. No other
backend can have a pointer to a tuple located in a pinned buffer that
can be invalidated by a concurrent heap page vacuum operation.
Heap vacuuming can now be thought of as conceptually similar to index
vacuuming and conceptually dissimilar to heap pruning. Heap pruning now
has sole responsibility for anything involving the logical contents of
the database (e.g., managing transaction status information, recovery
conflicts, considering what to do with HOT chains). Index vacuuming and
heap vacuuming are now only concerned with recycling garbage items from
physical data structures that back the logical database.
Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
changes.
Credit for the idea of retrying pruning a page to avoid the tupgone case
goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently). If there
is no such libraries, a fallback implementation is available. Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.
HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC. This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.
This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1. I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.
List of commits reverted, in reverse chronological order:
ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code. But we already have qsort_arg in
src/port, so let's move it next to it.
This seems to have been just copied-and-pasted from some other
header checks. But our C code is entirely unprepared to support
such a header name, so it's only wasting cycles to look for it.
If we did need to support it, some #ifdefs would be required.
(A quick trawl at codesearch.debian.net finds some packages that
reference lz4/lz4.h; but they use *only* that spelling, and
appear to be intending to reference their own copy rather than
a system-level installation of liblz4. There's no evidence of
freestanding installations that require this spelling.)
Discussion: https://postgr.es/m/457962.1616362509@sss.pgh.pa.us
It's not okay to just shove the pkg_config results right into our
build flags, for a couple different reasons:
* This fails to maintain the separation between CPPFLAGS and CFLAGS,
as well as that between LDFLAGS and LIBS. (The CPPFLAGS angle is,
I believe, the reason for warning messages reported when building
with MacPorts' liblz4.)
* If pkg_config emits anything other than -I/-D/-L/-l switches,
it's highly unlikely that we want to absorb those. That'd be more
likely to break the build than do anything helpful. (Even the -D
case is questionable; but we're doing that for libxml2, so I kept it.)
Also, it's not okay to skip doing an AC_CHECK_LIB probe, as
evidenced by recent build failure on topminnow; that should
have been caught at configure time.
Model fixes for this on configure's libxml2 support.
It appears that somebody overlooked an autoheader run, too.
Discussion: https://postgr.es/m/20210119190720.GL8560@telsasoft.com
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery. This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.
Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory. This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.
The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.
Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.
A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.
In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.
Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query. The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync. This can lead to substantial reductions
in query latency.
Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com>
Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Vérité <daniel@manitou-mail.org>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com>
Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.
This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.
Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.
Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
Create a wrapper object, ParallelSlotArray, to encapsulate the
number of slots and the slot array itself, plus some other relevant
bits of information. This reduces the number of parameters we have
to pass around all over the place.
Allow for a ParallelSlotArray to contain slots connected to
different databases within a single cluster. The current clients
of this mechanism don't need this, but it is expected to be used
by future patches.
Defer connecting to databases until we actually need the connection
for something. This is a slight behavior change for vacuumdb and
reindexdb. If you specify a number of jobs that is larger than the
number of objects, the extra connections will now not be used.
But, on the other hand, if you specify a number of jobs that is
so large that it's going to fail, the failure would previously have
happened before any operations were actually started, and now it
won't.
Mark Dilger, reviewed by me.
Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
Discussion: http://postgr.es/m/BA592F2D-F928-46FF-9516-2B827F067F57@enterprisedb.com
1. Backends waiting for buffer I/O are now interruptible.
2. If something goes wrong in a backend that is currently performing
I/O, waiting backends no longer wake up until that backend reaches
AbortBufferIO() and broadcasts on the CV. Previously, any waiters would
wake up (because the I/O lock was automatically released) and then
busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS.
3. LWLockMinimallyPadded is removed, as it would now be unused.
Author: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016)
Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
Moving this logic into pg_regress fixes a potential failure with
parallel tests when pg_upgrade and the main regression test suite both
trigger the makefile rule that cleaned up testtablespace/ under
src/test/regress. Even if pg_upgrade was triggering this rule, it has
no need to do so as it uses a different tablespace path. So if
pg_upgrade triggered the makefile rule for the tablespace setup while
the main regression test suite ran the tablespace cases, it would fail.
61be85a was a similar attempt at achieving that, but that broke cases
where the regression tests require to run under an Administrator
account, like with Appveyor.
Reported-by: Andres Freund, Kyotaro Horiguchi
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/20201209012911.uk4d6nxcnkp7ehrx@alap3.anarazel.de
Instead of instr_time (struct timespec) and the INSTR_XXX macros,
introduce pg_time_usec_t and use integer arithmetic. Don't include the
connection time in TPS unless using -C mode, but report it separately.
Author: Fabien COELHO <coelho@cri.ensmp.fr>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de
These can be set in buildenv.pl or a "set" command within a Windows
terminal. The MSVC script vcregress.pl parses the values available in
the environment to build the resulting prove commands, and the parsing
of PROVE_TESTS is able to handle name patterns in the same way as other
platforms.
Not specifying those environment values makes vcregress.pl fall back to
the previous default, with no extra flags for the prove command, and all
the tests run within t/.
Author: Michael Paquier
Reviewed-by: Juan José Santamaría Flecha, Julien Rouhaud
Discussion: https://postgr.es/m/YD9GigwHoL6lFY2y@paquier.xyz
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Bump catalog version due to change of state in the catalog
(pg_subscription_rel).
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
The parallel slots infrastructure (which implements client-side
multiplexing of server connections doing similar things, not
threading or multiple processes or anything like that) are moved from
src/bin/scripts/scripts_parallel.c to src/fe_utils/parallel_slot.c.
The functions consumeQueryResult() and processQueryResult() which were
previously part of src/bin/scripts/common.c are now moved into that
file as well, becoming static helper functions. This might need to be
changed in the future, but currently they're not used for anything
else.
Some other functions from src/bin/scripts/common.c are moved to to
src/fe_utils and are split up among several files. connectDatabase(),
connectMaintenanceDatabase(), and disconnectDatabase() are moved to
connect_utils.c. executeQuery(), executeCommand(), and
executeMaintenanceCommand() are move to query_utils.c.
handle_help_version_opts() is moved to option_utils.c.
Mark Dilger, reviewed by me. The larger patch series of which this is
a part has also had review from Peter Geoghegan, Andres Freund, Álvaro
Herrera, Michael Paquier, and Amul Sul, but I don't know whether any
of them have reviewed this bit specifically.
Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
Discussion: http://postgr.es/m/5F743835-3399-419C-8324-2D424237E999@enterprisedb.com
Discussion: http://postgr.es/m/70655DF3-33CE-4527-9A4D-DDEB582B6BA0@enterprisedb.com
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs. While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL. Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation. Nonetheless, we can collect and use knowledge about these
relationships.
This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships. (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.) Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h". The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL. The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins. (As of this
commit, 217 relationships are checked by the test, versus 181 before.)
Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries. The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.
The original switch, --with-openssl, is kept for compatibility.
Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.
This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.
The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives. To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead. By convention, we
usually make a catalog's OID column its primary key, if it has one.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
pg_standby was useful more than a decade ago, but now it is obsolete.
It has been proposed that we retire it many times. Now seems like a
good time to finally do it, because "waiting restore commands"
are incompatible with a proposed recovery prefetching feature.
Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
With this commit, SHA1 goes through the implementation provided by
OpenSSL via EVP when building the backend with it, and uses as fallback
implementation KAME which was located in pgcrypto and already shaped for
an integration with a set of init, update and final routines.
Structures and routines have been renamed to make things consistent with
the fallback implementations of MD5 and SHA2.
uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of
SHA1 if needed. This was built depending on the build options within
./configure, so this cleans up some code and removes the build
dependency between pgcrypto and uuid-ossp.
Note that this will help with the refactoring of HMAC, as pgcrypto
offers the option to use MD5, SHA1 or SHA2, so only the second option
was missing to make that possible.
Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
libpq's error messages for connection failures pretty well stand on
their own, especially since commits 52a10224e/27a48e5a1. Prefixing
them with 'could not connect to database "foo"' or the like is just
redundant, and perhaps even misleading if the specific database name
isn't relevant to the failure. (When it is, we trust that the
backend's error message will include the DB name.) Indeed, psql
hasn't used any such prefix in a long time. So, make all our other
programs and documentation examples agree with psql's practice.
Discussion: https://postgr.es/m/1094524.1611266589@sss.pgh.pa.us
"cl /?" produces a different output if run on a real or a virtual drive
(this can be set with a simple subst command), causing an error in the
MSVC scripts if building on a virtual drive because the platform to use
cannot be detected.
"cl /help", on the contrary, produces a consistent output if used on a
real or virtual drive. Changing to "/help" allows the compilation to
work with a virtual drive as long as the top of the code repository is
part of the drive, without impacting the build on real drives.
Reported-by: Robert Grange
Author: Juan José Santamaría Flecha
Discussion: https://postgr.es/m/16825-c4f104bcebc67034@postgresql.org
This continues the work done in b5913f6. All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options. Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM). The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.
This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.
Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.
Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected. Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing. This is moved to src/common/ and
it gains the same overflow checks as the decoding part.
On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries. Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.
While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection. The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code. Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks. The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.
Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions. Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.
Also move the _with_retry() wrapper to a more permanent home.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
This struct is used by ReindexRelationConcurrently to keep track of the
relations to process. This saves having to obtain some data repeatedly,
and has future uses as well.
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
Provide synchronous vectored file I/O routines. These map to preadv()
and pwritev(), with fallback implementations for systems that don't have
them. Also provide a wrapper pg_pwritev_with_retry() that automatically
retries on short writes.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
The allocation of the cryptohash context data when building with OpenSSL
was happening in the memory context of the caller of
pg_cryptohash_create(), which could lead to issues with resowner cleanup
if cascading resources are cleaned up on an error. Like other
facilities using resowners, move the base allocation to TopMemoryContext
to ensure a correct cleanup on failure.
The resulting code gets simpler with this commit as the context data is
now hold by a unique opaque pointer, so as there is only one single
allocation done in TopMemoryContext.
After discussion, also change the cryptohash subroutines to return an
error if the caller provides NULL for the context data to ease error
detection on OOM.
Author: Heikki Linnakangas
Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX. However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function. And setenv() has
cleaner semantics too. So, let's reverse that old policy.
This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field). More importantly, extend
win32env.c to also support setenv(). Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.
Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention. I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.
Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
This adds six methods to the output plugin API, adding support for
streaming changes of two-phase transactions at prepare time.
* begin_prepare
* filter_prepare
* prepare
* commit_prepared
* rollback_prepared
* stream_prepare
Most of this is a simple extension of the existing methods, with the
semantic difference that the transaction is not yet committed and maybe
aborted later.
Until now two-phase transactions were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the
two-phase commands were communicated to the subscriber.
This patch provides the infrastructure for logical decoding plugins to be
informed of two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.
This also extends the 'test_decoding' plugin, implementing these new
methods.
This commit simply adds these new APIs and the upcoming patch to "allow
the decoding at prepare time in ReorderBuffer" will use these APIs.
Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, and Dilip Kumar
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ruhttps://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.
Reported-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits. The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode. A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start. New parameters allow a file descriptor open to the terminal to
be passed. pg_upgrade support has also been added.
Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us
Author: Masahiko Sawada, me, Stephen Frost