Redefine our exact width types with standard C99 types and macros,
including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already
using <stdint.h> types in a few places.
One complication is that Windows' <inttypes.h> uses format strings like
"%I64d", "%I32", "%I" for PRI*64, PRI*32, PTR*PTR, instead of mapping to
other standardized format strings like "%lld" etc as seen on other known
systems. Teach our snprintf.c to understand them.
This removes a lot of configure clutter, and should also allow 64-bit
numbers and other standard types to be used in localized messages
without casting.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
The majority of all filenames are quoted in user facing error and
log messages, but a few were still printed without quotes. While
these filenames do not risk causing any ambiguity as their format
is strict, quote them anyways to be consistent across all logs.
Also concatenate a message to keep it one line to make it easier
to grep for in the code.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/080EEABE-6645-4A46-AB20-6285ADAC44FE@yesql.se
Make the primary messages more compact and make the detail messages
uniform. In initdb.c and pg_resetwal.c, use the newish
option_parse_int() to simplify some of the option parsing. For the
backend GUC wal_segment_size, add a GUC check hook to do the
verification instead of coding it in bootstrap.c. This might be
overkill, but that way the check is in the right place and it becomes
more self-documenting.
In passing, make pg_controldata use the logging API for warning
messages.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/9939aa8a-d7be-da2c-7715-0a0b5535a1f7@eisentraut.org
Not including the timeline IDs to the file names generated by pg_waldump
for the individual blocks saved could cause some of these files to be
overwritten when scanning segments across multiple timelines. Having
this information is also as much useful as the LSNs, to be able to know
from exactly which WAL segment a block is comes from.
While on it, this fixes a few comments in the tests, where the format of
the file was not described as matching with the reality.
Reported-by: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, David Christensen
Discussion: https://postgr.es/m/ZJp921+nITFnvBVS@paquier.xyz
We have two existing conventions for long options: either alphabetical
among short options, or all long options after all the short options.
But the convention apparently used here, next to a functionally
related option, is not one of them.
This makes it easier to specify values taken directly from WAL file
names.
The option parsing is arranged in the style of option_parse_int() (but
we need to parse unsigned int), to allow future refactoring in the
same manner.
Reviewed-by: Sébastien Lardière <sebastien@lardiere.net>
Discussion: https://www.postgresql.org/message-id/flat/8fef346e-2541-76c3-d768-6536ae052993@lardiere.net
This option extracts (potentially decompressing) full-page images
included in WAL records into a given target directory. These images are
subject to the same filtering rules as the normal display of the WAL
records, hence with --relation one can for example extract only the FPIs
issued on the relation defined. By default, the records are printed or
their stats computed (--stats), using --quiet would only save the images
without any output generated.
This is a tool aimed mostly for very experienced users, useful for
fixing page-level corruption or just analyzing the past state of a page,
and there were no easy way to do that with the in-core tools up to now
when looking at WAL.
Each block is saved in a separate file, to ease their manipulation, with
the file respecting <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with as format.
For instance, 00000000-010000C0.1663.1.6117.123_main refers to:
- WAL record LSN in hexa format (00000000-010000C0).
- Tablespace OID (1663).
- Database OID (1).
- Relfilenode (6117).
- Block number (123).
- Fork name of the file this block came from (_main).
Author: David Christensen
Reviewed-by: Sho Kato, Justin Pryzby, Bharath Rupireddy, Matthias van de
Meent
Discussion: https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.
If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.
This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.
Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Referring to the WAL as just "log" invites confusion with the
postmaster log, so avoid doing that in docs and error messages.
Also shorten "WAL segment file" to just "WAL file" in various
places.
Bharath Rupireddy, reviewed by Nathan Bossart and Kyotaro Horiguchi
Discussion: https://postgr.es/m/CALj2ACUeXa8tDPaiTLexBDMZ7hgvaN+RTb957-cn5qwv9zf-MQ@mail.gmail.com
Various bits of code were declaring signal handlers manually,
using "int signum" or variants of that. We evidently have no
platforms where that's actually wrong, but let's use our
SIGNAL_ARGS macro everywhere anyway. If nothing else, it's
good for finding signal handlers easily.
No need for back-patch, since this is just cosmetic AFAICS.
Discussion: https://postgr.es/m/2684964.1663167995@sss.pgh.pa.us
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
I rephrased the error messages to be more in the style of
option_parse_int(), and also made use of the new "detail" message
facility. I didn't actually use option_parse_int() (which could be
used for -n) because libpgfeutils wasn't used here yet and I wanted to
keep this just to string changes. But it could be done in the future.
All but a few existing callers assume without checking that this
function succeeds. While it probably will, that's a poor excuse for
not checking. Let's make it return void and instead throw an error
if it doesn't find the block reference. Callers that actually need
to handle the no-such-block case must now use the underlying function
XLogRecGetBlockTagExtended.
In addition to being a bit less error-prone, this should also serve
to suppress some Coverity complaints about XLogRecGetBlockRefInfo.
While at it, clean up some inconsistency about use of the
XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use
that instead of open-coding the same condition, and avoid calling
XLogRecHasBlockRef twice in relevant code paths. (That is,
calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now
deprecated: use XLogRecGetBlockTagExtended instead.)
Patch HEAD only; this doesn't seem to have enough value to consider
a back-branch API break.
Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
Change two macros to be static inline functions instead to keep the
data type consistent. This avoids a "comparison is always true"
warning that was occurring with -Wtype-limits. In the process, change
the names to look less like macros.
Discussion: https://postgr.es/m/20220407063505.njnnrmbn4sxqfsts@alap3.anarazel.de
Allow extensions to specify a new custom resource manager (rmgr),
which allows specialized WAL. This is meant to be used by a Table
Access Method or Index Access Method.
Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.
Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
After closedir() dirent->d_name is not valid anymore. As there alerady are a
few places relying on the limited lifetime of pg_waldump, do so here as well,
and just pg_strdup() the string.
The bug was introduced in fc49e24fa6.
Found by UBSan, run locally.
Backpatch: 11-, like fc49e24fa6 itself.
Follow-up improvements for commit 127aea2a based on discussion:
* use fork name for --fork, not number
* use -R, -B as short switches for --relation, --block
* re-alphabetize the list of switches (code, --help and docs)
Suggested-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> (fork name part)
Reviewed-by: David Christensen <david.christensen@crunchydata.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/3a4c2e93-7976-2320-fc0a-32097fe148a7%40enterprisedb.com
Teach xlogreader.c to decode the WAL into a circular buffer. This will
support optimizations based on looking ahead, to follow in a later
commit.
* XLogReadRecord() works as before, decoding records one by one, and
allowing them to be examined via the traditional XLogRecGetXXX()
macros and certain traditional members like xlogreader->ReadRecPtr.
* An alternative new interface XLogReadAhead()/XLogNextRecord() is
added that returns pointers to DecodedXLogRecord objects so that it's
now possible to look ahead in the WAL stream while replaying.
* In order to be able to use the new interface effectively while
streaming data, support is added for the page_read() callback to
respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of
waiting for more data to arrive.
No direct user of the new interface is included in this commit, though
XLogReadRecord() uses it internally. Existing code doesn't need to
change, except in a few places where it was accessing reader internals
directly and now needs to go through accessor macros.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
wal_compression gains a new value, "zstd", to allow the compression of
full-page images using the compression method of the same name.
Compression is done using the default level recommended by the library,
as of ZSTD_CLEVEL_DEFAULT = 3. Some benchmarking has shown that it
could make sense to use a level lower for the FPI compression, like 1 or
2, as the compression rate did not change much with a bit less CPU
consumed, but any tests done would only cover few scenarios so it is
hard to come to a clear conclusion. Anyway, there is no reason to not
use the default level instead, which is the level recommended by the
library so it should be fine for most cases.
zstd outclasses easily pglz, and is better than LZ4 where one wants to
have more compression at the cost of extra CPU but both are good enough
in their own scenarios, so the choice between one or the other of these
comes to a study of the workload patterns and the schema involved,
mainly.
This commit relies heavily on 4035cd5, that reshaped the code creating
and restoring full-page writes to be aware of the compression type,
making this integration straight-forward.
This patch borrows some early work from Andrey Borodin, though the patch
got a complete rewrite.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com
When opening a WAL file smaller than XLOG_BLCKSZ (e.g. 0 bytes long) while
determining the wal_segment_size, pg_waldump checked errno, despite errno not
being set by the short read. Resulting in a bogus error message.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220214.181847.775024684568733277.horikyota.ntt@gmail.com
Backpatch: 11-, the bug was introducedin fc49e24fa
Previously, pg_waldump would not display its statistics summary if it
got interrupted by SIGINT (or say a simple Ctrl+C). It gains with this
commit a signal handler for SIGINT, trapping the signal to exit at the
earliest convenience to allow a display of the stats summary before
exiting. This makes the reports more interactive, similarly to strace
-c.
This new behavior makes the combination of the options --stats and
--follow much more useful, so as the user will get a report for any
invocation of pg_waldump in such a case. Information about the LSN
range of the stats computed is added as a header to the report
displayed.
This implementation comes from a suggestion by Álvaro Herrera and
myself, following a complaint by the author of this patch about --stats
and --follow not being useful together originally.
As documented, this is not supported on Windows, though its support
would be possible by catching the terminal events associated to Ctrl+C,
for example (this may require a more centralized implementation, as
other tools could benefit from a common API).
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUUx3PcK2z9h0_m7vehreZAUbcmOky9WSEpe8TofhV=PQ@mail.gmail.com
A couple of spots can benefit from an added errdetail(), which matches
what we were already doing in other places; and those that cannot
withstand errdetail() can get a more descriptive primary message.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CALj2ACV+cX1eM03GfcA=ZMLXh5fSn1X1auJLz3yuS1duPSb9QA@mail.gmail.com
The logic is implemented so as there can be a choice in the compression
used when building a WAL record, and an extra per-record bit is used to
track down if a block is compressed with PGLZ, LZ4 or nothing.
wal_compression, the existing parameter, is changed to an enum with
support for the following backward-compatible values:
- "off", the default, to not use compression.
- "pglz" or "on", to compress FPWs with PGLZ.
- "lz4", the new mode, to compress FPWs with LZ4.
Benchmarking has showed that LZ4 outclasses easily PGLZ. ZSTD would be
also an interesting choice, but going just with LZ4 for now makes the
patch minimalistic as toast compression is already able to use LZ4, so
there is no need to worry about any build-related needs for this
implementation.
Author: Andrey Borodin, Justin Pryzby
Reviewed-by: Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/3037310D-ECB7-4BF1-AF20-01C10BB33A33@yandex-team.ru
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.
Commits reverted:
dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.
Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.
Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
Teach xlogreader.c to decode its output into a circular buffer, to
support optimizations based on looking ahead.
* XLogReadRecord() works as before, consuming records one by one, and
allowing them to be examined via the traditional XLogRecGetXXX()
macros.
* An alternative new interface XLogNextRecord() is added that returns
pointers to DecodedXLogRecord structs that can be examined directly.
* XLogReadAhead() provides a second cursor that lets you see
further ahead, as long as data is available and there is enough space
in the decoding buffer. This returns DecodedXLogRecord pointers to the
caller, but also adds them to a queue of records that will later be
consumed by XLogNextRecord()/XLogReadRecord().
The buffer's size is controlled with wal_decode_buffer_size. The buffer
could potentially be placed into shared memory, for future projects.
Large records that don't fit in the circular buffer are called
"oversized" and allocated separately with palloc().
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
Previously, the XLogReader module would fetch new input data using a
callback function. Redesign the interface so that it tells the caller
to insert more data with a special return value instead. This API suits
later patches for prefetching, encryption and maybe other future
projects that would otherwise require continually extending the callback
interface.
As incidental cleanup work, move global variables readOff, readLen and
readSegNo inside XlogReaderState.
Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version)
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
pg_waldump --stats=record identifies a record by a combination
of the RmgrId and the four bits of the xl_info field of the record.
But XACT records use the first bit of those four bits for an optional
flag variable, and the following three bits for the opcode to
identify a record. So previously the same type of XACT record
could have different four bits (three bits are the same but the
first one bit is different), and which could cause
pg_waldump --stats=record to show two lines of per-record statistics
for the same XACT record. This is a bug.
This commit changes pg_waldump --stats=record so that it processes
only XACT record differently, i.e., filters the opcode out of xl_info
and uses a combination of the RmgrId and those three bits as
the identifier of a record, only for XACT record. For other records,
the four bits of the xl_info field are still used.
Back-patch to all supported branches.
Author: Kyotaro Horiguchi
Reviewed-by: Shinya Kato, Fujii Masao
Discussion: https://postgr.es/m/2020100913412132258847@highgo.ca
Reuse cautionary language from src/test/ssl/README in
src/test/kerberos/README. SLRUs have had access to six-character
segments names since commit 73c986adde,
and recovery stopped calling HeapTupleHeaderAdvanceLatestRemovedXid() in
commit 558a9165e0. The other corrections
are more self-evident.
* Have both physical and logical walsender share a 'xlogreader' state
struct for tracking state. This replaces the existing globals sendSeg
and sendCxt.
* Change WALRead not to receive XLogReaderState->seg and ->segcxt as
separate arguments anymore; just use the ones from 'state'. This is
made possible by the above change.
* have the XLogReader segment_open contract require the callbacks to
install the file descriptor in the state struct themselves instead of
returning it. xlogreader was already ignoring any possible failed
return from the callbacks, relying solely on them never returning.
(This point is not altogether excellent, as it means the callbacks
have to know more of XLogReaderState; but to really improve on that
we would have to pass back error info from the callbacks to
xlogreader. And the complexity would not be saved but instead just
transferred to the callers of WALRead, which would have to learn how
to throw errors from the open_segment callback in addition of, as
currently, from pg_pread.)
* segment_open no longer receives the 'segcxt' as a separate argument,
since it's part of the XLogReaderState argument.
Per comments from Kyotaro Horiguchi.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
Code review for 0dc8ead463, prompted by a bug closed by 91c40548d5.
XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open). This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.
Document the whole arrangement more clearly.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
The primary motivation for this change is that it will be used by the
upcoming patch to add backup manifests, but it also seems to have some
potential more general use.
Andres Freund and Robert Haas
Discussion: http://postgr.es/m/20200330020814.nspra4mvby42yoa4@alap3.anarazel.de