Base de données relationnelle
Find a file
Tom Lane 860359ea02 Fix assorted bugs in archive_waldump.c.
1. archive_waldump.c called astreamer_finalize() nowhere.  This meant
that any data retained in decompression buffers at the moment we
detect archive EOF would never reach astreamer_waldump_content(),
resulting in surprising failures if we actually need the last few
bytes of the archive file.

To fix that, make read_archive_file() do the finalize once it detects
EOF.  Change its API to return a boolean "yes there's more data"
rather than the entirely-misleading raw count of bytes read.

2. init_archive_reader() relied on privateInfo->cur_file to track
which WAL segment was being read, but cur_file can become NULL if a
member trailer is processed during a read_archive_file() call.  This
could cause unreproducible "could not find WAL in archive" failures,
particularly with compressed archives where all the WAL data fits in
a small number of compressed bytes.

Fix by scanning the hash table after each read to find any cached
WAL segment with sufficient data, instead of depending on cur_file.
Also reduce the minimum data requirement from XLOG_BLCKSZ to
sizeof(XLogLongPageHeaderData), since we only need the long page
header to extract the segment size.

We likewise need to fix init_archive_reader() to scan the whole
hash table for irrelevant entries, since we might have already
loaded more than one entry when the data is compressible enough.

3. get_archive_wal_entry() relied on tracking cur_file to identify
WAL hash table entries that need to be spilled to disk.  However,
this can't work for entries that are read completely within a
single read_archive_file call: the caller will never see cur_file
pointing at such an entry.  Instead, scan the WAL hash table to
find entries we should spill.  This also fixes a buglet that any
hash table entries completely loaded during init_archive_reader
were never considered for spilling.

Also, simplify the logic tremendously by not attempting to spill
entries that haven't been read fully.  I am not convinced that the old
logic handled that correctly in every path, and it's really not worth
the complication and risk of bugs to try to spill entries on the fly.
We can just write them in a single go once they are no longer the
cur_file.

4. Fix a rather critical performance problem: the code thought that
resetStringInfo() will reclaim storage, but it doesn't.  So by the
end of the run we'd have consumed storage space equal to the total
amount of WAL read, negating all the effort of the spill logic.

Also document the contract that cur_file can change (or become NULL)
during a single read_archive_file() call, since the decompression
pipeline may produce enough output to trigger multiple astreamer
callbacks.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
2026-03-22 18:24:42 -04:00
.github Add CODE_OF_CONDUCT.md, CONTRIBUTING.md, and SECURITY.md. 2024-07-02 13:03:58 -05:00
config Hardcode override of typeof_unqual for clang-for-bitcode 2026-03-16 19:24:49 +01:00
contrib plpgsql: optimize "SELECT simple-expression INTO var". 2026-03-20 18:23:45 -04:00
doc pg_verifybackup: Enable WAL parsing for tar-format backups 2026-03-20 15:31:35 -04:00
src Fix assorted bugs in archive_waldump.c. 2026-03-22 18:24:42 -04:00
.cirrus.star ci: Simplify ci-os-only handling 2025-08-14 12:09:34 -04:00
.cirrus.tasks.yml Revert "Change default value of default_toast_compression to "lz4"" 2026-03-05 08:25:35 +09:00
.cirrus.yml ci: Per-repo configuration for manually trigger tasks 2025-08-14 11:54:03 -04:00
.dir-locals.el Make Emacs perl-mode indent more like perltidy. 2019-01-13 11:32:31 -08:00
.editorconfig Update .editorconfig and .gitattributes for postgresql.conf.sample. 2025-11-18 10:28:36 -06:00
.git-blame-ignore-revs Add commit 015d32016d to .git-blame-ignore-revs. 2026-03-19 13:45:07 +09:00
.gitattributes Update .editorconfig and .gitattributes for postgresql.conf.sample. 2025-11-18 10:28:36 -06:00
.gitignore Update top-level .gitignore. 2022-12-04 15:23:00 -05:00
.mailmap Add a Git .mailmap file 2024-11-05 13:56:02 +01:00
aclocal.m4 autoconf: Move export_dynamic determination to configure 2022-12-06 18:55:28 -08:00
configure Enable -Wstrict-prototypes and -Wold-style-definition by default 2026-03-18 14:31:50 +01:00
configure.ac Enable -Wstrict-prototypes and -Wold-style-definition by default 2026-03-18 14:31:50 +01:00
COPYRIGHT Update copyright for 2026 2026-01-01 13:24:10 -05:00
GNUmakefile.in Allow selecting the git revision to be packaged by "make dist". 2024-05-03 11:08:50 -04:00
HISTORY Canonicalize some URLs 2020-02-10 20:47:50 +01:00
Makefile Restore AIX support. 2026-02-23 13:34:22 -05:00
meson.build Enable -Wstrict-prototypes and -Wold-style-definition by default 2026-03-18 14:31:50 +01:00
meson_options.txt Update copyright for 2026 2026-01-01 13:24:10 -05:00
README.md Revise the style of a paragraph in README.md. 2024-03-21 10:16:41 -05:00

PostgreSQL Database Management System

This directory contains the source code distribution of the PostgreSQL database management system.

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.

Copyright and license information can be found in the file COPYRIGHT.

General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.

The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.