1. archive_waldump.c called astreamer_finalize() nowhere. This meant that any data retained in decompression buffers at the moment we detect archive EOF would never reach astreamer_waldump_content(), resulting in surprising failures if we actually need the last few bytes of the archive file. To fix that, make read_archive_file() do the finalize once it detects EOF. Change its API to return a boolean "yes there's more data" rather than the entirely-misleading raw count of bytes read. 2. init_archive_reader() relied on privateInfo->cur_file to track which WAL segment was being read, but cur_file can become NULL if a member trailer is processed during a read_archive_file() call. This could cause unreproducible "could not find WAL in archive" failures, particularly with compressed archives where all the WAL data fits in a small number of compressed bytes. Fix by scanning the hash table after each read to find any cached WAL segment with sufficient data, instead of depending on cur_file. Also reduce the minimum data requirement from XLOG_BLCKSZ to sizeof(XLogLongPageHeaderData), since we only need the long page header to extract the segment size. We likewise need to fix init_archive_reader() to scan the whole hash table for irrelevant entries, since we might have already loaded more than one entry when the data is compressible enough. 3. get_archive_wal_entry() relied on tracking cur_file to identify WAL hash table entries that need to be spilled to disk. However, this can't work for entries that are read completely within a single read_archive_file call: the caller will never see cur_file pointing at such an entry. Instead, scan the WAL hash table to find entries we should spill. This also fixes a buglet that any hash table entries completely loaded during init_archive_reader were never considered for spilling. Also, simplify the logic tremendously by not attempting to spill entries that haven't been read fully. I am not convinced that the old logic handled that correctly in every path, and it's really not worth the complication and risk of bugs to try to spill entries on the fly. We can just write them in a single go once they are no longer the cur_file. 4. Fix a rather critical performance problem: the code thought that resetStringInfo() will reclaim storage, but it doesn't. So by the end of the run we'd have consumed storage space equal to the total amount of WAL read, negating all the effort of the spill logic. Also document the contract that cur_file can change (or become NULL) during a single read_archive_file() call, since the decompression pipeline may produce enough output to trigger multiple astreamer callbacks. Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us |
||
|---|---|---|
| .github | ||
| config | ||
| contrib | ||
| doc | ||
| src | ||
| .cirrus.star | ||
| .cirrus.tasks.yml | ||
| .cirrus.yml | ||
| .dir-locals.el | ||
| .editorconfig | ||
| .git-blame-ignore-revs | ||
| .gitattributes | ||
| .gitignore | ||
| .mailmap | ||
| aclocal.m4 | ||
| configure | ||
| configure.ac | ||
| COPYRIGHT | ||
| GNUmakefile.in | ||
| HISTORY | ||
| Makefile | ||
| meson.build | ||
| meson_options.txt | ||
| README.md | ||
PostgreSQL Database Management System
This directory contains the source code distribution of the PostgreSQL database management system.
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.
Copyright and license information can be found in the file COPYRIGHT.
General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.
The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.