postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-06-27 09:22:52 -04:00

Author	SHA1	Message	Date
Michael Paquier	273fe94852	Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE) The NFC recomposition incorrectly included TBASE as a valid T syllable, which is incorrect based on the Unicode specification (TBASE is one below the start of the range, range beginning at U+11A8). This would cause the TBASE to be silently swallowed in the normalization, leading to an incorrect result. A couple of regression tests are added to check more patterns with Hangul recomposition and decomposition, on top of a test to check the problem with TBASE. Diego has submitted the code fix, and I have written the tests. Author: Diego Frias <mail@dzfrias.dev> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/B92ED640-7D4A-4505-B09F-3548F58CBB16@dzfrias.dev Backpatch-through: 14	2026-06-05 07:50:12 +09:00
Tom Lane	01e568b8c1	Fix assorted places that need to use palloc_array(). multirange_recv and BlockRefTableReaderNextRelation were incautious about multiplying a possibly-large integer by a factor more than 1 and then using it as an allocation size. This is harmless on 64-bit systems where we'd compute a size exceeding MaxAllocSize and then fail, but on 32-bit systems we could overflow size_t leading to an undersized allocation and buffer overrun. Fix these places by using palloc_array() instead of a handwritten multiplication. (In HEAD, some of them were fixed already, but none of that work got back-patched at the time.) In addition, BlockRefTableReaderNextRelation passes the same value to BlockRefTableRead's "int length" parameter. If built for 64-bit frontend code, palloc_array() allows a larger array size than it otherwise would, potentially allowing that parameter to overflow. Add an explicit check to forestall that and keep the behavior the same cross-platform. Reported-by: Xint Code Author: Tom Lane <tgl@sss.pgh.pa.us> Backpatch-through: 14 Security: CVE-2026-6473	2026-05-11 05:13:47 -07:00
Tom Lane	8d1489d505	Prevent buffer overrun in unicode_normalize(). Some UTF8 characters decompose to more than a dozen codepoints. It is possible for an input string that fits into well under 1GB to produce more than 4G decomposed codepoints, causing unicode_normalize()'s decomp_size variable to wrap around to a small positive value. This results in a small output buffer allocation and subsequent buffer overrun. To fix, test after each addition to see if we've overrun MaxAllocSize, and break out of the loop early if so. In frontend code we want to just return NULL for this failure (treating it like OOM). In the backend, we can rely on the following palloc() call to throw error. I also tightened things up in the calling functions in varlena.c, using size_t rather than int and allocating the input workspace with palloc_array(). These changes are probably unnecessary given the knowledge that the original input and the normalized output_chars array must fit into 1GB, but it's a lot easier to believe the code is safe with these changes. Reported-by: Xint Code Reported-by: Bruce Dang <bruce@calif.io> Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi> Backpatch-through: 14 Security: CVE-2026-6473	2026-05-11 05:13:47 -07:00
Tom Lane	e1c30458a1	Make palloc_array() and friends safe against integer overflow. Sufficiently large "count" arguments could result in undetected overflow, causing the allocated memory chunk to be much smaller than what the caller will subsequently write into it. This is unlikely to be a hazard with 64-bit size_t but can sometimes happen on 32-bit builds, primarily where a function allocates workspace that's significantly larger than its input data. Rather than trying to patch the at-risk callers piecemeal, let's just redefine these macros so that they always check. To do that, move the longstanding add_size() and mul_size() functions into palloc.h and mcxt.c, and adjust them to not be specific to shared-memory allocation. Then invent palloc_mul(), palloc0_mul(), palloc_mul_extended() to use these functions. Actually, the latter use inlined copies to save one function call. repalloc_array() gets similar treatment. I didn't bother trying to inline the calls for repalloc0_array() though. In v14 and v15, this also adds repalloc_extended(), which previously was only available in v16 and up. We need copies of all this in fe_memutils.[hc] as well, since that module also provides palloc_array() etc. Reported-by: Xint Code Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 14 Security: CVE-2026-6473	2026-05-11 05:13:47 -07:00
Jeff Davis	6044f55a47	Fix callers of unicode_strtitle() using srclen == -1. Currently, only called that way in tests, which failed to fail. Discussion: https://postgr.es/m/581a72ff452bb045ba83bbe3c6cf4467702d4f0f.camel@j-davis.com Backpatch-through: 18	2026-04-20 14:45:44 -07:00
Thomas Munro	ea94d2e673	Fix comments for Korean encodings in encnames.c * JOHAB: replace the incorrect "simplified Chinese" description with a correct one that identifies it as the Korean combining (Johab) encoding standardized in KS X 1001 annex 3. * EUC_KR: drop a stray space before the comma in the existing comment, and note that the encoding covers the KS X 1001 precomposed (Wansung) form. * UHC: spell out "Unified Hangul Code", clarify that it is Microsoft Windows CodePage 949, and describe its relationship to EUC-KR (superset covering all 11,172 precomposed Hangul syllables). Backpatch-through: 14 Author: Henson Choi <assam258@gmail.com> Discussion: https://postgr.es/m/CAAAe_zAFz1v-3b7Je4L%2B%3DwZM3UGAczXV47YVZfZi9wbJxspxeA%40mail.gmail.com	2026-04-16 18:21:41 +12:00
Andrew Dunstan	c3e436b1cb	Fix heap-buffer-overflow in pglz_decompress() on corrupt input. When decoding a match tag, pglz_decompress() reads 2 bytes (or 3 for extended-length matches) from the source buffer before checking whether enough data remains. The existing bounds check (sp > srcend) occurs after the reads, so truncated compressed data that ends mid-tag causes a read past the allocated buffer. Fix by validating that sufficient source bytes are available before reading each part of the match tag. The post-read sp > srcend check is no longer needed and is removed. Found by fuzz testing with libFuzzer and AddressSanitizer. Backpatch-through: 14	2026-04-10 10:28:00 -04:00
Andrew Dunstan	3e49556302	Fix incremental JSON parser numeric token reassembly across chunks. When the incremental JSON parser splits a numeric token across chunk boundaries, it accumulates continuation characters into the partial token buffer. The accumulator's switch statement unconditionally accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations regardless of position, which violated JSON number grammar (-? int [frac] [exp]). For example, input "4-" fed in single-byte chunks would accumulate the '-' into the numeric token, producing an invalid token that later triggered an assertion failure during re-lexing. Fix by tracking parser state (seen_dot, seen_exp, prev character) across the existing partial token and incoming bytes, so that each character class is accepted only in its grammatically valid position. Backpatch-through: 17	2026-04-10 10:21:38 -04:00
Tom Lane	de77775a7b	Fix some cases of indirectly casting away const. Newest versions of gcc+glibc are able to detect cases where code implicitly casts away const by assigning the result of strchr() or a similar function applied to a "const char " value to a target variable that's just "char ". This of course creates a hazard of not getting a compiler warning about scribbling on a string one was not supposed to, so fixing up such cases is good. This patch fixes a dozen or so places where we were doing that. Most are trivial additions of "const" to the target variable, since no actually-hazardous change was occurring. Thanks to Bertrand Drouvot for finding a couple more spots than I had. This commit back-patches relevant portions of `8f1791c61` and `9f7565c6c` into supported branches. However, there are two places in ecpg (in v18 only) where a proper fix is more complicated than seems appropriate for a back-patch. I opted to silence those two warnings by adding casts. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us Discussion: https://postgr.es/m/3988414.1771950285@sss.pgh.pa.us Backpatch-through: 14-18	2026-02-25 11:19:50 -05:00
Thomas Munro	efef05ba99	Fix mb2wchar functions on short input. When converting multibyte to pg_wchar, the UTF-8 implementation would silently ignore an incomplete final character, while the other implementations would cast a single byte to pg_wchar, and then repeat for the remaining byte sequence. While it didn't overrun the buffer, it was surely garbage output. Make all encodings behave like the UTF-8 implementation. A later change for master only will convert this to an error, but we choose not to back-patch that behavior change on the off-chance that someone is relying on the existing UTF-8 behavior. Security: CVE-2026-2006 Backpatch-through: 14 Author: Thomas Munro <thomas.munro@gmail.com> Reported-by: Noah Misch <noah@leadboat.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>	2026-02-09 12:12:33 +13:00
Thomas Munro	df0852fe03	Fix encoding length for EUC_CN. While EUC_CN supports only 1- and 2-byte sequences (CS0, CS1), the mb<->wchar conversion functions allow 3-byte sequences beginning SS2, SS3. Change pg_encoding_max_length() to return 3, not 2, to close a hypothesized buffer overrun if a corrupted string is converted to wchar and back again in a newly allocated buffer. We might reconsider that in master (ie harmonizing in a different direction), but this change seems better for the back-branches. Also change pg_euccn_mblen() to report SS2 and SS3 characters as having length 3 (following the example of EUC_KR). Even though such characters would not pass verification, it's remotely possible that invalid bytes could be used to compute a buffer size for use in wchar conversion. Security: CVE-2026-2006 Backpatch-through: 14 Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>	2026-02-09 12:12:29 +13:00
Tom Lane	8e5e3ff556	Don't put library-supplied -L/-I switches before user-supplied ones. For many optional libraries, we extract the -L and -l switches needed to link the library from a helper program such as llvm-config. In some cases we put the resulting -L switches into LDFLAGS ahead of -L switches specified via --with-libraries. That risks breaking the user's intention for --with-libraries. It's not such a problem if the library's -L switch points to a directory containing only that library, but on some platforms a library helper may "helpfully" offer a switch such as -L/usr/lib that points to a directory holding all standard libraries. If the user specified --with-libraries in hopes of overriding the standard build of some library, the -L/usr/lib switch prevents that from happening since it will come before the user-specified directory. To fix, avoid inserting these switches directly into LDFLAGS during configure, instead adding them to LIBDIRS or SHLIB_LINK. They will still eventually get added to LDFLAGS, but only after the switches coming from --with-libraries. The same problem exists for -I switches: those coming from --with-includes should appear before any coming from helper programs such as llvm-config. We have not heard field complaints about this case, but it seems certain that a user attempting to override a standard library could have issues. The changes for this go well beyond configure itself, however, because many Makefiles have occasion to manipulate CPPFLAGS to insert locally-desirable -I switches, and some of them got it wrong. The correct ordering is any -I switches pointing at within-the- source-tree-or-build-tree directories, then those from the tree-wide CPPFLAGS, then those from helper programs. There were several places that risked pulling in a system-supplied copy of libpq headers, for example, instead of the in-tree files. (Commit `cb36f8ec2` fixed one instance of that a few months ago, but this exercise found more.) The Meson build scripts may or may not have any comparable problems, but I'll leave it to someone else to investigate that. Reported-by: Charles Samborski <demurgos@demurgos.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/70f2155f-27ca-4534-b33d-7750e20633d7@demurgos.net Backpatch-through: 13	2025-07-29 15:17:40 -04:00
Heikki Linnakangas	b28c59a6cd	Use 'void ' for arbitrary buffers, 'uint8 ' for byte arrays A 'void ' argument suggests that the caller might pass an arbitrary struct, which is appropriate for functions like libc's read/write, or pq_sendbytes(). 'uint8 ' is more appropriate for byte arrays that have no structure, like the cancellation keys or SCRAM tokens. Some places used 'char ', but 'uint8 ' is better because 'char *' is commonly used for null-terminated strings. Change code around SCRAM, MD5 authentication, and cancellation key handling to follow these conventions. Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org	2025-05-08 22:01:25 +03:00
Noah Misch	627acc3caa	With GB18030, prevent SIGSEGV from reading past end of allocation. With GB18030 as source encoding, applications could crash the server via SQL functions convert() or convert_from(). Applications themselves could crash after passing unterminated GB18030 input to libpq functions PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or PQescapeString(). Extension code could crash by passing unterminated GB18030 input to jsonapi.h functions. All those functions have been intended to handle untrusted, unterminated input safely. A crash required allocating the input such that the last byte of the allocation was the last byte of a virtual memory page. Some malloc() implementations take measures against that, making the SIGSEGV hard to reach. Back-patch to v13 (all supported versions). Author: Noah Misch <noah@leadboat.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 13 Security: CVE-2025-4207	2025-05-05 04:52:04 -07:00
Jeff Davis	90260e2ec6	Fix INITCAP() word boundaries for PG_UNICODE_FAST. Word boundaries are based on whether a character is alphanumeric or not. For the PG_UNICODE_FAST collation, alphanumeric includes non-ASCII digits; whereas for the PG_C_UTF8 collation, it only includes digits 0-9. Pass down the right information from the pg_locale_t into initcap_wbnext to differentiate the behavior. Reported-by: Noah Misch <noah@leadboat.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com	2025-04-21 12:34:58 -07:00
Peter Geoghegan	a6cab6a78e	Harmonize function parameter names for Postgres 18. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 18 development. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit `035ce1fe`).	2025-04-12 12:07:36 -04:00
Fujii Masao	173c97812f	Use XLOG_CONTROL_FILE macro consistently for control file name. The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h) represents the control file name. While some parts of the codebase already use this macro, others previously hardcoded the file name as a string. This commit replaces those hardcoded strings with the macro, ensuring consistent usage throughout the code. This makes future maintenance easier and improves searchability, for example when grepping for control file usage. Author: Anton A. Melnikov <a.melnikov@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Masao Fujii <masao.fujii@gmail.com> Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru	2025-04-07 09:27:33 +09:00
Peter Eisentraut	82a46cca99	Update Unicode data to Unicode 16.0.0 Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/146349e4-4687-4321-91af-f235572490a8@eisentraut.org	2025-04-03 12:00:09 +02:00
Peter Eisentraut	bbf24fe2f1	Update code comment Commit `4e7f62bc38` added a new input file to a script but didn't update the comment listing the input files.	2025-04-03 09:20:25 +02:00
Peter Eisentraut	34f04aa653	Fix update-unicode make target The addition of SpecialCasing.txt by commit `286a365b9c` was not added to the make target dependencies, so the invoked script would fail because the required file wasn't downloaded first. (The meson version appears to work correctly.)	2025-04-03 09:20:25 +02:00
Richard Guo	7c82b4f711	Fix integer-overflow problem in scram_SaltedPassword() Setting the iteration count for SCRAM secret generation to INT_MAX will cause an infinite loop in scram_SaltedPassword() due to integer overflow, as the loop uses the "i <= iterations" comparison. To fix, use "i < iterations" instead. Back-patch to v16 where the user-settable GUC scram_iterations has been added. Author: Kevin K Biju <kevinkbiju@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com	2025-03-26 17:46:51 +09:00
Nathan Bossart	626d7236b6	pg_upgrade: Add --swap for faster file transfer. This new option instructs pg_upgrade to move the data directories from the old cluster to the new cluster and then to replace the catalog files with those generated for the new cluster. This mode can outperform --link, --clone, --copy, and --copy-file-range, especially on clusters with many relations. However, this mode creates many garbage files in the old cluster, which can prolong the file synchronization step if --sync-method=syncfs is used. To handle that, we recommend using --sync-method=fsync with this mode, and pg_upgrade internally uses "initdb --sync-only --no-sync-data-files" for file synchronization. pg_upgrade will synchronize the catalog files as they are transferred. We assume that the database files transferred from the old cluster were synchronized prior to upgrade. This mode also complicates reverting to the old cluster, so we recommend restoring from backup upon failure during or after file transfer. We did consider teaching pg_upgrade how to generate a revert script for such failures, but we decided against it due to the rarity of failing during file transfer, the complexity of generating the script, and the potential for misusing the script. The new mode is limited to clusters located in the same file system. With some effort, we could probably support upgrades between different file systems, but this mode is unlikely to offer much benefit if we have to copy the files across file system boundaries. It is also limited to upgrades from version 10 or newer. There are a few known obstacles for using swap mode to upgrade from older versions. For example, the visibility map format changed in v9.6, and the sequence tuple format changed in v10. In fact, swap mode omits the --sequence-data option in its uses of pg_dump and instead reuses the old cluster's sequence data files. While teaching swap mode to deal with these kinds of changes is surely possible (and we may have to deal with similar problems in the future, anyway), it doesn't seem worth the effort to support upgrades from long-unsupported versions. Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-25 16:02:35 -05:00
Nathan Bossart	cf131fa942	initdb: Add --no-sync-data-files. This new option instructs initdb to skip synchronizing any files in database directories, the database directories themselves, and the tablespace directories, i.e., everything in the base/ subdirectory and any other tablespace directories. Other files, such as those in pg_wal/ and pg_xact/, will still be synchronized unless --no-sync is also specified. --no-sync-data-files is primarily intended for internal use by tools that separately ensure the skipped files are synchronized to disk. A follow-up commit will use this to help optimize pg_upgrade's file transfer step. The --sync-method=fsync implementation of this option makes use of a new exclude_dir parameter for walkdir(). When not NULL, exclude_dir specifies a directory to skip processing. The --sync-method=syncfs implementation of this option just skips synchronizing the non-default tablespace directories. This means that initdb will still synchronize some or all of the database files, but there's not much we can do about that. Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan	2025-03-25 16:02:35 -05:00
Peter Eisentraut	618c64ffd3	Revert workarounds for -Wmissing-braces false positives on old GCC We have collected several instances of a workaround for GCC bug 53119, which caused false-positive compiler warnings. This bug has long been fixed, but was still seen on the buildfarm, most recently on lapwing with gcc (Debian 4.7.2-5). (The GCC bug tracker mentions that a fix was backported to 4.7.4 and 4.8.3.) That compiler no longer runs warning-free since commit `6fdd5d9563`, so we don't need to keep these workarounds. And furthermore, the consensus appears to be that we don't want to keep supporting that era of platform anymore at all. This reverts the following commits: `d937904cce` `506428d091` `b449afb582` `6392f2a096` `bad0763a4d` `5e0c761d0a` and makes a few similar fixes to newer code. Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com	2025-03-20 11:25:58 +01:00
Jeff Davis	549ea06e42	Fix headerscheck warning. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us	2025-03-18 08:37:07 -07:00
Andrew Dunstan	5eabd91a83	Silence perl critic Commit `27bdec0684` uses a loop variable that is not strictly local to the loop. Perlcritic disapproves, and there's really no reason as the variable is not used outside the loop. Per buildfarm animals koel and crake.	2025-03-15 17:41:54 -04:00
Jeff Davis	27bdec0684	Optimization for lower(), upper(), casefold() functions. Improve performance and reduce table sizes for case mapping. The main case mapping table stores only 16-bit offsets, which can be used to look up the mapped code point in any of the case tables (fold, lower, upper, or title case). Simple case pairs point to the same offsets. Generate a function in generate-unicode_case_table.pl that consists of a nested branches to test for specific codepoint ranges that determine the offset in the main table. Other approaches were considered, such as representing these ranges as another structure (rather than branches in a generated function), or a different approach such as a radix tree, or perfect hashing. The author implemented and tested these alternatives and settled on the generated branches. Author: Alexander Borisov <lex.borisov@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com	2025-03-15 13:00:50 -07:00
Peter Eisentraut	3691edfab9	pg_noreturn to replace pg_attribute_noreturn() We want to support a "noreturn" decoration on more compilers besides just GCC-compatible ones, but for that we need to move the decoration in front of the function declaration instead of either behind it or wherever, which is the current style afforded by GCC-style attributes. Also rename the macro to "pg_noreturn" to be similar to the C11 standard "noreturn". pg_noreturn is now supported on all compilers that support C11 (using _Noreturn), as well as GCC-compatible ones (using __attribute__, as before), as well as MSVC (using __declspec). (When PostgreSQL requires C11, the latter two variants can be dropped.) Now, all supported compilers effectively support pg_noreturn, so the extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped. This also fixes a possible problem if third-party code includes stdnoreturn.h, because then the current definition of #define pg_attribute_noreturn() __attribute__((noreturn)) would cause an error. Note that the C standard does not support a noreturn attribute on function pointer types. So we have to drop these here. There are only two instances at this time, so it's not a big loss. In one case, we can make up for it by adding the pg_noreturn to a wrapper function and adding a pg_unreachable(), in the other case, the latter was already done before. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw	2025-03-13 12:37:26 +01:00
Jeff Davis	d3b2e5e1ab	Refactor convert_case() to prepare for optimizations. Upcoming optimizations will add complexity to convert_case(). This patch reorganizes slightly so that the complexity can be contained within the logic to convert the case of a single character, rather than mixing it in with logic to iterate through the string. Reviewed-by: Alexander Borisov <lex.borisov@gmail.com> Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com	2025-03-12 21:51:52 -07:00
Andres Freund	37c87e63f9	Change relpath() et al to return path by value For AIO, and also some other recent patches, we need the ability to call relpath() in a critical section. Until now that was not feasible, as it allocated memory. The fact that relpath() allocated memory also made it awkward to use in log messages because we had to take care to free the memory afterwards. Which we e.g. didn't do for when zeroing out an invalid buffer. We discussed other solutions, e.g. filling a pre-allocated buffer that's passed to relpath(), but they all came with plenty downsides or were larger projects. The easiest fix seems to be to make relpath() return the path by value. To be able to return the path by value we need to determine the maximum length of a relation path. This patch adds a long #define that computes the exact maximum, which is verified to be correct in a regression test. As this change the signature of relpath(), extensions using it will need to adapt their code. We discussed leaving a backward-compat shim in place, but decided it's not worth it given the use of relpath() doesn't seem widespread. Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra	2025-02-25 09:02:07 -05:00
Peter Eisentraut	3e4d868615	Remove various unnecessary (char ) casts Remove a number of (char ) casts that are unnecessary. Or in some cases, rewrite the code to make the purpose of the cast clearer. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-20 19:49:27 +01:00
Andres Freund	4dc2896353	Add pg_encoding_set_invalid() There are cases where we cannot / do not want to error out for invalidly encoded input. In such cases it can be useful to replace e.g. an incomplete multi-byte characters with bytes that will trigger an error when getting validated as part of a larger string. Unfortunately, until now, for some encoding no such sequence existed. For those encodings this commit removes one previously accepted input combination - we consider that to be ok, as the chosen bytes are outside of the valid ranges for the encodings, we just previously failed to detect that. As we cannot add a new field to pg_wchar_table without breaking ABI, this is implemented "in-line" in the newly added function. Author: Noah Misch <noah@leadboat.com> Reviewed-by: Andres Freund <andres@anarazel.de> Backpatch-through: 13 Security: CVE-2025-1094	2025-02-10 10:03:37 -05:00
Jeff Davis	4e7f62bc38	Add support for Unicode case folding. Expand case mapping tables to include entries for case folding, which are parsed from CaseFolding.txt. Discussion: https://postgr.es/m/a1886ddfcd8f60cb3e905c93009b646b4cfb74c5.camel%40j-davis.com	2025-01-23 09:06:50 -08:00
Andrew Dunstan	ea5ff5833c	Be clearer about when jsonapi's need_escapes is needed Most operations beyond pure json parsing need to set need_escapes to true to get access to field names and string scalars. Document this fact more explicitly. Slightly tweaked patch from: Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=c49Vkfg2+A8ubSuEtaGEjuaKZXCA6SrXA8kdwHjx3uxQ@mail.gmail.com	2025-01-19 09:09:58 -05:00
Jeff Davis	286a365b9c	Support Unicode full case mapping and conversion. Generate tables from Unicode SpecialCasing.txt to support more sophisticated case mapping behavior: * support case mappings to multiple codepoints, such as "ß" uppercasing to "SS" * support conditional case mappings, such as the "final sigma" * support titlecase variants, such as "ǆ" uppercasing to "Ǆ" but titlecasing to "ǅ" Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org Reviewed-by: Peter Eisentraut, Daniel Verite	2025-01-17 15:56:20 -08:00
Tatsuo Ishii	72ceb21b02	Fix a compiler warning in initStringInfo(). Fix a compiler warning found by Cfbot. This was caused by commit bb86e85e442.	2025-01-11 15:52:37 +09:00
Tatsuo Ishii	a9dcbb4d5c	Add new StringInfo APIs to allow callers to specify the buffer size. Previously StringInfo APIs allocated buffers with fixed initial allocation size of 1024 bytes. This may be too large and inappropriate for some callers that can do with smaller memory buffers. To fix this, introduce new APIs that allow callers to specify initial buffer size. extern StringInfo makeStringInfoExt(int initsize); extern void initStringInfoExt(StringInfo str, int initsize); Existing APIs (makeStringInfo() and initStringInfo()) are changed to call makeStringInfoExt and initStringInfoExt respectively (via inline helper functions makeStringInfoInternal and initStringInfoInternal), with the default buffer size of 1024. Reviewed-by: Nathan Bossart, David Rowley, Michael Paquier, Gurjeet Singh Discussion: https://postgr.es/m/20241225.123704.1194662271286702010.ishii%40postgresql.org	2025-01-11 08:23:46 +09:00
Bruce Momjian	50e6eb731d	Update copyright for 2025 Backpatch-through: 13	2025-01-01 11:21:55 -05:00
Heikki Linnakangas	2571c1d5cc	meson: Export all libcommon functions in Windows builds This fixes "unresolved external symbol" errors with extensions that use functions from libcommon. This was reported with pgvector. Reported-by: Andrew Kane Author: Vladlen Popolitov Backpatch-through: 16, where Meson was introduced Discussion: https://www.postgresql.org/message-id/CAOdR5yF0krWrxycA04rgUKCgKugRvGWzzGLAhDZ9bzNv8g0Lag@mail.gmail.com	2024-12-25 18:14:18 +02:00
Michael Paquier	7b2690a571	Fix outdated comment of scram_build_secret() This routine documented that "iterations" would use a default value if set to 0 by the caller. However, the iteration should always be set by the caller to a value strictly more than 0, as documented by an assertion. Oversight in `b577743000`, that has made the iteration count of SCRAM configurable. Author: Matheus Alcantara Discussion: https://postgr.es/m/ac858943-4743-44cd-b4ad-08a0c10cbbc8@gmail.com Backpatch-through: 16	2024-12-10 12:54:09 +09:00
Andrew Dunstan	5c32c21afe	jsonapi: add lexer option to keep token ownership Commit `0785d1b8b` adds support for libpq as a JSON client, but allocations for string tokens can still be leaked during parsing failures. This is tricky to fix for the object_field semantic callbacks: the field name must remain valid until the end of the object, but if a parsing error is encountered partway through, object_field_end() won't be invoked and the client won't get a chance to free the field name. This patch adds a flag to switch the ownership of parsed tokens to the lexer. When this is enabled, the client must make a copy of any tokens it wants to persist past the callback lifetime, but the lexer will handle necessary cleanup on failure. Backend uses of the JSON parser don't need to use this flag, since the parser's allocations will occur in a short lived memory context. A -o option has been added to test_json_parser_incremental to exercise the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP tests make use of it. (The test program now cleans up allocated memory, so that tests can be usefully run under leak sanitizers.) Author: Jacob Champion Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com	2024-11-27 12:07:14 -05:00
Álvaro Herrera	e6c32d9fad	Clean up newlines following left parentheses Most came in during the 17 cycle, so backpatch there. Some (particularly reorderbuffer.h) are very old, but backpatching doesn't seem useful. Like commits `c9d2977519`, `c4f113e8fe`.	2024-11-26 17:10:07 +01:00
Peter Eisentraut	ecb5af7798	Remove unused #include's from bin .c files as determined by IWYU Similar to commit `dbbca2cf29`, but for bin and some related files. Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org	2024-11-06 11:11:52 +01:00
Nathan Bossart	849110dd3e	Optimize sifting down in binaryheap. Presently, each iteration of the loop in sift_down() will perform 3 comparisons if both children are larger than the parent node (2 for comparing each child to the parent node, and a third to compare the children to each other). By first comparing the children to each other and then comparing the larger child to the parent node, we can accomplish the same thing with just 2 comparisons (while also not affecting the number of comparisons in any other case). Author: ChangAo Chen Reviewed-by: Robert Haas Discussion: https://postgr.es/m/tencent_0142D8DA90940B9930BCC08348BBD6D0BB07%40qq.com	2024-10-30 11:28:34 -05:00
Peter Eisentraut	2845cd1ca0	meson: Add missing dependency to unicode test programs The test programs in src/common/unicode/ (case_test, category_test, norm_test), don't build with meson if the nls option is enabled, because a libintl dependency is missing. Fix that. (The makefiles are ok.) Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/52db1d2b-4b96-473e-b323-a4b16a950fba%40eisentraut.org	2024-10-30 08:30:00 +01:00
Tom Lane	11b7de4a78	Unify src/common/'s definitions of MaxAllocSize. As threatened in the previous patch, define MaxAllocSize in src/include/common/fe_memutils.h rather than having several copies of it in different src/common/*.c files. This also provides an opportunity to document it better. While this would probably be safe to back-patch, I'll refrain (for now anyway).	2024-10-28 14:39:01 -04:00
Tom Lane	bd28431672	Guard against enormously long input in pg_saslprep(). Coverity complained that pg_saslprep() could suffer integer overflow, leading to under-allocation of the output buffer, if the input string exceeds SIZE_MAX/4. This hazard seems largely hypothetical, but it's easy enough to defend against, so let's do so. This patch creates a third place in src/common/ where we are locally defining MaxAllocSize so that we can test against that in the same way in backend and frontend compiles. That seems like about two places too many, so the next patch will move that into common/fe_memutils.h. I'm hesitant to do that in back branches however. Back-patch to v14. The code looks similar in older branches, but before commit `67a472d71` there was a separate test on the input string length that prevented this hazard. Per Coverity report.	2024-10-28 14:33:55 -04:00
Peter Eisentraut	4b652692e9	Fix memory leaks from incorrect strsep() uses Commit `5d2e1cc117` introduced some strsep() uses, but it did the memory management wrong in some cases. We need to keep a separate pointer to the allocate memory so that we can free it later, because strsep() advances the pointer we pass to it, and it at the end it will be NULL, so any free() calls won't do anything. (This fixes two of the four places changed in commit `5d2e1cc117`. The other two don't have this problem.) Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-10-18 11:29:20 +02:00
Peter Eisentraut	41b023946d	jsonapi: fully initialize dummy lexer Valgrind reports that checks on lex->inc_state are undefined for the "dummy lexer" used for incremental parsing, since it's only partially initialized on the stack. This was introduced in `0785d1b8b2`. Zero-initialize the whole struct. Author: Jacob Champion <jacob.champion@enterprisedb.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com	2024-10-17 08:23:46 +02:00
Jeff Davis	a7f2f6adc2	Whitespace fixup from generated unicode tables. When running the 'update-unicode' build target, generate files that conform to pgindent whitespace rules.	2024-10-16 12:21:13 -07:00

1 2 3 4 5 ...

484 commits