Commit graph

484 commits

Author SHA1 Message Date
Michael Paquier
273fe94852 Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE)
The NFC recomposition incorrectly included TBASE as a valid T syllable,
which is incorrect based on the Unicode specification (TBASE is one
below the start of the range, range beginning at U+11A8).

This would cause the TBASE to be silently swallowed in the
normalization, leading to an incorrect result.

A couple of regression tests are added to check more patterns with
Hangul recomposition and decomposition, on top of a test to check the
problem with TBASE.  Diego has submitted the code fix, and I have
written the tests.

Author: Diego Frias <mail@dzfrias.dev>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/B92ED640-7D4A-4505-B09F-3548F58CBB16@dzfrias.dev
Backpatch-through: 14
2026-06-05 07:50:12 +09:00
Tom Lane
01e568b8c1 Fix assorted places that need to use palloc_array().
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size.  This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.

Fix these places by using palloc_array() instead of a handwritten
multiplication.  (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)

In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter.  If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow.  Add an explicit check to forestall that and keep the
behavior the same cross-platform.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:47 -07:00
Tom Lane
8d1489d505 Prevent buffer overrun in unicode_normalize().
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value.  This results in a small output buffer
allocation and subsequent buffer overrun.

To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so.  In frontend code we want to
just return NULL for this failure (treating it like OOM).  In the
backend, we can rely on the following palloc() call to throw error.

I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array().  These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.

Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:47 -07:00
Tom Lane
e1c30458a1 Make palloc_array() and friends safe against integer overflow.
Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it.  This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.

To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation.  Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions.  Actually, the latter
use inlined copies to save one function call.  repalloc_array() gets
similar treatment.  I didn't bother trying to inline the calls for
repalloc0_array() though.

In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.

We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:47 -07:00
Jeff Davis
6044f55a47 Fix callers of unicode_strtitle() using srclen == -1.
Currently, only called that way in tests, which failed to fail.

Discussion: https://postgr.es/m/581a72ff452bb045ba83bbe3c6cf4467702d4f0f.camel@j-davis.com
Backpatch-through: 18
2026-04-20 14:45:44 -07:00
Thomas Munro
ea94d2e673 Fix comments for Korean encodings in encnames.c
* JOHAB: replace the incorrect "simplified Chinese" description with
    a correct one that identifies it as the Korean combining (Johab)
    encoding standardized in KS X 1001 annex 3.

  * EUC_KR: drop a stray space before the comma in the existing
    comment, and note that the encoding covers the KS X 1001
    precomposed (Wansung) form.

  * UHC: spell out "Unified Hangul Code", clarify that it is
    Microsoft Windows CodePage 949, and describe its relationship to
    EUC-KR (superset covering all 11,172 precomposed Hangul syllables).

Backpatch-through: 14
Author: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/CAAAe_zAFz1v-3b7Je4L%2B%3DwZM3UGAczXV47YVZfZi9wbJxspxeA%40mail.gmail.com
2026-04-16 18:21:41 +12:00
Andrew Dunstan
c3e436b1cb Fix heap-buffer-overflow in pglz_decompress() on corrupt input.
When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains.  The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.

Fix by validating that sufficient source bytes are available before
reading each part of the match tag.  The post-read sp > srcend
check is no longer needed and is removed.

Found by fuzz testing with libFuzzer and AddressSanitizer.

Backpatch-through: 14
2026-04-10 10:28:00 -04:00
Andrew Dunstan
3e49556302 Fix incremental JSON parser numeric token reassembly across chunks.
When the incremental JSON parser splits a numeric token across chunk
boundaries, it accumulates continuation characters into the partial
token buffer.  The accumulator's switch statement unconditionally
accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations
regardless of position, which violated JSON number grammar
(-? int [frac] [exp]).  For example, input "4-" fed in single-byte
chunks would accumulate the '-' into the numeric token, producing an
invalid token that later triggered an assertion failure during
re-lexing.

Fix by tracking parser state (seen_dot, seen_exp, prev character)
across the existing partial token and incoming bytes, so that each
character class is accepted only in its grammatically valid position.

Backpatch-through: 17
2026-04-10 10:21:38 -04:00
Tom Lane
de77775a7b Fix some cases of indirectly casting away const.
Newest versions of gcc+glibc are able to detect cases where code
implicitly casts away const by assigning the result of strchr() or
a similar function applied to a "const char *" value to a target
variable that's just "char *".  This of course creates a hazard of
not getting a compiler warning about scribbling on a string one was
not supposed to, so fixing up such cases is good.

This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring.

Thanks to Bertrand Drouvot for finding a couple more spots than
I had.

This commit back-patches relevant portions of 8f1791c61 and
9f7565c6c into supported branches.  However, there are two
places in ecpg (in v18 only) where a proper fix is more
complicated than seems appropriate for a back-patch.  I opted
to silence those two warnings by adding casts.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
Discussion: https://postgr.es/m/3988414.1771950285@sss.pgh.pa.us
Backpatch-through: 14-18
2026-02-25 11:19:50 -05:00
Thomas Munro
efef05ba99 Fix mb2wchar functions on short input.
When converting multibyte to pg_wchar, the UTF-8 implementation would
silently ignore an incomplete final character, while the other
implementations would cast a single byte to pg_wchar, and then repeat
for the remaining byte sequence.  While it didn't overrun the buffer, it
was surely garbage output.

Make all encodings behave like the UTF-8 implementation.  A later change
for master only will convert this to an error, but we choose not to
back-patch that behavior change on the off-chance that someone is
relying on the existing UTF-8 behavior.

Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
2026-02-09 12:12:33 +13:00
Thomas Munro
df0852fe03 Fix encoding length for EUC_CN.
While EUC_CN supports only 1- and 2-byte sequences (CS0, CS1), the
mb<->wchar conversion functions allow 3-byte sequences beginning SS2,
SS3.

Change pg_encoding_max_length() to return 3, not 2, to close a
hypothesized buffer overrun if a corrupted string is converted to wchar
and back again in a newly allocated buffer.  We might reconsider that in
master (ie harmonizing in a different direction), but this change seems
better for the back-branches.

Also change pg_euccn_mblen() to report SS2 and SS3 characters as having
length 3 (following the example of EUC_KR).  Even though such characters
would not pass verification, it's remotely possible that invalid bytes
could be used to compute a buffer size for use in wchar conversion.

Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
2026-02-09 12:12:29 +13:00
Tom Lane
8e5e3ff556 Don't put library-supplied -L/-I switches before user-supplied ones.
For many optional libraries, we extract the -L and -l switches needed
to link the library from a helper program such as llvm-config.  In
some cases we put the resulting -L switches into LDFLAGS ahead of
-L switches specified via --with-libraries.  That risks breaking
the user's intention for --with-libraries.

It's not such a problem if the library's -L switch points to a
directory containing only that library, but on some platforms a
library helper may "helpfully" offer a switch such as -L/usr/lib
that points to a directory holding all standard libraries.  If the
user specified --with-libraries in hopes of overriding the standard
build of some library, the -L/usr/lib switch prevents that from
happening since it will come before the user-specified directory.

To fix, avoid inserting these switches directly into LDFLAGS during
configure, instead adding them to LIBDIRS or SHLIB_LINK.  They will
still eventually get added to LDFLAGS, but only after the switches
coming from --with-libraries.

The same problem exists for -I switches: those coming from
--with-includes should appear before any coming from helper programs
such as llvm-config.  We have not heard field complaints about this
case, but it seems certain that a user attempting to override a
standard library could have issues.

The changes for this go well beyond configure itself, however,
because many Makefiles have occasion to manipulate CPPFLAGS to
insert locally-desirable -I switches, and some of them got it wrong.
The correct ordering is any -I switches pointing at within-the-
source-tree-or-build-tree directories, then those from the tree-wide
CPPFLAGS, then those from helper programs.  There were several places
that risked pulling in a system-supplied copy of libpq headers, for
example, instead of the in-tree files.  (Commit cb36f8ec2 fixed one
instance of that a few months ago, but this exercise found more.)

The Meson build scripts may or may not have any comparable problems,
but I'll leave it to someone else to investigate that.

Reported-by: Charles Samborski <demurgos@demurgos.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70f2155f-27ca-4534-b33d-7750e20633d7@demurgos.net
Backpatch-through: 13
2025-07-29 15:17:40 -04:00
Heikki Linnakangas
b28c59a6cd Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays
A 'void *' argument suggests that the caller might pass an arbitrary
struct, which is appropriate for functions like libc's read/write, or
pq_sendbytes(). 'uint8 *' is more appropriate for byte arrays that
have no structure, like the cancellation keys or SCRAM tokens. Some
places used 'char *', but 'uint8 *' is better because 'char *' is
commonly used for null-terminated strings. Change code around SCRAM,
MD5 authentication, and cancellation key handling to follow these
conventions.

Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
2025-05-08 22:01:25 +03:00
Noah Misch
627acc3caa With GB18030, prevent SIGSEGV from reading past end of allocation.
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from().  Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString().  Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions.  All those functions have been
intended to handle untrusted, unterminated input safely.

A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page.  Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach.  Back-patch to v13 (all supported versions).

Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
2025-05-05 04:52:04 -07:00
Jeff Davis
90260e2ec6 Fix INITCAP() word boundaries for PG_UNICODE_FAST.
Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.

Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
2025-04-21 12:34:58 -07:00
Peter Geoghegan
a6cab6a78e Harmonize function parameter names for Postgres 18.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2025-04-12 12:07:36 -04:00
Fujii Masao
173c97812f Use XLOG_CONTROL_FILE macro consistently for control file name.
The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h)
represents the control file name. While some parts of the codebase already
use this macro, others previously hardcoded the file name as a string.

This commit replaces those hardcoded strings with the macro,
ensuring consistent usage throughout the code. This makes future
maintenance easier and improves searchability, for example when
grepping for control file usage.

Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru
2025-04-07 09:27:33 +09:00
Peter Eisentraut
82a46cca99 Update Unicode data to Unicode 16.0.0
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/146349e4-4687-4321-91af-f235572490a8@eisentraut.org
2025-04-03 12:00:09 +02:00
Peter Eisentraut
bbf24fe2f1 Update code comment
Commit 4e7f62bc38 added a new input file to a script but didn't
update the comment listing the input files.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
34f04aa653 Fix update-unicode make target
The addition of SpecialCasing.txt by commit 286a365b9c was not added
to the make target dependencies, so the invoked script would fail
because the required file wasn't downloaded first.  (The meson version
appears to work correctly.)
2025-04-03 09:20:25 +02:00
Richard Guo
7c82b4f711 Fix integer-overflow problem in scram_SaltedPassword()
Setting the iteration count for SCRAM secret generation to INT_MAX
will cause an infinite loop in scram_SaltedPassword() due to integer
overflow, as the loop uses the "i <= iterations" comparison.  To fix,
use "i < iterations" instead.

Back-patch to v16 where the user-settable GUC scram_iterations has
been added.

Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com
2025-03-26 17:46:51 +09:00
Nathan Bossart
626d7236b6 pg_upgrade: Add --swap for faster file transfer.
This new option instructs pg_upgrade to move the data directories
from the old cluster to the new cluster and then to replace the
catalog files with those generated for the new cluster.  This mode
can outperform --link, --clone, --copy, and --copy-file-range,
especially on clusters with many relations.

However, this mode creates many garbage files in the old cluster,
which can prolong the file synchronization step if
--sync-method=syncfs is used.  To handle that, we recommend using
--sync-method=fsync with this mode, and pg_upgrade internally uses
"initdb --sync-only --no-sync-data-files" for file synchronization.
pg_upgrade will synchronize the catalog files as they are
transferred.  We assume that the database files transferred from
the old cluster were synchronized prior to upgrade.

This mode also complicates reverting to the old cluster, so we
recommend restoring from backup upon failure during or after file
transfer.  We did consider teaching pg_upgrade how to generate a
revert script for such failures, but we decided against it due to
the rarity of failing during file transfer, the complexity of
generating the script, and the potential for misusing the script.

The new mode is limited to clusters located in the same file
system.  With some effort, we could probably support upgrades
between different file systems, but this mode is unlikely to offer
much benefit if we have to copy the files across file system
boundaries.

It is also limited to upgrades from version 10 or newer.  There are
a few known obstacles for using swap mode to upgrade from older
versions.  For example, the visibility map format changed in v9.6,
and the sequence tuple format changed in v10.  In fact, swap mode
omits the --sequence-data option in its uses of pg_dump and instead
reuses the old cluster's sequence data files.  While teaching swap
mode to deal with these kinds of changes is surely possible (and we
may have to deal with similar problems in the future, anyway), it
doesn't seem worth the effort to support upgrades from
long-unsupported versions.

Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Nathan Bossart
cf131fa942 initdb: Add --no-sync-data-files.
This new option instructs initdb to skip synchronizing any files
in database directories, the database directories themselves, and
the tablespace directories, i.e., everything in the base/
subdirectory and any other tablespace directories.  Other files,
such as those in pg_wal/ and pg_xact/, will still be synchronized
unless --no-sync is also specified.  --no-sync-data-files is
primarily intended for internal use by tools that separately ensure
the skipped files are synchronized to disk.  A follow-up commit
will use this to help optimize pg_upgrade's file transfer step.

The --sync-method=fsync implementation of this option makes use of
a new exclude_dir parameter for walkdir().  When not NULL,
exclude_dir specifies a directory to skip processing.  The
--sync-method=syncfs implementation of this option just skips
synchronizing the non-default tablespace directories.  This means
that initdb will still synchronize some or all of the database
files, but there's not much we can do about that.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Peter Eisentraut
618c64ffd3 Revert workarounds for -Wmissing-braces false positives on old GCC
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings.  This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5).  (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)

That compiler no longer runs warning-free since commit 6fdd5d9563, so
we don't need to keep these workarounds.  And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.

This reverts the following commits:

d937904cce
506428d091
b449afb582
6392f2a096
bad0763a4d
5e0c761d0a

and makes a few similar fixes to newer code.

Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
2025-03-20 11:25:58 +01:00
Jeff Davis
549ea06e42 Fix headerscheck warning.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us
2025-03-18 08:37:07 -07:00
Andrew Dunstan
5eabd91a83 Silence perl critic
Commit 27bdec0684 uses a loop variable that is not strictly local to
the loop. Perlcritic disapproves, and there's really no reason as the
variable is not used outside the loop.

Per buildfarm animals koel and crake.
2025-03-15 17:41:54 -04:00
Jeff Davis
27bdec0684 Optimization for lower(), upper(), casefold() functions.
Improve performance and reduce table sizes for case mapping.

The main case mapping table stores only 16-bit offsets, which can be
used to look up the mapped code point in any of the case tables (fold,
lower, upper, or title case). Simple case pairs point to the same
offsets.

Generate a function in generate-unicode_case_table.pl that consists of
a nested branches to test for specific codepoint ranges that determine
the offset in the main table.

Other approaches were considered, such as representing these ranges as
another structure (rather than branches in a generated function), or a
different approach such as a radix tree, or perfect hashing. The
author implemented and tested these alternatives and settled on the
generated branches.

Author: Alexander Borisov <lex.borisov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com
2025-03-15 13:00:50 -07:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Jeff Davis
d3b2e5e1ab Refactor convert_case() to prepare for optimizations.
Upcoming optimizations will add complexity to convert_case(). This
patch reorganizes slightly so that the complexity can be contained
within the logic to convert the case of a single character, rather
than mixing it in with logic to iterate through the string.

Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com
2025-03-12 21:51:52 -07:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Peter Eisentraut
3e4d868615 Remove various unnecessary (char *) casts
Remove a number of (char *) casts that are unnecessary.  Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-20 19:49:27 +01:00
Andres Freund
4dc2896353 Add pg_encoding_set_invalid()
There are cases where we cannot / do not want to error out for invalidly
encoded input. In such cases it can be useful to replace e.g. an incomplete
multi-byte characters with bytes that will trigger an error when getting
validated as part of a larger string.

Unfortunately, until now, for some encoding no such sequence existed. For
those encodings this commit removes one previously accepted input combination
- we consider that to be ok, as the chosen bytes are outside of the valid
ranges for the encodings, we just previously failed to detect that.

As we cannot add a new field to pg_wchar_table without breaking ABI, this is
implemented "in-line" in the newly added function.

Author: Noah Misch <noah@leadboat.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:37 -05:00
Jeff Davis
4e7f62bc38 Add support for Unicode case folding.
Expand case mapping tables to include entries for case folding, which
are parsed from CaseFolding.txt.

Discussion: https://postgr.es/m/a1886ddfcd8f60cb3e905c93009b646b4cfb74c5.camel%40j-davis.com
2025-01-23 09:06:50 -08:00
Andrew Dunstan
ea5ff5833c Be clearer about when jsonapi's need_escapes is needed
Most operations beyond pure json parsing need to set need_escapes to
true to get access to field names and string scalars. Document this
fact more explicitly.

Slightly tweaked patch from:

Author: Corey Huinker <corey.huinker@gmail.com>

Discussion: https://postgr.es/m/CADkLM=c49Vkfg2+A8ubSuEtaGEjuaKZXCA6SrXA8kdwHjx3uxQ@mail.gmail.com
2025-01-19 09:09:58 -05:00
Jeff Davis
286a365b9c Support Unicode full case mapping and conversion.
Generate tables from Unicode SpecialCasing.txt to support more
sophisticated case mapping behavior:

 * support case mappings to multiple codepoints, such as "ß"
   uppercasing to "SS"
 * support conditional case mappings, such as the "final sigma"
 * support titlecase variants, such as "dž" uppercasing to "DŽ" but
   titlecasing to "Dž"

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite
2025-01-17 15:56:20 -08:00
Tatsuo Ishii
72ceb21b02 Fix a compiler warning in initStringInfo().
Fix a compiler warning found by Cfbot. This was caused by commit
bb86e85e442.
2025-01-11 15:52:37 +09:00
Tatsuo Ishii
a9dcbb4d5c Add new StringInfo APIs to allow callers to specify the buffer size.
Previously StringInfo APIs allocated buffers with fixed initial
allocation size of 1024 bytes. This may be too large and inappropriate
for some callers that can do with smaller memory buffers. To fix this,
introduce new APIs that allow callers to specify initial buffer size.

extern StringInfo makeStringInfoExt(int initsize);
extern void initStringInfoExt(StringInfo str, int initsize);

Existing APIs (makeStringInfo() and initStringInfo()) are changed to
call makeStringInfoExt and initStringInfoExt respectively (via inline
helper functions makeStringInfoInternal and initStringInfoInternal),
with the default buffer size of 1024.

Reviewed-by: Nathan Bossart, David Rowley, Michael Paquier, Gurjeet Singh
Discussion: https://postgr.es/m/20241225.123704.1194662271286702010.ishii%40postgresql.org
2025-01-11 08:23:46 +09:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Heikki Linnakangas
2571c1d5cc meson: Export all libcommon functions in Windows builds
This fixes "unresolved external symbol" errors with extensions that
use functions from libcommon. This was reported with pgvector.

Reported-by: Andrew Kane
Author: Vladlen Popolitov
Backpatch-through: 16, where Meson was introduced
Discussion: https://www.postgresql.org/message-id/CAOdR5yF0krWrxycA04rgUKCgKugRvGWzzGLAhDZ9bzNv8g0Lag@mail.gmail.com
2024-12-25 18:14:18 +02:00
Michael Paquier
7b2690a571 Fix outdated comment of scram_build_secret()
This routine documented that "iterations" would use a default value if
set to 0 by the caller.  However, the iteration should always be set by
the caller to a value strictly more than 0, as documented by an
assertion.

Oversight in b577743000, that has made the iteration count of SCRAM
configurable.

Author: Matheus Alcantara
Discussion: https://postgr.es/m/ac858943-4743-44cd-b4ad-08a0c10cbbc8@gmail.com
Backpatch-through: 16
2024-12-10 12:54:09 +09:00
Andrew Dunstan
5c32c21afe jsonapi: add lexer option to keep token ownership
Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.

This patch adds a flag to switch the ownership of parsed tokens to the
lexer. When this is enabled, the client must make a copy of any tokens
it wants to persist past the callback lifetime, but the lexer will
handle necessary cleanup on failure.

Backend uses of the JSON parser don't need to use this flag, since the
parser's allocations will occur in a short lived memory context.

A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)

Author: Jacob Champion

Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com
2024-11-27 12:07:14 -05:00
Álvaro Herrera
e6c32d9fad
Clean up newlines following left parentheses
Most came in during the 17 cycle, so backpatch there.  Some
(particularly reorderbuffer.h) are very old, but backpatching doesn't
seem useful.

Like commits c9d2977519, c4f113e8fe.
2024-11-26 17:10:07 +01:00
Peter Eisentraut
ecb5af7798 Remove unused #include's from bin .c files
as determined by IWYU

Similar to commit dbbca2cf29, but for bin and some related files.

Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
2024-11-06 11:11:52 +01:00
Nathan Bossart
849110dd3e Optimize sifting down in binaryheap.
Presently, each iteration of the loop in sift_down() will perform
3 comparisons if both children are larger than the parent node (2
for comparing each child to the parent node, and a third to compare
the children to each other).  By first comparing the children to
each other and then comparing the larger child to the parent node,
we can accomplish the same thing with just 2 comparisons (while
also not affecting the number of comparisons in any other case).

Author: ChangAo Chen
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/tencent_0142D8DA90940B9930BCC08348BBD6D0BB07%40qq.com
2024-10-30 11:28:34 -05:00
Peter Eisentraut
2845cd1ca0 meson: Add missing dependency to unicode test programs
The test programs in src/common/unicode/ (case_test, category_test,
norm_test), don't build with meson if the nls option is enabled,
because a libintl dependency is missing.  Fix that.  (The makefiles
are ok.)

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/52db1d2b-4b96-473e-b323-a4b16a950fba%40eisentraut.org
2024-10-30 08:30:00 +01:00
Tom Lane
11b7de4a78 Unify src/common/'s definitions of MaxAllocSize.
As threatened in the previous patch, define MaxAllocSize in
src/include/common/fe_memutils.h rather than having several
copies of it in different src/common/*.c files.  This also
provides an opportunity to document it better.

While this would probably be safe to back-patch, I'll refrain
(for now anyway).
2024-10-28 14:39:01 -04:00
Tom Lane
bd28431672 Guard against enormously long input in pg_saslprep().
Coverity complained that pg_saslprep() could suffer integer overflow,
leading to under-allocation of the output buffer, if the input string
exceeds SIZE_MAX/4.  This hazard seems largely hypothetical, but it's
easy enough to defend against, so let's do so.

This patch creates a third place in src/common/ where we are locally
defining MaxAllocSize so that we can test against that in the same way
in backend and frontend compiles.  That seems like about two places
too many, so the next patch will move that into common/fe_memutils.h.
I'm hesitant to do that in back branches however.

Back-patch to v14.  The code looks similar in older branches, but
before commit 67a472d71 there was a separate test on the input string
length that prevented this hazard.

Per Coverity report.
2024-10-28 14:33:55 -04:00
Peter Eisentraut
4b652692e9 Fix memory leaks from incorrect strsep() uses
Commit 5d2e1cc117 introduced some strsep() uses, but it did the
memory management wrong in some cases.  We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.

(This fixes two of the four places changed in commit 5d2e1cc117.  The
other two don't have this problem.)

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-10-18 11:29:20 +02:00
Peter Eisentraut
41b023946d jsonapi: fully initialize dummy lexer
Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com
2024-10-17 08:23:46 +02:00
Jeff Davis
a7f2f6adc2 Whitespace fixup from generated unicode tables.
When running the 'update-unicode' build target, generate files that
conform to pgindent whitespace rules.
2024-10-16 12:21:13 -07:00