Commit graph

501 commits

Author SHA1 Message Date
Michael Paquier
3d10ece612 Make implementation of SASLprep compliant for ASCII characters
This commit makes our implementation of SASLprep() compliant with RFC
3454 (Stringprep) and RFC 4013 (SASLprep).  Originally, as introduced in
60f11b87a2, the operation considered a password made of only ASCII
characters as valid, performing an optimization for this case to skip
the internal NFKC transformation.

However, the RFCs listed above use a different definition, with the
following characters being prohibited:
- 0x00~0x1F (0~31), control characters.
- 0x7F (127, DEL).

In its SCRAM protocol, Postgres has the idea to apply a password as-is
if SASLprep() is not a success, so this change is safe on
backward-compatibility grounds:
- A libpq client with the compliant SASLprep can connect to a server
with a non-compliant SASLprep.
- A libpq client with the non-compliant SASLprep can connect to a server
with a compliant SASLprep.

This commit removes the all-ASCII optimization used in pg_saslprep() and
applies SASLprep even if a password is made only of ASCII characters,
making the operation compatible with the RFC.  All the in-core callers
of pg_saslprep() do that:
- pg_be_scram_build_secret() in auth-scram.c, when generating a
SCRAM verifier for rolpassword in the backend.
- scram_init() in fe-auth-scram.c, when starting the SASL exchange.
- pg_fe_scram_build_secret() in fe-auth-scram.c, when generating a SCRAM
verifier for the frontend with libpq, to generate it for a ALTER/CREATE
ROLE command for example.

The test module test_saslprep shows the difference this change is
leading to.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aaEJ-El2seZHeFcG@paquier.xyz
2026-03-24 08:29:23 +09:00
Andrew Dunstan
c8a350a439 Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.

This change promotes code reuse and simplifies maintenance across
frontend tools.

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
2026-03-20 15:31:35 -04:00
Peter Eisentraut
57ee397953 Update Unicode data to Unicode 17.0.0
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/2a668979-ed92-49a3-abf9-a3ec2d460ec2%40eisentraut.org
2026-03-20 08:42:50 +01:00
Peter Eisentraut
1b0c269f2e Implement unaccent Unicode data update in meson
The meson/ninja update-unicode target did not cover the required
updates in contrib/unaccent/.  This is fixed now.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2a668979-ed92-49a3-abf9-a3ec2d460ec2%40eisentraut.org
2026-03-18 13:42:05 +01:00
Álvaro Herrera
868825aaeb
Don't include wait_event.h in pgstat.h
wait_event.h itself includes wait_event_types.h, which is a generated
file, so it's nice that we can avoid compiling >10% of the tree just
because that file is regenerated.

To avoid breaking too many third-party modules, we now #include
utils/wait_classes.h in storage/latch.h.  Then, the very common case
of doing
	WaitLatch(..., PG_WAIT_EXTENSION)
continues to work by including just storage/latch.h.  (I didn't try to
determine how many modules would actually break if we don't do this, but
this seems a convenient and low-impact measure.)

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202602181214.gcmhx2vhlxzp@alvherre.pgsql
2026-03-06 16:24:58 +01:00
Peter Eisentraut
3f98862980 Fix some -Wcast-qual warnings
This fixes some warnings from -Wcast-qual that are easy to fix,
without using unconstify or the like.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/990c9117-b013-4026-aaf5-261fe2832c3d%40eisentraut.org
2026-02-27 21:57:33 +01:00
Peter Eisentraut
78727dcba3 meson: allow disabling building/installation of static libraries.
We now support the common meson option -Ddefault_library, with values
'both' (the default), 'shared' (install only shared libraries), and
'static' (install only static libraries).  The 'static' choice doesn't
actually work, since psql and other programs insist on linking to the
shared version of libpq, but it's there pro-forma.  It could be built
out if we really wanted, but since we have never supported the
equivalent in the autoconf build system, there doesn't appear to be an
urgent need.

With an eye to re-supporting AIX, the internal implementation
distinguishes whether to install libpgport.a and other static-only
libraries from whether to build/install the static variant of
libraries that we can build both ways.  This detail isn't exposed as a
meson option, though it could be if there's demand.

The Cirrus CI task SanityCheck now uses -Ddefault_library=shared to
save a little bit of build time (and to test this option).

Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/e8aa97db-872b-4087-b073-f296baae948d@eisentraut.org
2026-02-23 16:45:40 +01:00
Peter Eisentraut
8354b9d6b6 Use fallthrough attribute instead of comment
Instead of using comments to mark fallthrough switch cases, use the
fallthrough attribute.  This will (in the future, not here) allow
supporting other compilers besides gcc.  The commenting convention is
only supported by gcc, the attribute is supported by clang, and in the
fullness of time the C23 standard attribute would allow supporting
other compilers as well.

Right now, we package the attribute into a macro called
pg_fallthrough.  This commit defines that macro and replaces the
existing comments with that macro invocation.

We also raise the level of the gcc -Wimplicit-fallthrough= option from
3 to 5 to enforce the use of the attribute.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
2026-02-19 08:51:12 +01:00
Thomas Munro
74ee636cc9 Fix mb2wchar functions on short input.
When converting multibyte to pg_wchar, the UTF-8 implementation would
silently ignore an incomplete final character, while the other
implementations would cast a single byte to pg_wchar, and then repeat
for the remaining byte sequence.  While it didn't overrun the buffer, it
was surely garbage output.

Make all encodings behave like the UTF-8 implementation.  A later change
for master only will convert this to an error, but we choose not to
back-patch that behavior change on the off-chance that someone is
relying on the existing UTF-8 behavior.

Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
2026-02-09 12:08:58 +13:00
Thomas Munro
af79c30dc3 Fix encoding length for EUC_CN.
While EUC_CN supports only 1- and 2-byte sequences (CS0, CS1), the
mb<->wchar conversion functions allow 3-byte sequences beginning SS2,
SS3.

Change pg_encoding_max_length() to return 3, not 2, to close a
hypothesized buffer overrun if a corrupted string is converted to wchar
and back again in a newly allocated buffer.  We might reconsider that in
master (ie harmonizing in a different direction), but this change seems
better for the back-branches.

Also change pg_euccn_mblen() to report SS2 and SS3 characters as having
length 3 (following the example of EUC_KR).  Even though such characters
would not pass verification, it's remotely possible that invalid bytes
could be used to compute a buffer size for use in wchar conversion.

Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
2026-02-09 12:08:58 +13:00
Peter Eisentraut
5ca5f12c2c Fix accidentally cast away qualifiers
This fixes cases where a qualifier (const, in all cases here) was
dropped by a cast, but the cast was otherwise necessary or desirable,
so the straightforward fix is to add the qualifier into the cast.

Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org
2026-01-26 16:02:31 +01:00
Álvaro Herrera
35e3fae738
Remove #include <math.h> where not needed
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.

For future patches, we require git archaelogical analysis before we
accept patches of this nature.

Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
2026-01-15 19:09:47 +01:00
Peter Eisentraut
fa16e7fd84 Revert "Replace pg_restrict by standard restrict"
This reverts commit f0f2c0c1ae.

The original problem that led to the use of pg_restrict was that MSVC
couldn't handle plain restrict, and defining it to something else
would conflict with its __declspec(restrict) that is used in system
header files.  In C11 mode, this is no longer a problem, as MSVC
handles plain restrict.  This led to the commit to replace pg_restrict
with restrict.  But this did not take C++ into account.  Standard C++
does not have restrict, so we defined it as something else (for
example, MSVC supports __restrict).  But this then again conflicts
with __declspec(restrict) in system header files.  So we have to
revert this attempt.  The comments are updated to clarify that the
reason for this is now C++ only.

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com
2026-01-14 15:12:25 +01:00
Heikki Linnakangas
ad853bb877 Fix misc typos, mostly in comments
The only user-visible change is the fix in the "malformed
pg_dependencies" error detail. That one is new in commit e1405aa5e3,
so no backpatching required.
2026-01-08 18:10:08 +02:00
Jeff Davis
c4ff35f104 ICU: use UTF8-optimized case conversion API
Initializes a UCaseMap object once for use across calls, and uses
UTF8-optimized APIs.

Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se
2026-01-06 14:09:07 -08:00
Peter Eisentraut
de746e0d2a Separate read and write pointers in pg_saslprep
Use separate pointers for reading const input ('p') and writing to
mutable output ('outp'), avoiding the need to cast away const on the
input parameter.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal
2026-01-05 11:03:49 +01:00
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Michael Paquier
0c3c5c3b06 Use palloc_object() and palloc_array() in more areas of the tree
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().

The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/

Similar work has been done in 31d3847a37.

The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-09 14:53:17 +09:00
Tom Lane
8f1791c618 Fix some cases of indirectly casting away const.
Newest versions of gcc are able to detect cases where code implicitly
casts away const by assigning the result of strchr() or a similar
function applied to a "const char *" value to a target variable
that's just "char *".  This of course creates a hazard of not getting
a compiler warning about scribbling on a string one was not supposed
to, so fixing up such cases is good.

This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring.  There is one
place in ecpg.trailer where we were indeed violating the intention
of not modifying a string passed in as "const char *".  I believe
that's harmless not a live bug, but let's fix it by copying the
string before modifying it.

There is a remaining trouble spot in ecpg/preproc/variable.c,
which requires more complex surgery.  I've left that out of this
commit because I want to study that code a bit more first.

We probably will want to back-patch this once compilers that detect
this pattern get into wider circulation, but for now I'm just
going to apply it to master to see what the buildfarm says.

Thanks to Bertrand Drouvot for finding a couple more spots than
I had.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
2025-12-05 11:17:23 -05:00
Peter Eisentraut
4f941d432b Remove useless casting to same type
This removes some casts where the input already has the same type as
the type specified by the cast.  Their presence could cause risks of
hiding actual type mismatches in the future or silently discarding
qualifiers.  It also improves readability.  Same kind of idea as
7f798aca1d and ef8fe69360.  (This does not change all such
instances, but only those hand-picked by the author.)

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
2025-12-02 10:09:32 +01:00
Daniel Gustafsson
d22cc7326c Document that pg_getaddrinfo_all does not accept null hints
While the underlying getaddrinfo call accepts a null pointer for
the hintp parameter, pg_getaddrinfo_all does not.  Document this
difference with a comment to make it clear.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Discussion: https://postgr.es/m/1e5efc94-407e-40b8-8b10-4d25f823c6d7@postgrespro.ru
2025-11-13 16:35:07 +01:00
Michael Paquier
84fb27511d Replace off_t by pgoff_t in I/O routines
PostgreSQL's Windows port has never been able to handle files larger
than 2GB due to the use of off_t for file offsets, only 32-bit on
Windows.  This causes signed integer overflow at exactly 2^31 bytes when
trying to handle files larger than 2GB, for the routines touched by this
commit.

Note that large files are forbidden by ./configure (3c6248a828) and
meson (recent change, see 79cd66f28c).  This restriction also exists
in v16 and older versions for the now-dead MSVC scripts.

The code base already defines pgoff_t as __int64 (64-bit) on Windows for
this purpose, and some function declarations in headers use it, but many
internals still rely on off_t.  This commit switches more routines to
use pgoff_t, offering more portability, for areas mainly related to file
extensions and storage.

These are not critical for WAL segments yet, which have currently a
maximum size allowed of 1GB (well, this opens the door at allowing a
larger size for them).  This matters more for segment files if we want
to lift the large file restriction in ./configure and meson in the
future, which would make sense to remove once/if all traces of off_t are
gone from the tree.  This can additionally matter for out-of-core code
that may want files larger than 2GB in places where off_t is four bytes
in size.

Note that off_t is still used in other parts of the tree like
buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc.
These other code paths can be addressed separately, and their update
will be required if we want to remove the large file restriction in the
future.  This commit is a good first cut in itself towards more
portability, hopefully.

On Unix-like systems, pgoff_t is defined as off_t, so this change only
affects Windows behavior.

Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
2025-11-13 12:41:40 +09:00
Jeff Davis
3853a6956c Use C11 char16_t and char32_t for Unicode code points.
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/bedcc93d06203dfd89815b10f815ca2de8626e85.camel%40j-davis.com
2025-10-29 14:17:13 -07:00
Peter Eisentraut
f0f2c0c1ae Replace pg_restrict by standard restrict
MSVC in C11 mode supports the standard restrict qualifier, so we don't
need the workaround naming pg_restrict anymore.

Even though restrict is in C99 and should be supported by all
supported compilers, we keep the configure test and the hardcoded
redirection to __restrict, because that will also work in C++ in all
supported compilers.  (restrict is not part of the C++ standard.)

For backward compatibility for extensions, we keep a #define of
pg_restrict around, but our own code doesn't use it anymore.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org
2025-10-29 07:52:58 +01:00
Peter Eisentraut
3e908fb54f Fix compiler warnings around _CRT_glob
Newer compilers warned about

    extern int _CRT_glob = 0;

which is indeed a mysterious C construction, as it combines "extern"
and an initialization.  It turns out that according to the C standard,
the "extern" is ignored here, so we can remove it to resolve the
warnings.  But then we also need to add a real extern
declaration (without initializer) to satisfy
-Wmissing-variable-declarations.

(Note that this code is only active on MinGW.)

Discussion: https://www.postgresql.org/message-id/1053279b-da01-4eb4-b7a3-da6b5d8f73d1%40eisentraut.org
2025-10-01 17:13:52 +02:00
Peter Eisentraut
f5aabe6d58 Revert "Make some use of anonymous unions [pgcrypto]"
This reverts commit efcd5199d8.

I rebased my patch series incorrectly.  This patch contained unrelated
parts from another patch, which made the overall build fail.  Revert
for now and reconsider.
2025-09-30 13:12:16 +02:00
Peter Eisentraut
efcd5199d8 Make some use of anonymous unions [pgcrypto]
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.

This commit changes some structures in pgcrypto.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30 12:35:50 +02:00
Tom Lane
4300d8b6a7 Don't put library-supplied -L/-I switches before user-supplied ones.
For many optional libraries, we extract the -L and -l switches needed
to link the library from a helper program such as llvm-config.  In
some cases we put the resulting -L switches into LDFLAGS ahead of
-L switches specified via --with-libraries.  That risks breaking
the user's intention for --with-libraries.

It's not such a problem if the library's -L switch points to a
directory containing only that library, but on some platforms a
library helper may "helpfully" offer a switch such as -L/usr/lib
that points to a directory holding all standard libraries.  If the
user specified --with-libraries in hopes of overriding the standard
build of some library, the -L/usr/lib switch prevents that from
happening since it will come before the user-specified directory.

To fix, avoid inserting these switches directly into LDFLAGS during
configure, instead adding them to LIBDIRS or SHLIB_LINK.  They will
still eventually get added to LDFLAGS, but only after the switches
coming from --with-libraries.

The same problem exists for -I switches: those coming from
--with-includes should appear before any coming from helper programs
such as llvm-config.  We have not heard field complaints about this
case, but it seems certain that a user attempting to override a
standard library could have issues.

The changes for this go well beyond configure itself, however,
because many Makefiles have occasion to manipulate CPPFLAGS to
insert locally-desirable -I switches, and some of them got it wrong.
The correct ordering is any -I switches pointing at within-the-
source-tree-or-build-tree directories, then those from the tree-wide
CPPFLAGS, then those from helper programs.  There were several places
that risked pulling in a system-supplied copy of libpq headers, for
example, instead of the in-tree files.  (Commit cb36f8ec2 fixed one
instance of that a few months ago, but this exercise found more.)

The Meson build scripts may or may not have any comparable problems,
but I'll leave it to someone else to investigate that.

Reported-by: Charles Samborski <demurgos@demurgos.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70f2155f-27ca-4534-b33d-7750e20633d7@demurgos.net
Backpatch-through: 13
2025-07-29 15:17:40 -04:00
Álvaro Herrera
2633dae2e4
Standardize LSN formatting by zero padding
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages.  Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding.  This often led to confusion when comparing.

To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.

Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-07-07 13:57:43 +02:00
Heikki Linnakangas
b28c59a6cd Use 'void *' for arbitrary buffers, 'uint8 *' for byte arrays
A 'void *' argument suggests that the caller might pass an arbitrary
struct, which is appropriate for functions like libc's read/write, or
pq_sendbytes(). 'uint8 *' is more appropriate for byte arrays that
have no structure, like the cancellation keys or SCRAM tokens. Some
places used 'char *', but 'uint8 *' is better because 'char *' is
commonly used for null-terminated strings. Change code around SCRAM,
MD5 authentication, and cancellation key handling to follow these
conventions.

Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
2025-05-08 22:01:25 +03:00
Noah Misch
627acc3caa With GB18030, prevent SIGSEGV from reading past end of allocation.
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from().  Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString().  Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions.  All those functions have been
intended to handle untrusted, unterminated input safely.

A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page.  Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach.  Back-patch to v13 (all supported versions).

Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
2025-05-05 04:52:04 -07:00
Jeff Davis
90260e2ec6 Fix INITCAP() word boundaries for PG_UNICODE_FAST.
Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.

Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
2025-04-21 12:34:58 -07:00
Peter Geoghegan
a6cab6a78e Harmonize function parameter names for Postgres 18.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2025-04-12 12:07:36 -04:00
Fujii Masao
173c97812f Use XLOG_CONTROL_FILE macro consistently for control file name.
The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h)
represents the control file name. While some parts of the codebase already
use this macro, others previously hardcoded the file name as a string.

This commit replaces those hardcoded strings with the macro,
ensuring consistent usage throughout the code. This makes future
maintenance easier and improves searchability, for example when
grepping for control file usage.

Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru
2025-04-07 09:27:33 +09:00
Peter Eisentraut
82a46cca99 Update Unicode data to Unicode 16.0.0
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/146349e4-4687-4321-91af-f235572490a8@eisentraut.org
2025-04-03 12:00:09 +02:00
Peter Eisentraut
bbf24fe2f1 Update code comment
Commit 4e7f62bc38 added a new input file to a script but didn't
update the comment listing the input files.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
34f04aa653 Fix update-unicode make target
The addition of SpecialCasing.txt by commit 286a365b9c was not added
to the make target dependencies, so the invoked script would fail
because the required file wasn't downloaded first.  (The meson version
appears to work correctly.)
2025-04-03 09:20:25 +02:00
Richard Guo
7c82b4f711 Fix integer-overflow problem in scram_SaltedPassword()
Setting the iteration count for SCRAM secret generation to INT_MAX
will cause an infinite loop in scram_SaltedPassword() due to integer
overflow, as the loop uses the "i <= iterations" comparison.  To fix,
use "i < iterations" instead.

Back-patch to v16 where the user-settable GUC scram_iterations has
been added.

Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com
2025-03-26 17:46:51 +09:00
Nathan Bossart
626d7236b6 pg_upgrade: Add --swap for faster file transfer.
This new option instructs pg_upgrade to move the data directories
from the old cluster to the new cluster and then to replace the
catalog files with those generated for the new cluster.  This mode
can outperform --link, --clone, --copy, and --copy-file-range,
especially on clusters with many relations.

However, this mode creates many garbage files in the old cluster,
which can prolong the file synchronization step if
--sync-method=syncfs is used.  To handle that, we recommend using
--sync-method=fsync with this mode, and pg_upgrade internally uses
"initdb --sync-only --no-sync-data-files" for file synchronization.
pg_upgrade will synchronize the catalog files as they are
transferred.  We assume that the database files transferred from
the old cluster were synchronized prior to upgrade.

This mode also complicates reverting to the old cluster, so we
recommend restoring from backup upon failure during or after file
transfer.  We did consider teaching pg_upgrade how to generate a
revert script for such failures, but we decided against it due to
the rarity of failing during file transfer, the complexity of
generating the script, and the potential for misusing the script.

The new mode is limited to clusters located in the same file
system.  With some effort, we could probably support upgrades
between different file systems, but this mode is unlikely to offer
much benefit if we have to copy the files across file system
boundaries.

It is also limited to upgrades from version 10 or newer.  There are
a few known obstacles for using swap mode to upgrade from older
versions.  For example, the visibility map format changed in v9.6,
and the sequence tuple format changed in v10.  In fact, swap mode
omits the --sequence-data option in its uses of pg_dump and instead
reuses the old cluster's sequence data files.  While teaching swap
mode to deal with these kinds of changes is surely possible (and we
may have to deal with similar problems in the future, anyway), it
doesn't seem worth the effort to support upgrades from
long-unsupported versions.

Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Nathan Bossart
cf131fa942 initdb: Add --no-sync-data-files.
This new option instructs initdb to skip synchronizing any files
in database directories, the database directories themselves, and
the tablespace directories, i.e., everything in the base/
subdirectory and any other tablespace directories.  Other files,
such as those in pg_wal/ and pg_xact/, will still be synchronized
unless --no-sync is also specified.  --no-sync-data-files is
primarily intended for internal use by tools that separately ensure
the skipped files are synchronized to disk.  A follow-up commit
will use this to help optimize pg_upgrade's file transfer step.

The --sync-method=fsync implementation of this option makes use of
a new exclude_dir parameter for walkdir().  When not NULL,
exclude_dir specifies a directory to skip processing.  The
--sync-method=syncfs implementation of this option just skips
synchronizing the non-default tablespace directories.  This means
that initdb will still synchronize some or all of the database
files, but there's not much we can do about that.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
2025-03-25 16:02:35 -05:00
Peter Eisentraut
618c64ffd3 Revert workarounds for -Wmissing-braces false positives on old GCC
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings.  This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5).  (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)

That compiler no longer runs warning-free since commit 6fdd5d9563, so
we don't need to keep these workarounds.  And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.

This reverts the following commits:

d937904cce
506428d091
b449afb582
6392f2a096
bad0763a4d
5e0c761d0a

and makes a few similar fixes to newer code.

Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
2025-03-20 11:25:58 +01:00
Jeff Davis
549ea06e42 Fix headerscheck warning.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us
2025-03-18 08:37:07 -07:00
Andrew Dunstan
5eabd91a83 Silence perl critic
Commit 27bdec0684 uses a loop variable that is not strictly local to
the loop. Perlcritic disapproves, and there's really no reason as the
variable is not used outside the loop.

Per buildfarm animals koel and crake.
2025-03-15 17:41:54 -04:00
Jeff Davis
27bdec0684 Optimization for lower(), upper(), casefold() functions.
Improve performance and reduce table sizes for case mapping.

The main case mapping table stores only 16-bit offsets, which can be
used to look up the mapped code point in any of the case tables (fold,
lower, upper, or title case). Simple case pairs point to the same
offsets.

Generate a function in generate-unicode_case_table.pl that consists of
a nested branches to test for specific codepoint ranges that determine
the offset in the main table.

Other approaches were considered, such as representing these ranges as
another structure (rather than branches in a generated function), or a
different approach such as a radix tree, or perfect hashing. The
author implemented and tested these alternatives and settled on the
generated branches.

Author: Alexander Borisov <lex.borisov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com
2025-03-15 13:00:50 -07:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Jeff Davis
d3b2e5e1ab Refactor convert_case() to prepare for optimizations.
Upcoming optimizations will add complexity to convert_case(). This
patch reorganizes slightly so that the complexity can be contained
within the logic to convert the case of a single character, rather
than mixing it in with logic to iterate through the string.

Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com
2025-03-12 21:51:52 -07:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Peter Eisentraut
3e4d868615 Remove various unnecessary (char *) casts
Remove a number of (char *) casts that are unnecessary.  Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-20 19:49:27 +01:00
Andres Freund
4dc2896353 Add pg_encoding_set_invalid()
There are cases where we cannot / do not want to error out for invalidly
encoded input. In such cases it can be useful to replace e.g. an incomplete
multi-byte characters with bytes that will trigger an error when getting
validated as part of a larger string.

Unfortunately, until now, for some encoding no such sequence existed. For
those encodings this commit removes one previously accepted input combination
- we consider that to be ok, as the chosen bytes are outside of the valid
ranges for the encodings, we just previously failed to detect that.

As we cannot add a new field to pg_wchar_table without breaking ABI, this is
implemented "in-line" in the newly added function.

Author: Noah Misch <noah@leadboat.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:37 -05:00
Jeff Davis
4e7f62bc38 Add support for Unicode case folding.
Expand case mapping tables to include entries for case folding, which
are parsed from CaseFolding.txt.

Discussion: https://postgr.es/m/a1886ddfcd8f60cb3e905c93009b646b4cfb74c5.camel%40j-davis.com
2025-01-23 09:06:50 -08:00