Commit graph

251 commits

Author SHA1 Message Date
John Naylor
911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
Michael Paquier
6fb7c5d67c Centralize timestamp computation of control file on updates
This commit moves the timestamp computation of the control file within
the routine of src/common/ in charge of updating the backend's control
file, which is shared by multiple frontend tools (pg_rewind,
pg_checksums and pg_resetwal) and the backend itself.

This change has as direct effect to update the control file's timestamp
when writing the control file in pg_rewind and pg_checksums, something
that is helpful to keep track of control file updates for those
operations, something also tracked by the backend at startup within its
logs.  This part is arguably a bug, as ControlFileData->time should be
updated each time a new version of the control file is written, but this
is a behavior change so no backpatch is done.

Author: Amul Sul
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
2021-11-29 13:36:13 +09:00
Tom Lane
3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
Tom Lane
46d665bc26 Allow psql's other uses of simple_prompt() to be interrupted by ^C.
This fills in the work left un-done by 5f1148224.  \prompt can
be canceled out of now, and so can password prompts issued during
\connect.  (We don't need to do anything for password prompts
issued during startup, because we aren't yet trapping SIGINT
at that point.)

Nathan Bossart

Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
2021-11-19 12:11:46 -05:00
Tom Lane
5f1148224b Provide a variant of simple_prompt() that can be interrupted by ^C.
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT).  This
is pretty user-unfriendly, so improve it.  To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.

This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets().  While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.

psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.

Nathan Bossart, minor tweaks by me

Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
2021-11-17 19:09:54 -05:00
Michael Paquier
098c134556 Fix buffer overrun in unicode string normalization with empty input
PostgreSQL 13 and newer versions are directly impacted by that through
the SQL function normalize(), which would cause a call of this function
to write one byte past its allocation if using in input an empty
string after recomposing the string with NFC and NFKC.  Older versions
(v10~v12) are not directly affected by this problem as the only code
path using normalization is SASLprep in SCRAM authentication that
forbids the case of an empty string, but let's make the code more robust
anyway there so as any out-of-core callers of this function are covered.

The solution chosen to fix this issue is simple, with the addition of a
fast-exit path if the decomposed string is found as empty.  This would
only happen for an empty string as at its lowest level a codepoint would
be decomposed as itself if it has no entry in the decomposition table or
if it has a decomposition size of 0.

Some tests are added to cover this issue in v13~.  Note that an empty
string has always been considered as normalized (grammar "IS NF[K]{C,D}
NORMALIZED", through the SQL function is_normalized()) for all the
operations allowed (NFC, NFD, NFKC and NFKD) since this feature has been
introduced as of 2991ac5.  This behavior is unchanged but some tests are
added in v13~ to check after that.

I have also checked "make normalization-check" in src/common/unicode/,
while on it (works in 13~, and breaks in older stable branches
independently of this commit).

The release notes should just mention this commit for v13~.

Reported-by: Matthijs van der Vleuten
Discussion: https://postgr.es/m/17277-0c527a373794e802@postgresql.org
Backpatch-through: 10
2021-11-11 15:00:59 +09:00
Daniel Gustafsson
0ded7039fa Fix memory leak in pg_hmac
The intermittent h buffer was not freed, causing it to leak. Backpatch
through 14 where HMAC was refactored to the current API.

Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Discussion: https://postgr.es/m/af07e620-7e28-a742-4637-2bc44aa7c2be@postgrespro.ru
Backpatch-through: 14
2021-10-01 22:47:05 +02:00
Michael Paquier
e767ddcd35 Fix typos and grammar in code comments
Several mistakes have piled in the code comments over the time,
including incorrect grammar, function names and simple typos.  This
commit takes care of a portion of these.

No backpatch is done as this is only cosmetic.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
2021-09-27 14:21:28 +09:00
John Naylor
5bc429aacb Extend collection of Unicode combining characters to beyond the BMP
The former limit was perhaps a carryover from an older hand-coded
table. Since commit bab982161 we have enough space in mbinterval to
store larger codepoints, so collect all combining characters.

Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com
2021-08-26 13:07:34 -04:00
John Naylor
bab982161e Update display widths as part of updating Unicode
The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era.  This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.

To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.

In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.

Jacob Champion

Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
2021-08-26 10:53:56 -04:00
John Naylor
1563ecbc1b Revert "Rename unicode_combining_table to unicode_width_table"
This reverts commit eb0d0d2c73.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 10:06:12 -04:00
John Naylor
f8c8a8bccc Revert "Change mbbisearch to return the character range"
This reverts commit 78ab944cd4.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 09:58:28 -04:00
John Naylor
78ab944cd4 Change mbbisearch to return the character range
Add a width field to mbinterval and have mbbisearch return a
pointer to the found range rather than just bool for success.
A future commit will add another width besides zero, and this
will allow that to use the same search.

Reviewed by Jacob Champion
Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com
2021-08-25 13:08:11 -04:00
John Naylor
eb0d0d2c73 Rename unicode_combining_table to unicode_width_table
No functional changes. A future commit will use this table for
other purposes besides combining characters.
2021-08-25 13:01:35 -04:00
Michael Paquier
2576dcfb76 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:13 +09:00
Michael Paquier
b44669b2ca Simplify error handing of jsonapi.c for the frontend
This commit removes a dependency to the central logging facilities in
the JSON parsing routines of src/common/, which existed to log errors
when seeing error codes that do not match any existing values in
JsonParseErrorType, which is not something that should never happen.

The routine providing a detailed error message based on the error code
is made backend-only, the existing code being unsafe to use in the
frontend as the error message may finish by being palloc'd or point to a
static string, so there is no way to know if the memory of the message
should be pfree'd or not.  The only user of this routine in the frontend
was pg_verifybackup, that is changed to use a more generic error message
on parsing failure.

Note that making this code more resilient to OOM failures if used in
shared libraries would require much more work as a lot of code paths
still rely on palloc() & friends, but we are not sure yet if we need to
go down to that.  Still, removing the dependency to logging is a step
toward more portability.

This cleans up the handling of check_stack_depth() while on it, as it
exists only in the backend.

Per discussion with Jacob Champion and Tom Lane.

Discussion: https://postgr.es/m/YNwL7kXwn3Cckbd6@paquier.xyz
2021-07-02 09:35:12 +09:00
Tom Lane
42f94f56bf Fix incautious handling of possibly-miscoded strings in client code.
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.

This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them.  However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.

Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.)  Those functions were already
made proof against this class of problem, cf CVE-2006-2313.

To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it.  (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem.  So we just want a
version for those callers that don't have any better way of handling
this issue.)

Per private report from houjingyi.  Back-patch to all supported branches.
2021-06-07 14:15:25 -04:00
David Rowley
7fc26d11e3 Adjust locations which have an incorrect copyright year
A few patches committed after ca3b37487 mistakenly forgot to make the
copyright year 2021.  Fix these.

Discussion: https://postgr.es/m/CAApHDvqyLmd9P2oBQYJ=DbrV8QwyPRdmXtCTFYPE08h+ip0UJw@mail.gmail.com
2021-06-04 12:19:50 +12:00
Peter Eisentraut
82c3cd9741 Factor out system call names from error messages
Instead, put them in via a format placeholder.  This reduces the
number of distinct translatable messages and also reduces the chances
of typos during translation.  We already did this for the system call
arguments in a number of cases, so this is just the same thing taken a
bit further.

Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2021-04-23 14:21:37 +02:00
Michael Paquier
7ef8b52cf0 Fix typos and grammar in comments and docs
Author: Justin Pryzby
Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com
2021-04-19 11:32:30 +09:00
Michael Paquier
e6bdfd9700 Refactor HMAC implementations
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently).  If there
is no such libraries, a fallback implementation is available.  Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.

HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC.  This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.

This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1.  I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.

Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
2021-04-03 17:30:49 +09:00
Peter Eisentraut
f06b1c5982 pg_upgrade: Check version of target cluster binaries
This expands the binary validation in pg_upgrade with a version
check per binary to ensure that the target cluster installation
only contains binaries from the target version.

In order to reduce duplication, validate_exec is exported from
port.h and the local copy in pg_upgrade is removed.

Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/9328.1552952117@sss.pgh.pa.us
2021-03-03 09:45:56 +01:00
Tom Lane
ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
Michael Paquier
b83dcf7928 Add result size as argument of pg_cryptohash_final() for overflow checks
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that.  This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.

The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core.  Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes.  In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.

This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.

Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
2021-02-15 10:18:34 +09:00
Michael Paquier
42d74e0c44 Fix copy-paste error with SHA256 digest length in checksum_helper.c
Issue introduced by 87ae969, noticed while working on the area.  While
on it, fix some grammar in the surrounding static assertions.
2021-02-11 19:16:11 +09:00
Michael Paquier
fe61df7f82 Introduce --with-ssl={openssl} as a configure option
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries.  The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.

The original switch, --with-openssl, is kept for compatibility.

Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
2021-02-01 19:19:44 +09:00
Heikki Linnakangas
b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Michael Paquier
a8ed6bb8f4 Introduce SHA1 implementations in the cryptohash infrastructure
With this commit, SHA1 goes through the implementation provided by
OpenSSL via EVP when building the backend with it, and uses as fallback
implementation KAME which was located in pgcrypto and already shaped for
an integration with a set of init, update and final routines.
Structures and routines have been renamed to make things consistent with
the fallback implementations of MD5 and SHA2.

uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of
SHA1 if needed.  This was built depending on the build options within
./configure, so this cleans up some code and removes the build
dependency between pgcrypto and uuid-ossp.

Note that this will help with the refactoring of HMAC, as pgcrypto
offers the option to use MD5, SHA1 or SHA2, so only the second option
was missing to make that possible.

Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
2021-01-23 11:33:04 +09:00
Michael Paquier
aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Michael Paquier
15b824da97 Fix and simplify some code related to cryptohashes
This commit addresses two issues:
- In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final}
without checking for the result status.
- Simplify pg_checksum_raw_context to use only one variable for all the
SHA2 options available in checksum manifests.

Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi
2021-01-08 10:37:03 +09:00
Michael Paquier
55fe26a4b5 Fix allocation logic of cryptohash context data with OpenSSL
The allocation of the cryptohash context data when building with OpenSSL
was happening in the memory context of the caller of
pg_cryptohash_create(), which could lead to issues with resowner cleanup
if cascading resources are cleaned up on an error.  Like other
facilities using resowners, move the base allocation to TopMemoryContext
to ensure a correct cleanup on failure.

The resulting code gets simpler with this commit as the context data is
now hold by a unique opaque pointer, so as there is only one single
allocation done in TopMemoryContext.

After discussion, also change the cryptohash subroutines to return an
error if the caller provides NULL for the context data to ease error
detection on OOM.

Author: Heikki Linnakangas
Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz
2021-01-07 10:21:02 +09:00
Bruce Momjian
ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane
7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Bruce Momjian
3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Bruce Momjian
7705f8ca03 Fix function call typo in frontend Win32 code, commit 978f869b99
Reported-by: buildfarm member walleye

Backpatch-through: master
2020-12-25 20:49:50 -05:00
Tom Lane
0848cf4f55 Really fix the dummy implementations in cipher.c.
945083b2f wasn't enough to silence compiler warnings.
2020-12-25 14:45:24 -05:00
Bruce Momjian
8e59813e22 fix no-return function call in cipher.c from commit 978f869b99
Reported-by: buildfarm member sifaka

Backpatch-through: master
2020-12-25 14:40:46 -05:00
Bruce Momjian
e35b2bad1a remove uint128 requirement from patch 978f869b99 (CFE)
Used char[16] instead.

Reported-by: buildfarm member florican

Backpatch-through: master
2020-12-25 11:35:59 -05:00
Bruce Momjian
945083b2f7 Fix return value and const declaration from commit 978f869b99
This fixes the non-OpenSSL compile case.

Reported-by: buildfarm member sifaka

Backpatch-through: master
2020-12-25 11:00:32 -05:00
Bruce Momjian
978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian
c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Michael Paquier
93e8ff8701 Refactor logic to check for ASCII-only characters in string
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.

Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
2020-12-21 09:37:11 +09:00
Michael Paquier
9b584953e7 Improve some code around cryptohash functions
This adjusts some code related to recent changes for cryptohash
functions:
- Add a variable in md5.h to track down the size of a computed result,
moved from pgcrypto.  Note that pg_md5_hash() assumed a result of this
size already.
- Call explicit_bzero() on the hashed data when freeing the context for
fallback implementations.  For MD5, particularly, it would be annoying
to leave some non-zeroed data around.
- Clean up some code related to recent changes of uuid-ossp.  .gitignore
still included md5.c and a comment was incorrect.

Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
2020-12-14 12:38:13 +09:00
Michael Paquier
b67b57a966 Refactor MD5 implementations according to new cryptohash infrastructure
This commit heavily reorganizes the MD5 implementations that exist in
the tree in various aspects.

First, MD5 is added to the list of options available in cryptohash.c and
cryptohash_openssl.c.  This means that if building with OpenSSL, EVP is
used for MD5 instead of the fallback implementation that Postgres had
for ages.  With the recent refactoring work for cryptohash functions,
this change is straight-forward.  If not building with OpenSSL, a
fallback implementation internal to src/common/ is used.

Second, this reduces the number of MD5 implementations present in the
tree from two to one, by moving the KAME implementation from pgcrypto to
src/common/, and by removing the implementation that existed in
src/common/.  KAME was already structured with an init/update/final set
of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility
with OpenSSL, so moving it to src/common/ has proved to be a
straight-forward move, requiring no actual manipulation of the internals
of each routine.  Some benchmarking has not shown any performance gap
between both implementations.

Similarly to the fallback implementation used for SHA2, the fallback
implementation of MD5 is moved to src/common/md5.c with an internal
header called md5_int.h for the init, update and final routines.  This
gets then consumed by cryptohash.c.

The original routines used for MD5-hashed passwords are moved to a
separate file called md5_common.c, also in src/common/, aimed at being
shared between all MD5 implementations as utility routines to keep
compatibility with any code relying on them.

Like the SHA2 changes, this commit had its round of tests on both Linux
and Windows, across all versions of OpenSSL supported on HEAD, with and
even without OpenSSL.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz
2020-12-10 11:59:10 +09:00
Michael Paquier
16c302f512 Simplify code for getting a unicode codepoint's canonical class.
Three places of unicode_norm.c use a similar logic for getting the
combining class from a codepoint.  Commit 2991ac5 has added the function
get_canonical_class() for this purpose, but it was only called by the
backend.  This commit refactors the code to use this function in all
the places where the combining class is retrieved from a given
codepoint.

Author: John Naylor
Discussion: https://postgr.es/m/CAFBsxsHUV7s7YrOm6hFz-Jq8Sc7K_yxTkfNZxsDV-DuM-k-gwg@mail.gmail.com
2020-12-09 13:24:38 +09:00
Michael Paquier
4f48a6fbe2 Change SHA2 implementation based on OpenSSL to use EVP digest routines
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
This takes advantage of the refactoring done in 87ae969 that has
introduced the allocation and free routines for cryptographic hashes.

Since 1.1.0, OpenSSL does not publish the contents of the cryptohash
contexts, forcing any consumers to rely on OpenSSL for all allocations.
Hence, the resource owner callback mechanism gains a new set of routines
to track and free cryptohash contexts when using OpenSSL, preventing any
risks of leaks in the backend.  Nothing is needed in the frontend thanks
to the refactoring of 87ae969, and the resowner knowledge is isolated
into cryptohash_openssl.c.

Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Heikki Linnakangas
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
2020-12-04 10:49:23 +09:00
Michael Paquier
91624c2ff8 Fix compilation warnings in cryptohash_openssl.c
These showed up with -O2.  Oversight in 87ae969.

Author: Fujii Masao
Discussion: https://postgr.es/m/cee3df00-566a-400c-1252-67c3701f918a@oss.nttdata.com
2020-12-02 12:31:10 +09:00
Michael Paquier
87ae9691d2 Move SHA2 routines to a new generic API layer for crypto hashes
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years).  Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines.  This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.

This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.

The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/.  OpenSSL specifics are part of
cryptohash_openssl.c.  This infrastructure is usable for more hash
types, like MD5 or HMAC.

Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit.  The
zones mostly impacted are checksum manifests, libpq and SCRAM.

Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.

This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 10:37:20 +09:00
Peter Eisentraut
c9f0624bc2 Add support for abstract Unix-domain sockets
This is a variant of the normal Unix-domain sockets that don't use the
file system but a separate "abstract" namespace.  At the user
interface, such sockets are represented by names starting with "@".
Supported on Linux and Windows right now.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033@2ndquadrant.com
2020-11-25 08:33:57 +01:00
Michael Paquier
ceaeac54f7 Fix minor issues with new unicode {de,re}composition code
The table generation script would incorrectly complain in the
recomposition sorting when matching code points.  This would not have
caused the generation of an incorrect table.  Note that this condition
is not reachable yet, but could have been reached with future updates.

pg_bswap.h does not need to be included in the frontend.x

Author: John Naylor
Discussion: https://postgr.es/m/CAFBsxsGWmExpvv=61vtDKCs7+kBbhkwBDL2Ph9CacziFKnV_yw@mail.gmail.com
2020-11-07 10:15:58 +09:00