This routine documented that "iterations" would use a default value if
set to 0 by the caller. However, the iteration should always be set by
the caller to a value strictly more than 0, as documented by an
assertion.
Oversight in b577743000, that has made the iteration count of SCRAM
configurable.
Author: Matheus Alcantara
Discussion: https://postgr.es/m/ac858943-4743-44cd-b4ad-08a0c10cbbc8@gmail.com
Backpatch-through: 16
Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.
This patch adds a flag to switch the ownership of parsed tokens to the
lexer. When this is enabled, the client must make a copy of any tokens
it wants to persist past the callback lifetime, but the lexer will
handle necessary cleanup on failure.
Backend uses of the JSON parser don't need to use this flag, since the
parser's allocations will occur in a short lived memory context.
A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com
Most came in during the 17 cycle, so backpatch there. Some
(particularly reorderbuffer.h) are very old, but backpatching doesn't
seem useful.
Like commits c9d2977519, c4f113e8fe.
Presently, each iteration of the loop in sift_down() will perform
3 comparisons if both children are larger than the parent node (2
for comparing each child to the parent node, and a third to compare
the children to each other). By first comparing the children to
each other and then comparing the larger child to the parent node,
we can accomplish the same thing with just 2 comparisons (while
also not affecting the number of comparisons in any other case).
Author: ChangAo Chen
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/tencent_0142D8DA90940B9930BCC08348BBD6D0BB07%40qq.com
As threatened in the previous patch, define MaxAllocSize in
src/include/common/fe_memutils.h rather than having several
copies of it in different src/common/*.c files. This also
provides an opportunity to document it better.
While this would probably be safe to back-patch, I'll refrain
(for now anyway).
Coverity complained that pg_saslprep() could suffer integer overflow,
leading to under-allocation of the output buffer, if the input string
exceeds SIZE_MAX/4. This hazard seems largely hypothetical, but it's
easy enough to defend against, so let's do so.
This patch creates a third place in src/common/ where we are locally
defining MaxAllocSize so that we can test against that in the same way
in backend and frontend compiles. That seems like about two places
too many, so the next patch will move that into common/fe_memutils.h.
I'm hesitant to do that in back branches however.
Back-patch to v14. The code looks similar in older branches, but
before commit 67a472d71 there was a separate test on the input string
length that prevented this hazard.
Per Coverity report.
Commit 5d2e1cc117 introduced some strsep() uses, but it did the
memory management wrong in some cases. We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.
(This fixes two of the four places changed in commit 5d2e1cc117. The
other two don't have this problem.)
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com
Coverity pointed out that inc_lex_level() would leak memory
(not to mention corrupt the pstack data structure) if some
but not all of its three REALLOC's failed. To fix, store
successfully-updated pointers back into the pstack struct
immediately.
Oversight in 0785d1b8b, so no need for back-patch.
Based on a patch by Michael Paquier.
For libpq, use PQExpBuffer instead of StringInfo. This requires us to
track allocation failures so that we can return JSON_OUT_OF_MEMORY as
needed rather than exit()ing.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
In case of torn UTF8 in the input data we might end up going
past the end of the string since we don't account for length.
While validation won't be performed on a sequence with a NULL
byte it's better to avoid going past the end to beging with.
Fix by taking the length into consideration.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+mTnmM172g=_+Yvc47hzzeAsYPy2C4UBY3HK9p-AXNV0g@mail.gmail.com
Similarly to 2065ddf5e3, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.
There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for
some time, and is no longer the default OpenSSL version with any
vendor which package PostgreSQL. By retiring support for OpenSSL
1.0.2 we can remove a lot of no longer required complexity for
managing state within libcrypto which is now handled by OpenSSL.
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz
Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com
libpq must not use palloc/pfree. It's not allowed to exit on allocation
failure, and mixing the frontend pfree with malloc is architecturally
unsound.
Remove fe_memutils from the shlib build entirely, to keep devs from
accidentally depending on it in the future.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
The now preferred way to call realpath() is by passing NULL as the
second argument and get a malloc'ed result. We still supported the
old way of providing our own buffer as a second argument, for some
platforms that didn't support the new way yet. Those were only
Solaris less than version 11 and some older AIX versions (7.1 and
newer appear to support the new variant). We don't support those
platforms versions anymore, so we can remove this extra code.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/9e638b49-5c3f-470f-a392-2cbedb2f7855%40eisentraut.org
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.
Backpatch to v14, where we switched to the new implementation.
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases. Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.
Affected by this are parsing of:
- Stored SCRAM secrets
("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>")
- ICU collation attributes
("und@colStrength=primary;colCaseLevel=yes") for ICU older than
version 54
- PG_COLORS environment variable
("error=01;31:warning=01;35:note=01;36:locus=01")
- pg_regress command-line options with comma-separated list arguments
(--dbname, --create-role) (currently only used pg_regress_ecpg)
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
Until now, when an enlargeStringInfo() call would cause the StringInfo to
exceed its maximum size, we reported an "out of memory" error. This is
misleading as it's no such thing.
Here we remove the "out of memory" text and replace it with something
more relevant to better indicate that it's a program limitation that's
been reached.
Reported-by: Michael Banck
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/18484-3e357ade5fe50e61@postgresql.org
Apply const qualifiers to char * arguments and fields throughout the
jsonapi. This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error(). This needs to be added to the various nls.mk files
that use this. Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.
It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input. This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.
More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form. This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence. A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.
A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.
Note that pg_encoding_mblen_bounded() no longer has any callers. This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13
JsonNonTerminal and JsonParserSem were added in commit 3311ea86ed
These names of these two enums are not actually used, so there is no
need for typedefs. Instead use plain enums to declare the constants.
Noticed by Alvaro Herera.
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
Per gripes from Michael Paquier
Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz
Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.
While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
The previous formula was incorrect in the case where the function's
nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens
whenever a relation segment file is exactly 512MB or exactly 1GB in
length. In such cases, the formula would calculate a stop_offset of
0 rather than 65536, resulting in modified blocks in the second half
of a 1GB file, or all the modified blocks in a 512MB file, being
omitted from the incremental backup.
Reported off-list by Tomas Vondra and Jakub Wartak.
Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com
This adds the infrastructure for using the new non-recursive JSON parser
in processing manifests. It's important that callers make sure that the
last piece of json handed to the incremental manifest parser contains
the entire last few lines of the manifest, including the checksum.
Author: Andrew Dunstan
Reviewed-By: Jacob Champion
Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
This parser uses an explicit prediction stack, unlike the present
recursive descent parser where the parser state is represented on the
call stack. This difference makes the new parser suitable for use in
incremental parsing of huge JSON documents that cannot be conveniently
handled piece-wise by the recursive descent parser. One potential use
for this will be in parsing large backup manifests associated with
incremental backups.
Because this parser is somewhat slower than the recursive descent
parser, it is not replacing that parser, but is an additional parser
available to callers.
For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all
JSON parsing is done with the non-recursive parser, in which case only
trivial regression differences in error messages should be observed.
Author: Andrew Dunstan
Reviewed-By: Jacob Champion
Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).
This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.
The current code does not use the new indexing logic, but it will be
used by an upcoming patch.
Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com