This can be used to load an injection point and prewarm the
backend-level cache before running it, to avoid issues if the point
cannot be loaded due to restrictions in the code path where it would be
run, like a critical section where no memory allocation can happen
(load_external_function() can do allocations when expanding a library
name).
Tests can use a macro called INJECTION_POINT_LOAD() to load an injection
point. The test module injection_points gains some tests, and a SQL
function able to load an injection point.
Based on a request from Andrey Borodin, who has implemented a test for
multixacts requiring this facility.
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZkrBE1e2q2wGvsoN@paquier.xyz
Up until now, there was no ability to easily determine if a Material
node caused the underlying tuplestore to spill to disk or even see how
much memory the tuplestore used if it didn't.
Here we add some new functions to tuplestore.c to query this information
and add some additional output in EXPLAIN ANALYZE to display this
information for the Material node.
There are a few other executor node types that use tuplestores, so we
could also consider adding these details to the EXPLAIN ANALYZE for
those nodes too. Let's consider those independently from this. Having
the tuplestore.c infrastructure in to allow that is step 1.
Author: David Rowley
Reviewed-by: Matthias van de Meent, Dmitry Dolgov
Discussion: https://postgr.es/m/CAApHDvp5Py9g4Rjq7_inL3-MCK1Co2CRt_YWFwTU2zfQix0p4A@mail.gmail.com
The standby_slot_names GUC allows the specification of physical standby
slots that must be synchronized before the logical walsenders associated
with logical failover slots. However, for this purpose, the GUC name is
too generic.
Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Masahiko Sawada
Backpatch-through: 17
Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us
Shared statistics with a fixed number of objects are read from the stats
file in pgstat_read_statsfile() using members of PgStat_ShmemControl and
following an order based on their PgStat_Kind value.
Instead of being explicit, this commit changes the stats read to iterate
over the pgstat_kind_infos array to find the memory locations to read
into, based on a new shared_ctl_off in PgStat_KindInfo that can be used
to define the position of this stats kind in shared memory. This makes
the read logic simpler, and eases the introduction of future
improvements aimed at making this area more pluggable for external
modules.
Original idea suggested by Andres Freund.
Author: Tristan Partin
Reviewed-by: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/D12SQ7OYCD85.20BUVF3DWU5K7@neon.tech
This field is used to track if a stats kind can use a custom format
representation on disk when reading or writing its stats case. On HEAD,
this exists for replication slots stats, that need a mapping between an
internal index ID and the slot names.
named_on_disk is currently used nowhere and the callbacks
to_serialized_name and from_serialized_name are in charge of checking if
the serialization of the stats data should apply, so let's remove it.
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/ZmKVlSX_T5YvIOsd@paquier.xyz
Instead of looking up casts at parse time for converting the result
of JsonPath* query functions to the specified or the default
RETURNING type, always perform the conversion at runtime using either
the target type's input function or the function
json_populate_type().
There are two motivations for this change:
1. json_populate_type() coerces to types with typmod such that any
string values that exceed length limit cause an error instead of
silent truncation, which is necessary to be standard-conforming.
2. It was possible to end up with a cast expression that doesn't
support soft handling of errors causing bugs in the of handling
ON ERROR clause.
JsonExpr.coercion_expr which would store the cast expression is no
longer necessary, so remove.
Bump catversion because stored rules change because of the above
removal.
Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: Discussion: https://postgr.es/m/202405271326.5a5rprki64aw%40alvherre.pgsql
Previously, GetJsonPathVar() allowed a jsonpath expression to
reference any prefix of a PASSING variable's name. For example, the
following query would incorrectly work:
SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz);
The fix ensures that the length of the variable name mentioned in a
jsonpath expression matches exactly with the name of the PASSING
variable before comparing the strings using strncmp().
Reported-by: Alvaro Herrera (off-list)
Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com
Commit 534287403 invented SHARED_DEPENDENCY_INITACL entries in
pg_shdepend, but installed them only for non-owner roles mentioned
in a pg_init_privs entry. This turns out to be the wrong thing,
because there is nothing to cue REASSIGN OWNED to go and update
pg_init_privs entries when the object's ownership is reassigned.
That leads to leaving dangling entries in pg_init_privs, as
reported by Hannu Krosing. Instead, install INITACL entries for
all roles mentioned in pg_init_privs entries (except pinned roles),
and change ALTER OWNER to not touch them, just as it doesn't
touch pg_init_privs entries.
REASSIGN OWNED will now substitute the new owner OID for the old
in pg_init_privs entries. This feels like perhaps not quite the
right thing, since pg_init_privs ought to be a historical record
of the state of affairs just after CREATE EXTENSION. However,
it's hard to see what else to do, if we don't want to disallow
dropping the object's original owner. In any case this is
better than the previous do-nothing behavior, and we're unlikely
to come up with a superior solution in time for v17.
While here, tighten up some coding rules about how ACLs in
pg_init_privs should never be null or empty. There's not any
obvious reason to allow that, and perhaps asserting that it's
not so will catch some bugs. (We were previously inconsistent
on the point, with some code paths taking care not to store
empty ACLs and others not.)
This leaves recordExtensionInitPrivWorker not doing anything
with its ownerId argument, but we'll deal with that separately.
catversion bump forced because of change of expected contents
of pg_shdepend when pg_init_privs entries exist.
Discussion: https://postgr.es/m/CAMT0RQSVgv48G5GArUvOVhottWqZLrvC5wBzBa4HrUdXe9VRXw@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 17 development.
pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
This enum was used to determine the first ID to use when assigning a
custom wait event for extensions, which is always 1. It was kept so
as it would be possible to add new in-core wait events in the category
"Extension". There is no such thing currently, so let's remove this
enum until a case justifying it pops up. This makes the code simpler
and easier to understand.
This has as effect to switch back autoprewarm.c to use PG_WAIT_EXTENSION
rather than WAIT_EVENT_EXTENSION, on par with v16 and older stable
branches.
Thinko in c9af054653.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/195c6c45-abce-4331-be6a-e87724e1d060@eisentraut.org
This commit extends the backend-side infrastructure of injection points
so as it becomes possible to register some input data when attaching a
point. This private data can be registered with the function name and
the library name of the callback when attaching a point, then it is
given as input argument to the callback. This gives the possibility for
modules to pass down custom data at runtime when attaching a point
without managing that internally, in a manner consistent with the
callback entry retrieved from the hash shmem table storing the injection
point data.
InjectionPointAttach() gains two arguments, to be able to define the
private data contents and its size.
A follow-up commit will rely on this infrastructure to close a race
condition with the injection point detach in the module
injection_points.
While on it, this changes InjectionPointDetach() to return a boolean,
returning false if a point cannot be detached. This has been mentioned
by Noah as useful when it comes to implement more complex tests with
concurrent point detach, solid with the automatic detach done for local
points in the test module.
Documentation is adjusted in consequence.
Per discussion with Noah Misch.
Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/20240509031553.47@rfd.leadboat.com
If an ACL recorded in pg_init_privs mentions a non-pinned role,
that reference must also be noted in pg_shdepend so that we know
that the role can't go away without removing the ACL reference.
Otherwise, DROP ROLE could succeed and leave dangling entries
behind, which is what's causing the recent upgrade-check failures
on buildfarm member copperhead.
This has been wrong since pg_init_privs was introduced, but it's
escaped notice because typical pg_init_privs entries would only
mention the bootstrap superuser (pinned) or at worst the owner
of the extension (who can't go away before the extension does).
We lack even a representation of such a role reference for
pg_shdepend. My first thought for a solution was entries listing
pg_init_privs in classid, but that doesn't work because then there's
noplace to put the granted-on object's classid. Rather than adding
a new column to pg_shdepend, let's add a new deptype code
SHARED_DEPENDENCY_INITACL. Much of the associated boilerplate
code can be cribbed from code for SHARED_DEPENDENCY_ACL.
A lot of the bulk of this patch just stems from the new need to pass
the object's owner ID to recordExtensionInitPriv, so that we can
consult it while updating pg_shdepend. While many callers have that
at hand already, a few places now need to fetch the owner ID of an
arbitrary privilege-bearing object. For that, we assume that there
is a catcache on the relevant catalog's OID column, which is an
assumption already made in ExecGrant_common so it seems okay here.
We do need an entirely new routine RemoveRoleFromInitPriv to perform
cleanup of pg_init_privs ACLs during DROP OWNED BY. It's analogous
to RemoveRoleFromObjectACL, but we can't share logic because that
function operates by building a command parsetree and invoking
existing GRANT/REVOKE infrastructure. There is of course no SQL
command that would update pg_init_privs entries when we're not in
process of creating their extension, so we need a routine that can
do the updates directly.
catversion bump because this changes the expected contents of
pg_shdepend. For the same reason, there's no hope of back-patching
this, even though it fixes a longstanding bug. Fortunately, the
case where it's a problem seems to be near nonexistent in the field.
If it weren't for the buildfarm breakage, I'd have been content to
leave this for v18.
Patch by me; thanks to Daniel Gustafsson for review and discussion.
Discussion: https://postgr.es/m/1745535.1712358659@sss.pgh.pa.us
This addresses some post-commit review comments for commits 6185c973,
de3600452, and 9425c596a0, with the following changes:
* Fix JSON_TABLE() syntax documentation to use the term
"path_expression" for JSON path expressions instead of
"json_path_specification" to be consistent with the other SQL/JSON
functions.
* Fix a typo in the example code in JSON_TABLE() documentation.
* Rewrite some newly added comments in jsonpath.h.
* In JsonPathQuery(), add missing cast to int before printing an enum
value.
Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
This improves some error messages emitted by SQL/JSON query functions
by mentioning column name when available, such as when they are
invoked as part of evaluating JSON_TABLE() columns. To do so, a new
field column_name is added to both JsonFuncExpr and JsonExpr that is
only populated when creating those nodes for transformed JSON_TABLE()
columns.
While at it, relevant error messages are reworded for clarity.
Reported-by: Jian He <jian.universality@gmail.com>
Suggested-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
GlobalVisTestNonRemovableHorizon() was only used for the implementation of
snapshot_too_old, which was removed in f691f5b80a. As using
GlobalVisTestNonRemovableHorizon() is not particularly efficient, no new uses
for it should be added. Therefore remove.
Discussion: https://postgr.es/m/20240415185720.q4dg4dlcyvvrabz4@awork3.anarazel.de
When matching constraints in AttachPartitionEnsureIndexes() we weren't
testing the constraint type, which could make a UNIQUE key lacking a
not-null constraint incorrectly satisfy a primary key requirement. Fix
this by testing that the constraint types match. (Other possible
mismatches are verified by comparing index properties.)
Discussion: https://postgr.es/m/202402051447.wimb4xmtiiyb@alvherre.pgsql
Let table AM define custom reloptions for its tables. This allows specifying
AM-specific parameters by the WITH clause when creating a table.
The reloptions, which could be used outside of table AM, are now extracted
into the CommonRdOptions data structure. These options could be by decision
of table AM directly specified by a user or calculated in some way.
The new test module test_tam_options evaluates the ability to set up custom
reloptions and calculate fields of CommonRdOptions on their base.
The code may use some parts from prior work by Hao Wu.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn
Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis
29f6a959c added a bump allocator type for efficient compact allocations.
Here we make use of this for non-bounded tuplesorts to store tuples.
This is very space efficient when storing narrow tuples due to bump.c
not having chunk headers. This means we can fit more tuples in work_mem
before spilling to disk, or perform an in-memory sort touching fewer
cacheline.
Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
This introduces a bump MemoryContext type. The bump context is best
suited for short-lived memory contexts which require only allocations
of memory and never a pfree or repalloc, which are unsupported.
Memory palloc'd into a bump context has no chunk header. This makes
bump a useful context type when lots of small allocations need to be
done without any need to pfree those allocations. Allocation sizes are
rounded up to the next MAXALIGN boundary, so with this and no chunk
header, allocations are very compact indeed.
Allocations are also very fast as bump does not check any freelists to
try and make use of previously free'd chunks. It just checks if there
is enough room on the current block, and if so it bumps the freeptr
beyond this chunk and returns the value that the freeptr was previously
pointing to. Simple and fast. A new block is malloc'd when there's not
enough space in the current block.
Code using the bump allocator must take care never to call any functions
which could try to call realloc() (or variants), pfree(),
GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated
chunk. Due to lack of chunk headers, these operations are unsupported.
To increase the chances of catching such issues, when compiled with
MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and
any attempt to perform an unsupported operation will result in an ERROR.
Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported
operation could result in a segfault.
A follow-on commit will implement the first user of bump.
Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did
technically allow a maximum of 8 memory context types, however, we've
opted to reserve some bit patterns which left us with only 4 slots, all
of which were used.
Here we add another bit which frees up 8 slots for future memory context
types.
In passing, adjust the enum names in MemoryContextMethodID to make it
more clear which ones can be used and which ones are reserved.
Author: Matthias van de Meent, David Rowley
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
This new DDL command splits a single partition into several parititions.
Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are
created using createPartitionTable() function with parent partition as the
template.
This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing. This is why this new DDL command
can't be recommended for large partitioned tables under a high load. However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively. This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).
Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient. All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.
The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value. Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan. Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.
Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.
Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute. These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).
Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes. They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem). They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).
There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions. Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
JSON_TABLE() allows JSON data to be converted into a relational view
and thus used, for example, in a FROM clause, like other tabular
data. Data to show in the view is selected from a source JSON object
using a JSON path expression to get a sequence of JSON objects that's
called a "row pattern", which becomes the source to compute the
SQL/JSON values that populate the view's output columns. Column
values themselves are computed using JSON path expressions applied to
each of the JSON objects comprising the "row pattern", for which the
SQL/JSON query functions added in 6185c9737c are used.
To implement JSON_TABLE() as a table function, this augments the
TableFunc and TableFuncScanState nodes that are currently used to
support XMLTABLE() with some JSON_TABLE()-specific fields.
Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN
clauses, which are required to provide more flexibility to extract
data out of nested JSON objects, but they are not implemented here
to keep this commit of manageable size.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
This is marked PGC_SIGHUP, so it can only be set in a configuration
file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE,
so it can't be set using ALTER SYSTEM. When set to false, the
ALTER SYSTEM command is disallowed.
There was considerable concern that this would be misinterpreted as
a security feature, which it is not, because a determined superuser
has various ways of bypassing it. Hence, a lot of work has gone into
wordsmithing the documentation, in the hopes of avoiding any such
confusion.
Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions
for the documentation from many others.
Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com
The user-facing name is "Other Platforms and Clients", but the
internal name seems too focused on clients specifically, especially
given the plan to add a new setting to this session that is about
platform or deployment model compatibility rather than client
compatibility.
Jelte Fennema-Nio
Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com
This adds 3 new variants of the random() function:
random(min integer, max integer) returns integer
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric
Each returns a random number x in the range min <= x <= max.
For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.
The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.
Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.
Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com
Previously, the DSA segment size always started with 1MB and grew up
to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios,
such as when the caller desired a soft constraint on the total DSA
segment size, limiting it to less than 1MB.
This commit introduces the capability to specify the initial and
maximum DSA segment sizes when creating a DSA area, providing more
flexibility and control over memory usage.
Reviewed-by: John Naylor, Tomas Vondra
Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com
It's now possible to specify a table access method via
CREATE TABLE ... USING for a partitioned table, as well change it with
ALTER TABLE ... SET ACCESS METHOD. Specifying an AM for a partitioned
table lets the value be used for all future partitions created under it,
closely mirroring the behavior of the TABLESPACE option for partitioned
tables. Existing partitions are not modified.
For a partitioned table with no AM specified, any new partitions are
created with the default_table_access_method.
Also add ALTER TABLE ... SET ACCESS METHOD DEFAULT, which reverts to the
original state of using the default for new partitions.
The relcache of partitioned tables is not changed: rd_tableam is not
set, even if a partitioned table has a relam set.
Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Author: Michaël Paquier <michael@paquier.xyz>
Reviewed-by: The authors themselves
Discussion: https://postgr.es/m/CAE-ML+9zM4wJCGCBGv01k96qQ3gFv4WFcFy=zqPHKeaEFwwv6A@mail.gmail.com
Discussion: https://postgr.es/m/20210308010707.GA29832%40telsasoft.com
Up to now, all of the "catcache list" objects within a catalog cache
were just chained together on a single dlist, requiring O(N) time to
search. Remarkably, we've not had serious performance problems with
that so far; but we got a complaint of a bad performance regression
from v15 in a case with a large number of roles in the system, which
traced down to O(N^2) total time when we probed N catcache lists.
Replace that data structure with a hashtable having an enlargeable
number of dlists, in an exactly parallel way to the data structure
we've used for years for the plain CatCTup cache members. The extra
cost of maintaining a hash table seems negligible, since we were
already computing a hash value for list searches.
Normally this'd be HEAD-only material, but in view of the performance
regression it seems advisable to back-patch into v16. In the v16
version of the patch, leave the dead cc_lists field where it is and
add the new fields at the end of struct catcache, to avoid possible
ABI breakage in case any external code is looking at these structs.
(We assume no external code is actually allocating new catcache
structs.)
Per report from alex work.
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz=vggErdSG7pv2s6vmmTOLJSg@mail.gmail.com
This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate. OpenSSL has
APIs for extracting notAfter and notBefore, but they are only
supported in recent versions so we have to calculate the dates
by hand in order to make this work for the older versions of
OpenSSL that we still support.
Original patch by Cary Huang with additional hacking by Jacob
and myself.
Author: Cary Huang <cary.huang@highgo.ca>
Co-author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca
The new table AM method free_rd_amcache is responsible for freeing all the
memory related to rd_amcache and setting free_rd_amcache to NULL. If the new
method is not specified, we still assume rd_amcache to be a single chunk of
memory, which could be just pfree'd.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
This introduces the following SQL/JSON functions for querying JSON
data using jsonpath expressions:
JSON_EXISTS(), which can be used to apply a jsonpath expression to a
JSON value to check if it yields any values.
JSON_QUERY(), which can be used to to apply a jsonpath expression to
a JSON value to get a JSON object, an array, or a string. There are
various options to control whether multi-value result uses array
wrappers and whether the singleton scalar strings are quoted or not.
JSON_VALUE(), which can be used to apply a jsonpath expression to a
JSON value to return a single scalar value, producing an error if it
multiple values are matched.
Both JSON_VALUE() and JSON_QUERY() functions have options for
handling EMPTY and ERROR conditions, which can be used to specify
the behavior when no values are matched and when an error occurs
during jsonpath evaluation, respectively.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov,
Nikita Malakhov, Peter Eisentraut, Tomas Vondra
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
Based on comments from Peter Eisentraut.
* Document CREATE DATABASE ... BUILTIN_LOCALE.
* Determine required encoding based on locale name for CREATE
COLLATION. Use -1 for "C" (requires catversion bump).
* initdb output fixups.
* Make ctype_is_c a constant true for now.
* Fixups to ICU 010_create_database.pl test.
Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org
New provider for collations, like "libc" or "icu", but without any
external dependency.
Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.
The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.
By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.
Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX,
REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation.
Roles with privileges of pg_maintain may run those same commands on
all relations.
This was previously committed for v16, but it was reverted in
commit 151c22deee due to concerns about search_path tricks that
could be used to escalate privileges to the table owner. Commits
2af07e2f74, 59825d1639, and c7ea3f4229 resolved these concerns by
restricting search_path when running maintenance commands.
Bumps catversion.
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13
... and in particular don't return them as replica identity.
The motivation for this change is letting the primary keys be seen by
code that derives NOT NULL constraints from them, when creating
inheritance children; before this change, if you had a deferrable PK,
pg_dump would not recreate the attnotnull marking properly, because the
column would not be considered as having anything to back said marking
after dropping the throwaway NOT NULL constraint.
The reason we don't want these PKs as replica identities is that
replication can corrupt data, if the uniqueness constraint is
transiently broken.
Reported-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b94QonkgsbDXofakHDnORQNgafd1y3Oa5QXfpQNJyXyQ7A@mail.gmail.com
You might run out of stack space with recursion, which is not nice in
functions that might be used e.g. at cleanup after transaction
abort. MemoryContext contains pointer to parent and siblings, so we
can traverse a tree of contexts iteratively, without using
stack. Refactor the functions to do that.
MemoryContextStats() still recurses, but it now has a limit to how
deep it recurses. Once the limit is reached, it prints just a summary
of the rest of the hierarchy, similar to how it summarizes contexts
with lots of children. That seems good anyway, because a context dump
with hundreds of nested contexts isn't very readable.
Report by Egor Chindyaskin and Alexander Lakhin.
Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.
The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.
Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.
Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
This implements a radix tree data structure based on the design in
"The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases"
by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main
technique that makes it adaptive is using several different node types,
each with a different capacity of elements, and a different algorithm
for accessing them. The nodes start small and grow/shrink as needed.
The main advantage over hash tables is efficient sorted iteration and
better memory locality when successive keys are lexicographically
close together. The implementation currently assumes 64-bit integer
keys, and traversing the tree is in general slower than a linear
probing hash table, so this is not a general-purpose associative array.
The paper describes two other techniques not implemented here,
namely "path compression" and "lazy expansion". These can further
reduce memory usage and speed up traversal, but the former would add
significant complexity and the latter requires storing the full key
with the value. We do trivially compress the path when leading bytes
of the key are zeros, however.
For value storage, we use "combined pointer/value slots", as
recommended in the paper. Values of size equal or smaller than the the
platform's pointer type are stored in the array of child pointers in
the last level node, while larger values are each stored in a separate
allocation. This is for now fixed at compile time, but it would be
fairly trivial to allow determining at runtime how variable-length
values are stored.
One innovation in our implementation compared to the ART paper is
decoupling the notion of node "size class" from "kind". The size
classes within a given node kind have the same underlying type, but
a variable capacity for children, so we can introduce additional node
sizes with little additional code.
To enable different use cases to specialize for different value types
and for shared/local memory, we use macro-templatized code generation
in the same manner as simplehash.h and sort_template.h.
Future commits will use this infrastructure for storing TIDs.
Patch by Masahiko Sawada and John Naylor, but a substantial amount of
credit is due to Andres Freund, whose proof-of-concept was a valuable
source of coding idioms and awareness of performance pitfalls, and
who reviewed earlier versions.
Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
This reverts commit eae7be600b, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.
Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().
As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers. Depending on the expressions
or predicate used, this could cause the parallel build to fail.
Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions. Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.
Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working
with parallel query.
Now that the call requires more arguments, wrap the call in a new
function to avoid code duplication and offer a place for a comment.
Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.
The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.
One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches. In order for this to work with good
performance, two additional changes are made:
1. the cache is divided in "banks" (to borrow terminology from CPU
caches), and algorithms such as eviction buffer search only affect
one specific bank. This forestalls the problem that linear searching
for a specific buffer across the whole cache takes too long: we only
have to search the specific bank, whose size is small. This work is
authored by Andrey Borodin.
2. Change the locking regime for the SLRU banks, so that each bank uses
a separate LWLock. This allows for increased scalability. This work
is authored by Dilip Kumar. (A part of this was previously committed as
d172b717c6f4.)
Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next. This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this. I (Álvaro) also added
lots of comments to make sure the design is sound.
The new GUCs match the names introduced by bcdfa5f2e2 in the
pg_stat_slru view.
The default values for these parameters are similar to the previous
sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB. (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.) The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0. These values based on
shared_buffers may need to be revisited, but that's an easy change.
There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally). However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
The new names are intended to match those in an upcoming patch that adds
a few GUCs to configure the SLRU buffer sizes.
Backwards compatibility concern: this changes the accepted names for
function pg_stat_slru_rest(). Since this function recognizes "any other
string" as a request to reset the entry for "other", this means that
calling it with the old names would silently reset "other" instead of
doing nothing or throwing an error.
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
Many modern compilers are able to optimize function calls to functions
where the parameters of the called function match a leading subset of
the calling function's parameters. If there are no instructions in the
calling function after the function is called, then the compiler is free
to avoid any stack frame setup and implement the function call as a
"jmp" rather than a "call". This is called sibling call optimization.
Here we adjust the memory allocation functions in mcxt.c to allow this
optimization. This requires moving some responsibility into the memory
context implementations themselves. It's now the responsibility of the
MemoryContext to check for malloc failures. This is good as it both
allows the sibling call optimization, but also because most small and
medium allocations won't call malloc and just allocate memory to an
existing block. That can't fail, so checking for NULLs in that case
isn't required.
Also, traditionally it's been the responsibility of palloc and the other
allocation functions in mcxt.c to check for invalid allocation size
requests. Here we also move the responsibility of checking that into the
MemoryContext. This isn't to allow the sibling call optimization, but
more because most of our allocators handle large allocations separately
and we can just add the size check when doing large allocations. We no
longer check this for non-large allocations at all.
To make checking the allocation request sizes and ERROR handling easier,
add some helper functions to mcxt.c for the allocators to use.
Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within the transaction will be
terminated after spanning longer than this timeout.
However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.
Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin <amborodin@acm.org>
Author: Japin Li <japinli@hotmail.com>
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Thanks to commit 3b00fdba9f, this check in the SIGTERM handler for
the startup process is now obsolete and can be removed. Instead of
leaving around the dead function write_stderr_signal_safe(), I've
opted to just remove it for now.
This partially reverts commit 97550c0711.
Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose(), which freed the memory.
Guarantee that the object won't be destroyed until the end of the
current transaction, or in recovery, the commit/abort record that
destroys the underlying storage.
smgrclose() is now just an alias for smgrrelease(). It closes files
and forgets all state except the rlocator, but keeps the SMgrRelation
object valid.
A new smgrdestroy() function is used by rare places that know there
should be no other references to the SMgrRelation.
The short version:
* smgrclose() is now just an alias for smgrrelease(). It releases
resources, but doesn't destroy until EOX
* smgrdestroy() now frees memory, and should rarely be used.
Existing code should be unaffected, but it is now possible for code that
has an SMgrRelation object to use it repeatedly during a transaction as
long as the storage hasn't been physically dropped. Such code would
normally hold a lock on the relation.
This also replaces the "ownership" mechanism of SMgrRelations with a
pin counter. An SMgrRelation can now be "pinned", which prevents it
from being destroyed at end of transaction. There can be multiple pins
on the same SMgrRelation. In practice, the pin mechanism is only used
by the relcache, so there cannot be more than one pin on the same
SMgrRelation. Except with swap_relation_files XXX
Author: Thomas Munro, Heikki Linnakangas
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h. Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics). Moving the function to the much
less popular ascii.h resolves these issues for now.
This commit is back-patched for the benefit of the aforementioned
third-party tool. The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists. This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code. It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().
Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
This adds a new "Memory:" line under the "Planning:" group (which
currently only has "Buffers:") when the MEMORY option is specified.
In order to make the reporting reasonably accurate, we create a separate
memory context for planner activities, to be used only when this option
is given. The total amount of memory allocated by that context is
reported as "allocated"; we subtract memory in the context's freelists
from that and report that result as "used". We use
MemoryContextStatsInternal() to obtain the quantities.
The code structure to show buffer usage during planning was not in
amazing shape, so I (Álvaro) modified the patch a bit to clean that up
in passing.
Author: Ashutosh Bapat
Reviewed-by: David Rowley, Andrey Lepikhov, Jian He, Andy Fan
Discussion: https://www.postgresql.org/message-id/CAExHW5sZA=5LJ_ZPpRO-w09ck8z9p7eaYAqq3Ks9GDfhrxeWBw@mail.gmail.com
Formerly, these were only supported in to_char(), but there seems
little reason for that restriction. We should at least have enough
support to permit round-tripping the output of to_char().
In that spirit, TZ accepts either zone abbreviations or numeric
(HH or HH:MM) offsets, which are the cases that to_char() can output.
In an ideal world we'd make it take full zone names too, but
that seems like it'd introduce an unreasonable amount of ambiguity,
since the rules for POSIX-spec zone names are so lax.
OF is a subset of this, accepting only HH or HH:MM.
One small benefit of this improvement is that we can simplify
jsonpath's executeDateTimeMethod function, which no longer needs
to consider the HH and HH:MM cases separately. Moreover, letting
it accept zone abbreviations means it will accept "Z" to mean UTC,
which is emitted by JSON.stringify() for example.
Patch by me, reviewed by Aleksander Alekseev and Daniel Gustafsson
Discussion: https://postgr.es/m/1681086.1686673242@sss.pgh.pa.us
This commit implements ithe jsonpath .bigint(), .boolean(),
.date(), .decimal([precision [, scale]]), .integer(), .number(),
.string(), .time(), .time_tz(), .timestamp(), and .timestamp_tz()
methods.
.bigint() converts the given JSON string or a numeric value to
the bigint type representation.
.boolean() converts the given JSON string, numeric, or boolean
value to the boolean type representation. In the numeric case, only
integers are allowed. We use the parse_bool() backend function
to convert a string to a bool.
.decimal([precision [, scale]]) converts the given JSON string
or a numeric value to the numeric type representation. If precision
and scale are provided for .decimal(), then it is converted to the
equivalent numeric typmod and applied to the numeric number.
.integer() and .number() convert the given JSON string or a
numeric value to the int4 and numeric type representation.
.string() uses the datatype's output function to convert numeric
and various date/time types to the string representation.
The JSON string representing a valid date/time is converted to the
specific date or time type representation using jsonpath .date(),
.time(), .time_tz(), .timestamp(), .timestamp_tz() methods. The
changes use the infrastructure of the .datetime() method and perform
the datatype conversion as appropriate. Unlike the .datetime()
method, none of these methods accept a format template and use ISO
DateTime format instead. However, except for .date(), the
date/time related methods take an optional precision to adjust the
fractional seconds.
Jeevan Chalke, reviewed by Peter Eisentraut and Andrew Dunstan.
This adds a Node *escontext parameter to it and a bunch of functions
downstream to it, replacing any ereport()s in that path by either
errsave() or ereturn() as appropriate. This also adds code to those
functions where necessary to return early upon encountering a soft
error.
The changes here are mainly intended to suppress errors in the
functions of jsonfuncs.c. Functions in any external modules, such as
arrayfuncs.c, that those functions may in turn call are not changed
here based on the assumption that the various checks in jsonfuncs.c
functions should ensure that only values that are structurally valid
get passed to the functions in those external modules. An exception
is made for domain_check() to allow handling domain constraint
violation errors softly.
For testing, this adds a function jsonb_populate_record_valid(),
which returns true if jsonb_populate_record() would finish without
causing an error for the provided JSON object, false otherwise. Note
that jsonb_populate_record() internally calls populate_record(),
which in turn uses populate_record_field().
Extracted from a much larger patch to add SQL/JSON query functions.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewers have included (in no particular order) Andres Freund,
Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers,
Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby,
Álvaro Herrera, Jian He, Peter Eisentraut
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
Injection points are a new facility that makes possible for developers
to run custom code in pre-defined code paths. Its goal is to provide
ways to design and run advanced tests, for cases like:
- Race conditions, where processes need to do actions in a controlled
ordered manner.
- Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force
recovery, etc.
- Arbitrary sleeps.
This implements some basics, and there are plans to extend it more in
the future depending on what's required. Hence, this commit adds a set
of routines in the backend that allows developers to attach, detach and
run injection points:
- A code path calling an injection point can be declared with the macro
INJECTION_POINT(name).
- InjectionPointAttach() and InjectionPointDetach() to respectively
attach and detach a callback to/from an injection point. An injection
point name is registered in a shmem hash table with a library name and a
function name, which will be used to load the callback attached to an
injection point when its code path is run.
Injection point names are just strings, so as an injection point can be
declared and run by out-of-core extensions and modules, with callbacks
defined in external libraries.
This facility is hidden behind a dedicated switch for ./configure and
meson, disabled by default.
Note that backends use a local cache to store callbacks already loaded,
cleaning up their cache if a callback has found to be removed on a
best-effort basis. This could be refined further but any tests but what
we have here was fine with the tests I've written while implementing
these backend APIs.
Author: Michael Paquier, with doc suggestions from Ashutosh Bapat.
Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip
Kumar, Amul Sul, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
These support functions will transform expressions with constant
range values into direct comparisons on the range bound values,
which are frequently better-optimizable. The transformation is
skipped however if it would require double evaluation of a
volatile or expensive element expression.
Along the way, add the range opfamily OID to range typcache entries,
since load_rangetype_info has to compute that anyway and it seems
silly to duplicate the work later.
Kim Johan Andersson and Jian He, reviewed by Laurenz Albe
Discussion: https://postgr.es/m/94f64d1f-b8c0-b0c5-98bc-0793a34e0851@kimmet.dk
It can be useful for a hash function to expose separate initialization,
accumulation, and finalization steps. In particular, this is useful
for building inline hash functions for simplehash. Instead of trying
to whack around hash_bytes while maintaining its current behavior on
all platforms, we base this work on fasthash (MIT licensed) which
is simple, faster than hash_bytes for inputs over 12 bytes long,
and also passes the hash function testing suite SMHasher.
The fasthash functions have been reimplemented using our added-on
incremental interface to validate that this method will still give
the same answer, provided we have the input length ahead of time.
This functionality lives in a new header hashfn_unstable.h. The name
implies we have the freedom to change things across versions that
would be unacceptable for our other hash functions that are used for
e.g. hash indexes and hash partitioning. As such, these should only
be used for in-memory data structures like hash tables. There is also
no guarantee of being independent of endianness or pointer size.
As demonstration, use fasthash for pgstat_hash_hash_key. Previously
this called the 32-bit murmur finalizer on the three elements,
then joined them with hash_combine(). The new function is simpler,
faster and takes up less binary space. While the collision and bias
behavior were almost certainly fine with the previous coding, now we
have objective confidence of that.
There are other places that could benefit from this, but that is left
for future work.
Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao
Credit to Andres Freund for the idea
Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de
This changes the pg_attribute field attstattarget into a nullable
field in the variable-length part of the row. If no value is set by
the user for attstattarget, it is now null instead of previously -1.
This saves space in pg_attribute and tuple descriptors for most
practical scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108
to 104.) Also, null is the semantically more correct value.
The ANALYZE code internally continues to represent the default
statistics target by -1, so that that code can avoid having to deal
with null values. But that is now contained to the ANALYZE code.
Only the DDL code deals with attstattarget possibly null.
For system columns, the field is now always null. The ANALYZE code
skips system columns anyway.
To set a column's statistics target to the default value, the new
command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET
STATISTICS -1 still works.)
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
If we have DECHIST statistics about the argument expression, use
the average number of distinct elements as the array length estimate.
(It'd be better to use the average total number of elements, but
that is not currently calculated by compute_array_stats(), and
it's unclear that it'd be worth extra effort to get.)
To do this, we have to change the signature of estimate_array_length
to pass the "root" pointer. While at it, also change its result
type to "double". That's probably not really necessary, but it
avoids any risk of overflow of the value extracted from DECHIST.
All existing callers are going to use the result in a "double"
calculation anyway.
Paul Jungwirth, reviewed by Jian He and myself
Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com
Second attempt at 283a95da92. Since we can't reorder the enum values
of JsonPathItemType, instead reorder the switch cases where they are
used to generally follow the order of the enum values, for better
maintainability.
This reverts commit 283a95da92.
The reordering of JsonPathItemType affects the binary on-disk
compatibility of the jsonpath type, so we must not change it. Revert
for now and consider.
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:
* The "spool" grouping tuplesort and the heap/index is not necessary.
The heap/index are available as separate arguments, causing confusion.
So remove the spool, and use the tuplesort directly.
* The new tuplesort variant does not need the heap/index, as it sorts
simply by the range block number, without accessing the tuple data.
So simplify that too.
Initial report and patch by Ranier Vilela, further cleanup by me.
Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
When enabled (default off), this logs a backtrace anytime elog() or an
equivalent ereport() for internal errors is called.
This is not well covered by the existing backtrace_functions, because
there are many equally-worded low-level errors in many functions. And
if you find out where the error is, then you need to manually rewrite
the elog() to ereport() to attach the errbacktrace(), which is
annoying. Having a backtrace automatically on every elog() call could
be very helpful during development for various kinds of common errors
from palloc, syscache, node support, etc.
Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a
faster version of MemoryContextAllocZero(), but modern compilers turn
the MemSetLoop() into a call to memset() anyway, making it more or
less identical to MemoryContextAllocZero(). That was the only user of
MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast().
- Convert newNode() to a static inline function. When this was
originally originally written, it was written as a macro because
testing showed that gcc didn't inline the size check as we
intended. Modern compiler versions do, and now that it just calls
palloc0() there is no size-check to inline anyway.
One nice effect is that the palloc0() takes one less argument than
MemoryContextAllocZeroAligned(), which saves a few instructions in the
callers of newNode().
Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro
Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.
The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.
The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).
This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.
Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
As of commits dd04e958c8 and 1833f1a1c3, tuplestore_donestoring(),
SPI_push(), SPI_pop(), SPI_push_conditional(),
SPI_pop_conditional(), and SPI_restore_connection() are no-op
macros provided for backwards compatibility. This commit removes
these macros, so any uses in third-party code will need to be
removed, too. Since these macros have been no-ops for a while,
such adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVeO58JM5tK2Qa8QC-%3DkC8sdkJOTd4BFU%3DK8zs4gGYpjQ%40mail.gmail.com
A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.
This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.)
The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.
In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.
Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.
Reviewed-by: Alexander Lakhin, Thomas Munro
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds. This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER. This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.
This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.
Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset(). Now
that some time has passed, this macro seems more likely to create
confusion.
This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset(). Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead. Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.
Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p. This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.
(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)
As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary. We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.
The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/3fd0f1bc-4483-4ba9-8aa0-64765b052039@eisentraut.org
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two. This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing. In any case this coding seems
much clearer and more straightforward than what we had before.
This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add. Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.
Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.
Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.
Tom Lane, Heikki Linnakangas, and Jian He. Thanks to Alexander Lakhin
for the initial report and for review/testing.
Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.
Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.
The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.
All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.
The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.
Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget. There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.
Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.
There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.
Add tests in src/test/modules/test_resowner.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds. In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen. Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.
To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.
The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text. v11 psql
lacks that parameter, so omit the tests in that branch.
Our thanks to Pedro Gallegos for reporting this problem.
Security: CVE-2023-5869
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val. Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice. Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
Previously, ALTER SYSTEM failed if the target GUC wasn't present in
the session's GUC hashtable. That is a reasonable behavior for core
(single-part) GUC names, and for custom GUCs for which we have loaded
an extension that's reserved the prefix. But it's unnecessarily
restrictive otherwise, and it also causes inconsistent behavior:
you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar"
earlier in the session. That's fairly silly.
Hence, refactor things so that we can execute ALTER SYSTEM even
if the variable doesn't have a GUC hashtable entry, as long as the
name meets the custom-variable naming requirements and does not
have a reserved prefix. (It's safe to do this even if the
variable belongs to an extension we currently don't have loaded.
A bad value will at worst cause a WARNING when the extension
does get loaded.)
Also, adjust GRANT ON PARAMETER to have the same opinions about
whether to allow an unrecognized GUC name, and to throw the
same errors if not (it previously used a one-size-fits-all
message for several distinguishable conditions). By default,
only a superuser will be allowed to do ALTER SYSTEM SET on an
unrecognized name, but it's possible to GRANT the ability to
do it.
Patch by me, pursuant to a documentation complaint from
Gavin Panella. Arguably this is a bug fix, but given the
lack of other complaints I'll refrain from back-patching.
Discussion: https://postgr.es/m/2617358.1697501956@sss.pgh.pa.us
Discussion: https://postgr.es/m/169746329791.169914.16613647309012285391@wrigleys.postgresql.org
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system(). This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers. If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit(). This can end badly. For
example, both processes will try to remove themselves from the
PGPROC shared array.
To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before. If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.
This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.
Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
This commit introduces trigger on login event, allowing to fire some actions
right on the user connection. This can be useful for logging or connection
check purposes as well as for some personalization of environment. Usage
details are described in the documentation included, but shortly usage is
the same as for other triggers: create function returning event_trigger and
then create event trigger on login event.
In order to prevent the connection time overhead when there are no triggers
the commit introduces pg_database.dathasloginevt flag, which indicates database
has active login triggers. This flag is set by CREATE/ALTER EVENT TRIGGER
command, and unset at connection time when no active triggers found.
Author: Konstantin Knizhnik, Mikhail Gribkov
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru
Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko
Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund
Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk
Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu
The versions of these functions that accept object OIDs are supposed
to return NULL, rather than failing, if the target object has been
dropped. This makes it safe(r) to use them in queries that scan
catalogs, since the functions will be applied to objects that are
visible in the query's snapshot but might now be gone according to
the catalog snapshot. In most cases we implemented this by doing
a SearchSysCacheExists test and assuming that if that succeeds, we
can safely invoke the appropriate aclchk.c function, which will
immediately re-fetch the same syscache entry. It was argued that
if the existence test succeeds then the followup fetch must succeed
as well, for lack of any intervening AcceptInvalidationMessages call.
Alexander Lakhin demonstrated that this is not so when
CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly
dropped at the end of SearchSysCacheExists, and then it is possible
for the catalog snapshot to get advanced while re-fetching the entry.
Alexander's test case requires the operation to happen inside a
parallel worker, but that seems incidental to the fundamental problem.
What remains obscure is whether there is a way for this to happen in a
non-debug build. Nonetheless, CATCACHE_FORCE_RELEASE is a very useful
test methodology, so we'd better make the code safe for it.
After some discussion we concluded that the most future-proof fix
is to give up the assumption that checking SearchSysCacheExists can
guarantee success of a later fetch. At best that assumption leads
to fragile code --- for example, has_type_privilege appears broken
for array types even if you believe the assumption holds. And it's
not even particularly efficient.
There had already been some work towards extending the aclchk.c
APIs to include "is_missing" output flags, so this patch extends
that work to cover all the aclchk.c functions that are used by the
has_xxx_privilege() functions. (This allows getting rid of some
ad-hoc decisions about not throwing errors in certain places in
aclchk.c.)
In passing, this fixes the has_sequence_privilege() functions to
provide the same guarantees as their cousins: for some reason the
SearchSysCacheExists tests never got added to those.
There is more work to do to remove the unsafe coding pattern with
SearchSysCacheExists in other places, but this is a pretty
self-contained patch so I'll commit it separately.
Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD. (Perhaps we should clean up the has_sequence_privilege()
oversight in the back branches, but in the absence of field complaints
I'm not too excited about that either.)
Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
* sync_method is renamed to wal_sync_method.
* sync_method_options[] is renamed to wal_sync_method_options[].
* assign_xlog_sync_method() is renamed to assign_wal_sync_method().
* The names of the available synchronization methods are now
prefixed with "WAL_SYNC_METHOD_" and have been moved into a
WalSyncMethod enum.
* PLATFORM_DEFAULT_SYNC_METHOD is renamed to
PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is
renamed to DEFAULT_WAL_SYNC_METHOD.
These more descriptive names help distinguish the code for
wal_sync_method from the code for DataDirSyncMethod (e.g., the
recovery_init_sync_method configuration parameter and the
--sync-method option provided by several frontend utilities). This
change also prevents name collisions between the aforementioned
sets of code. Since this only improves the naming of internal
identifiers, there should be no behavior change.
Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived. With new features
being added for SQL/JSON this is no longer the case. Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.
At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.
This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it. AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.
Per Coverity complaint about datum_to_jsonb_internal().
Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
Unlike the other pg_stat_get_backend* functions,
pg_stat_get_backend_subxact() looks up the backend entry by using
its integer argument as a 1-based index in an internal array. The
other functions look for the entry with the matching session
backend ID. These numbers often match, but that isn't reliably
true.
This commit resolves this discrepancy by introducing
pgstat_get_local_beentry_by_backend_id() and using it in
pg_stat_get_backend_subxact(). We cannot use
pgstat_get_beentry_by_backend_id() because it returns a
PgBackendStatus, which lacks the locally computed additions
available in LocalPgBackendStatus that are required by
pg_stat_get_backend_subxact().
Author: Ian Barwick
Reviewed-by: Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
Presently, pgstat_fetch_stat_beentry() accepts a session's backend
ID as its argument, and pgstat_fetch_stat_local_beentry() accepts a
1-based index in an internal array as its argument. The former is
typically used wherever a user must provide a backend ID, and the
latter is usually used internally when looping over all entries in
the array. This difference was first introduced by d7e39d72ca.
Before that commit, both functions accepted a 1-based index to the
internal array.
This commit renames these two functions to make it clear whether
they use the backend ID or the 1-based index to look up the entry.
This is preparatory work for a follow-up change that will introduce
a function for looking up a LocalPgBackendStatus using a backend
ID.
Reviewed-by: Ian Barwick, Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
Make the primary messages more compact and make the detail messages
uniform. In initdb.c and pg_resetwal.c, use the newish
option_parse_int() to simplify some of the option parsing. For the
backend GUC wal_segment_size, add a GUC check hook to do the
verification instead of coding it in bootstrap.c. This might be
overkill, but that way the check is in the right place and it becomes
more self-documenting.
In passing, make pg_controldata use the logging API for warning
messages.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/9939aa8a-d7be-da2c-7715-0a0b5535a1f7@eisentraut.org
This new view, wrapped around a SRF, shows some information known about
wait events, as of:
- Name.
- Type (Activity, I/O, Extension, etc.).
- Description.
All the information retrieved comes from wait_event_names.txt, and the
description is the same as the documentation with filters applied to
remove any XML markups. This view is useful when joined with
pg_stat_activity to get the description of a wait event reported.
Custom wait events for extensions are included in the view.
Original idea by Yves Colin.
Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Masahiro Ikeda, Tom Lane, Michael
Paquier
Discussion: https://postgr.es/m/0e2ae164-dc89-03c3-cf7f-de86378053ac@gmail.com
Previously, if a specialized comparator found equal datum1 keys,
the "comparetup" function would repeat the comparison on the
datum before proceeding with the unabbreviated first key
and/or additional sort keys.
Move comparing additional sort keys into "tiebreak" functions so
that specialized comparators can call these directly if needed,
avoiding duplicate work.
Reviewed by David Rowley
Discussion: https://postgr.es/m/CAFBsxsGaVfUrjTghpf%3DkDBYY%3DjWx1PN-fuusVe7Vw5s0XqGdGw%40mail.gmail.com
Currently, the names of the custom wait event must be registered for
each backend, requiring all these to link to the shared memory area of
an extension, even if these are not loaded with
shared_preload_libraries.
This patch relaxes the constraints related to this infrastructure by
storing the wait events and their names in two dynamic hash tables in
shared memory. This has the advantage to simplify the registration of
custom wait events to a single routine call that returns an event ID
ready for consumption:
uint32 WaitEventExtensionNew(const char *wait_event_name);
The caller of this routine can then cache locally the ID returned, to be
used for pgstat_report_wait_start(), WaitLatch() or a similar routine.
The implementation uses two hash tables: one with a key based on the
event name to avoid duplicates and a second using the event ID as key
for event lookups, like on pg_stat_activity. These tables can hold a
minimum of 16 entries, and a maximum of 128 entries, which should be plenty
enough.
The code changes done in worker_spi show how things are simplified (most
of the code removed in this commit comes from there):
- worker_spi_init() is gone.
- No more shared memory hooks required (size requested and
initialization).
- The custom wait event ID is cached in the process that needs to set
it, with one single call to WaitEventExtensionNew() to retrieve it.
Per suggestion from Andres Freund.
Author: Masahiro Ikeda, with a few tweaks from me.
Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de
Store function config settings in lists to avoid the need to parse and
allocate for each function execution.
Speedup is modest but significant. Additionally, this change also
seems cleaner and supports some other performance improvements under
discussion.
Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel@j-davis.com
Reviewed-by: Nathan Bossart
Commit 19d8e2308b changed the list of set-of-columns that can be
returned by RelationGetIndexAttrBitmap, but didn't update its
"documentation". That was pretty hard to read already, so rewrite to
make it more comprehensible, adding the missing values while at it.
Backpatch to 16, like that commit.
Discussion: https://postgr.es/m/20230809091155.7c7f3gttjk3dj4ze@alvherre.pgsql
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
The previous commit removed the "override" APIs. Surviving APIs facilitate
plancache.c to snapshot search_path and test whether the current value equals
a remembered snapshot.
Aleksander Alekseev. Reported by Alexander Lakhin and Noah Misch.
Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com
Two backend routines are added to allow extension to allocate and define
custom wait events, all of these being allocated in the type
"Extension":
* WaitEventExtensionNew(), that allocates a wait event ID computed from
a counter in shared memory.
* WaitEventExtensionRegisterName(), to associate a custom string to the
wait event ID allocated.
Note that this includes an example of how to use this new facility in
worker_spi with tests in TAP for various scenarios, and some
documentation about how to use them.
Any code in the tree that currently uses WAIT_EVENT_EXTENSION could
switch to this new facility to define custom wait events. This is left
as work for future patches.
Author: Masahiro Ikeda
Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath
Rupireddy
Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com
This Patch introduces three SQL standard JSON functions:
JSON()
JSON_SCALAR()
JSON_SERIALIZE()
JSON() produces json values from text, bytea, json or jsonb values,
and has facilitites for handling duplicate keys.
JSON_SCALAR() produces a json value from any scalar sql value,
including json and jsonb.
JSON_SERIALIZE() produces text or bytea from input which containis
or represents json or jsonb;
For the most part these functions don't add any significant new
capabilities, but they will be of use to users wanting standard
compliant JSON handling.
Catversion bumped as this changes ruleutils.c.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera,
Peter Eisentraut
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
This is to export datum_to_json(), datum_to_jsonb(), and
jsonb_from_cstring(), though the last one is exported as
jsonb_from_text().
A subsequent commit to add new SQL/JSON constructor functions will
need them for calling from the executor.
Discussion: https://postgr.es/m/20230720160252.ldk7jy6jqclxfxkq%40alvherre.pgsql
This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate.
Author: Cary Huang <cary.huang@highgo.ca>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca
This essentially removes the JsonbTypeCategory enum and
jsonb_categorize_type() and integrates any jsonb-specific logic that
was in jsonb_categorize_type() into json_categorize_type(), now
moved to jsonfuncs.c. The remaining JsonTypeCategory enum and
json_categorize_type() cover the needs of the callers in both json.c
and jsonb.c. json_categorize_type() has grown a new parameter named
is_jsonb for callers to engage the jsonb-specific behavior of
json_categorize_type().
One notable change in the now exported API of json_categorize_type()
is that it now always returns *outfuncoid even though a caller may
have no need currently to see one.
This is in preparation of later commits to implement additional
SQL/JSON functions.
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
Commit 89e46da5e5 allowed using BTREE indexes that are neither
PRIMARY KEY nor REPLICA IDENTITY on the subscriber during apply of
update/delete. This patch extends that functionality to also allow HASH
indexes.
We explored supporting other index access methods as well but they don't
have a fixed strategy for equality operation which is required by the
current infrastructure in logical replication to scan the indexes.
Author: Kuroda Hayato
Reviewed-by: Peter Smith, Onder Kalaci, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58669D7414E59664E17A5827F522A@TYAPR01MB5866.jpnprd01.prod.outlook.com
This commit adds a new type of parallel message 'P' to allow a
parallel worker to poke at a leader to update the progress.
Currently it supports only incremental progress reporting but it's
possible to allow for supporting of other backend progress APIs in the
future.
There are no users of this new message type as of this commit. That
will follow in future commits.
Idea from Andres Freund.
Author: Sami Imseih
Reviewed by: Michael Paquier, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.
Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed. With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs
Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.
The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them. These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.
Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.
This commit has been tested with configure/Makefile, the CI and VPATH
build. clean, distclean and maintainer-clean were working fine.
Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
The following changes are done:
- Addition of WaitEventBufferPin and WaitEventExtension, that hold a
list of wait events related to each category.
- Addition of two functions that encapsulate the list of wait events for
each category.
- Rename BUFFER_PIN to BUFFERPIN (only this wait event class used an
underscore, requiring a specific rule in the automation script).
These changes make a bit easier the automatic generation of all the code
and documentation related to wait events, as all the wait event
categories are now controlled by consistent structures and functions.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/c6f35117-4b20-4c78-1df5-d3056010dcf5@gmail.com
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
Run pgindent and pgperltidy. It seems we're still some ways
away from all committers doing this automatically. Now that
we have a buildfarm animal that will whine about poorly-indented
code, we'll try to keep the tree more tidy.
Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
Split nbtree's _bt_getbuf function is two: code that read locks or write
locks existing pages remains in _bt_getbuf, while code that deals with
allocating new pages is moved to a new, dedicated function called
_bt_allocbuf. This simplifies most _bt_getbuf callers, since it is no
longer necessary for them to pass a heaprel argument. Many of the
changes to nbtree from commit 61b313e4 can be reverted. This minimizes
the divergence between HEAD/PostgreSQL 16 and earlier release branches.
_bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to
_bt_getbuf. There are only 3 affected call sites, all of which continue
to pass a heaprel for recovery conflict purposes. Note that nbtree's
use of P_NEW was superficial; nbtree never actually relied on the P_NEW
code paths in bufmgr.c, so this change is strictly mechanical.
GiST already took the same approach; it has a dedicated function for
allocating new pages called gistNewBuffer(). That factor allowed commit
61b313e4 to make much more targeted changes to GiST.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com
Eventually it is likely worth trying to deal with this in a more expansive
way, by generating dependency files generated within the scripts. But it's not
entirely obvious how to do that in perl and is work more suitable for 17
anyway.
Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Discussion: https://postgr.es/m/87v8g7s6bf.fsf@wibble.ilmari.org
While executing maintenance operations (ANALYZE, CLUSTER, REFRESH
MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to
'pg_catalog, pg_temp' to prevent inconsistent behavior.
Functions that are used for functional indexes, in index expressions,
or in materialized views and depend on a different search path must be
declared with CREATE FUNCTION ... SET search_path='...'.
This change addresses a security risk introduced in commit 60684dd834,
where a role with MAINTAIN privileges on a table may be able to
escalate privileges to the table owner. That commit is not yet part of
any release, so no need to backpatch.
Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com
Reviewed-by: Greg Stark
Reviewed-by: Nathan Bossart
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes. This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.
This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with. In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.
ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.
While on it, this commit fixes a third problem related to the amount of
data reported for key attributes. The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available. This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect. Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().
The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.
Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
This is equivalent to a revert of f193883 and fb32748, with the addition
that the declaration of the SQLValueFunction node needs to gain a couple
of node_attr for query jumbling. The performance impact of removing the
function call inlining is proving to be too huge for some workloads
where these are used. A worst-case test case of involving only simple
SELECT queries with a SQL keyword is proving to lead to a reduction of
10% in TPS via pgbench and prepared queries on a high-end machine.
None of the tests I ran back for this set of changes saw such a huge
gap, but Alexander Lakhin and Andres Freund have found that this can be
noticeable. Keeping the older performance would mean to do more
inlining in the executor when using COERCE_SQL_SYNTAX for a function
expression, similarly to what SQLValueFunction does. This requires more
redesign work and there is little time until 16beta1 is released, so for
now reverting the change is the best way forward, bringing back the
previous performance.
Bump catalog version.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot. In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").
In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed. Some tests are added to
check, while on it.
Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.
Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached. Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt. Finally, snapshot->cache would cache a new snapshot on
the second attempt.
Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15
Old versions of Solaris and illumos had buffer overrun bugs in their
strxfrm() implementations. The bugs were fixed more than a decade ago
and the relevant releases are long out of vendor support. It's time to
remove the defense added by commit be8b06c3.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com
This fixes many spelling mistakes in comments, but a few references to
invalid parameter names, function names and option names too in comments
and also some in string constants
Also, fix an #undef that was undefining the incorrect definition
Author: Alexander Lakhin
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com
This reverts commit 3d03b24c3 (Revert Add support for Kerberos
credential delegation) which was committed on the grounds of concern
about portability, but on further review and discussion, it's clear that
we are better off explicitly requiring MIT Kerberos as that appears to
be the only GSSAPI library currently that's under proper maintenance
and ongoing development. The API used for storing credentials was added
to MIT Kerberos over a decade ago while for the other libraries which
appear to be mainly based on Heimdal, which exists explicitly to be a
re-implementation of MIT Kerberos, the API never made it to a released
version (even though it was added to the Heimdal git repo over 5 years
ago..).
This post-feature-freeze change was approved by the RMT.
Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net
This reverts commit 3d4fa227bc.
Per discussion and buildfarm, this depends on APIs that seem to not
be available on at least one platform (NetBSD). Should be certainly
possible to rework to be optional on that platform if necessary but bit
late for that at this point.
Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us
Provide a way to ask the kernel to use O_DIRECT (or local equivalent)
where available for data and WAL files, to avoid or minimize kernel
caching. This hurts performance currently and is not intended for end
users yet. Later proposed work would introduce our own I/O clustering,
read-ahead, etc to replace the facilities the kernel disables with this
option.
The only user-visible change, if the developer-only GUC is not used, is
that this commit also removes the obscure logic that would activate
O_DIRECT for the WAL when wal_sync_method=open_[data]sync and
wal_level=minimal (which also requires max_wal_senders=0). Those are
non-default and unlikely settings, and this behavior wasn't (correctly)
documented. The same effect can be achieved with io_direct=wal.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com
Support GSSAPI/Kerberos credentials being delegated to the server by a
client. With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.
Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
delegated to the server by the client and GSSAPI is used to
authenticate to the remote system.
Authors: Stephen Frost, Peifeng Qiu
Reviewed-By: David Christensen
Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com
Add new options to the VACUUM and ANALYZE commands called
BUFFER_USAGE_LIMIT to allow users more control over how large to make the
buffer access strategy that is used to limit the usage of buffers in
shared buffers. Larger rings can allow VACUUM to run more quickly but
have the drawback of VACUUM possibly evicting more buffers from shared
buffers that might be useful for other queries running on the database.
Here we also add a new GUC named vacuum_buffer_usage_limit which controls
how large to make the access strategy when it's not specified in the
VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as
the access strategy was prior to this change. This setting also
controls how large to make the buffer access strategy for autovacuum.
Per idea by Andres Freund.
Author: Melanie Plageman
Reviewed-by: David Rowley
Reviewed-by: Andres Freund
Reviewed-by: Justin Pryzby
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de
A future patch will add support for extending relations by multiple blocks at
once. To be concurrency safe, the buffers for those blocks need to be marked
as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from
an IO error for a single buffer. This commit extends that infrastructure to
multiple buffers by using the resource owner infrastructure.
This commit increases the size of the ResourceOwnerData struct, which appears
to have a just about measurable overhead in very extreme workloads. Medium
term we are planning to substantially shrink the size of
ResourceOwnerData. Short term the increase is small enough to not worry about
it for now.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de
Convert to BCP47 language tags before storing in the catalog, except
during binary upgrade or when the locale comes from an existing
collation or template database.
The resulting language tags can vary slightly between ICU
versions. For instance, "@colBackwards=yes" is converted to
"und-u-kb-true" in older versions of ICU, and to the simpler (but
equivalent) "und-u-kb" in newer versions.
The process of canonicalizing to a language tag also understands more
input locale string formats than ucol_open(). For instance,
"fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is
ignored; effectively treating it the same as the locale "fr" and
opening the wrong collator. Canonicalization properly interprets the
language and region, resulting in the language tag "fr-CA", which can
then be understood by ucol_open().
This commit fixes a problem in prior versions due to ucol_open()
misinterpreting locale strings as described above. For instance,
creating an ICU collation with locale "fr_CA.UTF-8" would store that
string directly in the catalog, which would later be passed to (and
misinterpreted by) ucol_open(). After this commit, the locale string
will be canonicalized to language tag "fr-CA" in the catalog, which
will be properly understood by ucol_open(). Because this fix affects
the resulting collator, we cannot change the locale string stored in
the catalog for existing databases or collations; otherwise we'd risk
corrupting indexes. Therefore, only canonicalize locales for
newly-created (not upgraded) collations/databases. For similar
reasons, do not backport.
Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Up until now, logical replication actions have been performed as the
subscription owner, who will generally be a superuser. Commit
cec57b1a0f documented hazards
associated with that situation, namely, that any user who owns a
table on the subscriber side could assume the privileges of the
subscription owner by attaching a trigger, expression index, or
some other kind of executable code to it. As a remedy, it suggested
not creating configurations where users who are not fully trusted
own tables on the subscriber.
Although that will work, it basically precludes using logical
replication in the way that people typically want to use it,
namely, to replicate a database from one node to another
without necessarily having any restrictions on which database
users can own tables. So, instead, change logical replication to
execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the
table owner when they are replicated.
Since this involves switching the active user frequently within
a session that is authenticated as the subscription user, also
impose SECURITY_RESTRICTED_OPERATION restrictions on logical
replication code. As an exception, if the table owner can SET
ROLE to the subscription owner, these restrictions have no
security value, so don't impose them in that case.
Subscription owners are now required to have the ability to
SET ROLE to every role that owns a table that the subscription
is replicating. If they don't, replication will fail. Superusers,
who normally own subscriptions, satisfy this property by default.
Non-superusers users who own subscriptions will need to be
granted the roles that own relevant tables.
Patch by me, reviewed (but not necessarily in its entirety) by
Jelte Fennema, Jeff Davis, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com
This commit only implements one prerequisite part for allowing logical
decoding. The commit message contains an explanation of the overall design,
which later commits will refer back to.
Overall design:
1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby. To prevent those errors, a new replication
conflict scenario needs to be addressed (as much as hot standby does).
2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.
3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access. That way, during WAL replay, we know when there is a risk of
conflict and, if so, if there is a conflict.
4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.
5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.
Why do we need this for logical decoding on standby?
First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.
This is done thanks to the catalog_xmin associated with the logical
replication slot.
But, with logical decoding on standby, in the following cases:
- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
the primary and the standby. Then, hot_standby_feedback will work,
but only while the connection is alive (for example a node restart
would break it)
Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).
So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.
Implementation:
When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.
The affected WAL records are the ones that already contain the
snapshotConflictHorizon field, namely:
- gistxlogDelete
- gistxlogPageReuse
- xl_hash_vacuum_one_page
- xl_heap_prune
- xl_heap_freeze_page
- xl_heap_visible
- xl_btree_reuse_page
- xl_btree_delete
- spgxlogVacuumRedirect
Due to this new field being added, xl_hash_vacuum_one_page and
gistxlogDelete do now contain the offsets to be deleted as a
FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment.
It's not needed on the others struct where isCatalogRel has
been added.
This commit just introduces the WAL format changes mentioned above. Handling
the actual conflicts will follow in future commits.
Bumps XLOG_PAGE_MAGIC as the several WAL records are changed.
Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
This is done in preparation for logical decoding on standby, which needs to
include whether visibility affecting WAL records are about a (user) catalog
table. Which is only known for the table, not the indexes.
It's also nice to be able to pass the heap relation to GlobalVisTestFor() in
vacuumRedirectAndPlaceholder().
Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
This patch introduces the SQL standard IS JSON predicate. It operates
on text and bytea values representing JSON, as well as on the json and
jsonb types. Each test has IS and IS NOT variants and supports a WITH
UNIQUE KEYS flag. The tests are:
IS JSON [VALUE]
IS JSON ARRAY
IS JSON OBJECT
IS JSON SCALAR
These should be self-explanatory.
The WITH UNIQUE KEYS flag makes these return false when duplicate keys
exist in any object within the value, not necessarily directly contained
in the outermost object.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
This commit introduces the SQL/JSON standard-conforming constructors for
JSON types:
JSON_ARRAY()
JSON_ARRAYAGG()
JSON_OBJECT()
JSON_OBJECTAGG()
Most of the functionality was already present in PostgreSQL-specific
functions, but these include some new functionality such as the ability
to skip or include NULL values, and to allow duplicate keys or throw
error when they are found, as well as the standard specified syntax to
specify output type and format.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
For ICU collations, ensure that the locale's language exists in ICU,
and that the locale can be opened.
Basic validation helps avoid minor mistakes and misspellings, which
often fall back to the root locale instead of the intended
locale. It's even more important to avoid such mistakes in ICU
versions 54 and earlier, where the same (misspelled) locale string
could fall back to different locales depending on the environment.
Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com
Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com
Reviewed-by: Peter Eisentraut
When extracting an attr from a cached tuple in the syscache with
SysCacheGetAttr the isnull parameter must be checked in case the
attr cannot be NULL. For cases when this is known beforehand, a
wrapper is introduced which perform the errorhandling internally
on behalf of the caller, invoking an elog in case of a NULL attr.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/AD76405E-DB45-46B6-941F-17B1EB3A9076@yesql.se