These cases formerly failed with errors about "could not find array type
for data type". Now they yield arrays of the same element type and one
higher dimension.
The implementation involves creating functions with API similar to the
existing accumArrayResult() family. I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result. (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.
Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
Code that check the flag no longer need #ifdef's, which is more convenient.
In particular, makes it easier to write extensions that depend on it.
In the passing, modify sslinfo's ssl_is_used function to check ssl_in_use
instead of the OpenSSL specific 'ssl' pointer. It doesn't make any
difference currently, as sslinfo is only compiled when built with OpenSSL,
but seems cleaner anyway.
Add a new XLOG_FPI_FOR_HINT record type, and use that for full-page images
generated for hint bit updates, when checksums are enabled. The new record
type is replayed exactly the same as XLOG_FPI, but allows them to be tallied
separately e.g. in pg_xlogdump.
Make it work more like FDW plans do: instead of assuming that there are
expressions in a CustomScan plan node that the core code doesn't know
about, insist that all subexpressions that need planner attention be in
a "custom_exprs" list in the Plan representation. (Of course, the
custom plugin can break the list apart again at executor initialization.)
This lets us revert the parts of the patch that exposed setrefs.c and
subselect.c processing to the outside world.
Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept
may work in future, but it's far from fully baked right now.
Instead of register_custom_path_provider and a CreateCustomScanPath
callback, let's just provide a standard function hook in set_rel_pathlist.
This is more flexible than what was previously committed, is more like the
usual conventions for planner hooks, and requires less support code in the
core. We had discussed this design (including centralizing the
set_cheapest() calls) back in March or so, so I'm not sure why it wasn't
done like this already.
There seems no prospect that any of this will ever be useful, and indeed
it's questionable whether some of it would work if it ever got called;
it's certainly not been exercised in a very long time, if ever. So let's
get rid of it, and make the comments about mark/restore in execAmi.c less
wishy-washy.
The mark/restore support for Result nodes is also currently dead code,
but that's due to planner limitations not because it's impossible that
it could be useful. So I left it in.
Get rid of the pernicious entanglement between planner and executor headers
introduced by commit 0b03e5951b.
Also, rearrange the CustomFoo struct/typedef definitions so that all the
typedef names are seen as used by the compiler. Without this pgindent
will mess things up a bit, which is not so important perhaps, but it also
removes a bizarre discrepancy between the declaration arrangement used for
CustomExecMethods and that used for CustomScanMethods and
CustomPathMethods.
Clean up the commentary around ExecSupportsMarkRestore to reflect the
rather large change in its API.
Const-ify register_custom_path_provider's argument. This necessitates
casting away const in the function, but that seems better than forcing
callers of the function to do so (or else not const-ify their method
pointer structs, which was sort of the whole point).
De-export fix_expr_common. I don't like the exporting of fix_scan_expr
or replace_nestloop_params either, but this one surely has got little
excuse.
Now that we have a policy of hiding varlena catalog fields behind
"#ifdef CATALOG_VARLEN", there is no need for their type names to be
acceptable to the C compiler. And experimentation shows that it does
not matter to pgindent either. (If it did, we'd have problems anyway,
since these typedefs are unreferenced so far as the C compiler is
concerned, and find_typedef fails to identify such typedefs.)
Hence, remove the phony typedefs that genbki.h provided to make
some varlena field definitions compilable.
In passing, rearrange #define's into what seemed a more logical order.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
For <, <=, > and >= strategies, mark the first scan key
as already matched if scanning in an appropriate direction.
If index tuple contains no nulls we can skip the first
re-check for each tuple.
Author: Rajeev Rastogi
Reviewer: Haribabu Kommi
Rework of the code and comments by Simon Riggs
Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit
85b506bbfc was mistaken in setting the relpersistence value of the
index directly in the relcache entry, within reindex_index. The reason
for the failure is that an invalidation message that comes after mucking
with the relcache entry directly, but before writing it to the catalogs,
would cause the entry to become rebuilt in place from catalogs with the
old contents, losing the update.
Fix by passing the correct persistence value to
RelationSetNewRelfilenode instead; this routine also writes the updated
tuple to pg_class, avoiding the problem. Suggested by Tom Lane.
This removes ATChangeIndexesPersistence() introduced by f41872d0c1
which was too ugly to live for long. Instead, the correct persistence
marking is passed all the way down to reindex_index, so that the
transient relation built to contain the index relfilenode can
get marked correctly right from the start.
Author: Fabrízio de Royes Mello
Review and editorialization by Michael Paquier
and Álvaro Herrera
The initial patch for RLS mistakenly included headers associated with
the executor and planner bits in rewrite/rowsecurity.h. Per policy and
general good sense, executor headers should not be included in planner
headers or vice versa.
The include of execnodes.h was a mistaken holdover from previous
versions, while the include of relation.h was used for Relation's
definition, which should have been coming from utils/relcache.h. This
patch cleans these issues up, adds comments to the RowSecurityPolicy
struct and the RowSecurityConfigType enum, and changes Relation->rsdesc
to Relation->rd_rsdesc to follow Relation field naming convention.
Additionally, utils/rel.h was including rewrite/rowsecurity.h, which
wasn't a great idea since that was pulling in things not really needed
in utils/rel.h (which gets included in quite a few places). Instead,
use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments
explaining why.
Lastly, add an include into access/nbtree/nbtsort.c for
utils/sortsupport.h, which was evidently missed due to the above mess.
Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the
concerns regarding a similar situation in the custom-path commit still
need to be addressed.
There was a window in RestoreBackupBlock where a page would be zeroed out,
but not yet locked. If a backend pinned and locked the page in that window,
it saw the zeroed page instead of the old page or new page contents, which
could lead to missing rows in a result set, or errors.
To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins,
zeroes, and locks the page, if it's not in the buffer cache already.
In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE,
to avoid breaking any 3rd party extensions that might use RBM_ZERO. More
importantly, this avoids renumbering the other enum values, which would
cause even bigger confusion in extensions that use ReadBufferExtended, but
haven't been recompiled.
Backpatch to all supported versions; this has been racy since hot standby
was introduced.
The hope is that we can use this to produce better diagnostics in
some cases.
Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
This only happens if a client issues a Parse message with an empty query
string, which is a bit odd; but since it is explicitly called out as legal
by our FE/BE protocol spec, we'd probably better continue to allow it.
Fix by adding tests everywhere that the raw_parse_tree field is passed to
functions that don't or shouldn't accept NULL. Also make it clear in the
relevant comments that NULL is an expected case.
This reverts commits a73c9dbab0 and
2e9650cbcf, which fixed specific crash
symptoms by hacking things at what now seems to be the wrong end, ie the
callee functions. Making the callees allow NULL is superficially more
robust, but it's not always true that there is a defensible thing for the
callee to do in such cases. The caller has more context and is better
able to decide what the empty-query case ought to do.
Per followup discussion of bug #11335. Back-patch to 9.2. The code
before that is sufficiently different that it would require development
of a separate patch, which doesn't seem worthwhile for what is believed
to be an essentially cosmetic change.
Previously the maximum size of GIN pending list was controlled only by
work_mem. But the reasonable value of work_mem and the reasonable size
of the list are basically not the same, so it was not appropriate to
control both of them by only one GUC, i.e., work_mem. This commit
separates new GUC, pending_list_cleanup_size, from work_mem to allow
users to control only the size of the list.
Also this commit adds pending_list_cleanup_size as new storage parameter
to allow users to specify the size of the list per index. This is useful,
for example, when users want to increase the size of the list only for
the GIN index which can be updated heavily, and decrease it otherwise.
Reviewed by Etsuro Fujita.
I missed an additional colon in previous patch. Oops. to make that mistake
less likely in the future, add comments as placeholders for unused inputs
and outputs in inline assembly.
At one time it wasn't terribly important what column names were associated
with the fields of a composite Datum, but since the introduction of
operations like row_to_json(), it's important that looking up the rowtype
ID embedded in the Datum returns the column names that users would expect.
That did not work terribly well before this patch: you could get the column
names of the underlying table, or column aliases from any level of the
query, depending on minor details of the plan tree. You could even get
totally empty field names, which is disastrous for cases like row_to_json().
To fix this for whole-row Vars, look to the RTE referenced by the Var, and
make sure its column aliases are applied to the rowtype associated with
the result Datums. This is a tad scary because we might have to return
a transient RECORD type even though the Var is declared as having some
named rowtype. In principle it should be all right because the record
type will still be physically compatible with the named rowtype; but
I had to weaken one Assert in ExecEvalConvertRowtype, and there might be
third-party code containing similar assumptions.
Similarly, RowExprs have to be willing to override the column names coming
from a named composite result type and produce a RECORD when the column
aliases visible at the site of the RowExpr differ from the underlying
table's column names.
In passing, revert the decision made in commit 398f70ec07 to add
an alias-list argument to ExecTypeFromExprList: better to provide that
functionality in a separate function. This also reverts most of the code
changes in d685814835, which we don't need because we're no longer
depending on the tupdesc found in the child plan node's result slot to be
blessed.
Back-patch to 9.4, but not earlier, since this solution changes the results
in some cases that users might not have realized were buggy. We'll apply a
more restricted form of this patch in older branches.
Reported by David Rowley: variadic macros are a problem. Get rid of
them using a trick suggested by Tom Lane: add extra parentheses where
needed. In the future we might decide we don't need the calls at all
and remove them, but it seems appropriate to keep them while this code
is still new.
Also from David Rowley: brininsert() was trying to use a variable before
initializing it. Fix by moving the brin_form_tuple call (which
initializes the variable) to within the locked section.
Reported by Peter Eisentraut: can't use "new" as a struct member name,
because C++ compilers will choke on it, as reported by cpluspluscheck.
This allows extension modules to define their own methods for
scanning a relation, and get the core code to use them. It's
unclear as yet how much use this capability will find, but we
won't find out if we never commit it.
KaiGai Kohei, reviewed at various times and in various levels
of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro
Herrera, and myself.
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes. They work by maintaining "summary" data about
block ranges. Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not. Normal index scans are not supported
because these indexes do not store TIDs.
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range. This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results. In this commit I only include minmax.
Catalog version bumped due to new builtin catalog entries.
There's more that could be done here, but this is a good step forwards.
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
PS:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633.
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.
Likewise, the code to turn the root page from non-leaf to leaf page was
removed when we got rid of old-style VACUUM FULL. Remove the code to replay
that action, too.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.
Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.
Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
Obviously, every translation unit should not be declaring this
separately. It needs to be PGDLLIMPORT as well, to avoid breaking
third-party code that uses any of the functions that the commit
mentioned above changed to macros.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.
Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.
The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
A background worker can use pq_redirect_to_shm_mq() to direct protocol
that would normally be sent to the frontend to a shm_mq so that another
process may read them.
The receiving process may use pq_parse_errornotice() to parse an
ErrorResponse or NoticeResponse from the background worker and, if
it wishes, ThrowErrorData() to propagate the error (with or without
further modification).
Patch by me. Review by Andres Freund.
This reassociates a dynamic shared memory handle previous passed to
dsm_pin_mapping with the current resource owner, so that it will be
cleaned up at the end of the current query.
Patch by me. Review of the function name by Andres Freund, Amit
Kapila, Jim Nasby, Petr Jelinek, and Álvaro Herrera.
As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover
all cases where ANALYZE might be invoked in an unsafe context. We need to
test the result of IsInTransactionChain not IsTransactionBlock; which is
notationally a pain because IsInTransactionChain requires an isTopLevel
flag, which would have to be passed down through several levels of callers.
I chose to pass in_outer_xact (ie, the result of IsInTransactionChain)
rather than isTopLevel per se, as that seemed marginally more apropos
for the intermediate functions to know about.
Nobody seemed concerned about this naming when it originally went in,
but there's a pending patch that implements the opposite of
dsm_keep_mapping, and the term "unkeep" was judged unpalatable.
"unpin" has existing precedent in the PostgreSQL code base, and the
English language, so use this terminology instead.
Per discussion, back-patch to 9.4.
Instead of initializing a new TransInvalidationInfo for every
transaction or subtransaction, we can just do it for those
transactions or subtransactions that actually need to queue
invalidation messages. That also avoids needing to free those
entries at the end of a transaction or subtransaction that does
not generate any invalidation messages, which is by far the
common case.
Patch by me. Review by Simon Riggs and Andres Freund.
Previously, this was not exposed outside of miscinit.c. It is needed
for the pending pg_background patch, and will also be needed for
parallelism. Without it, there's no way for a background worker to
re-create the exact authentication environment that was present in the
process that started it, which could lead to security exposures.
Commit 7dbb606938 added a new CHECKPOINT_FLUSH_ALL flag. As that
commit needed to be backpatched I didn't change the numeric values of
the existing flags as that could lead to nastly problems if any
external code issued checkpoints. That's not a concern on master, so
renumber them there.
Also add a comment about CHECKPOINT_FLUSH_ALL above
CreateCheckPoint().