After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.
Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied. If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.
For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints. For the
moment, we just forbid such matching. We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.
To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead. In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.
Per bug #5717 from Richard Huxton and subsequent discussion.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete. The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update. So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.
In passing, this patch corrects the names of some columns in the
information_schema.triggers view. It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.
Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
Since the code underlying pg_get_expr() is not secure against malformed
input, and can't practically be made so, we need to prevent miscreants
from feeding arbitrary data to it. We can do this securely by declaring
pg_get_expr() to take a new datatype "pg_node_tree" and declaring the
system catalog columns that hold nodeToString output to be of that type.
There is no way at SQL level to create a non-null value of type pg_node_tree.
Since the backend-internal operations that fill those catalog columns
operate below the SQL level, they are oblivious to the datatype relabeling
and don't need any changes.
SI invalidation events, rather than indirectly through the relcache.
In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry. This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.
The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective. The cost is that we have to
search the typcache linearly to find entries that need to be flushed. There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.
Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.
Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1. It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.
Review by Jaime Casanova and Tom Lane.
functions to the core XML code. Per discussion, the former depends on
XMLOPTION while the others do not. These supersede a version previously
offered by contrib/xml2.
Mike Fowler, reviewed by Pavel Stehule
better handling of NULL elements within the arrays. The third parameter
is a string that should be used to represent a NULL element, or should
be translated into a NULL element, respectively. If the third parameter
is NULL it behaves the same as the two-parameter form.
There are two incompatible changes in the behavior of the two-parameter form
of string_to_array. First, it will return an empty (zero-element) array
rather than NULL when the input string is of zero length. Second, if the
field separator is NULL, the function splits the string into individual
characters, rather than returning NULL as before. These two changes make
this form fully compatible with the behavior of the new three-parameter form.
Pavel Stehule, reviewed by Brendan Jurd
functionality, while creating an ambiguity in usage with ORDER BY that at
least two people have already gotten seriously confused by. Also, add an
opr_sanity test to check that we don't in future violate the newly minted
policy of not having built-in aggregates with the same name and different
numbers of parameters. Per discussion of a complaint from Thom Brown.
unqualified names.
- Add a missing_ok parameter to get_tablespace_oid.
- Avoid duplicating get_tablespace_od guts in objectNamesToOids.
- Add a missing_ok parameter to get_database_oid.
- Replace get_roleid and get_role_checked with get_role_oid.
- Add get_namespace_oid, get_language_oid, get_am_oid.
- Refactor existing code to use new interfaces.
Thanks to KaiGai Kohei for the review.
struct representing a tree entry, rather than being a separately allocated
piece of storage. This API is at least as clean as the old one (if not
more so --- there were some bizarre choices in there) and it permits a
very substantial memory savings, on the order of 2X in ginbulk.c's usage.
Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself. These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.
In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4. Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.
Back-patch to 9.0 so we don't have a performance regression from 8.4.
I've added a quote_all_identifiers GUC which affects the behavior
of the backend, and a --quote-all-identifiers argument to pg_dump
and pg_dumpall which sets the GUC and also affects the quoting done
internally by those applications.
Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488
filed by Hartmut Goebel.
Normally, we automatically restart after a backend crash, but in some
cases when PostgreSQL is invoked by clusterware it may be desirable to
suppress this behavior, so we provide an option which does this.
Since no existing GUC group quite fits, create a new group called
"error handling options" for this and the previously undocumented GUC
exit_on_error, which is now documented.
Review by Fujii Masao.
rather than just $N. This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well. In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees. This is noticeably cleaner for
subplans, and about a wash elsewhere.
One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables. Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before. Adjust a
couple of examples in the documentation to match this behavior.
being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane
pointed out. The bug affects FOR statement variants too, because you can
close an implicitly created cursor too by guessing the "<unnamed portal X>"
name created for it.
To fix that, "pin" the portal to prevent it from being dropped while it's
being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is
the oldest supported version.
description for vacuum_defer_cleanup_age to the correct category.
Sections in postgresql.conf are also sorted in the same order with docs.
Per gripe by Fujii Masao, suggestion by Heikki Linnakangas, and patch by me.
by a superuser -- "ALTER USER f RESET setting" already disallows removing such a
setting.
Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database
owner that's not superuser.
too, instead of duplicating the functionality (badly).
I renamed xml_init to pg_xml_init, because the former seemed just a bit too
generic to be safe as a global symbol. I considered likewise renaming
xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably
made it sufficiently PG-centric already.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
defined. Its reference to CurrentMemoryContext causes link failures on some
platforms, evidently because the inline function gets compiled despite lack of
use. Per buildfarm member warthog.
compilers, by applying a configure check to see if the compiler will accept
an unreferenced "static inline foo ..." function without warnings. It is
believed that such warnings are the only reason not to declare inlined
functions in headers, if the compiler understands "inline" at all.
Kurt Harriman
several places, but for now only GIN uses it during index creation.
Using self-balanced tree greatly speeds up index creation in corner cases
with preordered data.
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens. Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation. We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation. This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.
In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.
needed by nothing else.
The restructuring I just finished doing on cache management exposed to me how
silly this routine was. Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation. However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row. So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.
On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any. Removing that is a very useful maintenance simplification.
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.
Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).
We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
of shared or nailed system catalogs. This has two key benefits:
* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
* We no longer have to use an unsafe reindex-in-place approach for reindexing
shared catalogs.
CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.
Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.
This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch. As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
of old and new toast tables can be done either at the logical level (by
swapping the heaps' reltoastrelid links) or at the physical level (by swapping
the relfilenodes of the toast tables and their indexes). This is necessary
infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
system catalogs, where we cannot change reltoastrelid. The physical swap
saves a few catalog updates too.
We unfortunately have to keep the logical-level swap logic because in some
cases we will be adding or deleting a toast table, so there's no possibility
of a physical swap. However, that only happens as a consequence of schema
changes in the table, which we do not need to support for system catalogs,
so such cases aren't an obstacle for that.
In passing, refactor the cluster support functions a little bit to eliminate
unnecessarily-duplicated code; and fix the problem that while CLUSTER had
been taught to rename the final toast table at need, ALTER TABLE had not.
the relfilenode of currently-not-relocatable system catalogs.
1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes. Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done. This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level. Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.
2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before. (I considered catalog/storage.c, but that seemed too
low level.) Rename to RelationSetNewRelfilenode.
3. Cosmetic cleanups of some other relfilenode manipulations.
the input values into a string. The two argument version also does the same
thing, but inserts delimiters between elements.
Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
default of "plpgsql". This is more reasonable than it was when the DO patch
was written, because we have since decided that plpgsql should be installed
by default. Per discussion, having a parameter for this doesn't seem useful
enough to justify the risk of application breakage if the value is changed
unexpectedly.
and implement OVERLAY() for bit strings and bytea.
In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.
Leonardo F, reviewed by Kevin Grittner