There's no longer much pressure to switch the default GIN opclass for
jsonb, but there was still some unhappiness with the name "jsonb_hash_ops",
since hashing is no longer a distinguishing property of that opclass,
and anyway it seems like a relatively minor detail. At the suggestion of
Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the
important characteristic that each index entry depends on the entire path
from the document root to the indexed value.
Also add a user-facing explanation of the implementation properties of
these two opclasses.
Per discussion, this seems like a more consistent choice of name.
Fabrízio de Royes Mello, after a suggestion by Peter Eisentraut;
some additional documentation wordsmithing by me
Document existence operator adequately; fix obsolete claim that no
Unicode-escape semantic checks happen on input (it's still true for
json, but not for jsonb); improve examples; assorted wordsmithing.
I started out with the intention of just fixing the info about the jsonb
operator classes, but soon found myself copy-editing most of the JSON
material. Hopefully it's more readable now.
Back in 8.3, we installed permissions checks in these functions (see
commits 8bc225e799 and cc26599b72). But we forgot to document that
anywhere in the user-facing docs; it did get mentioned in the 8.3 release
notes, but nobody's looking at that any more. Per gripe from Suya Huang.
Per discussion, the old value of 128MB is ridiculously small on modern
machines; in fact, it's not even any larger than the default value of
shared_buffers, which it certainly should be. Increase to 4GB, which
is unlikely to be any worse than the old default for anyone, and should
be noticeably better for most. Eventually we might have an autotuning
scheme for this setting, but the recent attempt crashed and burned,
so for now just do this.
This reverts commit ee1e5662d8, as well as
a remarkably large number of followup commits, which were mostly concerned
with the fact that the implementation didn't work terribly well. It still
doesn't: we probably need some rather basic work in the GUC infrastructure
if we want to fully support GUCs whose default varies depending on the
value of another GUC. Meanwhile, it also emerged that there wasn't really
consensus in favor of the definition the patch tried to implement (ie,
effective_cache_size should default to 4 times shared_buffers). So whack
it all back to where it was. In a followup commit, I'll do what was
recently agreed to, which is to simply change the default to a higher
value.
The main problem is that DocBook SGML allows indexterm elements just
about everywhere, but DocBook XML is stricter. For example, this common
pattern
<varlistentry>
<indexterm>...</indexterm>
<term>...</term>
...
</varlistentry>
needs to be changed to something like
<varlistentry>
<term>...<indexterm>...</indexterm></term>
...
</varlistentry>
See also bb4eefe7bf.
There is currently nothing in the build system that enforces that things
stay valid, because that requires additional tools and will receive
separate consideration.
Commit a730183926 created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that. To clean it up:
* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h. We won't consider this symbol part of the FE/BE API.
* Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.
* So, relfilenode.h now includes relpath.h rather than vice-versa. This
direction of dependency is fine. (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)
* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.
* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().
* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers. (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)
* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().
* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames. This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.
The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.
Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject. The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments. However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature. So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations. We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.
The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column. I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.
While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency. I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
Mention impossibility of moving tablespaces, backing them up
independently, or the inadvisability of placing them on temporary
file systems.
Patch by Craig Ringer, adjustments by Ian Lawrence Warwick and me
Previously, these functions treated "" optin values as defaults in some
ways, but not in others, like when comparing to .pgpass. Also, add
documentation to clarify that now "" and NULL use defaults, like
PQsetdbLogin() has always done.
BACKWARD INCOMPATIBILITY
Patch by Adrian Vondendriesch, docs by me
Report by Jeff Janes
Document problems when disconnection causes loss of hot_standby_feedback
and suggest adjusting max_standby_archive_delay and
max_standby_streaming_delay.
Initial patch by Marko Tiikkaja, adjustments by me
Now that EXPLAIN also outputs a "planning time" measurement, the use of
"total" here seems rather confusing: it sounds like it might include the
planning time which of course it doesn't. Majority opinion was that
"execution time" is a better label, so we'll call it that.
This should be noted as a backwards incompatibility for tools that examine
EXPLAIN ANALYZE output.
In passing, I failed to resist the temptation to do a little editing on the
materialized-view example affected by this change.
Document abrupt streaming client disconnection might leave slots in use,
so max_wal_senders should be slightly higher than needed to allow for
immediate reconnection.
Per mention by Magnus
These are natural complements to the functions added by commit
0886fc6a5c, but they weren't included
in the original patch for some reason. Add them.
Patch by me, per a complaint by Tom Lane. Review by Tatsuo
Ishii.
In psql \d+, display oids only when they exist, and display replication
identity only when it is non-default. Also document the defaults for
replication identity for system and non-system tables. Update
regression output.
Add vacuumdb option --analyze-in-stages which runs ANALYZE three times
with different configuration settings, adopting the logic from the
analyze_new_cluster.sh script that pg_upgrade generates. That way,
users of pg_dump/pg_restore can also use that functionality.
Change pg_upgrade to create the script so that it calls vacuumdb instead
of implementing the logic itself.
Apparently, the old text was written at a time when the only use of
constraint_name here was for a constraint to be dropped, but that's
no longer true.
Etsuro Fujita
Views which are marked as security_barrier must have their quals
applied before any user-defined quals are called, to prevent
user-defined functions from being able to see rows which the
security barrier view is intended to prevent them from seeing.
Remove the restriction on security barrier views being automatically
updatable by adding a new securityQuals list to the RTE structure
which keeps track of the quals from security barrier views at each
level, independently of the user-supplied quals. When RTEs are
later discovered which have securityQuals populated, they are turned
into subquery RTEs which are marked as security_barrier to prevent
any user-supplied quals being pushed down (modulo LEAKPROOF quals).
Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved. This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state. As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.
This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates. Follow-on commits will update some
of the built-in aggregates to use this feature.
David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
This operator class can accelerate subnet/supernet tests as well as
btree-equivalent ordered comparisons. It also handles a new network
operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
which is expected to be useful in exclusion constraints.
Ideally this opclass would be the default for GiST with inet/cidr data,
but we can't mark it that way until we figure out how to do a more or
less graceful transition from the current situation, in which the
really-completely-bogus inet/cidr opclasses in contrib/btree_gist are
marked as default. Having the opclass in core and not default is better
than not having it at all, though.
While at it, add new documentation sections to allow us to officially
document GiST/GIN/SP-GiST opclasses, something there was never a clear
place to do before. I filled these in with some simple tables listing
the existing opclasses and the operators they support, but there's
certainly scope to put more information there.
Emre Hasegeli, reviewed by Andreas Karlsson, further hacking by me
These functions won't throw an error if the object doesn't exist,
or if (for functions and operators) there's more than one matching
object.
Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti
Raudsepp, Amit Kapila, and me.
Infrastructure to allow
plpgsql.extra_warnings
plpgsql.extra_errors
Initial extra checks only for shadowed_variables
Marko Tiikkaja and Petr Jelinek
Reviewed by Simon Riggs and Pavel Stěhule
VALIDATE CONSTRAINT
CLUSTER ON
SET WITHOUT CLUSTER
ALTER COLUMN SET STATISTICS
ALTER COLUMN SET ()
ALTER COLUMN RESET ()
All other sub-commands use AccessExclusiveLock
Simon Riggs and Noah Misch
Reviews by Robert Haas and Andres Freund
For variadic functions (other than VARIADIC ANY), the syntaxes foo(x,y,...)
and foo(VARIADIC ARRAY[x,y,...]) should be considered equivalent, since the
former is converted to the latter at parse time. They have indeed been
equivalent, in all releases before 9.3. However, commit 75b39e790 made an
ill-considered decision to record which syntax had been used in FuncExpr
nodes, and then to make equal() test that in checking node equality ---
which caused the syntaxes to not be seen as equivalent by the planner.
This is the underlying cause of bug #9817 from Dmitry Ryabov.
It might seem that a quick fix would be to make equal() disregard
FuncExpr.funcvariadic, but the same commit made that untenable, because
the field actually *is* semantically significant for some VARIADIC ANY
functions. This patch instead adopts the approach of redefining
funcvariadic (and aggvariadic, in HEAD) as meaning that the last argument
is a variadic array, whether it got that way by parser intervention or was
supplied explicitly by the user. Therefore the value will always be true
for non-ANY variadic functions, restoring the principle of equivalence.
(However, the planner will continue to consider use of VARIADIC as a
meaningful difference for VARIADIC ANY functions, even though some such
functions might disregard it.)
In HEAD, this change lets us simplify the decompilation logic in
ruleutils.c, since the funcvariadic/aggvariadic flag tells directly whether
to print VARIADIC. However, in 9.3 we have to continue to cope with
existing stored rules/views that might contain the previous definition.
Fortunately, this just means no change in ruleutils.c, since its existing
behavior effectively ignores funcvariadic for all cases other than VARIADIC
ANY functions.
In HEAD, bump catversion to reflect the fact that FuncExpr.funcvariadic
changed meanings; this is sort of pro forma, since I don't believe any
built-in views are affected.
Unfortunately, this patch doesn't magically fix everything for affected
9.3 users. After installing 9.3.5, they might need to recreate their
rules/views/indexes containing variadic function calls in order to get
everything consistent with the new definition. As in the cited bug,
the symptom of a problem would be failure to use a nominally matching
index that has a variadic function call in its definition. We'll need
to mention this in the 9.3.5 release notes.
The advice to join to pg_prepared_xacts via the transaction column was not
updated when the transaction column was replaced by virtualtransaction.
Since it's not quite obvious how to do that join, give an explicit example.
For consistency also give an example for the adjacent case of joining to
pg_stat_activity. And link-ify the view references too, just because we
can. Per bug #9840 from Alexey Bashtanov.
Michael Paquier and Tom Lane
The system realizes that DEFAULT NULL is dummy in simple cases, but not if
a cast function (such as a length coercion) needs to be applied. It's
dubious that suppressing that function call would be appropriate, anyway.
For the moment, let's just adjust the docs to say that you should omit the
DEFAULT clause if you don't want a rewrite to happen. Per gripe from Amit
Langote.
Any OS user able to access the socket can connect as the bootstrap
superuser and in turn execute arbitrary code as the OS user running the
test. Protect against that by placing the socket in the temporary data
directory, which has mode 0700 thanks to initdb. Back-patch to 8.4 (all
supported versions). The hazard remains wherever the temporary cluster
accepts TCP connections, notably on Windows.
Attempts to run "make check" from a directory with a long name will now
fail. An alternative not sharing that problem was to place the socket
in a subdirectory of /tmp, but that is only secure if /tmp is sticky.
The PG_REGRESS_SOCK_DIR environment variable is available as a
workaround when testing from long directory paths.
As a convenient side effect, this lets testing proceed smoothly in
builds that override DEFAULT_PGSOCKET_DIR. Popular non-default values
like /var/run/postgresql are often unwritable to the build user.
Security: CVE-2014-0067
This has been true for some time, but we were leaving users to discover it
the hard way.
Back-patch to 9.2. It might've been true before that, but we were claiming
Python 2.2 compatibility before that, so I won't guess at the exact
requirements back then.
Set function parameter names and defaults. Add jsonb versions (which the
code already provided for so the actual new code is trivial). Add jsonb
regression tests and docs.
Bump catalog version (which I apparently forgot to do when jsonb was
committed).
Assert errors were thrown for functions being passed invalid encodings,
while the main code handled it just fine.
Also document that libpq's PQclientEncoding() returns -1 for an encoding
lookup failure.
Per report from Peter Geoghegan
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
This covers all the SQL-standard trigger types supported for regular
tables; it does not cover constraint triggers. The approach for
acquiring the old row mirrors that for view INSTEAD OF triggers. For
AFTER ROW triggers, we spool the foreign tuples to a tuplestore.
This changes the FDW API contract; when deciding which columns to
populate in the slot returned from data modification callbacks, writable
FDWs will need to check for AFTER ROW triggers in addition to checking
for a RETURNING clause.
In support of the feature addition, refactor the TriggerFlags bits and
the assembly of old tuples in ModifyTable.
Ronan Dunklau, reviewed by KaiGai Kohei; some additional hacking by me.
krb_srvname is actually not available anymore as a parameter server-side, since
with gssapi we accept all principals in our keytab. It's still used in libpq for
client side specification.
In passing remove declaration of krb_server_hostname, where all the functionality
was already removed.
Noted by Stephen Frost, though a different solution than his suggestion
Previously, psql would print the "COPY nnn" command status only for COPY
commands executed server-side. Now it will print that for frontend copies
too (including \copy). However, we continue to suppress the command status
for COPY TO STDOUT, since in that case the copy data has been routed to the
same place that the command status would go, and there is a risk of the
status line being mistaken for another line of COPY data. Doing that would
break existing scripts, and it doesn't seem worth the benefit --- this case
seems fairly analogous to SELECT, for which we also suppress the command
status.
Kumar Rajeev Rastogi, with substantial review by Amit Khandekar
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.
This commit includes a triConsistent function for the built-in array and
tsvector opclasses.
Alexander Korotkov, with some changes by me.
In order for this to work, walsenders need the optional ability to
connect to a database, so the "replication" keyword now allows true
or false, for backward-compatibility, and the new value "database"
(which causes the "dbname" parameter to be respected).
walsender needs to loop not only when idle but also when sending
decoded data to the user and when waiting for more xlog data to decode.
This means that there are now three separate loops inside walsender.c;
although some refactoring has been done here, this is still a bit ugly.
Andres Freund, with contributions from Álvaro Herrera, and further
review by me.
Return '4' and report a meaningful error message when a non-existent or
invalid data directory is passed. Previously, pg_ctl would just report
the server was not running.
Patch by me and Amit Kapila
Report from Peter Eisentraut
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.
Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera
Reviewed-by: Tomáš Vondra, Marko Tiikkaja
With input from Fabrízio de Royes Mello, Jim Nasby
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
This option makes pg_dump, pg_dumpall and pg_restore inject an IF EXISTS
clause to each DROP command they emit. (In pg_dumpall, the clause is
not added to individual objects drops, but rather to the CREATE DATABASE
commands, as well as CREATE ROLE and CREATE TABLESPACE.)
This allows for a better user dump experience when using --clean in case
some objects do not already exist. Per bug #7873 by Dave Rolsky.
Author: Pavel Stěhule
Reviewed-by: Jeevan Chalke, Álvaro Herrera, Josh Kupershmidt
A new MAX_RATE option allows imposing a limit to the network transfer
rate from the server side. This is useful to limit the stress that
taking a base backup has on the server.
pg_basebackup is now able to specify a value to the server, too.
Author: Antonin Houska
Patch reviewed by Stefan Radomski, Andres Freund, Zoltán Böszörményi,
Fujii Masao, and Álvaro Herrera.
- Write HIGH:MEDIUM instead of DEFAULT:!LOW:!EXP for clarity.
- Order 3DES last to work around inappropriate OpenSSL default.
- Remove !MD5 and @STRENGTH, because they are irrelevant.
- Add clarifying documentation.
Effectively, the new default is almost the same as the old one, but it
is arguably easier to understand and modify.
Author: Marko Kreen <markokr@gmail.com>
Space trimming rather than space-padding causes unusual behavior, which
might not be standards-compliant.
Also remove recently-added now-redundant C comment.
DocBook XML is superficially compatible with DocBook SGML but has a
slightly stricter DTD that we have been violating in a few cases.
Although XSLT doesn't care whether the document is valid, the style
sheets don't necessarily process invalid documents correctly, so we need
to work toward fixing this.
This first commit moves the indexterms in refentry elements to an
allowed position. It has no impact on the output.
Tablespaces can be relocated in plain backup mode by specifying one or
more -T olddir=newdir options.
Author: Steeve Lennmark <steevel@handeldsbanken.se>
Reviewed-by: Peter Eisentraut <peter_e@gmx.net>
The customization overrode the fast-forward code with its custom Up
link. So this is no longer really the fast-forward feature, so we might
as well turn that off and override the non-ff template instead, thus
removing one mental indirection.
Fix the wrong column span declaration.
Clarify and update the documentation.
The functions in slotfuncs.c don't exist in any released version,
but the changes to xlogfuncs.c represent backward-incompatibilities.
Per discussion, we're hoping that the queries using these functions
are few enough and simple enough that this won't cause too much
breakage for users.
Michael Paquier, reviewed by Andres Freund and further modified
by me.
Since the temporary server started by "make check" uses "trust"
authentication, another user on the same machine could connect to it
as database superuser, and then potentially exploit the privileges of
the operating-system user who started the tests. We should change
the testing procedures to prevent this risk; but discussion is required
about the best way to do that, as well as more testing than is practical
for an undisclosed security problem. Besides, the same issue probably
affects some user-written test harnesses. So for the moment, we'll just
warn people against using "make check" when there are untrusted users on
the same machine.
In passing, remove some ancient advice that suggested making the
regression testing subtree world-writable if you'd built as root.
That looks dangerously insecure in modern contexts, and anyway we
should not be encouraging people to build Postgres as root.
Security: CVE-2014-0067
The primary role of PL validators is to be called implicitly during
CREATE FUNCTION, but they are also normal functions that a user can call
explicitly. Add a permissions check to each validator to ensure that a
user cannot use explicit validator calls to achieve things he could not
otherwise achieve. Back-patch to 8.4 (all supported versions).
Non-core procedural language extensions ought to make the same two-line
change to their own validators.
Andres Freund, reviewed by Tom Lane and Noah Misch.
Security: CVE-2014-0061
Granting a role without ADMIN OPTION is supposed to prevent the grantee
from adding or removing members from the granted role. Issuing SET ROLE
before the GRANT bypassed that, because the role itself had an implicit
right to add or remove members. Plug that hole by recognizing that
implicit right only when the session user matches the current role.
Additionally, do not recognize it during a security-restricted operation
or during execution of a SECURITY DEFINER function. The restriction on
SECURITY DEFINER is not security-critical. However, it seems best for a
user testing his own SECURITY DEFINER function to see the same behavior
others will see. Back-patch to 8.4 (all supported versions).
The SQL standards do not conflate roles and users as PostgreSQL does;
only SQL roles have members, and only SQL users initiate sessions. An
application using PostgreSQL users and roles as SQL users and roles will
never attempt to grant membership in the role that is the session user,
so the implicit right to add or remove members will never arise.
The security impact was mostly that a role member could revoke access
from others, contrary to the wishes of his own grantor. Unapproved role
member additions are less notable, because the member can still largely
achieve that by creating a view or a SECURITY DEFINER function.
Reviewed by Andres Freund and Tom Lane. Reported, independently, by
Jonas Sundman and Noah Misch.
Security: CVE-2014-0060
Make a bit more noise about the timeout-interrupt bug. Also, remove the
release note entry for commit 423e1211a; that patch fixed a problem
introduced post-9.3.2, so there's no need to document it in the release
notes.
This documentation never got the word about the existence of check-world or
installcheck-world. Revise to recommend use of those, and document all the
subsidiary test suites. Do some minor wordsmithing elsewhere, too.
In passing, remove markup related to generation of plain-text regression
test instructions, since we don't do that anymore.
Back-patch to 9.1 where check-world was added. (installcheck-world exists
in 9.0; but since check-world doesn't, this patch would need additional
work to cover that branch, and it doesn't seem worth the effort.)
The documentation suggested using "echo | psql", but not the often-superior
alternative of a here-document. Also, be more direct about suggesting
that people avoid -c for multiple commands. Per discussion.
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.
Therefore, we now have multixact-specific freezing parameters:
vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)
vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids). Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.
autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids). This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.
Backpatch to 9.3 where multixacts were made to persist enough to require
freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.
Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
We have a practice of providing a "bread crumb" trail between the minor
versions where the migration section actually tells you to do something.
Historically that was just plain text, eg, "see the release notes for
9.2.4"; but if you're using a browser or PDF reader, it's a lot nicer
if it's a live hyperlink. So use "<xref>" instead. Any argument against
doing this vanished with the recent decommissioning of plain-text release
notes.
Vik Fearing
Providing this information as plain text was doubtless worth the trouble
ten years ago, but it seems likely that hardly anyone reads it in this
format anymore. And the effort required to maintain these files (in the
form of extra-complex markup rules in the relevant parts of the SGML
documentation) is significant. So, let's stop doing that and rely solely
on the other documentation formats.
Per discussion, the plain-text INSTALL instructions might still be worth
their keep, so we continue to generate that file.
Rather than remove HISTORY and src/test/regress/README from distribution
tarballs entirely, replace them with simple stub files that tell the reader
where to find the relevant documentation. This is mainly to avoid possibly
breaking packaging recipes that expect these files to exist.
Back-patch to all supported branches, because simplifying the markup
requirements for release notes won't help much unless we do it in all
branches.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts. Slots have some advantages over existing
techniques, as explained in the documentation.
In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.
Andres Freund and Robert Haas
New checks include input, month/day/time internal adjustments, addition,
subtraction, multiplication, and negation. Also adjust docs to
correctly specify interval size in bytes.
Report from Rok Kralj
This doesn't work for prepared queries, but it's not too easy to get
the information in that case and there's some debate as to exactly
what the right thing to measure is, so just do this for now.
Andreas Karlsson, with slight doc changes by me.
This patch adds an option, huge_tlb_pages, which allows requesting the
shared memory segment to be allocated using huge pages, by using the
MAP_HUGETLB flag in mmap(). This can improve performance.
The default is 'try', which means that we will attempt using huge pages,
and fall back to non-huge pages if it doesn't work. Currently, only Linux
has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as
'off'.
In the passing, don't try to round the mmap() size to a multiple of
pagesize. mmap() doesn't require that, and there's no particular reason for
PostgreSQL to do that either. When using MAP_HUGETLB, however, round the
request size up to nearest 2MB boundary. This is to work around a bug in
some Linux kernel versions, but also to avoid wasting memory, because the
kernel will round the size up anyway.
Many people were involved in writing this patch, including Christian Kruse,
Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund
and me.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.
Catalog version bumped.
Andrew Dunstan, reviewed by Marko Tiikkaja.
This change allows us to eliminate the previous limit on stored query
length, and it makes the shared-memory hash table very much smaller,
allowing more statements to be tracked. (The default value of
pg_stat_statements.max is therefore increased from 1000 to 5000.)
In typical scenarios, the hash table can be large enough to hold all the
statements commonly issued by an application, so that there is little
"churn" in the set of tracked statements, and thus little need to do I/O
to the file.
To further reduce the need for I/O to the query-texts file, add a way
to retrieve all the columns of the pg_stat_statements view except for
the query text column. This is probably not of much interest for human
use but it could be exploited by programs, which will prefer using the
queryid anyway.
Ordinarily, we'd need to bump the extension version number for the latter
change. But since we already advanced pg_stat_statements' version number
from 1.1 to 1.2 in the 9.4 development cycle, it seems all right to just
redefine what 1.2 means.
Peter Geoghegan, reviewed by Pavel Stehule
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment. There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.
This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.
Patch by me. Review by Andres Freund and KaiGai Kohei.
Fix integer overflow issue noted by Magnus Hagander, as well as a bunch
of other infelicities in commit ee1e5662d8
and its unreasonably large number of followups.
From the Department of Nitpicking, be consistent with other escaping
and use 'E' instead of 'e' to escape the string in the example docs
for GET DISAGNOSTICS stack = PG_CONTEXT.
Noticed by Department Chief Magnus Hagander.
Mention that CREATE TABLE LIKE INCLUDING DEFAULTS creates a link between
the original and new tables if a default function modifies the database,
like nextval().
This allows ending recovery as a consistent state has been reached. Without
this, there was no easy way to e.g restore an online backup, without
replaying any extra WAL after the backup ended.
MauMau and me.
Add the ability to specify the objects to move by who those objects are
owned by (as relowner) and change ALL to mean ALL objects. This
makes the command always operate against a well-defined set of objects
and not have the objects-to-be-moved based on the role of the user
running the command.
Per discussion with Simon and Tom.
There was a bug in the psql's meta command \conninfo. When the
IP address was specified in the hostaddr and psql used it to create
a connection (i.e., psql -d "hostaddr=xxx"), \conninfo could not
display that address. This is because \conninfo got the connection
information only from PQhost() which could not return hostaddr.
This patch adds PQhostaddr(), and changes \conninfo so that it
can display not only the host name that PQhost() returns but also
the IP address which PQhostaddr() returns.
The bug has existed since 9.1 where \conninfo was introduced.
But it's too late to add new libpq function into the released versions,
so no backpatch.
Unlike our other array functions, this considers the total number of
elements across all dimensions, and returns 0 rather than NULL when the
array has no elements. But it seems that both of those behaviors are
almost universally disliked, so hopefully that's OK.
Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).
libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
Tablespaces have a few options which can be set on them to give PG hints
as to how the tablespace behaves (perhaps it's faster for sequential
scans, or better able to handle random access, etc). These options were
only available through the ALTER TABLESPACE command.
This adds the ability to set these options at CREATE TABLESPACE time,
removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to
get the correct options set on the tablespace.
Vik Fearing, reviewed by Michael Paquier.
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another. This can be extremely handy and avoids
a lot of error-prone scripting. ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
This function provides a way of generating version 4 (pseudorandom) UUIDs
based on pgcrypto's PRNG. The main reason for doing this is that the
OSSP UUID library depended on by contrib/uuid-ossp is becoming more and
more of a porting headache, so we need an alternative for people who can't
install that. A nice side benefit though is that this implementation is
noticeably faster than uuid-ossp's uuid_generate_v4() function.
Oskari Saarenmaa, reviewed by Emre Hasegeli
The + modifier of \do didn't use to do anything, but now it adds an oprcode
column. This is useful both as an additional form of documentation of what
the operator does, and to save a step when finding out properties of the
underlying function.
Marko Tiikkaja, reviewed by Rushabh Lathia, adjusted a bit by me
Primarily, explain where to find the system-wide psqlrc file, per recent
gripe from John Sutton. Do some general wordsmithing and improve the
markup, too.
Also adjust psqlrc.sample so its comments about file location are somewhat
trustworthy. (Not sure why we bother with this file when it's empty,
but whatever.)
Back-patch to 9.2 where the startup file naming scheme was last changed.
We haven't wanted to do this in the past on the grounds that in rare
cases the original xmin value will be needed for forensic purposes, but
commit 37484ad2aa removes that objection,
so now we can.
Per extensive discussion, among many people, on pgsql-hackers.
Commit 37484ad2aa invalidated a good
chunk of documentation, so patch it up to reflect the new state of
play. Along the way, patch remaining documentation references to
FrozenXID to say instead FrozenTransactionId, so that they match the
way we actually spell it in the code.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
This fixes a problem noted as a followup to bug #8648: if a query has a
semantically-empty target list, e.g. SELECT * FROM zero_column_table,
ruleutils.c will dump it as a syntactically-empty target list, which was
not allowed. There doesn't seem to be any reliable way to fix this by
hacking ruleutils (note in particular that the originally zero-column table
might since have had columns added to it); and even if we had such a fix,
it would do nothing for existing dump files that might contain bad syntax.
The best bet seems to be to relax the syntactic restriction.
Also, add parse-analysis errors for SELECT DISTINCT with no columns (after
*-expansion) and RETURNING with no columns. These cases previously
produced unexpected behavior because the parsed Query looked like it had
no DISTINCT or RETURNING clause, respectively. If anyone ever offers
a plausible use-case for this, we could work a bit harder on making the
situation distinguishable.
Arguably this is a bug fix that should be back-patched, but I'm worried
that there may be client apps or PLs that expect "SELECT ;" to throw a
syntax error. The issue doesn't seem important enough to risk changing
behavior in minor releases.
WAL records of hint bit updates is useful to tools that want to examine
which pages have been modified. In particular, this is required to make
the pg_rewind tool safe (without checksums).
This can also be used to test how much extra WAL-logging would occur if
you enabled checksums, without actually enabling them (which you can't
currently do without re-initdb'ing).
Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
further changes by me.
This can be used to mark custom built binaries with an extra version
string such as a git describe identifier or distribution package release
version.
From: Oskari Saarenmaa <os@ohmu.fi>
Set min_recovery_apply_delay to force a delay in recovery apply for commit and
restore point WAL records. Other records are replayed immediately. Delay is
measured between WAL record time and local standby time.
Robert Haas, Fabrízio de Royes Mello and Simon Riggs
Detailed review by Mitsumasa Kondo
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983. This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all. Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.
Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.
Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and
other collection types. This feature, however, deals with set-returning
functions. Use a different syntax for this feature to keep open the
possibility of implementing the standard TABLE().
This sets up ECDH key exchange, when compiling against OpenSSL that
supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be
used for SSL connections. The latter one means that EC keys are now
usable.
The reason for EC key exchange is that it's faster than DHE and it
allows to go to higher security levels where RSA will be horribly slow.
There is also new GUC option ssl_ecdh_curve that specifies the curve
name used for ECDH. It defaults to "prime256v1", which is the most
common curve in use in HTTPS.
From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
The query ID is the internal hash identifier of the statement,
and was not available in pg_stat_statements view so far.
Daniel Farina, Sameer Thakur and Peter Geoghegan, reviewed by me.
By default, OpenSSL (and SSL/TLS in general) lets the client cipher
order take priority. This is OK for browsers where the ciphers were
tuned, but few PostgreSQL client libraries make the cipher order
configurable. So it makes sense to have the cipher order in
postgresql.conf take priority over client defaults.
This patch adds the setting "ssl_prefer_server_ciphers" that can be
turned on so that server cipher order is preferred. Per discussion,
this now defaults to on.
From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
Previously missing or invalid service files returned NULL. Also fix
pg_upgrade to report "out of memory" for a null return from
PQconndefaults().
Patch by Steve Singer, rewritten by me
I'm putting these up for review before I start to extract the relevant
subsets for the older branches. It'll be easier to make any suggested
wording improvements at this stage.
This is mostly to fix incorrect migration instructions: since the preceding
minor releases advised reindexing some GIST indexes, it's important that
we back-link to that advice rather than earlier instances.
Also improve some bug descriptions and fix a few typos.
No back-patch yet; these files will get copied into the back branches
later in the release process.
Reviewed-by: Ali Dar <ali.munir.dar@gmail.com>
Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com>
Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>
Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a
transaction block from error (post-9.3) to warning. (Was nothing in <=
9.3.) Also change ABORT outside of a transaction block from notice to
warning.
ECPG is not supposed to allow and output nested comments in C. These comments
are only allowed in the SQL parts and must not be written into the C file.
Also the different handling of different comments is documented.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data. This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good. This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.
It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function. But this is already a step forward.
This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates. I (tgl) thought it was worth reviewing and committing
this infrastructure separately. In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.
Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
Pending patches for logical replication will use this to determine
which columns of a tuple ought to be considered as its candidate key.
Andres Freund, with minor, mostly cosmetic adjustments by me
These things didn't work because the planner omitted to do the necessary
preprocessing of a WindowFunc's argument list. Add the few dozen lines
of code needed to handle that.
Although this sounds like a feature addition, it's really a bug fix because
the default-argument case was likely to crash previously, due to lack of
checking of the number of supplied arguments in the built-in window
functions. It's not a security issue because there's no way for a
non-superuser to create a window function definition with defaults that
refers to a built-in C function, but nonetheless people might be annoyed
that it crashes rather than producing a useful error message. So
back-patch as far as the patch applies easily, which turns out to be 9.2.
I'll put a band-aid in earlier versions as a separate patch.
(Note that these features still don't work for aggregates, and fixing that
case will be harder since we represent aggregate arg lists as target lists
not bare expression lists. There's no crash risk though because CREATE
AGGREGATE doesn't accept defaults, and we reject named-argument notation
when parsing an aggregate call.)
For rather inscrutable reasons, SQL:2008 disallows copying-and-modifying a
window definition that has any explicit framing clause. The error message
we gave for this only made sense if the referencing window definition
itself contains an explicit framing clause, which it might well not.
Moreover, in the context of an OVER clause it's not exactly obvious that
"OVER (windowname)" implies copy-and-modify while "OVER windowname" does
not. This has led to multiple complaints, eg bug #5199 from Iliya
Krapchatov. Change to a hopefully more intelligible error message, and
in the case where we have just "OVER (windowname)", add a HINT suggesting
that omitting the parentheses will fix it. Also improve the related
documentation. Back-patch to all supported branches.
These variables no longer have any useful purpose, since there's no reason
to special-case brute force timezones now that we have a valid
session_timezone setting for them. Remove the variables, and remove the
SET/SHOW TIME ZONE code that deals with them.
The user-visible impact of this is that SHOW TIME ZONE will now show a
POSIX-style zone specification, in the form "<+-offset>-+offset", rather
than an interval value when a brute-force zone has been set. While perhaps
less intuitive, this is a better definition than before because it's
actually possible to give that string back to SET TIME ZONE and get the
same behavior, unlike what used to happen.
We did not previously mention the angle-bracket syntax when describing
POSIX timezone specifications; add some documentation so that people
can figure out what these strings do. (There's still quite a lot of
undocumented functionality there, but anybody who really cares can
go read the POSIX spec to find out about it. In practice most people
seem to prefer Olsen-style city names anyway.)
SGML documentation, as well as code comments, failed to note that an FDW's
validator will be applied to foreign-table options for foreign tables using
the FDW.
Etsuro Fujita
We don't need two index entries for lo_create pointing at the same section.
It's a bit pedantic for the toolchain to warn about this, but warn it does.
With these, one need no longer manipulate large object descriptors and
extract numeric constants from header files in order to read and write
large object contents from SQL.
Pavel Stehule, reviewed by Rushabh Lathia.
Move random() and setseed() to a separate table, to have them grouped
together. Also add a notice that random() is not cryptographically secure.
Original patch by Honza Horak, although I didn't use his version.
Add a makefile rule for building PDFs with FOP. Two new build targets
in doc/src/sgml are postgres-A4-fop.pdf and postgres-US-fop.pdf.
Run .fo output through xmllint for reformatting, so that errors are
easier to find. (The default output has hardly any line breaks, so you
might be looking for an error in column 20000.)
Set some XSLT parameters to optimize for building with FOP.
Remove some redundant or somewhat useless chapterinfo/author
information, because it renders strangely with the FO stylesheet.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Previously, unless all columns were auto-updateable, we wouldn't
inserts, updates, or deletes, or at least not without a rule or trigger;
now, we'll allow inserts and updates that target only the auto-updateable
columns, and deletes even if there are no auto-updateable columns at
all provided the view definition is otherwise suitable.
Dean Rasheed, reviewed by Marko Tiikkaja
Although previously-introduced APIs allow the process that registers a
background worker to obtain the worker's PID, there's no way to prevent
a worker that is not currently running from being restarted. This
patch introduces a new API TerminateBackgroundWorker() that prevents
the background worker from being restarted, terminates it if it is
currently running, and causes it to be unregistered if or when it is
not running.
Patch by me. Review by Michael Paquier and KaiGai Kohei.
Development of IRIX has been discontinued, and support is scheduled
to end in December of 2013. Therefore, there will be no supported
versions of this operating system by the time PostgreSQL 9.4 is
released. Furthermore, we have no maintainer for this platform.
The default table of contents in the XSLT HTML build is much too big and
deep. Configure it to look more like the one that is currently being
produced by the DSSSL build.
All of these platforms are very much obsolete.
As far as I can determine, the last version of SINIX, later renamed
Reliant, occurred some time between 2002 and 2005.
The last release of SunOS that would run on a sun3 was released in
November of 1991; the last release of OpenBSD which supported that
platform was in 2001. The highest clock speed of any processor in
the family was 25MHz.
The NS32K (national semiconductor 320xx) architecture was retired
in 1990.
Support can be re-added if a maintainer emerges for any of these
platforms, but it seems unlikely.
Reviewed by Andres Freund.
The XSLT toolchain requires an empty <index> element where the index is
supposed to appear. Add that with conditionals to hide it from the
DSSSL build.
The previous plan of having the check-tabs target a prerequisite of
"all" and "distprep" caused make distcheck to fail because make -q
distprep would never be satisfied. Put check-tabs into the html target
instead, so it is only called when a build actually happens.
make maintainer-check was obscure and rarely called in practice, and
many breakages were missed. Fold everything that make maintainer-check
used to do into the normal build. Specifically:
- Call duplicate_oids when genbki.pl is called.
- Check for tabs in SGML files when the documentation is built.
- Run msgfmt with the -c option during the regular build. Add an
additional configure check to see whether we are using the GNU
version. (make maintainer-check probably used to fail with non-GNU
msgfmt.)
Keep maintainer-check as around as phony target for the time being in
case anyone is calling it. But it won't do anything anymore.
Change the input/output format to {A,B,C}, to match the internal
representation.
Complete the implementations of line_in, line_out, line_recv, line_send.
Remove comments and error messages about the line type not being
implemented. Add regression tests for existing line operators and
functions.
Reviewed-by: rui hua <365507506hua@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview
containing a column of a type without a default btree operator
class. It also did not produce results consistent with a non-
concurrent REFRESH or a normal view if any column was of a type
which allowed user-visible differences between values which
compared as equal according to the type's default btree opclass.
Concurrent matview refresh was modified to use the new operators
to solve these problems.
Documentation was added for record comparison, both for the
default btree operator class for record, and the newly added
operators. Regression tests now check for proper behavior both
for a matview with a box column and a matview containing a citext
column.
Reviewed by Steve Singer, who suggested some of the doc language.
This option provides more detailed error messages when STRICT is used
and the number of rows returned is not one.
Marko Tiikkaja, reviewed by Ian Lawrence Barwick
Isolate transaction latency (elapsed time between submitting first
command and receiving response to last command) from client-side delays
pertaining to the --rate schedule. Under --rate, report schedule lag as
defined in the documentation. Report latency standard deviation
whenever we collect the measurements to do so. All of these changes
affect --progress messages and the final report.
Fabien COELHO, reviewed by Pavel Stehule.
DISCARD ALL will now discard cached sequence information, as well.
Fabrízio de Royes Mello, reviewed by Zoltán Böszörményi, with some
further tweaks by me.
Previously, arbitray system columns could be mentioned in table
constraints, but they were not correctly checked at runtime, because
the values weren't actually set correctly in the tuple. Since it
seems easy enough to initialize the table OID properly, do that,
and continue allowing that column, but disallow the rest unless and
until someone figures out a way to make them work properly.
No back-patch, because this doesn't seem important enough to take the
risk of destabilizing the back branches. In fact, this will pose a
dump-and-reload hazard for those upgrading from previous versions:
constraints that were accepted before but were not correctly enforced
will now either be enforced correctly or not accepted at all. Either
could result in restore failures, but in practice I think very few
users will notice the difference, since the use case is pretty
marginal anyway and few users will be relying on features that have
not historically worked.
Amit Kapila, reviewed by Rushabh Lathia, with doc changes by me.
Commit 95ef6a3448 removed the
ability to create rules on an individual column as of 7.3, but
left some residual code which has since been useless. This cleans
up that dead code without any change in behavior other than
dropping the useless column from the catalog.
The previous coding attempted to activate all the GUC settings specified
in SET clauses, so that the function validator could operate in the GUC
environment expected by the function body. However, this is problematic
when restoring a dump, since the SET clauses might refer to database
objects that don't exist yet. We already have the parameter
check_function_bodies that's meant to prevent forward references in
function definitions from breaking dumps, so let's change CREATE FUNCTION
to not install the SET values if check_function_bodies is off.
Authors of function validators were already advised not to make any
"context sensitive" checks when check_function_bodies is off, if indeed
they're checking anything at all in that mode. But extend the
documentation to point out the GUC issue in particular.
(Note that we still check the SET clauses to some extent; the behavior
with !check_function_bodies is now approximately equivalent to what ALTER
DATABASE/ROLE have been doing for awhile with context-dependent GUCs.)
This problem can be demonstrated in all active branches, so back-patch
all the way.
There's no inherent reason why an aggregate function can't be variadic
(even VARIADIC ANY) if its transition function can handle the case.
Indeed, this patch to add the feature touches none of the planner or
executor, and little of the parser; the main missing stuff was DDL and
pg_dump support.
It is true that variadic aggregates can create the same sort of ambiguity
about parameters versus ORDER BY keys that was complained of when we
(briefly) had both one- and two-argument forms of string_agg(). However,
the policy formed in response to that discussion only said that we'd not
create any built-in aggregates with varying numbers of arguments, not that
we shouldn't allow users to do it. So the logical extension of that is
we can allow users to make variadic aggregates as long as we're wary about
shipping any such in core.
In passing, this patch allows aggregate function arguments to be named, to
the extent of remembering the names in pg_proc and dumping them in pg_dump.
You can't yet call an aggregate using named-parameter notation. That seems
like a likely future extension, but it'll take some work, and it's not what
this patch is really about. Likewise, there's still some work needed to
make window functions handle VARIADIC fully, but I left that for another
day.
initdb forced because of new aggvariadic field in Aggref parse nodes.
I started out just to fix the broken markup in commit
1c20857661, but got distracted by
copy-editing. I see Bruce already fixed the markup, but I'll
commit the wordsmithing anyway.
It seems like a good idea to update these examples since some fairly
basic planner behaviors have changed in 9.3; notably that the startup cost
for an indexscan plan node is no longer invariably estimated at 0.00.
Using the infrastructure provided by this patch, it's possible either
to wait for the startup of a dynamically-registered background worker,
or to poll the status of such a worker without waiting. In either
case, the current PID of the worker process can also be obtained.
As usual, worker_spi is updated to demonstrate the new functionality.
Patch by me. Review by Andres Freund.
We already did this for -t (--table) in 9.3, but missed the other similar
options. For consistency, allow all of them to be specified multiple times.
Unfortunately it's too late to sneak this into 9.3, so commit to master
only.
Currently we don't need to update the pg_tablespace catalog
after redefining the symbolic links to the tablespaces
because pg_tablespace.spclocation column was removed in
PostgreSQL 9.2.
Back patch to 9.2 where pg_tablespace.spclocation was removed.
Ian Barwick, with minor change by me.
plpgsql often just remembers SPI-result tuple tables in local variables,
and has no mechanism for freeing them if an ereport(ERROR) causes an escape
out of the execution function whose local variable it is. In the original
coding, that wasn't a problem because the tuple table would be cleaned up
when the function's SPI context went away during transaction abort.
However, once plpgsql grew the ability to trap exceptions, repeated
trapping of errors within a function could result in significant
intra-function-call memory leakage, as illustrated in bug #8279 from
Chad Wagner.
We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH
coding, but that would be tedious, probably slow, and prone to bugs of
omission; moreover it would do nothing for similar risks elsewhere.
What seems like a better plan is to make SPI itself responsible for
freeing tuple tables at subtransaction abort. This patch attacks the
problem that way, keeping a list of live tuple tables within each SPI
function context. Currently, such freeing is automatic for tuple tables
made within the failed subtransaction. We might later add a SPI call to
mark a tuple table as not to be freed this way, allowing callers to opt
out; but until someone exhibits a clear use-case for such behavior, it
doesn't seem worth bothering.
A very useful side-effect of this change is that SPI_freetuptable() can
now defend itself against bad calls, such as duplicate free requests;
this should make things more robust in many places. (In particular,
this reduces the risks involved if a third-party extension contains
now-redundant SPI_freetuptable() calls in error cleanup code.)
Even though the leakage problem is of long standing, it seems imprudent
to back-patch this into stable branches, since it does represent an API
semantics change for SPI users. We'll patch this in 9.3, but live with
the leakage in older branches.
In my previous change to make pgstattuple use SnapshotDirty rather
than SnapshotNow, I failed to notice that the documenation also
needed to be updated to match. Fix.
This adds the ability to get the call stack as a string from within a
PL/PgSQL function, which can be handy for logging to a table, or to
include in a useful message to an end-user.
Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked
around by Stephen Frost.
This controls the target transaction rate to certain tps, rather than
maximum. Patch contributed by Fabien COELHO, reviewed by Greg Smith,
and slight editing by me.
Per discussion on pgsql-hackers, these aren't really needed. Interim
versions of the background worker patch had the worker starting with
signals already unblocked, which would have made this necessary.
But the final version does not, so we don't really need it; and it
doesn't work well with the new facility for starting dynamic background
workers, so just rip it out.
Also per discussion on pgsql-hackers, back-patch this change to 9.3.
It's best to get the API break out of the way before we do an
official release of this facility, to avoid more pain for extension
authors later.
Also, tweak wording in comments (per Andres) and documentation (myself)
to point out that it's the database's default tablespace that can be
passed as 0, not DEFAULTTABLESPACE_OID. Robert Haas noticed the bug in
the code, but didn't update the accompanying prose.
Future patches are expected to introduce logical replication that
works by decoding WAL. WAL contains relfilenodes rather than relation
OIDs, so this infrastructure will be needed to find the relation OID
based on WAL contents.
If logical replication does not make it into this release, we probably
should consider reverting this, since it will add some overhead to DDL
operations that create new relations. One additional index insert per
pg_class row is not a large overhead, but it's more than zero.
Another way of meeting the needs of logical replication would be to
the relation OID to WAL, but that would burden DML operations, not
only DDL.
Andres Freund, with some changes by me. Design review, in earlier
versions, by Álvaro Herrera.
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated. For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).
This option is part of the SQL specification.
Dean Rasheed, reviewed by Pavel Stehule
This allows us to specify the target relation with several expressions,
'relname', 'schemaname.relname' and OID in all pgstattuple functions.
pgstatindex() and pg_relpages() could not accept OID as the argument
so far.
Per discussion on -hackers, we decided to keep two types of interfaces,
with regclass-type and TEXT-type argument, for each pgstattuple
function because of the backward-compatibility issue. The functions
which have TEXT-type argument will be deprecated in the future release.
Patch by Satoshi Nagayasu, reviewed by Rushabh Lathia and Fujii Masao.
The documentation for ALTER VIEW had a minor copy-and-paste error in
defining the parameters. Noticed when reviewing the WITH CHECK OPTION
patch.
Backpatch to 9.2 where this was first introduced.
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.
catversion bump due to change in Aggref and WindowFunc.
David Fetter, reviewed by Dean Rasheed.
This allows reads to continue without any blocking while a REFRESH
runs. The new data appears atomically as part of transaction
commit.
Review questioned the Assert that a matview was not a system
relation. This will be addressed separately.
Reviewed by Hitoshi Harada, Robert Haas, Andres Freund.
Merged after review with security patch f3ab5d4.
There is a new API, RegisterDynamicBackgroundWorker, which allows
an ordinary user backend to register a new background writer during
normal running. This means that it's no longer necessary for all
background workers to be registered during processing of
shared_preload_libraries, although the option of registering workers
at that time remains available.
When a background worker exits and will not be restarted, the
slot previously used by that background worker is automatically
released and becomes available for reuse. Slots used by background
workers that are configured for automatic restart can't (yet) be
released without shutting down the system.
This commit adds a new source file, bgworker.c, and moves some
of the existing control logic for background workers there.
Previously, there was little enough logic that it made sense to
keep everything in postmaster.c, but not any more.
This commit also makes the worker_spi contrib module into an
extension and adds a new function, worker_spi_launch, which can
be used to demonstrate the new facility.
This is like shared_preload_libraries except that it takes effect at
backend start and can be changed without a full postmaster restart. It
is like local_preload_libraries except that it is still only settable by
a superuser. This can be a better way to load modules such as
auto_explain.
Since there are now three preload parameters, regroup the documentation
a bit. Put all parameters into one section, explain common
functionality only once, update the descriptions to reflect current and
future realities.
Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>
This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of
the object's provenance. REINDEX is an earlier example of this pattern.
As a downside, functions called from materialized views must tolerate
running in a security-restricted operation. CREATE MATERIALIZED VIEW
need not change user ID. Nonetheless, avoid creation of materialized
views that will invariably fail REFRESH by making it, too, start a
security-restricted operation.
Back-patch to 9.3 so materialized views have this from the beginning.
Reviewed by Kevin Grittner.
Previously, pg_upgrade docs recommended using .pgpass if using MD5
authentication to avoid being prompted for a password. Turns out pg_ctl
never prompts for a password, so MD5 requires .pgpass --- document that.
Also recommend 'peer' for authentication too.
Backpatch back to 9.1.
The old implementation converted PostgreSQL numeric to Python float,
which was always considered a shortcoming. Now numeric is converted to
the Python Decimal object. Either the external cdecimal module or the
standard library decimal module are supported.
From: Szymon Guz <mabewlun@gmail.com>
From: Ronan Dunklau <rdunklau@gmail.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
This value, now pg_stat_all_tables.n_mod_since_analyze, was already
tracked and used by autovacuum, but not exposed to the user.
Mark Kirkwood, review by Laurenz Albe
In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends. However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places. 9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.
A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart. This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query). While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.
Patch by me. Review by Michael Paquier.
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.
Patch by Michael Paquier, reviewed by Andres Freund and me.
Specifically, permit attaching them to the error in RAISE and retrieving
them from a caught error in GET STACKED DIAGNOSTICS. RAISE enforces
nothing about the content of the fields; for its purposes, they are just
additional string fields. Consequently, clarify in the protocol and
libpq documentation that the usual relationships between error fields,
like a schema name appearing wherever a table name appears, are not
universal. This freedom has other applications; consider a FDW
propagating an error from an RDBMS having no schema support.
Back-patch to 9.3, where core support for the error fields was
introduced. This prevents the confusion of having a release where libpq
exposes the fields and PL/pgSQL does not.
Pavel Stehule, lexical revisions by Noah Misch.
Make it easier for readers of the FP docs to find out about possibly
truncated values.
Per complaint from Tom Duffey in message
F0E0F874-C86F-48D1-AA2A-0C5365BF5118@trillitech.com
Author: Albe Laurenz
Reviewed by: Abhijit Menon-Sen
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
On Unix, you can embed double-quotes in single-quotes, and via versa.
However, on Windows, you can only escape double-quotes in double-quotes,
so use that in the pg_dump -t/table example.
Backpatch to 9.3.
Report from Mike Toews
Add ability for to_char() to output the timezone's UTC offset (OF). We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan
Previous code had old/new prefixes on option values, e.g.
--old-datadir=OLDDATADIR. Remove them, for simplicity; now:
--old-datadir=DATADIR. Also update docs to do the same.
On immediate shutdown, or during a restart-after-crash sequence,
postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
this is not a good strategy if backends don't die because of that
signal. (This might happen, for example, if a backend gets tangled
trying to malloc() due to gettext(), as in an example illustrated by
MauMau.) This causes problems when later trying to restart the server,
because some processes are still attached to the shared memory segment.
Instead of just abandoning such backends to their fates, we now have
postmaster hang around for a little while longer, send a SIGKILL after
some reasonable waiting period, and then exit. This makes immediate
shutdown more reliable.
There is disagreement on whether it's best for postmaster to exit after
sending SIGKILL, or to stick around until all children have reported
death. If this controversy is resolved differently than what this patch
implements, it's an easy change to make.
Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau
MauMau and Álvaro Herrera
Change -u (user) option to -U, for consistency with other tools like
pg_dump and psql. Also expand --user to --username, again for
consistency.
BACKWARD INCOMPATIBILITY
Adjust the wording in the first para of "Sequence Manipulation Functions"
so that neither of the link phrases in it break across line boundaries,
in either A4- or US-page-size PDF output. This fixes a reported build
failure for the 9.3beta2 A4 PDF docs, and future-proofs this particular
para against causing similar problems in future. (Perhaps somebody will
fix this issue in the SGML/TeX documentation tool chain someday, but I'm
not holding my breath.)
Back-patch to all supported branches, since the same problem could rise up
to bite us in future updates if anyone changes anything earlier than this
in func.sgml.
If there is no <date> element, the publication date for the EPUB
manifest is taken from the copyright year. But something like
"1996-2013" is not a legal date specification. So the EPUB output
currently fails epubcheck.
Put in a separate <date> element with the current year. Put it in
legal.sgml, because copyright.pl already instructs to update that
manually, so it hopefully won't be missed.
Most of the documentation uses "single-user mode", so use that in the
code as well. Adjust the documentation to match the new error message
wording. Also add a documentation index entry for "single-user mode".
Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>
ALTER TABLE .. VALIDATE CONSTRAINT previously
gave incorrect details about lock levels and
therefore incomplete reasons to use the option.
Initial bug report and fix from Marko Tiikkaja
Reworded by me to include comments by Kevin Grittner
Extend the FDW API (which we already changed for 9.3) so that an FDW can
report whether specific foreign tables are insertable/updatable/deletable.
The default assumption continues to be that they're updatable if the
relevant executor callback function is supplied by the FDW, but finer
granularity is now possible. As a test case, add an "updatable" option to
contrib/postgres_fdw.
This patch also fixes the information_schema views, which previously did
not think that foreign tables were ever updatable, and fixes
view_is_auto_updatable() so that a view on a foreign table can be
auto-updatable.
initdb forced due to changes in information_schema views and the functions
they rely on. This is a bit unfortunate to do post-beta1, but if we don't
change this now then we'll have another API break for FDWs when we do
change it.
Dean Rasheed, somewhat editorialized on by Tom Lane
Per discussion on -hackers. We treat Unicode escapes when unescaping
them similarly to the way we treat them in PostgreSQL string literals.
Escapes in the ASCII range are always accepted, no matter what the
database encoding. Escapes for higher code points are only processed in
UTF8 databases, and attempts to process them in other databases will
result in an error. \u0000 is never unescaped, since it would result in
an impermissible null byte.
Per discussion, this restriction isn't needed for any real security reason,
and it seems to confuse people more often than it helps them. It could
also result in some database states being unrestorable. So just drop it.
Back-patch to 9.0, where ALTER DEFAULT PRIVILEGES was introduced.
In 9.2, Unicode escape sequences are not analysed at all other than
to make sure that they are in the form \uXXXX. But in 9.3 many of the
new operators and functions try to turn JSON text values into text in
the server encoding, and this includes de-escaping Unicode escape
sequences. This processing had not taken into account the possibility
that this might contain a surrogate pair to designate a character
outside the BMP. That is now handled correctly.
This also enforces correct use of surrogate pairs, something that is not
done by the type's input routines. This fact is noted in the docs.
Although the DTD technically allows this, the resulting HTML is invalid
because it puts block elements inside inline elements. DocBook 5.0 also
doesn't allow it anymore, so it's fair to assume that this was never
really intended to work. Replace <synopsis> with <literal>, which is
the markup used elsewhere in the documentation in similar cases.
The documentation for ALTER TYPE .. RENAME claimed to support a
RESTRICT/CASCADE option at the 'type' level, which wasn't implemented
and doesn't make a whole lot of sense to begin with. What is supported,
and previously undocumented, is
ALTER TYPE .. RENAME ATTRIBUTE .. RESTRICT/CASCADE.
I've updated the documentation and back-patched this to 9.1 where it was
first introduced.
The 9.2 patch that added argument name support in SQL-language functions
missed updating a parenthetical comment about that in the CREATE FUNCTION
reference page. Noted by Erwin Brandstetter.
If a standby server has a cascading standby server connected to it, it's
possible that WAL has already been sent up to the next WAL page boundary,
splitting a WAL record in the middle, when the first standby server is
promoted. Don't throw an assertion failure or error in walsender if that
happens.
Also, fix a variant of the same bug in pg_receivexlog: if it had already
received WAL on previous timeline up to a segment boundary, when the
upstream standby server is promoted so that the timeline switch record falls
on the previous segment, pg_receivexlog would miss the segment containing
the timeline switch. To fix that, have walsender send the position of the
timeline switch at end-of-streaming, in addition to the next timeline's ID.
It was previously assumed that the switch happened exactly where the
streaming stopped.
Note: this is an incompatible change in the streaming protocol. You might
get an error if you try to stream over timeline switches, if the client is
running 9.3beta1 and the server is more recent. It should be fine after a
reconnect, however.
Reported by Fujii Masao.
What we have implemented is a radix tree (or a radix trie or a patricia
trie), but the docs and code comments incorrectly called it a "suffix tree".
Alexander Korotkov
It is surprisingly common mistake to leave out backup_label file from a base
backup. Say more explicitly that it must be included.
Jeff Janes, with minor rewording by me.
Previously this state was represented by whether the view's disk file had
zero or nonzero size, which is problematic for numerous reasons, since it's
breaking a fundamental assumption about heap storage. This was done to
allow unlogged matviews to revert to unpopulated status after a crash
despite our lack of any ability to update catalog entries post-crash.
However, this poses enough risk of future problems that it seems better to
not support unlogged matviews until we can find another way. Accordingly,
revert that choice as well as a number of existing kluges forced by it
in favor of creating a pg_class.relispopulated flag column.
The initial implementation of this feature was really unsupportable,
because it's relying on the physical size of an on-disk file to carry the
relation's populated/unpopulated state, which is at least a modularity
violation and could have serious long-term consequences. We could say that
an unlogged matview goes to empty on crash, but not everybody likes that
definition, so let's just remove the feature for 9.3. We can add it back
when we have a less klugy implementation.
I left the grammar and tab-completion support for CREATE UNLOGGED
MATERIALIZED VIEW in place, since it's harmless and allows delivering a
more specific error message about the unsupported feature.
I'm committing this separately to ease identification of what should be
reverted when/if we are able to re-enable the feature.
Restore 4-byte designation for docs. Fix 9.3 doc query to properly pad
to four digits.
Backpatch to all active branches
Per suggestions from Ian Lawrence Barwick
Previously, libpq and the backend had opposite ideas about whether
it was necessary for the client to send a CopyDone message after
receiving an ErrorResponse, making it impossible to cleanly exit
COPY BOTH mode. Fix libpq so that works correctly, adopting the
backend's notion that an ErrorResponse kills the copy in both
directions.
Adjust receivelog.c to avoid a degradation in the quality of the
resulting error messages. libpqwalreceiver.c is already doing
the right thing, so no adjustment needed there.
Add an explicit statement to the documentation explaining how
this part of the protocol is supposed to work, in the hopes of
avoiding future confusion in this area.
Since the consequences of all this confusion are very limited,
especially in the back-branches where no client ever attempts
to exit COPY BOTH mode without closing the connection entirely,
no back-patch.
This changes the behavior of the start and stop actions to exit
successfully if the server was already started or stopped.
This changes the default behavior of the start action: Before, if the
server was already running, it would print a message and succeed. Now,
that situation will result in an error. When running in idempotent
mode, no message is printed and pg_ctl exits successfully.
It was considered to just make the idempotent behavior the default and
only option, but pg_upgrade needs the old behavior.
This wasn't addressed in the original patch, but it doesn't take very
much additional code to cover the case, so let's get it done.
Since pg_trgm 1.1 hasn't been released yet, I just changed the definition
of what's in it, rather than inventing a 1.2.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.
Currently, only GIN indexes are supported. We might be able to make
it work with GiST indexes later.
The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.
Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
This will hopefully be easier to use than pg_config for users who are
already used to the pkg-config interface. It also works better for
multi-arch installations.
reviewed by Tom Lane
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
Catalog version bumped.
Andrew Dunstan, with some documentation assistance from Merlin Moncure.
I changed this in commit fd15dba543, but
missed the fact that the SGML documentation of the function specified
exactly what it did. Well, one of the two places where it's specified
documented that --- probably I looked at the other place and thought
nothing needed to be done. Sync the two places where encode() and
decode() are described.
The main change here is to call security_compute_create_name_raw()
rather than security_compute_create_raw(). This ups the minimum
requirement for libselinux from 2.0.99 to 2.1.10, but it looks
like most distributions will have picked that up before 9.3 is out.
KaiGai Kohei
This event takes place just before ddl_command_end, and is fired if and
only if at least one object has been dropped by the command. (For
instance, DROP TABLE IF EXISTS of a table that does not in fact exist
will not lead to such a trigger firing). Commands that drop multiple
objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event
to fire. Some firings might be surprising, such as
ALTER TABLE DROP COLUMN.
The trigger is fired after the drop has taken place, because that has
been deemed the safest design, to avoid exposing possibly-inconsistent
internal state (system catalogs as well as current transaction) to the
user function code. This means that careful tracking of object
identification is required during the object removal phase.
Like other currently existing events, there is support for tag
filtering.
To support the new event, add a new pg_event_trigger_dropped_objects()
set-returning function, which returns a set of rows comprising the
objects affected by the command. This is to be used within the user
function code, and is mostly modelled after the recently introduced
pg_identify_object() function.
Catalog version bumped due to the new function.
Dimitri Fontaine and Álvaro Herrera
Review by Robert Haas, Tom Lane
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
Doing that results in a broken index entry in PDF output. We had only
a few like that, which is probably why nobody noticed before.
Standardize on putting the <term> first.
Josh Kupershmidt
One of the use-cases for postgres_fdw is extracting data from older PG
servers, so cross-version compatibility is important. Document what we
can do here, and further annotate some of the coding choices that create
compatibility constraints. In passing, remove one unnecessary
incompatibility with old servers, namely assuming that we didn't need to
quote the timezone name 'UTC'.
Commit 13fe298ca0 changed this GUC to be
PGC_SUSET, but neglected to update the documentation to match.
While at it, edit and rearrange the text a little for clarity.
Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.
WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.
Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.
Checksum used is WAL CRC-32 truncated to 16-bits.
Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.
Introduce pg_identify_object(oid,oid,int4), which is similar in spirit
to pg_describe_object but instead produces a row of machine-readable
information to uniquely identify the given object, without resorting to
OIDs or other internal representation. This is intended to be used in
the event trigger implementation, to report objects being operated on;
but it has usefulness of its own.
Catalog version bumped because of the new function.
Add section to the Reliability section about what is and is not protected for
various file types.
Further edits welcome.
Designed to allow 1-2 line change when/if checksums are committed.
Inspired by docs written by Jeff Davis, though completely different from his
patch.
The docs showed that early-January dates can be considered part of the
previous year for week-counting purposes, but failed to say explicitly
that late-December dates can also be considered part of the next year.
Fix that, and add a cross-reference to the "isoyear" field. Per bug
#7967 from Pawel Kobylak.
Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.
Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.
Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.
This GUC allows limiting the time spent waiting to acquire any one
heavyweight lock.
In support of this, improve the recently-added timeout infrastructure
to permit efficiently enabling or disabling multiple timeouts at once.
That reduces the performance hit from turning on lock_timeout, though
it's still not zero.
Zoltán Böszörményi, reviewed by Tom Lane,
Stephen Frost, and Hari Babu
Clarify the docs explaining what commit_delay does, and add a
recommendation about a useful value for it, namely half of the single-page
fsync time reported by pg_test_fsync. This is informed by testing of
the new-in-9.3 implementation of commit_delay; in prior versions it
was far harder to arrive at a useful setting.
In passing, do some wordsmithing and markup-fixing in the same general
area.
Also, change pg_test_fsync's default time-per-test from 2 seconds to 5.
The old value was about the minimum at which the results could be taken
seriously at all, and so seems a tad optimistic as a default.
Peter Geoghegan, reviewed by Noah Misch; some additional editing by me
There's still some discussion about exactly how postgres_fdw ought to
handle this case, but there seems no debate that we want to allow defaults
to be used for inserts into foreign tables. So remove the core-code
restrictions that prevented it.
While at it, get rid of the special grammar productions for CREATE FOREIGN
TABLE, and instead add explicit FEATURE_NOT_SUPPORTED error checks for the
disallowed cases. This makes the grammar a shade smaller, and more
importantly results in much more intelligible error messages for
unsupported cases. It's also one less thing to fix if we ever start
supporting constraints on foreign tables.
Apparently I lost some of the edits I had done on this page for commit
0ac5ad5134.
Per note from Etsuro Fujita, although I didn't use his patch.
Make some of the wording in the affected section a bit more complete,
too.
This adds the following:
json_agg(anyrecord) -> json
to_json(any) -> json
hstore_to_json(hstore) -> json (also used as a cast)
hstore_to_json_loose(hstore) -> json
The last provides heuristic treatment of numbers and booleans.
Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.
Andrew Dunstan, reviewed by Steve Singer.
Catalog version bumped.
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers. There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.
KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
Etsuro Fujita, reviewed by Amit Kapila.
Like with pg_basebackup and pg_receivexlog, it's a bit strange to call the
option -d/--dbname, when in fact you cannot pass a database name in it.
Original patch by Amit Kapila, heavily modified by me.
You could already pass a database name just by passing it as the last
option, without -d. This is an alias for that, like the -d/--dbname option
in psql and many other client applications. For consistency.
Without this, there's no way to pass arbitrary libpq connection parameters
to these applications. It's a bit strange that the option is called
-d/--dbname, when in fact you can *not* pass a database name in it, but it's
consistent with other client applications where a connection string is also
passed using -d.
Original patch by Amit Kapila, heavily modified by me.
This program relies on rm_desc backend routines and the xlogreader
infrastructure to emit human-readable rendering of WAL records.
Author: Andres Freund, with many reworks by Álvaro
Reviewed (in a much earlier version) by Peter Eisentraut
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch. So let's commit it and get on with testing.
Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE
... SET functionality to allow creating settings that apply to all users
in all databases.
reviewed by Pavel Stehule
The reason this wasn't supported before was that GiST indexes need an
increasing sequence to detect concurrent page-splits. In a regular WAL-
logged GiST index, the LSN of the page-split record is used for that
purpose, and in a temporary index, we can get away with a backend-local
counter. Neither of those methods works for an unlogged relation.
To provide such an increasing sequence of numbers, create a "fake LSN"
counter that is saved and restored across shutdowns. On recovery, unlogged
relations are blown away, so the counter doesn't need to survive that
either.
Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
Improve description of the vacuum_freeze_table_age bug (it's much more
serious than we realized at the time the fix was committed), and correct
attribution of pg_upgrade -O/-o fix (Marti Raudsepp contributed that,
but Bruce forgot to credit him in the commit log).
No need to back-patch right now, it'll happen when the next set of
release notes are prepared.
Instead of hardcoding a specific link, give a general link to the
download section of the web site. This gives the user more download
options and the sysadmins more flexibility. Also, the previously
presented link didn't work for devel versions.
This function was misdeclared to take cstring when it should take internal.
This at least allows crashing the server, and in principle an attacker
might be able to use the function to examine the contents of server memory.
The correct fix is to adjust the system catalog contents (and fix the
regression tests that should have caught this but failed to). However,
asking users to correct the catalog contents in existing installations
is a pain, so as a band-aid fix for the back branches, install a check
in enum_recv() to make it throw error if called with a cstring argument.
We will later revert this in HEAD in favor of correcting the catalogs.
Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue.
Security: CVE-2013-0255
This patch changes pg_get_viewdef() and allied functions so that
PRETTY_INDENT processing is always enabled. Per discussion, only the
PRETTY_PAREN processing (that is, stripping of "unnecessary" parentheses)
poses any real forward-compatibility risk, so we may as well make dump
output look as nice as we safely can.
Also, set the default wrap length to zero (i.e, wrap after each SELECT
or FROM list item), since there's no very principled argument for the
former default of 80-column wrapping, and most people seem to agree this
way looks better.
Marko Tiikkaja, reviewed by Jeevan Chalke, further hacking by Tom Lane
In the previous coding, psql's state variable saying that output should
go to a file was only reset after successful completion of a query
returning tuples. Thus for example,
regression=# select 1/0
regression-# \g somefile
ERROR: division by zero
regression=# select 1/2;
regression=#
... huh, I wonder where that output went. Even more oddly, the state
was not reset even if it's the file that's causing the failure:
regression=# select 1/2 \g /foo
/foo: Permission denied
regression=# select 1/2;
/foo: Permission denied
regression=# select 1/2;
/foo: Permission denied
This seems to me not to satisfy the principle of least surprise.
\g is certainly not documented in a way that suggests its effects are
at all persistent.
To fix, adjust the code so that the flag is reset at exit from SendQuery
no matter what happened.
Noted while reviewing the \gset patch, which had comparable issues.
Arguably this is a bug fix, but I'll refrain from back-patching for now.
This is specified in the SQL standard. The CREATE RECURSIVE VIEW
specification is transformed into a normal CREATE VIEW statement with a
WITH RECURSIVE clause.
reviewed by Abhijit Menon-Sen and Stephen Frost
The new option specifies length of aggregation interval (in
seconds). May be used only together with -l. With this option, the log
contains per-interval summary (number of transactions, min/max latency
and two additional fields useful for variance estimation).
Patch contributed by Tomas Vondra, reviewed by Pavel Stehule. Slight
change by Tatsuo Ishii, suggested by Robert Hass to emit an error
message indicating that the option is not currently supported on
Windows.
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure. It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error. (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)
Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.
Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.
Beyond 21474, the number of accounts exceed the range for int4. Change the
initialization code to use bigint for account id columns when scale is large
enough, and switch to using int64s for the variables in pgbench code. The
threshold where we switch to bigints is set at 20000, because that's easier
to remember and document than 21474, and ensures that there is some headroom
when int4s are used.
Greg Smith, with various changes by Euler Taveira de Oliveira, Gurjeet
Singh and Satoshi Nagayasu.
Give away ownership of shared objects (databases, tablespaces) along
with local objects, per original code intention. Try to make the
documentation clearer, too.
Per discussion about DROP OWNED's brokenness, in bug #7748.
This is not backpatched because it'd require some refactoring of the
ALTER/SET OWNER code for databases and tablespaces.
My "fix" for bugs #7578 and #6116 on DROP OWNED at fe3b5eb08a not only
misstated that it applied to REASSIGN OWNED (which it did not affect),
but it also failed to fix the problems fully, because I didn't test the
case of owned shared objects. Thus I created a new bug, reported by
Thomas Kellerer as #7748, which would cause DROP OWNED to fail with a
not-for-user-consumption error message. The code would attempt to drop
the database, which not only fails to work because the underlying code
does not support that, but is a pretty dangerous and undesirable thing
to be doing as well.
This patch fixes that bug by having DROP OWNED only attempt to process
shared objects when grants on them are found, ignoring ownership.
Backpatch to 8.3, which is as far as the previous bug was backpatched.
The SQL standard does not have general functions-in-FROM, but it does
allow UNNEST() there (see the <collection derived table> production),
and the semantics of that are defined to include lateral references.
So spec compliance requires allowing lateral references within UNNEST()
even without an explicit LATERAL keyword. Rather than making UNNEST()
a special case, it seems best to extend this flexibility to any
function-in-FROM. We'll still allow LATERAL to be written explicitly
for clarity's sake, but it's now a noise word in this context.
In theory this change could result in a change in behavior of existing
queries, by allowing what had been an outer reference in a function-in-FROM
to be captured by an earlier FROM-item at the same level. However, all
pre-9.3 PG releases have a bug that causes them to match variable
references to earlier FROM-items in preference to outer references (and
then throw an error). So no previously-working query could contain the
type of ambiguity that would risk a change of behavior.
Per a suggestion from Andrew Gierth, though I didn't use his patch.
Previously non-honored FREEZE mode was ignored. This also issues an
appropriate error message based on the cause of the failure, per
suggestion from Tom. Additional regression test case added.
In the initial implementation of plan caching, we saved the active
search_path when a plan was first cached, then reinstalled that path
anytime we needed to reparse or replan. The idea of that was to try to
reselect the same referenced objects, in somewhat the same way that views
continue to refer to the same objects in the face of schema or name
changes. Of course, that analogy doesn't bear close inspection, since
holding the search_path fixed doesn't cope with object drops or renames.
Moreover sticking with the old path seems to create more surprises than
it avoids. So instead of doing that, consider that the cached plan depends
on search_path, and force reparse/replan if the active search_path is
different than it was when we last saved the plan.
This gets us fairly close to having "transparency" of plan caching, in the
sense that the cached statement acts the same as if you'd just resubmitted
the original query text for another execution. There are still some corner
cases where this fails though: a new object added in the search path
schema(s) might capture a reference in the query text, but we'd not realize
that and force a reparse. We might try to fix that in the future, but for
the moment it looks too expensive and complicated.
Previously, the VARIADIC labeling was effectively ignored, but now these
functions act as though the array elements had all been given as separate
arguments.
Pavel Stehule
Since 9.0, the count parameter has only limited the number of tuples
actually returned by the executor. It doesn't affect the behavior of
INSERT/UPDATE/DELETE unless RETURNING is specified, because without
RETURNING, the ModifyTable plan node doesn't return control to execMain.c
for each tuple. And we only check the limit at the top level.
While this behavioral change was unintentional at the time, discussion of
bug #6572 led us to the conclusion that we prefer the new behavior anyway,
and so we should just adjust the docs to match rather than change the code.
Accordingly, do that. Back-patch as far as 9.0 so that the docs match the
code in each branch.
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com1290721684-sup-3951@alvh.no-ip.org1294953201-sup-2099@alvh.no-ip.org1320343602-sup-2290@alvh.no-ip.org1339690386-sup-8927@alvh.no-ip.org4FE5FF020200002500048A3D@gw.wicourts.gov4FEAB90A0200002500048B7D@gw.wicourts.gov
Remove extra line at bottom of table for new 'latex' mode border=3.
Also update 'latex'-longtable 'tableattr' docs to say
'whitespace-separated' instead of 'space'.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
On top of the previous support in pg_dump, add support to specify
multiple tables (by using the -t option multiple times) to
pg_restore, clsuterdb, reindexdb and vacuumdb.
Josh Kupershmidt, reviewed by Karl O. Pinc
(-i), producing only one progress message per 5 seconds along with
elapsed time and estimated remaining time. Also add elapsed time and
estimated remaining time to the default logging(prints one message
each 100000 rows).
Patch contributed by Tomas Vondra, reviewed by Jeevan Chalke and
Tatsuo Ishii.
Adds commandline option -R to pg_basebackup that creates a recovery.conf which
enables standby mode using the same parameters that pg_basebackup used to
connect to the master, and writes it into the output directory (or injects it
in the tar file when tar format is used).
Zoltan Boszormenyi, modified by Magnus Hagander, reviewed by Amit Kapila & Fujii Masao
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path. And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead. This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before. This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.
By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.
An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones. This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.
Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)
Heikki Linnakangas and Tom Lane
Code review for commit 2f582f76b1: don't use
a static variable for what ought to be a deparse_context field, fix
non-multibyte-safe test for spaces, avoid useless and potentially O(N^2)
(though admittedly with a very small constant) calculations of wrap
positions when we aren't going to wrap.
If pg_extension_config_dump() is executed again for a table already listed
in the extension's extconfig, the code was blindly making a new array entry.
This does not seem useful. Fix it to replace the existing array entry
instead, so that it's possible for extension update scripts to alter the
filter conditions for configuration tables.
In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig
entry for the target table, and remove it if present. This is not a 100%
solution because it's allowed for an extension update script to just
summarily DROP a member table, and that code path doesn't go through
ExecAlterExtensionContentsStmt. We could probably make that case clean
things up if we had to, but it would involve sticking a very ugly wart
somewhere in the guts of dependency.c. Since on the whole it seems quite
unlikely that extension updates would want to remove pre-existing
configuration tables, making the case possible with an explicit command
seems sufficient.
Per bug #7756 from Regina Obe. Back-patch to 9.1 where extensions were
introduced.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
Commit 729205571e added privileges on data
types, but there were a number of oversights. The implementation of
default privileges for types missed a few places, and pg_dump was
utterly innocent of the whole concept. Per bug #7741 from Nathan Alden,
and subsequent wider investigation.
This patch makes "simple" views automatically updatable, without the need
to create either INSTEAD OF triggers or INSTEAD rules. "Simple" views
are those classified as updatable according to SQL-92 rules. The rewriter
transforms INSERT/UPDATE/DELETE commands on such views directly into an
equivalent command on the underlying table, which will generally have
noticeably better performance than is possible with either triggers or
user-written rules. A view that has INSTEAD OF triggers or INSTEAD rules
continues to operate the same as before.
For the moment, security_barrier views are not considered simple.
Also, we do not support WITH CHECK OPTION. These features may be
added in future.
Dean Rasheed, reviewed by Amit Kapila
For some reason lost in the mists of prehistory, RETURN was only coded to
allow a simple reference to a composite variable when the function's return
type is composite. Allow an expression instead, while preserving the
efficiency of the original code path in the case where the expression is
indeed just a composite variable's name. Likewise for RETURN NEXT.
As is true in various other places, the supplied expression must yield
exactly the number and data types of the required columns. There was some
discussion of relaxing that for pl/pgsql, but no consensus yet, so this
patch doesn't address that.
Asif Rehman, reviewed by Pavel Stehule
Background workers are postmaster subprocesses that run arbitrary
user-specified code. They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.
Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on. Modules can install more than one
bgworker, if necessary.
Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration. This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests. Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)
The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).
In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing. The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).
More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.
An elementary sample module is supplied.
Author: Álvaro Herrera
This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.
Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila
storage.
Have pg_upgrade use it, and enable server options fsync=off and
full_page_writes=off.
Document that users turning fsync from off to on should run initdb
--sync-only.
[ Previous commit was incorrectly applied as a git merge. ]
We've generally recommended use of INSTEAD triggers over rules since that
feature was added; but this old text in the CREATE VIEW reference page
didn't get the memo. Noted by Thomas Kellerer.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.
Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner
This allows a caller to get back the exact conninfo array that was
used to create a connection, including parameters read from the
environment.
In doing this, restructure how options are copied from the conninfo
to the actual connection.
Zoltan Boszormenyi and Magnus Hagander
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
This reverts commit d573e239f0, "Take fewer
snapshots". While that seemed like a good idea at the time, it caused
execution to use a snapshot that had been acquired before locking any of
the tables mentioned in the query. This created user-visible anomalies
that were not present in any prior release of Postgres, as reported by
Tomas Vondra. While this whole area could do with a redesign (since there
are related cases that have anomalies anyway), it doesn't seem likely that
any future patch would be reasonably back-patchable; and we don't want 9.2
to exhibit a behavior that's subtly unlike either past or future releases.
Hence, revert to prior code while we rethink the problem.
This way it works more like the DSSSL build, and dependencies are
tracked better by make.
Also copy the CSS stylesheet to the html directory. This was forgotten
when the output directory was changed.
Some versions of the XSLT stylesheets don't handle the missing slash
correctly (they concatenate directory and file name without the slash).
This might never have worked correctly.
Without this, the connection will be killed after timeout if
wal_sender_timeout is set in the server.
Original patch by Amit Kapila, modified by me to fit recent changes in the
code.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.
This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
Represent a sequence's current value as a separate TableDataInfo dumpable
object, so that it can be dumped within the data section of the archive
rather than in pre-data. This fixes an undesirable inconsistency between
the meanings of "--data-only" and "--section=data", and also fixes dumping
of sequences that are marked as extension configuration tables, as per a
report from Marko Kreen back in July. The main cost is that we do one more
SQL query per sequence, but that's probably not very meaningful in most
databases.
Back-patch to 9.1, since it has the extension configuration issue even
though not the --section switch.
... and have sepgsql use it to determine whether to check permissions
during certain operations. Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.
Author: KaiGai Kohei, slightly revised by me
In commit 4317e0246c, I accidentally broke
this behavior while rearranging code to ensure that --create wouldn't
affect whether a DATABASE entry gets put into archive-format output.
Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is
either useless or dangerous depending on the usage scenario.
It should not do that, and no longer does.
A bright spot is that this refactoring makes it easy to allow the
combination of --clean and --create to work sensibly, ie, emit DROP
DATABASE then CREATE DATABASE before reconnecting. Ordinarily we'd
consider that a feature addition and not back-patch it, but it seems
silly to not include the extra couple of lines required in the 9.2
version of the code.
Per report from Guillaume Lelarge, though this is slightly more extensive
than his proposed patch.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.
It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.
Amit Kapila, heavily edited by me.
dblink now has its own validator function dblink_fdw_validator(), which is
better than the core function postgresql_fdw_validator() because it gets
the list of legal options from libpq instead of having a hard-wired list.
Make the dblink extension module provide a standard foreign data wrapper
dblink_fdw that encapsulates use of this validator, and recommend use of
that wrapper instead of making up wrappers on the fly.
Unfortunately, because ad-hoc wrappers *were* recommended practice
previously, it's not clear when we can get rid of postgresql_fdw_validator
without causing upgrade problems. But this is a step in the right
direction.
Shigeru Hanada, reviewed by KaiGai Kohei
libpq defines these functions as accepting "size_t" lengths ... but the
underlying backend functions expect signed int32 length parameters, and so
will misinterpret any value exceeding INT_MAX. Fix the libpq side to throw
error rather than possibly doing something unexpected.
This is a bug of long standing, but I doubt it's worth back-patching. The
problem is really pretty academic anyway with lo_read/lo_write, since any
caller expecting sane behavior would have to have provided a multi-gigabyte
buffer. It's slightly more pressing with lo_truncate, but still we haven't
supported large objects over 2GB until now.
Copy-editing for previous patch, plus fixing some longstanding markup
issues and oversights (like not mentioning that failures will set the
PQerrorMessage string).
4TB large objects (standard 8KB BLCKSZ case). For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.
Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
Use the terms "simple bind" and "search+bind" consistently do
distinguish the two modes (better than first mode and second mode in
any case). They were already used in some places, now it's just more
prominent.
Split up the list of options into one for common options and one for
each mode, for clarity.
Add configuration examples for either mode.
These reference pages still claimed that you have to be superuser to create
a database or schema owned by a different role. That was true before 8.1,
but it was changed in commits aa1110624c and
f91370cd2f to allow assignment of ownership
to any role you are a member of. However, at the time we were thinking of
that primarily as a change to the ALTER OWNER rules, so the need to touch
these two CREATE ref pages got missed.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.
Fabrízio de Royes Mello
This allows logging only some fraction of transactions, greatly reducing
the amount of log generated.
Tomas Vondra, reviewed by Robert Haas and Jeff Janes.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".
Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
Both programs got the "magic" string wrong, causing standard-conforming tar
implementations to believe the output was just legacy tar format without
any POSIX extensions. This doesn't actually matter that much, especially
since pg_dump failed to fill the POSIX fields anyway, but still there is
little point in emitting tar format if we can't be compliant with the
standard. In addition, pg_dump failed to write the EOF marker correctly
(there should be 2 blocks of zeroes not just one), pg_basebackup put the
numeric group ID in the wrong place, and both programs had a pretty
brain-dead idea of how to compute the checksum. Fix all that and improve
the comments a bit.
pg_restore is modified to accept either the correct POSIX-compliant "magic"
string or the previous value. This part of the change will need to be
back-patched to avoid an unnecessary compatibility break when a previous
version tries to read tar-format output from 9.3 pg_dump.
Brian Weaver and Tom Lane
The syntax "su -c 'command' username" is not accepted by all versions of
su, for example not OpenBSD's. More portable is "su username -c
'command'". So change runtime.sgml to recommend that syntax. Also,
add a -D switch to the OpenBSD example script, for consistency with other
examples. Per Denis Lapshin and Gábor Hidvégi.
This allows easily splitting configuration into many files, deployed in a
directory.
Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands. Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.
Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
Somewhere along the line, somebody decided to remove all trace of this
notation from the documentation text. It was still in the command syntax
synopses, or at least some of them, but with no indication what it meant.
This will not do, as evidenced by the confusion apparent in bug #7543;
even if the notation is now unnecessary, people will find it in legacy
SQL code and need to know what it does.
The documentation mentioned setting autovacuum_freeze_max_age to
"its maximum allowed value of a little less than two billion".
This led to a post asking about the exact maximum allowed value,
which is precisely two billion, not "a little less".
Based on question by Radovan Jablonovsky. Backpatch to 8.3.
When starting either an old or new postmaster, force it to place its Unix
socket in the current directory. This makes it even harder for accidental
connections to occur during pg_upgrade, and also works around some
scenarios where the default socket location isn't usable. (For example,
if the default location is something other than "/tmp", it might not exist
during "make check".)
When checking an already-running old postmaster, find out its actual socket
directory location from postmaster.pid, if possible. This dodges problems
with an old postmaster having a configured location different from the
default built into pg_upgrade's libpq. We can't find that out if the old
postmaster is pre-9.1, so also document how to cope with such scenarios
manually.
In support of this, centralize handling of the connection-related command
line options passed to pg_upgrade's subsidiary programs, such as pg_dump.
This should make future changes easier.
Bruce Momjian and Tom Lane
Extend xfunc.sgml's discussion of set-returning functions to show an
example of using LATERAL, and recommend that over putting SRFs in the
targetlist.
In passing, reword func.sgml's section on set-returning functions so
that it doesn't claim that the functions listed therein are all the
built-in set-returning functions. That hasn't been true for a long
time, and trying to make it so doesn't seem like it would be an
improvement. (Perhaps we should rename that section?)
Both per suggestions from Merlin Moncure.
Only warn when connecting to a newer server, since connecting to older
servers works pretty well nowadays. Also update the documentation a
little about current psql/server compatibility expectations.
The existing documentation in Linux Memory Overcommit seemed to
assume that PostgreSQL itself could never be the problem, or at
least it didn't tell you what to do about it.
Per discussion with Craig Ringer and Kevin Grittner.
This is so that these sections will have stable HTML tags that one can
link to, rather than things like "AEN1902". Perhaps we should mount a
campaign to do this everywhere, but I've found myself pointing at
syntax.sgml subsections often enough to be sure it's useful here.
As of 9.2, constraint exclusion should work okay with prepared statements:
the planner will try custom plans with actual values of the parameters,
and observe that they are a lot cheaper than the generic plan, and thus
never fall back to using the generic plan. Noted by Tatsuhito Kasahara.
The docs claimed that this mode only waits for the standby to receive WAL
data, but actually it waits for the data to be written out to the standby's
OS; which is a pretty significant difference because it removes the risk of
crash of the walreceiver process.
libxslt offers the ability to read and write both files and URLs through
stylesheet commands, thus allowing unprivileged database users to both read
and write data with the privileges of the database server. Disable that
through proper use of libxslt's security options.
Also, remove xslt_process()'s ability to fetch documents and stylesheets
from external files/URLs. While this was a documented "feature", it was
long regarded as a terrible idea. The fix for CVE-2012-3489 broke that
capability, and rather than expend effort on trying to fix it, we're just
going to summarily remove it.
While the ability to write as well as read makes this security hole
considerably worse than CVE-2012-3489, the problem is mitigated by the fact
that xslt_process() is not available unless contrib/xml2 is installed,
and the longstanding warnings about security risks from that should have
discouraged prudent DBAs from installing it in security-exposed databases.
Reported and fixed by Peter Eisentraut.
Security: CVE-2012-3488
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.
This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.
Honza Horak, reviewed and revised by Tom Lane
Should be limited to the maximum number of connections excluding
autovacuum workers, not including.
Add similar check for max_wal_senders, which should never be higher than
max_connections.
Previously, the -1 option was silently ignored.
Also, emit an error if -1 is used in a context where it won't be
respected, to avoid user confusion.
Original patch by Fabien COELHO, but this version is quite different
from the original submission.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.
The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned. The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags. Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.
In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason. (It was mainly
add_missing_from that caused so much fudging here in the first place.)
The planner support for this feature is very minimal, and will be improved
in future patches. It works well enough for testing purposes, though.
catversion bump forced due to new field in RangeTblEntry.
century specifications just like positive/AD centuries. Previously the
behavior was either wrong or inconsistent with positive/AD handling.
Centuries without years now always assume the first year of the century,
which is now documented.
After taking awhile to digest the row-processor feature that was added to
libpq in commit 92785dac2e, we've concluded
it is over-complicated and too hard to use. Leave the core infrastructure
changes in place (that is, there's still a row processor function inside
libpq), but remove the exposed API pieces, and instead provide a "single
row" mode switch that causes PQgetResult to return one row at a time in
separate PGresult objects.
This approach incurs more overhead than proper use of a row processor
callback would, since construction of a PGresult per row adds extra cycles.
However, it is far easier to use and harder to break. The single-row mode
still affords applications the primary benefit that the row processor API
was meant to provide, namely not having to accumulate large result sets in
memory before processing them. Preliminary testing suggests that we can
probably buy back most of the extra cycles by micro-optimizing construction
of the extra results, but that task will be left for another day.
Marko Kreen
The most user-visible part of this is to change the long options
--statusint and --noloop to --status-interval and --no-loop,
respectively, per discussion.
Also, consistently enclose file names in double quotes, per our
conventions; and consistently use the term "transaction log file" to
talk about WAL segments. (Someday we may need to go over this
terminology and make it consistent across the whole source code.)
Finally, reflow the code to better fit in 80 columns, and have pgindent
fix it up some more.
The initially implemented syntax, "CHECK NO INHERIT (expr)" was not
deemed very good, so switch to "CHECK (expr) NO INHERIT" instead. This
way it looks similar to SQL-standards compliant constraint attribute.
Backport to 9.2 where the new syntax and feature was introduced.
Per discussion.
This is apparently faster than doing things the other way around when
the scale factor is large.
Along the way, adjust -n to suppress vacuuming during initialization
as well as during test runs.
Jeff Janes, with some small changes by me.
Commit 3855968f32 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do. This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.
Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.
Dimitri Fontaine and Robert Haas
They don't actually do anything yet; that will get fixed in a
follow-on commit. But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.
Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
Historically we have not worried about fsync'ing anything during initdb
(in fact, initdb intentionally passes -F to each backend launch to prevent
it from fsync'ing). But with filesystems getting more aggressive about
caching data, that's not such a good plan anymore. Make initdb do a pass
over the finished data directory tree to fsync everything. For testing
purposes, the -N/--nosync flag can be used to restore the old behavior.
Also, testing shows that on Linux, sync_file_range() is much faster than
posix_fadvise() for hinting to the kernel that an fsync is coming,
apparently because the latter blocks on a rather small request queue while
the former doesn't. So use this function if available in initdb, and also
in the backend's pg_flush_data() (where it currently will affect only the
speed of CREATE DATABASE's cloning step).
We will later make pg_regress invoke initdb with the --nosync flag
to avoid slowing down cases such as "make check" in contrib. But
let's not do so until we've shaken out any portability issues in this
patch.
Jeff Davis, reviewed by Andres Freund
These functions support removing or replacing array element value(s)
matching a given search value. Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.
Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
This hasn't been true since 9.1, when the default was changed to -1.
Remove the reference completely, keeping the discussion of the parameter
and it's shared memory effects on the config page.
Instead of letting every backend participating in a group commit wait
independently, have the first one that becomes ready to flush WAL wait
for the configured delay, and let all the others wait just long enough
for that first process to complete its flush. This greatly increases
the chances of being able to configure a commit_delay setting that
actually improves performance.
As a side consequence of this change, commit_delay now affects all WAL
flushes, rather than just commits. There was some discussion on
pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay,
but in the absence of consensus I am leaving it alone for now.
Peter Geoghegan, with some changes, mostly to the documentation, by me.
A similar change was made previously for pg_cancel_backend, so now it
all matches again.
Dan Farina, reviewed by Fujii Masao, Noah Misch, and Jeff Davis,
with slight kibitzing on the doc changes by me.
The xlogid + segno representation of a particular WAL segment doesn't make
much sense in pg_resetxlog anymore, now that we don't use that anywhere
else. Use the WAL filename instead, since that's a convenient way to name a
particular WAL segment.
I did this partially for pg_resetxlog in the original xlogid/segno -> uint64
patch, but I neglected pg_upgrade and the docs. This should now be more
complete.
The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits. Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.
Remove the typedefs for int2 and int4 for now. They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.
Also, add some cross-links to the indexing documentation, so it's easier
to notice that && and other array operators have index support.
Ryan Kelly, edited by me.
The option --foreign-keys, used at initialization time, will create foreign
key constraints for the columns that represent references to other tables'
primary keys. This can help in benchmarking FK performance.
Jeff Janes
Previously, when executing an ON UPDATE SET NULL or SET DEFAULT action for
a multicolumn MATCH SIMPLE foreign key constraint, we would set only those
referencing columns corresponding to referenced columns that were changed.
This is what the SQL92 standard said to do --- but more recent versions
of the standard say that all referencing columns should be set to null or
their default values, no matter exactly which referenced columns changed.
At least for SET DEFAULT, that is clearly saner behavior. It's somewhat
debatable whether it's an improvement for SET NULL, but it appears that
other RDBMS systems read the spec this way. So let's do it like that.
This is a release-notable behavioral change, although considering that
our documentation already implied it was done this way, the lack of
complaints suggests few people use such cases.
Previously we followed the SQL92 wording, "MATCH <unspecified>", but since
SQL99 there's been a less awkward way to refer to the default style.
In addition to the code changes, pg_constraint.confmatchtype now stores
this match style as 's' (SIMPLE) rather than 'u' (UNSPECIFIED). This
doesn't affect pg_dump or psql because they use pg_get_constraintdef()
to reconstruct foreign key definitions. But other client-side code might
examine that column directly, so this change will have to be marked as
an incompatibility in the 9.3 release notes.
Before, some places didn't document the short options (-? and -V),
some documented both, some documented nothing, and they were listed in
various orders. Now this is hopefully more consistent and complete.
Since this is the easy way of doing it, it should be listed first. All
the old information is retained for those who want the more advanced way.
Also adds a subheading for compressing logs, that seems to have been missing
Aside from adjusting the documentation to say that these are deprecated,
we now report a warning (not an error) for use of GLOBAL, since it seems
fairly likely that we might change that to request SQL-spec-compliant temp
table behavior in the foreseeable future. Although our handling of LOCAL
is equally nonstandard, there is no evident interest in ever implementing
SQL modules, and furthermore some other products interpret LOCAL as
behaving the same way we do. So no expectation of change and no warning
for LOCAL; but it still seems a good idea to deprecate writing it.
Noah Misch
The simplest way to handle this is just to copy-and-paste the relevant
code block in fork_process.c, so that's what I did. (It's possible that
something more complicated would be useful to packagers who want to work
with either the old or the new API; but at this point the number of such
people is rapidly approaching zero, so let's just get the minimal thing
done.) Update relevant documentation as well.
Remove a couple of items that were actually back-patched bug fixes.
Add additional details to a couple of items which lacked a description.
Improve attributions for a couple of items I was involved with.
A few other miscellaneous corrections.
getopt_long() allows abbreviating long options, so we might as well
give the option the full name, and users can abbreviate it how they
like.
Do some general polishing of the --help output at the same time.
To replace it, add -X/--xlog-method that allows the specification
of fetch or stream.
Do this to avoid unnecessary backwards-incompatiblity. Spotted and
suggested by Peter Eisentraut.
Since the replication protocol deals with TimestampTz, we need to
care for the floating point case as well in the frontend tools.
Fujii Masao, with changes from Magnus Hagander
The initial implementation of pg_dump's --section option supposed that the
existing --schema-only and --data-only options could be made equivalent to
--section settings. This is wrong, though, due to dubious but long since
set-in-stone decisions about where to dump SEQUENCE SET items, as seen in
bug report from Martin Pitt. (And I'm not totally convinced there weren't
other bugs, either.) Undo that coupling and instead drive --section
filtering off current-section state tracked as we scan through the TOC
list to call _tocEntryRequired().
To make sure those decisions don't shift around and hopefully save a few
cycles, run _tocEntryRequired() only once per TOC entry and save the result
in a new TOC field. This required minor rejiggering of ACL handling but
also allows a far cleaner implementation of inhibit_data_for_failed_table.
Also, to ensure that pg_dump and pg_restore have the same behavior with
respect to the --section switches, add _tocEntryRequired() filtering to
WriteToc() and WriteDataChunks(), rather than trying to implement section
filtering in an entirely orthogonal way in dumpDumpableObject(). This
required adjusting the handling of the special ENCODING and STDSTRINGS
items, but they were pretty weird before anyway.
Minor other code review for the patch, too.
Drop special handling of host component with slashes to mean
Unix-domain socket. Specify it as separate parameter or using
percent-encoding now.
Allow omitting username, password, and port even if the corresponding
designators are present in URI.
Handle percent-encoding in query parameter keywords.
Alex Shulgin
some documentation improvements by myself
Per discussion, we should explain that we follow RFC 3339 and not really
the letter of the ISO 8601 spec for timestamp output format. Mostly
Brendan Jurd's wording, though I tweaked it to clarify that we do take 'T'
on input. Minor additional copy-editing and markup-tweaking, too.
initdb: Add -T option
oid2name: Put options in some non-random order
pg_dump: Put --section option in the right place
And some additional markup and terminology improvements.
Since we've got an "open items" list item about this, apparently some
people are pretty worried about it.
In passing remove a lot of trailing whitespace.
This example was quite old: it lacked the WAL writer and autovac launcher
as well as the more recently added checkpointer. Linux "ps" seems to show
slightly different stuff now too.
We previously recognized that citext wouldn't get marked as collatable
during pg_upgrade from a pre-9.1 installation, and hacked its
create-from-unpackaged script to manually perform the necessary catalog
adjustments. However, we overlooked the fact that domains over citext,
as well as the citext[] array type, need the same adjustments. Extend
the script to handle those cases.
Also, the documentation suggested that this was only an issue in pg_upgrade
scenarios, which is quite wrong; loading any dump containing citext from a
pre-9.1 server will also result in the type being wrongly marked.
I approached the documentation problem by changing the 9.1.2 release note
paragraphs about this issue, which is historically inaccurate. But it
seems better than having the information scattered in multiple places, and
leaving incorrect info in the 9.1.2 notes would be bad anyway. We'll still
need to mention the issue again in the 9.1.4 notes, but perhaps they can
just reference 9.1.2 for fix instructions.
Per report from Evan Carroll. Back-patch into 9.1.
Rewrite description of "include_if_exists" for clarity. Add subsection
headings to make the structure of the page a little clearer. A couple
other minor improvements too.
Josh Kupershmidt and Tom Lane
HEAD documentation was failing to build as US PDF for me, because a link
to "CREATE CAST" was getting split across pages. Adjust wording to
remove this rather gratuitous cross-reference.
It was already on its last legs, and it turns out that it was
accidentally broken in commit 89e850e6fd
and no one cared. So remove the rest the support for it and update
the documentation to indicate that Python 2.3 is now required.
It'd be nice to be able to spell Jan Urbanski's name with the correct
accent marks, but we haven't yet found a way that works in everybody's
docs toolchain. This way definitely doesn't.
Create separate appendixes for contrib extensions and other server
plugins on the one hand, and utility programs on the other. Recast
the documentation of the latter as refentries, so that man pages are
generated.
This adds the variable COMP_KEYWORD_CASE, which controls in what case
keywords are completed. This is partially to let users configure the
change from commit 69f4f1c357, but it
also offers more behaviors than were available before.
Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
The default for the choice attribute of the <arg> element is "opt",
which would normally put the argument inside brackets. But the DSSSL
stylesheets contain a hack that treats <arg> directly inside <group>
specially, so that <group><arg>-x</arg><arg>-y</arg></group> comes out
as [ -x | -y ] rather than [ [-x] | [-y] ], which it would technically
be. But when building man pages, this doesn't work, and so the
command synopses on the man pages contain lots of extra brackets.
By putting choice="opt" or choice="plain" explicitly on every <arg>
and <group> element, we avoid any toolchain dependencies like that,
and it also makes it clearer in the source code what is meant.
In passing, make some small corrections in the documentation about
which arguments are really optional or not.
Remove the following ports:
- dgux
- nextstep
- sunos4
- svr4
- ultrix4
- univel
These are obsolete and not worth rescuing. In most cases, there is
circumstantial evidence that they wouldn't work anymore anyway.
Add more markup in particular so that the command options appear
consistently in monospace in the HTML output.
On the vacuumdb reference page, remove listing all the possible
options in the synopsis. They have become too many now; we have the
detailed options list for that.
We had changed this from the default bold to monospace for all output
formats, but for man pages, this creates visual inconsistencies, so
revert to the default for man pages.
This patch adjusts the core statistics views to match the decision already
taken for pg_stat_statements, that values representing elapsed time should
be represented as float8 and measured in milliseconds. By using float8,
we are no longer tied to a specific maximum precision of timing data.
(Internally, it's still microseconds, but we could now change that without
needing changes at the SQL level.)
The columns affected are
pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time
pg_stat_xact_user_functions.total_time
pg_stat_xact_user_functions.self_time
The first four of these are new in 9.2, so there is no compatibility issue
from changing them. The others require a release note comment that they
are now double precision (and can show a fractional part) rather than
bigint as before; also their underlying statistics functions now match
the column definitions, instead of returning bigint microseconds.
Get rid of the per-column documentation of underlying functions, which did
far more to clutter the view descriptions than it did to be helpful, and
was rather incomplete and typo-ridden anyway. Instead suggest that people
consult the definitions of the standard views to see the underlying
functions.
The older functions for obtaining individual facts about backends are now
somewhat obsoleted by pg_stat_get_activity, which means that they are not
documented by any standard view. So I put that information into a separate
table. (Maybe we should just deprecate them instead?)
In passing, fix a couple more documentation errors.
Display total time and I/O timings in milliseconds, for consistency with
the units used for timings in the core statistics views. The columns
remain of float8 type, so that sub-msec precision is available. (At some
point we will probably want to convert the core views to use float8 type
for the same reason, but this patch does not touch that issue.)
This is a release-note-requiring change in the meaning of the total_time
column. The I/O timing columns are new as of 9.2, so there is no
compatibility impact from redefining them.
Do some minor copy-editing in the documentation, too.
Get rid of section 8.5.6 (Date/Time Internals), which appears to confuse
people more than it helps, and anyway discussion of Postgres' internal
datetime calculation methods seems pretty out of place here. Instead,
make datatype.sgml just say that we follow the Gregorian calendar (a bit
of specification not previously present anywhere in that chapter :-()
and link to the History of Units appendix for more info. Do some mild
editorialization on that appendix, too, to make it clearer that we are
following proleptic Gregorian calendar rules rather than anything more
historically accurate.
Per a question from Florence Cousin and subsequent discussion in
pgsql-docs.
Prohibiting this outright would break dumps taken from older versions
that contain such casts, which would create far more pain than is
justified here.
Per report by Jaime Casanova and subsequent discussion.
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.
The pg_constraint column has accordingly been renamed connoinherit.
This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.
Author: Nikhil Sontakke
Some tweaks by me
The result object methods colnames() etc. would crash when called
after a command that did not produce a result set. Now they throw an
exception.
discovery and initial patch by Jean-Baptiste Quenot
The output of the new pg_xlog_location_diff function is of type numeric,
since it could theoretically overflow an int8 due to signedness; this
provides a convenient way to format such values.
Fujii Masao, with some beautification by me.
Per mailing list discussion, we would like to keep the bytea functions
parallel to the text functions, so rename bytea_agg to string_agg,
which already exists for text.
Also, to satisfy the rule that we don't want aggregate functions of
the same name with a different number of arguments, add a delimiter
argument, just like string_agg for text already has.
Previously we attempted to throw an error or at least warning for missing
schemas, but this was done inconsistently because of implementation
restrictions (in many cases, GUC settings are applied outside transactions
so that we can't do system catalog lookups). Furthermore, there were
exceptions to the rule even in the beginning, and we'd been poking more
and more holes in it as time went on, because it turns out that there are
lots of use-cases for having some irrelevant items in a common search_path
value. It seems better to just adopt a philosophy similar to what's always
been done with Unix PATH settings, wherein nonexistent or unreadable
directories are silently ignored.
This commit also fixes the documentation to point out that schemas for
which the user lacks USAGE privilege are silently ignored. That's always
been true but was previously not documented.
This is mostly in response to Robert Haas' complaint that 9.1 started to
throw errors or warnings for missing schemas in cases where prior releases
had not. We won't adopt such a significant behavioral change in a back
branch, so something different will be needed in 9.1.
postgres:// URIs are an attempt to "stop the bleeding" in this general
area that has been said to occur due to external projects adopting their
own syntaxes. The syntaxes supported by this patch:
postgres://[user[:pwd]@][unix-socket][:port[/dbname]][?param1=value1&...]
postgres://[user[:pwd]@][net-location][:port][/dbname][?param1=value1&...]
should be enough to cover most interesting cases without having to
resort to "param=value" pairs, but those are provided for the cases that
need them regardless.
libpq documentation has been shuffled around a bit, to avoid stuffing
all the format details into the PQconnectdbParams description, which was
already a bit overwhelming. The list of keywords has moved to its own
subsection, and the details on the URI format live in another subsection.
This includes a simple test program, as requested in discussion, to
ensure that interesting corner cases continue to work appropriately in
the future.
Author: Alexander Shulgin
Some tweaking by Álvaro Herrera, Greg Smith, Daniel Farina, Peter Eisentraut
Reviewed by Robert Haas, Alexey Klyukin (offlist), Heikki Linnakangas,
Marko Kreen, and others
Oh, it also supports postgresql:// but that's probably just an accident.
This patch reverts commit 191ef2b407
and thereby restores the pre-7.3 behavior of EXTRACT(EPOCH FROM
timestamp-without-tz). Per discussion, the more recent behavior was
misguided on a couple of grounds: it makes it hard to get a
non-timezone-aware epoch value for a timestamp, and it makes this one
case dependent on the value of the timezone GUC, which is incompatible
with having timestamp_part() labeled as immutable.
The other behavior is still available (in all releases) by explicitly
casting the timestamp to timestamp with time zone before applying EXTRACT.
This will need to be called out as an incompatible change in the 9.2
release notes. Although having mutable behavior in a function marked
immutable is clearly a bug, we're not going to back-patch such a change.
It's still non-deterministic in some sense ... but given fixed settings
and identical planning problems, it will now always choose the same plan,
so we probably shouldn't tar it with that brush. Per bug #6565 from
Guillaume Cottenceau. Back-patch to 9.0 where the behavior was fixed.
If we make the initially-called function return the table physical-size
estimate, acquire_inherited_sample_rows will be able to use that to
allocate numbers of samples among child tables, when the day comes that
we want to support foreign tables in inheritance trees.
ANALYZE now accepts foreign tables and allows the table's FDW to control
how the sample rows are collected. (But only manual ANALYZEs will touch
foreign tables, for the moment, since among other things it's not very
clear how to handle remote permissions checks in an auto-analyze.)
contrib/file_fdw is extended to support this.
Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.
Ants Aasma's original patch to add timing information for buffer I/O
requests exposed this data at the relation level, which was judged too
costly. I've here exposed it at the database level instead.
This patch provides a test case for libpq's row processor API.
contrib/dblink can deal with very large result sets by dumping them into
a tuplestore (which can spill to disk) --- but until now, the intermediate
storage of the query result in a PGresult meant memory bloat for any large
result. Now we use a row processor to convert the data to tuple form and
dump it directly into the tuplestore.
A limitation is that this only works for plain dblink() queries, not
dblink_send_query() followed by dblink_get_result(). In the latter
case we don't know the desired tuple rowtype soon enough. While hack
solutions to that are possible, a different user-level API would
probably be a better answer.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
Traditionally libpq has collected an entire query result before passing
it back to the application. That provides a simple and transactional API,
but it's pretty inefficient for large result sets. This patch allows the
application to process each row on-the-fly instead of accumulating the
rows into the PGresult. Error recovery becomes a bit more complex, but
often that tradeoff is well worth making.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
pg_stat_statements now hashes selected fields of the analyzed parse tree
to assign a "fingerprint" to each query, and groups all queries with the
same fingerprint into a single entry in the pg_stat_statements view.
In practice it is expected that queries with the same fingerprint will be
equivalent except for values of literal constants. To make the display
more useful, such constants are replaced by "?" in the displayed query
strings.
This mechanism currently supports only optimizable queries (SELECT,
INSERT, UPDATE, DELETE). Utility commands are still matched on the
basis of their literal query strings.
There remain some open questions about how to deal with utility statements
that contain optimizable queries (such as EXPLAIN and SELECT INTO) and how
to deal with expiring speculative hashtable entries that are made to save
the normalized form of a query string. However, fixing these issues should
require only localized changes, and since there are other open patches
involving contrib/pg_stat_statements, it seems best to go ahead and commit
what we've got.
Peter Geoghegan, reviewed by Daniel Farina
Currently, the only way to see the numbers this gathers is via
EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
the stats collector and pg_stat_statements in subsequent patches.
Ants Aasma, reviewed by Greg Smith, with some further changes by me.
Fix loss of previous expression-simplification work when a transform
function fires: we must not simply revert to untransformed input tree.
Instead build a dummy FuncExpr node to pass to the transform function.
This has the additional advantage of providing a simpler, more uniform
API for transform functions.
Move documentation to a somewhat less buried spot, relocate some
poorly-placed code, be more wary of null constants and invalid typmod
values, add an opr_sanity check on protransform function signatures,
and some other minor cosmetic adjustments.
Note: although this patch touches pg_proc.h, no need for catversion
bump, because the changes are cosmetic and don't actually change the
intended catalog contents.
Per a suggestion from Euler Taveira, it seems like a good idea to include
this information in \du (and \dg) output. This costs nothing for people
who are not using the VALID UNTIL feature, while for those who are, it's
rather critical information.
Fabrízio de Royes Mello
PGAC_PATH_COLLATEINDEX supposed that it could use AC_PATH_PROGS to search
for collateindex.pl, but that macro will only accept files that are marked
executable, and at least some DocBook installations don't mark the script
executable (a case the docs Makefile was already prepared for). Accept the
script if it's present and readable in $DOCBOOKSTYLE/bin, and otherwise
search the PATH as before.
Having fixed that up, we don't need the fallback case that was in the docs
Makefile, and instead can throw an understandable error if configure didn't
find the script. Per recent trouble report from John Lumby.
Document that routine vacuuming is now also important for the purpose
of index-only scans; and mention in the section that describes the
visibility map that it is used to implement index-only scans.
Marti Raudsepp, with some changes by me.
Instead of just stopping after removing an arbitrary subset of orphaned
large objects, commit and start a new transaction after each -l objects.
This is just as effective as the original patch at limiting the number of
locks used, and it doesn't require doing the OID collection process
repeatedly to get everything. Since the option no longer changes the
fundamental behavior of vacuumlo, and it avoids a known server-side
limitation, enable it by default (with a default limit of 1000 LOs per
transaction).
In passing, be more careful about properly quoting the names of tables
and fields, and do some other cosmetic cleanup.
This is intended as infrastructure to allow sepgsql to cooperate with
connection pooling software, by allowing the effective security label
to be set for each new connection.
KaiGai Kohei, reviewed by Yeb Havinga.
add ability to control permissions of created files
have psql echo its queries for easier debugging
output four separate log files, and delete them on success
add -r/--retain option to keep log files after success
make logs file append-only
remove -g/-G/-l logging options
sugggest tailing appropriate log file on failure
enhance -v/--verbose behavior
This patch fixes the other major compatibility-breaking limitation of
SPGiST, that it didn't store anything for null values of the indexed
column, and so could not support whole-index scans or "x IS NULL"
tests. The approach is to create a wholly separate search tree for
the null entries, and use fixed "allTheSame" insertion and search
rules when processing this tree, instead of calling the index opclass
methods. This way the opclass methods do not need to worry about
dealing with nulls.
Catversion bump is for pg_am updates as well as the change in on-disk
format of SPGiST indexes; there are some tweaks in SPGiST WAL records
as well.
Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev.
(The original also stored nulls separately, but it reused GIN code to do
so; which required undesirable compromises in the on-disk format, and
would likely lead to bugs due to the GIN code being required to work in
two very different contexts.)
The original API definition was incapable of supporting whole-index scans
because there was no way to invoke leaf-value reconstruction without
checking any qual conditions. Also, it was inefficient for
multiple-qual-condition scans because value reconstruction got done over
again for each qual condition, and because other internal work in the
consistent functions likewise had to be done for each qual. To fix these
issues, pass the whole scankey array to the opclass consistent functions,
instead of only letting them see one item at a time. (Essentially, the
loop over scankey entries is now inside the consistent functions not
outside them. This makes the consistent functions a bit more complicated,
but not unreasonably so.)
In itself this commit does nothing except save a few cycles in
multiple-qual-condition index scans, since we can't support whole-index
scans on SPGiST indexes until nulls are included in the index. However,
I consider this a must-fix for 9.2 because once we release it will get
very much harder to change the opclass API definition.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
GetForeignColumnOptions provides some abstraction for accessing
column-specific FDW options, on a par with the access functions that were
already provided here for other FDW-related information.
Adjust file_fdw.c to use GetForeignColumnOptions instead of equivalent
hand-rolled code.
In addition, add some SGML documentation for the functions exported by
foreign.c that are meant for use by FDW authors.
(This is the fdw_helper portion of the proposed pgsql_fdw patch.)
Hanada Shigeru, reviewed by KaiGai Kohei
The original API specification only allowed an FDW to create a single
access path, which doesn't seem like a terribly good idea in hindsight.
Instead, move the responsibility for building the Path node and calling
add_path() into the FDW's PlanForeignScan function. Now, it can do that
more than once if appropriate. There is no longer any need for the
transient FdwPlan struct, so get rid of that.
Etsuro Fujita, Shigeru Hanada, Tom Lane
In backup.sgml, point out that you need to be using the logging collector
if you want to log messages from a failing archive_command script. (This
is an oversimplification, in that it will work without the collector as
long as you're not sending postmaster stderr to /dev/null; but it seems
like a good idea to encourage use of the collector to avoid problems
with multiple processes concurrently scribbling on one file.)
In config.sgml, do some wordsmithing of logging_collector discussion.
Per bug #6518 from Janning Vygen
Comparing two xlog locations are useful for example when calculating
replication lag.
Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups
from me
This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators. It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats. In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.
Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns. This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.
There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful. But the patch is in good enough
shape to commit for wider testing.
Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
The only reason this didn't work before was that parserOpenTable()
rejects composite types. So use relation_openrv() directly and
manually do the errposition() setup that parserOpenTable() does.
For those of us who prefer the formatting of the docs using the
website stylesheets. Use "make STYLE=website draft" (for example) to use.
The stylesheet itself is referenced directly to the website, so there
is currently no copy of it stored in the source repository. Thus, docs
built with it will only look correct if the browser can access the website
when viewing them.
The <literal> markup is not visible as distinct on man pages, which
creates a bit of confusion when looking at the documentation of the
pg_basebackup -l option. Rather than reinventing the entire font
system for man pages to remedy this, just put some quotes around this
particular case, which should also help in other output formats.
Several places were still written as though standard_conforming_strings
didn't exist, much less be the default. Now that it is on by default,
we can simplify the text and just insert occasional notes suggesting that
you might have to think harder if it's turned off. Per discussion of a
suggestion from Hannes Frederic Sowa.
Back-patch to 9.1 where standard_conforming_strings was made the default.
Most people won't read them individually anyway, it's an easy way to find
them, and it's a lot of duplicated information if they are kept in two
different places.
This makes it easier to match a column name with the description of it,
and makes it possible to add more detailed documentation in the future.
This patch does not add that extra documentation at this point, only
the structure required for it.
Modeled on the changes already done to pg_stat_activity.
This check was overlooked when we added function execute permissions to the
system years ago. For an ordinary trigger function it's not a big deal,
since trigger functions execute with the permissions of the table owner,
so they couldn't do anything the user issuing the CREATE TRIGGER couldn't
have done anyway. However, if a trigger function is SECURITY DEFINER,
that is not the case. The lack of checking would allow another user to
install it on his own table and then invoke it with, essentially, forged
input data; which the trigger function is unlikely to realize, so it might
do something undesirable, for instance insert false entries in an audit log
table.
Reported by Dinesh Kumar, patch by Robert Haas
Security: CVE-2012-0866
This allows changing the location of the files that were previously
hard-coded to server.crt, server.key, root.crt, root.crl.
server.crt and server.key continue to be the default settings and are
thus required to be present by default if SSL is enabled. But the
settings for the server-side CA and CRL are now empty by default, and
if they are set, the files are required to be present. This replaces
the previous behavior of ignoring the functionality if the files were
not found.
Some line feeds are added to target lists and from lists to make
them more readable. By default they wrap at 80 columns if possible,
but the wrap column is also selectable - if 0 it wraps after every
item.
Andrew Dunstan, reviewed by Hitoshi Harada.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK. This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).
There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag. But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.
KaiGai Kohei, with some wordsmithing by me.
Add new psql settings and command-line options to support setting the
field and record separators for unaligned output to a zero byte, for
easier interfacing with other shell tools.
reviewed by Abhijit Menon-Sen
Sometimes it may be useful to get actual row counts out of EXPLAIN
(ANALYZE) without paying the cost of timing every node entry/exit.
With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that.
Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
Do not prompt when options were not specified. Assume --no-createdb,
--no-createrole, --no-superuser by default.
Also disable prompting for user name in dropdb, unless --interactive
was specified.
reviewed by Josh Kupershmidt
In dry-run mode, just the name of the file to be removed is printed to
stdout; this is so the user can easily plug it into another program
through a pipe. If debug mode is also specified, a more verbose message
is printed to stderr.
Author: Gabriele Bartolini
Reviewer: Josh Kupershmidt
Like the XML data type, we simply store JSON data as text, after checking
that it is valid. More complex operations such as canonicalization and
comparison may come later, but this is enough for not.
There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.
The sequence USAGE privilege is sufficiently similar to the SQL
standard that it seems reasonable to show in the information schema.
Also add some compatibility notes about it on the GRANT reference
page.
Add result object functions .colnames, .coltypes, .coltypmods to
obtain information about the result column names and types, which was
previously not possible in the PL/Python SPI interface.
reviewed by Abhijit Menon-Sen
This patch fixes the planner so that it can generate nestloop-with-
inner-indexscan plans even with one or more levels of joining between
the indexscan and the nestloop join that is supplying the parameter.
The executor was fixed to handle such cases some time ago, but the
planner was not ready. This should improve our plans in many situations
where join ordering restrictions formerly forced complete table scans.
There is probably a fair amount of tuning work yet to be done, because
of various heuristics that have been added to limit the number of
parameterized paths considered. However, we are not going to find out
what needs to be adjusted until the code gets some real-world use, so
it's time to get it in there where it can be tested easily.
Note API change for index AM amcostestimate functions. I'm not aware of
any non-core index AMs, but if there are any, they will need minor
adjustments.
Add counters for number and size of temporary files used
for spill-to-disk queries for each database to the
pg_stat_database view.
Tomas Vondra, review by Magnus Hagander
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.
Jun Ishizuka and Fujii Masao, with minor modifications
This reports the depth level of triggers currently in execution, or zero
if not called from inside a trigger.
No catversion bump in this patch, but you have to initdb if you want
access to the new function.
Author: Kevin Grittner
Replication occurs only to memory on standby, not to disk,
so provides additional performance if user wishes to
reduce durability level slightly. Adds concept of multiple
independent sync rep queues.
Fujii Masao and Simon Riggs
This separates the state (running/idle/idleintransaction etc) into
it's own field ("state"), and leaves the query field containing just
query text.
The query text will now mean "current query" when a query is running
and "last query" in other states. Accordingly,the field has been
renamed from current_query to query.
Since backwards compatibility was broken anyway to make that, the procpid
field has also been renamed to pid - along with the same field in
pg_stat_replication for consistency.
Scott Mead and Magnus Hagander, review work from Greg Smith
That avoids errors when the functions are used in queries like "SELECT
pg_relation_size(oid) FROM pg_class", and a table is dropped concurrently.
Phil Sorber
When creating a child table, or when attaching an existing table as
child of another, we must not allow inheritable constraints to be
merged with non-inheritable ones, because then grandchildren would not
properly get the constraint. This would violate the grandparent's
expectations.
Bugs noted by Robert Haas.
Author: Nikhil Sontakke
Allows a user to use pg_cancel_queries() to cancel queries in
other backends if they are running under the same role.
pg_terminate_backend() still requires superuser permissoins.
Short patch, many authors working on the bikeshed: Magnus Hagander,
Josh Kupershmidt, Edward Muller, Greg Smith.
superuser doesn't have doesn't make much sense, as a superuser can do
whatever he wants through other means, anyway. So instead of granting
replication privilege to superusers in CREATE USER time by default, allow
replication connection from superusers whether or not they have the
replication privilege.
Patch by Noah Misch, per discussion on bug report #6264
Point out in the compatibility section that granting grant options to
PUBLIC is not supported by PostgreSQL. This is already mentioned
earlier, but since it concerns the information schema, it might be
worth pointing out explicitly as a compatibility issue.
The original implementation of this interpreted it as a kind of
"inheritance" facility and named all the internal structures
accordingly. This turned out to be very confusing, because it has
nothing to do with the INHERITS feature. So rename all the internal
parser infrastructure, update the comments, adjust the error messages,
and split up the regression tests.
ALTER DOMAIN / DROP CONSTRAINT on a nonexistent constraint name did
not report any error. Now it reports an error. The IF EXISTS option
was added to get the usual behavior of ignoring nonexistent objects to
drop.
In commit e2c2c2e8b1 I made use of nested
list structures to show which clauses went with which index columns, but
on reflection that's a data structure that only an old-line Lisp hacker
could love. Worse, it adds unnecessary complication to the many places
that don't much care which clauses go with which index columns. Revert
to the previous arrangement of flat lists of clauses, and instead add a
parallel integer list of column numbers. The places that care about the
pairing can chase both lists with forboth(), while the places that don't
care just examine one list the same as before.
The only real downside to this is that there are now two more lists that
need to be passed to amcostestimate functions in case they care about
column matching (which btcostestimate does, so not passing the info is not
an option). Rather than deal with 11-argument amcostestimate functions,
pass just the IndexPath and expect the functions to extract fields from it.
That gets us down to 7 arguments which is better than 11, and it seems
more future-proof against likely additions to the information we keep
about an index path.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view. Views not configured with this
option cannot provide robust row-level security, but will perform far
better.
Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!). Review (in earlier versions) by Noah Misch and
others. Design advice by Tom Lane and myself. Further review and
cleanup by me.
You could already rename domains using ALTER TYPE, but with this new
command it is more consistent with how other commands treat domains as
a subcategory of types.
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains. The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.
reviewed by Yeb Havinga
This makes them enforceable only on the parent table, not on children
tables. This is useful in various situations, per discussion involving
people bitten by the restrictive behavior introduced in 8.4.
Message-Id:
8762mp93iw.fsf@comcast.netCAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com
Authors: Nikhil Sontakke, Alex Hunsaker
Reviewed by Robert Haas and myself
Operator classes can specify whether or not they support this; this
preserves the flexibility to use lossy representations within an index.
In passing, move constant data about a given index into the rd_amcache
cache area, instead of doing fresh lookups each time we start an index
operation. This is mainly to try to make sure that spgcanreturn() has
insignificant cost; I still don't have any proof that it matters for
actual index accesses. Also, get rid of useless copying of FmgrInfo
pointers; we can perfectly well use the relcache's versions in-place.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple. However,
the SP-GiST AM forces the issue, so let's fix it.
This patch only installs the new API; no behavior actually changes.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees. As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.
There are a number of areas that could still use improvement, but at this
point the code seems committable.
Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
Valid values are --pre-data, data and post-data. The option can be
given more than once. --schema-only is equivalent to
--section=pre-data --section=post-data. --data-only is equivalent
to --section=data.
Andrew Dunstan, reviewed by Joachim Wieland and Josh Berkus.
This works the same as include, except that an error is not thrown
if the file is missing. Instead the fact that it's missing is
logged.
Greg Smith, reviewed by Euler Taveira de Oliveira.
Andrew Dunstan, reviewed by Josh Berkus, Robert Haas and Peter Geoghegan.
This allows dumping of a table definition but not its data, on a per table basis.
Table name patterns are supported just as for --exclude-table.
Instead, add a function pg_tablespace_location(oid) used to return
the same information, and do this by reading the symbolic link.
Doing it this way makes it possible to relocate a tablespace when the
database is down by simply changing the symbolic link.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting. In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator. While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.
Robert Haas and Tom Lane
If unable to connect to "postgres", try "template1". This allows things to
work more smoothly in the case where the postgres database has been
dropped. And just in case that's not good enough, also allow the user to
specify a maintenance database to be used for the initial connection, to
cover the case where neither postgres nor template1 is suitable.
Add a function plpy.cursor that is similar to plpy.execute but uses an
SPI cursor to avoid fetching the entire result set into memory.
Jan Urbański, reviewed by Steve Singer
This can be used to set (or unset) environment variables that will
affect programs called by psql (such as the PAGER), probably most
usefully in a .psqlrc file.
Andrew Dunstan, reviewed by Josh Kupershmidt.
PGresults used to be read-only from the application's viewpoint, but now
that we've exposed various functions that allow modification of a PGresult,
that sweeping statement is no longer accurate. Noted by Dmitriy Igrishin.
The correct information appears in the text, so just remove the statement
in the table, where it did not fit nicely anyway. (Curiously, the correct
info has been there much longer than the erroneous table entry.)
Resolves problem noted by Daniele Varrazzo.
In HEAD and 9.1, also do a bit of wordsmithing on other text on the page.
The WITH [NO] DATA option was not supported, nor the ability to specify
replacement column names; the former limitation wasn't even documented, as
per recent complaint from Naoya Anzai. Fix by moving the responsibility
for supporting these options into the executor. It actually takes less
code this way ...
catversion bump due to change in representation of IntoClause, which might
affect stored rules.
It's not clear that a per-datatype typanalyze function would be any more
useful than a generic typanalyze for ranges. What *is* clear is that
letting unprivileged users select typanalyze functions is a crash risk or
worse. So remove the option from CREATE TYPE AS RANGE, and instead put in
a generic typanalyze function for ranges. The generic function does
nothing as yet, but hopefully we'll improve that before 9.2 release.
Per discussion, the zero-argument forms aren't really worth the catalog
space (just write 'empty' instead). The one-argument forms have some use,
but they also have a serious problem with looking too much like functional
cast notation; to the point where in many real use-cases, the parser would
misinterpret what was wanted.
Committing this as a separate patch, with the thought that we might want
to revert part or all of it if we can think of some way around the cast
ambiguity.
For a very long time, one of the parser's heuristics for resolving
ambiguous operator calls has been to assume that unknown-type literals are
of the same type as the other input (if it's known). However, this was
only used in the first step of quickly checking for an exact-types match,
and thus did not help in resolving matches that require coercion, such as
matches to polymorphic operators. As we add more polymorphic operators,
this becomes more of a problem. This patch adds another use of the same
heuristic as a last-ditch check before failing to resolve an ambiguous
operator or function call. In particular this will let us define the range
inclusion operator in a less limited way (to come in a follow-on patch).
A very long time ago, language names were specified as literals rather
than identifiers, so this code was added to do case-folding. But that
style has ben deprecated for many years so this isn't needed any more.
Language names will still be downcased when specified as unquoted
identifiers, but quoted identifiers or the old style using string
literals will be left as-is.
Change range_lower and range_upper to return NULL rather than throwing an
error when the input range is empty or the relevant bound is infinite. Per
discussion, throwing an error seems likely to be unduly hard to work with.
Also, this is more consistent with the behavior of the constructors, which
treat NULL as meaning an infinite bound.
Change range_before, range_after, range_adjacent to return false rather
than throwing an error when one or both input ranges are empty.
The original definition is unnecessarily difficult to use, and also can
result in undesirable planner failures since the planner could try to
compare an empty range to something else while deriving statistical
estimates. (This was, in fact, the cause of repeatable regression test
failures on buildfarm member jaguar, as well as intermittent failures
elsewhere.)
Also tweak rangetypes regression test to not drop all the objects it
creates, so that the final state of the regression database contains
some rangetype objects for pg_dump testing.
This adds the "auto" option to the \x command, which switches to the
expanded mode when the normal output would be wider than the screen.
reviewed by Noah Misch
This reverts commit 0180bd6180.
contrib/userlock is gone, but user-level locking still exists,
and is exposed via the pg_advisory* family of functions.
Since PostgreSQL 9.0, we've emitted a warning message when an operator
named => is created, because the SQL standard now reserves that token
for another use. But we've also shipped such an operator with hstore.
Use of the function hstore(text, text) has been recommended in
preference to =>(text, text). Per discussion, it's now time to take
the next step and stop shipping the operator. This will allow us to
prohibit the use of => as an operator name in a future release if and
when we wish to support the SQL standard use of this token.
The release notes should mention this incompatibility.
Patch by me, reviewed by David Wheeler, Dimitri Fontaine and Tom Lane.
Add option for parallel streaming of the transaction log while a
base backup is running, to get the logfiles before the server has
removed them.
Also add a tool called pg_receivexlog, which streams the transaction
log into files, creating a log archive without having to wait for
segments to complete, thus decreasing the window of data loss without
having to waste space using archive_timeout. This works best in
combination with archive_command - suggested usage docs etc coming later.
This allows different instances to use the eventlog with different
identifiers, by setting the event_source GUC, similar to how
syslog_ident works.
Original patch by MauMau, heavily modified by Magnus Hagander
A transaction can export a snapshot with pg_export_snapshot(), and then
others can import it with SET TRANSACTION SNAPSHOT. The data does not
leave the server so there are not security issues. A snapshot can only
be imported while the exporting transaction is still running, and there
are some other restrictions.
I'm not totally convinced that we've covered all the bases for SSI (true
serializable) mode, but it works fine for lesser isolation modes.
Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
by Tom Lane
In general the data returned by an index-only scan should have the
datatypes originally computed by FormIndexDatum. If the index opclasses
use "storage" datatypes different from their input datatypes, the scan
tuple will not have the same rowtype attributed to the index; but we had
a hard-wired assumption that that was true in nodeIndexonlyscan.c. We'd
already hacked around the issue for the one case where the types are
different in btree indexes (btree name_ops), but this would definitely
come back to bite us if we ever implement index-only scans in GiST.
To fix, require the index AM to explicitly provide the tupdesc for the
tuple it is returning. btree can just pass back the index's tupdesc, but
GiST will have to work harder when and if it supports index-only scans.
I had previously proposed fixing this by allowing the index AM to fill the
scan tuple slot directly; but on reflection that seemed like a module
layering violation, since TupleTableSlots are creatures of the executor.
At least in the btree case, it would also be less efficient, since the
tuple deconstruction work would occur even for rows later found to be
invisible to the scan's snapshot.
Rearrange text to improve clarity, and add an example of implicit reference
to a plpgsql variable in a bound cursor's query. Byproduct of some work
I'd done on the "named cursor parameters" patch before giving up on it.
Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.
This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.
Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan. Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.
We have seen one too many reports of people trying to use 9.1 extension
files in the old-fashioned way of sourcing them in psql. Not only does
that usually not work (due to failure to substitute for MODULE_PATHNAME
and/or @extschema@), but if it did work they'd get a collection of loose
objects not an extension. To prevent this, insert an \echo ... \quit
line that prints a suitable error message into each extension script file,
and teach commands/extension.c to ignore lines starting with \echo.
That should not only prevent any adverse consequences of loading a script
file the wrong way, but make it crystal clear to users that they need to
do it differently now.
Tom Lane, following an idea of Andrew Dunstan's. Back-patch into 9.1
... there is not going to be much value in this if we wait till 9.2.
The documentation neglected to explain its behavior in a script file
(it only ends execution of the script, not psql as a whole), and failed
to mention the long form \quit either.
This might help to avoid confusion between the CREATE USER command,
and the deprecated CREATEUSER option to CREATE ROLE, as per a recent
complaint from Ron Adams. At any rate, having a cross-link here
seems like a good idea; two commands that are so similar should
reference each other.
We copy all the matched tuples off the page during _bt_readpage, instead of
expensively re-locking the page during each subsequent tuple fetch. This
costs a bit more local storage, but not more than 2*BLCKSZ worth, and the
reduction in LWLock traffic is certainly worth that. What's more, this
lets us get rid of the API wart in the original patch that said an index AM
could randomly decline to supply an index tuple despite having asserted
pg_am.amcanreturn. That will be important for future improvements in the
index-only-scan feature, since the executor will now be able to rely on
having the index data available.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
This variable provides only marginal error-prevention capability (since
it can only check the prefix of a qualified GUC name), and the consensus
is that that isn't worth the amount of hassle that maintaining the setting
creates for DBAs. So, let's just remove it.
With this commit, the system will silently accept a value for any qualified
GUC name at all, whether it has anything to do with any known extension or
not. (Unqualified names still have to match known built-in settings,
though; and you will get a WARNING at extension load time if there's an
unrecognized setting with that extension's prefix.)
There's still some discussion ongoing about whether to tighten that up and
if so how; but if we do come up with a solution, it's not likely to look
anything like custom_variable_classes.
This patch has two distinct purposes: to report multiple problems in
postgresql.conf rather than always bailing out after the first one,
and to change the policy for whether changes are applied when there are
unrelated errors in postgresql.conf.
Formerly the policy was to apply no changes if any errors could be
detected, but that had a significant consistency problem, because in some
cases specific values might be seen as valid by some processes but invalid
by others. This meant that the latter processes would fail to adopt
changes in other parameters even though the former processes had done so.
The new policy is that during SIGHUP, the file is rejected as a whole
if there are any errors in the "name = value" syntax, or if any lines
attempt to set nonexistent built-in parameters, or if any lines attempt
to set custom parameters whose prefix is not listed in (the new value of)
custom_variable_classes. These tests should always give the same results
in all processes, and provide what seems a reasonably robust defense
against loading values from badly corrupted config files. If these tests
pass, all processes will apply all settings that they individually see as
good, ignoring (but logging) any they don't.
In addition, the postmaster does not abandon reading a configuration file
after the first syntax error, but continues to read the file and report
syntax errors (up to a maximum of 100 syntax errors per file).
The postmaster will still refuse to start up if the configuration file
contains any errors at startup time, but these changes allow multiple
errors to be detected and reported before quitting.
Alexey Klyukin, reviewed by Andy Colson and av (Alexander ?)
with some additional hacking by Tom Lane
We'll now use "exists" for EXISTS(SELECT ...), "array" for ARRAY(SELECT
...), or the sub-select's own result column name for a simple expression
sub-select. Previously, you usually got "?column?" in such cases.
Marti Raudsepp, reviewed by Kyotaro Horiugchi
pg_trgm was already doing this unofficially, but the implementation hadn't
been thought through very well and leaked memory. Restructure the core
GiST code so that it actually works, and document it. Ordinarily this
would have required an extra memory context creation/destruction for each
GiST index search, but I was able to avoid that in the normal case of a
non-rescanned search by finessing the handling of the RBTree. It used to
have its own context always, but now shares a context with the
scan-lifespan data structures, unless there is more than one rescan call.
This should make the added overhead unnoticeable in typical cases.
I've made a significant effort at filling in the "Using EXPLAIN" section
to be reasonably complete about mentioning everything that EXPLAIN can
output, including the "Rows Removed" outputs that were added by Marko
Tiikkaja's recent documentation-free patch. I also updated the examples to
be consistent with current behavior; several of them were not close to what
the current code will do. No doubt there's more that can be done here, but
I'm out of patience for today.
Because these tests require root privileges, not to mention invasive
changes to the security configuration of the host system, it's not
reasonable for them to be invoked by a regular "make check" or "make
installcheck". Instead, dike out the Makefile's knowledge of the tests,
and change chkselinuxenv (now renamed "test_sepgsql") into a script that
verifies the environment is workable and then runs the tests. It's
expected that test_sepgsql will only be run manually.
While at it, do some cleanup in the error checking in the script, and
do some wordsmithing in the documentation.
The keywords and values arguments of these functions are more properly
declared "const char * const *" than just "const char **".
Lionel Elie Mamane, reviewed by Craig Ringer
This mode still exists for backwards compatibility, making
sslmode=require the same as sslmode=verify-ca when the file is present,
but not causing an error when it isn't.
Per bug 6189, reported by Srinivas Aji
This is implemented as a per-column boolean option, rather than trying
to match COPY's convention of a single option listing the column names.
Shigeru Hanada, reviewed by KaiGai Kohei
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach. The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.
In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed. The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps. It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan. The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced. Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
BREAKAGE.
Remove double-quoting of index/table names in reindexdb. BACKWARD
COMPABILITY BREAKAGE.
Document thate user/database names are preserved with double-quoting by
command-line tools like vacuumdb.
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization. But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults. This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.
No actual code changes here, just header refactoring.
When building a GiST index that doesn't fit in cache, buffers are attached
to some internal nodes in the index. This speeds up the build by avoiding
random I/O that would otherwise be needed to traverse all the way down the
tree to the find right leaf page for tuple.
Alexander Korotkov
We've now seen more than one gripe from somebody who didn't get the memo
about how to install contrib modules in 9.1. Try to make it a little more
prominent that you aren't supposed to call the scripts directly anymore.
This example wasn't updated when we changed the behavior of bpcharlen()
in 8.0, nor when we changed the number of parameters taken by the bpchar()
cast function in 7.3. Per report from lsliang.
These changes allow backtick command evaluation and psql variable
interpolation to happen on substrings of a single meta-command argument.
Formerly, no such evaluations happened at all if the backtick or colon
wasn't the first character of the argument, and we considered an argument
completed as soon as we'd processed one backtick, variable reference, or
quoted substring. A string like 'FOO'BAR was thus taken as two arguments
not one, not exactly what one would expect. In the new coding, an argument
is considered terminated only by unquoted whitespace or backslash.
Also, clean up a bunch of omissions, infelicities and outright errors in
the psql documentation of variables and metacommand argument syntax.
The previous coding resulted in contrib modules unintentionally overriding
the use of CONTRIB_TESTDB. There seems no particularly good reason to
allow that (after all, the makefile can set CONTRIB_TESTDB if that's really
what it intends).
In passing, document REGRESS_OPTS where the other pgxs.mk options are
documented.
Back-patch to 9.1 --- in prior versions, there were no cases of contrib
modules setting REGRESS_OPTS without including the --dbname switch, so
while the coding was fragile there was no actual bug.
Instead of displaying comments on an arbitrary subset of the object
types which support them, make \dd display comments on exactly those
object types which don't have their own backlash commands. We now
regard the display of comments as properly the job of the relevant
backslash command (though many of them do so only in verbose mode)
rather than something that \dd should be responsible for. However,
a handful of object types have no backlash command, so make \dd
give information about those.
Josh Kupershmidt
The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.
Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch. The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.
The relevant backslash commands already exist, so we're just adding an
additional column. With this commit, all objects that have psql backslash
commands and accept comments should now display those comments at least
in verbose mode.
Josh Kupershmidt, with doc additions by me.
\dc and \dD now accept a "+" option, which will cause the comments to
be displayed. Along the way, correct a few oversights in the previous
commit in this area, 3b17efdfdd - namely,
(1) when \dL+ is used, make description still be the last column, for
consistency with what we've done elsewhere; and (2) document the
difference between \dC and \dC+.
Josh Kupershmidt, with a couple of doc changes by me.
Also, handle failure better: don't just blindly keep trying to delete
stuff after the transaction has already failed.
Tim Lewis, reviewed by Josh Kupershmidt, with further hacking by me.
There is what may actually be a mistake in our markup. The problem is
in a situation like
<para>
<command>FOO</command> is ...
there is strictly speaking a line break before "FOO". In the HTML
output, this does not appear to be a problem, but in the man page
output, this shows up, so you get double blank lines at odd places.
So far, we have attempted to work around this with an XSL hack, but
that causes other problems, such as creating run-ins in places like
<acronym>SQL</acronym> <command>COPY</command>
So fix the problem properly by removing the extra whitespace. I only
fixed the problems that affect the man page output, not all the
places.
Somebody added a cross-reference to shared_preload_libraries, but wrote the
wrong variable name when they did it (and didn't bother to make it a link
either).
Spotted by Christoph Anton Mitterer.
The output of \dL (list languages) is fairly narrow, so we just always
display the comment. \dC (list casts) can get fairly wide, so we only
display comments if the new \dC+ option is specified.
Josh Kupershmidt
Also change "switch" to "arg" because "switch" is a bit of a sloppy
term. So the environment variable is called
PSQL_EDITOR_LINENUMBER_ARG. Set "+" as hardcoded default value on
Unix (since "vi" is the hardcoded default editor), so many users won't
have to configure this at all. Move the documentation around a bit to
centralize the editor configuration under environment variables,
rather than repeating bits of it under every backslash command that
invokes an editor.
Previously, xpath() simply returned an empty array if the expression did
not yield a node set. This is useless for expressions that return scalars,
such as one with name() at the top level. Arrange to return the scalar
value as a single-element xml array, instead. (String values will be
suitably escaped.)
This change will also cause xpath_exists() to return true, not false,
for such expressions.
Florian Pflug, reviewed by Radoslaw Smogura
This requires a new shared catalog, pg_shseclabel.
Along the way, fix the security_label regression tests so that they
don't monkey with the labels of any pre-existing objects. This is
unlikely to matter in practice, since only the label for the "dummy"
provider was being manipulated. But this way still seems cleaner.
KaiGai Kohei, with fairly extensive hacking by me.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.
Fujii Masao, review, rework and doc rewrite by Simon Riggs
This is more SQL-spec-compliant, more easily extensible, and better
performing than the old method of inventing special variables.
Pavel Stehule, reviewed by Shigeru Hanada and David Wheeler
When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
on an unshared database relation, and we can verify that no conflicting
locks can possibly be present, record the lock in a per-backend queue,
stored within the PGPROC, rather than in the primary lock table. This
eliminates a great deal of contention on the lock manager LWLocks.
This patch also refactors the interface between GetLockStatusData() and
pg_lock_status() to be a bit more abstract, so that we don't rely so
heavily on the lock manager's internal representation details. The new
fast path lock structures don't have a LOCK or PROCLOCK structure to
return, so we mustn't depend on that for purposes of listing outstanding
locks.
Review by Jeff Davis.
We already have similar functions for many other object types, including
operator classes, so it seems like we should have this one, too.
Extracted from a larger patch by Josh Kupershmidt
This function supports untranslated detail messages, in the same way that
errmsg_internal supports untranslated primary messages. We've needed this
for some time IMO, but discussion of some cases in the SSI code provided
the impetus to actually add it.
Kevin Grittner, with minor adjustments by me
The commit action of temporary tables is currently not cataloged, so
we can't easily show it. The previous value was outdated from before
we had different commit actions.