Commit graph

6779 commits

Author SHA1 Message Date
Andrew Dunstan
72d422a522 Map basebackup tablespaces using a tablespace_map file
Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which is
used by the restoring postgres to create the correct links in pg_tblspc.
The backup protocol also now has an option to request this file to be
included in the backup stream, and this is used by pg_basebackup when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.
2015-05-12 09:29:10 -04:00
Alvaro Herrera
b488c580ae Allow on-the-fly capture of DDL event details
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
2015-05-11 19:14:31 -03:00
Tom Lane
1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Andrew Dunstan
cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan
0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Stephen Frost
4b342fb591 Bump catversion for pg_file_settings
Pointed out by Andres (thanks!)

Apologies for not including it in the initial patch.
2015-05-08 19:14:32 -04:00
Stephen Frost
a97e0c3354 Add pg_file_settings view and function
The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele
2015-05-08 19:09:26 -04:00
Heikki Linnakangas
de7688442f At promotion, archive last segment from old timeline with .partial suffix.
Previously, we would archive the possible-incomplete WAL segment with its
normal filename, but that causes trouble if the server owning that timeline
is still running, and tries to archive the same segment later. It's not nice
for the standby to trip up the master's archival like that. And it's pretty
confusing, anyway, to have an incomplete segment in the archive that's
indistinguishable from a normal, complete segment.

To avoid such confusion, add a .partial suffix to the file. Or to be more
precise, make a copy of the old segment under the .partial suffix, and
archive that instead of the original file. pg_receivexlog also uses the
.partial suffix for the same purpose, to tell apart incompletely streamed
files from complete ones.

There is no automatic mechanism to use the .partial files at recovery, so
they will go unused, unless the administrator manually copies to them to
the pg_xlog directory (and removes the .partial suffix). Recovery won't
normally need the WAL - when recovering to the new timeline, it will find
the same WAL on the first segment on the new timeline instead - but it
nevertheless feels better to archive the file with the .partial suffix, for
debugging purposes if nothing else.
2015-05-08 21:59:01 +03:00
Heikki Linnakangas
179cdd0981 Add macros to check if a filename is a WAL segment or other such file.
We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.
2015-05-08 21:58:57 +03:00
Andres Freund
e8898e9169 Minor ON CONFLICT related comments and doc fixes.
Geoff Winkless, Stephen Frost, Peter Geoghegan and me.
2015-05-08 19:24:14 +02:00
Robert Haas
53bb309d2d Teach autovacuum about multixact member wraparound.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases.  This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off.  We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas
2015-05-08 12:53:00 -04:00
Andres Freund
168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Andres Freund
2c8f4836db Represent columns requiring insert and update privileges indentently.
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held.  As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund
2015-05-08 00:20:46 +02:00
Alvaro Herrera
db5f98ab4f Improve BRIN infra, minmax opclass and regression test
The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.
2015-05-07 13:02:22 -03:00
Robert Haas
1998261034 Avoid using a C++ keyword as a structure member name.
Per request from Peter Eisentraut.
2015-05-05 22:41:03 -04:00
Alvaro Herrera
3b6db1f445 Add geometry/range functions to support BRIN inclusion
This commit adds the following functions:
    box(point) -> box
    bound_box(box, box) -> box
    inet_same_family(inet, inet) -> bool
    inet_merge(inet, inet) -> cidr
    range_merge(anyrange, anyrange) -> anyrange

The first of these is also used to implement a new assignment cast from
point to box.

These functions are the first part of a base to implement an "inclusion"
operator class for BRIN, for multidimensional data types.

Author: Emre Hasegeli
Reviewed by: Andreas Karlsson
2015-05-05 15:22:24 -03:00
Tom Lane
2503982be4 Improve procost estimates for some text search functions.
The text search functions that involve parsing raw text into lexemes are
remarkably CPU-intensive, so estimating them at the same cost as most other
built-in functions seems like a mistake; moreover, doing so turns out to
discourage the optimizer from using functional indexes on these functions.
After some debate, we've agreed to raise procost from 1 to 100 for
to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(),
ts_match_tt(), and ts_match_tq(), which are all the text search functions
that parse raw text.

Also increase procost for the 2-argument form of ts_rewrite()
(tsquery_rewrite_query); while this function doesn't do text parsing,
it does execute a user-supplied SQL query, so its previous procost of 1 is
clearly a drastic underestimate.  It seems reasonable to assign it the same
cost we assign to PL functions by default, so 100 is the number here too.

I did not bother bumping catversion for this change, since it does not
break catalog compatibility with the server executable nor result in
any regression test changes.

Per complaint from Andrew Gierth and subsequent discussion.
2015-05-04 15:38:57 -04:00
Robert Haas
2ce439f337 Recursively fsync() the data directory after a crash.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk.  This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
2015-05-04 14:13:53 -04:00
Robert Haas
e7cb7ee145 Allow FDWs and custom scan providers to replace joins with scans.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
2015-05-01 08:50:35 -04:00
Andres Freund
2b22795b32 Copy editing of the replication origins patch.
Michael Paquier and myself.
2015-05-01 12:22:13 +02:00
Andres Freund
1db12da85b Fix unaligned memory access in xlog parsing due to replication origin patch.
ParseCommitRecord() accessed xl_xact_origin directly. But the chunks in
the commit record's data only have 4 byte alignment, whereas
xl_xact_origin's members require 8 byte alignment on some
platforms. Update comments to make not of that and copy the record to
stack local storage before reading.

With help from Stefan Kaltenbrunner in pinning down the buildfarm and
verifying the fix.
2015-05-01 11:36:14 +02:00
Robert Haas
924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Andres Freund
e0f26fc765 Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.
We can't rely on UINT16_MAX being present, which is why we introduced
PG_UINT16_MAX...

Buildfarm animal bowerbird via Andrew Gierth.
2015-04-30 00:19:36 +02:00
Andres Freund
5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Stephen Frost
dcbf5948e1 Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view.  This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.

As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks.  That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.

Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 12:29:42 -04:00
Andres Freund
6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Peter Eisentraut
cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00
Stephen Frost
e89bd02f58 Perform RLS WITH CHECK before constraints, etc
The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.

This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks.  This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.

The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.

Author: Dean Rasheed, with some kabitzing and comment changes by me.
2015-04-24 20:34:26 -04:00
Heikki Linnakangas
62420ae7d6 Move functions related to index maintenance to separate source file.
There is enough code here to deserve a file of their own, not be buried
in the middle of execUtils.c.
2015-04-24 09:33:23 +03:00
Stephen Frost
0bf22e0c8b RLS fixes, new hooks, and new test module
In prepend_row_security_policies(), defaultDeny was always true, so if
there were any hook policies, the RLS policies on the table would just
get discarded.  Fixed to start off with defaultDeny as false and then
properly set later if we detect that only the default deny policy exists
for the internal policies.

The infinite recursion detection in fireRIRrules() didn't properly
manage the activeRIRs list in the case of WCOs, so it would incorrectly
report infinite recusion if the same relation with RLS appeared more
than once in the rtable, for example "UPDATE t ... FROM t ...".

Further, the RLS expansion code in fireRIRrules() was handling RLS in
the main loop through the rtable, which lead to RTEs being visited twice
if they contained sublink subqueries, which
prepend_row_security_policies() attempted to handle by exiting early if
the RTE already had securityQuals.  That doesn't work, however, since
if the query involved a security barrier view on top of a table with
RLS, the RTE would already have securityQuals (from the view) by the
time fireRIRrules() was invoked, and so the table's RLS policies would
be ignored.  This is fixed in fireRIRrules() by handling RLS in a
separate loop at the end, after dealing with any other sublink
subqueries, thus ensuring that each RTE is only visited once for RLS
expansion.

The inheritance planner code didn't correctly handle non-target
relations with RLS, which would get turned into subqueries during
planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where
t1 has inheritance and t2 has RLS quals would fail.  Fix by making sure
to copy in and update the securityQuals when they exist for non-target
relations.

process_policies() was adding WCOs to non-target relations, which is
unnecessary, and could lead to a lot of wasted time in the rewriter and
the planner. Fix by only adding WCO policies when working on the result
relation.  Also in process_policies, we should be copying the USING
policies to the WITH CHECK policies on a per-policy basis, fix by moving
the copying up into the per-policy loop.

Lastly, as noted by Dean, we were simply adding policies returned by the
hook provided to the list of quals being AND'd, meaning that they would
actually restrict records returned and there was no option to have
internal policies and hook-based policies work together permissively (as
all internal policies currently work).  Instead, explicitly add support
for both permissive and restrictive policies by having a hook for each
and combining the results appropriately.  To ensure this is all done
correctly, add a new test module (test_rls_hooks) to test the various
combinations of internal, permissive, and restrictive hook policies.

Largely from Dean Rasheed (thanks!):

CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com

Author: Dean Rasheed, though I added the new hooks and test module.
2015-04-22 12:01:06 -04:00
Andres Freund
cef939c347 Rename pg_replication_slot's new active_in to active_pid.
In d811c037ce active_in was added but discussion since showed that
active_pid is preferred as a name.

Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-22 09:43:40 +02:00
Andres Freund
d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
Bruce Momjian
f92fc4c95d pg_upgrade: binary_upgrade_create_empty_extension() is strict
Was broken by commit 30982be4e5.

Patch by Jeff Janes
2015-04-17 20:08:42 -04:00
Peter Eisentraut
30982be4e5 Integrate pg_upgrade_support module into backend
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished.  Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.

The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.

The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:37 -04:00
Heikki Linnakangas
b73e7a0716 Oops, fix misspelled #endif
I hope this fixes the Windows builfarm failures.
2015-04-14 22:00:52 +03:00
Heikki Linnakangas
3dc2d62d04 Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.

Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)

Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 17:05:03 +03:00
Heikki Linnakangas
4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Heikki Linnakangas
b2a5545bd6 Don't archive bogus recycled or preallocated files after timeline switch.
After a timeline switch, we would leave behind recycled WAL segments that
are in the future, but on the old timeline. After promotion, and after they
become old enough to be recycled again, we would notice that they don't have
a .ready or .done file, create a .ready file for them, and archive them.
That's bogus, because the files contain garbage, recycled from an older
timeline (or prealloced as zeros). We shouldn't archive such files.

This could happen when we're following a timeline switch during replay, or
when we switch to new timeline at end-of-recovery.

To fix, whenever we switch to a new timeline, scan the data directory for
WAL segments on the old timeline, but with a higher segment number, and
remove them. Those don't belong to our timeline history, and are most
likely bogus recycled or preallocated files. They could also be valid files
that we streamed from the primary ahead of time, but in any case, they're
not needed to recover to the new timeline.
2015-04-13 16:53:49 +03:00
Magnus Hagander
9029f4b374 Add system view pg_stat_ssl
This view shows information about all connections, such as if the
connection is using SSL, which cipher is used, and which client
certificate (if any) is used.

Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier
2015-04-12 19:07:46 +02:00
Alvaro Herrera
27846f02c1 Optimize locking a tuple already locked by another subxact
Locking and updating the same tuple repeatedly led to some strange
multixacts being created which had several subtransactions of the same
parent transaction holding locks of the same strength.  However,
once a subxact of the current transaction holds a lock of a given
strength, it's not necessary to acquire the same lock again.  This made
some coding patterns much slower than required.

The fix is twofold.  First we change HeapTupleSatisfiesUpdate to return
HeapTupleBeingUpdated for the case where the current transaction is
already a single-xid locker for the given tuple; it used to return
HeapTupleMayBeUpdated for that case.  The new logic is simpler, and the
change to pgrowlocks is a testament to that: previously we needed to
check for the single-xid locker separately in a very ugly way.  That
test is simpler now.

As fallout from the HTSU change, some of its callers need to be amended
so that tuple-locked-by-own-transaction is taken into account in the
BeingUpdated case rather than the MayBeUpdated case.  For many of them
there is no difference; but heap_delete() and heap_update now check
explicitely and do not grab tuple lock in that case.

The HTSU change also means that routine MultiXactHasRunningRemoteMembers
introduced in commit 11ac4c73cb is no longer necessary and can be
removed; the case that used to require it is now handled naturally as
result of the changes to heap_delete and heap_update.

The second part of the fix to the performance issue is to adjust
heap_lock_tuple to avoid the slowness:

1. Previously we checked for the case that our own transaction already
held a strong enough lock and returned MayBeUpdated, but only in the
multixact case.  Now we do it for the plain Xid case as well, which
saves having to LockTuple.

2. If the current transaction is the only locker of the tuple (but with
a lock not as strong as what we need; otherwise it would have been
caught in the check mentioned above), we can skip sleeping on the
multixact, and instead go straight to create an updated multixact with
the additional lock strength.

3. Most importantly, make sure that both the single-xid-locker case and
the multixact-locker case optimization are applied always.  We do this
by checking both in a single place, rather than them appearing in two
separate portions of the routine -- something that is made possible by
the HeapTupleSatisfiesUpdate API change.  Previously we would only check
for the single-xid case when HTSU returned MayBeUpdated, and only
checked for the multixact case when HTSU returned BeingUpdated.  This
was at odds with what HTSU actually returned in one case: if our own
transaction was locker in a multixact, it returned MayBeUpdated, so the
optimization never applied.  This is what led to the large multixacts in
the first place.

Per bug report #8470 by Oskari Saarenmaa.
2015-04-10 13:47:15 -03:00
Alvaro Herrera
e9a077cad3 pg_event_trigger_dropped_objects: add is_temp column
It now also reports temporary objects dropped that are local to the
backend.  Previously we weren't reporting any temp objects because it
was deemed unnecessary; but as it turns out, it is necessary if we want
to keep close track of DDL command execution inside one session.  Temp
objects are reported as living in schema pg_temp, which works because
such a schema-qualification always refers to the temp objects of the
current session.
2015-04-06 11:40:55 -03:00
Alvaro Herrera
4ff695b17d Add log_min_autovacuum_duration per-table option
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
2015-04-03 11:55:50 -03:00
Fujii Masao
8c8a886268 Add palloc_extended for frontend and backend.
This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.

Michael Paquier, reviewed by me.
2015-04-03 17:36:12 +09:00
Robert Haas
abd94bcac4 Use abbreviated keys for faster sorting of numeric datums.
Andrew Gierth, reviewed by Peter Geoghegan, with further tweaks by me.
2015-04-02 14:04:26 -04:00
Andres Freund
62e2a8dc2c Define integer limits independently from the system definitions.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.

One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.

Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.

Discussion: 20150331141423.GK4878@alap3.anarazel.de
2015-04-02 17:43:35 +02:00
Robert Haas
4cd639baf4 Revert "psql: fix \connect with URIs and conninfo strings"
This reverts commit fcef161729, about
which both the buildfarm and my local machine are very unhappy.
2015-04-02 10:10:22 -04:00
Alvaro Herrera
fcef161729 psql: fix \connect with URIs and conninfo strings
psql was already accepting conninfo strings as the first parameter in
\connect, but the way it worked wasn't sane; some of the other
parameters would get the previous connection's values, causing it to
connect to a completely unexpected server or, more likely, not finding
any server at all because of completely wrong combinations of
parameters.

Fix by explicitely checking for a conninfo-looking parameter in the
dbname position; if one is found, use its complete specification rather
than mix with the other arguments.  Also, change tab-completion to not
try to complete conninfo/URI-looking "dbnames" and document that
conninfos are accepted as first argument.

There was a weak consensus to backpatch this, because while the behavior
of using the dbname as a conninfo is nowhere documented for \connect, it
is reasonable to expect that it works because it does work in many other
contexts.  Therefore this is backpatched all the way back to 9.0.

To implement this, routines previously private to libpq have been
duplicated so that psql can decide what looks like a conninfo/URI
string.  In back branches, just duplicate the same code all the way back
to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version.
In master, the routines are moved to src/common and renamed.

Author: David Fetter, Andrew Dunstan.  Some editorialization by me
(probably earning a Gierth's "Sloppy" badge in the process.)
Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost,
Robert Haas, Andrew Dunstan.
2015-04-01 20:00:07 -03:00
Heikki Linnakangas
f770870d9e Move inet/cidr GiST opclass functions to correct place in header file.
They were accidentally placed under the GIN heading.

Andreas Karlsson
2015-04-01 19:20:45 +03:00
Andrew Dunstan
fa1e5afa8a Run pg_upgrade and pg_resetxlog with restricted token on Windows
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.

Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.

Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
2015-03-30 17:07:52 -04:00
Alvaro Herrera
97690ea6e8 Change array_offset to return subscripts, not offsets
... and rename it and its sibling array_offsets to array_position and
array_positions, to account for the changed behavior.

Having the functions return subscripts better matches existing practice,
and is better suited to using the result value as a subscript into the
array directly.  For one-based arrays, the new definition is identical
to what was originally committed.

(We use the term "subscript" in the documentation, which is what we use
whenever we talk about arrays; but the functions themselves are named
using the word "position" to match the standard-defined POSITION()
functions.)

Author: Pavel Stěhule
Behavioral problem noted by Dean Rasheed.
2015-03-30 16:13:21 -03:00