Commit graph

1793 commits

Author SHA1 Message Date
Tom Lane
a8fc195052 In plpgsql, don't try to convert int2vector or oidvector to expanded array.
These types are storage-compatible with real arrays, but they don't support
toasting, so of course they can't support expansion either.

Per bug #14289 from Michael Overmeyer.  Back-patch to 9.5 where expanded
arrays were introduced.

Report: <20160818174414.1529.37913@wrigleys.postgresql.org>
2016-08-18 14:48:51 -04:00
Tom Lane
6356512140 Remove bogus dependencies on NUMERIC_MAX_PRECISION.
NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision
and scale you can write in a numeric typmod.  It might once have had
something to do with the allowed range of a typmod-less numeric value,
but at least since 9.1 we've allowed, and documented that we allowed,
any value that would physically fit in the numeric storage format;
which is something over 100000 decimal digits, not 1000.

Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit
on the allowed range of the exponent in scientific-format input.  That was
especially silly in view of the fact that you can enter larger numbers as
long as you don't use 'e' to do it.  Just constrain the value enough to
avoid localized overflow, and let make_result be the final arbiter of what
is too large.  Likewise adjust ecpg's equivalent of this code.

Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the
number of base-NBASE digits it would accept.  That created a dump/restore
hazard for binary COPY without doing anything useful; the wire-format
limit on number of digits (65535) is about as tight as we would want.

In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what
the numeric range limit is.  That code doesn't exist in the back branches.

Per gripe from Aravind Kumar.  Back-patch to all supported branches,
since they all contain the documentation claim about allowed range of
NUMERIC (cf commit cabf5d84b).

Discussion: <2895.1471195721@sss.pgh.pa.us>
2016-08-14 15:06:01 -04:00
Tom Lane
5035463769 Fix GiST index build for NaN values in geometric types.
GiST index build could go into an infinite loop when presented with boxes
(or points, circles or polygons) containing NaN component values.  This
happened essentially because the code assumed that x == x is true for any
"double" value x; but it's not true for NaNs.  The looping behavior was not
the only problem though: we also attempted to sort the items using simple
double comparisons.  Since NaNs violate the trichotomy law, qsort could
(in principle at least) get arbitrarily confused and mess up the sorting of
ordinary values as well as NaNs.  And we based splitting choices on box size
calculations that could produce NaNs, again resulting in undesirable
behavior.

To fix, replace all comparisons of doubles in this logic with
float8_cmp_internal, which is NaN-aware and is careful to sort NaNs
consistently, higher than any non-NaN.  Also rearrange the box size
calculation to not produce NaNs; instead it should produce an infinity
for a box with NaN on one side and not-NaN on the other.

I don't by any means claim that this solves all problems with NaNs in
geometric values, but it should at least make GiST index insertion work
reliably with such data.  It's likely that the index search side of things
still needs some work, and probably regular geometric operations too.
But with this patch we're laying down a convention for how such cases
ought to behave.

Per bug #14238 from Guang-Dih Lei.  Back-patch to 9.2; the code used before
commit 7f3bd86843 is quite different and doesn't lock up on my simple
test case, nor on the submitter's dataset.

Report: <20160708151747.1426.60150@wrigleys.postgresql.org>
Discussion: <28685.1468246504@sss.pgh.pa.us>
2016-07-14 18:46:00 -04:00
Tom Lane
cea17ba07a Be more predictable about reporting "lock timeout" vs "statement timeout".
If both timeout indicators are set when we arrive at ProcessInterrupts,
we've historically just reported "lock timeout".  However, some buildfarm
members have been observed to fail isolationtester's timeouts test by
reporting "lock timeout" when the statement timeout was expected to fire
first.  The cause seems to be that the process is allowed to sleep longer
than expected (probably due to heavy machine load) so that the lock
timeout happens before we reach the point of reporting the error, and
then this arbitrary tiebreak rule does the wrong thing.  We can improve
matters by comparing the scheduled timeout times to decide which error
to report.

I had originally proposed greatly reducing the 1-second window between
the two timeouts in the test cases.  On reflection that is a bad idea,
at least for the case where the lock timeout is expected to fire first,
because that would assume that it takes negligible time to get from
statement start to the beginning of the lock wait.  Thus, this patch
doesn't completely remove the risk of test failures on slow machines.
Empirically, however, the case this handles is the one we are seeing
in the buildfarm.  The explanation may be that the other case requires
the scheduler to take the CPU away from a busy process, whereas the
case fixed here only requires the scheduler to not give the CPU back
right away to a process that has been woken from a multi-second sleep
(and, perhaps, has been swapped out meanwhile).

Back-patch to 9.3 where the isolationtester timeouts test was added.

Discussion: <8693.1464314819@sss.pgh.pa.us>
2016-05-27 10:40:20 -04:00
Tom Lane
b4ce97aa5f Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>.
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t.  According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t.  (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)

It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>.  This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.

Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().

Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
2016-03-15 13:19:57 -04:00
Tom Lane
47acf3add3 Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.
Commit e529cd4ffa introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.

It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.

Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place).  That means the cost is not really O(mn) but more like O(max(m,n)).

Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.

In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.

Per gripe from Kevin Day; solution suggested by Robert Haas.  Back-patch
to 9.5 where the unwanted restriction was introduced.
2016-01-22 11:53:06 -05:00
Robert Haas
120b31dc54 Comment improvements for abbreviated keys.
Peter Geoghegan and Robert Haas
2015-12-22 13:59:13 -05:00
Tom Lane
e69d3a82e4 Avoid caching expression state trees for domain constraints across queries.
In commit 8abb3cda0d I attempted to cache
the expression state trees constructed for domain CHECK constraints for
the life of the backend (assuming the domain's constraints don't get
redefined).  However, this turns out not to work very well, because
execQual.c will run those state trees with ecxt_per_query_memory pointing
to a query-lifespan context, and in some situations we'll end up with
pointers into that context getting stored into the state trees.  This
happens in particular with SQL-language functions, as reported by
Emre Hasegeli, but there are many other cases.

To fix, keep only the expression plan trees for domain CHECK constraints
in the typcache's data structure, and revert to performing ExecInitExpr
(at least) once per query to set up expression state trees in the query's
context.

Eventually it'd be nice to undo this, but that will require some careful
thought about memory management for expression state trees, and it seems
far too late for any such redesign in 9.5.  This way is still much more
efficient than what happened before 8abb3cda0.
2015-11-29 18:18:42 -05:00
Tom Lane
a35c5b7c1f Fix handling of inherited check constraints in ALTER COLUMN TYPE (again).
The previous way of reconstructing check constraints was to do a separate
"ALTER TABLE ONLY tab ADD CONSTRAINT" for each table in an inheritance
hierarchy.  However, that way has no hope of reconstructing the check
constraints' own inheritance properties correctly, as pointed out in
bug #13779 from Jan Dirk Zijlstra.  What we should do instead is to do
a regular "ALTER TABLE", allowing recursion, at the topmost table that
has a particular constraint, and then suppress the work queue entries
for inherited instances of the constraint.

Annoyingly, we'd tried to fix this behavior before, in commit 5ed6546cf,
but we failed to notice that it wasn't reconstructing the pg_constraint
field values correctly.

As long as I'm touching pg_get_constraintdef_worker anyway, tweak it to
always schema-qualify the target table name; this seems like useful backup
to the protections installed by commit 5f173040.

In HEAD/9.5, get rid of get_constraint_relation_oids, which is now unused.
(I could alternatively have modified it to also return conislocal, but that
seemed like a pretty single-purpose API, so let's not pretend it has some
other use.)  It's unused in the back branches as well, but I left it in
place just in case some third-party code has decided to use it.

In HEAD/9.5, also rename pg_get_constraintdef_string to
pg_get_constraintdef_command, as the previous name did nothing to explain
what that entry point did differently from others (and its comment was
equally useless).  Again, that change doesn't seem like material for
back-patching.

I did a bit of re-pgindenting in tablecmds.c in HEAD/9.5, as well.

Otherwise, back-patch to all supported branches.
2015-11-20 14:55:28 -05:00
Tom Lane
5fb20a5ba6 Fix incorrect translation of minus-infinity datetimes for json/jsonb.
Commit bda76c1c8c caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json().  Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely.  Per bug #13687 from Stepan Perlov.  Back-patch to 9.4, like
the previous commit.

In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
2015-10-20 11:07:05 -07:00
Peter Eisentraut
e45f8f8820 Group cluster_name and update_process_title settings together 2015-10-04 12:29:50 -04:00
Tom Lane
45d256c279 Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Noah Misch
bbdb9dfbc3 Remove the SECURITY_ROW_LEVEL_DISABLED security context bit.
This commit's parent made superfluous the bit's sole usage.  Referential
integrity checks have long run as the subject table's owner, and that
now implies RLS bypass.  Safe use of the bit was tricky, requiring
strict control over the SQL expressions evaluating therein.  Back-patch
to 9.5, where the bit was introduced.

Based on a patch by Stephen Frost.
2015-09-20 20:47:36 -04:00
Noah Misch
6dae6edcd8 Remove the row_security=force GUC value.
Every query of a single ENABLE ROW SECURITY table has two meanings, with
the row_security GUC selecting between them.  With row_security=force
available, every function author would have been advised to either set
the GUC locally or test both meanings.  Non-compliance would have
threatened reliability and, for SECURITY DEFINER functions, security.
Authors already face an obligation to account for search_path, and we
should not mimic that example.  With this change, only BYPASSRLS roles
need exercise the aforementioned care.  Back-patch to 9.5, where the
row_security GUC was introduced.

Since this narrows the domain of pg_db_role_setting.setconfig and
pg_proc.proconfig, one might bump catversion.  A row_security=force
setting in one of those columns will elicit a clear message, so don't.
2015-09-20 20:45:54 -04:00
Tom Lane
a2538da89f Fix subtransaction cleanup after an outer-subtransaction portal fails.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort.  However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.

To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed.  (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)

In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact.  This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.

The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount.  That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache.  This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.

This has been broken since subtransactions were invented, so back-patch
to all supported branches.

Tom Lane and Michael Paquier
2015-09-04 13:37:16 -04:00
Tom Lane
e2035dc9a7 Fix bogus "out of memory" reports in tuplestore.c.
The tuplesort/tuplestore memory management logic assumed that the chunk
allocation overhead for its memtuples array could not increase when
increasing the array size.  This is and always was true for tuplesort,
but we (I, I think) blindly copied that logic into tuplestore.c without
noticing that the assumption failed to hold for the much smaller array
elements used by tuplestore.  Given rather small work_mem, this could
result in an improper complaint about "unexpected out-of-memory situation",
as reported by Brent DeSpain in bug #13530.

The easiest way to fix this is just to increase tuplestore's initial
array size so that the assumption holds.  Rather than relying on magic
constants, though, let's export a #define from aset.c that represents
the safe allocation threshold, and make tuplestore's calculation depend
on that.

Do the same in tuplesort.c to keep the logic looking parallel, even though
tuplesort.c isn't actually at risk at present.  This will keep us from
breaking it if we ever muck with the allocation parameters in aset.c.

Back-patch to all supported versions.  The error message doesn't occur
pre-9.3, not so much because the problem can't happen as because the
pre-9.3 tuplestore code neglected to check for it.  (The chance of
trouble is a great deal larger as of 9.3, though, due to changes in the
array-size-increasing strategy.)  However, allowing LACKMEM() to become
true unexpectedly could still result in less-than-desirable behavior,
so let's patch it all the way back.
2015-08-04 18:18:46 -04:00
Joe Conway
cfa928ff6f Plug RLS related information leak in pg_stats view.
The pg_stats view is supposed to be restricted to only show rows
about tables the user can read. However, it sometimes can leak
information which could not otherwise be seen when row level security
is enabled. Fix that by not showing pg_stats rows to users that would
be subject to RLS on the table the row is related to. This is done
by creating/using the newly introduced SQL visible function,
row_security_active().

Along the way, clean up three call sites of check_enable_rls(). The second
argument of that function should only be specified as other than
InvalidOid when we are checking as a different user than the current one,
as in when querying through a view. These sites were passing GetUserId()
instead of InvalidOid, which can cause the function to return incorrect
results if the current user has the BYPASSRLS privilege and row_security
has been set to OFF.

Additionally fix a bug causing RI Trigger error messages to unintentionally
leak information when RLS is enabled, and other minor cleanup and
improvements. Also add WITH (security_barrier) to the definition of pg_stats.

Bumped CATVERSION due to new SQL functions and pg_stats view definition.

Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav.
Patch by Joe Conway and Dean Rasheed with review and input by
Michael Paquier and Stephen Frost.
2015-07-28 13:21:37 -07:00
Tom Lane
6fcb337fa5 Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Andrew Dunstan
89ddd29bbd Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting.  Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator).  This inconsistency seemed likely to be
counter-intuitive to users.  To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators.  This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.

For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count.  From there on, it's as
if a positive-wise value was originally provided.

Note that there is still a minor inconsistency here across jsonb
deletion operators.  Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed.  The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements.  However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).

In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte.  This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator).  Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.

Peter Geoghegan and Andrew Dunstan
2015-07-17 21:06:39 -04:00
Noah Misch
aaf15ee33a Revoke support for strxfrm() that write past the specified array length.
This formalizes a decision implicit in commit
4ea51cdfe8 and adds clean detection of
affected systems.  Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.
2015-07-08 20:44:25 -04:00
Tom Lane
62d16c7fc5 Improve design and implementation of pg_file_settings view.
As first committed, this view reported on the file contents as they were
at the last SIGHUP event.  That's not as useful as reporting on the current
contents, and what's more, it didn't work right on Windows unless the
current session had serviced at least one SIGHUP.  Therefore, arrange to
re-read the files when pg_show_all_settings() is called.  This requires
only minor refactoring so that we can pass changeVal = false to
set_config_option() so that it won't actually apply any changes locally.

In addition, add error reporting so that errors that would prevent the
configuration files from being loaded, or would prevent individual settings
from being applied, are visible directly in the view.  This makes the view
usable for pre-testing whether edits made in the config files will have the
desired effect, before one actually issues a SIGHUP.

I also added an "applied" column so that it's easy to identify entries that
are superseded by later entries; this was the main use-case for the original
design, but it seemed unnecessarily hard to use for that.

Also fix a 9.4.1 regression that allowed multiple entries for a
PGC_POSTMASTER variable to cause bogus complaints in the postmaster log.
(The issue here was that commit bf007a27ac unintentionally reverted
3e3f65973a, which suppressed any duplicate entries within
ParseConfigFp.  However, since the original coding of the pg_file_settings
view depended on such suppression *not* happening, we couldn't have fixed
this issue now without first doing something with pg_file_settings.
Now we suppress duplicates by marking them "ignored" within
ProcessConfigFileInternal, which doesn't hide them in the view.)

Lesser changes include:

Drive the view directly off the ConfigVariable list, instead of making a
basically-equivalent second copy of the data.  There's no longer any need
to hang onto the data permanently, anyway.

Convert show_all_file_settings() to do its work in one call and return a
tuplestore; this avoids risks associated with assuming that the GUC state
will hold still over the course of query execution.  (I think there were
probably latent bugs here, though you might need something like a cursor
on the view to expose them.)

Arrange to run SIGHUP processing in a short-lived memory context, to
forestall process-lifespan memory leaks.  (There is one known leak in this
code, in ProcessConfigDirectory; it seems minor enough to not be worth
back-patching a specific fix for.)

Remove mistaken assignment to ConfigFileLineno that caused line counting
after an include_dir directive to be completely wrong.

Add missed failure check in AlterSystemSetConfigFile().  We don't really
expect ParseConfigFp() to fail, but that's not an excuse for not checking.
2015-06-28 18:06:14 -04:00
Heikki Linnakangas
cb2acb1081 Add missing_ok option to the SQL functions for reading files.
This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)

The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.

Michael Paquier, with plenty of kibitzing by me.
2015-06-28 21:35:46 +03:00
Kevin Grittner
604e99396d Add opaque declaration of HTAB to tqual.h.
Commit b89e151054 added the
ResolveCminCmaxDuringDecoding declaration to tqual.h, which uses an
HTAB parameter, without declaring HTAB.  It accidentally fails to
fail to build with current sources because a declaration happens to
be included, directly or indirectly, in all source files that
currently use tqual.h before tqual.h is first included, but we
shouldn't count on that.  Since an opaque declaration is enough
here, just use that, as was done in snapmgr.h.

Backpatch to 9.4, where the HTAB reference was added to tqual.h.
2015-06-27 09:55:06 -05:00
Tom Lane
5d1ff6bd55 Fix the logic for putting relations into the relcache init file.
Commit f3b5565dd4 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache.  However, we have historically nailed that index into cache for
performance reasons.  The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.

To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.

Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path.  Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set.  So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.

Back-patch to all supported branches, like the previous commit.
2015-06-25 14:39:05 -04:00
Kevin Grittner
870681017a Fix typo in comment.
Backpatch to 9.4 to minimize possible conflicts.
2015-06-10 17:03:56 -05:00
Tom Lane
f3b5565dd4 Use a safer method for determining whether relcache init file is stale.
When we invalidate the relcache entry for a system catalog or index, we
must also delete the relcache "init file" if the init file contains a copy
of that rel's entry.  The old way of doing this relied on a specially
maintained list of the OIDs of relations present in the init file: we made
the list either when reading the file in, or when writing the file out.
The problem is that when writing the file out, we included only rels
present in our local relcache, which might have already suffered some
deletions due to relcache inval events.  In such cases we correctly decided
not to overwrite the real init file with incomplete data --- but we still
used the incomplete initFileRelationIds list for the rest of the current
session.  This could result in wrong decisions about whether the session's
own actions require deletion of the init file, potentially allowing an init
file created by some other concurrent session to be left around even though
it's been made stale.

Since we don't support changing the schema of a system catalog at runtime,
the only likely scenario in which this would cause a problem in the field
involves a "vacuum full" on a catalog concurrently with other activity, and
even then it's far from easy to provoke.  Remarkably, this has been broken
since 2002 (in commit 7863404417), but we had
never seen a reproducible test case until recently.  If it did happen in
the field, the symptoms would probably involve unexpected "cache lookup
failed" errors to begin with, then "could not open file" failures after the
next checkpoint, as all accesses to the affected catalog stopped working.
Recovery would require manually removing the stale "pg_internal.init" file.

To fix, get rid of the initFileRelationIds list, and instead consult
syscache.c's list of relations used in catalog caches to decide whether a
relation is included in the init file.  This should be a tad more efficient
anyway, since we're replacing linear search of a list with ~100 entries
with a binary search.  It's a bit ugly that the init file contents are now
so directly tied to the catalog caches, but in practice that won't make
much difference.

Back-patch to all supported branches.
2015-06-07 15:32:09 -04:00
Andrew Dunstan
37def42245 Rename jsonb_replace to jsonb_set and allow it to add new values
The function is given a fourth parameter, which defaults to true. When
this parameter is true, if the last element of the path is missing
in the original json, jsonb_set creates it in the result and assigns it
the new value. If it is false then the function does nothing unless all
elements of the path are present, including the last.

Based on some original code from Dmitry Dolgov, heavily modified by me.

Catalog version bumped.
2015-05-31 20:34:10 -04:00
Tom Lane
da33a3894e Revert exporting of internal GUC variable "data_directory".
This undoes a poorly-thought-out choice in commit 970a18687f, namely
to export guc.c's internal variable data_directory.  The authoritative
variable so far as C code is concerned is DataDir; there is no reason for
anything except specific bits of GUC code to look at the GUC variable.

After yesterday's commits fixing the fsync-on-restart patch, the only
remaining misuse of data_directory was in AlterSystemSetConfigFile(),
which would be much better off just using a relative path anyhow: it's
less code and it doesn't break if the DBA moves the data directory of a
running system, which is a case we've taken some pains over in the past.

This is mostly cosmetic, so no need for a back-patch (and I'd be hesitant
to remove a global variable in stable branches anyway).
2015-05-29 11:57:33 -04:00
Tom Lane
91e79260f6 Remove no-longer-required function declarations.
Remove a bunch of "extern Datum foo(PG_FUNCTION_ARGS);" declarations that
are no longer needed now that PG_FUNCTION_INFO_V1(foo) provides that.

Some of these were evidently missed in commit e7128e8dbb, but others
were cargo-culted in in code added since then.  Possibly that can be blamed
in part on the fact that we'd not fixed relevant documentation examples,
which I've now done.
2015-05-24 12:20:23 -04:00
Bruce Momjian
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Andrew Dunstan
5302760a50 Unpack jbvBinary objects passed to pushJsonbValue
pushJsonbValue was accepting jbvBinary objects passed as WJB_ELEM or
WJB_VALUE data. While this succeeded, when those objects were later
encountered in attempting to convert the result to Jsonb, errors
occurred. With this change we ghuarantee that a JSonbValue constructed
from calls to pushJsonbValue does not contain any jbvBinary objects.
This cures a problem observed with jsonb_delete.

This means callers of pushJsonbValue no longer need to perform this
unpacking themselves. A subsequent patch will perform some cleanup in
that area.

The error was not triggered by any 9.4 code, but this is a publicly
visible routine, and so the error could be exercised by third party
code, therefore backpatch to 9.4.

Bug report from Peter Geoghegan, fix by me.
2015-05-22 10:21:41 -04:00
Heikki Linnakangas
fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Tom Lane
4db485e75b Put back a backwards-compatible version of sampling support functions.
Commit 83e176ec18 removed the longstanding
support functions for block sampling without any consideration of the
impact this would have on third-party FDWs.  The new API is not notably
more functional for FDWs than the old, so forcing them to change doesn't
seem like a good thing.  We can provide the old API as a wrapper (more
or less) around the new one for a minimal amount of extra code.
2015-05-18 18:34:37 -04:00
Andres Freund
f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Alvaro Herrera
26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Simon Riggs
f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Heikki Linnakangas
35fcb1b3d0 Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
2015-05-15 14:26:51 +03:00
Simon Riggs
83e176ec18 Separate block sampling functions
Refactoring ahead of tablesample patch

Requested and reviewed by Michael Paquier

Petr Jelinek
2015-05-15 04:02:54 +02:00
Peter Eisentraut
a486e35706 Add pg_settings.pending_restart column
with input from David G. Johnston, Robert Haas, Michael Paquier
2015-05-14 20:08:51 -04:00
Tom Lane
1dc5ebc907 Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
2015-05-14 12:08:49 -04:00
Andrew Dunstan
c6947010ce Additional functions and operators for jsonb
jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.
2015-05-12 15:52:45 -04:00
Alvaro Herrera
b488c580ae Allow on-the-fly capture of DDL event details
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
2015-05-11 19:14:31 -03:00
Andrew Dunstan
cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan
0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Stephen Frost
a97e0c3354 Add pg_file_settings view and function
The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele
2015-05-08 19:09:26 -04:00
Andres Freund
168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Alvaro Herrera
3b6db1f445 Add geometry/range functions to support BRIN inclusion
This commit adds the following functions:
    box(point) -> box
    bound_box(box, box) -> box
    inet_same_family(inet, inet) -> bool
    inet_merge(inet, inet) -> cidr
    range_merge(anyrange, anyrange) -> anyrange

The first of these is also used to implement a new assignment cast from
point to box.

These functions are the first part of a base to implement an "inclusion"
operator class for BRIN, for multidimensional data types.

Author: Emre Hasegeli
Reviewed by: Andreas Karlsson
2015-05-05 15:22:24 -03:00
Robert Haas
924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Andres Freund
5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Peter Eisentraut
cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00