postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-09 09:40:40 -04:00

Author	SHA1	Message	Date
Andres Freund	ec8a30ef2a	Change various GinIs macros to return 0/1. Returning the direct result of bit arithmetic, in a macro intended to be used in a boolean manner, can be problematic if the return value is stored in a variable of type 'bool'. If bool is implemented using C99's _Bool, that can lead to comparison failures if the variable is then compared again with the expression (see ginStepRight() for an example that fails), as _Bool forces the result to be 0/1. That happens in some configurations of newer MSVC compilers. It's also problematic when storing the result of such an expression in a narrower type. Several gin macros have been declared in that style since gin's initial commit in `8a3631f8d8`. There's a lot more macros like this, but this is the only one causing regression test failures; and I don't want to commit and backpatch a larger patch with lots of conflicts just before the next set of minor releases. Discussion: 20150811154237.GD17575@awork2.anarazel.de Backpatch: All supported branches	2016-03-27 17:47:55 +02:00
Tom Lane	b4ce97aa5f	Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>. Previously, we included <xlocale.h> only if necessary to get the definition of type locale_t. According to notes in PGAC_TYPE_LOCALE_T, this is important because on some versions of glibc that file supplies an incompatible declaration of locale_t. (This info may be obsolete, because on my RHEL6 box that seems to be the only definition of locale_t; but there may still be glibc's in the wild for which it's a live concern.) It turns out though that on FreeBSD and maybe other BSDen, you can get locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from <xlocale.h>. This was leaving us compiling calls to mbstowcs_l() and friends with no visible prototype, which causes a warning and could possibly cause actual trouble, since it's not declared to return int. Hence, adjust the configure checks so that we'll include <xlocale.h> either if it's necessary to get type locale_t or if it's necessary to get a declaration of mbstowcs_l(). Report and patch by Aleksander Alekseev, somewhat whacked around by me. Back-patch to all supported branches, since we have been using mbstowcs_l() since 9.1.	2016-03-15 13:19:57 -04:00
Andres Freund	63b06e8fa6	Introduce durable_rename() and durable_link_or_rename(). Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Andres Freund	5990a034ad	logical decoding: Fix handling of large old tuples with replica identity full. When decoding the old version of an UPDATE or DELETE change, and if that tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or failed in more subtle ways in non-assert builds. Normally individual tuples aren't bigger than MaxHeapTupleSize, with big datums toasted. But that's not the case for the old version of a tuple for logical decoding; the replica identity is logged as one piece. With the default replica identity btree limits that to small tuples, but that's not the case for FULL. Change the tuple buffer infrastructure to separate allocate over-large tuples, instead of always going through the slab cache. This unfortunately requires changing the ReorderBufferTupleBuf definition, we need to store the allocated size someplace. To avoid requiring output plugins to recompile, don't store HeapTupleHeaderData directly after HeapTupleData, but point to it via t_data; that leaves rooms for the allocated size. As there's no reason for an output plugin to look at ReorderBufferTupleBuf->t_data.header, remove the field. It was just a minor convenience having it directly accessible. Reported-By: Adam Dratwiński Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com	2016-03-05 18:02:20 -08:00
Andres Freund	6e759cefe8	logical decoding: Tell reorderbuffer about all xids. Logical decoding's reorderbuffer keeps transactions in an LSN ordered list for efficiency. To make that's efficiently possible upper-level xids are forced to be logged before nested subtransaction xids. That only works though if these records are all looked at: Unfortunately we didn't do so for e.g. row level locks, which are otherwise uninteresting for logical decoding. This could lead to errors like: "ERROR: subxact logged without previous toplevel record". It's not sufficient to just look at row locking records, the xid could appear first due to a lot of other types of records (which will trigger the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny). So invent infrastructure to tell reorderbuffer about xids seen, when they'd otherwise not pass through reorderbuffer.c. Reported-By: Jarred Ward Bug: #13844 Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Simon Riggs	bf7ced5e2d	Revert buggy optimization of index scans `606c0123d6` attempted to reduce cost of index scans using > and < strategies, though got that completely wrong in a few complex cases. Revert whole patch until we find a safe optimization.	2016-03-03 09:50:38 +00:00
Alvaro Herrera	573cffc7a0	Fix typos Backpatch to: 9.4	2016-02-25 20:50:20 -03:00
Noah Misch	377a1af56a	Call xlc __isync() after, not before, associated compare-and-swap. Architecture reference material specifies this order, and s_lock.h inline assembly agrees. The former order failed to provide mutual exclusion to lwlock.c and perhaps to other clients. The two xlc buildfarm members, hornet and mandrill, have failed sixteen times with duplicate key errors involving pg_class_oid_index or pg_type_oid_index. Back-patch to 9.5, where commit `b64d92f1a5` introduced atomics. Reviewed by Andres Freund and Tom Lane.	2016-02-19 22:48:04 -05:00
Joe Conway	73ca21c7ac	Move DATA entry to correct position In commit `7b4bfc87` the DATA and DESCR entries for the new row_security_active() function were inadvertantly put after the PROVOLATILE defines, rather than before as they should have been placed. Move them up where they belong. Backpatch to 9.5 where the new entries were introduced.	2016-02-15 16:37:16 -08:00
Tom Lane	d2ab9b0ac3	Suppress compiler warnings about useless comparison of unsigned to zero. Reportedly, some compilers warn about tests like "c < 0" if c is unsigned, and hence complain about the character range checks I added in commit `3bb3f42f37`. This is a bit of a pain since the regex library doesn't really want to assume that chr is unsigned. However, since any such reconfiguration would involve manual edits of regcustom.h anyway, we can put it on the shoulders of whoever wants to do that to adjust this new range-checking macro correctly. Per gripes from Coverity and Andres.	2016-02-15 17:11:52 -05:00
Tom Lane	616eaa396a	Stamp 9.5.1.	2016-02-08 16:12:28 -05:00
Tom Lane	a61de2bc13	Fix some regex issues with out-of-range characters and large char ranges. Previously, our regex code defined CHR_MAX as 0xfffffffe, which is a bad choice because it is outside the range of type "celt" (int32). Characters approaching that limit could lead to infinite loops in logic such as "for (c = a; c <= b; c++)" where c is of type celt but the range bounds are chr. Such loops will work safely only if CHR_MAX+1 is representable in celt, since c must advance to beyond b before the loop will exit. Fortunately, there seems no reason not to restrict CHR_MAX to 0x7ffffffe. It's highly unlikely that Unicode will ever assign codes that high, and none of our other backend encodings need characters beyond that either. In addition to modifying the macro, we have to explicitly enforce character range restrictions on the values of \u, \U, and \x escape sequences, else the limit is trivially bypassed. Also, the code for expanding case-independent character ranges in bracket expressions had a potential integer overflow in its calculation of the number of characters it could generate, which could lead to allocating too small a character vector and then overwriting memory. An attacker with the ability to supply arbitrary regex patterns could easily cause transient DOS via server crashes, and the possibility for privilege escalation has not been ruled out. Quite aside from the integer-overflow problem, the range expansion code was unnecessarily inefficient in that it always produced a result consisting of individual characters, abandoning the knowledge that we had a range to start with. If the input range is large, this requires excessive memory. Change it so that the original range is reported as-is, and then we add on any case-equivalent characters that are outside that range. With this approach, we can bound the number of individual characters allowed without sacrificing much. This patch allows at most 100000 individual characters, which I believe to be more than the number of case pairs existing in Unicode, so that the restriction will never be hit in practice. It's still possible for range() to take awhile given a large character code range, so also add statement-cancel detection to its loop. The downstream function dovec() also lacked cancel detection, and could take a long time given a large output from range(). Per fuzz testing by Greg Stark. Back-patch to all supported branches. Security: CVE-2016-0773	2016-02-08 10:25:40 -05:00
Tom Lane	f39c927114	Fix pg_description entries for jsonb_to_record() and jsonb_to_recordset(). All the other jsonb function descriptions refer to the arguments as being "jsonb", but these two said "json". Make it consistent. Per bug #13905 from Petru Florin Mihancea. No catversion bump --- we can't force one in the back branches, and this isn't very critical anyway.	2016-02-02 11:39:50 -05:00
Tom Lane	47acf3add3	Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN. Commit `e529cd4ffa` introduced an Assert requiring NAMEDATALEN to be less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time. Since up to that instant we had always allowed NAMEDATALEN to be substantially more than that, this was ill-advised. It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be so tight; but this patch takes the narrower approach of just not applying the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser. Trusting the parser for this seems reasonable, first because the strings are limited to NAMEDATALEN which is unlikely to be hugely more than 256, and second because the maximum distance is tightly constrained by MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one place). That means the cost is not really O(mn) but more like O(max(m,n)). Relaxing the limit for user-supplied calls is left for future research; given the lack of complaints to date, it doesn't seem very high priority. In passing, fix confusion between lengths-in-bytes and lengths-in-chars in comments and error messages. Per gripe from Kevin Day; solution suggested by Robert Haas. Back-patch to 9.5 where the unwanted restriction was introduced.	2016-01-22 11:53:06 -05:00
Tom Lane	e80c85e4e8	Add defenses against putting expanded objects into Const nodes. Putting a reference to an expanded-format value into a Const node would be a bad idea for a couple of reasons. It'd be possible for the supposedly immutable Const to change value, if something modified the referenced variable ... in fact, if the Const's reference were R/W, any function that has the Const as argument might itself change it at runtime. Also, because datumIsEqual() is pretty simplistic, the Const might fail to compare equal to other Consts that it should compare equal to, notably including copies of itself. This could lead to unexpected planner behavior, such as "could not find pathkey item to sort" errors or inferior plans. I have not been able to find any way to get an expanded value into a Const within the existing core code; but Paul Ramsey was able to trigger the problem by writing a datatype input function that returns an expanded value. The best fix seems to be to establish a rule that varlena values being placed into Const nodes should be passed through pg_detoast_datum(). That will do nothing (and cost little) in normal cases, but it will flatten expanded values and thereby avoid the above problems. Also, it will convert short-header or compressed values into canonical format, which will avoid possible unexpected lack-of-equality issues for those cases too. And it provides a last-ditch defense against putting a toasted value into a Const, which we already knew was dangerous, cf commit `2b0c86b665`. (In the light of this discussion, I'm no longer sure that that commit provided 100% protection against such cases, but this fix should do it.) The test added in commit `65c3d05e18` to catch datatype input functions with unstable results would fail for functions that returned expanded values; but it seems a bit uncharitable to deem a result unstable just because it's expressed in expanded form, so revise the coding so that we check for bitwise equality only after applying pg_detoast_datum(). That's a sufficient condition anyway given the new rule about detoasting when forming a Const. Back-patch to 9.5 where the expanded-object facility was added. It's possible that this should go back further; but in the absence of clear evidence that there's any live bug in older branches, I'll refrain for now.	2016-01-21 12:55:59 -05:00
Tom Lane	aa7e04cb56	Clean up some lack-of-STRICT issues in the core code, too. A scan for missed proisstrict markings in the core code turned up these functions: brin_summarize_new_values pg_stat_reset_single_table_counters pg_stat_reset_single_function_counters pg_create_logical_replication_slot pg_create_physical_replication_slot pg_drop_replication_slot The first three of these take OID, so a null argument will normally look like a zero to them, resulting in "ERROR: could not open relation with OID 0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX functions. The other three will dump core on a null argument, though this is mitigated by the fact that they won't do so until after checking that the caller is superuser or has rolreplication privilege. In addition, the pg_logical_slot_get/peek[_binary]_changes family was intentionally marked nonstrict, but failed to make nullness checks on all the arguments; so again a null-pointer-dereference crash is possible but only for superusers and rolreplication users. Add the missing ARGISNULL checks to the latter functions, and mark the former functions as strict in pg_proc. Make that change in the back branches too, even though we can't force initdb there, just so that installations initdb'd in future won't have the issue. Since none of these bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX functions hardly misbehave at all), it seems sufficient to do this. In addition, fix some order-of-operations oddities in the slot_get_changes family, mostly cosmetic, but not the part that moves the function's last few operations into the PG_TRY block. As it stood, there was significant risk for an error to exit without clearing historical information from the system caches. The slot_get_changes bugs go back to 9.4 where that code was introduced. Back-patch appropriate subsets of the pg_proc changes into all active branches, as well.	2016-01-09 16:58:32 -05:00
Magnus Hagander	6a1a44037b	Fix typo in comment Tatsuro Yamada	2016-01-08 08:56:03 +01:00
Tom Lane	cdd4ed5449	Stamp 9.5.0.	2016-01-04 16:29:34 -05:00
Tom Lane	2d774aaf18	Split out pg_operator.h function declarations to new file pg_operator_fn.h. Commit `a2e35b53c3` added an #include of catalog/objectaddress.h to pg_operator.h, making it impossible for client-side code to #include pg_operator.h. It's not entirely clear whether any client-side code needs to include pg_operator.h, but it seems prudent to assume that there is some such code somewhere. Therefore, split off the function definitions into a new file pg_operator_fn.h, similarly to what we've done for some other catalog header files. Back-patch of part of commit `0dab5ef39b`.	2016-01-01 13:00:13 -05:00
Joe Conway	282fdfcdc8	Rename (new\|old)estCommitTs to (new\|old)estCommitTsXid The variables newestCommitTs and oldestCommitTs sound as if they are timestamps, but in fact they are the transaction Ids that correspond to the newest and oldest timestamps rather than the actual timestamps. Rename these variables to reflect that they are actually xids: to wit newestCommitTsXid and oldestCommitTsXid respectively. Also modify related code in a similar fashion, particularly the user facing output emitted by pg_controldata and pg_resetxlog. Complaint and patch by me, review by Tom Lane and Alvaro Herrera. Backpatch to 9.5 where these variables were first introduced.	2015-12-28 12:35:16 -08:00
Robert Haas	120b31dc54	Comment improvements for abbreviated keys. Peter Geoghegan and Robert Haas	2015-12-22 13:59:13 -05:00
Alvaro Herrera	0c6881fd14	Rework internals of changing a type's ownership This is necessary so that REASSIGN OWNED does the right thing with composite types, to wit, that it also alters ownership of the type's pg_class entry -- previously, the pg_class entry remained owned by the original user, which caused later other failures such as the new owner's inability to use ALTER TYPE to rename an attribute of the affected composite. Also, if the original owner is later dropped, the pg_class entry becomes owned by a non-existant user which is bogus. To fix, create a new routine AlterTypeOwner_oid which knows whether to pass the request to ATExecChangeOwner or deal with it directly, and use that in shdepReassignOwner rather than calling AlterTypeOwnerInternal directly. AlterTypeOwnerInternal is now simpler in that it only modifies the pg_type entry and recurses to handle a possible array type; higher-level tasks are handled by either AlterTypeOwner directly or AlterTypeOwner_oid. I took the opportunity to add a few more objects to the test rig for REASSIGN OWNED, so that more cases are exercised. Additional ones could be added for superuser-only-ownable objects (such as FDWs and event triggers) but I didn't want to push my luck by adding a new superuser to the tests on a backpatchable bug fix. Per bug #13666 reported by Chris Pacejo. Backpatch to 9.5. (I would back-patch this all the way back, except that it doesn't apply cleanly in 9.4 and earlier because `59367fdf9` wasn't backpatched. If we decide that we need this in earlier branches too, we should backpatch both.)	2015-12-17 14:25:41 -03:00
Tom Lane	f1c152866c	Cope with Readline's failure to track SIGWINCH events outside of input. It emerges that libreadline doesn't notice terminal window size change events unless they occur while collecting input. This is easy to stumble over if you resize the window while using a pager to look at query output, but it can be demonstrated without any pager involvement. The symptom is that queries exceeding one line are misdisplayed during subsequent input cycles, because libreadline has the wrong idea of the screen dimensions. The safest, simplest way to fix this is to call rl_reset_screen_size() just before calling readline(). That causes an extra ioctl(TIOCGWINSZ) for every command; but since it only happens when reading from a tty, the performance impact should be negligible. A more valid objection is that this still leaves a tiny window during entry to readline() wherein delivery of SIGWINCH will be missed; but the practical consequences of that are probably negligible. In any case, there doesn't seem to be any good way to avoid the race, since readline exposes no functions that seem safe to call from a generic signal handler --- rl_reset_screen_size() certainly isn't. It turns out that we also need an explicit rl_initialize() call, else rl_reset_screen_size() dumps core when called before the first readline() call. rl_reset_screen_size() is not present in old versions of libreadline, so we need a configure test for that. (rl_initialize() is present at least back to readline 4.0, so we won't bother with a test for it.) We would need a configure test anyway since libedit's emulation of libreadline doesn't currently include such a function. Fortunately, libedit seems not to have any corresponding bug. Merlin Moncure, adjusted a bit by me	2015-12-16 16:58:55 -05:00
Tom Lane	8abb52fa29	Stamp 9.5rc1.	2015-12-15 17:25:44 -05:00
Stephen Frost	fda18e46a6	Handle policies during DROP OWNED BY DROP OWNED BY handled GRANT-based ACLs but was not removing roles from policies. Fix that by having DROP OWNED BY remove the role specified from the list of roles the policy (or policies) apply to, or the entire policy (or policies) if it only applied to the role specified. As with ACLs, the DROP OWNED BY caller must have permission to modify the policy or a WARNING is thrown and no change is made to the policy.	2015-12-11 16:12:36 -05:00
Tom Lane	dc4518814e	Get rid of the planner's LateralJoinInfo data structure. I originally modeled this data structure on SpecialJoinInfo, but after commit `acfcd45cac` that looks like a pretty poor decision. All we really need is relid sets identifying laterally-referenced rels; and most of the time, what we want to know about includes indirect lateral references, a case the LateralJoinInfo data was unsuited to compute with any efficiency. The previous commit redefined RelOptInfo.lateral_relids as the transitive closure of lateral references, so that it easily supports checking indirect references. For the places where we really do want just direct references, add a new RelOptInfo field direct_lateral_relids, which is easily set up as a copy of lateral_relids before we perform the transitive closure calculation. Then we can just drop lateral_info_list and LateralJoinInfo and the supporting code. This makes the planner's handling of lateral references noticeably more efficient, and shorter too. Such a change can't be back-patched into stable branches for fear of breaking extensions that might be looking at the planner's data structures; but it seems not too late to push it into 9.5, so I've done so.	2015-12-11 15:52:39 -05:00
Tom Lane	564c19e86e	Still more fixes for planner's handling of LATERAL references. More fuzz testing by Andreas Seltenreich exposed that the planner did not cope well with chains of lateral references. If relation X references Y laterally, and Y references Z laterally, then we will have to scan X on the inside of a nestloop with Z, so for all intents and purposes X is laterally dependent on Z too. The planner did not understand this and would generate intermediate joins that could not be used. While that was usually harmless except for wasting some planning cycles, under the right circumstances it would lead to "failed to build any N-way joins" or "could not devise a query plan" planner failures. To fix that, convert the existing per-relation lateral_relids and lateral_referencers relid sets into their transitive closures; that is, they now show all relations on which a rel is directly or indirectly laterally dependent. This not only fixes the chained-reference problem but allows some of the relevant tests to be made substantially simpler and faster, since they can be reduced to simple bitmap manipulations instead of searches of the LateralJoinInfo list. Also, when a PlaceHolderVar that is due to be evaluated at a join contains lateral references, we should treat those references as indirect lateral dependencies of each of the join's base relations. This prevents us from trying to join any individual base relations to the lateral reference source before the join is formed, which again cannot work. Andreas' testing also exposed another oversight in the "dangerous PlaceHolderVar" test added in commit `85e5e222b1`. Simply rejecting unsafe join paths in joinpath.c is insufficient, because in some cases we will end up rejecting all possible paths for a particular join, again leading to "could not devise a query plan" failures. The restriction has to be known also to join_is_legal and its cohort functions, so that they will not select a join for which that will happen. I chose to move the supporting logic into joinrels.c where the latter functions are. Back-patch to 9.3 where LATERAL support was introduced.	2015-12-11 14:22:20 -05:00
Alvaro Herrera	0f2c089d34	Fix commit timestamp initialization This module needs explicit initialization in order to replay WAL records in recovery, but we had broken this recently following changes to make other (stranger) scenarios work correctly. To fix, rework the initialization sequence so that it always takes place before WAL replay commences for both master and standby. I could have gone for a more localized fix that just added a "startup" call for the master server, but it seemed better to restructure the existing callers as well so that the whole thing made more sense. As a drawback, there is more control logic in xlog.c now than previously, but doing otherwise meant passing down the ControlFile flag, which seemed uglier as a whole. This also meant adding a check to not re-execute ActivateCommitTs if it had already been called. Reported by Fujii Masao. Backpatch to 9.5.	2015-12-11 14:30:43 -03:00
Andres Freund	5b51805fe4	Fix bug leading to restoring unlogged relations from empty files. At the end of crash recovery, unlogged relations are reset to the empty state, using their init fork as the template. The init fork is copied to the main fork without going through shared buffers. Unfortunately WAL replay so far has not necessarily flushed writes from shared buffers to disk at that point. In normal crash recovery, and before the introduction of 'fast promotions' in `fd4ced523` / 9.3, the END_OF_RECOVERY checkpoint flushes the buffers out in time. But with fast promotions that's not the case anymore. To fix, force WAL writes targeting the init fork to be flushed immediately (using the new FlushOneBuffer() function). In 9.5+ that flush can centrally be triggered from the code dealing with restoring full page writes (XLogReadBufferForRedoExtended), in earlier releases that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay function. Backpatch to 9.1, even if this currently is only known to trigger in 9.3+. Flushing earlier is more robust, and it is advantageous to keep the branches similar. Typical symptoms of this bug are errors like 'ERROR: index "..." contains unexpected zero page at block 0' shortly after promoting a node. Reported-By: Thom Brown Author: Andres Freund and Michael Paquier Discussion: 20150326175024.GJ451@alap3.anarazel.de Backpatch: 9.1-	2015-12-10 16:39:25 +01:00
Tom Lane	c3c5c02881	Simplify LATERAL-related calculations within add_paths_to_joinrel(). While convincing myself that commit `7e19db0c09` would solve both of the problems recently reported by Andreas Seltenreich, I realized that add_paths_to_joinrel's handling of LATERAL restrictions could be made noticeably simpler and faster if we were to retain the minimum possible parameterization for each joinrel (that is, the set of relids supplying unsatisfied lateral references in it). We already retain that for baserels, in RelOptInfo.lateral_relids, so we can use that field for joinrels too. This is a back-port of commit `edca44b152`. I originally intended not to back-patch that, but additional hacking in this area turns out to be needed, making it necessary not optional to compute lateral_relids for joinrels. In preparation for those fixes, sync the relevant code with HEAD as much as practical. (I did not risk rearranging fields of RelOptInfo in released branches, however.)	2015-12-09 18:54:25 -05:00
Robert Haas	325b357bc2	Allow foreign and custom joins to handle EvalPlanQual rechecks. Commit `e7cb7ee145` provided basic infrastructure for allowing a foreign data wrapper or custom scan provider to replace a join of one or more tables with a scan. However, this infrastructure failed to take into account the need for possible EvalPlanQual rechecks, and ExecScanFetch would fail an assertion (or just overwrite memory) if such a check was attempted for a plan containing a pushed-down join. To fix, adjust the EPQ machinery to skip some processing steps when scanrelid == 0, making those the responsibility of scan's recheck method, which also has the responsibility in this case of correctly populating the relevant slot. To allow foreign scans to gain control in the right place to make use of this new facility, add a new, optional RecheckForeignScan method. Also, allow a foreign scan to have a child plan, which can be used to correctly populate the slot (or perhaps for something else, but this is the only use currently envisioned). KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro Horiguchi.	2015-12-08 12:54:33 -05:00
Tom Lane	25517ee14c	Fix another oversight in checking if a join with LATERAL refs is legal. It was possible for the planner to decide to join a LATERAL subquery to the outer side of an outer join before the outer join itself is completed. Normally that's fine because of the associativity rules, but it doesn't work if the subquery contains a lateral reference to the inner side of the outer join. In such a situation the outer join must be done first. join_is_legal() missed this consideration and would allow the join to be attempted, but the actual path-building code correctly decided that no valid join path could be made, sometimes leading to planner errors such as "failed to build any N-way joins". Per report from Andreas Seltenreich. Back-patch to 9.3 where LATERAL support was added.	2015-12-07 17:41:45 -05:00
Alvaro Herrera	16e8e62d27	Further tweak commit_timestamp behavior As pointed out by Fujii Masao, we weren't quite there on a standby behaving sanely: first because we were failing to acquire the correct state in the case where no XLOG_PARAMETER_CHANGE message was sent (because a checkpoint had already happened after the setting was changed in the master, and then the standby was restarted); and second because promoting the standby with the feature enabled failed to activate it if the master had the feature disabled. This patch fixes both those misbehaviors hopefully without re-introducing any old problems. Also change the hint emitted in a standby together with the error message about the feature being disabled, to make it point out that the place to chance the setting is the master. Otherwise, if the setting is already enabled in the standby, it is very confusing to have it say that the setting must be enabled ... Authors: Álvaro Herrera, Petr Jelínek. Backpatch to 9.5.	2015-12-03 19:22:31 -03:00
Tom Lane	e69d3a82e4	Avoid caching expression state trees for domain constraints across queries. In commit `8abb3cda0d` I attempted to cache the expression state trees constructed for domain CHECK constraints for the life of the backend (assuming the domain's constraints don't get redefined). However, this turns out not to work very well, because execQual.c will run those state trees with ecxt_per_query_memory pointing to a query-lifespan context, and in some situations we'll end up with pointers into that context getting stored into the state trees. This happens in particular with SQL-language functions, as reported by Emre Hasegeli, but there are many other cases. To fix, keep only the expression plan trees for domain CHECK constraints in the typcache's data structure, and revert to performing ExecInitExpr (at least) once per query to set up expression state trees in the query's context. Eventually it'd be nice to undo this, but that will require some careful thought about memory management for expression state trees, and it seems far too late for any such redesign in 9.5. This way is still much more efficient than what happened before `8abb3cda0`.	2015-11-29 18:18:42 -05:00
Tom Lane	5f5e68b087	Adopt the GNU convention for handling tar-archive members exceeding 8GB. The POSIX standard for tar headers requires archive member sizes to be printed in octal with at most 11 digits, limiting the representable file size to 8GB. However, GNU tar and apparently most other modern tars support a convention in which oversized values can be stored in base-256, allowing any practical file to be a tar member. Adopt this convention to remove two limitations: * pg_dump with -Ft output format failed if the contents of any one table exceeded 8GB. * pg_basebackup failed if the data directory contained any file exceeding 8GB. (This would be a fatal problem for installations configured with a table segment size of 8GB or more, and it has also been seen to fail when large core dump files exist in the data directory.) File sizes under 8GB are still printed in octal, so that no compatibility issues are created except in cases that would have failed entirely before. In addition, this patch fixes several bugs in the same area: * In 9.3 and later, we'd defined tarCreateHeader's file-size argument as size_t, which meant that on 32-bit machines it would write a corrupt tar header for file sizes between 4GB and 8GB, even though no error was raised. This broke both "pg_dump -Ft" and pg_basebackup for such cases. * pg_restore from a tar archive would fail on tables of size between 4GB and 8GB, on machines where either "size_t" or "unsigned long" is 32 bits. This happened even with an archive file not affected by the previous bug. * pg_basebackup would fail if there were files of size between 4GB and 8GB, even on 64-bit machines. * In 9.3 and later, "pg_basebackup -Ft" failed entirely, for any file size, on 64-bit big-endian machines. In view of these potential data-loss bugs, back-patch to all supported branches, even though removal of the documented 8GB limit might otherwise be considered a new feature rather than a bug fix.	2015-11-21 20:21:32 -05:00
Tom Lane	a35c5b7c1f	Fix handling of inherited check constraints in ALTER COLUMN TYPE (again). The previous way of reconstructing check constraints was to do a separate "ALTER TABLE ONLY tab ADD CONSTRAINT" for each table in an inheritance hierarchy. However, that way has no hope of reconstructing the check constraints' own inheritance properties correctly, as pointed out in bug #13779 from Jan Dirk Zijlstra. What we should do instead is to do a regular "ALTER TABLE", allowing recursion, at the topmost table that has a particular constraint, and then suppress the work queue entries for inherited instances of the constraint. Annoyingly, we'd tried to fix this behavior before, in commit `5ed6546cf`, but we failed to notice that it wasn't reconstructing the pg_constraint field values correctly. As long as I'm touching pg_get_constraintdef_worker anyway, tweak it to always schema-qualify the target table name; this seems like useful backup to the protections installed by commit `5f173040`. In HEAD/9.5, get rid of get_constraint_relation_oids, which is now unused. (I could alternatively have modified it to also return conislocal, but that seemed like a pretty single-purpose API, so let's not pretend it has some other use.) It's unused in the back branches as well, but I left it in place just in case some third-party code has decided to use it. In HEAD/9.5, also rename pg_get_constraintdef_string to pg_get_constraintdef_command, as the previous name did nothing to explain what that entry point did differently from others (and its comment was equally useless). Again, that change doesn't seem like material for back-patching. I did a bit of re-pgindenting in tablecmds.c in HEAD/9.5, as well. Otherwise, back-patch to all supported branches.	2015-11-20 14:55:28 -05:00
Robert Haas	eb66ee639e	Stamp 9.5beta2.	2015-11-09 14:54:00 -05:00
Robert Haas	c98605cc47	Pass extra data to bgworkers, and use this to fix parallel contexts. Up until now, the total amount of data that could be passed to a background worker at startup was one datum, which can be a small as 4 bytes on some systems. That's enough to pass a dsm_handle or an array index, but not much else. Add a bgw_extra flag to the BackgroundWorker struct, allowing up to 128 bytes to be passed to a new worker on any platform. Use this to fix a problem I recently discovered with the parallel context machinery added in 9.5: the master assigns each worker an array index, and each worker subsequently assigns itself an array index, and there's nothing to guarantee that the two sets of indexes match, leading to chaos. Normally, I would not back-patch the change to add bgw_extra, since it is basically a feature addition. However, since 9.5 is still in beta and there seems to be no other sensible way to repair the broken parallel context machinery, back-patch to 9.5. Existing background worker code can ignore the bgw_extra field without a problem, but might need to be recompiled since the structure size has changed. Report and patch by me. Review by Amit Kapila.	2015-11-05 12:21:00 -05:00
Alvaro Herrera	68cc162e45	Cleanup commit timestamp module activaction, again Further tweak commit_ts.c so that on a standby the state is completely consistent with what that in the master, rather than behaving differently in the cases that the settings differ. Now in standby and master the module should always be active or inactive in lockstep. Author: Petr Jelínek, with some further tweaks by Álvaro Herrera. Backpatch to 9.5, where commit timestamps were introduced. Discussion: http://www.postgresql.org/message-id/5622BF9D.2010409@2ndquadrant.com	2015-10-27 15:06:50 -03:00
Tom Lane	5fb20a5ba6	Fix incorrect translation of minus-infinity datetimes for json/jsonb. Commit `bda76c1c8c` caused both plus and minus infinity to be rendered as "infinity", which is not only wrong but inconsistent with the pre-9.4 behavior of to_json(). Fix that by duplicating the coding in date_out/timestamp_out/timestamptz_out more closely. Per bug #13687 from Stepan Perlov. Back-patch to 9.4, like the previous commit. In passing, also re-pgindent json.c, since it had gotten a bit messed up by recent patches (and I was already annoyed by indentation-related problems in back-patching this fix ...)	2015-10-20 11:07:05 -07:00
Robert Haas	b3967f8937	Fix incorrect comment in plannodes.h Etsuro Fujita	2015-10-20 11:12:00 -04:00
Tom Lane	e91cfdead7	Improve memory-usage accounting in regular-expression compiler. This code previously counted the number of NFA states it created, and complained if a limit was exceeded, so as to prevent bizarre regex patterns from consuming unreasonable time or memory. That's fine as far as it went, but the code paid no attention to how many arcs linked those states. Since regexes can be contrived that have O(N) states but will need O(N^2) arcs after fixempties() processing, it was still possible to blow out memory, and take a long time doing it too. To fix, modify the bookkeeping to count space used by both states and arcs. I did not bother with including the "color map" in the accounting; it can only grow to a few megabytes, which is not a lot in comparison to what we're allowing for states+arcs (about 150MB on 64-bit machines or half that on 32-bit machines). Looking at some of the larger real-world regexes captured in the Tcl regression test suite suggests that the most that is likely to be needed for regexes found in the wild is under 10MB, so I believe that the current limit has enough headroom to make it okay to keep it as a hard-wired limit. In connection with this, redefine REG_ETOOBIG as meaning "regular expression is too complex"; the previous wording of "nfa has too many states" was already somewhat inapropos because of the error code's use for stack depth overrun, and it was not very user-friendly either. Back-patch to all supported branches.	2015-10-16 15:56:00 -04:00
Tom Lane	cff9e0659e	Fix O(N^2) performance problems in regular-expression compiler. Change the singly-linked in-arc and out-arc lists to be doubly-linked, so that arc deletion is constant time rather than having worst-case time proportional to the number of other arcs on the connected states. Modify the bulk arc transfer operations copyins(), copyouts(), moveins(), moveouts() so that they use a sort-and-merge algorithm whenever there's more than a small number of arcs to be copied or moved. The previous method is O(N^2) in the number of arcs involved, because it performs duplicate checking independently for each copied arc. The new method may change the ordering of existing arcs for the destination state, but nothing really cares about that. Provide another bulk arc copying method mergeins(), which is unused as of this commit but is needed for the next one. It basically is like copyins(), but the source arcs might not all come from the same state. Replace the O(N^2) bubble-sort algorithm used in carcsort() with a qsort() call. These changes greatly improve the performance of regex compilation for large or complex regexes, at the cost of extra space for arc storage during compilation. The original tradeoff was probably fine when it was made, but now we care more about speed and less about memory consumption. Back-patch to all supported branches.	2015-10-16 15:56:00 -04:00
Robert Haas	22884414cb	Remove cautions about using volatile from spin.h. Commit `0709b7ee72` obsoleted this comment but neglected to update it. Thomas Munro	2015-10-16 14:10:28 -04:00
Robert Haas	73d71cde57	Fix a problem with parallel workers being unable to restore role. check_role() tries to verify that the user has permission to become the requested role, but this is inappropriate in a parallel worker, which needs to exactly recreate the master's authorization settings. So skip the check in that case. This fixes a bug in commit `924bcf4f16`.	2015-10-16 11:40:03 -04:00
Robert Haas	26981d2927	Don't send protocol messages to a shm_mq that no longer exists. Commit `2bd9e412f9` introduced a mechanism for relaying protocol messages from a background worker to another backend via a shm_mq. However, there was no provision for shutting down the communication channel. Therefore, a protocol message sent late in the shutdown sequence, such as a DEBUG message resulting from cranking up log_min_messages, could crash the server. To fix, install an on_dsm_detach callback that disables sending messages to the shm_mq when the associated DSM is detached.	2015-10-16 09:49:08 -04:00
Robert Haas	5043193b78	Allow FDWs to push down quals without breaking EvalPlanQual rechecks. This fixes a long-standing bug which was discovered while investigating the interaction between the new join pushdown code and the EvalPlanQual machinery: if a ForeignScan appears on the inner side of a paramaterized nestloop, an EPQ recheck would re-return the original tuple even if it no longer satisfied the pushed-down quals due to changed parameter values. This fix adds a new member to ForeignScan and ForeignScanState and a new argument to make_foreignscan, and requires changes to FDWs which push down quals to populate that new argument with a list of quals they have chosen to push down. Therefore, I'm only back-patching to 9.5, even though the bug is not new in 9.5. Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.	2015-10-15 13:10:39 -04:00
Tom Lane	39ac293940	On Windows, ensure shared memory handle gets closed if not being used. Postmaster child processes that aren't supposed to be attached to shared memory were not bothering to close the shared memory mapping handle they inherit from the postmaster process. That's mostly harmless, since the handle vanishes anyway when the child process exits -- but the syslogger process, if used, doesn't get killed and restarted during recovery from a backend crash. That meant that Windows doesn't see the shared memory mapping as becoming free, so it doesn't delete it and the postmaster is unable to create a new one, resulting in failure to recover from crashes whenever logging_collector is turned on. Per report from Dmitry Vasilyev. It's a bit astonishing that we'd not figured this out long ago, since it's been broken from the very beginnings of out native Windows support; probably some previously-unexplained trouble reports trace to this. A secondary problem is that on Cygwin (perhaps only in older versions?), exec() may not detach from the shared memory segment after all, in which case these child processes did remain attached to shared memory, posing the risk of an unexpected shared memory clobber if they went off the rails somehow. That may be a long-gone bug, but we can deal with it now if it's still live, by detaching within the infrastructure introduced here to deal with closing the handle. Back-patch to all supported branches. Tom Lane and Amit Kapila	2015-10-13 11:21:33 -04:00
Bruce Momjian	2169e878c4	docs: clarify JSONB operator descriptions No catalog bump as the catalog changes are for SQL operator comments. Backpatch through 9.5	2015-10-07 09:06:49 -04:00
Tom Lane	02580df6c3	Perform an immediate shutdown if the postmaster.pid file is removed. The postmaster now checks every minute or so (worst case, at most two minutes) that postmaster.pid is still there and still contains its own PID. If not, it performs an immediate shutdown, as though it had received SIGQUIT. The original goal behind this change was to ensure that failed buildfarm runs would get fully cleaned up, even if the test scripts had left a postmaster running, which is not an infrequent occurrence. When the buildfarm script removes a test postmaster's $PGDATA directory, its next check on postmaster.pid will fail and cause it to exit. Previously, manual intervention was often needed to get rid of such orphaned postmasters, since they'd block new test postmasters from obtaining the expected socket address. However, by checking postmaster.pid and not something else, we can provide additional robustness: manual removal of postmaster.pid is a frequent DBA mistake, and now we can at least limit the damage that will ensue if a new postmaster is started while the old one is still alive. Back-patch to all supported branches, since we won't get the desired improvement in buildfarm reliability otherwise.	2015-10-06 17:15:27 -04:00

1 2 3 4 5 ...

6942 commits