postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-06-19 13:39:01 -04:00

Author	SHA1	Message	Date
Andres Freund	4bc0f165cb	Change default of backend_flush_after GUC to 0 (disabled). While beneficial, both for throughput and average/worst case latency, in a significant number of workloads, there are other workloads in which backend_flush_after can cause significant performance regressions in comparison to < 9.6 releases. The regression is most likely when the hot data set is bigger than shared buffers, but significantly smaller than the operating system's page cache. I personally think that the benefit of enabling backend flush control is considerably bigger than the potential downsides, but a fair argument can be made that not regressing is more important than improving performance/latency. As the latter is the consensus, change the default to 0. The other settings introduced in `428b1d6b2` do not have the same potential for regressions, so leave them enabled. Benchmarks leading up to changing the default have been performed by Mithun Cy, Ashutosh Sharma and Robert Haas. Discussion: CAD__OuhPmc6XH=wYRm_+Q657yQE88DakN4=Ybh2oveFasHkoeA@mail.gmail.com	2016-06-10 15:31:11 -07:00
Tom Lane	3303ea1a32	Remove reltarget_has_non_vars flag. Commit `b12fd41c6` added a "reltarget_has_non_vars" field to RelOptInfo, but failed to maintain it accurately. Since its only purpose was to skip calls to has_parallel_hazard() in the simple case where a rel's targetlist is all Vars, and that call is really pretty cheap in that case anyway, it seems like this is just a case of premature optimization. Let's drop the flag and do the calls unconditionally until it's proven that we need more smarts here.	2016-06-10 16:20:03 -04:00
Tom Lane	2f153ddfdd	Refactor to reduce code duplication for function property checking. As noted by Andres Freund, we'd accumulated quite a few similar functions in clauses.c that examine all functions in an expression tree to see if they satisfy some boolean test. Reduce the duplication by inventing a function check_functions_in_node() that applies a simple callback function to each SQL function OID appearing in a given expression node. This also fixes some arguable oversights; for example, contain_mutable_functions() did not check aggregate or window functions for mutability. I doubt that that represents a live bug at the moment, because we don't really consider mutability for aggregates; but it might someday be one. I chose to put check_functions_in_node() in nodeFuncs.c because it seemed like other modules might wish to use it in future. That in turn forced moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean. In passing, teach contain_leaked_vars_walker() about a few more expression node types it can safely look through, and improve the rather messy and undercommented code in has_parallel_hazard_walker(). Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>	2016-06-10 16:03:46 -04:00
Kevin Grittner	bf9a60ee33	Fix interaction between CREATE INDEX and "snapshot too old". Since indexes are created without valid LSNs, an index created while a snapshot older than old_snapshot_threshold existed could cause queries to return incorrect results when those old snapshots were used, if any relevant rows had been subject to early pruning before the index was built. Prevent usage of a newly created index until all such snapshots are released, for relations where this can happen. Questions about the interaction of "snapshot too old" with index creation were initially raised by Andres Freund. Reviewed by Robert Haas.	2016-06-10 09:25:31 -05:00
Tom Lane	cae1c788b9	Improve the situation for parallel query versus temp relations. Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)	2016-06-09 20:16:11 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Robert Haas	b12fd41c69	Don't generate parallel paths for rels with parallel-restricted outputs. Such paths are unsafe. To make it cheaper to detect when this case applies, track whether a relation's default PathTarget contains any non-Vars. In most cases, the answer will be no, which enables us to determine cheaply that the target list for a proposed path is parallel-safe. However, subquery pull-up can create cases that require us to inspect the target list more carefully. Amit Kapila, reviewed by me.	2016-06-09 12:43:36 -04:00
Tom Lane	7feb60c1bb	Clarify documentation of ceil/ceiling/floor functions. Document these as "nearest integer >= argument" and "nearest integer <= argument", which will hopefully be less confusing than the old formulation. New wording is from Matlab via Dean Rasheed. I changed the pg_description entries as well as the SGML docs. In the back branches, this will only affect installations initdb'd in the future, but it should be harmless otherwise. Discussion: <CAEZATCW3yzJo-NMSiQs5jXNFbTsCEftZS-Og8=FvFdiU+kYuSA@mail.gmail.com>	2016-06-09 11:58:00 -04:00
Tom Lane	e4158319f3	Mop-up for parallel degree-ectomy. Fix a couple of overlooked uses of "degree" terminology. Make the parallel worker count selection logic in create_plain_partial_paths more robust (in particular, it failed with max_parallel_workers_per_gather set to zero).	2016-06-09 11:16:26 -04:00
Robert Haas	c9ce4a1c61	Eliminate "parallel degree" terminology. This terminology provoked widespread complaints. So, instead, rename the GUC max_parallel_degree to max_parallel_workers_per_gather (leaving room for a possible future GUC max_parallel_workers that acts as a system-wide limit), and rename the parallel_degree reloption to parallel_workers. Rename structure members to match. These changes create a dump/restore hazard for users of PostgreSQL 9.6beta1 who have set the reloption (or applied the GUC using ALTER USER or ALTER DATABASE).	2016-06-09 10:00:26 -04:00
Tom Lane	77ba610805	Revert "Use Foreign Key relationships to infer multi-column join selectivity". This commit reverts `137805f89` as well as the associated commits `015e88942`, `5306df283`, and `68d704edb`. We found multiple bugs in this feature, and there was concern about possible planner slowdown (though to be fair, exhibiting a very large slowdown proved difficult). The way forward requires a considerable rewrite, which may or may not be possible to accomplish in time for beta2. In my judgment reviewing the rewrite will be easier to accomplish starting from a clean slate, so let's temporarily revert what's there now. This also leaves us in a safe state if it turns out to be necessary to postpone the rewrite to the next development cycle. Discussion: <20160429102531.GA13701@huehner.biz>	2016-06-07 17:21:17 -04:00
Tom Lane	f64340e743	Don't reset changes_since_analyze after a selective-columns ANALYZE. If we ANALYZE only selected columns of a table, we should not postpone auto-analyze because of that; other columns may well still need stats updates. As committed, the counter is left alone if a column list is given, whether or not it includes all analyzable columns of the table. Per complaint from Tomasz Ostrowski. It's been like this a long time, so back-patch to all supported branches. Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>	2016-06-06 17:44:17 -04:00
Robert Haas	c6dbf1fe79	Stop the executor if no more tuples can be sent from worker to leader. If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit `44339b892a` should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila	2016-06-06 14:52:58 -04:00
Tom Lane	d50183c578	Inline the easy cases in MakeExpandedObjectReadOnly(). This attempts to buy back some of whatever performance we lost from fixing bug #14174 by inlining the initial checks in MakeExpandedObjectReadOnly() into the callers. We can do that in a macro without creating multiple- evaluation hazards, so it's pretty much free notationally; and the amount of code added to callers should be minimal as well. (Testing a value can't take many more instructions than passing it to a subroutine.) Might as well inline DatumIsReadWriteExpandedObject() while we're at it. This is an ABI break for callers, so it doesn't seem safe to put into 9.5, but I see no reason not to do it in HEAD.	2016-06-03 18:34:05 -04:00
Tom Lane	69f526aa49	Mark read/write expanded values as read-only in ExecProject(). If a plan node output expression returns an "expanded" datum, and that output column is referenced in more than one place in upper-level plan nodes, we need to ensure that what is returned is a read-only reference not a read/write reference. Otherwise one of the referencing sites could scribble on or even delete the expanded datum before we have evaluated the others. Commit `1dc5ebc907`, which introduced this feature, supposed that it'd be sufficient to make SubqueryScan nodes force their output columns to read-only state. The folly of that was revealed by bug #14174 from Andrew Gierth, and really should have been immediately obvious considering that the planner will happily optimize SubqueryScan nodes out of the plan without any regard for this issue. The safest fix seems to be to make ExecProject() force its results into read-only state; that will cover every case where a plan node returns expression results. Actually we can delegate this to ExecTargetList() since we can recursively assume that plain Vars will not reference read-write datums. That should keep the extra overhead down to something minimal. We no longer need ExecMakeSlotContentsReadOnly(), which was introduced only in support of the idea that just a few plan node types would need to do this. In the future it would be nice to have the planner account for this problem and inject force-to-read-only expression evaluation nodes into only the places where there's a risk of multiple evaluation. That's not a suitable solution for 9.5 or even 9.6 at this point, though. Report: <20160603124628.9932.41279@wrigleys.postgresql.org>	2016-06-03 15:14:50 -04:00
Robert Haas	cac8321970	Mark PostmasterPid as PGDLLIMPORT. This is so that extensions can use it. Michael Paquier	2016-06-03 14:06:35 -04:00
Greg Stark	e1623c3959	Fix various common mispellings. Mostly these are just comments but there are a few in documentation and a handful in code and tests. Hopefully this doesn't cause too much unnecessary pain for backpatching. I relented from some of the most common like "thru" for that reason. The rest don't seem numerous enough to cause problems. Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings	2016-06-03 16:08:45 +01:00
Robert Haas	fdfaccfa79	Cosmetic improvements to freeze map code. Per post-commit review comments from Andres Freund, improve variable names, comments, and in one place, slightly improve the code structure. Masahiko Sawada	2016-06-03 08:43:41 -04:00
Kevin Grittner	4edb7bd2fd	C comment improvement & typo fix. Thomas Munro	2016-06-02 12:52:41 -05:00
Tom Lane	22b27b4c9e	Avoid useless closely-spaced writes of statistics files. The original intent in the stats collector was that we should not write out stats data oftener than every PGSTAT_STAT_INTERVAL msec. Backends will not make requests at all if they see the existing data is newer than that, and the stats collector is supposed to disregard requests having a cutoff_time older than its most recently written data, so that close-together requests don't result in multiple writes. But the latter part of that got broken in commit `187492b6c2`, so that if two backends concurrently decide the existing stats are too old, the collector would write the data twice. (In principle the collector's logic would still merge requests as long as the second one arrives before we've actually written data ... but since the message collection loop would write data immediately after processing a single inquiry message, that never happened in practice, and in any case the window in which it might work would be much shorter than PGSTAT_STAT_INTERVAL.) To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff time is too old, and doesn't add a request to the queue if so. This means that we do not need DBWriteRequest.request_time, because the decision is taken before making a queue entry. And that means that we don't really need the DBWriteRequest data structure at all; an OID list of database OIDs will serve and allow removal of some rather verbose and crufty code. In passing, improve the comments in this area, which have been rather neglected. Also change backend_read_statsfile so that it's not silently relying on MyDatabaseId to have some particular value in the autovacuum launcher process. It accidentally worked as desired because MyDatabaseId is zero in that process; but that does not seem like a dependency we want, especially with no documentation about it. Although this patch is mine, it turns out I'd rediscovered a known bug, for which Tomas Vondra had already submitted a patch that's functionally equivalent to the non-cosmetic aspects of this patch. Thanks to Tomas for reviewing this version. Back-patch to 9.3 where the bug was introduced. Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz> Patch: <4625.1464202586@sss.pgh.pa.us>	2016-05-31 15:55:15 -04:00
Andres Freund	87a3023c60	Move memory barrier in UnlockBufHdr to before releasing the lock. This bug appears to have been introduced late in the development of `48354581a4` ("Allow Pin/UnpinBuffer to operate in a lockfree manner."). Found while debugging a bug which turned out to be independent of the commit mentioned above. Backpatch: -	2016-05-30 15:35:53 -07:00
Alvaro Herrera	975ad4e602	Fix PageAddItem BRIN bug BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu	2016-05-30 14:47:22 -04:00
Tom Lane	9dd4178cec	Be more predictable about reporting "lock timeout" vs "statement timeout". If both timeout indicators are set when we arrive at ProcessInterrupts, we've historically just reported "lock timeout". However, some buildfarm members have been observed to fail isolationtester's timeouts test by reporting "lock timeout" when the statement timeout was expected to fire first. The cause seems to be that the process is allowed to sleep longer than expected (probably due to heavy machine load) so that the lock timeout happens before we reach the point of reporting the error, and then this arbitrary tiebreak rule does the wrong thing. We can improve matters by comparing the scheduled timeout times to decide which error to report. I had originally proposed greatly reducing the 1-second window between the two timeouts in the test cases. On reflection that is a bad idea, at least for the case where the lock timeout is expected to fire first, because that would assume that it takes negligible time to get from statement start to the beginning of the lock wait. Thus, this patch doesn't completely remove the risk of test failures on slow machines. Empirically, however, the case this handles is the one we are seeing in the buildfarm. The explanation may be that the other case requires the scheduler to take the CPU away from a busy process, whereas the case fixed here only requires the scheduler to not give the CPU back right away to a process that has been woken from a multi-second sleep (and, perhaps, has been swapped out meanwhile). Back-patch to 9.3 where the isolationtester timeouts test was added. Discussion: <8693.1464314819@sss.pgh.pa.us>	2016-05-27 10:40:20 -04:00
Tom Lane	f5e7b2f910	Mark wal_level as PGDLLIMPORT. Per buildfarm, this is needed to allow extensions to use XLogIsNeeded() in Windows builds.	2016-05-24 22:48:47 -04:00
Tom Lane	465e09da63	Add support for more extensive testing of raw_expression_tree_walker(). If RAW_EXPRESSION_COVERAGE_TEST is defined, do a no-op tree walk over every basic DML statement submitted to parse analysis. If we'd had this in place earlier, bug #14153 would have been caught by buildfarm testing. The difficulty is that raw_expression_tree_walker() is only used in limited cases involving CTEs (particularly recursive ones), so it's very easy for an oversight in it to not be noticed during testing of a seemingly-unrelated feature. The type of error we can expect to catch with this is complete omission of a node type from raw_expression_tree_walker(), and perhaps also recursion into a field that doesn't contain a node tree, though that would be an unlikely mistake. It won't catch failure to add new fields that need to be recursed into, unfortunately. I'll go enable this on one or two of my own buildfarm animals once bug #14153 is dealt with. Discussion: <27861.1464040417@sss.pgh.pa.us>	2016-05-23 19:08:26 -04:00
Tom Lane	8a4930e3fa	Fix latent crash in do_text_output_multiline(). do_text_output_multiline() would fail (typically with a null pointer dereference crash) if its input string did not end with a newline. Such cases do not arise in our current sources; but it certainly could happen in future, or in extension code's usage of the function, so we should fix it. To fix, replace "eol += len" with "eol = text + len". While at it, make two cosmetic improvements: mark the input string const, and rename the argument from "text" to "txt" to dodge pgindent strangeness (since "text" is a typedef name). Even though this problem is only latent at present, it seems like a good idea to back-patch the fix, since it's a very simple/safe patch and it's not out of the realm of possibility that we might in future back-patch something that expects sane behavior from do_text_output_multiline(). Per report from Hao Lee. Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>	2016-05-23 14:16:40 -04:00
Tom Lane	16ea51a263	Pin the built-in index access methods. This was overlooked in commit `473b93287`, which introduced DROP ACCESS METHOD. Although that command is restricted to superusers, we don't want even superusers dropping the built-in methods; "DROP ACCESS METHOD btree" in particular is unrecoverable from. Pin these objects in the same way that other initdb-created objects are pinned. I chose to bump catversion for this fix. That's not absolutely necessary perhaps, but it will ensure that no 9.6 production systems are missing the pin entries.	2016-05-19 14:40:02 -04:00
Tom Lane	8ee29a19d6	Stamp 9.6beta1.	2016-05-09 16:47:49 -04:00
Tom Lane	1a2c17f8e2	Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old. This patch essentially reverts commit `4c6780fd17`, in favor of a much simpler solution for the case where the new cluster would choose to create a TOAST table but the old cluster doesn't have one: just don't create a TOAST table. The existing code failed in at least two different ways if the situation arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the lock sanity check in create_toast_table failed; (2) pg_upgrade did not provide a pg_type OID for the new toast table, so that the crosscheck in TypeCreate failed. While both these problems were introduced by later patches, they show that the hack being used to cause TOAST table creation is overwhelmingly fragile (and untested). I also note that before the TypeCreate crosscheck was added, the code would have resulted in assigning an indeterminate pg_type OID to the toast table, possibly causing a later OID conflict in that catalog; so that it didn't really work even when committed. If we simply don't create a TOAST table, there will only be a problem if the code tries to store a tuple that's wider than a page, and field compression isn't sufficient to get it under a page. Given that the TOAST creation threshold is intended to be about a quarter of a page, it's very hard to believe that cross-version differences in the do-we-need-a-toast- table heuristic could result in an observable problem. So let's just follow the old version's conclusion about whether a TOAST table is needed. (If we ever do change needs_toast_table() so much that this conclusion doesn't apply, we can devise a solution at that time, and hopefully do it in a less klugy way than `4c6780fd17` did.) Back-patch to 9.3, like the previous patch. Discussion: <8110.1462291671@sss.pgh.pa.us>	2016-05-06 22:05:56 -04:00
Kevin Grittner	2cc41acd8f	Fix hash index vs "snapshot too old" problemms Hash indexes are not WAL-logged, and so do not maintain the LSN of index pages. Since the "snapshot too old" feature counts on detecting error conditions using the LSN of a table and all indexes on it, this makes it impossible to safely do early vacuuming on any table with a hash index, so add this to the tests for whether the xid used to vacuum a table can be adjusted based on old_snapshot_threshold. While at it, add a paragraph to the docs for old_snapshot_threshold which specifically mentions this and other aspects of the feature which may otherwise surprise users. Problem reported and patch reviewed by Amit Kapila	2016-05-06 07:47:12 -05:00
Dean Rasheed	93a8c6fd6c	Move and rename fmtReloptionsArray(). Move fmtReloptionsArray() from pg_dump.c to string_utils.c so that it is available to other frontend code. In particular psql's \ev and \sv commands need it to handle view reloptions. Also rename the function to appendReloptionsArray(), which is a more accurate description of what it does. Author: Dean Rasheed Reviewed-by: Peter Eisentraut Discussion: http://www.postgresql.org/message-id/CAEZATCWZjCgKRyM-agE0p8ax15j9uyQoF=qew7D2xB6cF76T8A@mail.gmail.com	2016-05-06 12:45:36 +01:00
Tom Lane	0b9a234432	Rename tsvector delete() to ts_delete(), and filter() to ts_filter(). The similarity of the original names to SQL keywords seems like a bad idea. Rename them before we're stuck with 'em forever. In passing, minor code and docs cleanup. Discussion: <4875.1462210058@sss.pgh.pa.us>	2016-05-05 19:43:32 -04:00
Alvaro Herrera	c1543a81a7	Revert timeline following in replication slots This reverts commits `f07d18b6e9`, `82c83b3372`, `3a3b309041`, and `24c5f1a103`. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.	2016-05-04 17:32:22 -03:00
Robert Haas	9888b34fdb	Fix more things to be parallel-safe. Conversion functions were previously marked as parallel-unsafe, since that is the default, but in fact they are safe. Parallel-safe functions defined in pg_proc.h and redefined in system_views.sql were ending up as parallel-unsafe because the redeclarations were not marked PARALLEL SAFE. While editing system_views.sql, mark ts_debug() parallel safe also. Andreas Karlsson	2016-05-03 14:36:38 -04:00
Alvaro Herrera	6b60916829	Fix thinko in comment Pointed out by Andres Freund	2016-05-02 16:46:42 -03:00
Alvaro Herrera	234a266066	Fix code comments regarding logical decoding Back in `3b02ea4f07` I added some comments in various places to explain how logical decoding and other things worked. Not all of the changes were welcome, because they were misleading or wrong. This changes them a little bit to make them more accurate. Some other comments are also changed to be more accurate. Also, fix a bunch of typos. Author: Álvaro Herrera, Craig Ringer Andres Freund reviewed some parts of this.	2016-05-02 16:04:29 -03:00
Robert Haas	37d0c2cb1a	Fix parallel safety markings for pg_start_backup. Commit `7117685461` made pg_start_backup parallel-restricted rather than parallel-safe, because it now relies on backend-private state that won't be synchronized with the parallel worker. However, it didn't update pg_proc.h. Separately, Andreas Karlsson observed that system_views.sql neglected to reiterate the parallel-safety markings whe redefining various functions, including this one; so add a PARALLEL RESTRICTED declaration there to match the new value in pg_proc.h.	2016-05-02 10:42:34 -04:00
Tom Lane	207d5a656e	Fix mishandling of equivalence-class tests in parameterized plans. Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is extremely hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.	2016-04-29 20:19:38 -04:00
Kevin Grittner	7c3e8039f4	Add a few entries to the tail of time mapping, to see old values. Without a few entries beyond old_snapshot_threshold, the lookup would often fail, resulting in the more aggressive pruning or vacuum being skipped often enough to matter. This was very clearly shown by a python test script posted by Ants Aasma, and was likely a factor in an earlier but somewhat less clear-cut test case posted by Jeff Janes. This patch makes no change to the logic, per se -- it just makes the array of mapping entries big enough to make lookup misses based on timing much less likely. An occasional miss is still possible if a thread stalls for more than 10 minutes, but that does not create any problem with correctness of behavior. Besides, if things are so busy that a thread is stalling for more than 10 minutes, it is probably OK to skip the more aggressive cleanup at that particular point in time.	2016-04-29 16:46:08 -05:00
Andrew Dunstan	d34e7b2812	Fix comment whitespace in VS2105 patch per gripe from Michael Paquier.	2016-04-29 14:18:51 -04:00
Magnus Hagander	a03bda323b	Fix typo Author: Thomas Munro	2016-04-29 16:15:07 +02:00
Andrew Dunstan	0fb54de9aa	Support building with Visual Studio 2015 Adjust the way we detect the locale. As a result the minumum Windows version supported by VS2015 and later is Windows Vista. Add some tweaks to remove new compiler warnings. Remove documentation references to the now obsolete msysGit. Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich. Backpatch to 9.5	2016-04-29 08:09:07 -04:00
Tom Lane	23b09e15b9	Adjust DatumGetBool macro, this time for sure. Commit `23a41573c` attempted to fix the DatumGetBool macro to ignore bits in a Datum that are to the left of the actual bool value. But it did that by casting the Datum to bool; and on compilers that use C99 semantics for bool, that ends up being a whole-word test, not a 1-byte test. This seems to be the true explanation for contrib/seg failing in VS2015. To fix, use GET_1_BYTE() explicitly. I think in the previous patch, I'd had some idea of not having to commit to bool being exactly 1 byte wide, but regardless of what the compiler's bool is, boolean columns and Datums are certainly 1 byte wide. The previous fix was (eventually) back-patched into all active versions, so do likewise with this one.	2016-04-28 11:50:58 -04:00
Teodor Sigaev	f8467f7da8	Prevent to use magic constants Use macroses for definition amstrategies/amsupport fields instead of hardcoded values. Author: Nikolay Shaplov with addition for contrib/bloom	2016-04-28 16:39:25 +03:00
Teodor Sigaev	e2c79e14d9	Prevent multiple cleanup process for pending list in GIN. Previously, ginInsertCleanup could exit early if it detects that someone else is cleaning up the pending list, without waiting for that someone else to finish the job. But in this case vacuum could miss tuples to be deleted. Cleanup process now locks metapage with a help of heavyweight LockPage(ExclusiveLock), and it guarantees that there is no another cleanup process at the same time. Lock is taken differently depending on caller of cleanup process: any vacuums and gin_clean_pending_list() will be blocked until lock becomes available, ordinary insert uses conditional lock to prevent indefinite waiting on lock. Insert into pending list doesn't use this lock, so insertion isn't blocked. Also, patch adds stopping of cleanup process when at-start-cleanup-tail is reached in order to prevent infinite cleanup in case of massive insertion. But it will stop only for automatic maintenance tasks like autovacuum. Patch introduces choice of limit of memory to use: autovacuum_work_mem, maintenance_work_mem or work_mem depending on call path. Patch for previous releases should be reworked due to changes between 9.6 and previous ones in this area. Discover and diagnostics by Jeff Janes and Tomas Vondra Patch by me with some ideas of Jeff Janes	2016-04-28 16:21:42 +03:00
Tom Lane	4c804fbdfb	Clean up parsing of synchronous_standby_names GUC variable. Commit `989be0810d` added a flex/bison lexer/parser to interpret synchronous_standby_names. It was done in a pretty crufty way, though, making assorted end-use sites responsible for calling the parser at the right times. That was not only vulnerable to errors of omission, but made it possible that lexer/parser errors occur at very undesirable times, and created memory leakages even if there was no error. Instead, perform the parsing once during check_synchronous_standby_names and let guc.c manage the resulting data. To do that, we have to flatten the parsed representation into a single hunk of malloc'd memory, but that is not very hard. While at it, work a little harder on making useful error reports for parsing problems; the previous code felt that "synchronous_standby_names parser returned 1" was an appropriate user-facing error message. (To be fair, it did also log a syntax error message, but separately from the GUC problem report, which is at best confusing.) It had some outright bugs in the face of invalid input, too. I (tgl) also concluded that we need to restrict unquoted names in synchronous_standby_names to be just SQL identifiers. The previous coding would accept darn near anything, which (1) makes the quoting convention both nearly-unnecessary and formally ambiguous, (2) makes it very hard to understand what is a syntax error and what is a creative interpretation of the input as a standby name, and (3) makes it impossible to further extend the syntax in future without a compatibility break. I presume that we're intending future extensions of the syntax, else this parsing infrastructure is massive overkill, so (3) is an important objection. Since we've taken a compatibility hit for non-identifier names with this change anyway, we might as well lock things down now and insist that users use double quotes for standby names that aren't identifiers. Kyotaro Horiguchi and Tom Lane	2016-04-27 17:55:25 -04:00
Robert Haas	360ca27a9b	Remove mergeHyperLogLog. It's buggy. If somebody needs this later, they'll need to put back a non-buggy vesion of it. Discussion: CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com Discussion: CAM3SWZRUOLsYoTT83QgdUy9D8ehYWm_nvbrrfcOOzikiRfFY7g@mail.gmail.com Peter Geoghegan	2016-04-27 10:55:32 -04:00
Robert Haas	59eb551279	Fix EXPLAIN VERBOSE output for parallel aggregate. The way that PartialAggregate and FinalizeAggregate plan nodes were displaying output columns before was bogus. Now, FinalizeAggregate produces the same outputs as an Aggregate would have produced, while PartialAggregate produces each of those outputs prefixed by the word PARTIAL. Discussion: 12585.1460737650@sss.pgh.pa.us Patch by me, reviewed by David Rowley.	2016-04-27 07:37:40 -04:00
Andres Freund	c6ff84b06a	Emit invalidations to standby for transactions without xid. So far, when a transaction with pending invalidations, but without an assigned xid, committed, we simply ignored those invalidation messages. That's problematic, because those are actually sent for a reason. Known symptoms of this include that existing sessions on a hot-standby replica sometimes fail to notice new concurrently built indexes and visibility map updates. The solution is to WAL log such invalidations in transactions without an xid. We considered to alternatively force-assign an xid, but that'd be problematic for vacuum, which might be run in systems with few xids. Important: This adds a new WAL record, but as the patch has to be back-patched, we can't bump the WAL page magic. This means that standbys have to be updated before primaries; otherwise "PANIC: standby_redo: unknown op code 32" errors can be encountered. XXX: Reported-By: Васильев Дмитрий, Masahiko Sawada Discussion: CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com	2016-04-26 20:21:54 -07:00
Noah Misch	213c7df033	Impose a full barrier in generic-xlc.h atomics functions. pg_atomic_compare_exchange_*_impl() were providing only the semantics of an acquire barrier. Buildfarm members hornet and mandrill revealed this deficit beginning with commit `008608b9d5`. While we have no report of symptoms in 9.5, we can't rule out the possibility of certain compilers, hardware, or extension code relying on these functions' specified barrier semantics. Back-patch to 9.5, where commit `b64d92f1a5` introduced atomics. Reviewed by Andres Freund.	2016-04-26 21:53:58 -04:00

1 2 3 4 5 ...

7254 commits