postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-02-25 10:50:50 -05:00

Author	SHA1	Message	Date
Robert Haas	2843c01a56	Report an ERROR if a parallel worker fails to start properly. Commit `28724fd90d` fixed things so that if a background worker fails to start due to fork() failure or because it is terminated before startup succeeds, BGWH_STOPPED will be reported. However, that only helps if the code that uses the background worker machinery notices the change in status, and the code in parallel.c did not. To fix that, do two things. First, make sure that when a worker exits, it triggers the leader to read from error queues. That way, if a worker which has attached to an error queue exits uncleanly, the leader is sure to throw some error, either the contents of the ErrorResponse sent by the worker, or "lost connection to parallel worker" if it exited without sending one. To cover the case where the worker never starts up in the first place or exits before attaching to the error queue, the ParallelContext now keeps track of which workers have sent at least one message via the error queue. A worker which sends no messages by the time the parallel operation finishes will be checked to see whether it exited before attaching to the error queue; if so, a new error message, "parallel worker failed to initialize", will be reported. If not, we'll continue to wait until it either starts up and exits cleanly, starts up and exits uncleanly, or fails to start, and then take the appropriate action. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com	2018-01-23 11:13:50 -05:00
Andres Freund	986a9153b9	Perform a lot more sanity checks when freezing tuples. The previous commit has shown that the sanity checks around freezing aren't strong enough. Strengthening them seems especially important because the existance of the bug has caused corruption that we don't want to make even worse during future vacuum cycles. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superflous translations, mark messages as internal. Author: Andres Freund and Alvaro Herrera Reviewed-By: Alvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-	2017-12-14 18:20:48 -08:00
Robert Haas	a4116c4602	Mark assorted variables PGDLLIMPORT. This makes life easier for extension authors who wish to support Windows. Brian Cloutier, slightly amended by me. Discussion: http://postgr.es/m/CAJCy68fscdNhmzFPS4kyO00CADkvXvEa-28H-OtENk-pa2OTWw@mail.gmail.com	2017-12-05 09:27:20 -05:00
Tom Lane	9773f0527f	Pad XLogReaderState's main_data buffer more aggressively. Originally, we palloc'd this buffer just barely big enough to hold the largest xlog record seen so far. It turns out that that can result in valgrind complaints, because some compilers will emit code that assumes it can safely fetch padding bytes at the end of a struct, and those padding bytes were unallocated so far as aset.c was concerned. We can fix that by MAXALIGN'ing the palloc request size, ensuring that it is big enough to include any possible padding that might've been omitted from the on-disk record. An additional objection to the original coding is that it could result in many repeated palloc cycles, in the worst case where we see a series of gradually larger xlog records. We can ameliorate that cheaply by imposing a minimum buffer size that's large enough for most xlog records. BLCKSZ/2 was chosen after a bit of discussion. In passing, remove an obsolete comment in struct xl_heap_new_cid that the combocid field is free due to alignment considerations. Perhaps that was true at some point, but it's not now. Back-patch to 9.5 where this code came in. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org	2017-11-26 15:17:25 -05:00
Robert Haas	19648ce553	Fix broken cleanup interlock for GIN pending list. The pending list must (for correctness) always be cleaned up by vacuum, and should (for the avoidance of surprising behavior) always be cleaned up by an explicit call to gin_clean_pending_list, but cleanup is optional when inserting. The old logic got this backward: cleanup was forced if (stats == NULL), but that's going to be false when vacuuming and true for inserts. Masahiko Sawada, reviewed by me. Discussion: http://postgr.es/m/CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@mail.gmail.com	2017-11-16 15:26:49 -05:00
Alvaro Herrera	08ba67d596	Revert bogus fixes of HOT-freezing bug It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de	2017-11-02 15:51:05 +01:00
Alvaro Herrera	d441cff142	Fix traversal of half-frozen update chains When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit `37484ad2a` "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after `20b6552242` fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, Álvaro Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com	2017-10-06 17:14:42 +02:00
Andres Freund	6c036d0108	Backpatch introduction of TupleDescAttr(tupdesc, i). `2cd7084524` / `c6293249d` change the way individual attributes in a TupleDesc are stored / accessed. To reduce the effort of making extensions compatible with postgresql 11, and to ease future backpatching, backpatch introduction of TupleDescAttr() to all releases. Do not backpatch change in storage, as that'd be a breaking change for existing and working extensions. Author: Andres Freund Discussion: https://postgr.es/m/20170820181723.tdswdinzptbcwhrr@alap3.anarazel.de Backpatch: 9.2-	2017-08-22 07:47:42 -07:00
Tatsuo Ishii	bd8ff38760	Fix comment. XLByteToSeg and XLByteToPrevSeg calculate only a segment number. The definition of these macros were modified by commit `dfda6ebaec` but the comment remain unchanged. Patch by Yugo Nagata. Back patched to 9.3 and beyond.	2017-08-01 08:08:32 +09:00
Robert Haas	9b6e8d8f86	Don't use bgw_main even to specify in-core bgworker entrypoints. On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com	2017-03-31 20:47:50 -04:00
Teodor Sigaev	8de6278d3b	Fix backup canceling Assert-enabled build crashes but without asserts it works by wrong way: it may not reset forcing full page write and preventing from starting exclusive backup with the same name as cancelled. Patch replaces pair of booleans nonexclusive_backup_running/exclusive_backup_running to single enum to correctly describe backup state. Backpatch to 9.6 where bug was introduced Reported-by: David Steele Authors: Michael Paquier, David Steele Reviewed-by: Anastasia Lubennikova https://commitfest.postgresql.org/13/1068/	2017-03-24 13:55:02 +03:00
Heikki Linnakangas	90e8599219	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:34:15 +02:00
Alvaro Herrera	fce4609d5e	Change some test macros to return true booleans These macros work fine when they are used directly in an "if" test or similar, but as soon as the return values are assigned to boolean variables (or passed as boolean arguments to some function), they become bugs, hopefully caught by compiler warnings. To avoid future problems, fix the definitions so that they return actual booleans. To further minimize the risk that somebody uses them in back-patched fixes that only work correctly in branches starting from the current master and not in old ones, back-patch the change to supported branches as appropriate. See also commit `af4472bcb8`, and the long discussion (and larger patch) in the thread mentioned in its commit message. Discussion: https://postgr.es/m/18672.1483022414@sss.pgh.pa.us	2017-01-18 18:06:13 -03:00
Alvaro Herrera	4e563a1f65	Fix ALTER TABLE / SET TYPE for irregular inheritance If inherited tables don't have exactly the same schema, the USING clause in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the children tables since commit `9550e8348b`. Starting with that commit, the attribute numbers in the USING expression are fixed during parse analysis. This can lead to bogus errors being reported during execution, such as: ERROR: attribute 2 has wrong type DETAIL: Table has type smallint, but query expects integer. Since it wouldn't do to revert to the original coding, we now apply a transformation to map the attribute numbers to the correct ones for each child. Reported by Justin Pryzby Analysis by Tom Lane; patch by me. Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com	2017-01-09 19:26:58 -03:00
Tom Lane	5c24c61273	Remove duplicate macro definition. Seems to be a copy-and-pasteo. Odd that we heard no reports of compiler warnings about it. Thomas Munro	2016-11-05 11:51:46 -04:00
Heikki Linnakangas	fd94ac501f	Fix outdated comments, GIST search queue is not an RBTree anymore. The GiST search queue is implemented as a pairing heap rather than as Red-Black Tree, since 9.5 (commit `e7032610`). I neglected these comments in that commit.	2016-09-20 11:40:13 +03:00
Andres Freund	e79aaebccd	Fix deletion of speculatively inserted TOAST on conflict INSERT .. ON CONFLICT runs a pre-check of the possible conflicting constraints before performing the actual speculative insertion. In case the inserted tuple included TOASTed columns the ON CONFLICT condition would be handled correctly in case the conflict was caught by the pre-check, but if two transactions entered the speculative insertion phase at the same time, one would have to re-try, and the code for aborting a speculative insertion did not handle deleting the speculatively inserted TOAST datums correctly. TOAST deletion would fail with "ERROR: attempted to delete invisible tuple" as we attempted to remove the TOAST tuples using simple_heap_delete which reasoned that the given tuples should not be visible to the command that wrote them. This commit updates the heap_abort_speculative() function which aborts the conflicting tuple to use itself, via toast_delete, for deleting associated TOAST datums. Like before, the inserted toast rows are not marked as being speculative. This commit also adds a isolationtester spec test, exercising the relevant code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Reported-By: Viren Negi, Oskari Saarenmaa Author: Oskari Saarenmaa, edited a bit by me Bug: #14150 Discussion: <20160519123338.12513.20271@wrigleys.postgresql.org> Backpatch: 9.5, where ON CONFLICT was introduced	2016-08-17 17:03:36 -07:00
Tom Lane	ed0097e4f9	Add SQL-accessible functions for inspecting index AM properties. Per discussion, we should provide such functions to replace the lost ability to discover AM properties by inspecting pg_am (cf commit `65c5fcd35`). The added functionality is also meant to displace any code that was looking directly at pg_index.indoption, since we'd rather not believe that the bit meanings in that field are part of any client API contract. As future-proofing, define the SQL API to not assume that properties that are currently AM-wide or index-wide will remain so unless they logically must be; instead, expose them only when inquiring about a specific index or even specific index column. Also provide the ability for an index AM to override the behavior. In passing, document pg_am.amtype, overlooked in commit `473b93287`. Andrew Gierth, with kibitzing by me and others Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>	2016-08-13 18:31:14 -04:00
Tom Lane	a5fe473ad7	Minor cleanup for access/transam/parallel.c. ParallelMessagePending must be marked volatile, because it's set by a signal handler. On the other hand, it's pointless for HandleParallelMessageInterrupt to save/restore errno; that must be, and is, done at the outer level of the SIGUSR1 signal handler. Calling CHECK_FOR_INTERRUPTS() inside HandleParallelMessages, which itself is called from CHECK_FOR_INTERRUPTS(), seems both useless and hazardous. The comment claiming that this is needed to handle the error queue going away is certainly misguided, in any case. Improve a couple of error message texts, and use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to report loss of parallel worker connection, since that's what's used in e.g. tqueue.c. (Maybe it would be worth inventing a dedicated ERRCODE for this type of failure? But I do not think ERRCODE_INTERNAL_ERROR is appropriate.) Minor stylistic cleanups.	2016-08-01 16:12:01 -04:00
Andres Freund	eca0f1db14	Clear all-frozen visibilitymap status when locking tuples. Since `a892234` & `fd31cd265` the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. There's a remaining issue (denoted by XXX): After the PageIsAllVisible() check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page status could theoretically change. Practically that currently seems impossible, because updaters will hold a page level pin already. Due to the next beta coming up, it seems better to get the required WAL magic bump done before resolving this issue. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Reviewed-By: Robert Haas, Amit Kapila and Andres Freund Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: -	2016-07-18 02:01:13 -07:00
Alvaro Herrera	b78364df18	Remove unused arguments in two GiST subroutines These arguments became unused in commit `2c03216d83`. Noticed while skimming code for unrelated development. This is cosmetic, so no backpatch.	2016-06-28 16:01:13 -04:00
Alvaro Herrera	e3ad3ffa68	Fix handling of multixacts predating pg_upgrade After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit `8e9a16ab8f` and was partially fixed in `a25c2b7c4d`. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as `a25c2b7c4d` but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug: https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org	2016-06-24 18:29:28 -04:00
Tom Lane	8cf739de85	Fix building of large (bigger than shared_buffers) hash indexes. When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit `9f03ca9151`, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <20160623162507.17237.39471@wrigleys.postgresql.org>	2016-06-24 16:57:36 -04:00
Robert Haas	71d05a2c7b	pg_visibility: Add pg_truncate_visibility_map function. This requires some core changes as well so that we can properly WAL-log the truncation. Specifically, it changes the format of the XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC. Patch by me, reviewed but not fully endorsed by Andres Freund.	2016-06-17 17:37:30 -04:00
Robert Haas	38e9f90a22	Fix lazy_scan_heap so that it won't mark pages all-frozen too soon. Commit `a892234f83` added a new bit per page to the visibility map fork indicating whether the page is all-frozen, but incorrectly assumed that if lazy_scan_heap chose to freeze a tuple then that tuple would not need to later be frozen again. This turns out to be false, because xmin and xmax (and conceivably xvac, if dealing with tuples from very old releases) could be frozen at separate times. Thanks to Andres Freund for help in uncovering and tracking down this issue.	2016-06-15 14:30:06 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Robert Haas	c6dbf1fe79	Stop the executor if no more tuples can be sent from worker to leader. If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit `44339b892a` should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila	2016-06-06 14:52:58 -04:00
Robert Haas	fdfaccfa79	Cosmetic improvements to freeze map code. Per post-commit review comments from Andres Freund, improve variable names, comments, and in one place, slightly improve the code structure. Masahiko Sawada	2016-06-03 08:43:41 -04:00
Kevin Grittner	4edb7bd2fd	C comment improvement & typo fix. Thomas Munro	2016-06-02 12:52:41 -05:00
Tom Lane	f5e7b2f910	Mark wal_level as PGDLLIMPORT. Per buildfarm, this is needed to allow extensions to use XLogIsNeeded() in Windows builds.	2016-05-24 22:48:47 -04:00
Alvaro Herrera	c1543a81a7	Revert timeline following in replication slots This reverts commits `f07d18b6e9`, `82c83b3372`, `3a3b309041`, and `24c5f1a103`. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.	2016-05-04 17:32:22 -03:00
Teodor Sigaev	f8467f7da8	Prevent to use magic constants Use macroses for definition amstrategies/amsupport fields instead of hardcoded values. Author: Nikolay Shaplov with addition for contrib/bloom	2016-04-28 16:39:25 +03:00
Teodor Sigaev	e2c79e14d9	Prevent multiple cleanup process for pending list in GIN. Previously, ginInsertCleanup could exit early if it detects that someone else is cleaning up the pending list, without waiting for that someone else to finish the job. But in this case vacuum could miss tuples to be deleted. Cleanup process now locks metapage with a help of heavyweight LockPage(ExclusiveLock), and it guarantees that there is no another cleanup process at the same time. Lock is taken differently depending on caller of cleanup process: any vacuums and gin_clean_pending_list() will be blocked until lock becomes available, ordinary insert uses conditional lock to prevent indefinite waiting on lock. Insert into pending list doesn't use this lock, so insertion isn't blocked. Also, patch adds stopping of cleanup process when at-start-cleanup-tail is reached in order to prevent infinite cleanup in case of massive insertion. But it will stop only for automatic maintenance tasks like autovacuum. Patch introduces choice of limit of memory to use: autovacuum_work_mem, maintenance_work_mem or work_mem depending on call path. Patch for previous releases should be reworked due to changes between 9.6 and previous ones in this area. Discover and diagnostics by Jeff Janes and Tomas Vondra Patch by me with some ideas of Jeff Janes	2016-04-28 16:21:42 +03:00
Tom Lane	bde361fef5	Fix memory leak and other bugs in ginPlaceToPage() & subroutines. Commit `36a35c550a` turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <571276DD.5050303@dalibo.com>	2016-04-20 14:25:15 -04:00
Andres Freund	533cd2303a	Remove trailing commas in enums. These aren't valid C89. Found thanks to gcc's -Wc90-c99-compat. These exist in differing places in most supported branches.	2016-04-14 19:25:16 -07:00
Andres Freund	be65eddd80	Add required database and origin filtering for logical messages. Logical messages, added in `3fe3511d05`, during decoding failed to filter messages emitted in other databases and messages emitted "under" a replication origin the output plugin isn't interested in. Add tests to verify that both types of filtering actually work. While touching message.sql remove hunk obsoleted by `d25379e`. Bump XLOG_PAGE_MAGIC because xl_logical_message changed and because `3fe3511d05` had omitted doing so. `3fe3511d05` additionally didn't bump catversion, but `7a542700d` has done so since. Author: Petr Jelinek Reported-By: Andres Freund Discussion: 20160406142513.wotqy3ba3kanr423@alap3.anarazel.de	2016-04-13 17:38:54 -07:00
Tom Lane	5713f03973	Improve API of GenericXLogRegister(). Rename this function to GenericXLogRegisterBuffer() to make it clearer what it does, and leave room for other sorts of "register" actions in future. Also, replace its "bool isNew" argument with an integer flags argument, so as to allow adding more flags in future without an API break. Alexander Korotkov, adjusted slightly by me	2016-04-12 11:42:06 -04:00
Tom Lane	08e785436f	Get rid of GenericXLogUnregister(). This routine is unsafe as implemented, because it invalidates the page image pointers returned by previous GenericXLogRegister() calls. Rather than complicate the API or the implementation to avoid that, let's just get rid of it; the use-case for having it seems much too thin to justify a lot of work here. While at it, do some wordsmithing on the SGML docs for generic WAL.	2016-04-09 16:39:30 -04:00
Kevin Grittner	848ef42bb8	Add the "snapshot too old" feature This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.	2016-04-08 14:36:30 -05:00
Teodor Sigaev	8b99edefca	Revert CREATE INDEX ... INCLUDING ... It's not ready yet, revert two commits `690c543550` - unstable test output `386e3d7609` - patch itself	2016-04-08 21:52:13 +03:00
Teodor Sigaev	386e3d7609	CREATE INDEX ... INCLUDING (column[, ...]) Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes	2016-04-08 19:45:59 +03:00
Simon Riggs	3fe3511d05	Generic Messages for Logical Decoding API and mechanism to allow generic messages to be inserted into WAL that are intended to be read by logical decoding plugins. This commit adds an optional new callback to the logical decoding API. Messages are either text or bytea. Messages can be transactional, or not, and are identified by a prefix to allow multiple concurrent decoding plugins. (Not to be confused with Generic WAL records, which are intended to allow crash recovery of extensible objects.) Author: Petr Jelinek and Andres Freund Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs Discussion: 5685F999.6010202@2ndquadrant.com	2016-04-06 10:05:41 +01:00
Magnus Hagander	7117685461	Implement backup API functions for non-exclusive backups Previously non-exclusive backups had to be done using the replication protocol and pg_basebackup. With this commit it's now possible to make them using pg_start_backup/pg_stop_backup as well, as long as the backup program can maintain a persistent connection to the database. Doing this, backup_label and tablespace_map are returned as results from pg_stop_backup() instead of being written to the data directory. This makes the server safe from a crash during an ongoing backup, which can be a problem with exclusive backups. The old syntax of the functions remain and work exactly as before, but since the new syntax is safer this should eventually be deprecated and removed. Only reference documentation is included. The main section on backup still needs to be rewritten to cover this, but since that is already scheduled for a separate large rewrite, it's not included in this patch. Reviewed by David Steele and Amit Kapila	2016-04-05 20:03:49 +02:00
Teodor Sigaev	65578341af	Add Generic WAL interface This interface is designed to give an access to WAL for extensions which could implement new access method, for example. Previously it was impossible because restoring from custom WAL would need to access system catalog to find a redo custom function. This patch suggests generic way to describe changes on page with standart layout. Bump XLOG_PAGE_MAGIC because of new record type. Author: Alexander Korotkov with a help of Petr Jelinek, Markus Nullmeier and minor editorization by my Reviewers: Petr Jelinek, Alvaro Herrera, Teodor Sigaev, Jim Nasby, Michael Paquier	2016-04-01 12:21:48 +03:00
Alvaro Herrera	24c5f1a103	Enable logical slots to follow timeline switches When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek	2016-03-30 20:07:05 -03:00
Alvaro Herrera	3b02ea4f07	XLogReader general code cleanup Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.	2016-03-30 18:56:13 -03:00
Teodor Sigaev	ccd6eb49a4	Introduce traversalValue for SP-GiST scan During scan sometimes it would be very helpful to know some information about parent node or all ancestor nodes. Right now reconstructedValue could be used but it's not a right usage of it (range opclass uses that). traversalValue is arbitrary piece of memory in separate MemoryContext while reconstructedVale should have the same type as indexed column. Subsequent patches for range opclass and quad4d tree will use it. Author: Alexander Lebedev, Teodor Sigaev	2016-03-30 18:29:28 +03:00
Robert Haas	314cbfc5da	Add new replication mode synchronous_commit = 'remote_apply'. In this mode, the master waits for the transaction to be applied on the remote side, not just written to disk. That means that you can count on a transaction started on the standby to see all commits previously acknowledged by the master. To make this work, the standby sends a reply after replaying each commit record generated with synchronous_commit >= 'remote_apply'. This introduces a small inefficiency: the extra replies will be sent even by standbys that aren't the current synchronous standby. But previously-existing synchronous_commit levels make no attempt at all to optimize which replies are sent based on what the primary cares about, so this is no worse, and at least avoids any extra replies for people not using the feature at all. Thomas Munro, reviewed by Michael Paquier and by me. Some additional tweaks by me.	2016-03-29 21:29:49 -04:00
Andres Freund	1a7a43672b	Don't use !! but != 0/NULL to force boolean evaluation. I introduced several uses of !! to force bit arithmetic to be boolean, but per discussion the project prefers != 0/NULL. Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com	2016-03-27 18:10:19 +02:00
Andres Freund	af4472bcb8	Change various GinIs macros to return 0/1. Returning the direct result of bit arithmetic, in a macro intended to be used in a boolean manner, can be problematic if the return value is stored in a variable of type 'bool'. If bool is implemented using C99's _Bool, that can lead to comparison failures if the variable is then compared again with the expression (see ginStepRight() for an example that fails), as _Bool forces the result to be 0/1. That happens in some configurations of newer MSVC compilers. It's also problematic when storing the result of such an expression in a narrower type. Several gin macros have been declared in that style since gin's initial commit in `8a3631f8d8`. There's a lot more macros like this, but this is the only one causing regression test failures; and I don't want to commit and backpatch a larger patch with lots of conflicts just before the next set of minor releases. Discussion: 20150811154237.GD17575@awork2.anarazel.de Backpatch: All supported branches	2016-03-27 17:46:48 +02:00

1 2 3 4 5 ...

1109 commits