postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-15 22:10:45 -04:00

Author	SHA1	Message	Date
Simon Riggs	2c8a4e9be2	Wake WALSender to reduce data loss at failover for async commit. WALSender now woken up after each background flush by WALwriter, avoiding multi-second replication delay for an all-async commit workload. Replication delay reduced from 7s with default settings to 200ms and often much less, allowing significantly reduced data loss at failover. Andres Freund and Simon Riggs	2012-06-07 19:22:47 +01:00
Robert Haas	b50991eedb	Fix more crash-safe visibility map bugs, and improve comments. In lazy_scan_heap, we could issue bogus warnings about incorrect information in the visibility map, because we checked the visibility map bit before locking the heap page, creating a race condition. Fix by rechecking the visibility map bit before we complain. Rejigger some related logic so that we rely on the possibly-outdated all_visible_according_to_vm value as little as possible. In heap_multi_insert, it's not safe to clear the visibility map bit before beginning the critical section. The visibility map is not crash-safe unless we treat clearing the bit as a critical operation. Specifically, if the transaction were to error out after we set the bit and before entering the critical section, we could end up writing the heap page to disk (with the bit cleared) and crashing before the visibility map page made it to disk. That would be bad. heap_insert has this correct, but somehow the order of operations got rearranged when heap_multi_insert was added. Also, add some more comments to visibilitymap_test, lazy_scan_heap, and IndexOnlyNext, expounding on concurrency issues. Per extensive code review by Andres Freund, and further review by Tom Lane, who also made the original report about the bogus warnings.	2012-06-07 12:48:13 -04:00
Tom Lane	3dd8e59681	Fix bogus handling of control characters in json_lex_string(). The original coding misbehaved if "char" is signed, and also made the extremely poor decision to print control characters literally when trying to complain about them. Report and patch by Shigeru Hanada. In passing, also fix core dump risk in report_parse_error() should the parse state be something other than what it expects.	2012-06-04 20:43:57 -04:00
Simon Riggs	d3abbbebe5	Avoid early reuse of btree pages, causing incorrect query results. When we allowed read-only transactions to skip assigning XIDs we introduced the possibility that a fully deleted btree page could be reused. This broke the index link sequence which could then lead to indexscans silently returning fewer rows than would have been correct. The actual incidence of silent errors from this is thought to be very low because of the exact workload required and locking pre-conditions. Fix is to remove pages only if index page opaque->btpo.xact precedes RecentGlobalXmin. Noah Misch, reviewed by Simon Riggs	2012-06-01 12:21:45 +01:00
Simon Riggs	055c352abb	After any checkpoint, close all smgr files handles in bgwriter	2012-06-01 09:24:53 +01:00
Simon Riggs	a297d64d92	Checkpointer starts before bgwriter to avoid missing fsync requests. Noted while testing Hot Standby startup.	2012-06-01 08:25:17 +01:00
Simon Riggs	1ec6a2bbc9	Provide interim statistics while in mid-checkpoint. Re-implements similar functionality in 9.1 and previously which was removed during split of checkpointer and bgwriter. Requested/spotted by Magnus Hagander	2012-06-01 08:19:06 +01:00
Tom Lane	a04dc87db1	Improve comment for GetStableLatestTransactionId().	2012-05-31 11:20:02 -04:00
Simon Riggs	a2b516dab9	Only throw recovery conflicts when InHotStandby. Bug fix to recent patch to allow Index Only Scans on Hot Standby. Bug report from Jaime Casanova	2012-05-31 13:11:47 +01:00
Tom Lane	ad0009e7be	Force PL and range-type support functions to be owned by a superuser. We allow non-superusers to create procedural languages (with restrictions) and range datatypes. Previously, the automatically-created support functions for these objects ended up owned by the creating user. This represents a rather considerable security hazard, because the owning user might be able to alter a support function's definition in such a way as to crash the server, inject trojan-horse SQL code, or even execute arbitrary C code directly. It appears that right now the only actually exploitable problem is the infinite-recursion bug fixed in the previous patch for CVE-2012-2655. However, it's not hard to imagine that future additions of more ALTER FUNCTION capability might unintentionally open up new hazards. To forestall future problems, cause these support functions to be owned by the bootstrap superuser, not the user creating the parent object.	2012-05-30 23:47:57 -04:00
Tom Lane	33c6eaf78e	Ignore SECURITY DEFINER and SET attributes for a PL's call handler. It's not very sensible to set such attributes on a handler function; but if one were to do so, fmgr.c went into infinite recursion because it would call fmgr_security_definer instead of the handler function proper. There is no way for fmgr_security_definer to know that it ought to call the handler and not the original function referenced by the FmgrInfo's fn_oid, so it tries to do the latter, causing the whole process to start over again. Ordinarily such misconfiguration of a procedural language's handler could be written off as superuser error. However, because we allow non-superuser database owners to create procedural languages and the handler for such a language becomes owned by the database owner, it is possible for a database owner to crash the backend, which ideally shouldn't be possible without superuser privileges. In 9.2 and up we will adjust things so that the handler functions are always owned by superusers, but in existing branches this is a minor security fix. Problem noted by Noah Misch (after several of us had failed to detect it :-(). This is CVE-2012-2655.	2012-05-30 23:27:57 -04:00
Tom Lane	cd0ff9c0f4	Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich. We used to only allow offsets less than +/-13 hours, then it was +/14, then it was +/-15. That's still not good enough though, as per today's bug report from Patric Bechtel. This time I actually looked through the Olson timezone database to find the largest offsets used anywhere. The winners are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at +15:13:42 until 1867. So we'd better allow offsets less than +/-16 hours. Given the history, we are way overdue to have some greppable #define symbols controlling this, so make some ... and also remove an obsolete comment that didn't get fixed the last time. Back-patch to all supported branches.	2012-05-30 19:58:35 -04:00
Robert Haas	07ab1383e3	Fix two more bugs in fast-path relation locking. First, the previous code failed to account for the fact that, during Hot Standby operation, the startup process takes AccessExclusiveLocks on relations without setting MyDatabaseId. This resulted in fast path strong lock counts failing to be incremented with the startup process took locks, which in turn allowed conflicting lock requests to succeed when they should not have. Report by Erik Rijkers, diagnosis by Heikki Linnakangas. Second, LockReleaseAll() failed to honor the allLocks and lockmethodid restrictions with respect to fast-path locks. It's not clear to me whether this produces any user-visible breakage at the moment, but it's certainly wrong. Rearrange order of operations in LockReleaseAll to fix. Noted by Tom Lane.	2012-05-30 16:17:46 -04:00
Heikki Linnakangas	d1996ed5e8	Change the way parent pages are tracked during buffered GiST build. We used to mimic the way a stack is constructed when descending the tree during normal GiST inserts, but that was quite complicated during a buffered build. It was also wrong: in GiST, the left-to-right relationships on different levels might not match each other, so that when you know the parent of a child page, you won't necessarily find the parent of the page to the right of the child page by following the rightlinks at the parent level. This sometimes led to "could not re-find parent" errors while building a GiST index. We now use a simple hash table to track the parent of every internal page. Whenever a page is split, and downlinks are moved from one page to another, we update the hash table accordingly. This is also better for performance than the old method, as we never need to move right to re-find the parent page, which could take a significant amount of time for buffers that were created much earlier in the index build.	2012-05-30 12:05:57 +03:00
Heikki Linnakangas	be02b16826	Delete the temporary file used in buffered GiST build, after the build. There were two bugs here: We forgot to call gistFreeBuildBuffers() function at the end of build, and we passed interXact == true to BufFileCreateTemp, so the file wasn't automatically cleaned up at end-of-transaction either.	2012-05-30 12:05:57 +03:00
Heikki Linnakangas	4bc6fb57f7	Fix integer overflow bug in GiST buffering build calculations. The result of (maintenance_work_mem * 1024) / BLCKSZ doesn't fit in a signed 32-bit integer, if maintenance_work_mem >= 2GB. Use double instead. And while we're at it, write the calculations in an easier to understand form, with the intermediary steps written out and commented.	2012-05-29 22:27:42 +03:00
Tom Lane	2755abf386	Teach AbortOutOfAnyTransaction to clean up partially-started transactions. AbortOutOfAnyTransaction failed to do anything if the state it saw on entry corresponded to failing partway through StartTransaction. I fixed AbortCurrentTransaction to cope with that case way back in commit `60b2444cc3`, but evidently overlooked that AbortOutOfAnyTransaction should do likewise. Back-patch to all supported branches. It's not clear that this omission has any more-than-cosmetic consequences, but it's also not clear that it doesn't, so back-patching seems the least risky choice.	2012-05-28 23:57:06 -04:00
Peter Eisentraut	388d251679	Update SQL features list Set E081 Basic Privileges to supported, since by the letter of it, we support it, even though not all possible forms of USAGE privileges are implemented.	2012-05-27 23:34:16 +03:00
Peter Eisentraut	27314d32a8	Suppress -Wunused-result warning about write() This is related to `aa90e148ca`, but this code is only used under -DLINUX_OOM_ADJ, so it was apparently overlooked then.	2012-05-27 22:35:01 +03:00
Tom Lane	532fe28dad	Prevent synchronized scanning when systable_beginscan chooses a heapscan. The only interesting-for-performance case wherein we force heapscan here is when we're rebuilding the relcache init file, and the only such case that is likely to be examining a catalog big enough to be syncscanned is RelationBuildTupleDesc. But the early-exit optimization in that code gets broken if we start the scan at a random place within the catalog, so that allowing syncscan is actually a big deoptimization if pg_attribute is large (at least for the normal case where the rows for core system catalogs have never been changed since initdb). Hence, prevent syncscan here. Per my testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane, though neither of them seem to have actually hit this specific problem. Back-patch to 8.3, where syncscan was introduced.	2012-05-26 19:09:52 -04:00
Tom Lane	d3b97d1488	Fix string truncation to be multibyte-aware in text_name and bpchar_name. Previously, casts to name could generate invalidly-encoded results. Also, make these functions match namein() more exactly, by consistently using palloc0() instead of ad-hoc zeroing code. Back-patch to all supported branches. Karl Schnaitter and Tom Lane	2012-05-25 17:34:51 -04:00
Tom Lane	2a4c46e0ba	Fix array overrun in regex code. zaptreesubs() was coded to unconditionally reset a capture subre's corresponding pmatch[] entry. However, in regexes without backrefs, that array is caller-supplied and might not have as many entries as the regex has capturing parens. So check the array length and do nothing if there is no corresponding entry, much as subset() does. Failure to check this resulted in a stack clobber in the case reported by Marko Kreen. This bug appears to have been latent in the regex library from the beginning. It was not exposed because find() called dissect() not cdissect(), and the dissect() code path didn't ever call zaptreesubs() (formerly zapmem()). When I unified dissect() and cdissect() in commit `4dd78bf37a`, the problem was exposed. Now that I've seen this, I'm rather suspicious that we might need to back-patch it; but will refrain for now, for lack of evidence that the case can be hit in the previous coding.	2012-05-24 13:56:16 -04:00
Tom Lane	ed962fd712	Ensure that seqscans check for interrupts at least once per page. If a seqscan encounters many consecutive pages containing only dead tuples, it can remain in the loop in heapgettup for a long time, and there was no CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were real-world situations where a query would be effectively uncancelable for long stretches. Add a check placed to occur once per page, which should be enough to provide reasonable response time without adding any measurable overhead. Report and patch by Merlin Moncure (though I tweaked it a bit). Back-patch to all supported branches.	2012-05-22 19:42:05 -04:00
Robert Haas	8fbe5a317d	Fix error message for COMMENT/SECURITY LABEL ON COLUMN xxx IS 'yyy' When the column name is an unqualified name, rather than table.column, the error message complains about too many dotted names, which is wrong. Report by Peter Eisentraut based on examination of the sepgsql regression test output, but the problem also affects COMMENT. New wording as suggested by Tom Lane.	2012-05-22 11:23:36 -04:00
Robert Haas	219c024c64	Repair out-of-date information in src/backend/storage/buffer/README. In commit `d526575f89`, we changed things so that buffer usage counts are incremented when the buffer is pinned, rather than when it is unpinned, but the README file didn't get the memo. Report by Amit Kapila.	2012-05-22 09:32:09 -04:00
Tom Lane	b94ce6e807	Move postmaster's RemovePgTempFiles call to a less randomly chosen place. There is no reason to do this as early as possible in postmaster startup, and good reason not to do it until we have completely created the postmaster's lock file, namely that it might contribute to pg_ctl thinking that postmaster startup has timed out. (This would require a rather unusual amount of time to be spent scanning temp file directories, but we have at least one field report of it happening reproducibly.) Back-patch to 9.1. Before that, pg_ctl didn't wait for additional info to be added to the lock file, so it wasn't a problem. Note that this is not a complete fix to the slow-start issue in 9.1, because we still had identify_system_timezone being run during postmaster start in 9.1. But that's at least a reasonably well-defined delay, with an easy workaround if needed, whereas the temp-files scan is not so predictable and cannot be avoided.	2012-05-21 22:50:30 -04:00
Tom Lane	efae4653c9	Update woefully-obsolete comment. The accurate info about what's in a lock file has been in miscadmin.h for some time, so let's just make this comment point there instead of maintaining a duplicative copy.	2012-05-21 22:11:00 -04:00
Peter Eisentraut	f1f6737e15	Fix incorrect logic in JSON number lexer Detectable by gcc -Wlogical-op. Add two regression test cases that would previously allow incorrect values to pass.	2012-05-20 02:24:46 +03:00
Peter Eisentraut	2273a50364	Realign some --help output to have better spacing between columns	2012-05-18 20:34:14 +03:00
Heikki Linnakangas	1d27dcf578	Fix bug in gistRelocateBuildBuffersOnSplit(). When we create a temporary copy of the old node buffer, in stack, we mustn't leak that into any of the long-lived data structures. Before this patch, when we called gistPopItupFromNodeBuffer(), it got added to the array of "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the pointer added to the loaded buffers array points to garbage. Often that goes unnotied, because when we go through the array of loaded buffers to unload them, buffers with a NULL pageBuffer are ignored, which can often happen by accident even if the pointer points to garbage. This patch fixes that by marking the temporary copy in stack explicitly as temporary, and refrain from adding buffers marked as temporary to the array of loaded buffers. While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber and improve comments a bit. This isn't strictly necessary, but makes debugging easier.	2012-05-18 19:38:32 +03:00
Peter Eisentraut	939ec9b8a4	Update SQL features/conformance information to SQL:2011	2012-05-17 09:50:04 +03:00
Peter Eisentraut	be6d1c88a4	Change COLLATION keyword category It was changed from unreserved to reserved as part of the COLLATION FOR syntax, but it turns out that type_func_name_keyword is sufficient.	2012-05-16 20:19:44 +03:00
Tom Lane	488c6dd170	Improve error message for ALTER COLUMN TYPE coercion failure. Per recent discussion, the error message for this was actually a trifle inaccurate, since it said "cannot be cast" which might be incorrect. Adjust that wording, and add a HINT suggesting that a USING clause might be needed.	2012-05-16 07:28:25 -04:00
Heikki Linnakangas	6593c5b5dc	Fix bug in freespace calculation in heap_multi_insert(). If the amount of freespace on page was less than the amount reserved by fillfactor, the calculation would underflow. This fixes bug #6643 reported by Tomonari Katsumata.	2012-05-16 14:13:06 +03:00
Peter Eisentraut	c8e086795a	Remove whitespace from end of lines pgindent and perltidy should clean up the rest.	2012-05-15 22:19:41 +03:00
Peter Eisentraut	8afb026e57	Remove stray nbsp character	2012-05-15 21:38:59 +03:00
Heikki Linnakangas	d2495f272c	Fix bug in to_tsquery(). We were using memcpy() to copy to a possibly overlapping memory region, which is a no-no. Use memmove() instead.	2012-05-15 19:27:34 +03:00
Tom Lane	9b63e9869f	In pgstat.c, use a timeout in WaitLatchOrSocket only on Windows. We have no need for a timeout here really, but some broken products from Redmond seem to lose FD_READ events occasionally, and waking up and retrying the recv() is the only known way to work around that. Perhaps somebody will be motivated to figure out a better answer here; but not I.	2012-05-14 23:51:34 -04:00
Tom Lane	5a2bb06012	Revert "Add some temporary instrumentation to pgstat.c." This reverts commit `7d88bb73f7`. That instrumentation has served its purpose.	2012-05-14 23:08:10 -04:00
Tom Lane	e42a21b9e6	Assert that WaitLatchOrSocket callers cannot wait only for writability. Since we have chosen to report socket EOF and error conditions via the WL_SOCKET_READABLE flag bit, it's unsafe to wait only for WL_SOCKET_WRITEABLE; the caller would never be notified of the socket condition, and in some of these implementations WaitLatchOrSocket would busy-wait until something else happens. Add this restriction to the API specification, and add Asserts to check that callers don't try to do that. At some point we might want to consider adjusting the API to relax this restriction, but until we have an actual use case for waiting on a write-only socket, it seems premature to design a solution.	2012-05-14 16:12:28 -04:00
Tom Lane	d461d0502b	For testing purposes, reinsert a timeout in pgstat.c's wait call. Test results from buildfarm members mastodon/narwhal (Windows Server 2003) make it look like that platform just plain loses FD_READ events occasionally, and the only reason our previous coding seemed to work was that it timed out every couple of seconds and retried the whole operation. Try to verify this by reinserting a finite timeout into the pgstat loop. This isn't meant to be a permanent patch either, just to confirm or disprove a theory.	2012-05-14 15:03:14 -04:00
Tom Lane	f1ca51549e	Force pgwin32_recv into nonblock mode when called from pgstat.c. This should get rid of the usage of pgwin32_waitforsinglesocket entirely, and perhaps thereby remove the race condition that's evidently still present on some versions of Windows. The previous arrangement was a bit unsafe anyway, since waiting at the recv() would not allow pgstat to notice postmaster death.	2012-05-14 10:57:07 -04:00
Heikki Linnakangas	f15c2eae9c	Remove unnecessary pg_verifymbstr() calls from tsvector/query in functions. The input should've been validated well before it hits the input function. Doing so again is a waste of cycles.	2012-05-14 14:30:32 +03:00
Heikki Linnakangas	9e4637bf89	Update comments that became out-of-date with the PGXACT struct. When the "hot" members of PGPROC were split off to separate PGXACT structs, many PGPROC fields referred to in comments were moved to PGXACT, but the comments were neglected in the commit. Mostly this is just a search/replace of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for prepared transactions changed more, making some of the comments totally bogus. Noah Misch	2012-05-14 10:28:55 +03:00
Peter Eisentraut	64f09ca386	Remove leftovers of BeOS port These should have been removed when the BeOS port was removed in `44f9021223`.	2012-05-14 04:50:39 +03:00
Peter Eisentraut	6bf1e7668d	Small punctuation editing of postgresql.conf.sample	2012-05-14 04:50:39 +03:00
Tom Lane	7d88bb73f7	Add some temporary instrumentation to pgstat.c. Log main-loop blocking events and the results of inquiry messages. This is to get some clarity as to what's happening on those Windows buildfarm members that still don't like the latch-ified stats collector. This bulks up the postmaster log a tad, so I won't leave it in place for long.	2012-05-13 21:11:31 -04:00
Tom Lane	b8347138e9	Fix DROP TABLESPACE to unlink symlink when directory is not there. If the tablespace directory is missing entirely, we allow DROP TABLESPACE to go through, on the grounds that it should be possible to clean up the catalog entry in such a situation. However, we forgot that the pg_tblspc symlink might still be there. We should try to remove the symlink too (but not fail if it's no longer there), since not doing so can lead to weird behavior subsequently, as per report from Michael Nolan. There was some discussion of adding dependency links to prevent DROP TABLESPACE when the catalogs still contain references to the tablespace. That might be worth doing too, but it's an orthogonal question, and in any case wouldn't be back-patchable. Back-patch to 9.0, which is as far back as the logic looks like this. We could possibly do something similar in 8.x, but given the lack of reports I'm not sure it's worth the trouble, and anyway the case could not arise in the form the logic is meant to cover (namely, a post-DROP transaction rollback having resurrected the pg_tablespace entry after some or all of the filesystem infrastructure is gone).	2012-05-13 18:06:52 -04:00
Tom Lane	966970ed63	Re-revert stats collector latch changes. This reverts commit `cb2f2873d6`, restoring the latch-ified stats collector logic. We'll soon see if this works any better on the Windows buildfarm machines.	2012-05-13 14:44:39 -04:00
Tom Lane	b85427f227	Attempt to fix some issues in our Windows socket code. Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition. We might want to tweak this further, but it was surely wrong as-is. Make pgwin32_waitforsinglesocket detach its private event object from the passed socket before returning. I suspect that failure to do so leads to race conditions when other code (such as WaitLatchOrSocket) attaches a different event object to the same socket. Moreover, the existing coding meant that repeated calls to pgwin32_waitforsinglesocket would perform ResetEvent on an event actively connected to a socket, which is rumored to be an unsafe practice; the WSAEventSelect documentation appears to recommend against this, though it does not say not to do it in so many words. Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to detach events from sockets, rather than passing the event in the second parameter. The WSAEventSelect documentation says that the second parameter is ignored if the third is 0, so theoretically this should make no difference. However, elsewhere on the same reference page the use of NULL in this context is recommended, and I have found suggestions on the net that some versions of Windows have bugs with a non-NULL second parameter in this usage. Some other mostly-cosmetic cleanup, such as using the right one of WSAGetLastError and GetLastError for reporting errors from these functions.	2012-05-13 14:35:40 -04:00

1 2 3 4 5 ...

12704 commits