postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-09 19:16:17 -04:00

Author	SHA1	Message	Date
Heikki Linnakangas	81f4e6cd27	Revert the behavior of inet/cidr functions to not unpack the arguments. I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.	2011-12-12 10:05:29 +02:00
Tom Lane	d08b64581f	Stamp 8.4.10.	2011-12-01 16:53:13 -05:00
Peter Eisentraut	44ff103566	Fix server header file installation with vpath builds Several server header files would not be installed in vpath builds because they live in the build directory.	2011-11-10 20:57:50 +02:00
Heikki Linnakangas	61f4750bf8	Make DatumGetInetP() unpack inet datums with a 1-byte header, and add a new macro, DatumGetInetPP(), that does not. This brings these macros in line with other DatumGet*P() macros. Backpatch to 8.3, where 1-byte header varlenas were introduced.	2011-11-08 22:45:13 +02:00
Tom Lane	b05ce7550c	Fix race condition with toast table access from a stale syscache entry. If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.	2011-11-01 19:48:56 -04:00
Tom Lane	07d306785e	Stamp 8.4.9.	2011-09-22 18:03:52 -04:00
Simon Riggs	7c24bac64c	PublishStartupProcessInformation() to avoid rare hang in recovery. Bgwriter could cause hang in recovery during page concurrent cleaning. Bug report and testing by Bernd Helmle, fix by me	2011-09-08 12:03:28 +01:00
Heikki Linnakangas	f2d875983b	Move the line to undefine setlocale() macro on Win32 outside USE_REPL_SNPRINTF ifdef block. It has nothing to do with whether the replacement snprintf function is used. It caused no live bug, because the replacement snprintf function is always used on Win32, but it was nevertheless misplaced.	2011-09-01 09:36:41 +03:00
Tom Lane	bcc9b17bbf	Fix race condition in relcache init file invalidation. The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.	2011-08-16 13:12:17 -04:00
Tom Lane	ceaf5052c6	Fix nested PlaceHolderVar expressions that appear only in targetlists. A PlaceHolderVar's expression might contain another, lower-level PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one certainly will be also, and so we have to make sure that both of them get into the placeholder_list with correct ph_may_need values during the initial pre-scan of the query (before deconstruct_jointree starts). We did this correctly for PlaceHolderVars appearing in the query quals, but overlooked the issue for those appearing in the top-level targetlist; with the result that nested placeholders referenced only in the targetlist did not work correctly, as illustrated in bug #6154. While at it, add some error checking to find_placeholder_info to ensure that we don't try to create new placeholders after it's too late to do so; they have to all be created before deconstruct_jointree starts. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2011-08-09 00:49:11 -04:00
Tom Lane	b503da135a	Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.	2011-05-30 17:07:19 -04:00
Tom Lane	5a4305fdf7	Avoid changing an index's indcheckxmin horizon during REINDEX. There can never be a need to push the indcheckxmin horizon forward, since any HOT chains that are actually broken with respect to the index must pre-date its original creation. So we can just avoid changing pg_index altogether during a REINDEX operation. This offers a cleaner solution than my previous patch for the problem found a few days ago that we mustn't try to update pg_index while we are reindexing it. System catalog indexes will always be created with indcheckxmin = false during initdb, and with this modified code we should never try to change their pg_index entries. This avoids special-casing system catalogs as the former patch did, and should provide a performance benefit for many cases where REINDEX formerly caused an index to be considered unusable for a short time. Back-patch to 8.3 to cover all versions containing HOT. Note that this patch changes the API for index_build(), but I believe it is unlikely that any add-on code is calling that directly.	2011-04-19 18:51:08 -04:00
Marc G. Fournier	7b8b256f08	Tag 8.4.8.	2011-04-15 00:17:14 -03:00
Tom Lane	5d3853a7fa	Fix plpgsql's issues with dropped columns in rowtypes in 8.4 branch. This is a back-patch of commit `dcb2bda9b7` of Aug 6 2009, which fixed assorted cases in which plpgsql would fail to cope with composite types that contain any dropped columns. Per discussion, this fix has been out in 9.0 for long enough to make it improbable that it creates any new bugs, so this is a low-risk fix. To make it even lower risk, I did not back-patch the changes in execQual.c, but just accepted the duplication of code between there and tupconvert.c. The added files tupconvert.h and tupconvert.c match their current states in HEAD.	2011-04-07 13:55:28 -04:00
Tom Lane	fca20bcec9	Prevent a rowtype from being included in itself. Eventually we might be able to allow that, but it's not clear how many places need to be fixed to prevent infinite recursion when there's a direct or indirect inclusion of a rowtype in itself. One such place is CheckAttributeType(), which will recurse to stack overflow in cases such as those exhibited in bug #5950 from Alex Perepelica. If we were sure it was the only such place, we could easily modify the code added by this patch to stop the recursion without a complaint ... but it probably isn't the only such place. Hence, throw error until such time as someone is excited enough about this type of usage to put work into making it safe. Back-patch as far as 8.3. 8.2 doesn't have the recursive call in CheckAttributeType in the first place, so I see no need to add code there in the absence of clear evidence of a problem elsewhere.	2011-03-28 15:45:08 -04:00
Tom Lane	1118e83198	Fix dangling-pointer problem in before-row update trigger processing. ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new tuple value by seeing if the returned tuple was pointer-equal to the old one. But the "old one" was in estate->es_junkFilter's result slot, which would be scribbled on if we had done an EvalPlanQual update in response to a concurrent update of the target tuple; therefore we were comparing a dangling pointer to a live one. Given the right set of circumstances we could get a false match, resulting in not forcing the tuple to be stored in the slot we thought it was stored in. In the case reported by Maxim Boguk in bug #5798, this led to "cannot extract system attribute from virtual tuple" failures when trying to do "RETURNING ctid". I believe there is a very-low-probability chance of more serious errors, such as generating incorrect index entries based on the original rather than the trigger-modified version of the row. In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers, ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to have similar APIs. In the back branches I just changed ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers case.	2011-02-21 21:18:19 -05:00
Tom Lane	2b3a0630b5	Fix tsmatchsel() to account properly for null rows. ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh	2011-02-17 19:01:01 -05:00
Magnus Hagander	ff18de8112	Undefine setlocale() macro on Win32 New versions of libintl redefine setlocale() to a macro which causes problems when the backend and libintl are linked against different versions of the runtime, which is often the case in msvc builds. Hiroshi Inoue, slightly updated comment by me	2011-02-01 13:22:58 +01:00
Marc G. Fournier	7df910c7d1	Tag 8.4.7	2011-01-27 22:23:36 -04:00
Tom Lane	d5b2587c20	Avoid unexpected conversion overflow in planner for distant date values. The "date" type supports a wider range of dates than int64 timestamps do. However, there is pre-int64-timestamp code in the planner that assumes that all date values can be converted to timestamp with impunity. Fortunately, what we really need out of the conversion is always a double (float8) value; so even when the date is out of timestamp's range it's possible to produce a sane answer. All we need is a code path that doesn't try to force the result into int64. Per trouble report from David Rericha. Back-patch to all supported versions. Although this is surely a corner case, there's not much point in advertising a date range wider than timestamp's if we will choke on such values in unexpected places.	2010-12-28 22:50:30 -05:00
Tom Lane	0a0eec670d	Remove optreset from src/port/ implementations of getopt and getopt_long. We don't actually need optreset, because we can easily fix the code to ensure that it's cleanly restartable after having completed a scan over the argv array; which is the only case we need to restart in. Getting rid of it avoids a class of interactions with the system libraries and allows reversion of my change of yesterday in postmaster.c and postgres.c. Back-patch to 8.4. Before that the getopt code was a bit different anyway.	2010-12-16 16:22:18 -05:00
Marc G. Fournier	35862ff7f2	Tag 8.4.6.	2010-12-13 22:59:19 -04:00
Tom Lane	f3224e010d	Force default wal_sync_method to be fdatasync on Linux. Recent versions of the Linux system header files cause xlogdefs.h to believe that open_datasync should be the default sync method, whereas formerly fdatasync was the default on Linux. open_datasync is a bad choice, first because it doesn't actually outperform fdatasync (in fact the reverse), and second because we try to use O_DIRECT with it, causing failures on certain filesystems (e.g., ext4 with data=journal option). This part of the patch is largely per a proposal from Marti Raudsepp. More extensive changes are likely to follow in HEAD, but this is as much change as we want to back-patch. Also clean up confusing code and incorrect documentation surrounding the fsync_writethrough option. Those changes shouldn't result in any actual behavioral change, but I chose to back-patch them anyway to keep the branches looking similar in this area. In 9.0 and HEAD, also do some copy-editing on the WAL Reliability documentation section. Back-patch to all supported branches, since any of them might get used on modern Linux versions.	2010-12-08 20:01:19 -05:00
Heikki Linnakangas	7ff02add83	The GiST scan algorithm uses LSNs to detect concurrent pages splits, but temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.	2010-11-16 11:29:28 +02:00
Marc G. Fournier	2f76a4b5a6	Tag 8.4.5	2010-10-01 10:35:31 -03:00
Tom Lane	dc9cc887b7	Fix PlaceHolderVar mechanism's interaction with outer joins. The point of a PlaceHolderVar is to allow a non-strict expression to be evaluated below an outer join, after which its value bubbles up like a Var and can be forced to NULL when the outer join's semantics require that. However, there was a serious design oversight in that, namely that we didn't ensure that there was actually a correct place in the plan tree to evaluate the placeholder :-(. It may be necessary to delay evaluation of an outer join to ensure that a placeholder that should be evaluated below the join can be evaluated there. Per recent bug report from Kirill Simonov. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2010-09-28 14:15:42 -04:00
Magnus Hagander	40f34ec4fd	Convert cvsignore to gitignore, and add .gitignore for build targets.	2010-09-22 12:57:08 +02:00
Tom Lane	e0664d5cdc	Fix up flushing of composite-type typcache entries to be driven directly by SI invalidation events, rather than indirectly through the relcache. In the previous coding, we had to flush a composite-type typcache entry whenever we discarded the corresponding relcache entry. This caused problems at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report from Jeff Davis, and might result in real-world problems given the kind of unexpected relcache flush that that test mechanism is intended to model. The new coding decouples relcache and typcache management, which is a good thing anyway from a structural perspective. The cost is that we have to search the typcache linearly to find entries that need to be flushed. There are a couple of ways we could avoid that, but at the moment it's not clear it's worth any extra trouble, because the typcache contains very few entries in typical operation. Back-patch to 8.2, the same as some other recent fixes in this general area. The patch could be carried back to 8.0 with some additional work, but given that it's only hypothetical whether we're fixing any problem observable in the field, it doesn't seem worth the work now.	2010-09-02 03:16:59 +00:00
Tom Lane	c472e780a3	Fix an additional set of problems in GIN's handling of lossy page pointers. Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count() from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee: \| dd:')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:')); Back-patch to 8.4 where the lossy pointer code was introduced.	2010-08-01 19:16:55 +00:00
Tom Lane	8d3487dacc	Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem() routines to make them behave better in the presence of "lossy" index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple "entry" streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn't really need to be visited. In Artur's example case, a query like WHERE tsvector @@ to_tsquery('a & b') was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b') The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there's more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur's example, such cases don't occur often enough to be worth trying to optimize. Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not subject to these problems.	2010-07-31 00:31:12 +00:00
Tom Lane	6b8494a3ad	Improved version of patch to protect pg_get_expr() against misuse: look through join alias Vars to avoid breaking join queries, and move the test to someplace where it will catch more possible ways of calling a function. We still ought to throw away the whole thing in favor of a data-type-based solution, but that's not feasible in the back branches. Completion of back-port of my patch of yesterday.	2010-07-30 17:56:59 +00:00
Tom Lane	5a188dcb87	Fix potential failure when hashing the output of a subplan that produces a pass-by-reference datatype with a nontrivial projection step. We were using the same memory context for the projection operation as for the temporary context used by the hashtable routines in execGrouping.c. However, the hashtable routines feel free to reset their temp context at any time, which'd lead to destroying input data that was still needed. Report and diagnosis by Tao Ma. Back-patch to 8.1, where the problem was introduced by the changes that allowed us to work with "virtual" tuples instead of materializing intermediate tuple values everywhere. The earlier code looks quite similar, but it doesn't suffer the problem because the data gets copied into another context as a result of having to materialize ExecProject's output tuple.	2010-07-28 04:51:08 +00:00
Heikki Linnakangas	0f0b236b03	The previous fix in CVS HEAD and 8.4 for handling the case where a cursor being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane pointed out. The bug affects FOR statement variants too, because you can close an implicitly created cursor too by guessing the "<unnamed portal X>" name created for it. To fix that, "pin" the portal to prevent it from being dropped while it's being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is the oldest supported version.	2010-07-05 09:27:24 +00:00
Robert Haas	bf53e7938d	Allow REASSIGNED OWNED to handle opclasses and opfamilies. Backpatch to 8.3, which is as far back as we have opfamilies. The opclass portion could probably be backpatched to 8.2, when REASSIGN OWNED was added, but for now I have not done that. Asko Tiidumaa, with minor adjustments by me.	2010-07-03 13:53:26 +00:00
Marc G. Fournier	c302ed9e4e	tag 8.4.4	2010-05-14 03:20:06 +00:00
Tom Lane	9b69647824	Fix "constraint_exclusion = partition" logic so that it will also attempt constraint exclusion on an inheritance set that is the target of an UPDATE or DELETE query. Per gripe from Marc Cousin. Back-patch to 8.4 where the feature was introduced.	2010-03-30 21:58:18 +00:00
Alvaro Herrera	08729b4e8a	Prevent ALTER USER f RESET ALL from removing the settings that were put there by a superuser -- "ALTER USER f RESET setting" already disallows removing such a setting. Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database owner that's not superuser.	2010-03-25 14:44:51 +00:00
Marc G. Fournier	d6c7c7c6bc	tag 8.4.3	2010-03-12 03:23:23 +00:00
Tom Lane	3c93c3ab95	Export xml.c's libxml-error-handling support so that contrib/xml2 can use it too, instead of duplicating the functionality (badly). I renamed xml_init to pg_xml_init, because the former seemed just a bit too generic to be safe as a global symbol. I considered likewise renaming xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably made it sufficiently PG-centric already.	2010-03-03 17:29:53 +00:00
Tom Lane	ed62e74522	Alter the configure script to fail immediately if the C compiler does not provide a working 64-bit integer datatype. As recently noted, we've been broken on such platforms since early in the 8.4 development cycle. Since it took nearly two years for anyone to even notice, it seems that the rationale for continuing to support such platforms has reached the point of non-existence. Rather than thrashing around to try to make it work again, we'll just admit up front that this no longer works. Back-patch to 8.4 since that branch is also broken. We should go around to remove INT64_IS_BUSTED support, but just in HEAD, so that seems like material for a separate commit.	2010-01-07 00:25:18 +00:00
Tom Lane	c5afcf9040	Add support for doing FULL JOIN ON FALSE. While this is really a rather peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.	2010-01-05 23:25:44 +00:00
Tom Lane	e5fddc5290	Fix a bug introduced when set-returning SQL functions were made inline-able: we have to cope with the possibility that the declared result rowtype contains dropped columns. This fails in 8.4, as per bug #5240. While at it, be more paranoid about inserting binary coercions when inlining. The pre-8.4 code did not really need to worry about that because it could not inline at all in any case where an added coercion could change the behavior of the function's statement. However, when inlining a SRF we allow sorting, grouping, and set-ops such as UNION. In these cases, modifying one of the targetlist entries that the sort/group/setop depends on could conceivably change the behavior of the function's statement --- so don't inline when such a case applies.	2009-12-14 02:16:04 +00:00
Marc G. Fournier	5cc7c13022	tag for 8.4.2	2009-12-10 02:56:56 +00:00
Tom Lane	42ba393b59	Prevent indirect security attacks via changing session-local state within an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136	2009-12-09 21:58:04 +00:00
Heikki Linnakangas	d1c5bdf520	Fix bug in temporary file management with subtransactions. A cursor opened in a subtransaction stays open even if the subtransaction is aborted, so any temporary files related to it must stay alive as well. With the patch, we use ResourceOwners to track open temporary files and don't automatically close them at subtransaction end (though in the normal case temporary files are registered with the subtransaction resource owner and will therefore be closed). At end of top transaction, we still check that there's no temporary files marked as close-at-end-of-transaction open, but that's now just a debugging cross-check as the resource owner cleanup should've closed them already.	2009-12-03 11:03:35 +00:00
Heikki Linnakangas	5e4b869f29	Fix an old bug in multixact and two-phase commit. Prepared transactions can be part of multixacts, so allocate a slot for each prepared transaction in the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer the oldest member value from the current backends slot to the prepared xact slot. Also save and recover the value from the 2pc state file. The symptom of the bug was that after a transaction prepared, a shared lock still held by the prepared transaction was sometimes ignored by other transactions. Fix back to 8.1, where both 2PC and multixact were introduced.	2009-11-23 09:58:51 +00:00
Magnus Hagander	8467f3c5a1	Add inheritable ACE when creating a restricted token for execution on Win32. Also refactor the code around it to be more clear. Jesse Morris	2009-11-14 15:39:45 +00:00
Alvaro Herrera	83d4698cc9	Fix longstanding problems in VACUUM caused by untimely interruptions In VACUUM FULL, an interrupt after the initial transaction has been recorded as committed can cause postmaster to restart with the following error message: PANIC: cannot abort transaction NNNN, it was already committed This problem has been reported many times. In lazy VACUUM, an interrupt after the table has been truncated by lazy_truncate_heap causes other backends' relcache to still point to the removed pages; this can cause future INSERT and UPDATE queries to error out with the following error message: could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes The window to this race condition is extremely narrow, but it has been seen in the wild involving a cancelled autovacuum process. The solution for both problems is to inhibit interrupts in both operations until after the respective transactions have been committed. It's not a complete solution, because the transaction could theoretically be aborted by some other error, but at least fixes the most common causes of both problems.	2009-11-10 18:00:30 +00:00
Tom Lane	01adc8afd5	Dept of second thoughts: after studying index_getnext() a bit more I realize that it can scribble on scan->xs_ctup.t_self while following HOT chains, so we can't rely on that to stay valid between hashgettuple() calls. Introduce a private variable in HashScanOpaque, instead.	2009-11-01 22:31:02 +00:00
Tom Lane	891e225a70	Fix two serious bugs introduced into hash indexes by the 8.4 patch that made hash indexes keep entries sorted by hash value. First, the original plans for concurrency assumed that insertions would happen only at the end of a page, which is no longer true; this could cause scans to transiently fail to find index entries in the presence of concurrent insertions. We can compensate by teaching scans to re-find their position after re-acquiring read locks. Second, neither the bucket split nor the bucket compaction logic had been fixed to preserve hashvalue ordering, so application of either of those processes could lead to permanent corruption of an index, in the sense that searches might fail to find entries that are present. This patch fixes the split and compaction logic to preserve hashvalue ordering, but it cannot do anything about pre-existing corruption. We will need to recommend reindexing all hash indexes in the 8.4.2 release notes. To buy back the performance loss hereby induced in split and compaction, fix them to use PageIndexMultiDelete instead of retail PageIndexDelete operations. We might later want to do something with qsort'ing the page contents rather than doing a binary search for each insertion, but that seemed more invasive than I cared to risk in a back-patch. Per bug #5157 from Jeff Janes and subsequent investigation.	2009-11-01 21:25:33 +00:00

1 2 3 4 5 ...

4897 commits